AskTable
sidebar.freeTrial

How to Optimize AI Query Effects: AskTable Best Practices

AskTable Team
AskTable Team 2025-12-31

How to Optimize AI Query Effects: AskTable Best Practices

The accuracy of AI queries depends on multiple factors: database metadata quality, training samples, permission configuration, question phrasing, etc. This article systematically introduces how to optimize AskTable's query effects, making AI understand your needs more accurately.

1. Metadata Optimization

1.1 Why Metadata is Important

Metadata is the foundation for AI to understand database structure. High-quality metadata can:

  • Help AI quickly locate related tables and fields
  • Reduce query errors and ambiguity
  • Improve query response speed
  • Reduce LLM API costs

1.2 Table and Field Descriptions

Add Clear Descriptions:

# Good description
table: "orders"
description: "Customer order records, including order basic info, amount, status, etc."

field: "order_status"
description: "Order status: pending, paid, shipped, completed, cancelled"

# Bad description
table: "orders"
description: "Orders"  # Too simple

field: "status"
description: "Status"  # Doesn't explain possible values

Description Best Practices:

  1. Table description: Explain table's business meaning, main purpose, data scope
  2. Field description: Explain field meaning, data type, possible values, units, etc.
  3. Enumeration values: For status fields, list all possible values and their meanings
  4. Relationship explanation: Explain foreign key relationships and table associations

1.3 Field Visibility Control

Hiding unnecessary fields can:

  • Reduce AI's selection scope
  • Lower query complexity
  • Improve query accuracy
# Hide internal fields
field: "internal_id"
visibility: false

field: "created_by_system"
visibility: false

# Keep business fields
field: "order_id"
visibility: true

field: "customer_name"
visibility: true

Recommended Fields to Hide:

  • System internal fields (like created_at_ms, updated_by_system)
  • Technical fields (like hash_key, partition_key)
  • Redundant fields (like hide created_date if created_at exists)
  • Sensitive fields (like password hashes, internal identifiers)

1.4 Example Value Injection

AskTable supports automatically injecting example values into field descriptions:

# Original description
field: "region"
description: "Sales region"

# After example value injection
field: "region"
description: "Sales region(e.g. \"East China\",\"North China\",\"South China\")"

How It Works:

  1. When user asks a question, AskTable extracts keywords
  2. Search for matching values in the value index
  3. Inject matching values into field descriptions
  4. AI sees concrete examples and is more likely to generate correct queries

Enable Value Index:

# Configure Azure AI Search
aisearch_host: "https://your-search.search.windows.net"
aisearch_master_key: "your-key"

2. Semantic Search Optimization

2.1 Semantic Search Workflow

When users ask questions, AskTable uses a two-stage retrieval:

User Question: "Sales in East China region last week"
    ↓
┌─────────────────────────────────────┐
│ 1. Extract subqueries and keywords  │
│    subqueries: ["sales", "region"]  │
│    keywords: ["East China", "last week"] │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ 2. Semantic search fields           │
│    - Search related fields in       │
│      vector database                │
│    - Match: sales.amount, sales.region│
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ 3. Full-text search values          │
│    - Search keywords in value index │
│    - Match: region="East China"     │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ 4. Merge results and inject examples│
│    region: "Sales region(e.g. \"East China\")" │
└─────────────────────────────────────┘

2.2 Vector Database Configuration

AskTable uses Qdrant to store field vectors:

# Field vectorization
field_vector = embed_model.encode(
    f"{table.name} {field.name} {field.description}"
)

# Store in Qdrant
qdrant.upsert(
    collection_name=f"ds_{datasource_id}_fields",
    points=[{
        "id": field_id,
        "vector": field_vector,
        "payload": {
            "schema_name": schema.name,
            "table_name": table.name,
            "field_name": field.name,
            "description": field.description,
        }
    }]
)

Optimization Suggestions:

  1. Regularly sync metadata: Sync after database structure changes
  2. Use high-quality descriptions: More detailed descriptions make vector representation more accurate
  3. Monitor search quality: Check if retrieved fields are relevant

2.3 Value Index Configuration

Value index is used for full-text search of specific data values:

# Index unique values of fields
unique_values = datasource.query(
    f"SELECT DISTINCT {field.name} FROM {table.name} LIMIT 1000"
)

# Store in Azure AI Search
for value in unique_values:
    search_client.upload_documents([{
        "id": f"{field_id}_{value}",
        "schema_name": schema.name,
        "table_name": table.name,
        "field_name": field.name,
        "value": value,
        "type": "value"
    }])

Fields Suitable for Indexing:

  • Enumeration fields (status, type, category)
  • Geographic fields (city, region, country)
  • Organization fields (department, team, company)
  • Product fields (product name, brand, model)

Fields Not Suitable for Indexing:

  • High-cardinality fields (user ID, order number)
  • Continuous numeric fields (amount, quantity)
  • Timestamp fields
  • Text fields (comments, descriptions)

3. Training Sample Optimization

3.1 Role of Training Samples

Training Pairs (question-SQL pairs) are used for:

  1. Few-shot Learning: Provide query examples for AI reference
  2. Schema Linking: Mark commonly used tables and fields
  3. Pattern Learning: Let AI learn specific query patterns

3.2 Adding Training Samples

Through UI:

  1. Execute a query in the chat interface
  2. Click "Add to training set" button on query result
  3. Edit question and SQL (if needed)
  4. Save

Through API:

import requests

response = requests.post(
    "https://api.asktable.com/api/v1/datasources/{ds_id}/training-pairs",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "question": "Sales in East China region last week",
        "sql": """
            -- Sales in East China region last week
            SELECT SUM(amount) as total_sales
            FROM sales
            WHERE region = 'East China'
              AND order_date >= DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
              AND order_date < CURDATE()
        """
    }
)

3.3 Training Sample Best Practices

1. Cover Common Query Patterns:

-- Time range query
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'

-- Aggregation query
SELECT region, SUM(amount) as total
FROM sales
GROUP BY region

-- Multi-table join
SELECT o.order_id, c.customer_name, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id

-- Ranking query
SELECT product_name, sales_count
FROM products
ORDER BY sales_count DESC
LIMIT 10

2. Include Business Terms:

-- Question: "This month's GMV"
-- SQL: SELECT SUM(amount) FROM orders WHERE ...

-- Question: "Active users"
-- SQL: SELECT COUNT(DISTINCT user_id) FROM user_actions WHERE ...

-- Question: "Repurchase rate"
-- SQL: SELECT COUNT(DISTINCT CASE WHEN order_count > 1 THEN customer_id END) / COUNT(DISTINCT customer_id) ...

3. Annotate Complex Logic:

-- Question: "Year-over-year growth rate"
SELECT
  current_year.month,
  (current_year.sales - last_year.sales) / last_year.sales * 100 as growth_rate
FROM
  (SELECT MONTH(order_date) as month, SUM(amount) as sales
   FROM orders
   WHERE YEAR(order_date) = 2024
   GROUP BY MONTH(order_date)) current_year
JOIN
  (SELECT MONTH(order_date) as month, SUM(amount) as sales
   FROM orders
   WHERE YEAR(order_date) = 2023
   GROUP BY MONTH(order_date)) last_year
ON current_year.month = last_year.month

4. Quantity Recommendations:

  • Minimum: 10-20 samples to cover basic queries
  • Recommended: 50-100 samples to cover common scenarios
  • Maximum: 200-300 samples (more increases token consumption)

3.4 Using Training Samples

AskTable automatically uses training samples during queries:

# 1. Extract relevant fields from training samples
training_pairs = retrieve_training_pairs(datasource_id, question)
training_fields = extract_fields_from_sql(training_pairs)

# 2. Merge into retrieval results
all_fields = semantic_search_fields + training_fields

# 3. Inject into System Prompt
system_prompt = f"""
Relevant training samples:
{format_training_pairs(training_pairs)}

Relevant fields:
{format_fields(all_fields)}
"""

4. Permission Configuration Optimization

4.1 Impact of Permissions on Queries

Permission configuration affects the data range AI can access:

# Users can only see their own department's data
role_policy = {
    "tables": ["sales"],
    "row_filter": "department = '{{user.department}}'"
}

# AI-generated SQL will automatically add filter conditions
# Original: SELECT * FROM sales
# Actual: SELECT * FROM sales WHERE department = 'Sales Dept'

Optimization Suggestions:

  1. Set table permissions reasonably: Only authorize necessary tables
  2. Use row-level filtering: Limit data scope rather than hiding entire tables
  3. Avoid over-restriction: Too much restriction prevents AI from answering questions

4.2 Field Masking

Mask sensitive fields:

# Configure field masking
field_policy = {
    "field": "customer_phone",
    "mask_type": "phone",  # 138****5678
}

field_policy = {
    "field": "customer_email",
    "mask_type": "email",  # a***@example.com
}

See: AskTable Data Security Best Practices

5. Questioning Skills

5.1 Clear Questions

Good questions:

  • "Sales in East China region last week"
  • "Top 10 products by sales in January 2024"
  • "New users this month"

Bad questions:

  • "Sales situation" (too vague)
  • "Data" (doesn't specify what data)
  • "Help me check" (doesn't specify what to check)

5.2 Include Key Information

Time range:

  • "Last week" / "This month" / "2024"
  • "Last 7 days" / "Past 30 days"
  • "2024-01-01 to 2024-01-31"

Dimension information:

  • "By region" / "By product" / "By department"
  • "East China region" / "Sales department" / "iPhone"

Metric information:

  • "Sales amount" / "Order count" / "User count"
  • "Average" / "Sum" / "Growth rate"

5.3 Use Business Terms

If your training samples include business terms, you can use them directly:

"This month's GMV" → AI knows GMV = SUM(amount)
"Active users" → AI knows the definition of active users
"Repurchase rate" → AI knows how to calculate repurchase rate

5.4 Step-by-Step Queries

For complex questions, you can query in steps:

Step 1: "Sales by region in 2024"
Step 2: "Sales proportion of East China region"
Step 3: "Top 10 products by sales in East China region"

In Canvas, you can combine multiple queries into a workflow.

6. Monitoring and Debugging

6.1 View AI Thinking Process

AskTable provides streaming response, allowing you to see AI's thinking process:

1. Searching metadata...
   Found tables: sales, orders
   Found fields: region, amount, order_date

2. Viewing table structure...
   sales.region: Sales region(e.g. "East China","North China")
   sales.amount: Sales amount
   sales.order_date: Order date

3. Executing SQL...
   SELECT SUM(amount) FROM sales WHERE region = 'East China' AND ...

4. Returning results
   Total sales: 1,234,567

6.2 Check Retrieval Quality

Check if AI-retrieved tables and fields are relevant:

# View retrieval results in logs
retrieved_fields = [
    "sales.region (score: 0.92)",
    "sales.amount (score: 0.89)",
    "sales.order_date (score: 0.85)",
]

If retrieval is inaccurate:

  • Check if field descriptions are clear
  • Check if vector database is synced
  • Consider adding training samples

6.3 Analyze Failed Queries

When a query fails, check:

  1. Metadata issues: Did AI find the correct tables and fields?
  2. Permission issues: Does the user have permission to access these tables?
  3. SQL errors: Are there syntax errors in the generated SQL?
  4. Data issues: Is there relevant data in the database?

7. Performance Optimization

7.1 Reduce Metadata Scale

Hide unnecessary tables:

# Only keep business tables
visible_tables = ["orders", "customers", "products", "sales"]

# Hide system tables
hidden_tables = ["_migrations", "_audit_log", "_temp_*"]

Hide unnecessary fields:

# Only keep business fields
visible_fields = ["id", "name", "amount", "status", "created_at"]

# Hide technical fields
hidden_fields = ["internal_*", "*_hash", "*_key"]

7.2 Use Prompt Caching

Enabling Prompt Caching can reduce processing time for repeated content:

# Automatically enable Prompt Caching
messages = message_builder.dump_openai(cache_control=True)

Content suitable for caching:

  • Database metadata (rarely changes)
  • Training samples (relatively stable)
  • System Prompt (fixed content)

7.3 Control Training Sample Count

Too many training samples increase token consumption:

# Only retrieve most relevant training samples
training_pairs = retrieve_training_pairs(
    datasource_id,
    question,
    limit=10  # Limit count
)

7.4 Use Appropriate Models

Choose models based on query complexity:

# Simple queries: Use fast model
model = "gpt-4o-mini"

# Complex queries: Use powerful model
model = "gpt-4o"

# Canvas nodes: Use reasoning model
model = "o1-mini"
reasoning_effort = "medium"

8. Common Questions

Possible causes:

  • Table/field descriptions are unclear
  • Vector database not synced
  • Question phrasing is unclear

Solutions:

  1. Improve table and field descriptions
  2. Resync metadata to vector database
  3. Add relevant training samples
  4. Use clearer question phrasing

Q2: AI-generated SQL has errors

Possible causes:

  • Field type information inaccurate
  • Missing relevant training samples
  • Question too complex

Solutions:

  1. Check if field's data_type is correct
  2. Add training samples for similar queries
  3. Break complex questions into multiple simple ones

Q3: Query is slow

Possible causes:

  • Metadata scale too large
  • Too many training samples
  • Prompt Caching not enabled

Solutions:

  1. Hide unnecessary tables and fields
  2. Limit training sample count
  3. Enable Prompt Caching
  4. Use faster models

Q4: AI-returned data doesn't match permissions

Possible causes:

  • Permission configuration error
  • Row-level filtering not working

Solutions:

  1. Check role permission configuration
  2. Verify row-level filter expression
  3. Check if generated SQL includes filter conditions

9. Summary

Optimizing AI query effects is a continuous process requiring attention to multiple aspects:

  1. Metadata optimization: Clear descriptions, reasonable visibility, example value injection
  2. Semantic search: Vector database, value index, regular sync
  3. Training samples: Cover common patterns, include business terms, add in moderation
  4. Permission configuration: Reasonable authorization, row-level filtering, field masking
  5. Questioning skills: Clear expression, include key information, use business terms
  6. Monitoring and debugging: View thinking process, check retrieval quality, analyze failure reasons
  7. Performance optimization: Reduce metadata, Prompt Caching, choose appropriate models

By systematically applying these best practices, you can significantly improve AskTable's query accuracy and efficiency.

cta.readyToSimplify

sidebar.noProgrammingNeededsidebar.startFreeTrial

cta.noCreditCard
cta.quickStart
cta.dbSupport