
sidebar.wechat

sidebar.feishu
sidebar.chooseYourWayToJoin

sidebar.scanToAddConsultant
The accuracy of AI queries depends on multiple factors: database metadata quality, training samples, permission configuration, question phrasing, etc. This article systematically introduces how to optimize AskTable's query effects, making AI understand your needs more accurately.
Metadata is the foundation for AI to understand database structure. High-quality metadata can:
Add Clear Descriptions:
# Good description
table: "orders"
description: "Customer order records, including order basic info, amount, status, etc."
field: "order_status"
description: "Order status: pending, paid, shipped, completed, cancelled"
# Bad description
table: "orders"
description: "Orders" # Too simple
field: "status"
description: "Status" # Doesn't explain possible values
Description Best Practices:
Hiding unnecessary fields can:
# Hide internal fields
field: "internal_id"
visibility: false
field: "created_by_system"
visibility: false
# Keep business fields
field: "order_id"
visibility: true
field: "customer_name"
visibility: true
Recommended Fields to Hide:
created_at_ms, updated_by_system)hash_key, partition_key)created_date if created_at exists)AskTable supports automatically injecting example values into field descriptions:
# Original description
field: "region"
description: "Sales region"
# After example value injection
field: "region"
description: "Sales region(e.g. \"East China\",\"North China\",\"South China\")"
How It Works:
Enable Value Index:
# Configure Azure AI Search
aisearch_host: "https://your-search.search.windows.net"
aisearch_master_key: "your-key"
When users ask questions, AskTable uses a two-stage retrieval:
User Question: "Sales in East China region last week"
↓
┌─────────────────────────────────────┐
│ 1. Extract subqueries and keywords │
│ subqueries: ["sales", "region"] │
│ keywords: ["East China", "last week"] │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ 2. Semantic search fields │
│ - Search related fields in │
│ vector database │
│ - Match: sales.amount, sales.region│
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ 3. Full-text search values │
│ - Search keywords in value index │
│ - Match: region="East China" │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ 4. Merge results and inject examples│
│ region: "Sales region(e.g. \"East China\")" │
└─────────────────────────────────────┘
AskTable uses Qdrant to store field vectors:
# Field vectorization
field_vector = embed_model.encode(
f"{table.name} {field.name} {field.description}"
)
# Store in Qdrant
qdrant.upsert(
collection_name=f"ds_{datasource_id}_fields",
points=[{
"id": field_id,
"vector": field_vector,
"payload": {
"schema_name": schema.name,
"table_name": table.name,
"field_name": field.name,
"description": field.description,
}
}]
)
Optimization Suggestions:
Value index is used for full-text search of specific data values:
# Index unique values of fields
unique_values = datasource.query(
f"SELECT DISTINCT {field.name} FROM {table.name} LIMIT 1000"
)
# Store in Azure AI Search
for value in unique_values:
search_client.upload_documents([{
"id": f"{field_id}_{value}",
"schema_name": schema.name,
"table_name": table.name,
"field_name": field.name,
"value": value,
"type": "value"
}])
Fields Suitable for Indexing:
Fields Not Suitable for Indexing:
Training Pairs (question-SQL pairs) are used for:
Through UI:
Through API:
import requests
response = requests.post(
"https://api.asktable.com/api/v1/datasources/{ds_id}/training-pairs",
headers={"Authorization": f"Bearer {api_key}"},
json={
"question": "Sales in East China region last week",
"sql": """
-- Sales in East China region last week
SELECT SUM(amount) as total_sales
FROM sales
WHERE region = 'East China'
AND order_date >= DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
AND order_date < CURDATE()
"""
}
)
1. Cover Common Query Patterns:
-- Time range query
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'
-- Aggregation query
SELECT region, SUM(amount) as total
FROM sales
GROUP BY region
-- Multi-table join
SELECT o.order_id, c.customer_name, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
-- Ranking query
SELECT product_name, sales_count
FROM products
ORDER BY sales_count DESC
LIMIT 10
2. Include Business Terms:
-- Question: "This month's GMV"
-- SQL: SELECT SUM(amount) FROM orders WHERE ...
-- Question: "Active users"
-- SQL: SELECT COUNT(DISTINCT user_id) FROM user_actions WHERE ...
-- Question: "Repurchase rate"
-- SQL: SELECT COUNT(DISTINCT CASE WHEN order_count > 1 THEN customer_id END) / COUNT(DISTINCT customer_id) ...
3. Annotate Complex Logic:
-- Question: "Year-over-year growth rate"
SELECT
current_year.month,
(current_year.sales - last_year.sales) / last_year.sales * 100 as growth_rate
FROM
(SELECT MONTH(order_date) as month, SUM(amount) as sales
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY MONTH(order_date)) current_year
JOIN
(SELECT MONTH(order_date) as month, SUM(amount) as sales
FROM orders
WHERE YEAR(order_date) = 2023
GROUP BY MONTH(order_date)) last_year
ON current_year.month = last_year.month
4. Quantity Recommendations:
AskTable automatically uses training samples during queries:
# 1. Extract relevant fields from training samples
training_pairs = retrieve_training_pairs(datasource_id, question)
training_fields = extract_fields_from_sql(training_pairs)
# 2. Merge into retrieval results
all_fields = semantic_search_fields + training_fields
# 3. Inject into System Prompt
system_prompt = f"""
Relevant training samples:
{format_training_pairs(training_pairs)}
Relevant fields:
{format_fields(all_fields)}
"""
Permission configuration affects the data range AI can access:
# Users can only see their own department's data
role_policy = {
"tables": ["sales"],
"row_filter": "department = '{{user.department}}'"
}
# AI-generated SQL will automatically add filter conditions
# Original: SELECT * FROM sales
# Actual: SELECT * FROM sales WHERE department = 'Sales Dept'
Optimization Suggestions:
Mask sensitive fields:
# Configure field masking
field_policy = {
"field": "customer_phone",
"mask_type": "phone", # 138****5678
}
field_policy = {
"field": "customer_email",
"mask_type": "email", # a***@example.com
}
See: AskTable Data Security Best Practices
Good questions:
Bad questions:
Time range:
Dimension information:
Metric information:
If your training samples include business terms, you can use them directly:
"This month's GMV" → AI knows GMV = SUM(amount)
"Active users" → AI knows the definition of active users
"Repurchase rate" → AI knows how to calculate repurchase rate
For complex questions, you can query in steps:
Step 1: "Sales by region in 2024"
Step 2: "Sales proportion of East China region"
Step 3: "Top 10 products by sales in East China region"
In Canvas, you can combine multiple queries into a workflow.
AskTable provides streaming response, allowing you to see AI's thinking process:
1. Searching metadata...
Found tables: sales, orders
Found fields: region, amount, order_date
2. Viewing table structure...
sales.region: Sales region(e.g. "East China","North China")
sales.amount: Sales amount
sales.order_date: Order date
3. Executing SQL...
SELECT SUM(amount) FROM sales WHERE region = 'East China' AND ...
4. Returning results
Total sales: 1,234,567
Check if AI-retrieved tables and fields are relevant:
# View retrieval results in logs
retrieved_fields = [
"sales.region (score: 0.92)",
"sales.amount (score: 0.89)",
"sales.order_date (score: 0.85)",
]
If retrieval is inaccurate:
When a query fails, check:
Hide unnecessary tables:
# Only keep business tables
visible_tables = ["orders", "customers", "products", "sales"]
# Hide system tables
hidden_tables = ["_migrations", "_audit_log", "_temp_*"]
Hide unnecessary fields:
# Only keep business fields
visible_fields = ["id", "name", "amount", "status", "created_at"]
# Hide technical fields
hidden_fields = ["internal_*", "*_hash", "*_key"]
Enabling Prompt Caching can reduce processing time for repeated content:
# Automatically enable Prompt Caching
messages = message_builder.dump_openai(cache_control=True)
Content suitable for caching:
Too many training samples increase token consumption:
# Only retrieve most relevant training samples
training_pairs = retrieve_training_pairs(
datasource_id,
question,
limit=10 # Limit count
)
Choose models based on query complexity:
# Simple queries: Use fast model
model = "gpt-4o-mini"
# Complex queries: Use powerful model
model = "gpt-4o"
# Canvas nodes: Use reasoning model
model = "o1-mini"
reasoning_effort = "medium"
Possible causes:
Solutions:
Possible causes:
Solutions:
Possible causes:
Solutions:
Possible causes:
Solutions:
Optimizing AI query effects is a continuous process requiring attention to multiple aspects:
By systematically applying these best practices, you can significantly improve AskTable's query accuracy and efficiency.
sidebar.noProgrammingNeeded
sidebar.startFreeTrial