
sidebar.wechat

sidebar.feishu
sidebar.chooseYourWayToJoin

sidebar.scanToAddConsultant
In Text-to-SQL systems, metadata quality directly determines SQL generation accuracy. But traditional metadata management often only focuses on technical aspects (table names, field names, data types), ignoring business semantics. AskTable's MetaAdmin semantic layer organically combines technical metadata and business semantics, allowing AI to truly understand enterprise data.
Technical Metadata Example:
CREATE TABLE ord_dtl (
ord_id VARCHAR(32),
cust_id VARCHAR(32),
prod_id VARCHAR(32),
qty INT,
amt DECIMAL(10,2),
sts TINYINT,
crt_tm DATETIME
);
Problems:
ord_dtl mean? Order details?qty quantity of?amt amount including or excluding tax?sts values 1, 2, 3 represent?crt_tm creation time or order time?User Question: "What is the order amount for last month?"
AI's Confusion:
ord_dtl table or order_detail table?amt field or amount field?sts field? What values count as valid orders?AskTable's MetaAdmin semantic layer provides complete business semantics:
Table(
name="ord_dtl",
origin_desc="ord_dtl", # Original table name
curr_desc="Order details table, records product明细 information for each order", # Business description
fields={
"ord_id": Field(
name="ord_id",
data_type="VARCHAR(32)",
curr_desc="Order ID, references order master table",
),
"amt": Field(
name="amt",
data_type="DECIMAL(10,2)",
curr_desc="Order amount, unit: yuan, excluding tax, excluding shipping",
),
"sts": Field(
name="sts",
data_type="TINYINT",
curr_desc="Order status: 1-pending payment, 2-paid, 3-shipped, 4-completed, 5-cancelled",
),
}
)
Now AI can:
ord_dtl is the order details tableamt is amount excluding tax and shippingsts of 2, 3, 4 are valid ordersThe diagram shows MetaAdmin's three-layer architecture: Schema → Table → Field. Each layer contains original description and business description. Through this layered structure, AI can accurately understand the business meaning and hierarchical relationships of data.
AskTable adopts three-layer structure: Schema → Table → Field
@dataclasses.dataclass(kw_only=True)
class StructureBase:
name: str
origin_desc: str | None = None # Original description
curr_desc: str | None = None # Current description (business semantics)
curr_desc_stat: str = "origin" # Description status: origin/manual/ai_generated
@dataclasses.dataclass()
class Field(StructureBase):
schema_name: str
table_name: str
sample_data: str | None = None # Sample data
data_type: str | None = None # Data type
is_index: bool = False # Whether indexed field
index_count: int = 0 # Index cardinality
visibility: bool = True # Whether visible
identifiable_type: IdentifiableType = IdentifiableType.plain # Sensitive information type
is_vector_synced: bool = True # Whether synced to vector library
@property
def full_name(self) -> str:
return f"{self.schema_name}.{self.table_name}.{self.name}"
@dataclasses.dataclass()
class Table(StructureBase):
schema_name: str
fields: dict[str, Field] = dataclasses.field(default_factory=dict)
@property
def full_name(self) -> str:
return f"{self.schema_name}.{self.name}"
@dataclasses.dataclass()
class Schema(StructureBase):
custom_configs: dict | None = None
tables: dict[str, Table] = dataclasses.field(default_factory=dict)
@property
def full_name(self) -> str:
return self.name
@dataclasses.dataclass(kw_only=True)
class MetaAdmin:
datasource_id: str | None = None
schemas: dict[str, Schema] = dataclasses.field(default_factory=dict)
@property
def schema_count(self):
return len(self.schemas.values())
@property
def table_count(self):
return sum([len(schema.tables.values()) for schema in self.schemas.values()])
@property
def field_count(self):
return sum([
len(table.fields.values())
for schema in self.schemas.values()
for table in schema.tables.values()
])
def __repr__(self):
return f"[{self.datasource_id}] {self.schema_count} S, {self.table_count} T, {self.field_count} F"
Design Highlights:
origin_desc: str | None = None # Original description
curr_desc: str | None = None # Current description
curr_desc_stat: str = "origin" # Description source
Why two descriptions?
sample_data: str | None = None # Sample data
is_index: bool = False # Whether indexed
index_count: int = 0 # Index cardinality
visibility: bool = True # Whether visible
identifiable_type: IdentifiableType = IdentifiableType.plain # Sensitive type
MetaAdmin provides powerful filtering, supporting filtering by name and regex:
def filter_fields_by_names(
self, field_full_names: list[tuple[str, str, str]]
) -> "MetaAdmin":
"""
Filter by field full names
field_full_names: [(schema_name, table_name, field_name), ...]
"""
# ... implementation
def filter_tables_by_names(
self, table_full_names: list[tuple[str, str]]
) -> "MetaAdmin":
"""
Filter by table full names
table_full_names: [(schema_name, table_name), ...]
"""
# ... implementation
def filter_by_regex(
self, schema_pattern=None, table_pattern=None, field_pattern=None
) -> "MetaAdmin":
"""
Filter by regex patterns
"""
# ... implementation
Supports merging and difference operations for multiple MetaAdmins:
def merge_metas(allow_metas: list[MetaAdmin], deny_metas: list[MetaAdmin]) -> MetaAdmin:
"""
Merge multiple allow_metas, and remove fields in deny_metas from result
"""
# ... implementation
MetaAdmin can export to Markdown format, easy to pass to LLM:
def to_markdown(
self,
level: Literal["table", "field"] = "field",
include: set[str] = set(),
exclude: set[str] = set(),
) -> str:
"""
Convert to Markdown format
"""
# ... implementation
Requirement: Sales department can only query orders for their responsible regions
# Get user role's accessible metadata
role = get_user_role(user_id)
accessible_meta = role.get_accessible_meta(datasource)
# Filter visible fields
# ... implementation
# Pass to Agent
agent = DBAgent(datasource=datasource, meta=accessible_meta, assumed_role=role)
Requirement: Customer service can query user info, but phone numbers and ID numbers need masking
# Tag sensitive fields
user_table.fields["phone"].identifiable_type = IdentifiableType.phone
user_table.fields["id_card"].identifiable_type = IdentifiableType.id_card
# Auto-mask during query
result = datasource.accessor.query("SELECT * FROM users LIMIT 10")
# ... masking logic
Requirement: Auto-generate field descriptions, improve metadata quality
async def enhance_metadata(meta: MetaAdmin) -> MetaAdmin:
"""Use AI to enhance metadata descriptions"""
# ... implementation
class MetaAdmin:
_lazy_loaded: bool = False
def ensure_loaded(self):
"""Lazy load metadata"""
# ... implementation
def update_field_description(
self,
schema_name: str,
table_name: str,
field_name: str,
new_desc: str
):
"""Incrementally update field description"""
# ... implementation
class MetaCache:
_cache: dict[str, MetaAdmin] = {}
_ttl: int = 3600 # 1 hour
# ... implementation
Table Description:
# ✅ Good description
table.curr_desc = "Order table, records all customer order information including order amount, status, creation time, etc."
# ❌ Poor description
table.curr_desc = "orders" # No business meaning
Field Description:
# ✅ Good description
field.curr_desc = "Order amount, unit: yuan, excluding tax, excluding shipping"
# ❌ Poor description
field.curr_desc = "Amount" # Insufficient information
# Phone number
field.identifiable_type = IdentifiableType.phone
# ID card
field.identifiable_type = IdentifiableType.id_card
# Email
field.identifiable_type = IdentifiableType.email
# Bank card
field.identifiable_type = IdentifiableType.bank_card
# Hide internal fields
field.visibility = False
# Hide test tables
table.visibility = False
# Hide deprecated fields
field.visibility = False
field.curr_desc = f"{field.curr_desc} (Deprecated)"
AskTable's MetaAdmin semantic layer enables AI to truly understand enterprise data through layered metadata management, dynamic filtering, and permission control:
The semantic layer is the foundation of Text-to-SQL systems. High-quality metadata directly determines SQL generation accuracy. Through MetaAdmin, AskTable achieves a perfect combination of technical metadata and business semantics.
sidebar.noProgrammingNeeded
sidebar.startFreeTrial