Semantic Layer Architecture: Letting AI Truly Understand Enterprise Data

In Text-to-SQL systems, metadata quality directly determines SQL generation accuracy. But traditional metadata management often only focuses on technical aspects (table names, field names, data types), ignoring business semantics. AskTable's MetaAdmin semantic layer organically combines technical metadata and business semantics, allowing AI to truly understand enterprise data.

Why Do We Need a Semantic Layer?

Limitations of Traditional Metadata

Technical Metadata Example:

CREATE TABLE ord_dtl (
    ord_id VARCHAR(32),
    cust_id VARCHAR(32),
    prod_id VARCHAR(32),
    qty INT,
    amt DECIMAL(10,2),
    sts TINYINT,
    crt_tm DATETIME
);

Problems:

•What does table name ord_dtl mean? Order details?
•What is field qty quantity of?
•Is field amt amount including or excluding tax?
•What do field sts values 1, 2, 3 represent?
•Is field crt_tm creation time or order time?

User Question: "What is the order amount for last month?"

AI's Confusion:

•Should query ord_dtl table or order_detail table?
•Should use amt field or amount field?
•Need to filter sts field? What values count as valid orders?

Value of Semantic Layer

AskTable's MetaAdmin semantic layer provides complete business semantics:

Table(
    name="ord_dtl",
    origin_desc="ord_dtl",  # Original table name
    curr_desc="Order details table, records product明细 information for each order",  # Business description
    fields={
        "ord_id": Field(
            name="ord_id",
            data_type="VARCHAR(32)",
            curr_desc="Order ID, references order master table",
        ),
        "amt": Field(
            name="amt",
            data_type="DECIMAL(10,2)",
            curr_desc="Order amount, unit: yuan, excluding tax, excluding shipping",
        ),
        "sts": Field(
            name="sts",
            data_type="TINYINT",
            curr_desc="Order status: 1-pending payment, 2-paid, 3-shipped, 4-completed, 5-cancelled",
        ),
    }
)

Now AI can:

•Understand ord_dtl is the order details table
•Know amt is amount excluding tax and shipping
•Know only orders with sts of 2, 3, 4 are valid orders

MetaAdmin Layered Architecture

加载图表中...

The diagram shows MetaAdmin's three-layer architecture: Schema → Table → Field. Each layer contains original description and business description. Through this layered structure, AI can accurately understand the business meaning and hierarchical relationships of data.

Core Architecture Design

1. Layered Metadata Structure

AskTable adopts three-layer structure: Schema → Table → Field

@dataclasses.dataclass(kw_only=True)
class StructureBase:
    name: str
    origin_desc: str | None = None  # Original description
    curr_desc: str | None = None    # Current description (business semantics)
    curr_desc_stat: str = "origin"  # Description status: origin/manual/ai_generated

@dataclasses.dataclass()
class Field(StructureBase):
    schema_name: str
    table_name: str

    sample_data: str | None = None  # Sample data
    data_type: str | None = None    # Data type

    is_index: bool = False          # Whether indexed field
    index_count: int = 0            # Index cardinality

    visibility: bool = True         # Whether visible
    identifiable_type: IdentifiableType = IdentifiableType.plain  # Sensitive information type

    is_vector_synced: bool = True   # Whether synced to vector library

    @property
    def full_name(self) -> str:
        return f"{self.schema_name}.{self.table_name}.{self.name}"

@dataclasses.dataclass()
class Table(StructureBase):
    schema_name: str
    fields: dict[str, Field] = dataclasses.field(default_factory=dict)

    @property
    def full_name(self) -> str:
        return f"{self.schema_name}.{self.name}"

@dataclasses.dataclass()
class Schema(StructureBase):
    custom_configs: dict | None = None
    tables: dict[str, Table] = dataclasses.field(default_factory=dict)

    @property
    def full_name(self) -> str:
        return self.name

@dataclasses.dataclass(kw_only=True)
class MetaAdmin:
    datasource_id: str | None = None
    schemas: dict[str, Schema] = dataclasses.field(default_factory=dict)

    @property
    def schema_count(self):
        return len(self.schemas.values())

    @property
    def table_count(self):
        return sum([len(schema.tables.values()) for schema in self.schemas.values()])

    @property
    def field_count(self):
        return sum([
            len(table.fields.values())
            for schema in self.schemas.values()
            for table in schema.tables.values()
        ])

    def __repr__(self):
        return f"[{self.datasource_id}] {self.schema_count} S, {self.table_count} T, {self.field_count} F"

Design Highlights:

1.1 Dual Description Mechanism

origin_desc: str | None = None  # Original description
curr_desc: str | None = None    # Current description
curr_desc_stat: str = "origin"  # Description source

Why two descriptions?

•origin_desc: Preserve original information for traceability
•
curr_desc: Current business description in use, can be:
- •Manually written (manual)
- •AI generated (ai_generated)
- •Original value (origin)

1.2 Rich Field Attributes

sample_data: str | None = None  # Sample data
is_index: bool = False          # Whether indexed
index_count: int = 0            # Index cardinality
visibility: bool = True         # Whether visible
identifiable_type: IdentifiableType = IdentifiableType.plain  # Sensitive type

2. Dynamic Filtering Capability

MetaAdmin provides powerful filtering, supporting filtering by name and regex:

2.1 Filter by Field Name

def filter_fields_by_names(
    self, field_full_names: list[tuple[str, str, str]]
) -> "MetaAdmin":
    """
    Filter by field full names
    field_full_names: [(schema_name, table_name, field_name), ...]
    """
    # ... implementation

2.2 Filter by Table Name

def filter_tables_by_names(
    self, table_full_names: list[tuple[str, str]]
) -> "MetaAdmin":
    """
    Filter by table full names
    table_full_names: [(schema_name, table_name), ...]
    """
    # ... implementation

2.3 Filter by Regex

def filter_by_regex(
    self, schema_pattern=None, table_pattern=None, field_pattern=None
) -> "MetaAdmin":
    """
    Filter by regex patterns
    """
    # ... implementation

3. Metadata Merging

Supports merging and difference operations for multiple MetaAdmins:

def merge_metas(allow_metas: list[MetaAdmin], deny_metas: list[MetaAdmin]) -> MetaAdmin:
    """
    Merge multiple allow_metas, and remove fields in deny_metas from result
    """
    # ... implementation

4. Markdown Export

MetaAdmin can export to Markdown format, easy to pass to LLM:

def to_markdown(
    self,
    level: Literal["table", "field"] = "field",
    include: set[str] = set(),
    exclude: set[str] = set(),
) -> str:
    """
    Convert to Markdown format
    """
    # ... implementation

Practical Application Scenarios

Scenario 1: Permission Control

Requirement: Sales department can only query orders for their responsible regions

# Get user role's accessible metadata
role = get_user_role(user_id)
accessible_meta = role.get_accessible_meta(datasource)

# Filter visible fields
# ... implementation

# Pass to Agent
agent = DBAgent(datasource=datasource, meta=accessible_meta, assumed_role=role)

Scenario 2: Sensitive Information Masking

Requirement: Customer service can query user info, but phone numbers and ID numbers need masking

# Tag sensitive fields
user_table.fields["phone"].identifiable_type = IdentifiableType.phone
user_table.fields["id_card"].identifiable_type = IdentifiableType.id_card

# Auto-mask during query
result = datasource.accessor.query("SELECT * FROM users LIMIT 10")
# ... masking logic

Scenario 3: Metadata Quality Enhancement

Requirement: Auto-generate field descriptions, improve metadata quality

async def enhance_metadata(meta: MetaAdmin) -> MetaAdmin:
    """Use AI to enhance metadata descriptions"""
    # ... implementation

Performance Optimization Practices

1. Lazy Loading

class MetaAdmin:
    _lazy_loaded: bool = False

    def ensure_loaded(self):
        """Lazy load metadata"""
        # ... implementation

2. Incremental Updates

def update_field_description(
    self,
    schema_name: str,
    table_name: str,
    field_name: str,
    new_desc: str
):
    """Incrementally update field description"""
    # ... implementation

3. Caching Strategy

class MetaCache:
    _cache: dict[str, MetaAdmin] = {}
    _ttl: int = 3600  # 1 hour

    # ... implementation

Best Practice Recommendations

1. Metadata Description Standards

Table Description:

# ✅ Good description
table.curr_desc = "Order table, records all customer order information including order amount, status, creation time, etc."

# ❌ Poor description
table.curr_desc = "orders"  # No business meaning

Field Description:

# ✅ Good description
field.curr_desc = "Order amount, unit: yuan, excluding tax, excluding shipping"

# ❌ Poor description
field.curr_desc = "Amount"  # Insufficient information

2. Sensitive Information Tagging

# Phone number
field.identifiable_type = IdentifiableType.phone

# ID card
field.identifiable_type = IdentifiableType.id_card

# Email
field.identifiable_type = IdentifiableType.email

# Bank card
field.identifiable_type = IdentifiableType.bank_card

3. Visibility Control

# Hide internal fields
field.visibility = False

# Hide test tables
table.visibility = False

# Hide deprecated fields
field.visibility = False
field.curr_desc = f"{field.curr_desc} (Deprecated)"

Summary

AskTable's MetaAdmin semantic layer enables AI to truly understand enterprise data through layered metadata management, dynamic filtering, and permission control:

•Dual Description: Preserve original information, support business semantics
•Rich Attributes: Sample data, index information, sensitive tags
•Dynamic Filtering: Filter by name, regex, permissions
•Metadata Merging: Support multi-role, multi-datasource
•Markdown Export: Clear structure, easy for LLM to understand

The semantic layer is the foundation of Text-to-SQL systems. High-quality metadata directly determines SQL generation accuracy. Through MetaAdmin, AskTable achieves a perfect combination of technical metadata and business semantics.

cta.readyToSimplify

sidebar.noProgrammingNeeded
sidebar.startFreeTrial

cta.startFree cta.viewPricing

cta.noCreditCard

cta.quickStart

cta.dbSupport

sidebar.joinAskTableCommunity