
sidebar.wechat

sidebar.feishu
sidebar.chooseYourWayToJoin

sidebar.scanToAddConsultant
There's a classic saying in the data analyst industry:
Garbage in, garbage out.
If the input data has problems, no matter how advanced the analysis method or complex the model, the output will definitely be unreliable.
But have you encountered these situations:
The common point of these problems: Data has issues, but wasn't discovered before analysis completed.
AskTable's Data Quality Detection Skill does one thing: proactively check data quality before and during analysis, ensuring analysis results are trustworthy.
Manifestation: Data missing, null values, breakpoints
Examples:
-某 store March 15-18 sales data is empty
- 40% of "age" field empty in user profiles
- 3-day breakpoint in time series
Impact:
- Trend analysis shows artificial "cliffs"
- Aggregated data is lower
- Prediction models distorted due to insufficient historical data
Manifestation: Same metric values inconsistent across different systems
Examples:
- CRM system shows this month's sales 5M, ERP shows 4.8M
- Reason: CRM counts by order time, ERP counts by shipping time
- Different definitions, resulting in two "truths"
Impact:
- Decision-makers see contradictory data, don't know which to trust
- Cross-system analysis results unreliable
Manifestation: Data values clearly illogical
Examples:
- Some user age shows 200
- Some order amount is negative
- Some store daily sales 100M (actually a small store)
Impact:
- Averages skewed by extreme values
- Aggregated results seriously distorted
Manifestation: Same record recorded multiple times
Examples:
- Some orders recorded twice due to system malfunction
- User submitted form, network jitter, duplicate submissions
Impact:
- Aggregated data artificially inflated
- Ratio metrics like conversion rate distorted
Manifestation: Data arrives late or is outdated
Examples:
- Today's data only updated 3 days ago
- After data source change, old definition data continues flowing in
Impact:
- Decisions based on outdated data may already be invalid
- Trend analysis shows artificial turning points
Before each analysis, AskTable automatically executes the following data quality checks:
Detection items:
- Null value ratio: proportion of null values in a field
- Missing data blocks: consecutive missing time periods or records
- Coverage: actual data volume vs expected data volume
Example:
"Past 30 days sales data:
- 28 days with data, 2 days missing (March 15-16)
- Data coverage 93.3%
- Missing days proportion 6.7%, below 10% threshold ✅"
Detection items:
- Cross-system metric comparison: same metric values from different sources
- Definition change records: whether statistical definitions changed
- Reconciliation checks: logical relationships between related metrics
Example:
"Sales comparison:
- Data source A (POS system): 5.2M
- Data source B (ERP system): 5.15M
- Difference 50K (0.97%), within acceptable range ✅
But note:
- Data source A counts by order time, data source B counts by payment time
- Difference mainly from cross-day orders (orders placed 23:00-24:00, paid next day)"
Detection items:
- Range validation: whether values are within reasonable ranges
- Extreme value detection: whether abnormally high/low values exist
- Logical validation: logical relationships between related fields
Example:
"Data accuracy check:
- Age field: found 3 records with age > 120 ⚠️
- Order amount: found 1 negative value record (returns) ⚠️
- Sales/orders = avg transaction, validation passed ✅"
AskTable gives a comprehensive data quality score (0-100):
📊 Data Quality Report
Overall score: 82/100 ✅ Good
┌────────────┬──────┬────────┐
│ Dimension │ Score│ Status │
├────────────┼──────┼────────┤
│ Completeness│ 88 │ ✅ │
│ Consistency│ 90 │ ✅ │
│ Accuracy │ 75 │ ⚠️ │
│ Duplication│ 95 │ ✅ │
│ Timeliness │ 65 │ ⚠️ │
└────────────┴──────┴────────┘
Issues found:
1. ⚠️ Accuracy: 3 records with age > 120, suggest cleanup
2. ⚠️ Timeliness: Data delayed 3 days, latest data is 3 days ago
Analysis impact assessment:
- Current data quality sufficient for trend and comparative analysis
- But aggregated results precise to the digit may have deviations
- Suggest fixing timeliness issues before making precise predictions
AskTable not only tells you "data has problems" but also gives fix suggestions:
| Issue | Fix Suggestion |
|---|---|
| Null values | "Can use before/after average filling (recommended) or exclude missing days directly" |
| Extreme values | "Suggest excluding 3 records with age > 120" |
| Duplicates | "5 duplicate orders detected, suggest deduplicating and recalculating" |
| Definition inconsistency | "Suggest unified order time definition, or note difference source in report" |
| Data delay | "Suggest checking data sync task, currently delayed 3 days" |
User asks: "Is this data reliable?"
📊 Data Quality Checkup Report
Data source: Sales database (January-March 2026)
Record count: 90 days × 3 = 270 store-days data
Overall score: 82/100 ✅ Good
【Completeness】88/100
- 2 days data missing (March 15-16, system maintenance)
- Coverage 99.3%
【Consistency】90/100
- Cross-system difference < 1%, normal
【Accuracy】75/100 ⚠️
- 3 abnormal records (age > 120)
- 1 negative amount record (returns, normal business)
【Timeliness】65/100 ⚠️
- Data delayed 3 days
- Latest available data: April 3
【Conclusion】
Data quality good, can be used for trend and comparative analysis.
But for day-level precise prediction, suggest fixing data delay first.
User asks: "Are there any problems with recent data?"
📊 This Week's Data Quality Inspection
Inspection range: Past 7 days (March 31 - April 6)
Inspection result: 2 items to watch
1. ⚠️ April 2 data missing
- Cause: POS system upgrade, data sync interrupted
- Impact: Day's sales data unavailable
- Suggestion: Fill with before/after average or contact IT to recover
2. ⚠️ User data duplication rate increased to 2.5%
- Normal range: < 1%
- Possible cause: Registration system upgrade caused duplicate submissions
- Suggestion: Run deduplication script, check registration system
Other dimensions normal.
📊 Data Quality Trend (Past 4 weeks)
| Week | Overall Score | Main Issue |
|----|----------|----------|
| W1 | 78 | Data delayed 5 days |
| W2 | 80 | Delay fixed, but nulls appeared |
| W3 | 82 | Nulls fixed, duplicates appeared |
| W4 | 85 | Duplicate cleanup in progress |
Trend: ✅ Continuously improving
Expected to reach 90+ next week
"Is this data reliable?"
"Are there any problems with recent data?"
"Check data quality for me"
"Before analysis, check if data has problems"
"Run a comprehensive data quality check"
"Check this week's data quality"
"Compare differences between two data sources"
When Data Quality Guardian Agent is enabled, AskTable regularly automatically detects data quality, proactively pushing alerts when issues found.
Data quality detection (prerequisite: is data reliable?)
↓
Anomaly detection (discovered issue: is data anomalous?)
↓
Drill-down/attribution (diagnose issue: why anomalous?)
↓
Metric interpretation (translate: what does it mean?)
↓
Report orchestration (output: complete analysis report)
Data quality detection is the prerequisite of all analysis skills. If data quality score is below 70, AskTable will first remind users to fix data rather than continue analysis.
Pain point: Management lacked trust in data reports because two decision-making mistakes caused by data definition inconsistencies occurred. Every time looking at reports had to ask "is this data accurate".
Solution: Enable Data Quality Detection Skill, each analysis report automatically includes data quality score.
Effects:
"Before I had to question every report, now seeing data quality score 90+ gives me confidence. This isn't a technical issue, it's a trust issue." —— Data Lead, a certain financial company
Data Quality Detection Skill's core value isn't in "discovering data has problems", but in:
Good data analysis doesn't start with analysis, but with confirming data is reliable.
sidebar.noProgrammingNeeded
sidebar.startFreeTrial