AskTable
sidebar.freeTrial

Data Quality Detection Skill: Garbage In, Garbage Out - Ensuring Analysis Starts with Reliable Data

AskTable Team
AskTable Team 2026-04-06

There's a classic saying in the data analyst industry:

Garbage in, garbage out.

If the input data has problems, no matter how advanced the analysis method or complex the model, the output will definitely be unreliable.

But have you encountered these situations:

  • Report done, then found 15% of data is missing
  • "Sales amount" from two systems don't match because definition differs
  • Data has obvious duplicate records affecting aggregation results
  • Used last month's data for trend analysis, but last month's data collection had issues

The common point of these problems: Data has issues, but wasn't discovered before analysis completed.

AskTable's Data Quality Detection Skill does one thing: proactively check data quality before and during analysis, ensuring analysis results are trustworthy.


I. Five Types of Data Quality Issues

1.1 Completeness Issues

Manifestation: Data missing, null values, breakpoints

Examples:
-某 store March 15-18 sales data is empty
- 40% of "age" field empty in user profiles
- 3-day breakpoint in time series

Impact:
- Trend analysis shows artificial "cliffs"
- Aggregated data is lower
- Prediction models distorted due to insufficient historical data

1.2 Consistency Issues

Manifestation: Same metric values inconsistent across different systems

Examples:
- CRM system shows this month's sales 5M, ERP shows 4.8M
- Reason: CRM counts by order time, ERP counts by shipping time
- Different definitions, resulting in two "truths"

Impact:
- Decision-makers see contradictory data, don't know which to trust
- Cross-system analysis results unreliable

1.3 Accuracy Issues

Manifestation: Data values clearly illogical

Examples:
- Some user age shows 200
- Some order amount is negative
- Some store daily sales 100M (actually a small store)

Impact:
- Averages skewed by extreme values
- Aggregated results seriously distorted

1.4 Duplication Issues

Manifestation: Same record recorded multiple times

Examples:
- Some orders recorded twice due to system malfunction
- User submitted form, network jitter, duplicate submissions

Impact:
- Aggregated data artificially inflated
- Ratio metrics like conversion rate distorted

1.5 Timeliness Issues

Manifestation: Data arrives late or is outdated

Examples:
- Today's data only updated 3 days ago
- After data source change, old definition data continues flowing in

Impact:
- Decisions based on outdated data may already be invalid
- Trend analysis shows artificial turning points

II. How Data Quality Detection Skill Works

2.1 Automatic Detection Process

Before each analysis, AskTable automatically executes the following data quality checks:

加载图表中...

Completeness Check

Detection items:
- Null value ratio: proportion of null values in a field
- Missing data blocks: consecutive missing time periods or records
- Coverage: actual data volume vs expected data volume

Example:
"Past 30 days sales data:
- 28 days with data, 2 days missing (March 15-16)
- Data coverage 93.3%
- Missing days proportion 6.7%, below 10% threshold ✅"

Consistency Check

Detection items:
- Cross-system metric comparison: same metric values from different sources
- Definition change records: whether statistical definitions changed
- Reconciliation checks: logical relationships between related metrics

Example:
"Sales comparison:
- Data source A (POS system): 5.2M
- Data source B (ERP system): 5.15M
- Difference 50K (0.97%), within acceptable range ✅

But note:
- Data source A counts by order time, data source B counts by payment time
- Difference mainly from cross-day orders (orders placed 23:00-24:00, paid next day)"

Accuracy Check

Detection items:
- Range validation: whether values are within reasonable ranges
- Extreme value detection: whether abnormally high/low values exist
- Logical validation: logical relationships between related fields

Example:
"Data accuracy check:
- Age field: found 3 records with age > 120 ⚠️
- Order amount: found 1 negative value record (returns) ⚠️
- Sales/orders = avg transaction, validation passed ✅"

2.2 Data Quality Score

AskTable gives a comprehensive data quality score (0-100):

📊 Data Quality Report

Overall score: 82/100 ✅ Good

┌────────────┬──────┬────────┐
│ Dimension  │ Score│ Status │
├────────────┼──────┼────────┤
│ Completeness│ 88  │ ✅     │
│ Consistency│ 90  │ ✅     │
│ Accuracy   │ 75  │ ⚠️    │
│ Duplication│ 95  │ ✅     │
│ Timeliness │ 65  │ ⚠️    │
└────────────┴──────┴────────┘

Issues found:
1. ⚠️ Accuracy: 3 records with age > 120, suggest cleanup
2. ⚠️ Timeliness: Data delayed 3 days, latest data is 3 days ago

Analysis impact assessment:
- Current data quality sufficient for trend and comparative analysis
- But aggregated results precise to the digit may have deviations
- Suggest fixing timeliness issues before making precise predictions

2.3 Fix Suggestions

AskTable not only tells you "data has problems" but also gives fix suggestions:

IssueFix Suggestion
Null values"Can use before/after average filling (recommended) or exclude missing days directly"
Extreme values"Suggest excluding 3 records with age > 120"
Duplicates"5 duplicate orders detected, suggest deduplicating and recalculating"
Definition inconsistency"Suggest unified order time definition, or note difference source in report"
Data delay"Suggest checking data sync task, currently delayed 3 days"

III. Typical Application Scenarios

Scenario 1: Pre-Analysis Data Checkup

User asks: "Is this data reliable?"

📊 Data Quality Checkup Report

Data source: Sales database (January-March 2026)
Record count: 90 days × 3 = 270 store-days data

Overall score: 82/100 ✅ Good

【Completeness】88/100
- 2 days data missing (March 15-16, system maintenance)
- Coverage 99.3%

【Consistency】90/100
- Cross-system difference < 1%, normal

【Accuracy】75/100 ⚠️
- 3 abnormal records (age > 120)
- 1 negative amount record (returns, normal business)

【Timeliness】65/100 ⚠️
- Data delayed 3 days
- Latest available data: April 3

【Conclusion】
Data quality good, can be used for trend and comparative analysis.
But for day-level precise prediction, suggest fixing data delay first.

Scenario 2: Regular Data Quality Inspection

User asks: "Are there any problems with recent data?"

📊 This Week's Data Quality Inspection

Inspection range: Past 7 days (March 31 - April 6)
Inspection result: 2 items to watch

1. ⚠️ April 2 data missing
   - Cause: POS system upgrade, data sync interrupted
   - Impact: Day's sales data unavailable
   - Suggestion: Fill with before/after average or contact IT to recover

2. ⚠️ User data duplication rate increased to 2.5%
   - Normal range: < 1%
   - Possible cause: Registration system upgrade caused duplicate submissions
   - Suggestion: Run deduplication script, check registration system

Other dimensions normal.

Scenario 3: Data Quality Trend Tracking

📊 Data Quality Trend (Past 4 weeks)

| Week | Overall Score | Main Issue |
|----|----------|----------|
| W1 | 78 | Data delayed 5 days |
| W2 | 80 | Delay fixed, but nulls appeared |
| W3 | 82 | Nulls fixed, duplicates appeared |
| W4 | 85 | Duplicate cleanup in progress |

Trend: ✅ Continuously improving
Expected to reach 90+ next week

IV. Hands-On: How to Use Data Quality Detection Skill

4.1 Natural Language Trigger

"Is this data reliable?"
"Are there any problems with recent data?"
"Check data quality for me"
"Before analysis, check if data has problems"

4.2 Deep Detection

"Run a comprehensive data quality check"
"Check this week's data quality"
"Compare differences between two data sources"

4.3 Auto Detection

When Data Quality Guardian Agent is enabled, AskTable regularly automatically detects data quality, proactively pushing alerts when issues found.


Data quality detection (prerequisite: is data reliable?)
    ↓
Anomaly detection (discovered issue: is data anomalous?)
    ↓
Drill-down/attribution (diagnose issue: why anomalous?)
    ↓
Metric interpretation (translate: what does it mean?)
    ↓
Report orchestration (output: complete analysis report)

Data quality detection is the prerequisite of all analysis skills. If data quality score is below 70, AskTable will first remind users to fix data rather than continue analysis.


VI. Customer Case

A Certain Financial Company: From "Data Distrust" to "Data-Driven Decision Making"

Pain point: Management lacked trust in data reports because two decision-making mistakes caused by data definition inconsistencies occurred. Every time looking at reports had to ask "is this data accurate".

Solution: Enable Data Quality Detection Skill, each analysis report automatically includes data quality score.

Effects:

  • Data quality score improved from 68 to 92 (after 2 months continuous fixing)
  • Management trust in data reports increased from 45% to 90%
  • 47 data quality issues discovered and fixed
  • Decision mistakes caused by data issues dropped from 2-3 per quarter to 0

"Before I had to question every report, now seeing data quality score 90+ gives me confidence. This isn't a technical issue, it's a trust issue." —— Data Lead, a certain financial company


Summary

Data Quality Detection Skill's core value isn't in "discovering data has problems", but in:

  1. Proactive detection: Don't wait for users to discover issues, check data quality before analysis
  2. Comprehensive coverage: Five dimensions - completeness, consistency, accuracy, duplication, timeliness
  3. Quantified scoring: Give 0-100 comprehensive score, making quality measurable and trackable
  4. Fix suggestions: Not just telling you "there's a problem", but "how to fix it"

Good data analysis doesn't start with analysis, but with confirming data is reliable.


Extended Reading

cta.readyToSimplify

sidebar.noProgrammingNeededsidebar.startFreeTrial

cta.noCreditCard
cta.quickStart
cta.dbSupport