
sidebar.wechat

sidebar.feishu
sidebar.chooseYourWayToJoin

sidebar.scanToAddConsultant
If you've used any conversational AI product, you've almost certainly experienced this scenario:
In your first conversation, you tell the AI: "From now on, whenever you report metrics like sales revenue, include the specific time period."
AI replies: "Got it, I'll remember that."
The next day, you open a new conversation window and ask: "How were sales last month?"
AI replies: "Sales last month were 3.2 million." — No time range included.
You sigh and say it again: "I told you to include the time."
AI: "Sorry, I'll do better next time."
This isn't because the AI isn't smart enough. It's a systemic flaw in conversational AI.
Every new conversation is, for the AI, a "first meeting." All your preferences, context, and habits — gone, reset to zero.
In this article, I want to talk from a user experience perspective about how we built a permanent memory system in AskTable, and why it's the critical step that takes an AI data analysis product from "usable" to "great."
Mainstream conversational AI products today limit memory to the current session's context window. This means:
This design was barely acceptable in early general-purpose chat scenarios — users were just "asking questions" and moving on. But in data analysis, the problem is magnified many times over.
Data analysis is not a "ask and leave" scenario. It's a continuous, iterative, cumulative workflow.
Consider this real user journey:
Day 1: Operations manager Xiao Li uses an AI data analysis tool for the first time. He spends a lot of time explaining the company's business definitions — how GMV is calculated, how returns are handled, which provinces are in the East China region. The AI says it understands everything.
Day 2: Xiao Li continues working, but all the definitions from yesterday are gone. He has to explain everything again.
Week 1: Xiao Li has now said the same things five times. He starts wondering whether this tool is even worth using.
Month 1: Xiao Li gives up on the AI tool and goes back to Excel. Because "teaching it takes longer than just looking things up myself."
This is the real fate of memoryless AI in data analysis scenarios.
| Scenario | Without Memory | With Memory |
|---|---|---|
| First time defining "GMV" | AI understands | AI understands |
| Second time asking about GMV | Need to re-explain the definition | AI uses your definition directly |
| Third monthly report | Explain format preferences again | AI generates per your preference |
| Nth query | Still need to start from scratch | AI works with you like a seasoned colleague |
The core issue isn't that AI isn't smart enough — it's that every conversation starts from zero.
It's like having the same colleague ask you to reintroduce yourself every single time you talk. That's not intelligence — that's an efficiency killer.
When users interact with AI, they naturally bring an "anthropomorphic" expectation:
"I already told you once — why did you forget again?"
Behind this complaint lies the user's expectation of a continuous personality — they want the AI to be an assistant that "knows them," not a stranger rebooted every time.
From a psychological perspective, this is a fundamental need of relational interaction. Humans expect a degree of "memory continuity" from any intelligent agent, whether human or machine. It's the foundation of building trust and usage habits.
From a technical perspective, this expectation is entirely reasonable. Because that's how human assistants actually work: you only need to say something once.
Some products on the market have tried to address this, but they generally fall short:
These solutions are either not automated enough, not flexible enough, or not precise enough.
Permanent memory means the AI can remember users' preferences, habits, definitions, and context across sessions, across conversations, and across time, and automatically apply them in future interactions.
Specifically:
And these memories are not passively stored — they are actively recalled and applied when you ask questions.
In AskTable's data analysis scenarios, we've identified several core categories of memory:
1. Preference Memory
Personal preference settings established during usage, such as:
2. Definition Memory
User definitions for business metrics and terminology, such as:
3. Focus Memory
Dimensions, regions, and time windows the user frequently focuses on, such as:
4. Correction Memory
Errors the user has corrected in past conversations, such as:
In AskTable, the memory system operates in two phases across four stages:
## Historical Memories
- When the user asks about metrics like sales or profit margin, include the specific time period (2026-04-01)
- The user primarily focuses on East China region data (2026-03-15)
- Prefers to show year-over-year and month-over-month growth rates in charts (2026-03-20)
The four stages are:
1. Search
When a user initiates a new request, the system retrieves relevant historical memories from the memory store. This happens early in the request processing pipeline, ensuring that subsequent analysis already carries the memory context.
Specifically, when a user inputs "How were sales in the East China region last month?", the system:
2. Inject
The retrieved memories are formatted as structured context prompts and injected into the AI's system prompt. When generating a response, the AI naturally incorporates these memories.
The injected context looks roughly like this:
## Historical Memories
- When the user asks about metrics like sales or profit margin, include the specific time period (2026-04-01)
- The user primarily focuses on East China region data (2026-03-15)
- Prefers to show year-over-year and month-over-month growth rates in charts (2026-03-20)
These memory entries are appended to the end of the system prompt, allowing the AI to automatically apply them when generating responses.
3. Extract
After the AI completes its response, the system analyzes the conversation and extracts segments that may constitute new memories — such as user preference declarations, metric definitions, correction feedback, etc.
For example, if the user says in a conversation: "Don't use pie charts anymore, bar charts are better for comparison." The system uses an LLM to analyze this statement and extract a structured memory entry: "The user prefers bar charts over pie charts for data comparison."
4. Store
The extracted new memories are written to the vector database and indexed for future queries. Each memory entry includes a timestamp and source information for management and traceability.
In our engineering implementation, we adopted a read-write separation architecture:
The key consideration behind this design:
Reading affects the present; writing affects the future.
Reading directly determines the quality of the current response — if memories aren't loaded promptly, the AI will answer like it has amnesia. So reading must be synchronous and low-latency.
Writing, on the other hand, is about optimizing future experiences — even if this session's memory isn't written immediately, it only means one fewer memory next time, without impacting the current experience. So writing can be asynchronous and fire-and-forget.
Users don't need to wait for "memory write complete" before seeing their answer. The AI gives you results first, then silently learns in the background.
For the memory system's tech stack, we settled on mem0 as the memory management framework, paired with Qdrant for vector storage.
Why mem0?
1. Focused on One Thing: Memory
mem0's positioning is very clear — providing persistent memory capabilities for AI applications. It's not an all-in-one Agent framework; it's middleware specifically solving "memory storage, retrieval, and update." This focus means:
2. Out-of-the-Box Memory Operations
mem0 provides three core APIs:
add() — Add new memories
search() — Retrieve relevant memories
update() — Update existing memories
delete() — Delete specific memories
This aligns perfectly with our "read-write-update-delete" needs. We don't need to implement CRUD logic for memories at the bottom layer ourselves.
3. LLM-Driven Intelligent Extraction
mem0's add operation doesn't simply dump conversation content into a database — it uses an LLM to analyze the conversation and extract structured memory fragments. This means:
This intelligent extraction capability significantly reduces application-layer memory management complexity. We don't need to write our own rules for parsing user intent — mem0 handles this internally via LLM.
Qdrant is an open-source vector database. We chose it based on the following considerations:
1. High-Performance Similarity Search
Memory retrieval's core task is "finding the most relevant historical memories for the current question." Qdrant's HNSW index performs excellently in vector similarity search scenarios, returning Top-K relevant memories in milliseconds.
In our testing, even with thousands of memories, retrieval latency remained under 10ms — fully meeting real-time conversation requirements.
2. Metadata Filtering
Memories don't just contain vector representations — they also need timestamps, source, type, and other metadata. Qdrant supports combined queries of vector search + metadata filtering, which is crucial for precise memory recall.
For example, we can query "preference memories related to sales created within the last 30 days" — a combined query capability that pure vector search cannot provide.
3. Flexible Deployment
Qdrant supports Docker containerized deployment as well as cloud services, integrating well with our overall tech stack. For enterprise customers with on-premises deployment needs, Qdrant can also run entirely within the customer's internal network.
This is one of the most critical architectural decisions in the entire memory system design.
Several isolation granularities are available for memory systems:
| Isolation Granularity | Description | Use Case |
|---|---|---|
| User-level | Each user has an independent memory space | Personal assistants |
| Session-level | Each session has independent memory | None (defeats cross-session purpose) |
| Agent-level | All users share memory under the same Agent | Team knowledge base |
| Organization-level | Shared across the entire organization | Enterprise-wide knowledge base |
AskTable chose Agent-level isolation granularity. This means:
Under the same Data Agent, all users' memories are shared. When User A defines the "GMV" metric, User B also benefits from this memory when asking questions.
Why this design?
1. Team Knowledge Base Model
In data analysis scenarios, metric definitions, business rules, and preference settings are typically not personal — they are team-level consensus. When User A tells the AI "GMV includes refund amounts," this definition applies equally to User B.
With user-level isolation, you'd end up with the confusing situation of "User A's GMV definition differs from User B's GMV definition."
2. Reducing Repetitive Communication Costs
With user-level isolation, every team member would need to repeat the same definitions and preferences to the AI. This directly contradicts our original goal of solving the "repeating yourself" pain point.
Our objective is to eliminate the need for teams to repeatedly explain the same thing. Agent-level sharing is the prerequisite for achieving this.
3. Memory as Knowledge Accumulation
As the team uses the system over time, the Agent's memory store naturally evolves into a living team knowledge base. New members don't need to teach the AI anything — the AI already "knows" the team.
The value of this knowledge accumulation far exceeds simple conversation efficiency gains. It's actually building the team's data assets.
4. Alignment with AskTable's Product Model
AskTable's core product unit is the Data Agent. Each Data Agent corresponds to a specific data analysis scenario (e.g., sales analysis, operations monitoring), with independent data sources, analysis models, and permission configurations. As a capability of the Agent, the memory system naturally shares the same Agent boundary.
Of course, Agent-level sharing also introduces challenges — such as how to handle memory conflicts between users. In practice, this is resolved through timestamp priority and manual management capabilities.
Architecturally, we made an important decision: memory capabilities are abstracted through a Protocol layer, and mem0 is just the current implementation.
┌─────────────────────────────────┐
│ Agent Framework Layer │
│ (Only depends on MemoryProtocol interface) │
├─────────────────────────────────┤
│ MemoryProtocol Abstraction Layer │
│ - search(query) │
│ - add(conversation) │
│ - update(id, content) │
│ - delete(id) │
├─────────────────────────────────┤
│ Implementation Layer (Replaceable) │
│ ┌─────────┐ ┌──────────────┐ │
│ │ mem0 │ │ In-house (future) │ │
│ │ +Qdrant │ │ │ │
│ └─────────┘ └──────────────┘ │
└─────────────────────────────────┘
This means:
The benefits of this design are clear:
Before the memory system, the user experience looked like this:
Conversation 1 (April 1):
User: "What is our GMV calculation definition?"
AI: "Please tell me your GMV definition."
User: "GMV includes paid and pending orders, excludes refunds."
AI: "Got it, I understand."
Conversation 2 (April 2):
User: "What is this month's GMV?"
AI: "This month's GMV is 12 million."
(No time range, not using the correct definition)
User: "I told you GMV should include pending orders, and include the time!"
AI: "Sorry, this month's (2026-04-01 to 2026-04-30) GMV is 13.5 million,
including paid and pending orders."
Conversation 3 (April 3):
User: "What about the East China region?"
AI: "Which region are you referring to?"
(Doesn't remember the user primarily focuses on East China)
User: "Forget it, I'll look it up myself."
Conversation 4 (April 5):
User: "Generate a monthly report."
AI: "Sure, what content do you need? What chart format?
Which regions to focus on? What time range?"
User: "......"
(User closes the browser)
Every conversation, the user repeats the same things. Over time, their patience with the AI is completely exhausted.
This isn't the user's problem — it's a product problem.
With the permanent memory system:
Conversation 1 (April 1):
User: "What is our GMV calculation definition?"
AI: "Please tell me your GMV definition."
User: "GMV includes paid and pending orders, excludes refunds."
AI: "Got it, I understand."
(Background: memory write completed)
Conversation 2 (April 2):
User: "What is this month's GMV?"
AI: "This month's (2026-04-01 to 2026-04-30) GMV is 13.5 million,
including paid and pending orders."
(Automatically includes time range and correct definition)
User: "Nice."
Conversation 3 (April 3):
User: "What about the East China region?"
AI: "East China region's GMV this month is 4.8 million, accounting for 35.6%,
up 12.3% month-over-month."
(Automatically applies the user's focus on East China region memory)
Conversation 4 (April 5):
User: "Generate a monthly report."
AI: Directly generates the report in the user's preferred format,
focusing on their regions of interest, using the correct definitions.
Includes: East China province-level data, YoY/MoM bar charts, time range annotations.
User: "Great, I'll send this to the boss directly."
The AI no longer needs to be repeatedly taught. It learns silently, then applies silently.
| Dimension | Before | After |
|---|---|---|
| New conversation startup | Zero context | Automatically loads relevant memories |
| Metric definitions | Need to re-explain every time | Define once, applies globally |
| Preference settings | Repeated declarations | Automatically applied |
| Error corrections | Same mistakes repeat after correction | Corrections are permanent |
| Team collaboration | Everyone teaches the AI individually | One person defines, everyone benefits |
| New member onboarding | Need to teach the AI from scratch | The AI already knows the team |
| User feeling | "It doesn't remember me" | "It understands me better each time" |
To give users full control and visibility over the memory system, we added a "Memories" Tab to the Data Agent configuration page.
This Tab provides the following capabilities:
1. Memory Overview
Displays all memory entries for the current Agent, sorted by time in descending order. Each entry includes:
2. Search Memories
Supports keyword search to quickly locate specific memory entries. For example, searching "GMV" instantly finds all GMV-related memories.
3. Manually Add Memories
Users can directly add memory entries through the UI without needing to trigger them via conversation. This is useful for rules and preferences that don't naturally emerge from dialogue.
4. Delete Memories
Users can delete memories that are no longer needed. For example, when a metric definition changes, the old definition memory can be removed.
5. Toggle Control
Users can disable the memory feature. In certain scenarios (e.g., temporary analysis, test data), users may not want the memory system to intervene.
6. Memory Quality Feedback
In the future, we plan to add memory quality feedback functionality, allowing users to flag memory accuracy and help the system continuously improve.
Introducing any new system brings new challenges. The memory system is no exception. Here are the main challenges we encountered in practice and how we addressed them.
The Problem
Each mem0 add operation requires 1-2 LLM calls (to extract structured memories from conversations). For high-frequency usage scenarios, this overhead is not negligible.
Assuming a team has 50 conversations per day, each generating 1 memory, that's an additional 50-100 LLM calls per day. Over a month, that's 1,500-3,000 calls.
Mitigation Strategies
Asynchronous Execution
Memory writes execute asynchronously after the response is complete, adding no user-perceived latency. The speed at which users see answers is unaffected.
Batch Processing
For multiple rounds of conversation within a short time window, we merge extractions rather than processing each individually. For example, if a user asks 5 questions within 10 minutes, we consolidate those 5 rounds into a single extraction operation rather than 5 separate ones.
On-Demand Enablement
Users can control whether the memory feature is enabled in Agent configuration. For temporary analysis tasks, memory can be disabled to avoid unnecessary LLM overhead.
Threshold Control
Once the memory store reaches a certain scale, we can reduce the write frequency. As the memory store matures, the marginal value of new memories gradually diminishes.
The Problem
mem0's fact extraction prompt is primarily optimized for English scenarios. In Chinese conversations, extraction accuracy and structured output quality may be affected.
For example, in English:
In Chinese:
Mitigation Strategies
Structured Memory Reduces Language Dependency
AskTable's memory content tends toward structured data (metric definitions, preference declarations, etc.), which is less affected by language model differences in cross-language scenarios. Structured content inherently carries sufficient semantic information.
Progressive Prompt Optimization
We plan to customize extraction prompts and memory schemas based on Chinese scenario characteristics. Chinese-specific optimization plans are already in the pipeline.
Impact Assessment and Monitoring
We continuously monitor memory quality in production use, driving optimizations with data. We periodically sample memory entries for accuracy to ensure extraction quality remains within acceptable bounds.
The Problem
When the memory system "remembers wrong" or "recalls irrelevant memories," how do you pinpoint the issue? The black-box nature of memory systems makes debugging a challenge.
Typical debugging scenarios include:
Mitigation Strategies
Frontend Visibility
In the Memories Tab of the Data Agent configuration page, users can:
This visibility ensures users are no longer "feeling around in the dark" — they can clearly see what the AI has remembered.
Protocol Abstraction Ensures Replaceability
Through the Protocol layer abstraction, mem0 is just one implementation. If we find mem0 insufficient for observability in the future, we can switch to an in-house solution that provides more detailed trace and debug information.
Internal Trace Records
The system records the memory retrieval and write process, including:
These trace records provide data support for troubleshooting.
The Problem
In Agent-level shared memory scenarios, different users may have different definitions for the same metric. For example:
These two memories are contradictory. Who should the AI listen to?
Mitigation Strategies
Timestamp Priority
The newest memory overrides older ones. This aligns with the conventional logic that "business definitions follow the latest specification."
Conflict Alerts
When the system detects potential memory conflicts, it can flag them in the Memories Tab to alert users.
Manual Management
Users can view and edit memories in the Memories Tab to manually resolve conflicts. For critical metric definitions, we recommend centralized management by a data owner.
Granular Isolation (Future)
We're considering support for user-level isolated memory spaces in specific scenarios, coexisting with Agent-level shared memory. For example, some preferences are personal (like chart color preferences), while some definitions are team-shared (like GMV definitions).
The permanent memory system is the first step in AskTable's AI capability evolution, but not the last. Here are the directions we're planning and exploring.
The current memory system is flat — all memories are stored in the same space. In the future, we plan to introduce memory categorization:
Different memory types will have different retrieval and application strategies:
Memories are not eternal. As business evolves, certain memories will expire, become invalid, or need updating. We plan to introduce:
The current memory system is passive — memories are only extracted when users explicitly declare them in conversation. We're exploring:
As AskTable supports more types of Agents, memory sharing and collaboration between different Agents will become an important topic:
Our envisioned future architecture:
┌─────────────────────────────────────┐
│ Organization-Level Knowledge Graph │
│ (Cross-Agent shared meta-knowledge and business definitions) │
├────────────┬────────────┬────────────┤
│ Sales Agent │ Operations Agent │ Analysis Agent │
│ Dedicated │ Dedicated │ Dedicated │
│ Memory │ Memory │ Memory │
└────────────┴────────────┴────────────┘
At the organization level, shared business definitions and terminology; at the Agent level, each maintains its own dedicated memories. This ensures both knowledge consistency and memory specificity.
We will gradually explore these directions in future versions.
At its core, the permanent memory system transforms AI from a tool that's "a stranger every time" into a colleague that "understands you better over time."
This isn't a nice-to-have feature — it's the infrastructure that takes conversational AI from usable to great.
In AskTable's practice, through the mem0 + Qdrant architecture, we've achieved:
Additionally, through frontend-visible memory management features, users can view, edit, and control their memory data, ensuring the transparency and controllability of the memory system.
Our design philosophy for the memory system can be summarized in one sentence:
A good memory system shouldn't make users aware of its existence. It should be like a human assistant's memory — natural, reliable, and requiring no conscious maintenance.
If you're interested in the agent architecture behind AI data analysis, we recommend our in-depth analysis of How AskTable AI Agent Works, as well as our article on How AskTable Packages Data Analysts into AI.
We also have a dedicated discussion on how memory systems enable AI to become a continuously growing "data team": Memory Systems: Making AI a Growing Data Team.
Key Takeaway: A good AI assistant shouldn't make you reintroduce yourself every time. It should remember everything you've said and silently apply it in the next conversation. That's the purpose of permanent memory — not technical showboating, but making interaction feel natural again.
enterprise.cta.description
enterprise.cta.consulting.description
enterprise.cta.implementation.description
enterprise.cta.footer