我将我的人工智能助手的平面事实存储替换为图数据库
# 我用图数据库替换了我的AI代理的扁平事实存储,它的运行内存仅为85MB
我正在构建LocalClaw,这是一个以本地模型为主的AI代理框架,通过Ollama在个人硬件上运行。没有云服务,也没有API费用。几周前,我发布了关于路由器/专家架构的帖子,很多人询问了内存系统的相关信息,下面就是这个内容。
## 问题
最初使用的是一个JSONL事实存储和嵌入相似性检索。起初看似简单,但后来并非如此。在经过几周的实际使用后,我发现关于同一主题的14个近重复事实来自不同的会话。对重复数据进行分层去重,但仍然不够干净。
更大的问题是关系。“彼得在DevMesh工作”和“DevMesh正在构建一个外联平台”是两个独立的嵌入。你可以检索到每一个,但无法从一个跳转到另一个。没有多跳,没有事实演变。旧事实和新事实并存,但没有信号表明哪个是当前的。
经过四次对扁平存储的迭代后,我意识到我在修补错误的东西。
## 为什么选择FalkorDB
我考虑了Neo4j(社区版故意受限)、Memgraph(没有原生向量搜索)和FalkorDB。
FalkorDB在Docker中运行,使用Redis协议,具有原生的HNSW向量搜索,并且在我当前的规模下整个系统仅占用85MB。图遍历、向量相似性和混合关键词搜索都在一个容器中运行。没有单独的Qdrant,也没有两个存储之间的同步问题。
## 图所能实现的功能
每个事实通过ABOUT边与其引用的实体连接。多跳遍历变得自然——查找与项目相关的所有内容,查找与某项技术同时提到的所有实体。
当一个事实发生变化时,新事实会与旧事实建立SUPERSEDES边。两个事实都保留时间戳。时间查询现在可以使用。“系统上个月对这个的了解是什么?”是一个真实的查询。
向量索引在FalkorDB内部运行,使用来自qwen3-embedding:8b的4096维嵌入。O(log n)的HNSW搜索。没有外部数据库。
## 让我惊讶的部分
小型本地模型的实体提取是不可靠的盲目操作。phi4-mini将DGX Spark分类为软件,并为同一实体的单数和复数形式创建了独立节点。
解决方案:在从新事实中提取实体之前,先从图中查询现有的类型化实体,并将它们注入到NER提示中作为参考上下文。现在,phi4-mini在分类任何新内容之前,会看到“DGX Spark → 硬件,FalkorDB → 软件”。每个正确类型的实体使未来的提取更加一致。图在没有任何额外训练的情况下,随着时间的推移教会模型。
## 评分
纯粹的向量相似性会浮现出语义上最接近的内容,无论其重要性如何。评分公式如下:
```
score = similarity × 0.5 + recency × 0.2 + importance × 0.3
```
重要性使用1-5级(关键健康/家庭=5,工作/身份=4,偏好=3,上下文=2,短暂=1)。一个相关性适中但关键的事实得分高于一个高度相关但短暂的事实。你妻子的健康状况得分高于昨天的天气。
## 我学到了什么
模型本身不进行计算。代码处理哪些事实发生了变化,哪些是重复的,得分是多少。模型处理其含义。一旦你让模型进行算术或基于哈希的去重,就会出现无法解释的失败。
重要性层级需要在提取提示中提供具体示例。phi4:14b将所有内容默认为2级,直到我添加了一些带有情感权重的少量示例。抽象指令无法校准模型。
一旦需要关系推理,图就超越了扁平存储。单单SUPERSEDES链就证明了迁移的合理性。
整个系统完全在Mac Mini上运行。图占用85MB。所有内容均为本地存储。
GitHub: https://github.com/PeterGreenAppliedAI/LocalClaw
查看原文
# I Replaced My AI Agent's Flat Fact Store with a Graph Database and It Runs in 85MB<p>I've been building LocalClaw, a local-model-first AI agent framework running on personal hardware through Ollama. No cloud, no API costs. A few weeks ago I posted about the router/specialist architecture. A lot of people asked about the memory system so here's that.<p>## The Problem<p>Started with a JSONL fact store and embedding similarity retrieval. Simple enough until it wasn't. After a few weeks of real use I had 14 near-duplicate facts about the same topics from different sessions. Layered dedup on top of dedup and it still wasn't clean.<p>The bigger problem was relationships. "Peter works at DevMesh" and "DevMesh is building an outreach platform" were two separate embeddings. You could retrieve each one but you couldn't traverse from one to the other. No multi-hop. No fact evolution. Old facts and new facts coexisted with no signal about which was current.<p>Four iterations on the flat store later I accepted I was patching the wrong thing.<p>## Why FalkorDB<p>Looked at Neo4j (Community Edition is intentionally crippled), Memgraph (no native vector search), and FalkorDB.<p>FalkorDB runs in Docker, uses the Redis wire protocol, has native HNSW vector search, and the entire thing sits at 85MB at my current scale. Graph traversal, vector similarity, and hybrid keyword search in one container. No separate Qdrant, no sync issues between two stores.<p>## What the Graph Enables<p>Every fact connects to the entities it references via ABOUT edges. Multi-hop traversal becomes natural - find everything connected to a project, find all entities mentioned alongside a technology.<p>When a fact changes, the new fact gets a SUPERSEDES edge to the old one. Both persist with timestamps. Temporal queries now work. "What did the system know about this last month?" is a real query.<p>The vector index runs inside FalkorDB on 4096-dimensional embeddings from qwen3-embedding:8b. O(log n) HNSW search. No external database.<p>## The Part That Surprised Me<p>Entity extraction by a small local model is unreliable blind. phi4-mini classified DGX Spark as software and created separate nodes for singular and plural forms of the same entity.<p>Fix: before extracting entities from a new fact, query existing typed entities from the graph and inject them into the NER prompt as reference context. Now phi4-mini sees "DGX Spark → hardware, FalkorDB → software" before it classifies anything new. Each correctly typed entity makes future extractions more consistent. The graph teaches the model over time without any additional training.<p>## Scoring<p>Pure vector similarity surfaces whatever is semantically closest regardless of whether it matters. The scoring formula:<p>```
score = similarity × 0.5 + recency × 0.2 + importance × 0.3
```<p>Importance uses a 1-5 tier (critical health/family = 5, job/identity = 4, preference = 3, context = 2, ephemeral = 1). A moderately relevant but critical fact scores higher than a highly relevant but ephemeral one. Your wife's health condition surfaces above yesterday's weather.<p>## What I Learned<p>The model computes nothing. Code handles which facts changed, which are duplicates, what the scores are. The model handles what it means. The moment you let a model do arithmetic or hash-based dedup you get failures you can't explain.<p>Importance tiers need concrete examples in the extraction prompt. phi4:14b defaulted everything to tier 2 until I added few-shot examples with emotional weight. Abstract instructions don't calibrate a model.<p>The graph beats flat storage the moment you need relationship reasoning. SUPERSEDES chain alone justified the migration.<p>Runs entirely on a Mac Mini. 85MB for the graph. Everything local.<p>GitHub: https://github.com/PeterGreenAppliedAI/LocalClaw