展示HN:用一个确定性的.py引擎替代5万美元的手动取证审计
我是一名软件架构师,最近开发了Exit Protocol(<a href="https://exitprotocols.com" rel="nofollow">https://exitprotocols.com</a>),这是一个针对高冲突诉讼的自动化法务会计引擎。
<p>问题:
如果你离婚了,并且需要证明某个特定的25万美元在一个高度混合的共同银行账户中是你的“独立财产”(例如,来自婚前创业的退出收益),那么证明责任完全是数学性的。历史上,这意味着需要支付法务注册会计师每小时500美元,将多年的模糊银行PDF文件导入Excel,并手动追踪每一美元。这通常需要几周时间,费用超过5万美元。
<p>我查看了法院用于此类案件的法律标准——最低中间余额规则(LIBR),意识到这并不是一个会计问题,而是一个分布式系统状态机问题。
<p>为什么我们不直接“抛出AI”呢?
<p>目前有一百家法律科技初创公司试图使用大型语言模型(LLMs)来总结银行数据。在法庭上,生成式AI是致命的责任。如果一个LLM虚构了一笔交易,整个账本在道伯特标准下将被视为不可接受。
<p>为了使其适合法庭,我们必须构建一个严格确定性的管道:
<p>1. 视觉原生摄取(超越Tesseract)
银行对账单是光学字符识别(OCR)的最终Boss(合并单元格、重叠的借贷列)。标准线性OCR会灾难性失败。我们构建了一个空间网格OCR管道(使用Azure文档智能和本地Surya OCR作为后备),能够映射页面的几何结构。它可以完美重建表格账本,即使是来自“地狱PDF”的多代文件。
<p>2. 确定性引擎(LIBR)
LIBR算法充当单向棘轮。如果账户余额低于你的独立财产索赔金额,你的索赔将永久限制在新的底线。后续的婚姻存款不会重新填充它(“补充谬误”)。该引擎按时间顺序重放数千笔交易,持续评估S_t = min(S_t-1, B_t)。
<p>3. 解决时间戳歧义
银行PDF提供日期,而不是时间戳。如果在同一天发生了一笔1万美元的存款和1万美元的取款,顺序就很重要。我们构建了一个模拟切换,强制“最坏情况”(先处理取款)与“最好情况”排序,建立一个数学上不可反驳的“真相区”用于和解谈判。
<p>4. 加密链条和主权模式
律师们对云SaaS的安全漏洞感到恐惧。我们通过Docker将整个单体应用(Django 5.0/Postgres/Celery)容器化,以便企业可以在自己的硬件上以隔离模式运行(主权模式)。此外,每个生成的PDF档案都用基础数据快照的SHA-256哈希进行封存,向法官证明输出自生成以来未被篡改。
<p>如果你想看到数学的实际应用,我们设置了一个“演示沙箱”,里面填充了一个合成的、高度复杂的三年混合账本。你可以在这里自己运行引擎(推荐使用桌面版):<a href="https://exitprotocols.com/simulation/uplink/" rel="nofollow">https://exitprotocols.com/simulation/uplink/</a>
<p>这是我们系统从原始PDF或法务审计档案生成的确切“律师工作产品”——<a href="https://exitprotocols.com/static/documents/Forensic_Audit_Sample_Vinay_MKT2026.pdf" rel="nofollow">https://exitprotocols.com/static/documents/Forensic_Audit_Sample_Vinay_MKT2026.pdf</a>
<p>我非常希望得到HN社区对架构的反馈——特别是在处理边缘案例数据摄取和在B2B企业部署中维护加密完整性方面。
<p>谢谢!
查看原文
I’m a software architect, and I recently built Exit Protocol (<a href="https://exitprotocols.com" rel="nofollow">https://exitprotocols.com</a>), an automated forensic accounting engine for high-conflict litigation.<p>Problem:
If you get divorced and need to prove that a specific $250k in a heavily commingled joint bank account is your "separate property" (e.g., from a pre-marital startup exit), the burden of proof is strictly mathematical. Historically, this meant paying a forensic CPA $500/hour to dump years of blurry bank PDFs into Excel and manually trace every dollar. It takes weeks and routinely costs over $50,000.<p>I looked at the legal standard courts use for this—the Lowest Intermediate Balance Rule (LIBR)—and realized it wasn’t an accounting problem. It is a Distributed Systems state-machine problem.<p>Why we didn't just "Throw AI at it"?<p>There are a hundred legal-tech startups right now trying to use LLMs to summarize bank data. In a courtroom, GenAI is a fatal liability. If an LLM hallucinates a single transaction, the entire ledger is inadmissible under the Daubert standard.<p>To make this court-ready, we had to build a strictly deterministic pipeline:<p>1. Vision-Native Ingestion (Beating Tesseract)
Bank statements are the final boss of OCR (merged cells, overlapping debit/credit columns). Standard linear OCR fails catastrophically. We built a spatial-grid OCR pipeline (using Azure Document Intelligence with a local Surya OCR fallback) that maps the geometric structure of the page. It reconstructs tabular ledgers perfectly, even from multi-generational "PDFs from hell."<p>2. The Deterministic Engine (LIBR)
The LIBR algorithm acts as a one-way ratchet. If an account balance drops below your separate property claim amount, your claim is permanently capped at that new floor. Subsequent marital deposits do not refill it (the "replenishment fallacy"). The engine replays thousands of transactions chronologically, continuously evaluating S_t = min(S_t-1, B_t).<p>3. Resolving Timestamp Ambiguity
Bank PDFs give you dates, not timestamps. If a $10k deposit and $10k withdrawal happen on the same day, order matters. We built a simulation toggle that forces "Worst Case" (withdrawals process first) vs "Best Case" sorting, establishing a mathematically irrefutable "Zone of Truth" for settlement negotiations.<p>4. Cryptographic Chain of Custody & Sovereign Mode
Lawyers are terrified of cloud SaaS breaches. We containerized the entire monolith (Django 5.0/Postgres/Celery) via Docker so enterprise firms can run it air-gapped on their own hardware (Sovereign Mode). Furthermore, every generated PDF dossier is sealed with a SHA-256 hash of the underlying data snapshot, proving to a judge that the output hasn't been tampered with since generation.<p>If you want to see the math in action, we set up a "Demo Sandbox" populated with a synthetic, highly complex 3-year commingled ledger. You can run the engine yourself here (Desktop recommended): <a href="https://exitprotocols.com/simulation/uplink/" rel="nofollow">https://exitprotocols.com/simulation/uplink/</a><p>Here is the exact "Attorney Work Product" it generates from raw PDF or Forensic Audit Dossier our system generates- <a href="https://exitprotocols.com/static/documents/Forensic_Audit_Sample_Vinay_MKT2026.pdf" rel="nofollow">https://exitprotocols.com/static/documents/Forensic_Audit_Sa...</a><p>I'd love feedback from the HN crowd on the architecture—specifically handling edge-case data ingestion and maintaining cryptographic integrity in B2B enterprise deployments.<p>Cheers!