展示HN:用一个确定性的.py引擎替代5万美元的手动取证审计

2作者: cd_mkdir大约 2 个月前原帖
我是一名软件架构师,最近开发了Exit Protocol(<a href="https://exitprotocols.com" rel="nofollow">https://exitprotocols.com</a>),这是一个针对高冲突诉讼的自动化法务会计引擎。 <p>问题: 如果你离婚了,并且需要证明某个特定的25万美元在一个高度混合的共同银行账户中是你的“独立财产”(例如,来自婚前创业的退出收益),那么证明责任完全是数学性的。历史上,这意味着需要支付法务注册会计师每小时500美元,将多年的模糊银行PDF文件导入Excel,并手动追踪每一美元。这通常需要几周时间,费用超过5万美元。 <p>我查看了法院用于此类案件的法律标准——最低中间余额规则(LIBR),意识到这并不是一个会计问题,而是一个分布式系统状态机问题。 <p>为什么我们不直接“抛出AI”呢? <p>目前有一百家法律科技初创公司试图使用大型语言模型(LLMs)来总结银行数据。在法庭上,生成式AI是致命的责任。如果一个LLM虚构了一笔交易,整个账本在道伯特标准下将被视为不可接受。 <p>为了使其适合法庭,我们必须构建一个严格确定性的管道: <p>1. 视觉原生摄取(超越Tesseract) 银行对账单是光学字符识别(OCR)的最终Boss(合并单元格、重叠的借贷列)。标准线性OCR会灾难性失败。我们构建了一个空间网格OCR管道(使用Azure文档智能和本地Surya OCR作为后备),能够映射页面的几何结构。它可以完美重建表格账本,即使是来自“地狱PDF”的多代文件。 <p>2. 确定性引擎(LIBR) LIBR算法充当单向棘轮。如果账户余额低于你的独立财产索赔金额,你的索赔将永久限制在新的底线。后续的婚姻存款不会重新填充它(“补充谬误”)。该引擎按时间顺序重放数千笔交易,持续评估S_t = min(S_t-1, B_t)。 <p>3. 解决时间戳歧义 银行PDF提供日期,而不是时间戳。如果在同一天发生了一笔1万美元的存款和1万美元的取款,顺序就很重要。我们构建了一个模拟切换,强制“最坏情况”(先处理取款)与“最好情况”排序,建立一个数学上不可反驳的“真相区”用于和解谈判。 <p>4. 加密链条和主权模式 律师们对云SaaS的安全漏洞感到恐惧。我们通过Docker将整个单体应用(Django 5.0/Postgres/Celery)容器化,以便企业可以在自己的硬件上以隔离模式运行(主权模式)。此外,每个生成的PDF档案都用基础数据快照的SHA-256哈希进行封存,向法官证明输出自生成以来未被篡改。 <p>如果你想看到数学的实际应用,我们设置了一个“演示沙箱”,里面填充了一个合成的、高度复杂的三年混合账本。你可以在这里自己运行引擎(推荐使用桌面版):<a href="https://exitprotocols.com/simulation/uplink/" rel="nofollow">https://exitprotocols.com/simulation/uplink/</a> <p>这是我们系统从原始PDF或法务审计档案生成的确切“律师工作产品”——<a href="https://exitprotocols.com/static/documents/Forensic_Audit_Sample_Vinay_MKT2026.pdf" rel="nofollow">https://exitprotocols.com/static/documents/Forensic_Audit_Sample_Vinay_MKT2026.pdf</a> <p>我非常希望得到HN社区对架构的反馈——特别是在处理边缘案例数据摄取和在B2B企业部署中维护加密完整性方面。 <p>谢谢!
查看原文
I’m a software architect, and I recently built Exit Protocol (<a href="https:&#x2F;&#x2F;exitprotocols.com" rel="nofollow">https:&#x2F;&#x2F;exitprotocols.com</a>), an automated forensic accounting engine for high-conflict litigation.<p>Problem: If you get divorced and need to prove that a specific $250k in a heavily commingled joint bank account is your &quot;separate property&quot; (e.g., from a pre-marital startup exit), the burden of proof is strictly mathematical. Historically, this meant paying a forensic CPA $500&#x2F;hour to dump years of blurry bank PDFs into Excel and manually trace every dollar. It takes weeks and routinely costs over $50,000.<p>I looked at the legal standard courts use for this—the Lowest Intermediate Balance Rule (LIBR)—and realized it wasn’t an accounting problem. It is a Distributed Systems state-machine problem.<p>Why we didn&#x27;t just &quot;Throw AI at it&quot;?<p>There are a hundred legal-tech startups right now trying to use LLMs to summarize bank data. In a courtroom, GenAI is a fatal liability. If an LLM hallucinates a single transaction, the entire ledger is inadmissible under the Daubert standard.<p>To make this court-ready, we had to build a strictly deterministic pipeline:<p>1. Vision-Native Ingestion (Beating Tesseract) Bank statements are the final boss of OCR (merged cells, overlapping debit&#x2F;credit columns). Standard linear OCR fails catastrophically. We built a spatial-grid OCR pipeline (using Azure Document Intelligence with a local Surya OCR fallback) that maps the geometric structure of the page. It reconstructs tabular ledgers perfectly, even from multi-generational &quot;PDFs from hell.&quot;<p>2. The Deterministic Engine (LIBR) The LIBR algorithm acts as a one-way ratchet. If an account balance drops below your separate property claim amount, your claim is permanently capped at that new floor. Subsequent marital deposits do not refill it (the &quot;replenishment fallacy&quot;). The engine replays thousands of transactions chronologically, continuously evaluating S_t = min(S_t-1, B_t).<p>3. Resolving Timestamp Ambiguity Bank PDFs give you dates, not timestamps. If a $10k deposit and $10k withdrawal happen on the same day, order matters. We built a simulation toggle that forces &quot;Worst Case&quot; (withdrawals process first) vs &quot;Best Case&quot; sorting, establishing a mathematically irrefutable &quot;Zone of Truth&quot; for settlement negotiations.<p>4. Cryptographic Chain of Custody &amp; Sovereign Mode Lawyers are terrified of cloud SaaS breaches. We containerized the entire monolith (Django 5.0&#x2F;Postgres&#x2F;Celery) via Docker so enterprise firms can run it air-gapped on their own hardware (Sovereign Mode). Furthermore, every generated PDF dossier is sealed with a SHA-256 hash of the underlying data snapshot, proving to a judge that the output hasn&#x27;t been tampered with since generation.<p>If you want to see the math in action, we set up a &quot;Demo Sandbox&quot; populated with a synthetic, highly complex 3-year commingled ledger. You can run the engine yourself here (Desktop recommended): <a href="https:&#x2F;&#x2F;exitprotocols.com&#x2F;simulation&#x2F;uplink&#x2F;" rel="nofollow">https:&#x2F;&#x2F;exitprotocols.com&#x2F;simulation&#x2F;uplink&#x2F;</a><p>Here is the exact &quot;Attorney Work Product&quot; it generates from raw PDF or Forensic Audit Dossier our system generates- <a href="https:&#x2F;&#x2F;exitprotocols.com&#x2F;static&#x2F;documents&#x2F;Forensic_Audit_Sample_Vinay_MKT2026.pdf" rel="nofollow">https:&#x2F;&#x2F;exitprotocols.com&#x2F;static&#x2F;documents&#x2F;Forensic_Audit_Sa...</a><p>I&#x27;d love feedback from the HN crowd on the architecture—specifically handling edge-case data ingestion and maintaining cryptographic integrity in B2B enterprise deployments.<p>Cheers!