HackerNews中文版

人们在使用大型语言模型（LLMs）从科学论文中提取信息方面的经验如何？我的个人经历是：我首次尝试从3730篇临床试验论文中提取抗药物抗体（ADA）率，这些论文均已在PubMed中索引。我从PDF文件开始。Claude Opus 4.7使用我们制定的规则文档分析每个PDF。处理所有论文大约花了一周的时间，因为我不断遇到会话限制；总费用约为25美元。我们从909篇论文中获得了实际的ADA率。其余的论文大多是因为没有提供该率或不符合我们的标准，包括仅一次只使用一种药物的情况。我阅读了其中三十篇论文，并重新审阅了那些与Claude的答案不同的论文，得出结论：Claude出错了一次，而我出错了三次。因此，这种方法是有效的，但并不是完全方便：会话限制意味着我不能启动后就离开，或者我不知道如何设计这种能力。此外，我也很好奇本地模型的表现如何。为此，我在我的Mac M5 Max（128GB内存）上尝试了llama 3.3 70B。我使用了Ollama，Q4_K_M，128k上下文，经过pdftotext -layout处理后约80k输入标记。一篇论文花了18分钟；该模型无法确定ADA率，而这在论文中是明确提到的。一篇论文并不是一个合适的基准，但速度太慢，无法进行适当的测试。显然，速度问题的一部分在于Claude可以访问服务器农场，而我只是运行在一台Mac上。这是使用本地计算时可能面临的实际问题。在这个问题的最新进展如何？无论是逐篇回答问题，还是同时使用多篇论文？我很想听听成功的案例！

查看原文

What are peoples' experiences with using LLMs to mine information from scientific papers?My own experience: I first attempted to extract the anti-drug antibody (ADA) rate from each of 3730 clinical-trial papers, all indexed in PubMed. I started from PDFs. Claude Opus 4.7 analyzed each PDF using a written rules doc that we had formulated. Running all the papers took about a week because I kept hitting session limits; the total cost was ~$25 (USD). We got actual rates from 909 papers. The rest were mostly cases where the rate was not present or did not meet our criteria, including administering only one drug at a time.I read thirty of the papers and re-read those where I got a different answer from Claude, concluding that it had erred one time and I had erred three times.So this works, but is not totally convenient: session limits mean that I can't start it up and walk away. Or I don't know how to engineer this capability. In addition I was curious how local models would perform.To that end I tried llama 3.3 70B on my Mac M5 Max (128 GB mem). I used Ollama, Q4_K_M, 128 k context, ~80 k input tokens after pdftotext -layout.One paper took 18 minutes; the model was unable to determine the ADA rate, whereas it is clearly in the paper. One paper is not a proper benchmark but it's too slow to do a proper test. Clearly part of the speed issue here is that Claude has access to a server farm, whereas I'm running on just one Mac. This is part of the practical problem that someone would face with local computation.What is the state of the art on this type of problem, for answering questions one paper at a time or using many papers at once? I'd love to hear success stories!

问HN：挖掘科学论文