问HN:挖掘科学论文

1作者: davidbjaffe1 天前原帖
人们在使用大型语言模型(LLMs)从科学论文中提取信息方面的经验如何? 我的个人经历是:我首次尝试从3730篇临床试验论文中提取抗药物抗体(ADA)率,这些论文均已在PubMed中索引。我从PDF文件开始。Claude Opus 4.7使用我们制定的规则文档分析每个PDF。处理所有论文大约花了一周的时间,因为我不断遇到会话限制;总费用约为25美元。我们从909篇论文中获得了实际的ADA率。其余的论文大多是因为没有提供该率或不符合我们的标准,包括仅一次只使用一种药物的情况。 我阅读了其中三十篇论文,并重新审阅了那些与Claude的答案不同的论文,得出结论:Claude出错了一次,而我出错了三次。 因此,这种方法是有效的,但并不是完全方便:会话限制意味着我不能启动后就离开,或者我不知道如何设计这种能力。此外,我也很好奇本地模型的表现如何。 为此,我在我的Mac M5 Max(128GB内存)上尝试了llama 3.3 70B。我使用了Ollama,Q4_K_M,128k上下文,经过pdftotext -layout处理后约80k输入标记。 一篇论文花了18分钟;该模型无法确定ADA率,而这在论文中是明确提到的。一篇论文并不是一个合适的基准,但速度太慢,无法进行适当的测试。显然,速度问题的一部分在于Claude可以访问服务器农场,而我只是运行在一台Mac上。这是使用本地计算时可能面临的实际问题。 在这个问题的最新进展如何?无论是逐篇回答问题,还是同时使用多篇论文?我很想听听成功的案例!
查看原文
What are peoples&#x27; experiences with using LLMs to mine information from scientific papers?<p>My own experience: I first attempted to extract the anti-drug antibody (ADA) rate from each of 3730 clinical-trial papers, all indexed in PubMed. I started from PDFs. Claude Opus 4.7 analyzed each PDF using a written rules doc that we had formulated. Running all the papers took about a week because I kept hitting session limits; the total cost was ~$25 (USD). We got actual rates from 909 papers. The rest were mostly cases where the rate was not present or did not meet our criteria, including administering only one drug at a time.<p>I read thirty of the papers and re-read those where I got a different answer from Claude, concluding that it had erred one time and I had erred three times.<p>So this works, but is not totally convenient: session limits mean that I can&#x27;t start it up and walk away. Or I don&#x27;t know how to engineer this capability. In addition I was curious how local models would perform.<p>To that end I tried llama 3.3 70B on my Mac M5 Max (128 GB mem). I used Ollama, Q4_K_M, 128 k context, ~80 k input tokens after pdftotext -layout.<p>One paper took 18 minutes; the model was unable to determine the ADA rate, whereas it is clearly in the paper. One paper is not a proper benchmark but it&#x27;s too slow to do a proper test. Clearly part of the speed issue here is that Claude has access to a server farm, whereas I&#x27;m running on just one Mac. This is part of the practical problem that someone would face with local computation.<p>What is the state of the art on this type of problem, for answering questions one paper at a time or using many papers at once? I&#x27;d love to hear success stories!