HackerNews中文版

有没有什么包可以帮助我像 Vitest 一样快速检查 LLM 输出的正确性，以便我能够自动化检测在更改提示时是否出现了回归问题？例如，这个房地产代理的助手一直在提供虚拟看房（尽管这并不存在），而不是进行交接（我为此修改了提示），所以我希望有一个包可以让我编写测试，比如，对于这个输入，不要提到这个或永远不要提到那些事情。或者对于某些输入，始终调用这个工具。我开始自己开发一个小工具，但在深入构建自己的包之前，想看看是否已经有类似的东西存在，或者我是否走错了方向！附言：不确定这是否应该称为评估（evals），有点重叠，但这到底应该叫什么呢？

查看原文

is theres any package that helps do like vitest style like quick sanity checks on the output of an llm that I can automate to see if I have regressed on smthin while changing my prompt.For example this agent for a realtor kept offering virtual viewings (even though that isnt a thing) instead of doing a handoff, (modified prompt for this) so a package where I can write a test so that, hey for this input, do not mention this or never mention those things. Or for certain inputs, always call this tool.Started engineering my own little utility for this, but before I dove deep and built my own package, wanted to see if something like this alr exists or if im heading down the wrong path here!p.s. not sure if this should be called evals, kinda overlapping but yeah what should this even be called?

大型语言模型的单元测试？