大型语言模型的单元测试?

1作者: simantakDabhade2 个月前原帖
有没有什么包可以帮助我像 Vitest 一样快速检查 LLM 输出的正确性,以便我能够自动化检测在更改提示时是否出现了回归问题?<p>例如,这个房地产代理的助手一直在提供虚拟看房(尽管这并不存在),而不是进行交接(我为此修改了提示),所以我希望有一个包可以让我编写测试,比如,对于这个输入,不要提到这个或永远不要提到那些事情。或者对于某些输入,始终调用这个工具。<p>我开始自己开发一个小工具,但在深入构建自己的包之前,想看看是否已经有类似的东西存在,或者我是否走错了方向!<p>附言:不确定这是否应该称为评估(evals),有点重叠,但这到底应该叫什么呢?
查看原文
is theres any package that helps do like vitest style like quick sanity checks on the output of an llm that I can automate to see if I have regressed on smthin while changing my prompt.<p>For example this agent for a realtor kept offering virtual viewings (even though that isnt a thing) instead of doing a handoff, (modified prompt for this) so a package where I can write a test so that, hey for this input, do not mention this or never mention those things. Or for certain inputs, always call this tool.<p>Started engineering my own little utility for this, but before I dove deep and built my own package, wanted to see if something like this alr exists or if im heading down the wrong path here!<p>p.s. not sure if this should be called evals, kinda overlapping but yeah what should this even be called?