HackerNews中文版

好奇这里的人们在调整AI技能/提示/工作流程时如何评估变化。<p>很多时候，某个调整在一两个案例中可能感觉更好，但很难判断它是否真的整体上改善了技能，还是只是以某种方式改变了其行为，使其在短期内看起来更好。<p>你们主要依靠直觉，还是有一些轻松的方法来检查这个调整是否真的有帮助？

查看原文

Curious how people here evaluate changes when they tweak an AI skill / prompt / workflow.<p>A lot of the time, a tweak might feel better in one or two cases, but it’s hard to tell if it actually improved the skill overall or just changed its behavior in a way that looks better for a bit.<p>Do you mostly go by intuition, or do you have some lightweight way to check if a tweak really helped?

请问HN：你如何判断对你的AI技能所做的调整是否提高了其表现？