HackerNews中文版

大家好，这是我的第一个问题！我一直在寻找很多用于匿名化提示数据的预处理工具，但我想知道是否有人知道可以用于后处理大型语言模型（LLM）聊天记录文件的工具。我想进行一项研究，旨在更轻松地匿名化参与者的聊天记录，以便在我收到这些记录时，降低个人身份信息（PII）的风险。我还需要添加的另一个步骤是删除讨论个人健康的聊天记录，或者总结讨论个人健康话题的聊天记录？我真的不太确定，所以在自己开发之前来这里询问一下！

查看原文

Hi all, this is my first question! I have been finding a lot of pre-processing tools to anonymize prompt data, but was wondering if anyone knew of tools that could be used in post-processing llm chat history files.<p>I want to conduct a study that strives to more easily anonymize the participant chat history so that when I receive it, it reduces PII risk.<p>another step I will need to add is just dropping chats that discuss personal health or rather summarizes chats that discuss topics of personal health? I really don't know, hence me asking here before just developing it on my own!

对大型语言模型（LLM）聊天记录导出进行匿名化/去标识化后处理