对大型语言模型(LLM)聊天记录导出进行匿名化/去标识化后处理
大家好,这是我的第一个问题!我一直在寻找很多用于匿名化提示数据的预处理工具,但我想知道是否有人知道可以用于后处理大型语言模型(LLM)聊天记录文件的工具。
我想进行一项研究,旨在更轻松地匿名化参与者的聊天记录,以便在我收到这些记录时,降低个人身份信息(PII)的风险。
我还需要添加的另一个步骤是删除讨论个人健康的聊天记录,或者总结讨论个人健康话题的聊天记录?我真的不太确定,所以在自己开发之前来这里询问一下!
查看原文
Hi all, this is my first question! I have been finding a lot of pre-processing tools to anonymize prompt data, but was wondering if anyone knew of tools that could be used in post-processing llm chat history files.<p>I want to conduct a study that strives to more easily anonymize the participant chat history so that when I receive it, it reduces PII risk.<p>another step I will need to add is just dropping chats that discuss personal health or rather summarizes chats that discuss topics of personal health? I really don't know, hence me asking here before just developing it on my own!