问HN:提交CSV数据给大型语言模型(LLMs)的好格式是什么?
我需要向一个大型语言模型提交大约1000行数据,以便我可以询问数据中的趋势。如果我使用JSON格式,我检查了GPT的分词器,每行大约需要40个token(因为每次都在引用表头,导致效率低下)。这意味着需要40,000个输入,这肯定会让我陷入上下文混乱(幻觉)的境地。我听说使用CSV格式会非常不准确。你有什么建议吗?
查看原文
I need to submit like 1000 rows of data to an llm so I can ask it for trends within the data. If I use json, I check gpt tokenizer and thats like 40 tokens per row(cuz headers were being referenced everytime leading to inefficiency). Meaning 40k input, which definitely would put me in context rot(hallucination) territory. And I heard using csv was very inaccurate. Any suggetions