HackerNews中文版

嗨，HN，我想从设备捕获的音频中提取结构化信息，设备不需要太贵（一个小型的语言模型是一个选项，我有一块旧的Nvidia 1660 Super显卡，6GB显存）。可以使用OpenAI的Whisper将音频内容转化为文本，但我不太清楚如何可靠地以结构化的方式提取信息。总是会有一个“目的”，这个目的从大约10个可能的选项中选择，还有“所需数据”，这取决于目的，由键值对组成，并且这些键值对也有预定义的值。一个例子（口语文本）： ``` 请申请从11月1日到11月8日的假期。 ``` 结果（结构化数据）： ``` { purpose: "申请假期", data: { start: "2025-11-01", end: "2025-11-08" } } ``` 我有哪些选项可以以可靠的方式做到这一点，以“最佳匹配”的方法将不同的目的与不同的数据相匹配？

查看原文

Hey HN,I would like to extract structured information from captured audio on a device that is not too expensive (a small LLM would be an option, I got an old NVidia 1660 Super with 6GB VRAM).OpenAI Whisper could be used to get the audio contents as text, but I don't really know how I could reliably extract the information in a structured way. There is always a "purpose", which is selected out of let's say 10 possible purposes and "required data", which is depending on the purpose and composed by key value pairs, that also have predefined values.An example (spoken text):<pre><code> Please apply for leave from 1st November to 8th november. </code></pre> Result (structured data):<pre><code> { purpose: "apply for leave", data: { start: "2025-11-01", end: "2025-11-08" } } </code></pre> What are my options to do this in a reliable way that can match different purposes with different data by "best match" approach?

请问HN：如何从录制的音频中提取结构化信息？