请问HN:如何从录制的音频中提取结构化信息?

1作者: sandreas5 个月前原帖
嗨,HN, 我想从设备捕获的音频中提取结构化信息,设备不需要太贵(一个小型的语言模型是一个选项,我有一块旧的Nvidia 1660 Super显卡,6GB显存)。 可以使用OpenAI的Whisper将音频内容转化为文本,但我不太清楚如何可靠地以结构化的方式提取信息。总是会有一个“目的”,这个目的从大约10个可能的选项中选择,还有“所需数据”,这取决于目的,由键值对组成,并且这些键值对也有预定义的值。 一个例子(口语文本): ``` 请申请从11月1日到11月8日的假期。 ``` 结果(结构化数据): ``` { purpose: "申请假期", data: { start: "2025-11-01", end: "2025-11-08" } } ``` 我有哪些选项可以以可靠的方式做到这一点,以“最佳匹配”的方法将不同的目的与不同的数据相匹配?
查看原文
Hey HN,<p>I would like to extract structured information from captured audio on a device that is not too expensive (a small LLM would be an option, I got an old NVidia 1660 Super with 6GB VRAM).<p>OpenAI Whisper could be used to get the audio contents as text, but I don&#x27;t really know how I could reliably extract the information in a structured way. There is always a &quot;purpose&quot;, which is selected out of let&#x27;s say 10 possible purposes and &quot;required data&quot;, which is depending on the purpose and composed by key value pairs, that also have predefined values.<p>An example (spoken text):<p><pre><code> Please apply for leave from 1st November to 8th november. </code></pre> Result (structured data):<p><pre><code> { purpose: &quot;apply for leave&quot;, data: { start: &quot;2025-11-01&quot;, end: &quot;2025-11-08&quot; } } </code></pre> What are my options to do this in a reliable way that can match different purposes with different data by &quot;best match&quot; approach?