请问HN:如何从录制的音频中提取结构化信息?
嗨,HN,
我想从设备捕获的音频中提取结构化信息,设备不需要太贵(一个小型的语言模型是一个选项,我有一块旧的Nvidia 1660 Super显卡,6GB显存)。
可以使用OpenAI的Whisper将音频内容转化为文本,但我不太清楚如何可靠地以结构化的方式提取信息。总是会有一个“目的”,这个目的从大约10个可能的选项中选择,还有“所需数据”,这取决于目的,由键值对组成,并且这些键值对也有预定义的值。
一个例子(口语文本):
```
请申请从11月1日到11月8日的假期。
```
结果(结构化数据):
```
{
purpose: "申请假期",
data: {
start: "2025-11-01",
end: "2025-11-08"
}
}
```
我有哪些选项可以以可靠的方式做到这一点,以“最佳匹配”的方法将不同的目的与不同的数据相匹配?
查看原文
Hey HN,<p>I would like to extract structured information from captured audio on a device that is not too expensive (a small LLM would be an option, I got an old NVidia 1660 Super with 6GB VRAM).<p>OpenAI Whisper could be used to get the audio contents as text, but I don't really know how I could reliably extract the information in a structured way. There is always a "purpose", which is selected out of let's say 10 possible purposes and "required data", which is depending on the purpose and composed by key value pairs, that also have predefined values.<p>An example (spoken text):<p><pre><code> Please apply for leave from 1st November to 8th november.
</code></pre>
Result (structured data):<p><pre><code> {
purpose: "apply for leave",
data: {
start: "2025-11-01",
end: "2025-11-08"
}
}
</code></pre>
What are my options to do this in a reliable way that can match different purposes with different data by "best match" approach?