启动 HN:Exa(YC S21)——将网络作为数据库

45作者: willbryk7 个月前原帖
大家好!我们是来自 Exa 的 Will 和 Jeff(<a href="https://exa.ai">https://exa.ai</a>)。我们最近推出了 Exa Websets,这是一款基于嵌入技术的搜索引擎,旨在准确返回您所询问的内容。您可以获得针对复杂查询的精确结果,例如“所有在旧金山的开源开发工具初创公司,成立于 2021-2025 年”。 演示视频请见这里 - <a href="https://youtu.be/Unt8hJmCxd4" rel="nofollow">https://youtu.be/Unt8hJmCxd4</a> 我们开始开发 Exa 是因为我们感到沮丧,尽管大型语言模型(LLM)的技术每周都在进步,但谷歌的搜索体验却逐渐变差。互联网曾经像一个神奇的信息门户,但如今在不断被推向 SEO 优化的点击诱饵时,这种感觉已经不复存在。 Websets 是朝着相反方向迈出的一步。对于每一次搜索,我们会在 Exa 的网络向量数据库上执行数十次嵌入搜索,以找到合适的搜索候选项,然后对每个结果运行代理工作流,以验证它们是否完全符合您的要求。 Websets 的结果之所以优秀,有两个原因。首先,我们为我们的主要搜索算法训练了定制的嵌入模型,而不是典型的关键词匹配搜索算法。我们的嵌入模型专门训练以返回您所请求的确切类型的实体。实际上,这意味着如果您搜索“从事纳米技术的初创公司”,基于关键词的搜索引擎会返回关于纳米技术初创公司的列表,因为这些列表与查询中的关键词匹配。相比之下,我们的嵌入模型返回的是实际的初创公司主页,因为这些主页与查询的含义相匹配。 第二个原因是 LLM 提供了验证每个结果所需的最后一公里智能。每个结果和数据片段都有支持性参考,帮助我们验证该结果是否确实符合您的搜索标准。这就是为什么 Websets 的运行时间可能需要几分钟甚至几小时,具体取决于您的查询和请求的结果数量。对于有价值的搜索查询,我们认为这是值得的。 值得注意的是,Websets 是表格,而不是列表。您可以添加“丰富”列,以获取有关每个结果的更多信息,例如“员工人数”或“作者是否有博客?”,并且单元格会异步加载。希望这种表格格式能让网络感觉更像一个数据库。 以下是一些可以使用 Websets 进行的搜索示例: “由非美国教师创建的数学博客” - <a href="https://websets.exa.ai/cma1oz9xf007sis0ipzxgbamn">https://websets.exa.ai/cma1oz9xf007sis0ipzxgbamn</a> “关于如何避免变换器中 O(n^2) 注意力问题的研究论文,其中第一作者的名字以“A”、“B”、“S”或“T”开头,并且是在 2018 年至 2022 年之间写的”: <a href="https://websets.exa.ai/cm7dpml8c001ylnymum4sp11h">https://websets.exa.ai/cm7dpml8c001ylnymum4sp11h</a> “总部位于美国的医疗保健公司,员工超过 100 人且有技术创始人”: <a href="https://websets.exa.ai/cm6lc0dlk004ilecmzej76qx2">https://websets.exa.ai/cm6lc0dlk004ilecmzej76qx2</a> “所有在湾区的程序员,具有初创公司经验,懂 Rust 并且之前发布过技术内容”: <a href="https://youtu.be/knjrlm1aibQ" rel="nofollow">https://youtu.be/knjrlm1aibQ</a> 您可以在 <a href="https://websets.exa.ai/">https://websets.exa.ai/</a> 尝试使用,API 文档在 <a href="https://docs.exa.ai/websets">https://docs.exa.ai/websets</a>。我们期待您的反馈!
查看原文
Hey HN! We’re Will and Jeff from Exa (<a href="https:&#x2F;&#x2F;exa.ai">https:&#x2F;&#x2F;exa.ai</a>). We recently launched Exa Websets, an embeddings-powered search engine designed to return exactly what you’re asking for. You can get precise results for complex queries like “all startups working on open-source developer tools based in SF, founded 2021-2025”. Demo here - <a href="https:&#x2F;&#x2F;youtu.be&#x2F;Unt8hJmCxd4" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;Unt8hJmCxd4</a><p>We started working on Exa because we were frustrated that while LLM state-of-the-art is advancing every week, Google has gotten worse over time. The Internet used to feel like a magical information portal, but it doesn’t feel that way anymore when you’re constantly being pushed towards SEO-optimized clickbait.<p>Websets is a step in the opposite direction. For every search, we perform dozens of embedding searches over Exa’s vector database of the web to find good search candidates, then we run agentic workflows on each result to verify they match exactly what you asked for.<p>Websets results are good for two reasons. First, we train custom embedding models for our main search algorithm, instead of typical keyword matching search algorithms. Our embeddings models are trained specifically to return exactly the type of entity you ask for. In practice, that means if you search “startups working in nanotech”, keyword-based search engines return listicles about nanotech startups, because these listicles match the keywords in the query. In contrast, our embedding models return actual startup homepages, because these startup homepages match the meaning of the query.<p>The second is that LLMs provide the last-mile intelligence needed to verify every result. Each result and piece of data is backed with supporting references that we used to validate that the result is actually a match for your search criteria. That’s why Websets can take minutes or even hours to run, depending on your query and how many results you ask for. For valuable search queries, we think this is worth it.<p>Also notably, Websets are tables, not lists. You can add “enrichment” columns to find more information about each result, like “# of employees” or “does author have blog?”, and the cells asynchronously load in. This table format hopefully makes the web feel more like a database.<p>A few examples of searches that work with Websets: “Math blogs created by teachers from outside the US” - <a href="https:&#x2F;&#x2F;websets.exa.ai&#x2F;cma1oz9xf007sis0ipzxgbamn">https:&#x2F;&#x2F;websets.exa.ai&#x2F;cma1oz9xf007sis0ipzxgbamn</a><p>&quot;research paper about ways to avoid the O(n^2) attention problem in transformers, where one of the first author&#x27;s first name starts with &quot;A&quot;,&quot;B&quot;, &quot;S&quot;, or &quot;T&quot;, and it was written between 2018 and 2022”: <a href="https:&#x2F;&#x2F;websets.exa.ai&#x2F;cm7dpml8c001ylnymum4sp11h">https:&#x2F;&#x2F;websets.exa.ai&#x2F;cm7dpml8c001ylnymum4sp11h</a><p>“US based healthcare companies, with over 100 employees and a technical founder&quot;: <a href="https:&#x2F;&#x2F;websets.exa.ai&#x2F;cm6lc0dlk004ilecmzej76qx2">https:&#x2F;&#x2F;websets.exa.ai&#x2F;cm6lc0dlk004ilecmzej76qx2</a><p>“all software engineers in the Bay Area, with experience in startups, who know Rust and have published technical content before”: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;knjrlm1aibQ" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;knjrlm1aibQ</a><p>You can try it at <a href="https:&#x2F;&#x2F;websets.exa.ai&#x2F;">https:&#x2F;&#x2F;websets.exa.ai&#x2F;</a> and API docs are at <a href="https:&#x2F;&#x2F;docs.exa.ai&#x2F;websets">https:&#x2F;&#x2F;docs.exa.ai&#x2F;websets</a>. We’d love to hear your feedback!