展示HN:我建立了一个独立于Google/Bing的2B页面搜索引擎
嗨,HN,
在过去的18个月里,我一直在独自从零开始构建一个完全独立的搜索引擎。今天,我将其开放进行beta测试,非常希望能得到你们的反馈。
该项目支持两个公共网站,使用同一个20亿页面的索引:
- Searcha.Page:一个具有会话意识的搜索引擎,使用持久的浏览器密钥(而非Cookie)以提供更好的上下文。
- Seek.Ninja:一个100%无状态、以隐私为首的版本,完全不使用任何标识符。
整个技术栈自托管在我洗衣房里一台价值约4000美元的裸金属EPYC服务器上(没有云服务,也没有风险投资资金)。搜索管道采用混合模型,使用传统的词汇索引进行重负载处理,并利用轻量级的LLM(大语言模型)处理特定任务,如查询扩展和重新排序。这是一个关于资本效率和数字主权的实验——证明你不需要大型科技公司的API就能竞争。
我希望能得到关于搜索结果相关性、速度以及隐私模型清晰度的反馈。请试用一下,并告诉我你的想法。
链接:
[https://searcha.page](https://searcha.page)
[https://seek.ninja](https://seek.ninja)
谢谢,
Ryan
查看原文
Hi HN,
For the last 18 months, I've been working solo on building a completely independent search engine from scratch. Today, I'm opening it up for beta testing and would love to get your feedback.
The project powers two public sites from the same 2-billion-page index:
Searcha.Page: A session-aware search engine that uses a persistent browser key (not a cookie) for better context.
Seek.Ninja: A 100% stateless, privacy-first version with no identifiers at all.
The entire stack is self-hosted on a single ~$4k bare-metal EPYC server in my laundry room (no cloud, no VC funding). The search pipeline is a hybrid model, using a traditional lexical index for the heavy lifting and lightweight LLMs for specific tasks like query expansion and re-ranking. It's an experiment in capital efficiency and digital sovereignty—proving you don't need Big Tech APIs to compete.
I’m looking for feedback on search result relevance, speed, and the clarity of the privacy models. Please try it out and let me know what you think.
Links:
<a href="https://searcha.page" rel="nofollow">https://searcha.page</a>
<a href="https://seek.ninja" rel="nofollow">https://seek.ninja</a>
Thanks,
Ryan