问HN:你如何搜索个人数据?
我有近20年的工作记录,包括个人笔记、通信、代码和文档。这些内容分散在多个(云)服务中,跨这些领域进行搜索变得不切实际。
问题是这样的:“啊,我记得和某人谈论过[算法],然后记录了一个重要的见解。我们来找找这个。”
这不是一个大型语言模型(LLM)能够解决的问题。阻碍在于没有办法在所有这些纯文本中运行搜索代码。
服务包括:
* 电子邮件(Gmail,已与我的macOS磁盘通过Apple Mail同步)
* Dropbox
* Notion
* Google Drive
* Obsidian
* Github
* Apple Notes
* Discord聊天
* Trello
* 我自己的博客
如果我把所有内容都同步到我的Mac磁盘,也许我可以在那里进行纯文本搜索。然而,Spotlight的索引总是不完整,常常漏掉明显的文件。我的Dropbox太大了,所以我并没有将其全部本地同步。
有些服务我已经不再使用,比如Evernote。当我归档这个服务时,我导出了所有内容并将其移动到我的Dropbox中。因此,如果我搜索Dropbox,它也会搜索Evernote中的旧笔记。我不可能对我正在积极使用的所有服务都这样做。
我现在的搜索方式是猜测结果最有可能在哪个服务中,然后在那里搜索。当没有结果时,我就搜索下一个最有可能的服务,反复进行。
对于我自己的博客,我曾经使用Google的站内搜索,但我最近发现这个搜索不完整:https://bsky.app/profile/dustinfreeman.bsky.social/post/3m5l5tto6pk27
我可以想象一个解决方案,即有一个第三方服务能够访问我所有服务的访问密钥。但是,现实是,这需要巨大的信任。此外,我对所有这些服务的访问都需要双重身份验证,并且有有效期,因此我需要不断地重新授权给这个第三方服务。在这种情况下,继续按照我现在的方式进行搜索就显得更有意义。
查看原文
I have personal notes, correspondence, code and documentation from nearly 20 years of work. These are spread across multiple (cloud) services, and searching across these fiefdoms has been impractical.<p>The problem goes like: "Ah, I remember having a conversation with someone about [algorithm], then recording an important insight. Let's find that."<p>This isn't a problem solved by an LLM. The blocker is that there isn't a way to run search code on all this plain text.<p>Services:
* Email (gmail, synced to my macOS disk with Apple Mail)
* Dropbox
* Notion
* Google Drive
* Obsidian
* Github
* Apple Notes
* Discord chats
* Trello
* My own blog<p>If I had everything synced to my mac's disk, maybe I could do a plaintext search there. However Spotlight's indexing is always incomplete and misses obvious files. My Dropbox is so large I don't sync it all locally.<p>Some services I no longer use, like Evernote. When I archived this service, I exported everything and moved it into my Dropbox. So, if I search Dropbox, it also searches old notes from Evernote. There's no way I could be doing this for all services I actively use.<p>The way I search now is I guess the service the result is most likely in, and search there. When finding no results, I search the next most likely service, ad nauseum
For my own blog, I used to use Google's site search, but I recently discovered this was incomplete: https://bsky.app/profile/dustinfreeman.bsky.social/post/3m5l5tto6pk27<p>I could imagine a solution where there's some 3rd party service that has access keys to all my services. But, let's be real, that's a huge amount of trust. Also, my access to all these services is 2FA'd with expiry, and so I'd be continually re-upping auth to this third party service. At that point, it makes sense to just do search how I do it now.