Sophia NLU引擎 – AI代理开发者 – 理解您的用户
如果你对人工智能代理感兴趣,可能会发现理解用户的表达是一项挑战。你基本上只能选择向像ChatGPT这样的语言模型发送请求,获取JSON对象,或者使用像NLTK、SpaCy、Rasa等复杂且庞大的Python实现。
最新版本的开源Sophia自然语言理解(NLU)引擎刚刚发布,详细信息包括在线演示,请访问:
https://cicero.sh/sophia/
该引擎使用Rust开发,最大的特点是其自给自足且轻量化的特性。没有外部依赖或API调用,处理速度约为每秒20,000个单词,提供两种不同的词汇数据存储——基础版本为79MB,包含145,000个单词,而完整词汇版本为177MB,包含914,000个单词。与那些需要多吉字节安装的Python系统相比,这是一项巨大的提升,后者的处理速度最多也只能达到每秒300个单词。
Sophia引擎内置了词性标注器、命名实体识别、短语解释器、指代消解、拼写错误自动更正、多层次分类系统,允许你轻松将词汇集映射到动作等。它还配备了一个友好的本地RPC服务器,方便你通过任何编程语言运行,具体实现页面提供了代码示例。
不幸的是,由于数据中名词偏重,词性标注器仍然存在一些小问题。该引擎在229百万个标记上进行训练,使用了4个词性标注器中的3个共识得分,但基于PyTorch的标注器表现不佳。不过,这些问题在一周内都可以轻松修复,感兴趣的话可以查看这里的问题和解决方案:
https://cicero.sh/forums/thread/sophia-nlu-engine-v1-0-released-000005#p6
目前正在开发高级上下文感知升级,预计几周内发布,这将是一次巨大的提升,使其能够区分例如“访问google.com”、“访问马克的想法”、“访问商店”、“拜访我的父母”等指令。同时,还将推出更先进的混合短语解释器,以及将分类系统转换为向量评分,以便更好地聚类和细化词汇。
NLU引擎本身是免费的开源软件,Github和crates.io的链接可以在网站上找到。然而,不得不采用典型的双重许可模式,并提供高级许可证,因为生活总是喜欢和我开玩笑。目前资金紧张,不想多谈。如果感兴趣,可以听听这段6分钟的音频介绍:
https://youtu.be/bkpuo1EtElw
我需要一些进展,因为目前只有RTX 3050用于计算,无法修复词性标注器。给你一个优惠。目前的高级价格约为未来上下文感知升级发布后价格的三分之一。
现在就获取副本,立即访问带有SDK的二进制应用程序,新的词汇数据存储将在一周内发布,修复后的词性标注器将开源,几周后将推出上下文感知升级,这将是一次巨大的改进,届时价格将上涨三倍。此外,我保证会尽一切努力确保Sophia成为全球领先的NLU引擎。
如果你对部署任何类型的人工智能代理感兴趣,这将是你工具箱中的一款优秀工具。与其向ChatGPT请求JSON对象并得到不可预测的结果,不如使用这个自给自足的小工具,它驻留在你的服务器上,速度极快,每次都能产生相同可靠且可预测的结果,所有数据都保留在本地并对你保密,没有每月的API费用。这是一个不错的交易。
此外,这也是为了一个优秀的事业。你可以在“起源与最终目标”帖子中阅读Cicero项目的完整宣言:
https://cicero.sh/forums/thread/cicero-origins-and-end-goals-000004
如果你能看到这里,感谢你的倾听。如有需要,请随时通过matt@cicero.sh直接联系我,我很乐意与你交流,必要时可以通话等。
有关Sophia的完整信息,包括开源下载,请访问:https://cicero.sh/sophia/
查看原文
If you're into AI agents, you've probably found it's a struggle to figure out what the user's are saying. You're essentially stuck either pinging a LLM like ChatGPT and asking for a JSON object, or using a bulky and complex Python implementation like NLTK, SpaCy, Rasa, et al.<p>Latest iteration of the open source Sophia NLU (natural language understanding) engine just dropped, with full details including online demo at:
https://cicero.sh/sophia/<p>Developed in Rust with key differential being it's self contained and lightweight nature. No external dependencies or API calls, Processes about 20,000 words/sec, and two different vocabulary data stores -- base is simple 79MB and has 145k words while the full vocab is 177MB with 914k words. This is a massive boost compared to the Python systems out there which are multi gigabyte installs, and process at best 300 words/sec.<p>Has a built-in POS tagger, named entity recognition, phrase interpreter, anaphora resolution, auto correction of spelling typos, multi-hierarchical categorization system allowing you to easily map clusters of words to actions, etc. Nice localhost RPC server allowing you to easily run via any programming language, and see Implementation page for code examples.<p>Unfortunately, still slight issues with POS tagger due to noun heavy bias in data. Was trained on 229 million tokens using 3 of 4 consensus score across 4 POS taggers, but PyTorch based taggers are terrible. No matter, all easily fixable within a week, details of problem and solution here if interested:
https://cicero.sh/forums/thread/sophia-nlu-engine-v1-0-released-000005#p6<p>Advanced contextual awareness upgrade in the works and should be out within a few weeks hopefully, which will be massive boost and allow it to differentiate for example, "visit google.com", "visit Mark's idea", "visit the store", "visit my parents", etc. Will also have much more advanced hybrid phrase interpreter, along with categorization system being flipped into vector scoring for better clustering and granular filtering of words.<p>NLU engine itself free and open source, Github and crates.io links available on site. However, no choice but to do typical dual license model and also offer premium licenses because life likes to have fun with me. Currently out of runway, not going to get into myself. If interested, quick 6 min audio giving intro / back story at:
Https://youtu.be/bkpuo1EtElw<p>Need something to happen as only have RTX 3050 for compute, not enoguh to fix POS tagger. Make you a deal. Current premium price is about a third of what it will be once contextual awareness upgrade released.<p>Grab copy now, get instant access to binary app with SDK, new vocab data store in a week with fixed POS tagger open sourced, then in few weeks contextual awareness upgrade which will be massive improvement at which point price will triple, plus my guarantee will do everything in my power to ensure Sophia becomes the defact world leading NLU engine.<p>If you're into deploying AI agents of any kind, this is an excellent tool in your kit. Instead of pinging ChatGPT for JSON objects and getting unpredictable results, this is a nice, self contained little package that resides on your server, blazingly fast, produces the same reliable and predictable results each time, all data stays local and private to you, and no monthly API bills. It's a sweet deal.<p>Besides, it's for an excellent cause. You can read full manifest of Cicero project in "Origins and End Goals" post at:
https://cicero.sh/forums/thread/cicero-origins-and-end-goals-000004<p>If you made it this far, thanks for listening. Feel free to reach out directly at matt@cicero.sh and happy to engage, get you on the phone if desired, et al.<p>Full details on Sophia including open source download at: https://cicero.sh/sophia/