展示HN:尝试将聊天机器人嵌入二维码的第一天

1作者: kuberwastaken2 天前原帖
第1天的图片:<a href="https://i.imgur.com/bQ3Oxc5.png" rel="nofollow">https://i.imgur.com/bQ3Oxc5.png</a> <p>在上次尝试将《毁灭战士》放入二维码之后(<a href="https://news.ycombinator.com/item?id=43729683">https://news.ycombinator.com/item?id=43729683</a>),我想继续这个“系列”,将一个真正不错的聊天机器人放入二维码中。</p> <p>当然,这并不像前者那么简单。我可以选择作弊,制作一个基于规则的ELIZA风格的聊天机器人(我之前确实尝试过),但我想做一些真正有用的东西。我对大型语言模型(LLMs)和变换器(Transformers)的基本工作原理知之甚少,因此这也将让我学到很多关于人工智能的知识(另外,当它真正变得有趣时,也会公开和开源)。</p> <p>以下是我们的限制条件:</p> <p>最大的标准二维码(版本40)可以容纳2,953字节(约2.9 KB)。这个容量非常小——一个15分之一秒的Windows音频文件就有11 KB!此外,我们不能直接将HTML/JS放入二维码中,我们需要将其压缩为BASE64(或BigInt),这也占用0.1-0.15 KB,因此我们大约只有2.7 KB的空间,真是让人头疼!</p> <p>这是我在第1天所做的:</p> <p>第一个版本(v0)非常基础——一个简单的模式匹配聊天机器人,带有预定义的响应:</p> ``` const V = "you,I,is,are,do,what,how,why,,...e".split(","); const P = [ [5,2,0,8], // 你喜欢什么 [5,4,0,8], // 你喜欢做什么.... [0,8,15,9] // 你喜欢我认为 ]; ``` <p>(v1)增加了更好的CSS(仍然是浅色主题)、主题记忆、情感分析和过渡模式,但这使得文件大小略微超过4 KB。</p> <p>(v2)是v1的版本,经过更多压缩,虽然丢失了一些功能,但缩小到了2.8 KB。</p> <p>(v3)增加了复古UI,因为这似乎很合适,ASCII艺术和简化的文本格式化带有换行,但它仍然非常简单。(v4)和(v5)进行了更多的删减,勉强让它低于限制(2.85 KB)。</p> <p>因此,我改变了(v6)的思路,采用了一个字典树数据结构来查找响应:</p> ``` const t={h:{e:{l:{l:{o:["Hello! How can I help you today?","Hi! What's on your mind?"]}}}}}; ``` <p>这使得在我们的限制条件下可以进行前缀匹配,并且不再需要模式匹配。</p> <p>(v7)试图优化它,但最终仍然在3.3 KB左右,虽然比之前好,但仍然不够“智能”。</p> <p>对于(v8),我花了很多时间,转向了一个非常基础的两层神经网络实现:</p> ``` const network = { embeddings: new Float32Array(c.vSize * c.eDim), hidden: new Float32Array(c.eDim * c.hSize), output: new Float32Array(c.hSize * c.oSize), hiddenBias: new Float32Array(c.hSize), outputBias: new Float32Array(c.oSize) }; ``` <p>这给我们提供了一个582个字符的神经网络,经过8位量化,但正如你所预料的那样,这个体积巨大,约11 KB。</p> <p>(v9)和(v10)基本上是进一步压缩,缩小到约3.2 KB,不算差!</p> <p>我今天最后一个工作的版本是(v10.5)。我使用了基于词的处理,而不是基于字符的处理,采用了4D向量、上下文感知的模板响应、更好的状态跟踪和8个输出维度。还添加了重复惩罚(目前有点问题),但实际上效果还不错……5.3 KB的效果。</p> <p>对于第2天,我在考虑:</p> 1. 实现更好的上下文处理 2. 进一步优化神经网络架构(也许是一个小型变换器?) 3. 也许找到进一步压缩的方法?</p> <p>资源:</p> <a href="https://www.youtube.com/watch?v=aircAruvnKk" rel="nofollow">https://www.youtube.com/watch?v=aircAruvnKk</a> <a href="https://www.youtube.com/watch?v=zhxNI7V2IxM&t=275s" rel="nofollow">https://www.youtube.com/watch?v=zhxNI7V2IxM&t=275s</a> <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a> <a href="https://github.com/lionelmessi6410/Neural-Networks-from-Scratch">https://github.com/lionelmessi6410/Neural-Networks-from-Scratch</a>
查看原文
Image for day 1: <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;bQ3Oxc5.png" rel="nofollow">https:&#x2F;&#x2F;i.imgur.com&#x2F;bQ3Oxc5.png</a><p>After I tried to fit DOOM inside a QR code last time (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43729683">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43729683</a>), I&#x27;m trying to continue this &quot;series&quot; to get an actually decent chatbot into a QR code.<p>This is, of course, not as easy as the former. I could always cheat and make a rule-based ELIZA style chatbot (that I actually dabbled with earlier) but I want to make something actually somewhat useful. I know quite little about how LLMs and Transformers fundamentally work so this will also teach me a lot about AI (also, will be public and Open Source when it actually turns into something somewhat cool)<p>Here&#x27;s our limitations: The largest standard QR code (Version 40) holds 2,953 bytes (~2.9 KB). This is very small—a Windows sound file of 1&#x2F;15th of a second is 11 KB! PLUS, we can&#x27;t directly dump HTML&#x2F;JS into the QR code, we need to compress it to BASE64 (or BigInt) which takes up 0.1-0.15Kb as well, so we have about 2.7Kb for the entire thing, yikes!<p>Here&#x27;s what I did for day 1:<p>The first version (v0) was incredibly basic - a simple pattern-matching chatbot with predefined responses:<p>``` const V = &quot;you,I,is,are,do,what,how,why,,...e&quot;.split(&quot;,&quot;); const P = [ [5,2,0,8], &#x2F;&#x2F; what is you like [5,4,0,8], &#x2F;&#x2F; what do you like.... [0,8,15,9] &#x2F;&#x2F; you like me think ]; ```<p>(v1) added better CSS (still light theme), topic memory, sentiment analysis and transition patterns, but all this made the file size a bit over 4kb.<p>(v2) was v1 with more compression, lost features but shrank to 2.8kb.<p>(v3) added a retro UI because it seemed fitting, ASCII art and simplified text formatting with newlines, but it was still extremely dumb. (v4) and (v5) added more cuts to barely get it below the limit (2.85kb).<p>So I changed the approach for (v6) and went for a trie data structure for response lookups: ``` const t={h:{e:{l:{l:{o:[&quot;Hello! How can I help you today?&quot;,&quot;Hi! What&#x27;s on your mind?&quot;]}}}}}; ```<p>This allowed for prefix matching under our constraints AND there was no need for pattern matching.<p>(v7) was trying to optimise it, but it still ended up being around 3.3kb, better than before but still not very &quot;intelligent&quot;.<p>For (v8), I took a lot of time and switched to a very basic implementation of a 2 layered neural network: ``` const network = { embeddings: new Float32Array(c.vSize * c.eDim), hidden: new Float32Array(c.eDim * c.hSize), output: new Float32Array(c.hSize * c.oSize), hiddenBias: new Float32Array(c.hSize), outputBias: new Float32Array(c.oSize) }; ```<p>This gives us a 582 char neural network that&#x27;s 8 bit quantized but, as you would expect, this was huge, about 11kb.<p>(v9) and (v10) were basically minifying this further, down to about 3.2kb, not bad!<p>The last version I worked on today was (v10.5). I used word level processing instead of character level with 4D vectors, template responses with context awareness, better state tracking and 8 output dimensions. Also added a repetition penalty (currently a little broken) but is actually kind of good... 5.3kb good.<p>For Day 2, I&#x27;m thinking: 1. Implement better context handling 2. Optimize the neural architecture further (maybe a tiny transformer?) 3. Maybe find a way to compress it even more?<p>Resources: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=aircAruvnKk" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=aircAruvnKk</a> <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=zhxNI7V2IxM&amp;t=275s" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=zhxNI7V2IxM&amp;t=275s</a> <a href="https:&#x2F;&#x2F;github.com&#x2F;rasbt&#x2F;LLMs-from-scratch">https:&#x2F;&#x2F;github.com&#x2F;rasbt&#x2F;LLMs-from-scratch</a> <a href="https:&#x2F;&#x2F;github.com&#x2F;lionelmessi6410&#x2F;Neural-Networks-from-Scratch">https:&#x2F;&#x2F;github.com&#x2F;lionelmessi6410&#x2F;Neural-Networks-from-Scra...</a>