展示HN:智能眼镜中的乐谱

10作者: kevinlinxc2 个月前原帖
大家好,我叫Kevin Lin,这是我关于乐谱智能眼镜项目的Show HN。我在周五发布的视频登上了首页:<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43876243">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43876243</a>,但Dang建议我们也做一个Show HN,所以我来了! 我一直想把乐谱放进智能眼镜里很久了,直到二月中旬,Mentra(YC W25)在推特上发布了他们举办的智能眼镜黑客马拉松的消息——获胜者可以带走一副眼镜。我参加了活动,和我的队友一起制作了许多与音乐相关的应用,结果我们获胜了,所以我带着眼镜回家,进一步完善了项目,并制作了一段很酷的视频(<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=j36u2i7PKKE" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=j36u2i7PKKE</a>)。 这副眼镜是Even Realities G1s。它们看起来很普通,但配备了两个麦克风,每个镜片都有一个屏幕,甚至可以根据处方定制。我遇到的每一个试戴过的人都对显示效果感到惊讶,而视频录制的效果却无法完全展现它们的优点。 软件运行在AugmentOS上,这是Mentra的智能眼镜操作系统,适用于多种第三方智能眼镜,包括G1s。制作应用程序时,我只需使用AugmentOS SDK编写并运行一个TypeScript文件。这为你提供语音转录和原始音频作为输入,文本或位图作为输出到屏幕,其他一切都被完全抽象化。你的眼镜与AugmentOS应用程序通信,然后该应用程序与你的TypeScript服务通信。 唯一困难的部分是创建一个Python脚本,将乐谱(MusicXML格式)转换为小型优化的位图以在屏幕上显示。首先,现有的音乐相关Python库文档相当薄弱,我遇到了多个前所未见的错误信息。缩小到眼镜屏幕的小尺寸也意味着音符的杆和五线谱线会消失,因此我想使用形态学膨胀来强调这些元素,而不使音符变得难以辨认。最终的处理流程是:MusicXML -> music21库渲染小节为png -> 使用opencv膨胀 -> 缩小 -> 使用Pillow转换为位图 -> 使用imagemagick优化位图。这远不是我写过的最好代码,但LLMs在这个任务上的尝试非常糟糕,我多年的Python经验在这里得到了充分展现。代码在GitHub上:<a href="https:&#x2F;&#x2F;github.com&#x2F;kevinlinxc&#x2F;AugmentedChords">https:&#x2F;&#x2F;github.com&#x2F;kevinlinxc&#x2F;AugmentedChords</a>。 将其整合起来,我的TypeScript服务在请求时本地提供这些位图。我搭建了一个用户界面,可以通过语音命令导航菜单和乐谱(例如:显示目录、下一页、选择、开始、退出、暂停),然后我将脚踏板连接到我的笔记本电脑。由于位图发送延迟(目前约为3秒,但未来的眼镜会更好),在演奏时使用脚踏板翻页并不可行,因此我让一个踏板切换自动滚动,另外两个踏板则加速/暂时暂停滚动。 经过多次调整,我终于能够仅用眼镜演奏完整的歌曲!这花了很多次尝试,确实还有很多改进的空间。例如:- 位图发送速度较慢,这就是为什么使用脚踏板翻页不可行的原因;- 分辨率较小,我希望能一次显示更多的小节,以便减少翻页频率;- 由于脚踏板不便携,能够有一种模式让音频决定乐谱的变化会很酷。我尝试用FFT实现这一点,但经常出错,需要更多的努力。头部倾斜控制也很酷,因为完全手动控制是练习的硬性要求。 所有这些痛点都在Mentra和其他竞争公司中得到解决,因此我非常期待下一代产品的到来!如果你有任何问题,请随时问我!
查看原文
Hi everyone, my name is Kevin Lin, and this is a Show HN for my sheet music smart glasses project. My video was on the front page on Friday: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43876243">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43876243</a>, but dang said we should do a Show HN as well, so here goes!<p>I’ve wanted to put sheet music into smart glasses for a long time, but the perfect opportunity to execute came in mid-February, when Mentra (YC W25) tweeted about a smart glasses hackathon they were hosting - winners would get to take home a pair. I went, had a blast making a bunch of music-related apps with my teammate, and we won, so I got to take them home, refine the project, and make a pretty cool video about it (<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=j36u2i7PKKE" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=j36u2i7PKKE</a>).<p>The glasses are Even Realities G1s. They look normal, but they have two microphones, a screen in each lens, and can be even made with a prescription. Every person I’ve met who tried them on was surprised at how good the display is, and the video recordings of them unfortunately don’t do them justice.<p>The software runs on AugmentOS, which is Mentra’s smart glasses operating system that works on various 3rd-party smart glasses, including the G1s. All I had to do to make an app was write and run a typescript file using the AugmentOS SDK. This gives you the voice transcription and raw audio as input, and text or bitmaps available as output to the screens, everything else is completely abstracted away. Your glasses communicate with an AugmentOS app, and then the app communicates with your typescript service.<p>The only hard part was creating a Python script to turn sheet music (MusicXML format) into small, optimized bitmaps to display on the screens. To start, the existing landscape of music-related Python libraries is pretty poorly documented and I ran into multiple never-before-seen error messages. Downscaling to the small size of the glasses screens also meant that stems and staff lines were disappearing, so I thought to use morphological dilation to emphasize those without making the notes unintelligible. The final pipeline was MusicXML -&gt; music21 library to render chunks of bars to png -&gt; dilate with opencv- &gt; downscale -&gt; convert to bitmap with Pillow -&gt; optimize bitmaps with imagemagick. This is far from the best code I’ve ever written, but the LLMs attempt at this whole task was abysmal and my years of Python experience really got to shine here. The code is on GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;kevinlinxc&#x2F;AugmentedChords">https:&#x2F;&#x2F;github.com&#x2F;kevinlinxc&#x2F;AugmentedChords</a>.<p>Putting it together, my typescript service serves these bitmaps locally when requested. I put together a UI where I can navigate menus and sheet music with voice commands (e.g. show catalog, next, select, start, exit, pause) and then I connected foot pedals to my laptop. Because of bitmap sending latency (~3s right now, but future glasses will do better), using foot pedals to turn the bars while playing wasn’t viable, so I instead had one of my pedals toggle autoscrolling, and the other two pedals sped up&#x2F;temporarily paused the scrolling.<p>After lots of adjustments, I was able to play a full song using just the glasses! It took many takes and there was definitely lots of room for improvement. For example: - Bitmap sending is pretty slow, which is why using the foot pedals to turn bars wasn’t viable; - The resolution is pretty small, I would love to put more bars in at once so I can flip less frequently; - Since foot pedals aren’t portable, it would be cool to have a mode where the audio dictates when the sheet music changes. I tried implementing that with FFT but it was often wrong and more effort is needed. Head tilt controls would be cool too, because full manual control is a hard requirement for practicing.<p>All of these pain points are being targeted by Mentra and other companies competing in the space, and so I’m super excited to see the next generation! Also, feel free to ask me anything!