HackerNews中文版

我一直在使用wandb，并且研究过neptune ai和一些开源替代品，但我始终觉得协作和版本控制（例如将代码快照与训练运行关联等）显得笨拙。我还在想，如果能对我的长时间运行进行某种监控，以便在满足特定条件时提醒我，甚至能够远程停止或重新启动运行并修改超参数（进行潜在的自主操作），那该多好，比如通过手机来操作。我很好奇你们在这些（以及类似的）AI开发平台/可观测性层上的经验，以及你们发现现有解决方案中缺乏什么或有什么不满（如果有的话）。我发现这个研究过程非常痛苦，不知道这是否只是我一个人的感受。

查看原文

I’ve been using wandb quite a bit and looked into neptune ai and some open source alternatives, but I’ve always felt that collaboration and version control (e.g. associating code snapshots with training runs etc) is clunky. I was also thinking it’d be nice to have some kind of monitoring on my longer runs to alert me on certain criteria, or even be able to stop or restart a run with hyperparam modifications remotely (take potentially agentic actions), like from my phone.<p>I was curious what all of your experiences have been with these (and similar) AI developer platforms / observability layers and what you’ve found lacking or gripes you have with the existing solutions (if anything). I've found the research process extremely painful and was wondering if this was just me.

研究工具体验