OSUniverse:构建更好的操作系统世界
大家好,
我们很高兴发布一个新的计算机使用基准测试。我们最初并没有计划开发一个基准测试,但发现当前的OSWorld环境非常难以使用,许多测试结果也存在问题。
OSUniverse旨在简单易用,只需安装Docker,并且可以通过一条命令运行。它提供了多个测试级别,复杂性逐渐增加,并且易于扩展。
我们已经对所有顶级代理进行了基准测试。随着新的图形用户界面代理的发布,我们将继续更新它们的性能。
祝您使用愉快!
查看原文
Hey all,<p>We are happy to release a new benchmark for computer use. We didn’t set out to build a benchmark but found the current state of OSWorld to be very challenging to work with and numerous tests were faulty.<p>OSUniverse aims to be dead simple to use, it only requires docker and can run in a single command. It offers test levels that increase in complexity and are easy to extend.<p>We have benchmarked all the top agents. As new GUI agents are released we will continue to update their performance.<p>Enjoy!