发布 HN:Transload(YC P26)– 利用 CCTV 测量货物项目

7作者: nils_spatial17 天前原帖
嗨,HN——我们是Julius、Jago和Nils,我们正在创建transload(transload.io)。 transload帮助LTL(少量货物运输)卡车公司利用他们终端中已安装的安全摄像头来测量货物尺寸。我们不再需要通过专用的尺寸测量站发送货物,而是可以在货物通过正常的装卸流程时自动进行测量。 我们为HN特定的用户准备了一个小型演示网站:<a href="https:&#x2F;&#x2F;hn.transload.io&#x2F;">https:&#x2F;&#x2F;hn.transload.io&#x2F;</a> 在LTL运输中,货物尺寸至关重要,因为它们会影响定价、货物分类和拖车利用率。如果货物的实际尺寸大于发货人报告的尺寸,承运人可能会低估费用,但仍然占用相同的拖车空间。显而易见的解决方案是测量每一件货物,但在繁忙的货运终端,这实际上是相当困难的。专用的尺寸测量系统适用于通过它们的货物,但可能会增加叉车的行驶距离,造成装卸区拥堵,并改变正常的工作流程。实际上,许多终端只测量部分货物的尺寸。 Jago在家族的LTL运输和交叉装卸业务中长大,对这个行业非常熟悉。我们最初并不是为了构建货物尺寸测量系统而开始的。我们的第一个想法是开发一个用于优化交叉装卸终端内叉车路线的人工智能系统。在与客户交流并与50多家运输公司交谈后,我们意识到,叉车路线并不是人们反复提到的痛点,货物尺寸才是。 与此同时,我们看到空间人工智能技术正在迅速发展。单目测距技术已经显著提升,使得从普通摄像头视频中恢复准确的3D结构成为可能,而无需昂贵的激光雷达传感器。MapAnything(<a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;map-anything" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;map-anything</a>)和MoGe(<a href="https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;moge" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;moge</a>)就是两个例子。 货运终端的结构也很有帮助:固定的摄像头、重复的工作流程、条形码扫描时间戳和已知的布局。几乎每个仓库都已经安装了闭路电视。这让我们产生了一个简单的问题:如果我们能够利用现有的安全摄像头自动测量货物,完全在后台进行,会怎样呢?这将允许承运人在不改变装卸工作流程的情况下,测量每一件货物。 我们的系统主要有两个步骤:将条形码扫描与视频中的正确对象连接,然后估计该对象的实际尺寸。 装卸工人已经在正常工作流程中扫描货物。每次扫描都会给我们一个时间戳和一个处理单元ID。在这个时间戳附近,我们分析视频,以推断出哪个工人进行了扫描以及他们扫描了哪个货物。我们原本期待使用视觉语言模型(VLMs)来处理这个问题,但结果发现它们的可靠性远低于预期。因此,我们训练了自己的模型,该模型能够根据视线、身体方向和运动等线索进行3D推理。 这个关联步骤至关重要。一帧图像中可能包含数十个托盘、几个工人、叉车和部分隐藏的货物。如果我们将扫描与错误的对象关联起来,测量结果将毫无意义。 一旦我们确定了目标货物,我们就会对其进行分割,并从单目摄像头视角估计一个度量的3D边界框。在边界框拟合完成后,尺寸就很简单:长度、宽度、高度和体积都可以直接从中得出。 困难的部分是如何从一台普通的安全摄像头中精确拟合这个边界框。单张2D图像并不能直接告诉你物体的形状或比例,许多不同的3D边界框可以解释相似的图像证据。我们使用物体掩膜、可见边缘、地面接触、摄像头几何和来自终端的约束条件来找到最符合场景的3D边界框。 我们目前正在与几家LTL承运人合作。对于一位客户,大约10%的检查货物存在尺寸错误。第一个用例是收入恢复:识别尺寸不足的货物,附上视觉证据,帮助承运人纠正账单或分类。从长远来看,这些数据可以帮助承运人更好地理解拖车的利用率。 在LTL货运领域进行3D计算机视觉研究是一个奇特的领域,我们每周都会学到新东西。如果你在单目重建、3D物体检测、仓库感知或复杂的现实世界计算机视觉方面有经验,我们非常希望听到你的看法。关于货物、LTL终端或技术方法的问题也非常欢迎。
查看原文
Hi HN — we’re Julius, Jago, and Nils, and we’re building transload (transload.io).<p>transload helps LTL trucking companies measure freight dimensions using the security cameras already installed in their terminals. Instead of sending shipments through a dedicated dimensioning station, we measure them automatically as they move through the normal dock workflow.<p>We’ve put together a small HN-specific demo site here: <a href="https:&#x2F;&#x2F;hn.transload.io&#x2F;">https:&#x2F;&#x2F;hn.transload.io&#x2F;</a><p>In LTL trucking, dimensions matter because they affect pricing, freight classification, and trailer utilization. If a shipment is larger than the shipper reported, the carrier may undercharge for it while still giving up the same amount of trailer space. The obvious fix is to measure every shipment, but that is surprisingly hard in a busy freight terminal. Dedicated dimensioning systems work for freight that passes through them, but they can add forklift travel, create dock congestion, and change the normal flow of work. In practice, many terminals only measure a sample of their shipments.<p>Jago grew up close to this industry through his family’s LTL trucking and cross-docking business. We did not start out building freight dimensioning. Our first idea was an AI system for optimizing forklift routes inside cross-dock terminals. After spending time with customers and talking to more than 50 trucking companies, we realized that forklift routing was not the pain people kept bringing up. Freight dimensions were.<p>At the same time, we saw that spatial AI was advancing quickly. Monocular metric depth estimation has become dramatically better, making it possible to recover accurate 3D structure from ordinary camera footage without expensive LiDAR sensors. MapAnything (<a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;map-anything" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;map-anything</a>) and MoGe (<a href="https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;moge" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;moge</a>) are two examples.<p>Freight terminals also have helpful structure: fixed cameras, repeated workflows, barcode scan timestamps, and known layouts. Nearly every warehouse already has CCTV. That led us to a simple question: what if we could measure freight automatically using the existing security cameras, entirely in the background? That would allow carriers to measure every shipment without changing the dock workflow.<p>Our system has two main steps: connect a barcode scan to the right object in the video, then estimate that object’s dimensions in real-world units.<p>Dock workers already scan freight as part of the normal workflow. Each scan gives us a timestamp and a handling-unit ID. Around that timestamp, we analyze the video to infer which worker scanned and which shipment they scanned. We expected VLMs to handle this; they turned out to be far too unreliable. Instead, we train our own model that reasons in 3D over cues like gaze, body orientation, and movement.<p>That association step is critical. A frame can contain dozens of pallets, several workers, forklifts, and partially hidden freight. If we attach the scan to the wrong object, the measurement is useless.<p>Once we know the target shipment, we segment it and estimate a metric 3D bounding box from the monocular camera view. After the box is fitted, the dimensions are straightforward: length, width, height, and volume come directly from it.<p>The hard part is precisely fitting that bounding box from one ordinary security camera. A single 2D image does not directly tell you object shape or scale, and many different 3D boxes can explain similar-looking image evidence. We use the object mask, visible edges, floor contact, camera geometry, and constraints from the terminal to find the 3D box that best matches the scene.<p>We are currently working with several LTL carriers. For one customer, roughly 10% of checked shipments had dimension errors. The first use case is revenue recovery: identify under-dimensioned shipments, attach visual evidence, and help carriers correct the billing or classification. Longer term, the same data can help carriers understand trailer utilization better.<p>LTL freight is an odd place to be doing 3D computer vision, and we learn something new every week. If you’ve worked on monocular reconstruction, 3D object detection, warehouse perception, or messy real-world CV, we’d love your take. Questions about freight, LTL terminals, or the technical approach are very welcome too.