HackerNews中文版

问题：大型语言模型（LLMs）在理解电子商务网站方面表现不佳。它们：从杂乱的HTML中虚构价格和规格在用户界面模板（如标题、弹窗、广告）上浪费令牌在实时库存和价格更新方面遇到困难我们的解决方案：基于Answer.AI的llms.txt进行分支，推出site-llms.xml，这是一种用于产品数据的XML网站地图协议。商家提供： /site-llms.xml：所有产品URL的索引 /product/123/llms.txt：包含规格和价格的干净Markdown（示例见代码库）好处： AI获取结构化数据，而不是进行抓取商家可以控制暴露的信息（类似于robots.txt）可扩展到数百万个产品（支持网站地图索引）我们将以CC BY-SA协议开源此项目（与网站地图协议相同）。欢迎HN的意见：这是正确的抽象吗？它能适用于非电子商务网站吗？代码库：github.com/Lumigo-AI/site-llms（欢迎点赞！）

查看原文

The Problem: LLMs are terrible at understanding eCommerce sites. They: Hallucinate prices/specs from messy HTML Waste tokens on UI boilerplate (headers, popups, ads) Struggle with real-time inventory/pricing updatesOur solution: A fork of Answer.AI’s llms.txt that introduces site-llms.xml, an XML sitemap protocol for product data.Stores expose: /site-llms.xml: Index of all product URLs /product/123/llms.txt: Clean Markdown with specs/pricing (example in repo)Benefits: AI gets structured data instead of scraping Stores control what’s exposed (like robots.txt) Scales to millions of products (sitemap indexes supported)We’re open-sourcing this under CC BY-SA (same as sitemap protocol). Would love HN’s thoughts:Is this the right abstraction? Could it work for non-eCommerce sites?Repo: github.com/Lumigo-AI/site-llms (stars welcome!)

可扩展的清洁电子商务数据标准（Llms.txt的分支）