问HN:有没有人在使用图像模型进行生产级的图像编辑?是如何实现的?
嘿,HN——我正在开发一款应用,用户可以上传“现实生活”中的服装照片(例如,一件皱巴巴的衬衫折叠在地板上)。目标是将这张单一照片转换为干净的电商风格服装图片。
一个关键的用户体验需求是:输出需要是带透明度(alpha)的PNG格式,以便我们能够一致地将服装裁剪/合成到一个固定的用户界面中(如卡片、服装布局等)。可以想象成“主体剪裁后干净地放入模板”。
我目前的工作流程如下:
1. 用户上传的照片(背景杂乱,角度奇怪)
2. 用户上传的照片与“查询”图像(风格目标)匹配(目前使用Nano Banana)
3. 背景去除模型以获取透明度并保存为RGBA PNG
这个流程是可行的,但感觉有些hacky,并且偶尔会引入边缘伪影。此外,生成模型有时会创造出阴影/背景线索,这会干扰背景去除步骤。感觉这两个步骤在相互抵触。
我想了解在这种工作流程中,什么样的“好”表现是理想的:
人们是否仍然将生成/编辑与单独的背景去除作为标准流程?
你们中有谁在生产中使用alpha原生生成(RGBA输出)?如果有,使用的技术栈/工作流程是什么?
如果你们专门做过“杂乱的用户生成内容照片 → 目录资产”:最常出现的问题是什么,解决方案又是什么?
我并不想听到供应商的推销——我主要想了解人们正在使用的实际模式(开源工作流程、模型类别、ComfyUI/SD管道、基于API的技术栈等)。如果需要,我很乐意分享更多细节。
查看原文
Hey HN — I’m building an app where users upload “real life” clothing photos (ex. a wrinkly shirt folded on the floor). The goal is to transform that single photo into a clean, ecommerce-style image of the garment.<p>One key UX requirement: the output needs to be a PNG with transparency (alpha) so we can consistently crop/composite the garment into an on-rails UI (cards, outfit layouts, etc.). Think “subject cutout that drops cleanly into templates.”<p>My current pipeline looks like:
1. User-uploaded photo (messy background, weird angles)
2. User-upload is matched to “query” image (style target) + promptCurrently using Nano Banana)
4. Background removal model to get transparency and save as RGBA PNG<p>This works, but it feels hacky + occasionally introduces edge artifacts. Also, the generation model sometimes invents shadows/background cues that confuse the background removal step. It feels like the two steps are fighting one another.<p>I’m trying to understand what “good” looks like in production for this kind of workflow:<p>Are people still doing gen/edit → separate background removal as the standard?<p>Are any of you using alpha-native generation (RGBA outputs) in production? If so, what’s the stack/workflow?<p>If you’ve done “messy UGC photo → catalog asset” specifically: what broke most often and what fixed it?<p>I’m not looking for vendor pitches—mostly practical patterns people are using (open source workflows, model classes, ComfyUI/SD pipelines, API-based stacks, etc.). Happy to share more details if helpful.