HackerNews中文版

字节跳动悄然推出了Seedance 2.0。值得关注的并不是常规的文本转视频升级，而是其参考/条件系统。与典型的文本转视频（T2V）模型不同之处在于： - 同时接受四种输入方式：文本、图像（最多9张）、视频片段（最多3个，总时长≤15秒）和音频（最多3个，总时长≤15秒）。混合输入的上限为12个文件。 - 基于参考的生成：可以使用图像锁定构图/角色外观，使用视频片段指定摄像机运动和动态，使用音轨驱动节奏和速度。输出包括生成的音效/背景音乐。 - 关键声明是“音频驱动的视频”，而不是“附带音频的视频”——这意味着运动实际上是与音频输入的节拍结构同步的，而不仅仅是叠加在一起。 - 支持视频的延续/扩展，具有镜头间的一致性，并可以对现有片段进行编辑操作（角色替换、片段插入/删除）。 - 输出时长：4–15秒，可选择。内置音效。技术上这为何重要：目前大多数视频模型将音频视为后处理步骤。而Seedance 2.0似乎直接将扩散过程与音频特征相结合，这也解释了其节拍同步的行为。多参考@标记系统（@image1用于构图，@video1用于运动，@audio1用于节奏）表明其架构是混合条件的，而非简单的连接。目前尚未看到官方公告。相关文档已在Dreamina（字节跳动的创意平台）上发布。想知道是否有人对其架构有更多细节。如果您希望在发布后进行测试，以下是一些适合您使用场景的平台： - 对于开发者（API）：https://www.atlascloud.ai/ - 对于创作者：Higgsfield, ImagenArt 有关Seedance 2.0的更多信息：https://www.reddit.com/r/SoraAi/comments/1qxdv5u/seedance_20_teaser_better_than_sora_2_true/ Seedance 2.0讨论的子版块：https://www.reddit.com/r/Seedance_AI

查看原文

ByteDance quietly shipped Seedance 2.0. The interesting part isn't the usual text-to-video upgrade — it's the reference/conditioning system.What's different from the typical T2V model:Accepts 4 input modalities simultaneously: text, images (up to 9), video clips (up to 3, ≤15s total), and audio (up to 3, ≤15s total). Mixed input cap is 12 files. Reference-driven generation: you can use an image to lock composition/character appearance, a video clip to specify camera movement and motion dynamics, and an audio track to drive rhythm and tempo. Outputs include generated SFX/BGM. The key claim is "audio-driven video" rather than "video with audio attached" — meaning motion is actually synced to the audio input's beat structure, not just overlaid. Supports video continuation/extension with shot-to-shot coherence, and editing operations (character swap, segment insertion/removal) on existing clips. Output: 4–15s, selectable. Comes with built-in sound. Why this matters technically:Most current video models treat audio as a post-processing step. Seedance 2.0 appears to condition the diffusion process on audio features directly, which would explain the beat-sync behavior. The multi-reference @ tagging system (@image1 for composition, @video1 for motion, @audio1 for rhythm) suggests a mixture-of-conditions architecture rather than simple concatenation.Haven't seen an official announcement yet. Docs are up on Dreamina (ByteDance's creative platform). Curious if anyone has more details on the architecture.If you want to test them after launch, here are a few good platforms depending on your use case: - For developers (API): https://www.atlascloud.ai/ - For creators: Higgsfield, ImagenArtMore info of the Seedance 2.0: https://www.reddit.com/r/SoraAi/comments/1qxdv5u/seedance_20_teaser_better_than_sora_2_true/ Subreddit of Seedance 2.0 for discussion: https://www.reddit.com/r/Seedance_AI

Seedance 2.0 预览：2026年最佳视频模型，超越 Sora 2