Seedance 2.0 预览:2026年最佳视频模型,超越 Sora 2

1作者: Alisaqqt3 个月前原帖
字节跳动悄然推出了Seedance 2.0。值得关注的并不是常规的文本转视频升级,而是其参考/条件系统。 与典型的文本转视频(T2V)模型不同之处在于: - 同时接受四种输入方式:文本、图像(最多9张)、视频片段(最多3个,总时长≤15秒)和音频(最多3个,总时长≤15秒)。混合输入的上限为12个文件。 - 基于参考的生成:可以使用图像锁定构图/角色外观,使用视频片段指定摄像机运动和动态,使用音轨驱动节奏和速度。输出包括生成的音效/背景音乐。 - 关键声明是“音频驱动的视频”,而不是“附带音频的视频”——这意味着运动实际上是与音频输入的节拍结构同步的,而不仅仅是叠加在一起。 - 支持视频的延续/扩展,具有镜头间的一致性,并可以对现有片段进行编辑操作(角色替换、片段插入/删除)。 - 输出时长:4–15秒,可选择。内置音效。 技术上这为何重要: 目前大多数视频模型将音频视为后处理步骤。而Seedance 2.0似乎直接将扩散过程与音频特征相结合,这也解释了其节拍同步的行为。多参考@标记系统(@image1用于构图,@video1用于运动,@audio1用于节奏)表明其架构是混合条件的,而非简单的连接。 目前尚未看到官方公告。相关文档已在Dreamina(字节跳动的创意平台)上发布。想知道是否有人对其架构有更多细节。 如果您希望在发布后进行测试,以下是一些适合您使用场景的平台: - 对于开发者(API):https://www.atlascloud.ai/ - 对于创作者:Higgsfield, ImagenArt 有关Seedance 2.0的更多信息:https://www.reddit.com/r/SoraAi/comments/1qxdv5u/seedance_20_teaser_better_than_sora_2_true/ Seedance 2.0讨论的子版块:https://www.reddit.com/r/Seedance_AI
查看原文
ByteDance quietly shipped Seedance 2.0. The interesting part isn&#x27;t the usual text-to-video upgrade — it&#x27;s the reference&#x2F;conditioning system.<p>What&#x27;s different from the typical T2V model:<p>Accepts 4 input modalities simultaneously: text, images (up to 9), video clips (up to 3, ≤15s total), and audio (up to 3, ≤15s total). Mixed input cap is 12 files. Reference-driven generation: you can use an image to lock composition&#x2F;character appearance, a video clip to specify camera movement and motion dynamics, and an audio track to drive rhythm and tempo. Outputs include generated SFX&#x2F;BGM. The key claim is &quot;audio-driven video&quot; rather than &quot;video with audio attached&quot; — meaning motion is actually synced to the audio input&#x27;s beat structure, not just overlaid. Supports video continuation&#x2F;extension with shot-to-shot coherence, and editing operations (character swap, segment insertion&#x2F;removal) on existing clips. Output: 4–15s, selectable. Comes with built-in sound. Why this matters technically:<p>Most current video models treat audio as a post-processing step. Seedance 2.0 appears to condition the diffusion process on audio features directly, which would explain the beat-sync behavior. The multi-reference @ tagging system (@image1 for composition, @video1 for motion, @audio1 for rhythm) suggests a mixture-of-conditions architecture rather than simple concatenation.<p>Haven&#x27;t seen an official announcement yet. Docs are up on Dreamina (ByteDance&#x27;s creative platform). Curious if anyone has more details on the architecture.<p>If you want to test them after launch, here are a few good platforms depending on your use case: - For developers (API): https:&#x2F;&#x2F;www.atlascloud.ai&#x2F; - For creators: Higgsfield, ImagenArt<p>More info of the Seedance 2.0: https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;SoraAi&#x2F;comments&#x2F;1qxdv5u&#x2F;seedance_20_teaser_better_than_sora_2_true&#x2F; Subreddit of Seedance 2.0 for discussion: https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;Seedance_AI