Tom Chea
1 posts


Today, we're officially launching Wan2.5-Preview! It's set to reshape the future of visual generation with a new architecture and powerful features.
• Architectural Features: Native Multimodality, Deep Alignment
∘ Native Multimodal Architecture: Adopts a new, unified framework for both understanding and generation, flexibly supporting the input and output of text, images, video, and audio.
∘ Joint Multimodal Training: Achieves stronger modal alignment by jointly training on text, audio, and visual data—key to enabling audio-visual sync and greatly improved instruction following.
∘ Human Preference Alignment: Implements Reinforcement Learning from Human Feedback (RLHF) to continuously align with human preferences, enhancing image quality and video dynamics.
• Video Capabilities: A/V Synchronization, Cinematic Quality
∘ Synchronized A/V Generation: Natively supports high-fidelity, high-consistency video generation with synchronized audio, including multi-person vocals, sound effects, and BGM.
∘ Controllable Multimodal Input: Supports text, images, and audio as input sources for limitless creativity.
∘ Cinematic Aesthetics: Features powerful dynamics and structural stability with an upgraded cinematic control system, generating 1080p HD 10s videos of cinematic quality.
• Image Capabilities: Creative & Precise Control
∘ Advanced Image Generation: Greatly improved instruction following to support photorealistic quality, diverse artistic styles, creative typography, and professional-grade charts.
∘ Image Editing: Supports conversational, instruction-based image editing and pixel-level precision for tasks like multi-concept fusion, material transformation, and product color swapping, and more.
English