Tom Chea

1 posts

Tom Chea

Tom Chea

@TomChea56513

Katılım Şubat 2024
2 Takip Edilen0 Takipçiler
Wan
Wan@Alibaba_Wan·
Today, we're officially launching Wan2.5-Preview! It's set to reshape the future of visual generation with a new architecture and powerful features. • Architectural Features: Native Multimodality, Deep Alignment ∘ Native Multimodal Architecture: Adopts a new, unified framework for both understanding and generation, flexibly supporting the input and output of text, images, video, and audio. ∘ Joint Multimodal Training: Achieves stronger modal alignment by jointly training on text, audio, and visual data—key to enabling audio-visual sync and greatly improved instruction following. ∘ Human Preference Alignment: Implements Reinforcement Learning from Human Feedback (RLHF) to continuously align with human preferences, enhancing image quality and video dynamics. • Video Capabilities: A/V Synchronization, Cinematic Quality ∘ Synchronized A/V Generation: Natively supports high-fidelity, high-consistency video generation with synchronized audio, including multi-person vocals, sound effects, and BGM. ∘ Controllable Multimodal Input: Supports text, images, and audio as input sources for limitless creativity. ∘ Cinematic Aesthetics: Features powerful dynamics and structural stability with an upgraded cinematic control system, generating 1080p HD 10s videos of cinematic quality. • Image Capabilities: Creative & Precise Control ∘ Advanced Image Generation: Greatly improved instruction following to support photorealistic quality, diverse artistic styles, creative typography, and professional-grade charts. ∘ Image Editing: Supports conversational, instruction-based image editing and pixel-level precision for tasks like multi-concept fusion, material transformation, and product color swapping, and more.
English
169
225
2.2K
1.7M