Wildminder

2.4K posts

Wildminder

@wildmindai

Physicist, Programmer, Designer

Berlin Katılım Aralık 2024

91 Takip Edilen10.4K Takipçiler

Wildminder@wildmindai·1h

SCENA by Lightricks. Reference-driven multi-speaker audio scene gen! Prompt "two friends arguing in a rainy cafe" + two ref voice clips -> full 20s audio scene in one pass. Overlapping speech, room echo, background sounds - all baked in. - based on LTX-2.3 - beats ZipVoice-Dialog/MOSS-TTSD - up to 3 ref speakers - perfect text alignment -Identity-aware positional encodings - SOTA performance finmickey.github.io/scena/

English

848

Wildminder@wildmindai·1h

like this LTX2.3 CrossView LoRA. You're basically getting a multi-cam setup from a single raw clip. video + a camera angle prompt, get the same scene re-rendered from a completely different viewpoint. Subject stays, camera moves. huggingface.co/Cseti/LTX2.3-2…

English

1.1K

Wildminder@wildmindai·6h

LTX2.3 Multiple-Subject-Reference V2 LoRA. character identity and outfits actually stay consistent now + the annoying flickering is mostly gone. feed it up to 5 refs at once and it blends them naturally. Character movement and scene interaction finally make sense huggingface.co/LiconStudio/LT…

English

2.6K

Wildminder@wildmindai·9h

No way 😭 Wan dropped Wan-Dancer-14B and it's 85GB... I'm crying > Minute-scale music-to-dance > 720p@30fps > 5 genres > identity preservation > multimodal conditioning (audio, text, ref images) > sharp movements huggingface.co/Wan-AI/Wan-Dan…

English

3.4K

Wildminder@wildmindai·10h

ComfyUI now supports Lucida background remover. nice results. For design work, product shots/illustrations, this is likely the best option right now It isn't a total replacement for everything, though. for portraits with lots of hair or super busy scenes - InSPyReNet still has the edge. huggingface.co/Comfy-Org/BiRe…

English

3.1K

Wildminder@wildmindai·10h

Motion4Motion by StepFun. Training-free motion transfer, no skeleton rigging, no training, no tweaking. - WAN-T2V-14B +3D causal VAE - appearance consistency - good pose similarity - beats MotionClone - diverse morphologies (human to animal etc) lhchen.top/Motion4Motion/

English

3.5K

Wildminder@wildmindai·3d

Just insane. Audio-to-MIDI and it handles the whole band at once, which most tools can't do. it’ll likely struggle with death/black metal, but still pretty cool

Mirelo@MireloAI

Today, together with @kyutai_labs, we’re introducing our new Audio-to-MIDI model. It takes a finished recording, identifies the instruments playing, and returns separate MIDI tracks for each — voice, drums, bass, keys, and more. Unlike most existing solutions, our model works directly from the full mix rather than requiring separate stems. It also detects chords, key, and tempo, giving producers broader musical context. We’ve written more about the model, the problem, and how it works here: mirelo.ai/blog/turning-a…

English

7.4K

Wildminder@wildmindai·3d

ComfyUI LingBot. Nice! There's already a PR for Lingbot Video support. It should be merged pretty soon. github.com/Comfy-Org/Comf…

Wildminder@wildmindai

and impressive video model.. LingBot-Video. Sparse MoE for physically consistent video gen. prioritizes physical realism and action-consequence logic -DiT + Qwen3-VL-4B conditioning + Wan2.1-VAE. - 120B params - spatiotemporal geometry stability - T2I, T2V, TI2V - 1080p technology.robbyant.com/lingbot-video

English

5.8K

Wildminder@wildmindai·3d

MobileWan by Qualcomm. High-fidelity video gen on your phone. - Wan2.2 on Snapdragon - 480×832, 16 FPS in ~20s - uses RNN-like DiT + 2 step DMD2 distillation - only 10GB RAM - memory-efficient VAE qualcomm-ai-research.github.io/MobileWan/

English

4.5K

Wildminder@wildmindai·5d

English

11.8K

Wildminder@wildmindai·5d

It's wild! LingBot-World 2.0 -infinite-horizon video world simulation - drift-free, unbounded environments - real-time 720p 60fps - 60-minute uninterrupted sessions - complex interactions - 14B/1.3B it's a creative playground too. You can step into any setting: ancient Egypt a sci-fi city.... technology.robbyant.com/lingbot-world-…

English

3.6K

Wildminder@wildmindai·5d

Phew… completely rebuilt the updating pipeline. New voice models now appear almost instantly. updated info, new sections and a bunch of new models github.com/wildminder/awe…

English

2.5K

Wildminder@wildmindai·6d

Vibecoders dont use DaVinci Resolve, they build their own

Purz.ai@PurzBeats

You can just vibe code davinci resolve plugins. 🤣

English

2.9K

Wildminder@wildmindai·6d

LTX-2.3 as an interactive world! Amazing! AlayaWorld generates playable virtual worlds in real-time as you interact with them. - 720p@24fps - uses 3D Cache +DMD distillation - long-horizon stability - different styles - realistic simulations creates infinite, unscripted video games. alaya-lab.github.io/AlayaWorld/

English

180

12.1K

Wildminder@wildmindai·7 Tem

Cool LTX2.3 Vintage Style LoRA. Minimalist, vintage-style illustrations with stop-motion typography animation huggingface.co/a3xrfgb/Fable5…

English

2.7K

Wildminder@wildmindai·7 Tem

PixWorld turns flat 2D into a fully navigable 3D environment. - 1.04B pixel-space DiT - no VAE loss - uses pixel-aligned 3DGS and Flow Matching - complete 3D scenes in 15s on a single A100. A dream tool for interior designers sensengao.github.io/PixWorld/

English

163

9.5K

Wildminder@wildmindai·7 Tem

MIRA by Epic Game - first multiplayer interactive world model - simulates 2v2 Rocket League matches - 5B-LDM with Diffusion Forcing for stable 5-minute rollouts - 70ms end-to-end latency - preserves global consistency mira-wm.com

English

3.8K

Wildminder@wildmindai·7 Tem

Wan-Streamer v0.2 looks great. Native full-duplex audio-visual streaming at 25fps - 640x368 - 200ms signal-to-signal latency - parallelized perception and generation - grounded interaction with gaze and hand-movement legibility If Baldur’s Gate 3 had real-time situated dialogue NPCs, it would be insane... wan-streamer.com/v0.2/

English

6.3K

Wildminder@wildmindai·6 Tem

LTX-2.3 Product Ad Style LoRA Turn your basic product photos into $1M luxury commercials. nails that high-end perfume and tech ad vibe. Perfect for fashion, cosmetics, tech showcases. huggingface.co/SOLRICKS/ltx-2…

English

6.6K

Wildminder@wildmindai·6 Tem

OrbitQuant: Data-agnostic quantization for DiTs. shrinks the model's memory footprint and speeds up processing without needing any training - lossless at 4-bit - FLUX.1, Wan 2.1, Z, HunyuanVideo saurabhcantina.github.io/orbitquant/

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry