camenduru

3.2K posts

camenduru

@camenduru

building 🍞 @tost_ai ❤ open source https://t.co/8MMNbygz1P

Katılım Aralık 2006

5.7K Takip Edilen21.3K Takipçiler

camenduru retweetledi

JHC620@jhc620·2d

We've updated RconViaGen-v0.5 based on TRELLIS.2, supporting the generation of high-resolution meshes and PBR materials from multi-view images, and have also released the training code for ReconViaGen! Code：github.com/GAP-LAB-CUHK-S… Huggingface demo：huggingface.co/spaces/Stable-…

English

164

8.3K

camenduru retweetledi

李萌萌@ljsabc·6d

We're reintroducing and open-sourcing project "See-through". Given a single anime illustration, it automatically decomposes the character into fully-inpainted semantic layers with depth ordering. One image in, layered PSD out. (1/n) Repo: github.com/shitagaki-lab/…

English

650

4.1K

457.7K

camenduru retweetledi

Stéphane d'Ascoli@stephanedascoli·26 Mar

🚨 We're very happy to introduce TRIBE v2: a foundation model of the brain's responses to sight, sound & language. 📄 Paper: ai.meta.com/research/publi… ▶️ Demo: aidemos.atmeta.com/tribev2/ 💻 Code: github.com/facebookresear… 🤗 Model: huggingface.co/facebook/tribe…

English

253

1.3K

112.8K

camenduru retweetledi

Sand.ai@SandAI_HQ·23 Mar

🪄 Introducing daVinci-MagiHuman: The Performance-Level Audio-Video Generative Foundation Model Proudly open-sourced and jointly developed by SII GAIR Lab & Sand.ai, it sets a new standard for multimodal AI. ⏳ 1/6

English

2.8K

camenduru retweetledi

Zhiyang (Frank) Dou@frankzydou·18 Mar

We have seen many works unlock the power of pretrained models for images and videos🏞️. But what about human motion🕺💃? Can we leverage a pretrained motion prior for a wide range of downstream tasks? Yes!! UMO is a simple yet effective framework that, for the first time, unlocks the priors of a motion foundation model (i.e., HY-Motion) for 10+ tasks, including editing, reaction generation, stylization, trajectory control, obstacle avoidance, keyframe infilling, and more. Amazing work! @xiaoyan_cong and @kunkun0w0. 🏠Webpage: oliver-cong02.github.io/UMO.github.io/ 📄 Paper: arxiv.org/abs/2603.15975 With the growing number of tools for transferring SMPL motion to humanoids, we hope it could also become a source of skills for humanoid robot learning. #Graphics #Motion #Animation #AIGC #GenerativeAI #Vision #3DV #Robotics #Robot #Humanoid #Learning #GenAI #Animation

Xiaoyan Cong@xiaoyan_cong

💡Introducing 𝑼𝑴𝑶 -- one unified model that unlocks motion foundation model (HY-Motion @TencentHunyuan) priors for 𝟏𝟎+ 𝐭𝐚𝐬𝐤𝐬: 𝐞𝐝𝐢𝐭𝐢𝐧𝐠, 𝐫𝐞𝐚𝐜𝐭𝐢𝐨𝐧 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧, 𝐬𝐭𝐲𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧, 𝐭𝐫𝐚𝐣𝐞𝐜𝐭𝐨𝐫𝐲 𝐜𝐨𝐧𝐭𝐫𝐨𝐥, 𝐨𝐛𝐬𝐭𝐚𝐜𝐥𝐞 𝐚𝐯𝐨𝐢𝐝𝐚𝐧𝐜𝐞, 𝐤𝐞𝐲𝐟𝐫𝐚𝐦𝐞 𝐢𝐧𝐟𝐢𝐥𝐥𝐢𝐧𝐠... (1/8) 🌐 Webpage: oliver-cong02.github.io/UMO.github.io/ 📄 Paper: arxiv.org/abs/2603.15975

English

8.5K

camenduru retweetledi

Yuan Liu@YuanLiu41955461·6 Mar

Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it! Homepage: jiah-cloud.github.io/Track4World.gi… Code: github.com/TencentARC/Tra… Paper: arxiv.org/abs/2603.02573

English

264

17.8K

camenduru retweetledi

Donghao Zhou @ CUHK@donghao_zhou·9 Mar

Ever tried inpainting an object into a scene with #AI, but details got lost? 🥴 Meet HiFi-Inpaint (#CVPR2026)! 🎉 High-fidelity detail preservation for reference-based inpainting — texts, logos, textures, all intact. No more blur in your ad images! 👇 correr-zhou.github.io/HiFi-Inpaint/

English

101

6.1K

camenduru retweetledi

Peiqing Yang@peiqing001·9 Mar

🚀 𝙈𝙖𝙩𝘼𝙣𝙮𝙤𝙣𝙚 2 is accepted to #CVPR2026! 🔥A stronger version of 𝙈𝙖𝙩𝘼𝙣𝙮𝙤𝙣𝙚 with finer details and enhanced robustness🔥 -🏡: pq-yang.github.io/projects/MatAn… -📜: arxiv.org/abs/2512.11782 -👩🏻‍💻: github.com/pq-yang/MatAny…

English

872

71K

camenduru retweetledi

LTX@ltx_model·5 Mar

LTX-2.3 is a major upgrade. It’s a production-ready multimodal engine - designed to be built on. Here’s what’s new 🧵 1/7

English

236

2.7K

830.4K

camenduru retweetledi

Bin Lin@LinBin46984·4 Mar

🤯Real-time video generation just got HUGE. Introducing Helios: A 14B parameter model running at 19.5 FPS on a single H100. Cheaper, Faster, and Stronger than 1.3B models, generating minute-long videos without the usual tricks (even no quantization). github.com/PKU-YuanGroup/…

English

306

27.5K

camenduru retweetledi

Qwen@Alibaba_Qwen·2 Mar

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English

922

2.9K

21.4K

8.9M

camenduru@camenduru·26 Şub

I learned a lot while studying github.com/antirez/iris.c thanks to @antirez ❤ Here are my study notes written in pure C 🐣 github.com/camenduru/klei…

English

8.3K

camenduru retweetledi

Bo Wang@BoWang87·21 Şub

Bytedance just dropped a paper that might change how AI thinks. Literally. They figured out why LLMs fail at long reasoning — and framed it as chemistry. The discovery: Chain-of-thought isn't just words. It's molecular structure. Three bond types: • Deep reasoning = covalent bonds (strong, unbreakable) • Self-reflection = hydrogen bonds (flexible, context-aware) • Exploration = van der Waals (weak, ever-present) Why most AI "thinking" sucks: Everyone's been imitating keywords — "wait," "let me check" — without building the actual bonds. It's like copying the shape of a protein without the atomic forces holding it together. Bytedance proved: structure emerges from training, not prompting. The fix: Mole-Syn Their method doesn't just generate text. It synthesizes stable thought molecules. Results: better reasoning, more stable RL training. Bytedance is treating AI reasoning like organic chemistry — and it works. Paper: arxiv.org/abs/2601.06002

English

114

514

2.8K

241.2K

camenduru retweetledi

ModelScope@ModelScope2022·14 Şub

Introducing FireRed-Image-Edit-1.0 from FireRedTeam! 🚀 It’s officially the new SOTA for general image editing. ✅ Better than Closed-Source: Outperforms Nano-Banana & Seedream4.0 on GEdit benchmarks. ✅ Native Evolution: Built from T2I foundations, not just a "patch" on existing models. ✅ Style Mastery: Scored a record-breaking 4.97/5.0 in style transfer. ✅ High-Fidelity Text: Keeps original font styles perfectly. ✅ Virtual Try-on: Native support for multi-image joint editing. ✅ Bilingual: Native support for both English & Chinese prompts. Apache 2.0 license. Local deployment ready. 🤖 Model: modelscope.cn/models/FireRed… 🎠 Demo: modelscope.cn/studios/FireRe… 🛠️ Github: github.com/FireRedTeam/Fi…

English

518

79.2K

camenduru retweetledi

Owen Tian Ye@tiny85114767·14 Şub

Just shipped FastFlux2 Realtime Editor. A fully open-source real-time editing studio in your browser. Webcam → FLUX.2-klein-4B → Single 4090 @ 5 FPS, H100 @ 10+ FPS. Repo: github.com/Owen718/flux-s…

English

222

22.3K

camenduru retweetledi

MiniMax (official)@MiniMax_AI·13 Şub

MiniMax-M2.5 is now open source. Trained with reinforcement learning across hundreds of thousands of complex real-world environments, it delivers SOTA performance in coding, agentic tool use, search, and office workflows. Hugging Face: huggingface.co/MiniMaxAI/Mini… GitHub: github.com/MiniMax-AI/Min… Coding Plan: platform.minimax.io/subscribe/codi… Intelligence with Everyone

MiniMax (official)@MiniMax_AI

Introducing M2.5, an open-source frontier model designed for real-world productivity. - SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. - Optimized for efficient execution, 37% faster at complex tasks. - At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible MiniMax Agent: agent.minimax.io API: platform.minimax.io CodingPlan: platform.minimax.io/subscribe/codi…

English

247

1.1M

camenduru@camenduru·14 Şub

🤦 @github @jaredpalmer 🚫 github.com/lucidrains 🆕 gitlab.com/lucidrains 🆕 codeberg.org/lucidrains

QME

4.7K

camenduru retweetledi

ZHYang@yang_zihan79147·10 Şub

Excited to share our work: ArcFlow, a 2-Step Text-to-Image Generation Framework via High-Precision Non-Linear Flow Distillation. Code: github.com/pnotp/ArcFlow It ensures high-quality alignment with teacher, delivering 40× speedup and 4× faster convergence with <5% parameters.

English

2.3K

camenduru retweetledi

ModelScope@ModelScope2022·10 Şub

Free training for Klein is live on ModelScope—supporting both image and edit models! 🚀modelscope.ai/civision/model…

English

3.6K

camenduru@camenduru·6 Şub

@AnTh2107 yep reddit.com/r/comfyui/comm…

AnTh@AnTh2107·4 Şub

@camenduru Can i use my own Voice on Songs?

English

camenduru@camenduru·4 Şub

🎵 ACE-Step v1.5 ile Türkçe müzik yapabiliyormuşuzzz 😛 🪽 Video model: Imagine (480p 6s) 🧬 github.com/ace-step/ACE-S… 🎮 github.com/fspecii/ace-st… 🎮 acemusic.ai

ACE Music@acemusicAI

We're releasing ACE-Step-v1.5(2B), a fast, high-quality open-source music model. It runs locally on a consumer-grade GPU, generates a full song in under 2 seconds(on an A100), supports LoRA fine-tuning, and beats SUNO on common eval metrics. GitHub: github.com/ace-step/ACE-S… Key traits: Quality: beats Suno on common eval scores Speed: full song under 2s on A100 Local: ~4GB VRAM, under 10s on RTX 3090 LoRA: train your own style with a few songs License: MIT, free for commercial use Data: fully authorized plus synthetic The music AI space lacks commercial-grade open models. Many creators are forced to rely on closed-source services, and can’t fully own, run locally, or fine-tune their own models. We want to help change that.

Türkçe

7.4K

Keşfet

@xiaoyan_cong @kunkun0w0 @antirez @github @jaredpalmer @AnTh2107 @elonmusk @BarackObama