camenduru

3.2K posts

camenduru banner
camenduru

camenduru

@camenduru

building 🍞 @tost_ai ❤ open source https://t.co/8MMNbygz1P

Katılım Aralık 2006
5.7K Takip Edilen21.3K Takipçiler
camenduru retweetledi
JHC620
JHC620@jhc620·
We've updated RconViaGen-v0.5 based on TRELLIS.2, supporting the generation of high-resolution meshes and PBR materials from multi-view images, and have also released the training code for ReconViaGen! Code:github.com/GAP-LAB-CUHK-S… Huggingface demo:huggingface.co/spaces/Stable-…
English
1
34
164
8.3K
camenduru retweetledi
李萌萌
李萌萌@ljsabc·
We're reintroducing and open-sourcing project "See-through". Given a single anime illustration, it automatically decomposes the character into fully-inpainted semantic layers with depth ordering. One image in, layered PSD out. (1/n) Repo: github.com/shitagaki-lab/…
English
48
650
4.1K
457.7K
camenduru retweetledi
Sand.ai
Sand.ai@SandAI_HQ·
🪄 Introducing daVinci-MagiHuman: The Performance-Level Audio-Video Generative Foundation Model Proudly open-sourced and jointly developed by SII GAIR Lab & Sand.ai, it sets a new standard for multimodal AI. ⏳ 1/6
English
3
11
33
2.8K
camenduru retweetledi
Zhiyang (Frank) Dou
Zhiyang (Frank) Dou@frankzydou·
We have seen many works unlock the power of pretrained models for images and videos🏞️. But what about human motion🕺💃? Can we leverage a pretrained motion prior for a wide range of downstream tasks? Yes!! UMO is a simple yet effective framework that, for the first time, unlocks the priors of a motion foundation model (i.e., HY-Motion) for 10+ tasks, including editing, reaction generation, stylization, trajectory control, obstacle avoidance, keyframe infilling, and more. Amazing work! @xiaoyan_cong and @kunkun0w0. 🏠Webpage: oliver-cong02.github.io/UMO.github.io/ 📄 Paper: arxiv.org/abs/2603.15975 With the growing number of tools for transferring SMPL motion to humanoids, we hope it could also become a source of skills for humanoid robot learning. #Graphics #Motion #Animation #AIGC #GenerativeAI #Vision #3DV #Robotics #Robot #Humanoid #Learning #GenAI #Animation
Zhiyang (Frank) Dou tweet media
Xiaoyan Cong@xiaoyan_cong

💡Introducing 𝑼𝑴𝑶 -- one unified model that unlocks motion foundation model (HY-Motion @TencentHunyuan) priors for 𝟏𝟎+ 𝐭𝐚𝐬𝐤𝐬: 𝐞𝐝𝐢𝐭𝐢𝐧𝐠, 𝐫𝐞𝐚𝐜𝐭𝐢𝐨𝐧 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧, 𝐬𝐭𝐲𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧, 𝐭𝐫𝐚𝐣𝐞𝐜𝐭𝐨𝐫𝐲 𝐜𝐨𝐧𝐭𝐫𝐨𝐥, 𝐨𝐛𝐬𝐭𝐚𝐜𝐥𝐞 𝐚𝐯𝐨𝐢𝐝𝐚𝐧𝐜𝐞, 𝐤𝐞𝐲𝐟𝐫𝐚𝐦𝐞 𝐢𝐧𝐟𝐢𝐥𝐥𝐢𝐧𝐠... (1/8) 🌐 Webpage: oliver-cong02.github.io/UMO.github.io/ 📄 Paper: arxiv.org/abs/2603.15975

English
0
23
82
8.5K
camenduru retweetledi
Donghao Zhou @ CUHK
Donghao Zhou @ CUHK@donghao_zhou·
Ever tried inpainting an object into a scene with #AI, but details got lost? 🥴 Meet HiFi-Inpaint (#CVPR2026)! 🎉 High-fidelity detail preservation for reference-based inpainting — texts, logos, textures, all intact. No more blur in your ad images! 👇 correr-zhou.github.io/HiFi-Inpaint/
Donghao Zhou @ CUHK tweet media
English
3
15
101
6.1K
camenduru retweetledi
LTX
LTX@ltx_model·
LTX-2.3 is a major upgrade. It’s a production-ready multimodal engine - designed to be built on. Here’s what’s new 🧵 1/7
English
89
236
2.7K
830.4K
camenduru retweetledi
Bin Lin
Bin Lin@LinBin46984·
🤯Real-time video generation just got HUGE. Introducing Helios: A 14B parameter model running at 19.5 FPS on a single H100. Cheaper, Faster, and Stronger than 1.3B models, generating minute-long videos without the usual tricks (even no quantization). github.com/PKU-YuanGroup/…
English
6
41
306
27.5K
camenduru retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…
Qwen tweet media
English
922
2.9K
21.4K
8.9M
camenduru retweetledi
Bo Wang
Bo Wang@BoWang87·
Bytedance just dropped a paper that might change how AI thinks. Literally. They figured out why LLMs fail at long reasoning — and framed it as chemistry. The discovery: Chain-of-thought isn't just words. It's molecular structure. Three bond types: • Deep reasoning = covalent bonds (strong, unbreakable) • Self-reflection = hydrogen bonds (flexible, context-aware) • Exploration = van der Waals (weak, ever-present) Why most AI "thinking" sucks: Everyone's been imitating keywords — "wait," "let me check" — without building the actual bonds. It's like copying the shape of a protein without the atomic forces holding it together. Bytedance proved: structure emerges from training, not prompting. The fix: Mole-Syn Their method doesn't just generate text. It synthesizes stable thought molecules. Results: better reasoning, more stable RL training. Bytedance is treating AI reasoning like organic chemistry — and it works. Paper: arxiv.org/abs/2601.06002
Bo Wang tweet mediaBo Wang tweet media
English
114
514
2.8K
241.2K
camenduru retweetledi
ModelScope
ModelScope@ModelScope2022·
Introducing FireRed-Image-Edit-1.0 from FireRedTeam! 🚀 It’s officially the new SOTA for general image editing. ✅ Better than Closed-Source: Outperforms Nano-Banana & Seedream4.0 on GEdit benchmarks. ✅ Native Evolution: Built from T2I foundations, not just a "patch" on existing models. ✅ Style Mastery: Scored a record-breaking 4.97/5.0 in style transfer. ✅ High-Fidelity Text: Keeps original font styles perfectly. ✅ Virtual Try-on: Native support for multi-image joint editing. ✅ Bilingual: Native support for both English & Chinese prompts. Apache 2.0 license. Local deployment ready. 🤖 Model: modelscope.cn/models/FireRed… 🎠 Demo: modelscope.cn/studios/FireRe… 🛠️ Github: github.com/FireRedTeam/Fi…
ModelScope tweet mediaModelScope tweet mediaModelScope tweet mediaModelScope tweet media
English
16
62
518
79.2K
camenduru retweetledi
Owen Tian Ye
Owen Tian Ye@tiny85114767·
Just shipped FastFlux2 Realtime Editor. A fully open-source real-time editing studio in your browser. Webcam → FLUX.2-klein-4B → Single 4090 @ 5 FPS, H100 @ 10+ FPS. Repo: github.com/Owen718/flux-s…
English
7
20
222
22.3K
camenduru retweetledi
camenduru retweetledi
ZHYang
ZHYang@yang_zihan79147·
Excited to share our work: ArcFlow, a 2-Step Text-to-Image Generation Framework via High-Precision Non-Linear Flow Distillation. Code: github.com/pnotp/ArcFlow It ensures high-quality alignment with teacher, delivering 40× speedup and 4× faster convergence with <5% parameters.
ZHYang tweet media
English
0
3
31
2.3K
AnTh
AnTh@AnTh2107·
@camenduru Can i use my own Voice on Songs?
English
1
0
0
75
camenduru
camenduru@camenduru·
🎵 ACE-Step v1.5 ile Türkçe müzik yapabiliyormuşuzzz 😛 🪽 Video model: Imagine (480p 6s) 🧬 github.com/ace-step/ACE-S… 🎮 github.com/fspecii/ace-st… 🎮 acemusic.ai
ACE Music@acemusicAI

We're releasing ACE-Step-v1.5(2B), a fast, high-quality open-source music model. It runs locally on a consumer-grade GPU, generates a full song in under 2 seconds(on an A100), supports LoRA fine-tuning, and beats SUNO on common eval metrics. GitHub: github.com/ace-step/ACE-S… Key traits: Quality: beats Suno on common eval scores Speed: full song under 2s on A100 Local: ~4GB VRAM, under 10s on RTX 3090 LoRA: train your own style with a few songs License: MIT, free for commercial use Data: fully authorized plus synthetic The music AI space lacks commercial-grade open models. Many creators are forced to rely on closed-source services, and can’t fully own, run locally, or fine-tune their own models. We want to help change that.

Türkçe
2
2
62
7.4K