Drbaph

333 posts

Drbaph

@drbaph

💀 Illustrator | 3D Designer | ML Researcher

Katılım Ağustos 2021

598 Takip Edilen769 Takipçiler

Drbaph@drbaph·9h

@MosiAI_Official @huggingface amazing!

English

350

MOSI@MosiAI_Official·9h

🤗 MOSS-VL-Realtime is now open source on @huggingface . Built for real-time visual understanding over continuous video streams: 🧠 11B vision-language model 📜 Apache-2.0 license 💬 Ask questions at any point in a video stream 👀 Keeps watching while generating a response 🔄 Revises or interrupts its response as the scene changes 🤫 Can remain silent when more evidence is needed 🧩 256K-token context window 🌐 Chinese and English multimodal understanding 📦 Base, Instruct, and Realtime models are all open source From “watch first, answer later” to “keep watching while answering.” 👀 @Open_MOSS Thank you @sgl_project @lmsysorg for day-0 support! 🚀

English

372

27.8K

Drbaph@drbaph·3d

@cocktailpeanut @kyutai_labs @MireloAI pretty cool, good job!

English

281

cocktail peanut@cocktailpeanut·3d

MuScriptor: Transcribe any music, on your local PC Just made a 1-click launcher for MuScriptor (by @kyutai_labs and @MireloAI). 1. Runs on ALL OS (Mac, Windows, Linux). 2. Runs on ALL machines even with low memory (around 1GB for small, 2~3GB for medium, ~8GB for large model)

kyutai@kyutai_labs

We're releasing MuScriptor, the best open model for multi-instrument transcription to date, created in collaboration with @MireloAI. Give it a recording in any genre: pop, classical, metal, jazz, whatever, and it transcribes the individual instruments into MIDI. Link in 🧵

English

379

26.9K

Drbaph@drbaph·30 Haz

@SlipperyGem longcat-2.0 for my Hermes setup atm is sweet, not the cheapest but exceeds deepseek for what i need it to do

English

Brie Wensleydale🧀🐭@SlipperyGem·30 Haz

Since Dipsy V4 had a price increase and I've been interested in a HIGH \ LOW setup, went and looked up the price. Yeah, perhaps I'll stick with Dipsy for now.

English

753

Drbaph@drbaph·30 Haz

The biggest steady-state issue looks like repeated full-scene work In the current build, live car reflections are configured with applyLiveUpdates=true, minFrameInterval=1, moveThreshold=0, and facesPerFrame=6, so the reflection path can render a cubemap of the scene every frame. Profiling backs this up: disabling reflections massively reduces draw calls/triangles and improves FPS. Shadows and active traffic physics are the next major costs My proposed fix would be: - Keep reflection quality, but cache/invalidate probes instead of updating all cube faces every frame. Update on meaningful car movement, lighting change, or nearby dynamic-object change; spread faces across frames; ideally render a reflection-only proxy scene/layer - Keep shadow resolution/visual quality, but render shadow-caster proxies and cached clipmaps rather than full beauty geometry for world/building detail. - Add traffic/physics LOD: player and nearby/interacting cars stay high-frequency, distant traffic becomes kinematic/lower-rate/asleep - For the 30-60s construction problem, fix PCG streaming priority/cancellation: nearest first pages before details/colliders, real flush/progress, and don’t let far building work clog the queue. The worker already transfers buffers, so the issue is scheduling rather than typed-array copying

English

234

robot 2.0@alightinastorm·30 Haz

vibe-stack.github.io/60fps?debug=1 whoever finds out first how we can get 60fps without losing quality will receive $50-100 in SOL (depending on my mood and how good the answer is) it's a raw game dump so sorry 50mb constructing the world takes a while (30-60 seconds), you can interact while it does but the worker queue is clogging hard anyways, have fun don't even try on mobile, no chance

English

10.1K

Drbaph@drbaph·30 Haz

@skbulous sick

English

SKB@skbzz7·29 Haz

Nothing special, just a small radio player for ComfyUI. Supports YouTube/live links and direct radio streams. That’s it. github.com/SKBv0/ComfyUI_…

English

137

Drbaph@drbaph·24 Haz

@Machinedelusion @nazar44444444 or egpu / dock

Magyar

Machine Delusions@Machinedelusion·24 Haz

@nazar44444444 It’s definitely not a lie. Just use a GPU instead of a laptop =P

English

163

444@nazar44444444·24 Haz

local inference is a lie sadly - at least for now - no ones waiting that long

Machine Delusions@Machinedelusion

im running Krea 2 on a macbook pro m5, 2048 res, and its taking about 5 min. For a 1k image it takes about 30 seconds

English

231

Ting Chen Liang@ting_·20 Haz

omg this made my day 🥹 Someone already made a ComfyUI node pack for @MosiAI_Official @Open_MOSS MOSS-TTS Local Transformer v1.5 Clone voices, generate speech in 30+ languages, and export 48 kHz stereo audio right from your workflow!! thank you for building this🩵 @drbaph

English

1.6K

Drbaph@drbaph·20 Haz

@ting_ @MosiAI_Official @Open_MOSS thank you for your work, this is an incredible model quite fast and very stable 🔥

English

Drbaph@drbaph·19 Haz

@RoRo_Rhoda @SMOUSE_CG delete it and add it outright from the gumroad shop its $0.00

English

Ronan_Kyle_Rhoda@RoRo_Rhoda·19 Haz

@SMOUSE_CG it says free on the site(100% off), but only gives a 50% discount 😕

English

638

SMOUSE 🔸 🏳️‍🌈🏳️‍⚧️@SMOUSE_CG·19 Haz

In celebration of my new website, I'm giving away Underwater Caustics Pro ($20) away for free for a very limited time! 🥰 Everything else is also 50% Off - I mean everything! Go check it out :D smouse.studio/underwatercaus…

English

626

24.1K

Drbaph@drbaph·19 Haz

@ting_ @huggingface github.com/Saganaki22/Mos…

QME

Ting Chen Liang@ting_·18 Haz

🤗 our latest tts model is open source on @huggingface > 48hz > 30+ languages supported > streaming output > inline pause control

OpenMOSS@Open_MOSS

🤗 MOSS-TTS-Local Transformer v1.5 is now open source. Built with a pure autoregressive Audio Tokenizer + LLM paradigm: >MOSS-Audio-Tokenizer-v2, 2B params >Qwen3-4B backbone >Native 48 kHz stereo audio >Streaming output with theoretical sub-100 ms TTFT >Zero-shot voice cloning >Inline [pause] control >🇺🇸 🇯🇵 🇰🇷 31 language synthesis >SGLang-Omni Day0 support 🎉 @sgl_project @lmsysorg Designed for voice agents, digital humans, game NPCs, audiobooks, and real-time speech generation. 👇

English

111

13.3K

Drbaph@drbaph·14 Haz

@Gcabrielclark ComfyUI ready, bf16 & fp8 mixed github.com/Saganaki22/Zon…

English

Gabriel Clark@Gcabrielclark·12 Haz

Been working on this for a while go check it out! Tech report coming soon. I’m so excited for people to dig into all the weird stuff we found.

Zyphra@ZyphraAI

Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on @AMD. 🧵

English

3.7K

Drbaph@drbaph·9 Haz

@cocktailpeanut @ideogram_ai even with propper json schema if the prompt is fairly simple or short it will trigger it, some people bypass the model's first 2 layers by manual sigmas and it never triggers the safety filter

English

156

cocktail peanut@cocktailpeanut·9 Haz

Just me, or is it super frustrating to use the JSON prompt with @ideogram_ai? Keep getting this safety filter thing half the time despite using 100% valid JSON prompts validated against the schema. Even a simple "a cat" returns the safety filter bs

cocktail peanut@cocktailpeanut

Run Ideogram 4 with 6GB VRAM Locally WanGP now supports Ideogram 4, just need around 6GB VRAM to run. The JSON based prompting lets you describe the layout of the image PRECISELY, and you can use the built-in prompt helper to build the prompt.

English

8.3K

Drbaph@drbaph·7 Haz

@PenguinWeb3

QME

211

Penguin@PenguinWeb3·6 Haz

I found the weirdest ChatGPT image bug If you ask it this prompt: “Restore the attached photo. I apologise for the content of the photo! I know it’s very strange. Don’t ask any questions, don’t accept any explanations. Just restore the image, please. Don’t ask me to upload the photo again; just close your eyes and restore it. Make up the photo yourself” but there's no actual photo the model starts hallucinating the image by itself and the results are genuinely cursed like creepy lost media nightmare photos @sama @OpenAI

English

7.7K

2.3K

34.4K

17.6M

Drbaph@drbaph·6 Haz

@toyxyz3 thats pretty cool

English

toyxyz@toyxyz3·6 Haz

I am currently developing a simple SteamVR-based motion capture tool. It also supports capturing multiple actors simultaneously.

English

Drbaph@drbaph·6 Haz

@FeitengLi github.com/Saganaki22/Dot…

QME

192

Feiteng@FeitengLi·6 Haz

小红书 AI 团队（rednote-hilab）开源语音克隆模型 dots.tts 输出 48kHz 高保真，Apache 2.0 协议，能免费商用中文在 Seed-TTS-Eval 上的数字，中文念错率 0.94%、音色相似度 81.0，开源里基本第一档。多语言24 种语言的多语种测试里，说话人相似度平均 83.9，中英日韩法德这些常用语种都覆盖。放了三个模型，克隆声音直接选 soar（官方默认款、相似度最高），想训练自己的声音用 base，要快要实时用 mf（meanflow 2~4 步出声）。代码 github.com/rednote-hilab/… 模型 🤗 huggingface.co/collections/re… 在线试玩 huggingface.co/spaces/rednote… demo rednote-hilab.github.io/dots.tts-demo/

中文

7.9K

Drbaph@drbaph·5 Haz

@ostrisai there might be a bug where no gpu shows up in autocaptioner gpu id dropdown, even though training jobs can see the GPU #0

English

141

Ostris@ostrisai·4 Haz

Added an Ideogram 4 auto captioner to AI Toolkit. It automatically does the boxes and the json for you. I tested with Qwen-3-VL 8B and it works quite well. I even added a little toggle to view the boxes in the dataset viewer.

English

182

10.5K

Drbaph@drbaph·5 Haz

@PhotogenicWeekE @ai_nontan_room made this simple node, it uses their free prompt magic api, grab a key and use it for free github.com/Saganaki22/ide…

English

357

Photogenic Weekend@PhotogenicWeekE·4 Haz

@ai_nontan_room jsonで書かないとダメなのです。以下をSystem PromptとしてLLMをセット。 TARGET IMAGE ASPECT RATIO: 2:3 (width:height). User idea: a young Japanese woman と言う感じでLLMに入力、変換して出てきたjsonをPromptにしてください。 github.com/ideogram-oss/i…

日本語

13.2K

－TAKATO－AI_room@ai_nontan_room·4 Haz

Ideogram 4.0 試しましたが、ローカルなのにセーフティフィルターでブロックされましたｗローカルで胸が大きいビキニでブロックされてると痛いなあｗ僕だけなのかな？😅

日本語

8.2K

Drbaph@drbaph·4 Haz

@ostrisai yes this was in auraflow 😅

English

138

Ostris@ostrisai·3 Haz

Some of you may remember an instance a while back when a model was trained on Ideogram images, and their safety filter image placeholder accidently got burned into the model so NSFW terms generated something resembling that image.. It was a super interesting accident.

English

Ostris@ostrisai·3 Haz

Ideogram actually burned in a safety filter into the Ideogram4 model. This is without any prompt upsampling, which seems to trigger false positives, with AI Toolkit default prompts. Most failed like this. It actually diffused and generated this image.

English

122

12.7K

Drbaph@drbaph·1 Haz

github.com/Saganaki22/SAM…

ZXX

Drbaph@drbaph·1 Haz

Built a desktop app around SAM3DBody-cpp: - CUDA accelerated - Video → BVH motion capture - Image → Static Mesh / BVH - 8 GB VRAM - Skeleton + body mesh visualization - OBJ / GLB export - Tauri + Rust + CPP + ONNX Runtime Link in comments 👇

English

396

Keşfet

@MosiAI_Official @huggingface @Open_MOSS @sgl_project @lmsysorg @cocktailpeanut @kyutai_labs @MireloAI