Ivan Zhang

347 posts

Ivan Zhang banner
Ivan Zhang

Ivan Zhang

@izzycodev

Developer, Failed Youtuber, and trying to fail at trading and AI

انضم Mart 2026
67 يتبع274 المتابعون
Ivan Zhang
Ivan Zhang@izzycodev·
>dflash on mlx w/ qwen 3.5 27b >m5 max, 128gb >4x faster >cool cool cool >time for 3.6 27b
Ivan Zhang tweet mediaIvan Zhang tweet media
English
0
0
1
163
Ivan Zhang
Ivan Zhang@izzycodev·
@iotcoi I’m so curious, was it easy to setup? Let me know what repos you’re using to get this going. Much appreciated
English
1
0
3
4.6K
Mitko Vasilev
Mitko Vasilev@iotcoi·
Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny GB10 GPU at 49W power
Mitko Vasilev tweet media
English
57
61
763
74.5K
Ivan Zhang
Ivan Zhang@izzycodev·
The next wave isn't bigger cloud models. It's smaller, faster, local ones. Models that run on your hardware. Your data never leaves. No API bills. No rate limits. No outages. The gap between local and frontier is closing faster than anyone expected. The people getting comfortable with local inference right now are going to have a massive advantage in 12 months. Don't be behind.
English
0
0
1
32
Ivan Zhang
Ivan Zhang@izzycodev·
@witcheer Lovely, I had it hooked up to a M1 Max to test things together but it’s pretty solid as is
English
0
0
2
835
Mike Gannotti
Mike Gannotti@MichaelGannotti·
All my AI are now switched the @Kimi_Moonshot K2.6 and all three are cooking on seperate projects. In seperate news I need some mini PCs and a switcher to clean up this mess 😜😜
Mike Gannotti tweet media
English
5
2
19
1.2K
Joey
Joey@aijoey·
DGX Spark Video I should of posted first lol. Like I said, Im not a real video person. I love and appreciate tech. Have to keep learning and applying. And the fastest way for me to do that is jump in water and build. @NVIDIA_AI_PC @microcenter
English
9
2
34
2.8K
Zhijian Liu
Zhijian Liu@zhijianliu_·
DFlash for Qwen3.6-35B-A3B just dropped ⚡ The community was running the day-1 preview before we even finished training. Now it's done: ✅ Training complete ✅ Validation passed ✅ Weights finalized ↓ Go build github.com/z-lab/dflash huggingface.co/z-lab/Qwen3.6-…
Zhijian Liu@zhijianliu_

🔥 DFlash x MLX is happening! Shoutout to @aryagm01 for the early work on this. We're building on the momentum. Native MLX support, more models (Qwen3.5), up to 4x faster. Lossless! 👉 github.com/z-lab/dflash

English
36
73
681
89K
Ivan Zhang
Ivan Zhang@izzycodev·
Trying to get @__tinygrad__ running with my m4 mini and a 5070 Seeing if it’ll run with a decent enough modem for Hermes Agent from @NousResearch Don’t know if 12gb is going to cut it
Ivan Zhang tweet media
English
0
0
2
157
송준 Jun Song
송준 Jun Song@songjunkr·
개인으로 사용중인 로컬LLM 세팅 공유: 장비 : MacStudio M2 Ultra 64gb 모델 온로드 - SuperQwen3.6 35b mlx 4bit (90tok/s) - Ernie Image Turbo (이미지 생성모델) Hermes Agent + MLX-LM + GPT Codex (코딩), Gemini (대화, 이미지) 🧵
한국어
28
23
395
23K
Ivan Zhang
Ivan Zhang@izzycodev·
More data on Qwen3.6-35B-A3B 8-bit — this time with continuous batching. What batching actually does: instead of running requests one by one, the model processes them concurrently. You're already paying to stream weights through the GPU — might as well serve 8 users at once. The numbers: → 1x = 99 tok/s decode → 4x = 240 tok/s (+2.4x throughput) → 8x = 307 tok/s (+3x throughput) Anyone else enjoying this model?
Ivan Zhang tweet media
English
0
0
1
96
Vikoo
Vikoo@vikrambuilds·
Is this enough?
Vikoo tweet media
English
35
1
46
1.7K