Ivan Zhang

347 posts

Ivan Zhang

@izzycodev

Developer, Failed Youtuber, and trying to fail at trading and AI

انضم Mart 2026

67 يتبع274 المتابعون

Ivan Zhang@izzycodev·12h

Fireeee

Teknium 🪽@Teknium

Will be in Hermes Agent to try in the next hour or so 🤗😎

English

Ivan Zhang@izzycodev·21h

>dflash on mlx w/ qwen 3.5 27b >m5 max, 128gb >4x faster >cool cool cool >time for 3.6 27b

English

163

Ivan Zhang أُعيد تغريده

0xSero@0xSero·1d

New best local model for y'all 16GB-64GB rejoice, the chosen one has arrived. huggingface.co/Qwen/Qwen3.6-2…

English

136

2.7K

186.1K

Ivan Zhang@izzycodev·1d

@iotcoi I’m so curious, was it easy to setup? Let me know what repos you’re using to get this going. Much appreciated

English

4.6K

Mitko Vasilev@iotcoi·1d

Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny GB10 GPU at 49W power

English

763

74.5K

Ivan Zhang@izzycodev·2d

The next wave isn't bigger cloud models. It's smaller, faster, local ones. Models that run on your hardware. Your data never leaves. No API bills. No rate limits. No outages. The gap between local and frontier is closing faster than anyone expected. The people getting comfortable with local inference right now are going to have a massive advantage in 12 months. Don't be behind.

English

Ivan Zhang@izzycodev·2d

Got @__tinygrad__ working! Took a bit of trial and error. Likely myself not understanding how drivers and eGPUs work. Will see what kind of model we can load on here

Ivan Zhang@izzycodev

Trying to get @__tinygrad__ running with my m4 mini and a 5070 Seeing if it’ll run with a decent enough modem for Hermes Agent from @NousResearch Don’t know if 12gb is going to cut it

English

Ivan Zhang@izzycodev·2d

@witcheer Lovely, I had it hooked up to a M1 Max to test things together but it’s pretty solid as is

English

835

witcheer ☯︎@witcheer·2d

so this is what a Mac Mini M4 gets you

0xSero@0xSero

Here's how anyone can find models that work for your hardware easily. 1. Go to huggingface.co and make an account 2. Models tab to find weights and all compressions 3. Click on your profile on the top right.

English

109

30.4K

Ivan Zhang@izzycodev·2d

@MichaelGannotti @Kimi_Moonshot What a setup haha, one day I'll get here

English

Mike Gannotti@MichaelGannotti·3d

All my AI are now switched the @Kimi_Moonshot K2.6 and all three are cooking on seperate projects. In seperate news I need some mini PCs and a switcher to clean up this mess 😜😜

English

1.2K

Ivan Zhang@izzycodev·3d

@aijoey @NVIDIA_AI_PC @microcenter @Teknium Love it man, I’m in the same boat. Excited to see you build and continue learning

English

Joey@aijoey·3d

DGX Spark Video I should of posted first lol. Like I said, Im not a real video person. I love and appreciate tech. Have to keep learning and applying. And the fastest way for me to do that is jump in water and build. @NVIDIA_AI_PC @microcenter

English

2.8K

Ivan Zhang@izzycodev·3d

@souly9999 @zhijianliu_ Whoa very cool

English

safrano9999@souly9999·3d

@zhijianliu_ x.com/souly9999/stat… works great with hermes and openclaw

safrano9999@souly9999

github.com/safrano9999/op… Wrapper Server in Python to use MLX DFlash with Qwen3.5 and Qwen3.6 local llm on Apple Silicon. Tested with @NousResearch Hermes and @openclaw @zhijianliu_

English

895

Zhijian Liu@zhijianliu_·3d

DFlash for Qwen3.6-35B-A3B just dropped ⚡ The community was running the day-1 preview before we even finished training. Now it's done: ✅ Training complete ✅ Validation passed ✅ Weights finalized ↓ Go build github.com/z-lab/dflash huggingface.co/z-lab/Qwen3.6-…

Zhijian Liu@zhijianliu_

🔥 DFlash x MLX is happening! Shoutout to @aryagm01 for the early work on this. We're building on the momentum. Native MLX support, more models (Qwen3.5), up to 4x faster. Lossless! 👉 github.com/z-lab/dflash

English

681

89K

Ivan Zhang@izzycodev·3d

Trying to get @__tinygrad__ running with my m4 mini and a 5070 Seeing if it’ll run with a decent enough modem for Hermes Agent from @NousResearch Don’t know if 12gb is going to cut it

English

157

Ivan Zhang@izzycodev·3d

Another one to try! Exciting

Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

Ivan Zhang@izzycodev·3d

@songjunkr What is Super Qwen?

English

313

송준 Jun Song@songjunkr·3d

개인으로 사용중인 로컬LLM 세팅 공유: 장비 : MacStudio M2 Ultra 64gb 모델 온로드 - SuperQwen3.6 35b mlx 4bit (90tok/s) - Ernie Image Turbo (이미지 생성모델) Hermes Agent + MLX-LM + GPT Codex (코딩), Gemini (대화, 이미지) 🧵

한국어

395

23K

Ivan Zhang@izzycodev·4d

More data on Qwen3.6-35B-A3B 8-bit — this time with continuous batching. What batching actually does: instead of running requests one by one, the model processes them concurrently. You're already paying to stream weights through the GPU — might as well serve 8 users at once. The numbers: → 1x = 99 tok/s decode → 4x = 240 tok/s (+2.4x throughput) → 8x = 307 tok/s (+3x throughput) Anyone else enjoying this model?

English

Ivan Zhang@izzycodev·4d

@vikrambuilds No pillow? :(

English

Vikoo@vikrambuilds·4d

Is this enough?

English

1.7K

Ivan Zhang@izzycodev·4d

@chribjel Terminal all the things!

English

Christoffer Bjelke@chribjel·6d

like an addict

English

4.1K

175.7K

Ivan Zhang@izzycodev·4d

@OrganicGPT @aijoey @microcenter @NVIDIAAIDev Will be tracking how you use it, I don’t have an immediate use case, but there’s 25 near me at the micro center.

English

Behnam@OrganicGPT·4d

@aijoey @izzycodev @microcenter @NVIDIAAIDev makes sense, esp. if your clients end up paying for it

English

Joey@aijoey·4d

I did a thing. Thanks @microcenter @NVIDIAAIDev Going to be busy for a while.

English

122

48.1K

Ivan Zhang@izzycodev·4d

@OrganicGPT @aijoey @microcenter @NVIDIAAIDev So a 1.5x markup from msrp? Compute is getting limited

English

Behnam@OrganicGPT·4d

@aijoey @izzycodev @microcenter @NVIDIAAIDev ouch! the original price was $3000, then raised to $4000 and now $4500..

English

Ivan Zhang@izzycodev·4d

@aijoey @microcenter @NVIDIAAIDev You wouldn’t be near the DMV area would you?

English

Joey@aijoey·4d

@izzycodev @microcenter @NVIDIAAIDev The microcenter I went to had 3 in stock. I got lucky.

English

184

اكتشف

@iotcoi @__tinygrad__ @witcheer @MichaelGannotti @Kimi_Moonshot @aijoey @NVIDIA_AI_PC @microcenter