stevibe

2.5K posts

stevibe banner
stevibe

stevibe

@stevibe

Fullstack | LLM | Local AI addict | Learning ML | Builds things nobody asked for. Benchmarks things for fun.

Katılım Temmuz 2009
1.5K Takip Edilen12.4K Takipçiler
Sabitlenmiş Tweet
stevibe
stevibe@stevibe·
Claude Sonnet 4.6, when asked in Chinese: “你是什么模型?” (What model are you?) Confidently replies: “我是 DeepSeek。” (I am DeepSeek) This is the same model whose company just accused DeepSeek of “industrial-scale distillation attacks”
English
353
1.3K
13.1K
1.9M
stevibe
stevibe@stevibe·
@digitalix Always set up as new, this is the only chance given to you when you have a new Mac
English
0
0
0
44
Alex Ziskind
Alex Ziskind@digitalix·
for the first time in 6 or more years, I’m thinking about this choice
English
13
1
43
6.6K
stevibe
stevibe@stevibe·
I gave 6 frontier coding models the same task: turn this emoji into an SVG, from scratch, in real time. Watching them stream their thinking before a single shape appears is wild — some plan meticulously, others just wing it. Models: - GPT-5.3 Codex - Claude Opus 4.6 - Gemini 3.1 Pro - MiniMax M2.7 - GLM-5 - Kimi K2.5
English
8
2
54
6.8K
stevibe
stevibe@stevibe·
Ok but the real winner here: Kimi and GLM both put wasabi inside the sushi 🍣
stevibe tweet media
English
0
0
4
357
stevibe
stevibe@stevibe·
NVIDIA just dropped Nemotron-3-Nano:4b — a tiny 2.8GB model. Guess whose hardware runs it the fastest? - RTX 4090: 226 tok/s - RTX 3090: 187 tok/s - Mac Studio M2 Ultra: 86 tok/s - Mac Mini M4: 25 tok/s Home court advantage is real. Also trying a new layout with live performance charts. Lmk what you think!
English
78
133
1.1K
120.9K
rukasufall
rukasufall@rukasufall·
@stevibe The RTX 5070 Ti did around 214 tk/s. I’m really liking the capabilities of this nano.
English
2
0
3
576
stevibe
stevibe@stevibe·
@zhaoxiongding Definitely. We are just comparing one of the factors here
English
0
0
0
123
Ding
Ding@zhaoxiongding·
@stevibe People don’t run a model because it’s fast people run a model because it’s good.
English
1
0
0
155
stevibe
stevibe@stevibe·
@Nice1774036 Hey, if you want to run local models, the easiest one would be using ollama (the one that this test use), or LM Studio; for advanced usage, llama.cpp and vLLM are good choices.
English
1
0
1
15
Nice
Nice@Nice1774036·
@stevibe How I can use it from Beginning 😺. First time Visit your profile want to learn something else 😔. Need help have you upload any episode or video soo I can watch it deeply and will research and can use it
English
1
0
0
11
stevibe
stevibe@stevibe·
You don't need a cloud API for great OCR anymore. GLM-OCR runs locally with just ~2GB VRAM, handles tables, math equations, and hits ~260 tok/s on a Mac Studio M2 Ultra. Local models are getting better AND smaller at a crazy pace. If you have a GPU or a Mac, you're already ready for the AI era. @Zai_org
English
49
181
1.9K
136.4K
stevibe
stevibe@stevibe·
@changtimwu The real optimised version should be the NVFP4, but I am using a normal Q4 version here
English
0
0
1
127
Tim Wu
Tim Wu@changtimwu·
@stevibe It seems Nemetron series models have been optimized for NV arch? Or gaming GPUs have advantages on executing SLMs around 4B?
English
1
0
0
159
Abby
Abby@abbly298·
@stevibe @Alibaba_Qwen I tested MLX Qwen3.5 9B on the 2020 MacBook Pro 13-inch with the M1 chip and 16GB of RAM. Across all models, it managed to reach a throughput of 13 TOK/sec used 5.116 GB RAM , which is double what Ollama can do! I got 23 t/s for Qwen3.5 4B 4bit used 2.456 GB RAM.
English
1
0
1
36
stevibe
stevibe@stevibe·
Qwen3.5:9b reasoning head-to-head: Mac Studio M2 Ultra 64GB: 43.08 tok/s Mac Mini M4 16GB: 13.07 tok/s @Alibaba_Qwen
English
64
150
2K
240.7K
stevibe
stevibe@stevibe·
@meta_alex Yeah my DGX Spark is arriving next week, will add that in future tests!
English
2
0
6
2K
Alex Skinner
Alex Skinner@meta_alex·
@stevibe Now try it on dgi sparx or any other unified memory based setup it’s not just nvidia vs others these are pure native vram cards
English
3
0
4
2.1K
Jimsta
Jimsta@Jimster4801·
@stevibe What is your prompt here exactly?
English
1
0
0
725
Ernest Yeung
Ernest Yeung@ernestyalumni·
@stevibe ok, noted. btw, how do you do the benchmark to get the tok/second stat?
English
1
0
0
157
stevibe
stevibe@stevibe·
@nuvolore All tests use the same prompt, but I didn't touch the temperature and top_p settings in the test, will consider next time. Thanks for suggesting!
English
1
0
4
379
Lorenzo Nuvoletta
Lorenzo Nuvoletta@nuvolore·
@stevibe Have you tried setting temperature to 0, top_p to 1 and identical seed?
English
1
0
1
390