
Big Chungus
12.7K posts






no and this is nowhere close to "Pixar-grade"




How Fast is Gemma 4 on a MacBook Pro M4? Benchmarking Google's new MoE (26B-A4B) > Model size: 26.1 GiB > Load time: ~4.2s Comparing single request VS > concurrent requests performance > 32k total context, 4 parallel slots single request behavior > TTFT: 5.68s > prompt: 3,701 tokens @ 652 tok/s > decode: 40.08 tok/s sequential (1 request at a time): > avg duration: 20.5s > p99: 22.1s > throughput: 40.11 tok/s > clean finishes: 100% concurrent (4 parallel requests): > aggregate throughput: 47.25 tok/s > total system throughput: 262.27 tok/s > avg duration: 65.1s > p95 latency: 68.8s > req/sec: 0.058 Head-to-Head: Sequential vs Concurrent throughput: > 40.11 tok/s → 47.25 tok/s (+17.8%) > small gain despite 4x parallelism latency per request: > 20.5s → 65.1s (~3.2x slower) > you pay heavily for concurrency system throughput (true utilization): > ~40 tok/s → 262 tok/s (~6.5x total output) > this is where concurrency wins tokens per second (decode ceiling): > ~40 tok/s steady in both modes > hardware-bound, not scheduler-bound TTFT impact: > ~5.7s baseline → buried under queueing in concurrent > “headers waittime” becomes the bottleneck What this actually means? - You don’t get linear scaling from parallel slots - You trade latency for total output - Mac Unified Memory setup is clearly saturating - Bandwidth + Scheduling overhead show up immediately This is exactly why GPUs dominate here Concurrency without killing latency






We are planning to open-source the Qwen3.6 models (particularly medium-sized versions) to facilitate local deployment and customization for developers. Please vote for the model size you are **most** anticipating—the community’s voice is vital to us!














💥Fresh polls ahead of Hungary’s April 12 parliamentary elections show the opposition TISZA party’s lead over Viktor Orbán’s Fidesz is holding or widening. Under normal democratic conditions, it’s hard to see Orbán closing such a large gap. But this campaign isn't normal at all.









