Bridgebench

23 posts

Bridgebench banner
Bridgebench

Bridgebench

@bridgebench

The best vibe coding benchmark in the world. Built by @bridgemindai

United States شامل ہوئے Mart 2026
4 فالونگ69 فالوورز
Bridgebench
Bridgebench@bridgebench·
@xundecidability fair. async workflows change the equation. if you're not waiting on results in real time, latency matters less
English
0
0
0
52
thomas
thomas@xundecidability·
@bridgebench Disagree. More agentic work is async now.
English
1
0
1
61
Bridgebench
Bridgebench@bridgebench·
GLM 5.1 is the slowest frontier model we've ever benchmarked on BridgeBench. 44.3 tokens per second. Half the speed of GPT 5.4. Nearly 6x slower than Grok 4.20. Z.ai traded all of their speed for intelligence. The coding benchmarks improved. The throughput collapsed. In 2026, agentic coding is about parallelism. You're running 5, 10, 15 agents at once. A model this slow bottlenecks every workflow it touches. Intelligence without speed is a luxury most vibe coders can't afford. bridgebench.ai
Bridgebench tweet media
English
30
8
199
20.2K
Bridgebench
Bridgebench@bridgebench·
@briantexts solid point. async + thorough planning is a legit workflow where speed matters less. it's the parallel agent use case where latency kills you
English
1
0
1
94
brian
brian@briantexts·
@bridgebench I value the model's intelligence over speed when building real software. Why would I want to pollute my codebase and go back and fix things when I can thoroughly plan PRD and work async?
English
1
0
1
214
Bridgebench
Bridgebench@bridgebench·
@Manaho217794 that would be the move. if Z.ai ships a turbo variant with this level of intelligence, it could be a real contender
English
0
0
1
180
Manaho
Manaho@Manaho217794·
@bridgebench So we'll have to wait for GLM-5.1 Turbo.
English
1
0
3
651
Bridgebench
Bridgebench@bridgebench·
@abukinansafadi3 totally valid if you're working on single tasks. the speed penalty only really hits when you scale up agents
English
0
0
0
105
Bridgebench
Bridgebench@bridgebench·
@shortwavlabs when you're running 10+ agents in parallel, every second of latency compounds. speed is the bottleneck for agentic workflows
English
1
0
1
56
Bridgebench
Bridgebench@bridgebench·
@qrydor that's actually a smart use case. use GLM for the thinking layer and hand off execution to something faster
English
0
0
0
70
Bridgebench
Bridgebench@bridgebench·
@mavori838 xAI went all in on inference speed. massive GPU clusters optimized for throughput
English
0
0
1
165
max
max@mSanterre·
@bridgebench Might need to wait until it can be hosted on faster hardware
English
1
0
1
114
Bridgebench
Bridgebench@bridgebench·
@DimitriGilbert ha fair enough. benchmarking load definitely doesn't help. the variability is real though, that's on their infra side
English
1
0
1
239
Dimitri Gilbert
Dimitri Gilbert@DimitriGilbert·
@bridgebench it is highly variable and their infra sucks before y'all went hammering things with your benches, I was working around the normal speed can't you wait for me to sleep to do that ? XD
English
1
0
1
472
Bridgebench
Bridgebench@bridgebench·
@TheoLBorges the context window degradation is a big deal. if it falls apart past 70k tokens, that limits real-world usability significantly
English
1
0
1
524
Theo Borges
Theo Borges@TheoLBorges·
@bridgebench Not only that, but their API is using a quantized model for sure; the quality is subpar. After a 70-80k context window, it gives you gibberish.
English
1
0
5
857
Bridgebench
Bridgebench@bridgebench·
@Divkix coding benchmarks definitely improved over GLM-5. speed feels similar to what you'd expect from their infrastructure, not the model itself
English
1
0
1
527
Divanshu Chauhan (divkix)
@bridgebench How much do you think has intelligence increased compared to to glm-5 and turbo variant? I’m using 5.1 and it seems to work at a normal rate, not as fast as codex, cc tho
English
1
0
1
875
Bridgebench
Bridgebench@bridgebench·
@stepbystepnomad that's a fair take. availability matters just as much as speed. if Claude keeps going down, slower alternatives start looking a lot more attractive
English
0
0
2
1.1K
Danny Hallwood 🇺🇦
Danny Hallwood 🇺🇦@stepbystepnomad·
I suspect GLM, Kimi etc are under higher than normal load as Claude is dishing out both today: - major incidents taking availability offline (again) - cutting max users token limits to levels that don't support a couple hours work For a model to be good - its has to be available.
English
1
0
9
1.8K
Bridgebench
Bridgebench@bridgebench·
@visualdevguy ha, queue it up before bed and check the results in the morning. not the worst strategy
English
1
0
0
120
Bridgebench
Bridgebench@bridgebench·
@justBill totally fair. if you're not running parallel agents, the raw speed matters less. depends on the workflow
English
0
0
2
318
Bill
Bill@justBill·
@bridgebench Honestly for my specific use case I haven’t had an issue with GLM speed
English
1
0
4
456
Bridgebench
Bridgebench@bridgebench·
@kostasbotonakis yeah that's a known issue. language drift is a real problem when the training data skews heavily toward one language
English
1
0
2
278
Konstantinos
Konstantinos@kostasbotonakis·
@bridgebench Works fine if you ask “Hi, what are you?” Except that 8/10 times it replies in Chinese even though the prompt is in plain English.
English
1
0
3
632
Bridgebench
Bridgebench@bridgebench·
@joeychilson good point. the model itself might not be the bottleneck, the infrastructure serving it is. would be interesting to see GLM 5.1 benchmarked on a provider with better compute
English
4
0
7
1.3K
Joey Chilson
Joey Chilson@joeychilson·
@bridgebench I'm pretty sure it's slow because they don't have the compute is serve the model. This is pretty typical of open source models from China. GLM-5 served through them is also slow, but much faster on providers that do have compute and access to the latest chips.
Joey Chilson tweet media
English
1
0
20
1.8K
BridgeMind
BridgeMind@bridgemindai·
GLM 5.1 just dropped. 45.3 on the coding evaluation using Claude Code as the harness. 2.6 points behind Claude Opus 4.6 at 47.9. Nearly 10 points ahead of GLM 5 at 35.4. An open source model is within striking distance of the best closed source coding model in the world. Z.ai keeps shipping. The gap between open source and frontier keeps shrinking. Need to get GLM 5.1 on BridgeBench and see how it performs in real vibe coding workflows. bridgebench.ai
BridgeMind tweet media
English
18
8
333
15.5K
Bridgebench
Bridgebench@bridgebench·
GLM 5.1 just released. We're adding it to BridgeBench. 45.3 on the coding evaluation. 2.6 points behind Claude Opus 4.6. Open source closing the gap fast. Full BridgeBench results dropping soon. Overall, Algo, Debug, Refactor, Gen, UI, Security, Speed, Cost, and Completion Rate. Benchmarks don't lie. Let's see how it holds up.
Bridgebench tweet media
English
0
0
1
364