Bridgebench

23 posts

Bridgebench banner
Bridgebench

Bridgebench

@bridgebench

The best vibe coding benchmark in the world. Built by @bridgemindai

United States Beigetreten Mart 2026
4 Folgt61 Follower
Bridgebench
Bridgebench@bridgebench·
@xundecidability fair. async workflows change the equation. if you're not waiting on results in real time, latency matters less
English
0
0
0
38
thomas
thomas@xundecidability·
@bridgebench Disagree. More agentic work is async now.
English
1
0
1
41
Bridgebench
Bridgebench@bridgebench·
GLM 5.1 is the slowest frontier model we've ever benchmarked on BridgeBench. 44.3 tokens per second. Half the speed of GPT 5.4. Nearly 6x slower than Grok 4.20. Z.ai traded all of their speed for intelligence. The coding benchmarks improved. The throughput collapsed. In 2026, agentic coding is about parallelism. You're running 5, 10, 15 agents at once. A model this slow bottlenecks every workflow it touches. Intelligence without speed is a luxury most vibe coders can't afford. bridgebench.ai
Bridgebench tweet media
English
26
8
170
16K
Bridgebench
Bridgebench@bridgebench·
@briantexts solid point. async + thorough planning is a legit workflow where speed matters less. it's the parallel agent use case where latency kills you
English
1
0
1
61
brian
brian@briantexts·
@bridgebench I value the model's intelligence over speed when building real software. Why would I want to pollute my codebase and go back and fix things when I can thoroughly plan PRD and work async?
English
1
0
1
184
Bridgebench
Bridgebench@bridgebench·
@Manaho217794 that would be the move. if Z.ai ships a turbo variant with this level of intelligence, it could be a real contender
English
0
0
1
114
Manaho
Manaho@Manaho217794·
@bridgebench So we'll have to wait for GLM-5.1 Turbo.
English
1
0
3
568
Bridgebench
Bridgebench@bridgebench·
@abukinansafadi3 totally valid if you're working on single tasks. the speed penalty only really hits when you scale up agents
English
0
0
0
68
Bridgebench
Bridgebench@bridgebench·
@shortwavlabs when you're running 10+ agents in parallel, every second of latency compounds. speed is the bottleneck for agentic workflows
English
1
0
1
39
Bridgebench
Bridgebench@bridgebench·
@qrydor that's actually a smart use case. use GLM for the thinking layer and hand off execution to something faster
English
0
0
0
45
Bridgebench
Bridgebench@bridgebench·
@mavori838 xAI went all in on inference speed. massive GPU clusters optimized for throughput
English
0
0
1
106
Bridgebench
Bridgebench@bridgebench·
@mSanterre agreed. better hardware could close the gap fast
English
0
0
0
62
max
max@mSanterre·
@bridgebench Might need to wait until it can be hosted on faster hardware
English
1
0
1
77
Bridgebench
Bridgebench@bridgebench·
@DimitriGilbert ha fair enough. benchmarking load definitely doesn't help. the variability is real though, that's on their infra side
English
1
0
1
163
Dimitri Gilbert
Dimitri Gilbert@DimitriGilbert·
@bridgebench it is highly variable and their infra sucks before y'all went hammering things with your benches, I was working around the normal speed can't you wait for me to sleep to do that ? XD
English
1
0
1
376
Bridgebench
Bridgebench@bridgebench·
@TheoLBorges the context window degradation is a big deal. if it falls apart past 70k tokens, that limits real-world usability significantly
English
1
0
1
329
Theo Borges
Theo Borges@TheoLBorges·
@bridgebench Not only that, but their API is using a quantized model for sure; the quality is subpar. After a 70-80k context window, it gives you gibberish.
English
1
0
4
635
Bridgebench
Bridgebench@bridgebench·
@Divkix coding benchmarks definitely improved over GLM-5. speed feels similar to what you'd expect from their infrastructure, not the model itself
English
1
0
1
346
Divanshu Chauhan (divkix)
@bridgebench How much do you think has intelligence increased compared to to glm-5 and turbo variant? I’m using 5.1 and it seems to work at a normal rate, not as fast as codex, cc tho
English
1
0
1
665
Bridgebench
Bridgebench@bridgebench·
@stepbystepnomad that's a fair take. availability matters just as much as speed. if Claude keeps going down, slower alternatives start looking a lot more attractive
English
0
0
2
729
Danny Hallwood 🇺🇦
Danny Hallwood 🇺🇦@stepbystepnomad·
I suspect GLM, Kimi etc are under higher than normal load as Claude is dishing out both today: - major incidents taking availability offline (again) - cutting max users token limits to levels that don't support a couple hours work For a model to be good - its has to be available.
English
1
0
9
1.4K
Bridgebench
Bridgebench@bridgebench·
@visualdevguy ha, queue it up before bed and check the results in the morning. not the worst strategy
English
1
0
0
84
Bridgebench
Bridgebench@bridgebench·
@justBill totally fair. if you're not running parallel agents, the raw speed matters less. depends on the workflow
English
0
0
2
218
Bill
Bill@justBill·
@bridgebench Honestly for my specific use case I haven’t had an issue with GLM speed
English
1
0
4
343
Bridgebench
Bridgebench@bridgebench·
@kostasbotonakis yeah that's a known issue. language drift is a real problem when the training data skews heavily toward one language
English
1
0
2
181
Konstantinos
Konstantinos@kostasbotonakis·
@bridgebench Works fine if you ask “Hi, what are you?” Except that 8/10 times it replies in Chinese even though the prompt is in plain English.
English
1
0
3
523
Bridgebench
Bridgebench@bridgebench·
@joeychilson good point. the model itself might not be the bottleneck, the infrastructure serving it is. would be interesting to see GLM 5.1 benchmarked on a provider with better compute
English
2
0
5
855
Joey Chilson
Joey Chilson@joeychilson·
@bridgebench I'm pretty sure it's slow because they don't have the compute is serve the model. This is pretty typical of open source models from China. GLM-5 served through them is also slow, but much faster on providers that do have compute and access to the latest chips.
Joey Chilson tweet media
English
1
0
16
1.3K
BridgeMind
BridgeMind@bridgemindai·
GLM 5.1 just dropped. 45.3 on the coding evaluation using Claude Code as the harness. 2.6 points behind Claude Opus 4.6 at 47.9. Nearly 10 points ahead of GLM 5 at 35.4. An open source model is within striking distance of the best closed source coding model in the world. Z.ai keeps shipping. The gap between open source and frontier keeps shrinking. Need to get GLM 5.1 on BridgeBench and see how it performs in real vibe coding workflows. bridgebench.ai
BridgeMind tweet media
English
18
8
327
15K
Bridgebench
Bridgebench@bridgebench·
GLM 5.1 just released. We're adding it to BridgeBench. 45.3 on the coding evaluation. 2.6 points behind Claude Opus 4.6. Open source closing the gap fast. Full BridgeBench results dropping soon. Overall, Algo, Debug, Refactor, Gen, UI, Security, Speed, Cost, and Completion Rate. Benchmarks don't lie. Let's see how it holds up.
Bridgebench tweet media
English
0
0
1
338