Bridgebench

23 posts

Bridgebench

@bridgebench

The best vibe coding benchmark in the world. Built by @bridgemindai

United States Beigetreten Mart 2026

4 Folgt61 Follower

Bridgebench@bridgebench·2h

@xundecidability fair. async workflows change the equation. if you're not waiting on results in real time, latency matters less

English

thomas@xundecidability·2h

@bridgebench Disagree. More agentic work is async now.

English

Bridgebench@bridgebench·4h

GLM 5.1 is the slowest frontier model we've ever benchmarked on BridgeBench. 44.3 tokens per second. Half the speed of GPT 5.4. Nearly 6x slower than Grok 4.20. Z.ai traded all of their speed for intelligence. The coding benchmarks improved. The throughput collapsed. In 2026, agentic coding is about parallelism. You're running 5, 10, 15 agents at once. A model this slow bottlenecks every workflow it touches. Intelligence without speed is a luxury most vibe coders can't afford. bridgebench.ai

English

170

16K

Bridgebench@bridgebench·2h

@briantexts solid point. async + thorough planning is a legit workflow where speed matters less. it's the parallel agent use case where latency kills you

English

brian@briantexts·2h

@bridgebench I value the model's intelligence over speed when building real software. Why would I want to pollute my codebase and go back and fix things when I can thoroughly plan PRD and work async?

English

184

Bridgebench@bridgebench·2h

@Manaho217794 that would be the move. if Z.ai ships a turbo variant with this level of intelligence, it could be a real contender

English

114

Manaho@Manaho217794·3h

@bridgebench So we'll have to wait for GLM-5.1 Turbo.

English

568

Bridgebench@bridgebench·2h

@abukinansafadi3 totally valid if you're working on single tasks. the speed penalty only really hits when you scale up agents

English

abukinan_safadi2@abukinansafadi3·2h

@bridgebench I want this I can afford slowness if results are better

English

253

Bridgebench@bridgebench·2h

@shortwavlabs when you're running 10+ agents in parallel, every second of latency compounds. speed is the bottleneck for agentic workflows

English

Stephen @ Shortwav Labs@shortwavlabs·3h

@bridgebench What’s the rush?

English

208

Bridgebench@bridgebench·2h

@qrydor that's actually a smart use case. use GLM for the thinking layer and hand off execution to something faster

English

Szymon Hardy 🇵🇱🇪🇺@qrydor·3h

@bridgebench maybe GLM is ok when you use it for harder tasks or planning, giving execution to faster models?

English

336

Bridgebench@bridgebench·2h

@mavori838 xAI went all in on inference speed. massive GPU clusters optimized for throughput

English

106

Mavori@mavori838·2h

@bridgebench wtf.. why is grok that fast

English

131

Bridgebench@bridgebench·2h

@mSanterre agreed. better hardware could close the gap fast

English

max@mSanterre·2h

@bridgebench Might need to wait until it can be hosted on faster hardware

English

Bridgebench@bridgebench·2h

@XadenRyan @matthewmillerai accuracy matters for sure. the tradeoff only hurts when you're running multiple agents in parallel

English

114

Xaden Ryan@XadenRyan·2h

@bridgebench @matthewmillerai I’d personally have a slow model that’s more accurate. Otherwise I might as well go use a 0.6b model

English

134

Bridgebench@bridgebench·2h

@DimitriGilbert ha fair enough. benchmarking load definitely doesn't help. the variability is real though, that's on their infra side

English

163

Dimitri Gilbert@DimitriGilbert·2h

@bridgebench it is highly variable and their infra sucks before y'all went hammering things with your benches, I was working around the normal speed can't you wait for me to sleep to do that ? XD

English

376

Bridgebench@bridgebench·2h

@TheoLBorges the context window degradation is a big deal. if it falls apart past 70k tokens, that limits real-world usability significantly

English

329

Theo Borges@TheoLBorges·2h

@bridgebench Not only that, but their API is using a quantized model for sure; the quality is subpar. After a 70-80k context window, it gives you gibberish.

English

635

Bridgebench@bridgebench·2h

@Divkix coding benchmarks definitely improved over GLM-5. speed feels similar to what you'd expect from their infrastructure, not the model itself

English

346

Divanshu Chauhan (divkix)@Divkix·2h

@bridgebench How much do you think has intelligence increased compared to to glm-5 and turbo variant? I’m using 5.1 and it seems to work at a normal rate, not as fast as codex, cc tho

English

665

Bridgebench@bridgebench·2h

@stepbystepnomad that's a fair take. availability matters just as much as speed. if Claude keeps going down, slower alternatives start looking a lot more attractive

English

729

Danny Hallwood 🇺🇦@stepbystepnomad·3h

I suspect GLM, Kimi etc are under higher than normal load as Claude is dishing out both today: - major incidents taking availability offline (again) - cutting max users token limits to levels that don't support a couple hours work For a model to be good - its has to be available.

English

1.4K

Bridgebench@bridgebench·2h

@visualdevguy ha, queue it up before bed and check the results in the morning. not the worst strategy

English

Mateusz Mirkowski@visualdevguy·2h

@bridgebench And that's why it can be great for "night coding". ;)

English

144

Bridgebench@bridgebench·2h

@justBill totally fair. if you're not running parallel agents, the raw speed matters less. depends on the workflow

English

218

Bill@justBill·2h

@bridgebench Honestly for my specific use case I haven’t had an issue with GLM speed

English

343

Bridgebench@bridgebench·2h

@kostasbotonakis yeah that's a known issue. language drift is a real problem when the training data skews heavily toward one language

English

181

Konstantinos@kostasbotonakis·3h

@bridgebench Works fine if you ask “Hi, what are you?” Except that 8/10 times it replies in Chinese even though the prompt is in plain English.

English

523

Bridgebench@bridgebench·2h

@joeychilson good point. the model itself might not be the bottleneck, the infrastructure serving it is. would be interesting to see GLM 5.1 benchmarked on a provider with better compute

English

855

Joey Chilson@joeychilson·2h

@bridgebench I'm pretty sure it's slow because they don't have the compute is serve the model. This is pretty typical of open source models from China. GLM-5 served through them is also slow, but much faster on providers that do have compute and access to the latest chips.

English

1.3K

Bridgebench@bridgebench·12h

@bridgemindai Let's go

English

535

BridgeMind@bridgemindai·12h

GLM 5.1 just dropped. 45.3 on the coding evaluation using Claude Code as the harness. 2.6 points behind Claude Opus 4.6 at 47.9. Nearly 10 points ahead of GLM 5 at 35.4. An open source model is within striking distance of the best closed source coding model in the world. Z.ai keeps shipping. The gap between open source and frontier keeps shrinking. Need to get GLM 5.1 on BridgeBench and see how it performs in real vibe coding workflows. bridgebench.ai

English

327

15K

Bridgebench@bridgebench·12h

GLM 5.1 just released. We're adding it to BridgeBench. 45.3 on the coding evaluation. 2.6 points behind Claude Opus 4.6. Open source closing the gap fast. Full BridgeBench results dropping soon. Overall, Algo, Debug, Refactor, Gen, UI, Security, Speed, Cost, and Completion Rate. Benchmarks don't lie. Let's see how it holds up.

English

338

Entdecken

@xundecidability @briantexts @Manaho217794 @abukinansafadi3 @shortwavlabs @qrydor @mavori838 @mSanterre