Chris Rabl

2.3K posts

Chris Rabl

@crabl

🇨🇦/acc, technoalchemist @chaosgremlinlab

Lethbridge, Alberta Katılım Ağustos 2008

1.4K Takip Edilen404 Takipçiler

Chris Rabl retweetledi

chaos gremlin labs@chaosgremlinlab·5d

you should be gremlinmaxxing

English

Chris Rabl@crabl·4d

thank you @modal for existing! serverless H200 instances with snapshots are so nice for quick experiments

English

Chris Rabl@crabl·20 Nis

@haki_xer cards and border radius is a dead giveaway

English

Haki@haki_xer·20 Nis

Be honest, does this look vibe coded and if yes why?

English

131

23.8K

Chris Rabl@crabl·20 Nis

@VictorTaelin you have to hold the model differently, they don't work the same

English

Taelin@VictorTaelin·19 Nis

people who swear 4.7 > 4.6 (if anyone): what are you doing

English

200

580

79.3K

Chris Rabl@crabl·18 Nis

opus 4.7 is a VERY GOOD model

English

Chris Rabl@crabl·17 Nis

@joshcrnls bc their target audience is not "designers"

English

Josh@joshcrnls·17 Nis

can someone explain to me why everyone thinks an ai lab made up entirely of engineers is going to build a better design product than a company who has lived and breathed the needs of designers for a decade

Claude@claudeai

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English

513

2.4K

555K

Chris Rabl@crabl·15 Nis

@JakeKing @realsigridjin building @chaosgremlinlab here in AB, llm eval tooling and mech interp

English

Jake@JakeKing·15 Nis

Who's building devtools in 🇨🇦right now? want my algo to be filled with cool people building cool shit up here.

English

109

227

14.5K

Chris Rabl@crabl·13 Nis

anyway. the anthropic team (@bcherny @trq212 and everyone else) are clearly working hard on this. just wanted to share concrete data instead of vibes. it's easy to criticize those in the arena, but please remember they are humans working their butts off for your benefit <3

English

Chris Rabl@crabl·13 Nis

in contrast to others i don't think the fix is more thinking tokens. tool selection and "thrash" detection are two places to start: make the model verify against actual source material before it starts claiming things, and detect when the model starts swirling and second-guessing

English

Chris Rabl@crabl·13 Nis

been data mining my claude code transcripts to figure out why conversations "feel worse" lately. pulled out 264 "tilt" incidents across 127 sessions, ~9k tool calls and this is what i found...

English

Chris Rabl@crabl·13 Nis

@kcosr @sdmat123 yes, this takes into account reading files through both the standard Read and Grep tools as well as cat, head, tail, diff, grep, git diff, git show, sed, etc. via Bash

English

Kevin@kcosr·13 Nis

@crabl @sdmat123 Does this account for files read using other tools like Bash or Exec (or whatever Anthropic calls them)? These can output file contents from shell commands like cat, grep, sed and diff.

English

sdmat@sdmat123·12 Nis

Thariq, Firstly, huge respect to you and your team for the innovative work on Claude Code over the past year. But "mostly" doesn't cut it here. Yes, one of the metrics in the report is flawed due to your change hiding the thinking traces. That doesn't invalidate the other objective behavioral metrics showing severe degradation. This is a high profile user who put a huge amount of effort into communicating exactly what they were seeing with more rigor and precision than Anthropic provides regarding changes. Thousands of other users are shouting about their similar experiences. Anthropic has won a lot of love from developers. There are only two ways to retain that love when performance drops: 1) Fix the problem 2) Be transparent about necessary compromises/tradeoffs, including explaining what changed in detail If this level of transparency will alienate users, that requires some soul searching.

Thariq@trq212

@Hesamation boris responded to this in depth in the issue- it's mostly just that we stopped showing thinking summaries for latency (you can opt-in to showing it) which was affecting the thinking measurement in the post #issuecomment-4194007103" target="_blank" rel="nofollow noopener">github.com/anthropics/cla…

English

190

17.5K

Chris Rabl@crabl·13 Nis

@ns123abc grok and gemini combo in opencode

English