John Solly

5.2K posts

John Solly banner
John Solly

John Solly

@_jsolly

Technologist, crossfitter, and satire lover

Philadelphia, PA 가입일 Aralık 2021
569 팔로잉721 팔로워
Peter Gostev
Peter Gostev@petergostev·
Note we've renamed Code Arena to Frontend Design: WebDev for these chats. I hope this is less confusing, but lmk if you have better suggestions
Arena.ai@arena

MiMo-V2.5 by @XiaomiMiMo is the #11 model (#3 among open) in Code Arena for frontend design. A new MIT-licensed open source model with 1M context, it also ranks strongly as an open model in Text and Vision Arena. Code Arena: frontend webdev design - MiMo-V2.5-Pro: #3 open (#11 overall) - MiMo-V2.5: #5 open (#18 overall) Text Arena: text prompts - MiMo-V2.5-Pro: #2 open (#22 overall) Vision Arena: visual input reasoning MiMo-V2.5: #7 open (#37 overall) Congrats to the @XiaomiMiMo team on this achievement!

English
12
9
172
19.6K
Randy Olson
Randy Olson@randal_olson·
Someone discovered that Claude knows when you've been cheating on it with Codex. Lucas maintains an open relationship with his LLMs. Source: reddit.com/r/ClaudeAI/com…
Randy Olson tweet media
English
1
0
1
1.2K
John Solly
John Solly@_jsolly·
The amount of sass and GenZ-iums I get from Opus 4.7 constantly catches me by surprise. Anyone else?
English
0
0
3
25
John Solly
John Solly@_jsolly·
@JamesTimmins I feel like at this point agents should just abstract them away.
English
0
0
0
27
James Timmins
James Timmins@JamesTimmins·
I hate git worktrees so much
English
4
0
3
144
John Solly
John Solly@_jsolly·
I think there’s a Jevon’s paradox with increasing model capability. As models get better, my prompts become more vague and encompass harder and harder tasks. So the ‘cheap, but good enough’ idea falls apart. You always want the best model. But for fixed, commodity prompts, it makes sense to use a lesser model.
English
0
0
0
19
John Solly
John Solly@_jsolly·
@nghoihin @derekmeegan This is an interesting approach. If you have it connected to Claude via MCP, are you running docker in the cloud?
English
0
0
1
20
John Solly
John Solly@_jsolly·
@beffjezos Finally, an opportunity to increase shareholder value at my kid’s piano recital.
English
0
0
0
56
John Solly
John Solly@_jsolly·
@derekmeegan F it. Put the ontology in an AGENTS.md and then just have a bunch of markdown files it references.
English
0
0
0
13
derek
derek@derekmeegan·
@_jsolly idk if it’s that there aren’t good tools for repackaging/distilling facts but if models are actually good at using them/referencing context intelligently
English
1
0
0
20
Tai Groot 🐧
Tai Groot 🐧@taigrr·
my agent is smarter than your agent
English
2
0
9
197
John Solly
John Solly@_jsolly·
Oh good point! Maybe MCPs help solve this. Take each important part of your stack and limit how and who can manipulate it. They’re also seems to be good progress with running LLMs in sandboxes and controlling how they can ‘escape’ But eventually, you get to a point where you’re exchanging flexibility for control.
English
0
0
1
13
Engel Nyst - open/acc
Engel Nyst - open/acc@engelnyst·
@_jsolly Absolutely 1) I do wonder about 2), it doesn’t seem easy when the agent is powerful. So many ways to write in bash/python/anything a thing! LLMs, even Claude cloud agent, are like “hmm not allowed, let me try that way” - boom. Maybe with proxy/OS interception of the action?
English
1
0
0
29
Engel Nyst - open/acc
Engel Nyst - open/acc@engelnyst·
One more time, for those in the back: Prompts are not safeguards Prompts are not safeguards Prompts are not safeguards Repeat: All LLMs are jailbreakable. Prompts are not safeguards. All LLMs are jailbreakable. Prompts are not safeguards. (yes normal prompts! No Pliny required)
JER@lifeof_jer

x.com/i/article/2048…

English
2
0
3
123
svg
svg@newgeographer2·
Green or purple?
svg tweet mediasvg tweet media
English
2
0
4
295
Kent C. Dodds 🏹
Kent C. Dodds 🏹@kentcdodds·
I'm currently writing an article titled "The Last Software Engineer"
English
59
5
258
76.3K
John Solly
John Solly@_jsolly·
Chat, GPT-5.5 is mid
Arena.ai@arena

GPT-5.5 by @OpenAI is now live in the Arena, landing across multiple leaderboards. Here’s how it ranks by modality: - Code Arena (agentic web dev): #9, a strong +50pt jump over GPT-5.4 - Document Arena (analysis & long-content reasoning): #6, on par with Sonnet 4.6 - Text Arena: #7, Math #3, Instruction Following: #8 - Expert Arena: #5 - Search Arena: #2 - Vision Arena: #5 Strong, well-rounded performance, especially in Code (+50 pts vs GPT-5.4). Congrats to @OpenAI on the release. Full category breakdowns by modality in the thread.

English
0
0
1
85