Shaun Smith

1.9K posts

Shaun Smith banner
Shaun Smith

Shaun Smith

@evalstate

https://t.co/Hf39YScQZv https://t.co/rA1UoojwhN https://t.co/TCqQhhMkBM https://t.co/76p6mDAfej

united kingdom Katılım Temmuz 2024
796 Takip Edilen962 Takipçiler
Shaun Smith
Shaun Smith@evalstate·
LOL, Codex Goal stuck in a waiting human review loop it can't get out of. (polling every second).
Shaun Smith tweet media
English
0
0
0
54
Shaun Smith
Shaun Smith@evalstate·
@thsottiaux I've had some things not go as well as usual, hard to tell if it's a performance issue but some sessions have felt a bit "off" the last couple of days. Good luck with the investigation.
English
0
0
0
313
Tibo
Tibo@thsottiaux·
Codex team is aware of reports of GPT-5.5 performing worse for some users and investigating. We don't have anything conclusive yet and systems are healthy but we will share updates as we go.
English
595
151
5K
986.3K
Shaun Smith retweetledi
Julien Chaumond
Julien Chaumond@julien_c·
Friday project: Readable rewrite of the hardware-detection module behind @midudev's canirun-ai. Same heuristics, shaders & spec tables — just descriptive names + JSDoc. github.com/julien-c/canir…
English
4
2
7
7.8K
Shaun Smith
Shaun Smith@evalstate·
I don't disagree with your central thesis and certainly agree that frontier models are way overused. This: claude -p "review this PR" and claude < "review this PR" being charged differently is absurd. Savvy users now know they are in a game of ToS enforcement cat-and-mouse. Will "automation detection" be tripped by 3rd party voice transcription? Or with keyboard macro software? Or over SSH? I don't think these questions are hyperbole at all.
English
1
0
0
64
Tadas Antanavicius
Tadas Antanavicius@tadasayy·
Unpopular opinion: as a heavy user of claude -p on Claude Max plans, I actually think Anthropic is going to come out ahead with this move. If Anthropic had to turn flat fees off for programmatic usage, OpenAI is going to have to eventually as well. Anyone relying on these flat fee subscriptions is going to churn off the idea they can count on them. It's too risky to build infrastructure and workflows wholly reliant on the subsidization now. And if most people do that, then it's a level playing field where the best mix of token cost vs. model performance wins. People will still use Opus and Sonnet. They'll probably spread out and incorporate other cheaper models too. We'll be forced to be more economical and stop bringing the Ferrari for workloads where a Prius will do. The ecosystem is going to do what we always do: innovate and figure out how to accomplish the same outcomes with more efficient token spend, so the API fees become stomachable. I expect to churn off of my Claude Max accounts, but I think I'll continue to pay a decent amount in API fees to Anthropic, and probably a bunch to other model providers too where I figure out opportunities to downshift from Opus without meaningful performance loss.
Matt Pocock@mattpocockuk

This is the clarity we've been crying out for. But it's a poisoned chalice. This is a 10X cut to claude -p disguised as a monthly bonus. Anthropic is discouraging any kind of programmatic usage. And that's fine - no subsidy lasts forever. But it's time to try Codex.

English
1
0
5
355
Shaun Smith
Shaun Smith@evalstate·
@reach_vb Does that have QR code send a link with credit on it? I've got a friend who wants to try for a month...
English
0
0
0
166
Shaun Smith
Shaun Smith@evalstate·
@stochasticchasm Possible, but my search setup tightly coupled to gpt-oss-120b needed a lot of refinement to get working with spark. both are very odd/different models to work with though.
English
0
0
4
1.5K
stochasm
stochasm@stochasticchasm·
insane if true (from semianalysis)
stochasm tweet media
English
16
4
328
74.1K
Theo - t3.gg
Theo - t3.gg@theo·
Setting an upper bound at $20,000 because I'm already gonna be broke with the increase in inference.
English
16
3
874
61K
Shaun Smith
Shaun Smith@evalstate·
OK, last post on this *but* it would be so much better if this new credit was accessible with.... an API key.
Shaun Smith tweet media
English
0
0
0
143
Shaun Smith
Shaun Smith@evalstate·
OK, so on an evening where the PAYG API is doing this - again - Anthropic have increased limits on Claude Plans by gifting a separate credit pool on top of existing usage? Really?
Shaun Smith tweet media
English
0
0
0
161
Shaun Smith
Shaun Smith@evalstate·
@mattpocockuk This might make sense if Codex was worse at the moment. For most people it's clearly better. For Anthropic this seems like the worst time to have people "try Codex". Bizarre times.
English
0
0
3
395
Matt Pocock
Matt Pocock@mattpocockuk·
This is the clarity we've been crying out for. But it's a poisoned chalice. This is a 10X cut to claude -p disguised as a monthly bonus. Anthropic is discouraging any kind of programmatic usage. And that's fine - no subsidy lasts forever. But it's time to try Codex.
ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English
230
173
3.4K
286.1K
Shaun Smith
Shaun Smith@evalstate·
Did I just read that you will have a separate credit account for sending a message via a command line flag rather than typing it in? What sort of stupid shit is that?
English
2
0
5
325
Christopher
Christopher@communicating·
@evalstate Stumbled on this & on 1st glance it looked interesting & since we were taking PII thot I’d share Not a rec since I haven’t had time to actually review it yet MemPrivacy - Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents: huggingface.co/papers/2605.09…
English
1
0
2
44
Shaun Smith
Shaun Smith@evalstate·
@reach_vb Hi Codex, codex codex, codexcodexcodex, codex. codexxxmaxxing.
Deutsch
0
0
2
55
Vaibhav (VB) Srivastav
putting together a group chat for Codex power users in London / Europe who are the biggest ballers around?
English
254
4
314
104.4K
Shaun Smith
Shaun Smith@evalstate·
Not sure I'm going to get the answer I want, but how is Panther Lake for CPU inference?
English
0
0
0
139
Shaun Smith retweetledi
clem 🤗
clem 🤗@ClementDelangue·
As President Trump meets President Xi this week, a call to the American AI community: If your startup, lab, non-profit or company benefits from open international AI - especially Chinese (Deepseek, Qwen, Kimi, GLM,…), please share! Open source is the most important driver of competition, jobs and wealth creation in AI today. Let’s support and promote it at critical times like this week!
English
33
73
544
74.8K
Shaun Smith
Shaun Smith@evalstate·
@rachelnabors You should. It's got a new trick which is to render both desktop and mobile views of websites to make sure that content looks good.
English
0
0
1
16
Shaun Smith
Shaun Smith@evalstate·
Using GEPA to design updated Tool Schemas...
Shaun Smith tweet media
English
1
1
21
1K