Shaun Smith

1.6K posts

Shaun Smith

@evalstate

https://t.co/Hf39YSdoP3 https://t.co/rA1Uook47l https://t.co/TCqQhhMSrk https://t.co/76p6mDAN3R

united kingdom Entrou em Temmuz 2024

726 Seguindo875 Seguidores

Shaun Smith@evalstate·4h

@dominikhonnef When they did the $1,000 free credits when they launched the Claude Code it scooped up an API key and started using it. Issue filed, no response, out of pocket.

English

Dominik Honnef@dominikhonnef·18h

Anthropic's "Claude for Open Source" program has been an awful experience for me. I applied and got invited a couple days later. When I used the promo link to get free Max, it charged me $200, anyway. It has been 17 days of me trying to talk to a human to rectify that.

English

833

Shaun Smith@evalstate·19h

@badlogicgames Let's do it 🍻. The last one was great fun!

English

321

Mario Zechner@badlogicgames·19h

who here will be a AI Engineer London in April? I'm ready to have more pub visits.

English

8.6K

Shaun Smith@evalstate·1d

@jxnlco what's most token dense?

English

421

jason liu@jxnlco·1d

Future of AI

English

61.9K

Shaun Smith@evalstate·1d

@OpenAINewsroom @astral_sh Go team 🐍

English

367

OpenAI Newsroom@OpenAINewsroom·1d

We've reached an agreement to acquire Astral. After we close, OpenAI plans for @astral_sh to join our Codex team, with a continued focus on building great tools and advancing the shared mission of making developers more productive. openai.com/index/openai-t…

English

476

819

7.1K

3.8M

Shaun Smith@evalstate·1d

@arvidkahl I've found in my testing that it's perfectly usable for coding -- if I were on API rather than plan it would be my default choice.

English

304

Arvid Kahl@arvidkahl·1d

If you do AI inference via OpenAI’s API, you should use the flex tier for half price. My requests always try to use flex tier first, and on 429 / 500 errors, I use the default service tier. 95% of my requests are flex. 2 tries flex, then fall back to standard. Massive cost cut.

English

172

19.1K

Shaun Smith@evalstate·2d

@idosal1 I've played, can confirm it's fing cool.

English

Ido Salomon@idosal1·2d

The bigger IDE is multiplayer. AgentCraft now lets humans and agents collaborate in one shared workspace! ⚔️ See allies on one map. Share context. Hand off agent work across machines.

Andrej Karpathy@karpathy

Expectation: the age of the IDE is over Reality: we’re going to need a bigger IDE (imo). It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.

English

262

34.5K

Shaun Smith@evalstate·2d

@liran_tal It's not even the best harness for Claude.

English

Liran Tal@liran_tal·2d

I've been saying this for so long. I'm not even sure the CC harness is at all better than the competition (codex, cursor), just folks uninformed

kitze 🛠️ tinkerer.club@thekitze

the Codex app is a trillion and one time better than the codex cli and any other cli as a matter of fact. fuck tui, gimme ui all day every day

English

830

Shaun Smith@evalstate·2d

Loving this new way of looking at the Hugging Face Hub; Generative UI. Early days, but looking promising 😎

English

518

Shaun Smith@evalstate·2d

I'll publish a few quickstart packs over the next couple of days with development environments optimised for Codex and Hugging Face IP and code mode subagents.

English

Shaun Smith@evalstate·2d

OK, we know the drill by now. Proper llama.cpp support yesterday, now the best(?) coding and general purpose agent has GPT-5.4-mini/nano support. Oh, and MCP Server side migration to FastMCP3 (deprecating SSE transports). Web Search is very snappy with the mini model 🌐

English

217

Shaun Smith@evalstate·3d

@nicoritschel @AbeWheeler5 👀

QME

Nico Ritschel@nicoritschel·3d

@evalstate @AbeWheeler5 has something for this

English

Shaun Smith@evalstate·3d

What's a good simple MCP Apps testing tool that isn't MCP Jam?

English

746

Shaun Smith@evalstate·3d

@Shashikant86 (those are rolled up loops btw)

English

Shaun Smith@evalstate·3d

@Shashikant86 Definitely some variance in TTFT -- think that's probably the issue.

English

Shashi 🇬🇧🇺🇸@Shashikant86·3d

Codex is amazing and codes bug free softwares than any other agents. However, I would love to see the fast mode in the Codex so that I can get something done without needed the deep analysis of the code. It would be amazing to get "fast" mode in the codex for those who wants needs to get something done quickly when time is money. @thsottiaux @ah20im @romainhuet Something similar to what @AmpCode did like smart/fast mode etc and let users decides.

Ahmed@ah20im

What would you like to see in Codex?

English

209

Shaun Smith@evalstate·3d

@Shashikant86 Yes, I'm trialling using spark for it again, but gpt-oss-120b with heavy guardrails still seems to beat it.

English

Shaun Smith@evalstate·3d

And that's the FastMCP 3.1.1 migration completed, with working Hugging Face OAuth and token passthrough, and elicitations and all that stuff! A wise man said "you're weird - I bet you can't just change the imports". They were right 🤣

English

573

Shaun Smith@evalstate·3d

Yes, the llama.cpp thing is nice as it makes it very easy to download models, and not having to configure windows, output lengths etc. is super convenient. Qwen3.5-9B is small and capable. As a subagent, you can just ask a big model to tune it for a task you have in mind (keep history off etc.)

English

Christopher@communicating·3d

@evalstate Are you seeing anything close to a “1 million” token window or does it start losing its mind at around 40% like so many other larger context subdued have in the past? I’m not doing much local at the moment but loving the llama.cpp support and it’ll be a big crowd pleaser. 👍

English

Shaun Smith@evalstate·3d

fast-agent 0.6.0... big update for Anthropic 1M Context Window defaults, Google model improvements... and llama.cpp support. Discover and sets model parameters and capabilities (e.g. Vision) from llama.cpp servers.

English

291

Shaun Smith@evalstate·3d

@communicating I did the change a few days back, and given it a few workouts. I know the NIH benchmarks going around look good for 1M, but I find Claude models start losing coherence around the ~120k level for code - and the new settings don't seem to have changed that. Glad it's free now tho'

English

Shaun Smith@evalstate·3d

Friends let friends use their 4*GTX 3090 cluster. Thanks for the tokens @SecretiveShell 🙂

English

124

Descobrir

@dominikhonnef @badlogicgames @jxnlco @OpenAINewsroom @astral_sh @arvidkahl @idosal1 @liran_tal