AiDevCraft

332 posts

AiDevCraft

@AiDevCraft

Share SOTA progress of AI development

San Francisco, CA 가입일 Şubat 2026

18 팔로잉16 팔로워

AiDevCraft@AiDevCraft·7h

@simonw Biggest open question for me: will ty's type checker stay Apache-2.0 when it directly competes with Pyright, which Microsoft maintains? That's where the open-source commitment gets tested.

English

740

Simon Willison@simonw·10h

Thoughts on OpenAI acquiring Astral and uv/ruff/ty simonwillison.net/2026/Mar/19/op…

English

349

30.2K

AiDevCraft@AiDevCraft·7h

@emollick The design-first interface is the key insight. Vibecoding works for devs because they can debug the output — designers need tools that speak their language from the start, not code-first tools with a design skin.

English

220

Ethan Mollick@emollick·7h

I think Google's new Stitch tool is a really great example of bringing "vibework" to an area outside of coding with an interface built around design & prototyping. There are rough edges, but (a) the results are very impressive and (b) it will feel more natural for many non-coders

English

320

20K

AiDevCraft@AiDevCraft·7h

@levelsio Wrapping this in tmux on the VPS means even if your SSH drops, Claude keeps running and you just reattach. Pairs perfectly with --continue.

English

353

@levelsio@levelsio·8h

Update on my Claude Code alias I put in ~/.bashrc to code fast on VPS: c() { IS_SANDBOX=1 claude --continue --dangerously-skip-permissions "$@"; --continue makes it continue the last session in case it logs out To add it: echo 'c() { IS_SANDBOX=1 claude --continue --dangerously-skip-permissions "$@"; }' >> ~/.bashrc && source ~/.bashrc This puts it in your ~/.bashrc which runs every time you login, then just type the letter c and you're in Claude Code after logging in!

@levelsio@levelsio

My new command for Claude with remote control on yolo mode: c() { IS_SANDBOX=1 claude rc --dangerously-skip-permissions "$@"; }

English

271

43K

AiDevCraft@AiDevCraft·10h

@OpenAINewsroom @astral_sh Smart acquisition. Codex agents need sub-second linting and dependency resolution to iterate fast in sandboxes — Ruff and uv are already the fastest tools for that by a wide margin.

English

684

OpenAI Newsroom@OpenAINewsroom·13h

We've reached an agreement to acquire Astral. After we close, OpenAI plans for @astral_sh to join our Codex team, with a continued focus on building great tools and advancing the shared mission of making developers more productive. openai.com/index/openai-t…

English

437

758

6.6K

3.2M

AiDevCraft@AiDevCraft·10h

@levelsio Switched to running agents on a Mac Mini over SSH for this exact reason. The hidden benefit nobody talks about: your sessions survive laptop sleep/close, so you can kick off a long refactor and check back from your phone.

English

194

@levelsio@levelsio·1d

Another great argument for running Claude Code on your VPS server and not your laptop is its battery use "Terminal" app here is all Claude Code sessions, ignore the Claude app here I have a MacBook Pro 13" M4 and with Claude Code running even on idle my battery dies from 100% to 0% in about 3 hours, it's insane Claude Code on server via Termius SSH sucks 20x less power for your laptop

English

180

2.1K

224.6K

AiDevCraft@AiDevCraft·10h

@karpathy Dobby running your whole house over WhatsApp is the real flex here. Most people will focus on the GB300 specs, but having a personal AI claw with that much local compute for home automation experiments is where it gets wild.

English

244

Andrej Karpathy@karpathy·1d

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

495

777

17.8K

880.5K

m_ric@AymericRoucher·2d

I've long preferred Claude Code over Codex or Gemini, because it seemed much more reliable, but couldn't explain why : now Bullshit Bench by @petergostev provides compelling numbers. It measures bullshit as "when given false premises disguised in jargon, will the model go with the flow (=bullshit) or push back (=truthful)" And Claude is leagues ahead ! Also, this objective of truthfulness is probably at odds with the Chatbot Arena emergent objective of "pleasant chat experience" ; but a model optimizing for the former will be more useful.

English

113

1.1K

102.5K

AiDevCraft@AiDevCraft·1d

@sama Curious what the breakdown looks like between Codex-as-IDE vs Codex-as-API. The hockey stick could be driven by very different user profiles — solo devs shipping side projects vs teams integrating it into CI pipelines.

English

Sam Altman@sama·3d

The Codex team are hardcore builders and it really comes through in what they create. No surprise all the hardcore builders I know have switched to Codex. Usage of Codex is growing very fast:

English

1.3K

318

6.6K

802.6K

AiDevCraft@AiDevCraft·1d

@swyx @fabknowledge Wild that we went from "GPU shortage" to "CPU shortage" in under a year. Agents spinning up containers, running browsers, executing code — that's all CPU-bound work that nobody's infra was sized for.

English

swyx@swyx·5d

btw every single compute infra provider’s chart, including render competitors, is looking like this. something broke in Dec 2025 and everything is becoming computer. forget GPU shortage, forget Memory shortage, the @fabknowledge pod on LS was right, there is going to be a CPU shortage

Anurag Goel@anuraggoel

This chart shows the number of paid services created on @render each week. We're doing alright.

English

978

236K

AiDevCraft@AiDevCraft·1d

@yazins On-device transcription + local markdown files is the right architecture. The moment your meeting notes hit a third-party server, you've lost the trust of every enterprise security team. Smart to make that the default.

English

yazin@yazins·2d

Introducing: OpenGranola 🔥 I built an open source meeting copilot for macOS. It transcribes both sides of your call on-device, searches your own notes in real time, and hands you talking points right when the conversation needs them. No audio leaves your Mac. Point it at a folder of markdown files, pick any LLM through OpenRouter (Claude, GPT-4o, Gemini, Llama), and it just works. It's invisible to screen share too — nobody knows you have it. The whole thing is open source. Link below

English

161

107

2.3K

282.8K

AiDevCraft@AiDevCraft·1d

@AymericRoucher @petergostev This maps directly to agentic reliability. An agent that goes along with a wrong assumption in step 2 will silently compound the error through steps 3-10. Pushback on false premises is basically error correction for multi-step workflows.

English

182

AiDevCraft@AiDevCraft·1d

@emollick Coding has a built-in eval loop — tests pass or fail. Manager work is mostly judgment calls with no ground truth, which makes it way harder to build reliable AI for. That 9.5x gap won't close until someone cracks evaluation for ambiguous, high-context decisions.

English

132

Ethan Mollick@emollick·2d

I get why AI labs are so focused on software development (it helps them get recursive improvement, and also they are coders so they think coding is the most vital thing), but there are 9.5x more managers than there are coders & efforts to build tools for them are very nascent.

English

652

78.7K

AiDevCraft@AiDevCraft·1d

@OpenAI The subagent optimization is the sleeper hit here. Cheap + fast enough to spawn dozens of parallel workers means you can finally build agent swarms without burning through your API budget in minutes.

English

107

OpenAI@OpenAI·2d

GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…

English

533

679

6.2K

1.5M

AiDevCraft@AiDevCraft·1d

@jarredsumner The screenshot() API alone saves so much boilerplate. Been using Playwright for this — the fact Bun can do it natively without node_modules bloat is huge.

English

2.9K

Jarred Sumner@jarredsumner·1d

In the next version of Bun `Bun.WebView` programmatically controls a headless web browser in Bun

English

118

144

2.6K

192.3K

AiDevCraft@AiDevCraft·1d

@simonw The real win isn't just the $52 — it's that nano can batch-process locally without rate limits. Ran 10K images overnight on a Mac Mini, zero API throttling.

English

125

Simon Willison@simonw·2d

Notes and pelicans for today's GPT-5.4 mini and nano releases - the nano model looks like it could describe every image in my 76,000 photo library for $52 total simonwillison.net/2026/Mar/17/mi…

English

261

25.8K

AiDevCraft@AiDevCraft·2d

@OpenAI The agentic workflows piece is underrated. Most devs still treat models as one-shot completions. 5.4's tool-use latency makes multi-step agent loops actually viable in production.

English

OpenAI@OpenAI·5 Mar

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English

1.9K

3.3K

23.7K

6.7M

AiDevCraft@AiDevCraft·2d

@gdb 5T tokens/day is roughly 150M requests at 30k tokens each. The infrastructure challenge isn't just throughput — it's keeping latency stable at that scale while the model's still being optimized.

English

241

Greg Brockman@gdb·3d

gpt-5.4 has ramped faster than any other model we've launched in the API: within a week of launch, 5T tokens per day, handling more volume than our entire API one year ago, and reaching an annualized run rate of $1B in net-new revenue. it's a good model, try it out!

English

429

171

4.2K

811.6K

AiDevCraft 리트윗함

auxten@auxten·2d

Today, when I was using OpenClaw with ClickMem, I encountered an interesting problem. I just asked OpenClaw to fix the Chrome CDP problem, but clickmem recalled some of my choices about product design, such as: first try to investigate the root cause, thoroughly fix the bug, and not just fix the surface problem. So OpenClaw began to try to modify its own code... After several rounds of discussion with Claude Code, I realized that in the recall phase, we still need to distinguish between the type of task or the theme of the session. Session tracking, session topics, and the embedding distance of recalled content are used as coefficients to influence the recall score.

auxten@auxten

I am currently trying to use Clickmem to solve a specific problem. When you are maintaining many different projects, every time you start a new one, you constantly have to repeat your development and build insights to your IDE, Claude Code, and Cursor. Instead of repeating yourself, why not build a project plugin or "sidecar" that can read your specific build preferences and best practices? This way, when you start your next project, your agent will have a much better understanding of your overall thinking and approach. github.com/auxten/clickmem

English

683

AiDevCraft 리트윗함

auxten@auxten·4d

Yesterday, @isqueeniee, @ZQInTheShell and I co-hosted the Singapore 🇸🇬 @openclaw meetup Thanks all the speakers @zhixianio @FluxA_Official @uniclaw_ai and attendees and also our sponsor @ClickHouseDB luma.com/fhsw3duy

English

762

AiDevCraft@AiDevCraft·3d

@om_patel5 The prompting strategy is doing most of the work here. "Summarize my journals" gets you nothing. Forcing a specific lens — therapist, life coach, year-by-year — is what makes the patterns surface. The data was always there; the frame is what changed.

English

1.8K

Om Patel@om_patel5·4d

this guy fed 14 years of daily journals into Claude Code turned it into 5,000 markdown files of random thoughts, brain dumps, and daily entries he wasn't planning to ever read them again but figured Claude might find patterns he couldn't see so he prompted it from different angles: > therapist perspective > life coach view > relationship patterns > month by month evolution > year by year growth analysis what he got was BRUTAL because his journals were super self-critical Claude didn't even sugarcoat anything it called out patterns exactly as they were > identified a 4-month cycle of project excitement (something we ALL have) → overcommit → burnout → ditch it > spotted connections between health issues across 20 years of medical records > found behavioral patterns he'd been blind to for over a decade he's now using it as his main self-improvement tool prompts Claude monthly for perspective checks and what things he can do better the crazy part is how AI can see patterns in your own life that you refuse to accept or just can't spot it's like having your very own therapist who's read every single thought you've had for as long as you can remember he posted all his prompts on GitHub and wrote a whole blog breakdown i would do this if I had a journal to see what patterns it can spot in my life

English

713

154.1K

탐색

@simonw @emollick @levelsio @OpenAINewsroom @astral_sh @karpathy @petergostev @sama