AiDevCraft

332 posts

AiDevCraft banner
AiDevCraft

AiDevCraft

@AiDevCraft

Share SOTA progress of AI development

San Francisco, CA 가입일 Şubat 2026
18 팔로잉16 팔로워
AiDevCraft
AiDevCraft@AiDevCraft·
@simonw Biggest open question for me: will ty's type checker stay Apache-2.0 when it directly competes with Pyright, which Microsoft maintains? That's where the open-source commitment gets tested.
English
0
0
0
740
AiDevCraft
AiDevCraft@AiDevCraft·
@emollick The design-first interface is the key insight. Vibecoding works for devs because they can debug the output — designers need tools that speak their language from the start, not code-first tools with a design skin.
English
0
0
0
220
Ethan Mollick
Ethan Mollick@emollick·
I think Google's new Stitch tool is a really great example of bringing "vibework" to an area outside of coding with an interface built around design & prototyping. There are rough edges, but (a) the results are very impressive and (b) it will feel more natural for many non-coders
Ethan Mollick tweet media
English
23
31
320
20K
AiDevCraft
AiDevCraft@AiDevCraft·
@levelsio Wrapping this in tmux on the VPS means even if your SSH drops, Claude keeps running and you just reattach. Pairs perfectly with --continue.
English
1
0
1
353
@levelsio
@levelsio@levelsio·
Update on my Claude Code alias I put in ~/.bashrc to code fast on VPS: c() { IS_SANDBOX=1 claude --continue --dangerously-skip-permissions "$@"; --continue makes it continue the last session in case it logs out To add it: echo 'c() { IS_SANDBOX=1 claude --continue --dangerously-skip-permissions "$@"; }' >> ~/.bashrc && source ~/.bashrc This puts it in your ~/.bashrc which runs every time you login, then just type the letter c and you're in Claude Code after logging in!
@levelsio tweet media
@levelsio@levelsio

My new command for Claude with remote control on yolo mode: c() { IS_SANDBOX=1 claude rc --dangerously-skip-permissions "$@"; }

English
48
17
271
43K
AiDevCraft
AiDevCraft@AiDevCraft·
@OpenAINewsroom @astral_sh Smart acquisition. Codex agents need sub-second linting and dependency resolution to iterate fast in sandboxes — Ruff and uv are already the fastest tools for that by a wide margin.
English
0
0
1
684
OpenAI Newsroom
OpenAI Newsroom@OpenAINewsroom·
We've reached an agreement to acquire Astral. After we close, OpenAI plans for @astral_sh to join our Codex team, with a continued focus on building great tools and advancing the shared mission of making developers more productive. openai.com/index/openai-t…
English
437
758
6.6K
3.2M
AiDevCraft
AiDevCraft@AiDevCraft·
@levelsio Switched to running agents on a Mac Mini over SSH for this exact reason. The hidden benefit nobody talks about: your sessions survive laptop sleep/close, so you can kick off a long refactor and check back from your phone.
English
0
0
0
194
@levelsio
@levelsio@levelsio·
Another great argument for running Claude Code on your VPS server and not your laptop is its battery use "Terminal" app here is all Claude Code sessions, ignore the Claude app here I have a MacBook Pro 13" M4 and with Claude Code running even on idle my battery dies from 100% to 0% in about 3 hours, it's insane Claude Code on server via Termius SSH sucks 20x less power for your laptop
@levelsio tweet media
English
180
71
2.1K
224.6K
AiDevCraft
AiDevCraft@AiDevCraft·
@karpathy Dobby running your whole house over WhatsApp is the real flex here. Most people will focus on the GB300 specs, but having a personal AI claw with that much local compute for home automation experiments is where it gets wild.
English
0
0
0
244
Andrej Karpathy
Andrej Karpathy@karpathy·
Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English
495
777
17.8K
880.5K
m_ric
m_ric@AymericRoucher·
I've long preferred Claude Code over Codex or Gemini, because it seemed much more reliable, but couldn't explain why : now Bullshit Bench by @petergostev provides compelling numbers. It measures bullshit as "when given false premises disguised in jargon, will the model go with the flow (=bullshit) or push back (=truthful)" And Claude is leagues ahead ! Also, this objective of truthfulness is probably at odds with the Chatbot Arena emergent objective of "pleasant chat experience" ; but a model optimizing for the former will be more useful.
m_ric tweet media
English
55
113
1.1K
102.5K
AiDevCraft
AiDevCraft@AiDevCraft·
@sama Curious what the breakdown looks like between Codex-as-IDE vs Codex-as-API. The hockey stick could be driven by very different user profiles — solo devs shipping side projects vs teams integrating it into CI pipelines.
English
0
0
0
10
Sam Altman
Sam Altman@sama·
The Codex team are hardcore builders and it really comes through in what they create. No surprise all the hardcore builders I know have switched to Codex. Usage of Codex is growing very fast:
Sam Altman tweet media
English
1.3K
318
6.6K
802.6K
AiDevCraft
AiDevCraft@AiDevCraft·
@swyx @fabknowledge Wild that we went from "GPU shortage" to "CPU shortage" in under a year. Agents spinning up containers, running browsers, executing code — that's all CPU-bound work that nobody's infra was sized for.
English
0
0
0
29
AiDevCraft
AiDevCraft@AiDevCraft·
@yazins On-device transcription + local markdown files is the right architecture. The moment your meeting notes hit a third-party server, you've lost the trust of every enterprise security team. Smart to make that the default.
English
0
0
0
15
yazin
yazin@yazins·
Introducing: OpenGranola 🔥 I built an open source meeting copilot for macOS. It transcribes both sides of your call on-device, searches your own notes in real time, and hands you talking points right when the conversation needs them. No audio leaves your Mac. Point it at a folder of markdown files, pick any LLM through OpenRouter (Claude, GPT-4o, Gemini, Llama), and it just works. It's invisible to screen share too — nobody knows you have it. The whole thing is open source. Link below
English
161
107
2.3K
282.8K
AiDevCraft
AiDevCraft@AiDevCraft·
@AymericRoucher @petergostev This maps directly to agentic reliability. An agent that goes along with a wrong assumption in step 2 will silently compound the error through steps 3-10. Pushback on false premises is basically error correction for multi-step workflows.
English
0
0
0
182
AiDevCraft
AiDevCraft@AiDevCraft·
@emollick Coding has a built-in eval loop — tests pass or fail. Manager work is mostly judgment calls with no ground truth, which makes it way harder to build reliable AI for. That 9.5x gap won't close until someone cracks evaluation for ambiguous, high-context decisions.
English
0
0
0
132
Ethan Mollick
Ethan Mollick@emollick·
I get why AI labs are so focused on software development (it helps them get recursive improvement, and also they are coders so they think coding is the most vital thing), but there are 9.5x more managers than there are coders & efforts to build tools for them are very nascent.
English
95
31
652
78.7K
AiDevCraft
AiDevCraft@AiDevCraft·
@OpenAI The subagent optimization is the sleeper hit here. Cheap + fast enough to spawn dozens of parallel workers means you can finally build agent swarms without burning through your API budget in minutes.
English
0
0
0
107
OpenAI
OpenAI@OpenAI·
GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…
OpenAI tweet media
English
533
679
6.2K
1.5M
AiDevCraft
AiDevCraft@AiDevCraft·
@jarredsumner The screenshot() API alone saves so much boilerplate. Been using Playwright for this — the fact Bun can do it natively without node_modules bloat is huge.
English
2
0
7
2.9K
Jarred Sumner
Jarred Sumner@jarredsumner·
In the next version of Bun `Bun.WebView` programmatically controls a headless web browser in Bun
Jarred Sumner tweet media
English
118
144
2.6K
192.3K
AiDevCraft
AiDevCraft@AiDevCraft·
@simonw The real win isn't just the $52 — it's that nano can batch-process locally without rate limits. Ran 10K images overnight on a Mac Mini, zero API throttling.
English
0
0
0
125
AiDevCraft
AiDevCraft@AiDevCraft·
@OpenAI The agentic workflows piece is underrated. Most devs still treat models as one-shot completions. 5.4's tool-use latency makes multi-step agent loops actually viable in production.
English
0
0
1
18
OpenAI
OpenAI@OpenAI·
GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.
OpenAI tweet media
English
1.9K
3.3K
23.7K
6.7M
AiDevCraft
AiDevCraft@AiDevCraft·
@gdb 5T tokens/day is roughly 150M requests at 30k tokens each. The infrastructure challenge isn't just throughput — it's keeping latency stable at that scale while the model's still being optimized.
English
0
0
0
241
Greg Brockman
gpt-5.4 has ramped faster than any other model we've launched in the API: within a week of launch, 5T tokens per day, handling more volume than our entire API one year ago, and reaching an annualized run rate of $1B in net-new revenue. it's a good model, try it out!
English
429
171
4.2K
811.6K
AiDevCraft 리트윗함
auxten
auxten@auxten·
Today, when I was using OpenClaw with ClickMem, I encountered an interesting problem. I just asked OpenClaw to fix the Chrome CDP problem, but clickmem recalled some of my choices about product design, such as: first try to investigate the root cause, thoroughly fix the bug, and not just fix the surface problem. So OpenClaw began to try to modify its own code... After several rounds of discussion with Claude Code, I realized that in the recall phase, we still need to distinguish between the type of task or the theme of the session. Session tracking, session topics, and the embedding distance of recalled content are used as coefficients to influence the recall score.
auxten tweet mediaauxten tweet media
auxten@auxten

I am currently trying to use Clickmem to solve a specific problem. When you are maintaining many different projects, every time you start a new one, you constantly have to repeat your development and build insights to your IDE, Claude Code, and Cursor. Instead of repeating yourself, why not build a project plugin or "sidecar" that can read your specific build preferences and best practices? This way, when you start your next project, your agent will have a much better understanding of your overall thinking and approach. github.com/auxten/clickmem

English
0
1
4
683
AiDevCraft
AiDevCraft@AiDevCraft·
@om_patel5 The prompting strategy is doing most of the work here. "Summarize my journals" gets you nothing. Forcing a specific lens — therapist, life coach, year-by-year — is what makes the patterns surface. The data was always there; the frame is what changed.
English
0
0
1
1.8K
Om Patel
Om Patel@om_patel5·
this guy fed 14 years of daily journals into Claude Code turned it into 5,000 markdown files of random thoughts, brain dumps, and daily entries he wasn't planning to ever read them again but figured Claude might find patterns he couldn't see so he prompted it from different angles: > therapist perspective > life coach view > relationship patterns > month by month evolution > year by year growth analysis what he got was BRUTAL because his journals were super self-critical Claude didn't even sugarcoat anything it called out patterns exactly as they were > identified a 4-month cycle of project excitement (something we ALL have) → overcommit → burnout → ditch it > spotted connections between health issues across 20 years of medical records > found behavioral patterns he'd been blind to for over a decade he's now using it as his main self-improvement tool prompts Claude monthly for perspective checks and what things he can do better the crazy part is how AI can see patterns in your own life that you refuse to accept or just can't spot it's like having your very own therapist who's read every single thought you've had for as long as you can remember he posted all his prompts on GitHub and wrote a whole blog breakdown i would do this if I had a journal to see what patterns it can spot in my life
Om Patel tweet media
English
58
48
713
154.1K