The Trainman

1K posts

The Trainman banner
The Trainman

The Trainman

@ohadmaor

Senior iOS Engineer @ https://t.co/fL9oXSwJFW , ex @AlibabaGroup, @ridewithvia. Developing features for the machines but mainly for the Merovingian.

Israel Katılım Haziran 2017
345 Takip Edilen58 Takipçiler
The Trainman
The Trainman@ohadmaor·
“A new study: Here is what experienced developers do instead. → They plan before they prompt. They write out the architecture, the constraints, and the edge cases first, then hand the agent a tightly scoped task.”
Sukh Sroay@sukh_saroy

A new study just blew up the entire "vibe coding" movement. Researchers from UC San Diego and Cornell tracked 112 experienced software developers using AI agents in their actual jobs. The finding is the opposite of every viral demo on your timeline. Professional developers don't vibe code. They control. Here's what they actually found. The researchers ran two studies. 13 developers were observed live as they coded with agents in real production work. 99 more answered a deep qualitative survey. Every participant had at least 3 years of professional experience. Some had 25. The viral pitch of agentic coding goes like this. Hand the agent a vague prompt. Don't read the diff. Forget the code even exists. Trust the vibes. Andrej Karpathy coined the term. Tens of thousands of developers on X claim to run "dozens of agents at once" building entire production systems hands-off. The data says almost nobody serious actually works that way. Here is what experienced developers do instead. → They plan before they prompt. They write out the architecture, the constraints, and the edge cases first, then hand the agent a tightly scoped task. → They review every diff. Not because they're paranoid. Because they've seen what happens when you don't. → They constrain the agent's blast radius. Small, well-defined tasks only. The moment a problem touches multiple systems or has unclear requirements, they take over. → They treat the agent like a fast junior dev that needs supervision, not a senior engineer that can be trusted alone. The researchers also found something darker buried in the data. A separate randomized trial they cite showed that experienced open source maintainers were 19% slower when allowed to use AI. A different agentic system deployed in a real issue tracker had only 8% of its invocations result in a merged pull request. 92% failure rate in production. 19% productivity drop for senior devs. The viral demos lied to you. The paper's biggest insight is in one sentence: experienced developers feel positive about AI agents only when they remain in control. The moment they let go, quality collapses, and they know it. This matches what every serious shop has quietly figured out. The developers shipping the most with AI right now aren't the ones vibing. They're the ones with the strictest review processes, the tightest task scoping, and the clearest mental model of what the agent can and cannot do. Vibe coding makes for great Twitter videos. It does not make great software. The next time someone tells you they let Claude build their entire SaaS in a weekend, ask them how much of that code they've actually read. The honest answer separates real engineers from the demo crowd.

English
0
0
0
14
The Trainman
The Trainman@ohadmaor·
״How it can be true that a single artifact will..coherently refactor a 100,000-line code base and tell you to walk to the car wash to wash your car You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a 🗡️״
Andrej Karpathy@karpathy

Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.

English
0
0
0
10
Artem Grebenkin
Artem Grebenkin@artem_grebenkin·
@_MaxBlade Is it algorithm fcking with me or is it a thing that many people moving to codex from Claude code?
English
7
1
50
8.5K
Max Blade
Max Blade@_MaxBlade·
How to start winning : Install Hermes on your MacBook. Get a $100 a month codex subscription and have it run on gpt 5.5 Don’t waste time on anything else. It has absolute frontier intelligence and they have made soo much improvements to the personality. It feels like working with a friend. ( kind of scary tbh ) The amount of tokens you get subsidized with running gpt 5.5 through the codex cli is unbelievable. Give hermes GitHub access, ssh keys to your servers, and cloudflare tokens. It will take your local project and put them live on the web with domain, dns, ssl, nginx, pm2, everything setup in just ONE prompt. It can monitor your backend / database and make sure everything is running perfectly. Give it a support email, when customers reach out instruct it to look into there account on the server and makes any non destructive fixes. Hermes will automatically turn everything it does into skills, getting better and learning from mistakes constantly. You can literally have a 250k a year highly technical employee that WORKS 24/7 for $100 dollars right now. I need you to wake up.
English
202
163
3.5K
232.8K
The Trainman retweetledi
Big Brain AI
Big Brain AI@realBigBrainAI·
Jack Dorsey, co-founder of Twitter (now X) and Block, on why treating AI as a "copilot" is a losing strategy: @jack argues that most companies are approaching AI in a way that will make it nearly impossible for them to survive. "I think most of the industry is thinking about AI as like a co-pilot, as something that is augmented onto, rather than like how do you just rebuild our whole company with this as the core." His concern is that bolting AI onto existing structures produces companies that look indistinguishable from each other, and from the AI labs themselves. "If it doesn't make sense for your business to do that and you end up being or looking very similar or rhyming too closely with the frontier labs, then I think it's going to be very, very challenging to differentiate and survive." This thinking has been driving his decisions since early 2024, when these tools "really came to bear." That's when his team began building Goose, an agent coding harness, as part of a broader effort to rebuild around AI rather than layer it on top. The core insight? Speeding up old workflows with AI is a short-term gain every competitor will match. Real differentiation comes from rebuilding the company itself around intelligence.
English
177
243
1.9K
855.2K
The Trainman retweetledi
Paras Chopra
Paras Chopra@paraschopra·
AI bois be like:
Paras Chopra tweet media
English
125
549
7.5K
294.3K
The Trainman retweetledi
Aakash Gupta
Aakash Gupta@aakashgupta·
Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from their training corpus, they see stock ticker symbols, broken HTML, forum spam, autogenerated gibberish. Not Wikipedia. Not the Wall Street Journal. The actual pretraining dataset is mostly noise, and the model is burning parameters to vaguely remember all of it. One estimate pegs Llama 3's information compression at 0.07 bits per token. Well-structured English carries around 1.5 bits per token of real information. The trillion-parameter model is holding a roughly 5% resolution image of the internet it trained on. So when a lab ships a 1.8 trillion parameter model, the overwhelming majority of those weights are handling rough memorization. They are compression overhead for a noisy training set, taking up capacity that could be doing reasoning instead. Karpathy's proposal is to separate the two. Build a cognitive core: a small model that contains only the algorithms for reasoning and problem-solving, stripped of encyclopedic memorization. Pair it with external memory the model queries when it needs a fact. A 1 billion parameter reasoner plus retrieval beats a 1.8 trillion parameter model trying to do both. The data already supports this direction. GPT-4o runs at roughly 200 billion parameters and outperforms the original 1.8 trillion GPT-4. Inference costs for GPT-3.5 level performance fell 280x between 2022 and 2024, driven almost entirely by smaller, cleaner, better-architected models. The trend line is pointing where Karpathy says it should. The real implication for anyone tracking the AI trade: data quality is the actual constraint. The companies winning the next phase will be the ones who figured out what to train on, and what to throw away.
English
129
349
3.1K
505.7K
The Trainman retweetledi
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
AIs are just another step up the semantic expression ladder. We initially expressed our semantics in binary, then assembler, then Fortran, then C, then Java, then Python, etc. AI is just the next step up that same old ladder. And when you take that step, nothing else changes. You are still expressing behavioral semantics. You still need to express structural semantics. All the old principles still apply. You still have to be concerned about design and architecture. And even though the syntax allows informal statement, you cannot abandon formalism. When you express behavior you need a formal way to enforce the behavior you want. I use Gherkin for this. It seems to work pretty well. Consider that Gherkin is written in triplets of Given/When/Then. Each of those GWT triplets is a transition of a state machine. A full suite of Gherkin triplets is a formal description of the finite state machine that represents the behavior of the application. Other formalisms that matter are things like module dependency graphs, testing constraints, complexity constraints, and many others. This step up the semantic expression ladder provides you with an enormous amount of options. But you'd better choose those options wisely!
English
56
72
660
36K
The Trainman retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Someone recently suggested to me that the reason OpenClaw moment was so big is because it's the first time a large group of non-technical people (who otherwise only knew AI as synonymous with ChatGPT as a website) experienced the latest agentic models.
English
254
174
3.9K
439.8K
Uri Eliabayev
Uri Eliabayev@urieli17·
פעם שלישית היום שהוא הגיע למקסימום אז החלטתי עכשיו לפתוח את Codex ולהפנות אותו לתיקייה עם כל הסקילים שיצרתי עם קלוד. עכשיו אנחנו עושים בדיקות כדי להשמיש את כולם עבור GPT 5.4. לשמחתי הוא זיהה את כולם והבין מהר את הקונספט הכללי. אנטרופיק הגזימו לחלוטין.
Uri Eliabayev@urieli17

קלוד קוד מתחיל להגיע לי לאחרונה יותר מידי מהר למיצוי הטוקנים שלו גם בתוכנית ה-100 דולר. אני לאט לאט מתחיל להעביר חלק מהעבודות לCodex של OpenAI (יש לי מנוי ב-20 דולר). השאיפה היא גם לחזק עם מודלים מקומיים חזקים ולהוריד את התלות בקלוד. אעדכן.

עברית
34
0
184
18.4K
The Trainman
The Trainman@ohadmaor·
“Researchers tested 8 coordination protocols across 8 models and up to 256 agents. The protocol where agents were given NO assigned roles, NO hierarchy, and NO coordinator outperformed centralized coordination by 14%”
Sukh Sroay@sukh_saroy

🚨Shocking: A 25,000-task experiment just proved that the entire multi-agent AI framework industry is built on the wrong assumption. Every major framework - CrewAI, AutoGen, MetaGPT, ChatDev - starts from the same premise: assign roles, define hierarchies, let a coordinator distribute work. Researchers tested 8 coordination protocols across 8 models and up to 256 agents. The protocol where agents were given NO assigned roles, NO hierarchy, and NO coordinator outperformed centralized coordination by 14%. The gap between the best and worst protocol was 44%. That's not noise. That's a completely different outcome depending on how you organize the agents - not which model you use. Here's what makes this uncomfortable: When agents were simply given a fixed turn order and told "figure it out," they spontaneously invented 5,006 unique specialized roles from just 8 agents. They voluntarily sat out tasks they weren't good at. They formed their own shallow hierarchies - without anyone designing them. The researchers call it the "endogeneity paradox." The best coordination isn't maximum control or maximum freedom. It's minimal scaffolding - just enough structure for self-organization to emerge. But there's a catch nobody building agents wants to hear: below a certain model capability threshold, the effect reverses. Weaker models actually need rigid structure. Autonomy only works when the model is smart enough to use it. Which means every agent framework shipping with one-size-fits-all hierarchies is wrong twice - over-constraining strong models and under-constraining weak ones. The $2B+ invested in agent orchestration tooling may be solving a problem that capable models solve better on their own.

English
0
0
0
9
The Trainman retweetledi
Alex Volkov
Alex Volkov@altryne·
If you, like me, just woke up, let me catch you up on the Claude Code Leak (I know nothing, all conjecture): > Someone inside Anthropic, got switched to Adaptive reasoning mode > Their Claude Code switched to Sonnet > Committed the .map file of Claude Code > Effectively leaking the ENTIRE CC Source Code > @realsigridjin was tired after running 2 south korean hackathons in SF, saw the leak > Rules in Korea are different, he cloned the repo, went to sleep > Wakes up to 25K stars, and his GF begging him to take it down (she's a copyright lawyer) > Their team decided - how about we have agents rewrite this in Python!? Surely... this is more legal > Rewrite in Py > Board a plane to SK🇰🇷 > One of the guys decides python is slow, is now rewriting ALL OF CLAUDE CODE into Rust. > Anthropic cannot take down, cannot sue > Is this "fair use?" > TL;DR - we're about to have open source Claude Code in Rust
Alex Volkov tweet media
English
352
1.1K
11.9K
2M
The Trainman retweetledi
Ole Lehmann
Ole Lehmann@itsolelehmann·
i can't believe more people aren't talking about this part of the claude code leak there's a hidden feature in the source code called KAIROS, and it basically shows you anthropic's endgame KAIROS is an always-on, *proactive* Claude that does things without you asking it to. it runs in the background 24/7 while you work (or sleep) anthropic hasn't turned it on to the public yet, but the code is fully built here's how it works: every few seconds, KAIROS gets a heartbeat. basically a prompt that says "anything worth doing right now?" it looks at what's happening and makes a call: do something, or stay quiet if it acts, it can fix errors in your code, respond to messages, update files, run tasks... basically anything claude code can already do, just without you telling it to but here's what makes KAIROS different from regular claude code: it has (at least) 3 exclusive tools that regular claude code doesn't get: 1. push notifications, so it can reach you on your phone or desktop even when you're not in the terminal 2. file delivery, so it can send you things it created without you asking for them 3. pull request subscriptions, so it can watch your github and react to code changes on its own regular claude code can only talk to you when you talk to it. KAIROS can tap you on the shoulder and it keeps daily logs of everything. > what it noticed > what it decided > what it did append-only, meaning it can't erase its own history (you can read everything) at night it runs something the code literally calls "autoDream." where it consolidates what it learned during the day and reorganizes its memory while you sleep and it persists across sessions. close your laptop friday, open it monday, it's been working the whole time think about what this means in practice: > you're asleep and your website goes down. KAIROS detects it, restarts the server, and sends you a notification. by the time you see it, it's already back up > you get a customer complaint email at 2am. KAIROS reads it, sends the reply, and logs what it did. you wake up and it's already resolved > your stripe subscription page has a typo that's been live for 3 days. KAIROS spots it, fixes it, and logs the change endless use-cases, it's essentially a co-founder who never sleeps the codebase has this fully built and gated behind internal feature flags called PROACTIVE and KAIROS i think this is probably the clearest signal yet for where all ai tools are going. we are heading into the "post-prompting" era where the ai just works for you in the background like an all-knowing teammate who notices and handles everything, before you even think to ask
Ole Lehmann tweet media
Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English
248
333
3.2K
802.4K
The Trainman retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.4K
3.4M
Ejaaz
Ejaaz@cryptopunk7213·
wtf did i just read LMAO ai-powered cows worth $2 billion are using an algorithm called the “cowgorithm” to boost farming productivity (i’m not fucking joking) - Halter makes ai-powered cow collars that virtually monitor health, location and herd cows - farmers literally tap a button on the app and the cows gather for milking - 600,000 cow collars already live - peter thiel is backing their latest round worth $2 billion every fucking time i think ive seen the most ridiculous (but cool) application of ai i am proven wrong.
Ejaaz tweet mediaEjaaz tweet media
Bloomberg@business

Peter Thiel’s Founders Fund is backing a company bringing AI to cow herding at a $2 billion valuation bloomberg.com/news/articles/…

English
521
1.2K
15.7K
2.5M