robi 😼🧶

12.5K posts

robi 😼🧶 banner
robi 😼🧶

robi 😼🧶

@subdigit

nomadic 😺

Portland, OR Se unió Mart 2007
1.1K Siguiendo993 Seguidores
robi 😼🧶
robi 😼🧶@subdigit·
"collapse icon in section does not rotate properly at the center" and it just fixes it. I no longer have to spend 10 minutes hunting it down and figuring out what the issues is. It just takes seconds. For UI work, just multiply that by 100 bugs and imagine the time savings...
English
0
0
0
7
robi 😼🧶
robi 😼🧶@subdigit·
And this is the most damning thing you'll read about the AI hype train.
Peter Girnus 🦅@gothburz

@josephfounder You did the investigation. I did the confession. Karpathy did the endorsement. Rank those by reach and you'll see the problem.

English
0
0
0
5
robi 😼🧶
robi 😼🧶@subdigit·
AI is at a very interesting inflection point. We went from hallucinating AI, to helpful AI, to "just do it" AI in a matter of months. This will only accelerate. Software development as a career is going to be wildly different in just a few months. Everyone needs to be ready.
English
0
0
0
9
robi 😼🧶
robi 😼🧶@subdigit·
@ab_aditya It feels like magic. Still need to hand hold to get things done right, but magic otherwise. So very interested to see where this goes once the AI is actually intelligent...
English
1
0
1
13
Aditya Banerjee
Aditya Banerjee@ab_aditya·
To me Claude Code feels like playing a text based adventure in co-op mode with the LLM, with me dual hatting as a player and DM. We both get to level up our skills and also build a code base as loot.
Andrej Karpathy@karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

English
1
0
0
31
robi 😼🧶 retuiteado
Andrej Karpathy
Andrej Karpathy@karpathy·
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
English
1.6K
5.4K
39.4K
7.6M
robi 😼🧶
robi 😼🧶@subdigit·
Been messing around with antigravity, and for secondary stuff, it feels magical enough that I can tell it what to do, tell it to tweak it, and get it to a good enough state where it works. I don't even look at the code. Primary work replacement is just a matter of time.
Guillermo Rauch@rauchg

10 days into 2026: - Terence Tao announces GPT & Aristotle solve Erdős problem autonomously - Linus Torvalds concedes vibe coding is better than hand-coding for his non-kernel project - DHH walks back “AI can’t code” from Lex podcast 6 months later An acceleration is coming the likes of which humanity has never experienced before

English
0
0
0
17
robi 😼🧶
robi 😼🧶@subdigit·
@owl_elc @Cr7Godbrand I'm so sorry for the profound assholes in your reply thread. I'm just stunned by the level of blame surfaced back to you.
English
0
0
0
5
Owl of Athena 🇮🇱🎗️🐿️
It's much simpler than that. When we do say exactly what we want, when we get up the nerve to talk to someone who appears to be oblivious to things right in front of their face, not esoteric emotional needs, but picking their dirty laundry up off the floor, and not leaving a wet towel on the bed, we're called nags and treated as if the work we do is of no value if they bring home a paycheck that is slightly larger than ours.
English
42
2
62
35.2K
STUNNER
STUNNER@Cr7Godbrand·
think I finally understand why many women complain about men so much. At its core, it comes the frustration that men do not think, process, or experience the world the way women do. women struggle with having to verbalize their needs….to explain what’s wrong and spell things out emotionally. They wish men would simply anticipate and intuit without being told. When that doesn’t happen, it feels exhausting and unfair. This is why ideas like “mental load” and “emotional labor” keep coming up…not because men are malicious, but because men do not naturally think in the same constant, anticipatory, emotionally layered way women do. Men tend to be more linear, task-focused, and present-oriented, while women lean toward overthinking, future-projection, perfectionism, and emotional vigilance. That difference breeds resentment. Instead of accepting these differences, many women try to push men to adopt feminine emotional traits…hyper-empathy, constant emotional processing, and anticipatory caretaking…which do not come naturally to most men. This is why phrases like “emotional intelligence,” “if he wanted to, he would,” and “a real man would notice” are often used. They assume men should operate with the same internal wiring as women, which they simply do not.
STUNNER tweet media
English
482
591
6.2K
539.6K
robi 😼🧶
robi 😼🧶@subdigit·
It's bad that articles no longer do comments, as there isn't anyone to tell you that this is extraordinarily bad advice, or at best, incomplete advice. We're in an age of information where nothing wants to be questioned. People just want to be right... share.google/UQFczdCtgjGPCU…
English
0
0
0
15
robi 😼🧶
robi 😼🧶@subdigit·
It always scares me that LLMs are genuinely stupid and over confident. Especially the difference in the lower vs higher models. The lower models will never be convinced it's answer is wrong. It just says "sorry" and keeps producing the wrong answer. So dangerous.
English
0
0
0
5
robi 😼🧶
robi 😼🧶@subdigit·
I wouldn't call this "true" reasoning, but more of a "brute force" reasoning. There still isn't genuine understanding and reasoning until it's told what the right answer is to continue to build on. The methodology to force it to understand is interesting, if you have ♾️ tokens
Carlos E. Perez@IntuitMachine

Everyone says LLMs can't do true reasoning—they just pattern-match and hallucinate code. So why did our system just solve abstract reasoning puzzles that are specifically designed to be unsolvable by pattern matching? Let me show you what happens when you stop asking AI for answers and start asking it to think. 🧵 First, what even is ARC-AGI? It's a benchmark that looks deceptively simple: You get 2-4 examples of colored grids transforming (input → output), and you have to figure out the rule. But here's the catch: These aren't IQ test patterns. They're designed to require genuine abstraction. (Why This Is Hard) Humans solve these by forming mental models: "Oh, it's mirroring across the diagonal" "It's finding the bounding box of blue pixels" "It's rotating each object independently" Traditional ML? Useless. You'd need millions of examples to learn each rule. LLMs? They hallucinate plausible-sounding nonsense. But we had a wild idea: What if instead of asking the LLM to predict the answer, we asked it to write Python code that transforms the grid? Suddenly, the problem shifts from "memorize patterns" to "reason about transformations and implement them." Code is a language of logic. Here's the basic algorithm: Show the LLM examples: "Write a transform(grid) function" LLM writes code Run it against examples If wrong → show exactly where it failed Repeat with feedback Sounds simple, right? But that's not even the most interesting part. When the code fails, we don't just say "wrong." We show the LLM a visual diff of what it predicted vs. what was correct: Your output: 1 2/3 4 ← "2/3" means "you said 2, correct was 3" 5 6/7 8 Plus a score: "Output accuracy: 0.75" It's like a teacher marking your work in red ink. With each iteration, the LLM sees: Its previous failed attempts Exactly what went wrong The accuracy score It's not guessing. It's debugging. And here's where it gets wild: We give it up to 10 tries to refine its logic. Most problems? Solved by iteration 3-5. But wait, it gets crazier. We don't just run this once. We run it with 8 independent "experts"—same prompt, different random seeds. Why? Because the order you see examples matters. Shuffling them causes different insights. Then we use voting to pick the best answer. After all experts finish, we group solutions by their outputs. If 5 experts produce solution A and 3 produce solution B, we rank A higher. Why does this work? Because wrong answers are usually unique. Correct answers converge. It's wisdom of crowds, but for AI reasoning. Each expert gets a different random seed, which affects: Example order (we shuffle them) Which previous solutions to include in feedback The "creativity" of the response Same prompt. Same model. Wildly different exploration paths. One expert might focus on colors. Another on geometry. Our prompts are elaborate. We don't just say "solve this." We teach the LLM how to approach reasoning: Analyze objects and relationships Form hypotheses (start simple!) Test rigorously Refine based on failures It's like giving it a graduate-level course in problem-solving. Here's why code matters: When you write: def transform(grid): return np.flip(grid) You're forced to be precise. You can't hand-wave. Code doesn't tolerate ambiguity. It either works or it doesn't. This constraint makes the LLM think harder. Oh, and we execute all this code in a sandboxed subprocess with timeouts. Because yeah, the LLM will occasionally write infinite loops or try to import libraries that don't exist. Safety first. But also: fast failure = faster learning. ARC-AGI isn't about knowledge. It's about: Abstraction (seeing the pattern behind the pattern) Generalization (applying a rule to new cases) Reasoning (logical step-by-step thinking) We're not teaching the AI facts. We're teaching it how to think. So did it work? We shattered the state-of-the-art on ARC-AGI-2. Not by a little. By a lot. Problems that stumped every other system? Solved. And the solutions are readable, debuggable Python functions. You can literally see the AI's reasoning process. This isn't just about solving puzzles. It's proof that LLMs can do genuine reasoning if you frame the problem correctly. Don't ask for answers. Ask for logic. Don't accept vague outputs. Demand executable precision. Don't settle for one attempt. Iterate and ensemble. Which makes you wonder: What else are we getting wrong about AI capabilities because we're asking the wrong questions? Maybe the limit isn't the models. Maybe it's our imagination about how to use them. Here's what you can steal from this: When working with LLMs on hard problems: Ask for code/structure, not raw answers Give detailed feedback on failures Let it iterate Run multiple attempts with variation Use voting/consensus to filter noise Precision beats creativity. The most powerful pattern here? Treating the LLM like a reasoning partner, not an oracle. We're not extracting pre-trained knowledge. We're creating a thought process—prompt → code → test → feedback → refined thought. That loop is where the magic lives. If you're working on hard AI problems, stop asking: "Can the model do X?" Start asking: "How can I design a process that lets the model discover X?" The future of AI isn't smarter models. It's smarter prompts, loops, and systems around them.

English
1
0
0
21
robi 😼🧶
robi 😼🧶@subdigit·
@jaffathecake It feels like current AI takes too much stick bashing to make it work. If you look at the image AIs, the prompts rarely produce what you want. You may be pleasantly surprised, but it's never what you asked for. It's scary to think this same thinking is applied to AI code agents
English
0
0
1
20
Jake Archibald
Jake Archibald@jaffathecake·
For the map, it actually created a map of a driving route between the locations, which is pretty good, but it isn't what I wanted. I tried to correct it, and it got increasingly worse from there. AI-in-browser continues to feel like a novelty, and not practically useful.
Jake Archibald tweet media
English
3
0
19
3K
Jake Archibald
Jake Archibald@jaffathecake·
A couple of times recently I felt I could benefit from a browser agent. 1️⃣ Here's a list of TV shows & movies. Add them to my Trakt watchlist. 2️⃣ I have a bunch of open tabs to restaurants/pubs. Create a map displaying all. I tried these with ChatGPT Atlas…
English
5
3
41
10.8K
robi 😼🧶
robi 😼🧶@subdigit·
The 15" MBP seems like a cash grab. A entry level machine that touts AI capabilities which a few more months down the road will surely be eclipsed by the M5 Max and Ultra and whatever else they're calling it. Just a gap filler essentially.
English
0
0
0
32
robi 😼🧶
robi 😼🧶@subdigit·
I guess this the modern day corollary to "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can" from jwz? Cause if I can browse, I can also do email, so ☑️
robi 😼🧶 tweet media
English
0
0
0
12
robi 😼🧶
robi 😼🧶@subdigit·
@jeffrey_abbott I just don't understand that in the great age of our Lord 2025, it's still not easy to setup a dev to production pipeline for the language and framework you want. It shouldn't be this convoluted...
English
0
0
0
19
robi 😼🧶
robi 😼🧶@subdigit·
Nothing makes me feel dumber than having to deal with installing a python app... brew, pip, pipx, python, python3, venv, installing in venv, setting one up, making sure the right python is there, configuring pip to use the right version... blah blah blah
English
0
0
0
37
robi 😼🧶
robi 😼🧶@subdigit·
When things change, you break the trust. It may indeed be for the good, but the timing is horrible. Shows no concern for stability in a peak period. Just fake metrics. So bad. Definitely no wow.
English
0
0
0
9
robi 😼🧶
robi 😼🧶@subdigit·
Sigh. Add it up there in addition to where it was, don't remove it. Why? Because you're in the busiest part of the tax season. People want a stable interface. They don't have time to have things change. They don't have time to worry about what odd change is next.
Department of Government Efficiency@DOGE

On the IRS.gov website, the "log in" button was not in the top right on the navbar like it is on most websites. It was weirdly placed in the middle of the page below the fold. An IRS engineer explained that the *soonest* this change could get deployed is July 21st... 103 days from now. This engineer worked with the DOGE team to delete the red tape and accomplished the task in 71 minutes. See before/after pictures below. There are great people at the IRS, who are simply being strangled by bureaucracy.

English
1
0
0
40