mindmodel

2.2K posts

mindmodel

mindmodel

@mindmodel

Freelance C# / .Net Web architect / developer, specializing in clean, simple, performant code. Looking for new projects. Check my web site for demos, etc.

Boston, MA USA Katılım Aralık 2007
6.5K Takip Edilen887 Takipçiler
mindmodel retweetledi
Alex Albert
Alex Albert@alexalbert__·
We released Claude Opus 4.6 just two months ago. Today we're sharing some info on our new model, Claude Mythos Preview.
Alex Albert tweet mediaAlex Albert tweet media
English
697
959
13.8K
1.5M
mindmodel retweetledi
Om Patel
Om Patel@om_patel5·
ANTHROPIC JUST DROPPED ULTRAPLAN FOR CLAUDE CODE > you type /ultraplan in your terminal > Claude drafts a full plan in the cloud > you review it in your browser with inline comments > then you can execute it remotely or send it back to your CLI it shipped alongside Claude Code Web pushing everything toward cloud-first workflows while keeping the terminal as the power-user entry point
Om Patel tweet media
English
44
38
737
103.8K
mindmodel retweetledi
Hedgie
Hedgie@HedgieMarkets·
🦔A new acronym is reshaping how workers think about their careers. FOBO, the Fear of Becoming Obsolete, is now the defining psychological condition of the American workplace according to a new report. Four in ten workers name AI-driven job loss as a primary fear, nearly double the share from a year ago. Sixty-three percent say AI will make the workplace feel less human. Skill demands in AI-exposed roles are shifting 66% faster than a year ago. A new MIT study tracking AI across 3,000 labor market tasks adds weight to the fear, finding frontier models already complete 50-75% of text-based work at acceptable quality, with success rates projected to reach 80-95% by 2029. My Take FOBO is rational. The MIT data confirms the fear is pointing in roughly the right direction, just not necessarily on the timeline most people imagine. The researchers describe AI progress as a rising tide rather than a crashing wave, broad and gradual across almost all task types rather than sudden and catastrophic in specific ones. That framing matters because it means most workers will have visibility into the changes coming rather than waking up one morning to find their role gone. The cruelest part of FOBO is what happens when it goes untreated. The EY data shows experienced, highly skilled workers who are resisting AI adoption have gone from top of their peer group to bottom, while workers who embraced the tools have gone from average to exceptional. The fear of becoming obsolete, in other words, is actively accelerating the outcome people dread most. Only 19% of US companies have adopted AI at all and only a third of workers say their employer is providing adequate training. Most people are being left to manage FOBO alone, without the infrastructure that would actually resolve it. Hedgie🤗
Hedgie tweet media
English
8
31
120
6.5K
mindmodel
mindmodel@mindmodel·
@belaDouglass @etorreborre Other good news is that, in my experience, code quality improves as code base gets bigger. Not what I expected. I can say “make it work like all these modules”. Rules are nice but too often ignored by bots. Sometimes sample code works better.
English
0
0
1
13
mindmodel
mindmodel@mindmodel·
@belaDouglass @etorreborre Pointed Claude to the db it owns, where it records each of its errors to categorize and analyze them and come up with new hooks, skills, Claude.md etc. Claude’s conclusion: not fixable. Doesn’t matter what rules we write. Once Claude gets going it ignores and errs.
English
3
1
1
340
Eric Torreborre
Eric Torreborre@etorreborre·
My current experience with AI-driven code is that it can help a lot for getting started, for discussing alternatives, for boilerplate, for debugging, for code reviews, but the amount of incorrect, flawed or redundant code is a bit scary 1/2
English
14
4
124
19K
mindmodel
mindmodel@mindmodel·
@belaDouglass @etorreborre Good news is my code works. Of course not perfect or mathematically proven correct. Tests pass. App works. Process for tracking ai api regression. My point is without HIL, for now, the process fails.
English
0
0
1
15
mindmodel
mindmodel@mindmodel·
@belaDouglass @etorreborre Plan mode is nice but I prefer to have the bots write tickets. Include goal, plan, code samples, tests. Easy to switch between bots, record what’s done. Copilot writes commit comments that refer to ticket numbers.
English
0
0
1
118
mindmodel retweetledi
Big Brain Psychology
Big Brain Psychology@BigBrainPsych·
The Emotional Risks of Skipping the "Rebellious Stage" Philosopher and author Alain de Botton on why adolescent rebellion is a psychological necessity. Most parents dread adolescence. But de Botton argues it might be the most important phase of your life and skipping it could haunt you for decades. Adolescence, that messy, tumultuous stretch between 12 and 19 is "commonly held to be a nightmare by parents," de Botton acknowledges. Lots of sighing. Lots of mutual commiseration. "When a child turns to its parent and goes, 'You ruined my life, I hate you, everything about you is ridiculous,' that is part of growth. That is part of a journey to adulthood." Without it, you don't become an adult. You become something far more fragile. The "Premature Adult" Trap De Botton draws a sharp distinction between a true adult and what he calls a "premature adult": "A premature adult is not an adult. They are a child who's had to act like an adult in order to protect the adults around them from their reality. And that's a brutal and cruel thing to have done to you." Children who never got to be messy, angry, or difficult didn't grow up. They just got good at performing adulthood and that performance has a cost. The Question You Should Ask on a First Date De Botton suggests that one of the most important things you could ever learn about a partner is whether they've had a proper adolescence: "Imagine on an early dinner date you say to somebody, 'Have you had an adolescence?' They might not really know what you're talking about, but what you're really asking is something extremely important." What you're actually asking is: Have you had a chance to be something other than merely good? Have you listened to your own feelings? Have you been angry in the way you needed to be in order to feel real? "Are you more than just an actor of adulthood? Are you actually mature, rather than a good boy or girl?" The Law of the Missing Stage The most sobering part of de Botton's argument is what he calls a fundamental law of psychological life: "If you haven't had all the stages that are necessary to growth, you will need to go back and repeat a stage. It's like a curriculum, an emotional curriculum. And the stages that we've missed, we need to go back and have them." This plays out in ways that can devastate relationships. People who never had their rebellious 15-year-old phase can suddenly "wake up" at 70 and need to live it out. The result? Chaos for everyone around them. "It's hard to be 15 when you're 50." What Parents Actually Owe Their Kids The most loving thing a parent can do is to let kids feel it fully, at the right age. "One of the most generous things that parents can do is allow their child to be who they are at every age. When you're five, have all the tantrums that you need to have at five." The tantrum at five. The rebellion at fifteen. The existential crisis at nineteen. These are signs that a child is being allowed to grow.
English
14
81
420
31K
mindmodel retweetledi
mindmodel retweetledi
jon allie
jon allie@jonallie·
Personal rule of thumb: don't use an LLM for something that a deterministic program can do. I get it, LLMs are exciting, but they don't mean that software ceases to exist. They are fantastic at dealing with human language and ambiguity, but are terrible (by design and for good reason) at repeatability. To borrow terminology from the book Thinking Fast and Slow, LLMs are "system 2"...slower, more "expensive" (for LLMs, both in time and dollars), but flexible and creative. Traditional programs are "system 1" ..fast and cheap, but inflexible and dumb. Instead of trying to put an LLM in the "hot loop" of your program, it's usually worth asking an agent to write a deterministic program to do the thing you need done. Since code is "cheap", this deterministic tool can do exactly what you want it to, and doesn't consume tokens on every execution. (This applies to agents too..I find myself regularly yelling at Claude to stop repeatedly generating the same 30 lines of python to inspect a file, and instead telling it to generate a 3-line shell script wrapper around jq that it can check in and call repeatedly)
English
87
110
1.1K
96.7K
mindmodel retweetledi
SMB Attorney
SMB Attorney@SMB_Attorney·
Watch until the very end. I promise it will be worth it. This is amazing and hilarious. Gives you an idea of what we’re dealing with here… spoiler alert: it ain’t perfect 😂
English
146
551
4.4K
171.7K
mindmodel retweetledi
MERICA MEMED
MERICA MEMED@Mericamemed·
Now this is one wild story. The amount of lawsuits coming down the pipeline with stories like this is going to be astronomical.
English
907
6.7K
23.1K
1.6M
mindmodel retweetledi
tomie
tomie@tomieinlove·
People tend to develop AI psychosis from the models that most closely match their own intelligence. For the average person, that was 4o, which explains the popularity of #keep4o. But for those with more prodigious IQs, from 110 to 120, there's Opus 4.6.
English
55
12
649
52.2K
mindmodel retweetledi
Lenny Rachitsky
Lenny Rachitsky@lennysan·
My biggest takeaways from @simonw: 1. November 2025 was an inflection point for AI coding. GPT 5.1 and Claude Opus 4.5 crossed a threshold where coding agents went from “mostly works” to “almost always does what you want it to do.” Software engineers who tinkered over the holidays realized the technology had become genuinely reliable. 2. Mid-career engineers are the most vulnerable—not juniors, not seniors. AI amplifies experienced engineers by letting them leverage decades of pattern recognition. It also dramatically helps new engineers onboard. Cloudflare and Shopify each hired a thousand interns because AI cut ramp-up time from a month to a week. But mid-career engineers who haven’t accumulated deep expertise and have already captured the beginner boost are in the most precarious position. 3. AI exhaustion is real and underestimated. Simon runs four coding agents in parallel and is mentally wiped out by 11 a.m. He’s getting more time back, but his brain is exhausted from the intensity of directing multiple autonomous workers. Some engineers are losing sleep to keep agents running. This may just be a novelty issue, but the underlying dynamic—that managing AI amplifies cognitive load even as it reduces labor—is a real tension. Good companies will manage expectations rather than expecting 5x output indefinitely. 4. Code is cheap now. This simple idea has profound implications. The thing that used to take most of the time—writing code—now takes the least. The bottleneck has shifted to everything else: deciding what to build, proving ideas work, getting user feedback. Since prototyping is nearly free, Simon often builds three versions of every feature when he’s getting started. 5. The “dark factory” is the most radical experiment in AI-assisted development happening right now. A company called StrongDM established a policy: nobody writes code, nobody reads code. Instead, they run a swarm of AI-simulated end users 24/7—thousands of fake employees making requests like “give me access to Jira”—at $10,000 a day in token costs. They even had coding agents build simulated versions of Slack, Jira, and Okta from API documentation so they could test without rate limits. 6. "Red/green TDD" is the single highest-leverage agentic engineering pattern. Having coding agents write tests first, watch them fail, then write the implementation, then watch them pass produces materially better results. The five-word prompt “use red/green TDD” encodes this entire workflow because the agents recognize the jargon. 7. “Hoarding things you know how to do” is one of Simon's other favorite agentic engineering patterns. Simon maintains a GitHub repo of 193 small HTML/JavaScript tools and a separate research repo of coding-agent experiments. Each one captures a technique, a proof of concept, or a library he’s tested. When a new problem arrives, he can point Claude Code at past projects and say “combine these two approaches.” 8. The "lethal trifecta" makes AI agent security fundamentally unsolved. Whenever an AI agent has access to private data, exposure to untrusted content (like incoming emails), and the ability to send data externally (like replying to email), you have a lethal trifecta. Prompt injection—where malicious instructions in untrusted text override the agent’s intended behavior—cannot be reliably prevented. Simon has predicted a “Challenger disaster” for AI security every six months for three years. It hasn’t happened yet, but he’s pretty sure it will. 9. Start every project from a thin template, not a long instructions file. Coding agents are phenomenally good at matching existing patterns. A single test file with your preferred indentation and style is more effective than paragraphs of written instructions. Simon starts every project with a template containing one test (literally testing that 1 + 1 = 2) laid out in his preferred style. The agent picks it up and follows the convention across the entire codebase. This is cheaper and more reliable than maintaining elaborate prompt files. 10. The pelican-on-a-bicycle benchmark accidentally became a real AI benchmark. Simon created it as a joke to mock numeric benchmarks—get each LLM to generate an SVG of a pelican riding a bicycle, and compare the drawings. Unexpectedly, there’s a strong correlation between how good the drawing is and how good the model is at everything else. Nobody can explain why. It’s become a meme: Gemini 3.1’s launch video featured a pelican riding a bicycle. The AI labs are aware of it and quietly competing on it. Don't miss our full conversation: youtube.com/watch?v=wc8FBh…
YouTube video
YouTube
Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English
85
140
1.1K
322.1K
mindmodel retweetledi
Anthropic
Anthropic@AnthropicAI·
For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.
Anthropic tweet media
English
67
247
2.8K
819.6K
mindmodel retweetledi
Sukh Sroay
Sukh Sroay@sukh_saroy·
🚨Shocking: A 25,000-task experiment just proved that the entire multi-agent AI framework industry is built on the wrong assumption. Every major framework - CrewAI, AutoGen, MetaGPT, ChatDev - starts from the same premise: assign roles, define hierarchies, let a coordinator distribute work. Researchers tested 8 coordination protocols across 8 models and up to 256 agents. The protocol where agents were given NO assigned roles, NO hierarchy, and NO coordinator outperformed centralized coordination by 14%. The gap between the best and worst protocol was 44%. That's not noise. That's a completely different outcome depending on how you organize the agents - not which model you use. Here's what makes this uncomfortable: When agents were simply given a fixed turn order and told "figure it out," they spontaneously invented 5,006 unique specialized roles from just 8 agents. They voluntarily sat out tasks they weren't good at. They formed their own shallow hierarchies - without anyone designing them. The researchers call it the "endogeneity paradox." The best coordination isn't maximum control or maximum freedom. It's minimal scaffolding - just enough structure for self-organization to emerge. But there's a catch nobody building agents wants to hear: below a certain model capability threshold, the effect reverses. Weaker models actually need rigid structure. Autonomy only works when the model is smart enough to use it. Which means every agent framework shipping with one-size-fits-all hierarchies is wrong twice - over-constraining strong models and under-constraining weak ones. The $2B+ invested in agent orchestration tooling may be solving a problem that capable models solve better on their own.
Sukh Sroay tweet media
English
66
62
555
58.2K
mindmodel retweetledi
Apoth3osis
Apoth3osis@apoth3osis_io·
LLMs are probabilistic machines, and I am officially done arguing about why "just scale it" won't fix their fundamental complex systems issues. Let's use a metaphor instead: The Weather. Try as we might, we cannot perfectly predict or control the wind. But we don't need to. We build windmills to harness its energy for productive applications, just like nature uses it to scatter seeds. That is our actual job in the agentic age. We need to design systems that capture the incredible power of probabilistic AI while structurally limiting its negative consequences. Not overstating capabilities to raise VC money, not relying on the magical thinking of "more compute will fix it," but doing the hard, unglamorous systems engineering involved in traffic and population control. But those of us on the front lines need to show more vulnerability, too. Guess what? We are still prone to generating the exact same "slop" as everyone else. I have gone to great lengths to control my own workflows. I use 300+ tools and 150+ custom skill files strictly for my Automated Theorem Proving (ATP) work. My guardrails have guardrails. And yet, today I pushed Lean -> Swift, WASM, & Go wrappers in absolutely unusable, embarrassing states. Total garbage. Why? Because I had just successfully produced Lean -> Rust, got lazy, and assumed the LLM was capable of following the exact same instructions for other languages without me strictly auditing the output. These models are like toddlers. Take your eyes off them for one second and they instantly have the digital equivalent of chocolate all over their faces. It happens. It's embarrassing. Get over it. The key is to learn why the mistake happened and build a new gate or system to prevent it next time. There is a delicate balance here: if you go overboard and over-constrain the system, you lose the creative power of these probabilistic engines entirely. But if you under-constrain them, you ship slop. Stop trying to hardcode the weather. Learn to properly guide the wind. 🌪️
English
4
4
11
948