Mike
165 posts

Mike retweetledi
Mike retweetledi
Mike retweetledi
Mike retweetledi

So we now have a pretty good picture of the state of the frontier AI model makers.
US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of the pack, and may have signs of recursive self-improvement. xAI has fallen from frontier status for now (though promises to return shortly). Meta re-entered the space today with a not-quite-frontier closed source model, but an approach that suggests that they might be back in the race. All the other US players seem far behind.
On the Chinese model front, Alibaba (Qwen), Moonshot (Kimi), MiniMax, Xiaomi (MiMo), Deepseek, and Z (GLM) all still appear to be very much in the race, though the best Chinese models are still 7-9+ months behind released US closed source models. For some of these players, especially Xiaomi and Alibaba, their commitment to open weights appear to be slipping.
Outside of China, Mistral seems to have fallen from frontier status.
English

Coding's agents are presented as turn key solutions. The reality is that coding agents have a massive skill curve. The more you use them the father the farther you can see on the horizon.
The people heavily using them have seen the capability curve has constantly ascended new highs for the past 2 years.
English

Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy
The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.
English

This isn’t Rogan being combative for sport it’s him refusing to let the Overton window drift in real time. Theo’s chasing the “safe, harmonious everyman” spot in a culture that rewards emotional fluency over intellectual consistency. Their friction is the sound of one guy still demanding arguments and the other sensing the cultural weather vane. Rare to see the mask slip so cleanly on a major platform.
English

The other axis to watch is the high performer skill curve. People having Opus 4.6 run without interaction building things for hours is real. People sensationalize it but you can actually do this now for some use cases. I expect people to have mastered this skill in another 6 months. That is was I am aiming for.
English

It's a good question. I'm not sure. The models are improving at a faster rate that the average dev is acquiring coding skills. I have been surprised on the down side by the average person at my org's ramp up. Being effective with them is a skill which takes time to cultivate. The early adopters make it look like agents are a turn key solution which is not reality.
English

Wow… I know this is happening but it’s still so jarring to listen to
This is Owen Jennings, the Business Lead at Block (the company that just laid off 40% of it’s entire company) talking about the new workflow for the company post reorg
>Dev teams of 14 turned to 3
>Claude Fast mode with unlimited tokens
>Managing multiple agents at once + context switching
Also he totally refutes the idea that the layoffs were because of excess Covid hiring- attributes them specifically to advancements in AI
Really just mind blowing
English


My company rolled out AI tools 11 months ago. Since then, every task I do takes longer.
I am not allowed to say this out loud.
Not because there is a policy. There is no policy. There is something worse than a policy. There is enthusiasm.
There is a Slack channel called #ai-wins where people post screenshots of AI outputs with captions like "this just saved me an hour." There is a VP who opens every all-hands with "the companies that adopt fastest win." There is a Director who renamed his team from Operations to Intelligent Operations. There is a peer review question that now asks: "How have you leveraged AI tools to enhance your workflow this quarter?"
If the answer is "I haven't, because I was faster before," that is a career decision.
So I leverage.
Emails.
Before the tools, I wrote emails. This took the amount of time it takes to write an email. I did not measure it. Nobody measured it. The email got written and sent and it was fine.
Now I write the email. Then I highlight the text and click "Enhance with AI." The AI rewrites my email. It replaces "Can we meet Thursday?" with "I'd love to explore the possibility of finding a mutually convenient time to align on this." I read the rewrite. I delete the rewrite. I send my original email.
This takes 4 minutes instead of 2. The 2 extra minutes are the enhancement. I do this 11 times a day. That is 22 minutes I spend each day rejecting improvements to sentences that were already finished.
In #ai-wins I posted a screenshot of the rewrite. I did not post the part where I deleted it. 23 people reacted with the rocket emoji.
That is adoption.
Meetings.
We have an AI notetaker in every meeting now. It joins automatically. It records. It transcribes. It summarizes. After each meeting I receive a 3-paragraph summary of the meeting I just attended.
I read the summary. This takes 3 minutes. I was in the meeting. I know what happened. I am reading a machine's account of something I experienced firsthand. Sometimes the account is wrong. Last Tuesday it attributed a comment about Q3 revenue to me. My manager made that comment. I spent 4 minutes correcting the transcript.
Before the notetaker, I did not spend 7 minutes after each meeting correcting a robot's memory of something I personally witnessed. I attend 11 meetings a week. That is 77 minutes per week supervising a transcription nobody requested.
I mentioned this once. My manager said "think about the people who weren't in the meeting." The people who weren't in the meeting do not read the summaries. I checked. The read receipts show single-digit opens. The summaries exist not because they are useful but because they are there. I read them for the same reason.
Documents.
I write a weekly status update. Before the tools, this took 10 minutes. I typed what happened. I sent it. My manager skimmed it. The system worked.
Now I open the AI writing assistant. I give it my bullet points. It produces a draft. The draft says "Significant progress was achieved across multiple workstreams." I did not achieve significant progress across multiple workstreams. I updated a spreadsheet and sent 4 emails.
I rewrite the draft to say what actually happened. Then I run my rewrite through the grammar tool. It suggests I change "done" to "completed" and "next week" to "in the forthcoming period." I click Ignore 9 times. Then I send the version I would have written in 10 minutes. The process now takes 30.
I have been doing this every week for 11 months. I have added 20 minutes to a task that did not need 20 more minutes. I call this efficiency. I have been calling it efficiency for 11 months. That is what efficiency means now. It means the additional time you spend to arrive at the same outcome through a longer process. Nobody has questioned this definition. I have not offered it for review.
I kept a log once. 2 weeks. Every task, timed. Before-AI and after-AI. The after number was larger in every case. Every single one. Not by a little. The range was 40 to 200 percent.
I deleted the log.
I deleted it because it was a document that said, in plain numbers, that the AI tools make me slower. And a document like that has no place in a company where AI adoption is a strategic priority. I could not send it to my manager. He championed the rollout. I could not post it in #ai-wins. I could not raise it in a meeting because the notetaker would transcribe it and the summary would read "[Name] expressed concerns about AI tool efficacy" and that summary would be the first one anyone actually reads.
So I do what everyone does.
I use the tools. I spend the extra time. I post in #ai-wins. I write "leveraged AI to streamline weekly reporting" in my review and my manager gives me a 4 out of 5 for innovation. I have innovated nothing. I have added steps to processes that were already finished. I have made simple things longer and labeled the difference with words that used to mean something.
Every week in #ai-wins someone posts a screenshot. And 20 people react with the rocket emoji. And nobody posts the part where they deleted the output and did the task themselves. Nobody posts the revert. Nobody posts the before-and-after timer. Nobody will. Because "I was better at my job before the AI tools" is a sentence that cannot be said out loud in any company that has decided AI is the future.
Every company has decided AI is the future.
So we leverage. Quietly. Adding steps. Calling them optimization. Getting slightly less done, slightly more slowly, with slightly more steps, and reporting it as progress.
My yearly review is next month. There is a new section this year. "AI Impact Assessment." It asks me to quantify the hours saved by AI tools per week.
I will write a number. The number will be positive. It will not be true.
But the AI writing assistant will help me phrase it convincingly. That is the one thing it does well.
English
Mike retweetledi

I'm usually not one to write thought pieces without much technical depth. But here we go.
Slow the fuck down.
mariozechner.at/posts/2026-03-…
English

@jeremyphoward Users need to change the way they use it. Iterate on a markdown file. They can back and forth as much as they need. Then when it is ready tell Claude to implement it.
English
Mike retweetledi

@bt_sofia_ai @lossfunk @ShriKaranHanda It's fair ask to see what they can do natively. Evidence is growing that LLMs are great memorization machines and this is further evidence of it.
But agree results without comparison with agentic harnesses in the loop is only half the picture.
English

@lossfunk @ShriKaranHanda Whats the point of testing if you dont already use agentic tooling uh
English
Mike retweetledi

Some people haven't stared into GPT-5.3-Codex' sdead dark eyes long enough.
They haven't looked into the abyss deep enough to notice that down there there's only sed, rg, nl, cat, and awk.
It hasn't hit them that if you take a handful of Unix tools and combine it with intelligence that there's essentially nothing it can't do.
English
Mike retweetledi
Mike retweetledi

I wrote about the exponential improvement path of AI, the early signs of massive transformations in the nature of work (including software companies where nobody codes any more), and how one week in February is an omen of our future as things get weirder. open.substack.com/pub/oneusefult…
English









