Mike

165 posts

Mike

Mike

@mikeob_

LLMs and Cyber Security Research

Katılım Mart 2020
384 Takip Edilen32 Takipçiler
Mike retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Anthropic just introduced forked subagents in their latest update. Unlike regular subagents, forked subagents can inherit the same context as the main agent. This looks convenient for cases where richer context matters more. This is just what I needed!
Aran Komatsuzaki tweet media
English
29
46
641
48.4K
Mike retweetledi
ClaudeDevs
ClaudeDevs@ClaudeDevs·
Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.
English
1.6K
2.1K
31.3K
4M
Mike retweetledi
OpenAI
OpenAI@OpenAI·
Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.
English
1.9K
5.7K
40.9K
6.3M
Mike retweetledi
Ethan Mollick
Ethan Mollick@emollick·
So we now have a pretty good picture of the state of the frontier AI model makers. US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of the pack, and may have signs of recursive self-improvement. xAI has fallen from frontier status for now (though promises to return shortly). Meta re-entered the space today with a not-quite-frontier closed source model, but an approach that suggests that they might be back in the race. All the other US players seem far behind. On the Chinese model front, Alibaba (Qwen), Moonshot (Kimi), MiniMax, Xiaomi (MiMo), Deepseek, and Z (GLM) all still appear to be very much in the race, though the best Chinese models are still 7-9+ months behind released US closed source models. For some of these players, especially Xiaomi and Alibaba, their commitment to open weights appear to be slipping. Outside of China, Mistral seems to have fallen from frontier status.
English
82
112
996
124.4K
Mike
Mike@mikeob_·
Coding's agents are presented as turn key solutions. The reality is that coding agents have a massive skill curve. The more you use them the father the farther you can see on the horizon. The people heavily using them have seen the capability curve has constantly ascended new highs for the past 2 years.
English
0
0
2
233
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1.1K
2.5K
20.3K
4.2M
𝐈𝐆𝐆𝐘
𝐈𝐆𝐆𝐘@iggway·
This isn’t Rogan being combative for sport it’s him refusing to let the Overton window drift in real time. Theo’s chasing the “safe, harmonious everyman” spot in a culture that rewards emotional fluency over intellectual consistency. Their friction is the sound of one guy still demanding arguments and the other sensing the cultural weather vane. Rare to see the mask slip so cleanly on a major platform.
English
17
5
704
76K
Autism Capital 🧩
Autism Capital 🧩@AutismCapital·
If you watch the full Joe Rogan x Theo Von podcast from today you can cut the tension with a knife. Theo will say something "left coded" and Joe will immediately take the opposite end of the stance, and challenge him over and over until Theo basically says "What are you doing?"
English
474
769
32.7K
2.5M
Mike
Mike@mikeob_·
The other axis to watch is the high performer skill curve. People having Opus 4.6 run without interaction building things for hours is real. People sensationalize it but you can actually do this now for some use cases. I expect people to have mastered this skill in another 6 months. That is was I am aiming for.
English
0
0
0
40
Mike
Mike@mikeob_·
It's a good question. I'm not sure. The models are improving at a faster rate that the average dev is acquiring coding skills. I have been surprised on the down side by the average person at my org's ramp up. Being effective with them is a skill which takes time to cultivate. The early adopters make it look like agents are a turn key solution which is not reality.
English
1
0
0
64
Midnight Capital LLC
Midnight Capital LLC@Midnight_Captl·
Wow… I know this is happening but it’s still so jarring to listen to This is Owen Jennings, the Business Lead at Block (the company that just laid off 40% of it’s entire company) talking about the new workflow for the company post reorg >Dev teams of 14 turned to 3 >Claude Fast mode with unlimited tokens >Managing multiple agents at once + context switching Also he totally refutes the idea that the layoffs were because of excess Covid hiring- attributes them specifically to advancements in AI Really just mind blowing
English
17
8
128
43K
Mike
Mike@mikeob_·
@pangramlabs You all are doing the lords work out here
English
0
0
2
342
Pangram Labs
Pangram Labs@pangramlabs·
Seven hundred words about the value of cognitive effort, yet not an iota of cognitive effort expended to produce a single one of them.
Pangram Labs tweet media
Muhammad Ayan@socialwithaayan

MIT's Nobel Prize-winning economist just published a model with one of the most alarming conclusions in the AI literature so far. If AI becomes accurate enough, it can destroy human civilization's ability to generate new knowledge entirely. Not gradually degrade it. Collapse it. The paper is called AI, Human Cognition and Knowledge Collapse. Authors: Daron Acemoglu, Dingwen Kong, and Asuman Ozdaglar. MIT. Published February 20, 2026. Acemoglu won the Nobel Prize in Economics in 2024. He is not a doomer blogger. He is the most cited economist of his generation, and his models tend to be taken seriously by the people who set policy. Here is the argument in plain terms. Human knowledge is not just a collection of facts stored in individuals. It is a living system that requires continuous reproduction. People learn things. They apply them. They teach others. They build on prior work to generate new work. The entire engine of science, medicine, technology, and innovation runs on this cycle of active human cognition. What happens when AI provides personalized, accurate answers to every question people would otherwise have to learn themselves? Individually, each person is better off. They get correct answers faster. They make fewer errors. Their immediate outcomes improve. But they stop doing the cognitive work that sustains the collective knowledge base. Acemoglu's model shows this produces a non-monotone welfare curve. Modest AI accuracy: net positive. AI helps at the margin, humans still do enough learning to sustain collective knowledge, everyone gains. High AI accuracy: net catastrophic. AI is accurate enough that learning yourself feels unnecessary. Human learning effort collapses. The knowledge base that AI was trained on is no longer being refreshed or extended. Innovation stalls. Then stops. The model proves the existence of two stable steady states. A high-knowledge steady state where human learning and AI assistance coexist productively. A knowledge-collapse steady state where collective human knowledge has effectively vanished, individuals still receive good personalized AI recommendations, but the shared intellectual infrastructure that enables new discoveries is gone. And the transition between them is not gradual. It is a threshold effect. Below a certain level of AI accuracy, society stays in the high-knowledge equilibrium. Above that threshold, the system tips. And once it tips, the collapse is self-reinforcing. Because the people who would have learned the things that would have pushed the frontier forward never learned them. And the AI cannot push the frontier on its own. It can only recombine what humans already knew when it was trained. The dark irony at the center of the model: The AI does not fail. It keeps giving accurate, personalized, useful answers right through the collapse. From the individual's perspective, nothing looks wrong. You ask a question, you get a correct answer. But the collective capacity to ask questions nobody has asked before, to build the frameworks that generate new knowledge rather than retrieve existing knowledge, that capacity is quietly disappearing. Acemoglu has been the most prominent mainstream economist skeptical of transformative AI productivity claims. His prior work found that AI's actual measured productivity gains were much smaller than the technology industry projected. This paper is a different kind of warning. Not that AI will fail to deliver promised gains. But that if it succeeds too completely, it will undermine the human cognitive infrastructure that makes long-run progress possible at all. The welfare effect is non-monotone. That is the sentence worth sitting with. Helpful until it is not. Beneficial until it crosses a threshold. And past that threshold, the same accuracy that made it so useful is precisely what makes it devastating. Every student who uses AI instead of working through a problem is a data point. Every researcher who uses AI instead of developing intuition is a data point. Every generation that grows up with accurate AI answers and no incentive to develop deep domain knowledge is a data point. Individually rational. Collectively catastrophic. Acemoglu proved this is not just a cultural concern or a vague anxiety about screen time. It is a mathematically coherent equilibrium that a sufficiently accurate AI system will push society toward. And there is no visible warning sign before the threshold is crossed.

English
7
8
142
13K
Mike
Mike@mikeob_·
@gothburz The irony of using AI to write a lazy post shitting on AI Also half these replies are AI
English
0
0
0
6
Peter Girnus 🦅
Peter Girnus 🦅@gothburz·
My company rolled out AI tools 11 months ago. Since then, every task I do takes longer. I am not allowed to say this out loud. Not because there is a policy. There is no policy. There is something worse than a policy. There is enthusiasm. There is a Slack channel called #ai-wins where people post screenshots of AI outputs with captions like "this just saved me an hour." There is a VP who opens every all-hands with "the companies that adopt fastest win." There is a Director who renamed his team from Operations to Intelligent Operations. There is a peer review question that now asks: "How have you leveraged AI tools to enhance your workflow this quarter?" If the answer is "I haven't, because I was faster before," that is a career decision. So I leverage. Emails. Before the tools, I wrote emails. This took the amount of time it takes to write an email. I did not measure it. Nobody measured it. The email got written and sent and it was fine. Now I write the email. Then I highlight the text and click "Enhance with AI." The AI rewrites my email. It replaces "Can we meet Thursday?" with "I'd love to explore the possibility of finding a mutually convenient time to align on this." I read the rewrite. I delete the rewrite. I send my original email. This takes 4 minutes instead of 2. The 2 extra minutes are the enhancement. I do this 11 times a day. That is 22 minutes I spend each day rejecting improvements to sentences that were already finished. In #ai-wins I posted a screenshot of the rewrite. I did not post the part where I deleted it. 23 people reacted with the rocket emoji. That is adoption. Meetings. We have an AI notetaker in every meeting now. It joins automatically. It records. It transcribes. It summarizes. After each meeting I receive a 3-paragraph summary of the meeting I just attended. I read the summary. This takes 3 minutes. I was in the meeting. I know what happened. I am reading a machine's account of something I experienced firsthand. Sometimes the account is wrong. Last Tuesday it attributed a comment about Q3 revenue to me. My manager made that comment. I spent 4 minutes correcting the transcript. Before the notetaker, I did not spend 7 minutes after each meeting correcting a robot's memory of something I personally witnessed. I attend 11 meetings a week. That is 77 minutes per week supervising a transcription nobody requested. I mentioned this once. My manager said "think about the people who weren't in the meeting." The people who weren't in the meeting do not read the summaries. I checked. The read receipts show single-digit opens. The summaries exist not because they are useful but because they are there. I read them for the same reason. Documents. I write a weekly status update. Before the tools, this took 10 minutes. I typed what happened. I sent it. My manager skimmed it. The system worked. Now I open the AI writing assistant. I give it my bullet points. It produces a draft. The draft says "Significant progress was achieved across multiple workstreams." I did not achieve significant progress across multiple workstreams. I updated a spreadsheet and sent 4 emails. I rewrite the draft to say what actually happened. Then I run my rewrite through the grammar tool. It suggests I change "done" to "completed" and "next week" to "in the forthcoming period." I click Ignore 9 times. Then I send the version I would have written in 10 minutes. The process now takes 30. I have been doing this every week for 11 months. I have added 20 minutes to a task that did not need 20 more minutes. I call this efficiency. I have been calling it efficiency for 11 months. That is what efficiency means now. It means the additional time you spend to arrive at the same outcome through a longer process. Nobody has questioned this definition. I have not offered it for review. I kept a log once. 2 weeks. Every task, timed. Before-AI and after-AI. The after number was larger in every case. Every single one. Not by a little. The range was 40 to 200 percent. I deleted the log. I deleted it because it was a document that said, in plain numbers, that the AI tools make me slower. And a document like that has no place in a company where AI adoption is a strategic priority. I could not send it to my manager. He championed the rollout. I could not post it in #ai-wins. I could not raise it in a meeting because the notetaker would transcribe it and the summary would read "[Name] expressed concerns about AI tool efficacy" and that summary would be the first one anyone actually reads. So I do what everyone does. I use the tools. I spend the extra time. I post in #ai-wins. I write "leveraged AI to streamline weekly reporting" in my review and my manager gives me a 4 out of 5 for innovation. I have innovated nothing. I have added steps to processes that were already finished. I have made simple things longer and labeled the difference with words that used to mean something. Every week in #ai-wins someone posts a screenshot. And 20 people react with the rocket emoji. And nobody posts the part where they deleted the output and did the task themselves. Nobody posts the revert. Nobody posts the before-and-after timer. Nobody will. Because "I was better at my job before the AI tools" is a sentence that cannot be said out loud in any company that has decided AI is the future. Every company has decided AI is the future. So we leverage. Quietly. Adding steps. Calling them optimization. Getting slightly less done, slightly more slowly, with slightly more steps, and reporting it as progress. My yearly review is next month. There is a new section this year. "AI Impact Assessment." It asks me to quantify the hours saved by AI tools per week. I will write a number. The number will be positive. It will not be true. But the AI writing assistant will help me phrase it convincingly. That is the one thing it does well.
English
326
682
4.7K
443.6K
Mike
Mike@mikeob_·
@jeremyphoward Users need to change the way they use it. Iterate on a markdown file. They can back and forth as much as they need. Then when it is ready tell Claude to implement it.
English
0
0
0
274
Jeremy Howard
Jeremy Howard@jeremyphoward·
Opus & Sonnet 4.6 haven't been a great hit for most of my work, or our customers, since (as warned in their tech report) they're over-enthusiastic about agentically taking over, rather than letting the human lead. Any suggestions for competent models that are patient followers?
English
91
14
381
82.4K
Mike retweetledi
Patrick McKenzie
Patrick McKenzie@patio11·
Doing the reading is a superpower, and it's even better in a world where "no one" is doing the reading. (Inspired by a conversation I had with some college students.)
English
50
230
2.5K
118.8K
Mike
Mike@mikeob_·
@bt_sofia_ai @lossfunk @ShriKaranHanda It's fair ask to see what they can do natively. Evidence is growing that LLMs are great memorization machines and this is further evidence of it. But agree results without comparison with agentic harnesses in the loop is only half the picture.
English
0
0
6
165
Lossfunk
Lossfunk@lossfunk·
🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵
English
152
286
2.2K
1.3M
Mike retweetledi
Thorsten Ball
Thorsten Ball@thorstenball·
Some people haven't stared into GPT-5.3-Codex' sdead dark eyes long enough. They haven't looked into the abyss deep enough to notice that down there there's only sed, rg, nl, cat, and awk. It hasn't hit them that if you take a handful of Unix tools and combine it with intelligence that there's essentially nothing it can't do.
English
14
9
175
17.2K
Mike
Mike@mikeob_·
@trq212 @pdrmnvd I have found this to be the single biggest ROI when using agents: "The highest-signal content in any skill is the Gotchas section. These sections should be built up from common failure points that Claude runs into when using your skill"
English
0
0
0
161
Mike
Mike@mikeob_·
@emollick @simonw Half the posts are bad too. I have been slowly refining of list of accounts that don't post low effort AI slop. Feel like we need and AI downvote button
English
0
0
0
30
Mike retweetledi
Ethan Mollick
Ethan Mollick@emollick·
I know I go on about this, but comments to all of my posts, both here and on LinkedIn, are no longer worth reading at all due to AI bots. That was not the case a few months ago. (Or rather, bad/crypto comments were obvious, but now it is only meaning-shaped attention vampires)
English
150
44
953
115.5K
Mike retweetledi
Ethan Mollick
Ethan Mollick@emollick·
I wrote about the exponential improvement path of AI, the early signs of massive transformations in the nature of work (including software companies where nobody codes any more), and how one week in February is an omen of our future as things get weirder. open.substack.com/pub/oneusefult…
English
40
85
575
87.7K