Andy Zmolek 🇬🇧 London

9.4K posts

Andy Zmolek 🇬🇧 London banner
Andy Zmolek 🇬🇧 London

Andy Zmolek 🇬🇧 London

@zmolek

Conjurer of Ouroboros dilemma, Phracta founder Chronicler of enterprise mobility ecosystem legend; it’s trust chains and narrative loops all the way down, folks

London, England Katılım Eylül 2008
3.1K Takip Edilen1.2K Takipçiler
Sabitlenmiş Tweet
Andy Zmolek 🇬🇧 London
Some futurists think everything is a nail awaiting a pounding from a GenAI hammer I personally don’t believe in universal wish granting technology
English
1
0
2
72
Andy Zmolek 🇬🇧 London
You can rarely tell me definitively which of your insights for cracking a specific problem came from your right-brain hemisphere (vs your left) without resorting to speculative models based on research into said hemispheric split. Parts of your visual processing however are happening precisely on the opposite hemisphere from your eye (which is why blindsight is a such a funny thing). Similarly, an LLM doesn't have direct access to anything but the tokens themselves in formulating its responses so when when the right frame is forced, the LLM has to resort to imperfect models to derive plausible answers that are embedded within a process it can't observe. Most of the time, humans operate on so many layers of meta-programming that the hemispheric limitations simply don't surface. But LLMs don't have anywhere near the self-programming and self-observing capacity that a typical 5 year old exhibits, so it's much easier to frame out a meta-programming problem that the LLM can't catch. Eventually we'll understand better the magic of our own meta-programming enough to replicate more of it with machines (this will take longer than most AI promoters expect and shorter than doomers realise)
Andy Masley@AndyMasley

If you give Opus 4.7 a discombobulating riddle it gets that it can't answer it but also completely fails to count the letters in specific words correctly

English
0
0
0
27
Andy Zmolek 🇬🇧 London retweetledi
elvis
elvis@omarsar0·
LLM agents loop, drift, and get stuck on hard reasoning tasks up to 30% of the time. Current fixes are either too blunt (hard step limits) or too expensive (LLM-as-judge adding 10-15% overhead per step). New research proposes a smarter middle ground. The work introduces the Cognitive Companion, a parallel monitoring architecture with two variants: an LLM-based monitor and a novel Probe-based monitor that detects reasoning degradation from the model's own hidden states at zero inference overhead. The Probe-based Companion trains a simple logistic regression classifier on hidden states from layer 28. It reads the model's internal representations during the existing forward pass, requiring no additional model calls. A single matrix multiplication is all it takes to flag when reasoning quality is declining. Why does it matter? The LLM-based Companion reduced repetition on loop-prone tasks by 52-62% with roughly 11% overhead. The Probe-based variant achieved a mean effect size of +0.471 with zero measured overhead and AUROC 0.840 on cross-validated detection. But the results also reveal an important nuance: companions help on loop-prone and open-ended tasks while showing neutral or negative effects on structured tasks. Models below 3B parameters also struggled to act on companion guidance at all. This suggests the future isn't universal monitoring but selective activation, deploying cognitive companions only where reasoning degradation is a real risk. Paper: arxiv.org/abs/2604.13759 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
14
31
175
17.3K
Andy Zmolek 🇬🇧 London retweetledi
Dédáyọ̀ Roots
Dédáyọ̀ Roots@DedayoRoots·
In 1879, a British/Scottish medical student named Robert Felkin watched an African healer in Uganda perform a caesarean section. Clean incision. Banana wine as anaesthetic and antiseptic. Bleeding cauterised with hot iron. Wound closed with iron pins and herbal root paste. Mother recovered fully. Baby survived. Felkin noted in his journal that the technique was SO REFINED, it was clearly standard practice, performed routinely long before any European arrived. At that same moment, hospitals in London and Edinburgh were still debating whether caesarean sections could ever be justified on a living woman. European surgeons were operating in street clothes, rarely washing their hands, and losing most patients to post-operative infection. The Africans had already solved anaesthesia, anti sepsis, haemostasis, and wound care. Felkin went home and presented his findings to the Edinburgh Obstetrical Society in 1884. The knife used in that surgery still exists. It is now housed in the Science Museum in London. A silent artifact of a surgical tradition they called primitive. They didn't discover our medicine. They witnessed it, wrote it down and forgot to mention where it came from.
Dédáyọ̀ Roots tweet mediaDédáyọ̀ Roots tweet media
Dédáyọ̀ Roots@DedayoRoots

Share a story that sounds fabricated but is 100% true.

English
186
7.7K
25.8K
895.1K
Andy Zmolek 🇬🇧 London retweetledi
Shane Gu
Shane Gu@shaneguML·
10 years ago today, we lost Sir David MacKay FRS. Physicist. Mathematician. Polymath. Gone at 48. I was working on my PhD at Cambridge, and attended some of his last lectures and symposium. He was a reason that attracted me to Cambridge over MIT in 2014. His textbook, Information Theory, Inference, and Learning Algorithms, was the first ML book I ever read — recommended to me by none other than Geoff Hinton. He used that same information theory to build Dasher — a text entry system where users steer through a continuous stream of letters flowing toward them, with a probabilistic language model making likely next letters larger and easier to reach, so that any tiny movement — a finger, a gaze — becomes efficient writing. It was the first ML application that truly blew my mind, and sent me deep into a rabbit hole: arithmetic coding, PAQ8 compression, nonparametric models. A journey I partly owe to his PhD student Christian Steinruecken, who also happened to share my love of Japan. As Chief Scientific Advisor to the UK's Department of Energy & Climate Change, he brought a physicist's clarity to policy. In Sustainable Energy – Without the Hot Air, he ran the numbers on our entire energy diet — and made me confront an uncomfortable truth. One of the biggest single factors? Beef — roughly 1,000 days of cow-time per steak. Hard to argue with the data. Hard to act on it when you were born and raised in Japan. I'm still working on that one, David. At his final symposium in Cambridge — just a few weeks before his passing — the room told the full story. Geoff Hinton and his Caltech PhD advisor John Hopfield — both Nobel Prize winners in Physics 2024 — gave tributes. Environment policy advisers spoke. Dasher users sent video messages of thanks from around the world — people who found their voice because of him. It was extraordinary to witness, in one room, just how many minds and lives a single person had touched. The story of how Hinton first noticed him: at a conference workshop poster session, among everyone who stopped by, it was the young MacKay who asked the sharpest, most penetrating question. Hinton remembered it. That's how it begins. I've always liked physicists who cross into ML — they bring a groundedness, a refusal to hide behind formalism without meaning. David MacKay and Max Welling are the role models I point to. Not just for the mathematics they built, but for how they carried it: with humility, curiosity, and a stubborn insistence on reaching beyond academia. He seemed to know his time was limited, and gave everything anyway. His legacy stays.
Shane Gu tweet media
GIF
Shane Gu tweet mediaShane Gu tweet media
English
19
93
741
106.3K
Andy Zmolek 🇬🇧 London retweetledi
shira
shira@shiraeis·
Found a paper that suggests we may have spent years training agents to become hunters of proxy reward when the more basic thing intelligence craves is not a reward at all, but to not run out of viable futures. The paper proposes that behavior is best understood as maximizing future action-state path occupancy, which collapses mathematically into a discounted entropy objective. The agent doesn’t necessarily want to GET something, but rather is trying to keep as many meaningful trajectories alive as possible. The obvious objection is “so it just does random shit? fuck around and find out?” No, this is where it gets pretty beautiful. The agent is variable when variation is cheap and becomes surgically goal-oriented the moment an absorbing state (death, starvation, falling over, etc) gets close enough to threaten its future path space. Variability is the same drive as goal-directedness, just operating under different constraints. The demos are kinda wild: - A cartpole (classic move a cart to keep a pole from falling control task) that doesn’t merely balance but dances and swings through a huge range of angles and positions because why not? The whole point is occupying state space, and rigid balance is a voluntarily impoverished life. - A prey-predator gridworld where the mouse PLAYS with the cat, teasing it and using both clockwise and counterclockwise routes around obstacles to lure it away from the food source before slipping in to eat, using both routes roughly equally. A reward-maximizing agent would collapse to one strategy and exploit it. Here, the agent keeps its behavioral repertoire - A quadruped trained with Soft Actor-Critic and ZERO external reward that learns to walk, jump, spin, and stabilize, and then makes a beeline for food only when its internal energy drops low enough that starvation becomes a real threat The thing that hit me hardest is the comparison to empowerment and free energy principle agents. Both collapse to near-deterministic policies with almost no behavioral variability. This paper’s agents find the highest-empowerment state and exploit it. FEP agents converge to classical reward maximizers. As far as I’m aware, this is the only framework that produces agents you could describe as being “alive.” The AI implication here is that we undertrain for behavioral repertoire. Most systems hit the benchmark by collapsing onto a narrow attractor basin of good-enough trajectories. They’re competent for sure, but brittle too, with one viable plan, executed until the world shifts and leaves them with nothing. The thing I increasingly want from agents isn’t competence per se, but option-preserving competence. I want agents with the ability to keep multiple viable plans alive and switch between them without catastrophe. We’ve been so focused on teaching agents what to want that we never stopped to ask what happens if wanting isn’t the point, if the deepest drive isn’t necessarily toward anything, but away from the walls closing in. paper: nature.com/articles/s4146…
shira tweet media
English
74
131
1.1K
68.5K
Andy Zmolek 🇬🇧 London retweetledi
Tech with Mak
Tech with Mak@techNmak·
Andrej Karpathy wrote something that every Claude Code user has felt but couldn't articulate. Three quotes. Read them slowly. "The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do." "They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task." You've seen all three. Probably this week. Someone turned these three observations into a single CLAUDE[.]md file. Four principles, one install, directly addresses each quote: 1./ Think before coding Don't assume. Don't hide confusion. State ambiguity explicitly. Present multiple interpretations rather than silently picking one. Push back if a simpler approach exists. Stop and ask rather than guess. 2./ Simplicity first No features beyond what was asked. No abstractions for single-use code. No "flexibility" that wasn't requested. No error handling for impossible scenarios. The test: would a senior engineer say this is overcomplicated? If yes, rewrite it. 3./ Surgical changes Don't "improve" adjacent code. Don't refactor things that aren't broken. Match the existing style even if you'd do it differently. If you notice unrelated dead code, mention it, don't delete it. Every changed line should trace directly to the request. 4./ Goal-driven execution Transform "fix the bug" into "write a test that reproduces it, then make it pass." Transform "add validation" into "write tests for invalid inputs, then make them pass." Give it success criteria and watch it loop until done. This last one is Karpathy's key insight captured directly: "LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go." It's a single file. Drop it into any project.
Tech with Mak tweet media
English
52
211
2.1K
185.9K
Andy Zmolek 🇬🇧 London retweetledi
NIK
NIK@ns123abc·
🚨 Anthropic just revealed their unreleased frontier model called Claude Mythos Preview The model is INSANE It found thousands of zero-day vulnerabilities in EVERY major operating system and browsers: > 27-year-old bug in OpenBSD > 16-year-old bug in FFmpeg that automated tools hit 5M times without catching Completely autonomous. No human steering. They assembled an entire industry coalition called Project Glasswing around it: AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, JPMorgan, Cisco, Palo Alto, Linux Foundation Goal: patch the world’s software BEFORE releasing it > SWE-bench: 93.9% (Opus 4.6: 80.8%) > Anthropic is committing $100M in usage credits > Thousands of vulnerabilities in 40+ organizations are being fixed right now Yesterday OpenAI published a 13-page essay warning about cyber threats and asking the government to help… Today Anthropic actually fixed them.
NIK tweet mediaNIK tweet media
Anthropic@AnthropicAI

Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser.

English
84
178
2.2K
299.2K
Andy Zmolek 🇬🇧 London
The mythical man month returns thanks to AI. So what should we call the AI equivalent of Brooke’s law? ChatGPT initially ran with an idea without further context and proposed Zmolek’s Law (Orchestration Law): “Adding agents to an under-orchestrated system increases entropy faster than capability.” Several turns of context insertion later it suggests we translate Brooks more faithfully (as coordination overhead dominates progress); now claims equivalent here is: 👉 The Accretion Law “In AI-assisted systems, what is generated accumulates faster than it is integrated or removed.”
gregorein@Gregorein

so... I audited Garry's website after he bragged about 37K LOC/day and a 72-day shipping streak. here's what 78,400 lines of AI slop code actually looks like in production. a single homepage load of garryslist.org downloads 6.42 MB across 169 requests. for a newsletter-blog-thingy. 1/9🧵

English
0
0
0
42
Andy Zmolek 🇬🇧 London retweetledi
gregorein
gregorein@Gregorein·
so... I audited Garry's website after he bragged about 37K LOC/day and a 72-day shipping streak. here's what 78,400 lines of AI slop code actually looks like in production. a single homepage load of garryslist.org downloads 6.42 MB across 169 requests. for a newsletter-blog-thingy. 1/9🧵
gregorein tweet media
Garry Tan@garrytan

Absolutely insane week for agentic engineering 37K LOC per day across 5 projects Still speeding up

English
348
626
7.7K
2.8M
Andy Zmolek 🇬🇧 London retweetledi
Rosemary Kelanic
Rosemary Kelanic@RKelanic·
This great piece by @ishaantharoor, in which I'm quoted, gets the 1956 Suez War history right and explains its echoes for the Iran War. Increasingly I fear that Trump's massive strategic error will damage U.S. power and prestige far more than Suez hurt Britain and France. The UK/France gambit to take the canal backfired spectacularly. The *war itself* prompted Nasser to block the canal -- the outcome the UK/Fra was trying to prevent. It heralded the final decline of Britain from great power to "has been" status. The analogies to Trump's Iran debacle are legion. But an especially overlooked similarity is how the Suez War dramatically strengthened Nasser's power and influence throughout the region -- much like Trump how Trump's war has perversely *strengthened* the Islamic Republic of Iran. The war has provoked an entirely predictable (in fact, predicted) nationalistic response among many Iranians -- including those who hate the regime but now hate the U.S. and Israel more for bombing universities, threatening the electric grid, and blanketing Tehran with toxic rain following the explosion at a nearby refinery. Not only has the regime consolidated power, but it is now filled with hardliners after Israeli assassinations have killed off relative pragmatists like Ali Larijani. Courtesy of Trump, Iran has also discovered it can paralyze oil shipping through the Strait of Hormuz and collect "tolls" in exchange for freedom of passage. Iran is now trying to institutionalize its newfound leverage, which could be a lasting unintended consequence of this foolish war. I've argued before that Trump's Iran war is already the U.S.'s "Suez Moment" in terms of signifying U.S. strategic decline -- especially a decline in our ability to make sound national security decisions. But the Iran War could turn out considerably worse than Suez because the US has no one to check us from our own strategic excesses. This war will unfold as badly as Trump decides to make it, and the indications are that he intends to escalate, making it worse. Russia and China are sipping champagne while they watch the U.S. self-destruct from the sidelines. In 1956, both the U.S. and the USSR leaned heavily on Britain and France to withdraw. The Soviets even made blatant nuclear threats to compel UK/Fra to quit Suez. In 2026, there is no higher power. Only the U.S. itself can course-correct before making a bad situation even worse with further escalation. But Trump's impenetrable hubris and poor decision-making don't inspire confidence that the U.S. will retrench. @defpriorities @NewYorker newyorker.com/news/the-lede/…
English
32
398
865
97.3K
Andy Zmolek 🇬🇧 London retweetledi
Gracie💙
Gracie💙@Gracie_Blue89·
In 1999, David Bowie and Jeremy Paxman discussed the internet. Bowie foresaw the potential it had Paxman dismissed Bowie's prediction, looking at him as if he was exaggerating David "I think we are on the cusp of something exhilarating and terrifying"
English
0
8
18
975
Andy Zmolek 🇬🇧 London retweetledi
elvis
elvis@omarsar0·
NEW AI report from Google. Every prior intelligence explosion in human history was social, not individual. These authors make the case that the AI "singularity" framed as a single superintelligent mind bootstrapping to godlike intelligence is fundamentally wrong. This is directly relevant to anyone designing multi-agent systems. They observe that frontier reasoning models like DeepSeek-R1 spontaneously develop internal "societies of thought," multi-agent debates among cognitive perspectives, through RL alone. The path forward is human-AI configurations and agent institutions, not bigger monolithic oracles. This reframes AI scaling strategy from "build bigger models" to "compose richer social systems." It argues governance of AI agents should follow institutional design principles, checks and balances, role protocols, rather than individual alignment. Paper: arxiv.org/abs/2603.20639 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
130
347
1.7K
195.2K
Mo Syed
Mo Syed@msyed_·
@SpencerKlavan Yes and when I saw this photo I always said to myself, "This is going to fail miserably."
English
2
0
11
1.2K
Spencer A. Klavan
Spencer A. Klavan@SpencerKlavan·
This is a real thing that happened
Spencer A. Klavan tweet media
English
83
166
2.7K
68.7K
Andy Zmolek 🇬🇧 London retweetledi
Quanta Magazine
Quanta Magazine@QuantaMagazine·
A trio of mathematicians built the first physical model of a “monostable” tetrahedron, a shape that will always flip-flop onto the same face no matter what side you place it on. In order for it to work properly, it had to be engineered to a level of precision within one-tenth of a gram and one-tenth of a millimeter. (From the archive) quantamagazine.org/a-new-pyramid-…
English
11
85
622
64.6K
Andy Zmolek 🇬🇧 London
@om Nothing more powerful than a good narrative loop when bootstrapping a symbolic capitalisation machine
English
1
0
1
88