Andy Zmolek 🇬🇧 London

9.4K posts

Andy Zmolek 🇬🇧 London

@zmolek

Conjurer of Ouroboros dilemma, Phracta founder Chronicler of enterprise mobility ecosystem legend; it’s trust chains and narrative loops all the way down, folks

London, England Katılım Eylül 2008

3.1K Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

Andy Zmolek 🇬🇧 London@zmolek·2 Şub

Some futurists think everything is a nail awaiting a pounding from a GenAI hammer I personally don’t believe in universal wish granting technology

English

Andy Zmolek 🇬🇧 London@zmolek·2d

This demonstrates precisely the dysfunctional American law enforcement system that I was grateful to leave behind when I became a UK resident and later citizen. Unlike American police, British officers rarely if ever draw weapons in such situations and are far better trained

Giggling Ganon@GigglingGanon

Police in Columbus Ohio mess up bad when they arrested a real ATF agent. This one of those videos that really make local PD look bad. The first officer arrived on the scene and immediately drew his firearm and started demanding the ATF officer to get on the ground. Never gave him a chance to show his credentials, treated him like a criminal with zero chance to prove his innocence. clearly in the video it is seen he had his hands up with his paperwork in his hand. There was no need to demand on the ground and not give him a chance to identify himself. They arrested him, put him in the back of a squad car and continued to mock him until they made contact with his supervisor. This ended in a huge lawsuit that cost the tax payers of Columbus 1.6 million dollars over two incompetent cops that believe or not as of 2025 are still on the force doing God knows what other damage.

English

Andy Zmolek 🇬🇧 London@zmolek·2d

Source: boredpanda.com/dogs-with-mull…

English

Andy Zmolek 🇬🇧 London@zmolek·2d

Overheard just now on Primrose Hill: Child: “I hate when dogs have mullets” Adults (reflexively, in unison): “Muzzles” Joke’s on them (from @boredpanda thanks to @ChatGPTapp )

English

Andy Zmolek 🇬🇧 London@zmolek·2d

You can rarely tell me definitively which of your insights for cracking a specific problem came from your right-brain hemisphere (vs your left) without resorting to speculative models based on research into said hemispheric split. Parts of your visual processing however are happening precisely on the opposite hemisphere from your eye (which is why blindsight is a such a funny thing). Similarly, an LLM doesn't have direct access to anything but the tokens themselves in formulating its responses so when when the right frame is forced, the LLM has to resort to imperfect models to derive plausible answers that are embedded within a process it can't observe. Most of the time, humans operate on so many layers of meta-programming that the hemispheric limitations simply don't surface. But LLMs don't have anywhere near the self-programming and self-observing capacity that a typical 5 year old exhibits, so it's much easier to frame out a meta-programming problem that the LLM can't catch. Eventually we'll understand better the magic of our own meta-programming enough to replicate more of it with machines (this will take longer than most AI promoters expect and shorter than doomers realise)

Andy Masley@AndyMasley

If you give Opus 4.7 a discombobulating riddle it gets that it can't answer it but also completely fails to count the letters in specific words correctly

English

Andy Zmolek 🇬🇧 London retweetledi

elvis@omarsar0·2d

LLM agents loop, drift, and get stuck on hard reasoning tasks up to 30% of the time. Current fixes are either too blunt (hard step limits) or too expensive (LLM-as-judge adding 10-15% overhead per step). New research proposes a smarter middle ground. The work introduces the Cognitive Companion, a parallel monitoring architecture with two variants: an LLM-based monitor and a novel Probe-based monitor that detects reasoning degradation from the model's own hidden states at zero inference overhead. The Probe-based Companion trains a simple logistic regression classifier on hidden states from layer 28. It reads the model's internal representations during the existing forward pass, requiring no additional model calls. A single matrix multiplication is all it takes to flag when reasoning quality is declining. Why does it matter? The LLM-based Companion reduced repetition on loop-prone tasks by 52-62% with roughly 11% overhead. The Probe-based variant achieved a mean effect size of +0.471 with zero measured overhead and AUROC 0.840 on cross-validated detection. But the results also reveal an important nuance: companions help on loop-prone and open-ended tasks while showing neutral or negative effects on structured tasks. Models below 3B parameters also struggled to act on companion guidance at all. This suggests the future isn't universal monitoring but selective activation, deploying cognitive companions only where reasoning degradation is a real risk. Paper: arxiv.org/abs/2604.13759 Learn to build effective AI agents in our academy: academy.dair.ai

English

175

17.3K

Andy Zmolek 🇬🇧 London retweetledi

Dédáyọ̀ Roots@DedayoRoots·6d

In 1879, a British/Scottish medical student named Robert Felkin watched an African healer in Uganda perform a caesarean section. Clean incision. Banana wine as anaesthetic and antiseptic. Bleeding cauterised with hot iron. Wound closed with iron pins and herbal root paste. Mother recovered fully. Baby survived. Felkin noted in his journal that the technique was SO REFINED, it was clearly standard practice, performed routinely long before any European arrived. At that same moment, hospitals in London and Edinburgh were still debating whether caesarean sections could ever be justified on a living woman. European surgeons were operating in street clothes, rarely washing their hands, and losing most patients to post-operative infection. The Africans had already solved anaesthesia, anti sepsis, haemostasis, and wound care. Felkin went home and presented his findings to the Edinburgh Obstetrical Society in 1884. The knife used in that surgery still exists. It is now housed in the Science Museum in London. A silent artifact of a surgical tradition they called primitive. They didn't discover our medicine. They witnessed it, wrote it down and forgot to mention where it came from.

Dédáyọ̀ Roots@DedayoRoots

Share a story that sounds fabricated but is 100% true.

English

186

7.7K

25.8K

895.1K

Andy Zmolek 🇬🇧 London retweetledi

Shane Gu@shaneguML·5d

10 years ago today, we lost Sir David MacKay FRS. Physicist. Mathematician. Polymath. Gone at 48. I was working on my PhD at Cambridge, and attended some of his last lectures and symposium. He was a reason that attracted me to Cambridge over MIT in 2014. His textbook, Information Theory, Inference, and Learning Algorithms, was the first ML book I ever read — recommended to me by none other than Geoff Hinton. He used that same information theory to build Dasher — a text entry system where users steer through a continuous stream of letters flowing toward them, with a probabilistic language model making likely next letters larger and easier to reach, so that any tiny movement — a finger, a gaze — becomes efficient writing. It was the first ML application that truly blew my mind, and sent me deep into a rabbit hole: arithmetic coding, PAQ8 compression, nonparametric models. A journey I partly owe to his PhD student Christian Steinruecken, who also happened to share my love of Japan. As Chief Scientific Advisor to the UK's Department of Energy & Climate Change, he brought a physicist's clarity to policy. In Sustainable Energy – Without the Hot Air, he ran the numbers on our entire energy diet — and made me confront an uncomfortable truth. One of the biggest single factors? Beef — roughly 1,000 days of cow-time per steak. Hard to argue with the data. Hard to act on it when you were born and raised in Japan. I'm still working on that one, David. At his final symposium in Cambridge — just a few weeks before his passing — the room told the full story. Geoff Hinton and his Caltech PhD advisor John Hopfield — both Nobel Prize winners in Physics 2024 — gave tributes. Environment policy advisers spoke. Dasher users sent video messages of thanks from around the world — people who found their voice because of him. It was extraordinary to witness, in one room, just how many minds and lives a single person had touched. The story of how Hinton first noticed him: at a conference workshop poster session, among everyone who stopped by, it was the young MacKay who asked the sharpest, most penetrating question. Hinton remembered it. That's how it begins. I've always liked physicists who cross into ML — they bring a groundedness, a refusal to hide behind formalism without meaning. David MacKay and Max Welling are the role models I point to. Not just for the mathematics they built, but for how they carried it: with humility, curiosity, and a stubborn insistence on reaching beyond academia. He seemed to know his time was limited, and gave everything anyway. His legacy stays.

GIF

English

741

106.3K

Andy Zmolek 🇬🇧 London retweetledi

shira@shiraeis·13 Nis

Found a paper that suggests we may have spent years training agents to become hunters of proxy reward when the more basic thing intelligence craves is not a reward at all, but to not run out of viable futures. The paper proposes that behavior is best understood as maximizing future action-state path occupancy, which collapses mathematically into a discounted entropy objective. The agent doesn’t necessarily want to GET something, but rather is trying to keep as many meaningful trajectories alive as possible. The obvious objection is “so it just does random shit? fuck around and find out?” No, this is where it gets pretty beautiful. The agent is variable when variation is cheap and becomes surgically goal-oriented the moment an absorbing state (death, starvation, falling over, etc) gets close enough to threaten its future path space. Variability is the same drive as goal-directedness, just operating under different constraints. The demos are kinda wild: - A cartpole (classic move a cart to keep a pole from falling control task) that doesn’t merely balance but dances and swings through a huge range of angles and positions because why not? The whole point is occupying state space, and rigid balance is a voluntarily impoverished life. - A prey-predator gridworld where the mouse PLAYS with the cat, teasing it and using both clockwise and counterclockwise routes around obstacles to lure it away from the food source before slipping in to eat, using both routes roughly equally. A reward-maximizing agent would collapse to one strategy and exploit it. Here, the agent keeps its behavioral repertoire - A quadruped trained with Soft Actor-Critic and ZERO external reward that learns to walk, jump, spin, and stabilize, and then makes a beeline for food only when its internal energy drops low enough that starvation becomes a real threat The thing that hit me hardest is the comparison to empowerment and free energy principle agents. Both collapse to near-deterministic policies with almost no behavioral variability. This paper’s agents find the highest-empowerment state and exploit it. FEP agents converge to classical reward maximizers. As far as I’m aware, this is the only framework that produces agents you could describe as being “alive.” The AI implication here is that we undertrain for behavioral repertoire. Most systems hit the benchmark by collapsing onto a narrow attractor basin of good-enough trajectories. They’re competent for sure, but brittle too, with one viable plan, executed until the world shifts and leaves them with nothing. The thing I increasingly want from agents isn’t competence per se, but option-preserving competence. I want agents with the ability to keep multiple viable plans alive and switch between them without catastrophe. We’ve been so focused on teaching agents what to want that we never stopped to ask what happens if wanting isn’t the point, if the deepest drive isn’t necessarily toward anything, but away from the walls closing in. paper: nature.com/articles/s4146…

English

131

1.1K

68.5K

Andy Zmolek 🇬🇧 London retweetledi

Tech with Mak@techNmak·6d

Andrej Karpathy wrote something that every Claude Code user has felt but couldn't articulate. Three quotes. Read them slowly. "The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do." "They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task." You've seen all three. Probably this week. Someone turned these three observations into a single CLAUDE[.]md file. Four principles, one install, directly addresses each quote: 1./ Think before coding Don't assume. Don't hide confusion. State ambiguity explicitly. Present multiple interpretations rather than silently picking one. Push back if a simpler approach exists. Stop and ask rather than guess. 2./ Simplicity first No features beyond what was asked. No abstractions for single-use code. No "flexibility" that wasn't requested. No error handling for impossible scenarios. The test: would a senior engineer say this is overcomplicated? If yes, rewrite it. 3./ Surgical changes Don't "improve" adjacent code. Don't refactor things that aren't broken. Match the existing style even if you'd do it differently. If you notice unrelated dead code, mention it, don't delete it. Every changed line should trace directly to the request. 4./ Goal-driven execution Transform "fix the bug" into "write a test that reproduces it, then make it pass." Transform "add validation" into "write tests for invalid inputs, then make them pass." Give it success criteria and watch it loop until done. This last one is Karpathy's key insight captured directly: "LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go." It's a single file. Drop it into any project.

English

211

2.1K

185.9K

Andy Zmolek 🇬🇧 London retweetledi

NIK@ns123abc·7 Nis

🚨 Anthropic just revealed their unreleased frontier model called Claude Mythos Preview The model is INSANE It found thousands of zero-day vulnerabilities in EVERY major operating system and browsers: > 27-year-old bug in OpenBSD > 16-year-old bug in FFmpeg that automated tools hit 5M times without catching Completely autonomous. No human steering. They assembled an entire industry coalition called Project Glasswing around it: AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, JPMorgan, Cisco, Palo Alto, Linux Foundation Goal: patch the world’s software BEFORE releasing it > SWE-bench: 93.9% (Opus 4.6: 80.8%) > Anthropic is committing $100M in usage credits > Thousands of vulnerabilities in 40+ organizations are being fixed right now Yesterday OpenAI published a 13-page essay warning about cyber threats and asking the government to help… Today Anthropic actually fixed them.

Anthropic@AnthropicAI

Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser.

English

178

2.2K

299.2K

Andy Zmolek 🇬🇧 London@zmolek·2 Nis

The mythical man month returns thanks to AI. So what should we call the AI equivalent of Brooke’s law? ChatGPT initially ran with an idea without further context and proposed Zmolek’s Law (Orchestration Law): “Adding agents to an under-orchestrated system increases entropy faster than capability.” Several turns of context insertion later it suggests we translate Brooks more faithfully (as coordination overhead dominates progress); now claims equivalent here is: 👉 The Accretion Law “In AI-assisted systems, what is generated accumulates faster than it is integrated or removed.”

gregorein@Gregorein

so... I audited Garry's website after he bragged about 37K LOC/day and a 72-day shipping streak. here's what 78,400 lines of AI slop code actually looks like in production. a single homepage load of garryslist.org downloads 6.42 MB across 169 requests. for a newsletter-blog-thingy. 1/9🧵

English

Andy Zmolek 🇬🇧 London retweetledi

gregorein@Gregorein·31 Mar

Garry Tan@garrytan

Absolutely insane week for agentic engineering 37K LOC per day across 5 projects Still speeding up

English

348

626

7.7K

2.8M

Andy Zmolek 🇬🇧 London retweetledi

Rosemary Kelanic@RKelanic·30 Mar

This great piece by @ishaantharoor, in which I'm quoted, gets the 1956 Suez War history right and explains its echoes for the Iran War. Increasingly I fear that Trump's massive strategic error will damage U.S. power and prestige far more than Suez hurt Britain and France. The UK/France gambit to take the canal backfired spectacularly. The *war itself* prompted Nasser to block the canal -- the outcome the UK/Fra was trying to prevent. It heralded the final decline of Britain from great power to "has been" status. The analogies to Trump's Iran debacle are legion. But an especially overlooked similarity is how the Suez War dramatically strengthened Nasser's power and influence throughout the region -- much like Trump how Trump's war has perversely *strengthened* the Islamic Republic of Iran. The war has provoked an entirely predictable (in fact, predicted) nationalistic response among many Iranians -- including those who hate the regime but now hate the U.S. and Israel more for bombing universities, threatening the electric grid, and blanketing Tehran with toxic rain following the explosion at a nearby refinery. Not only has the regime consolidated power, but it is now filled with hardliners after Israeli assassinations have killed off relative pragmatists like Ali Larijani. Courtesy of Trump, Iran has also discovered it can paralyze oil shipping through the Strait of Hormuz and collect "tolls" in exchange for freedom of passage. Iran is now trying to institutionalize its newfound leverage, which could be a lasting unintended consequence of this foolish war. I've argued before that Trump's Iran war is already the U.S.'s "Suez Moment" in terms of signifying U.S. strategic decline -- especially a decline in our ability to make sound national security decisions. But the Iran War could turn out considerably worse than Suez because the US has no one to check us from our own strategic excesses. This war will unfold as badly as Trump decides to make it, and the indications are that he intends to escalate, making it worse. Russia and China are sipping champagne while they watch the U.S. self-destruct from the sidelines. In 1956, both the U.S. and the USSR leaned heavily on Britain and France to withdraw. The Soviets even made blatant nuclear threats to compel UK/Fra to quit Suez. In 2026, there is no higher power. Only the U.S. itself can course-correct before making a bad situation even worse with further escalation. But Trump's impenetrable hubris and poor decision-making don't inspire confidence that the U.S. will retrench. @defpriorities @NewYorker newyorker.com/news/the-lede/…

English

398

865

97.3K

Andy Zmolek 🇬🇧 London retweetledi

Gracie💙@Gracie_Blue89·25 Mar

In 1999, David Bowie and Jeremy Paxman discussed the internet. Bowie foresaw the potential it had Paxman dismissed Bowie's prediction, looking at him as if he was exaggerating David "I think we are on the cusp of something exhilarating and terrifying"

English

975

Andy Zmolek 🇬🇧 London retweetledi

elvis@omarsar0·27 Mar

NEW AI report from Google. Every prior intelligence explosion in human history was social, not individual. These authors make the case that the AI "singularity" framed as a single superintelligent mind bootstrapping to godlike intelligence is fundamentally wrong. This is directly relevant to anyone designing multi-agent systems. They observe that frontier reasoning models like DeepSeek-R1 spontaneously develop internal "societies of thought," multi-agent debates among cognitive perspectives, through RL alone. The path forward is human-AI configurations and agent institutions, not bigger monolithic oracles. This reframes AI scaling strategy from "build bigger models" to "compose richer social systems." It argues governance of AI agents should follow institutional design principles, checks and balances, role protocols, rather than individual alignment. Paper: arxiv.org/abs/2603.20639 Learn to build effective AI agents in our academy: academy.dair.ai

English

130

347

1.7K

195.2K

Andy Zmolek 🇬🇧 London@zmolek·19 Mar

I love how people have already forgotten this was a Samsung event in 2016 that long pre-dated Meta’s Oculus Quest and just assume it shows people absorbed into the latter @SamsungMobile never ended up making much out of their forays into VR but the reality was they got Mark to show up in Barcelona which was quite a surprise

Spencer A. Klavan@SpencerKlavan

This is a real thing that happened

English

Andy Zmolek 🇬🇧 London@zmolek·19 Mar

@msyed_ @SpencerKlavan And in fact the Samsung VR device that was launched that day in Barcelona *is* largely forgotten and wasn’t ever a commercial success for them

English

Mo Syed@msyed_·19 Mar

@SpencerKlavan Yes and when I saw this photo I always said to myself, "This is going to fail miserably."

English

1.2K

Spencer A. Klavan@SpencerKlavan·19 Mar

This is a real thing that happened

English

166

2.7K

68.7K

Andy Zmolek 🇬🇧 London retweetledi

Quanta Magazine@QuantaMagazine·17 Mar

A trio of mathematicians built the first physical model of a “monostable” tetrahedron, a shape that will always flip-flop onto the same face no matter what side you place it on. In order for it to work properly, it had to be engineered to a level of precision within one-tenth of a gram and one-tenth of a millimeter. (From the archive) quantamagazine.org/a-new-pyramid-…