Adrian de Wynter

57 posts

Adrian de Wynter

@deWynterruption

Scientist at Microsoft/University of York. Opinions are my own.

Both Cambridges, sort of 参加日 Mayıs 2025

15 フォロー中144 フォロワー

固定されたツイート

Adrian de Wynter@deWynterruption·7 Haz

Since the AoE 2 paper seems to have picked up steam (ayyy), and IT IS 13 dense af PAGES, I would like to clarify a few things: 1. I do not deal with consciousness bc it isn't well-defined nor measurable, 2. The AoE II bit is for substrate dependence, and 3. The core argument/proof of the paper is that claiming existence (or lack thereof) human-like attributes needs better experimental setups. Using AoE II (and later Boston) is to emphasise the distinction between our interpretation of what AI does when we observe it, vs what it does. E.g., an LLM outputting an explanation is a sequence of tokens. Calling it an 'explanation' is our observation of it. Assuming it means something (like self-awareness) is an assumption impacting your experimental setup and thus your conclusions. The paper is dense because one needs to be careful when providing these types of arguments (otherwise, trust me, it'd been a lot shorter). So I wrote a less-formal, more-digestible thing here: adriandewynter.substack.com/p/if-llms-have…

English

103.3K

Adrian de Wynter@deWynterruption·6h

@bigjapbitch 10/10 on this idea though it is hilarious

English

1.7K

japbitch@bigjapbitch·6h

@deWynterruption yeah thats no probem at all you don't have to 'endorse' it, but you're more than welcome to collect the fees already sent towards your github. You can claim when you're ready

English

1.1K

Adrian de Wynter@deWynterruption·6h

... did someone just made a memecoin out of my paper? This is a big moment for me. I think this tops having my paper rt'd to the pope and Hinton.

English

8.2K

Adrian de Wynter@deWynterruption·10h

Chuffed that this post does not go straight to consciousness and did get the message :)

Rohan Paul@rohanpaul_ai

New Microsoft + York Univ paper argues that LLMs should not be treated as human-like without clear tests and narrower claims. Many studies ask whether LLMs have things like understanding, empathy, anxiety, or self-awareness, but they often build those ideas into the test from the start. The author shows that, in principle, the old strategy game can implement logic gates, train a tiny perceptron, and serve as a substrate for computation. If the same language model could be rebuilt inside a game, with goats moving around as bits, would we still say it “understands,” “feels anxiety,” or “has empathy” when it produces the same sentence? The point is not that the game is secretly intelligent, but that the same computation can be represented in a very different form. If an LLM-like system were rebuilt inside that game, its answers might stay similar, but people would probably find its “feelings” or “understanding” much less convincing. The authors argue that this shows a big measurement problem: many human-like claims about LLMs may depend on the interface and the observer, not only on the system itself. The paper is not saying LLMs definitely lack human-like attributes, or that all talk of AI cognition is nonsense. It is saying that many experiments smuggle the conclusion into the setup: they assume the model has, or cannot have, a human-like property, then interpret behavior through that assumption. ---- Link – arxiv. org/abs/2605.31514 Title: "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II"

English

2.7K

Adrian de Wynter@deWynterruption·6d

something something goats in AoE II

Avijit Ghosh@evijit

I cannot overstate how important it is to not anthropomorphize models. “Happiness” is a sentient property, which this measurement is very much not. A simulacrum at best. Once again, important to read what the actual eval is instead of the paper title, and it is just this 🤦‍♂️

English

1.4K

Adrian de Wynter@deWynterruption·10 Haz

@dwarkesh_sp Mechanistic interp works so well here, love it! And I do agree with this. But we don't define *what* is reasoning though. Not just here--everywhere. From that perspective it is quite easy to claim it exists and find evidence fitting the evidence

English

137

Dwarkesh Patel@dwarkesh_sp·9 Haz

Whatever AI sceptics say, LLMs really can reason. They're not just doing an imitation that looks like reasoning, it's the real deal. But even though they are able to reason, sometimes they won't! If you ask an LLM a question it can't answer, sometimes it will just try to imitate reasoning without doing it. The chain of thought looks basically indistinguishable from actual reasoning. But under the hood something very different is going on. @TrentonBricken talked with me about what work on circuits inside LLMs has revealed:

English

1.1K

104.1K

Adrian de Wynter@deWynterruption·10 Haz

I broadly agree with this but it drives me up the wall we NEVER say which type of reasoning is happening... or what reasoning even means in the given context 🫠

Dwarkesh Patel@dwarkesh_sp

English

479

Adrian de Wynter@deWynterruption·10 Haz

@mav3ri3k Thank you! :) I'm trash at AoE II so I stay away from people 🥲-- I plan to start playing with people this weekend though. I'm so dead.

English

Apurva Mishra@mav3ri3k·10 Haz

Probably the most fun paper I read in a while 😄. What is your elo on aoe2 ?

English

Adrian de Wynter@deWynterruption·7 Haz

English

103.3K

Adrian de Wynter@deWynterruption·8 Haz

Well well I gotta say I'm happy that the AoE2 paper sparked a discussion. That was its point. People got so mad for the wrong reasons tho... Here's another good summary, but you need to skip up until the 'circular reasoning' bit. aitoolly.com/ai-news/articl…

English

279

Adrian de Wynter@deWynterruption·7 Haz

Not exactly. The unfortunate thing about this picture is that the highlighted thing is AoE, not the actual paper's argument (which is the subsequent sentences). The paper itself does address your comments: for example, there's no need to show AoE has human-like attributes. In fact, the paper works at a meta-level indicating that the failure mode of research is independent of philosophical viewpoint, validity of such viewpoint, substrate, or the nature (positive or negative) of the assumptions made. Indeed, what it shows is not that it 'fails', but that you get unsound conclusions because, although their truth-value might hold, their validity might not: they are either circular or uninformative within the setup. About the title: so one of the fundamental assumptions you gotta make when performing a measurement is that the substrate could present these attributes (in the paper). So transferring an entity cross-substrate does imply that such substrate is assumed to present the attributes you have ascribed to the entity. Same for not presenting it (it's just a symmetric argument). It follows that by analogy the title holds--and there's a few other examples within the paper, too.

English

LIFE 2030 and Beyond@life2030com·7 Haz

1) the paper does not really show that the original videogame Age of Empires II has human-like attributes. So, the HONEST title of this paper would be: "If LLMs have human-like attributes, then an LLM implemented inside Age of Empires II may have them too." 2) The author says: "LLM interpretation is substrate-dependent." That is a better claim. But then the target should be interpretation of the LLM under different implementations, not attribution to the substrate AoE II itself. The paper's argument appears to conflate the substrate with the implemented system. 3) The paper warns the poorly designed anthropomorphism research, but it does not show that attributing human-like functional properties to neural networks or LLM-like systems necessarily fails. It mainly shows that attributing such properties to a bizarre substrate without distinguishing substrate from implemented system leads to confusion. This comment is related to my comment yesterday: x.com/life2030com/st…

LIFE 2030 and Beyond@life2030com

The title of this paper is a click-bait, and doesn't make sense. x.com/MilesCranmer/s… Let me fix the title of this paper to: “If LLMs have human-like attributes, then so does the neural network inside Age of Empires II." The paper's argument appears to conflate the substrate with the implemented system. Age of Empires II, as a classical videogame, is not shown to possess anthropomorphic attributes, as the title suggested. What is actually shown is that a trainable neural network or LLM-like system can be implemented inside the game environment, like Age of Empires II. Nothing new or novel is spotted in this paper. Therefore, the content of this paper is stale, and does not evidently support the title's claim.

English

Miles Cranmer@MilesCranmer·6 Haz

This is an insane paper and I love it arxiv.org/abs/2605.31514

English

156

1.3K

11.2K

623.7K

Adrian de Wynter@deWynterruption·7 Haz

Hi! Author here -- yes, I completely agree with this and subscribe to this school of thought. The paper's main message is to stop making these human-like assumptions when performing measurements, because they lead to incorrect conclusions regardless of the outcome. What you mention is also part of objection 6.4: being able to calibrate the instrument (i.e., having different facets of intelligence) to debug failure modes of hypotheses means that the conclusions won't be incorrect :)

English

Michael Levin@drmichaellevin·7 Haz

I think the basic issue in all such discussions is that these kinds of competencies aren't some kind of essential "human attributes". Measuring cognitive properties by humans is myopic and causes all kinds of pseudoproblems. The emerging field of diverse intelligence has better frameworks. A spectrum of highly variable intelligences, and lots of research on which kinds of architectures enable which kinds of patterns (of behavior, of computation, of physiology, etc.), is more useful for discussions of natural, artificial, and hybrid agents of varying provenance and composition. frontiersin.org/articles/10.33… journal.frontiersin.org/article/10.338… and more at drmichaellevin.org/publications/b…, plus lots of other good labs.

English

209

9.9K

Adrian de Wynter@deWynterruption·7 Haz

Nearly forgot--the common thing I've read (outside of 'this is dumb' ): ) is that this is functionalism/materialism/CRA/CBE. The paper (and abstract) do point out that the results are independent of the philosophical school of thought you subscribe to.

English

292

Adrian de Wynter@deWynterruption·7 Haz

Both brilliant comments! That's why the title says that AoE II could have human-like characteristics (not 'a neural network in AoE II'). The key here is that people implicitly assume these characteristics could arise when 'subscribing' (or rejecting) some framing (e.g., materialism). What the paper shows is that it really doesn't matter which framework one accepts/rejects, the experiments will still fail. ... and yes I stayed away from consciousness 😂

English

rewind@0xRWND·7 Haz

@zoewangai @MilesCranmer I think that the phrase “human-like characteristics” is also a limiting framing used to avoid more loaded topics like “consciousness” which isn’t really a sufficiently provable characteristic; even in this more generalized version, novelty is in how they share them but diverge

English

Adrian de Wynter@deWynterruption·7 Haz

Hi! Author here -- yes, the summary is almost correct. But the paper is related to assumptions in measurement. Lots of papers assume these properties when setting up experiments, then point at the conclusions and be all like 'see! I told you!'. From AoE II/Boston we see that LLM interpretation is substrate-dependent (i.e., 'non-unique'), so measurements of their properties should account for it. Then I show that experiments assuming human-like properties lead to failed conclusions--and same for the converse: either they are circular arguments or uninformative. And then the null assumption, which accounts for non-uniqueness, walks in.

English

Zoe Wang@zoewangai·6 Haz

@MilesCranmer The study invalidates LLM research built on preset human-like characteristics. After building an in-game neural network in Age of Empires II, the authors find perceived AI anthropomorphism stems mainly from interface design instead of inherent properties. openread.academy/en/paper/readi…

English

5.8K

Adrian de Wynter@deWynterruption·7 Haz

@gshaikovski Thank you! :) Yes, you may compute LLM outputs with pen and paper and make the same argument. Back in the day I even worked out the ops for BERT (by hand!) for a proof. But consciousness... I intentionally avoid it in the paper bc... what IS consciousness? How do we measure it?🫠

English

George Shaikovski@gshaikovski·6 Haz

@deWynterruption Found your paper in my feed, really cool argument :) Just yesterday I saw someone here making a similar argument - you could compute LLM output with a pen and paper - does this mean you would instantiate consciousness?

English

Adrian de Wynter@deWynterruption·1 Haz

Would you believe an #LLM has feelings if it is made of #ageofempires goats? What if it spans all of #boston ? They wouldn't have emergent properties, would they? That's what I argue in 'If LLMs Have Human-Like Attributes, Then So Does Age of Empires II' arxiv.org/abs/2605.31514

English

1.1K

Adrian de Wynter@deWynterruption·7 Haz

@evangineer They always were (at a low-level haha). Real talk, though, LLMs do present interesting properties which arise from their pretraining. The question is how do we measure these withouth being always fooled--ELIZA sounds silly but it DID fool people back then

English

Mamading Ceesay@evangineer·6 Haz

@deWynterruption So you are saying that LLMs are just Eliza on steroids metaphorically speaking.

English

Adrian de Wynter@deWynterruption·2 Haz

Man everyone can describe my paper better than I 😂😂😂

Anomaly@theanomalyai

A researcher just proved that if ChatGPT has human-like feelings, then Age of Empires II might have them too. Not metaphorically. Literally inside the argument. The paper is called "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II" and it might be one of the funniest serious AI papers I have read this year. The researcher built a neural network inside Age of Empires II. He used goats, rails, gates, triggers, villagers, markets, farms, and game mechanics to show that the 1999 strategy game can implement computation. First, he built NAND gates inside the game. That matters because NAND gates are functionally complete. If you can build NAND gates, you can theoretically build any computation. Then he went further. He built a 1-bit perceptron inside Age of Empires II and trained it to learn AND. In normal language: he made Age of Empires II run a tiny neural network. But the point is not that AoE II is secretly an AI lab. The point is much more uncomfortable. If an LLM can be copied into any sufficiently powerful substrate, and that system gives the same output to the same prompt, then the "human-like" part might not belong to the model at all. It might belong to the interface. Here is the thought experiment from the paper. You tell the system: "I feel lonely." It replies: "I feel bad for you, maybe catch up with a friend? Closeness always helps in these situations." When ChatGPT says this, people call it empathy. When the same response comes from goats moving back and forth inside Age of Empires II, nobody wants to call it empathy. Same input. Same output. Different costume. That is the entire trap. The paper is not saying LLMs definitely have no understanding, no empathy, no self-awareness, no morality, no anxiety, no deception, no theory of mind. It is saying the way researchers measure these traits is often broken from the start. Because many studies begin by treating the LLM like the kind of thing that can have human-like traits, then run an experiment, then act surprised when the result looks human-like. That is circular. You ask a model if it feels anxious. You design a test for anxiety. You interpret a language output through human psychology. Then you conclude the model has an anxiety-like state. But if the same behavior came from a video game circuit made of goats, the interpretation would collapse instantly. That means the experiment may not be measuring the system. It may be measuring our willingness to anthropomorphize the system. The meta-review in the paper is even more disturbing. The researcher analyzed 315 LLM papers from 2024 to 2026. 57% assumed human-like attributes when studying LLMs. 15% made human-like attributes the main subject. And among the papers that directly studied those traits, 77% concluded that LLMs had them. That does not automatically prove the conclusions are wrong. But it does show the field is playing with fire. Because once you start asking whether a machine "understands," "feels," "believes," "wants," or "knows," you have already smuggled human psychology into the measurement. The paper’s cleanest argument is this: Measure behavior first. Interpret later. A model can comfort without empathy. It can explain without understanding. It can say "I believe" without belief. It can describe its inner state without having access to one. It can pass a theory-of-mind benchmark without having a mind. That distinction matters because AI companies are already selling these systems as assistants, companions, therapists, agents, tutors, co-workers, and almost-people. The more human the interface gets, the easier it becomes to confuse performance with personhood. A chat window feels alive because it responds instantly, speaks naturally, remembers context, and mirrors your emotions. A goat circuit inside Age of Empires II feels ridiculous. But if both can produce the same response, maybe the feeling was never evidence. Maybe it was just design. The scariest part is not that AI might become human. The scariest part is that humans are so easy to fool that a sufficiently good interface can make us see a mind where there may only be machinery. And according to this paper, if we are not careful, the next decade of AI psychology could become scientists staring at goats in Age of Empires II and calling it consciousness.

English

1.4K

Adrian de Wynter がリツイート

Dan Roy@roydanroy·5 Haz

Warning: once you learn category theory, you'll never be able or willing to talk with people who don't know category theory.

Markus J. Buehler@ProfBuehlerMIT

We've made a breakthrough in self-evolving AI scientists moving from "search" to "principled discovery": Scientific discovery requires that the search space itself changes, and an AI scientist must perceive this shift without intervention. We built an AI that achieves this for the first time with the ability to discover the scientific vocabulary it reasons in. Evidence, tools, artifacts, verifiers, failures & claims become typed provenance. We show three distinct modalities: 1) retrieval, adding known objects; 2) search, exploring a fixed schema; and critically: 3) discovery, a verified regime transition. We solve the open-endedness evaluation problem by lifting agentic workflows into a typed copresheaf and proving, via a Kan obstruction, that true discovery is not unbounded generation but a verifiable schema expansion: old evidence is transported by Left Kan extension, and genuine novelty is mathematically quantified by the pointwise residual beyond the transported image - separating discovery from mere search and making novelty objective and measurable rather than a subjective judgment or benchmark delta. Our AI scientist is built in a way that does not pre-conceive the approach it chooses; instead, we endow the system with formal power to adapt, evolve, and reason from first principles. Case studies include: 1⃣Builder/Breaker model that discovers mode-conditioned compliance in proteins; 2⃣CategoryScienceClaw that finds anisotropic fiber-network stiffness rules. Great work in collaboration with my graduate student @fwang108_ @MITdeptofBE F.Y. Wang & M.J. Buehler, Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence, arXiv:2606.01444, 2026

English

148

2.6K

454.7K

ディスカバー

@bigjapbitch @dwarkesh_sp @TrentonBricken @mav3ri3k @zoewangai @MilesCranmer @elonmusk @BarackObama