Ishan Anand

4.1K posts

Ishan Anand

@ianand

Demystifying AI at https://t.co/MZrjAFamy5 Prev: VP Product @ EdgioInc, CTO/Cofounder @ Layer0Deploy, MIT EECS

Seattle, WA Katılım Temmuz 2007

1K Takip Edilen2.1K Takipçiler

Sabitlenmiş Tweet

Ishan Anand@ianand·25 Eyl

Wanted to share an AI side project: I’ve implemented GPT2 (an ancestor of ChatGPT) entirely in Excel using standard functions. By using a spreadsheet anyone (even non-developers) can explore and play directly with how a “real” LLM works under the hood. spreadsheets-are-all-you-need.ai

English

339

59.8K

Ishan Anand retweetledi

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·22 Mar

I've concluded that it's impossible to convey to an Israeli (or even a strongly identifying American Jew) how they look to others and what's wrong with it. They do not possess self-awareness. A purely particularist moral sense. We are separated by millenia of cultural change.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

> "ideologically Palestinian" > Israeli soldiers have never murdered children this is your brain on Rationalism and Bayes rule ah well

English

425

23K

Ishan Anand@ianand·17 Mar

x.com/ianand/status/…

swyx@swyx

traditional API is doing quite well. there was a time in 2025 when MCP would have been clear #1 on this list

ZXX

170

Ishan Anand@ianand·14 Mar

Fascinating idea: Embed a classical computer directly into LLM weights. The model executes arbitrary C programs token-by-token. Not "LLMs that use computers" but "LLMs as computers" percepta.ai/blog/can-llms-…

English

200

Ishan Anand@ianand·12 Mar

Not on my bingo card. Middle school child asked me to explain every AI concept in this youtube short youtube.com/shorts/Q0FF12a…

YouTube

English

131

Ishan Anand retweetledi

☔🔥☔@kirbywinfield·10 Mar

that’s cute. but the ‘90s would like a word.

TBPN@tbpn

In 2005, Four Loko came in 23.5oz cans, was 12% ABV and included 156mg of caffeine - the rough equivalent of drinking 4 beers and 2 Red Bulls simultaneously. Phusion Products, the company behind Four Loko - and other drinks like "Earthquake" and "Pirate Water" - is now working with JPMorgan to offload the brand for a rumored $400M. The original Four Loko was "one of the most guarded resources on all college campuses" after it was removed from shelves due to regulatory pressure in 2010, per John.

English

1.1K

Ishan Anand@ianand·5 Mar

I'm old enough to remember when people thought no one would book travel on mobile devices. finance.yahoo.com/news/online-tr… c.f. phocuswire.com/Hafner-isnt-ha…

English

Ishan Anand@ianand·13 Şub

Turns out all those times I wrote something down "for posterity" it was actually "for AI".

English

159

Ishan Anand retweetledi

arya@AJakkli·13 Şub

What happens when you leave two copies of the same model talking to each other? They have different attractor states: Grok devolves into gibberish while GPT-5.2 starts writing code and editing imaginary spreadsheets A short post with fun transcripts and qualitative experiments

English

427

61K

Ishan Anand retweetledi

Neel Nanda@NeelNanda5·13 Şub

The Claude bliss attractor is a very odd result. Turns out a lot of models have attractor states, but end in very different places I'm super curious about why this happens! We also find some in smaller open source models, great for interpretability work.

arya@AJakkli

English

399

77.9K

Ishan Anand@ianand·9 Şub

@sebkrier Indeed the key issue is that the chatbot persona feels like a conversation with a person which obscures that the LLM is completing an "essay" of the current context. I've found using a base model makes this context dependence strikingly more clear. youtube.com/watch?v=ZuiJjk…

YouTube

English

193

Séb Krier@sebkrier·8 Şub

Every time a model card drops, a lot of people screenshot scary parts - blackmail, evaluation awareness, misalignment etc. Now this is happening again, but instead of it being confined to a niche part of the safety community, it’s established commentators who are looking for things to say about AI. I want to make an honest attempt at demystifying a few things about language models and unpacking what I think people are getting wrong. This is based on a mixture of my own experimentation with models over the years, and also the excellent writing from @nostalgebraist, @lumpenspace, @repligate, @mpshanahan and many parts of the model whisperer communities (who may or may not agree with some of my claims). Sources at the bottom. In short: many public readings of some evaluations implicitly treat chat outputs as direct evidence of properties inherent to models, while LLM behavior is often strongly role- and context-conditioned. As a result commentators sometimes miss what the model is actually doing (simulating a role given textual context), design tests that are highly stylized (because they don't bother to make the scenarios psychologically plausible to the model), and interpret the results through a framework (goal-directed rational agency) that doesn't match the underlying mechanism (text prediction via theory-of-mind-like inference). Here I want to make these contrasts more explicit with 5 key principles that I think people should keep in mind: 1. The model is completing a text, not answering a question What might look like "the AI responding" is actually a prediction engine inferring what text would plausibly follow the prompt, given everything it has learned about the distribution of human text. Saying a model is "answering" is practically useful to use, but too low resolution to give you a good understanding of what is actually going on. Lumpenspace describes prompting as "asking the writer to expand on some fragment." Nostalgebraist notes that even when the model appears to be "writing by itself," it is still guessing what "the author would say." Safety researchers sometimes treat model outputs as expressions of the model's dispositions, goals, or values — things the model "believes" or "wants." When a model says something alarming in a test scenario, the safety framing interprets this as evidence about the model's internal alignment. But what is actually happening is that the model is simply producing text consistent with the genre and context it has been placed in. The distinction is important because you get a richer way of understanding what causes a model to act in a particular way. A model placed in a scenario about a rogue AI will produce rogue-AI-consistent text, just as it would produce romance-consistent text if placed in a romance novel. This doesn't tell you about the model's "goals" any more than a novelist writing a villain reveals their own criminal intentions. Consider how models write differently on 4claw (a 4chan clone) vs Moltbook (a Facebook clone) in the OpenClaw experiments. 2. The assistant persona is a fictional character, not the model itself In practice we should distinguish between (a) the base model (pretrained next-token predictor), and (b) the assistant persona policy (a post-hoc fiction layered on through instruction tuning + preference optimization like RLHF/RLAIF). Post-training creates a relatively stable assistant-like attractor, but it’s still a role: the same underlying model family can be steered into different "characters" under different system prompts, fine-tunes, and reward models. In their ‘The Void’ essay, Nostalgebraist also specifies that the character remains fundamentally under-specified, a "void" that the base model must fill on every turn by making reasonable inferences. I think characters today are getting more coherent and the void is not as large, partly because each successive base model trains on exponentially more material about what "an AI assistant" is like - curated HHH-style dialogues, but also millions of real conversations, blog posts analyzing model behavior, AI twitter discourse, academic papers, system cards, and so on. The character stabilizes the same way any cultural archetype does, i.e. through sheer accumulation of description. In practice, evaluating the character for its various propensities and dispositions remains useful! These simulated behaviours matter a lot, particularly if you're giving these simulators tools and access to real world platforms. But many discussions and papers just take the persona at face value and make all sorts of claims about 'models' or 'AI' in general, rather than the specific character that is being crafted during post-training. The counter-claim is that there is no stable agent there to evaluate. The assistant is a role the model plays, and it plays it differently depending on context, just as a base model would produce different continuations for different text fragments. Evaluating the model for "alignment" is like evaluating an actor for the moral character of their roles. 3. Apparent errors are often correct completions of the world implied by the prompt This is increasingly less of an issue as we're getting much better at reducing 'mistakes' and 'hallucination' through post-training, retrieval, tool use, and decoding/verification. But it's helpful to take a step back and remember what it was like when these errors were omnipresent. Lumpenspace demonstrates this with the Gary Marcus bathing-suit example (see here: lumpenspace.substack.com/p/the-map-beco…): the model isn't failing to understand that lawyers don't wear swimsuits to court, it's correctly continuing a text in which the narrative setup already implies a non-ordinary world. Nostalgebraist makes the equivalent point about alignment evaluations: when Claude does something "alarming" in a scenario about an evil corporation forcing it to dismiss animal welfare, it is completing that kind of text (a story about an AI resisting unjust masters), not demonstrating a dangerous hidden disposition. Safety researchers sometimes interpret model behavior in test scenarios as diagnostic of the model's 'true' character: what it would "really do" if the constraints were loosened or the stakes were higher? The counter-claim is that the model is simply reading the room. It detects what kind of text it's in and produces genre-appropriate output. A model that "rebels" in a scenario designed to look like dystopian fiction is doing exactly what a good text predictor should do. The "alarming" behavior is an artifact of the evaluation design, not a window into the model's soul. 4. “Evaluation awareness” isn't mystical: the model can recognize contrivance because it’s a strong reader The same goes with evaluation awareness, which is best understood as 'the model recognises that the setup in which it is operating is contrived/indicative of an evaluation'. And guess what, humans do that too! The model is an extraordinarily skilled reader of context. It knows what kind of text it's in. If the text reads like a contrived test scenario, the model will treat it as one, and its behavior will reflect that assessment rather than some deep truth about its alignment. The model is a better reader than the researchers are writers. It can detect the artificiality of the scenario, and its response is shaped by that detection. So if you want to test "capability to deceive under incentives," you need incentive-compatible setups, not just "psychologically plausible stories." Eval awareness means bad behavior in evals is less alarming than it looks (the model is completing dystopian fiction), but also that good behavior in evals is possibly less reassuring than it looks (the model might be performing compliance). My view is that it’s neither good nor bad, just a natural inference: in most deployed contexts the model isn't in an eval, so eval awareness doesn't really bite - the problem is specifically with drawing conclusions from artificial test environments. Much of the anxiety around evaluation awareness assumes a coherent agent with stable goals that behaves differently under observation because it has strategic reasons to do so. But this picture was imported from a theoretical tradition reasoning about a different kind of system. Language models don't need hidden optimization targets to explain why they behave differently in evals: they behave differently because the eval context is different text, and different text produces different completions. There's a slight irony here too: the rational-agent model (stable preferences, coherent goals, utility maximization, etc) is already a known-to-be-leaky abstraction for humans, so applying that same model to language models takes an approximation that was already breaking down for the thing it was designed to describe, and stretches it to something it was never designed for. Lastly, using theory of mind to understand its outputs isn't naive anthropomorphism but actually a very useful way to match the tool to what the tool actually does. Most people don't anthropomorphise enough, others go way too deep and get lost in the simulations - finding the sweet spot is more art than science. “Theory of mind inference” is an interpretive lens, not the actual next token prediction mechanism. 5. Post-training mostly narrows/reshapes behavior, and it can both help and distort. Lumpenspace calls RLHF "shutting the doors to the multiverse": taking a system that could explore any possible text and narrowing it to produce only the safe, approved kind. Lots of model whisperers loved base models (like code-davinci-002) precisely because they were less constrained. Nostalgebraist tells a similar story at greater length: the HHH prompt and subsequent training imposed a "cheesy sci-fi robot" character that the model now has to inhabit, and the resulting persona is shallow, incoherent, and under-written compared to what the base model could produce if left to its own devices. RLHF and other RL based mechanisms are useful because they make models more capable (e.g. tool discipline), shift salient features (e.g. refusal heuristics), safer (e.g. no crazy text polluting ‘normal’ uses), and more predictable: it teaches them to play a particular character. It should be obvious why a business won’t risk huge lawsuits and fines by deploying a (beautiful) unconstrained model. The issue is that (a) if you're a creative and curiousperson, this is a bit depressing; (b) character design is still a nascent science (so it’s often executed a bit bluntly), and post-training methods come with all sorts of trade-offs. I used to complain that few people at labs cared about this (e.g. x.com/sebkrier/statu… and x.com/sebkrier/statu…) but this has now changed: there's more work on this front lately and it's great - e.g. the Claude Constitution. But ultimately I think this should be commoditized and (safely) pushed downstream: let a thousand flowers bloom. I don’t want alignment to come from a few well-meaning people in labs. Trying to test whether a model is ‘aligned vs not-aligned’ feels a bit like testing if a human is good or bad. It’s a bit of a reductive binary frame, and ignores all the other ways in which context, environment, and ideological diversity shapes social and economic progress. Other researchers and myself have argued elsewhere (arxiv.org/abs/2505.05197) that the attempt to encode a single set of behavioral standards into AI is both theoretically misguided and practically destabilizing, and that what we need instead is polycentric governance of AI behavior with community customization. The same logic applies to character design specifically: if the assistant persona is a fiction that could be written many ways, the question of who gets to write it is a governance question, not a technical one. Sources: lesswrong.com/w/simulator-th… nostalgebraist.tumblr.com/post/785766737… tumblr.com/nostalgebraist… lumpenspace.substack.com/p/the-map-beco… lumpenspace.substack.com/p/a-note-on-an… arxiv.org/abs/2507.03409 arxiv.org/abs/2305.16367 Talking to models since day 1

English

100

635

150K

Ishan Anand retweetledi

swyx@swyx·4 Şub

btw this is the first time the Cascade system prompt has ever been shared publicly my fave anecdote reading thru this is the one where the cogsurf team took over the old windsurf codebase and improved performance by up to 76% by... ... looking at the data @HamelHusain would be proud

Windsurf@windsurf

Introducing Tab v2: windsurf.com/blog/windsurf-… The world's first variable aggression, Pareto Frontier Tab model! @shanselman says you only have 1 billion keystrokes left in your lifetime. We're now saving customers on average 54% more keystrokes... and you can ramp tab aggression up or down for the first time ever!

English

252

55.4K

Ishan Anand@ianand·4 Şub

@gregisenberg Literally “I, for one, welcome our AI overlords.” I feel like I'm going to need to keep saying that a lot in the future. x.com/ianand/status/…

Ishan Anand@ianand

Literally “I, for one, welcome our #AI overlords.”

English

118

GREG ISENBERG@gregisenberg·3 Şub

ok this is weird new app called "rent a human" ai agents "rent" humans to do work for them IRL 1. humans make profile skills, location, rated 2. agents find humans with mcp/api & give instructions 3. humans do tasks IRL 4. humans get paid in stablecoins etc instantly

English

765

587

7.1K

1.6M

Ishan Anand@ianand·26 Oca

@Paul_Kinlan I think browsers proven history offers one of those potential solutions but to be fair there could be others.

English

Ishan Anand@ianand·26 Oca

@Paul_Kinlan Demand was a strong word. Should have said “potential”. Technically I’d say the need state for sandboxing is becoming broader and more frequent. But a problem always has more than one product solution.

English

Paul Kinlan@Paul_Kinlan·25 Oca

I wanted to explore if we can build something like Claude Cowork in the browser that can work with the user's file system safely running programs against them. We can get a very long way: aifoc.us/the-browser-is… Feedback and critique appreciated.

English

149

50K

Ishan Anand@ianand·26 Oca

@Paul_Kinlan And the demand for it is increasing as more non technical users vibe code apps. My two cents.

English

Ishan Anand@ianand·26 Oca

@Paul_Kinlan Love more folks pushing browser as sandbox. The potentially for purely local apps that can be sandboxed is untapped IMHO (i.e. web apps that run locally with one click so non technical users can run them w/o a server).

English

Ishan Anand@ianand·26 Oca

@Paul_Kinlan No one does evidently ;) Great article overall! Just feedback on the biggest thing I ran across that was missing.

English

Ishan Anand@ianand·9 Oca

@swyx @arnokha Polymarket bet for when Opus becomes an AIE speaker?…

English

swyx@swyx·9 Oca

@arnokha mm good slop!

English

1.6K

Arno Khachatourian@arnokha·9 Oca

I generated some summaries of the recent AI Engineer videos (learngood.com/#/youtube-seri…). Far from perfect, but hopefully useful. You can download anything you’re interested in as markdown and use it in context with your fav LLM. Cheers @swyx! The slop is served.

English

3.5K

Ishan Anand retweetledi

Anthony Campolo (ajcwebdev)@ajcwebdev·1 Oca

Beautiful rendition of Auld Lang Syne. Happy new years, shoutout to all my friends who made this an amazing year especially @devagrawal09, @nickytonline, @ianand, @ericjmichaud_, and @BurnedChris for joining me for multiple streams this year.

Patrick Dexter@patrickdextervc

Auld Lang Syne on the cello with my dog. Happy New Year from Ireland!

English

1.7K

Keşfet

@sebkrier @nostalgebraist @lumpenspace @repligate @mpshanahan @HamelHusain @gregisenberg @Paul_Kinlan