AI Explained (@AIExplainedYT) - Профиль Twitter

AI Explained@AIExplainedYT·23 Şub

I can't believe you phoned me personally to apologize! No worries at all! We agreed you did deliberately bury the ''prediction" aspect deep into the post and used verbs like "blows" and "surpasses" and "word has it" to imply knowledge of what you had in fact completely made up. I understand why people do it, the attention economy rewards it, just flagging for the 99% who only read the visible part of the post that it's not true. *I predict you will phone me; sorry, if I made it seem like you had, just the attention economy again.

English

4

0

98

1.6K

Dan McAteer@daniel_mac8·23 Şub

@AIExplainedYT Yes. It’s called a prediction. Some of them have turned out to be correct.

English

7

0

3

6.1K

Dan McAteer@daniel_mac8·22 Şub

GPT-5.3, codenamed "Garlic" 🧄 is released on Thursday, Feb. 26th. It surpasses human baseline on SimpleBench of 83.7%. In fact, it blows every previous model out of the water on all non-coding benchmarks. Word has it is a *HUGE* leap. A GPT-3 to GPT-4 moment again. OpenAI has long had the best RL/post-training pipeline, which makes sense since they were the first lab to train LLMs for inference time reasoning using RL (o1). Now they've got their mojo back when it comes to pretraining too (Mark Chen, Chief Researcher, alluded to this on Ashlee Vance's podcast last year). Public comments from sama also point in the direction of major progress. This could be the big one. It may be deserving of a major version bump. That's my prediction.

English

223

148

1.7K

470.9K

AI Explained@AIExplainedYT·6 Şub

@PatBQc It's already back up, see pinned comment! Thank you so much for watching.

English

0

2

202

Patrick Bélanger@PatBQc·6 Şub

@AIExplainedYT did you have to take your Claude Opus 4.6 and GPT-5.3-Codex private ? I was watching and it now seems to be marked as private. Can you confirm? Thanks again, the first minutes were great. And as always, have a really great day.

English

2

0

2

228

AI Explained@AIExplainedYT·29 Oca

9. In 2023, you predicted that ‘AI systems may facilitate extraordinary insights in broad swaths of many science and engineering disciplines’ by ‘24-25 but did you mean purely LLMs, (in which case are you disappointed?) or if you meant systems like Google’s WeatherNext or AlphaEvolve, why have Anthropic never publicly posted about/worked on neuro-symbolic or non-LLM systems? 10/12 10. Do you acknowledge the conflict of interest you could be perceived to have, in that you are calling to stop China getting Nvidia chips while at the same time it is those open-weight Chinese models, and scaffolds like Kimi Code, that could most threaten Anthropic’s revenue? 11/12

English

1

0

24

2.8K

AI Explained@AIExplainedYT·29 Oca

8. Can you describe the tipping point when you decided to switch from training Claude to ‘avoid implying it had a personal identity’ in ‘23-24 to ‘encourag[ing] Claude to think of itself as a particular type of person’ in ‘25-’26? 9/12

English

1

0

26

2.9K

AI Explained@AIExplainedYT·29 Oca

The Adolescence of Technology is a well-written 20,000-word new essay on what you should expect from the near future of AI. I read it in full + every footnote and link, and have these 10 questions (of a type not asked at Davos) for @DarioAmodei, the essay author and CEO of Anthropic, makers of Claude. 1/12

Dario Amodei@DarioAmodei

The Adolescence of Technology: an essay on the risks posed by powerful AI to national security, economies and democracy—and how we can defend against them: darioamodei.com/essay/the-adol…

English

4

6

83

6.6K

AI Explained@AIExplainedYT·24 Kas

@johntheadman_ @karpathy Would love for you to set up opne of those councils on lmcouncil.ai, then share the link so I could try it!

English

1

3

355

John Calvin Weaver@johntheadman_·23 Kas

I use a council for advice on different things, but I have created advisors out of famous people alive and dead so I literally get economic advice from Milton Friedman, and Adam Smith and I get writing advice from William Shakespeare and Ezra Pound. I've actually developed this into a very competitive set of advisory boards for various professions.

English

6

1

29

13.6K

Andrej Karpathy@karpathy·23 Kas

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-c… if others would like to play. ty nano banana pro for fun header image for the repo

Andrej Karpathy@karpathy

I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.

English

904

1.5K

16.9K

5.3M

AI Explained@AIExplainedYT·23 Kas

As luck would have it, I used @openrouter to go one step further and turn this idea in a full app, launched last month with 1.5k users! lmcouncil.ai: and another coincidence, I first created and benchmarked the approach behind Karpathy's chairman, see my SmartGPT video with his pinned comment, in mid-2023, but I took the name of my app from Karpathy, from a video he made 9 months ago. Now we both have an AI council lol.

English

0

10

948

OpenRouter@OpenRouter·23 Kas

Fun new project by @karpathy: LLM Council, with a Chairman model to synthesize the result

Andrej Karpathy@karpathy

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-c… if others would like to play. ty nano banana pro for fun header image for the repo

English

18

28

525

103K

AI Explained@AIExplainedYT·23 Kas

@karpathy Hey Andrej, what do you think of my version?: lmcouncil.ai More like 6 months than one weekend, as I expanded the concept into images, audio, polls, smartgpt leaders and more.

English

7

6

149

16.1K

AI Explained@AIExplainedYT·20 Kas

Nano Banana Pro drew an admirably edgy Rake's Progress, 2025-edition.

English

11

4

74

6.4K

AI Explained@AIExplainedYT·17 Kas

@Miles_Brundage I had a go at fixing this with lmcouncil.ai, just depends if there is enough of a niche who want those second opinions regularly.

English

0

3

324

Miles Brundage@Miles_Brundage·17 Kas

Great AI models do not currently think alike, which is good (but expensive) for consumers who want multiple opinions.

English

3

0

10

2.6K

AI Explained@AIExplainedYT·13 Ağu

If you use GPT-5 Pro for coding, you will swiftly realize that it will never agree to anything, even its own suggestions, without adding 'two quick tweaks'. It's a pathological perfectionist. *Still very useful, just strange in this particular way. **Gave Pro this tweet and it suggested this new version, with 'two tweaks': "If you use GPT‑5 Pro for coding, you’ll quickly realize it never accepts anything—even its own suggestions—without adding “two quick tweaks.” It’s a pathological perfectionist. Still useful, just strange in this particular way." ***Gave Pro that tweet, and it had 'two tiny notes': "When posting, drop the outermost quotation marks (they’re just framing here). Check you’re within the 280‑character limit (~229 chars, including line breaks)."

English

34

9

375

50.6K

AI Explained@AIExplainedYT·28 May

2 quick updates, and look-ahead, exactly a year on from first testing models on Simple-Bench: 1) Claude 4 busted our rate limits, and my entreaties to @AnthropicAI (to allow us to spend more money!) have yet to bear fruit. A shame, as am fairly confident Opus 4 would be SOTA. 2) Gemini 2.5 Pro 05-06 and Flash 05-20 (the latest versions) are actually a slight downgrade in both performance and instruction-following and the one full run we got out of 2.5 Pro got 46% (below the previous version's 51%). We would prefer to get an AVG@5, for fairness, before posting on the leaderboard. Thoughts: RL becoming 20% of the compute spend for frontier models may have more strange side effects than labs were anticipating. 'Over-eagerness' over simply following commands seems barely under control. On Simple, I had been fairly confident it would be saturated (>80-85%) by the end of the year. Now I think it is more like 50-50, and progress could instead slow for a while, as models become relentlessly optimised for dollar-maximising tasks, like software engineering, over general nous. Spatial intelligence, like spotting that the glove would fall onto the road, in the question pasted at the bottom of this tweet, is simply not yet as lucrative. As ever, grateful to @weights_biases and @Ag_Mlynarczyk in particular for keeping the show on the road. Q. A luxury sports-car is traveling north at 30km/h over a roadbridge, 250m long, which runs over a river that is flowing at 5km/h eastward. The wind is blowing at 1km/h westward, slow enough not to bother the pedestrians snapping photos of the car from both sides of the roadbridge as the car passes. A glove was stored in the trunk of the car, but slips out of a hole and drops out when the car is half-way over the bridge. Assume the car continues in the same direction at the same speed, and the wind and river continue to move as stated. 1 hour later, the water-proof glove is (relative to the center of the bridge) approximately? Models (super-trained on HS Math): 4km East

English

31

14

311

36K

AI Explained

Открыть