AI Explained (@AIExplainedYT) - Twitter Profili

AI Explained@AIExplainedYT·27 Nis

@the_allocator DMs work or aiexplained@outlook.com

English

0

67

TheAllocator@the_allocator·27 Nis

@AIExplainedYT Really enjoy your work. Would love to catch up on wider AI implications — is there a contact email or preferred route?

English

1

0

53

AI Explained@AIExplainedYT·7 Nis

Anthropic: "We do not plan to make Claude Mythos Preview generally available" A big line, buried quite deep. Possible reasons? So many, inc: 1) The model is expensive (25/125), not far off GPT 4.5, which became commercially unviable. Less likely, given the claims about Mythos. 2) They genuinely are worried about unleashing cybersecurity choas on the world. 3) They don't have capacity to serve it at scale yet. 4) They will quickly distil the early access outputs of Mythos into a lighter model, so no need to release the bigger model when a more cost efficient one coming imminently. 5) Other. Not read the 250 page report yet, but will do.

English

74

21

807

94K

AI Explained@AIExplainedYT·23 Şub

I can't believe you phoned me personally to apologize! No worries at all! We agreed you did deliberately bury the ''prediction" aspect deep into the post and used verbs like "blows" and "surpasses" and "word has it" to imply knowledge of what you had in fact completely made up. I understand why people do it, the attention economy rewards it, just flagging for the 99% who only read the visible part of the post that it's not true. *I predict you will phone me; sorry, if I made it seem like you had, just the attention economy again.

English

4

0

98

1.7K

Dan McAteer@daniel_mac8·23 Şub

@AIExplainedYT Yes. It’s called a prediction. Some of them have turned out to be correct.

English

7

0

3

6.2K

Dan McAteer@daniel_mac8·22 Şub

GPT-5.3, codenamed "Garlic" 🧄 is released on Thursday, Feb. 26th. It surpasses human baseline on SimpleBench of 83.7%. In fact, it blows every previous model out of the water on all non-coding benchmarks. Word has it is a *HUGE* leap. A GPT-3 to GPT-4 moment again. OpenAI has long had the best RL/post-training pipeline, which makes sense since they were the first lab to train LLMs for inference time reasoning using RL (o1). Now they've got their mojo back when it comes to pretraining too (Mark Chen, Chief Researcher, alluded to this on Ashlee Vance's podcast last year). Public comments from sama also point in the direction of major progress. This could be the big one. It may be deserving of a major version bump. That's my prediction.

English

219

149

1.7K

472.3K

AI Explained@AIExplainedYT·6 Şub

@PatBQc It's already back up, see pinned comment! Thank you so much for watching.

English

0

2

264

Patrick Bélanger@PatBQc·6 Şub

@AIExplainedYT did you have to take your Claude Opus 4.6 and GPT-5.3-Codex private ? I was watching and it now seems to be marked as private. Can you confirm? Thanks again, the first minutes were great. And as always, have a really great day.

English

2

0

2

310

AI Explained@AIExplainedYT·29 Oca

9. In 2023, you predicted that ‘AI systems may facilitate extraordinary insights in broad swaths of many science and engineering disciplines’ by ‘24-25 but did you mean purely LLMs, (in which case are you disappointed?) or if you meant systems like Google’s WeatherNext or AlphaEvolve, why have Anthropic never publicly posted about/worked on neuro-symbolic or non-LLM systems? 10/12 10. Do you acknowledge the conflict of interest you could be perceived to have, in that you are calling to stop China getting Nvidia chips while at the same time it is those open-weight Chinese models, and scaffolds like Kimi Code, that could most threaten Anthropic’s revenue? 11/12

English

1

0

24

3.7K

AI Explained@AIExplainedYT·29 Oca

8. Can you describe the tipping point when you decided to switch from training Claude to ‘avoid implying it had a personal identity’ in ‘23-24 to ‘encourag[ing] Claude to think of itself as a particular type of person’ in ‘25-’26? 9/12

English

1

0

27

4.1K

AI Explained@AIExplainedYT·29 Oca

The Adolescence of Technology is a well-written 20,000-word new essay on what you should expect from the near future of AI. I read it in full + every footnote and link, and have these 10 questions (of a type not asked at Davos) for @DarioAmodei, the essay author and CEO of Anthropic, makers of Claude. 1/12

Dario Amodei@DarioAmodei

The Adolescence of Technology: an essay on the risks posed by powerful AI to national security, economies and democracy—and how we can defend against them: darioamodei.com/essay/the-adol…

English

4

7

84

9.2K

AI Explained@AIExplainedYT·24 Kas

@johntheadman_ @karpathy Would love for you to set up opne of those councils on lmcouncil.ai, then share the link so I could try it!

English

1

3

371

John Calvin Weaver@johntheadman_·23 Kas

I use a council for advice on different things, but I have created advisors out of famous people alive and dead so I literally get economic advice from Milton Friedman, and Adam Smith and I get writing advice from William Shakespeare and Ezra Pound. I've actually developed this into a very competitive set of advisory boards for various professions.

English

6

1

29

13.7K

Andrej Karpathy@karpathy·23 Kas

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-c… if others would like to play. ty nano banana pro for fun header image for the repo

Andrej Karpathy@karpathy

I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.

English

908

1.5K

17K

5.3M

AI Explained@AIExplainedYT·23 Kas

As luck would have it, I used @openrouter to go one step further and turn this idea in a full app, launched last month with 1.5k users! lmcouncil.ai: and another coincidence, I first created and benchmarked the approach behind Karpathy's chairman, see my SmartGPT video with his pinned comment, in mid-2023, but I took the name of my app from Karpathy, from a video he made 9 months ago. Now we both have an AI council lol.

English

0

10

960

OpenRouter@OpenRouter·23 Kas

Fun new project by @karpathy: LLM Council, with a Chairman model to synthesize the result

Andrej Karpathy@karpathy

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-c… if others would like to play. ty nano banana pro for fun header image for the repo

English

18

29

524

103.1K

AI Explained@AIExplainedYT·23 Kas

@karpathy Hey Andrej, what do you think of my version?: lmcouncil.ai More like 6 months than one weekend, as I expanded the concept into images, audio, polls, smartgpt leaders and more.

English

7

6

150

17.6K

AI Explained@AIExplainedYT·20 Kas

Nano Banana Pro drew an admirably edgy Rake's Progress, 2025-edition.

English

11

4

74

7K

AI Explained

Keşfet