Cezar Andrei

843 posts

Cezar Andrei

Cezar Andrei

@cezarandrei

Katılım Kasım 2009
166 Takip Edilen45 Takipçiler
Cezar Andrei
Cezar Andrei@cezarandrei·
@daveremy 100%, very well said. I hope schools, from elementary to college will switch to this new paradigm of encouraging curiosity and ideas exploration and do away with cookie cutters frameworks.
English
0
0
1
19
Cezar Andrei retweetledi
kache
kache@yacineMTB·
you can outsource your thinking but you cannot outsource your understanding
English
257
3.8K
16.7K
2.4M
Massimo
Massimo@Rainmaker1973·
Scientists have created one of the most detailed 3D reconstructions of a human cell (eukaryotic cell) ever produced. This groundbreaking model, often termed a "Cellular Landscape Cross-Section Through a Eukaryotic Cell," combines data from X-ray tomography, nuclear magnetic resonance (NMR), and cryo-electron microscopy to map molecular structures in extreme detail.
English
853
4.7K
22.2K
2.2M
Cezar Andrei
Cezar Andrei@cezarandrei·
@karpathy No need for human researchers when I thought they would be one of the last jobs standing. Now all it matters is: how big your compute is!
English
0
0
2
305
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
966
2.1K
19.5K
3.6M
Freya Lawson
Freya Lawson@Freyabuilds·
🚨BREAKING: Microsoft just solved the "Agent Loop" problem. Agent Lightning is an open-source framework that lets agents learn from their own mistakes using Reinforcement Learning. Your agent fails a task → Agent Lightning analyzes why → Updates the prompt automatically → Next run succeeds. 100% Opensource.
Freya Lawson tweet media
English
44
123
763
63.8K
Cezar Andrei
Cezar Andrei@cezarandrei·
@r0ck3t23 "by end of year" seems way too early but I bet, at least signs of this will appear even sooner than that.
English
0
0
0
45
Dustin
Dustin@r0ck3t23·
Elon Musk thinks coding dies this year. Not evolves. Dies. By December, AI won’t need programming languages. It generates machine code directly. Binary optimized beyond anything human logic could produce. No translation. No compilation. Just pure execution. Musk: “You don’t even bother doing coding.” Code was never the point. It was friction. A tax we paid because machines didn’t speak human. AI just learned fluent human. The tax is gone. Now plug that into Neuralink. No syntax. No keyboard. No screen. Musk: “Imagination-to-software.” Thought becomes executable. You imagine an outcome, the system architects and compiles it into reality instantly. We’re not automating programming. We’re erasing it from existence. The entire profession collapses into a thought. Decades of training reduced to irrelevance. The gap between idea and instantiation hits zero. You don’t build anymore. You imagine, and it materializes. Not incremental progress. Total phase shift. The way humans have created things for ten thousand years just became obsolete. Welcome to a world where the limiting factor isn’t skill, resources, or time. It’s whether you can picture what you want clearly enough for a machine to birth it into existence.
English
1.9K
3K
15.7K
4.1M
Cezar Andrei retweetledi
elvis
elvis@omarsar0·
NEW research from FAIR at Meta, Cornell, and CMU. This paper is a bigger deal than it seems. Apparently, you don't need billions of parameters to teach an AI model to reason. The default approach to post-training language models for reasoning today remains finetuning millions or even billions of parameters. But what if the signal needed for reasoning is far sparser than we assume? This new research introduces TinyLoRA, a method that scales low-rank adapters down to as few as a single trainable parameter. Using TinyLoRA with RL, they trained Qwen2.5-7B to 91% accuracy on GSM8K with only 13 parameters in bf16. That's 26 total bytes. So what's the idea? RL and SFT require fundamentally different amounts of model capacity. SFT must absorb the full demonstration, encoding both task-relevant structure and irrelevant noise into the update. RL receives a sparser, cleaner signal. The reward separates what matters from what doesn't, so resampling amplifies useful information while noise cancels out. Here are the results: On GSM8K, models trained with GRPO reach 90% accuracy with fewer than 100 parameters. Models of the same capacity trained with SFT barely outperform the base model. On harder benchmarks like MATH500, AIME, and AMC, finetuning just 196 parameters retains 87% of the absolute performance improvement averaged across six benchmarks. The trend scales with model size, too. Larger models need proportionally smaller updates, suggesting trillion-scale models may be trainable for many tasks with just a handful of parameters. The key takeaway is that reasoning may already live inside pretrained models. RL doesn't inject new knowledge; it surfaces what's already there, and it can do so with almost no parameter change at all. Paper: arxiv.org/abs/2602.04118 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
21
94
583
50.8K
Cezar Andrei
Cezar Andrei@cezarandrei·
If this is true, the only limiting factor to progress is compute power.
Connor Davis@connordavis_ai

MIT just published a paper that quietly explains why LLM reasoning hits a wall and how to push past it. The usual story is that models fail on hard problems because they lack scale, data, or intelligence. This paper argues something much more structural: models stop improving because the learning signal disappears. Once a task becomes too difficult, success rates collapse toward zero, reinforcement learning has nothing to optimize, and reasoning stagnates. The failure isn’t cognitive, it’s pedagogical. The authors propose a simple but radical reframing. Instead of asking how to make models solve harder problems, they ask how models can generate problems that teach them. Their system, SOAR, splits a single pretrained model into two roles: a student that attempts extremely hard target tasks, and a teacher that generates new training problems. The catch is that the teacher is not rewarded for producing clever or realistic questions. It is rewarded only if the student’s performance improves on a fixed set of real evaluation problems. No improvement means zero reward. That incentive reshapes everything. The teacher learns to generate intermediate, stepping-stone problems that sit just inside the student’s current capability boundary. These problems are not simplified versions of the target task, and strikingly, they do not even require correct solutions. What matters is that their structure forces the student to practice the right kind of reasoning, allowing gradient signal to emerge even when direct supervision fails. The experimental results make the point painfully clear. On benchmarks where models start with zero success and standard reinforcement learning completely flatlines, SOAR breaks the deadlock and steadily improves performance. The model escapes the edge of learnability not by thinking harder, but by constructing a better learning environment for itself. The deeper implication is uncomfortable. Many supposed “reasoning limits” may not be limits of intelligence at all. They are artifacts of training setups that assume the world provides learnable problems for free. This paper suggests that if models can shape their own curriculum, reasoning plateaus become engineering problems, not fundamental barriers. No new architectures, no extra human data, no larger models. Just a shift in what we reward: learning progress instead of answers.

English
0
0
0
8
Cezar Andrei
Cezar Andrei@cezarandrei·
Very important but also a group of people with agency have to also collaborate to go into the approximate same direction of a bigger vision.
Andrej Karpathy@karpathy

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are we educating for agency? Are you acting as if you had 10X agency? Grok explanation is ~close: “Agency, as a personality trait, refers to an individual's capacity to take initiative, make decisions, and exert control over their actions and environment. It’s about being proactive rather than reactive—someone with high agency doesn’t just let life happen to them; they shape it. Think of it as a blend of self-efficacy, determination, and a sense of ownership over one’s path. People with strong agency tend to set goals and pursue them with confidence, even in the face of obstacles. They’re the type to say, “I’ll figure it out,” and then actually do it. On the flip side, someone low in agency might feel more like a passenger in their own life, waiting for external forces—like luck, other people, or circumstances—to dictate what happens next. It’s not quite the same as assertiveness or ambition, though it can overlap. Agency is quieter, more internal—it’s the belief that you *can* act, paired with the will to follow through. Psychologists often tie it to concepts like locus of control: high-agency folks lean toward an internal locus, feeling they steer their fate, while low-agency folks might lean external, seeing life as something that happens *to* them.”

English
0
0
0
9
Cezar Andrei
Cezar Andrei@cezarandrei·
@karpathy Vibe coding is all the way to number 5. Isn't it amazing?
English
0
0
0
44
Cezar Andrei retweetledi
Wes Roth
Wes Roth@WesRoth·
A leak reveals Google is working on TorchTPU, a secret project to make PyTorch run natively on Google TPUs breaking NVIDIA’s legendary CUDA lock-in.
Ricardo@Ric_RTP

Google just launched a direct attack on Nvidia's most valuable asset. Not their chips. Their SOFTWARE. And if this works, Nvidia's $4 trillion empire collapses. Here's what just leaked: Google is building "TorchTPU" - a secret project that makes PyTorch seamlessly run on Google's TPU chips instead of Nvidia GPUs. Why does this matter? PyTorch is the MOST USED AI framework on Earth. Every AI developer uses it. And PyTorch was built around Nvidia's CUDA software. Wall Street analysts call CUDA "Nvidia's strongest defensive wall." It's the reason companies can't easily switch away from Nvidia even when alternatives exist. You don't just buy Nvidia chips. You buy into their entire ecosystem. Switching costs MILLIONS in engineering work. Months of rewrites. Performance drops. So companies stay locked in. Even when Nvidia raises prices. Even when supply runs short. That's not a hardware moat. That's a SOFTWARE prison. And Google just found the escape route. Here's the problem Nvidia created for itself: Google's TPU chips are actually GOOD. Competitive performance. Better availability. Lower cost. But developers won't use them because Google's chips run JAX (Google's internal framework), not PyTorch. That means if you want to use Google TPUs, you have to rewrite your entire codebase. Nobody wants to do that. So Google TPUs sit unused while developers fight over Nvidia chips. Until now. TorchTPU makes PyTorch run natively on Google hardware. No rewrites. No performance loss. No months of engineering. You just... switch. And Google is partnering with META (who built PyTorch) to make it happen. They're even considering OPEN-SOURCING parts of it to speed adoption. Translation: Google is willing to give this away for free just to break Nvidia's lock. The implications are insane: Every company currently paying Nvidia's premium prices suddenly has a way out. Oracle, Microsoft, OpenAI - all locked into Nvidia's ecosystem - can switch to Google. Nvidia's pricing power evaporates overnight. And the timing is perfect: Nvidia is already facing heat. Semiconductor index dropped 3% today. Oracle just lost their biggest investor over AI spending concerns. Companies are realizing AI infrastructure costs are unsustainable. Now Google hands them an alternative. Same performance. Lower cost. Better availability. Jensen Huang knows exactly what this means. CUDA has been Nvidia's untouchable advantage for YEARS. It's why Nvidia trades at 50x earnings while AMD trades at 25x. The software moat justified the premium. But if Google removes that switching cost? Nvidia becomes just another chip company. And chip companies compete on price, not ecosystem lock-in. Here's what happens next: Google needs 12-18 months to make TorchTPU production-ready. If it works, cloud providers will adopt it instantly. They WANT an alternative to Nvidia's monopoly pricing. Amazon already building their own Trainium chips. Microsoft making Maia. They're all trying to escape Nvidia. Google just gave them the software bridge. Nvidia's response options are limited: They can't buy Google. Can't kill PyTorch (Meta owns it). Can't stop open source. Their only play is to keep improving CUDA faster than Google can catch up. But that's a race, not a moat. The market isn't pricing this in yet. Nvidia down 2% today. Google down 2%. Investors think this is just "another competitor." They don't understand this is an attack on the FOUNDATION of Nvidia's valuation. Hardware is replaceable. Software lock-in is what made Nvidia worth $4 trillion. Google is attacking the lock-in. Watch what happens in 2026 when TorchTPU goes live and companies realize they can actually leave Nvidia. The "Nvidia is unstoppable" narrative dies. And a $4 trillion valuation built on software moats gets repriced.

English
114
552
8.3K
781.5K
Cezar Andrei
Cezar Andrei@cezarandrei·
"This is similar to how our brain manages short-term and long-term memory simultaneously. We might finally be closing the gap between AI and the human brain's ability to continually learn." Interesting...
Akshay 🚀@akshay_pachaar

Google just dropped "Attention is all you need (V2)" This paper could solve AI's biggest problem: Catastrophic forgetting. When AI models learn something new, they tend to forget what they previously learned. Humans don't work this way, and now Google Research has a solution. Nested Learning. This is a new machine learning paradigm that treats models as a system of interconnected optimization problems running at different speeds - just like how our brain processes information. Here's why this matters: LLMs don't learn from experiences; they remain limited to what they learned during training. They can't learn or improve over time without losing previous knowledge. Nested Learning changes this by viewing the model's architecture and training algorithm as the same thing - just different "levels" of optimization. The paper introduces Hope, a proof-of-concept architecture that demonstrates this approach: ↳ Hope outperforms modern recurrent models on language modeling tasks ↳ It handles long-context memory better than state-of-the-art models ↳ It achieves this through "continuum memory systems" that update at different frequencies This is similar to how our brain manages short-term and long-term memory simultaneously. We might finally be closing the gap between AI and the human brain's ability to continually learn. I've shared link to the paper in the next tweet!

English
0
0
0
7
Cezar Andrei retweetledi
Brian Roemmele
Brian Roemmele@BrianRoemmele·
How small is a transistor on a modern processors?
English
254
2.4K
15.8K
1M
Cezar Andrei
Cezar Andrei@cezarandrei·
Already proven the best model is getting better!
Google@Google

Today @GoogleDeepMind and @GoogleResearch are introducing WeatherNext 2, our most advanced and efficient forecasting model. WeatherNext 2 can generate forecasts 8x faster and provide hundreds of possible weather outcomes for more accurate forecasts.

English
0
0
0
7
Cezar Andrei retweetledi
Massimo
Massimo@Rainmaker1973·
The Rhind Mathematical Papyrus is the oldest manuscript written in algebra and trigonometry, dating back to 3,550 years ago. It shows that the Egyptians used first-order equations, geometric series and a second-order algebraic equation, related to the Pythagorean theorem a² + b² = c² It also describes how to obtain an approximation of π accurate to within less than 1% and one of the earliest attempts at squaring the circle.
Massimo tweet media
English
101
262
1.3K
90.7K
Cezar Andrei retweetledi
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
🚨 Google’s new approach makes AI learn, adapt, and remember more like a human brain. TLDR: Google Research has introduced a new ML paradigm called Nested Learning, designed to help models learn new tasks without forgetting old ones. - A proof-of-concept model called “Hope” was built using this approach - Hope shows better long-context memory and language modeling performance than standard transformers - The framework introduces “continuum memory systems,” where memory updates occur at different frequency rates - Experiments show Hope achieves lower perplexity and higher accuracy on reasoning and long-context tasks - Nested Learning aims to reduce catastrophic forgetting and bring AI closer to human-like continual learning
AshutoshShrivastava tweet media
Google Research@GoogleResearch

Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI @GoogleAI

English
27
94
835
100.1K
Cezar Andrei
Cezar Andrei@cezarandrei·
@cb_doge @grok Hey Grok, can you estimate how much of total LLM traffic is going through OpenRouter?
English
1
0
1
197
DogeDesigner
DogeDesigner@cb_doge·
BREAKING: xAI is officially leading the AI model market on OpenRouter. xAI has overtaken all competitors to dominate nearly one-third of total market share defeating Google, OpenAI and others.
DogeDesigner tweet media
English
246
173
1.2K
62.3K