David Hall

215 posts

David Hall banner
David Hall

David Hall

@halldm2000

Researcher: Artificial Intelligence, Extreme Weather, Climate, Physics

เข้าร่วม Ağustos 2012
560 กำลังติดตาม107 ผู้ติดตาม
David Hall รีทวีตแล้ว
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
966
2.1K
19.4K
3.5M
David Hall รีทวีตแล้ว
Jim Fan
Jim Fan@DrJimFan·
Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible. Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream. We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached. As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first. Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset. A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities: - Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090. - Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor. - Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains +17% real-world success out of the box on a fruit packing task. We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too. 2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling! Links in thread:
English
80
177
1.2K
201.2K
David Hall รีทวีตแล้ว
China pulse 🇨🇳
China pulse 🇨🇳@Eng_china5·
Unitree Robotics robot shooting test. It feels like it was generated by AI. It’s terrifying… in the future, wars might not need humans anymore
English
1.8K
2.6K
11.1K
1.5M
David Hall รีทวีตแล้ว
Simplifying AI
Simplifying AI@simplifyinAI·
wild.. tencent researchers just killed fine-tuning.. They built a "training-free" method that costs $18 and outperforms $10k reinforcement learning setups. It's called "Training-Free GRPO" and it proves you don't need to update a single parameter to get Reinforcement Learning performance. Instead of expensive gradient updates, the model learns from "Semantic Advantage", a natural language memory of its own successes and failures. - No Gradients: The model stays frozen. - Self-Correction: It introspects its own rollouts to extract "what worked" into a text-based experience library. - Massive Efficiency: Achieves fine-tuned performance with just 100 examples. - Cost: ~$18 total (vs $10,000+ for traditional RL). - It’s effectively an agent that writes its own "strategy guide" in real-time.
Simplifying AI tweet media
English
12
58
489
24.9K
David Hall รีทวีตแล้ว
CyberRobo
CyberRobo@CyberRobooo·
Just saw something that actually feels like a real leap in robotics hardware.👋 @AllonicRobotics built a robot hand using 3D Tissue Braiding,basically weaving high-strength fibers around a minimal rigid skeleton the way human connective tissue wraps around bone. No hundreds of screws, bearings, cables or fiddly joints. Instead,a continuous automated process that creates the tendons, soft tissue & compliant structure all at once. The outcome is wild: »strong yet naturally soft & safe for close human interaction »surprisingly dexterous »produced from digital design→physical part in minutes »cost drops so much that you could eventually swap end-effectors like disposable gloves This is starting to feel like the moment robotic bodies get their own “3D printing revolution”. Hardware iteration speed finally approaching software speed. If this scales, it could be one of the missing pieces that lets dexterous humanoid robots move from lab → factories → homes. (Oh, and the company just raised $7.2M Pre-Seed,largest ever in Hungary. Budapest-based with US HQ. Led by Visionaries Club + angels from OpenAI, Hugging Face, ETH Zurich, Northwestern etc.) Prototype hand looks insane,the woven fiber texture is unreal.
English
175
936
5.8K
256.8K
David Hall รีทวีตแล้ว
Phil Trubey
Phil Trubey@PTrubey·
This morning at NeurIPS, Rich Sutton reminded us that we need continual learning to reach AGI. This afternoon, Ali Behrouz presented a Google poster paper, Nested Learning, which provides new ideas on the path to continual learning. I recorded the 40 minute talk as it might be useful for some researchers in the audience. For the rest of us, I subscribe to Andrej Karpathy's suspicion that it will take a 5-10 papers like this to move us to AGI from where we are now, just like it took about 10 papers to move from 2012's AlexNet to ChatGPT. At the very end, I ask Ali how far along to continual learning this represents. Full paper link below, as well as a YouTube link. ps. sorry about the first 2 minutes of bad audio since there were 2 idiots standing beside me have a conversation right in front of this presenter in a rather packed poster presentation. Honestly, tamp down your egos guys and show come common courtesy!
English
45
247
2.4K
234.8K
David Hall รีทวีตแล้ว
Eric Wall
Eric Wall@ercwl·
i'm finding it increasingly difficult to participate in modern discourse to the point that i don't really know what to do or how to talk to people over the past few years, i've seen the capabilities of ai continue to improve, and all the goalposts of what ai should/shouldn't be able to do constantly shift it's completely obvious that we've discovered a way for machines to learn things. they're now at the capabilities in math and coding where they outcompete most humans on most task. it should be clear to you that the fundamental barriers that you thought would prevent ai from getting better haven't actually prevented ai from getting better. it was only a year ago that you said that their capabilities would flatline because we're running out of training data. that sentiment peaked around september last year, and for all the lamentations about data walls or efficient compute frontiers, new paradigms (in RL and others) *have worked* machines continue to get better. we don't know where they'll stop. and the methods by which they think (cot, scratch pads, python tools) which have barely existed nanoseconds on the cosmic calendar, already *do* show forms of reasoning that allow them to reach impressive conclusions if we don't agree about this, that machines will continue to learn, and we'll continue to explore paradigms for them in which to think, and that it is atleast quite likely that they'll continue to get better, i already don't know how to converse with you if you still observe this from a "oh the statistical parrot machine spat out a token in a sequence that was likely, wow, such reasoning!" point of view, i already don't know how to converse with you and here's where things get more complicated if you're with me this far, it should be reasonable to describe our current world as one where a new form of possibly superior intelligence is arriving and is steadily improving now, where things get completely bizarre for me is where *you* think that your opinions of what this new form of intelligence can and cannot do matter or are relevant. and this bleeds into topics like quantum and everything else for me from my point of view, you are now a chimpanzee on planet earth, studying the arrival of "humans" you climb up every tree on earth and jump down from it, and you establish that gravity seems to work equally on all surfaces of the earth, and from this experiment you establish that regardless of how smart this new "human" is, it will not be possible for it to master space travel. because *you* understand gravity if you don't approach the world from a point of view that you don't actually know anything about what an intelligence smarter than you is capable or not capable of doing, i already don't know hot to converse with you and i'm not talking about a leap in intelligence that requires something impossible that has never happened in history before. i'm just talking about the difference in intelligence between chimpanzees and humans. which has happened before. and led to space travel. the only thing that was required for this superior form of intelligence to evolve on the planet (from molecules to *us*) was for a rock (earth) to get hit by an ice ball and then spin around the sun a bunch of times. the ingredients that produce intelligence are not complex, they only require iterations and feedback from the universe and it appears that this exact process is now happening in machines which do not require the thousand-year process of evolution, because we have invented a way for rocks to think in silicon, and the substrate of intelligence (the matter on which it runs) have shifted to something where it's far more malleable and the iterations are much faster to me, all of these things are rather simple, easily observable phenomena. they're all ongoing and they're real. but in all conversations, basically no matter where i look, everyone is still stuck in a "but hoomans can't do space travel because muh gravity"-type of reasoning and are incapable of embracing the very realistic prospects that all our models are quite likely to soon be broken i tire of the conversation, and find that i do not have much to add to what you are currently talking about, because it appears that we are so diametrically opposed from each other in our understanding of what is happening in the world that we can not begin to have a useful exchange on current topics
English
264
132
1.4K
203.6K
David Hall รีทวีตแล้ว
Min Choi
Min Choi@minchoi·
Hailuo 02 video just broke the Internet yesterday. This is 100% AI 7 wild examples + prompt: 1. Animal Olympics
English
35
88
592
133.1K
David Hall
David Hall@halldm2000·
I highly recommend this podcast. It does an awesome job explaining how AI works and is evolving and where it’s likely to go. Best description I’ve heard so far of the existing models. podcasts.apple.com/us/podcast/the…
English
0
0
0
41
David Hall รีทวีตแล้ว
AI Notkilleveryoneism Memes ⏸️
o3 scores 136 IQ on Mensa Norway, qualifying for Mensa
AI Notkilleveryoneism Memes ⏸️ tweet media
AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

It's happening: OpenAI's new model jumped ***30 IQ points*** to 120 IQ 120 IQ is higher than 9 in 10 humans. "Pour one out for our 300,000-year reign as the smartest species on the planet. Was a great run." - @waitbutwhy "Worried about AI taking over the world? You probably should be. That’s my new takeaway after testing OpenAI’s new model. I had become blasé about AI progress after my initial tests in February, because there was approximately zero IQ improvement since then. This week, that all changed." Note: the author @maximlott administered another contamination free test which showed a lower score (~100 - average human) but a relatively similar leap forward in IQ. So regardless of which score you look at, the leap was HUGE, and the trend is obvious. There is not much time left.

English
55
166
1.5K
197.3K
David Hall รีทวีตแล้ว
ₕₐₘₚₜₒₙ
ₕₐₘₚₜₒₙ@hamptonism·
This is genuinely a billion dollar industry.
English
177
545
5.8K
1.1M
David Hall รีทวีตแล้ว
AI Notkilleveryoneism Memes ⏸️
How can people can see chart after chart like this and not realize the tsunami that is imminent AGI companies are about to unleash 100 billion superhuman shoggoth coders onto the internet
AI Notkilleveryoneism Memes ⏸️ tweet media
AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

"Due to improved coding and research engineering performance, OpenAI o3-mini is the first model to reach Medium risk on Model Autonomy." This is the riskiest possible threat level where OpenAI has said they will still release a model.

English
108
112
981
272.2K
David Hall รีทวีตแล้ว
Josie Kins
Josie Kins@Josikinz·
The first version of our exploratory LLM Self-Model analysis is complete and publicly available! Including all prompts, outputs, comics, resulting data, and a breakdown of our methodology. Link to the webpage in the thread below ⬇️
Josie Kins tweet media
English
44
102
766
70.4K
David Hall รีทวีตแล้ว
AI Digest
AI Digest@aidigest_·
If the faster trend continues, agents might reach month-long tasks in 2027. However, looking at just one year's data gives a less robust estimate. The rate of progress might slow down.
AI Digest tweet media
English
2
3
20
2.4K
David Hall รีทวีตแล้ว
Brian Roemmele
Brian Roemmele@BrianRoemmele·
Preview.
English
189
361
1.9K
191.3K
David Hall รีทวีตแล้ว
vittorio
vittorio@IterIntellectus·
how do you even defend from an army of these chasing you?!
English
441
165
1.8K
313.5K