edo
132 posts





I finally come to terms with what @demishassabis has been saying and doing, only thing that matters as a goal for building AI is accelerating science that helps us as a species. AI is just the most useful tool to build for it. Building AI one can get lost in the science of AI itself but those don’t matter in the long term, it only matters if we have improve sciences that directly improve the human condition!


Later in the convo: "So commit 0bad44707 is 1097 commits ahead of the current HEAD (6cb783c00). This means the fix hasn't been applied yet in the current testbed. Let me apply the fix manually based on that commit" h/t to @paradite_ for finding the commit, cc @YouJiacheng

















the fascinating (to me) quality of hard-core RL researchers (e.g. Sutton, but also many others) is the ability to have this very broad, all encompassing view of RL as the principle basis of intelligence, while at the same time working on super low level stuff like temporal differences algorithms in a tabular world, and yet strongly believe these are actually the same thing.


the masculine urge to reimplement muon from scratch in jax


It is even more fun to see how Memo reacts to unseen environments. We deploy it to 6 unseen Airbnbs and task the robot with fine-grained tasks such as picking up utensils from the plate. Because we train on data from over 500 homes, the new home is instantly familiar to Memo.


In my opinion, VLAs research is extremely empirical compared to many other directions. Simulations like LIBERO are no longer statistically meaningful to VLAs, as we can overfit to ~99% easily now. Urgent priorities: 1) Create new sim benchmarks 2) Show real-world experiments improve behaviors 3) Conduct more ablation studies on data recipes/ model arch improvements. A very cool direction is proving real and sim are calibrated. Like real2sim-eval.github.io, we need rigorous methods to show simulation is proportional to the real world. Thus, we can scale generalist task benchmarks more cost-efficiently. Without the above, we can hardly tell if new incremental components work, or if the base model is strong enough to solve the problem.







