Jack Monas

127 posts

Jack Monas

Jack Monas

@JackMonas

AI + World Models @1x_tech Prev. @Princeton @GoogleQuantumAI, @MSFTResearch

Beigetreten Ocak 2014
385 Folgt1.3K Follower
Jack Monas
Jack Monas@JackMonas·
This is going to be fun....excited to scale this up with you @_sam_sinha_!
Jack Monas tweet media
Samarth Sinha@_sam_sinha_

I am SO excited to be sharing that I am joining @BerntBornich and @1x_tech to lead the new 1X World Model Lab aimed at building the next frontier of embodied AI! The core guiding principle of the lab is: scale up along every damn axis!! 🚀 Robotics data is NOT a second-class citizen - it is too important of a problem to be left to fine tuning! Your model needs to see your most important tokens from step 0 We need to think about robotics through the first principles of AI: how do we best utilize the vast amounts of web-scale media and how do we create a data-flywheel to collect millions of hours of rich robot interactions. There is no other moat in AI outside of data and @1x_tech has done an INCREDIBLE job scaling manufacturing, production and hardware to build humanoid robots that can create a unique data-flywheel in unstructured environments. Scaling data collection for highly dexterous on-policy robot data will be the only way for creating a moat in AI. @JackMonas and team have made great progress in building World Models, and now the goal is to supercharge this effort by starting a hyper-focused scale and data-pilled lab. Before scaling compute / data / models, we are currently RAPIDLY scaling our team and hiring across the 4 core pillars of AI: model + data, data infra, ML infra and evals. Looking for folks that are excited about the 0->1 problem and share the same principles as us. There’s a single application for everyone in the lab - if you’re a good at engineering and ML, we will find a place for you in the team ❤️ AGI won’t be solved by fine-tuning… Let’s build the next frontier of AI together 🚀 My DMs are always open!!

English
2
2
46
4K
Jack Monas retweetet
Eric Jang
Eric Jang@ericjang11·
For the last few months I've been working on a from-scratch implementation of AlphaGo, a 2016 AI breakthrough that inspired me to get into deep learning. My casual understanding of AlphaGo was "search-augmented deep neural networks trained with self-play", but I wanted to go deeper and understand it by creating it. Frontier deep learning research has always been expensive, but any given capability gets cheaper very quickly. In 2026, you no longer need DeepMind's resources to train a strong Go AI - you can vibe code all of it yourself for just a few thousand dollars of rented compute. It was a huge honor to be invited to teach this with @dwarkesh_sp on @dwarkeshpodcast I am an AlphaGo & Go apprentice, not a master, so all factual errors in the podcast are mine. Web version of tutorial: evjang.com/2026/04/28/aut… Code: github.com/ericjang/autogo Play the go bot here: autogo.evjang.com
Dwarkesh Patel@dwarkesh_sp

New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Timestamps: 0:00:00 – Basics of Go 0:08:06 – Monte Carlo Tree Search 0:31:53 – What the neural network does 1:00:22 – Self-play 1:25:27 – Alternative RL approaches 1:45:36 – Why doesn’t MCTS work for LLMs 2:00:58 – Off-policy training 2:11:51 – RL is even more information inefficient than you thought 2:22:05 – Automated AI researchers

English
49
181
2.4K
533.6K
Jack Monas retweetet
1X
1X@1x_tech·
Building Your NEO
English
115
227
2.2K
663.8K
Jack Monas retweetet
The Humanoid Hub
The Humanoid Hub@TheHumanoidHub·
Jitendra Malik rants about parallel-jaw grippers being inadequate; he believes multi-fingered hands with tactile sensing are necessary for advanced dexterous manipulation. Malik is a Professor at UC Berkeley and a Distinguished Scientist at Amazon.
Jitendra MALIK@JitendraMalikCV

At the RI seminar at CMU yesterday, I presented a 3 level analysis of robot skills & discussed the pros and cons of teleoperation, simulation, and learning from videos, before presenting our research. Enjoy! youtube.com/watch?v=ry8iti…

English
6
15
135
23.3K
Jack Monas retweetet
Nicholas Pfaff
Nicholas Pfaff@NicholasEPfaff·
Meet SceneSmith: An agentic system that generates entire simulation-ready environments from a single text prompt. VLM agents collaborate to build scenes with dozens of objects per room, articulated furniture, and full physics properties. We believe environment generation is no longer the bottleneck for scalable robot training and evaluation in simulation. Website: scenesmith.github.io 👇🧵(1/8)
English
18
80
565
74K
Jack Monas retweetet
Joel Jang
Joel Jang@jang_yoel·
Introducing DreamZero 🤖🌎 from @nvidia > A 14B “World Action Model” that achieves zero-shot generalization to unseen tasks & few-shot adaptation to new robots > The key? Jointly predicting video & actions in the same diffusion forward pass Project Page: dreamzero0.github.io 🧵 (1/10)
English
18
49
260
61.9K
Jack Monas retweetet
RoboPapers
RoboPapers@RoboPapers·
Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company @1x_tech has a solution: world models. 1X Director of Evaluations @itsdanielho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with @micoolcho and @chris_j_paxton, now!
English
6
7
59
27.5K
Jack Monas retweetet
Arash Vahdat ✈️ #CVPR2026
Arash Vahdat ✈️ #CVPR2026@ArashVahdat·
🚀 Diffusion too slow? Fix it in a few steps. 📢 Introducing NVIDIA FastGen — a plug-and-play research library for turning slow diffusion models into high-quality few-step generators. ⚡ What’s inside: • Consistency & MeanFlow (CM, sCM, TCM, MeanFlow) • Distribution Matching (DMD, f-Distill, LADD) • Long-video generation (CausVid, Self-Forcing) • Fine-tuning & KD (SFT, CausalSFT, KD, Causal KD) 🧠 Includes: 📷 EDM, DiT, SD 1.5, SDXL, Flux 🎬 WAN (T2V / I2V / VACE), CogVideoX, Cosmos Predict2 ✨ One unified interface. Research-ready. Apache-2.0. 🔗 Blog: nvda.ws/3LARhFy 💻 Code: github.com/NVlabs/FastGen
Arash Vahdat ✈️ #CVPR2026 tweet media
English
8
53
361
48.8K
Anurag Bagchi
Anurag Bagchi@Miccooper9·
[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !
English
12
45
415
43K
Jack Monas retweetet
Moo Jin Kim
Moo Jin Kim@moo_jin_kim·
We release Cosmos Policy 💫: a state-of-the-art robot policy built on a video diffusion model backbone. - policy + world model + value function — in 1 model - no architectural changes to the base video model - SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%) 🧵👇
English
18
109
862
148.7K
Jack Monas
Jack Monas@JackMonas·
@ericjang11 Eric the Mage, thank you for everything. It has been a pleasure working with you.
English
0
0
7
622
Eric Jang
Eric Jang@ericjang11·
Life update: I've decided to leave 1X. It's been an honor helping grow the company. I joined Halodi Robotics in 2022 (prior name of the company) as the only California-based employee. At the time, we were about 40 based out of Norway and 2 in Texas. My first hire and I worked from my garage for a few months to save money. Today, 1X is hundreds of people, with hardware, design, software, AI, manufacturing, product all relocated to the SF Bay area, firing on all cylinders and working on getting NEO ready for the home. A big thank you to all my colleagues that I worked with. It was a hard decision to leave. When working at an exciting startup that is growing fast, there's always so much to do and never a perfect time time to move on. We have several works in the pipeline that are so exciting because they greatly advance general autonomy and scalability of our deployment approach and really show a realistic path towards the product working. The recent World Model autonomy update is one example, and there's more coming. The 1X factory is so exciting. Things are accelerating at a speed I would have been surprised by a few years ago. In 2022, most technologists and researchers and VCs were skeptical about humanoids and large scale imitation learning. "Why Legs?" "How could end-to-end learning ever be good enough?" "Why go for the home and not the factory?" "How will we ever gather enough data?" The Overton window on general-purpose robotics has shifted a lot since then. Although we are still early in our mission, I remain confident that soon, house robots will be as commonplace as air conditioners, cars, and ChatGPT. Just talk to the bot, and it will go and quietly get it done. Entire economies will eventually re-organize around this technology. People get it now. What's next? I believe that progress in applied deep learning generally rides on "harnessing the magic" of a few magical objects. These magical objects possess way more generalization power than one might normally expect. Just asking the LLM to understand what you want is magic. Video generation models are magic. Reasoning is magic. You don't run into a magic object every day, but when you do, you make sure to grab it and put it to work to make something useful in the robot somehow. A lot of my early conviction for where robotics was headed was working on BC-Z from 2018-2021. The "magical object" I bet on at the time was the surprising data-absorption capabilities of supervised learning and "just ask for generalization". This pioneered a lot of the standard ingredients we see in VLAs today: - Generalization to unseen language commands - Human-Guided DAgger for policy improvement - Open-loop auxiliary predictions + receding horizon control, AKA action chunking - Manipulation keypoints to improve servoing - Simple ResNet18 with FiLM conditioning on multi-modal inputs The next "magical object" we bet on at 1X was video models, because they are clearly magical objects that learn a data distribution not too dissimilar from what a robot needs to learn. They generalize surprisingly well. I am once again feeling that there are more magical objects in play now, which opens up a lot of new possibilities for robotics and beyond. I'm taking a few months to empty my cup of priors and gain fresh perspective. When I left Google in 2022, I spent about 2 weeks deciding what to do next. This time, I want to take a lot more time to catch up what has happened in the broader AI + robotics space. I've been re-implementing some deep learning papers. I'm working on a big tutorial for my blog. I'm learning all the Claude power user tricks. I'm reading the Thinking Machines blog posts to understand what kinds of experiments are being run at frontier labs. I'm reading Ben Katz's 2016 thesis on the Mini-cheetah actuator. I'm traveling to China in March to meet incredible companies in the Chinese robotics ecosystem. Now, more than ever, is the time for both humans and machines to learn. The next token of my life sequence will be an important one. To colleagues and investors that bet on 1X early, even before we became a household name - I thank you from the bottom of my heart. I won't forget it♥️
English
154
43
1.7K
287K
Jack Monas
Jack Monas@JackMonas·
@robotryer @1x_tech Interesting questions. The user should be able to define behaviors. We have some work to do to figure out the “how” so stay tuned.
English
0
0
7
98
Robert Moore
Robert Moore@robotryer·
@JackMonas @1x_tech How and who defines the preferred behaviors? Will the owners be able to define NEOs “personality “ in that regard?
English
1
0
2
195
Jack Monas
Jack Monas@JackMonas·
One of many next steps at @1x_tech: preference learning for world-model-based policies. Given a generated starting frame, we can sample multiple video rollouts from our WM and use preference feedback to steer the model toward higher-quality behavior. This lets us fix policy failures in synthetic worlds—resolving bad NEO behaviors with generated dogs before we ever meet real ones.
English
8
23
198
19.6K
Jack Monas
Jack Monas@JackMonas·
@fatemi_michael Your work during your time at 1X helped lay the foundation for this release. Thank you Michael!
English
0
0
6
127
Jack Monas retweetet
Peter Liu
Peter Liu@peterliuposts·
One of the coolest examples we found is NEO holding up a peace sign WM both understands what a peace sign is and is self aware (no hands in starting frame) + the IDM extracts finger level actions :)
1X@1x_tech

NEO’s Starting to Learn on Its Own

English
3
4
17
2.2K
Jack Monas retweetet
Karan Dalal
Karan Dalal@karansdalal·
LLM memory is considered one of the hardest problems in AI. All we have today are endless hacks and workarounds. But the root solution has always been right in front of us. Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data. Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available. Blog: nvda.ws/4syfyMN Arxiv: arxiv.org/abs/2512.23675 This has been over a year in the making with @arnuvtandon and an incredible team.
Karan Dalal tweet media
English
91
321
2.1K
574.4K