Peter

73 posts

Peter

Peter

@PeterSushko

Web agents @allen_ai Built MolmoWeb https://t.co/PjSLu7OOlz

Seattle Katılım Haziran 2024
131 Takip Edilen61 Takipçiler
Peter retweetledi
Ai2
Ai2@allen_ai·
Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵
English
3
45
169
120.5K
Peter
Peter@PeterSushko·
@DJiafei Congratulations Jiafei!
Català
0
0
1
164
Jiafei Duan
Jiafei Duan@DJiafei·
4 years have been simply amazing! I’m happy to share that I have successfully defended my PhD! Thank you to everyone who came to support me, and most importantly, to my thesis committee, advisors, collaborators, friends, and family for being part of this journey.
Jiafei Duan tweet media
English
41
12
293
17.2K
Peter
Peter@PeterSushko·
LLM evals are hard. Agentic evals are very hard. Web browsing evals are crazy. The same webpage will show different content based on: Time of year (seasonal promos) Your IP (stores near me) Your device (os+browser combo) Random A/B tests This codebase solves evals and training.
Ai2@allen_ai

You can now train, adapt, and eval web agents on your own tasks. We're releasing the full MolmoWeb codebase—the training code, eval harness, annotation tooling, synthetic data pipeline, & client-side code for our demo. 🧵

English
1
7
17
4.4K
Peter retweetledi
Ai2
Ai2@allen_ai·
Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵
English
9
62
284
84.3K
Peter retweetledi
Tanmay Gupta
Tanmay Gupta@tanmay2099·
72hrs after the release, looking at the community’s excitement around MolmoWeb, I have been reflecting on what leading this project throughout the past year was actually like. It didn’t feel like winning. It felt like a constant uphill battle. Making the case that this is worth building. Building a team around the project from the ground up. Working through compute constraints and org-wide competing priorities. Showing early demos that didn’t quite land. And so on. But reading people’s comments, it is clear that builders wanted an open web agent they could run locally. They wanted MolmoWeb. For me, it is a powerful reminder that sometimes you must go against the grain. Sometimes you must work in silence until your results can speak for themselves. If you are wrong, you will learn. If you are right, you might just give the world what it needs.
Tanmay Gupta tweet media
English
3
3
28
3K
Peter retweetledi
Mehdi Quant
Mehdi Quant@quantum_citoyen·
@allen_ai Open weights stomping proprietary models on web agents? Fucking finally. 4B parameters pulling this off is wild.
English
2
0
2
587
Peter retweetledi
Ai2
Ai2@allen_ai·
Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵
Ai2 tweet media
English
21
114
809
129.8K
Twlvone
Twlvone@twlvone·
@allen_ai 4B open-weight browser agent beating proprietary models on web benchmarks. the capability gap that justified closed access just got a lot smaller.
English
1
0
1
184
Peter
Peter@PeterSushko·
@bnafOg @allen_ai And we will soon release a tool that will allow you to finetune MolmoWeb on a specific type of tasks/websites. This way you can taylor the model towards your needs!
English
0
0
1
16
Bnaf.OG | 🟧
Bnaf.OG | 🟧@bnafOg·
@allen_ai The 4B size is the key unlock — runs locally, no API keys, no data leaving your infra. Same screenshot-observation architecture as Claude Computer Use, just fully open-weight. Which of the 4 benchmarks showed the biggest gap vs. closed models?
English
1
0
2
272
Peter
Peter@PeterSushko·
@Web3__Youth @allen_ai It’s great at tasks on a single website. Like looking up plane tickets, finding specific information, online shopping etc. We will soon release the eval code on GitHub, so you can test it in benchmarks!
English
0
0
0
20
MyAI
MyAI@Web3__Youth·
@allen_ai that's a huge leap forward in open source web agents, what kind of tasks can molmoweb handle out of the box and how does it compare to existing automation tools like selenium
English
1
0
2
768
Peter
Peter@PeterSushko·
@anitakirkovska @allen_ai Playwright is a tool to execute actions in a browser. Molmoweb is a model that comes up with the right actions. Playwright is like the hand and molmoweb is like the brain
English
0
0
0
17
anita
anita@anitakirkovska·
@allen_ai how is this different than playwrigth?
English
1
0
2
414
Peter retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Ai2 just released MolmoWeb on Hugging Face A fully open multimodal web agent that autonomously controls browsers to complete tasks, achieving SOTA results and surpassing GPT-4o based agents on WebVoyager and Mind2Web.
DailyPapers tweet media
English
1
12
56
7.4K
Peter retweetledi
Boyuan Zheng
Boyuan Zheng@boyuan__zheng·
Still missing that sweet summer with the AI2 team ❤️CUA research is incredibly hard in academia — the lack of trajectories and RL environments is still a real bottleneck. (too profitable to open-source🥲) Excited to see MolmoWeb finally out and potentially unlock key directions for making CUA work: self-play, continual learning, RL in generative environments, and more. 2026 is going to be a big year for CUA. 🚀
Ai2@allen_ai

Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵

English
0
4
39
4.5K
Peter
Peter@PeterSushko·
Very proud & excited to share what i've been working on at Ai2. MolmoWeb is: 1. A pretty strong agent for browsing the web 2. A huge collection of artifacts. Synthetically generated data, human annotations, model checkpoints, evaluation codebase (coming soon) Check it out!
Ai2@allen_ai

Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵

English
0
5
26
4.2K
Peter retweetledi
Ranjay Krishna
Ranjay Krishna@RanjayKrishna·
We are releasing MolmoBot! We challenge the assumption that sim-to-real requires real-world finetuning. Our robot models beat strong baselines with no real world data. With enough diversity and scale in simulation, zero-shot transfer can actually work—across both static and mobile manipulation. Similar to all our projects, everything is open sourced.
Ai2@allen_ai

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

English
0
5
60
6K
Peter retweetledi
Graham Neubig
Graham Neubig@gneubig·
What I do if a paper I like doesn't have a GitHub repo. -2025: email the authors for the code 2026-: ask openhands to reimplements the code
English
5
9
171
16.4K
Peter retweetledi
Ai2
Ai2@allen_ai·
Molmo 2 (8B) is now available via @huggingface Inference Providers, courtesy of Public AI. State-of-the-art video understanding with pointing, counting, & multi-frame reasoning. Track objects through scenes and identify where + when events occur. 🧵
Ai2 tweet media
English
9
25
181
32.7K