Rowan Zellers

591 posts

Rowan Zellers banner
Rowan Zellers

Rowan Zellers

@rown

multimodal @thinkymachines. I also like to climb rocks and throw pottery. https://t.co/5Er4j39K71 (he/him)

San Francisco, CA Katılım Kasım 2008
998 Takip Edilen14.7K Takipçiler
Sabitlenmiş Tweet
Rowan Zellers
Rowan Zellers@rown·
Excited to introduce GPT-4o. Language, vision, and sound -- all together and all in real time. This thing has been so much fun to work on. It's been even more fun to play with -- with moments of magic where things feel totally fluid and I forget I'm video chatting with an AI.
English
22
38
369
95.2K
Rowan Zellers
Rowan Zellers@rown·
I find it amusing that Claude/Codex will describe race conditions as "racy"
English
0
0
8
1.4K
Rowan Zellers retweetledi
Xiangru (Edward) Jian
Xiangru (Edward) Jian@EdwardJian2·
🚀 Announcing CUA-Suite, a computer-use agent (CUA) training and evaluation ecosystem based on the largest open expert video corpus for desktop CUAs – VideoCUA. 55 hours of human demonstrations across 87 professional apps — 2.5× bigger than the previous largest dataset. 🌐 cua-suite.github.io
GIF
English
2
16
83
28K
Rowan Zellers
Rowan Zellers@rown·
Really cool work by the SI team! It's impressive to see a low level inverse dynamics model for computer use work here... both for computer tasks as well as a bonus self driving car demo
Standard Intelligence@si_pbc

Computer use models shouldn't learn from screenshots. We built a new foundation model that learns from video like humans do. FDM-1 can construct a gear in Blender, find software bugs, and even drive a real car through San Francisco using arrow keys.

English
0
2
64
9.6K
Rowan Zellers
Rowan Zellers@rown·
I realized an interesting connection between climbing and research recently In climbing, sometimes you can send a boulder problem and then get 'stuck' with a certain beta, with a bunch of moves that seem critical but end up being arbitrary. Cleaning it up by doing boulder repeats is valuable but hard. You're fighting against your mind telling you certain moves are necessary. Same is true for frontier research. Any model training recipe probably has a bunch of pieces that are unnecessary, or perhaps even harmful to performance. To make progress requires constantly questioning old truths. Just like boulder repeats -- careful experimentation is hard but it can be fun. A healthy lab is one where this is encouraged 😀
English
2
3
103
10.9K
Rowan Zellers retweetledi
Lilian Weng
Lilian Weng@lilianweng·
I’ve been telling people this a lot today: I enjoy so much working with people who care about what they are building and craftsmanship. It is a privilege to have a chance to work on something I’m passionate about, beyond making a living. I cherish it and don’t take it for granted.
English
65
64
1.6K
174K
Rowan Zellers
Rowan Zellers@rown·
doesn’t the name ‘Shimpo VL-Whisper’ sound like a vision / language / audio model?
Rowan Zellers tweet media
English
3
0
23
3.4K
Rowan Zellers
Rowan Zellers@rown·
@BigMeanInternet fair but then isn’t the question ‘why aren’t capitalists investing one trillion dollars into iphone-of-ebikes like companies?’
English
1
0
0
97
Malcolm Harris
Malcolm Harris@BigMeanInternet·
@rown I bet if capitalists invested one trillion dollars he wouldn't have gone bankrupt.
English
1
0
0
99
Malcolm Harris
Malcolm Harris@BigMeanInternet·
Why build all these data centers rather than the iphone of electric bikes? How come no one wants to build the iphone of electric bikes?
English
15
12
218
13.5K
Rowan Zellers retweetledi
Yue Zhao
Yue Zhao@__yuezhao__·
Discrete or continuous tokens? Or even tokenizer-free? The visual modeling debate rages on, but for now, let me introduce L24SQ, a provably optimal, regularizer-free quantizer with a large codebook (~200k), achieving SoTA reconstruction-compression tradeoff and generative power!
Yue Zhao tweet media
English
4
34
198
29K
Rowan Zellers
Rowan Zellers@rown·
@ziqiao_ma I might have missed this, do you have the ablation where you predict the next patch’s pixels via l2 loss instead of the embedding?
English
1
0
10
1.9K
Martin Ziqiao Ma
Martin Ziqiao Ma@ziqiao_ma·
NEPA: Next-Embedding Predictive Autoregression A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings. Key ideas: - One self-supervised signal: cosine-style next-embedding prediction - Autoregression runs directly on the embeddings from a native encoder (no offline encoder) - No pixel decoder (and loss), no contrastive pairs, no task-specific heads, no random masks Scales into modern ViT backbones and stays competitive after supervised fine-tuning: - ImageNet-1K (Base 83.8%; Large 85.3%) - ADE20K Fully open-sourced with reproducibility verified: - Homepage: sihanxu.me/nepa/ - Paper: arxiv.org/abs/2512.16922 - Code: github.com/SihanXU/nepa - Weights: huggingface.co/collections/Si… This work is led by @6SihanXu and advised by @SLED_AI, @sainingxie, and Stella X. Yu. Contributors: me, @wenhaocha1, @ChenXuweiyi, and @JinWeiyang18434.
Martin Ziqiao Ma tweet media
English
20
98
725
137.7K
Ben Goodger
Ben Goodger@bengoodger·
@rown Before you do would you be able to show the details of the atlas processes in activity Monitor? We are hunting some leaks and any info is helpful. Thank you!!
English
1
0
1
108
Rowan Zellers
Rowan Zellers@rown·
I was playing around with ChatGPT atlas browser (congrats @bengoodger and team). A lot of cool ML + UX codesign! I had it go to amazon and help me purchase things. Curiously... ChatGPT atlas adds some prompt injection... which the latest model recognizes, flags, and ignores (!)
Rowan Zellers tweet media
English
3
3
93
10.1K
Rowan Zellers
Rowan Zellers@rown·
@bengoodger FYI more thing you might want to fix (I haven’t used it in the last few days… just kept the app open. Will uninstall now)
Rowan Zellers tweet media
English
1
0
5
191
Ben Goodger
Ben Goodger@bengoodger·
@rown Thanks for sharing! We're working on a fix.
English
1
0
5
193
Rowan Zellers retweetledi
Ai2
Ai2@allen_ai·
Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥—and leads many open models on challenging industry video benchmarks. 🧵
Ai2 tweet mediaAi2 tweet mediaAi2 tweet media
English
7
64
320
126.2K
Rowan Zellers
Rowan Zellers@rown·
Also check out the cookbook example that @brandontrabucco put together for finetuning Qwen3-VL-235B on image classification github.com/thinking-machi… this is just scratching the surface, excited to see what the community can do with this!
English
0
2
31
3.5K
Rowan Zellers
Rowan Zellers@rown·
Today we are releasing Tinker to everyone, and now with vision input! You can now finetune a frontier Qwen3-VL-235B on your own image+text data, bringing your own algorithm (sft, RL, something else?). We'll take care of the GPU infra. Full update: thinkingmachines.ai/blog/tinker-ge…
English
35
112
1.2K
135.6K
Rowan Zellers retweetledi