Jim Bohnslav

7.1K posts

Jim Bohnslav banner
Jim Bohnslav

Jim Bohnslav

@jbohnslav

training VLMs @zoox

Boston, MA Katılım Şubat 2011
4.3K Takip Edilen2.2K Takipçiler
Jim Bohnslav
Jim Bohnslav@jbohnslav·
@vikhyatk open chatgpt. "create an image that looks like pen and paper..."
English
0
0
1
419
vik
vik@vikhyatk·
ML interview question: Here are the weights for Llama 3.1 70B. Generate a token by executing the forward pass manually using pen and paper. You have 30 minutes.
English
55
24
1.4K
115.6K
Jim Bohnslav retweetledi
Shenzhi Wang🌟
Shenzhi Wang🌟@ShenzhiWang_THU·
When training Qwen3.5, we kept asking ourselves: 🧐What kind of multimodal RLVR data actually leads to generalizable gains? 💡We believe the answer may not lie only in data tightly tailored to specific benchmarks, but also in OOD proxy tasks that train the foundational abilities behind long-chain visual reasoning. The motivation is simple: VLMs are still unreliable in long-CoT settings. Small mistakes in perception, reasoning, knowledge use, or grounding can compound across intermediate steps and eventually lead to much larger final errors. However, much of today’s RLVR data still does not require complex reasoning chains grounded in visual evidence throughout, meaning these failure modes are often not sufficiently stressed during training. 🚀Excited to share our new work from Qwen and Tsinghua LeapLab: HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning This is also one of the training task sources used in Qwen3.5 VL RLVR. To study this question, we propose HopChain, a scalable framework for synthesizing multi-hop vision-language reasoning data for RLVR training. The key idea is to build each query as a chain of logically dependent hops: earlier hops establish the instances, sets, or conditions needed for later hops, while the model must repeatedly return to the image for fresh visual grounding along the way. At the same time, each query ends with a specific, unambiguous numerical answer, making it naturally suitable for verifiable rewards. Concretely, HopChain combines two complementary structures: perception-level hops and instance-chain hops. We require each synthesized example to involve both, so the model cannot simply continue reasoning from language inertia. Instead, it is forced to keep grounding intermediate steps in the image, maintain cross-step dependencies, and control error accumulation across long reasoning trajectories. Our goal is not to mimic any specific downstream benchmark, but to strengthen the more fundamental abilities that long-CoT vision-language reasoning depends on. We add HopChain-synthesized data into RLVR training for Qwen3.5-35B-A3B and Qwen3.5-397B-A17B, and evaluate on 24 benchmarks spanning diverse domains. Despite not being designed for any particular benchmark, HopChain improves 20 out of 24 benchmarks on both models, indicating broad and generalizable gains. We also find that full chained multi-hop queries are crucial: replacing them with half-multi-hop or single-hop variants reduces performance substantially. Most notably, the gains are especially strong on long-CoT and ultra-long-CoT vision-language reasoning, peaking at more than 50 accuracy points in the ultra-long-CoT regime. Our main takeaway is simple: beyond benchmark-aligned data, OOD proxy tasks that systematically train the core mechanics of long-chain visual reasoning can be a powerful and scalable source of RLVR supervision for VLMs — and can lead to more generalizableimprovements. 🔗 huggingface.co/papers/2603.17…
Shenzhi Wang🌟 tweet mediaShenzhi Wang🌟 tweet mediaShenzhi Wang🌟 tweet media
English
2
55
434
57.6K
Soumith Chintala
Soumith Chintala@soumithchintala·
@dphuang2 not AI, the book was actually made by Hatty Wang at Harvard. here's a few pages....
Soumith Chintala tweet mediaSoumith Chintala tweet mediaSoumith Chintala tweet mediaSoumith Chintala tweet media
English
4
3
98
2.6K
Soumith Chintala
Soumith Chintala@soumithchintala·
someone's getting started early!
Soumith Chintala tweet media
English
73
169
3.6K
105.4K
Lotto
Lotto@LottoLabs·
I like my models small, chinese, dense and not thinking.
English
13
12
214
16.3K
json
json@JsonBasedman·
Codex writes good code but my God GPT-5.4 is a chore to talk to
English
57
6
552
62.5K
Jim Bohnslav
Jim Bohnslav@jbohnslav·
@paradite_ it feels different the past few days. can't even use a CLI that smart opus created last week
English
0
0
5
1.6K
Zhu Liang
Zhu Liang@paradite_·
Opus 4.6 is literally broken right now. Examples: - Ask to re-run pass@3, it proceed to run pass@1. - Ask to check recent commits, missed the most recent commit. - Ask why it did something, it apologizes and executes commands instead of just giving an answer.
English
69
11
517
79.6K
Jim Bohnslav
Jim Bohnslav@jbohnslav·
I wonder what it feels like for GPT5.3 to read something GPT5.4 wrote. So similar, and yet smarter. Like meeting your higher-achieving long-lost twin.
English
1
0
1
267
Tenobrus
Tenobrus@tenobrus·
unfortunately ghostty remains completely useless to me until it supports tmux -CC. somehow, across literally all platforms, iterm 2 remains literally the only emulator to build complete support for this, so i'm absolutely locked in
Mitchell Hashimoto@mitchellh

Ghostty 1.3 is now out! Scrollback search, native scrollbars, click-to-move cursor, rich clipboard copy, AppleScript, split drag/drop, Unicode 17 and international text improvements, massive performance improvements, and hundreds more changes. ghostty.org/docs/install/r…

English
22
1
205
74.7K
Jim Bohnslav retweetledi
CLS
CLS@ChengleiSi·
Great to see autoresearch blowing up becoz of the legendary Karpathy sensei. This year will ofc be an exciting year for automated AI research. For all of you guys excited to jump onto it, hopefully our papers will be some helpful references: - automated feedback loop for research agents to optimize LLM pre-training and post-training stacks: x.com/ChengleiSi/sta… - generating novel research ideas with LLMs, along with a comparison against human experts: x.com/ChengleiSi/sta… - evaluating the effectiveness of LLM-generated ideas through experiment execution: x.com/ChengleiSi/sta… - finetuning LLMs to directly predict the effectiveness of research ideas: x.com/jiaxinwen22/st…
Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English
9
27
342
49.8K
Jim Bohnslav
Jim Bohnslav@jbohnslav·
@ericliuof97 home robotics is in its exuberant era, like self driving in 2015. soon people will realize that Figure falling down your stairs will crush your toddler to death. then it'll take 10 more years to get to the safety level of eg Waymo (and Zoox!)
English
0
0
1
33
Crystal
Crystal@crystalsssup·
what?! @sainingxie is joining Yann LeCun's new lab AMI Labs, as a cofounder and CSO
Crystal tweet media
English
20
15
491
40.1K
Brian Li
Brian Li@Brian_Bo_Li·
Working with goats, Saining, Lecun, and more names that I've long dreamed of.
Saining Xie@sainingxie

i’m joining forces with @ylecun and an incredible group of people to start AMI Labs @amilabs. AMI isn’t a conventional lab. we don’t intend to become one. a lot to say about why this moment matters, but for now we’re heads down building. join us: amilabs.xyz

English
3
1
128
9.7K
Jim Bohnslav
Jim Bohnslav@jbohnslav·
@vikhyatk p(state_{t+1} | state_t, action). the action conditioning makes it different from e.g. generic text to video models
English
0
0
1
146
vik
vik@vikhyatk·
not sure what a world model is and at this point i’m too afraid to ask
English
35
1
203
14.7K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
I had a funny/cute back-and-forth with Claude last week at work, where I had two trainings running, and one was much slower then the other, but there was no good reason for it. So I asked claude for help, but claude could not access either system, so I let Claude use me as a tool, it told me what commands to run on both systems, and I gave it the command's outputs. It went something like this: C: Aha! I found the issue, it is [blabla] Me: What can I run on both systems to conclusively confirm or deny this hypothesis? C: run [commands] Me: here's the output of [commands]: [outputs] C: This changes everything! Now the picture is crystal clear! The real issue is [blabla] Me: What can I run on both systems to conclusively confirm or deny this hypothesis? C: run [commands] Me: here's the output of [commands]: [outputs] C: These results are enlightening! I was completely wrong, but now I am certain the issue is [blabla] Me: What can I run on both systems to conclusively confirm or deny this hypothesis? C: run [commands] Me: here's the output of [commands]: [outputs] C: Wow! Thank you! These results are extremely helpful. I was wrong. But also, now everything is clear! The issue is [blabla] and so on and so on lol it was pretty endearing, if it wasn't a friday afternoon blocking me from running the big weekend run😅
English
18
3
167
18.6K
Ross Wightman
Ross Wightman@wightmanr·
Time flies. After almost 4 years at @huggingface , I’m moving on. A major part of that chapter was timm, which I sold to the company and continued to build. For anyone relying on it, I’ve agreed to collaborate on bug fixes and basic maintenance, but new feature development will likely cease. It was a meaningful chapter, and I’m thankful for the opportunity to grow timm over that time. AI is moving incredibly fast, and I’m excited to focus on new ideas and opportunities that feel like the right fit for this moment. There will be significant decisions for me ahead. I look forward to more of the serendipitous collaborations (e.g. OpenCLIP, ResNet Strikes Back, HTTY ViT) that I’ve enjoyed in the past. I’m currently working on a long overdue OpenCLIP refactoring that I hope will be useful for all and make it easier to add new model + objective combinations.
English
36
20
445
28K
Jim Bohnslav retweetledi
Ai2
Ai2@allen_ai·
📢 Update: the Molmo 2 codebase is now open source. We're releasing the code behind Molmo 2—our open model family for video & image understanding, pointing, tracking, & more. Now you can easily train Molmo 2 on your own data. 🧵
Ai2 tweet media
English
6
51
364
31K
Jim Bohnslav retweetledi
Ted Zadouri
Ted Zadouri@tedzadouri·
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Ted Zadouri tweet media
English
7
132
781
221.7K
Jim Bohnslav
Jim Bohnslav@jbohnslav·
@uwukko maybe claire will be better at writing rust
English
0
0
14
1.3K