Rupesh Srivastava

1.1K posts

Rupesh Srivastava

Rupesh Srivastava

@rupspace

Fully open LLM frontiers @MBZUAI IFM Silicon Valley. Previously (co)developed Highway Networks, Upside-Down RL, Bayesian Flow Networks, EvoTorch.

Santa Cruz, CA Katılım Eylül 2014
770 Takip Edilen2.7K Takipçiler
Sabitlenmiş Tweet
Rupesh Srivastava
Rupesh Srivastava@rupspace·
Update: new gig, and I'm hiring! I recently joined the Institute of Foundation Models in the SF Bay Area! Our goal is to train large-scale FULLY open-source LLMs at and beyond the frontier, from scratch, with open science, open data and open checkpoints. We are hiring across the training stack. Further, I'm building a new team to advance open agentic LLMs, and hiring researchers/engineers on-site. Send me a DM or email if you are interested! I'll also be at #NeurIPS2025 in San Diego this week to talk to potential candidates for internships and FT positions.
Rupesh Srivastava tweet media
English
16
21
224
36.6K
Rupesh Srivastava retweetledi
Mingkai Deng
Mingkai Deng@mdeng34·
Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.
Mingkai Deng tweet media
English
3
39
214
38.2K
Taylor W. Killian
Taylor W. Killian@tw_killian·
📣 There's never a "best" time to share important updates, especially after sitting on this for so long... I'm joining the faculty @BYU + @BYUCS this Summer as an Assistant Professor in preparation for the upcoming school year. Lots of excitement and a fair bit of nerves. 🧵
Taylor W. Killian tweet media
English
44
11
173
18K
Rupesh Srivastava
Rupesh Srivastava@rupspace·
@agarwl_ @BlackHC That idea came from von Malsburg. Hinton and Plaut even cited him in the paper for this, but his influence is sadly forgotten.
English
0
0
2
34
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
@BlackHC For me, the inspiration was mostly two timescales of learning and the fact that not everything has to go to network weights. Rest is vibes.
English
1
0
5
267
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…
Rishabh Agarwal tweet media
English
18
73
565
69.3K
Rupesh Srivastava retweetledi
Jeff Clune
Jeff Clune@jeffclune·
Thrilled to share that we founded Recursive to create AI that safely conducts experiments on how to improve itself in an open-ended process of endless, automated scientific discovery. As I wrote in my 2019 AI-generating algorithms paper, this will likely be the fastest path to superintelligence. Our work since has shown the power of this approach. Excited to scale up and improve upon ideas like the Darwin Gödel Machine, HyperAgents, ADAS, OMNI, ALMA, The AI Scientist, PromptBreeder, Rainbow Teaming, Automated Capability Discovery, and other work on open-ended and AI-generating algorithms. We’ve assembled a dream team of researchers and significant resources to pursue this vision. My amazing co-founders are pictured here, and we have an all-star team of founding members (we’re over 25 and growing). Please join us if you are interested! Follow our progress @Recursive_SI
Jeff Clune tweet media
English
49
44
609
115.6K
Rupesh Srivastava retweetledi
Loren Lugosch
Loren Lugosch@lorenlugosch·
In this paper, we ask: 𝘏𝘰𝘸 𝘤𝘢𝘯 𝘸𝘦 𝘤𝘭𝘶𝘮𝘴𝘪𝘭𝘺 𝘳𝘦𝘧𝘰𝘳𝘮𝘶𝘭𝘢𝘵𝘦 𝘵𝘩𝘦 𝘤𝘢𝘱𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘸𝘦 𝘪𝘮𝘱𝘭𝘦𝘮𝘦𝘯𝘵𝘦𝘥 𝘪𝘯 𝘵𝘩𝘦 𝘧𝘰𝘳𝘮 𝘰𝘧 𝘢 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯?
English
0
1
14
2.1K
Rupesh Srivastava
Rupesh Srivastava@rupspace·
@finbarrtimbers I think this is likely a difference of scale mainly. If there's enough filtered data to train on, then use that. If there's limited data, train on all.
English
0
0
1
226
finbarr
finbarr@finbarrtimbers·
An interesting gap in the literature is that the large open weights labs (DeepSeek, Zhipu) do correctness filtering for their SFT data, but there's a bunch of results from smaller labs (OpenThoughts, for one) that claim you should also include incorrect responses in SFT.
English
7
1
61
8.1K
Rupesh Srivastava retweetledi
Shibo Hao
Shibo Hao@Ber18791531·
🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench
Shibo Hao@Ber18791531

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

English
2
34
79
11.3K
Rupesh Srivastava retweetledi
Institute of Foundation Models
A visually convincing rollout is not the same thing as a useful world model. WR-Arena is built to test the harder question: can a model simulate futures well enough to support action, planning, and reasoning? That’s the shift from simple next-state prediction to realistic world simulation grounded in real-world utility. Paper + code are live. t.co/waRc0MJmwP t.co/ZzN76nOwoI #AI #WorldModels #Benchmarking #EmbodiedIntelligence #PhysicalAI #MachineLearning
English
0
10
46
5.1K
Grad
Grad@Grad62304977·
@kalomaze So elegant esp when u mix with DSA (sideways MoE)
English
2
0
20
1.2K
Rupesh Srivastava retweetledi
Alex Shaw
Alex Shaw@alexgshaw·
The Harbor registry is getting an upgrade. Now, anyone can publish to the registry to make their dataset available to every Harbor user:
Alex Shaw tweet media
English
4
5
38
4.8K
Rupesh Srivastava retweetledi
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Yes and no. Very often it turns out that what you think solves the problem is not what actually solves it, and this you only find out by not moving on, but making sure you have experiments that back up the *exact* statement you make removing all reasonable confounders. And that, you get from one of: - public review - extremely strict colleagues - insane self discipline
English
1
6
166
9.1K
Rupesh Srivastava retweetledi
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
47
261
1.7K
253.5K
Rupesh Srivastava retweetledi
Subham Sahoo
Subham Sahoo@ssahoo_·
📢@CVPR 2026: first-ever tutorial dedicated to DISCRETE DIFFUSION 🔥 Part I: Consistency Models + Flow Maps - @JCJesseLai Part II: Discrete Diffusion - by me. ✨Few-step gen + inference-time scaling + live demos Co-orgs: @StefanoErmon @DrYangSong @mittu1204 @gimdong58085414 Full schedule + details👇 (1/3)
Subham Sahoo tweet media
English
5
41
327
21.1K
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
745
33.2K
Rupesh Srivastava retweetledi
Wonmin Byeon
Wonmin Byeon@wonmin_byeon·
🚀 New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. ✅ Only 25% of visual tokens 🚀 3.8–4.2× faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning)
Wonmin Byeon tweet media
English
3
24
218
14.1K