Rupesh Srivastava

1.1K posts

Rupesh Srivastava

@rupspace

Fully open LLM frontiers @MBZUAI IFM Silicon Valley. Previously (co)developed Highway Networks, Upside-Down RL, Bayesian Flow Networks, EvoTorch.

Santa Cruz, CA Katılım Eylül 2014

770 Takip Edilen2.7K Takipçiler

Sabitlenmiş Tweet

Rupesh Srivastava@rupspace·1 Ara

Update: new gig, and I'm hiring! I recently joined the Institute of Foundation Models in the SF Bay Area! Our goal is to train large-scale FULLY open-source LLMs at and beyond the frontier, from scratch, with open science, open data and open checkpoints. We are hiring across the training stack. Further, I'm building a new team to advance open agentic LLMs, and hiring researchers/engineers on-site. Send me a DM or email if you are interested! I'll also be at #NeurIPS2025 in San Diego this week to talk to potential candidates for internships and FT positions.

English

224

36.6K

Rupesh Srivastava retweetledi

Mingkai Deng@mdeng34·1d

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

214

38.2K

Rupesh Srivastava@rupspace·16 May

@tw_killian @BYU @BYUCS

GIF

QME

Taylor W. Killian@tw_killian·15 May

@rupspace @BYU @BYUCS Wanna come with?!

English

Taylor W. Killian@tw_killian·14 May

📣 There's never a "best" time to share important updates, especially after sitting on this for so long... I'm joining the faculty @BYU + @BYUCS this Summer as an Assistant Professor in preparation for the upcoming school year. Lots of excitement and a fair bit of nerves. 🧵

English

173

18K

Rupesh Srivastava@rupspace·15 May

@agarwl_ @BlackHC That idea came from von Malsburg. Hinton and Plaut even cited him in the paper for this, but his influence is sadly forgotten.

English

Rishabh Agarwal@agarwl_·15 May

@BlackHC For me, the inspiration was mostly two timescales of learning and the fact that not everything has to go to network weights. Rest is vibes.

English

267

Rishabh Agarwal@agarwl_·15 May

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

565

69.3K

Rupesh Srivastava retweetledi

Jeff Clune@jeffclune·13 May

Thrilled to share that we founded Recursive to create AI that safely conducts experiments on how to improve itself in an open-ended process of endless, automated scientific discovery. As I wrote in my 2019 AI-generating algorithms paper, this will likely be the fastest path to superintelligence. Our work since has shown the power of this approach. Excited to scale up and improve upon ideas like the Darwin Gödel Machine, HyperAgents, ADAS, OMNI, ALMA, The AI Scientist, PromptBreeder, Rainbow Teaming, Automated Capability Discovery, and other work on open-ended and AI-generating algorithms. We’ve assembled a dream team of researchers and significant resources to pursue this vision. My amazing co-founders are pictured here, and we have an all-star team of founding members (we’re over 25 and growing). Please join us if you are interested! Follow our progress @Recursive_SI

English

609

115.6K

Rupesh Srivastava@rupspace·1 May

Did he just ... wow @fredagainagain1 thank you so much! youtube.com/watch?v=GiXKuk…

YouTube

English

175

Rupesh Srivastava@rupspace·25 Nis

Yes!

Susan Zhang@suchenzang

@charuman wasn't meant as sarcasm it's always nice to see a lab so confident/secure in their capabilities that they can openly publish all their struggles

QST

277

Rupesh Srivastava retweetledi

Loren Lugosch@lorenlugosch·22 Nis

In this paper, we ask: 𝘏𝘰𝘸 𝘤𝘢𝘯 𝘸𝘦 𝘤𝘭𝘶𝘮𝘴𝘪𝘭𝘺 𝘳𝘦𝘧𝘰𝘳𝘮𝘶𝘭𝘢𝘵𝘦 𝘵𝘩𝘦 𝘤𝘢𝘱𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘸𝘦 𝘪𝘮𝘱𝘭𝘦𝘮𝘦𝘯𝘵𝘦𝘥 𝘪𝘯 𝘵𝘩𝘦 𝘧𝘰𝘳𝘮 𝘰𝘧 𝘢 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯?

English

2.1K

Rupesh Srivastava@rupspace·16 Nis

@finbarrtimbers I think this is likely a difference of scale mainly. If there's enough filtered data to train on, then use that. If there's limited data, train on all.

English

226

finbarr@finbarrtimbers·16 Nis

An interesting gap in the literature is that the large open weights labs (DeepSeek, Zhipu) do correctness filtering for their SFT data, but there's a bunch of results from smaller labs (OpenThoughts, for one) that claim you should also include incorrect responses in SFT.

English

8.1K

Rupesh Srivastava retweetledi

Shibo Hao@Ber18791531·14 Nis

🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench

Shibo Hao@Ber18791531

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

English

11.3K

Rupesh Srivastava retweetledi

Institute of Foundation Models@IFM_MBZUAI·11 Nis

A visually convincing rollout is not the same thing as a useful world model. WR-Arena is built to test the harder question: can a model simulate futures well enough to support action, planning, and reasoning? That’s the shift from simple next-state prediction to realistic world simulation grounded in real-world utility. Paper + code are live. t.co/waRc0MJmwP t.co/ZzN76nOwoI #AI #WorldModels #Benchmarking #EmbodiedIntelligence #PhysicalAI #MachineLearning

English

5.1K

Rupesh Srivastava@rupspace·4 Nis

@Grad62304977 @kalomaze All networks are mixtures of experts, just gated at unit level :) arxiv.org/abs/1410.1165

English

Grad@Grad62304977·4 Nis

@kalomaze So elegant esp when u mix with DSA (sideways MoE)

English

1.2K

kalomaze@kalomaze·4 Nis

i feel like the concept of MoE is pretty simple (activate some subnetwork via gating mechanism at each layer) and is only hard to deal with for "making things go fast on GPUs can be hard" reasons, and i feel those reasons are unrelated to elegance in the conceptual sense

Arthur Zucker@art_zucker

The main reason I don't like MoEs is just philosophical, I'm a big ockham's razor believer and no one computed the actual brain/money cost of all in moe...

English

107

11.2K

Rupesh Srivastava retweetledi

Alex Shaw@alexgshaw·1 Nis

The Harbor registry is getting an upgrade. Now, anyone can publish to the registry to make their dataset available to every Harbor user:

English

4.8K

Rupesh Srivastava retweetledi

Institute of Foundation Models@IFM_MBZUAI·27 Mar

Back in beautiful New Haven this weekend for YHack. We’ll be there with K2 Think V2, a fully open-source reasoning system. Hackers! Dig into how it works: huggingface.co/LLM360/K2-Thin…

Institute of Foundation Models tweet media

English

597

Rupesh Srivastava retweetledi

Lucas Beyer (bl16)@giffmana·16 Mar

Yes and no. Very often it turns out that what you think solves the problem is not what actually solves it, and this you only find out by not moving on, but making sure you have experiments that back up the *exact* statement you make removing all reasonable confounders. And that, you get from one of: - public review - extremely strict colleagues - insane self discipline

English

166

9.1K

Rupesh Srivastava retweetledi

Seungwook Han@seungwookh·12 Mar

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)

English

261

1.7K

253.5K

Rupesh Srivastava retweetledi

Subham Sahoo@ssahoo_·11 Mar

📢@CVPR 2026: first-ever tutorial dedicated to DISCRETE DIFFUSION 🔥 Part I: Consistency Models + Flow Maps - @JCJesseLai Part II: Discrete Diffusion - by me. ✨Few-step gen + inference-time scaling + live demos Co-orgs: @StefanoErmon @DrYangSong @mittu1204 @gimdong58085414 Full schedule + details👇 (1/3)

English

327

21.1K

Rupesh Srivastava@rupspace·10 Mar

@kalomaze @teortaxesTex @kzkirie 👀

QME

kalomaze@kalomaze·10 Mar

@teortaxesTex MoEUT 2 electric boogaloo?

Français

859

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·10 Mar

Very cool. MoE with cross-layer expert sharing (+reuse), so vastly richer combinatorially than the normal case, but what's neat is it can be warm-started from normal MoE checkpoints. I'm surprised at the claim of benign training dynamics and routing.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Yilong Chen@Yichen4NLP

We introduce MoUE. A new MoE paradigm boosts base-model performance by up to 1.3 points from scratch and up to 4.2 points on average, without increasing either activated parameters or total parameters. The main idea is simple: a sufficiently wide MoE layer with recursive reuse can be treated as a strict generalization of standard MoE. arxiv.org/abs/2603.04971 huggingface.co/papers/2603.04… #MoE #LLM #MixtureOfExperts #SparseModels #ScalingLaws #Modularity #UniversalTransformers #RecursiveComputation #ContinualPretraining

English

10.8K

Rupesh Srivastava@rupspace·10 Mar

@eliebakouch Congrats on a great run!

English

elie@eliebakouch·9 Mar

today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)

English

116

745

33.2K

Rupesh Srivastava retweetledi

Wonmin Byeon@wonmin_byeon·4 Mar

🚀 New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. ✅ Only 25% of visual tokens 🚀 3.8–4.2× faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning)

English

218

14.1K

Keşfet

@tw_killian @BYU @BYUCS @agarwl_ @BlackHC @Recursive_SI @fredagainagain1 @finbarrtimbers