Rupesh Srivastava

1.1K posts

Rupesh Srivastava

Rupesh Srivastava

@rupspace

Fully open LLM frontiers @MBZUAI IFM Silicon Valley. Previously (co)developed Highway Networks, Upside-Down RL, Bayesian Flow Networks, EvoTorch.

Santa Cruz, CA Bergabung Eylül 2014
768 Mengikuti2.7K Pengikut
Tweet Disematkan
Rupesh Srivastava
Rupesh Srivastava@rupspace·
Update: new gig, and I'm hiring! I recently joined the Institute of Foundation Models in the SF Bay Area! Our goal is to train large-scale FULLY open-source LLMs at and beyond the frontier, from scratch, with open science, open data and open checkpoints. We are hiring across the training stack. Further, I'm building a new team to advance open agentic LLMs, and hiring researchers/engineers on-site. Send me a DM or email if you are interested! I'll also be at #NeurIPS2025 in San Diego this week to talk to potential candidates for internships and FT positions.
Rupesh Srivastava tweet media
English
16
21
224
36.5K
Grad
Grad@Grad62304977·
@kalomaze So elegant esp when u mix with DSA (sideways MoE)
English
2
0
20
1.1K
kalomaze
kalomaze@kalomaze·
i feel like the concept of MoE is pretty simple (activate some subnetwork via gating mechanism at each layer) and is only hard to deal with for "making things go fast on GPUs can be hard" reasons, and i feel those reasons are unrelated to elegance in the conceptual sense
Arthur Zucker@art_zucker

The main reason I don't like MoEs is just philosophical, I'm a big ockham's razor believer and no one computed the actual brain/money cost of all in moe...

English
12
0
107
11K
Rupesh Srivastava me-retweet
Alex Shaw
Alex Shaw@alexgshaw·
The Harbor registry is getting an upgrade. Now, anyone can publish to the registry to make their dataset available to every Harbor user:
Alex Shaw tweet media
English
4
5
38
4.4K
Rupesh Srivastava me-retweet
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Yes and no. Very often it turns out that what you think solves the problem is not what actually solves it, and this you only find out by not moving on, but making sure you have experiments that back up the *exact* statement you make removing all reasonable confounders. And that, you get from one of: - public review - extremely strict colleagues - insane self discipline
English
1
6
167
9K
Rupesh Srivastava me-retweet
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
47
261
1.7K
245.5K
Rupesh Srivastava me-retweet
Subham Sahoo
Subham Sahoo@ssahoo_·
📢@CVPR 2026: first-ever tutorial dedicated to DISCRETE DIFFUSION 🔥 Part I: Consistency Models + Flow Maps - @JCJesseLai Part II: Discrete Diffusion - by me. ✨Few-step gen + inference-time scaling + live demos Co-orgs: @StefanoErmon @DrYangSong @mittu1204 @gimdong58085414 Full schedule + details👇 (1/3)
Subham Sahoo tweet media
English
5
42
328
20.6K
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
745
32.9K
Rupesh Srivastava me-retweet
Wonmin Byeon
Wonmin Byeon@wonmin_byeon·
🚀 New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. ✅ Only 25% of visual tokens 🚀 3.8–4.2× faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning)
Wonmin Byeon tweet media
English
3
24
217
13.9K
Rupesh Srivastava
Rupesh Srivastava@rupspace·
Exhibits from GPT-5.2-Pro trying to understand a coding agent harness. The final answer was impeccable btw.
Rupesh Srivastava tweet mediaRupesh Srivastava tweet media
English
0
0
1
461
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
None of these models have been trained specifically on AIME2026. This is just the state of RL now, there's been a step change in a year. Contamination? What contamination? Contamination with generalizable heuristics, perhaps. And yes, I recommend you to try out Step 3.5 again.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Jasper Dekoninck@j_dekoninck

Link to MathArena: matharena.ai/?view=problem Link to HF: huggingface.co/datasets/MathA…

English
8
12
158
38.5K
Rupesh Srivastava
Rupesh Srivastava@rupspace·
@giffmana I feel like after the recent "juice" changes the Thinking model tends to skip thinking more often.
English
0
0
1
68
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@rupspace It feels like it didn't, at least not enough. I had it on standard, on extended it gets it.
English
1
0
1
479
Rupesh Srivastava me-retweet
Max Jaderberg
Max Jaderberg@maxjaderberg·
We give a glimpse at some of the capabilities of IsoDDE: - predicting novel biomolecular structures with 2-3x the accuracy of previous methods (including our own!) - the ability to predict binding affinity, one of the holy grail quantities of rational drug design, better even than physics simulations - the ability to highlight and uncover new pockets that had not previously been discovered 2/7
English
2
6
113
13K
Rupesh Srivastava me-retweet
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
As per my recurring rants:
Lucas Beyer (bl16) tweet media
English
15
26
476
40.4K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
We want to make this a lot smoother for the public version. Something that is fully integrated. It should work for individual people with a remote machine and also large enterprise. We're taking a bit of time to do this the right way.
English
6
0
46
6.6K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
On SSH/remote/boxes: We're working on this! Quick notes: 👇
English
11
3
119
18.8K