Rupesh Srivastava

1.1K posts

Rupesh Srivastava

Rupesh Srivastava

@rupspace

Fully open LLM frontiers @MBZUAI IFM Silicon Valley. Previously (co)developed Highway Networks, Upside-Down RL, Bayesian Flow Networks, EvoTorch.

Santa Cruz, CA 参加日 Eylül 2014
768 フォロー中2.7K フォロワー
固定されたツイート
Rupesh Srivastava
Rupesh Srivastava@rupspace·
Update: new gig, and I'm hiring! I recently joined the Institute of Foundation Models in the SF Bay Area! Our goal is to train large-scale FULLY open-source LLMs at and beyond the frontier, from scratch, with open science, open data and open checkpoints. We are hiring across the training stack. Further, I'm building a new team to advance open agentic LLMs, and hiring researchers/engineers on-site. Send me a DM or email if you are interested! I'll also be at #NeurIPS2025 in San Diego this week to talk to potential candidates for internships and FT positions.
Rupesh Srivastava tweet media
English
16
21
224
36.5K
Grad
Grad@Grad62304977·
@kalomaze So elegant esp when u mix with DSA (sideways MoE)
English
2
0
20
1.1K
Rupesh Srivastava がリツイート
Alex Shaw
Alex Shaw@alexgshaw·
The Harbor registry is getting an upgrade. Now, anyone can publish to the registry to make their dataset available to every Harbor user:
Alex Shaw tweet media
English
4
5
38
4.4K
Rupesh Srivastava がリツイート
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Yes and no. Very often it turns out that what you think solves the problem is not what actually solves it, and this you only find out by not moving on, but making sure you have experiments that back up the *exact* statement you make removing all reasonable confounders. And that, you get from one of: - public review - extremely strict colleagues - insane self discipline
English
1
6
167
9K
Rupesh Srivastava がリツイート
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
47
261
1.7K
245.4K
Rupesh Srivastava がリツイート
Subham Sahoo
Subham Sahoo@ssahoo_·
📢@CVPR 2026: first-ever tutorial dedicated to DISCRETE DIFFUSION 🔥 Part I: Consistency Models + Flow Maps - @JCJesseLai Part II: Discrete Diffusion - by me. ✨Few-step gen + inference-time scaling + live demos Co-orgs: @StefanoErmon @DrYangSong @mittu1204 @gimdong58085414 Full schedule + details👇 (1/3)
Subham Sahoo tweet media
English
5
42
328
20.6K
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
745
32.9K
Rupesh Srivastava がリツイート
Wonmin Byeon
Wonmin Byeon@wonmin_byeon·
🚀 New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. ✅ Only 25% of visual tokens 🚀 3.8–4.2× faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning)
Wonmin Byeon tweet media
English
3
24
217
13.9K
Rupesh Srivastava
Rupesh Srivastava@rupspace·
Exhibits from GPT-5.2-Pro trying to understand a coding agent harness. The final answer was impeccable btw.
Rupesh Srivastava tweet mediaRupesh Srivastava tweet media
English
0
0
1
461
Rupesh Srivastava
Rupesh Srivastava@rupspace·
@giffmana I feel like after the recent "juice" changes the Thinking model tends to skip thinking more often.
English
0
0
1
68
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@rupspace It feels like it didn't, at least not enough. I had it on standard, on extended it gets it.
English
1
0
1
479
Rupesh Srivastava がリツイート
Max Jaderberg
Max Jaderberg@maxjaderberg·
We give a glimpse at some of the capabilities of IsoDDE: - predicting novel biomolecular structures with 2-3x the accuracy of previous methods (including our own!) - the ability to predict binding affinity, one of the holy grail quantities of rational drug design, better even than physics simulations - the ability to highlight and uncover new pockets that had not previously been discovered 2/7
English
2
6
113
13K
Rupesh Srivastava がリツイート
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
As per my recurring rants:
Lucas Beyer (bl16) tweet media
English
15
26
476
40.4K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
We want to make this a lot smoother for the public version. Something that is fully integrated. It should work for individual people with a remote machine and also large enterprise. We're taking a bit of time to do this the right way.
English
6
0
46
6.6K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
On SSH/remote/boxes: We're working on this! Quick notes: 👇
English
11
3
119
18.8K