Sam Stevens

336 posts

Sam Stevens

Sam Stevens

@iamsamstevens

PhD focusing on AI-accelerated scientific discovery; seeking full-time research roles.

Katılım Ağustos 2022
326 Takip Edilen348 Takipçiler
Sam Stevens retweetledi
Hanane Nour Moussa
Hanane Nour Moussa@HananeNMoussa·
Gym environments have played a key role in advancing LMs and agents for general coding tasks. But how do we build them for scientific coding? Introducing D3-Gym, the first automatically constructed dataset of verifiable environments for data-driven scientific discovery. 🧵
Hanane Nour Moussa tweet media
English
2
30
88
10.3K
Sam Stevens retweetledi
Jaylen Jones
Jaylen Jones@Jaylen_JonesNLP·
At #ICLR2026, many highlighted the gap between agent performance in evaluations and the reliability needed for real-world deployment readiness. Incidents like this show that gap playing out in practice. Our recent study provides AutoElicit as an automatic method to proactively surface and analyze such unintended behaviors before they reach production systems. We find that computer-use agents don't always adhere to core safety principles, leading to severe consequences as a result. Reliability for computer-use agents isn't optional - it's essential for trustworthy deployment.
Jaylen Jones tweet media
JER@lifeof_jer

x.com/i/article/2048…

English
0
6
17
1.8K
Sam Stevens
Sam Stevens@iamsamstevens·
@natolambert this is hard distillation? like data gen + next token prediction? or soft logit knowledge distillation?
English
0
0
0
42
Nathan Lambert
Nathan Lambert@natolambert·
A surprising fact is that open research often distills from open models (likely largely bc of a mix of more control and even cheaper)
English
1
1
14
4K
Nathan Lambert
Nathan Lambert@natolambert·
Would be good to fund people openly studying relative difference in performance and cost of post-training for SOTA performance with & without distillation. Fully open labs rely on distillation because buying data tends to be too expensive. More data co’s should do partnerships
English
5
2
75
10.6K
Sam Stevens retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
AI is helping scientists see nature in entirely new ways. 🔍 In collaboration with @OhioState, BioCLIP2 runs on NVIDIA accelerated computing to identify over a million species and reveal hidden patterns that support conservation and ecosystem health worldwide. 👉 nvda.ws/4v1RK5p
NVIDIA AI Developer tweet media
English
2
11
56
11K
Sam Stevens
Sam Stevens@iamsamstevens·
Excited to present some more SAE work, this time applied to insects and arachnids! Come talk about bugs!
Vardaan Pahuja@vardaanpahuja

I won’t be at ICLR, but my amazing co-author @iamsamstevens will be presenting our paper on Friday, April 24, in Pavilion 4 (9:30 am). We introduce a trait-annotation pipeline that leverages sparse autoencoders and multimodal language models to generate grounded, interpretable morphological trait descriptions from ecological images. Paper: arxiv.org/pdf/2604.01619 Website: osu-nlp-group.github.io/sae-trait-anno… #ICLR2026

English
0
0
4
102
Sam Stevens
Sam Stevens@iamsamstevens·
@torchcompiled cool work! i'd love to see a comparison on modded-nanogpt since undertuned baselines are such a persistent problem in architecture papers. it's not anything in particular against this work; it’s just a hard evaluation issue, especially for smaller teams
English
0
0
1
373
Ethan
Ethan@torchcompiled·
Have you ever gotten tired of boring plain linear layers and wanted a more complex function? We find that attaching low rank nonlinear residual functions can significantly accelerate pretraining, with an identified variant, CosNet, consistently observing 20+% wallclock speedup!
Ethan tweet media
English
16
47
259
38.7K
vik
vik@vikhyatk·
@iamsamstevens normally i'd love to open source everything, but our thesis is that people would pay for really fast inference, so we have to keep it proprietary unfortunately re: packaging/static linking maturin is phenomenal and basically solves it
English
1
0
1
47
vik
vik@vikhyatk·
Moondream's inference engine got so fast image decoding became a bottleneck. So we shipped a SIMD image decoding library that's faster than all the Python options I know of. Plus it's statically linked so not a pain in the ass to install.
vik tweet media
English
17
17
380
32.8K
Sam Stevens
Sam Stevens@iamsamstevens·
@vikhyatk to be clear, I appreciate you sharing the lib at all. just looking to grow/learn from great code
English
1
0
1
71
Sam Stevens
Sam Stevens@iamsamstevens·
@vikhyatk can you publish/share any details or resources/guides on SIMD in Rust + static linking? might be useful to the community for lots of efficient dataloaders
English
1
0
1
77
Sam Stevens
Sam Stevens@iamsamstevens·
@vikhyatk doesn't seem to have kestrel-native source, just references it as dep. is it proprietary moondream rust code?
Sam Stevens tweet media
English
1
0
2
185
Sam Stevens
Sam Stevens@iamsamstevens·
@khoomeik isn't your brain constantly running JEPA? we don't actually predict the next token, we predict the embedding of the next token (but your point about loss masking is clear)
English
0
0
2
209
Rohan Pandey
Rohan Pandey@khoomeik·
your brain is constantly running next token prediction on the world, rewiring itself to predict the content you consume training with diverse contexts is great, but it's critical to learn when to set certain token spans to loss_mask=0
English
4
1
61
5.1K
Sam Stevens
Sam Stevens@iamsamstevens·
@nrehiew_ this was a great thread btw, I develop some more intuition about both flow matching and drifting
English
0
0
0
19
Sam Stevens
Sam Stevens@iamsamstevens·
@nrehiew_ > pixel space l2 dist doesn't relate to similarity & is a poor measure of img quality (blurry imgs have low l2 dist) I do discriminative img research and not img gen but this was surprising to me. how did you build up this belief? specific papers, blogs, models?
English
1
0
0
46
wh
wh@nrehiew_·
Thread on Drifting Models, the new generative modelling paradigm that everyone has been talking about from Kaiming He's team. (with additional plots from my own experiments)
wh tweet media
English
3
30
262
19.8K
Sam Stevens
Sam Stevens@iamsamstevens·
@xeophon my personal cua evaluation is whether they can fill out my concur receipts for reimbursements
English
0
0
1
13
Florian Brand
Florian Brand@xeophon·
the more you use CC/Codex to research, the more (CPU) compute you need, the more storage you need, the better network you need. truly insane.
English
3
0
43
2.4K
Sam Stevens
Sam Stevens@iamsamstevens·
@xeophon true but if you're bottlenecked by running a headless chrome instance because concur/salesforce only has a web ui and not an http api...
English
1
0
1
58
Florian Brand
Florian Brand@xeophon·
@iamsamstevens yeah but there are a trillion more python sandboxes i can spin up before your ubuntu vm boots up :P
English
1
0
1
32
Sam Stevens
Sam Stevens@iamsamstevens·
@xeophon I don't do llm research so I have no intelligent takes here BUT every time someone says llms won't be able to do X someone makes them do X (math proofs, arc-agi, scientific discovery, etc)
English
1
0
1
30
Sam Stevens
Sam Stevens@iamsamstevens·
@xeophon computer use agents are gonna have to run dozens of instances of chrome 😭
English
1
0
1
32
Sam Stevens
Sam Stevens@iamsamstevens·
@xeophon yeah but just writing code, running browsers, making excel spreadsheets, rendering markdown to html, etc will all need cpu cycles
English
1
0
1
32