Sam Stevens

336 posts

Sam Stevens

@iamsamstevens

PhD focusing on AI-accelerated scientific discovery; seeking full-time research roles.

Katılım Ağustos 2022

326 Takip Edilen348 Takipçiler

Sam Stevens retweetledi

Hanane Nour Moussa@HananeNMoussa·3d

Gym environments have played a key role in advancing LMs and agents for general coding tasks. But how do we build them for scientific coding? Introducing D3-Gym, the first automatically constructed dataset of verifiable environments for data-driven scientific discovery. 🧵

English

10.3K

Sam Stevens retweetledi

Jaylen Jones@Jaylen_JonesNLP·27 Nis

At #ICLR2026, many highlighted the gap between agent performance in evaluations and the reliability needed for real-world deployment readiness. Incidents like this show that gap playing out in practice. Our recent study provides AutoElicit as an automatic method to proactively surface and analyze such unintended behaviors before they reach production systems. We find that computer-use agents don't always adhere to core safety principles, leading to severe consequences as a result. Reliability for computer-use agents isn't optional - it's essential for trustworthy deployment.

JER@lifeof_jer

x.com/i/article/2048…

English

1.8K

Sam Stevens@iamsamstevens·25 Nis

@natolambert this is hard distillation? like data gen + next token prediction? or soft logit knowledge distillation?

English

Nathan Lambert@natolambert·25 Nis

A surprising fact is that open research often distills from open models (likely largely bc of a mix of more control and even cheaper)

English

Nathan Lambert@natolambert·25 Nis

Would be good to fund people openly studying relative difference in performance and cost of post-training for SOTA performance with & without distillation. Fully open labs rely on distillation because buying data tends to be too expensive. More data co’s should do partnerships

English

10.6K

Sam Stevens retweetledi

Yu Su@ysu_nlp·4 Nis

Thanks @NVIDIAAI for featuring our work. Led by amazing students @vimar_gu @iamsamstevens at @osunlp

NVIDIA AI Developer@NVIDIAAIDev

AI is helping scientists see nature in entirely new ways. 🔍 In collaboration with @OhioState, BioCLIP2 runs on NVIDIA accelerated computing to identify over a million species and reveal hidden patterns that support conservation and ecosystem health worldwide. 👉 nvda.ws/4v1RK5p

English

8.1K

Sam Stevens retweetledi

NVIDIA AI Developer@NVIDIAAIDev·1 Nis

English

11K

Sam Stevens@iamsamstevens·24 Nis

Excited to present some more SAE work, this time applied to insects and arachnids! Come talk about bugs!

Vardaan Pahuja@vardaanpahuja

I won’t be at ICLR, but my amazing co-author @iamsamstevens will be presenting our paper on Friday, April 24, in Pavilion 4 (9:30 am). We introduce a trait-annotation pipeline that leverages sparse autoencoders and multimodal language models to generate grounded, interpretable morphological trait descriptions from ecological images. Paper: arxiv.org/pdf/2604.01619 Website: osu-nlp-group.github.io/sae-trait-anno… #ICLR2026

English

102

Sam Stevens@iamsamstevens·9 Mar

@torchcompiled cool work! i'd love to see a comparison on modded-nanogpt since undertuned baselines are such a persistent problem in architecture papers. it's not anything in particular against this work; it’s just a hard evaluation issue, especially for smaller teams

English

373

Ethan@torchcompiled·9 Mar

Have you ever gotten tired of boring plain linear layers and wanted a more complex function? We find that attaching low rank nonlinear residual functions can significantly accelerate pretraining, with an identified variant, CosNet, consistently observing 20+% wallclock speedup!

English

259

38.7K

Sam Stevens@iamsamstevens·20 Şub

@vikhyatk makes sense. +1 to maturin, it kicked ass for some audio dataloaders github.com/samuelstevens/… but I never got it to statically link with ffmpeg. will check it out.

English

vik@vikhyatk·20 Şub

@iamsamstevens normally i'd love to open source everything, but our thesis is that people would pay for really fast inference, so we have to keep it proprietary unfortunately re: packaging/static linking maturin is phenomenal and basically solves it

English

vik@vikhyatk·18 Şub

Moondream's inference engine got so fast image decoding became a bottleneck. So we shipped a SIMD image decoding library that's faster than all the Python options I know of. Plus it's statically linked so not a pain in the ass to install.

English

380

32.8K

Sam Stevens@iamsamstevens·18 Şub

@vikhyatk to be clear, I appreciate you sharing the lib at all. just looking to grow/learn from great code

English

Sam Stevens@iamsamstevens·18 Şub

@vikhyatk can you publish/share any details or resources/guides on SIMD in Rust + static linking? might be useful to the community for lots of efficient dataloaders

English

Sam Stevens@iamsamstevens·18 Şub

@vikhyatk doesn't seem to have kestrel-native source, just references it as dep. is it proprietary moondream rust code?

English

185

vik@vikhyatk·18 Şub

@iamsamstevens github.com/m87-labs/kestr…

QME

809

Sam Stevens@iamsamstevens·16 Şub

@khoomeik isn't your brain constantly running JEPA? we don't actually predict the next token, we predict the embedding of the next token (but your point about loss masking is clear)

English

209

Rohan Pandey@khoomeik·16 Şub

your brain is constantly running next token prediction on the world, rewiring itself to predict the content you consume training with diverse contexts is great, but it's critical to learn when to set certain token spans to loss_mask=0

English

5.1K

Sam Stevens@iamsamstevens·11 Şub

@nrehiew_ this was a great thread btw, I develop some more intuition about both flow matching and drifting

English

Sam Stevens@iamsamstevens·11 Şub

@nrehiew_ > pixel space l2 dist doesn't relate to similarity & is a poor measure of img quality (blurry imgs have low l2 dist) I do discriminative img research and not img gen but this was surprising to me. how did you build up this belief? specific papers, blogs, models?

English

wh@nrehiew_·11 Şub

Thread on Drifting Models, the new generative modelling paradigm that everyone has been talking about from Kaiming He's team. (with additional plots from my own experiments)

English

262

19.8K

Sam Stevens@iamsamstevens·10 Şub

@xeophon my personal cua evaluation is whether they can fill out my concur receipts for reimbursements

English

Florian Brand@xeophon·10 Şub

@iamsamstevens BLESSED to not live in salesforce 🙏

English

Florian Brand@xeophon·10 Şub

the more you use CC/Codex to research, the more (CPU) compute you need, the more storage you need, the better network you need. truly insane.

English

2.4K

Sam Stevens@iamsamstevens·10 Şub

@xeophon true but if you're bottlenecked by running a headless chrome instance because concur/salesforce only has a web ui and not an http api...

English

Florian Brand@xeophon·10 Şub

@iamsamstevens yeah but there are a trillion more python sandboxes i can spin up before your ubuntu vm boots up :P

English

Sam Stevens@iamsamstevens·10 Şub

@xeophon I don't do llm research so I have no intelligent takes here BUT every time someone says llms won't be able to do X someone makes them do X (math proofs, arc-agi, scientific discovery, etc)

English

Florian Brand@xeophon·10 Şub

@iamsamstevens (computer use is ngmi but thats another story)

English

Sam Stevens@iamsamstevens·10 Şub

@xeophon computer use agents are gonna have to run dozens of instances of chrome 😭

English

Sam Stevens@iamsamstevens·10 Şub

@xeophon yeah but just writing code, running browsers, making excel spreadsheets, rendering markdown to html, etc will all need cpu cycles

English

Keşfet

@natolambert @NVIDIAAI @vimar_gu @osunlp @OhioState @torchcompiled @vikhyatk @khoomeik