Sameer Reddy

286 posts

Sameer Reddy

@SameerReddy0

AI Research Engineer @ Predibase

NYC Katılım Eylül 2019

298 Takip Edilen47 Takipçiler

Sameer Reddy@SameerReddy0·29 Tem

@StasBekman Great work @StasBekman! Really enjoyed/learned a lot about practical optimization from your ALST blog.

English

199

Stas Bekman@StasBekman·29 Tem

Llama-8b 1.2M sequence length training is now possible on a 1x H200 gpu with ALST + FA3 + Liger-Kernel. That's 2.4x longer than with 1x H100. Ready to run recipes: #1-gpu" target="_blank" rel="nofollow noopener">github.com/snowflakedb/Ar… For how this is possible see: arxiv.org/abs/2506.13996 To build FA3: #flashattention-3-beta-release" target="_blank" rel="nofollow noopener">github.com/Dao-AILab/flas… If you run into any issues please open an Issue and tag me. Once FA3 supports B200 we should get more than 1.5M seqlen on 1x B200. Thanks to @tri_dao for making FA3 support int64 indexing, and the @liger_kernel team for the same!

English

215

16.4K

Sameer Reddy retweetledi

Dhruv@dhruvamin·3 Tem

our soham parekh story: - yes, we hired him. we're building an AI agent in SF. he was eng #5. - recommended by a recruiter, which lent legitimacy. - he was eager and crushed our in person pair programming onsite. i believe he's actually a good engineer. - some have said "this is the danger of credentials". we didn't care, we cared he could ship. - said catnip for founders like "i love what you're building" and "i just want to build 24/7". finally, someone to help carry the load - he gave references. I gave offer while waiting for responses for the first (and last) time. checked linkedin, github, open source commits, blog posts lightly. maybe should have gone deeper but startups need to offer fast to win - he accepted same evening. said he had an nyc trip planned, then would start. - he went dark the next week (strange) but texted on weekend excited for Monday - his first day at 9:30am he calls in sick (strange). said he'd onboard from home. gave an address to ship laptop. - i honestly thought maybe he just missed the flight back from NYC and is embarrassed. idk people's first days are weird when you're getting to know each other - the first red flag: address to ship the laptop to was an SF office building, not apartment (strange). i nearby for a doctor's appointment same day so checked lobby. it had an industrial and sync labs, a yc company. thought, huh maybe his friend works there to grab the laptop, but weird. - next day, soham calls to say too sick to work, going to sleep it off, but he was up to speed on the codebase from yesterday. we said cool, maybe we just push the start date a week to recover. he said he'd onboard throughout the week. - next day, by chance, my co-founder notices his Github profile has a ton of commits in other private repos middle of the night after saying too sick to work. that's when we noticed 1) he had yet to clone our repo 2) he had public commits to sync lab's documentation. - we called him up to ask, what's up, are you still working for sync? all good, just tell us if you are so we can move on. he denied and said he couldn't sleep so was playing with deepseek in his own repos. - my co-founder was ready to move on then (too much smoke). i thought what are the chances someone's actually trying to work at 2 in person SF startups at the same time. i get it if remote. or large company. but 2+ 9-9-7 startups?? no way. let's push his start date, give him a week to recover. - his first day in person day was killer. showed up on time, stayed late, shipped something significant on Day 1. breathed a momentary sigh of relief. - the next few days it all fell apart. called in sick again, but said well enough to work. told us he had just gotten diagnosed with a chronic condition and was really scared. medication had him up all hours. he was waiting on his o1 approval and thought at risk of losing that. wanted to support him and felt for him but whole team was losing trust. late communication, weird signals. - he then spent 2 days saying he was working on something from home we knew should have taken him 1/2 a day max. always almost ready, just testing something. - finally it started blocking the main thread. so my co-founder asked to take over his branch to get it done. almost nothing had been done. fine if too sick to work, but should have communicated that. so knew it was just straight up lie he was working on it. - co-founder decided to call it for performance / shadiness. but just to check, he went over to sync labs and asked "hey is soham here?" someone said "no, he's at home" as if he worked there - at that point, we pinged their founder to confirm he was employed there. - when we called soham up, he denied it to the end. said sync guys were just friends. either way, we were out. in an ironic twist of fate sync dropped an employee of the month video same day that featured none other than soham. - told him working 2+ places same time was breach of FTE contract so not going to process first payroll. no argument from him. he just dipped. - i just assumed young kid who made a mistake. a few months later a few other founders reached out about him. told them the story in private. reflections: - whole thing was a drain of 1 month of time, focus, and energy. only resources you have as a startup. so sucked. - it was embarrassing until yesterday when i realized how widespread. then i was pissed. then impressed. still not sure how he pulled it off for so long with in person startups with long hours, but appreciated the hustle. hope he had a good reason. feels like a stressful way to make money - i made my jokes yesterday, but the internet piles on so also shot soham a text yesterday. wish he'd have been straight up. wish him well. - he's a good eng so will probably be fine. biggest mistake was lying repeatedly which just kills any team's trust fast off the chest so moving on :)

English

198

281

5.2K

725K

Sameer Reddy retweetledi

Tomek Korbak@tomekkorbak·5 Haz

I reimplemented the bliss attractor eval from Claude 4 System Card. It's fascinating how LLMs reliably fall into attractor basins of their pet obsessions, how different these attractors across LLMs, and how they say something non-trivial about LLMs' personalities. 🌀🌀🌀

English

174

38.5K

Sameer Reddy@SameerReddy0·30 May

@StellaLisy @YiranWu18 Hmm but I think that the difference before/after RL is not the only thing to care about here. There are many suboptimal prompts that would give low base model acc, that RL could surmount through the reward signal. However the magnitude of the gain via RL claimed would be wrong.

English

158

Stella Li@StellaLisy·29 May

@YiranWu18 Thanks for raising this! Prompts used can make a big difference in numbers. We didn’t use any advanced prompt, just the basic ones (same prompt and acc as TTRL), and consistent across exps. The important message here, though, is the difference before/after RL and across models.

English

943

Stella Li@StellaLisy·27 May

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

English

344

1.8K

700K

Sameer Reddy retweetledi

Shashwat Goel@ShashwatGoel7·29 May

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

English

121

873

323.9K

Sameer Reddy@SameerReddy0·18 Nis

RT @predibase: Every extra millisecond your #LLM spends “thinking” is $$$ left on the table and results in more users abandoning your AI ap…

English

Sameer Reddy@SameerReddy0·15 Nis

@willdepue Why can't deep research add images and figures to the reports? It could even leverage the image generation + image search to combine forces.

English

will depue@willdepue·14 Nis

ok hit me with your craziest ideas for chatgpt, if it’s really good i’ll mail you limited openai merch. hit me 👇

English

1.5K

2.4K

716.6K

Sameer Reddy retweetledi

Naman Jain@StringChaos·9 Nis

Excited to release R2E-Gym - 🔥 8.1K executable environments using synthetic data - 🧠 Hybrid verifiers for enhanced inference-time scaling - 📈 51% success-rate on the SWE-Bench Verified - 🤗 Open Source Data + Models + Trajectories 1/

English

256

51.8K

Sameer Reddy retweetledi

Lewis Tunstall@_lewtun·8 Nis

New R1-Zero experiments with GRPO: 1. Mask the loss from completions that don't terminate in an EOS token (DAPO). Significantly improves stability when doing importance sampling with μ>0. Coming soon to TRL! 2. Use a "soft" format reward function to elicit the and tags from the model, and then refine with a strict format. Inspired by @willccbb's famous GRPO training script and makes all the difference between never learning the format to rapidly getting it 100% correct Log book: #67f54321bbbacbc327d03e6c" target="_blank" rel="nofollow noopener">huggingface.co/spaces/open-r1…

Lewis Tunstall@_lewtun

New log book: figuring out which of the many methods are actually needed for stable R1-Zero-like training

English

271

26.8K

Sameer Reddy@SameerReddy0·8 Nis

@MLOpsWorld Excited to attend and be presenting at MLOps World!

English

Sameer Reddy retweetledi

MLOps World | GenAI Summit@MLOpsWorld·8 Nis

Excited to have @SameerReddy0 at the Agents & GenAI Infrastructure and Tooling Summit on April 15, 2025! In his session, he’ll explain how reinforcement fine-tuning (RFT) improves agent decision-making. 📅 April 15, 2025 | 8:00 AM - 3:30 PM EST | Virtual

English

108

Sameer Reddy retweetledi

Amey Agrawal@agrawalamey12·27 Mar

Super long-context models with context window spanning millions of tokens are becoming commonplace (@GoogleDeepMind Gemini, @xai Grok 3, @Alibaba_Qwen Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes a major issue, hurting latency for everyone. We present Medha, a system designed to handle this mix efficiently. Achieving 30x lower latency, and 5x higher throughput compared to the state-of-the-art. Full paper: arxiv.org/pdf/2409.17264. 🧵

English

3.6K

Sameer Reddy retweetledi

Predibase by Rubrik@predibase·19 Mar

Today we're thrilled to announce the first end-to-end platform for Reinforcement Fine-Tuning. With just a dozen labeled data points, you can outperform #OpenAI o1 and #DeepSeekR1 on complex tasks. Built on the #GRPO methodology that DeepSeek-R1 popularized, our platform delivers exceptional results. In our real-world PyTorch to Triton transpilation case study, we achieved 3x higher accuracy than OpenAI o1 and DeepSeek-R1 when writing GPU code. Check out the thread below to learn how you can adapt an #opensource #LLM to your use cases with unmatched efficiency. #rft

English

500

Sameer Reddy retweetledi

Rohan Paul@rohanpaul_ai·19 Mar

Want AI that speaks your language? Fine-tuning is the spark you need. Essentially, you’re tailoring an off-the-shelf model to your precise goals. The catch? Historically, it took an avalanche of labeled data—thousands of samples—to train your ‘old dog’ to perform a brand-new trick. I was getting tired of labeling thousands of samples and the cost. And then I looked into Reinforcement Fine-Tuning (RFT) from @predibase . 🧵1/n With RFT, a lightweight opensource LLM can rapidly evolve into an exceptional problem-solving machine. 📚 Reinforcement Method Predibase's Reinforcement Fine-Tuning (RFT) addresses the constraints of classic supervised approaches. It systematically applies a reward function to guide model updates. - Advanced policy gradient methods that shorten iteration cycles. - This setup surpasses standard fine-tuning by providing precise feedback and higher accuracy with minimal labeled data. ⚙️ Minimal Labeled Data - RFT excels with fewer than 100 labeled examples. - It discards the old requirement for massive datasets by validating responses against a reward metric. - This approach cuts data-collection costs significantly. - Small, controlled feedback loops sharpen performance in logical or multi-hop tasks. 🔗 Chain-of-Thought Boost Chain-of-thought integration refines step-by-step reasoning. RFT checks partial correctness and then readjusts updates to strengthen valid outputs. This self-correcting mechanism limits error propagation in arithmetic or combinatorial tasks. Iterative feedback enables the model to refine its own reasoning. 🚀 Integration and Impact The RFT pipeline is seamlessly integrated within the Predibase platform. - Debugging tools help trace reward distribution, and distributed training supports larger-scale tasks. - Models can be deployed or tracked in a serverless manner without hardware overhead. - Adaptive reward shaping and automated checkpointing accelerate development. - RFT extends fine-tuning capabilities for LLMs in limited-data domains. - This eliminates hefty labeling expenses while sustaining performance gains. - It directly tackles situations where correctness is measurable but labeled data is scarce. ------ The below image is from their official technical report (link in comment). RFT leads the pack at each data scale. Note the jump in performance when training with just 10 or 100 samples.

English

11.8K

Sameer Reddy@SameerReddy0·17 Mar

@antferrui @hongyu_chang I come from a deep learning background and this sounds so interesting. I'm able to grasp some of what you are saying, but are you able to provide a really intuitive high level description? What does it mean to have disentangled representations in these context, mathematically?

English

343

AntonioFR@antferrui·17 Mar

Excited to share our latest story! We found disentangled memory representations in the hippocampus that generalized across time and environments, despite the seemingly random drift and remapping of single cells. This code enabled the transfer of prior knowledge to solve new tasks

English

160

92.5K

Sameer Reddy@SameerReddy0·23 Şub

@HaydnBelfield i think people know this lol

English

Haydn Belfield@HaydnBelfield·23 Şub

One of the wisest things Dario Amodei has said is that's it's best to think of companies as companies - corporate bureaucracies responding to market incentives - rather than as sports teams or fandoms.

Dan McAteer@daniel_mac8

🪄 just realized these four major US AI Labs map almost perfectly to the four Hogwarts houses: 1. Slytherin - @xai (obviously) 2. Gryffindor - @OpenAI (obviously) 3. Hufflepuff - @AnthropicAI (obviously) 4. Ravenclaw - @AIatMeta (obviously)

English

117

7.2K

Sameer Reddy retweetledi

Leshem (Legend) Choshen 🤖🤗@LChoshen·21 Şub

AdaMerging finds that when you merge models you can weight the avg. and it matters a lot (big surprise..) The twist is that to find the coefficients you don't need labeled data. Instead, they minimize the entropy of the predictions, low entropy➡️high perf.

English

5.7K

Sameer Reddy retweetledi

Alex Lew@alexanderklew·10 Şub

@xtimv and I were just discussing this interesting comment in the DeepSeek paper introducing GRPO: a different way of setting up the KL loss. It's a little hard to reason about what this does to the objective. 1/

English

2.6K

Sameer Reddy retweetledi

Sebastian Raschka@rasbt·15 Şub

It's 2025, and I’ve finally updated my Python setup guide to use uv + venv instead of conda + pip! Here's my go-to recommendation for uv + venv in Python projects for faster installs, better dependency management: github.com/rasbt/LLMs-fro… (Any additional suggestions?)

English

270

2.3K

192.3K

Sameer Reddy retweetledi

Paul Calcraft@paul_cal·15 Şub

The story of LLMs playing games, and what we know so far Tic Tac Toe, Chess, Minecraft, NYT Connections, Wordle, Pictionary, Connect 4, Codenames, Snake... 1/n

GIF

English

108

248.8K

Keşfet

@StasBekman @tri_dao @liger_kernel @StellaLisy @YiranWu18 @predibase @willdepue @willccbb