Atharva

41 posts

Atharva

@atharvarta

Exploring everything that life throws at me 🫪

Mumbai Katılım Ağustos 2024

111 Takip Edilen0 Takipçiler

Atharva@atharvarta·1h

Back to experiments. The goal now isn't getting better numbers, it's understanding why the numbers move in the first place. A lot more ablations ahead. updated readme: github.com/Atharva-Mendhu…

English

Atharva@atharvarta·1h

Quick StrataRL update. First Kaggle run and now I know why training the model is the easy part. Getting the infrastructure, logging, monitoring, and evaluation right took way more time than I expected. Starting to understand why people obsess over metrics so much.

English

Atharva@atharvarta·6h

@iyoushetwt Just because of the hardware this might be true

English

313

Ayushi☄️@iyoushetwt·21h

unpopular opinion: macOS is better than linux (for coding)

English

137

11K

Atharva@atharvarta·1d

@itzsam_ai Im solo dev so idk 😝💔

Slovenščina

Sattyam Samania@itzsam_ai·1d

@atharvarta when will you launch this?

English

Sattyam Samania@itzsam_ai·1d

Are you Building in public? Drop your project below👇

English

129

4.6K

Atharva@atharvarta·1d

@nezbuilds Working on StrataRL: A GRPO infrastructure for multi-domain reasoning in Small Language Models. github.com/Atharva-Mendhu…

English

Nez@nezbuilds·1d

Good morning builders 👋 It’s Wednesday, time to show the internet what you’ve been building. Drop your project + a short description below. I’ll be checking out projects, giving feedback and connecting with fellow founders throughout the day 👇

English

116

2.5K

Atharva@atharvarta·1d

Local validation is done. Now I'm moving the experiments to Kaggle The biggest bottlenecks so far are GPU memory constraints, rollout speed, and fitting meaningful GRPO experiments into Kaggle's runtime limits Would love to hear from anyone who's run RLHF/GRPO on Kaggle before

English

Atharva@atharvarta·1d

GRPO + domain-aware rewards + stratified advantage normalization + curriculum scheduling + training monitoring Currently validated locally on a Laptop (MBA M4 24GB) with Qwen2.5-3B, 35+ tests passing, 0 failures for now, and successful training runs so far

English

Atharva@atharvarta·1d

Lately I've been reading a lot about RL and GRPO. What started as a few papers turned into a deep dive on why multi-domain RL can improve one benchmark while making another worse. One paper I found interesting was DeepSeekMath. Worth a read if you're interested in GRPO and RL.

English

Atharva@atharvarta·26 May

Just finished reading SICP Chapter 3. Never thought a book written decades ago would map so well to modern agentic AI systems. State, shared context, mutation, concurrency, synchronization. Same problems, different scale.