Aditya Ghai

130 posts

Aditya Ghai

@aditya_ghai07

Building Vithos || ML Systems at Scale || 5X Hackathon Winner

Katılım Ocak 2024

179 Takip Edilen29 Takipçiler

Sabitlenmiş Tweet

Aditya Ghai@aditya_ghai07·8 Nis

Training run 10K steps. ~4B tokens. 225M params. 3× RTX 4090s on novita ai. Throughput sat rock solid at ~39.5K tok/s the whole run, DDP was clean. Loss opened at 10.15, closed at 2.73. Curve looks exactly like it should. Could've used Karpathy's autoresearch loop to squeeze more out of the setup. Chose not to. Wanted to see what best can I get by hand: scheduler, buffer cycling, DDP sync, the whole thing. You learn differently when you can't outsource the debugging. Still was stopping just shy of Chinchilla optimal. Loss was still dropping at step 10K. The $25 budget made the call, not me.

English

Aditya Ghai@aditya_ghai07·19h

@ravikiran_dev7 supabase and sometimes firebase for mvps mostly.

English

Ray🫧@ravikiran_dev7·1d

As a developer, which authentication provider do you prefer?

English

140

267

40.2K

Aditya Ghai@aditya_ghai07·20h

@N_and_ni personally I used striver

English

Nandini@N_and_ni·1d

I am starting DSA which playlist is better? 🤔 Striver ? Love Babbar ?

English

127

14.7K

Aditya Ghai@aditya_ghai07·20h

@sickdotdev Your family's health history, finally in one place. Upload a lab PDF. We extract every biomarker, track how it changes over time, and let you ask an AI that actually knows your history. For you and your whole family. [LIFETIME FREE] vithos.in

English

Sick@sickdotdev·1d

Drop what you’re building. Last time 50k people saw it. Consider this as marketing.

English

620

292

25.6K

Aditya Ghai@aditya_ghai07·21h

@Haezurath My linkedIn seems fine, but I am just too bad on X :(

English

Kacie Ahmed@Haezurath·22h

Your personal brand can get you a job. I’m not joking Drop a comment and I’ll give you advice/actionable feedback on how to improve yours

English

176

263

16.6K

Aditya Ghai@aditya_ghai07·1d

@suni_code Know your family's health. Every marker, always. FREE FOR LIFE for next 1000 signups vithos.in

English

Suni@suni_code·2d

Drop you Project, I will SIGNUP And let's drive traffic to ur site....

English

2.1K

Aditya Ghai@aditya_ghai07·1d

@0xPrajwal_ yes :(

Prajwal@0xPrajwal_·1d

@aditya_ghai07 Ohh is it ?

English

Prajwal@0xPrajwal_·2d

Applications for the Anthropic Fellows Program are now open. Chance to earn a $3,850/week stipend. Details + link in the first comment

English

7.6K

Aditya Ghai@aditya_ghai07·1d

@amritwt Your health, remembered. vithos.in

English

174

amrit@amritwt·2d

pitch me your app in three words

English

357

251

23.8K

Aditya Ghai@aditya_ghai07·6d

@Harshit_srv204 vithos.in here is the link!

English

Aditya Ghai@aditya_ghai07·6d

Vithos just got an update Full body reports can take a bit longer now, so we added email alerts. You’ll get notified as soon as your report is ready. Custom SMTP setup done, thanks @Harshit_srv204 Take care of your parents’ health FREE FOR LIFE for the next few users.

English

Aditya Ghai@aditya_ghai07·10 Nis

@omeedtehrani done

English

200

omeed@omeedtehrani·10 Nis

I am hiring. DM me.

English

187

12.2K

Aditya Ghai@aditya_ghai07·8 Nis

@attharrva15 Congrats!

English

Atharva@attharrva15·8 Nis

I met so many goated people in this batch damn everyone is so ambitious and trying their level best is what i like. It's not about the money or placement, the people are totally ambitious and that's why it's a great thing to join to.

Harkirat Singh@kirat_tw

Super 30 2.0 ends today. 6 months ago, 78 people joined us onsite in Noida. Most of them were either in their final year or recently graduated. Today 60/78 have been placed at an average offer of $1.8k/mo and median of $1.5k/mo. Next Super 30 starts 7th May, you can pre-register for it here - 100xschool.in/super30-admiss…

English

3.6K

Aditya Ghai@aditya_ghai07·8 Nis

@iBhanuDahiya so sad, happy that my college allows yearlong intern as well, I get the college credits. So basically I had to only study till 3rd year. Will be graduating in a few months btw.

English

273

Bhanu D — sys/acc@iBhanuDahiya·7 Nis

Indian education system is cooked fr. Btw, I’m from Maharaja Agrasen Institute of Technology and I’ve got no issue saying this publicly because MAIT has never really provided anything useful. I needed an NOC for my full time job. This is how it went: Me: Sir, why am I getting detained? I’m working full time and can provide employment confirmation and all required docs. Mentor: Not in my hands. Talk to HOD. Me: Sir, mentor asked me to talk to you about stopping my detention and NOC. HOD: Not in my hands. Talk to Director. Me: Sir, HOD asked me to talk to you about my job and detention in midterms. Director: Kisne bola tha job karne? Come to college and you won’t be detained. Job chhod do aur college aao. NOC chahiye toh 8th sem me aana. I absolutely love my college, it’s amazing, peak college experience. If you value your sanity, highly recommend everyone to join MAIT 👍

English

1.2K

166.5K

Aditya Ghai@aditya_ghai07·7 Nis

link for tokenizer and details github.com/adityaghai07/a…

English

Aditya Ghai@aditya_ghai07·7 Nis

The reason is simple. Byte-level BPE (GPT-2, LLaMA, DeepSeek) treats each Devanagari character as 3 tokens before any merges, one per UTF-8 byte. Ours starts from Unicode characters directly. Every matra, conjunct, virama sequence is one unit from the start. You don't need a bigger vocab. You need the right alphabet.

English

Aditya Ghai@aditya_ghai07·7 Nis

General-purpose tokenizers are terrible at Hindi. GPT-4 takes 997 tokens for a 186-word Hindi paragraph. SmolDocling hits 1088. DeepSeek-V3, which people think is multilingual-aware still clocks 558. The script is paying a tax it shouldn't have to.

English

112

Aditya Ghai@aditya_ghai07·7 Nis

Dataset: Hindi Wikipedia + ai4bharat/sangraha. Baseline cleaning, Wikipedia-heavy by design, cleaner signal over raw web crawl. Built 500M token buffers and cycled them during training instead of one static pass. ~4B tokens seen total. Simple pipeline, intentional data ordering.

English

Aditya Ghai@aditya_ghai07·7 Nis

Built a 225M Hindi LLM from scratch. MLA + block attention residuals, DDP across 3× RTX 4090s, trained on ~4B tokens in bf16. Custom BPE tokenizer hits ~1.5 fertility beats SOTA on Hindi. Full open source

English

Keşfet

@ravikiran_dev7 @N_and_ni @sickdotdev @Haezurath @suni_code @0xPrajwal_ @amritwt @Harshit_srv204