Aditya Ghai

130 posts

Aditya Ghai banner
Aditya Ghai

Aditya Ghai

@aditya_ghai07

Building Vithos || ML Systems at Scale || 5X Hackathon Winner

Katılım Ocak 2024
179 Takip Edilen29 Takipçiler
Sabitlenmiş Tweet
Aditya Ghai
Aditya Ghai@aditya_ghai07·
Training run 10K steps. ~4B tokens. 225M params. 3× RTX 4090s on novita ai. Throughput sat rock solid at ~39.5K tok/s the whole run, DDP was clean. Loss opened at 10.15, closed at 2.73. Curve looks exactly like it should. Could've used Karpathy's autoresearch loop to squeeze more out of the setup. Chose not to. Wanted to see what best can I get by hand: scheduler, buffer cycling, DDP sync, the whole thing. You learn differently when you can't outsource the debugging. Still was stopping just shy of Chinchilla optimal. Loss was still dropping at step 10K. The $25 budget made the call, not me.
Aditya Ghai tweet media
English
0
0
3
77
Ray🫧
Ray🫧@ravikiran_dev7·
As a developer, which authentication provider do you prefer?
Ray🫧 tweet mediaRay🫧 tweet mediaRay🫧 tweet mediaRay🫧 tweet media
English
140
6
267
40.2K
Nandini
Nandini@N_and_ni·
I am starting DSA which playlist is better? 🤔 Striver ? Love Babbar ?
Nandini tweet mediaNandini tweet media
English
70
5
127
14.7K
Aditya Ghai
Aditya Ghai@aditya_ghai07·
@sickdotdev Your family's health history, finally in one place. Upload a lab PDF. We extract every biomarker, track how it changes over time, and let you ask an AI that actually knows your history. For you and your whole family. [LIFETIME FREE] vithos.in
English
0
0
1
9
Sick
Sick@sickdotdev·
Drop what you’re building. Last time 50k people saw it. Consider this as marketing.
English
620
4
292
25.6K
Aditya Ghai
Aditya Ghai@aditya_ghai07·
@Haezurath My linkedIn seems fine, but I am just too bad on X :(
English
0
0
1
62
Kacie Ahmed
Kacie Ahmed@Haezurath·
Your personal brand can get you a job. I’m not joking Drop a comment and I’ll give you advice/actionable feedback on how to improve yours
English
176
8
263
16.6K
Suni
Suni@suni_code·
Drop you Project, I will SIGNUP And let's drive traffic to ur site....
English
62
2
43
2.1K
Prajwal
Prajwal@0xPrajwal_·
Applications for the Anthropic Fellows Program are now open. Chance to earn a $3,850/week stipend. Details + link in the first comment
Prajwal tweet mediaPrajwal tweet media
English
28
4
99
7.6K
amrit
amrit@amritwt·
pitch me your app in three words
English
357
3
251
23.8K
Aditya Ghai
Aditya Ghai@aditya_ghai07·
Vithos just got an update Full body reports can take a bit longer now, so we added email alerts. You’ll get notified as soon as your report is ready. Custom SMTP setup done, thanks @Harshit_srv204 Take care of your parents’ health FREE FOR LIFE for the next few users.
Aditya Ghai tweet media
English
1
0
1
27
omeed
omeed@omeedtehrani·
I am hiring. DM me.
English
66
2
187
12.2K
Atharva
Atharva@attharrva15·
I met so many goated people in this batch damn everyone is so ambitious and trying their level best is what i like. It's not about the money or placement, the people are totally ambitious and that's why it's a great thing to join to.
Harkirat Singh@kirat_tw

Super 30 2.0 ends today. 6 months ago, 78 people joined us onsite in Noida. Most of them were either in their final year or recently graduated. Today 60/78 have been placed at an average offer of $1.8k/mo and median of $1.5k/mo. Next Super 30 starts 7th May, you can pre-register for it here - 100xschool.in/super30-admiss…

English
2
2
46
3.6K
Aditya Ghai
Aditya Ghai@aditya_ghai07·
@iBhanuDahiya so sad, happy that my college allows yearlong intern as well, I get the college credits. So basically I had to only study till 3rd year. Will be graduating in a few months btw.
English
1
0
1
273
Bhanu D — sys/acc
Bhanu D — sys/acc@iBhanuDahiya·
Indian education system is cooked fr. Btw, I’m from Maharaja Agrasen Institute of Technology and I’ve got no issue saying this publicly because MAIT has never really provided anything useful. I needed an NOC for my full time job. This is how it went: Me: Sir, why am I getting detained? I’m working full time and can provide employment confirmation and all required docs. Mentor: Not in my hands. Talk to HOD. Me: Sir, mentor asked me to talk to you about stopping my detention and NOC. HOD: Not in my hands. Talk to Director. Me: Sir, HOD asked me to talk to you about my job and detention in midterms. Director: Kisne bola tha job karne? Come to college and you won’t be detained. Job chhod do aur college aao. NOC chahiye toh 8th sem me aana. I absolutely love my college, it’s amazing, peak college experience. If you value your sanity, highly recommend everyone to join MAIT 👍
Bhanu D — sys/acc tweet mediaBhanu D — sys/acc tweet media
English
91
55
1.2K
166.5K
Aditya Ghai
Aditya Ghai@aditya_ghai07·
The reason is simple. Byte-level BPE (GPT-2, LLaMA, DeepSeek) treats each Devanagari character as 3 tokens before any merges, one per UTF-8 byte. Ours starts from Unicode characters directly. Every matra, conjunct, virama sequence is one unit from the start. You don't need a bigger vocab. You need the right alphabet.
English
1
0
2
92
Aditya Ghai
Aditya Ghai@aditya_ghai07·
General-purpose tokenizers are terrible at Hindi. GPT-4 takes 997 tokens for a 186-word Hindi paragraph. SmolDocling hits 1088. DeepSeek-V3, which people think is multilingual-aware still clocks 558. The script is paying a tax it shouldn't have to.
Aditya Ghai tweet media
English
2
0
4
112
Aditya Ghai
Aditya Ghai@aditya_ghai07·
Dataset: Hindi Wikipedia + ai4bharat/sangraha. Baseline cleaning, Wikipedia-heavy by design, cleaner signal over raw web crawl. Built 500M token buffers and cycled them during training instead of one static pass. ~4B tokens seen total. Simple pipeline, intentional data ordering.
English
0
0
2
35
Aditya Ghai
Aditya Ghai@aditya_ghai07·
Built a 225M Hindi LLM from scratch. MLA + block attention residuals, DDP across 3× RTX 4090s, trained on ~4B tokens in bf16. Custom BPE tokenizer hits ~1.5 fertility beats SOTA on Hindi. Full open source
English
4
0
3
53