Harsh Bhatt

1.8K posts

Harsh Bhatt banner
Harsh Bhatt

Harsh Bhatt

@harshbhatt7585

20 | reinforcing agents, prev RL @softmaxresearch, 3x ml at startup

simulation Katılım Ekim 2022
635 Takip Edilen2.1K Takipçiler
Sabitlenmiş Tweet
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
I just uploaded a video on implementing Qwen 3.5 from scratch. > Implement RoPE > Implement Group-Query Attention > Implement Recurrent Linear Attention > Implement KV Cache Management > Implement Decoder checkout here: Coding Qwen 3.5 LLM from scratch! youtu.be/wzW7Kf7sDvU
YouTube video
YouTube
Harsh Bhatt tweet media
English
14
65
659
21.6K
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
@vbppl it given push of ~ +200-300 tokens/sec
English
2
0
0
5
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
@vbppl I will post about the actual margin gain!
English
1
0
0
9
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
Training at 1M token/s Here’s how i m doing it, I have been keep reading nanochat training pipeline and that’s a great example on what squeezing the model and high-throughput training. TorchAO js really great because it quantising not the the entire model but selected layers layers of the model. TorchAO is a PyTorch-native optimization library for quantization, sparsity, and low-precision training. I’m selecting the layers where the computation is dense, mostly the heavy matrix multiplication parts, and converting those parts to float8. And other parts I am using bfloat16. That means faster matrix multiplications and better GPU throughput. TorchAO already supports float8 training, and PyTorch mentions speedups up to around 1.5x at large training scale with torch.compile. That’s how you squeeze more tokens per second from the same hardware. And this is one of the underrated parts of training LLM
English
4
2
42
1.7K
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
@vbppl linear layers are heavy so we can convert them to f8
English
1
0
0
17
Rocky
Rocky@Rocky_T07·
@harshbhatt7585 Aapke liye toh Vol 2 aur bhi badhiya lagega. Naye LLM ke training mein kaam aayega :)
हिन्दी
1
0
1
51
Rocky
Rocky@Rocky_T07·
Harvard's CS249r course has got it's main book in two volumes along with all labs and tinytorch available in the browser itself. be the first to try this out: harvard-edge.github.io/cs249r_book_de… This is the new (currently in dev version) which will be released in stable version once we're done with all the testing. Try it out folks !!
English
2
0
6
110
Suyash Jain
Suyash Jain@Suyash151504·
@harshbhatt7585 Bro i am a big fan of you tutorial videos, they are so practical to learn, can you please make a video for above thing you are doing 🙏🏻
English
1
0
1
86
Rocky
Rocky@Rocky_T07·
@harshbhatt7585 This is a really smart move, a perfect ML systems engineering as it should be.. W !!
English
1
0
1
104
Kylin Shaw
Kylin Shaw@ShawKylin·
Proud to share that the U.S. Army, 1st Cavalry Division, has reached a deployment agreement with Hippos. 1CD one of the most storied combat divisions in American military history. Phase 1 and 2 deployment begins this year, with a pathway toward 2,000-unit ($2M) division-wide procurement in 2027. As a 21-year-old, all I could have asked for was a place where I could freely dream the impossible, a team to build shoulder to shoulder with, and a purpose greater than myself. America gave me all three. This is the beginning of paying back a debt of gratitude I will carry for life. 🇺🇸
Kylin Shaw tweet media
English
36
12
101
13.7K
Arnav Mehta | CAMB.AI
Performing a peak-energy experiment right now because I've too many things to get done by tomorrow I've combined: - Protein Coffee from Potential (L Theanine) - Added a Black Coffee - A Cappucino - Casa Pons energy leaves bought from Dubai Wil Update results after a quick nap
English
2
0
6
147
Krishiv
Krishiv@KrishivThakuria·
Turned 18 today Grateful for everything tech did for me as a kid I hope to give back soon
English
16
0
42
1.1K
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
@Rocky_T07 aag to aapki profile picture ne lagai hai rocky bhaiiii!
HT
1
0
1
54
Harsh Bhatt
Harsh Bhatt@harshbhatt7585·
I am implementing and training diffusion language model from scratch. The challenge is to not eat my favourite food until I figure out obtaining a decent inference performance! > implementing training infrastructure and scaling the training. > I am optimising KV cache performance to speed up the transformers block > Implementing Diffusion Blocks which can predict tokens parallel! lessgoo!
English
18
10
187
9.7K