Aditya Kulshrestha

474 posts

Aditya Kulshrestha banner
Aditya Kulshrestha

Aditya Kulshrestha

@Aditya_kul02

GenAI & Solutions @ Intel | Ex https://t.co/Ted7LL3YNQ, https://t.co/oDR2AwJqny | Part time sasta Philosopher

Katılım Şubat 2024
445 Takip Edilen55 Takipçiler
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
What are some of the must read RL papers apart from classic RL techniques. I am interested in papers talking more about rewards, exploitation, efficiency and non-popular applications.
English
0
0
1
19
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
@maharshii Finally someone talking about it. It's not just about writing the fastest kernel but about the tradeoff of how to do system level optimization. How it gets impacted with concurrency, and other process fighting for BW.
English
0
0
0
123
maharshi
maharshi@maharshii·
the deeper i go into ml optimizations the more i realize that it is a system design problem. fast kernels are important yes, but how you integrate them matters a lot. the best part is there’s no one way to do things, you can be very creative and still get significant speedups!
English
14
5
303
8K
Saurabh Kumar
Saurabh Kumar@drummatick·
@0xSero In what way SGLang better than vLLM? Community? Speed? Concurrency? Day 0 rollouts? Features?
English
1
0
1
605
0xSero
0xSero@0xSero·
SGLang > VLLM > Exllamav3 > Llama.cpp IF you have any Nvidia hardware this is the play.
English
64
39
672
43.9K
Asiya
Asiya@asiyayyay·
@sheeeevam who says sdes are not good writers🫶🏻
English
1
0
0
21
MSIZI 🇿🇦
MSIZI 🇿🇦@msiziworld·
What kind of mosquito is this?😳😳
MSIZI 🇿🇦 tweet mediaMSIZI 🇿🇦 tweet media
English
7K
2.6K
48.1K
10.1M
maharshi
maharshi@maharshii·
gonna send this to start conversations with all my friends from now on
maharshi tweet media
English
13
151
2.5K
43.5K
Manthan Gupta
Manthan Gupta@manthanguptaa·
Book Review Inference Engineering is best approached as a map of the space rather than something that teaches you how to actually do it. The book does a really good job of laying out the breadth of the field as it touches topics like GPUs, infra, serving patterns, production concerns, and gives a solid sense of what exists and how the pieces fit together. In that sense, it feels like a "Wikipedia for inference engineering." But that breadth comes with a tradeoff. It's trying to compress an entire emerging field into a small number of pages, so a lot of concepts don't get the depth needed to really click. You often come away knowing that something is important, but not fully understanding why or how it works in practice. The biggest gap for me was the lack of concrete examples. More real-world scenarios or step-by-step breakdowns would have made a huge difference in building intuition. Overall, it's useful if your goal is to get oriented and understand the landscape. But if you are trying to build real intuition or actually learn how to design and optimize inference systems, it’s not enough on its own. Feels like a great reference to revisit once you have already started building, not something you rely on as your primary learning resource.
Manthan Gupta tweet media
English
8
7
116
6.4K
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
@_avichawla What does Flash Attention got to do with context length expansion. Specially in the training part? It is used to speed up the process but its not a technique to expand the context length.
English
0
0
5
2.4K
Avi Chawla
Avi Chawla@_avichawla·
You're in a Research Scientist interview at OpenAI. The interviewer asks: "How would you expand the context length of an LLM from 2K to 128K tokens?" You: "I will fine-tune the model on longer docs with 128K context." Interview over. Here's what you missed:
English
30
67
930
250.1K
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
@kadirnardev Did you face catastrophic forgetting? My experiences yielded null audio even though the loss was conveying otherwise.
English
1
0
0
68
Kadir Nar
Kadir Nar@kadirnardev·
I started training a new TTS supporting Japanese and English using the Mimi and Qwen3-0.6 models. I did several experiments with Mimi before and think it's the best option for LLM-based TTS models. When I just increase the codebook value, model training slows down quite a bit. Data: Emilia-subset(en) + private ja(3M samples) GPU: 8xB200 Model: Qwen3 + Mimi(32 codebook) Total time: 150 hours
Kadir Nar tweet media
English
4
0
30
2K
Arnie Ramesh
Arnie Ramesh@arnie_hacker·
@Aditya_kul02 Despite ~3 weeks of back & forth & $10K credits, they won’t grant me more than 8 G vCPUs
English
2
0
1
20
Arnie Ramesh
Arnie Ramesh@arnie_hacker·
Can you believe I am CPU bound on AWS?
English
2
0
1
745
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
Please don't judge me but I use hf for file sharing and storage. Lately, more often than usual. Hope they don't block me.
English
1
0
2
117
aryash jain
aryash jain@Aaryash747·
Sometimes I watch a movie and think, Why did they make the movie.
English
1
0
1
159
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
Breakdown: - queue is like a waiter who takes the instructions, data movement, prefetch, synchronization. - Unified Shared Memory Allocation - Host and accelerator both can point to same memory pointer. - q.parallel_for - Queue (q), launch N parallel workers - .wait() - Sync
English
0
0
0
18
Aditya Kulshrestha
Aditya Kulshrestha@Aditya_kul02·
Finally getting back to it. Started learning about SYCL programming - hardware agnostic pgrmg language built on top of OpenCL by Khronos grp. OpenCL - Hardware agnostic compiler; uses different compiler for host and accelerator SYCL - Abstraction over OpenCL DPC+ - Intel exten.
Aditya Kulshrestha tweet media
Aditya Kulshrestha@Aditya_kul02

This is a reminder for me to start building a model that can write good kernels. Challenge is low availability of data and less pretraining corpus for current coding models.

English
1
0
2
99