Nitin Kedia

20 posts

Nitin Kedia

Nitin Kedia

@nitinkedi

CS PhD Student at @UTAustin | ex @MSFTResearch @zetasuite @IITGuwahati | Systems for ML

Katılım Mayıs 2023
74 Takip Edilen16 Takipçiler
Nitin Kedia retweetledi
Pratyush Kumar
Pratyush Kumar@pratykumar·
Drop 13/14: The 30B and 105B models, benchmarks, and HF links will all come. But today it is a drop about people. About how our team of just 15 folks gave it their all to do what many doubted as not doable - ie train usefully large, globally competitive models from scratch in India. This team of 15 has now firmly launched @sarvam into its second innings. Yes, we can! @_mohit_singla @anand_404 @kediaharshit9 @AashaySachdeva @sumanthd17 @ArpitDwivedi100 @HarveenChadha @rkal4 @sushil_khyalia @ManavSinghal157 @sohampetkar missing in the pictuere - @selfawareatom @AnnaUpreti Anand @MeghMakwan33973 Utkarsh
Pratyush Kumar tweet media
English
198
740
5.3K
308.2K
Nitin Kedia retweetledi
kwatra
kwatra@kwatra·
TokenWeave – Efficient Compute-Communication Overlap for Distributed LLM Inference. Why? Even with highspeed NVLink on H100 DGX, communication overhead for distributed LLM inference can be > 20 %! Can we recover this overhead? (1/10)
kwatra tweet media
English
1
6
18
1.4K
Nitin Kedia retweetledi
Vima Gupta
Vima Gupta@vima_gupta·
1/7 🧵 MoEs: A tale of expectation vs reality Marketing: "Only compute the expert parameters you need!" Reality: Batch 16 requests → ALL experts activate At serving time (vLLM/TGI), arithmetic intensity: AI ≈ (num_tokens * top_k) / total_experts In simpler terms: Your decode arithmetic intensity scales inversely with expert count 🤔 #MoE #LLMs #ChatGPT #Claude #vllm #AI #ML
Vima Gupta tweet media
English
4
7
32
3.1K
Nitin Kedia retweetledi
Amey Agrawal
Amey Agrawal@agrawalamey12·
@Google has silently but surely developed an edge over @OpenAI. Long context processing seems to be the key to Google's AI strategy. NotebookLM is a prime example of what long context processing can unlock. In our latest paper, we talk about how systems can be built to support multi-million context length matching Google's capabilities. In case you missed the paper, here is the NotebookLM generated podcast! Podcast: notebooklm.google.com/notebook/764f5… Arxiv: arxiv.org/abs/2409.17264
English
2
4
11
824
Nitin Kedia
Nitin Kedia@nitinkedi·
Are you getting the performance you paid for from your LLM provider? Benchmark it using Metron. It is one our biggest learning while working on LLM Inference for the last year at @MSFTResearch and @gtcomputing when we shipped Chunked Prefill at OSDI'24 and Vidur @MLSysConf.
Amey Agrawal@agrawalamey12

🚀 Introducing Metron: Redefining LLM Serving Benchmarks! 📊 Tired of misleading metrics for LLM performance? Our new paper introduces a holistic framework that captures what really matters - the user experience! 🧠💬 github.com/project-metron… #LLM #AI #Benchmark

English
0
0
2
80
Nitin Kedia retweetledi
fly51fly
fly51fly@fly51fly·
[LG] Vidur: A Large-Scale Simulation Framework For LLM Inference arxiv.org/abs/2405.05465 - This paper presents Vidur, a high fidelity and easily extensible simulator for large language model (LLM) inference, along with a benchmark and search suite. - Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates end-to-end inference performance for different workloads. - It estimates metrics like latency, throughput, model FLOPs utilization, memory utilization, etc. with high accuracy. - Vidur addresses challenges unique to simulating LLM inference like finer time granularity, varying iteration times, and cascading errors. - It uses insights like architectural uniformity of LLMs, operator triaging, and automated profiling for parallelism strategies to achieve fidelity. - Vidur-Search uses Vidur to automatically identify optimal cost-effective deployment configurations meeting performance constraints.
fly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
0
9
14
1.7K
Nitin Kedia
Nitin Kedia@nitinkedi·
We at @MSFTResearch and @GeorgiaTech believe that running LLM's shouldn't be so expensive 💵 So, we built a tool 🛠️ that will enable you to run it cheaper, make it cheaper. Introducing Vidur👳🏽, the first LLM Inference System simulator. #mlsys #vllm #llm #llama #gpt
English
1
3
15
538
Nitin Kedia retweetledi
Amey Agrawal
Amey Agrawal@agrawalamey12·
1/ LLM inference systems are like high-performance engines ⚙️—complex, powerful, and full of intricate settings. Efficiently deploying them to maximize GPU performance is a challenge typically tackled by experts at orgs like @OpenAI and @AIatMeta 🚀. 🧵
Amey Agrawal tweet media
English
1
13
39
3.2K