Jayashree Mohan

76 posts

Jayashree Mohan

@jayashree2912

Researcher | Microsoft Research India

Bengaluru, India Katılım Kasım 2015

351 Takip Edilen663 Takipçiler

Jayashree Mohan retweetledi

kwatra@kwatra·20 May

TokenWeave – Efficient Compute-Communication Overlap for Distributed LLM Inference. Why? Even with highspeed NVLink on H100 DGX, communication overhead for distributed LLM inference can be > 20 %! Can we recover this overhead? (1/10)

English

1.4K

Jayashree Mohan@jayashree2912·16 Tem

Check out this exciting recent work from our group that shows why legacy eval metrics like accuracy don't completely capture the compressed LLM model behavior and the metrics you should actually look out for! Read details here...

Abhinav Dutta@abhinavdutta555

🚨 Are LLM compression methods (𝘲𝘶𝘢𝘯𝘵𝘪𝘻𝘢𝘵𝘪𝘰𝘯, 𝘱𝘳𝘶𝘯𝘪𝘯𝘨, 𝘦𝘢𝘳𝘭𝘺 𝘦𝘹𝘪𝘵) too good to be true and are existing eval metrics sufficient? We've looked into it in our latest research at @MSFTResearch 🧵 (1/n) arxiv.org/abs/2407.09141

English

2.2K

Jayashree Mohan@jayashree2912·14 Tem

Conventional latency and throughput metrics are insufficient in capturing user-facing performance for interactive LLM applications. Introducing our effort : Metron, a benchmark with fluidity metric and fluidity token generation rate that captures user experience!

Amey Agrawal@agrawalamey12

🚀 Introducing Metron: Redefining LLM Serving Benchmarks! 📊 Tired of misleading metrics for LLM performance? Our new paper introduces a holistic framework that captures what really matters - the user experience! 🧠💬 github.com/project-metron… #LLM #AI #Benchmark

English

2.2K

Jayashree Mohan retweetledi

Alexey Tumanov@alsched·30 May

[proud advisor moment] Happy to see @agrawalamey12 work in close collaboration with MSR-India already enjoying adoption and impact at @anyscalecompute! Please refer to Sarathi-Serve (arxiv.org/abs/2403.02310) for the better reference and coverage of the chunked prefill mechanism.

Robert Nishihara@robertnishihara

One of @vllm_project's strengths is that it exposes the ability to trade off latency and throughput. However, higher qps regimes cause significant latency degradation. The underlying reason has to do with inference taking place in two stages: prefilling (processing the input context) and decoding (generating output tokens). Normally in vLLM, these stages don't happen at the same time and so an incoming request will trigger prefill computations that interrupt ongoing decoding (leading to a spike in latency). Chunked prefill breaks prefill computations into multiple "chunks" and batches them along with ongoing decoding computations to avoid interruptions. In a low qps regime, this makes no difference, but in the high qps regime with constant interrupts, this is a big deal. More details in this paper arxiv.org/pdf/2308.16369 by @agrawalamey12. Also in this RFC github.com/vllm-project/v….

English

1.4K

Jayashree Mohan@jayashree2912·12 Tem

Talk to @agrawalamey12 or @nitinkedi at #osdi24 to learn more about our recent work Sarathi-Serve, a scheduler that addresses the latency-throughput tradeoff in LLM serving.

Amey Agrawal@agrawalamey12

Did you ever feel that @chatgpt is done generating your response and then suddenly a burst of tokens show up? This happens when the serving system is prioritizing someone else’s request before generating your response. But why? well to reduce cost. 🧵

English

Jayashree Mohan@jayashree2912·4 Eyl

Checkout our recent work on speeding up token generation for LLM inference. See the tweet deck below to learn more! Full paper is now up on arXiv - arxiv.org/pdf/2308.16369…

Amey Agrawal@agrawalamey12

Ever wondered why @OpenAI charges 2x price for output tokens compared to input? Turns out that an output token can be up to 200x more compute time than an input token. Why? We explored this phenomenon during my internship at @MSFTResearch. 🧵

English

2.6K

Jayashree Mohan@jayashree2912·25 Nis

NEW! Spring Deadline : Eurosys 2023 Consider submitting your work to the Spring deadline (May 11 - abstract, May 18- full paper)

ACM SIGOPS@ACMSIGOPS

#CFP @EuroSys_conf 2023 will take place in Rome, #Italy. Eurosys adopts a new dual-deadline format. The first of the two deadlines (Spring & Fall) is 11-May-2022 (for abstract). CFP and other details can be found at #cfp" target="_blank" rel="nofollow noopener">2023.eurosys.org/cfp.html#cfp

English

Jayashree Mohan@jayashree2912·10 Eyl

@SuprShastri @UTSASLab @RohanKadekodi @aashaka_ @hlebland @Ponnapalli95 @vj_chidambaram @SekwonL Thank you Supreeth, I am humbled. We cubies must catch up sometime soon!

Austin, TX 🇺🇸 English

Supreeth Shastri@SuprShastri·10 Eyl

@jayashree2912 @UTSASLab @RohanKadekodi @aashaka_ @hlebland @Ponnapalli95 @vj_chidambaram @SekwonL Brilliant news, @jayashree2912 What you’ve done is honestly 1.5 PhDs. Guess everyone wanted to have you around for a bit longer! You have been a role model for so many (including me). I fondly remember my time as your cubemate. Fly high, Dr. Mohan 🎉

English

Jayashree Mohan@jayashree2912·9 Eyl

The @UTSASLab family 🤗 Successfully defended my PhD thesis today. Mixed feelings! Going to miss the enthusiasm and energy of this lab more than anything! Can’t believe time flew by in the blink of an eye! We missed you @SuprShastri !

English

Jayashree Mohan@jayashree2912·9 Eyl

@arun_kadekodi @UTSASLab @SuprShastri @RohanKadekodi @aashaka_ @hlebland @Ponnapalli95 @vj_chidambaram @SekwonL Thank you so much, uncle! 😊

Austin, TX 🇺🇸 English

Arun Kadekodi@arun_kadekodi·9 Eyl

@jayashree2912 @UTSASLab @SuprShastri @RohanKadekodi @aashaka_ @hlebland @Ponnapalli95 @vj_chidambaram @SekwonL Congratulations, Jayashree!

English

Jayashree Mohan@jayashree2912·9 Eyl

@TweetAtAKK @vj_chidambaram @UTSASLab Thank you Arun! 😊

Austin, TX 🇺🇸 English

Arun Kumar@TweetAtAKK·9 Eyl

@jayashree2912 @vj_chidambaram @UTSASLab Awesome news! Congrats on both the defense and joining MSRI, Jayashree! They are lucky to have you. 👍 And congrats to the rightfully beaming advisor @vj_chidambaram too! 😄

English

Jayashree Mohan@jayashree2912·9 Eyl

Wohoo. Back after a long break on Twitter! You now know what I was upto 😅 The past 5 years have been a beautiful experience and it’s hard to bid goodbye. So fortunate to have a caring and supportive advisor @vj_chidambaram and group @UTSASLab

Round Rock, TX 🇺🇸 English

135

Jayashree Mohan@jayashree2912·9 Eyl

@Ponnapalli95 @UTSASLab @SuprShastri @RohanKadekodi @aashaka_ @hlebland @vj_chidambaram @SekwonL Thanks so much Soujanya! Super happy to share this moment with you all 🤩

Austin, TX 🇺🇸 English

Soujanya Ponnapalli@Ponnapalli95·9 Eyl

@jayashree2912 @UTSASLab @SuprShastri @RohanKadekodi @aashaka_ @hlebland @vj_chidambaram @SekwonL Congratulations Jayashree! So happy you got to defend in-person and we got to celebrate with you! 🥳

English

Jayashree Mohan@jayashree2912·9 Eyl

@SriramRajamani @stub_AS @vj_chidambaram @UTCompSci @IndiaMSR @DebopamBhattac5 Thanks Sriram! Super excited and looking forward to joining @IndiaMSR !

English

Sriram Rajamani@SriramRajamani·9 Eyl

@stub_AS @vj_chidambaram @jayashree2912 @UTCompSci @IndiaMSR @DebopamBhattac5 Congratulations to Jayashree for defending her thesis! At MSR India, we are thrilled to have @jayashree2912 and @DebopamBattac5 join us as colleagues! Many thanks to Vijay and Ankit!

English

Jayashree Mohan@jayashree2912·9 Eyl

@siobhcroo @vj_chidambaram @UTSASLab @IndiaMSR Thank you so much, @siobhcroo !

English

Natacha Crooks@siobhcroo·9 Eyl

@jayashree2912 @vj_chidambaram @UTSASLab That was insanely quick (and yet very successful)!!! Congratulations and best of luck for the future! @IndiaMSR are lucky to have you.

English

Jayashree Mohan@jayashree2912·9 Eyl

@agrawalamod @vj_chidambaram @UTSASLab Thank you @agrawalamod! No plans of visiting Bay Area this time, but we’ll definitely catch when you’re in Bangalore!

Austin, TX 🇺🇸 English

Amod Agrawal@agrawalamod·9 Eyl

@jayashree2912 @vj_chidambaram @UTSASLab BIG Congratulations, Jayashree! Looking forward to you being on other side of the desk at MSRI. :) Meet up if you’re ever in the Bay Area.

English

Jayashree Mohan@jayashree2912·9 Eyl

@stub_AS @vj_chidambaram @UTCompSci @IndiaMSR @DebopamBhattac5 Thank you! That’s good to know :)

English

Ankit Singla@stub_AS·8 Eyl

@vj_chidambaram @jayashree2912 @UTCompSci @IndiaMSR Congratulations to both of you! @DebopamBhattac5 is joining MSR-Bangalore too, so maybe it helps to connect the two soon-to-be-new-MSR-researchers :)

English

Jayashree Mohan@jayashree2912·25 Haz

@Debolin17031191 @ShashidharNanj4 Sorry, missed this. Dm me if you're still unable to find a slot.

English

Debolina Chatterjee@Debolin17031191·21 Haz

@jayashree2912 @ShashidharNanj4 Hi Jayashree, if you don't mind may I ask how you are getting updates on the availability? This is help me a lot since I don't have to check the account so many times so that it geta frozen. Please help. Thanks for your prompt help so far.

English

Jayashree Mohan@jayashree2912·21 Haz

@Debolin17031191 @ShashidharNanj4 Oh no, sorry about that! I think all slots are gone. There were 50+ slots when I checked. If your account has been locked, maybe you should try after 48-72 hours.

English

Debolina Chatterjee@Debolin17031191·21 Haz

@jayashree2912 @ShashidharNanj4 Thanks a ton. Hey i tried Delhi. But it says error maximum number of times viewed this page!!!! Ufff I am so frustrated. Are they still available? Please let me know if you have any info. After how long should I try again?

English

Jayashree Mohan@jayashree2912·21 Haz

@Debolin17031191 @ShashidharNanj4 Delhi slots available now!

English

Jayashree Mohan@jayashree2912·21 Haz

@Debolin17031191 @ShashidharNanj4 Around 8:30 pm IST. I really don't think there's any pattern in slot availability. I had also been trying for more than a week with no luck.

English

Keşfet

@agrawalamey12 @anyscalecompute @nitinkedi @SuprShastri @UTSASLab @RohanKadekodi @aashaka_ @hlebland