SOUVIK KUNDU

218 posts

SOUVIK KUNDU

SOUVIK KUNDU

@thisissouvikk

IEEE/ACM DAC under 40 innovator'25 | INNS Young Investigator'24 | CPAL rising star'25 | SRC Outstanding Researcher'23 | RS @ Intel Labs | Opinions are personal.

Los Angeles(US), Kolkata(IND) Katılım Nisan 2012
261 Takip Edilen222 Takipçiler
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
🎉🎉Announcing #ICML2026 workshop on "Multi-Modal Agentic AI": scale-icml-2026.github.io Topics covered: 1. Agentic memory, 2. Efficient agentic AI systems, 3. Scaling of multi-modal agents, 4. Agents for Planning, 5. Evaluation, guard railing, and benchmarking of agents 🚀Confirmed speakers: @dasongle (GenBio) @mohitban47 (UNCCH) @james_y_zou (Stanford) @chelseabfinn (Stanford) @MengdiWang10 (Princeton) @sunjiao123sun_ (Google) @MikeShou1 (NUS) @MinhyukSung (KAIST) With @HongyiWang10 @digbose92 @jaeh0ng_yoon @ManlingLi_ @nagsayan112358 @schowdhury671 #AgenticAI #MAS #MultimodalAI
SOUVIK KUNDU tweet media
English
0
4
11
3.8K
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
As the #ICML2026 review deadline is approaching (in a day), here is the current status of my AC lot. Seems like it is a race against time now. Overall, we need to chase down around 44% of the missing reviews in a day. With only 12.5% paper receiving all four reviews: God bless peer reviewing, long live human reviews.
SOUVIK KUNDU tweet media
English
0
0
0
239
#CVPR2026
#CVPR2026@CVPR·
The area chairs suggested 1,717 papers (which are not accepted to the #CVPR2026 main conference) for inclusion in the Findings workshop. Inclusion is subject to opt-in by the authors, and review by the Findings workshop organizers.
English
9
4
45
15.4K
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
🚀#MLSys2026 Growing KV cache for the CoT reasoning steps and interpretable response of LRM is a problem for both datacenter serving as well as PC inference. To resolve this, we present SkipKV, a two way solution: ✅ Skip KV generation - a strategy to reduce redundant thoughts ✅ Skip KV storage - a sentence and semantic aware cache eviction 👉Paper: arxiv.org/pdf/2512.07993 👉 Code: to be released soon!
SOUVIK KUNDU tweet media
English
0
0
7
137
SOUVIK KUNDU retweetledi
SOUVIK KUNDU retweetledi
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
Ever wonder why big AI models get so slow and memory-hungry when they "think"? Intel researchers find that current methods to speed them up actually break their reasoning and make them more verbose. The solution? SkipKV, a clever technique that compresses the thought process by removing redundant sentences, boosting both speed and accuracy. SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models Paper: arxiv.org/abs/2512.07993
机器之心 JIQIZHIXIN tweet media
English
4
20
184
10.9K
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
#NeurIPS2025 We had been working towards a new sampling method for LLMs autoregressive token generation, particularly to **push the boundary of** creativity-coherence balance!! Happy to get some initial success that we published as a paper here, in our recently accepted #NeurIPS2025: 𝗧𝗼𝗽-𝗛 𝗗𝗲𝗰𝗼𝗱𝗶𝗻𝗴: 𝗔𝗱𝗮𝗽𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗖𝗿𝗲𝗮𝘁𝗶𝘃𝗶𝘁𝘆 𝗮𝗻𝗱 𝗖𝗼𝗵𝗲𝗿𝗲𝗻𝗰𝗲 𝘄𝗶𝘁𝗵 𝗕𝗼𝘂𝗻𝗱𝗲𝗱 𝗘𝗻𝘁𝗿𝗼𝗽𝘆 𝗶𝗻 𝗧𝗲𝘅𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 In this paper, we propose Top-H decoding, a new entropy-bounded sampling strategy that dynamically balances creativity and coherence in large language model generation. Our method achieves up to 𝟮𝟱.𝟲𝟯% improvements over strong baselines like min-p, while staying robust at high temperatures, especially useful for creative writing and open-ended generation. Work done in collaboration with @USCMingHsiehEE @USC !!! 📄 Read the paper: lnkd.in/eive6rsc 💻 Code: lnkd.in/e89h5ipU
SOUVIK KUNDU tweet media
English
0
0
3
134
SOUVIK KUNDU retweetledi
Amey Agrawal
Amey Agrawal@agrawalamey12·
After hitting evaluation puzzles like this in our own work, we analyzed patterns across LLM inference papers and identified 8 systematic evaluation issues that can make performance comparisons misleading. We have compiled a practical evaluation checklist to help avoid these pitfalls. 📄 arxiv.org/abs/2507.09019 We're also releasing Veeksha, our comprehensive LLM inference evaluation framework, later this month to help the community design more robust benchmarks! 🛠️ What evaluation issues have you discovered in your systems work? Let's learn from each other's mistakes! @nitinkedi @jayashree2912 @kwatra @thisissouvikk @ramaramjee @alsched @gtcomputing @MSFTResearch @intel
English
0
3
5
578
SOUVIK KUNDU retweetledi
NeurIPS Conference
NeurIPS Conference@NeurIPSConf·
We're excited to announce a second physical location for NeurIPS 2025, in Mexico City. By expanding our physical locations, we hope to address concerns around skyrocketing attendance and difficulties in obtaining travel visas that some attendees have experienced in the past few years when only one location was available. Read more in our blog post: blog.neurips.cc/2025/07/16/neu…
English
21
80
621
206.9K
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
It gives me a pleasant surprise to be recognized as one of the three young investigators of 2024 (award year #IJCNN2025) as conferred by the @INNSociety Congratulations to the other recipients @sijialiu17 and @ Bo Han Now its time to go back to the black screen and meetings with the awesome collaborators.. :)
SOUVIK KUNDU tweet media
English
1
0
6
222
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
#NeurIPS2025 #reviewingstatus Dear reviewers of @NeurIPSConf , today is the deadline to submit your reviews. Please do so asap. Reviewing responsibility should be evaluated more strictly for such conferences to maintain good quality and timely reviews! here is a snippet of current reviewing completion status under my AC lot! Lets get this done before the date changes.
SOUVIK KUNDU tweet media
English
0
0
2
507
SOUVIK KUNDU
SOUVIK KUNDU@thisissouvikk·
With the rapid growth of image and video generative tasks, their efficient and adaptable deployment has become essential. The denoising steps of diffusion models serves as a key concept to generate image from noise, and also acts as a key bottleneck for the diffusion model's generative latency. While there have been and ongoing works to present efficiency to reduce time steps and perform low-precision diffusion operations, we take an orthogonal route to adaptively choose denoising and generative models. In specific, we introduce a conglomeration of models dubbed as "𝗺𝗶𝘅𝘁𝘂𝗿𝗲 𝗼𝗳 𝗱𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀" (𝗠𝗼𝗗𝗠) that can essentially serve as the fundamental block of an efficient image generation system. This work has been accepted at Int. Conference on Architectural Support for Programming Languages and Operating Systems (#ASPLOS) 2026, with 𝟗̲.̲𝟓̲%̲ ̲𝐚̲𝐜̲𝐜̲𝐞̲𝐩̲𝐭̲𝐚̲𝐧̲𝐜̲𝐞̲ ̲𝐫̲𝐚̲𝐭̲𝐞̲ 🥶 !!! 🚀𝗞𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀: ----------------------- 🌟We present MoDM, a novel 𝗰𝗮𝗰𝗵𝗶𝗻𝗴-𝗯𝗮𝘀𝗲𝗱 𝘀𝗲𝗿𝘃𝗶𝗻𝗴 𝘀𝘆𝘀𝘁𝗲𝗺 𝗳𝗼𝗿 𝗱𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀 that dynamically balances latency and quality through a mixture of diffusion models. 🌟We design a 𝗴𝗹𝗼𝗯𝗮𝗹 𝗺𝗼𝗻𝗶𝘁𝗼𝗿 𝘁𝗵𝗮𝘁 𝗼𝗽𝘁𝗶𝗺𝗮𝗹𝗹𝘆 𝗮𝗹𝗹𝗼𝗰𝗮𝘁𝗲𝘀 𝗚𝗣𝗨 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 and balances inference workload, ensuring high throughput while meeting Service-Level Objectives (SLOs) under varying request rates. 🌟Our evaluations show that MoDM significantly 𝗿𝗲𝗱𝘂𝗰𝗲𝘀 𝗮𝗻 𝗮𝘃𝗲𝗿𝗮𝗴𝗲 𝘀𝗲𝗿𝘃𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗯𝘆 𝟮.𝟱× 𝘄𝗵𝗶𝗹𝗲 𝗿𝗲𝘁𝗮𝗶𝗻𝗶𝗻𝗴 𝗶𝗺𝗮𝗴𝗲 𝗾𝘂𝗮𝗹𝗶𝘁𝘆, making it a practical solution for scalable and resource-efficient model deployment. Work done with @TalatiNishil (@UMich)
SOUVIK KUNDU tweet media
English
0
0
1
111