Rishabh Iyer

561 posts

Rishabh Iyer

@rishiyer

Prof. at UTD CS, Director @caraml_lab | ML/AI/Optimization | Ex-Microsoft | MS, PhD: UW, BTech: IITB | https://t.co/TB3cpJcBeC

Dallas, TX Katılım Şubat 2016

329 Takip Edilen1.6K Takipçiler

Sabitlenmiş Tweet

Rishabh Iyer@rishiyer·18 Nis

I just finished up a new course I've been teaching for Spring 2021 titled "Optimization in Machine Learning". Different from typical "OptML" courses, I covered both discrete and continuous optimization in 11 weeks. Here is the youtube playlist: youtube.com/playlist?list=….

English

213

966

Rishabh Iyer@rishiyer·1 Ara

I’ll be at NeurIPS 2025 between December 2nd to December 7th, 2025! Looking forward to meeting new friends and reconnecting with old ones! I’m excited to discuss on topics around data subset selection, combinatorial optimization, data-efficient learning, representation learning and GenAI (specifically targeted generation).

English

497

Rishabh Iyer@rishiyer·1 Eyl

Takeaway: Your split strategy defines the story your evaluation tells. Random splits answer “can the model generalize on average?” Temporal splits ask “can it predict the future?” Stratified splits ensure fairness across classes. Group-based and leave-one-group-out splits measure generalization to new entities and institutions. Spatial splits test new regions. Domain splits test new contexts. Adversarial splits stress-test robustness. Choosing the right strategy ensures your model isn’t just good on paper, but trustworthy in deployment.

English

112

Rishabh Iyer@rishiyer·1 Eyl

To ground these different split strategies in practice, we ran four small experiments using synthetic datasets. Each experiment highlights how the wrong split can give misleading confidence in your model — and how the right split exposes the truth. The Figure below shows the results obtained by all our experiments. 1. Temporal Split vs Random Split We simulated time-series data with a drifting trend. With a random split, the model “peeked into the future” and achieved artificially low error (MSE of ~40). With a proper temporal split (train on past, test on future), error rose dramatically (MSE of ~351) — a realistic reflection of deployment. See the Figure below (top row) shows the results. Takeaway: Always use temporal splits for forecasting problems like finance, ads, or search. 2. Group-Based Split vs Random Split We created data where each patient had their own distribution. A decision tree trained and tested on a random split looked great, because the same patients leaked into both sets. But with group splits (entire patients left out of training), accuracy dropped sharply. See the Figure below (middle row) shows the results for this. Takeaway: Group splits are essential in healthcare, recommender systems, or any scenario with entity-level correlations. 3. Adversarial / Stress-Test Split We trained a linear model and a non-linear model on circular data. On a normal test set, the tree model obtained a near perfect accuracy while logistic regression got around 60%. On a stress-test set (points near the non-linear boundary), the linear model collapsed to near-random accuracy, while the non-linear model’s performance also dropped significantly (by around 20%). This suggests that even the non-linear model could be improved by more hyper-parameter tuning. See the Figure below (bottom row) compares the same models (linear and DT) on random and hard test splits. Takeaway: Stress-test splits expose weaknesses hidden in average-case performance, revealing when you need a more robust model.

English

151

Rishabh Iyer@rishiyer·1 Eyl

ML Nugget #2: Choosing the Right Train/Test Split Splitting your dataset into training, validation, and test sets feels like one of the most straightforward steps in machine learning. But the truth is: how you split the data can dramatically change your evaluation and, more importantly, how your model performs once deployed. Random splits aren’t always enough. Depending on the application, you might need to think more carefully about the structure of your data and the problem you’re solving. Here are eight key ways to split data — and when they matter most. - Random Split: The most common approach is to randomly shuffle the dataset and split into train, validation, and test sets. This works well when the data is i.i.d. (independent and identically distributed), such as classifying handwritten digits (MNIST), spam detection on randomly sampled emails, and many real world applications with balanced data. The random split ensures that both train and test come from the same distribution, giving a fair measure of generalization. But if your application involves time, groups, or correlations, random splits can give overly optimistic results. - Temporal Split: When data evolves over time, as in quantitative finance, online advertising, search, or real estate forecasting, temporal splits are the gold standard. Here you train on past data, validate on more recent data, and test on the most recent data (your training data is temporally before your validation/test data). This setup mirrors deployment: tomorrow’s predictions are always based on yesterday’s information and your model will predict information in the future. For example, in stock prediction, a random split could accidentally mix tomorrow’s price into today’s training set, inflating backtest results. A temporal split prevents this “future leakage” and gives a realistic sense of predictive power. - Stratified Split: In highly imbalanced problems like fraud detection, rare disease classification, or churn prediction, random splits can lead to test sets missing the rare but critical class altogether. Stratified splits preserve the class distribution across train and test sets, ensuring your model is evaluated fairly on both common and rare outcomes. For example, in medical applications, you don’t want your test set to contain zero positive cases of a disease just because they’re rare — stratification guarantees that your evaluation remains meaningful. - Group-Based Split: Sometimes your dataset contains multiple examples tied to the same entity — multiple scans of the same patient, multiple purchases from the same customer, or multiple ratings from the same user. If you randomly split, the same entity might appear in both train and test, and the model could exploit entity-specific quirks rather than learning generalizable patterns. Group-based splits ensure that all data from one entity is placed entirely in either train or test. In recommendation systems or medical imaging, this is essential for measuring performance on new users or new patients. - Spatial / Geographic Split: Location matters in domains like real estate pricing, agriculture, satellite imagery, or climate prediction. A random split might put neighboring regions into both train and test sets, leading to inflated performance since nearby areas often share strong correlations. A geographic split (train on Dallas, test on Houston) better evaluates generalization across space. For example, in real estate, if your model only learns “local quirks,” it may fail when deployed in a completely new city. - Domain / Task Split (Cross-Domain Evaluation): In NLP, vision, and speech, it’s not enough to test on the same dataset you trained on. Real-world deployment often means facing new domains. Training a sentiment classifier on electronics reviews and testing it on clothing reviews, or training on natural images and testing on X-rays, checks whether your model learned general features or just dataset-specific ones. Cross-domain splits are a tough but honest way to measure robustness when your deployment setting may not match your training environment. - Leave-One-Group-Out (Cross-Institution Splits): A variant of group-based splits, leave-one-group-out is especially valuable in healthcare and enterprise applications. Imagine training a diagnostic model on data from nine hospitals and testing on the tenth. Rotating through each hospital reveals whether your model generalizes across institutions with different demographics, equipment, or labeling practices. This approach answers a critical deployment question: will my model work in a completely new organization that wasn’t represented in training? - Adversarial / Stress-Test Split: Sometimes you don’t want your test set to be average — you want it to be hard. Adversarial or stress-test splits deliberately focus on rare or challenging scenarios. For example, in fraud detection, you might train on common fraud schemes but test on emerging ones. In self-driving perception, you might train on sunny images but test on rain, snow, or fog. In speech recognition, you might train on quiet audio but test on noisy factory recordings. These splits measure robustness under worst-case conditions, which is often more valuable than average accuracy.

English

294

Rishabh Iyer@rishiyer·1 Eyl

To make this concrete, I ran two simple simulations that show just how dangerous distribution shift can be. Feature Shift (Covariate Shift): Here, the underlying input features gradually drift over time. Think of stock price features that evolve as market conditions change, or sensor measurements that drift as devices age. I trained a logistic regression model on clean data and then tested it on data where the feature distribution was shifted step by step. What I saw was striking: the model’s accuracy started high as the deployment distribution matched the train/test distribution but dropped steadily on the shifted deployment data. The test set gave the illusion of stability, while the deployment data exposed the decay. Label Distribution Shift (Prior Probability Shift): In this case, the features stayed the same, but the proportion of labels changed — for example, the fraud rate increasing in finance, or churn rising in a subscription business. The model was trained on data with a 70/30 class balance, but in deployment, the ratio of positives gradually increased. Accuracy again declined sharply, even though features hadn’t changed. Once again, train and test scores looked fine, but deployment accuracy told the real story. Together, these simulations (shown in the Figure below) highlight the deployment gap — the difference between test performance (on same-distribution data) and true performance in production (under drift).

English

121

Rishabh Iyer@rishiyer·1 Eyl

ML Nugget 1: Beyond Train/Test: The Deployment Gap and How to Quantify It Every ML 101 course teaches you about train, validation, and test splits. The train set helps the model learn, the validation set helps tune hyperparameters, and the test set estimates generalization. We have learnt not to trust the training and validation set performance, and that the sole purpose of the test set is to quantify how well the model performs at deployment. This setup works beautifully in controlled academic settings. But here’s the catch: the test set is usually sampled from the same distribution as the train set. In the real world, that assumption often breaks down. When you deploy a model into production, it faces distribution shift. User behavior evolves, sensors drift, new slang emerges, and world events (like COVID-19) completely reshape data patterns. In applications like quantitative trading, online search, search advertisement, and home price predictions, the effects of distribution shift is very pronounced. Your model, tuned and validated on historical data, may suddenly underperform in production. That’s why it’s valuable to keep a deployment-style holdout set — a dataset that better simulates post-deployment conditions. For example, you might use time-based splits (train on the past, validate on recent history, and hold out the most recent chunk as a “deployment proxy”). This helps you measure the deployment gap — the difference between test performance and real-world performance. In addition to this, it is good practice to have an automated system to continuously track the "deployment performance" and create automated alerts and warnings when the deployment performance starts declining significantly below what you expect from the model (e.g., from the test set). In industry, the lesson is clear: don’t just optimize for test accuracy. Optimize for robustness against the unknown future. Having this deployment holdout set can save you from painful surprises when your “state-of-the-art” model collapses in production.

English

420

Rishabh Iyer@rishiyer·1 Eyl

I’ve been teaching AI/ML courses for several years and working with companies and startups for over a decade. Along the way, I’ve picked up practical lessons in machine learning that often don’t make it into standard textbooks or courses. I’m starting a short series to share these “ML nuggets” — insights at the intersection of research and real-world practice.

English

449

Rishabh Iyer@rishiyer·24 Şub

@AryanMokhtari Congrats Aryan! Well deserved!

English

227

Aryan Mokhtari@AryanMokhtari·24 Şub

Exciting News: I officially got tenure! Huge thanks to my amazing students, collaborators, and mentors!

English

354

23K

Vibhav Gogate@VibhavGogate·16 Oca

Please congragulate my dear friend @Sriraam_UTD who is now a AAAI fellow. Well done @Sriraam_UTD . Details here: aaai.org/about-aaai/aaa…

English

4.5K

Rishabh Iyer@rishiyer·16 Oca

@VibhavGogate @Sriraam_UTD Congratulations @Sriraam_UTD! Well deserved and a fitting recognition for all the nice work done over the years! We are very proud of you!

English

129

Rishabh Iyer@rishiyer·16 Oca

@ravi_iitm @RealAAAI @Sriraam_UTD Congratulations! Very well deserved!

English

210

Balaraman Ravindran@ravi_iitm·16 Oca

Honoured and humbled to be elected an @RealAAAI Fellow this year, along with an illustrious cohort. Congratulations to all! Doubly happy to be receiving this recognition along with my good friend @Sriraam_UTD!! Thanks to my nominators, supporters, students, collaborators and, above all, my family. aaai.org/about-aaai/aaa… @WSAI_IITM @iitmadras

English

326

16.8K

Rishabh Iyer@rishiyer·15 Oca

@mohitban47 @WhiteHouse @POTUS Congrats Mohit! Very well deserved!

English

274

Mohit Bansal@mohitban47·15 Oca

Deeply honored and humbled to have received the Presidential #PECASE Award by the @WhiteHouse and @POTUS office! Very grateful to my amazing mentors, students, postdocs, collaborators, and friends+family for making this possible, and for making the journey worthwhile + beautiful 💙 🙏 (Also congrats to all the winners from the last 4-5 years/batches + glad this has been finally announced officially 🙂)

UNC Computer Science@unccs

🎉 Congratulations to Prof. @mohitban47 for receiving the Presidential #PECASE Award by @WhiteHouse, which is the highest honor bestowed by US govt. on outstanding scientists/engineers who show exceptional potential for leadership early in their careers! whitehouse.gov/ostp/news-upda…

English

348

30K

Rishabh Iyer@rishiyer·30 Eyl

Very nice insights! Agree 100%!

Ashish Kapoor@akapoor_av8r

7 lessons from AirSim: I ran the autonomous systems and robotics research effort at Microsoft for nearly a decade and here are our biggest learnings. A thread 🧵 Complete blog: sca.fo/AAeoC

English

423

Rishabh Iyer retweetledi

Amin Karbasi@aminkarbasi·11 Ağu

A breakthrough just dropped. arxiv.org/pdf/2408.03583

English

482

74.6K

Rishabh Iyer@rishiyer·29 Haz

Congratulations India! What a World Cup win! Indian team today is one of the strongest it has ever been! While India has had good batsmen always, India’s bowling has really improved. This has been a game changer!

English

601

Rishabh Iyer@rishiyer·26 Oca

@arkrause @ETH_en @ETH_AI_Center @TheOfficialACM Congratulations Andreas! Very well deserved!

English

146

Andreas Krause@arkrause·25 Oca

Greatly honored to join the ranks of the #ACMFellows! Thank you so much to my nominator and endorsers, as well as my amazing students, collaborators and mentors over the years! @ETH_en @ETH_AI_Center @TheOfficialACM

ETH CS Department@CSatETH

👏Big congratulations to @arkrause for being named @TheOfficialACM Fellow. The distinction recognises Krause's extensive research contributions to learning-based decision making under uncertainty. @ETH_en @ETH_AI_Center bit.ly/429CxBg

English

222

22.2K

Rishabh Iyer@rishiyer·22 Oca

Today I'm filled with joy to see the Ram Mandir opening in Ayodhya!! Lord Ram exemplifies what it means to be an ideal person - ideal king, ideal son, ideal husband, and ideal in every way! I have little doubt that the next decade will be that of India! #RamMandirPranPrathistha

English

1.5K

Rishabh Iyer@rishiyer·22 Ara

Thank you for the invitation! I presented work done by @krishnatejakk's Ph.D. on subset selection for compute-efficient deep learning! I also enjoyed all the other talks at the conference! It was a solid program! Congrats to the organizers of @indoml_sym!!

IndoML Symposium, 2025@indoml_sym

Day 1: Session 2 Machine Learning Talk 2: Rishabh Iyer Professor UT Dallas Subset Selection for Compute-Efficient Deep Learning Professor Rishabh took us through approaches like GLISTER, GRAD-MATCH, MILD, which helped us make our concepts on subset selection crystal clear!

English

1.2K

Rishabh Iyer retweetledi