Rishabh Iyer

561 posts

Rishabh Iyer banner
Rishabh Iyer

Rishabh Iyer

@rishiyer

Prof. at UTD CS, Director @caraml_lab | ML/AI/Optimization | Ex-Microsoft | MS, PhD: UW, BTech: IITB | https://t.co/TB3cpJcBeC

Dallas, TX Katılım Şubat 2016
327 Takip Edilen1.6K Takipçiler
Sabitlenmiş Tweet
Rishabh Iyer
Rishabh Iyer@rishiyer·
I just finished up a new course I've been teaching for Spring 2021 titled "Optimization in Machine Learning". Different from typical "OptML" courses, I covered both discrete and continuous optimization in 11 weeks. Here is the youtube playlist: youtube.com/playlist?list=….
English
7
215
968
0
Rishabh Iyer
Rishabh Iyer@rishiyer·
I’ll be at NeurIPS 2025 between December 2nd to December 7th, 2025! Looking forward to meeting new friends and reconnecting with old ones! I’m excited to discuss on topics around data subset selection, combinatorial optimization, data-efficient learning, representation learning and GenAI (specifically targeted generation).
English
0
0
6
481
Rishabh Iyer
Rishabh Iyer@rishiyer·
Takeaway: Your split strategy defines the story your evaluation tells. Random splits answer “can the model generalize on average?” Temporal splits ask “can it predict the future?” Stratified splits ensure fairness across classes. Group-based and leave-one-group-out splits measure generalization to new entities and institutions. Spatial splits test new regions. Domain splits test new contexts. Adversarial splits stress-test robustness. Choosing the right strategy ensures your model isn’t just good on paper, but trustworthy in deployment.
English
0
0
1
107
Rishabh Iyer
Rishabh Iyer@rishiyer·
To ground these different split strategies in practice, we ran four small experiments using synthetic datasets. Each experiment highlights how the wrong split can give misleading confidence in your model — and how the right split exposes the truth. The Figure below shows the results obtained by all our experiments. 1. Temporal Split vs Random Split We simulated time-series data with a drifting trend. With a random split, the model “peeked into the future” and achieved artificially low error (MSE of ~40). With a proper temporal split (train on past, test on future), error rose dramatically (MSE of ~351) — a realistic reflection of deployment. See the Figure below (top row) shows the results. Takeaway: Always use temporal splits for forecasting problems like finance, ads, or search. 2. Group-Based Split vs Random Split We created data where each patient had their own distribution. A decision tree trained and tested on a random split looked great, because the same patients leaked into both sets. But with group splits (entire patients left out of training), accuracy dropped sharply. See the Figure below (middle row) shows the results for this. Takeaway: Group splits are essential in healthcare, recommender systems, or any scenario with entity-level correlations. 3. Adversarial / Stress-Test Split We trained a linear model and a non-linear model on circular data. On a normal test set, the tree model obtained a near perfect accuracy while logistic regression got around 60%. On a stress-test set (points near the non-linear boundary), the linear model collapsed to near-random accuracy, while the non-linear model’s performance also dropped significantly (by around 20%). This suggests that even the non-linear model could be improved by more hyper-parameter tuning. See the Figure below (bottom row) compares the same models (linear and DT) on random and hard test splits. Takeaway: Stress-test splits expose weaknesses hidden in average-case performance, revealing when you need a more robust model.
Rishabh Iyer tweet media
English
1
0
1
137
Rishabh Iyer
Rishabh Iyer@rishiyer·
ML Nugget #2: Choosing the Right Train/Test Split Splitting your dataset into training, validation, and test sets feels like one of the most straightforward steps in machine learning. But the truth is: how you split the data can dramatically change your evaluation and, more importantly, how your model performs once deployed. Random splits aren’t always enough. Depending on the application, you might need to think more carefully about the structure of your data and the problem you’re solving. Here are eight key ways to split data — and when they matter most. - Random Split: The most common approach is to randomly shuffle the dataset and split into train, validation, and test sets. This works well when the data is i.i.d. (independent and identically distributed), such as classifying handwritten digits (MNIST), spam detection on randomly sampled emails, and many real world applications with balanced data. The random split ensures that both train and test come from the same distribution, giving a fair measure of generalization. But if your application involves time, groups, or correlations, random splits can give overly optimistic results. - Temporal Split: When data evolves over time, as in quantitative finance, online advertising, search, or real estate forecasting, temporal splits are the gold standard. Here you train on past data, validate on more recent data, and test on the most recent data (your training data is temporally before your validation/test data). This setup mirrors deployment: tomorrow’s predictions are always based on yesterday’s information and your model will predict information in the future. For example, in stock prediction, a random split could accidentally mix tomorrow’s price into today’s training set, inflating backtest results. A temporal split prevents this “future leakage” and gives a realistic sense of predictive power. - Stratified Split: In highly imbalanced problems like fraud detection, rare disease classification, or churn prediction, random splits can lead to test sets missing the rare but critical class altogether. Stratified splits preserve the class distribution across train and test sets, ensuring your model is evaluated fairly on both common and rare outcomes. For example, in medical applications, you don’t want your test set to contain zero positive cases of a disease just because they’re rare — stratification guarantees that your evaluation remains meaningful. - Group-Based Split: Sometimes your dataset contains multiple examples tied to the same entity — multiple scans of the same patient, multiple purchases from the same customer, or multiple ratings from the same user. If you randomly split, the same entity might appear in both train and test, and the model could exploit entity-specific quirks rather than learning generalizable patterns. Group-based splits ensure that all data from one entity is placed entirely in either train or test. In recommendation systems or medical imaging, this is essential for measuring performance on new users or new patients. - Spatial / Geographic Split: Location matters in domains like real estate pricing, agriculture, satellite imagery, or climate prediction. A random split might put neighboring regions into both train and test sets, leading to inflated performance since nearby areas often share strong correlations. A geographic split (train on Dallas, test on Houston) better evaluates generalization across space. For example, in real estate, if your model only learns “local quirks,” it may fail when deployed in a completely new city. - Domain / Task Split (Cross-Domain Evaluation): In NLP, vision, and speech, it’s not enough to test on the same dataset you trained on. Real-world deployment often means facing new domains. Training a sentiment classifier on electronics reviews and testing it on clothing reviews, or training on natural images and testing on X-rays, checks whether your model learned general features or just dataset-specific ones. Cross-domain splits are a tough but honest way to measure robustness when your deployment setting may not match your training environment. - Leave-One-Group-Out (Cross-Institution Splits): A variant of group-based splits, leave-one-group-out is especially valuable in healthcare and enterprise applications. Imagine training a diagnostic model on data from nine hospitals and testing on the tenth. Rotating through each hospital reveals whether your model generalizes across institutions with different demographics, equipment, or labeling practices. This approach answers a critical deployment question: will my model work in a completely new organization that wasn’t represented in training? - Adversarial / Stress-Test Split: Sometimes you don’t want your test set to be average — you want it to be hard. Adversarial or stress-test splits deliberately focus on rare or challenging scenarios. For example, in fraud detection, you might train on common fraud schemes but test on emerging ones. In self-driving perception, you might train on sunny images but test on rain, snow, or fog. In speech recognition, you might train on quiet audio but test on noisy factory recordings. These splits measure robustness under worst-case conditions, which is often more valuable than average accuracy.
English
1
0
4
267
Rishabh Iyer
Rishabh Iyer@rishiyer·
To make this concrete, I ran two simple simulations that show just how dangerous distribution shift can be. Feature Shift (Covariate Shift): Here, the underlying input features gradually drift over time. Think of stock price features that evolve as market conditions change, or sensor measurements that drift as devices age. I trained a logistic regression model on clean data and then tested it on data where the feature distribution was shifted step by step. What I saw was striking: the model’s accuracy started high as the deployment distribution matched the train/test distribution but dropped steadily on the shifted deployment data. The test set gave the illusion of stability, while the deployment data exposed the decay. Label Distribution Shift (Prior Probability Shift): In this case, the features stayed the same, but the proportion of labels changed — for example, the fraud rate increasing in finance, or churn rising in a subscription business. The model was trained on data with a 70/30 class balance, but in deployment, the ratio of positives gradually increased. Accuracy again declined sharply, even though features hadn’t changed. Once again, train and test scores looked fine, but deployment accuracy told the real story. Together, these simulations (shown in the Figure below) highlight the deployment gap — the difference between test performance (on same-distribution data) and true performance in production (under drift).
Rishabh Iyer tweet media
English
1
0
1
118
Rishabh Iyer
Rishabh Iyer@rishiyer·
ML Nugget 1: Beyond Train/Test: The Deployment Gap and How to Quantify It Every ML 101 course teaches you about train, validation, and test splits. The train set helps the model learn, the validation set helps tune hyperparameters, and the test set estimates generalization. We have learnt not to trust the training and validation set performance, and that the sole purpose of the test set is to quantify how well the model performs at deployment. This setup works beautifully in controlled academic settings. But here’s the catch: the test set is usually sampled from the same distribution as the train set. In the real world, that assumption often breaks down. When you deploy a model into production, it faces distribution shift. User behavior evolves, sensors drift, new slang emerges, and world events (like COVID-19) completely reshape data patterns. In applications like quantitative trading, online search, search advertisement, and home price predictions, the effects of distribution shift is very pronounced. Your model, tuned and validated on historical data, may suddenly underperform in production. That’s why it’s valuable to keep a deployment-style holdout set — a dataset that better simulates post-deployment conditions. For example, you might use time-based splits (train on the past, validate on recent history, and hold out the most recent chunk as a “deployment proxy”). This helps you measure the deployment gap — the difference between test performance and real-world performance. In addition to this, it is good practice to have an automated system to continuously track the "deployment performance" and create automated alerts and warnings when the deployment performance starts declining significantly below what you expect from the model (e.g., from the test set). In industry, the lesson is clear: don’t just optimize for test accuracy. Optimize for robustness against the unknown future. Having this deployment holdout set can save you from painful surprises when your “state-of-the-art” model collapses in production.
English
1
0
7
415
Rishabh Iyer
Rishabh Iyer@rishiyer·
I’ve been teaching AI/ML courses for several years and working with companies and startups for over a decade. Along the way, I’ve picked up practical lessons in machine learning that often don’t make it into standard textbooks or courses. I’m starting a short series to share these “ML nuggets” — insights at the intersection of research and real-world practice.
English
0
0
9
444
Aryan Mokhtari
Aryan Mokhtari@AryanMokhtari·
Exciting News: I officially got tenure! Huge thanks to my amazing students, collaborators, and mentors!
English
56
2
354
23K
Mohit Bansal
Mohit Bansal@mohitban47·
Deeply honored and humbled to have received the Presidential #PECASE Award by the @WhiteHouse and @POTUS office! Very grateful to my amazing mentors, students, postdocs, collaborators, and friends+family for making this possible, and for making the journey worthwhile + beautiful 💙 🙏 (Also congrats to all the winners from the last 4-5 years/batches + glad this has been finally announced officially 🙂)
UNC Computer Science@unccs

🎉 Congratulations to Prof. @mohitban47 for receiving the Presidential #PECASE Award by @WhiteHouse, which is the highest honor bestowed by US govt. on outstanding scientists/engineers who show exceptional potential for leadership early in their careers! whitehouse.gov/ostp/news-upda…

English
58
29
350
30K
Rishabh Iyer
Rishabh Iyer@rishiyer·
Congratulations India! What a World Cup win! Indian team today is one of the strongest it has ever been! While India has had good batsmen always, India’s bowling has really improved. This has been a game changer!
English
0
0
14
599
Andreas Krause
Andreas Krause@arkrause·
Greatly honored to join the ranks of the #ACMFellows! Thank you so much to my nominator and endorsers, as well as my amazing students, collaborators and mentors over the years! @ETH_en @ETH_AI_Center @TheOfficialACM
ETH CS Department@CSatETH

👏Big congratulations to @arkrause for being named @TheOfficialACM Fellow. The distinction recognises Krause's extensive research contributions to learning-based decision making under uncertainty. @ETH_en @ETH_AI_Center bit.ly/429CxBg

English
27
8
222
22.2K
Rishabh Iyer
Rishabh Iyer@rishiyer·
Today I'm filled with joy to see the Ram Mandir opening in Ayodhya!! Lord Ram exemplifies what it means to be an ideal person - ideal king, ideal son, ideal husband, and ideal in every way! I have little doubt that the next decade will be that of India! #RamMandirPranPrathistha
English
0
0
16
1.5K
Rishabh Iyer
Rishabh Iyer@rishiyer·
Thank you for the invitation! I presented work done by @krishnatejakk's Ph.D. on subset selection for compute-efficient deep learning! I also enjoyed all the other talks at the conference! It was a solid program! Congrats to the organizers of @indoml_sym!!
IndoML Symposium, 2025@indoml_sym

Day 1: Session 2 Machine Learning Talk 2: Rishabh Iyer Professor UT Dallas Subset Selection for Compute-Efficient Deep Learning Professor Rishabh took us through approaches like GLISTER, GRAD-MATCH, MILD, which helped us make our concepts on subset selection crystal clear!

English
0
1
14
1.2K
Rishabh Iyer retweetledi
IndoML Symposium, 2025
IndoML Symposium, 2025@indoml_sym·
Day 1: Session 2 Machine Learning Talk 2: Rishabh Iyer Professor UT Dallas Subset Selection for Compute-Efficient Deep Learning Professor Rishabh took us through approaches like GLISTER, GRAD-MATCH, MILD, which helped us make our concepts on subset selection crystal clear!
IndoML Symposium, 2025 tweet mediaIndoML Symposium, 2025 tweet mediaIndoML Symposium, 2025 tweet mediaIndoML Symposium, 2025 tweet media
English
0
5
14
2.4K