Samet Oymak

337 posts

Samet Oymak banner
Samet Oymak

Samet Oymak

@SametOymac

Professor @UMich EECS | Visiting Faculty @Google. Research on the Foundations of ML+RL+LLM

Katılım Nisan 2021
313 Takip Edilen1.4K Takipçiler
Samet Oymak retweetledi
Alex Cohen
Alex Cohen@anothercohen·
I was fired from Block today. I was the PM in charge of changing the default tip option on the Square terminal to start at 40%. Jack replaced me with an AI agent that decides which tip amount to show based on your age, weight, and race. If anyone is hiring product managers, please let me know!
English
213
213
5.6K
997.4K
Samet Oymak
Samet Oymak@SametOymac·
Fantastic job by @DimitrisPapail! Two hypotheses: - Harder benchmarks likely have high variance and tricky to predict (like many models achieve near zero). - Expensive benchmarks with more problems may be more predictable due to finer difficulty granularity (so scaling laws show up). This makes IMO 2025 hardest to predict as actually observed :)
Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2026…

English
0
0
5
1.1K
Samet Oymak retweetledi
Koustava Goswami
Koustava Goswami@koustavagoswami·
Our paper "SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG" is accepted at #ICLR2026 🎉 Static chunks fail for long docs,Let the model decide. We make chunking a reasoning problem, introducing STITCH, a new RL↔SFT post-training loop
Koustava Goswami tweet media
English
2
1
5
937
Samet Oymak retweetledi
Jeff Dean
Jeff Dean@JeffDean·
We’ve pushed out the Pareto frontier of efficiency vs. intelligence again. With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models, now running at Flash-level latency. This opens up entirely new categories of near real-time applications that require complex thought. It’s available in the API, and rolling out today as the default model in AI Mode in Search and Gemini app globally. Read more on the blog at: bit.ly/4pTo5YU More in thread ⬇️
Jeff Dean tweet media
English
53
196
1.8K
158.3K
Samet Oymak retweetledi
Jeff Dean
Jeff Dean@JeffDean·
I’m really excited about our release of Gemini 3 today, the result of hard work by many, many people in the Gemini team and all across Google! 🎊 We’ve built many exciting new product experiences with it, as you’ll see today and in the coming weeks and months. You can find it today on @GeminiApp and AI Mode in Search. For developers, you can build with it now in @GoogleAIStudio and Vertex AI. blog.google/products/gemin… The model performs quite well on a wide range of benchmarks.
Jeff Dean tweet media
English
208
339
3.4K
398.2K
Samet Oymak retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
Update: TabPFN-2.5 is actually the SOTA model on all of TabArena (which has datasets with up to 100k training data points). In a single forward pass, TabPFN-2.5 outperforms all other models, even if you tune them for 4 hours. We built and previously evaluated TabPFN-2.5 for up to 50k data points (and 2k features) and were kind of surprised that it's SOTA up to 100k 🙂 👉 TabPFN-2.5 webinar tomorrow: app.livestorm.co/p/21526e44-406… 👉 Model report on arXiv: arxiv.org/pdf/2511.08667
Frank Hutter tweet media
English
0
7
73
5.5K
Samet Oymak retweetledi
Ben Recht
Ben Recht@beenwrekt·
The arXiv position paper controversy and the weird, unwritten, organic evolution of academic practice. argmin.net/p/a-position-o…
English
3
3
31
4.6K
Samet Oymak retweetledi
Pushmeet Kohli
Pushmeet Kohli@pushmeet·
(1) Our team at @GoogleDeepMind has been collaborating with Terence Tao and Javier Gómez-Serrano to use our AI agents (AlphaEvolve, AlphaProof, & Gemini Deep Think) for advancing Maths research. They find that AlphaEvolve can help discover new results across a range of problems.
English
26
180
1.8K
147.2K
Samet Oymak
Samet Oymak@SametOymac·
@tydsh This is crazy and it's truly their loss
English
0
0
1
1.1K
Yuandong Tian
Yuandong Tian@tydsh·
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
English
473
268
6.4K
4.4M
Samet Oymak retweetledi
Richard Sutton
Richard Sutton@RichardSSutton·
Learning is the derivative of knowledge.
English
91
135
1.6K
109.5K
Samet Oymak
Samet Oymak@SametOymac·
This paper plane outside my office has been there since I joined U of M in 2023. Considering nominating it for tenure.
Samet Oymak tweet media
English
2
1
77
7.1K
Samet Oymak retweetledi
Andrea Montanari
Andrea Montanari@Andrea__M·
Honest question. What "scaling laws" theory papers that are not a variation on 1980s nonparametric statistics?
English
6
8
134
16K
Samet Oymak
Samet Oymak@SametOymac·
@docmilanfar Wow this is exactly what I was teaching in the class this week: Gaussian kernel smoother = Softmax Attention
English
1
0
15
3.8K
Peyman Milanfar
Peyman Milanfar@docmilanfar·
How Kernel Regression is related to Attention Mechanism - a summary in 10 slides. 0/1
Peyman Milanfar tweet media
English
15
193
1.3K
221.1K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Small models as the new frontier and why this may be academia's LLM moment Academia should reject the nihilism of "scale is all you need", i.e, that meaningful research requires frontier scale compute. This mindset hurts basic research and what we can contribute to machine learning in practice. Many interesting questions about architectures, data, and training methods do show signal and can be tested at the O(100M) to O(1B) parameter scale within reasonable budget. There seems to exist no fundamental reason why these insights wouldn't transfer and hold up to 14B, 32B, or even larger models. Yes, there will be trends and observations that break at the trillion parameter scale, but my conjecture is that this will be irrelevant for the majority of models people will actually deploy locally in the future. The economics of post-training (SFT/RL) are finally favorable for academia. Post training a 7B model fits on a single H100 GPU, which roughly $3/hour on cloud providers. You can train on 100M+ tokens for under $100. Why care about mid/post-training? That's where a lot of interesting problems are! Reasoning, tool use, specialization, etc, these are settings where you see meaningful performance improvements and skills learnt within millions of trained tokens, not billions, that are typical for pretraining. More importantly, the 4B-32B parameter range will likely dominate local deployment in the not so distant future. These models fit on reasonable hardware (a beefy laptop) as inference requires enough RAM to fit the model, but you can use without GPUs for single batch inf calls. Also these models, at that scale, are getting seriously good for tasks likecoding, math, tutoring, computer use etc. So here is my conjecture: local models at the <100B scale will eventually generate more tokens/day than api-hosted frontier models. This may be academia's moment! The open-weights ecosystem provides a path to real impact without million-dollar GPU clusters at this scale. Our research can directly study, understand, and improve the 99% of models that will run locally, not the 1% that require data centers. This is finally both possible and meaningful. Don't be discouraged by scale maximalism!!
English
21
37
338
65.4K
Samet Oymak retweetledi
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Congrats to @jackcai1206 and @nayoung_nylee for their NeurIPS Spotlight on "Extrapolation by Association: Length Generalization Transfer In Transformers" :)
Dimitris Papailiopoulos@DimitrisPapail

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)

English
4
17
114
11.3K