Samet Oymak

337 posts

Samet Oymak

@SametOymac

Professor @UMich EECS | Visiting Faculty @Google. Research on the Foundations of ML+RL+LLM

Katılım Nisan 2021

313 Takip Edilen1.4K Takipçiler

Samet Oymak retweetledi

Alex Cohen@anothercohen·27 Şub

I was fired from Block today. I was the PM in charge of changing the default tip option on the Square terminal to start at 40%. Jack replaced me with an AI agent that decides which tip amount to show based on your age, weight, and race. If anyone is hiring product managers, please let me know!

English

213

5.6K

997.4K

Samet Oymak@SametOymac·26 Şub

Fantastic job by @DimitrisPapail! Two hypotheses: - Harder benchmarks likely have high variance and tricky to predict (like many models achieve near zero). - Expensive benchmarks with more problems may be more predictable due to finer difficulty granularity (so scaling laws show up). This makes IMO 2025 hardest to predict as actually observed :)

Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2026…

English

1.1K

Samet Oymak retweetledi

Koustava Goswami@koustavagoswami·28 Oca

Our paper "SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG" is accepted at #ICLR2026 🎉 Static chunks fail for long docs,Let the model decide. We make chunking a reasoning problem, introducing STITCH, a new RL↔SFT post-training loop

English

937

Samet Oymak retweetledi

Jeff Dean@JeffDean·17 Ara

We’ve pushed out the Pareto frontier of efficiency vs. intelligence again. With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models, now running at Flash-level latency. This opens up entirely new categories of near real-time applications that require complex thought. It’s available in the API, and rolling out today as the default model in AI Mode in Search and Gemini app globally. Read more on the blog at: bit.ly/4pTo5YU More in thread ⬇️

English

196

1.8K

158.3K

Samet Oymak retweetledi

Ben Recht@beenwrekt·3 Ara

All of reinforcement learning in 809 words. argmin.net/p/defining-rei…

English

191

19.5K

Samet Oymak retweetledi

Jeff Dean@JeffDean·18 Kas

I’m really excited about our release of Gemini 3 today, the result of hard work by many, many people in the Gemini team and all across Google! 🎊 We’ve built many exciting new product experiences with it, as you’ll see today and in the coming weeks and months. You can find it today on @GeminiApp and AI Mode in Search. For developers, you can build with it now in @GoogleAIStudio and Vertex AI. blog.google/products/gemin… The model performs quite well on a wide range of benchmarks.

English

208

339

3.4K

398.2K

Samet Oymak retweetledi

Frank Hutter@FrankRHutter·13 Kas

Update: TabPFN-2.5 is actually the SOTA model on all of TabArena (which has datasets with up to 100k training data points). In a single forward pass, TabPFN-2.5 outperforms all other models, even if you tune them for 4 hours. We built and previously evaluated TabPFN-2.5 for up to 50k data points (and 2k features) and were kind of surprised that it's SOTA up to 100k 🙂 👉 TabPFN-2.5 webinar tomorrow: app.livestorm.co/p/21526e44-406… 👉 Model report on arXiv: arxiv.org/pdf/2511.08667

English

5.5K

Samet Oymak retweetledi

Ben Recht@beenwrekt·10 Kas

The arXiv position paper controversy and the weird, unwritten, organic evolution of academic practice. argmin.net/p/a-position-o…

English

4.6K

Samet Oymak retweetledi

Pushmeet Kohli@pushmeet·6 Kas

(1) Our team at @GoogleDeepMind has been collaborating with Terence Tao and Javier Gómez-Serrano to use our AI agents (AlphaEvolve, AlphaProof, & Gemini Deep Think) for advancing Maths research. They find that AlphaEvolve can help discover new results across a range of problems.

English

180

1.8K

147.2K

Samet Oymak@SametOymac·1 Kas

Much needed step from arXiv for quality control and preventing blog post & survey dumping: blog.arxiv.org/2025/10/31/att…

English

2.5K

Samet Oymak@SametOymac·23 Eki

@tydsh This is crazy and it's truly their loss

English

1.1K

Yuandong Tian@tydsh·23 Eki

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

English

473

268

6.4K

4.4M

Samet Oymak retweetledi

Richard Sutton@RichardSSutton·22 Eki

Learning is the derivative of knowledge.

English

135

1.6K

109.5K

Samet Oymak@SametOymac·20 Eki

This paper plane outside my office has been there since I joined U of M in 2023. Considering nominating it for tenure.

English

7.1K

Samet Oymak retweetledi

Andrea Montanari@Andrea__M·4 Eki

Honest question. What "scaling laws" theory papers that are not a variation on 1980s nonparametric statistics?

English

134

16K

Samet Oymak@SametOymac·4 Eki

@docmilanfar Wow this is exactly what I was teaching in the class this week: Gaussian kernel smoother = Softmax Attention

English

3.8K

Peyman Milanfar@docmilanfar·4 Eki

How Kernel Regression is related to Attention Mechanism - a summary in 10 slides. 0/1

English

193

1.3K

221.1K

Samet Oymak@SametOymac·19 Eyl

@DimitrisPapail 100% agreed. Today is the worst day SLMs will ever be 🙂

English

562

Dimitris Papailiopoulos@DimitrisPapail·19 Eyl

Small models as the new frontier and why this may be academia's LLM moment Academia should reject the nihilism of "scale is all you need", i.e, that meaningful research requires frontier scale compute. This mindset hurts basic research and what we can contribute to machine learning in practice. Many interesting questions about architectures, data, and training methods do show signal and can be tested at the O(100M) to O(1B) parameter scale within reasonable budget. There seems to exist no fundamental reason why these insights wouldn't transfer and hold up to 14B, 32B, or even larger models. Yes, there will be trends and observations that break at the trillion parameter scale, but my conjecture is that this will be irrelevant for the majority of models people will actually deploy locally in the future. The economics of post-training (SFT/RL) are finally favorable for academia. Post training a 7B model fits on a single H100 GPU, which roughly $3/hour on cloud providers. You can train on 100M+ tokens for under $100. Why care about mid/post-training? That's where a lot of interesting problems are! Reasoning, tool use, specialization, etc, these are settings where you see meaningful performance improvements and skills learnt within millions of trained tokens, not billions, that are typical for pretraining. More importantly, the 4B-32B parameter range will likely dominate local deployment in the not so distant future. These models fit on reasonable hardware (a beefy laptop) as inference requires enough RAM to fit the model, but you can use without GPUs for single batch inf calls. Also these models, at that scale, are getting seriously good for tasks likecoding, math, tutoring, computer use etc. So here is my conjecture: local models at the <100B scale will eventually generate more tokens/day than api-hosted frontier models. This may be academia's moment! The open-weights ecosystem provides a path to real impact without million-dollar GPU clusters at this scale. Our research can directly study, understand, and improve the 99% of models that will run locally, not the 1% that require data centers. This is finally both possible and meaningful. Don't be discouraged by scale maximalism!!

English

338

65.4K

Samet Oymak@SametOymac·19 Eyl

Our paper BREAD is accepted to #NeurIPS2025. BREAD can (provably) solve problems that are not solvable with standard SFT->GRPO training by interpolating between them: arxiv.org/pdf/2506.17211 Congrats to the lead authors @xuechen_zhang (on the job market!) and @HuangJ42673.

English

161

8.5K

Samet Oymak retweetledi

Dimitris Papailiopoulos@DimitrisPapail·18 Eyl

Congrats to @jackcai1206 and @nayoung_nylee for their NeurIPS Spotlight on "Extrapolation by Association: Length Generalization Transfer In Transformers" :)

Dimitris Papailiopoulos@DimitrisPapail

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)

English

114

11.3K

Samet Oymak@SametOymac·16 Eyl

Per reddit/ML (tinyurl.com/4am325bn), #AAAI2026 reviews and decisions are a mess. Seems overwhelmed by 30k submissions...

English

2.1K

Keşfet

@DimitrisPapail @GeminiApp @GoogleAIStudio @GoogleDeepMind @tydsh @docmilanfar @xuechen_zhang @HuangJ42673