Ivan Rubachev

282 posts

Ivan Rubachev banner
Ivan Rubachev

Ivan Rubachev

@puhsuuu

ML Researcher @YandexResearch | Tabular ML

F#1M/3146 SAE feature Katılım Ağustos 2016
1.7K Takip Edilen421 Takipçiler
Sabitlenmiş Tweet
Ivan Rubachev
Ivan Rubachev@puhsuuu·
Tabular DL success on benchmarks ≠ success in production. We know this first hand, trying to ship models. This motivated us to create TabReD - a new suite of 8 tabular datasets that capture real-world data characteristics overlooked by existing benchmarks. (1/N)
Ivan Rubachev tweet media
English
2
11
43
9.4K
Ivan Rubachev retweetledi
Abhinav Moudgil
Abhinav Moudgil@amoudgl·
Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! 🧵
Abhinav Moudgil tweet media
English
3
21
99
7.7K
Ivan Rubachev retweetledi
Weight Space Symmetries @ ICML 2026
📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026
Weight Space Symmetries @ ICML 2026 tweet media
English
1
34
51
16.8K
Ivan Rubachev retweetledi
away from keyboard
away from keyboard@catisafk·
good things are coming
English
163
8.9K
35.3K
796.4K
Ivan Rubachev retweetledi
Ofir Press
Ofir Press@OfirPress·
If you work in AI you have to watch this talk by Moritz Hardt on the science of benchmarking. It talks about a lot of unexpected properties of benchmarks that I don't think most people are aware of; e.g. benchmarks can be incredibly noisy/imprecise and still be useful. 🔗⬇️
Ofir Press tweet media
English
3
14
225
19.8K
Ivan Rubachev retweetledi
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
Blog post - Transformers as Constrained Optimization Rewriting pre-norm decoder-only transformers as solutions to regularized objectives. Changing regularization to hard constraint gives a canonical temperature, generalizing to KL-divergence, ideas of cross-layer interaction.
Ji-Ha tweet media
English
10
68
602
29.7K
Ivan Rubachev retweetledi
shako
shako@shakoistsLog·
I finally wrote my anti time-series foundation models screed. Based on my own long experience building forecasts. Must read if you like my forecasting content. Link in replies.
shako tweet media
English
13
8
110
5.6K
Ivan Rubachev retweetledi
Ivan Rubachev
Ivan Rubachev@puhsuuu·
Interesting analysis, I’ve seen some stuff in the emacs community recently (neoemacs, wayland exwm like window manager, faster tramp remote method, react like ui lib, all kinda big and impossible without ai). Anecdotally felt like an improvement in the amount of new interesting packages. Maybe it’s the hobbyist and personal tooling landscape that makes a difference (like pi extensions in my twitter feed, people do love to improve their tools). Or maybe it wouldn’t be different if we look at the data and these are just too few examples
English
0
0
1
392
Alexis Gallagher
Alexis Gallagher@alexisgallagher·
If AI is so great for coding, where are the apps? @R_Dimm and I studied the Python Package index to find an "AI effect". Here's where it is not, where it is, and thoughts on why. WHERE AI IS NOT. There's no clear AI effect on Python _package creation_ since ChatGPT.
Alexis Gallagher tweet media
English
7
9
37
26.8K
Ivan Rubachev retweetledi
Ethan
Ethan@torchcompiled·
I’m not sure there was ever evidence actually that DiTs ever trained faster than conv or hybrid-based alternatives, and in fact the original DiT paper had a somewhat misleading comparison that ultimately favors non-DiT models.
miru@miru_why

Reviving ConvNeXt for Efficient Convolutional Diffusion Models github.com/star-kwon/FCDM arxiv.org/abs/2603.09408… the authors propose an improved convnext-based diffusion model architecture that reportedly matches DiT-XL/2 quality with 7x fewer training steps

English
1
2
35
5.3K
Ivan Rubachev retweetledi
Aleksandra Bakalova
Aleksandra Bakalova@abakalova13175·
Can we rewrite Transformers as a human-readable code? In this paper, we decompile Transformers trained on algorithmic and formal language tasks into D-RASP – a programming language that mirrors Transformer architecture. 🧵
Aleksandra Bakalova tweet media
English
2
39
236
24K
Ivan Rubachev retweetledi
Machine Learning Street Talk
Machine Learning Street Talk@MLStreetTalk·
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!
English
36
77
615
127.8K
Ivan Rubachev retweetledi
Dmitry Eremeev
Dmitry Eremeev@eremeev_d42·
Graph foundation model with SOTA results on real-world graphs! Our “GraphPFN: A Prior-Data Fitted Graph Foundation Model” paper recently got a major update, with better ICL performance, new ablations, code improvements and more! 🧵1/11
Dmitry Eremeev tweet media
English
2
5
32
1.9K