Ivan Rubachev

282 posts

Ivan Rubachev banner
Ivan Rubachev

Ivan Rubachev

@puhsuuu

ML Researcher @YandexResearch | Tabular ML

F#1M/3146 SAE feature เข้าร่วม Ağustos 2016
1.7K กำลังติดตาม421 ผู้ติดตาม
ทวีตที่ปักหมุด
Ivan Rubachev
Ivan Rubachev@puhsuuu·
Tabular DL success on benchmarks ≠ success in production. We know this first hand, trying to ship models. This motivated us to create TabReD - a new suite of 8 tabular datasets that capture real-world data characteristics overlooked by existing benchmarks. (1/N)
Ivan Rubachev tweet media
English
2
11
43
9.4K
Ivan Rubachev รีทวีตแล้ว
Abhinav Moudgil
Abhinav Moudgil@amoudgl·
Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! 🧵
Abhinav Moudgil tweet media
English
3
21
99
7.7K
Ivan Rubachev รีทวีตแล้ว
Weight Space Symmetries @ ICML 2026
📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026
Weight Space Symmetries @ ICML 2026 tweet media
English
1
34
51
16.8K
Ivan Rubachev รีทวีตแล้ว
away from keyboard
away from keyboard@catisafk·
good things are coming
English
163
8.9K
35.3K
796.4K
Ivan Rubachev รีทวีตแล้ว
Ofir Press
Ofir Press@OfirPress·
If you work in AI you have to watch this talk by Moritz Hardt on the science of benchmarking. It talks about a lot of unexpected properties of benchmarks that I don't think most people are aware of; e.g. benchmarks can be incredibly noisy/imprecise and still be useful. 🔗⬇️
Ofir Press tweet media
English
3
14
225
19.8K
Ivan Rubachev รีทวีตแล้ว
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
Blog post - Transformers as Constrained Optimization Rewriting pre-norm decoder-only transformers as solutions to regularized objectives. Changing regularization to hard constraint gives a canonical temperature, generalizing to KL-divergence, ideas of cross-layer interaction.
Ji-Ha tweet media
English
10
68
602
29.7K
Ivan Rubachev รีทวีตแล้ว
shako
shako@shakoistsLog·
I finally wrote my anti time-series foundation models screed. Based on my own long experience building forecasts. Must read if you like my forecasting content. Link in replies.
shako tweet media
English
13
8
110
5.6K
Ivan Rubachev รีทวีตแล้ว
Ivan Rubachev
Ivan Rubachev@puhsuuu·
Interesting analysis, I’ve seen some stuff in the emacs community recently (neoemacs, wayland exwm like window manager, faster tramp remote method, react like ui lib, all kinda big and impossible without ai). Anecdotally felt like an improvement in the amount of new interesting packages. Maybe it’s the hobbyist and personal tooling landscape that makes a difference (like pi extensions in my twitter feed, people do love to improve their tools). Or maybe it wouldn’t be different if we look at the data and these are just too few examples
English
0
0
1
392
Alexis Gallagher
Alexis Gallagher@alexisgallagher·
If AI is so great for coding, where are the apps? @R_Dimm and I studied the Python Package index to find an "AI effect". Here's where it is not, where it is, and thoughts on why. WHERE AI IS NOT. There's no clear AI effect on Python _package creation_ since ChatGPT.
Alexis Gallagher tweet media
English
7
9
37
26.8K
Ivan Rubachev รีทวีตแล้ว
Ethan
Ethan@torchcompiled·
I’m not sure there was ever evidence actually that DiTs ever trained faster than conv or hybrid-based alternatives, and in fact the original DiT paper had a somewhat misleading comparison that ultimately favors non-DiT models.
miru@miru_why

Reviving ConvNeXt for Efficient Convolutional Diffusion Models github.com/star-kwon/FCDM arxiv.org/abs/2603.09408… the authors propose an improved convnext-based diffusion model architecture that reportedly matches DiT-XL/2 quality with 7x fewer training steps

English
1
2
35
5.3K
Ivan Rubachev รีทวีตแล้ว
Aleksandra Bakalova
Aleksandra Bakalova@abakalova13175·
Can we rewrite Transformers as a human-readable code? In this paper, we decompile Transformers trained on algorithmic and formal language tasks into D-RASP – a programming language that mirrors Transformer architecture. 🧵
Aleksandra Bakalova tweet media
English
2
39
236
24K
Ivan Rubachev รีทวีตแล้ว
Machine Learning Street Talk
Machine Learning Street Talk@MLStreetTalk·
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!
English
36
77
615
127.8K
Ivan Rubachev รีทวีตแล้ว
Dmitry Eremeev
Dmitry Eremeev@eremeev_d42·
Graph foundation model with SOTA results on real-world graphs! Our “GraphPFN: A Prior-Data Fitted Graph Foundation Model” paper recently got a major update, with better ICL performance, new ablations, code improvements and more! 🧵1/11
Dmitry Eremeev tweet media
English
2
5
32
1.9K