Ivan Rubachev

282 posts

Ivan Rubachev

@puhsuuu

ML Researcher @YandexResearch | Tabular ML

F#1M/3146 SAE feature Katılım Ağustos 2016

1.7K Takip Edilen421 Takipçiler

Sabitlenmiş Tweet

Ivan Rubachev@puhsuuu·6 Kas

Tabular DL success on benchmarks ≠ success in production. We know this first hand, trying to ship models. This motivated us to create TabReD - a new suite of 8 tabular datasets that capture real-world data characteristics overlooked by existing benchmarks. (1/N)

English

9.4K

Ivan Rubachev retweetledi

Abhinav Moudgil@amoudgl·4d

Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! 🧵

English

7.7K

Ivan Rubachev retweetledi

Weight Space Symmetries @ ICML 2026@weightsymmetry·5d

📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026

Weight Space Symmetries @ ICML 2026 tweet media

English

16.8K

Ivan Rubachev retweetledi

away from keyboard@catisafk·6d

good things are coming

English

163

8.9K

35.3K

796.4K

Ivan Rubachev retweetledi

Ofir Press@OfirPress·23 Mar

If you work in AI you have to watch this talk by Moritz Hardt on the science of benchmarking. It talks about a lot of unexpected properties of benchmarks that I don't think most people are aware of; e.g. benchmarks can be incredibly noisy/imprecise and still be useful. 🔗⬇️

English

225

19.8K

Ivan Rubachev retweetledi

Ji-Ha@Ji_Ha_Kim·19 Mar

Blog post - Transformers as Constrained Optimization Rewriting pre-norm decoder-only transformers as solutions to regularized objectives. Changing regularization to hard constraint gives a canonical temperature, generalizing to KL-divergence, ideas of cross-layer interaction.

English

602

29.7K

Ivan Rubachev retweetledi

shako@shakoistsLog·10 Mar

I finally wrote my anti time-series foundation models screed. Based on my own long experience building forecasts. Must read if you like my forecasting content. Link in replies.

English

110

5.6K

Ivan Rubachev retweetledi

N8 Programs@N8Programs·13 Mar

there is ICL everywhere for those with the flops to see

Seungwook Han@seungwookh

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)

English

3.4K

Ivan Rubachev@puhsuuu·13 Mar

Interesting analysis, I’ve seen some stuff in the emacs community recently (neoemacs, wayland exwm like window manager, faster tramp remote method, react like ui lib, all kinda big and impossible without ai). Anecdotally felt like an improvement in the amount of new interesting packages. Maybe it’s the hobbyist and personal tooling landscape that makes a difference (like pi extensions in my twitter feed, people do love to improve their tools). Or maybe it wouldn’t be different if we look at the data and these are just too few examples

English

392

Alexis Gallagher@alexisgallagher·12 Mar

If AI is so great for coding, where are the apps? @R_Dimm and I studied the Python Package index to find an "AI effect". Here's where it is not, where it is, and thoughts on why. WHERE AI IS NOT. There's no clear AI effect on Python _package creation_ since ChatGPT.

English

26.8K

Ivan Rubachev retweetledi

Ethan@torchcompiled·12 Mar

I’m not sure there was ever evidence actually that DiTs ever trained faster than conv or hybrid-based alternatives, and in fact the original DiT paper had a somewhat misleading comparison that ultimately favors non-DiT models.

miru@miru_why

Reviving ConvNeXt for Efficient Convolutional Diffusion Models github.com/star-kwon/FCDM arxiv.org/abs/2603.09408… the authors propose an improved convnext-based diffusion model architecture that reportedly matches DiT-XL/2 quality with 7x fewer training steps

English

5.3K

Ivan Rubachev retweetledi

Edward Z. Yang@ezyang·11 Mar

New blog: AI-assisted programming for spmd_types blog.ezyang.com/2026/03/ai-ass…

Norsk

6.1K

Ivan Rubachev retweetledi

george hotz archive@geohotarchive·11 Mar

Every minute you aren’t running 69 agents, you are falling behind geohot.github.io//blog/jekyll/u…

English

782

59.4K

Ivan Rubachev@puhsuuu·6 Mar

@fleetwood___ isattentionallyouneed.com

QME

112

Fleetwood@fleetwood___·6 Mar

Everyone can switch to GDN/a new linear attention variant, I will still refuse to believe it. Sparse attention is the future.

Ai2@allen_ai

Key finding: hybrid models are substantially more data-efficient than transformers. We show this through rigorous theory + controlled experiments. On MMLU, Olmo Hybrid matches Olmo 3’s accuracy using 49% fewer tokens—roughly 2× efficiency.

English

11K

Ivan Rubachev retweetledi

Aleksandra Bakalova@abakalova13175·4 Mar

Can we rewrite Transformers as a human-readable code? In this paper, we decompile Transformers trained on algorithmic and formal language tasks into D-RASP – a programming language that mirrors Transformer architecture. 🧵

English

236

24K

Ivan Rubachev retweetledi

Machine Learning Street Talk@MLStreetTalk·4 Mar

A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!

English

615

127.8K

Ivan Rubachev@puhsuuu·2 Mar

@michael_nielsen Plus there are very nice interfaces for llms in emacs. One nice workflow example that reminds me of (not yet released) solveit a bit and is generally cool and unique poyo.co/note/20260202T…

English

110

Ivan Rubachev retweetledi

Michael Nielsen@michael_nielsen·1 Mar

Ouch! Also: true! (Funny fact: coding assistants have revolutionized my use of emacs. It's so much easier to add features! And it really is a kinda operating system for life.)

Olof Johansson@olofj

Emacs IS an age verification scheme. Nobody under 40 uses it.

English

106

16.8K

Ivan Rubachev@puhsuuu·24 Şub

“Have you written the skill for your…” paper

Andrej Karpathy@karpathy

CLIs are super exciting precisely because they are a "legacy" technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit. E.g ask your Claude/Codex agent to install this new Polymarket CLI and ask for any arbitrary dashboards or interfaces or logic. The agents will build it for you. Install the Github CLI too and you can ask them to navigate the repo, see issues, PRs, discussions, even the code itself. Example: Claude built this terminal dashboard in ~3 minutes, of the highest volume polymarkets and the 24hr change. Or you can make it a web app or whatever you want. Even more powerful when you use it as a module of bigger pipelines. If you have any kind of product or service think: can agents access and use them? - are your legacy docs (for humans) at least exportable in markdown? - have you written Skills for your product? - can your product/service be usable via CLI? Or MCP? - ... It's 2026. Build. For. Agents.

English

176

Ivan Rubachev@puhsuuu·20 Şub

@shinfxh x.com/DimitrisPapail…

Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2024…

QME

Xinghong (Shin) Fu@shinfxh·20 Şub

hmm 200M is still a little big, can we do it in less than 1M parameters?

Jeff Dean@JeffDean

Learn more about a very high quality time series model released by @GoogleResearch a while back at research.google/blog/a-decoder…

English

2.5K

Ivan Rubachev@puhsuuu·13 Şub

@FrankRHutter @SamuelMullr @LeoGrint @LennartPurucker @DHolzmueller @JingangQu @MarineLeMorvan @GaelVaroquaux @PengCui18 continued training of tabular PFNs is sota on graph tasks (representative of actual graph data in industry, not citation networks). Thought you may be interested

English

190

Ivan Rubachev retweetledi

Dmitry Eremeev@eremeev_d42·13 Şub

Graph foundation model with SOTA results on real-world graphs! Our “GraphPFN: A Prior-Data Fitted Graph Foundation Model” paper recently got a major update, with better ICL performance, new ablations, code improvements and more! 🧵1/11

English

1.9K

Keşfet

@icmlconf @R_Dimm @fleetwood___ @jeremyphoward @michael_nielsen @shinfxh @elonmusk @BarackObama