Noah Hollmann

147 posts

Noah Hollmann

Noah Hollmann

@noahholl

Co-founder & CTO @ Prior Labs | Building AI for tabular data

Berlin, Deutschland Katılım Kasım 2010
139 Takip Edilen529 Takipçiler
Noah Hollmann
Noah Hollmann@noahholl·
It’s incredible to welcome Turing Award winner and "Godfather of AI" Yann LeCun to our Scientific Advisory Board at Prior Labs. Yann has defined the frontier of AI, now he’s helping us build Tabular Foundation Models giving structured data the native intelligence it deserves. 📈
Prior Labs@prior_labs

Honored to announce that Yann LeCun @ylecun is joining Prior Labs’ Scientific Advisory Board.

English
0
1
5
843
Noah Hollmann retweetledi
Prior Labs
Prior Labs@prior_labs·
Honored to announce that Yann LeCun @ylecun is joining Prior Labs’ Scientific Advisory Board.
Prior Labs tweet media
English
7
14
275
56.8K
Noah Hollmann retweetledi
BURKOV
BURKOV@burkov·
This paper really is groundbreaking. It solves a long-standing embarrassment in machine learning: despite all the hype around deep learning, traditional tree-based methods (XGBoost, CatBoost, random forests, etc) have dominated tabular data—the most common data format in real-world applications—for two decades. Deep learning conquered images, text, and games, but spreadsheets remained stubbornly resistant. This paper's (published in Nature by the way) main contribution is a foundation model that finally beats tree-based methods convincingly on small-to-medium datasets, and does so very fast. TabPFN in 2.8 seconds outperforms CatBoost tuned for 4 hours—a 5,000× speedup. That's not incremental; it's a different regime entirely. The training approach is also fundamentally different. GPT trains on internet text; CLIP trains on image-caption pairs. TabPFN trains on entirely synthetic data—over 100 million artificial datasets generated from causal graphs. TabPFN generates training data by randomly constructing directed acyclic graphs where each edge applies a random transformation (using neural networks, decision trees, discretization, or noise), then pushes random noise through the root nodes and lets it propagate through the graph—the intermediate values at various nodes become features, one becomes the target, and post-processing adds realistic messiness like missing values and outliers. By training on millions of these synthetic datasets with very different structures, the model learns general prediction strategies without ever seeing real data. The inference mechanism is also unusual. Rather than finetuning or prompting, TabPFN performs both "training" and prediction in a single forward pass. You feed it your labeled training data and unlabeled test points together, and it outputs predictions immediately. There's no gradient descent at inference time—the model has learned how to learn from examples during pretraining. The architecture respects tabular structure with two-way attention (across features within a row, then across samples within a column), unlike standard transformers that treat everything as a flat sequence. So, the transformer has basically learned to do supervised learning. Talk to the paper on ChapterPal: chapterpal.com/s/a1899430/acc… Download the PDF: nature.com/articles/s4158…
BURKOV tweet media
English
77
391
2.6K
331.2K
Noah Hollmann retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
Update: TabPFN-2.5 is actually the SOTA model on all of TabArena (which has datasets with up to 100k training data points). In a single forward pass, TabPFN-2.5 outperforms all other models, even if you tune them for 4 hours. We built and previously evaluated TabPFN-2.5 for up to 50k data points (and 2k features) and were kind of surprised that it's SOTA up to 100k 🙂 👉 TabPFN-2.5 webinar tomorrow: app.livestorm.co/p/21526e44-406… 👉 Model report on arXiv: arxiv.org/pdf/2511.08667
Frank Hutter tweet media
English
0
7
74
5.6K
Noah Hollmann
Noah Hollmann@noahholl·
A new version of TabPFN is ready! When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about, and that using in-context learning could be a breakthrough for it. We kept pushing: v2 landed in Nature earlier this year, and we started Prior Labs so we could obsess over this full-time. Now, with TabPFN-2.5, we were honestly surprised by how much further we could go. On datasets up to 50,000 datapoints and 2000 features, its untuned performance now far outperforms tuned XGBoost and CatBoost. It even matches a 4-hour tuned AutoGluon ensemble - an ensemble that includes our previous v2 model. We also focused a lot on making models deployment-ready: We show how to export our models to a tree or MLP representation (TabPFN-as-MLP in the plot below), making them fast in inference and easy to deploy. At the same time, we improved our Python SDK (a lot!) running TabPFN in the cloud, and added a REST API for developers to access the model from anywhere. We are incredibly proud by the ways TabPFN has already helped in science and decision-making, from oncology to climate research (over 400 citations and 100 published use cases since beginning of the year!). This new release will immediately boost all those applications - for example, our report shows v2.5 is much stronger for causal inference tasks. It's amazing to see the field accelerating. We can't wait to show you what's next.
Noah Hollmann tweet mediaNoah Hollmann tweet mediaNoah Hollmann tweet mediaNoah Hollmann tweet media
English
2
4
21
2.5K
Noah Hollmann retweetledi
Prior Labs
Prior Labs@prior_labs·
Today, TabPFN gets an upgrade. TabPFN-2.5 is here. 🪂 TabPFN-2.5 outperforms tuned-tree based models & matches the performance of a complex ensemble (AutoGluon) 1.4 tuned for 4 hours on benchmarks of up to 50,000 samples and 2,000 features. 🧵 1/7 #tabpfn #priorlabs
Prior Labs tweet media
English
1
7
15
1.1K
Noah Hollmann retweetledi
Léo Grinsztajn
Léo Grinsztajn@LeoGrint·
🍁 Excited to be in Vancouver for ICML! Don't hesitate to reach out to me or @FrankRHutter if you'd like to chat about foundation models, tabular data, time series, and what we're doing on these topics at @prior_labs ! P.S: we're hiring 😌
English
0
2
18
834
Noah Hollmann retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
It's great to have this new open-source benchmark for tabular data 🚀 It's really comprehensive, created and maintained by serious open source contributors from various groups, and I expect it to quickly become the new standard benchmark. I'm super excited that progress in the tabular space is exploding! It's also interesting to see that, for the datasets in TabPFN v2's constraints (<10k data points, <500 features), half a year after its release, TabPFN v2 still strongly outperforms all other methods!
Frank Hutter tweet media
Lennart Purucker@LennartPurucker

🚨What is SOTA on tabular data, really? We are excited to announce 𝗧𝗮𝗯𝗔𝗿𝗲𝗻𝗮, a living benchmark for machine learning on IID tabular data with: 📊 an online leaderboard (submit!) 📑 carefully curated datasets 📈 strong tree-based, deep learning, and foundation models 🧵

English
4
18
107
12.4K
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
finally, wind is changing its direction: causal inference becomes easier if we give up on designing a new estimation algorithm ourselves (i don't think we've evolved to do so ourselves well.) let learning find one for you!
Kyunghyun Cho tweet mediaKyunghyun Cho tweet mediaKyunghyun Cho tweet media
English
5
20
173
16K
Noah Hollmann retweetledi
Bernhard Schölkopf
Bernhard Schölkopf@bschoelkopf·
In 2015, we ran a workshop on "Drawing causal inference from Big Data" at the NAS. Back then, “Big Data” felt like a buzzword. Ten years later, we might finally have a method to make it real.
Jake Robertson@JakeMRobertson

We believe that Do-PFN can provide causal insights on diverse and understudied problems where experimental data is scarce! This is joint work with @Arik_Reuter (shared), @syguoML, @noahholl, @FrankRHutter, @bschoelkopf Checkout the paper at: arxiv.org/abs/2506.06039 (7/7)

English
3
5
44
6.2K
Noah Hollmann retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
Super excited about this interventional version of TabPFN! We rarely make predictions just for the sake of making predictions, but rather in order to make (automated) decisions. If these decisions entail interventions on some of the features (treatments, prices, purchases, etc), then models fitted purely on observational data are off. I.e., *all* traditional methods (gradient boosting but also transformer models) are systematically wrong in that case. With Do-PFN, we can now, e.g., predict conditional average treatment effects (CATEs) better than models specially designed for this purpose! I'm very much looking forward to scaling this up. Also really happy about this being the first work in a line of many in a cooperation between @prior_labs and @ELLISInst_Tue, together with @bschoelkopf, a leading authority in causality!
Jake Robertson@JakeMRobertson

We present a new approach to causal inference. Pre-trained on synthetic data, Do-PFN opens the door to a new domain: PFNs for causal inference—we are excited to announce our new paper “Do-PFN: In-Context Learning for Causal Effect Estimation” on Arxiv! 🔨🔍 A thread:

English
0
4
36
2.4K
Noah Hollmann retweetledi
Prior Labs
Prior Labs@prior_labs·
Curious to see how fine-tuned TabPFN can transform your team's results? Visit our site for full details, compelling use cases, and to explore the possibilities. 👇 #PriorLabs 🧵 5/5 priorlabs.ai/finetuning
English
0
1
3
398
Noah Hollmann retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
Tabular data often has messy text features —preprocessing them is tedious and time-consuming. Our new #TabPFN 2+ removes that hassle: just feed in your data, and get fast, accurate predictions. No feature engineering required. We’d love your feedback! API + example notebook👇1/5
Frank Hutter tweet media
English
1
8
91
5.1K
Noah Hollmann retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
We’re actively hiring ML Engineers & Applied AI Scientists at @prior_labs to build the future of foundation models for structured data! We’re pushing the boundaries of AI with foundation models that understand structured data—an untapped AI frontier powering science, finance, healthcare, and business. Backed by €9M funding from @balderton, @xtxmarkets and AI pioneers from @huggingface, @AMDSiloAI, @bfl_ml and @DataRobot/@AtScale, we aim to become the world-leading lab for structured data AI. 🔹 Who should apply? PhDs or experienced engineers with deep ML expertise, especially in PyTorch, scikit-learn, NN architectures and structured data (tabular, time series). If you want to work on fundamental AI breakthroughs, scalable models, and multimodal architectures, join us! 📍 Location: Freiburg or Berlin 💼 Compensation: Competitive salary (€70K - 120K) + meaningful equity 🚀 Join us in shaping the future of AI for structured data! 🔗 Apply now: jobs.ashbyhq.com/prior-labs
English
0
5
31
2.1K
Noah Hollmann retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
DeepSeek-R1 is a cheap, non-US, open-source foundation model challenging US market leaders … and so is TabPFN v2! Happy to report that TabPFN-TS just took #1 in the popular time series benchmark GIFT-Eval, outperforming Amazon's leading time series foundation model Chronos!🧵1/5
Frank Hutter tweet media
English
4
17
88
8.8K
Breno Brito
Breno Brito@brenorb·
@FrankRHutter Hey, I tried TabPFN yesterday with 10k time series datapoints and after 7+ hours in a Mac M1 I decided to stop and move on. Should I have resampled it to use fewer datapoints or run on the cloud GPU?
English
2
0
2
204
Frank Hutter
Frank Hutter@FrankRHutter·
#TabPFN v2 also excels on time series! Just before our Nature paper came out, we had this paper at the #NeurIPS time series workshop: openreview.net/forum?id=ho8Yx… We simply cast time series as tabular regression and use exactly the model from Nature. It’s crazy that this works! 🧵1/10
Frank Hutter tweet media
English
7
42
306
29.2K