Noah Hollmann

147 posts

Noah Hollmann

@noahholl

Co-founder & CTO @ Prior Labs | Building AI for tabular data

Berlin, Deutschland Katılım Kasım 2010

139 Takip Edilen529 Takipçiler

Noah Hollmann@noahholl·13 Oca

It’s incredible to welcome Turing Award winner and "Godfather of AI" Yann LeCun to our Scientific Advisory Board at Prior Labs. Yann has defined the frontier of AI, now he’s helping us build Tabular Foundation Models giving structured data the native intelligence it deserves. 📈

Prior Labs@prior_labs

Honored to announce that Yann LeCun @ylecun is joining Prior Labs’ Scientific Advisory Board.

English

843

Noah Hollmann retweetledi

Prior Labs@prior_labs·13 Oca

Honored to announce that Yann LeCun @ylecun is joining Prior Labs’ Scientific Advisory Board.

English

275

56.8K

Noah Hollmann retweetledi

Parul Pandey@pandeyparul·2 Oca

Here is the Kaggle Notebook which shows a basic implementation of TabPFN on a playground competition. kaggle.com/code/parulpand…

GIF

Towards Data Science@TDataScience

Get a complete introduction to TabPFN, from theory to practice. @pandeyparul covers the model's transformer-based architecture and unique training pipeline. towardsdatascience.com/exploring-tabp…

English

Noah Hollmann retweetledi

BURKOV@burkov·3 Ara

This paper really is groundbreaking. It solves a long-standing embarrassment in machine learning: despite all the hype around deep learning, traditional tree-based methods (XGBoost, CatBoost, random forests, etc) have dominated tabular data—the most common data format in real-world applications—for two decades. Deep learning conquered images, text, and games, but spreadsheets remained stubbornly resistant. This paper's (published in Nature by the way) main contribution is a foundation model that finally beats tree-based methods convincingly on small-to-medium datasets, and does so very fast. TabPFN in 2.8 seconds outperforms CatBoost tuned for 4 hours—a 5,000× speedup. That's not incremental; it's a different regime entirely. The training approach is also fundamentally different. GPT trains on internet text; CLIP trains on image-caption pairs. TabPFN trains on entirely synthetic data—over 100 million artificial datasets generated from causal graphs. TabPFN generates training data by randomly constructing directed acyclic graphs where each edge applies a random transformation (using neural networks, decision trees, discretization, or noise), then pushes random noise through the root nodes and lets it propagate through the graph—the intermediate values at various nodes become features, one becomes the target, and post-processing adds realistic messiness like missing values and outliers. By training on millions of these synthetic datasets with very different structures, the model learns general prediction strategies without ever seeing real data. The inference mechanism is also unusual. Rather than finetuning or prompting, TabPFN performs both "training" and prediction in a single forward pass. You feed it your labeled training data and unlabeled test points together, and it outputs predictions immediately. There's no gradient descent at inference time—the model has learned how to learn from examples during pretraining. The architecture respects tabular structure with two-way attention (across features within a row, then across samples within a column), unlike standard transformers that treat everything as a flat sequence. So, the transformer has basically learned to do supervised learning. Talk to the paper on ChapterPal: chapterpal.com/s/a1899430/acc… Download the PDF: nature.com/articles/s4158…

English

391

2.6K

331.2K

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·13 Kas

Update: TabPFN-2.5 is actually the SOTA model on all of TabArena (which has datasets with up to 100k training data points). In a single forward pass, TabPFN-2.5 outperforms all other models, even if you tune them for 4 hours. We built and previously evaluated TabPFN-2.5 for up to 50k data points (and 2k features) and were kind of surprised that it's SOTA up to 100k 🙂 👉 TabPFN-2.5 webinar tomorrow: app.livestorm.co/p/21526e44-406… 👉 Model report on arXiv: arxiv.org/pdf/2511.08667

English

5.6K

Noah Hollmann retweetledi

Samuel Müller@SamuelMullr·6 Kas

check out the technical report storage.googleapis.com/prior-labs-tab…

English

291

Noah Hollmann@noahholl·6 Kas

Link to report: priorlabs.ai/technical-repo…

English

263

Noah Hollmann@noahholl·6 Kas

A new version of TabPFN is ready! When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about, and that using in-context learning could be a breakthrough for it. We kept pushing: v2 landed in Nature earlier this year, and we started Prior Labs so we could obsess over this full-time. Now, with TabPFN-2.5, we were honestly surprised by how much further we could go. On datasets up to 50,000 datapoints and 2000 features, its untuned performance now far outperforms tuned XGBoost and CatBoost. It even matches a 4-hour tuned AutoGluon ensemble - an ensemble that includes our previous v2 model. We also focused a lot on making models deployment-ready: We show how to export our models to a tree or MLP representation (TabPFN-as-MLP in the plot below), making them fast in inference and easy to deploy. At the same time, we improved our Python SDK (a lot!) running TabPFN in the cloud, and added a REST API for developers to access the model from anywhere. We are incredibly proud by the ways TabPFN has already helped in science and decision-making, from oncology to climate research (over 400 citations and 100 published use cases since beginning of the year!). This new release will immediately boost all those applications - for example, our report shows v2.5 is much stronger for causal inference tasks. It's amazing to see the field accelerating. We can't wait to show you what's next.

English

2.5K

Noah Hollmann retweetledi

Prior Labs@prior_labs·6 Kas

Today, TabPFN gets an upgrade. TabPFN-2.5 is here. 🪂 TabPFN-2.5 outperforms tuned-tree based models & matches the performance of a complex ensemble (AutoGluon) 1.4 tuned for 4 hours on benchmarks of up to 50,000 samples and 2,000 features. 🧵 1/7 #tabpfn #priorlabs

English

1.1K

Noah Hollmann retweetledi

Léo Grinsztajn@LeoGrint·13 Tem

🍁 Excited to be in Vancouver for ICML! Don't hesitate to reach out to me or @FrankRHutter if you'd like to chat about foundation models, tabular data, time series, and what we're doing on these topics at @prior_labs ! P.S: we're hiring 😌

English

834

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·24 Haz

It's great to have this new open-source benchmark for tabular data 🚀 It's really comprehensive, created and maintained by serious open source contributors from various groups, and I expect it to quickly become the new standard benchmark. I'm super excited that progress in the tabular space is exploding! It's also interesting to see that, for the datasets in TabPFN v2's constraints (<10k data points, <500 features), half a year after its release, TabPFN v2 still strongly outperforms all other methods!

Lennart Purucker@LennartPurucker

🚨What is SOTA on tabular data, really? We are excited to announce 𝗧𝗮𝗯𝗔𝗿𝗲𝗻𝗮, a living benchmark for machine learning on IID tabular data with: 📊 an online leaderboard (submit!) 📑 carefully curated datasets 📈 strong tree-based, deep learning, and foundation models 🧵

English

107

12.4K

Noah Hollmann@noahholl·14 Haz

@kchonyc @SamuelMullr

QAM

Kyunghyun Cho@kchonyc·12 Haz

finally, wind is changing its direction: causal inference becomes easier if we give up on designing a new estimation algorithm ourselves (i don't think we've evolved to do so ourselves well.) let learning find one for you!

English

173

16K

Noah Hollmann retweetledi

Bernhard Schölkopf@bschoelkopf·10 Haz

In 2015, we ran a workshop on "Drawing causal inference from Big Data" at the NAS. Back then, “Big Data” felt like a buzzword. Ten years later, we might finally have a method to make it real.

Jake Robertson@JakeMRobertson

We believe that Do-PFN can provide causal insights on diverse and understudied problems where experimental data is scarce! This is joint work with @Arik_Reuter (shared), @syguoML, @noahholl, @FrankRHutter, @bschoelkopf Checkout the paper at: arxiv.org/abs/2506.06039 (7/7)

English

6.2K

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·10 Haz

Super excited about this interventional version of TabPFN! We rarely make predictions just for the sake of making predictions, but rather in order to make (automated) decisions. If these decisions entail interventions on some of the features (treatments, prices, purchases, etc), then models fitted purely on observational data are off. I.e., *all* traditional methods (gradient boosting but also transformer models) are systematically wrong in that case. With Do-PFN, we can now, e.g., predict conditional average treatment effects (CATEs) better than models specially designed for this purpose! I'm very much looking forward to scaling this up. Also really happy about this being the first work in a line of many in a cooperation between @prior_labs and @ELLISInst_Tue, together with @bschoelkopf, a leading authority in causality!

Jake Robertson@JakeMRobertson

We present a new approach to causal inference. Pre-trained on synthetic data, Do-PFN opens the door to a new domain: PFNs for causal inference—we are excited to announce our new paper “Do-PFN: In-Context Learning for Causal Effect Estimation” on Arxiv! 🔨🔍 A thread:

English

2.4K

Noah Hollmann retweetledi

Prior Labs@prior_labs·7 May

Curious to see how fine-tuned TabPFN can transform your team's results? Visit our site for full details, compelling use cases, and to explore the possibilities. 👇 #PriorLabs 🧵 5/5 priorlabs.ai/finetuning

English

398

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·21 Şub

Tabular data often has messy text features —preprocessing them is tedious and time-consuming. Our new #TabPFN 2+ removes that hassle: just feed in your data, and get fast, accurate predictions. No feature engineering required. We’d love your feedback! API + example notebook👇1/5

English

5.1K

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·14 Şub

We’re actively hiring ML Engineers & Applied AI Scientists at @prior_labs to build the future of foundation models for structured data! We’re pushing the boundaries of AI with foundation models that understand structured data—an untapped AI frontier powering science, finance, healthcare, and business. Backed by €9M funding from @balderton, @xtxmarkets and AI pioneers from @huggingface, @AMDSiloAI, @bfl_ml and @DataRobot/@AtScale, we aim to become the world-leading lab for structured data AI. 🔹 Who should apply? PhDs or experienced engineers with deep ML expertise, especially in PyTorch, scikit-learn, NN architectures and structured data (tabular, time series). If you want to work on fundamental AI breakthroughs, scalable models, and multimodal architectures, join us! 📍 Location: Freiburg or Berlin 💼 Compensation: Competitive salary (€70K - 120K) + meaningful equity 🚀 Join us in shaping the future of AI for structured data! 🔗 Apply now: jobs.ashbyhq.com/prior-labs

English

2.1K

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·29 Oca

DeepSeek-R1 is a cheap, non-US, open-source foundation model challenging US market leaders … and so is TabPFN v2! Happy to report that TabPFN-TS just took #1 in the popular time series benchmark GIFT-Eval, outperforming Amazon's leading time series foundation model Chronos!🧵1/5

English

8.8K

Noah Hollmann retweetledi

Frank Hutter@FrankRHutter·27 Oca

Great to see the community building on TabPFNv2 with these powerful extensions: - shapiq for interpretability github.com/PriorLabs/tabp… - AutoTabPFN for better performance github.com/PriorLabs/tabp… (demoed in colab.research.google.com/drive/1SHa43Vu…) You gotta love the open source community ♥️

English

1.9K

Noah Hollmann@noahholl·14 Oca

@brenorb @FrankRHutter Hi Breno, you do need on a GPU - sorry for making your computer burn :D We have a client (github.com/PriorLabs/tabp…) that uses our GPU (however limited number of cells, I hope it runs on your data rn) or use a colab with a free T4 GPU.

English

Breno Brito@brenorb·14 Oca

@FrankRHutter Hey, I tried TabPFN yesterday with 10k time series datapoints and after 7+ hours in a Mac M1 I decided to stop and move on. Should I have resampled it to use fewer datapoints or run on the cloud GPU?

English

204

Frank Hutter@FrankRHutter·13 Oca

#TabPFN v2 also excels on time series! Just before our Nature paper came out, we had this paper at the #NeurIPS time series workshop: openreview.net/forum?id=ho8Yx… We simply cast time series as tabular regression and use exactly the model from Nature. It’s crazy that this works! 🧵1/10

English

306

29.2K

Keşfet

@ylecun @FrankRHutter @prior_labs @kchonyc @SamuelMullr @ELLISInst_Tue @bschoelkopf @balderton