Marco Slot

10

68

6K

Marco Slot retweetledi

POSETTE: An Event for Postgres 🌟 🎥 #PosetteConf@PosetteConf·18 Nis

Marco Slot (@marcoslot) of Snowflake explains how Postgres can become a better analytics database with pg_lake. Mark your calendar for POSETTE: An Event for Postgres 2026 Livestream 2 on Wed 17 Jun posetteconf.com/2026/talks/pg_… #PosetteConf

POSETTE: An Event for Postgres 🌟 🎥 #PosetteConf tweet media

English

1

7

238

Marco Slot@marcoslot·14 Nis

@tzkb Yes, it can read Iceberg tables managed by a REST catalog, though support is still somewhat limited (currently only tested with Polaris, can only configure 1 catalog) #issuecomment-3641347703" target="_blank" rel="nofollow noopener">github.com/Snowflake-Labs…

English

0

4

158

こば -Koba as a DB engineer-@tzkb·14 Nis

pg_lake、勘違いしてた。IcebergのカタログをPostgreSQLに持つ構成なんだね。外部にあるカタログを見にいって、Icebergにクエリするものかと思ってた。それも出来るのかな。

日本語

Stanislav Kozlovski@kozlovski

0

5

1.5K

Marco Slot@marcoslot·9 Nis

A deep dive on why Postgres and Iceberg belong together 🙂

Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground @marcoslot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever

English

8

780

Marco Slot retweetledi

Stanislav Kozlovski@kozlovski·9 Nis

Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground @marcoslot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever

English

24

232

16.4K

Marco Slot retweetledi

Craig Kerstiens@craigkerstiens·6 Mar

Your fully open source time series stack: Postgres: because duh pg_partman: time partitioning pg_lake: iceberg for archival natively in Postgres pg_incremental: incremental data processing snowflake.com/en/engineering…

English

11

76

4.1K

Marco Slot@marcoslot·5 Şub

@denismagda @tobias_petry We've definitely considered it, but two main challenges are the space reclamation in DuckDB files, and unioning not-cached Parquet with a large number of DuckDB tables without repercussions. So far, we've punted on those due to high complexity for incremental improvement.

English

1

33

Denis Magda@denismagda·5 Şub

@marcoslot @tobias_petry Marco, thanks a lot for sharing the details! Is the cached data stored in the DuckDB's columnar storage format? btw, I loved watching your video, sharing here with everyone who'd like to dig deeper: youtube.com/watch?v=HZArjl…

YouTube

English

@Mr_D_V_D @denismagda github.com/Snowflake-Labs…

0

2

110

Denis Magda@denismagda·5 Şub

Let Postgres own the Iceberg catalog and delegate analytics to DuckDB. The result => transactional lakehouse updates with fast analytical queries. This isn’t a concept. It’s exactly what pg_lake delivers today. pg_lake combines a set of extensions and components that let you query and modify Iceberg tables (and other lakehouse formats) directly from Postgres. DuckDB is used to accelerate analytical queries and runs in a sidecar process called pgduck_server, which communicates with Postgres during query execution. How it works (diagram below): 1. An application sends a query to Postgres to calculate unrealized PnL (Profit and Loss) for the Disney ticker. 2. Postgres parses the query and identifies the part that computes the average price from historical lakehouse data. 3. That part is forwarded to pgduck_server for accelerated execution. 4. pgduck_server delegates execution to DuckDB, which queries the lakehouse (reusing cached data if available). 5. DuckDB computes the average price and returns it to Postgres. 6. Postgres joins the result with local portfolio data and computes the unrealized PnL. 7. The final result is returned to the application.

English

7

33

375

28.4K

Marco Slot@marcoslot·5 Şub

QME

2

34

fictiousxxl@Mr_D_V_D·5 Şub

@denismagda can we see the implementation of this so we understand the actual setup

English

0

372

Marco Slot@marcoslot·5 Şub

@tobias_petry @denismagda We implemented an LRU file caching layer (write-through and background fetches) on top of DuckDB's file system abstraction which can be activated by running pgduck_server with --cache_dir .. In addition Iceberg metadata is cached in Postgres tables.

English

0

2

94

Tobias_Petry.sql@tobias_petry·5 Şub

@denismagda > 4. pgduck_server delegates execution to DuckDB, which queries the lakehouse (reusing cached data if available). How does it reuse cached data? Did you setup a file-based cache for the lakehouse data?

English

0

1

711

Marco Slot retweetledi

Mim@mim_djo·3 Oca

First look at pg_lake, and how #duckdb gave #postgresql a boost :) Apache Iceberg + Postgres + DuckDB compute pushdown is a very interesting combo. youtu.be/qlzIY6hjjLw

YouTube

English

11

54

6.2K

Marco Slot@marcoslot·26 Ara

@Korosuke512tr Pg_lake does compaction automatically every 10 minutes.

English

0

1

108

さくらもち太郎🍡@Korosuke512tr·26 Ara

仕事は納まったがpg_lakeで作ったIceberg テーブルのコンパクションはどうするのが最適なのかは謎のままなので継続調査📈

日本語

0

2

140

Marco Slot@marcoslot·15 Ara

@tzkb It uses text to store custom types and leaves the parsing and filtering to Postgres. It's definitely not very efficient, but always works. Note that pg_lake can delegate whole complex queries into DuckDB, just not when it needs to filter in Postgres.

English

68

こば -Koba as a DB engineer-@tzkb·15 Ara

pg_lakeだとDuckDBが扱えないはずのデータ型を指定しても、create tableが出来ている。そしてクエリ投げると、DuckDBとpostgresの両方でfilter掛けるような動きになっている。これ、pg_lake(pgduck_server?)が何かしら変換掛けてるのかな。感覚的にこちらの方が不思議な動きに見える。

日本語

0

1

712

こば -Koba as a DB engineer-@tzkb·15 Ara

postgres＋DuckDB/Icebergについての現状がまとまってた👀 pg_mooncakeの動きが良く分かってなかったので有難い。そして、DuckDB v1.4からIceberg対応が続いてるので、この辺の情勢もすぐに変わりそうである。 zenn.dev/forcia_tech/ar…

日本語

5

38

4.1K

Marco Slot@marcoslot·11 Ara

@mim_djo with del as (delete from heap returning *) insert into iceberg select * from del; Put it in a pg_cron job and problem solved 😉

English

0

2

328

Mim@mim_djo·11 Ara

iceberg or delta for that matter are not great for streaming, i got like 1.3 transaction per second 🙂

English

1

14

1.6K

Marco Slot retweetledi

Mim@mim_djo·9 Ara

#apacheiceberg stored in s3, served as direct query to #powerbi desktop running using a local #postgresql with pg_lake (based on #duckdb)

English

6

7

80

9.2K

Marco Slot retweetledi

Mim@mim_djo·8 Ara

Playing with pg_lake in Postgresql, suddenly everything make sense, same database two workload OLTP and OLAP using 🦆

English

16

1.5K

Marco Slot retweetledi

Craig Kerstiens@craigkerstiens·4 Kas

Excited to announce the release of pg_lake which lets you: - Manage Iceberg tables directly in Postgres - Query raw data files in your data lake - Flexibly data import/export Repo at: github.com/snowflake-labs… Announce post: snowflake.com/en/engineering…

English

9

20

156

19K

Marco Slot@marcoslot·4 Kas

@craigkerstiens Docs: github.com/Snowflake-Labs…

English

1

471

Marco Slot@marcoslot·4 Kas

Blog by @craigkerstiens : snowflake.com/en/engineering…

English

2

468

Marco Slot@marcoslot·4 Kas

pg_lake just went open source! pg_lake is a set of extensions (from Crunchy Data Warehouse) that add comprehensive Iceberg support and data lake access to Postgres, with @duckdb transparently integrated into the query engine. GitHub: snowflake-labs/pg_lake Blog link below

English

Vol:18 No:6 → Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility vldb.org/pvldb/vol18/p1…

5

28

1.4K

Marco Slot retweetledi

Andy Pavlo (@andypavlo.bsky.social)@andy_pavlo·3 Tem

.@abigale_kim's paper is unleashed! It's the most complete eval of DB extensions/plugins. We analyze @PostgreSQL, @MySQL, @mariadb , SQLite, @duckdb, @Redis. TLDR: Postgres ecosystem is fraught w/ footguns. Other DBMSs have fewer extns but less problems. DuckDB has cleanest API.

PVLDB@pvldb

English

32

182

14.5K

Marco Slot retweetledi

Andy Pavlo (@andypavlo.bsky.social)@andy_pavlo·3 Tem

No system hits the sweet spot of allowing for extensibility while maintaining systems safety. It would be nice if there was a standard plugin API (think POSIX) that allows compatibility across systems. Thanks to @marcoslot + @dave_andersen for their collaboration on this project

Andy Pavlo (@andypavlo.bsky.social) tweet media

English