Julien Le Dem

11K posts

Julien Le Dem banner
Julien Le Dem

Julien Le Dem

@J_

Architect, Founder, Angel, Advisor, Keynote speaker, OSS: @OpenLineage @MarquezProject, ASF: Parquet Arrow Iceberg 🐖. 🦋 https://t.co/4VQUXaZ5vu . he/him

California Katılım Mayıs 2009
2.1K Takip Edilen4.1K Takipçiler
Julien Le Dem retweetledi
Kevin Liu
Kevin Liu@kevinjqliu·
Hey it’s @J_ @ Iceberg Summit. Great keynote!!
Kevin Liu tweet media
English
0
1
2
273
Julien Le Dem retweetledi
AI Council
AI Council@AICouncilConf·
The database landscape is going through its biggest shift in a decade. AI workloads are pushing OLTP and OLAP closer together, object storage is becoming the de facto foundation for AI search at hundred‑billion‑ to trillion‑document scale, and agents are spinning up databases and schemas programmatically in patterns classic systems were never optimized for. AI Council's Data Engineering & Databases track is where the builders re-architecting the stack come together. Curated by @saisrirampur, Director of Product at @ClickHouseDB, here's the lineup: → Hannes Mühleisen, Co-Founder & CEO at @duckdb: "Super-Secret Next Big Thing for DuckDB" → @nikhilbenesch, CTO at @turbopuffer: "Fast AI Search on Object Storage @ >1 Trillion Scale" → @J_ & Pierre Lacave, Principal Engineer & Staff Engineer at @datadoghq: "The Deconstructed Database at Datadog" → @iskakaushik, Engineering Lead at ClickHouseDB: "Building a Unified OLTP + OLAP Database for AI Workloads" → Bhargavi Reddy Dokuru, Staff Data Engineer at @netflix: "Democratizing Analytics via Self Service: Netflix Games Edition" → @kelvich, Principal Software Engineer at @databricks & Neon co-founder: "AI Needs a New Kind of OLTP: Lakebase & Serverless Postgres in the Agent Era" → Robin Tang, Co-founder & CTO at @artie_labs: “Scaling CDC to Trillions of Rows: What Broke, What We Rebuilt, and What AI Demands Next” A huge thank you to Sai for curating this awesome track! Join us SF, May 12-14! 🎟️ aicouncil.com
AI Council tweet mediaAI Council tweet mediaAI Council tweet media
English
0
5
15
1.1K
Julien Le Dem retweetledi
Onehouse
Onehouse@Onehousehq·
@J_ co-created Apache Parquet, Apache Arrow, and OpenLineage. Three projects. Three industry standards. Parquet at Twitter in 2013. Arrow at Dremio. OpenLineage at Datakin, acquired as part of Astronomer's $213M Series C. He is now Principal Engineer at Datadog and an officer of the Apache Software Foundation. That is an unusual track record of picking the right abstraction at the right time. His OpenXData talk argues that the current wave of challengers -- Lance, Vortex, Nimble, FastLanes, BtrBlocks, F3 -- are solving real problems but misreading what made Parquet succeed in the first place. The core contribution was not the encoding choices. It was the community consensus mechanism those choices were built inside. His case: use established open source communities to absorb these innovations rather than fragment the ecosystem across six competing formats. He published the written version of this argument at sympathetic.ink in December 2025. OpenXData is where you can push back live. 👉 Register here: openxdata.ai
Onehouse tweet media
English
0
2
6
352
Julien Le Dem
Julien Le Dem@J_·
@changhiskhan My point is the upper layers can be built on top of the existing format. You don’t need to start a new stack from scratch. Not everything needs to be in parquet an you don’t need a new format to build them.
English
0
0
0
84
changhiskhan
changhiskhan@changhiskhan·
@J_ Great article and really thoughtful! Encodings is almost the least interesting part of Lance tho. Requirements for AI data and workloads spans multiple layers. Even IF parquet moves forward on encodings there’s still so much that’s missing for AI engineers and researchers *now*.
English
1
0
1
145
Julien Le Dem
Julien Le Dem@J_·
In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet. Presumably, the design of yore is not going to cut it moving forward. I spent some time to understand how things actually changed. sympathetic.ink/2025/12/11/Col…
English
3
17
91
6.6K
Julien Le Dem retweetledi
Andrew Lamb
Andrew Lamb@andrewlamb1111·
There is some crazy (good) activity on the @ApacheParquet mailing list for new encodings. A sample: PFOR, FSST, ALP, Strings and Cascaded Encodings. 🤯 Huge kudos to Arnav Balyan, Prateek Gaur, and Micah Kornfield for driving this. @parquet.apache.org" target="_blank" rel="nofollow noopener">lists.apache.org/list.html?dev@…
English
1
6
57
5.5K
Julien Le Dem
Julien Le Dem@J_·
I’ll be speaking in Mountain View on Thursday. Come say hi!
Delta Lake@DeltaLakeOSS

Parquet sparked a revolution in columnar storage. Now AI workloads are driving a new wave of change. At 𝗢𝗽𝗲𝗻 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 + 𝗔𝗜 𝗠𝗶𝗻𝗶 𝗦𝘂𝗺𝗺𝗶𝘁, Julien Le Dem (@datadoghq) will cover: 🔹 What’s changed since Parquet was introduced 🔹 Why new columnar formats are emerging now 🔹 The encoding advances shaping what comes next—and how they’re pushing Parquet to evolve 📅 Nov 13 | Mountain View 🔗 Register: luma.com/OLMS-1113 #openlakehouse #opensource #columnstorage #ai #parquet

English
0
1
5
1.1K
Julien Le Dem retweetledi
Hyperparam
Hyperparam@hyperparamapp·
Cool parquet metadata visualizer by @J_, powered by Hyparquet
Hyperparam tweet media
English
0
2
2
296
Julien Le Dem
Julien Le Dem@J_·
If you've been wondering why we see a flurry of new columnar formats, come see me present "Column Storage for the AI Era". I'll talk about what has changed, new advances in data encoding and how that's pushing Parquet to evolve. Event tomorrow: luma.com/pxikwty3
English
0
1
5
609
Julien Le Dem
Julien Le Dem@J_·
@julianhyde Ironically it is simultaneously terrible and the thing everyone compares themselves against :)
English
0
0
3
56
Julian Hyde
Julian Hyde@julianhyde·
@J_ I hope you get a dime every time someone says Parquet is terrible.
English
1
0
2
146
Julien Le Dem
Julien Le Dem@J_·
I'm trying to understand a bit better real life deployments of open source Clickhouse. If you're using it, what does your deployment look like?
English
0
1
2
353