Apache Hudi

811 posts

Apache Hudi

@apachehudi

Official twitter handle of Apache Hudi, an open data lakehouse platform. https://t.co/SXay7oHNah

Katılım Ocak 2019

125 Takip Edilen3.8K Takipçiler

Sabitlenmiş Tweet

Apache Hudi@apachehudi·17 Ara

Hudi 1.0 is the most powerful release to date for data lakehouses. Read the blog for details: Secondary Indexing, Expression Indexes, Partial Updates, Non-blocking Concurrency Control, New LSM timeline, +more: hudi.apache.org/blog/2024/12/1… #datalakehouse #opentableformat

English

2.7K

Apache Hudi@apachehudi·17h

Disaster recovery docs: hudi.apache.org/docs/disaster_… SQL procedures (savepoint / restore): hudi.apache.org/docs/procedures

English

Apache Hudi@apachehudi·17h

Use it for: bad upstream data ingested before you caught it. Bad schema migration. Bug in a transform that corrupted a batch. NOT for routine ops cleanup. Restore is a sledgehammer. Links ↓ #ApacheHudi #DisasterRecovery

English

Apache Hudi@apachehudi·17h

Bad commit at 14:00? Restore to 13:45 with two SQL calls. Hudi disaster recovery: savepoint + restore.

English

106

Apache Hudi@apachehudi·17h

Clustering docs: hudi.apache.org/docs/clustering RFC-19 (original clustering design): cwiki.apache.org/confluence/dis…

English

Apache Hudi@apachehudi·17h

REPLACE_COMMIT on the timeline. Async, incremental scheduling only processes partitions changed since last run — large-partition-count tables stay fast. Hudi 1.1: binary Parquet copy clustering ships 10-15× faster execution. Links ↓ #ApacheHudi #QueryPerformance

English

Apache Hudi@apachehudi·17h

Clustering bridges fast ingestion with fast queries. Ingestion likes small files. Queries don't. Hudi clustering rewrites file groups to optimize layout — without slowing writes ↓

English

153

Apache Hudi@apachehudi·5d

SCD-2 with Hudi at Walmart (Jayasheel Kalgal): medium.com/walmartglobalt…

Filipino

Apache Hudi@apachehudi·5d

Walmart Global Tech blog (Sam Guleff): medium.com/walmartglobalt…

English

Apache Hudi@apachehudi·5d

Walmart ran Apache Hudi across 600,000+ Spark cores. They picked it via weighted scorecard against Iceberg and Delta. Two real workloads tested ↓

English

324

Apache Hudi@apachehudi·6d

RFC-80 (column families): github.com/apache/hudi/tr… Lance file format in Hudi: hudi.apache.org/docs/next/lanc… Hudi file layouts: hudi.apache.org/docs/next/file…

English

Apache Hudi@apachehudi·6d

That extra abstraction is what lets one design absorb Parquet, Lance, and column-family layouts without rewriting the table mechanics. #ApacheHudi #AINativeLakehouse

English

Apache Hudi@apachehudi·6d

The Hudi file group is the primitive that makes Lance fragments, RFC-80 column families, and wide AI tables all click into place. One record key → one file group. Indexed, versioned, concurrency-controlled. Two AI workloads land on it ↓

English

253

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry