Daft

616 posts

Daft

@daftengine

Distributed query engine providing simple and reliable data processing for any modality and scale (https://t.co/IN219tFqrN)

Se unió Eylül 2022

46 Siguiendo802 Seguidores

Tweet fijado

Daft@daftengine·12 Ara

.@SourcetableApp CTO @andrewgrosser shares his recommended tech stack for serious startups - a "wicked combination" that includes: - S3 + Cassandra for data - Daft for processing - Python, WASM, Ray Learn how they built the first AI-powered spreadsheet: daft.ai/blog/how-sourc…

English

879

Daft retuiteado

Sammy Sidhu@Sammy_Sidhu·3d

🚨 @NVIDIA's Jensen Huang just dropped the biggest validation for the DataFrame ecosystem at #GTC2026: “You've heard of it — SQL, Spark, Pandas, Velox — some of these really important very large platforms — Snowflake, Databricks, Amazon EMR, Azure Fabric, and Google Cloud, BigQuery... ALL OF THESE PLATFORMS ARE PROCESSING DATAFRAMES. ...About 90% of what's generated every single year is unstructured data. Until now, this data has been completely useless to the world. We read it, we put it into our file system, and that's it. Unfortunately, we can't query it. We can't search for it... so now we have AI do that. Just as AI was able to solve multi-modality perception and understanding, you can use that same technology to go read a PDF."

English

442

Daft retuiteado

Everett Kleven@everettkleven·3d

daft.func scales stateless Python. But what if your function needs to hold onto something between rows? A loaded model. A database connection. An API client. @daftengine daft.cls turns a Python class into a distributed operator. __init__ runs once per worker. Methods run on every row.

English

194

Daft retuiteado

Everett Kleven@everettkleven·12 Mar

@desmondcheongzx shipped 8 PRs in @daftengine this week. The range is absurd. Performance: → Parquet late materialization via RowFilter + RowSelection (#6311) — skip data you don't need → Lazy-load Lance so `import daft` is faster for everyone (#6328) Infrastructure: → Removed parquet2 entirely (#6339) → Made arrow-rs the default Parquet reader (#6335) → Field.name: String → Arc for fewer allocations (#6351) Docs: → Custom catalog implementation guide (#6366) → Join strategies optimization guide (#6367) 8 PRs. Perf, infra, and docs. One week.

English

172

Daft retuiteado

Everett Kleven@everettkleven·12 Mar

Daft v0.7.5 just shipped. NEW NATIVE PLUGIN SYSTEM Any language that can produce a C-compatible shared library can now extend @Daftengine's query engine. rchowell modeled it after PostgreSQL's extension system stable binary interface, Arrow C Data Interface at the boundary.

English

216

Daft retuiteado

Everett Kleven@everettkleven·11 Mar

goodbye arrow2. universalmind303 merged the final PR removing arrow2 from @daftengine. 122 PRs. 9 contributors. 38,850 lines changed. 6 releases. Every kernel in the engine rewritten. One PR title says it all: "refactor(arrow2): goodbye arrow2" github.com/Eventual-Inc/D…

English

227

Daft retuiteado

Everett Kleven@everettkleven·10 Mar

Last week we covered what a UDF is and why you need one. This week: the four stateless patterns in @daftengine's daft.func. Row-wise. Generators. Async. Batch. One decorator. Four shapes of work. Same code on your laptop or a cluster.

English

221

Daft retuiteado

Sammy Sidhu@Sammy_Sidhu·6 Mar

One of my favorite PRs for @daftengine Thanks @andrewlamb1111 and the rest of the arrow-rs community for creating such a great crate!

English

224

Daft retuiteado

Everett Kleven@everettkleven·6 Mar

✌️ Dueces @ApacheArrow @CW_Arrow

Español

1.7K

Daft@daftengine·5 Mar

Works with every catalog @ApacheIceberg supports: REST, Hive, Glue, Nessie, @unitycatalog. Read distributed. Write distributed. Filter pushdown. Zero code changes local to cluster. Iceberg docs: docs.daft.ai/en/stable/conn…

English

Daft@daftengine·5 Mar

Daft shipped same-day support for PyIceberg 0.11.0 — including the breaking API changes to partition fields and data file inspection. Plus: custom snapshot properties on writes for pipeline lineage tracking. Day-one compatibility. Not backfilling months later.

English

Daft@daftengine·5 Mar

Everyone's talking about Apache Iceberg this week. Snowflake just shipped v3 public preview. Supabase built Analytics Buckets on it. The entire data eng timeline is Iceberg. Meanwhile @daftengine has been quietly shipping deep Iceberg support. Here's what you can do today:

English

461

Daft retuiteado

Everett Kleven@everettkleven·4 Mar

What's the most painful custom transform you've had to scale? Most data frameworks make you choose between writing idiomatic python or scale with internal apis. @daftengine says no. Use `@daft.func` and `@daft.cls` to scale any python function. [sync, batch, async, generators]

English

363

Daft retuiteado

Everett Kleven@everettkleven·3 Mar

60+ storage backends. One line of config. @daftengine now supports @ApacheOpenDAL — the Apache project that unifies S3, GCS, Azure, HDFS, Tencent COS, Google Drive, Redis, and dozens more behind a single API.

English

Daft retuiteado

Everett Kleven@everettkleven·3 Mar

Most data teams don't have one observability problem. They have three: 1) Local development 2) Production monitoring 3) Post-incident analysis How we're tackling this in @daftengine:

English

364

Daft@daftengine·26 Şub

why did we choose arrow2... this ones for the real ones... universalmind303 what would we do without you 😩

English

832

Daft@daftengine·26 Şub

uv add "daft=>0.7.4" github.com/Eventual-Inc/D…

Norsk

108

Daft@daftengine·26 Şub

OpenDAL Daft now reads from any storage backend that Apache OpenDAL supports — S3, GCS, Azure Blob, HDFS, Tencent COS, and dozens more. One API. No dedicated connectors.

English

112

Daft@daftengine·26 Şub

Daft v0.7.4 just shipped. THE GREAT ARROW-RS MIGRATION 122 PRs completing the arrow2 → arrow-rs migration. 38,850 lines changed. Plus: full observability stack, Apache OpenDAL support, and Flight shuffle for Flotilla.

English

2.4K

Descubrir

@NVIDIA @desmondcheongzx @andrewlamb1111 @ApacheArrow @CW_Arrow @ApacheIceberg @unitycatalog @Daft