Daft

616 posts

Daft banner
Daft

Daft

@daftengine

Distributed query engine providing simple and reliable data processing for any modality and scale (https://t.co/IN219tFqrN)

Se unió Eylül 2022
46 Siguiendo802 Seguidores
Tweet fijado
Daft
Daft@daftengine·
.@SourcetableApp CTO @andrewgrosser shares his recommended tech stack for serious startups - a "wicked combination" that includes: - S3 + Cassandra for data - Daft for processing - Python, WASM, Ray Learn how they built the first AI-powered spreadsheet: daft.ai/blog/how-sourc…
English
2
1
9
879
Daft retuiteado
Sammy Sidhu
Sammy Sidhu@Sammy_Sidhu·
🚨 @NVIDIA's Jensen Huang just dropped the biggest validation for the DataFrame ecosystem at #GTC2026: “You've heard of it — SQL, Spark, Pandas, Velox — some of these really important very large platforms — Snowflake, Databricks, Amazon EMR, Azure Fabric, and Google Cloud, BigQuery... ALL OF THESE PLATFORMS ARE PROCESSING DATAFRAMES. ...About 90% of what's generated every single year is unstructured data. Until now, this data has been completely useless to the world. We read it, we put it into our file system, and that's it. Unfortunately, we can't query it. We can't search for it... so now we have AI do that. Just as AI was able to solve multi-modality perception and understanding, you can use that same technology to go read a PDF."
Sammy Sidhu tweet media
English
1
3
7
442
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
daft.func scales stateless Python. But what if your function needs to hold onto something between rows? A loaded model. A database connection. An API client. @daftengine daft.cls turns a Python class into a distributed operator. __init__ runs once per worker. Methods run on every row.
Everett Kleven tweet media
English
1
1
5
194
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
@desmondcheongzx shipped 8 PRs in @daftengine this week. The range is absurd. Performance: → Parquet late materialization via RowFilter + RowSelection (#6311) — skip data you don't need → Lazy-load Lance so `import daft` is faster for everyone (#6328) Infrastructure: → Removed parquet2 entirely (#6339) → Made arrow-rs the default Parquet reader (#6335) → Field.name: String → Arc for fewer allocations (#6351) Docs: → Custom catalog implementation guide (#6366) → Join strategies optimization guide (#6367) 8 PRs. Perf, infra, and docs. One week.
English
0
2
3
172
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
Daft v0.7.5 just shipped. NEW NATIVE PLUGIN SYSTEM Any language that can produce a C-compatible shared library can now extend @Daftengine's query engine. rchowell modeled it after PostgreSQL's extension system stable binary interface, Arrow C Data Interface at the boundary.
Everett Kleven tweet media
English
1
2
3
216
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
goodbye arrow2. universalmind303 merged the final PR removing arrow2 from @daftengine. 122 PRs. 9 contributors. 38,850 lines changed. 6 releases. Every kernel in the engine rewritten. One PR title says it all: "refactor(arrow2): goodbye arrow2" github.com/Eventual-Inc/D…
English
1
2
5
227
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
Last week we covered what a UDF is and why you need one. This week: the four stateless patterns in @daftengine's daft.func. Row-wise. Generators. Async. Batch. One decorator. Four shapes of work. Same code on your laptop or a cluster.
Everett Kleven tweet media
English
1
2
4
221
Daft retuiteado
Sammy Sidhu
Sammy Sidhu@Sammy_Sidhu·
One of my favorite PRs for @daftengine Thanks @andrewlamb1111 and the rest of the arrow-rs community for creating such a great crate!
Sammy Sidhu tweet media
English
0
1
4
224
Daft
Daft@daftengine·
Daft shipped same-day support for PyIceberg 0.11.0 — including the breaking API changes to partition fields and data file inspection. Plus: custom snapshot properties on writes for pipeline lineage tracking. Day-one compatibility. Not backfilling months later.
English
1
0
0
84
Daft
Daft@daftengine·
Everyone's talking about Apache Iceberg this week. Snowflake just shipped v3 public preview. Supabase built Analytics Buckets on it. The entire data eng timeline is Iceberg. Meanwhile @daftengine has been quietly shipping deep Iceberg support. Here's what you can do today:
English
1
0
5
461
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
What's the most painful custom transform you've had to scale? Most data frameworks make you choose between writing idiomatic python or scale with internal apis. @daftengine says no. Use `@daft.func` and `@daft.cls` to scale any python function. [sync, batch, async, generators]
Everett Kleven tweet media
English
5
2
4
363
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
60+ storage backends. One line of config. @daftengine now supports @ApacheOpenDAL — the Apache project that unifies S3, GCS, Azure, HDFS, Tencent COS, Google Drive, Redis, and dozens more behind a single API.
English
1
4
16
4K
Daft retuiteado
Everett Kleven
Everett Kleven@everettkleven·
Most data teams don't have one observability problem. They have three: 1) Local development 2) Production monitoring 3) Post-incident analysis How we're tackling this in @daftengine:
Everett Kleven tweet media
English
6
2
7
364
Daft
Daft@daftengine·
why did we choose arrow2... this ones for the real ones... universalmind303 what would we do without you 😩
English
0
1
4
832
Daft
Daft@daftengine·
OpenDAL Daft now reads from any storage backend that Apache OpenDAL supports — S3, GCS, Azure Blob, HDFS, Tencent COS, and dozens more. One API. No dedicated connectors.
English
1
0
0
112
Daft
Daft@daftengine·
Daft v0.7.4 just shipped. THE GREAT ARROW-RS MIGRATION 122 PRs completing the arrow2 → arrow-rs migration. 38,850 lines changed. Plus: full observability stack, Apache OpenDAL support, and Flight shuffle for Flotilla.
Daft tweet media
English
1
0
5
2.4K