Scott Haines

2.8K posts

Scott Haines banner
Scott Haines

Scott Haines

@newfront

OSS Engineer | Speaker • Trainer | #DatabricksMVP | Author @OReillyMed | ❤️ #ApacheSpark. ❤️ #Dogs. #DatabricksMVP. Views are my own

California, USA Katılım Kasım 2008
889 Takip Edilen834 Takipçiler
Scott Haines retweetledi
DuckDB
DuckDB@duckdb·
The Delta and Unity Catalog extensions in the latest DuckDB release come with a fresh set of features and have shed their experimental labels. In today's blog post, Ben Fleis walks you through the key improvements: ✍️ You can now write Delta tables with DuckDB. Multiple inserts within a transaction produce a single atomic version in the Delta table. 🤝 The Unity Catalog unlocks multi-writer access. DuckDB and other clients such as Spark can now perform writes alongside each other with the catalog handling concurrency control. ⏪ You can use the coolest feature of data lake formats: time travel. This lets you query any Delta table at a specific historical version. Thanks to incremental snapshot loading, this is fast even across large Delta logs. Read the full blog post for more – link in the thread 🧵
DuckDB tweet media
English
2
13
66
5.5K
Scott Haines retweetledi
Unity Catalog
Unity Catalog@unitycatalog_io·
[Ep 1] Open Lakehouse + AI: The Catalog Layer, Interoperability & AI-Agent Governance Learn how composable lakehouses center the catalog for metadata, versioning, and commit coordination. 👇 🔸 Interop: Iceberg REST, Unity Catalog (OSS), Spark, Delta Lake, Delta-RS, Iceberg. 🔸 Governance: credential vending, row/col filters, column masks, audits, and why “no governance was the easiest governance.” 🔸 Agents as data customers (Temporal). Fine-grained access beats blanket credentials on 24/7 workloads. 🎥 Full episode: youtube.com/watch?v=dEFAkS… #OpenLakehouse #UnityCatalog @ApacheSpark @ApacheIceberg @DeltaLakeOSS
YouTube video
YouTube
Unity Catalog tweet media
English
1
2
4
276
Kirill Skrygan
Kirill Skrygan@kskrygan·
Would you be interested if JetBrains releases a totally local AI agent, working 100% on your laptop, using our code insight engine and deeply integrated into the IDE? Yes, it will be probably 1 month behind the very recent frontier models, but no token blood bath anymore WDYT?
English
805
234
7.1K
489.3K
Scott Haines
Scott Haines@newfront·
☘️ I had the pleasure of speaking at the Seattle/Bellevue Apache DataFusion meetup last Thursday at @databricks . 🚔 The talk was on governance "Unifying Open Lakehouse Governance via Policy Portability". This might feel like a boring concept, but it is in my opinion one of the most important things for us to get right to actually support "agents" and "autonomous work". 🌲 I covered using the Cedar Policy Language (as a foundation to move beyond simple authorization through the use of "annotations"). Learn more at cedarpolicy.com/en. 🧪 I showed how "row-filters" and "column-masks" can be encoded and then compiled to Google's Common Expression Language (CEL) as a portable means of sharing "expressions" that can then be transformed for a given "engine" - for this talk I covered @ApacheDataFusio (but the same pattern can also work with @ApacheSpark). The litmus test was supporting the two engines. Thanks to @andrewlamb1111 for putting together a great group of speakers (@westoncpace from @lancedb , @lukekim from @spice_ai, @JiaYu_JY from @wherobots , Ruihang Xia from @Greptime, myself @newfront of @databricks) and of course Andrew Lamb himself from @InfluxDB. I'll be releasing the policast environment within the next two weeks so people can take a look at it.
English
0
0
3
85
Scott Haines retweetledi
Delta Lake
Delta Lake@DeltaLakeOSS·
Join us for this first Delta Lake Community Meetup of 2026 on Tuesday, April 21 at 9AM PT! 🚀 We’re bringing the community together for a deep dive into the ecosystem, infrastructure enhancements, and the future project roadmap. Come get your technical questions answered live by the maintainers. ​ What we'll cover: 🔹 Latest Delta Lake updates and how the community is evolving 🔹 A technical look at infrastructure enhancements 🔹 The future of Delta Lake: Roadmap insights and a deep dive into Iceberg v4 compatible metadata 🔹 Live Q&A with the community RSVP ➡️ luma.com/deltalake-0426 #opensource #oss #deltalake #community
English
0
1
1
217
Scott Haines
Scott Haines@newfront·
@jaceklaskowski What was your experience like? What advice would you give folk looking to migrate from structured streaming?
English
0
0
0
10
Matei Zaharia
Matei Zaharia@matei_zaharia·
Definitely a surprise! It wouldn't have been possible without my awesome collaborators and students.
Databricks@databricks

We're incredibly proud to congratulate our co-founder and CTO, @matei_zaharia, on receiving the ACM Prize in Computing for his development of distributed data systems that have enabled large-scale machine learning, analytics, and AI. Matei's open-source contributions have fundamentally changed how organizations work with data and AI — including Apache Spark™, Delta Lake, and MLflow. Researchers, nonprofits, startups, and enterprises across every industry have built on the foundation he helped create. Now he's pushing the frontier further, focusing on building and scaling reliable AI agents through open-source research like DSPy and GEPA. Matei, this recognition is so well deserved. We're honored to build alongside you every day. awards.acm.org/about/2025-acm…

English
12
21
211
14.7K