Scott Haines

2.8K posts

Scott Haines

@newfront

OSS Engineer | Speaker • Trainer | #DatabricksMVP | Author @OReillyMed | ❤️ #ApacheSpark. ❤️ #Dogs. #DatabricksMVP. Views are my own

California, USA Katılım Kasım 2008

923 Takip Edilen853 Takipçiler

Scott Haines@newfront·4d

@ChiefScientist @vercel @rauchg I love that everything solves itself. Like I know most things are table stakes now a days, but having SSL certs keep themselves up to date...

GIF

English

Alexy 🤍💙🤍@ChiefScientist·4d

I derive so much value from @vercel that I've been relieved when I hit some limits in Hobby with like 4 projects working for months, and now am a paying customer. @rauchg provides so much value for $20/month, it is unbelievable!

English

302

Scott Haines retweetledi

Youssef Mrini@YoussefMrini·4d

Apache Spark 4.2 is already available on DBR 19 and it reflects the strength of the Apache Spark community with more than 1,900 commits from over 260 contributors. What's new in Apache Spark 4.2 ? 🛑 Metric views: a native semantic layer that lets you define governed business metrics once and reuse them consistently across SQL, BI, apps and AI, preserving correct aggregation semantics. ( Open source is on our DNA) 🛑Spark Connect & PySpark: better Spark Classic compatibility (RDD APIs, YARN cluster mode, error propagation) + easier remote invocation of Spark as a service. 🛑Arrow-first Python: Arrow-optimized Python UDFs on by default, Pandas 3 support, Arrow UDFs and zero-copy interop with Polars/DuckDB via the Arrow Data Interface. 🛑AI-native SQL: vector similarity/distance functions and NEAREST BY top-K retrieval for search, recommendations and RAG-style workloads. 🛑Native geospatial: built-in GEOMETRY/GEOGRAPHY types and ST_* functions, with Parquet, WKT/WKB and SRID support. 🛑More SQL primitives: SYSTEM.BUILTIN/SYSTEM.SESSION qualification, SET PATH search paths, SQL cursors, QUALIFY, time_bucket, tuple sketches, top-K max_by/min_by and IGNORE/RESPECT NULLS. 🛑Auto CDC in Declarative Pipeline: first-class SCD Type 1 change-data processing via a Python API, replacing hand-written merge logic. 🛑Real-Time Mode for PySpark( Like the flash) millisecond-latency streaming now extended to stateless PySpark queries. 🛑Data Source V2 advances: first-class CDC via the CHANGES clause, MERGE INTO performance gains, schema evolution for INSERT INTO, and transaction API foundations. 🛑Platform improvements: modernized Web UI (Bootstrap 5, dark mode), better Kubernetes support and JDK 25. #Databricks

English

256

Scott Haines retweetledi

Databricks@databricks·5d

Teams no longer have to choose between multi-engine flexibility, performance, and centralized governance. External access to Unity Catalog managed Delta tables is now in Public Preview. Engines like Apache Spark, Apache Flink, and DuckDB, can create, read, and write to the same governed copy of data. Behind the scenes, Predictive Optimization improves query performance and lowers storage costs. Migrating to UC managed tables is easy - simply upgrade your external tables with ALTER TABLE SET MANAGED. databricks.com/blog/how-unity…

English

4.7K

Scott Haines@newfront·5d

@savsql happy Wednesday :)

English

savannah longoria@savsql·6d

hi friends!!! missed your beautiful faces on my TL.

English

Scott Haines retweetledi

Vectorize@Vectorizeio·5d

Hindsight is now natively inside @Omnigent_ai 🎉 One memory setup. Every harness. Tell your agent something once. Claude Code, Codex, Pi, anything you run through Omnigent remembers it. Great work from @databricks and @matei_zaharia on the meta-harness layer. Memory belongs here.

Vectorize@Vectorizeio

x.com/i/article/2077…

English

1.8K

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·10 Tem

Catalog-managed Delta 4.3: Spark streaming + CDF + catalog-driven batch CDC 👇 🔹 All standard streaming read options 🔹 Replay via CHANGES FROM VERSION/TIMESTAMP 🔹 Delta Sharing CDF for shared tables 🔗 Learn more: delta.io/blog/2026-06-2… #DeltaLake

English

646

Scott Haines@newfront·9 Tem

@dougc333 @HoytEmerson :). Engineering life has left me wizened!

English

doug chang@dougc333·9 Tem

@HoytEmerson @newfront nice hair SH!!

English

Scott Haines retweetledi

Hoyt Emerson@HoytEmerson·7 Tem

What does "no vendor lock in" really mean when it comes to lakehouses like Apache Iceberg and Delta Lake. Allow Scott Haines (@newfront), Dev Advocate at Databricks to explain it best. Watch it all here: youtube.com/watch?v=4eKnmA…

YouTube

English

1.2K

Scott Haines retweetledi

Ali Ghodsi@alighodsi·3 Tem

My co-founder @rxin personally wrote this really good paper that explains the main idea behind postgres Lakebase as well as LTAP. It almost serves as a primer on how transactional databases are built and how Lakebase and LTAP work. Maybe more importantly, what are the tradeoffs, and what are you giving up by adopting this new approach. Highly recommended reading: databricks.com/blog/lakebase-…

English

331

31.2K

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·1 Tem

The single canonical stack is giving way to composable, interoperable lakehouses. In this clip, @lisancao and @newfront cover format choice, glue code, and what comes next. 👀 Full video: youtu.be/dEFAkS7vYV0 #DeltaLake #OpenLakehouse

YouTube

English

698

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·29 Haz

Delta Lake: The Definitive Guide book signing at Data + AI Summit 2026 was a big success! 📚 @dennylee, @newfront, Tristen Wentling, and Tyler Croy were busy signing copies and meeting community members, with a line stretching across the expo floor. Thank you to the 300+ people who waited in line for a physical book and connected with the authors. 🙌 Here's to everyone building with Delta Lake. #opensource #deltalake #oss #dataaisummit

English

638

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·28 Haz

Delta 4.3 extends catalog-managed tables with Unity Catalog REST APIs. 👇 🔌 Intent-based catalog commits 🔄 Atomic incremental UniForm 📡 CDC streaming + Row Tracking Apache Spark, DuckDB, Apache Flink + Delta-Kernel on one catalog. 🔗 Read more: delta.io/blog/2026-06-2… #DeltaLake #ApacheSpark #OpenSource

English

4.7K

Scott Haines retweetledi

Matei Zaharia@matei_zaharia·19 Haz

We just released Omnigent 0.2 with a ton of improvements from the past 5 days! Here's what's new according to Omnigent. Major additions are @cursor_ai CLI and @antigravity harnesses, lots of new sandbox providers, and secret-free sandboxing via API proxy. github.com/omnigent-ai/om…

English

192

28.3K

Scott Haines retweetledi

#DataAISummit@Data_AI_Summit·17 Haz

📚 Attendees are lined up for the @DeltaLakeOSS: The Definitive Guidebook signing at our DevLounge! Stop by to meet authors @dennylee, Tristen Wentling, @newfront, and Prashanth Babu and pick up a signed copy.

English

1.5K

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·13 Haz

From 2017 to now, Delta Lake has grown to 40M+ downloads/month and powers daily processing of hundreds of exabytes. At #DataAISummit, “The Road to Delta 5.0” will cover what’s next: 🔹 catalog-first Delta 🔹 Data Source V2 modernization 🔹 Delta + Iceberg convergence 🔹 Delta Kernel alignment (Java + Rust) Details: databricks.com/dataaisummit/s… #DeltaLake #OpenSource

English

611

Scott Haines@newfront·12 Haz

@YoussefMrini @DeltaLakeOSS Yes you can!

English

Youssef Mrini@YoussefMrini·12 Haz

@DeltaLakeOSS @newfront Can I get one ?

English

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·11 Haz

Back by popular demand! 📘 Delta Lake: The Definitive Guide book signing returns to Data + AI Summit 2026. If you are at Data + AI Summit, come meet the authors, say hi to the Delta Lake community, and pick up a signed copy while supplies last. 🗓️ Tuesday, June 16 🕝 2:00–2:30 PM 📍 Dev Lounge, Data + AI Summit Expo, Moscone Center Books tend to go quickly, so plan to stop by early. Hope to see you there! 👋 #DeltaLake #DataAISummit #OpenSource #DataEngineering @dennylee @newfront @Data_AI_Summit

English

2.9K

Scott Haines retweetledi

Delta Lake@DeltaLakeOSS·3 Haz

Data + AI Summit session: Tyler Croy (delta-rs maintainer) will introduce Virtual Delta Tables and the associated open source code designed for multimodal inference.👇 🗓️ June 15-18 📍 San Francisco 🔗 Session details: databricks.com/dataaisummit/s… #DeltaLake #MultimodalAI #DataAndAISummit #Lakehouse

English

374

Scott Haines retweetledi

Alexy 🤍💙🤍@ChiefScientist·31 May

Exciting!

dale@daleverett

pgGraph v0.1.4 is here. (Full Open Source, Rust) For anyone new here: pgGraph is a PostgreSQL extension for bringing graph capabilities directly into Postgres. With v0.1.4, we focused on making pgGraph easier to install, package, validate, and prepare for wider source-based distribution through PGXN. This release keeps the existing v0.1.3 SQL contract intact and introduces no breaking changes. What changed: > Added PGXN META.json distribution metadata at the repository root, plus a PGXN-compatible top-level Makefile that delegates into cargo pgrx. That means pgGraph is now better aligned with the standard PostgreSQL extension distribution flow. > Expanded the installation documentation across all the docs including clearer guidance for source installs, PG_CONFIG targeting, and missing PostgreSQL header troubleshooting. And yes, something much bigger is coming very soon. We have been working on a massive update that pushes pgGraph far beyond “Postgres extension with graph traversal.” I cannot say too much yet, but if you care about Postgres, graph workloads, AI agents, or making existing databases more intelligent, you will want to watch this repo. pgGraph v0.1.4 is out now. Link to repo & docs below.

English

787

Scott Haines retweetledi

Apache Spark@ApacheSpark·1 Haz

#DataAISummit Session Spotlight ➡️ Learn how to build agentic workflows with OSS Spark Declarative Pipelines, with patterns for deterministic, testable, production-ready data workflows. 🗓️ June 15–18 📍 San Francisco 🔗 Session details: databricks.com/dataaisummit/s… #ApacheSpark #DataAISummit

English

1.1K

Keşfet

@ChiefScientist @vercel @rauchg @savsql @Omnigent_ai @databricks @matei_zaharia @dougc333