Matthew Russo

23 posts

Matthew Russo

@RussoMatthew

Third-year PhD candidate in the MIT Data Systems Group. Working on Palimpzest and semantic operators. Project: https://t.co/2gSZUQiHli Email: [email protected]

Boston, MA Katılım Mayıs 2024

40 Takip Edilen40 Takipçiler

Matthew Russo retweetledi

Andy Pavlo (@andypavlo.bsky.social)@andy_pavlo·22 Oca

Congratulations to the 2026 @cidrdb prize awardees! @tianyu_li_ → Gong Show Winner @FuhengZ → Database Quiz Winner They each received a rare signed print of "The Birth of the Database Messiah" (est value $12,000).

Andy Pavlo (@andypavlo.bsky.social) tweet media

English

113

16.6K

Matthew Russo retweetledi

Connor Shorten@CShorten30·18 Kas

I am SUPER EXCITED to publish the 131st episode of the Weaviate Podcast with Matthew Russo (@RussoMatthew), a Ph.D. student at MIT! 🎉 AI is transforming Database Systems. Perhaps the biggest impact so far has been natural language to query language translations, or Text-to-SQL. However, another massive innovation is brewing. 💥 AI presents new Semantic Operators for our query languages. For example, we are all familiar with the WHERE filter. Now we have AI_WHERE, in which an LLM or another AI model computes the filter value without needing it to be already available in the database! ```sql SELECT * FROM podcasts AI_WHERE “Text-to-SQL” in topics ``` Semantic Filters are just the tip of iceberg, the roster of Semantic Operators further includes Semantic Joins, Map, Rank, Classify, Groupby, and Aggregation! 🛠️ And it doesn’t stop there! One of the core ideas in Relational Algebra and its influence Database Systems is query planning and finding the optimal order to apply filters. For example, let’s say you have two filters, the car is red and the car is a BMW. Now let’s say the dataset only contains 100 BMWs, but 50,000 red cars!! Applying the BMW filter first will limit the size of the set for the next filter! 🧠 This foundational idea has all sorts of extensions now that LLMs are involved! This opportunity is giving rise to new query engines with declarative optimizers such as Palimpzest, LOTUS, and others! 💻 So many interesting nuggets in this podcast, loved discussing these things with Matthew, and I hope you find it interesting! 👇

English

Matthew Russo@RussoMatthew·6 Kas

👏 And finally kudos to the rest of the PZ team! Chunwei Liu, Gerardo Vitagliano, Sivaprasad Sudhir, Peter Baile Chen, Zui Chen, Rana Shahout, Lei Cao, Mike Cafarella, Sam Madden, Tim Kraska, and Michael J. Franklin

English

184

Matthew Russo@RussoMatthew·6 Kas

👀 See Immanuel Trummer's original announcement of #SemBench: linkedin.com/posts/immanuel…

English

Matthew Russo@RussoMatthew·6 Kas

📣 We're spreading the word about #SemBench -- a brand new benchmark for semantic query processing over multimodal workloads including text, image, audio, and tabular data! 📜Paper: bit.ly/3WGsZf6 💻Website: sembench.org 💾Code: bit.ly/49DeuAg

English

1.2K

Matthew Russo@RussoMatthew·30 May

Finally, this work is the product of a great research team: Chunwei Liu (@Tranway), Michael Cafarella (@MikeCafarella), Lei Cao, Peter Chen, Zui Chen (@ZuiChen), Michael Franklin (@franklinmj), Tim Kraska (@tim_kraska), Sam Madden (@samrmadden), Gerardo Vitagliano (@gerarvita)

English

1.1K

Matthew Russo@RussoMatthew·30 May

If this (very high-level) summary of our work has piqued your interest -- go read our full paper! 📄Paper: arxiv.org/pdf/2405.14696 💻Code: github.com/mitdbg/palimpz… We would love to hear any feedback, ideas for more use cases, and/or opportunities for collaboration.

English

2.4K

Matthew Russo@RussoMatthew·30 May

Finally, with parallelism enabled, we show Palimpzest can achieve a 90.3x speedup at 9.1x lower cost while obtaining an F1-score within 83.5% of the single-threaded GPT-4 baseline. Crucially, the user does not need to modify their declarative program to obtain these speedups.

English

292

Matthew Russo@RussoMatthew·30 May

For our evaluation, we curated three SAPP workloads in: - Legal Discovery - (identifying evidence of fraud at Enron) - Real Estate Search - (finding a suitable home in Cambridge, MA) - Medical Schema Matching - (reproducing a real-world bioinformatics paper)

English

324

Matthew Russo@RussoMatthew·30 May

Second, using three different policies on each dataset, we find that PZ selects high-quality plans. Plans selected by PZ consistently have similar or better quality than naive GPT-4 baselines, with up to 80.0% lower single-threaded runtime and up to 89.7% lower cost.

English

228

Matthew Russo@RussoMatthew·30 May

First, we find PZ produces appealing plans at numerous points in the tradeoff space. PZ produces plans that are: - 4.7x faster, 9.1x cheaper, and within 85.7% of the naive GPT-4 plan's F1-score - 3.3x faster, 2.9x cheaper, and up to 1.1x better F1 than the naive GPT-4 plan

English

205

Matthew Russo@RussoMatthew·30 May

We evaluated two claims: 1. PZ creates a set of candidate plans that offer diverse tradeoffs and better perf. than naive baselines 2. PZ can select a high-quality plan from the set of candidates (naive baseline = the plan you would get using a single model w/out optimization)

English

195

Matthew Russo@RussoMatthew·30 May

PZ first compiles a user program into a logical plan. It then uses logical and physical optimizations to create a large set of candidate plans. PZ estimates the cost of each plan and executes the optimal one based on user-specified preferences. For details, please read the paper

English

186

Matthew Russo@RussoMatthew·30 May

We’re excited to announce a pre-print and prototype system for Palimpzest: A Declarative System for Optimizing AI Workloads. Check out our full paper, blog post, and demo: - 📄Paper: arxiv.org/pdf/2405.14696 - 📬Blog Post: dsg.csail.mit.edu/projects/palim… - 💻Demo: bit.ly/pz-demo

English

590

Matthew Russo@RussoMatthew·30 May

Thus, we built Palimpzest (PZ), a system which compiles Python programs written using our declarative framework into optimized physical plans which it then executes.

English

194

Matthew Russo@RussoMatthew·30 May

Our key insight is that declarative program optimization can alleviate many of the challenges with building AI systems today, just as it did for database systems in the 1970s. We believe that machines should figure out how best to optimize AI workloads, not human engineers.

English

193

Matthew Russo@RussoMatthew·30 May

For a general class of AI workloads, which we define and term Semantic Analytics Applications (SAPPs), we argue that: 1. These workloads are highly capable of being optimized 2. Building AI systems for these workloads with conventional programming frameworks is painful

English

194

Keşfet

@cidrdb @tianyu_li_ @FuhengZ @tranway @MikeCafarella @ZuiChen @franklinmj @tim_kraska