Matthew Russo

23 posts

Matthew Russo banner
Matthew Russo

Matthew Russo

@RussoMatthew

Third-year PhD candidate in the MIT Data Systems Group. Working on Palimpzest and semantic operators. Project: https://t.co/2gSZUQiHli Email: [email protected]

Boston, MA Katılım Mayıs 2024
40 Takip Edilen40 Takipçiler
Matthew Russo retweetledi
Andy Pavlo (@andypavlo.bsky.social)
Congratulations to the 2026 @cidrdb prize awardees! @tianyu_li_ → Gong Show Winner @FuhengZ → Database Quiz Winner They each received a rare signed print of "The Birth of the Database Messiah" (est value $12,000).
Andy Pavlo (@andypavlo.bsky.social) tweet mediaAndy Pavlo (@andypavlo.bsky.social) tweet media
English
1
3
113
16.6K
Matthew Russo retweetledi
Connor Shorten
Connor Shorten@CShorten30·
I am SUPER EXCITED to publish the 131st episode of the Weaviate Podcast with Matthew Russo (@RussoMatthew), a Ph.D. student at MIT! 🎉 AI is transforming Database Systems. Perhaps the biggest impact so far has been natural language to query language translations, or Text-to-SQL. However, another massive innovation is brewing. 💥 AI presents new Semantic Operators for our query languages. For example, we are all familiar with the WHERE filter. Now we have AI_WHERE, in which an LLM or another AI model computes the filter value without needing it to be already available in the database! ```sql SELECT * FROM podcasts AI_WHERE “Text-to-SQL” in topics ``` Semantic Filters are just the tip of iceberg, the roster of Semantic Operators further includes Semantic Joins, Map, Rank, Classify, Groupby, and Aggregation! 🛠️ And it doesn’t stop there! One of the core ideas in Relational Algebra and its influence Database Systems is query planning and finding the optimal order to apply filters. For example, let’s say you have two filters, the car is red and the car is a BMW. Now let’s say the dataset only contains 100 BMWs, but 50,000 red cars!! Applying the BMW filter first will limit the size of the set for the next filter! 🧠 This foundational idea has all sorts of extensions now that LLMs are involved! This opportunity is giving rise to new query engines with declarative optimizers such as Palimpzest, LOTUS, and others! 💻 So many interesting nuggets in this podcast, loved discussing these things with Matthew, and I hope you find it interesting! 👇
Connor Shorten tweet media
English
4
13
30
7K
Matthew Russo
Matthew Russo@RussoMatthew·
👏 And finally kudos to the rest of the PZ team! Chunwei Liu, Gerardo Vitagliano, Sivaprasad Sudhir, Peter Baile Chen, Zui Chen, Rana Shahout, Lei Cao, Mike Cafarella, Sam Madden, Tim Kraska, and Michael J. Franklin
English
0
0
1
184
Matthew Russo
Matthew Russo@RussoMatthew·
If this (very high-level) summary of our work has piqued your interest -- go read our full paper! 📄Paper: arxiv.org/pdf/2405.14696 💻Code: github.com/mitdbg/palimpz… We would love to hear any feedback, ideas for more use cases, and/or opportunities for collaboration.
English
1
1
7
2.4K
Matthew Russo
Matthew Russo@RussoMatthew·
Finally, with parallelism enabled, we show Palimpzest can achieve a 90.3x speedup at 9.1x lower cost while obtaining an F1-score within 83.5% of the single-threaded GPT-4 baseline. Crucially, the user does not need to modify their declarative program to obtain these speedups.
Matthew Russo tweet media
English
1
0
1
292
Matthew Russo
Matthew Russo@RussoMatthew·
For our evaluation, we curated three SAPP workloads in: - Legal Discovery - (identifying evidence of fraud at Enron) - Real Estate Search - (finding a suitable home in Cambridge, MA) - Medical Schema Matching - (reproducing a real-world bioinformatics paper)
Matthew Russo tweet media
English
1
1
2
324
Matthew Russo
Matthew Russo@RussoMatthew·
Second, using three different policies on each dataset, we find that PZ selects high-quality plans. Plans selected by PZ consistently have similar or better quality than naive GPT-4 baselines, with up to 80.0% lower single-threaded runtime and up to 89.7% lower cost.
Matthew Russo tweet media
English
1
0
1
228
Matthew Russo
Matthew Russo@RussoMatthew·
First, we find PZ produces appealing plans at numerous points in the tradeoff space. PZ produces plans that are: - 4.7x faster, 9.1x cheaper, and within 85.7% of the naive GPT-4 plan's F1-score - 3.3x faster, 2.9x cheaper, and up to 1.1x better F1 than the naive GPT-4 plan
Matthew Russo tweet media
English
1
0
1
205
Matthew Russo
Matthew Russo@RussoMatthew·
We evaluated two claims: 1. PZ creates a set of candidate plans that offer diverse tradeoffs and better perf. than naive baselines 2. PZ can select a high-quality plan from the set of candidates (naive baseline = the plan you would get using a single model w/out optimization)
English
1
0
1
195
Matthew Russo
Matthew Russo@RussoMatthew·
PZ first compiles a user program into a logical plan. It then uses logical and physical optimizations to create a large set of candidate plans. PZ estimates the cost of each plan and executes the optimal one based on user-specified preferences. For details, please read the paper
English
1
0
0
186
Matthew Russo
Matthew Russo@RussoMatthew·
Thus, we built Palimpzest (PZ), a system which compiles Python programs written using our declarative framework into optimized physical plans which it then executes.
Matthew Russo tweet media
English
1
0
1
194
Matthew Russo
Matthew Russo@RussoMatthew·
Our key insight is that declarative program optimization can alleviate many of the challenges with building AI systems today, just as it did for database systems in the 1970s. We believe that machines should figure out how best to optimize AI workloads, not human engineers.
English
1
0
0
193
Matthew Russo
Matthew Russo@RussoMatthew·
For a general class of AI workloads, which we define and term Semantic Analytics Applications (SAPPs), we argue that: 1. These workloads are highly capable of being optimized 2. Building AI systems for these workloads with conventional programming frameworks is painful
English
1
0
0
194