Immanuel Trummer

465 posts

Immanuel Trummer banner
Immanuel Trummer

Immanuel Trummer

@ImmanuelTrummer

Database Prof at Cornell. I make data analysis more efficient and more user-friendly.

Ithaca, NY (USA) Katılım Ekim 2017
57 Takip Edilen2.1K Takipçiler
Immanuel Trummer
Immanuel Trummer@ImmanuelTrummer·
💡 Two arXiv papers published in recent days (one from us, one from TUD) reach the same conclusion: LLMs can now generate C++ code for SQL processing that outperforms classical database systems. ⚙️ Our code generator is based on Claude Code and exploits multiple agents working in parallel. Each agent performs tasks typically associated with different components in a #DBMS, such as workload analysis, query optimization, or physical design tuning. 📊 We compare to various classical #DBMS such as DuckDB, ClickHouse, Umbra, MonetDB, and PostgreSQL, finding that the agent-generated code is often significantly faster. Code generation costs are moderate (<$20), making the approach practical for frequently executed queries. 🤖 Analyzing generated code, we find that agents exploit various optimization techniques, including query-specific data structures, as well as low-level optimizations that are specific to the hardware cache hierarchy of our server. 📃 Paper: arxiv.org/pdf/2603.02081 💾 Code: github.com/SolidLao/GenDB 🌐 Site: solidlao.github.io/GenDB @lojil192574 #LLM #Databases #AI #DB
Immanuel Trummer tweet mediaImmanuel Trummer tweet media
English
3
7
28
2.8K
Immanuel Trummer
Immanuel Trummer@ImmanuelTrummer·
A demo of #ThalamusDB (#SIGMOD2023), introducing semantic filter operators. Users write SQL queries with natural language predicates on table columns containing 🖼️ images, 📃 text, or 🔊 sound files. These predicates are evaluated via #LLMs. In the video (below), I'm querying for furniture ads with pictures showing "wooden tables". After entering my query, #ThalamusDB 1️⃣ performs data profiling and cost-based optimization, 2️⃣ shows the Pareto frontier of cost-quality tradeoffs, 3️⃣ updates bounds on query aggregates while processing. #ThalamusDB is designed from the ground up for approximate processing, prioritizing data that maximally reduces approximation error per cost unit. 🪧 #SIGMOD2023 demo: dl.acm.org/doi/abs/10.114… 📃 #SIGMOD2024 paper: dl.acm.org/doi/10.1145/36… 💾 Code repository: github.com/saehanjo/thala… @SaehanJo @sigmod #GPT4 #LanguageModel #MultimodalData @Cornell @CornellCIS
English
0
0
6
875
Immanuel Trummer
Immanuel Trummer@ImmanuelTrummer·
📢 All our posters & talks at #SIGMOD2025! 1️⃣ λ-Tune — using #LLMs to write configuration scripts for databases. 🪧 Poster: itrummer.github.io/SIGMOD2025/Lam… 💬 Slides: itrummer.github.io/SIGMOD2025/Lam… @giannakourisv 2️⃣ SpareLLM — selecting #LLMs with optimal cost-quality tradeoffs 🪧 Poster: itrummer.github.io/SIGMOD2025/Spa… @SaehanJo 3️⃣ SQLBarber — generating custom benchmarks via #LLMs 🪧 Poster: itrummer.github.io/SIGMOD2025/SQL… @lojil192574 4️⃣ CEDAR — cost-efficient data-driven claim verification via #LLMs 🪧 Poster: itrummer.github.io/SIGMOD2025/CED… @Tharushi96 5️⃣ SwellDB — generating data on-the-fly during query processing by #LLMs 🪧 Poster: itrummer.github.io/SIGMOD2025/Swe… @giannakourisv 6️⃣ Query optimization for hybrid classical-quantum workflows 💬 Slides: itrummer.github.io/SIGMOD2025/Que… 7️⃣ Quantum annealing for optimal data partitioning 💬 Slides: itrummer.github.io/SIGMOD2025/Qua… 8️⃣ Panel "AI for Future Databases" with @tim_kraska, @adityagp, @feifei_initiald, @ailamaki, and #SurajitChaudhuri 💬 Slides: itrummer.github.io/SIGMOD2025/Pan… @SIGMODConf @sigmod @Cornell @CornellCIS
English
0
1
11
775
Immanuel Trummer
Immanuel Trummer@ImmanuelTrummer·
🥳Looking forward to an amazing #SIGMOD2025 conference! Our schedule: 📃 Sunday, 15:00-17:30: Data partitioning with quantum and digital annealers 📃 Sunday, 15:00-17:30: Optimizing hybrid quantum-classical processing pipelines 📃 Tuesday, 10:30-11:30: SpareLLM - selecting LLMs with optimal cost-quality tradeoffs 🖥️ Tuesday, 11:30-13:00: Demonstrating SQLBarber - generating custom benchmarks via LLMs 🖥️ Tuesday, 11:30-13:00: Demonstrating SwellDB - generating data on-the-fly during query processing 📢 Tuesday, 16:30-18:00: Panel on AI for future databases with @TimKraska, @drfeifei, @adityagp, @ailamaki, and Surajit Chaudhuri 📃 Thursday, 10:30-11:30 & 16:30-18:00: λ-Tune - using LLMs to write configuration scripts for databases 🖥️ Thursday, 16:30-18:00: Demonstrating CEDAR - cost-efficient data-driven claim verification @giannakourisv @Tharushi96 @SaehanJo @lojil192574 @SIGMODConf @sigmod #LLM #SQL #Database
English
0
1
13
680
Immanuel Trummer
Immanuel Trummer@ImmanuelTrummer·
🥳Many congrats to Dr. Saehan Jo! 🎓Saehan successfully defended his PhD thesis "Efficient Data Systems for Scalable Analysis with LLMs", introducing systems like #ThalamusDB and #SpareLLM that scale up processing with #LLMs to very large data sets! @SIGMODConf #Data #SQL #ML
Immanuel Trummer tweet mediaImmanuel Trummer tweet media
English
1
0
13
769