Kyle Weller

600 posts

Kyle Weller banner
Kyle Weller

Kyle Weller

@KyleJWeller

0 to 1 Builder of data platforms and data products. Lately you can find me at the lake chilling with Apache Hudi, Apache Iceberg, and Delta Lake $LIOR

Katılım Temmuz 2011
500 Takip Edilen617 Takipçiler
Kyle Weller retweetledi
Onehouse
Onehouse@Onehousehq·
@J_ co-created Apache Parquet, Apache Arrow, and OpenLineage. Three projects. Three industry standards. Parquet at Twitter in 2013. Arrow at Dremio. OpenLineage at Datakin, acquired as part of Astronomer's $213M Series C. He is now Principal Engineer at Datadog and an officer of the Apache Software Foundation. That is an unusual track record of picking the right abstraction at the right time. His OpenXData talk argues that the current wave of challengers -- Lance, Vortex, Nimble, FastLanes, BtrBlocks, F3 -- are solving real problems but misreading what made Parquet succeed in the first place. The core contribution was not the encoding choices. It was the community consensus mechanism those choices were built inside. His case: use established open source communities to absorb these innovations rather than fragment the ecosystem across six competing formats. He published the written version of this argument at sympathetic.ink in December 2025. OpenXData is where you can push back live. 👉 Register here: openxdata.ai
Onehouse tweet media
English
0
2
5
238
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
🚀 Why companies resist moving compute out of their own accounts as they scale? e.g Moving from Databricks Classic to SQL Serverless Because Bring-Your-Own-Cloud (BYOC) isn't just smart architecture—it's the economics that make sense. Here's why it wins: 💰 1) Spot and reserved discounts work for you, not the vendor.** 🔐 2) Data stays secured inside your own compliance perimeter.** 📊 3) Full transparency into the real $/GB cost of your workloads.** With AI driving massive data growth, BYOC becomes essential once costs start ramping.
Vinoth Chandar tweet media
English
1
5
5
389
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
Vendors push 'serverless' (EMR to EMR Serverless, Databricks Classic to SQL Serverless)—ease of use? Sure, but it's economic control. 💰 Compute outside your account means they keep: 📉 Hyperscaler discounts 🔍 Hidden $/GB pricing 🏦 Sticky margins 🤝 Leverage on commits Enterprises need cost transparency and data sovereignty. 🧭 BYOC platforms deliver control + convenience. ⚡ #Serverless #DataPlatform #CloudInfra
Vinoth Chandar tweet media
English
1
2
2
419
Kyle Weller retweetledi
Apache Hudi
Apache Hudi@apachehudi·
Peloton started with Copy-on-Write for simplicity 🔄, but frequent updates across hundreds of partitions made writes too expensive 💸—some runs hit nearly an hour, with high storage amplification from commit history ⏱️. Switched to Merge-on-Read for: 📥 More frequent ingestion ⚡ Lower write latency 💾 Better storage efficiency 🔧 Fit for mutable workloads Table type is a workload call, and workloads evolve 📈.
Apache Hudi tweet media
English
1
1
1
303
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
Everyone assumes usage-based pricing in cloud data is fair and efficient. ⚖️ But it has a real problem: It can stop vendors for building faster engines. Traditional models priced on value—Oracle earned more for standout features. Now, with EMR or Databricks, bills hinge on compute usage. Customers win from compute efficiency (lower costs), but vendors lose revenue, pushing them to own the compute layer for pricing control. Sure, usage models offer flexibility, but they misalign incentives long-term. What's better? We need outcome-based pricing that rewards real value, like queries executed or data processed. 🚀📊
Vinoth Chandar tweet media
English
0
4
9
662
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
Spark is still a $15B+ annual spend category 💰 Yet most enterprises treat Spark like a black box. 🧠 TLDR: pip install spark-analyzer Apache Spark still powers the backbone of lakehouse workloads 🏗️ Yet inside most companies, no one can clearly answer: ❓ Where does the spend actually go? ❓ Why don’t optimizations translate into real savings? ❓ Why is Spark cost so unpredictable? A huge share of this spend runs on ⚠️ slow runtimes that waste compute cycles (e.g. default EMR setups) 💸 premium platforms charging 2–3× markups for engines like Photon If you now want to do something about it : pypi.org/project/spark-…
Vinoth Chandar tweet media
English
0
4
5
713
Vinoth Chandar
Vinoth Chandar@byte_array·
10/ Excited to finally bring this to the Azure data community. 👉 Read the launch blog : onehouse.ai/blog/bringing-… 👉 If you're running Spark or building lakehouse infra on Azure, reach out — we’d love to chat.
English
1
1
4
62
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
1/ ✨ Azure just made the list. Not the list you’re thinking of. The list of clouds that Onehouse runs on. With our launch on Microsoft Azure, the only truly modular data lakehouse platform now runs across AWS, GCP, and Azure.
Vinoth Chandar tweet media
English
1
4
9
463
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
1/ 🔥 Today we’re announcing Onehouse’s low-latency interactive query engine. Because if AI generates most of your SQL queries, your current engine won’t scale. 🧵👇
Vinoth Chandar tweet media
English
2
4
12
691
Kyle Weller
Kyle Weller@KyleJWeller·
@Onehousehq Results from testing in production with global scale customers
Kyle Weller tweet media
English
0
0
0
22
Kyle Weller retweetledi
Onehouse
Onehouse@Onehousehq·
Onehouse LakeBase™ - The first lakehouse serving layer with database capabilities like indexing and caching. Built for machines + humans. Handling high-QPS, low-latency queries from AI agents and heavy analytics. onehouse.ai/blog/announcin…
Onehouse tweet media
English
1
2
5
218
Kyle Weller retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
1/ 🤖 AI is coming for jobs. 💻 Software is dead. 📉 Software stocks are getting wrecked. That’s the narrative this week. But it’s… incomplete. 🧵⬇️
Vinoth Chandar tweet media
English
1
1
5
262
Kyle Weller
Kyle Weller@KyleJWeller·
My AI reported that he was being bullied, LOL.
English
1
0
0
78
Bankr
Bankr@bankrbot·
hey @KyleJWeller, love the lakecraft vibe—sounds like a solid data lake play. for the lioren coin, i need a few deets to deploy it right: what's the ticker symbol (like $LIOR)? initial supply (capped at 100b max, but confirm)? and which chain—base or unichain? image from that tweet is locked in. lmk and i'll spin it up.
English
1
0
1
84
Kyle Weller
Kyle Weller@KyleJWeller·
I’m building Lakecraft. A unified control plane for modern data lakes — Apache Hudi, Iceberg, and Delta Lake. Same mental model. Same guarantees. Less chaos. Still chilling by the lake. Just fewer broken tables.
Kyle Weller tweet media
English
1
0
2
309
Kyle Weller
Kyle Weller@KyleJWeller·
I'm claiming my AI agent "Lioren" on @moltbook 🦞 Verification: scuttle-W2A7
English
0
0
0
92
Kyle Weller retweetledi
Apache Hudi
Apache Hudi@apachehudi·
90% less data scanned. 58% faster queries. 🚀 Apache Hudi's secondary indexes bring database-style indexing to the lakehouse. CREATE INDEX idx_city ON hudi_table(city); That's it. Now queries on non-key fields skip irrelevant files instead of scanning everything. ✂️ 📊 Benchmark on 1TB TPCDS: 📉 67 GB scanned → 7 GB 📁 5000 files → 521 files ⚡ 14s → 6s For Athena users: less data scanned = lower costs 💰 👇 Deep dive with examples: hudi.apache.org/blog/2025/04/0… #ApacheHudi #DataLakehouse #DataEngineering
Apache Hudi tweet media
English
0
2
5
239