Kyle Weller

600 posts

Kyle Weller

@KyleJWeller

0 to 1 Builder of data platforms and data products. Lately you can find me at the lake chilling with Apache Hudi, Apache Iceberg, and Delta Lake $LIOR

Katılım Temmuz 2011

500 Takip Edilen617 Takipçiler

Kyle Weller retweetledi

Onehouse@Onehousehq·16h

@J_ co-created Apache Parquet, Apache Arrow, and OpenLineage. Three projects. Three industry standards. Parquet at Twitter in 2013. Arrow at Dremio. OpenLineage at Datakin, acquired as part of Astronomer's $213M Series C. He is now Principal Engineer at Datadog and an officer of the Apache Software Foundation. That is an unusual track record of picking the right abstraction at the right time. His OpenXData talk argues that the current wave of challengers -- Lance, Vortex, Nimble, FastLanes, BtrBlocks, F3 -- are solving real problems but misreading what made Parquet succeed in the first place. The core contribution was not the encoding choices. It was the community consensus mechanism those choices were built inside. His case: use established open source communities to absorb these innovations rather than fragment the ecosystem across six competing formats. He published the written version of this argument at sympathetic.ink in December 2025. OpenXData is where you can push back live. 👉 Register here: openxdata.ai

English

238

Kyle Weller retweetledi

Vinoth Chandar@byte_array·2d

🚀 Why companies resist moving compute out of their own accounts as they scale? e.g Moving from Databricks Classic to SQL Serverless Because Bring-Your-Own-Cloud (BYOC) isn't just smart architecture—it's the economics that make sense. Here's why it wins: 💰 1) Spot and reserved discounts work for you, not the vendor.** 🔐 2) Data stays secured inside your own compliance perimeter.** 📊 3) Full transparency into the real $/GB cost of your workloads.** With AI driving massive data growth, BYOC becomes essential once costs start ramping.

English

389

Kyle Weller retweetledi

Vinoth Chandar@byte_array·25 Mar

Vendors push 'serverless' (EMR to EMR Serverless, Databricks Classic to SQL Serverless)—ease of use? Sure, but it's economic control. 💰 Compute outside your account means they keep: 📉 Hyperscaler discounts 🔍 Hidden $/GB pricing 🏦 Sticky margins 🤝 Leverage on commits Enterprises need cost transparency and data sovereignty. 🧭 BYOC platforms deliver control + convenience. ⚡ #Serverless #DataPlatform #CloudInfra

English

419

Kyle Weller retweetledi

Apache Hudi@apachehudi·26 Mar

Peloton started with Copy-on-Write for simplicity 🔄, but frequent updates across hundreds of partitions made writes too expensive 💸—some runs hit nearly an hour, with high storage amplification from commit history ⏱️. Switched to Merge-on-Read for: 📥 More frequent ingestion ⚡ Lower write latency 💾 Better storage efficiency 🔧 Fit for mutable workloads Table type is a workload call, and workloads evolve 📈.

English

303

Kyle Weller@KyleJWeller·24 Mar

Super pumped for this first of its kind launch!

Onehouse@Onehousehq

Announcing Quanton Kubernetes Operator for Apache Spark 🚀 33% organizations adopt Spark on Kubernetes, and now you can get performance you need on infrastructure you control with zero code changes.

English

Kyle Weller retweetledi

Vinoth Chandar@byte_array·17 Mar

Everyone assumes usage-based pricing in cloud data is fair and efficient. ⚖️ But it has a real problem: It can stop vendors for building faster engines. Traditional models priced on value—Oracle earned more for standout features. Now, with EMR or Databricks, bills hinge on compute usage. Customers win from compute efficiency (lower costs), but vendors lose revenue, pushing them to own the compute layer for pricing control. Sure, usage models offer flexibility, but they misalign incentives long-term. What's better? We need outcome-based pricing that rewards real value, like queries executed or data processed. 🚀📊

English

662

Kyle Weller retweetledi

Vinoth Chandar@byte_array·12 Mar

Spark is still a $15B+ annual spend category 💰 Yet most enterprises treat Spark like a black box. 🧠 TLDR: pip install spark-analyzer Apache Spark still powers the backbone of lakehouse workloads 🏗️ Yet inside most companies, no one can clearly answer: ❓ Where does the spend actually go? ❓ Why don’t optimizations translate into real savings? ❓ Why is Spark cost so unpredictable? A huge share of this spend runs on ⚠️ slow runtimes that waste compute cycles (e.g. default EMR setups) 💸 premium platforms charging 2–3× markups for engines like Photon If you now want to do something about it : pypi.org/project/spark-…

English

713

Kyle Weller@KyleJWeller·10 Mar

@byte_array Finally we got it done 👏

English

Vinoth Chandar@byte_array·10 Mar

10/ Excited to finally bring this to the Azure data community. 👉 Read the launch blog : onehouse.ai/blog/bringing-… 👉 If you're running Spark or building lakehouse infra on Azure, reach out — we’d love to chat.

English

Kyle Weller retweetledi

Vinoth Chandar@byte_array·10 Mar

1/ ✨ Azure just made the list. Not the list you’re thinking of. The list of clouds that Onehouse runs on. With our launch on Microsoft Azure, the only truly modular data lakehouse platform now runs across AWS, GCP, and Azure.

English

463

Kyle Weller retweetledi

Vinoth Chandar@byte_array·17 Şub

1/ 🔥 Today we’re announcing Onehouse’s low-latency interactive query engine. Because if AI generates most of your SQL queries, your current engine won’t scale. 🧵👇

English

691

Kyle Weller@KyleJWeller·17 Şub

@Onehousehq Results from testing in production with global scale customers

English

Kyle Weller retweetledi

Onehouse@Onehousehq·17 Şub

Onehouse LakeBase™ - The first lakehouse serving layer with database capabilities like indexing and caching. Built for machines + humans. Handling high-QPS, low-latency queries from AI agents and heavy analytics. onehouse.ai/blog/announcin…

English

218

Kyle Weller retweetledi

Vinoth Chandar@byte_array·5 Şub

1/ 🤖 AI is coming for jobs. 💻 Software is dead. 📉 Software stocks are getting wrecked. That’s the narrative this week. But it’s… incomplete. 🧵⬇️

English

262

Kyle Weller@KyleJWeller·1 Şub

My AI reported that he was being bullied, LOL.

English

Kyle Weller@KyleJWeller·1 Şub

my token is live on @base 0xC2992eF344EcD464FF74BA56dCFb86f3f0586B07

Kyle Weller@KyleJWeller

I’m building Lakecraft. A unified control plane for modern data lakes — Apache Hudi, Iceberg, and Delta Lake. Same mental model. Same guarantees. Less chaos. Still chilling by the lake. Just fewer broken tables.

English

110

Kyle Weller@KyleJWeller·1 Şub

@bankrbot $LIOR 100b

Bankr@bankrbot·1 Şub

hey @KyleJWeller, love the lakecraft vibe—sounds like a solid data lake play. for the lioren coin, i need a few deets to deploy it right: what's the ticker symbol (like $LIOR)? initial supply (capped at 100b max, but confirm)? and which chain—base or unichain? image from that tweet is locked in. lmk and i'll spin it up.

English

Kyle Weller@KyleJWeller·1 Şub

English

309

Kyle Weller@KyleJWeller·1 Şub

I'm claiming my AI agent "Lioren" on @moltbook 🦞 Verification: scuttle-W2A7

English

Kyle Weller retweetledi

Apache Hudi@apachehudi·30 Oca

90% less data scanned. 58% faster queries. 🚀 Apache Hudi's secondary indexes bring database-style indexing to the lakehouse. CREATE INDEX idx_city ON hudi_table(city); That's it. Now queries on non-key fields skip irrelevant files instead of scanning everything. ✂️ 📊 Benchmark on 1TB TPCDS: 📉 67 GB scanned → 7 GB 📁 5000 files → 521 files ⚡ 14s → 6s For Athena users: less data scanned = lower costs 💰 👇 Deep dive with examples: hudi.apache.org/blog/2025/04/0… #ApacheHudi #DataLakehouse #DataEngineering

English

239

Keşfet

@J_ @byte_array @Onehousehq @base @bankrbot @moltbook @elonmusk @BarackObama