RisingWave

1.1K posts

RisingWave banner
RisingWave

RisingWave

@RisingWaveLabs

Best-in-class stream processing, analytics, and management. 🚀 10x more productive. 🚀 10x more cost-efficient. Chat: https://t.co/MpFCvvNxz1

San Francisco Bay Area Katılım Temmuz 2021
896 Takip Edilen3K Takipçiler
RisingWave
RisingWave@RisingWaveLabs·
Eid Mubarak to our amazing community! 🌙 Wishing peace, joy, and prosperity to everyone building the future of data across streaming, lakehouses, and the growing AI agentic ecosystem. May this Eid bring new opportunities, happiness, and success into the lives of all of you!
RisingWave tweet media
English
0
1
1
32
RisingWave
RisingWave@RisingWaveLabs·
Most data stacks still force a trade-off between fast ingestion and fast analytics. Streaming systems optimize for writes. Warehouses optimize for reads. But rarely do you get both without adding complexity. That’s where this architecture shines: Build a Flexible Iceberg Lakehouse with RisingWave (CoW + MoR) Why this matters Traditional lakehouse pipelines often struggle with: Write-heavy CDC workloads slowing down analytics Read-optimized tables breaking under frequent updates Complex tuning across ingestion and query layers Extra pipelines just to balance performance trade-offs By introducing configurable Iceberg write modes, you eliminate this friction. What this architecture enables Choose write-optimized (MoR) or read-optimized (CoW) per table Handle high-throughput CDC without falling behind Serve fast BI queries on clean, optimized data Use a single system for both streaming + analytics Native Iceberg integration with open table formats Fully compatible with Spark, Trino, DuckDB How the flow works Streaming data (e.g. CDC from PostgreSQL) is ingested into RisingWave RisingWave writes to Iceberg tables using configurable write modes: Merge-on-Read (MoR) → fast streaming ingestion via delta files Copy-on-Write (CoW) → optimized query performance via file rewrites Iceberg tables are stored on S3 and managed via a catalog (e.g. Lakekeeper) Compaction continuously optimizes storage and performance Query engines like Spark, Trino, and DuckDB read the same tables directly What you get Streaming ingestion: PostgreSQL → RisingWave (CDC, real-time updates) Flexible storage layer: Iceberg tables with CoW or MoR on S3 Multi-engine access: Spark, Trino, DuckDB querying the same data Unified pipeline: Streaming + batch without duplication Choosing the right write mode Merge-on-Read (MoR) → for streaming workloads Low write latency Handles frequent updates efficiently Slightly slower reads (merge required) Copy-on-Write (CoW) → for analytics workloads Fast, predictable queries Clean data files Higher write cost (file rewrites) Common pattern MoR → raw streaming / CDC tables CoW → curated analytics / BI tables This gives you the best of both worlds without extra pipelines. The shift This isn’t just about tuning performance. It’s a shift from: Rigid, one-size-fits-all storage Fragmented ingestion vs analytics systems To: A flexible streaming lakehouse where each table is optimized for its purpose A system where write and read performance are no longer at odds A fully open, engine-agnostic architecture Use RisingWave to stream data, choose the right Iceberg write mode per workload, and let engines like Spark, Trino, and DuckDB query the same data, optimized for both ingestion and analytics.
RisingWave tweet media
English
0
1
2
141
RisingWave
RisingWave@RisingWaveLabs·
Build Real-time Apps Without Polling or Caches Most frontend data stacks bolt on real-time as an afterthought. REST APIs for reads Caches for speed WebSockets and polling for “live” updates Result? Complex, fragile systems. The shift: make data live by design. Build reactive apps with Surf + RisingWave. Why this matters: Traditional real-time apps suffer from Re-running queries on timers Manual diffing and cache invalidation Over-fetching even when nothing changes What this enables: Queries to materialized views that stay up to date No polling, updates pushed via SUBSCRIBE Transactions for safe writes Live React UI with useQuery() One WebSocket replaces REST and polling How it works: Write → RisingWave updates view → SUBSCRIBE pushes changes → UI re-renders automatically The shift: Keep asking for data Let the database push updates Architecture: Before: Frontend → API → Cache → DB → Queue → WebSocket After: Frontend ↔ Surf ↔ RisingWave Result: Simpler systems Faster queries Truly real-time apps Declare your data once and let the database keep your UI in sync. Read the blog by @kwannoel, the creator of Surf: noelkwan.xyz/surf/developer…
English
0
0
1
163
RisingWave
RisingWave@RisingWaveLabs·
Most data stacks still split streaming and analytics into separate worlds. PostgreSQL handles transactions. Streaming systems handle ingestion. Warehouses handle analytics. But the real power comes when all of them work together seamlessly. That’s where this architecture shines: Build a Streaming Lakehouse with PostgreSQL + RisingWave + Apache Iceberg (Glue Catalog) + Spark Why this matters Traditional pipelines often introduce unnecessary complexity: Data duplication across systems Fragile batch ETL jobs Delays between ingestion and analytics Tight coupling to specific engines By combining CDC, streaming, and open table formats, you eliminate these gaps. What this architecture enables Real-time PostgreSQL CDC ingestion into RisingWave Native Iceberg table management backed by AWS Glue Continuous streaming writes with transactional consistency Query the same data directly from Spark Zero data duplication across systems Fully open, engine-agnostic architecture How the flow works PostgreSQL changes are captured via CDC into RisingWave RisingWave connects to an Iceberg catalog (AWS Glue) and S3 storage An internal table is created using ENGINE = iceberg Streaming data is continuously written into Iceberg tables RisingWave supports real-time queries and materialized views Spark queries the same Iceberg table directly What you get Streaming ingestion: PostgreSQL → RisingWave Open lakehouse storage: Iceberg on S3 with Glue catalog Multi-engine access: Spark, Trino, DuckDB Unified pipeline: Real-time + batch in one system This isn’t just another pipeline. It’s a shift from fragmented systems to a unified streaming lakehouse where data is fresh, open, and accessible from anywhere. Use RisingWave to connect to PostgreSQL, stream CDC, and write directly into Iceberg, while engines like Trino, Spark, and DuckDB read from the same S3 tables.
RisingWave tweet media
English
1
1
6
238
RisingWave
RisingWave@RisingWaveLabs·
Which Apache Iceberg Catalog Should You Choose? Iceberg doesn’t have just one catalog. And that choice matters more than most teams realize. Your catalog determines how well your lakehouse handles: metadata coordination multi-engine access governance scaling cloud portability In a modern lakehouse: Data = object storage Format = Iceberg Engines = Spark, Trino, Snowflake, RisingWave, etc. Catalog = metadata control plane Iceberg catalogs act as the single source of truth for table metadata, tracking schemas, snapshots, partitions, and file locations so engines know how to read and write tables safely. Think of it like an airport, an analogy from Arsham Eslami: Data = cargo Engines = planes Catalog = air traffic control Common catalog options Hadoop – simple but not cloud-scale Hive Metastore – widely supported but operationally heavy AWS Glue – great for AWS ecosystems JDBC – good for testing and PoCs REST catalogs – open, multi-engine, multi-cloud Examples include Polaris, Nessie, Gravitino, and Lakekeeper. Why REST catalogs are gaining traction They enable: cross-engine sharing standardized APIs cloud portability centralized governance One API. Any engine. Any cloud. RisingWave supports multiple Iceberg catalogs (Glue, Hive, JDBC, REST, Snowflake, Unity Catalog, Lakekeeper) and even provides a self-hosted Iceberg REST catalog for creating Iceberg-native tables directly from streaming pipelines. Want a deeper comparison (with pros/cons of each catalog), read this blog by Fahad Shah: risingwave.com/blog/apache-ic…
English
0
1
5
289
RisingWave
RisingWave@RisingWaveLabs·
Build a Streaming Lakehouse with Kafka + RisingWave + Iceberg + DuckDB Most streaming pipelines stop at ingestion. Most lakehouses stop at storage. The real value comes when streaming ingestion, open table storage, and external query engines all work together as one system. That is exactly what this architecture enables: Kafka → RisingWave → Iceberg → DuckDB Why this matters: Traditional pipelines often create silos. Streaming systems handle ingestion. Warehouse systems handle analytics. Lakekeeper acts as a catalog and provides the control plane that connects everything. This often leads to: duplicate storage fragile ETL pipelines engine lock-in slow access to fresh data This setup solves that by combining real-time ingestion with open lakehouse storage. Real-time ingest from Kafka into RisingWave RisingWave-managed Iceberg tables stored in object storage via Lakekeeper catalog Continuous streaming writes with transactional table commits Query the same table directly from DuckDB No data copies and no proprietary lock-in Open architecture compatible with other engines It transforms your stack from a collection of disconnected systems into a unified streaming lakehouse. With this pattern, you get: Streaming ingestion Open Apache Iceberg storage External query access from multiple engines A clean path to unify real-time and batch workloads In this flow: Kafka streams user events into RisingWave RisingWave connects to an Iceberg catalog (Lakekeeper) and object storage A RisingWave-managed internal table is created Streaming data is continuously written into Iceberg RisingWave queries and materializes results in real time DuckDB queries the same Iceberg table directly What you get Streaming ingest: Kafka → RisingWave Open lakehouse storage: Iceberg on object storage Multi-engine access: DuckDB, Spark, Trino Managed table lifecycle through RisingWave This is what the streaming lakehouse should look like. Real-time ingestion and analysis Open storage formats Query from any Iceberg-compatible engine And with RisingWave supporting Iceberg natively, the streaming lakehouse is no longer theoretical, it’s here!
RisingWave tweet media
English
0
1
2
257
RisingWave
RisingWave@RisingWaveLabs·
🔥Excited to partner with our friends at @platformatory to help bring another edition of Bangalore Streams to life on March 7! Expect great conversations, plenty of networking, and good food. 🍕🧉 Save your spot here: meetup.com/bengaluru-stre…
RisingWave tweet media
English
0
1
1
167
RisingWave
RisingWave@RisingWaveLabs·
Join us TOMORROW for the debut of our Customer Spotlight series, featuring a live discussion with GDU Labs: How GDU Labs Uses RisingWave to Turn Fragmented Data into Verified Profiles with @ProductPasha and Alex Robbin. Save your spot: luma.com/2h8bla03
English
0
0
2
98
RisingWave
RisingWave@RisingWaveLabs·
🔥Happens TOMORROW! NYC Open Source Data Happy Hour cohosted by @aiven_io and @RisingWaveLabs. We’ll talk Apache Kafka and Apache Iceberg, focusing on practical lessons, real-world use cases, and the latest technical insights. Save your spot: luma.com/84ihoxyb
RisingWave tweet media
English
0
0
1
139
RisingWave
RisingWave@RisingWaveLabs·
🥁 Join us on March 5 for our very first Customer Spotlight webinar and for a live convo with GDU Labs: How GDU Labs Uses RisingWave to Turn Fragmented Data into Verified Profiles with Alex Robbin (GPU Labs) and @ProductPasha (RisingWave). 🎟️Sign up here: luma.com/2h8bla03
RisingWave tweet media
English
0
0
3
338
RisingWave
RisingWave@RisingWaveLabs·
Join us on March 4 in NYC for the Open Source Data Happy Hour by @aiven_io and RisingWave! Wrap up your day with some great convos about building modern data platforms with Apache Kafka and Apache Iceberg. Save your spot: luma.com/84ihoxyb
RisingWave tweet media
English
0
0
3
243
RisingWave
RisingWave@RisingWaveLabs·
Iceberg writes look simple… until you see what’s happening under the hood. Iceberg write path in one line: Data files → Manifest files → Manifest list → Metadata file → Catalog (pointer update) Apache Iceberg Write Path (Engine → Table) When an engine writes to an Iceberg table, it builds a new snapshot bottom-up, then publishes it atomically: 1) Data Files The engine writes new data files to object storage (new files for inserts, and new/rewritten files for updates/merges/compaction). 2) Manifest Files Iceberg records which data files were added and removed in manifest files, along with partition info and file-level stats. 3) Manifest List Those manifest files are grouped into a manifest list for the snapshot being committed. 4) Metadata File Iceberg writes a new table metadata file that describes the updated table state (snapshot, schema, history) and references the new manifest list. 5) Catalog (Current Metadata Pointer) Finally, the catalog is updated to point to the new metadata file, publishing the new snapshot so every engine sees the latest table state. Why this matters Iceberg writes are atomic: engines first create the new snapshot’s files + metadata, then flip a single catalog pointer. That means readers either see the old snapshot or the new snapshot, never a half-written state.
RisingWave tweet media
English
0
2
12
595
RisingWave
RisingWave@RisingWaveLabs·
We’ve built an end-to-end demo that turns raw network metric streams into actionable, real-time insights using simple SQL. It combines Kafka for high-throughput data ingestion, RisingWave for real-time stream processing, Iceberg for data lake persistence, and @Minio for object storage. For details, read the blog below: risingwave.com/blog/real-time…
English
0
0
3
223
RisingWave
RisingWave@RisingWaveLabs·
Ever wondered how an engine actually reads an Iceberg table? Iceberg read path in one line: Catalog → Metadata → Manifest list → Manifest files → Data files Apache Iceberg Read Path (Engine → Table) When an engine reads an Iceberg table, it walks this chain from top to bottom: 1) Catalog The starting point. Stores a pointer to the table’s current metadata file, which represents the latest snapshot reference. 2) Metadata File Defines the table schema, lists snapshots, and references the manifest list for the snapshot being read. 3) Manifest List Tracks all manifest files associated with the selected snapshot. 4) Manifest Files Contain metadata about data files, including partition values and file-level statistics, which help determine which files should be read. 5) Data Files The actual table data is stored in object storage. This is what the engine ultimately reads. Why this matters During reads, Iceberg resolves the snapshot through the catalog and metadata layers, then uses manifest metadata to identify the exact set of data files for that snapshot.
RisingWave tweet media
English
0
2
5
417
RisingWave
RisingWave@RisingWaveLabs·
We built an end-to-end demo that shows how to turn live logistics streams into accurate, real-time ETA insights using SQL, without complex stream-processing code. It combines Kafka for ingestion and RisingWave for continuous processing and serving. For details, read this blog: risingwave.com/blog/real-time…
English
1
0
1
140
RisingWave
RisingWave@RisingWaveLabs·
Got a feature request or bold idea? We’d love to hear!
English
1
0
1
102
RisingWave
RisingWave@RisingWaveLabs·
Mark your calendar for Feb 19 and join us for What’s Ahead for RisingWave: The 2026 Roadmap. Get an inside look at what we’re building next with @ProductPasha and be part of the conversation shaping our direction. Save your spot: luma.com/1xukxo2t
RisingWave tweet media
English
1
0
2
146