Bartosz Konieczny

860 posts

Bartosz Konieczny banner
Bartosz Konieczny

Bartosz Konieczny

@waitingforcode

Freelance Data Engineer and instructor, enjoy solving data problems with #ApacheSpark #AWS #GCP #Azure 👨‍🏭 | [email protected]

remote Joined Ocak 2011
79 Following1.8K Followers
Bartosz Konieczny retweeted
Delta Lake
Delta Lake@DeltaLakeOSS·
⏰ Final Reminder – Delta Lake Webinar Tomorrow! Wondering if data engineering design patterns can unlock new insights into Delta Lake? Or how Delta Lake can become a key part of your streaming data architecture? Join @newfront (@bufbuild) and @waitingforcode as they tackle these questions head-on! 🗓️ Oct 14 @ 9AM PT 🎥 Live on LinkedIn, YouTube & X 📍 Reserve your spot today: luma.com/delta-1014 #opensource #oss #deltalake #streaming #dataengineering
Delta Lake tweet media
English
1
2
5
1.4K
Bartosz Konieczny retweeted
Jack Vanlightly
Jack Vanlightly@vanlightly·
Why don’t Iceberg or Delta Lake have secondary indexes? Because analytics workloads and OLTP workloads optimize for opposite I/O patterns. See my dive into data layout, pruning, and what “indexing” really means in open table formats: jack-vanlightly.com/blog/2025/10/8…
English
2
30
197
11.5K
Bartosz Konieczny retweeted
Delta Lake
Delta Lake@DeltaLakeOSS·
Are you wondering if general concepts like data engineering design patterns can help you learn about #DeltaLake? Or, if it's possible to leverage Delta Lake within your streaming data architecture? In this webinar, Scott Haines and Bartosz Konieczny will answer these two questions. Scott, who gained streaming expertise at Yahoo, Twilio, and Nike, will share with you best practices for leveraging Delta Lake as a component of your streaming architecture. ✅ Bartosz, who recently published Data Engineering Design Patterns, will reverse-engineer a few of these design patterns to explain which Delta Lake features make everything tick. 🗓️ Tuesday, Oct 14 🕝 9AM PT Don't miss it! 🔗 Register today: luma.com/delta-1014 #opensource #oss #dataarchitecture #dataengineering @waitingforcode
Delta Lake tweet media
English
0
2
4
637
Bartosz Konieczny
Bartosz Konieczny@waitingforcode·
@AdiPolak I'm not that new anymore, but "Stream Processing with Apache Flink" was my first learning resource; well structured, covering IMO the most important parts to start. Now, I'm deeply appreciating Flink Forward technical deep dives to go further 🤩
English
0
0
1
93
Adi Polak
Adi Polak@AdiPolak·
People who are newly learning Flink: what learning resources are you finding helpful these days? Senior folks forget what it's like, and the resources we learned on get out of date. Teach me your ways!
English
3
1
11
1.5K
Bartosz Konieczny retweeted
Jim Dowling
Jim Dowling@jim_dowling·
I have been busy the last few months writing a book for O'Reilly about how to build ML systems (batch, real-time, and LLMs), distilling much of what I have learnt from both working with customers as well as students. Why could the book interest you? * Data Scientists - transition from training models to building ML systems * ML Engineers - learn about how to build batch, real-time, LLM systems in modular parts that you compose into a ML system * Data Engineers - learn about the data transformation taxonomy for ML and how badly structured DAGs prevent reuse in ML systems * Architects - divide et impera - learn how modularity helps you build faster and better ML systems. Early access to the first chapter (52 pages) is available here: hopsworks.ai/lp/oreilly-boo…
English
6
22
120
15.3K
Bartosz Konieczny retweeted
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
I don't want to start a flame war here, but IMO it is a mistake to jump straight to distributed databases (and 90% of the content below is distributed databases) without first learning fundamentals on single node databases. Here's my 10 things to understand about databases: 1. Relational model. Primary keys, foreign keys, normal form. 2. SQL language. Ideally with advanced SQL (CTE, analytics) 3. ACID and how transactions work 4. Write-ahead log (or binlog) and how it is used. Especially around restarts, recovery and replication. 5. Buffer cache, disk storage layout and how they interact 6. What happens when databases start? when they shut down? 7. Indexes, cluster tables, partitions and other types of database structures. 8. Query parsing, planning and optimizing. 9. MVCC and how to deal with its quirks in your DB of choice 10. Security - authentication, authorization, encryption on wire and at rest. 11. (Bonus) Investigating performance issues and making sense of benchmarks. Entire world, stuff that 99% of developers use daily. You can be a deep expert without ever looking at distributed databases. And this also serves as strong foundation once you do. And if you use Postgres, I found this free book super helpful in making sense of things: interdb.jp/pg/
Kaivalya Apte - The Geek Narrator@thegeeknarrator

Ten things to understand about your database: 1) High level Architecture 2) How writes work? (Replication, data distribution, internal organisation etc) 3) How reads work? (Consistency guarantees, tuning options, etc) 4) CAP theorem, ex. CP or AP 5) Transactions and Concurrency models 6) How does it scale? 7) How are failures handled? 8) Best practices on Querying data 9) How is geo-distribution supported, so you can plan ahead in time? 10) How to optimise cost? Episodes to watch to understand the above for different databases: DynamoDB: youtu.be/ifSckJlatWE Cassandra: youtu.be/V1EO_0i3RNA CockroachDB: youtu.be/1NuvxQEoVHU General database internals: Part-1 youtu.be/DiLA0Ri6RfY and Part-2 youtu.be/IW4cpnpVg7E Realtime Analytics with Apache Pinot: youtu.be/cGTffWg2EFs Geo Distribution of databases: youtu.be/JQfnMp0OeTA CDC and Debezium: youtu.be/VGH6TlhEJpM Twisp - A ledger database: youtu.be/VGb54yNQrHM Kafka internals youtu.be/d89W_GzWnRw YugaByteDB Internals: youtu.be/cXIPIA7e220 Write ahead logging: youtu.be/yV_Zp0Mi3xs and youtu.be/2MqY_mT1vw8 B-Trees on Disk: youtu.be/dTfR0S_rBGg Graph Database Internals: youtu.be/iihJXKAQZkA ScyllaDb internals: youtu.be/AqY13RjWwJg Duckdb Internals: youtu.be/f9QlkXW4H9A RisingWave Streaming Database: youtu.be/nckuW02gI3Y Clickhouse Internals: youtu.be/sh5EBqrrwEU

English
24
89
583
123.4K
Bartosz Konieczny retweeted
Delta Lake
Delta Lake@DeltaLakeOSS·
The early release of Delta Lake: The Definitive Guide is here! 🎉 The latest edition includes the addition of Chapter 12: Performance Tuning. Download here ➡️ bit.ly/472DVY7 Authors @dennylee, Prashanth Babu, Tristen Wentling, & @newfront #opensource #deltalake #oss
English
3
14
81
14.4K
Bartosz Konieczny retweeted
Antón
Antón@antonmry·
A list of articles I share again and again when developers ask me about Kafka 🧵
English
9
80
323
58.5K
Bartosz Konieczny retweeted
Apache Spark
Apache Spark@ApacheSpark·
[ANNOUNCEMENT] Congrats to the Apache Spark community and all the contributors! The Apache Spark 3.5.0 release is here. Try it out! spark.apache.org/releases/spark…
English
5
31
111
14K