Lucas Bradstreet @ghaz.bsky.social

1.3K posts

Lucas Bradstreet @ghaz.bsky.social

Lucas Bradstreet @ghaz.bsky.social

@ghaz

Working on the Kafka Storage team at @confluentinc. Lately I've been working on Confluent Freight. Ex-Distributed Masonry. {he/him}

Singapore Beigetreten Ağustos 2008
1.5K Folgt813 Follower
Lucas Bradstreet @ghaz.bsky.social
@ALEXEIMARTOV A little bit. But maybe it's safer than doing the typing thing for many people. The main reason is that 1PW can actually check that the website is the correct one when it auto enters it (tho you can override it). This is one big benefit of passkeys or U2F keys.
English
0
0
0
27
Martin Bradstreet
Martin Bradstreet@ALEXEIMARTOV·
@ghaz Right but kinda defeats the 2 factor thing if all they need is your one password right.
English
1
0
0
15
Lucas Bradstreet @ghaz.bsky.social
@ALEXEIMARTOV If you don't want to enter the 2FA in the first place you can store them in 1PW. I don't do that because I don't want them in the same place as my passwords, but it is pretty convenient. You should be using passkeys (I put them in 1pw) where available. Those are even safer.
English
1
0
0
36
Lucas Bradstreet @ghaz.bsky.social
@ALEXEIMARTOV The impression that I got is that they're trying to describe the code to you inline, but doesn't expect you to retain most of them in the code you merge. Most of the comments it leaves for me aren't quite to that degree though.
English
1
0
0
40
Martin Bradstreet
Martin Bradstreet@ALEXEIMARTOV·
Why does Gemini write so many comments? When I read googles coding standards a million years ago it said not to do stuff like this: if (VRReplicatedCamera) // Check if the VR Camera component is valid
English
1
0
1
502
Andrew Fisher
Andrew Fisher@acpandy·
fucking surreal levels $NVDA $86.88 $TSLA $218.88 who is buying ?
English
1
0
1
277
Lucas Bradstreet @ghaz.bsky.social retweetet
Confluent
Confluent@confluentinc·
Apache Kafka 4.0 is here! This release marks a major milestone–Kafka now runs entirely without ZooKeeper! By running in KRaft mode, Kafka simplifies deployments, improves scalability, and reduces operational overhead. Plus, get ready for: ✅ Faster consumer rebalances with KIP-848 ✅ Early access to Queues for Kafka (KIP-932) for point-to-point messaging Explore what’s new in @davidjacot's blog ➡️ cnfl.io/4iCd1vp
Confluent tweet media
English
0
9
23
1.3K
Lucas Bradstreet @ghaz.bsky.social retweetet
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
🔒 Simplify your multi-tenant app authentication with our step-by-step guide using Next.js and Postgres! Join me in today's video for 3 simple steps to multi-tenant authentication.
English
0
3
6
989
Stanislav Kozlovski
Stanislav Kozlovski@kozlovski·
Proprietary Kafka designs can save 90% of cluster costs by avoiding replication & writing directly to S3. How hard would it actually be to extend the Open Source design to do this? Surprisingly simple. If you want to go the leaderless Kafka model - you'd probably need a full rewrite, since there are a lot of changes: • every broker needs to accept data for every partition • every broker needs to write the data into a mixed multi-partition blob • the system needs a centralized consensus layer to serialize the order of messages per partition • the system needs a background job that splits the mixed blobs into properly-ordered sequential partition data It's complex. 🙄 But what if literally we did the MVP? 1. Create a special kind of topic - call it a Glacier Topic. 🧊 Still has leaders and followers as normal 2. Writes to the leader replica would be cached up to 300ms then PUT in S3 (via multi-part upload). ♻️ This would persist the progress made so far. The broker would build up the segment PUT by PUT. 3. The replication protocol still exists, but it wouldn't send Fetch requests with the actual data. 🔖 Glacier Replication would just send metadata from the leader to the followers. The local Glacier topic data is then simply metadata - basically the durable persistance layer of the multi-part PUT progress. The leader broker would only respond to produce requests once its followers have replicated the metadata about the PUT containing the produced data. And because the local topic only has light metadata - you could afford to rebalance super fast! Kafka topics with ultra fast scale-up, scale-down and load balancing? 🤩 I see one gotcha with this design. 🚨 You can't read persisted multi-part PUTs from S3 until the object is fully complete. 💡 This has implications for followers reads and failover 👇 1. Follower brokers cannot serve consume requests for the latest data. Not until the segment is fully persisted in S3. 2. Leader brokers can serve consume requests for the latest data if they cache the produced data. This is fine in the happy path, but can result in out of memory issues or unaccessible data if it has to get dropped from memory. 3. On fail-over, the new leader won't have any of the recently-written data. You have a few solutions here: • On fail over, you could simply force complete the PUT from the new leader prematurely. Then the data would be readable from S3. • You could serve follower reads by proxying them to the leader. This crosses zone boundaries ($$$) and doesn't solve the memory problem. 👎 • You could straight out say you're unable to read the latest data until the segment is closed and completely PUT 🛑🤷‍♂️ This sounds extreme but can actually be palatable at high throughput. We could have the broker break a segment (1 GiB) down into 20 chunks (e.g 50 MiB). When a chunk is full, complete the multi-part PUT. 👌 If we agree that the main use case for these Glacier Topics would be: • extremely latency-insensitive workloads ("I'll access it after tens of seconds") • high throughput - e.g 1 MB/s+ per partition (a super fair assumption) Then a 1 MiB/s partition would need less than a minute (51 seconds) to become "visible". • 4 MiB/s partition - 13 seconds to become visible • 8 MiB/s partition - 6.5 seconds to become visible 👀 If it reduces your cost by 90%... 13 seconds until you're able to "see" the data sounds like a super fair trade off for eligible use cases. And you could control the chunk count to further reduce this visibility-throughput ratio. Granted, brokers would need to rebuild the chunks to complete the segment. There would simply need to be some new background process that eventually merges this mess into one object. Sounds easily doable via e.g the Coordinator pattern Kafka leverages today. (consumer, transaction) With this new design, we'd ironically be moving Kafka toward more micro-batching oriented workloads. But I don't see anything wrong with that. Anyway. This post was my version of napkin-math design. Am I missing anything?
Stanislav Kozlovski tweet media
English
5
44
236
12.9K
Lucas Bradstreet @ghaz.bsky.social retweetet
Jack Vanlightly
Jack Vanlightly@vanlightly·
A new log replication disaggregation survey post is out! The Kafka Replication Protocol: 🔹Separation of control plane from data plane. 🔹Role separation with minimal coupling. 🔹Kafka’s alignment with Paxos roles. jack-vanlightly.com/blog/2025/2/21…
English
2
16
116
6.7K
Kevin → Plant Daddy
Kevin → Plant Daddy@KevinEspiritu·
I haven't showered indoors in over a year. Instead, I built an outdoor shower in my garden and plumbed it so the water irrigates 16 varieties of citrus trees that I planted on my front yard
English
16
7
225
31K
Lucas Bradstreet @ghaz.bsky.social retweetet
Jack Vanlightly
Jack Vanlightly@vanlightly·
New distributed systems protocol write-up! Virtual Consensus (Delos) heavily inspired the new architecture in Confluent’s Kora engine, powering Freight Clusters. This write-up dives into Virtual Consensus in Delos paper and why it makes sense as the default log replication protocol in the era of object storage and hybrid environments. jack-vanlightly.com/blog/2025/2/5/…
English
2
27
107
14.7K
Lucas Bradstreet @ghaz.bsky.social retweetet
Jack Vanlightly
Jack Vanlightly@vanlightly·
With the announcement of S3-native-streams (Freight clusters), here is a commentary on Confluent strategy regarding object storage, streaming and an open data architecture. jack-vanlightly.com/blog/2024/5/2/…
English
1
15
51
6.7K
Lucas Bradstreet @ghaz.bsky.social retweetet
Mahesh Balakrishnan
Mahesh Balakrishnan@maheshb·
confluent.io/blog/introduci… In Feb 2022, @ghaz and I wrote an internal proposal at Confluent arguing for a "cost-saving design (e.g., writing to S3 directly) that can eliminate cross-AZ traffic costs for high-rate elephant workloads". I was hoping to apply my research on Corfu and Delos on Kafka. This turned into a classical systems research project with Lucas and @garmanrnar (it felt like I was back at MSR SVC -- new types of futures! fast simulation testing! quorum protocols and atomic registers!). Luckily we anticipated an important industry trend around object storage; an immensely talented engineering team formed around the project; and today Confluent announced Freight clusters based on their hard work. Turns out systems research is important, who knew! Look forward to writing a paper about this some day, but for now it's back to building.
English
7
18
104
30.2K
Lucas Bradstreet @ghaz.bsky.social retweetet
zach
zach@ztellman·
For the past few years, my younger brother has been designing a new kind of acoustic piano: smaller, lighter, quieter. He just gave his first talk about it; if you’re at all curious about the design space for stringed instruments, check it out: vimeo.com/907981562
English
2
7
64
9.5K
apenwarr
apenwarr@apenwarr·
@lawrjones Snowflake really focuses on one thing (mostly-read-only tables that can be parallel scanned). PostgreSQL tries to be everything to everyone but is especially bad at both of those things because of its core transaction-centric architecture.
English
2
0
5
362
apenwarr
apenwarr@apenwarr·
If you were starting from scratch today, it would be a lot easier to write an open source snowflake than an open source postgresql. But alas, here we are.
English
3
1
53
10.1K
Lucas Bradstreet @ghaz.bsky.social retweetet
Michael Drogalis
Michael Drogalis@MichaelDrogalis·
🔥 It's finally here! I'm excited to announce that @ShadowTrafficIO is now available. Head to the home page to get started for free. For my entire career, I've been baffled by how long it takes to build demos, load tests, and proof-of-concept projects. Everyone's built little internal data generator tools, but none of them are good. ShadowTraffic is a tasteful, complete product that rapidly simulates production traffic to your backend—@apachekafka, @PostgreSQL, and webhooks. Help me get the word out: 👨‍💻 Hacker News: I'm on the /newest page 🛠️ Product Hunt: I'm listed under Developer Tools 📰 Reshare this post ShadowTraffic was built completely in public in just 90 days, 100% bootstrapped on my own dime. Thank you *so* much for following along guys!
English
6
34
82
21.2K