Ismael Juma

12K posts

Ismael Juma

Ismael Juma

@ijuma

Kafka, Scala, JVM, distributed systems, performance, machine learning, Haskell, @ConfluentInc.

San Francisco Bay Area, CA Katılım Aralık 2008
513 Takip Edilen4.6K Takipçiler
Ismael Juma retweetledi
Guillaume Smet
Guillaume Smet@gsmet_·
If you want to learn more about how we got a Quarkus REST app to start in 80ms on the JVM, have a look at the very detailed blog post we wrote with @geoand86 about Quarkus + Project Leyden. quarkus.io/blog/leyden-2/
English
0
8
12
635
Ismael Juma retweetledi
Matthias J. Sax 🦦
Matthias J. Sax 🦦@MatthiasJSax·
With the recent #ApacheKafka 4.2 release, the new "streams" rebalance protocol is production ready. It's exciting that it's also GA on Confluent Cloud now. The cherry-on-top is, that our #KafkaStreams Cloud UI now also shows task assignment information for "streams" groups. \1
English
1
1
5
529
Ismael Juma retweetledi
Jason
Jason@jason_koch·
Optimizing Recommendation Systems with JDK's Vector API I think this is maybe our first prod usage of Vector API, works great. netflixtechblog.com/optimizing-rec…
English
0
15
79
13.9K
Ismael Juma retweetledi
Anton Zhiyanov
Anton Zhiyanov@ohmypy·
SwissTable, a high-performance open-addressing hash map originally developed by Google, is becoming more popular in the industry. First, Rust adopted it for its HashMap type. Then Go started using a custom version of SwissTable for its map type. Now, Valkey (a Redis fork) has rewritten its core hash table data structure, switching from the old chained implementation to SwissTable. Pretty cool! valkey.io/blog/new-hash-…
English
8
47
560
38.7K
Ismael Juma retweetledi
Francesco Nigro
Francesco Nigro@forked_franz·
@debasishg In relation of observability for loom I've made a go-routine like tracer to help debugging the custom Loom scheduler 🚀
Francesco Nigro tweet media
English
1
5
15
1.3K
Ismael Juma retweetledi
Java
Java@java·
Boosting your Java application performance by applying Ahead-of-Time cache features—added in recent JDK releases. ⚡ This article guides you through it. social.ora.cl/6011CILyh
Java tweet media
English
1
29
159
10.7K
Gunnar Morling 🌍
Gunnar Morling 🌍@gunnarmorling·
Turns out GZIP decompression is consuming a fair share of CPU cycles when profiling the #Hardwood Parquet parser. Anyone aware of more efficient alternatives to the GZIP implementation coming with the JDK?
Gunnar Morling 🌍 tweet media
Gunnar Morling 🌍@gunnarmorling

🏎️ Made some good progress over the weekend improving the performance of the #Hardwood parser for Apache Parquet; 11 files from the 2025 NYC taxi ride data set (~720 MB) can now be fully parsed in ~1.9 sec. Besides some decoder tweaks, I focused mostly on improving the parallelism of the parser at this time. Which, as it turns out, is a surprisingly tricky problem. I'm still not really happy with how things are, but they are much better now than before. A naive approach would be to just parse separate column chunks in parallel. This can help a little bit, but it falls short very quickly: Your file might not just have many columns to begin with, or they could have different lengths (one column is repeatable, while others are not). So I took a first stab at implementing page-level parallelism (The Parquet format organizes files in row groups which are made up of column chunks which are made up of pages), which allows to fan out the work on a more fine-grained level. Once you have identified the page boundaries within a chunk (Parquet supports indexes for that, but not all files have them), and you have parsed its dictionary (if the column uses dictionary encoding), you can distribute the work of parsing pages to multiple threads, which increases CPU utilization a lot. There's still a problem: there can be significant differences in terms of how CPU-intensive the decoding of a given page is, depending on its encoding type, and thus to the time it takes to parse a page; essentially, faster columns will wait on slower columns. The way I'm currently tackling this is via adaptive page pre-fetching: slower columns build up a deeper page pre-fetching queue over time, thus more threads can pick up their parsing tasks. Eventually, whenever a new page is needed when iterating through the Parquet file, that page should be decoded already, no matter its value or encoding type. This gets me to a CPU utilization of ~800%, which is a significant improvement over single-threaded parsing or basic column-level parallelism. In wall clock profiling, I'm still seeing decoder threads idling for about half of the time, but we're getting there, step by step 🤓. 👉 github.com/hardwood-hq/ha…

English
13
6
103
26K
Ismael Juma retweetledi
Johannes Bechberger
Johannes Bechberger@parttimen3rd·
Async-profiler has released an update that includes native lock profiling and a latency filter. Consequently, ap-loader, which wraps async-profiler in a platform-independent JAR, has also been updated: github.com/jvm-profiling-…
English
0
9
22
2.8K
Ismael Juma retweetledi
Matei Zaharia
Matei Zaharia@matei_zaharia·
Super excited that we’re open sourcing the Dicer autosharder! It’s become a critical piece of infrastructure in Databricks that’s made many of our systems more scalable and reliable, and it’s powered by really cool systems work. databricks.com/blog/open-sour…
Matei Zaharia tweet media
English
5
34
240
17.5K
Ismael Juma retweetledi
Kevin Kiley
Kevin Kiley@KevinKileyCA·
The proposed “wealth tax” would not only collapse our state’s finances. It is also blatantly unconstitutional in confiscating the assets of former residents who have left California. I’m preparing legislation to expressly preempt this provision under federal law.
English
121
354
2.9K
50.4K
Ismael Juma retweetledi
Mayor Matt Mahan
Mayor Matt Mahan@MattMahanSJ·
The @CAgovernor and I have had our policy disagreements over the past few years on issues like his opposition to Prop 36, but he is spot on here. This so-called wealth tax is going to backfire and middle class taxpayers are going to be forced to pick up the bill. We need to close federal tax loopholes, cut waste - not crash California's economy. politico.com/news/2026/01/1…
English
283
136
1.9K
797.1K
Ismael Juma retweetledi
Kafka Streams
Kafka Streams@kafkastreams·
There is a new KIP-1244 proposing to deprecate the Kafka Streams Scala API, and remove it with the next major release. Please join the discussion on the dev mailing list if this concerns you.
English
0
7
16
4.8K
Ismael Juma retweetledi
Francesco Nigro
Francesco Nigro@forked_franz·
Quick poll. I am working on a Netty Loon scheduler (see github.com/franz1981/Nett…). I haven't yet released it, but will happen soon. Who is interested to try it, why and is keen to give some feedback on the @OpenJDK loom-dev list? ❤️
English
1
14
75
5.5K
Ismael Juma retweetledi
Java
Java@java·
Put on your comfy shoes, because we're walking through some of the many notable performance improvements and features in JDK 25. 🚶 social.ora.cl/6011A7thd
Java tweet media
English
7
46
303
27.1K