Johannes Rudolph

487 posts

Johannes Rudolph

@virtualvoid

Fiddling bits since 2342 @[email protected]

Присоединился Ocak 2009

297 Подписки1.5K Подписчики

Закреплённый твит

Johannes Rudolph@virtualvoid·12 Haz

My job as a Rust Engineering Lead at a startup became victim of restructuring and I'm looking for a new role. I enjoy working on infra software, Open Source, with Rust, Scala, or whatever is needed. I'm also available contractually to support previous libs like Pekko Http.

English

15.2K

Johannes Rudolph@virtualvoid·25 Haz

@hadilq @debasishg If you need fast append/insert/delete, usually the solution is not "just an array" but clever pre-allocation schemes and a mixture between linear and linked data to amortize the cost of updates while still keeping most of the linear memory benefits for sequential access.

English

Hadi@hadilq·22 Haz

@debasishg Wait what?! Did I understand it correctly? If you use LinkedList, with O(1) add/remove, in your LRU implementation, instead of array, with O(n) insert/delete, the array one will be faster because of CPU architecture!?

English

4.3K

Debasish (দেবাশিস্) Ghosh 🇮🇳@debasishg·22 Haz

Linux recently improved their page fault handling replacing Linked Lists and Red Black trees with a new data structure that offers better cache friendliness. This talk has some pointers to why linked lists fail on modern CPUs and what it takes to make a cache friendly data structure. The Maple Tree FTW .. youtu.be/TEHRMzZ01nE?si…

YouTube

English

669

75.6K

Johannes Rudolph@virtualvoid·25 Haz

@debasishg @hadilq Another downside is that the reference chain in a linked list adds a dependency chain between elements in the list that prevents potential OOO benefits and SIMD optimizations.

English

Debasish (দেবাশিস্) Ghosh 🇮🇳@debasishg·22 Haz

With a linked list every time u traverse to the next item in the list, u r jumping to a totally random location in the memory. And potentially it's a cache miss. With earlier processors, processor speeds were roughly comparable to that of a memory access. So there was no substantial diff between accessing the next element of an array and that of a linked list. But this diff is huge with modern processors using L1, L2 or even L3 caches. So a cache miss is much costlier today. With arrays, elements are placed in contiguous locations and hence cache hit ratio is much more compared to linked lists where elements are placed in random memory locations.

English

4.6K

Johannes Rudolph@virtualvoid·17 Haz

@forked_franz I see. I looked into uprobes again for github.com/jvm-profiling-… but JIT code / code in anon mappings is still not supported. It seems that you can use mem eXecution watchpoints to trace function entry but something like return probes still seems quite elusive.

English

Francesco Nigro@forked_franz·14 Haz

@virtualvoid But the problem is that will observe just the java side , and with some overhead too :/. And will be blind to native frames and the rest of the kernel stacks, clearly...(2/2)

English

119

Francesco Nigro@forked_franz·13 Haz

Anyone aware of #function-graph-tracer" target="_blank" rel="nofollow noopener">kernel.org/doc/html/v4.18… but in some Java profiler?

English

1.1K

Johannes Rudolph@virtualvoid·14 Haz

Almost 50 years ago mastermind Loriot created this visualization of what happens during a rolling upgrade if you implemented sharding naively (w/o consistent hashing) dailymotion.com/video/x2x2dhm

GIF

English

298

Johannes Rudolph@virtualvoid·12 Haz

@Matsluni Thanks, Matthias!

English

176

Matthias Lüneberg@Matsluni·12 Haz

If you have the chance to hire Johannes, do so. Quickly. He is a very talented engineer in many fields.

Johannes Rudolph@virtualvoid

English

153

Johannes Rudolph@virtualvoid·12 Haz

In other news, I won another prize at the Vesuvius Challenge for vesuvius-gui, an interactive browser for 3D volumes (like the vesuvius scrolls) written in Rust with egui. Thanks a lot to challenge team! scrollprize.substack.com/p/may-progress…

Vesuvius Challenge@scrollprize

Congratulations to the May Progress Prize winners for contributions towards reading the Herculaneum scrolls!! scrollprize.substack.com/p/may-progress…

English

1.4K

Johannes Rudolph@virtualvoid·31 Oca

@lukasz_bialy @WojciechM_dev Great stuff, maybe I should retry llama2.scala on Scala Native as well since it was also hampered e.g. by ByteBuffer issues.

English

343

Łukasz Biały@lukasz_bialy·31 Oca

It turned out to be a bumpy road but @WojciechM_dev was there to lend a helpful hand and fix all the issues and perf problems I encountered. The journey finished yesterday in the evening and in the end I can only say: we are so back! Read more about the whole thing here: github.com/lbialy/1brc/tr… Beside @WojciechM_dev I'd like to thanks @velvetbaldmime and @armanbilge for inspiring me to look into this!

English

1.3K

Łukasz Biały@lukasz_bialy·31 Oca

In these last hours of #1brc challenge I want to share with you my small exploration not of how fast I can make JVM go BRRRRR but of whether Scala Native is now a real, usable contender in the space of functional langs compiled to native binaries. The biggest issue, of course, was whether I can parallelise the load. Scala Native did not support multithreading for the longest time but now that has changed! In mid-January I implemented a relatively (no Panama and Unsafe, no SWAR) fast solution inspired by work of @sampullara, Yavuz Tas and @royvanrijn. Then, I cooked the binary using scala-cli and benchmarked it against the fastest JVM in the tournament. #scala 1/*

English

9.8K

Johannes Rudolph@virtualvoid·13 Ara

Thank you, @scrollprize, for awarding me a prize in the Vesuvius Challenge for my Open Source contributions cataloguing released segments and running OS ink detection models on them. It has been a fun journey and great collaboration! blog.virtual-void.net/2023/12/11/ves…

English

594

Johannes Rudolph ретвитнул

Jordi Mon Companys@JordiMonPMM·10 Ara

Amazing contributions to the @scrollprize @spacegaier, @virtualvoid and others and congrats for the prizes won! scrollprize.substack.com/p/many-open-so…

English

380

Johannes Rudolph@virtualvoid·13 Ara

@ernerfeldt @scrollprize Thanks for egui :) For the first time in 15 years, I have been enjoying working on a gui after avoiding GUIs at all cost because of the lack of good options...

English

Johannes Rudolph@virtualvoid·29 Ağu

A short report on trying out different algorithms for selecting top-p elements from a distribution (aka Nucleus Sampling): blog.virtual-void.net/2023/08/29/cal…

English

443

Johannes Rudolph@virtualvoid·28 Ağu

@hetzner The whole point of cloud volumes is to be able to view storage detached from compute. Optimally, storage can be easily migrated to new compute in cases of problems. Quite a bummer if storage cannot be relocated in cases like this... @Hetzner_Online

English

373

Johannes Rudolph@virtualvoid·28 Ağu

Mmh, @hetzner cloud volume hangs while unmounting since one hour. No more actions on the server are possible, after shutting the node down other mounted volumes are also blocked, server cannot be restarted. Big share of cluster storage is unavailable => extended downtime.

English

518

Johannes Rudolph@virtualvoid·15 Ağu

Also, the day, when you first run a code generation model on your engine asking for suggestions about how to improve algorithms of the underlying engine 🤯 (the suggestions are mostly weird and broken, admittedly, but had some ideas to follow up) gist.github.com/jrudolph/fb764…

English

396

Johannes Rudolph@virtualvoid·15 Ağu

llama2.scala now has quantization support, AVX2 kernels, multithreading and can load some GGML models directly. github.com/jrudolph/llama…

English

3.8K

Johannes Rudolph@virtualvoid·4 Ağu

@RealNeilC @karpathy's llama2.c is interesting because it removes all abstraction and breaks down inference into a single file of C (which can be seen as a lingua franca of computation because data structures map closely to actual memory layout and operations to CPU instructions)

English

Johannes Rudolph@virtualvoid·4 Ağu

@RealNeilC Python's speed is mostly irrelevant since ML means instructing the GPU which matrix op to do next. In big models, each multiplication takes so long that it dwarfs the overhead of the language. If you are looking at state-of-the-art inference on CPU look at llama.cpp.

English

198

Johannes Rudolph ретвитнул

Apache Pekko@ApachePekko·2 Ağu

Apache Pekko Http 1.0.0 has been released, see #pekko-http" target="_blank" rel="nofollow noopener">pekko.apache.org/download.html#… and pekko.apache.org/docs/pekko-htt… for more details. Apache Pekko Http also includes Scala 3.3 support, the result of an ongoing community effort that spanned years!

English

8.8K

Открыть

@hadilq @debasishg @forked_franz @Matsluni @lukasz_bialy @WojciechM_dev @velvetbaldmime @armanbilge