
Ali-Reza Adl-Tabatabai
24 posts

Ali-Reza Adl-Tabatabai
@aadltaba
Dad, Computer Scientist & Entrepreneur. Passionate about developer tools. Previously head of Developer Platform at Uber. Ex Google, Facebook, and Intel.




Avoid the pitfalls of stale feature flags: they not only increase tech debt but also make debugging a nightmare. At Gitar, we're aiming to enhance every aspect of development. DM us to get an exclusive look at how we’re making development more reliable, secure & enjoyable.

Boost the performance of your Go services by 10-30% without touching a line of code! Profile-Guided Optimization (PGO) uses runtime profiles to enhance your application's latency & efficiency. Read more about it on our blog: gitar.co/blog/unlocking…



I recently investigated the 1brc challenge and its Go implementation, especially in the context of PGO (Profile-Guided Optimization) which we upstreamed to Google. However, I found that the benchmark is predominantly I/O-bound, limiting the potential benefits of advanced compiler techniques like PGO. However, there is still 3.59+% improvement via PGO. My analysis was conducted on a Mac OS with M2/64GB hardware. See details below: 1) Without PGO, average time per run = 8.08s 2) Profiling reveals that approximately 66% of the execution time is dedicated to I/O operations, specifically buffio.(*Scanner).Scan, indicating that it's an I/O-bound benchmark. Additionally, strconv.ParseFloat accounts for about 9% of the time, while accumulator::ensure contributes to 0.9% of the overall execution duration. 3) The Profile-Guided Optimization (PGO) we introduced in the Go compiler yields a 3.59% enhancement in this benchmark. This improvement is primarily due to the inlining of several critical functions across packages. Notably, when the inlining threshold was raised from 80 to 2000 for hot functions, both accumulator::ensure (100) and strconv.ParseFloat (1505) were effectively inlined, contributing to this performance boost. // benchstat pkg: github.com/warpstreamlabs… │ before.txt │ pgo_after.txt │ │ sec/op │ sec/op vs base │ YourFunction-12 8.085 ± 2% 7.795 ± 1% -3.59% (p=0.000 n=10) 4) Another interesting finding is that by applying Profile-Guided Optimization (PGO) on slice sizes, currently manually set at 1<<5, we can entirely eliminate the accumulator::ensure function from the hotpath. Increasing the slice size to 1<<9 ensures that there's no need to call ensure for expanding slice sizes [and the copies], though this is contingent on using the same measurements.txt file from run-to-run. Similarly, PGO can be applied to optimally set the buffer size in bufio.NewReaderSize(sr, 1<<19). References: 1. 1brc challenge: x.com/gunnarmorling/… 2. Go implementation: github.com/warpstreamlabs… 3. Our PGO upstreamed proposal: go.googlesource.com/proposal/+/mas…











