Andreas Abel

120 posts

Andreas Abel

@uops_info

Zürich, Switzerland Katılım Mart 2014

46 Takip Edilen694 Takipçiler

Sabitlenmiş Tweet

Andreas Abel@uops_info·3d

I have added latency, throughput, and port usage data for Emerald Rapids, Meteor Lake, Arrow Lake, and Zen 5 to uops.info/table.html.

English

232

35.4K

Andreas Abel@uops_info·3d

@cmuratori x.com/uops_info/stat…

Andreas Abel@uops_info

I have added latency, throughput, and port usage data for Emerald Rapids, Meteor Lake, Arrow Lake, and Zen 5 to uops.info/table.html.

QME

1.3K

Casey Muratori@cmuratori·3d

THANK YOU UOPS.INFO!!!!

English

366

23.8K

Andreas Abel@uops_info·30 Eki

@G_melo_ding @IanCutress I'm pretty sure we didn't.

English

Game.Keeps.Loading@G_melo_ding·27 Eki

@IanCutress Lmao 😂🤣 I feel like they used chatGPT when writing it

English

335

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·27 Eki

The most amusing thing about this paper is that they're comparing the performance of superoptimizers analysis tools for simulated Intel architectures by running the benchmark suite on an AMD Ryzen 5900X. 😂👌

Matt@matt_dz

Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction arxiv.org/abs/2310.13212 IEEE International Symposium on Workload Characterization (IISWC) 2023 Andreas Abel (@uops_info), Shrey Sharma, Jan Reineke

English

10.9K

Andreas Abel retweetledi

Matt@matt_dz·26 Eki

English

17.6K

Andreas Abel@uops_info·8 Eki

@FUZxxl @AgnerFog_ On Skylake, rd*sbase has 6 uops and a throughput of 6 cycles, wr*sbase has 7 uops and a throughput of 18.

English

231

Robert Clausecker@FUZxxl·8 Eki

Does anybody know the latency/throughput of rdfsbase, rdgsbase, wrfsbase, and wrgsbase? These could be (ab)used to turn FS/GS into extra index registers, but the usual tables (@uops_info @AgnerFog_) don't have any information on them.

English

299

Andreas Abel@uops_info·28 Kas

Latency, throughput, and port usage data for #Zen4 is now available at uops.info/table.html

English

Andreas Abel@uops_info·4 Ağu

@corsix @trav_downs @geofflangdale @Wunkolo My benchmarks on SKX for VPADDD don't show such an extra uop: uops.info/html-tp/SKX/VP…

English

Pete Cawley@corsix·4 Ağu

@uops_info @trav_downs @geofflangdale @Wunkolo They tend to decompose into several uops though, one of which being a merge op to combine the old and new contents of the destination.

English

Pete Cawley@corsix·1 Ağu

Given: 1. crc32 has throughput 1 on port 1 2. pclmulqdq has throughput 1 on port 5 3. pclmulqdq+pxor can emulate crc32 It seems that fastest crc32 code should divide input in half and issue a crc32 _and_ a pclmulqdq every cycle. Code and numbers at corsix.org/content/fast-c…

English

Andreas Abel@uops_info·4 Ağu

@trav_downs @geofflangdale @corsix @Wunkolo There are several instructions with a writemask (such as "VPADDD (XMM, K, XMM, XMM)") that technically also read all three XMM registers. Other than that, TERNLOG indeed seems to be unique.

English

Travis Downs@trav_downs·4 Ağu

@uops_info @geofflangdale @corsix @Wunkolo Good point, though those only arrive in VBMI2. I wonder if TERNLOG is unique in that respect in SKX-ish?

English

Andreas Abel@uops_info·4 Ağu

@trav_downs @geofflangdale @corsix @Wunkolo There is VPSH(L/R)DV(W/D/Q)

English

Travis Downs@trav_downs·4 Ağu

@geofflangdale @corsix @Wunkolo It *is* very nice. Are there even any other 1-latency 3-[xyz]mm input instructions out there? cc @uops_info

English

Andreas Abel retweetledi

Geoff Langdale@geofflangdale·11 Haz

Good feature of uops.info: "URL" button in the top right corner gets you a URL that preserves the state of the table you've selected (which can be slowish to reconstruct). I was too dim to notice this! Thanks @uops_info for pointing this feature out.

English

Andreas Abel@uops_info·8 May

@_monoid @trav_downs Whether Zen2 actually runs this at 1 cyc/iteration depends on how xmm1 is initialized. If the previous write to xmm1 zeros the upper bits (like "vmovd xmm1, eax") it works. On the other hand, for, e.g., "vmovupd xmm1, [r14]" it runs at 9 cyc/iteration (even if [r14] contains 0).

English

Alexander Monakov@_monoid·6 May

@trav_downs @uops_info Have you seen discussion which CPUs manage to avoid false dependency on scalar SSE ops such as roundss that merge unmodified high bits into the result? Zen2 can, it runs this loop at 1cyc/iteration while UICA says all Intels stall: bit.ly/39HgkDr

English

Andreas Abel@uops_info·2 Mar

@Stanisl61420489 @InstLatX64 Already available since December at uops.info/table.html 😉 twitter.com/uops_info/stat…

Andreas Abel@uops_info

Latency, throughput, and port usage data for Alder Lake is now available at uops.info/table.html. #Intel #AlderLake (1/4)

English

Stanislav@Stanisl61420489·1 Mar

@InstLatX64 Golden Cove throughput/latency tables going to air soon too

English

InstLatX64@InstLatX64·1 Mar

#Intel released the 45th edition of the x86/x64 Software Optimization Manual with #AlderLake #GoldenCove and #Gracemont microarchitecture intel.com/content/www/us…

InstLatX64@InstLatX64

#Intel released the 44th edition of the x86/x64 Software Optimization Manual with fixed and downloadable code samples: software.intel.com/content/dam/de… GitHub: github.com/intel/optimiza…

English

122

InstLatX64@InstLatX64·8 Şub

The entire #Intel Atom fleet appeared on uops.info site! Congratulations, @uops_info!

English

Andreas Abel@uops_info·9 Şub

@InstLatX64 twitter.com/uops_info/stat…

Andreas Abel@uops_info

@IanCutress @BloodyTangerine AVX512 data for Alder Lake is now available at uops.info. I have also added instruction data for Tremont, Goldmont (Plus), Airmont, and Bonnell.

QME

Andreas Abel@uops_info·26 Oca

@IanCutress @BloodyTangerine AVX512 data for Alder Lake is now available at uops.info. I have also added instruction data for Tremont, Goldmont (Plus), Airmont, and Bonnell.

English

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·23 Ara

@uops_info @BloodyTangerine No avx512 numbers? 😉

English

Andreas Abel@uops_info·23 Ara

Latency, throughput, and port usage data for Alder Lake is now available at uops.info/table.html. #Intel #AlderLake (1/4)

English

Andreas Abel@uops_info·22 Oca

@vic_mic_ @trav_downs @dendibakh Thanks!

English

Victor Michel@vic_mic_·21 Oca

@trav_downs @dendibakh @uops_info My meager contribution: github.com/andreas-abel/u…

English

Victor Michel@vic_mic_·21 Oca

On the Skylakes that didn't get their LSD disabled, are there documented corner cases of JCC erratum mitigation not behaving as it should? This uops.info snippet with offset 50 bit.ly/3fJnTJx has a suspiciously high count of DSB+LSD when I actually run it

English

Andreas Abel@uops_info·21 Oca

@trav_downs @pervognsen @gamozolabs What would be examples of things that happen on only on odd or even cycles?

English

Travis Downs@trav_downs·21 Oca

@uops_info @pervognsen @gamozolabs Yes, but are there tricks for *all* of them? I think we don't even know all of them: there is a long tail of hidden state that starts to matter less and less, but plenty of things which happen only on odd or even cycles (so the "parity" of your start cycle matters), lots of \

English

Andreas Abel@uops_info·31 Tem

Today, I released uiCA, the "uops.info Code Analyzer". uiCA is based on data from uops.info, combined with a new detailed pipeline model. An online version (that also supports other tools) is available at uica.uops.info (1/3)

English

139

Andreas Abel@uops_info·21 Oca

@trav_downs @pervognsen @gamozolabs No, unfortunately, I don't have tricks for all of them.

English

Andreas Abel@uops_info·21 Oca

@trav_downs @gamozolabs twitter.com/uops_info/stat…

Andreas Abel@uops_info

nanoBench now has the option to perform cycle-by-cycle measurements, similar to @gamozolabs' Sushi Roll technique (twitter.com/gamozolabs/sta…) #cycle-by-cycle-measurements" target="_blank" rel="nofollow noopener">github.com/andreas-abel/n…

QME

Travis Downs@trav_downs·3 Ağu

@uops_info Awesome work! Is the extension to nanoBench which allows cycle-by-cycle measurement (so-called "Falk diagrams") available? cc @gamozolabs

English

Keşfet

@cmuratori @G_melo_ding @IanCutress @FUZxxl @AgnerFog_ @corsix @trav_downs @geofflangdale