Fabian Giesen

83.6K posts

Fabian Giesen

@rygorous

Abstraction maker, abstraction breaker. @[email protected] he/him

เข้าร่วม Aralık 2009

91 กำลังติดตาม14.8K ผู้ติดตาม

ทวีตที่ปักหมุด

Fabian Giesen@rygorous·18 Kas

-> @rygorous/" target="_blank" rel="nofollow noopener">mastodon.gamedev.place/@rygorous/

ZXX

Fabian Giesen@rygorous·26 Ara

@Streetware_ This is comparable to slightly less amount of work than just converting regular (u)int32 to float is!

English

218

Fabian Giesen@rygorous·24 Ara

New blog post: "UNORM and SNORM to float, hardware edition" fgiesen.wordpress.com/2024/12/24/uno…

English

112

11.4K

Fabian Giesen@rygorous·26 Kas

We just released Oodle 2.9.13. Significantly increased BC7 encoding speed (about 20-25% encode time reduction for non-RDO on typical content, 25-30% encode time reduction for RDO) at slightly increased quality. Also several bug fixes and experimental WASM 64-bit support.

English

8.5K

Fabian Giesen@rygorous·11 Kas

@daniel_collin On x86, same thing with PSUBW + PMULHRSW + PADDW, FWIW. (PMULHRSW is basically the same as ARM SQRDMULH, the just-multiply-not-multiply-accumulate version of SQRDMLAH.)

English

572

Daniel Collin@daniel_collin·9 Kas

My new favorite ARM Neon instruction is: sqrdmlah developer.arm.com/architectures/… It allows to do LERP of 8 x i16 values with only two instructions (a vsub and the instruction above) Super useful for what I'm currently fiddling with :) Thanks to @rygorous for the tip!

English

1.1K

Fabian Giesen@rygorous·7 Kas

New blog post: "Exact UNORM8 to float" fgiesen.wordpress.com/2024/11/06/exa… a satisfying solution to a problem that, quite possibly, nobody has

English

104

10.6K

Fabian Giesen@rygorous·4 Kas

New blog post: "BC7 optimal solid-color blocks" fgiesen.wordpress.com/2024/11/03/bc7… clearing out my "I should write this up" queue, this technique is from... *checks git logs* May 2017. Oh my. (I have quite the backlog.)

English

7.8K

Fabian Giesen@rygorous·26 Eki

@tom_forsyth PMULHW is at 0x0f 0xe5. PMULHUW is 0x0f 0xe4. MUL and IMUL are ModR/M mod=4 and mod=5 in their group. It's possible they just blocked out things this way by coincidence, but given this and Andy's comments, I doubt it.

English

774

Fabian Giesen@rygorous·26 Eki

@tom_forsyth Because at the time there was a mandate to be "more RISC-y" which management at the time interpreted as "fewer instructions is good". Andy Glew was still publicly salty about it 5 years later. web.stanford.edu/class/ee380/Ab…

English

881

Fabian Giesen@rygorous·26 Eki

New blog post: "Why those particular integer multiplies?" fgiesen.wordpress.com/2024/10/26/why… some explanation and some speculation on the integer SIMD multiplies offered in x86, along with some history

English

8.3K

Fabian Giesen@rygorous·26 Eki

@corsix no, it's a completely separate unit normally

English

336

Pete Cawley@corsix·26 Eki

@rygorous Would you also try to fit pclmulqdq in to the same data path? It is after all kind of an integer mul, just without carries.

English

399

Fabian Giesen@rygorous·25 Eki

@geofflangdale It's different for every "iteration" and BC7 decode does it 1-3 times in a row. The actual decoder has this in vector regs so I don't have PDEP/PEXT to begin with.

English

247

Geoff Langdale@geofflangdale·25 Eki

@rygorous Nice! In your application, how often is the 'pos' parameter a delightful surprise that varies unpredictably per iteration? Do you ever need this twice in a row? I think this wins vs PDEP (I'm not as sure whether the "remove 0" wins vs PEXT), and is more portable, natch.

English

491

Fabian Giesen@rygorous·25 Eki

New blog post: "Inserting a 0 bit in the middle of a value" fgiesen.wordpress.com/2024/10/24/ins… I guess it's 2-for-1 bit hacks week.

English

115

10.8K

Fabian Giesen@rygorous·24 Eki

New blog post: "Zero or sign extend" fgiesen.wordpress.com/2024/10/23/zer…

English

6.2K

Fabian Giesen@rygorous·12 Eyl

@nothings my first association on reading that string of letters is cs.toronto.edu/~simon/html/un…

English

592

Fabian Giesen@rygorous·1 Eyl

@nothings Also this one that I really like! youtube.com/watch?v=mRfSM-…

YouTube

English

715

Fabian Giesen@rygorous·21 Ağu

@nothings @aras_p It's already shipped in UE 5.4! #radaudiocodec(experimental)" target="_blank" rel="nofollow noopener">dev.epicgames.com/documentation/…

English

681

Fabian Giesen@rygorous·19 Tem

@liam_whan @cmuratori Casey has me blocked so I can't even read the tweet in question. (Not that I'm really active on here anymore anyway.)

English

220

Liam Whan@liam_whan·19 Tem

@cmuratori This is cheating, but Im curious about what @rygorous thinks...

English

186

Casey Muratori@cmuratori·17 Tem

I'd like to gauge developers' instinctive feeling about types of cores they've heard of. Without looking anything up, if you could have the computing power of a single instance of one of the following cores, which one would you pick?

English

36.3K

Fabian Giesen@rygorous·6 Haz

@gdamjan @jonmasters It shipped in several Skylake SKUs. #eDRAM_architectural_changes" target="_blank" rel="nofollow noopener">en.wikichip.org/wiki/intel/mic…

English

137

will over matter@gdamjan·5 Haz

@jonmasters IIRC Intel has been talking about “Memory Side Cache” since 10 years ago. They tried something with the eDRAM, but I guess then didn't follow thru

English

782

Jon Masters 🏴‍☠️@jonmasters·5 Haz

Don’t forget split scheduler on the back end, and a “Memory Side Cache”. I know my own Bingo card was full by the time they were done describing everything Apple already shipped 4 years ago

INIYSA@lafaiel

Intel is now essentially following Apple's design philosophy, with an integrated memory architecture, a large front-end, a large L1 cache, removal of SMT, 4+4 cores

English

102

36.1K

Fabian Giesen@rygorous·27 Mar

@Simon_Fe1 @tom_forsyth @FreyaHolmer I was talking about Booth encoding in a regular multiplier (you never Booth encode both operands). I'm pretty sure squarers don't Booth encode at all, yes.

English

457

Freya Holmér@FreyaHolmer·26 Mar

lordie christ I just want to find *any* results on squaring floating point numbers in IEEE-754, apart from a single math paper the only thing search engines want to find is floating point square roots and the fast inverse square root 💀

English

135

14K

Fabian Giesen@rygorous·26 Mar

@tom_forsyth @FreyaHolmer The main application I'm aware of is "High-Speed Function Approximation Using a Minimax Quadratic Interpolator" by Piñeiro, Oberman, Muller and Bruguera. (Internals of NVidia GPU SFUs at some point, I think their current SFUs are still descended from this.)

English

633

Fabian Giesen@rygorous·26 Mar

@tom_forsyth @FreyaHolmer Squarers are mostly a thing in special function units for polynomial eval. You always only Booth encode one of the argument, the other is left alone, so that doesn't save anything, but IIRC there are some shortcuts you can do for squaring.

English

705

ค้นพบ

@daniel_collin @tom_forsyth @corsix @geofflangdale @nothings @aras_p @liam_whan @cmuratori