Fabian Giesen

83.6K posts

Fabian Giesen

Fabian Giesen

@rygorous

Abstraction maker, abstraction breaker. @[email protected] he/him

เข้าร่วม Aralık 2009
91 กำลังติดตาม14.8K ผู้ติดตาม
ทวีตที่ปักหมุด
Fabian Giesen
Fabian Giesen@rygorous·
-> @rygorous/" target="_blank" rel="nofollow noopener">mastodon.gamedev.place/@rygorous/
ZXX
1
0
23
0
Fabian Giesen
Fabian Giesen@rygorous·
@Streetware_ This is comparable to slightly less amount of work than just converting regular (u)int32 to float is!
English
0
0
2
218
Fabian Giesen
Fabian Giesen@rygorous·
We just released Oodle 2.9.13. Significantly increased BC7 encoding speed (about 20-25% encode time reduction for non-RDO on typical content, 25-30% encode time reduction for RDO) at slightly increased quality. Also several bug fixes and experimental WASM 64-bit support.
English
2
5
70
8.5K
Fabian Giesen
Fabian Giesen@rygorous·
@daniel_collin On x86, same thing with PSUBW + PMULHRSW + PADDW, FWIW. (PMULHRSW is basically the same as ARM SQRDMULH, the just-multiply-not-multiply-accumulate version of SQRDMLAH.)
English
1
0
4
572
Daniel Collin
Daniel Collin@daniel_collin·
My new favorite ARM Neon instruction is: sqrdmlah developer.arm.com/architectures/… It allows to do LERP of 8 x i16 values with only two instructions (a vsub and the instruction above) Super useful for what I'm currently fiddling with :) Thanks to @rygorous for the tip!
English
3
1
7
1.1K
Fabian Giesen
Fabian Giesen@rygorous·
New blog post: "BC7 optimal solid-color blocks" fgiesen.wordpress.com/2024/11/03/bc7… clearing out my "I should write this up" queue, this technique is from... *checks git logs* May 2017. Oh my. (I have quite the backlog.)
English
0
12
87
7.8K
Fabian Giesen
Fabian Giesen@rygorous·
@tom_forsyth PMULHW is at 0x0f 0xe5. PMULHUW is 0x0f 0xe4. MUL and IMUL are ModR/M mod=4 and mod=5 in their group. It's possible they just blocked out things this way by coincidence, but given this and Andy's comments, I doubt it.
English
0
0
0
774
Fabian Giesen
Fabian Giesen@rygorous·
@corsix no, it's a completely separate unit normally
English
0
0
0
336
Pete Cawley
Pete Cawley@corsix·
@rygorous Would you also try to fit pclmulqdq in to the same data path? It is after all kind of an integer mul, just without carries.
English
1
0
0
399
Fabian Giesen
Fabian Giesen@rygorous·
@geofflangdale It's different for every "iteration" and BC7 decode does it 1-3 times in a row. The actual decoder has this in vector regs so I don't have PDEP/PEXT to begin with.
English
0
0
1
247
Geoff Langdale
Geoff Langdale@geofflangdale·
@rygorous Nice! In your application, how often is the 'pos' parameter a delightful surprise that varies unpredictably per iteration? Do you ever need this twice in a row? I think this wins vs PDEP (I'm not as sure whether the "remove 0" wins vs PEXT), and is more portable, natch.
English
2
0
0
491
Fabian Giesen
Fabian Giesen@rygorous·
@liam_whan @cmuratori Casey has me blocked so I can't even read the tweet in question. (Not that I'm really active on here anymore anyway.)
English
0
0
0
220
Casey Muratori
Casey Muratori@cmuratori·
I'd like to gauge developers' instinctive feeling about types of cores they've heard of. Without looking anything up, if you could have the computing power of a single instance of one of the following cores, which one would you pick?
English
41
7
93
36.3K
will over matter
will over matter@gdamjan·
@jonmasters IIRC Intel has been talking about “Memory Side Cache” since 10 years ago. They tried something with the eDRAM, but I guess then didn't follow thru
English
2
1
11
782
Fabian Giesen
Fabian Giesen@rygorous·
@Simon_Fe1 @tom_forsyth @FreyaHolmer I was talking about Booth encoding in a regular multiplier (you never Booth encode both operands). I'm pretty sure squarers don't Booth encode at all, yes.
English
1
0
1
457
Freya Holmér
Freya Holmér@FreyaHolmer·
lordie christ I just want to find *any* results on squaring floating point numbers in IEEE-754, apart from a single math paper the only thing search engines want to find is floating point square roots and the fast inverse square root 💀
English
6
0
135
14K
Fabian Giesen
Fabian Giesen@rygorous·
@tom_forsyth @FreyaHolmer The main application I'm aware of is "High-Speed Function Approximation Using a Minimax Quadratic Interpolator" by Piñeiro, Oberman, Muller and Bruguera. (Internals of NVidia GPU SFUs at some point, I think their current SFUs are still descended from this.)
English
0
0
2
633
Fabian Giesen
Fabian Giesen@rygorous·
@tom_forsyth @FreyaHolmer Squarers are mostly a thing in special function units for polynomial eval. You always only Booth encode one of the argument, the other is left alone, so that doesn't save anything, but IIRC there are some shortcuts you can do for squaring.
English
2
0
2
705