MichalDrobot

69 posts

MichalDrobot

MichalDrobot

@MichalDrobot

Technology Fellow @InfinityWard | Personal stuff, private opinions. Studio Head @InfinityWardPL

Venice,CA / Krakow, PL انضم Şubat 2023
181 يتبع521 المتابعون
MichalDrobot
MichalDrobot@MichalDrobot·
@SebAaltonen @jamonholmgren That’s what we do too. Fixed point position representation and zero ws origin for shaders. Been like that for amist a decade and still suffices
English
1
0
5
459
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
In a custom engine, I could personally choose 32-bit fixed point. If we have 1048km * 1048km world size (height = 1048km too). We have uniform density of 4096 units in one meter (~0.2mm). Same storage and CPU cost as 32-bit floating point. But you need to make positions (3xuint32) and vectors (3xfloat32) separate types. It has engine-wide implications.
English
2
0
53
4.1K
Jamon
Jamon@jamonholmgren·
Devs are … fun. Every reply is either: 1. You should do origin shifting (the thing I literally say I’m doing in the second paragraph) 2. Wow what a dummy just use 64 bit floats / integers (sigh…) 3. Wow that’s actually hilarious, leave it in (best replies)
Jamon@jamonholmgren

Floating point precision gets pretty bad at 500 kilometers from the origin point 😅 Working on an origin shifting system, but it's slow going, given my terrain system doesn't support it and I'm having to do a bunch of C++ wrangling, compiling, reloading ... not the fastest feedback cycle

English
49
3
647
56.2K
MichalDrobot
MichalDrobot@MichalDrobot·
@terekhov_de Just reverse face cull and rasterize mesh interior when you detect camera is inside the shape. That’s a trivial check on GPU and more expensive on cpu as you need to run it against all triangles but only for lights that intersect camera origin with their bbox
English
1
0
0
98
Dmitrii Terekhov
Dmitrii Terekhov@terekhov_de·
@MichalDrobot I'm also curious how did you classify innver vs outer lights for concave light proxy? Did you run point inside polyhedron test? Did you subdivide mesh into multiple convex polyhedron? Or did you just use analytical shape?
English
1
0
0
93
Dmitrii Terekhov
Dmitrii Terekhov@terekhov_de·
@MichalDrobot Hello! I'm implementing tiled shading based on your amazing presentation. I'm curious how did you guys compensate for PCF kernel and shadow map precision in light binning?
English
2
0
0
70
MichalDrobot
MichalDrobot@MichalDrobot·
@terekhov_de - proxy shapes have baked in Max penumbra into their size. Because it’s a mesh and penumbra is dependent on distance from source this doesn’t bloat the proxy too much - we use MSAA instead of conservative. If that is not enough (due to hw) we would move vertex position in VS
English
0
0
1
74
MichalDrobot
MichalDrobot@MichalDrobot·
@FilmicWorlds With the caveat we do 4xmsaa - 8xMsaa generates too many sub pixel quads and not everything uses quad-less rendering
English
1
0
1
203
MichalDrobot
MichalDrobot@MichalDrobot·
@FilmicWorlds That’s how COD resolves upsample image in case all sub samples are available via SW VRS - with caveat we also output geometric depth / normal at full MSAA rate and use that to guide reconstruction.
English
1
0
6
493
MichalDrobot
MichalDrobot@MichalDrobot·
@KostasAAA And where is the second part where you forced the compile to coalesce those reads while retaining low VGPR count? ^^
English
0
0
7
409
Kostas Anagnostou
Kostas Anagnostou@KostasAAA·
While the long branch unnecessarily increased VGPRs, the compiler took advantage of it to batch many tex reads up front and cache the results in VGPRs. Without the inactive branch it serialised access issuing a tex load, wait, use result and then issue another, to save VGPRs. 2/2
English
2
2
33
2.3K
Kostas Anagnostou
Kostas Anagnostou@KostasAAA·
I like sharing this story as a warning that the result of optimisation may defy expectation and that one should always profile any change: Once we removed a long, inactive branch to reduce VGPR allocation/increase occupancy. Shader became unexpectedly slow, mem latency bound. 1/2
English
1
7
74
5.3K
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes Yeah not disagreeing. Specific problem specific solution. I was just pointing out hardships with portable MSAA solution (MSAA quirks aside. Gotta tell new people about sv_coverage ;> )
English
0
0
1
132
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
@MichalDrobot At that stage I'd have transitioned to direct final view reconstruction from object space domain. Walking object space, atomicMin(MSB{z,ref}LSB) at some good enough density into the final projected space, then use a neighborhood of 'refs' to get back to object space neighborhoods
English
2
0
1
221
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
So we (the industry) really ever went back to the MSAA based TAA combinations and applied everything we learned in the past decade ...
English
4
5
75
9K
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes Then you need invest heavily into visbuffer for “simple draws” and pay setup price to compensate those deficiencies. involved pipeline that is not portable I can totally see your plan working for specific art styles or more retro gaming. There is a huge Indy market for this
English
1
0
0
125
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
@MichalDrobot I see, yes once authoring pushes tri/pix density then what makes sense changes. I was thinking more of the way things had been authored in the relief mapping era.
English
1
0
0
141
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes That’s pretty much exactly the mindset and what ended up shipping in Killzone Shadow Fall (4xMSAA -> 2x scaling + TAA) and in FarCry4 8xMSAA FMask only + analytical resolve + TAA Doing sub samples in FC4 was challenging due to all that AT foliage.
English
0
0
1
105
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
@MichalDrobot My mindset is more tuned towards rendering without minSampleShading and using the temporal component to get the shading resolution up since base frame rate will be high anyway. So no soft VRS in my line of thinking either (other than whatever falls out of 8xMSAA render)
English
1
0
0
182
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes If low end hw is decent with MSAA AND you can control triangle density effectively to combat quad occupancy loss (we can’t) - that’s the best there is imo. FWIW that’s why we use stencil even for geo edges for MSAA - because we get too many interior edges that don’t do anything
English
1
0
1
147
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
@MichalDrobot I guess the question then is if you implement variable spatial scaling in the custom resolve, is the 8xMSAA loss something that can easily be covered by increased spatial scale on the low end HW? 8x area scaling at 8xMSAA is still likely good for 3xAA on edges at spatial-only
English
2
0
0
225
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes But sure I agree that 8x is the way to go for 4K. Our challenge is that we still have gen8 with 1k target and at those resolutions 8x scales really sub linear due to quad occupancy
English
1
0
1
142
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes 1) it’s 4xMSAA but we DRS only on X axis which is matching your case I believe. Similar how it was done it Killzone Shadowfall. 8xMSAA we could explore XY scaling 2) depends. On PC it’s flat /w DCC. On gen8 it’s fmask packed. Gen9 is FMask with DCC plane0/1
English
1
0
1
276
MichalDrobot
MichalDrobot@MichalDrobot·
@adamjmiles @NOTimothyLottes Maybe 1000hz is a bit unnecessary - but looking at those numbers it’s 25%-100% of your 120hz title budget gone. Aside from grams gen not being great for competitive shooters, TAA needs to be really fast there
English
0
0
5
187
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
At least AMD is honest about their runtime costs: #performance" target="_blank" rel="nofollow noopener">gpuopen.com/fidelityfx-sup… Frame-gen naturally costs more than the scaling-TAA Obvious from these numbers that the industry "standard" scaling-TAA + frame-gen isn't going to be a good solution for 1000Hz even on a 7900 XTX
NOTimothyLottes tweet media
English
3
10
46
6.7K
MichalDrobot
MichalDrobot@MichalDrobot·
@NOTimothyLottes Fully agree. Pains me to see TFLOPs going to waste for arguably minuscule quality “improvements” if we can even call laggy hallucinated images this way. Also i have a feeling that the industry is taking a step back in terms of engineering and research - dumping all into big data
English
0
0
7
249
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
People keep claiming non-AI is dead with regards to visual improvements, well that is 100% bogus. No one has yet scratched the surface of what is possible.
English
6
4
23
1.8K
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
__/ HOW TO MAKE A "GOOD" NON-ML SCALING-TAA \__ ML people specialize in understanding what training data to feed a fixed network, likely instead of understanding how the network is actually solving the problem Alternative: understand the problem and write a shader solution ...
English
2
20
110
8.9K
MichalDrobot
MichalDrobot@MichalDrobot·
@dark1x @NOTimothyLottes Sure that is true, but that area is fairly limited unless you just use processing power to compensate for deficiencies elsewhere. I am just not a fan of “let’s PT everything” and there are some good lessons learnt in offline rendering where baking and approximating is doing great
English
1
0
3
423
NOTimothyLottes
NOTimothyLottes@NOTimothyLottes·
__/NVIDIA CES 2025 KEYNOTE THOUGHTS\__ HAVE TO ASK/ There are 70 PS4s worth of shader Tflops in 5090, why do you need to artificially generate frames? PS4 as in graphics like Uncharted 4 below [in-game photo credit Ray Soemarsono]
NOTimothyLottes tweet media
English
17
40
506
145.5K
MichalDrobot
MichalDrobot@MichalDrobot·
@dark1x @NOTimothyLottes If there is no benefit to the player or to gameplay, then there is no reason to waste 100x more processing power than baking. Better use it elsewhere.
English
1
0
3
325
John Linneman
John Linneman@dark1x·
@NOTimothyLottes Well, in that specific case, it would be nice if the lighting in that scene could be reproduced in full real-time rather than baking...but I think is the goal with RT and the like. Still, that does look amazing even now.
English
1
0
8
681
MichalDrobot
MichalDrobot@MichalDrobot·
@_mamoniem @4rknova @wojtsterna Considering it barely saves anything on modern GPUs I would strongly advise against it in anything that requires precision (which probably is important here). Correlating denominators is a great way to unpredictable results and will eat into ULP precision super fast.
English
0
0
4
385
mamoniem
mamoniem@_mamoniem·
Cool observation by Wojtek Sterna
mamoniem tweet media
Polski
12
117
1.4K
56K
MichalDrobot
MichalDrobot@MichalDrobot·
@SebAaltonen hah yeah I know : ) I was actually surprised you want to add that specialized depth downsample for it - so maybe I just went ahead a bit too far. I think your plan is a solid one for those specs and expectations. As always curious to see how it pans out.
English
0
0
0
221
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
New idea for Z-downsample for SSAO: Instead of 2x2 min or max, we calculate average of them and pick the sample that's closest. Solves: Edge 1 vs 3 outlier case. Picks the most common surface. Slope = pick middle (best for bilateral upscale). Thoughts?
English
4
0
50
7.6K