MichalDrobot

69 posts

MichalDrobot

@MichalDrobot

Technology Fellow @InfinityWard | Personal stuff, private opinions. Studio Head @InfinityWardPL

Venice,CA / Krakow, PL انضم Şubat 2023

181 يتبع521 المتابعون

MichalDrobot@MichalDrobot·6 Nis

@SebAaltonen @jamonholmgren That’s what we do too. Fixed point position representation and zero ws origin for shaders. Been like that for amist a decade and still suffices

English

459

Sebastian Aaltonen@SebAaltonen·6 Nis

In a custom engine, I could personally choose 32-bit fixed point. If we have 1048km * 1048km world size (height = 1048km too). We have uniform density of 4096 units in one meter (~0.2mm). Same storage and CPU cost as 32-bit floating point. But you need to make positions (3xuint32) and vectors (3xfloat32) separate types. It has engine-wide implications.

English

4.1K

Jamon@jamonholmgren·5 Nis

Devs are … fun. Every reply is either: 1. You should do origin shifting (the thing I literally say I’m doing in the second paragraph) 2. Wow what a dummy just use 64 bit floats / integers (sigh…) 3. Wow that’s actually hilarious, leave it in (best replies)

Jamon@jamonholmgren

Floating point precision gets pretty bad at 500 kilometers from the origin point 😅 Working on an origin shifting system, but it's slow going, given my terrain system doesn't support it and I'm having to do a bunch of C++ wrangling, compiling, reloading ... not the fastest feedback cycle

English

647

56.2K

MichalDrobot@MichalDrobot·16 Kas

@terekhov_de Just reverse face cull and rasterize mesh interior when you detect camera is inside the shape. That’s a trivial check on GPU and more expensive on cpu as you need to run it against all triangles but only for lights that intersect camera origin with their bbox

English

Dmitrii Terekhov@terekhov_de·14 Kas

@MichalDrobot I'm also curious how did you classify innver vs outer lights for concave light proxy? Did you run point inside polyhedron test? Did you subdivide mesh into multiple convex polyhedron? Or did you just use analytical shape?

English

Dmitrii Terekhov@terekhov_de·14 Kas

@MichalDrobot Hello! I'm implementing tiled shading based on your amazing presentation. I'm curious how did you guys compensate for PCF kernel and shadow map precision in light binning?

English

MichalDrobot@MichalDrobot·16 Kas

@terekhov_de - proxy shapes have baked in Max penumbra into their size. Because it’s a mesh and penumbra is dependent on distance from source this doesn’t bloat the proxy too much - we use MSAA instead of conservative. If that is not enough (due to hw) we would move vertex position in VS

English

MichalDrobot@MichalDrobot·6 May

@FilmicWorlds Still based on research.activision.com/publications/2… Just added more things over the years (ie every single pass is now in MSAA alike format, all passes have pixel binning, depth and normal used selectively for upsampling etc)

English

254

John Hable@FilmicWorlds·6 May

@MichalDrobot Very cool. Do you have a link?

English

265

John Hable@FilmicWorlds·6 May

New post: Upsampling via Multisampling. The goal is to use an MSAA target in an interesting way for doing a 4x area upsample. filmicworlds.com/blog/upsamplin…

English

258

19.9K

MichalDrobot@MichalDrobot·6 May

@FilmicWorlds With the caveat we do 4xmsaa - 8xMsaa generates too many sub pixel quads and not everything uses quad-less rendering

English

203

MichalDrobot@MichalDrobot·6 May

@FilmicWorlds That’s how COD resolves upsample image in case all sub samples are available via SW VRS - with caveat we also output geometric depth / normal at full MSAA rate and use that to guide reconstruction.

English

493

MichalDrobot@MichalDrobot·30 Oca

@KostasAAA And where is the second part where you forced the compile to coalesce those reads while retaining low VGPR count? ^^

English

409

Kostas Anagnostou@KostasAAA·30 Oca

While the long branch unnecessarily increased VGPRs, the compiler took advantage of it to batch many tex reads up front and cache the results in VGPRs. Without the inactive branch it serialised access issuing a tex load, wait, use result and then issue another, to save VGPRs. 2/2

English

2.3K

Kostas Anagnostou@KostasAAA·30 Oca

I like sharing this story as a warning that the result of optimisation may defy expectation and that one should always profile any change: Once we removed a long, inactive branch to reduce VGPR allocation/increase occupancy. Shader became unexpectedly slow, mem latency bound. 1/2

English

5.3K

MichalDrobot@MichalDrobot·29 Oca

@NOTimothyLottes Yeah not disagreeing. Specific problem specific solution. I was just pointing out hardships with portable MSAA solution (MSAA quirks aside. Gotta tell new people about sv_coverage ;> )

English

132

NOTimothyLottes@NOTimothyLottes·29 Oca

@MichalDrobot At that stage I'd have transitioned to direct final view reconstruction from object space domain. Walking object space, atomicMin(MSB{z,ref}LSB) at some good enough density into the final projected space, then use a neighborhood of 'refs' to get back to object space neighborhoods

English

221

NOTimothyLottes@NOTimothyLottes·28 Oca

So we (the industry) really ever went back to the MSAA based TAA combinations and applied everything we learned in the past decade ...

English

MichalDrobot@MichalDrobot·29 Oca

@NOTimothyLottes Then you need invest heavily into visbuffer for “simple draws” and pay setup price to compensate those deficiencies. involved pipeline that is not portable I can totally see your plan working for specific art styles or more retro gaming. There is a huge Indy market for this

English

125

NOTimothyLottes@NOTimothyLottes·29 Oca

@MichalDrobot I see, yes once authoring pushes tri/pix density then what makes sense changes. I was thinking more of the way things had been authored in the relief mapping era.

English

141

MichalDrobot@MichalDrobot·29 Oca

@NOTimothyLottes That’s pretty much exactly the mindset and what ended up shipping in Killzone Shadow Fall (4xMSAA -> 2x scaling + TAA) and in FarCry4 8xMSAA FMask only + analytical resolve + TAA Doing sub samples in FC4 was challenging due to all that AT foliage.

English

105

NOTimothyLottes@NOTimothyLottes·29 Oca

@MichalDrobot My mindset is more tuned towards rendering without minSampleShading and using the temporal component to get the shading resolution up since base frame rate will be high anyway. So no soft VRS in my line of thinking either (other than whatever falls out of 8xMSAA render)

English

182

MichalDrobot@MichalDrobot·29 Oca

@NOTimothyLottes If low end hw is decent with MSAA AND you can control triangle density effectively to combat quad occupancy loss (we can’t) - that’s the best there is imo. FWIW that’s why we use stencil even for geo edges for MSAA - because we get too many interior edges that don’t do anything

English

147

NOTimothyLottes@NOTimothyLottes·29 Oca

@MichalDrobot I guess the question then is if you implement variable spatial scaling in the custom resolve, is the 8xMSAA loss something that can easily be covered by increased spatial scale on the low end HW? 8x area scaling at 8xMSAA is still likely good for 3xAA on edges at spatial-only

English

225

MichalDrobot@MichalDrobot·29 Oca

@NOTimothyLottes But sure I agree that 8x is the way to go for 4K. Our challenge is that we still have gen8 with 1k target and at those resolutions 8x scales really sub linear due to quad occupancy

English

142

MichalDrobot@MichalDrobot·29 Oca

@NOTimothyLottes 1) it’s 4xMSAA but we DRS only on X axis which is matching your case I believe. Similar how it was done it Killzone Shadowfall. 8xMSAA we could explore XY scaling 2) depends. On PC it’s flat /w DCC. On gen8 it’s fmask packed. Gen9 is FMask with DCC plane0/1

English

276

MichalDrobot@MichalDrobot·27 Oca

@adamjmiles @NOTimothyLottes Maybe 1000hz is a bit unnecessary - but looking at those numbers it’s 25%-100% of your 120hz title budget gone. Aside from grams gen not being great for competitive shooters, TAA needs to be really fast there

English

187

Adam Miles@adamjmiles·27 Oca

@NOTimothyLottes Why do we need a solution for 1000Hz?

English

502

NOTimothyLottes@NOTimothyLottes·27 Oca

At least AMD is honest about their runtime costs: #performance" target="_blank" rel="nofollow noopener">gpuopen.com/fidelityfx-sup… Frame-gen naturally costs more than the scaling-TAA Obvious from these numbers that the industry "standard" scaling-TAA + frame-gen isn't going to be a good solution for 1000Hz even on a 7900 XTX

English

6.7K

MichalDrobot@MichalDrobot·10 Oca

@NOTimothyLottes Fully agree. Pains me to see TFLOPs going to waste for arguably minuscule quality “improvements” if we can even call laggy hallucinated images this way. Also i have a feeling that the industry is taking a step back in terms of engineering and research - dumping all into big data

English

249

NOTimothyLottes@NOTimothyLottes·9 Oca

People keep claiming non-AI is dead with regards to visual improvements, well that is 100% bogus. No one has yet scratched the surface of what is possible.

English

1.8K

NOTimothyLottes@NOTimothyLottes·9 Oca

__/ HOW TO MAKE A "GOOD" NON-ML SCALING-TAA \__ ML people specialize in understanding what training data to feed a fixed network, likely instead of understanding how the network is actually solving the problem Alternative: understand the problem and write a shader solution ...

English

110

8.9K

MichalDrobot@MichalDrobot·9 Oca

@dark1x @NOTimothyLottes Sure that is true, but that area is fairly limited unless you just use processing power to compensate for deficiencies elsewhere. I am just not a fan of “let’s PT everything” and there are some good lessons learnt in offline rendering where baking and approximating is doing great

English

423

John Linneman@dark1x·9 Oca

@MichalDrobot @NOTimothyLottes Agreed but sometimes there is a benefit. Just depends on what you’re trying to do, right?

English

469

NOTimothyLottes@NOTimothyLottes·7 Oca

__/NVIDIA CES 2025 KEYNOTE THOUGHTS\__ HAVE TO ASK/ There are 70 PS4s worth of shader Tflops in 5090, why do you need to artificially generate frames? PS4 as in graphics like Uncharted 4 below [in-game photo credit Ray Soemarsono]

English

506

145.5K

MichalDrobot@MichalDrobot·9 Oca

@dark1x @NOTimothyLottes If there is no benefit to the player or to gameplay, then there is no reason to waste 100x more processing power than baking. Better use it elsewhere.

English

325

John Linneman@dark1x·9 Oca

@NOTimothyLottes Well, in that specific case, it would be nice if the lighting in that scene could be reproduced in full real-time rather than baking...but I think is the goal with RT and the like. Still, that does look amazing even now.

English

681

MichalDrobot@MichalDrobot·16 Kas

@_mamoniem @4rknova @wojtsterna Considering it barely saves anything on modern GPUs I would strongly advise against it in anything that requires precision (which probably is important here). Correlating denominators is a great way to unpredictable results and will eat into ULP precision super fast.

English

385

mamoniem@_mamoniem·15 Kas

Cool observation by Wojtek Sterna

Polski

117

1.4K

56K

MichalDrobot@MichalDrobot·10 Eki

@SebAaltonen hah yeah I know : ) I was actually surprised you want to add that specialized depth downsample for it - so maybe I just went ahead a bit too far. I think your plan is a solid one for those specs and expectations. As always curious to see how it pans out.

English

221

Sebastian Aaltonen@SebAaltonen·10 Eki

@MichalDrobot I like your reasoning, but we are talking about 50€ mobile phones here :)

English

239

Sebastian Aaltonen@SebAaltonen·9 Eki

New idea for Z-downsample for SSAO: Instead of 2x2 min or max, we calculate average of them and pick the sample that's closest. Solves: Edge 1 vs 3 outlier case. Picks the most common surface. Slope = pick middle (best for bilateral upscale). Thoughts?

English

7.6K

اكتشف

@SebAaltonen @jamonholmgren @terekhov_de @FilmicWorlds @KostasAAA @NOTimothyLottes @elonmusk @BarackObama