Alex Goldring

640 posts

Alex Goldring

@SoftEngineer

Building Shade — next-gen Web graphics engine | Graphics engineer | consultant

Katılım Ağustos 2009

37 Takip Edilen2.2K Takipçiler

Alex Goldring@SoftEngineer·4h

It's quite boring really, just a lot of ground work. The animation system is very similar to what you would write on the CPU, it's backed by a databased implementation that runs on the GPU. The database did take a while to implement... The actual animation system I got working in less than a week all the way up to skinning.

English

Tonči Jukić@toncijukic·5h

@SoftEngineer Would like to learn about your animation system too. :)

English

Alex Goldring@SoftEngineer·2d

Apparently animating more than ~20 characters in modern graphics engines is a big deal😅 324 skinned characters animating independently in WebGPU (browser) Each character is playing animating with a completely separate skeleton and a timeline. No two characters sample the same time. Character has 66 bones and 28,106 triangles I want to stress that there is no instancing of any kind here, and CPU is not involved at all.

English

396

56.4K

Alex Goldring@SoftEngineer·5h

@toncijukic Looking forward to it

English

Tonči Jukić@toncijukic·7h

@SoftEngineer Not at CDPR since this month so I will definitely explore what I can and cannot talk about legally. It is an interesting topic altogether.

English

Alex Goldring@SoftEngineer·7h

@cassett31 That's why I built it this way - different animations would be just as cheap too. Or "expensive", whichever way you want to look at it. Every character gets a full copy of geometry, a complete new skeleton, and fully evaluates it's animation.

English

mimo@cassett31·8h

@SoftEngineer Since it’s the same instance of same mesh the gpu doesnt have to process everything again and again. What about different animations then ?

English

Alex Goldring@SoftEngineer·9h

Why does it have to be a texture? Easy - because that's all it could be when the tech was invented. It's true that today you don't have to go that route. However, a lot of tooling was already built around textures, and formats too. So unless you're building something from scratch or your tooling is more modern - you'll be limited by that side too. Basically - it's legacy. Nothing wrong with old tech per-se, there are reasons why it can still be good. For example - you're limited in how many storage buffers you can attach in WebGPU, but textures don't share that same limit, so if you're pushing up against that limit - textures can still be a valid path.

English

Erfan@ahmadierfan999·18h

I'm not an expert in rendering animations, but why does it have to be in a texture? - You could have 12byte vertex data with 4 byte alignment and save memory if you use BDA or byte address buffer - textures usually need an extra staging copy to upload, that's just an extra copy and swizzle - AFAIK WebGPU doesn't let you decide on "Optimal" vs "Linear" tiling like vulkan, that means you'll be also paying for extra data swizzling into optimal layout for regular sampling (most likely a version of Z-order) yea it's brutal overall, considering streaming that small 15MB texture for a 1 sec in full would take 1-2ms based on PCIe config. I suppose there is also more advanced stuff one could do if they really wanted to do VAT, just spitting out ideas: - instead of storing all frames and linearly interpolating, use 4-6x less data, and do polynomial (quadratic) interpolation with nearest 3 points instead - to support 10-100 second weird animations, one could only stream a portion. not all of it needs to be resident on VRAM

English

161

Alex Goldring@SoftEngineer·1d

VAT or "Vertex animation" the way everyone seems to refer to it, is just pre-recorded set of positions for each vertex. Discretized into frames. Imagine like a large table with each row being a set of vertex coordinates for every vertex the character has. Most powerful GPUs cap out at 32k texture resolution, meaning that you character can't have more than 32k vertices. If your character has more than that - it's just not going to work. A lot of cards don't support 32k texture resolution, but, say 4k instead. If you have a 4k texture and you store full vertex positions, you'll need 12 bytes per vertex. Except, texture formats don't do 3 channels today, you have 1,2 or 4. We have to go with 4. That's 16 bytes (float32 = 4 bytes). The character I was showing, with ~30k vertices, would need 30,000^16 = 480 Kb of storage per frame. Let's say your animation is very short and sweet, only 30 frames, that's about 1-2 second of playback with decent resolution. You would need 15Mb of storage for that. The dancer in my video had an 18s animation loop. At just 30 frames per second, you'd need 260Mb of texture storage for just this one animation. Now let's say you want multiple animations, for multiple characters... it's a nice piece of tech for short and complex animations. Like baking a sim for example, or taking a very complex rig with 100s of IK drivers, and baking all of that down into a VAT. Lots of modern games use VATs for just that. Typically VAT will account for a tiny fraction of animations that you would see in a game. Using VAT for a skinned skeletal animation seems like a bad fit to me. That same 18s animation using exact curve data takes around < 100Kb of storage by comparison, that's 260,000% less

English

Alex Goldring@SoftEngineer·9h

It can be, but typically those spectators have very low poly counts. You take a high-poly character with bones, animate it, then decimate it, then bake that into a vertex flipbook (VAT or whatever). The reason it's done, at least in part, is because skeletal animation is expensive in those engines.

English

悪魔的Z計画@akumatekiz·20h

@SoftEngineer その通りです。スキニングなら行列を保存した方が良いです。自分もそうしています。ただ１つの短いアニメをするだけならVATの方が高速なので大量にいる観客やサイリュームライト等には良く使われていますね。

日本語

116

Alex Goldring@SoftEngineer·9h

@BrookeHodgman Yep, faces are complex, it's in that realm where skeletal animations just don't cut it or are a poor fit.

English

Hodgima Productions ⬛🟡🟥🏳️‍🌈🏳️‍⚧️🍉;@BrookeHodgman·21h

@SoftEngineer Vertex animations are most often used for "blend shapes", when you actually need very fine grained (almost one bone per triangle) deformation, like facial animation, emotion, lip sync, or special effects. Being able to merge that with regular skeletal techniques is handy.

English

Alex Goldring@SoftEngineer·9h

True enough. I wrote a whole Buffer abstraction system on top of WebGL textures for meep ( @woosh/meep-engine" target="_blank" rel="nofollow noopener">npmjs.com/package/@woosh… ), that's what it uses for implementation of Virtual Geometry and meshlet-based rendering. discourse.threejs.org/t/virtually-ge… It's a workable solution, it's also a massive pain in the rear, compared to just having the raw memory access via buffers. In meep, I even have a page-based allocator on top of that buffer abstraction, it *almost* feels like working with real buffers, but the cost of such abstraction is non-zero. It also took me weeks to tune and debug in total.

English

Hodgima Productions ⬛🟡🟥🏳️‍🌈🏳️‍⚧️🍉;@BrookeHodgman·21h

@SoftEngineer You can bind both linear buffers or textures to be arbitrarily sampled from (both are just SRVs in D3D speak). 2D textures are fine too if one row isn't enough... Often for storing data in textures or buffers you'd just use R32_UINT formats and pack/unpack the bits yourself.

English

189

Alex Goldring@SoftEngineer·23h

@MrCollison @threejs Guilty as charged😁

English

382

Matthew Collison@MrCollison·1d

@SoftEngineer In the UK they add a Vertex animation charge to every purchase I love your rendering work btw. I feel like I recognize Shade from the @threejs forums when I was researching clustered rendering Are you that guy?

English

462

Alex Goldring@SoftEngineer·1d

Implemented per-vertex and per-object motions vectors for Shade #WebGPU. Integrated with TAA. Added motion optional motion blur post process. Not necessary, but it's something that requires high quality motion vectors, so why not.

English

6.4K

Alex Goldring@SoftEngineer·1d

@ivanpopelyshev @BrookeHodgman @marcsh Ah, cool. Yeah, there are a lot of limits that you can overcome a lot more easily with compute, especially once you add atomics and subgroup operations into the mix.

English

Ivan Popelyshev 🇷🇺 ❤️ 🇪🇸 🇧🇷@ivanpopelyshev·1d

@SoftEngineer @BrookeHodgman @marcsh "mixing" in that case is adding positions, rotations per bone.

English

Alex Goldring@SoftEngineer·1d

@Rafsby Not on this, no

English

Rafsby | a KID called BEAST@Rafsby·1d

@SoftEngineer Are you using any kind of compression?

English

Alex Goldring@SoftEngineer·1d

It's a cool name "bone mixing". what is it? :D If you mean transform blending, I'm doing dual quat. If you mean performing hierarchical transforms - I do that too, without limits or doing multiple dispatches. If it's something highly specific like treating a skin like a pipe of fixed diameter...? The slides seem to have been made to accompany something. Maybe that's just me but there are no speaker notes. Let's start with a basic set of questions: 1. what problem does this solve? 2. who is this for? (use cases) 3. what is the main contribution?

English

Ivan Popelyshev 🇷🇺 ❤️ 🇪🇸 🇧🇷@ivanpopelyshev·1d

@BrookeHodgman @SoftEngineer @marcsh I would be impressed if you implemented any gpu bone mixing technique. For example, here is my approach: docs.google.com/presentation/d…

English

Alex Goldring@SoftEngineer·1d

@BrookeHodgman Thanks, I haven't seen many assets on the web with 6 or 8 weights yet, I guess mostly due to asset size constraints. But I'll plan to support more than 4

English

Hodgima Productions ⬛🟡🟥🏳️‍🌈🏳️‍⚧️🍉;@BrookeHodgman·1d

@SoftEngineer Not sure about public sources, but IME with PS4 onwards, projects took the opportunity to jump to 6 or 8 weights and hundreds of bones. Either CPU SIMD or compute skinning.

English

Alex Goldring@SoftEngineer·2d

Depends on what you mean. A character with 66 bones and 4 weights is already quite heavy. Animating 100 characters with 16 bones each and 2 blend weights is easy. On the other hand, animating more than a few dozen fully rigged characters is hard. When you play a game, and you see a ton of characters on screen - they are typically LoDs, like maybe 20 are full rigs, and the rest are drastically simplified skeletons and skins, running at reduced animation tick rate. The reason is quite simple, if you have 100 bones to animate, that's 400 animation curves to sample, at least, 4 curves per quaternion of a bone, and that's just for rotation. If you have some kind of displacement going on, such as for hips, shoulders, face etc - that would be more curves. Then, we need to update node graph for bone hierarchies, that's a bunch of 4x4 matrix multiplications. Now, if we have animation blending - you have to multiply the previous work by the number of animations you're blending. Then we also have bounds calculation, because graphics engines need to know the space bounds that a mesh occupies. Very quickly the memory footprint explodes, and there's a ton of ALU as well. You can get far by carefully packing animation and transform data in memory, but it's inherently a problem with a massive amount of data. I was recently playing Cyberpunk 2077, and wherever you look - there are typically no more than ~16 animated characters on screen at the same time. Why "on screen" matters? - because a smart animation system can take advantage of that and pause animation if the character is not on screen. GPU-driven animation and skinning system are not really new, we've been moving in that direction in an ad-hoc way. Recent notable examples would be: 1. CDPR's Witcher 4 demo 2. Remedy's Alan Wake 2, where vegetation is driven through skinned animation 3. Warhammer 40k Space Marine 2, uses the same idea as Alan Wake 2 for animating huge number of Xenos

/// //@marcsh

@SoftEngineer @ivanpopelyshev A lot of modern games are running like 50 anims that are all being blended together in complex expensive ways So when folks talk about 'animations' they often mean those sorts of controllers

English

5.1K

Alex Goldring@SoftEngineer·1d

@BrookeHodgman Can you show me some examples and bones counts that you consider to be typical in 2026?

English

Hodgima Productions ⬛🟡🟥🏳️‍🌈🏳️‍⚧️🍉;@BrookeHodgman·1d

@SoftEngineer 4 weights per vertex and 60 bones is **extremely** lightweight in 2026 for a character rig.

English

Alex Goldring@SoftEngineer·1d

@BrookeHodgman @marcsh @ivanpopelyshev Would you be impressed if it was 2500?

English

Hodgima Productions ⬛🟡🟥🏳️‍🌈🏳️‍⚧️🍉;@BrookeHodgman·1d

@marcsh @SoftEngineer @ivanpopelyshev 300 skinned animations = 300 simple non-interactive props, not 300 standard quality humanoid characters.

English

Alex Goldring@SoftEngineer·1d

@cassett31 These, basically, are 324 different meshes. Each skin is a full copy of the character. There would be no difference between this and a scene where every character being a different mesh, in terms of perf.

English

mimo@cassett31·1d

@SoftEngineer Now try with 324 different meshes

English

Alex Goldring@SoftEngineer·1d

For now - blending is coming, but controls are shipped to the CPU side. That is - the user is responsible for changing the current time if they want the animation to play from a different point. The user also sets playback_rate, which can be negative by the way, as well as weight. IK and other animation-adjacent features are not planned for now. There is space in the architecture for these features though.

English

Gabriel L. Kannenberg@gabriellkann·1d

@SoftEngineer Have you drafted a plan of animation features you would need to support for a full game? Since it's all GPU driven, I am curious how you would implement things like state machines, IK and transition blends. That's usually gameplay driven... What about multi-actor interactions?

English

Alex Goldring@SoftEngineer·1d

Actual skinning takes the most time, so it would help a lot. I'd say about 90% of the perf budget goes towards skinning and only about 10% towards actual animation. If you dropped vertex count per character from the 28k that this uses to 14k - you could expect perf to almost double.

English

108

Kryyative@KryyssX·2d

@SoftEngineer Out of curiosity. How much difference is the performance if a stylised aesthetic is used to lower the tri count per actor? Using a cel shader to justify the lack of detailing on the materials and mesh, for example.

English

122

Alex Goldring@SoftEngineer·2d

@BartekMoniewski Very true

English

Bartek Moniewski@BartekMoniewski·2d

@SoftEngineer When I hit that I immediately kill the context and start a fresh chat. There are few scientific papers that prove that any LLM will slowly deep-fry itself into full blown artificial psychosis after a bit of arguing.

English

Alex Goldring@SoftEngineer·6d

As an expert, AI gaslighting is frustrating. When Claude/Gemini/ChatGPT tries to convince you that black is white and sky is green, actually. Specific examples include hallucinating new API for WebGPU and trying to convince me that, "no, it really exists, you're just too uninformed"

English

1.3K

Keşfet

@toncijukic @cassett31 @BrookeHodgman @MrCollison @threejs @ivanpopelyshev @marcsh @elonmusk