Meetem

2.9K posts

Meetem

@Meetem4

C# lover and Unity Performance Maniac. | 🇺🇦 переможе | Cutting milliseconds for money | https://t.co/M0cMRkjVv6

Belgrade, Republic of Serbia Katılım Ocak 2023

336 Takip Edilen396 Takipçiler

Sabitlenmiş Tweet

Meetem@Meetem4·11 Eki

DX12 Bindless Plugin for Unity3D is now open source, and you can use it in your projects! Can be useful for writing a custom GPU Driven Renderer. (github below)

English

195

13.8K

Meetem@Meetem4·24 Nis

@yak32 @playcanvas That’s really good, you just have to bake lightprobes so dynamic objects will match the scene lighting! Amazing!

English

187

Yakov@yak32·23 Nis

I finally finished my Gaussian Splat based FPS demo. It's a @playcanvas project, runs in a browser. On a real photoscan. With physics, baked lighting, pathfinding NPCs. Here's how 👇

English

140

1.5K

176.2K

Meetem@Meetem4·27 Mar

@andkalysh Very nice job! Where can I read more about that implementation?

English

199

AK@andkalysh·26 Mar

Added a Light Propagation Volume to my plugin for #UnrealEngine5. Real-time GI folks. Everybody thought it couldn't be done cheap. Wrong. Under 0.8ms on an RX 580. That's a very old card. Very impressive stuff, believe me. (1/8) #UE5 #GameDev #IndieDev

English

149

8.2K

Meetem@Meetem4·12 Mar

That’s true. I’ve asked Josh Peterson if they will remove compacting from GC, and he atm told no. I think I would’ve just remove the compacting entirely and it’s quite easy to do in coreclr. On IL2CPP perf though I don’t think IL2CPP will beat coreclr nativeaot in all the circumstances

English

Sebastiano Mandalà@sebify·11 Mar

People are a bit confused by the CoreCLR news CoreCLR is great news for newer c# adoption and compatibility but: 1) I still have to see how the new GC system is implemented. They have already said they are going to need a custom GC because .net GC is not meant for games (more)

Unity for Games@unitygames

Get ready for a Unity 6.8 alpha coming later this year, as we prepare for the entire ecosystem to take full advantage of CoreCLR for faster iteration times and reduced wait times for incremental changes.

English

490

Meetem@Meetem4·11 Mar

@SebAaltonen Interesting, last time Ive checked structured buffers in Compute pipeline they were a little bit faster. Probably they are using a different path in compute. Interesting why is that.

English

Sebastian Aaltonen@SebAaltonen·10 Mar

Nvidia is one of the few vendors who still have IA. But Turing (RTX 2000) added fastpath raw loads (~3x lower latency vs sampler) and 2x L1$ bandwidth for raw load paths. So you just use that. Performance is practically the same. But don't use typed buffers, those go though the sampler = higher latency and lower throughput.

English

818

Sebastian Aaltonen@SebAaltonen·10 Mar

This reminded me that I should kill vertex buffers and vertex input assembly setup APIs in our RHI. We no longer need GLES backwards compatibility. Vertex vertex = buffer.vertices[gl_VertexID];

Tech Bro Memes@techbromemes

English

460

33K

Meetem@Meetem4·11 Mar

@SoftEngineer Ah, Ive got it. I did something similar for Unity implementing NICE filter on gpu. Constructing a little smaller kernel and take max(currentMip-3, 0) as a source

English

Alex Goldring@SoftEngineer·10 Mar

On the mipmap generation? Well, there are a few things here. In real-time scenarios you're typically forced to run small filter kernels, like 5x5 is already large. A typically linear mipmap (basic standard) is 2x2 kernel. When we generate mipmaps offline - we can always start from mip0 (highest resolution), and run progressively larger kernels all the way. You get best results this way, but it's slow. And it doesn't scale for GPUs. If you just run recursive filter, that is filter mip0 to mip1, then use the same filter on mip1 to get mip2 etc. You get a pretty blurry result, because at each stage you're destroying some information. So with all that - my mipmap generator cheats, instead of going all the way to mip0 which is too expensive, it goes back a few mip levels, where you still have more detail and information. The kernel gets larger, but in acceptable range. I also don't use the basic linear filter. This, however, creates a conundrum where you want to run a complex filter, but not recursively. So my mechanism is split into optional forward and backward passes. Backward pass is optional, it only kicks in when needed by the requested filter. To maybe better illustrate the system, here's a processing spec for mitchell filter: --- [TextureFilterType.Mitchell]: FilterProcessDescriptor.from({ filter: FilterShaderType.Mitchell, skip_distance: 2, base_filter: FilterShaderType.MagicKernelBase, }) --- Beyond this, the mipmap generator has a scheduler, where it runs a bit of the mipmap workload at a time each frame, which prevents the main thread from locking up. There's a fair bit of complexity related to memory management and keeping drawcall overhead to the minimum.

English

128

Alex Goldring@SoftEngineer·10 Mar

Sponza scene, in Shade (WebGPU graphics engine). Was working on mipmap texture filtering. It's one of those things that can be a time black hole, but when we can improve image clarity - I can't help but feel like it's worth it. I guess my mipmap generation system in Shade is the most sophisticated mipmap system on the web. Perhaps a bit of an overkill but I'm super happy with the results.

English

4.3K

Meetem@Meetem4·10 Mar

@unitygames Will runtime be opensourced and easily recompilable as an original dotnet repo?

English

521

Unity for Games@unitygames·9 Mar

English

382

61.1K

Meetem@Meetem4·27 Şub

@RED_SIM Oh wait so basically it’s the CHAIR being baked? :O And the whole world too and then you intersect them?

English

RED_SIM@RED_SIM·27 Şub

@Meetem4 Just like regular light SH, but it's L0 + dot(L1, LightDirection)

English

RED_SIM@RED_SIM·25 Şub

A very rough proof of concept of Baked Directional AO for VRC Light Volumes. I'm not sure if I want to integrate it into light volumes for real, but it could be a VERY performant solution to make super cheap realtime contact shadows.

English

424

19.4K

Meetem@Meetem4·24 Şub

@richgel999 This is huge! Congratulations’

English

Richard Geldreich 🇺🇸@richgel999·23 Şub

There are now two full (all 14 block size) open source ASTC LDR GPU texture encoders in the world: ARM's and Binomial's. We also support ASTC HDR 4x4 and 6x6 with high quality specialized encoders. github.com/BinomialLLC/ba…

English

2.5K

Meetem@Meetem4·23 Şub

@Mirko_Salm Probably just use SH? I haven’t even knew they’ve patented that!

English

124

Mirko@Mirko_Salm·22 Şub

Is this accurate?

English

7.4K

Meetem@Meetem4·20 Şub

@AlperenAkkuncu We became closer to God. E.g. to the glass weights and lightspeed inference

English

119

Alperen@AlperenAkkuncu·20 Şub

Apparently, there is new hardware that can run LLMs much, much faster. It can easily give you over 15,000 tokens per second, which is stupidly fast. From what I understand, it's 'hard-coded,' meaning that you can't change the weights or anything; the chip itself can only run the model from the day it's manufactured, forever. Interesting concept.

English

235

170

2.8K

287.8K

Meetem retweetledi

Ереванский Лесной@ivan4th·20 Şub

Продолжая тему взвешенного подхода к использованию AI. Наткнулся тут на интересную заметку - "Breaking the Spell of Vibe Coding". Довольно интересный текст о рисках бездумного вайбкодинга. Всем разработчикам (и не только им, конечно) известно понятие flow state ("состояние потока"), то самое, когда хорошим программистам удаётся "своротить горы". Автор статьи рассматривает вариант этого состояния - dark flow, типа того, что испытывает лудоман, взаимодействуя с игровым автоматом, и утверждает, то же самое человек испытывает при вайбкодинге - когда ты вообще не смотришь на результирующий код, но крайне увлечён реализацией своей наверняка великой идеи. В результате такой работы получается "дом на песке", некая конструкция без внятной архитектуры, с кучей воркэраундов и багов, разваливающаяся под собственным весом по мере роста объёма кода. Примерно тот же результат, если вдуматься, что и у лудомана с голой жопой и коллекторами у входной двери. Думается мне, что что-то в этом есть.

Русский

Meetem@Meetem4·20 Şub

@James_M_South Getting a better performance for drawing is usually simple, what’s complicated is getting results back without a huge delay from GPU while also maintaining the frontend similar to CPU

English

121

JimBobSquarePants 🇺🇦@James_M_South·20 Şub

Writing a GPU drawing backend is REALLY difficult. I can get it rendering but not there with the perfomance yet. My new parallel CPU rasterizer is still a touch faster

English

1.4K

Meetem@Meetem4·19 Şub

@unitycoder_com Yeah funny thing they are still using cpu to submit drawcalls, and only sub cull draw calls with occlusion culling. Sorting/filtering is basically impossible to get done on gpu with the current API design. At least when I checked it (~half year ago)

English

mgear@unitycoder_com·19 Şub

@Meetem4 no idea, tried those on my heavy scenes, but it just got slower or about same : /

English

mgear@unitycoder_com·19 Şub

unity survey about "GPU Resident Drawer" discussions.unity.com/t/feedback-nee…

English

1.1K

Meetem@Meetem4·19 Şub

My terminology might be a really imprecise here tho, will conduct better tests to show what I mean exactly

English

477

Meetem@Meetem4·19 Şub

BTW Unity DXR is broken because they are using per-material variant to bind textures (sic!) instead of using bindless. So it’s hell as slow.

Gabriel Dechichi@gdechichi

just wrapped up the PBR path tracer for my engine, and looks like it's about 5x faster than @UnrealEngine and 12x faster than @unity for the same scene. goal is to make this real time

English

Meetem@Meetem4·18 Şub

@marcel_wiessler @guycalledfrank @pemathedev This one is hilarious btw: x.com/meetem4/status…

Meetem@Meetem4

CommandBuffer.CopyBuffer doesn't have an overload to copy part of the buffer to another part of the buffer. Why, @unity ? Don't tell me about platform limitations, there are virtually NO platforms which support gpu<>gpu copy and do not supoport a range copy.

English

Meetem@Meetem4·18 Şub

@marcel_wiessler @guycalledfrank @pemathedev years of unsupported binding MRTs without depth buffer (while there are no gfx api which don’t allow that): discussions.unity.com/t/setting-mult…

English

Mr F@guycalledfrank·15 Şub

Do you also dislike how Unity's WheelCollider only casts 1 center ray, so cars jump on every tiny bump? Did you also see 3rd-party assets doing like 16 rays per wheel and thought maybe it's kinda expensive? Did you know that PhysX can actually sweep convex shapes instead of rays?

English

808

367.6K

Keşfet

@yak32 @playcanvas @andkalysh @SebAaltonen @SoftEngineer @unitygames @RED_SIM @richgel999