Burak Efe | Climbing the mount Vulkan🌋💀

121 posts

Burak Efe | Climbing the mount Vulkan🌋💀 banner
Burak Efe | Climbing the mount Vulkan🌋💀

Burak Efe | Climbing the mount Vulkan🌋💀

@burak_efe_dev

CARE! I migth post about C++ or Vulkan. Opinions are my own, not my landlord's.

Mount Vulkan Katılım Ocak 2024
128 Takip Edilen31 Takipçiler
Burak Efe | Climbing the mount Vulkan🌋💀 retweetledi
Nic Barker
Nic Barker@nicbarkeragain·
This photo has been posted so many times, and often sparks discussion about iterating towards an ideal solution, but I feel like the message for programmers hiding in plain sight here is that we should be throwing software away and rebuilding it from scratch more often
Nic Barker tweet media
English
115
71
1.3K
52.8K
Adam Sawicki
Adam Sawicki@Reg__·
@SaschaWillems2 @VulkanAPI Interesting article! In my VMA sample app I always had double-buffering of everything and no vkQueueWaitIdle but I have some validation layer errors regarding synchronization that I need to fix.
English
1
0
2
364
Sascha Willems
Sascha Willems@SaschaWillems2·
When I started work on my @VulkanAPI #Vulkan C++ samples almost 10 years ago I made some non-optimal decisions esp. around sync and command buffer recording. Finally fixed those and also write a bit about that and 10 years of Vulkan samples: saschawillems.de/blog/2025/08/1…
English
12
39
272
12.6K
Mathias W
Mathias W@mwesterdahl76·
what's a good image (tga, png etc) inspector? I've stopped using PS (it wasn't good for this anyways), and Krita isn't good either... #gamedev
English
9
0
4
600
Volkan İlbeyli
Volkan İlbeyli@Varaquilex·
Biology for understanding how eye and human color perception works Math for changing various 3d,4d spaces that map the human perception of color into numbers Technology to understand how modern color displaying works, from eye to optical device and how it's standardized to achieve as close as possible results for a given input (content) on various output devices (TVs, monitors, cinema, etc)
English
2
0
10
394
Volkan İlbeyli
Volkan İlbeyli@Varaquilex·
One of my favorite things about color spaces & rendering is that you'll have to learn about a little bit of biology, good amount of math and technology all at once, spanning a healthy amount of domains.
English
2
0
20
1.4K
Andres Hernandez
Andres Hernandez@cybereality·
So here's a better shot of the screen space subsurface scattering in #Degine on and off so it's clear what the shader is doing.
Andres Hernandez tweet mediaAndres Hernandez tweet media
English
1
0
7
262
Sergiy Kanilo
Sergiy Kanilo@spkanilo·
@icauroboros IMHO instantiation time is not the reason you don't want to use some particular feature
English
1
0
4
264
Nicholas Wilt
Nicholas Wilt@CUDAHandbook·
@icauroboros The memory traffic has to be reconciled with the cache hierarchy.
English
1
0
1
25
Nicholas Wilt
Nicholas Wilt@CUDAHandbook·
It’s a genuine tragedy and failure of our computer engineering community that memset and memcpy are not hardware primitives with fast native support. x86 has had them in STOS/MOVS since the 1970s, but nevertheless the software guys optimize for the machine they’re on, and…
Falco Girgis@falco_girgis

Been writing SH4 assembly code for the Sega Dreamcast all day and night, hoping to bring big performance gainz to everyone in the community by providing a replacement memcpy() routine that doesn't suck for our GCC toolchains. As it turns out, the Newlib-provided memcpy() we have backing the C standard library in our SH GCC toolchains is slow AF. This impacts not only our Grand Theft Auto 3 and Vice City ports, but also Doom64, Mario Kart 64, WipeOut, and virtually every homebrew game or port that uses KallistiOS! Just have a look at the benchmark results on the left to see just how shittily it performs. The benchmarker invokes a series of memcpy() implementations over an increasingly large buffer window with compile-time configurable alignments. Each iteration initializes the source buffer with a series of randomly generated numbers and clears the destination buffer before clearing both the data and icaches for each run. During the run, the performance counters on the SH4 CPU are used to record cycle-accurate timing for each memcpy() invocation, which is then validated after the run for correctness. There are also large buffers located before and after the destination buffer, which are scanned for any stray/out-of-boundary writes after each iteration. ANYWAY, what you're seeing in the benchmark output is the performance of my custom 1, 2, 4, 8, and 32-byte aligned memcpy() variants, which are highly optimized for specific use-cases, as well as the result of "memcpy_gainz()" which is the generalized form which attempts to call into the fastest of these specialized forms. Meanwhile, "memcpy_fast()" is a routine we found on the internet many years ago from STMicroelectronics which has impressive speeds, but has an LGPL license, which prevents us from statically linking to it in closed-source commercial games. Finally, "memcpy()" is the C standard library routine that ships with our toolchains... and as you can see... It runs like absolute, total, and complete shit. Somehow, at a pathologically best-case alignment of 32-bytes with 1024-byte+ copy requests, the damn thing manages to be slower than "memcpy1()" which is a simple for loop in vanilla C that could've been written by a total newbie that just copies the source buffer to the destination buffer one byte at a time... So basically all of the bazillion things that are using memcpy() in our software in the Dreamcast community, including everything ranging from copying strings or vertices to transferring packets to and from the layers of our network stack, is all taking a massive performance hit due to us having a shitty memcpy() implementation. After I discovered this, I embarked on a quest to take my specialized memcpyN() routines and see if I could use them as the basis for a generalized memcpy() routine to leverage. This is how "memcpy_gainz()" was born. Unfortunately I was on my own for this quest, as every single resource that I found for writing optimal memcpy() routines was targeted at platforms which support unaligned memory accesses. Such platforms require a fundamentally different approach from the one taken for SuperH and other RISC processors without such support. Rather than simply falling back to unaligned memory accesses, my routine attempts to align the destination buffer to 32-byte cache line boundaries where it can call into one of the fast specialized routines depending on the relative alignment of the source buffer. Then it simply does byte-by-byte unaligned copying for any bytes before or after the cache line boundaries. At this point in time, I'm happy to say that for all alignment types I am beating even our fast_memcpy() implementation for transaction sizes larger than 32 bytes and smaller than 8KB. There's still plenty of work to do for both tiny and massive sizes, but I'm stoked to see what people do with the extra cycles once this is done!

English
11
15
243
32.6K
Charlie Callahan
Charlie Callahan@ccallac7·
Working on fixing a bunch of edge cases with the automatic steam pipe routing ~3k lines of lua just to generate these pipes
Charlie Callahan tweet media
English
2
0
18
1.2K
Sarper Şoher
Sarper Şoher@sarpersoher·
So much work just to pick an object with the mouse. My respect for the CPU has grown tenfold since I started building a tiny custom game engine.
Sarper Şoher tweet media
English
1
0
5
234
logic destroyer
logic destroyer@splinedrive·
C++ is basically a disease, a money-laundering scheme. Some guys, constantly trying to prove themselves, keep writing and adding to the standards. It's complete and utter nonsense. But if you just use a subset of it, it's damn good. Still, it's bloated and has become an industry for some people who just want to cash in on it. God night
English
35
12
345
25.5K
Charlie Shenton
Charlie Shenton@charshenton·
Adreno drivers fail to perform this optimisation with all available render pass information. Failure to merge these subpasses literally triples my frame times. Please just let me program directly against the tile memory myself, this is ridiculous.
English
2
0
3
275
Charlie Shenton
Charlie Shenton@charshenton·
Starting to think we need a new graphics API dedicated to mobile GPUs. The fact I can't express something as simple as: - write to MSAA depth - resolve to single-sample depth - read locally as input attachment Within a single, merged render pass. Is a total fail.
English
1
0
7
764
Burak Efe | Climbing the mount Vulkan🌋💀
For directional lights, it seems like all of them use lux, which makes sense, you won't care about how much light emits to universe (lumen), or how much per 3d angle (candela) there is no angle, beams are parallel, the most sensible question "how bright is my front yard (lux)".
English
0
0
0
45
Burak Efe | Climbing the mount Vulkan🌋💀
I was referring to state-of-the-art renderers to see how are they handle light units, looks like there is no common choice when it comes the point lights, some uses lumen while others use candela or let user pick.
English
2
0
0
60
Keruis
Keruis@yutongwu111140·
Writing CMake is really tedious. I'm developing a blueprint with Qt, and CMake is taking up too much of my time
Keruis tweet media
English
16
0
76
13.4K