Burak Efe | Climbing the mount Vulkan🌋💀

121 posts

Burak Efe | Climbing the mount Vulkan🌋💀

@burak_efe_dev

CARE! I migth post about C++ or Vulkan. Opinions are my own, not my landlord's.

Mount Vulkan Katılım Ocak 2024

128 Takip Edilen31 Takipçiler

Burak Efe | Climbing the mount Vulkan🌋💀 retweetledi

Nic Barker@nicbarkeragain·10 Eki

This photo has been posted so many times, and often sparks discussion about iterating towards an ideal solution, but I feel like the message for programmers hiding in plain sight here is that we should be throwing software away and rebuilding it from scratch more often

English

115

1.3K

52.8K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·16 Ağu

@Reg__ @SaschaWillems2 @VulkanAPI was it this docs.vulkan.org/guide/latest/s… i didnt know submit semaphore shoul be per swpchain img and not frames before this error

English

Adam Sawicki@Reg__·15 Ağu

@SaschaWillems2 @VulkanAPI Interesting article! In my VMA sample app I always had double-buffering of everything and no vkQueueWaitIdle but I have some validation layer errors regarding synchronization that I need to fix.

English

364

Sascha Willems@SaschaWillems2·15 Ağu

When I started work on my @VulkanAPI #Vulkan C++ samples almost 10 years ago I made some non-optimal decisions esp. around sync and command buffer recording. Finally fixed those and also write a bit about that and 10 years of Vulkan samples: saschawillems.de/blog/2025/08/1…

English

272

12.6K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·19 Haz

@mwesterdahl76 Let us know if you manage to find a good one, I was searching for it too...

English

Mathias W@mwesterdahl76·18 Haz

what's a good image (tga, png etc) inspector? I've stopped using PS (it wasn't good for this anyways), and Krita isn't good either... #gamedev

English

600

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·17 Haz

@Varaquilex I think Photometry would be a better term than biology in this context. Also there is Radiometry if its a offline renderer.

English

Volkan İlbeyli@Varaquilex·17 Haz

Biology for understanding how eye and human color perception works Math for changing various 3d,4d spaces that map the human perception of color into numbers Technology to understand how modern color displaying works, from eye to optical device and how it's standardized to achieve as close as possible results for a given input (content) on various output devices (TVs, monitors, cinema, etc)

English

394

Volkan İlbeyli@Varaquilex·17 Haz

One of my favorite things about color spaces & rendering is that you'll have to learn about a little bit of biology, good amount of math and technology all at once, spanning a healthy amount of domains.

English

1.4K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·16 Haz

@cybereality Yes its visible on self shadowed parts. Im not expert but shouldnt be sss also reduce specular reflections?

English

Andres Hernandez@cybereality·16 Haz

So here's a better shot of the screen space subsurface scattering in #Degine on and off so it's clear what the shader is doing.

English

262

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·16 Haz

@DanielJCollier github.com/aras-p/ClangBu… Thanks to @aras_p btw

English

310

Daniel@DanielJCollier·15 Haz

@icauroboros what tool is this?

English

290

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·15 Haz

There are 10 reasons not to use span and 7 reasons not to use unique ptrs

Burak Efe | Climbing the mount Vulkan🌋💀 tweet media

English

6.7K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·15 Haz

@spkanilo It adds up, development time is just as important as other aspects of software

English

254

Sergiy Kanilo@spkanilo·15 Haz

@icauroboros IMHO instantiation time is not the reason you don't want to use some particular feature

English

264

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·15 Haz

@CUDAHandbook Can't be like, cpu handles caches itself while ram chip handles ram memory in async. Or is it already like this?

English

Nicholas Wilt@CUDAHandbook·15 Haz

@icauroboros The memory traffic has to be reconciled with the cache hierarchy.

English

Nicholas Wilt@CUDAHandbook·13 Haz

It’s a genuine tragedy and failure of our computer engineering community that memset and memcpy are not hardware primitives with fast native support. x86 has had them in STOS/MOVS since the 1970s, but nevertheless the software guys optimize for the machine they’re on, and…

Falco Girgis@falco_girgis

Been writing SH4 assembly code for the Sega Dreamcast all day and night, hoping to bring big performance gainz to everyone in the community by providing a replacement memcpy() routine that doesn't suck for our GCC toolchains. As it turns out, the Newlib-provided memcpy() we have backing the C standard library in our SH GCC toolchains is slow AF. This impacts not only our Grand Theft Auto 3 and Vice City ports, but also Doom64, Mario Kart 64, WipeOut, and virtually every homebrew game or port that uses KallistiOS! Just have a look at the benchmark results on the left to see just how shittily it performs. The benchmarker invokes a series of memcpy() implementations over an increasingly large buffer window with compile-time configurable alignments. Each iteration initializes the source buffer with a series of randomly generated numbers and clears the destination buffer before clearing both the data and icaches for each run. During the run, the performance counters on the SH4 CPU are used to record cycle-accurate timing for each memcpy() invocation, which is then validated after the run for correctness. There are also large buffers located before and after the destination buffer, which are scanned for any stray/out-of-boundary writes after each iteration. ANYWAY, what you're seeing in the benchmark output is the performance of my custom 1, 2, 4, 8, and 32-byte aligned memcpy() variants, which are highly optimized for specific use-cases, as well as the result of "memcpy_gainz()" which is the generalized form which attempts to call into the fastest of these specialized forms. Meanwhile, "memcpy_fast()" is a routine we found on the internet many years ago from STMicroelectronics which has impressive speeds, but has an LGPL license, which prevents us from statically linking to it in closed-source commercial games. Finally, "memcpy()" is the C standard library routine that ships with our toolchains... and as you can see... It runs like absolute, total, and complete shit. Somehow, at a pathologically best-case alignment of 32-bytes with 1024-byte+ copy requests, the damn thing manages to be slower than "memcpy1()" which is a simple for loop in vanilla C that could've been written by a total newbie that just copies the source buffer to the destination buffer one byte at a time... So basically all of the bazillion things that are using memcpy() in our software in the Dreamcast community, including everything ranging from copying strings or vertices to transferring packets to and from the layers of our network stack, is all taking a massive performance hit due to us having a shitty memcpy() implementation. After I discovered this, I embarked on a quest to take my specialized memcpyN() routines and see if I could use them as the basis for a generalized memcpy() routine to leverage. This is how "memcpy_gainz()" was born. Unfortunately I was on my own for this quest, as every single resource that I found for writing optimal memcpy() routines was targeted at platforms which support unaligned memory accesses. Such platforms require a fundamentally different approach from the one taken for SuperH and other RISC processors without such support. Rather than simply falling back to unaligned memory accesses, my routine attempts to align the destination buffer to 32-byte cache line boundaries where it can call into one of the fast specialized routines depending on the relative alignment of the source buffer. Then it simply does byte-by-byte unaligned copying for any bytes before or after the cache line boundaries. At this point in time, I'm happy to say that for all alignment types I am beating even our fast_memcpy() implementation for transaction sizes larger than 32 bytes and smaller than 8KB. There's still plenty of work to do for both tiny and massive sizes, but I'm stoked to see what people do with the extra cycles once this is done!

English

243

32.6K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·15 Haz

@ccallac7 Looks like it can be built freely and not predefined voxelized positions. Looks hard indeed.

English

Charlie Callahan@ccallac7·14 Haz

Working on fixing a bunch of edge cases with the automatic steam pipe routing ~3k lines of lua just to generate these pipes

English

1.2K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·15 Haz

@sarpersoher More like so much work to do for a raycasting, but its so useful in game dev, I think it's definitely worth it to have.

English

Sarper Şoher@sarpersoher·15 Haz

So much work just to pick an object with the mouse. My respect for the CPU has grown tenfold since I started building a tiny custom game engine.

English

234

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·11 Haz

@splinedrive True, but it will still take years to decide what not to use for that subset. And ensuring that you and your teammates stay within that subset is hard, if not impossible.

English

310

logic destroyer@splinedrive·11 Haz

C++ is basically a disease, a money-laundering scheme. Some guys, constantly trying to prove themselves, keep writing and adding to the standards. It's complete and utter nonsense. But if you just use a subset of it, it's damn good. Still, it's bloated and has become an industry for some people who just want to cash in on it. God night

English

345

25.5K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·9 Haz

light units on popular renderers

English

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·9 Haz

@charshenton or qcom should be a decent gpu developer and hire some driver programmers instead of wasting the time of graphics developers with bugs and subpar performance.

English

Charlie Shenton@charshenton·7 Haz

Adreno drivers fail to perform this optimisation with all available render pass information. Failure to merge these subpasses literally triples my frame times. Please just let me program directly against the tile memory myself, this is ridiculous.

English

275

Charlie Shenton@charshenton·7 Haz

Starting to think we need a new graphics API dedicated to mobile GPUs. The fact I can't express something as simple as: - write to MSAA depth - resolve to single-sample depth - read locally as input attachment Within a single, merged render pass. Is a total fail.

English

764

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·9 Haz

For directional lights, it seems like all of them use lux, which makes sense, you won't care about how much light emits to universe (lumen), or how much per 3d angle (candela) there is no angle, beams are parallel, the most sensible question "how bright is my front yard (lux)".

English

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·9 Haz

I was referring to state-of-the-art renderers to see how are they handle light units, looks like there is no common choice when it comes the point lights, some uses lumen while others use candela or let user pick.

English

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·9 Haz

Also I think I find a mistake in unreals documentation about lumen, not sure where to report it...

English

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·28 May

@SebAaltonen Looks incredible? Dude is levitating...

English

429

Sebastian Aaltonen@SebAaltonen·28 May

I hate comparison shots like this. There's no shadows or screen space AO. A game designed without path tracing would definitely have both of those today. And the image would still look good.

Compusemble@compusemble

One example from the video below. Standard ray tracing vs path tracing. Path tracing in F1 25 looks incredible.

English

389

40.3K

Burak Efe | Climbing the mount Vulkan🌋💀@burak_efe_dev·25 May

@yutongwu111140 for me one of the best use for AI is asking cmake questions

English

Keruis@yutongwu111140·23 May

Writing CMake is really tedious. I'm developing a blueprint with Qt, and CMake is taking up too much of my time

English

13.4K

Keşfet

@Reg__ @SaschaWillems2 @VulkanAPI @mwesterdahl76 @Varaquilex @cybereality @DanielJCollier @aras_p