GPSnoopy

20 posts

GPSnoopy

@TFautre

Incompetent C++ and C# developer obsessed with performance.

Katılım Mayıs 2021

6 Takip Edilen4 Takipçiler

GPSnoopy@TFautre·10 May

@Yxuer @Peter_shirley As a baseline, using NVIDIA CUDA backend for Vulkan Raytracing, at 2560x1440 (8 samples, 16 bounces) on a GTX 1080 Ti: 3.4 FPS (0.29s per frame) There are probably some nice low hanging performance fruits to be tackled for your fun project.

English

Yxuer@Yxuer·6 May

@Peter_shirley So I took your "Raytracing in One Weekend" tutorials some years ago, and now I've reimplemented it in CUDA as a personal project! So far it's going cool! It can generate the random spheres scene (1200x675 pixels, 10 samples per pixel) in 4.5 seconds approx!

English

1.5K

GPSnoopy@TFautre·22 Nis

@DefPriPub @Peter_shirley Also, conceptually there is no reason for `final` to go faster, as all the method calls in RTIOW are on the base pointer and virtual. Only if the pointer is on the final class, will the compiler infer that the method call is direct and non-virtual.

English

306

Benjamin Summerton@DefPriPub·22 Nis

I measured C++’s ‘final’ keyword to see if it boosted performance or not. Results were unexpected. Read here: 16bpp.net/blog/post/the-…

English

22.4K

GPSnoopy@TFautre·22 Nis

@DefPriPub If you really mean business, using the CPU, then ISPC beats GCC and Clang by a healthy margin. It has a similar paradigm to GPU programming languages. Example: github.com/GPSnoopy/RayTr… `final` is just noise compared to any of these aforementioned approaches.

English

GPSnoopy@TFautre·22 Nis

@DefPriPub I've found the following to yield the best results on a i9 9900K for RT: g++ -O3 -ffast-math -march=skylake FastMath allows SSE/AVX instructions instead of slow IEEE-compliant routines. MArch Skylake implies the use of FMA instructions. Overall goes more than 2x faster.

English

462

GPSnoopy@TFautre·20 May

@KerbalSpaceP Congratulations on implementing lens flares as it ought to be done since 2007. registry.khronos.org/OpenGL/extensi…

English

265

Kerbal Space Program@KerbalSpaceP·19 May

New Dev Update: "Mohopeful" by Nate Simpson, Creative Director Read it 👉 forum.kerbalspaceprogram.com/index.php?/top…

English

274

70.9K

GPSnoopy@TFautre·2 Şub

@ID_AA_Carmack The real question is how come memory bandwidth utilization is not reported as a base metric by all the common OSes, like they do for CPU & Disk IO. Given that all algorithms are either CPU, memory or IO bound, it seems like a rather unfortunate blind spot to have.

English

118

John Carmack@ID_AA_Carmack·1 Şub

Virtual CPU utilization can be tracked and balanced neatly in cloud datacenters, but what about memory? It seems like there should be more NUMA scalability going on. Is there instrumentation about how much memory is getting intensively used, vs just allocated?

English

160

71K

GPSnoopy@TFautre·30 Ara

@d0cTB @PayPalFrance Pas que je sois expert légal, mais normalement GDPR couvre ce cas et te donne un sacré poids juridique. Wikipedia: "right to contest any automated decision-making that was made on a solely algorithmic basis, and their right to file complaints with a Data Protection Authority"

Français

323

Doc TB@d0cTB·30 Ara

Après 7 appels, discussion hallucinante avec un "superviseur" de @PayPalFrance : "C'est le système. On ne peut pas savoir, on n'a pas la main, on ne peut rien faire. J'en suis désolé mais c'est comme ça, vous n'aurez pas plus d’explications, c'est l'algorithme". Flippant. 😆

Doc TB@d0cTB

Je savais qu'il y avait une class-action aux US, mais je viens de subir le permaban de mon compte @PayPalFrance vieux de 20 ans, sans prévis ni explication, avec confiscation des thunes (200€). Impossible de joindre qui que ce soit. Ca fait très mafia...😬engadget.com/paypal-lawsuit…

Français

12.5K

GPSnoopy retweetledi

flavio@flaviocopes·12 Eki

ZXX

770

3.3K

GPSnoopy@TFautre·23 Nis

@d0cTB Single channel? Bandwidth estimates look low in general. I assume Memtest86+ is using BCOPY convention (en.wikipedia.org/wiki/Memory_ba…)? Might be good to clarify in the UI. Personally prefer the Hardware convention, as it's closer to the memory official specifications.

English

Doc TB@d0cTB·23 Nis

👀

QME

GPSnoopy@TFautre·31 Oca

@EyezCG I don't think it should take 6h to render at that resolution, even single threaded. Suggest you check this C++ version as a performance baseline: github.com/GPSnoopy/RayTr… Then the ISPC version is the best I managed on the CPU, and serves as a good basis for a pure CUDA version

English

Eyez_CG@EyezCG·31 Oca

take 6h to render 1000 x 666 image🥺🧨

English

Eyez_CG@EyezCG·31 Oca

#raytracing in one weekend (1 month) made by following @Peter_shirley ‘s down to earth book. Amazing tutorial for someone (me) who gets irritably itchy if didn’t understand every nuance of how things work. The very next thing I wanna do is to parallel it on GPU so it doesn’t

English

GPSnoopy@TFautre·8 Ara

@TheCherno I couldn't resist comparing (5950X / 3090FE): - Code from the video : 22.0 seconds - C++ : 9.9 seconds [1] - ISPC : 2.8 seconds [1] - Vulkan : 0.005 seconds (200 fps) [2] [1] github.com/GPSnoopy/RayTr… [2] github.com/GPSnoopy/RayTr…

English

Yan Chernikov@TheCherno·3 Ara

I made it FASTER // Code Review youtu.be/mOSirVeP5lo

YouTube

English

GPSnoopy@TFautre·13 Ağu

@wkjarosz This is likely not what you want to hear but: don't use static initialisation. Use explicit object creation and registration within your main() method. 30 years of Computer Science will thank you (so will your concurrency code, unit tests, and debugging tools).

English

Wojciech Jarosz@wkjarosz·12 Ağu

Any pointers for how to code up a self-registering factory pattern in #c++ that still works in static libraries? Seen many that fix the static initialization order fiasco, but none seem to work from a static lib.

English

GPSnoopy@TFautre·1 Tem

@skaven_ @Peter_shirley Do you have a reproducible example that you can share? Seems hard to believe you otherwise

English

GPSnoopy@TFautre·15 Haz

@IanCutress Care to at least give the URL? I've checked a few articles but couldn't easily find the explanation. Worth asking the license owner, no? "Most benchmarks are not open source": this guy would seriously disagree with you -> phoronix-test-suite.com

English

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·15 Haz

@TFautre Technically the license is owned by the publisher of the paper on which the code is based. My student project was 10 years ago. Regardless, many benchmarks are closed. Also, again, I've explained reasons why AVX2 to avx512 speedup is greater than expected. Go find them.

English

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·6 Mar

3DPM Peak AVX test 8-core RKL 4.6 GHz AVX512 290 W ➡ 32845 pts 64-core Zen2 3.0 GHz AVX2 280 W ➡ 28761 pts

GPSnoopy@TFautre·15 Haz

@IanCutress What's stopping you from open-sourcing it? Seriously? It's your student project. Less credibility than a big name benchmark. Getting >5x speed up when using AVX512 compared to AVX2 is not expected just from doubling the register size. Something else is at play here

English

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·14 Haz

@TFautre No source: No problem. Most benchmarks are not open source. Student Project: What's that got to do about anything? Patched by Intel: OK'ed by AMD. Very surprising results: Not at all. But you do you. I've spoken about this test at length. Not going to repeat just for you.

English

GPSnoopy@TFautre·14 Haz

@IanCutress @IanCutress Let's be honest: no source, student project, patched by Intel, and very surprising results. It requires a bit more transparency. You have gotten us used to better journalistic standards than that.

English

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·14 Haz

@TFautre Where did I say that the AVX2 path isn't optimized?

English

GPSnoopy@TFautre·14 Haz

@IanCutress "we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere" anandtech.com/show/16495/int… Implies strong possibility of bias

English

GPSnoopy retweetledi

SwiftOnSecurity@SwiftOnSecurity·21 Eki

History in pics: Testing prototype Roomba's in 1982. It would take two decades until they could be made small enough to clean under a couch.

English

108

639

3.5K

GPSnoopy@TFautre·13 May

@damageboy @ridiculous_fish On a Ryzen 5950X (PBO Off - No overclocking) AVX2: u32: 7 1.231 0.547 0.551 0.108 0.108 2.013 2 u64: 7 1.845 0.440 0.461 0.274 0.280 2.449 2 Tested using Ubuntu 20.04 on Windows 10 WSL.

English

damageboy@damageboy·13 May

@ridiculous_fish On an Intel tiger-lake i7-1165G7@2.80GHz (AVX512): u32: 7 1.321 0.594 0.594 0.109 0.082 2.472 2 u64: 7 2.174 0.528 0.528 0.326 0.277 2.319 2

Peter Ammon@ridiculous_fish·12 May

libdivide and division benchmarked on Apple M1 and Intel AVX512 ridiculousfish.com/blog/posts/ben…

English

Keşfet

@Yxuer @Peter_shirley @DefPriPub @KerbalSpaceP @ID_AA_Carmack @d0cTB @PayPalFrance @EyezCG