GPSnoopy

20 posts

GPSnoopy

GPSnoopy

@TFautre

Incompetent C++ and C# developer obsessed with performance.

Katılım Mayıs 2021
6 Takip Edilen4 Takipçiler
GPSnoopy
GPSnoopy@TFautre·
@Yxuer @Peter_shirley As a baseline, using NVIDIA CUDA backend for Vulkan Raytracing, at 2560x1440 (8 samples, 16 bounces) on a GTX 1080 Ti: 3.4 FPS (0.29s per frame) There are probably some nice low hanging performance fruits to be tackled for your fun project.
English
1
0
1
54
Yxuer
Yxuer@Yxuer·
@Peter_shirley So I took your "Raytracing in One Weekend" tutorials some years ago, and now I've reimplemented it in CUDA as a personal project! So far it's going cool! It can generate the random spheres scene (1200x675 pixels, 10 samples per pixel) in 4.5 seconds approx!
Yxuer tweet media
English
3
2
10
1.5K
GPSnoopy
GPSnoopy@TFautre·
@DefPriPub @Peter_shirley Also, conceptually there is no reason for `final` to go faster, as all the method calls in RTIOW are on the base pointer and virtual. Only if the pointer is on the final class, will the compiler infer that the method call is direct and non-virtual.
English
0
0
2
306
GPSnoopy
GPSnoopy@TFautre·
@DefPriPub If you really mean business, using the CPU, then ISPC beats GCC and Clang by a healthy margin. It has a similar paradigm to GPU programming languages. Example: github.com/GPSnoopy/RayTr… `final` is just noise compared to any of these aforementioned approaches.
English
0
0
0
84
GPSnoopy
GPSnoopy@TFautre·
@DefPriPub I've found the following to yield the best results on a i9 9900K for RT: g++ -O3 -ffast-math -march=skylake FastMath allows SSE/AVX instructions instead of slow IEEE-compliant routines. MArch Skylake implies the use of FMA instructions. Overall goes more than 2x faster.
English
1
0
2
462
GPSnoopy
GPSnoopy@TFautre·
@ID_AA_Carmack The real question is how come memory bandwidth utilization is not reported as a base metric by all the common OSes, like they do for CPU & Disk IO. Given that all algorithms are either CPU, memory or IO bound, it seems like a rather unfortunate blind spot to have.
English
1
0
0
118
John Carmack
John Carmack@ID_AA_Carmack·
Virtual CPU utilization can be tracked and balanced neatly in cloud datacenters, but what about memory? It seems like there should be more NUMA scalability going on. Is there instrumentation about how much memory is getting intensively used, vs just allocated?
English
21
12
160
71K
GPSnoopy
GPSnoopy@TFautre·
@d0cTB @PayPalFrance Pas que je sois expert légal, mais normalement GDPR couvre ce cas et te donne un sacré poids juridique. Wikipedia: "right to contest any automated decision-making that was made on a solely algorithmic basis, and their right to file complaints with a Data Protection Authority"
Français
0
0
0
323
Doc TB
Doc TB@d0cTB·
Après 7 appels, discussion hallucinante avec un "superviseur" de @PayPalFrance : "C'est le système. On ne peut pas savoir, on n'a pas la main, on ne peut rien faire. J'en suis désolé mais c'est comme ça, vous n'aurez pas plus d’explications, c'est l'algorithme". Flippant. 😆
Doc TB@d0cTB

Je savais qu'il y avait une class-action aux US, mais je viens de subir le permaban de mon compte @PayPalFrance vieux de 20 ans, sans prévis ni explication, avec confiscation des thunes (200€). Impossible de joindre qui que ce soit. Ca fait très mafia...😬engadget.com/paypal-lawsuit…

Français
8
18
46
12.5K
GPSnoopy retweetledi
flavio
flavio@flaviocopes·
flavio tweet media
ZXX
63
770
3.3K
0
GPSnoopy
GPSnoopy@TFautre·
@d0cTB Single channel? Bandwidth estimates look low in general. I assume Memtest86+ is using BCOPY convention (en.wikipedia.org/wiki/Memory_ba…)? Might be good to clarify in the UI. Personally prefer the Hardware convention, as it's closer to the memory official specifications.
English
0
0
0
0
Doc TB
Doc TB@d0cTB·
👀
Doc TB tweet media
QME
3
1
24
0
GPSnoopy
GPSnoopy@TFautre·
@EyezCG I don't think it should take 6h to render at that resolution, even single threaded. Suggest you check this C++ version as a performance baseline: github.com/GPSnoopy/RayTr… Then the ISPC version is the best I managed on the CPU, and serves as a good basis for a pure CUDA version
English
0
0
1
0
Eyez_CG
Eyez_CG@EyezCG·
take 6h to render 1000 x 666 image🥺🧨
English
3
0
5
0
Eyez_CG
Eyez_CG@EyezCG·
#raytracing in one weekend (1 month) made by following @Peter_shirley ‘s down to earth book. Amazing tutorial for someone (me) who gets irritably itchy if didn’t understand every nuance of how things work. The very next thing I wanna do is to parallel it on GPU so it doesn’t
Eyez_CG tweet mediaEyez_CG tweet mediaEyez_CG tweet mediaEyez_CG tweet media
English
1
1
24
0
GPSnoopy
GPSnoopy@TFautre·
@wkjarosz This is likely not what you want to hear but: don't use static initialisation. Use explicit object creation and registration within your main() method. 30 years of Computer Science will thank you (so will your concurrency code, unit tests, and debugging tools).
English
0
0
0
0
Wojciech Jarosz
Wojciech Jarosz@wkjarosz·
Any pointers for how to code up a self-registering factory pattern in #c++ that still works in static libraries? Seen many that fix the static initialization order fiasco, but none seem to work from a static lib.
English
3
0
3
0
GPSnoopy
GPSnoopy@TFautre·
@skaven_ @Peter_shirley Do you have a reproducible example that you can share? Seems hard to believe you otherwise
English
1
0
2
0
GPSnoopy
GPSnoopy@TFautre·
@IanCutress Care to at least give the URL? I've checked a few articles but couldn't easily find the explanation. Worth asking the license owner, no? "Most benchmarks are not open source": this guy would seriously disagree with you -> phoronix-test-suite.com
English
1
0
1
0
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠
@TFautre Technically the license is owned by the publisher of the paper on which the code is based. My student project was 10 years ago. Regardless, many benchmarks are closed. Also, again, I've explained reasons why AVX2 to avx512 speedup is greater than expected. Go find them.
English
1
0
0
0
GPSnoopy
GPSnoopy@TFautre·
@IanCutress What's stopping you from open-sourcing it? Seriously? It's your student project. Less credibility than a big name benchmark. Getting >5x speed up when using AVX512 compared to AVX2 is not expected just from doubling the register size. Something else is at play here
English
1
0
1
0
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠
@TFautre No source: No problem. Most benchmarks are not open source. Student Project: What's that got to do about anything? Patched by Intel: OK'ed by AMD. Very surprising results: Not at all. But you do you. I've spoken about this test at length. Not going to repeat just for you.
English
1
0
0
0
GPSnoopy
GPSnoopy@TFautre·
@IanCutress @IanCutress Let's be honest: no source, student project, patched by Intel, and very surprising results. It requires a bit more transparency. You have gotten us used to better journalistic standards than that.
English
1
0
0
0
GPSnoopy
GPSnoopy@TFautre·
@IanCutress "we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere" anandtech.com/show/16495/int… Implies strong possibility of bias
English
1
0
0
0
GPSnoopy retweetledi
SwiftOnSecurity
SwiftOnSecurity@SwiftOnSecurity·
History in pics: Testing prototype Roomba's in 1982. It would take two decades until they could be made small enough to clean under a couch.
SwiftOnSecurity tweet media
English
108
639
3.5K
0
GPSnoopy
GPSnoopy@TFautre·
@damageboy @ridiculous_fish On a Ryzen 5950X (PBO Off - No overclocking) AVX2: u32: 7 1.231 0.547 0.551 0.108 0.108 2.013 2 u64: 7 1.845 0.440 0.461 0.274 0.280 2.449 2 Tested using Ubuntu 20.04 on Windows 10 WSL.
English
0
0
0
0
damageboy
damageboy@damageboy·
@ridiculous_fish On an Intel tiger-lake i7-1165G7@2.80GHz (AVX512): u32: 7 1.321 0.594 0.594 0.109 0.082 2.472 2 u64: 7 2.174 0.528 0.528 0.326 0.277 2.319 2
2
0
2
0