Neeraj ⚡ retweetledi
Neeraj ⚡
3.8K posts


more optimisations: from 2x -> 9x speedup to -> 11x speedup in the backward pass by
> adding warp shuffle reductions to replace atomicAdd for gradient accumulation
> caching intermediate y_norm_f32 to avoid bf16 -> f32 conversion in backward
> vectorizing float4 mem access for elementwise ops
> removing redundant tensor allocations

Archie Sengupta@archiexzzz
Pic 1: Tested raw GH200 - 480GB. Pic 2: Optimized the backward pass as before the backward kernels were calling cudaMallocAsync/cudaFreeAsync inside the hot path to allocate temporary buffers for parallel reduction. this added ~400μs overhead per op(atleast on gh200). small fix i added: for small n (≤8), replaced the malloc + two-kernel reduction with a single kernel using atomic adds to shared memory → no allocation needed. backward pass speedup improved from 2.9x → 9.7x - for 320×1280×4 config
English

For this holiday release, we've focused entirely on fixing bugs and improving reliability.
We know how much stability matters in a tool that you use every day.
Read the full list of bug fixes and improvements:
cursor.com/changelog/2-3
English
Neeraj ⚡ retweetledi

NEW: Tennis star Kamil Majchrzak is looking for a young boy who had a hat snatched from him by a grown man at the US Open.
Majchrzak was seen trying to hand the boy his hat when a grown man took it and stashed it in a bag.
"After the match, I didn't record that my cap didn't get to the boy ... Could you help me find the kid from my match?" Majchrzak said on IG.
English
Neeraj ⚡ retweetledi
Neeraj ⚡ retweetledi
Neeraj ⚡ retweetledi
Neeraj ⚡ retweetledi

Address to the nation. twitter.com/i/broadcasts/1…
English

India opens three gates of Salal Dam in Jammu and Kashmir's Reasi district and one gate of Baglihar Dam in Ramban district
The government cited rising water levels due to rainfall as a reason behind this move
@sidhant joins @SehgalRahesha for more on this
English
Neeraj ⚡ retweetledi
Neeraj ⚡ retweetledi


@ft_sidd @AravSrinivas @AskPerplexity It's also about taking a stand for your beliefs. Not saying what you said is wrong, but both of these things don't need to be mutually exclusive
English

@AravSrinivas @AskPerplexity Not sure this is the right moment to boost product adoption. Bad taste.
English

@leerob For me, the worst offender is the tendency to use useEffects as handlers for user actions. Such code will constantly run into max call stack errors as the complexity grows.
So, evals for limiting useEffect usage to things that need to be done on mount/unmount would be great 🙏
English

this kid is literally the most successful 18 y/o in tech on the planet and he gets rejected from fucking NYU???? lol???
sentiment towards higher education wouldn't be this negative if admissions wasn't run by blue-haired baristas who pick sob stories over actual talent.
Zach Yadegari@zach_yadegari
18 years old 34 ACT 4.0 GPA $30M ARR biz Stanford ❌ MIT ❌ Harvard ❌ Yale ❌ WashU ❌ Columbia ❌ UPenn ❌ Princeton ❌ Duke ❌ USC ❌ Georgia Tech ✅ UVA ❌ NYU ❌ UT ✅ Vanderbilt ❌ Brown ❌ UMiami ✅ Cornell ❌
English





















