Dev Patel
1.1K posts

Dev Patel
@devnp2007
CSE'29 @ Nirma 🇮🇳 | Robotics | Hardware | Software | AI ML | Devops | Chips



triton, gluon, cutedsl, hopper, blackwell, tensorcores, layouts, composition, local_tile, partitionS, partitionD, wgmma, tcgen05, TMA, block scaling, coalesced access, ampere, ada lovelace, cutlass, cublas, cudnn, flash attention, gemm, sgemm, fp16, bf16, mxfp8, nvfp4, int4, quantization, mixed precision, occupancy, reductions, warp divergence, bank conflicts, memory coalescing, shared memory, global memory, texture memory, constant memory, unified memory, epilogues, kernel fusion, graph optimization, tensorrt, torch compile, dynamo, inductor, graph capture, thread blocks, warps, SIMT, streaming multiprocessors, L1 cache, L2 cache, register spilling, thread divergence, memory bandwidth, compute capability, CUDA cores, ldg, stg, ncu, nsys, atomic operations, syncthreads, cooperative groups, dynamic parallelism, persistent kernels, vectorized loads, static quantization, tensors, swizzling, predication, instruction throughput, memory latency hiding...







AI education is moving from “watch tutorials” to “ship artifacts” That’s why I built AI Engineering from Scratch 416+ lessons. 20 phases. Every lesson outputs something reusable: prompts, skills, agents, MCP servers. Learn AI with AI, then ship tools. github.com/rohitg00/ai-en…

200 followers. next stop 300. if you’re in tech, let’s connect.


















