Sagnik

1.8K posts

Sagnik

@sagnikcodes

locked in

NE Katılım Kasım 2022

164 Takip Edilen249 Takipçiler

Sabitlenmiş Tweet

Sagnik@sagnikcodes·25 Ara

Dilon mein tum apni betabiyan leke chal rahe ho, toh zinda ho tum...

हिन्दी

1.4K

Sagnik@sagnikcodes·1h

@vllm_project engineers are crazy ,Most of the components are art I was going through the memory of gpu part Like how they decide 12 gb vram gpu with 95% utilization -> 11.4 gb in this how the whole model weight, all the kv cache , cuda graphs and other runtime memory will load and run ? Their technique - > load the model weights > run a dummy forward pass(fake inputs) > also track other memory req (kernels) > now we have --> weights + others(cuda/custom) > rest give to the kv cache > same for all models(text/mm) > so proper gpu utilization

English

Sagnik@sagnikcodes·12h

@jbhuang0604 What an algo I was just watching ur flash attention vid 😅

English

931

Jia-Bin Huang@jbhuang0604·14h

Wow, Opus 4.7 seems very capable. Let me give it a try! 45 mins later:

English

150

41K

Sagnik@sagnikcodes·1d

@squibsters Ik the third pic veggi but i forgot the exact name

English

Sanskriti Bist@squibsters·1d

Deep in the mountains of Uttarakhand, spending my time foraging, cooking and walking

English

112

2.2K

44.7K

Sagnik@sagnikcodes·1d

@gpusteve Mtp is also a spec decoding right?

English

steve@gpusteve·1d

🪩

QME

1.2K

Sagnik@sagnikcodes·1d

@w0rdgenerator 🥰

QME

bud wiser@w0rdgenerator·1d

Bro…

bud wiser@w0rdgenerator

Have you ever seen a more handsome boy

1.5K

27.2K

Sagnik@sagnikcodes·2d

@inkdrop_app Cool

English

134

Takuya 🐾 devaslife@inkdrop_app·2d

🎬 New video: Explore Zed's source code to learn how to support multiple AI providers 💪

English

1.4K

78.6K

Sagnik@sagnikcodes·2d

@gooseydotgov Give this cutu more 🐈

English

goose🪿@gooseydotgov·3d

the creature's craving for brown fishy slop has been fulfilled... yet the creature yearns for more...

English

1.6K

15.8K

131.2K

Sagnik@sagnikcodes·2d

lmao

Xiuyu Li@sheriyuo

BREAKING NEWS: You Can Steal DeepSeek Training Data By Questioning This <｜begin▁of▁sentence｜> <｜sft▁begin｜> <think> Remember to close Search #deepseek

Sagnik@sagnikcodes·3d

vast.ai gpus are quite cheap(mostly spot ig?) Modal also give free gpus decent amount , I have never tried to rent any gpu , for running for few hours and benchmarking whats the best options we have ?

English

135

Sagnik@sagnikcodes·3d

@gregpr07 browser-use rewritten in rust when

English

Gregor Zunic@gregpr07·3d

How did “let’s rewrite it in Rust” go from a meme to a single /goal prompt

English

5.2K

Sagnik@sagnikcodes·3d

@meetapersecond xactly me

English

meeta@meetapersecond·3d

me when i hear work from home 💦💦💦💦

Indian Tech & Infra@IndianTechGuide

🚨 Prime Minister Narendra Modi today: - Stop buying gold. - Bring back the work-from-home culture. - Save petrol and diesel. - Avoid foreign trips and destination weddings. - Stop importing foreign products.

English

124

6.5K

Sagnik@sagnikcodes·3d

@emxlymargaret 😰😰😰

QME

emmy award winner@emxlymargaret·3d

another mother's day where my cats didn't get me SHIT

English

137

12.7K

69.1K

924.5K

Sagnik@sagnikcodes·3d

The only i keep up to date with all the ai engg stuffs is actually build my own coding agent and keep updating , pi is actually a good start , adding good harness like codex or opencode will be tough , but lets try..

English

Sagnik@sagnikcodes·3d

@merve77_ Smol hands 😹😹

English

merve@merve77_·3d

Nasılda kendini sevdiriyor sevimli portakal. Sarı beyaz yanında olmayınca uslu uslu takılıyor.😂

Türkçe

10.8K

120.4K

Sagnik@sagnikcodes·3d

@mohitwt_ Damn coool

English

mohit@mohitwt_·3d

I'm building a GPT-2 inference engine from scratch in CUDA. the project focuses on implementing and optimizing transformer inference kernels directly at the CUDA level, with emphasis on reduction strategies, memory behavior, numerical stability, and kernel-level performance optimization. by the end, this engine will take a prompt, tokenize it, run a full GPT-2 forward pass using custom CUDA kernels, and autoregressively generate text, with KV cache for fast decoding. every operation from embedding lookup to sampling runs through kernels written and profiled from scratch. things planned for the repo: - attention kernels - causal masking - KV cache, and alot more. things i've implemented so far: - tiled matmul with shared memory - online softmax - 3 pass layernorm: benchmarked against Welford across all GPT-2 model sizes, using 3 pass as primary kernel for LN benchmarks and profiling done on RTX 3050 laptop GPU. repo: github.com/Mog9/gpt2-infe…

English

129

4.3K

Sagnik@sagnikcodes·3d

@Madhav_goyal_ damnnn looks great

English

104