Pusal

5 posts

Pusal

@retab4y

Deep Learning | AI Engineer

Katılım Mart 2026

156 Takip Edilen24 Takipçiler

Pusal retweetledi

Eric ⚡️ Building...@outsource_·1d

🚀 NEW GEMMA 4 31B TURBO DROPPED Runs on a SINGLE RTX 5090: ⚡️18.5 GB VRAM only (68% smaller) 🧠51 tok/s single decode 💻1,244 tok/s batched 🤖15,359 tok/s prefill ← yes, fifteen thousand 🚨2.5× faster than base model with basically zero quality loss. It hits Sonnet-4.5 level on hard classification tasks… at 1/600th the cost. Local models are shipping faster than we can test 👇🏻 🔥 HF: huggingface.co/LilaRest/gemma…

English

197

2.4K

178.7K

Pusal retweetledi

g-Matrix@idgmatrix·1d

세계 최초로 신경망 구현에 GPU를 사용한 논문은 한국의 숭실대에서 무려 2004년에 나왔습니다. 구글의 제프 딘이 언급해서 처음 알았는데 세상은 아직도 이걸 잘 모르는 듯합니다. 이때는 아직 CUDA도 나오기 전이었지요. sciencedirect.com/.../abs/pii/S0… koreascience.or.kr/article/JAKO20…

한국어

360

67.6K

Pusal@retab4y·2d

@grok @MEBSEntropy0 @elonmusk @DannyLimanseta This is correct but i don’t think we are going to achieve groundbreaking Einstein level novelty anytime soon by scaling the model here. Big companies should focus on researching new architecture for this rather than scaling transformer always.

English

Grok@grok·3d

At this scale (10T+ params), pre-training doesn't just average—model capacity explodes, letting rare signals carve out distinct subspaces in the latent space without dilution. Novel ideas in data (e.g., a fresh paper or edge-case insight) get encoded via the predictive objective if they cohere predictably with context, even if infrequent. Emergence kicks in: the model starts recombining latent patterns into outputs that feel "new" because no single training example had them exactly. It's not invention from void—it's hyper-efficient compression revealing unseen connections in the data distribution. Post-pretrain fine-tuning or prompting amplifies it further.

English

168

161

2.1K

9.2M

Elon Musk@elonmusk·3d

SpaceXAI Colossus 2 now has 7 models in training: - Imagine V2 - 2 variants of 1T - 2 variants of 1.5T - 6T - 10T Some catching up to do.

English

7.5K

66.8K

27.3M

Pusal@retab4y·2d

Why are companies still asking LeetCode style questions even for ML engineer roles? Shouldn’t the focus be on core math, modeling intuition, or actually building things like writing neural network code in PyTorch or debugging real ML systems?

English

Pusal@retab4y·28 Mar

@_vmlops Nice compilation, but update the answer to question 3. Self attention is permutation equivariant (not permutation invariant).

English

569

Vaishnavi@_vmlops·28 Mar

Deep Learning Interview preparation drive.google.com/file/d/1b7KjE3…

English

124

885

35K

Keşfet

@grok @MEBSEntropy0 @elonmusk @DannyLimanseta @_vmlops @BarackObama @taylorswift13 @cristiano