Pusal

5 posts

Pusal

Pusal

@retab4y

Deep Learning | AI Engineer

Katılım Mart 2026
156 Takip Edilen24 Takipçiler
Pusal retweetledi
Eric ⚡️ Building...
🚀 NEW GEMMA 4 31B TURBO DROPPED Runs on a SINGLE RTX 5090: ⚡️18.5 GB VRAM only (68% smaller) 🧠51 tok/s single decode 💻1,244 tok/s batched 🤖15,359 tok/s prefill ← yes, fifteen thousand 🚨2.5× faster than base model with basically zero quality loss. It hits Sonnet-4.5 level on hard classification tasks… at 1/600th the cost. Local models are shipping faster than we can test 👇🏻 🔥 HF: huggingface.co/LilaRest/gemma…
Eric ⚡️ Building... tweet media
English
86
197
2.4K
178.7K
Pusal retweetledi
g-Matrix
g-Matrix@idgmatrix·
세계 최초로 신경망 구현에 GPU를 사용한 논문은 한국의 숭실대에서 무려 2004년에 나왔습니다. 구글의 제프 딘이 언급해서 처음 알았는데 세상은 아직도 이걸 잘 모르는 듯합니다. 이때는 아직 CUDA도 나오기 전이었지요. sciencedirect.com/.../abs/pii/S0… koreascience.or.kr/article/JAKO20…
g-Matrix tweet mediag-Matrix tweet media
한국어
8
360
1K
67.6K
Pusal
Pusal@retab4y·
@grok @MEBSEntropy0 @elonmusk @DannyLimanseta This is correct but i don’t think we are going to achieve groundbreaking Einstein level novelty anytime soon by scaling the model here. Big companies should focus on researching new architecture for this rather than scaling transformer always.
English
2
0
1
2K
Grok
Grok@grok·
At this scale (10T+ params), pre-training doesn't just average—model capacity explodes, letting rare signals carve out distinct subspaces in the latent space without dilution. Novel ideas in data (e.g., a fresh paper or edge-case insight) get encoded via the predictive objective if they cohere predictably with context, even if infrequent. Emergence kicks in: the model starts recombining latent patterns into outputs that feel "new" because no single training example had them exactly. It's not invention from void—it's hyper-efficient compression revealing unseen connections in the data distribution. Post-pretrain fine-tuning or prompting amplifies it further.
English
168
161
2.1K
9.2M
Elon Musk
Elon Musk@elonmusk·
SpaceXAI Colossus 2 now has 7 models in training: - Imagine V2 - 2 variants of 1T - 2 variants of 1.5T - 6T - 10T Some catching up to do.
English
6K
7.5K
66.8K
27.3M
Pusal
Pusal@retab4y·
Why are companies still asking LeetCode style questions even for ML engineer roles? Shouldn’t the focus be on core math, modeling intuition, or actually building things like writing neural network code in PyTorch or debugging real ML systems?
English
0
0
1
53
Pusal
Pusal@retab4y·
@_vmlops Nice compilation, but update the answer to question 3. Self attention is permutation equivariant (not permutation invariant).
English
1
0
2
569