罗杰斯

544 posts

罗杰斯 banner
罗杰斯

罗杰斯

@dhbrojas

AGI @ 池畔 🏖️ Prev. https://t.co/vrJX6VP8I0, 清华大学

Paris เข้าร่วม Nisan 2020
1.1K กำลังติดตาม435 ผู้ติดตาม
ทวีตที่ปักหมุด
罗杰斯
罗杰斯@dhbrojas·
Life Update: In June, after 3.5 years, I left Z.ai to join Poolside as an Applied Researcher on the Pre-training Data team 🏖️
罗杰斯 tweet media
English
11
0
72
7.8K
罗杰斯 รีทวีตแล้ว
the tiny corp
the tiny corp@__tinygrad__·
@jayair I'm an emo kid, non conforming as can be. You'd be non conforming too if you looked just like me.
English
2
2
109
4K
罗杰斯 รีทวีตแล้ว
maharshi
maharshi@maharshii·
ML perf is in its JS framework era, i love it.
Bohan Hou@bohanhou1998

We release TIRx today, a minimal compiler stack and hardware-native DSL for frontier ML kernels, built around storage-first tensor layouts and reusable tile primitives. tvm.apache.org/2026/06/22/tirx On NVIDIA B200, TIRx delivers up to ~1.08× over cuBLASLt on dense GEMM, outperforms DeepGEMM on all FP8 blockwise workloads with up to ~1.09× speedup, keeps FlashAttention-4 (FA4) typically within ~±2% of CuTeDSL, and remains competitive with cuBLASLt/FlashInfer on NVFP4 GEMM. Through our past experiences building frontier ML kernels, megakernels, and agentic kernel systems, we kept seeing the same boundary problem: new operators and new hardware require new optimization strategies that often break old programming models or compiler passes. TIRx builds on top of Apache TVM and moves toward a simple goal: let users and agents express the best-performing program, even for future hardware generations, while keeping the engineering effort for new kernels and new hardware as low as possible.

English
2
2
85
6.6K
罗杰斯
罗杰斯@dhbrojas·
@tmuxvim Vague-posting like this is a crime BTW
English
0
0
0
149
罗杰斯
罗杰斯@dhbrojas·
Trust me bro, it's only $20K bro, you only need four of them bro, you can run GLM 5.2 REAP INT2 at 20 tokens/s bro
English
28
18
505
36.3K
罗杰斯
罗杰斯@dhbrojas·
@1casie Glad to hear! Please don't hesitate to share some feedback, positive or negative 🫶
English
0
0
1
56
🜑
🜑@1casie·
@dhbrojas indeed, it wasn't (i had m.1 implement pliny's glossopetrae in pi! it's good, i'm proud of u and whatever)
🜑 tweet media
English
1
0
0
69
罗杰斯
罗杰斯@dhbrojas·
@CRC_8341 On my way to work, I open X, the everything app, I’m instantly blasted with TusPark五道口-core, Arcadia, the memories come flooding in
English
0
0
10
1.4K
罗杰斯
罗杰斯@dhbrojas·
Even with GLM 5.2, using Chinese models doesn’t make much sense when your average John Westernman is hooked on heavily subsidized OpenAI/Anthropic subscriptions. That’s mostly until the sun sets on AI’s $5 Uber era.
English
2
1
25
3.1K
罗杰斯
罗杰斯@dhbrojas·
@guojing0 I don’t run GLM 5.2 but I have some NVIDIA 4090s, 5090s and soon Tenstorrent Blackhole p300s
English
1
0
0
105
Jing Guo
Jing Guo@guojing0·
@dhbrojas What/Which GPUs are you running with?
English
1
0
0
27
Joel Grus 🤠
Joel Grus 🤠@joelgrus·
guy who buys $20k of hardware to avoid paying these astronomical prices
Joel Grus 🤠 tweet media
Scenic Oaks, TX 🇺🇸 English
60
19
1.6K
134.9K
罗杰斯 รีทวีตแล้ว
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
GLM 5.2 is one *of the* greatest gap reductions ever, but I think it is *the* greatest show of benchmark solidity from an open model claiming SoTA ever. Normally, you have some variety of the bad old Qwen pattern: headline benchmarks are SoTA+, new OOD ones are ≈8 months behind, and real experience is spiky, competitive in places, but usually ≈1 year behind, and sometimes utterly falling apart. Knock on it and hear the hollow sound. Yes, even DeepSeek. Not so here. There's no progressive decay. It's "Opus 4.5-4.7ish" throughout, in anything of value that you throw at it. It is the first truly, completely solid Chinese model. A phase change, I hope.
Elliot Arledge@elliotarledge

Beyond the megakernel, a 6-problem hard CUDA/Triton deck. Speedup is over torch.compile (a strong baseline, not naive PyTorch). Paged attention is where compile falls down and a real kernel runs away with it: Opus 4.8 hits 56.8x on B200.

English
13
21
597
52.8K
Arthur Zucker
Arthur Zucker@art_zucker·
People always dunk on France and our issues with #clim but not many know about Fraîcheur de Paris. Paris has a mutualized cooling system that exchanges heat with the cool Seine and pumps through the entire city. Individual AC is just so stupid compared to that: ~35% less electricity, ~50% less CO₂, ~90% less refrigerant emissions and much lower cost once installed.
English
9
0
10
4.7K
罗杰斯
罗杰斯@dhbrojas·
@eric_alcaide My hot take is that they should bring back GLM Flash 🫣
English
0
0
2
106
罗杰斯
罗杰斯@dhbrojas·
Life Update: In June, after 3.5 years, I left Z.ai to join Poolside as an Applied Researcher on the Pre-training Data team 🏖️
罗杰斯 tweet media
English
11
0
72
7.8K