Zihao Ye

230 posts

Zihao Ye

Zihao Ye

@ye_combinator

Seattle Katılım Ekim 2017
617 Takip Edilen2K Takipçiler
Zihao Ye retweetledi
Hassan Hayat 🔥
Hassan Hayat 🔥@TheSeaMouse·
Codex laughs at your petty guardrails
Hassan Hayat 🔥 tweet media
English
81
291
6.2K
319.6K
Zihao Ye retweetledi
You Jiacheng
You Jiacheng@YouJiacheng·
DCA is cool cuz it uses different aggregators for QKV. main strength of AttnRes are: 1. Co-Design with PP (Block ver.) and Inference (2-stage batching) 2. Large-scale&solid baseline I believe we can found the same core idea in even earlier papers, literature review is hard.
Ali Behrouz@behrouz_ali

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach.

English
1
4
52
5.2K
Zihao Ye retweetledi
Ethan He
Ethan He@EthanHe_42·
My last open-source project before joining xAI is just out today. Megatron Core MoE is probably the best open framework out there to seriously train mixture of experts at scale. It achieves 1233 TFLOPS/GPU for DeepSeek-V3-685B. arxiv.org/abs/2603.07685
Ethan He tweet media
English
39
106
992
80.5K
Zihao Ye retweetledi
Claude
Claude@claudeai·
Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.
English
2.1K
5.2K
63K
23.3M
Zihao Ye retweetledi
Shiyi Cao
Shiyi Cao@shiyi_c98·
🤖🤖 Tried something fun today: asked Claude Code to create an agent team (an Implementer + a Planner) to implement the flashinfer mla paged decode CUDA kernel. The Implementer spent ~20 turns writing tests and debugging to use wgmma but kept getting stuck.😵‍💫😵‍💫😵‍💫 The Planner noticed the stagnation and (at its own decision) went off to carefully read the CUTLASS docs 📚, found the bug, and suggested the fix — the Implementer applied the fix and it worked immediately! Watching this kind of emergent coordination behaviour in an agent team is pretty interesting✨
Shiyi Cao tweet mediaShiyi Cao tweet mediaShiyi Cao tweet media
English
4
7
134
8.6K
Zihao Ye retweetledi
Axiom
Axiom@axiommathai·
1/ RELEASING AXLE: the Axiom Lean Engine ⚙️ We are serving our core Infrastructure for formal proving at scale. These are the same Lean metaprogramming tools that are behind AxiomProver, powering it to win Putnam and crack open research conjectures. Available to anyone today!
Axiom tweet media
English
11
66
427
111.5K
Zihao Ye
Zihao Ye@ye_combinator·
@YouJiacheng You can compile it with tvm-ffi and loading the .so in any languages.
English
1
0
7
501
Zihao Ye retweetledi
Ted Zadouri
Ted Zadouri@tedzadouri·
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Ted Zadouri tweet media
English
7
132
782
220.9K
Zihao Ye retweetledi
Tanishq Kumar
Tanishq Kumar@tanishqkumar07·
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
English
133
455
4K
600K
Zihao Ye retweetledi
Matt
Matt@matt_dz·
I Fuzzed, and Vibe Fixed, the Vibed C Compiler john.regehr.org/writing/claude… by John Regehr (@regehr/116161100362503805" target="_blank" rel="nofollow noopener">mastodon.social/@regehr/116161…)
English
0
8
36
3.5K
Zihao Ye retweetledi
xjdr
xjdr@_xjdr·
─ Worked for 59m 13s ───────────────────── • Context compacted • I’m noticing we have many untracked files, which is quite overwhelming. Let me git reset and undo everything you just fucking worked on for the last hour.
English
31
24
1.6K
78.8K
Zihao Ye retweetledi
Prof. Anima Anandkumar
Prof. Anima Anandkumar@AnimaAnandkumar·
We’re excited to release TorchLean which is the first fully verified neural network framework in Lean. The Lean community has largely focused on pure mathematics. TorchLean expands this frontier toward verified neural network software and scientific computing. With the recent release of CSlib, we see this as another step toward a fully verified ML stack. We support features: 1. Executable IEEE-754 floating-point semantics (and extensible alternative FP models) verified tensor abstractions with precise shape/indexing semantics 2. Formally verified autograd system for differentiation of NN programs Proof-checked certification / verification algorithms like CROWN (robustness, bounds, etc.) 3. PyTorch-inspired modeling API with eager-style development + export/lowering to a shared IR for execution and verification Project page: leandojo.org/torchlean.html Paper: [2602.22631] TorchLean: Formalizing Neural Networks in Lean Work done @Robertljg, Jennifer Cruden, Xiangru Zhong, @huan_zhang12 and @AnimaAnandkumar. #MachineLearning #ScientificComputing #Lean
Prof. Anima Anandkumar tweet media
English
27
247
1.6K
135.7K
Zihao Ye retweetledi
Yifan Zhang
Yifan Zhang@yifan_zhang_·
⚡️Introducing FlashSampling: Fast and Memory-Efficient Exact Sampling ⚡️ flashsampling.github.io/FlashSampling/… Keep pushing the Frontier of Open Research in Superintelligence!
Yifan Zhang tweet media
English
5
32
259
28.7K
Zihao Ye retweetledi
xjdr
xjdr@_xjdr·
with our new GB300NVL72 training, not only is the codebase completely TP free it is now also completely nccl and nvshmem free . its a beautiful thing.
English
14
9
260
29.1K
Zihao Ye retweetledi
Lydia Hallie ✨
Lydia Hallie ✨@lydiahallie·
Excited to announce Claude for Open Source ❤️ We're giving 6 months of free Claude Max 20x to open source maintainers and core contributors. If you maintain a popular project or contribute across open source, please apply! claude.com/contact-sales/…
English
589
1.4K
12.6K
1.7M