Ruixiang Zhang

99 posts

Ruixiang Zhang

Ruixiang Zhang

@onloglogn

Research @Apple MLR, PhD @Mila_Quebec. Prev @GoogleDeepMind @preferred_jp

Katılım Haziran 2017
1.3K Takip Edilen601 Takipçiler
Sabitlenmiş Tweet
Ruixiang Zhang
Ruixiang Zhang@onloglogn·
At #NeurIPS2025 Tue-Fri presenting 3 papers from our 🍎Apple ML research team. Interested in LLM, RL, reasoning, and diffusion LLMs. We also have FY26 research intern and full-time positions available. DM me if interested for a chat!
English
4
6
104
9.2K
Ruixiang Zhang
Ruixiang Zhang@onloglogn·
Thanks @BoWang87 for posting our work! We have released our model checkpoints on 🤗 at huggingface.co/collections/ap… Please also checkout our detailed thread on this work at x.com/YizheZhangNLP/…
Bo Wang@BoWang87

Apple Research just published something really interesting about post-training of coding models. You don't need a better teacher. You don't need a verifier. You don't need RL. A model can just… train on its own outputs. And get dramatically better. Simple Self-Distillation (SSD): sample solutions from your model, don't filter them for correctness at all, fine-tune on the raw outputs. That's it. Qwen3-30B-Instruct: 42.4% → 55.3% pass@1 on LiveCodeBench. +30% relative. On hard problems specifically, pass@5 goes from 31.1% → 54.1%. Works across Qwen and Llama, at 4B, 8B, and 30B. One sample per prompt is enough. No execution environment. No reward model. No labels. SSD sidesteps this by reshaping distributions in a context-dependent way — suppressing distractors at locks while keeping diversity alive at forks. The capability was already in the model. Fixed decoding just couldn't access it. The implication: a lot of coding models are underperforming their own weights. Post-training on self-generated data isn't just a cheap trick — it's recovering latent capacity that greedy decoding leaves on the table. paper: arxiv.org/abs/2604.01193 code: github.com/apple/ml-ssd

English
0
5
14
2.1K
Ruixiang Zhang retweetledi
Ruixiang Zhang retweetledi
Ruixiang Zhang retweetledi
Yizhe Zhang @ ICLR 2026
Yizhe Zhang @ ICLR 2026@YizheZhangNLP·
1/6 The "Self-Improvement" Paradox Can an LLM get smarter using only its own raw, unverified outputs? No verifiers. No teachers. No RL. We found the answer is an emphatic YES. Introducing SimpleSD: Embarrassingly Simple Self-Distillation. By simply sampling solutions from a model with specific temperature and truncation settings and then fine tuning the model on those exact samples, Qwen3-30B jumped from 42.4% to 55.3% (30% improvement) on LiveCodeBench v6 just by training on its own samples! 🚀 The gain is universal across different model sizes (4B, 8B, 30B) and model families (Llama, Qwen). The harder the problem is, the larger the gain. 📈 Kudos to my amazing colleagues @onloglogn, @richard_baihe, @UnderGroundJeg, Navdeep Jaitly, @trebolloc.  Check out the paper and code below! 👇 paper: arxiv.org/abs/2604.01193 code: github.com/apple/ml-ssd HF models: huggingface.co/collections/ap…
Yizhe Zhang @ ICLR 2026 tweet media
English
7
29
189
15.8K
Ruixiang Zhang retweetledi
Yizhe Zhang @ ICLR 2026
Yizhe Zhang @ ICLR 2026@YizheZhangNLP·
Latent reasoning is an interesting domain. It bridges continuous and discrete modalities, and bridges autoregressive and non-autoregressive thinking.
Yann LeCun@ylecun

@elonmusk Thinking in language has limited applications, largely in coding and mathematics where the language itself can help reasoning. But, as I've been saying for years, thinking manipulates mental models in abstract (continuous) representation space. Soooo, xAI gonna use JEPA now?

English
0
2
8
1.3K
Ruixiang Zhang retweetledi
Shuangfei Zhai
Shuangfei Zhai@zhaisf·
Say hi to Exclusive Self Attention (XSA), a (nearly) free improvement to Transformers for LM. Observation: for y = attn(q, k, v), yᵢ and vᵢ tend to have a very high cosine similarity Fix: exclude vᵢ from yᵢ via zᵢ = yᵢ - (yᵢᵀvᵢ)vᵢ/‖vᵢ‖² Result: better training/val loss across model sizes; increasing gains as sequence length grows. See more: arxiv.org/abs/2603.09078
Shuangfei Zhai tweet media
English
33
84
948
222K
Ruixiang Zhang retweetledi
Ruixiang Zhang retweetledi
Ruixiang Zhang retweetledi
Yuyang Wang
Yuyang Wang@YuyangW95·
Now accepted to ICLR 2026! Check our repo if interested in how a "simple" flow-matching model with standard Transformer works for protein folding: github.com/apple/ml-simpl…. See you in Brazil!
Yuyang Wang@YuyangW95

New preprint & open-source! 🚨 “SimpleFold: Folding Proteins is Simpler than You Think” (arxiv.org/abs/2509.18480). We ask: Do protein folding models really need expensive and domain-specific modules like pair representation? We build SimpleFold, a 3B scalable folding model solely built on general-purpose transformers + flow matching, and is trained on 9M structures. SimpleFold supports easy deployment and efficient inference on consumer-level hardware with PyTorch/MLX (try it on your MacBook!) (1/n)

English
1
10
58
6.9K
Ruixiang Zhang retweetledi
Yuyang Wang
Yuyang Wang@YuyangW95·
I’ll present SimpleFold at MLSB workshop tomorrow. Come by if interested!
Yuyang Wang@YuyangW95

New preprint & open-source! 🚨 “SimpleFold: Folding Proteins is Simpler than You Think” (arxiv.org/abs/2509.18480). We ask: Do protein folding models really need expensive and domain-specific modules like pair representation? We build SimpleFold, a 3B scalable folding model solely built on general-purpose transformers + flow matching, and is trained on 9M structures. SimpleFold supports easy deployment and efficient inference on consumer-level hardware with PyTorch/MLX (try it on your MacBook!) (1/n)

English
2
8
68
8.8K
Ruixiang Zhang retweetledi
Shuangfei Zhai
Shuangfei Zhai@zhaisf·
Check out the new addition to our TarFlow franchise. TLDR: normalizing flows “just work” for generating videos. This adds another strong evidence to our argument that NFs are capable generative models; and I’m now more convinced than ever that they will continue working better.
Jiatao Gu@thoma_gu

STARFlow gets an upgrade—it now works on videos🎥 We present STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows, a invertible, causal video generator built on autoregressive flows! 📄 Paper huggingface.co/papers/2511.20… 💻 Code github.com/apple/ml-starf… (1/10)

English
0
15
76
12.5K
Ruixiang Zhang
Ruixiang Zhang@onloglogn·
At #NeurIPS2025 Tue-Fri presenting 3 papers from our 🍎Apple ML research team. Interested in LLM, RL, reasoning, and diffusion LLMs. We also have FY26 research intern and full-time positions available. DM me if interested for a chat!
English
4
6
104
9.2K
Ruixiang Zhang retweetledi
Yuyang Wang
Yuyang Wang@YuyangW95·
I’ll be at San Diego attending #NeurIPS2025 Dec 3-7. DM me if interested in diffusion model, multimodal, protein generative models! We’re looking for FTE to join us working on generative models. You can also find me at Apple  booth on Dec 3 3-5pm.
Yuyang Wang@YuyangW95

New preprint & open-source! 🚨 “SimpleFold: Folding Proteins is Simpler than You Think” (arxiv.org/abs/2509.18480). We ask: Do protein folding models really need expensive and domain-specific modules like pair representation? We build SimpleFold, a 3B scalable folding model solely built on general-purpose transformers + flow matching, and is trained on 9M structures. SimpleFold supports easy deployment and efficient inference on consumer-level hardware with PyTorch/MLX (try it on your MacBook!) (1/n)

English
3
6
51
8.1K
Ruixiang Zhang
Ruixiang Zhang@onloglogn·
Why is Opus 4.5 no-thinking better than 64k-thinking on swebench here?
Ruixiang Zhang tweet media
English
0
0
3
407
Ruixiang Zhang
Ruixiang Zhang@onloglogn·
Is it just me or codex is down again…?
English
2
0
10
631