Zhikun Xu

64 posts

Zhikun Xu

@JerrryKun

ri @AIatMeta, pursuing CS PhD @SCAI_ASU | Prev: Applied Math (B.S. & M.S.) @FudanUni, Research Internship @awscloud @AlibabaGroup @AMD | Opinions are my own.

Katılım Kasım 2018

136 Takip Edilen69 Takipçiler

Zhikun Xu@JerrryKun·10 Haz

@xwang_lk

QME

131

Xin Eric Wang@xwang_lk·10 Haz

Legit statement from someone who once encouraged their employees to tokenmax.

Amazon Web Services@awscloud

More AI-generated code doesn't make your team faster. It might actually slow you down.

English

120

10.6K

Zhikun Xu@JerrryKun·9 Haz

@hendav136 🫡 (pretraining for what?

English

1.4K

Hen Davidov@hendav136·8 Haz

Hen, first year PhD, pretraining an LLM with academia compute for the first time. Wish me luck.

English

301

44.2K

Zhikun Xu@JerrryKun·8 Haz

@wenhaocha1 Indeed. Tokens are becoming electricity. Acting is more expensive.

English

219

Wenhao Chai@wenhaocha1·8 Haz

The cost and speed of tokens don’t matter; the only thing that matters is performance. Today, the main cost of coding agents lies in rollouts, but once agentic systems permeate every industry, tool calls will become the dominant source of latency and expense. The cost of making a single mistake will far exceed the cost of thousands of API calls.

English

4.8K

Zhikun Xu@JerrryKun·7 Haz

arxiv.org/abs/2605.31509

ZXX

Zhikun Xu@JerrryKun·7 Haz

That last part resonates with what motivated our ongoing project, ReuseRL: if a capability only internalizes when it is compressible, the real question becomes what gets internalized. We go looking for the atoms: the small set of reusable skills a model can absorb and build on.

English

112

Zhikun Xu@JerrryKun·7 Haz

Treating reasoning and acting as two tools for one job folds many debates (long context vs RAG, think vs act, etc.) into a single allocation question. And framing the internalization–externalization boundary as the next design question feels exactly right. Really inspiring read!

Hongru Wang@HongruWang007

Two students take the same exam. Both score 100 — one solved it himself, the other Googled every answer. A semester later, the gap is huge. That's the problem with today's AI agents. I write a detailed blog to share my recent thoughts on this, mainly based on Theory of Agents. I promise this is definitely worth 30 minutes of your time. Blog: notion.so/Second-Half-of… Project: hrwise-nlp.github.io/assets/website…

English

633

Zhikun Xu@JerrryKun·4 Haz

@Swarooprm7 We may need to define the "machine creativity/novelty" first. What about searching for some counterexamples in formal math? Concepts are created to simplify things, facilitate abstraction, and improve human understanding (efficiency, etc.). Why do AI models need novel concepts?

English

215

Swaroop Mishra@Swarooprm7·4 Haz

How can we accurately evaluate whether an AI model has generated a genuinely novel concept? Is there a widely accepted benchmark for measuring machine creativity or invention?

English

5.3K

Zhikun Xu@JerrryKun·25 May

@wzenus @YejinChoinka @jiajunwu_cs @ManlingLi_ @LINJIEFUN @chi_gui_1 @DeimosGN @qineng_wang @James_KKW @shiqi_chen17 @zhengyuan_yang Well-deserved👍

English

Zihan "Zenus" Wang@wzenus·24 May

RAGEN-2 is selected as ICML oral! Congrats and great appreciation to all collaborators!!

English

158

14.8K

Zhikun Xu@JerrryKun·22 Nis

✨Check out the paper to learn more! This paper was done by our intern @Zijun0916 last summer. A big thank you to my labmate @XiaoYe1170354 and our advisor @BenZhou96 for the support and guidance!

English

202

Zhikun Xu@JerrryKun·22 Nis

📈 The results: Consistent gains over vanilla baselines, including up to +9.3% on in-domain Textbook problems and +9.6% on out-of-domain TheoremQA. We also did ablation experiments to show the results are consistent with different models and across different benchmarks.

English

243

Zhikun Xu@JerrryKun·22 Nis

LLMs memorize massive amounts of text, but can they actually apply this knowledge conceptually? 🤔 Our #ICLR 26' paper from the ARC Lab probes this in math reasoning! "CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap..." 🔗 openreview.net/forum?id=pRSRi…

English

600

Zhikun Xu@JerrryKun·31 Oca

@HBX_hbx @QuYuxiao Besides QuestA, there are also many other related works using a similar idea from last year: BREAD(arxiv.org/abs/2506.17211), Scaf-GRPO(arxiv.org/abs/2510.19807), and CORE(arxiv.org/abs/2512.18857). "Guided prefix" could be partial oracle solutions, problem-related concepts, etc.

English

Bingxiang He@HBX_hbx·31 Oca

@QuYuxiao Excellent work! I am curious what the difference is between POPE and QuestA, since both leveraging prefix of references to conduct curriculum learning.

English

527

Yuxiao Qu@QuYuxiao·30 Oca

🚨 NEW PAPER: “POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration”! ❓ How do we train LLMs with RL on hard problems when the model never gets a single correct rollout? 💡 Short answer: standard RL is stuck. We show why, and introduce POPE to break this deadlock. 🧵[1/N]

English

235

44.8K

Zhikun Xu@JerrryKun·7 Oca

@denny_zhou Fake agi: Gemini🤖 Real agi: daughter🧒

English

314

Denny Zhou@denny_zhou·7 Oca

My daughter asked a fun question: are humans actually aliens on this planet? Her argument: the intelligence gap between humans and every other species is massive. No other life has built civilization. Dinosaurs ruled Earth for hundreds of millions of years and still produced no civilization. Yet humans did it in a few hundred thousand. My take: we probably aren’t aliens. Look at AI. The gap between AI 10 years ago and today can feel like dinosaurs vs humans. Back then it was mostly shallow mapping from input to output with statistical models (e.g., image/text classification ). Now AI can “think” for hours and solve genuinely hard problems.

English

127

20.1K

Zhikun Xu@JerrryKun·4 Ara

Highly resonate about this. Conceptual reasoning should be very important for LLM reasoning.

Denny Zhou@denny_zhou

To the questions of “why not both?”: my dream is for LLMs to make conceptual discoveries, like Galois with group theory or Einstein with general relativity. I don’t believe breakthroughs like these would come from A* search or its more advanced version MCTS.

English

780

Zhikun Xu@JerrryKun·4 Ara

@ShashwatGoel7 👀 arxiv.org/abs/2310.12962

QME

518

Shashwat Goel@ShashwatGoel7·4 Ara

Was wondering whether GRPO style RL is "only a few tokens deep"... Intuitively, we take a next token predictor, and slightly upweigh some tokens, s.t. NTP leads to success Found this interesting ICLR sub preliminarily indicating this hypothesis is true: openreview.net/forum?id=8vWIX…

English

8.3K

Zhikun Xu@JerrryKun·1 Ara

First time at #NeurIPS2025! I’ll be in San Diego from Dec 1–6 and would love to make new friends, grab some tea🍵, and discuss LLM reasoning (math & cognition-inspired), RL, and more! Feel free to DM!

English

642

Zhikun Xu@JerrryKun·29 Kas

@zjasper @hyperbolic_labs github.com/THUKElab/COUNT… 👀how about testing it with some counterexamples?

English

212

Jasper@zjasper·28 Kas

We got deepseek-math-v2 running on 8xH200 node on @hyperbolic_labs on-demand GPU platform. Feel free to reply with any math problems that you want to know and I can share the answers. An exciting time to own the brain of one of the best mathematicians!

clem 🤗@ClementDelangue

As far as I know, there isn't any chatbot or API that gives you access to an IMO 2025 gold-medalist model. Not only does this change today, but you get to download the weights with the Apache 2.0 open-source release of @deepseek_ai Math-V2 on @huggingface! Imagine owning the brain of one of the best mathematicians in the world for free to: - explore it for research - fine-tune it - optimize it - run it on your own hardware No limitations, no nerfing, no company or government to take it back. That's democratization of AI and knowledge at its best, literally 🤯🤯🤯 You can download the weights here: huggingface.co/deepseek-ai/De…. The frontier of AI is open-source!

English

232

33.6K

Zhikun Xu@JerrryKun·26 Kas

@taiwei_shi NeurIPS+1! Hope to talk with u🙋

English

159

Taiwei Shi@taiwei_shi·25 Kas

I will be at NeurIPS next week. ✨ Looking forward to catching up with friends old and new. Happy to talk about reinforcement learning, test-time training, long-term memory, self-evolving agent, and so much more! DMs are open. 😄

Taiwei Shi@taiwei_shi

Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n

English

13.8K

Keşfet

@xwang_lk @hendav136 @wenhaocha1 @Swarooprm7 @wzenus @YejinChoinka @jiajunwu_cs @ManlingLi_