Zhikun Xu

64 posts

Zhikun Xu banner
Zhikun Xu

Zhikun Xu

@JerrryKun

ri @AIatMeta, pursuing CS PhD @SCAI_ASU | Prev: Applied Math (B.S. & M.S.) @FudanUni, Research Internship @awscloud @AlibabaGroup @AMD | Opinions are my own.

Katılım Kasım 2018
136 Takip Edilen69 Takipçiler
Hen Davidov
Hen Davidov@hendav136·
Hen, first year PhD, pretraining an LLM with academia compute for the first time. Wish me luck.
English
21
3
301
44.2K
Zhikun Xu
Zhikun Xu@JerrryKun·
@wenhaocha1 Indeed. Tokens are becoming electricity. Acting is more expensive.
English
0
0
2
219
Wenhao Chai
Wenhao Chai@wenhaocha1·
The cost and speed of tokens don’t matter; the only thing that matters is performance. Today, the main cost of coding agents lies in rollouts, but once agentic systems permeate every industry, tool calls will become the dominant source of latency and expense. The cost of making a single mistake will far exceed the cost of thousands of API calls.
English
2
1
32
4.8K
Zhikun Xu
Zhikun Xu@JerrryKun·
That last part resonates with what motivated our ongoing project, ReuseRL: if a capability only internalizes when it is compressible, the real question becomes what gets internalized. We go looking for the atoms: the small set of reusable skills a model can absorb and build on.
English
1
0
2
112
Zhikun Xu
Zhikun Xu@JerrryKun·
Treating reasoning and acting as two tools for one job folds many debates (long context vs RAG, think vs act, etc.) into a single allocation question. And framing the internalization–externalization boundary as the next design question feels exactly right. Really inspiring read!
Hongru Wang@HongruWang007

Two students take the same exam. Both score 100 — one solved it himself, the other Googled every answer. A semester later, the gap is huge. That's the problem with today's AI agents. I write a detailed blog to share my recent thoughts on this, mainly based on Theory of Agents. I promise this is definitely worth 30 minutes of your time. Blog: notion.so/Second-Half-of… Project: hrwise-nlp.github.io/assets/website…

English
2
1
4
633
Zhikun Xu
Zhikun Xu@JerrryKun·
@Swarooprm7 We may need to define the "machine creativity/novelty" first. What about searching for some counterexamples in formal math? Concepts are created to simplify things, facilitate abstraction, and improve human understanding (efficiency, etc.). Why do AI models need novel concepts?
English
0
0
1
215
Swaroop Mishra
Swaroop Mishra@Swarooprm7·
How can we accurately evaluate whether an AI model has generated a genuinely novel concept? Is there a widely accepted benchmark for measuring machine creativity or invention?
English
8
2
22
5.3K
Zihan "Zenus" Wang
Zihan "Zenus" Wang@wzenus·
RAGEN-2 is selected as ICML oral! Congrats and great appreciation to all collaborators!!
Zihan "Zenus" Wang tweet media
English
9
16
158
14.8K
Zhikun Xu
Zhikun Xu@JerrryKun·
✨Check out the paper to learn more! This paper was done by our intern @Zijun0916 last summer. A big thank you to my labmate @XiaoYe1170354 and our advisor @BenZhou96 for the support and guidance!
English
0
0
0
202
Zhikun Xu
Zhikun Xu@JerrryKun·
📈 The results: Consistent gains over vanilla baselines, including up to +9.3% on in-domain Textbook problems and +9.6% on out-of-domain TheoremQA. We also did ablation experiments to show the results are consistent with different models and across different benchmarks.
English
1
0
0
243
Zhikun Xu
Zhikun Xu@JerrryKun·
LLMs memorize massive amounts of text, but can they actually apply this knowledge conceptually? 🤔 Our #ICLR 26' paper from the ARC Lab probes this in math reasoning! "CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap..." 🔗 openreview.net/forum?id=pRSRi…
Zhikun Xu tweet media
English
1
2
9
600
Bingxiang He
Bingxiang He@HBX_hbx·
@QuYuxiao Excellent work! I am curious what the difference is between POPE and QuestA, since both leveraging prefix of references to conduct curriculum learning.
English
2
0
3
527
Yuxiao Qu
Yuxiao Qu@QuYuxiao·
🚨 NEW PAPER: “POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration”! ❓ How do we train LLMs with RL on hard problems when the model never gets a single correct rollout? 💡 Short answer: standard RL is stuck. We show why, and introduce POPE to break this deadlock. 🧵[1/N]
Yuxiao Qu tweet media
English
9
34
235
44.8K
Denny Zhou
Denny Zhou@denny_zhou·
My daughter asked a fun question: are humans actually aliens on this planet? Her argument: the intelligence gap between humans and every other species is massive. No other life has built civilization. Dinosaurs ruled Earth for hundreds of millions of years and still produced no civilization. Yet humans did it in a few hundred thousand. My take: we probably aren’t aliens. Look at AI. The gap between AI 10 years ago and today can feel like dinosaurs vs humans. Back then it was mostly shallow mapping from input to output with statistical models (e.g., image/text classification ). Now AI can “think” for hours and solve genuinely hard problems.
English
28
1
127
20.1K
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
Was wondering whether GRPO style RL is "only a few tokens deep"... Intuitively, we take a next token predictor, and slightly upweigh some tokens, s.t. NTP leads to success Found this interesting ICLR sub preliminarily indicating this hypothesis is true: openreview.net/forum?id=8vWIX…
Shashwat Goel tweet media
English
2
9
92
8.3K
Zhikun Xu
Zhikun Xu@JerrryKun·
First time at #NeurIPS2025! I’ll be in San Diego from Dec 1–6 and would love to make new friends, grab some tea🍵, and discuss LLM reasoning (math & cognition-inspired), RL, and more! Feel free to DM!
English
0
0
6
642
Jasper
Jasper@zjasper·
We got deepseek-math-v2 running on 8xH200 node on @hyperbolic_labs on-demand GPU platform. Feel free to reply with any math problems that you want to know and I can share the answers. An exciting time to own the brain of one of the best mathematicians!
Jasper tweet media
clem 🤗@ClementDelangue

As far as I know, there isn't any chatbot or API that gives you access to an IMO 2025 gold-medalist model. Not only does this change today, but you get to download the weights with the Apache 2.0 open-source release of @deepseek_ai Math-V2 on @huggingface! Imagine owning the brain of one of the best mathematicians in the world for free to: - explore it for research - fine-tune it - optimize it - run it on your own hardware No limitations, no nerfing, no company or government to take it back. That's democratization of AI and knowledge at its best, literally 🤯🤯🤯 You can download the weights here: huggingface.co/deepseek-ai/De…. The frontier of AI is open-source!

English
29
16
232
33.6K
Taiwei Shi
Taiwei Shi@taiwei_shi·
I will be at NeurIPS next week. ✨ Looking forward to catching up with friends old and new. Happy to talk about reinforcement learning, test-time training, long-term memory, self-evolving agent, and so much more! DMs are open. 😄
Taiwei Shi@taiwei_shi

Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n

English
5
4
91
13.8K