Nan Zhang

133 posts

Nan Zhang

Nan Zhang

@NanZhangNLP

PhD Student @ISTatPENNSTATE, NLP #NLProc, ML, AI. Ex-intern @SFResearch, @NECLabsAmerica On the job market for research scientist and postdoc positions!

State College, PA Katılım Aralık 2015
538 Takip Edilen133 Takipçiler
Sabitlenmiş Tweet
Nan Zhang
Nan Zhang@NanZhangNLP·
Are there simple and more effective compression signals than activation and 2nd-order info? Yes, fine-tuning traces. 📢Introducing QuantLRM, quantization of large reasoning models via fine-tuning signals. Code: github.com/psunlpgroup/Qu… Paper: arxiv.org/abs/2602.02581 (1/N)🧵
English
1
6
12
1K
Nan Zhang retweetledi
Rameswar Panda
Rameswar Panda@rpanda89·
We are looking for exceptional candidates to join the core Granite LLM team at IBM Research. If you’re interested, please fill out this form: 👉 forms.gle/DFoBFTAS7Cb2ws…
English
12
23
355
37K
Nan Zhang retweetledi
Ryo Kamoi
Ryo Kamoi@RyoKamoi·
🎉 Our paper has been accepted to ACL 2026 Findings! We propose FoVer, an efficient method for creating accurate PRM training data using formal verification tools 🚀 Our tool-annotated formal synthetic training data improves PRMs on math, NLI, and BBH 😳 arxiv.org/abs/2505.15960
Ryo Kamoi tweet media
English
2
7
74
9.6K
Nan Zhang
Nan Zhang@NanZhangNLP·
QuantLRM delivers a consistent improvement, with an average improvement of 6.55% on 3-bit Olmo-3-7B-Think (an RL fine-tuned model). QuantLRM also supports non-fine-tuned models by gathering updates via pseudo-fine-tuning, which demonstrates wide applicability. (6/N)🧵
Nan Zhang tweet media
English
1
1
2
50
Nan Zhang
Nan Zhang@NanZhangNLP·
Are there simple and more effective compression signals than activation and 2nd-order info? Yes, fine-tuning traces. 📢Introducing QuantLRM, quantization of large reasoning models via fine-tuning signals. Code: github.com/psunlpgroup/Qu… Paper: arxiv.org/abs/2602.02581 (1/N)🧵
English
1
6
12
1K
Nan Zhang retweetledi
Mayank Vora
Mayank Vora@aiwithmayank·
OpenAI and Anthropic engineers leaked a prompting technique that separates beginners from experts. It's called "Socratic prompting" and it's insanely simple. Instead of telling the AI what to do, you ask it questions. My output quality: 6.2/10 → 9.1/10 Here's how it works:
English
129
529
6.6K
1.7M
Nan Zhang
Nan Zhang@NanZhangNLP·
A huge shoutout to my amazing collaborators, Eugene Kwek, @YusenZhangNLP, @HiuNguy71624401, Prasenjit Mitra, and @ruizhang_nlp! I am also on the job market now for research scientist and postdoc positions! If you believe I might be a good fit, I am happy to connect!🤝 (N/N)
English
0
0
2
140
Nan Zhang
Nan Zhang@NanZhangNLP·
🧐(3) Current quantization overly compresses the final-layer modules and MLP gate, so protecting just 2% of weights that are overly compressed can raise average accuracy by 6.57%, greatly surpassing SOTA. Our 3 findings generalize well to R1 and GPT-OSS distilled models! (5/N)🧵
Nan Zhang tweet media
English
1
1
1
112
Nan Zhang retweetledi
Noam Brown
Noam Brown@polynoamial·
I'm often asked how to land a research job at a frontier AI lab. It's hard, especially without a research background, but I like to point to @kellerjordan0 as an example showing it can be done. Keller graduated from UCSD with no publication record and was working at an AI content moderation startup when he landed a cold call with @bneyshabur (who was at Google) and presented an idea to improve upon Behnam's recent paper. Behnam agreed to mentor him, which led to an ICLR paper. Sadly there's less open research today, but improving upon a researcher's published work is a great way to demonstrate excellence to someone inside a lab and give them the conviction to advocate for an interview. Later, Keller got on @OpenAI's radar thanks to the NanoGPT speed run he started. All his work was documented and it was easy to measure his success, so the case for hiring him was strong. Keller is one example, but there's plenty of other success stories as well: 🧵
Andrej Karpathy@karpathy

nanoGPT speedrun: Nice work from @kellerjordan0 adapting the nanoGPT/llmc PyTorch training code into a benchmark training a 124M Transformer to a fixed validation loss target. Current SOTA is 3.8X more token-efficient training (2.7B vs. 10B tokens)

English
51
177
2.9K
705.6K
Nan Zhang retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
I have been pretty heads-down this year to finish Chapter 6 on implementing reinforcement learning with verifiable rewards from scratch (using GRPO). I just finished it this weekend, and I'd say it's the best (or at least my favorite) chapter yet! The goal of this chapter is to explain and implement GRPO from the bottom up. This means coding and walking through each GRPO step one by one (advantages, rewards, logprobs, and loss) and then training a 0.6B base model on the 12k examples from the MATH training set. (This takes the model from 15% to 47% accuracy on the MATH-500 test set, which is about as good as the official Qwen3 reasoning model of similar size.) The focus is on readability and understanding GRPO, but the supplementary materials also contain scripts to run it in a multi-GPU setting. The code notebook is already available on GitHub if you want to take a look: github.com/rasbt/reasonin…. (And the full chapter should make it to the early access version of the book at mng.bz/Nwr7 soon!) PS: The next chapter will introduce additional tips and tricks to improve the GRPO algorithm for better and more stable training behavior.
Sebastian Raschka tweet media
English
47
295
2.1K
133.7K
Nan Zhang retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Another really interesting paper from my 2025 bookmarked papers: On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (arxiv.org/abs/2512.07783). In short, RL is most effective when applied to data that is neither too close to nor too far from the pre-training distribution. If the data is too in-distribution, RL adds little beyond supervised training. If it is too far out-of-distribution, RL struggles because the model lacks the necessary priors. This has been known before, but it's nice to see it formalized with data and figures to reference.
Sebastian Raschka tweet media
English
21
141
915
49.2K