Nan Zhang

133 posts

Nan Zhang

@NanZhangNLP

PhD Student @ISTatPENNSTATE, NLP #NLProc, ML, AI. Ex-intern @SFResearch, @NECLabsAmerica On the job market for research scientist and postdoc positions!

State College, PA Katılım Aralık 2015

538 Takip Edilen133 Takipçiler

Sabitlenmiş Tweet

Nan Zhang@NanZhangNLP·12 Şub

Are there simple and more effective compression signals than activation and 2nd-order info? Yes, fine-tuning traces. 📢Introducing QuantLRM, quantization of large reasoning models via fine-tuning signals. Code: github.com/psunlpgroup/Qu… Paper: arxiv.org/abs/2602.02581 (1/N)🧵

English

Nan Zhang retweetledi

Rameswar Panda@rpanda89·14 Nis

We are looking for exceptional candidates to join the core Granite LLM team at IBM Research. If you’re interested, please fill out this form: 👉 forms.gle/DFoBFTAS7Cb2ws…

English

355

37K

Nan Zhang retweetledi

Ryo Kamoi@RyoKamoi·10 Nis

🎉 Our paper has been accepted to ACL 2026 Findings! We propose FoVer, an efficient method for creating accurate PRM training data using formal verification tools 🚀 Our tool-annotated formal synthetic training data improves PRMs on math, NLI, and BBH 😳 arxiv.org/abs/2505.15960

English

9.6K

Nan Zhang retweetledi

Amy Tam@amytam01·17 Şub

x.com/i/article/2023…

ZXX

185

827

7.6K

2.7M

Nan Zhang@NanZhangNLP·12 Şub

Shoutout to my amazing collaborators, Eugene Kwek, @YusenZhangNLP, Muyu Pan, Suhang Wang, Prasenjit Mitra, and @ruizhang_nlp! We have released the 3-bit pseudo-quantized models by QuantLRM on Hugging Face: huggingface.co/models?other=a….🤗 (N/N)

English

Nan Zhang@NanZhangNLP·12 Şub

QuantLRM delivers a consistent improvement, with an average improvement of 6.55% on 3-bit Olmo-3-7B-Think (an RL fine-tuned model). QuantLRM also supports non-fine-tuned models by gathering updates via pseudo-fine-tuning, which demonstrates wide applicability. (6/N)🧵

English

Nan Zhang@NanZhangNLP·12 Şub

English

Nan Zhang retweetledi

Mayank Vora@aiwithmayank·7 Şub

OpenAI and Anthropic engineers leaked a prompting technique that separates beginners from experts. It's called "Socratic prompting" and it's insanely simple. Instead of telling the AI what to do, you ask it questions. My output quality: 6.2/10 → 9.1/10 Here's how it works:

English

129

529

6.6K

1.7M

Nan Zhang@NanZhangNLP·3 Şub

A huge shoutout to my amazing collaborators, Eugene Kwek, @YusenZhangNLP, @HiuNguy71624401, Prasenjit Mitra, and @ruizhang_nlp! I am also on the job market now for research scientist and postdoc positions! If you believe I might be a good fit, I am happy to connect!🤝 (N/N)

English

140

Nan Zhang@NanZhangNLP·3 Şub

🧐(3) Current quantization overly compresses the final-layer modules and MLP gate, so protecting just 2% of weights that are overly compressed can raise average accuracy by 6.57%, greatly surpassing SOTA. Our 3 findings generalize well to R1 and GPT-OSS distilled models! (5/N)🧵

English

112

Nan Zhang@NanZhangNLP·3 Şub

📢Excited our benchmarking and interpretation analysis of compressed large reasoning models (LRMs) has been accepted by #ICLR2026! We study how LRMs' reasoning capabilities are compressed during compression. Code: github.com/psunlpgroup/Co… Paper: arxiv.org/abs/2504.02010 (1/N)🧵

English

3.1K

Nan Zhang retweetledi

Noam Brown@polynoamial·22 Oca

I'm often asked how to land a research job at a frontier AI lab. It's hard, especially without a research background, but I like to point to @kellerjordan0 as an example showing it can be done. Keller graduated from UCSD with no publication record and was working at an AI content moderation startup when he landed a cold call with @bneyshabur (who was at Google) and presented an idea to improve upon Behnam's recent paper. Behnam agreed to mentor him, which led to an ICLR paper. Sadly there's less open research today, but improving upon a researcher's published work is a great way to demonstrate excellence to someone inside a lab and give them the conviction to advocate for an interview. Later, Keller got on @OpenAI's radar thanks to the NanoGPT speed run he started. All his work was documented and it was easy to measure his success, so the case for hiring him was strong. Keller is one example, but there's plenty of other success stories as well: 🧵

Andrej Karpathy@karpathy

nanoGPT speedrun: Nice work from @kellerjordan0 adapting the nanoGPT/llmc PyTorch training code into a benchmark training a 124M Transformer to a fixed validation loss target. Current SOTA is 3.8X more token-efficient training (2.7B vs. 10B tokens)

English

177

2.9K

705.6K

Nan Zhang retweetledi

Sebastian Raschka@rasbt·18 Oca

I have been pretty heads-down this year to finish Chapter 6 on implementing reinforcement learning with verifiable rewards from scratch (using GRPO). I just finished it this weekend, and I'd say it's the best (or at least my favorite) chapter yet! The goal of this chapter is to explain and implement GRPO from the bottom up. This means coding and walking through each GRPO step one by one (advantages, rewards, logprobs, and loss) and then training a 0.6B base model on the 12k examples from the MATH training set. (This takes the model from 15% to 47% accuracy on the MATH-500 test set, which is about as good as the official Qwen3 reasoning model of similar size.) The focus is on readability and understanding GRPO, but the supplementary materials also contain scripts to run it in a multi-GPU setting. The code notebook is already available on GitHub if you want to take a look: github.com/rasbt/reasonin…. (And the full chapter should make it to the early access version of the book at mng.bz/Nwr7 soon!) PS: The next chapter will introduce additional tips and tricks to improve the GRPO algorithm for better and more stable training behavior.

English

295

2.1K

133.7K

Nan Zhang retweetledi

Sebastian Raschka@rasbt·2 Oca

Another really interesting paper from my 2025 bookmarked papers: On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (arxiv.org/abs/2512.07783). In short, RL is most effective when applied to data that is neither too close to nor too far from the pre-training distribution. If the data is too in-distribution, RL adds little beyond supervised training. If it is too far out-of-distribution, RL struggles because the model lacks the necessary priors. This has been known before, but it's nice to see it formalized with data and figures to reference.

English

141

915

49.2K

Keşfet

@YusenZhangNLP @ruizhang_nlp @kellerjordan0 @bneyshabur @OpenAI @elonmusk @BarackObama @taylorswift13