Kairong Luo ✈️ ICLR2026 (@openhonor) - Twitter Profile

Kairong Luo ✈️ ICLR2026@openhonor·2d

✈️ Heading to ICLR 🇧🇷 Apr 22–27. Come to our oral on Fri, Apr 24 (10:30 AM–12:00 PM, Room 202 A/B) or find me at our poster (3:15 PM–5:45 PM, P3-#521). We study why LR decay can hurt curriculum-based LLM pretraining — and how to fix it. Happy to chat!

English

0

7

2.5K

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🙏 Great honor to collaborate with @BranSun10, @Dunk_KD1998, @Harry_Chen_, with advice from Professor Kaifeng Lyu @vfleaking, and under the support and leadership of Professor Wenguang Chen. Thanks to all contributors who made this work possible!

English

0

79

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🚀 Announcing PCMind-2.1-Kaiyuan-2B A new frontier for fully open-source models. Not just weights—full pretraining pipeline & recipe. Specs: 2B params, 2.2T tokens Approach: data-centric pretraining Status: SOTA among fully-open models 🤗 HF: huggingface.co/thu-pacman/PCM…

English

9

0

6

161

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🔓 Resources We believe in True Open Source: Weights, Data, and Recipes. 📄 Report: arxiv.org/pdf/2512.07612 🤗 Model: huggingface.co/thu-pacman/PCM… 📚 Data: huggingface.co/datasets/thu-p… ⚙️ Data Code: github.com/thu-pacman/Kai… 🏋️ Train Code: github.com/thu-pacman/kai…

English

0

58

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

⚙️ Infrastructure: Kaiyuan-Spark Built on Spark & Chukonu (pacman.cs.tsinghua.edu.cn/~cwg/publicati…) for scale. - Capabilities: Massive deduplication & mixing. - Speed: Optimized C++ kernels. - Reproducibility: Reconstruct our exact training set via config files.

English

0

44

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🛠️ Engineering: "Hard Mode" (FP16) Training on FP16-only hardware risks divergence. We modified the architecture for maximum stability: - Sandwich Normalization (controls residual growth) - Logits Soft-Capping (prevents extreme values) - QK-Norm

English

0

37

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🏆 Evaluation Results KAIYUAN-2B pushes the fully open-source boundary. ✅ Beats: SmolLM2-1.7B & OLMo-2-1B ✅ Matches: Larger models like YuLan-Mini (2.4B) ⚔️ Approaches: Open-weight leaders (Qwen2-1.5B / Llama3.2-3B) 💪 Exceptionally strong in Chinese, Math, and Code.

English

0

39

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

📈 Innovation 3: Quality Curriculum Samples sorted by quality (ascending), then interleaved globally. - Progressive Exposure: Model sees "textbook quality" data only when mature. - Stable Mix: Domain ratios (Chinese/Code/Math) remain fixed while quality ramps up.

English

0

31

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🔄 Innovation 2: Strategic Repetition High-quality data is finite. We use a multi-phase approach to repeat the best data without overfitting. Method: Retain top 50% → 30% → 10% in later phases. Result: Top 10% samples seen 4x; low-quality samples seen only once.

English

0

29

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

📊 Innovation 1: Quantile Probing Stop blind filtering. Start systematic probing. We trained reference models on data subsets across quality quantiles (top 15%... 75%). Insight: Quality is task-dependent. FineWeb-Edu 👑 Knowledge (MMLU) DCLM-Baseline 👑 Reasoning (WinoGrande)

English

0

30

Kairong Luo ✈️ ICLR2026@openhonor·10 Ara

🧐Challenge: Heterogeneity & Scarcity Open datasets (DCLM, FineWeb) are great but vastly different. High-quality tokens are potent but rare. How to compare/mix heterogeneous sources? How to max efficiency with sparse "gold" data? Focus on these and run data-centric training.👇

English

0

40

Kairong Luo ✈️ ICLR2026@openhonor·25 Nis

📢 Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session: 🗓 Friday, April 25 🕒 3:00 PM – 5:30 PM CST 📍 Hall 3 + Hall 2B, Poster #237 Expect your feedback!

Kairong Luo ✈️ ICLR2026@openhonor

🔍How does pretraining loss evolve under different LR schedules? 🌟Meet our Multi-Power Law: predicts the full loss curve for various schedules! 🌟Accurate enough to optimize LR schedules directly. 🌟Result? A WSD-like schedule that outperforms the rest! 🔥Accepted at #ICLR2025

English

1

3

3.8K

Kairong Luo ✈️ ICLR2026@openhonor·24 Nis

📢 Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session: 🗓 Friday, April 25 🕒 3:00 PM – 5:30 PM CST 📍 Hall 3 + Hall 2B, Poster #237 Expect your feedback!

English

0

7

206

Kairong Luo ✈️ ICLR2026@openhonor·18 Mar

🔹 Using predicted final loss as a surrogate objective, we induce an optimized schedule—matching WSD (Hu et al., 2024) in shape but achieving even lower loss!

English

0

4

321

Kairong Luo ✈️ ICLR2026@openhonor·18 Mar

💡 Results at a glance: 🔹 Our law is fitted on the schedules in the first row—then accurately predicts loss curves for unseen schedules in the second row!

English

1

0

3

455

Kairong Luo ✈️ ICLR2026@openhonor·18 Mar

🔍How does pretraining loss evolve under different LR schedules? 🌟Meet our Multi-Power Law: predicts the full loss curve for various schedules! 🌟Accurate enough to optimize LR schedules directly. 🌟Result? A WSD-like schedule that outperforms the rest! 🔥Accepted at #ICLR2025

English

2

3

29

18.9K

Kairong Luo ✈️ ICLR2026

Tuklasin