Kairong Luo ✈️ ICLR2026

19 posts

Kairong Luo ✈️ ICLR2026

Kairong Luo ✈️ ICLR2026

@openhonor

PhD Student @ Tsinghua University | Researching LLM

Sumali Şubat 2025
109 Sinusundan86 Mga Tagasunod
Kairong Luo ✈️ ICLR2026
✈️ Heading to ICLR 🇧🇷 Apr 22–27. Come to our oral on Fri, Apr 24 (10:30 AM–12:00 PM, Room 202 A/B) or find me at our poster (3:15 PM–5:45 PM, P3-#521). We study why LR decay can hurt curriculum-based LLM pretraining — and how to fix it. Happy to chat!
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
7
2.5K
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🚀 Announcing PCMind-2.1-Kaiyuan-2B A new frontier for fully open-source models. Not just weights—full pretraining pipeline & recipe. Specs: 2B params, 2.2T tokens Approach: data-centric pretraining Status: SOTA among fully-open models 🤗 HF: huggingface.co/thu-pacman/PCM…
Kairong Luo ✈️ ICLR2026 tweet media
English
9
0
6
161
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🛠️ Engineering: "Hard Mode" (FP16) Training on FP16-only hardware risks divergence. We modified the architecture for maximum stability: - Sandwich Normalization (controls residual growth) - Logits Soft-Capping (prevents extreme values) - QK-Norm
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
0
37
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🏆 Evaluation Results KAIYUAN-2B pushes the fully open-source boundary. ✅ Beats: SmolLM2-1.7B & OLMo-2-1B ✅ Matches: Larger models like YuLan-Mini (2.4B) ⚔️ Approaches: Open-weight leaders (Qwen2-1.5B / Llama3.2-3B) 💪 Exceptionally strong in Chinese, Math, and Code.
Kairong Luo ✈️ ICLR2026 tweet mediaKairong Luo ✈️ ICLR2026 tweet mediaKairong Luo ✈️ ICLR2026 tweet media
English
0
0
0
39
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
📈 Innovation 3: Quality Curriculum Samples sorted by quality (ascending), then interleaved globally. - Progressive Exposure: Model sees "textbook quality" data only when mature. - Stable Mix: Domain ratios (Chinese/Code/Math) remain fixed while quality ramps up.
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
0
31
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🔄 Innovation 2: Strategic Repetition High-quality data is finite. We use a multi-phase approach to repeat the best data without overfitting. Method: Retain top 50% → 30% → 10% in later phases. Result: Top 10% samples seen 4x; low-quality samples seen only once.
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
0
29
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
📊 Innovation 1: Quantile Probing Stop blind filtering. Start systematic probing. We trained reference models on data subsets across quality quantiles (top 15%... 75%). Insight: Quality is task-dependent. FineWeb-Edu 👑 Knowledge (MMLU) DCLM-Baseline 👑 Reasoning (WinoGrande)
Kairong Luo ✈️ ICLR2026 tweet mediaKairong Luo ✈️ ICLR2026 tweet mediaKairong Luo ✈️ ICLR2026 tweet media
English
0
0
0
30
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🧐Challenge: Heterogeneity & Scarcity Open datasets (DCLM, FineWeb) are great but vastly different. High-quality tokens are potent but rare. How to compare/mix heterogeneous sources? How to max efficiency with sparse "gold" data? Focus on these and run data-centric training.👇
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
0
40
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
📢 Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session: 🗓 Friday, April 25 🕒 3:00 PM – 5:30 PM CST 📍 Hall 3 + Hall 2B, Poster #237 Expect your feedback!
Kairong Luo ✈️ ICLR2026@openhonor

🔍How does pretraining loss evolve under different LR schedules? 🌟Meet our Multi-Power Law: predicts the full loss curve for various schedules! 🌟Accurate enough to optimize LR schedules directly. 🌟Result? A WSD-like schedule that outperforms the rest! 🔥Accepted at #ICLR2025

English
1
1
3
3.8K
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
📢 Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session: 🗓 Friday, April 25 🕒 3:00 PM – 5:30 PM CST 📍 Hall 3 + Hall 2B, Poster #237 Expect your feedback!
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
7
206
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🔹 Using predicted final loss as a surrogate objective, we induce an optimized schedule—matching WSD (Hu et al., 2024) in shape but achieving even lower loss!
Kairong Luo ✈️ ICLR2026 tweet media
English
0
0
4
321
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
💡 Results at a glance: 🔹 Our law is fitted on the schedules in the first row—then accurately predicts loss curves for unseen schedules in the second row!
Kairong Luo ✈️ ICLR2026 tweet media
English
1
0
3
455
Kairong Luo ✈️ ICLR2026
Kairong Luo ✈️ ICLR2026@openhonor·
🔍How does pretraining loss evolve under different LR schedules? 🌟Meet our Multi-Power Law: predicts the full loss curve for various schedules! 🌟Accurate enough to optimize LR schedules directly. 🌟Result? A WSD-like schedule that outperforms the rest! 🔥Accepted at #ICLR2025
English
2
3
29
18.9K