Junhyuck Kim
29 posts

Junhyuck Kim
@jhyuckkim
Researcher @Krafton_AI (@PUBG) Prev @CambridgeMLG


💻Tired of running so many slow, expensive benchmark evals across every checkpoint? Try ✨BenchPress✨ at microsoft.github.io/benchpress/: provide a few benchmark scores, then get predictions for the remaining ~100 benchmarks, with trust probabilities and calibrated 90% prediction intervals. How does this work? In his original post (x.com/DimitrisPapail…), @DimitrisPapail first tried the idea as a fun question: collect model-by-benchmark scores into a matrix, find its low-rank structure, and use matrix completion to predict missing benchmark scores from a few observed ones. We expanded this into a full system: a fully audited 84-model x 133-benchmark score matrix, an optimized matrix-completion predictor, and a reliability layer for trust probabilities and 90% prediction intervals. Beyond predicting missing scores, we also suggest practical seed benchmark sets. The five-probe set {GPQA-D, HLE, Codeforces, MMLU-Pro, ARC-AGI-1} recovers the rest of a model's public score profile with a MedAE of 3.93 points. A lower-cost set {GPQA-D, MMLU-Pro, Aider Polyglot, MATH-500, AIME 2026} reaches 4.55 points. See more details below 🧵1/7 This work is with @DimitrisPapail at AI Frontiers, a boutique research lab inside @MSFTResearch.







this is basically @steipete's oracle pattern, right? github.com/steipete/oracle i have been using a custom skill in codex/claude to consult `gpt-5.4-pro` for difficult tasks: github.com/search?q=repo%… `gpt-5.4-pro` is the best model out there that *anyone* can use


got scooped by Ant. Oh well :p cute idea

My team has been cooking nonstop for a while... and I’m so excited to finally share what we’ve been building!!! Today, we’re releasing four open models, many of which are the best models of the same size 🥳!!! tldr; 1) Raon-Speech: 9B SOTA speech LLM 2) Raon-SpeechChat: 9B full duplex model 3) Raon-OpenTTS: 0.3B/1B open-data-open-weight SOTA TTS 4) Raon-VisionEncoder: 0.4B vision encoder trained only with public data huggingface.co/collections/KR… === 1) Raon-Speech (9B) Raon-Speech is a speech LLM (LLM + speech understanding + speech generation). It's a bilingual model (English/Korean), and it's ranked #1 on both leaderboards 😎 tldr; it's the best open-model alternative to ChatGPT voice mode. Model: huggingface.co/KRAFTON/Raon-S… Tech report: huggingface.co/KRAFTON/Raon-S… Web demo: raon.krafton.ai ("Speech Chat" menu here. "auto" is a bit unstable, so use "manual" and choose the language!) 2) Raon-SpeechChat (9B) While a speech LLM is useful, it’s kind of like a walkie-talkie. A full-duplex model is more like a phone, so it is even more useful in many applications. That’s why we also built and are releasing Raon-SpeechChat. Again, on several quantitative evaluation metrics, Raon-SpeechChat scored the best on average. Model: huggingface.co/KRAFTON/Raon-S… Tech report: huggingface.co/KRAFTON/Raon-S… Web demo: raon.krafton.ai ("Full Duplex" menu here.) 3) Raon-OpenTTS (0.3B, 1B) We’re also releasing Raon-OpenTTS, a state-of-the-art open-data, open-weight TTS model. Model + data: huggingface.co/KRAFTON/Raon-O… The 1B model and a detailed tech report are coming soon! 4) Raon-VisionEncoder (0.4B) Last but not least, we’re releasing Raon-VisionEncoder, a vision encoder trained from scratch using only public data. It closely matchs the SOTA vision encoder quality too! Model: huggingface.co/KRAFTON/Raon-V… Tech blog: krafton.ai/blog/posts/202… === That’s it! I’m incredibly proud of what my team has built! My AI research team at KRAFTON (@Krafton_AI), which undoubtedly is the most cracked team in Korea, has been cooking nonstop for a while for this 😅... This is just the beginning of our planned model releases, so stay tuned! ps1/ Ah, by the way, you may ask why “Raon”? “Raon” is an old Korean word meaning happy. And, well, we’re kRAftON :-) ps2/ KRAFTON is one of the four teams participating in Korea’s national frontier-model project, together with SK Telecom. We’re training something very exciting together... and more to come soon!







