Wayne Chi ✈️ ICML

542 posts

Wayne Chi ✈️ ICML

@iamwaynechi

CS Ph.D. at @SCSatCMU. Funded by @NDSEG Fellowship. Intern @arena. Editor at https://t.co/kBygvj9Puy.

Santa Clara Katılım Temmuz 2013

247 Takip Edilen1K Takipçiler

Sabitlenmiş Tweet

Wayne Chi ✈️ ICML@iamwaynechi·13 Şub

New preprint alert 🚨 Can LLM agents develop video games? We release GameDevBench, the first benchmark evaluating agentic game development in a game engine, Godot. We also present two simple multimodal feedback mechanisms that lead to immediate performance gains. /🧵

English

258

27K

Wayne Chi ✈️ ICML retweetledi

Irmak Bukey@irmakbukey·1d

🎶How can we learn to follow along to a music performance in a score without direct supervision?🎶 Introducing FuSiLi (Fused Sinkhorn-Localized Similarity)! A multimodal contrastive learning approach that learns precise local alignments using only global supervision. 🧵

English

7.6K

Wayne Chi ✈️ ICML@iamwaynechi·2d

Would people like to see K3 evaluated on gamedevbench?

Kimi.ai@Kimi_Moonshot

Introducing Kimi K3: Open Frontier Intelligence 🔹 2.8 Trillion Parameters, 1 Million Context, Native Multimodal 🔹 Kimi Delta Attention enables up to 6.3x faster decoding in million-token contexts 🔹 Attention Residuals deliver ~25% higher training efficiency at <2% additional cost 🔹 Built for long-horizon agentic coding and self-evolving workflows Kimi K3 is now live on on Kimi.com, Kimi Work, Kimi Code, and the Kimi API. Open Weights by July 27, 2026. 🔗 API: platform.kimi.ai 🔗 Tech blog: kimi.com/blog/kimi-k3

English

939

Wayne Chi ✈️ ICML@iamwaynechi·2d

@garywu_eth @OpenAI Working on getting it here :)

English

gary wu@garywu_eth·2d

@iamwaynechi @OpenAI fable on here?

English

Wayne Chi ✈️ ICML@iamwaynechi·3d

GPT-5.6 Sol is a frontier model for game development With high reasoning, it's +7.2 pp better than the previous best model on GameDevBench This is the biggest jump I've seen in agentic game development capabilities yet Huge congrats to @OpenAI for a fantastic model!

English

2.8K

Wayne Chi ✈️ ICML@iamwaynechi·2d

@MsShannon112 @OpenAI Coming soon!

English

Man@MsShannon112·2d

@iamwaynechi @OpenAI When will you test fable?

English

Wayne Chi ✈️ ICML@iamwaynechi·3d

@ajs6888 @OpenAI 沒錯沒錯

中文

安叫兽|Bird🕊️ 🔶 BNB@ajs6888·3d

@iamwaynechi @OpenAI 看起来游戏原型迭代要更卷了

中文

Wayne Chi ✈️ ICML@iamwaynechi·3d

@RonfortMartin @OpenAI It's legit a huge jump in quality

English

Martin Ronfort@RonfortMartin·3d

@iamwaynechi @OpenAI 7.2pp jump on agentic game dev is legit. The shift from 'can this model code?' to 'how good is it at iterative game design?' is where we actually measure progress now.

English

104

Wayne Chi ✈️ ICML retweetledi

Seth Karten@sethkarten·3d

More meaningful to understand true game dev capabilities of GPT-5.6 Sol on the GameDevBench eval than any slop minecraft clones you see on the timeline

Wayne Chi ✈️ ICML@iamwaynechi

GPT-5.6 Sol is a frontier model for game development. With high reasoning, it's +7.2 pp better than the previous best model on GameDevBench. This is the biggest jump I've seen in agentic game development capabilities yet!

English

225

Wayne Chi ✈️ ICML@iamwaynechi·3d

Just ran GPT-5.6 Sol on GameDevBench... this model is kind of cracked

English

1.3K

Wayne Chi ✈️ ICML@iamwaynechi·5d

At that rate you'll soon be post-economic

Andrew Gordon Wilson@andrewgwils

There should be a $10 fine anytime someone from the Bay Area uses the phrase "permanent underclass". I can give my account details for collection.

English

567

Wayne Chi ✈️ ICML retweetledi

Nathan Pruyne@pruynathan·11 Tem

Introducing 🎼🫧MulTTiPop, an evaluation dataset of multitrack pop music for automatic music transcription! MulTTiPop contains 3.5 hours (572 segments) of aligned multitrack pop audio and MIDI, and can be previewed at gclef-cmu.org/multtipop 🧵1/7

English

229

15.2K

Wayne Chi ✈️ ICML@iamwaynechi·9 Tem

@ethantsliu Thanks!

English

ethantsliu@ethantsliu·9 Tem

@iamwaynechi congrats! 🎉 cool work

English

Wayne Chi ✈️ ICML@iamwaynechi·9 Tem

LLMs are much less capable judges for code than we might expect. Super excited that our paper on LLM judges for code has been accepted to COLM! In it, we discuss this weakness and provide a pipeline for understanding and diagnosing issues with LLM code preferences. This was led by two wonderful students @RyanShar01 and @AdityaM129. It was an absolute pleasure mentoring them on this project.