Tyler LaBonte

940 posts

Tyler LaBonte

@tmlabonte

ML PhD student @GeorgiaTech, Math BS @USC. Deep learning theory, generalization, robustness.

Atlanta, GA Katılım Aralık 2019

702 Takip Edilen894 Takipçiler

Sabitlenmiş Tweet

Tyler LaBonte@tmlabonte·1 May

Excited to present at the first #AISTATS2025 poster session on May 3! Ever wondered how LLMs can generalize to new tasks in-context despite only training on token completion? We formalize this phenomenon as "task shift" and investigate a linear version: arxiv.org/abs/2502.13285

English

2.5K

Tyler LaBonte@tmlabonte·9 Mar

@iamwaynechi Can't wait for more games in various shades of red! ("rougelikes"... ok I'll see myself out)

English

Wayne Chi@iamwaynechi·9 Mar

Slay the Spire 2 is having one of the most successful launches in indie gaming history... And it's made entirely in Godot I think Godot will have a meteoric rise in the coming years and it's a big reason why I focused GameDevBench (arxiv.org/abs/2602.11103) on Godot

Dexerto@Dexerto

Indie game Slay the Spire 2 has surpassed 500,000 concurrent players on Steam The rougelike is now in the top 20 games with highest all-time peaks on Valve's platform

English

489

Tyler LaBonte retweetledi

Microsoft Research@MSFTResearch·9 Mar

Multimodal reasoning with Phi-4-reasoning-vision, new work on scaling LLM inference, benchmarking AI agents in network operations, cinematic video generation, adaptive evaluation for LLMs, and using AI to improve individual and population health. msft.it/6013QiQgx

English

11.4K

Tyler LaBonte@tmlabonte·5 Mar

Our Phi-4-reasoning-vision-15B technical report is now available on arxiv: arxiv.org/abs/2603.03975

English

378

Tyler LaBonte@tmlabonte·5 Mar

Some nice coverage on our new model release, highlighting our hybrid approach to multimodal reasoning 🚀

VentureBeat@VentureBeat

Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time venturebeat.com/ai/microsoft-b…

English

380

Tyler LaBonte@tmlabonte·5 Mar

It's been the privilege of my career to help build the newest Phi series model from @MSFTResearch! Phi-4-reasoning-vision-15B is open-weight & competitive on perf with 10X less compute/tokens. Read the blog for math and CUA case studies, hybrid reasoning, data insights, & more!

Microsoft Research@MSFTResearch

Vision-language models improve multimodal systems, but can make them slower, costlier, and harder to deploy. Learn how Phi-4-reasoning-vision-15B, a compact and fast multimodal reasoning model, blends strengths of different methods while reducing their limits: msft.it/6014Q5X0u

English

856

Tyler LaBonte@tmlabonte·5 Şub

@bneyshabur Best of luck, Behnam! Looking forward to what comes next!

English

784

Behnam Neyshabur@bneyshabur·5 Şub

I've left Anthropic to start something new. 🧵

English

156

2.9K

398K

Tyler LaBonte@tmlabonte·14 Oca

Finally, thanks to @Kangwook_Lee's "Tenure Track Simulator" post for inspiring me to make the game public and write this up!

English

Tyler LaBonte@tmlabonte·14 Oca

Misc takeaways: • Copilot + GitHub was far more useful than I expected • Keeping code style consistent across humans + agents is painful • Overall: Claude was best for agentic coding; Gemini best for interactive pair-programming

English

121

Tyler LaBonte@tmlabonte·14 Oca

Over the holidays, I stress-tested the AI coding hype by doing something concrete: I built a college football simulator game from scratch to see if agents actually deliver. Here’s what I learned 👇

English

168

Tyler LaBonte@tmlabonte·14 Oca

@liyzhen2 What are the other workshops? I couldn't find them on the AISTATS website

English

152

yingzhen@liyzhen2·13 Oca

AISTATS has introduced workshop track this year! 😍 this is one of the 3 workshops in the inaugural year, consider submitting

Edgar Dobriban@EdgarDobriban

We are excited to announce the workshop *Theory and Applications of Calibration for Modern AI* held at AISTATS (May 5, 2026; Tangier, Morocco)! This workshop aims to bring together researchers and practitioners interested in the calibration of machine learning and AI models from a variety of areas; encompassing theory, methods, and applications. Calibration has always been an important topic at the interface of many different areas. Despite this, there is no well-established "home" for discussion about calibration. We aim to fill this gap. We are thrilled to have a great lineup of speakers, including Peter Flach (delivering a tutorial), Ewout W. Steyerberg, Johanna Ziegel, @nhaghtal, Futoshi Futami, and @BuettnerFlo. We also accept submissions for workshop papers, which will be presented at our poster session. Deadline: *February 20, 2026*. (Openreview link below.) Please find more details on our website: calibration-workshop.github.io. Spread the word and submit your work! Glad to co-organize with @Eugene_Berta, @SebGGruber, @TPopordanoska, @yifanwu2014 and @BachFrancis.

English

Tyler LaBonte@tmlabonte·11 Ara

Nice work from GT colleagues about how next-token prediction naturally captures long-range structural dependencies!

Xinyuan Cao@CaoYouki

(1/6) Why does next-token prediction work so well, even for long text? 🤔 Check out “Provable Long-Range Benefits of Next-Token Prediction”. A rigorous explanation for LLM’s long-range coherence/reasoning. Joint work with Santosh Vempala📄 arXiv: arxiv.org/abs/2512.07818

English

288

Tyler LaBonte@tmlabonte·9 Ara

@marikgoldstein Yes definitely, though it's better than 6mo-1yr ago. I find rewording in a more objective way helps, e.g., "Prove whether f(x) is O(n)" instead of "Prove that f(x) is O(n)". Also for anything important (i.e., research) I ask both GPT and Gemini and compare, then verify myself.

English

112

Mark Goldstein@marikgoldstein·9 Ara

Suppose that you are trying to prove XYZ with GPT. If in the chat you show some desire for the claim to be true, have you noticed GPT making mistakes frequently? so to speak, maybe prioritizing affirmation over correctness? If so, beyond inserting "I might be wrong", what helps?

English

648

Tyler LaBonte@tmlabonte·25 Kas

Fara has been one of the most exciting projects to watch evolve @MSFTResearch over the last few months. From my perspective, Fara is a real advance towards natively multimodal computer-use agents (e.g., no accessibility trees). Congrats to Corby and the team on the release!

Corbin Rosset@corby_rosset

Microsoft just dropped Fara-7B, its first on device AI Agent that can use your computer just like you would: it clicks, types, fills out forms and completed tasks just by “seeing” the screen. It’s best-in-class in terms of accuracy and cost from yours truly at Microsoft AI Frontiers and you can use it today

English

264

Tyler LaBonte@tmlabonte·22 Eyl

@im_td Comments with em-dashes

English

120

Tim Davidson@im_td·22 Eyl

what is the em-dash equivalent for AI generated code?

English

474

Keşfet

@iamwaynechi @MSFTResearch @bneyshabur @Kangwook_Lee @liyzhen2 @marikgoldstein @im_td @elonmusk