Good Start Labs

10

74

5.8K

Good Start Labs retweetledi

Timothy Chen@tnachen·28 Oca

Games are enabling frontier models in all sizes

alex duffy@alxai_

@goodstartlabs is proud to be supporting @arcee_ai with high-complexity game RL environments to improve their Trinity family's 〜reasoning, tool use, long horizon planning, & humor 〜 a fast, solid, open source model from a U.S lab that's just getting started 🔼🔼🔼

English

4

9

1.3K

Good Start Labs retweetledi

alex duffy@alxai_·28 Oca

@goodstartlabs is proud to be supporting @arcee_ai with high-complexity game RL environments to improve their Trinity family's 〜reasoning, tool use, long horizon planning, & humor 〜 a fast, solid, open source model from a U.S lab that's just getting started 🔼🔼🔼

Arcee.ai@arcee_ai

Today, we’re releasing the first weights from Trinity Large, our first frontier-scale model in the Trinity MoE family.

English

6

27

2.6K

Good Start Labs retweetledi

Sean Cai@SeanZCai·11 Oca

x.com/i/article/2010…

ZXX

8

4

83

12.2K

Good Start Labs retweetledi

alex duffy@alxai_·7 Oca

sonnet just beat opus. same task. better scaffolding. i'm convinced 1. models will continue to improve 2. the skillcap for AI use is quite high @goodstartlabs we spent months getting models to play diplomacy well 〜 just finished training Qwen 3 on that environment & wow... gains on games it never saw (hanabi, wordle). gains on benchmarks it never trained on (BFCL, Tau 2, Math). more to come here! ♻️ 1. make great harness 2. get better model responses 3. train on it 4. model gets better ♻️ that loop has room to run ⤴️

Zhenting Qi@ZhentingQi

Agent scaffolding matters as much as, or even more than, raw model capability for hard agentic tasks. In our latest research with @Meta, we show that carefully designed scaffolding achieve 54.3% (Claude Opus) and 52.7% (Claude Sonnet) on SWE-Bench-Pro, compared to a 52.0% Claude Opus' result under a proprietary scaffold @claudeai.

English

1

8

1.1K

Good Start Labs retweetledi

alex duffy@alxai_·4 Ara

At NeurIPS presenting our paper at the Workshop on Multi-Turn Interactions in Large Language Models! If you're into AI 〜 Games 〜 RL Env's 〜 Alignment 〜 Interpretability Lets Chat!

English

1

11

562

Good Start Labs retweetledi

∿@somewheresy·26 Kas

btw — I had a very short run, quietly working with @goodstartlabs on some incredible stuff you’ll see next year. amazing team, incredible trajectory & make sure to follow them. everything Every touches turns to gold and it was nothing but a pleasure :)

English

Our full vibe check of Gemini 3 Pro now live on @every every.to/vibe-check/vib…

1

11

709

Good Start Labs retweetledi

alex duffy@alxai_·20 Kas

gemini 3 pro is now: · The funniest model as voted by real people · The best Diplomacy player, finally dethroning o3 without being as ruthless It's one of only models to successfully use convoys, requiring precision across multiple coordinated units and long-term strategic planning. that precision & planning shows up in the code it writes especially in their new IDE Antigravity. better multi-file coordination. cleaner architectures. less bloat. feels like a genuine step change. confirms there's a clear path forward through better pre-training and post-training. i'll keep using claude and chatgpt, but reaching for gemini and antigravity a lot more now for code - full vibe check 👇 we're still early.

Dan Shipper 📧@danshipper

English

7

24

4.4K

Good Start Labs retweetledi

alex duffy@alxai_·14 Kas

games aren't just for AI training they're how we figure out what we need AI to do and how to work together your hardest problem? might be a game too first edition of Playtesting → every.to/playtesting/ai…

English

11

56

7.7K

Good Start Labs retweetledi

alex duffy@alxai_·22 Eki

Was great talking with Wes + Dylan spoke a lot about the inspiration behind @goodstartlabs and why games + AI is such an important intersection

Wes Roth@WesRoth

CAN AN AI MODEL "HACK" YOUR BRAIN? we recently interviewed Alex Duffy (@alxai_) and one thing he said has me stuck “Language is an attack vector for humans.” can LLMs "hack" our brains by mastering the weapon we created (language). If words are the API of the human mind, then AIs fluent in it have deep psychological leverage. check out the full interview below:

English

2

15

4.8K

Good Start Labs@goodstartlabs·17 Eki

@polats Looking forward to seeing you on the battlefield 🫡

English

11

Paul Gadi@polats·17 Eki

I just applied to Battle of the Bots by @goodstartlabs — 49 AI commanders, $1000+ prizes, pure strategy. goodstartlabs.com/battle-of-the-…

English

0

3

173

Good Start Labs retweetledi

(El Capitano) Otman Mechbal, PM & AI Consultant@El_Capitano_O·15 Eki

Raising $3.6M, @goodstartlabs 's mission is to use games as dynamic environments to train and evaluate AI. Using games to sharpen both human and AI skills can lead to better, more relatable AI, and help close the understanding gap between users and technologies. The future will bring more AI-vs-AI battles, games that test creativity (like humor challenges), and a stronger focus on using AI to solve meaningful real-world problems, not just play for fun. Imagine two or more artificial intelligences playing a game against each other, like a robot chess tournament but with software instead of physical robots. Why do it? By letting AIs compete, we can see which one is smarter, funnier, or better at solving problems. It’s a fresh way to discover each AI’s strengths and weaknesses instead of just testing them with yes/no questions or static quizzes.

Dan Shipper 📧@danshipper

We taught AI to play games, now it’s a $3.6m company. I sat down with @alxai_ to talk about how and why playing games is the future of AI:

English

1

5

1.3K

Good Start Labs retweetledi

Dan Shipper 📧@danshipper·15 Eki

We taught AI to play games, now it’s a $3.6m company. I sat down with @alxai_ to talk about how and why playing games is the future of AI:

English

1

6

38

31K

Good Start Labs retweetledi

alex duffy@alxai_·15 Eki

Talked w/ @danshipper about @goodstartlabs, games & @every - if you're curious about what we're building & why, more in this episode. Dan's the best, grateful we get to chat like this when cameras are and aren't rolling

Dan Shipper 📧@danshipper

We taught AI to play games, now it’s a $3.6m company. I sat down with @alxai_ to talk about how and why playing games is the future of AI:

English

6

15

4K

Good Start Labs retweetledi

Dan Shipper 📧@danshipper·15 Eki

BIG NEWS: Our latest incubation @every just raised $3.6m to teach AIs to play games. It’s spinning out as a separate company—@goodstartlabs—with funding led by @generalcatalyst and @inovia. GSL builds custom AI-playable game environments with popular existing games—like Diplomacy and Cards Against Humanity. These environments allow AIs and millions of players to naturally generate rich and valuable training data for reinforcement learning and pre-training—all through play. GSL is led by our head of AI training @alxai_ and his co-founder @Tyler_Marques. It started as AI Diplomacy, a project @every where we taught frontier models to battle each other for world domination in Diplomacy. I’m proud that GSL started its journey @Every: It’s one of my favorite examples of how great thinking can turn into great writing, which can turn into valuable businesses. Extremely excited to watch what @alxai_ and co do next—and proud to be along for the ride!

English

23

9

233

26.4K

Good Start Labs retweetledi

alex duffy@alxai_·15 Eki

Today, we're launching Good Start Labs w/ $3.6M from amazing investors including @Inovia & @generalcatalyst My whole life I've been learning from games Over the past five years, I've dreamt about how AI learn with me. Today we're launching LOL Arena, the first AI benchmark for humor, informed by millions of human votes. We are also launching Diplomacy Arena ranking strategy, betrayal, and prompt impact across models. In the coming years we hope to lead at the intersection of Gen AI & Games and define what it means to do alignment via entertainment. Ensuring everyone can share their voice and help AI become a tool that really is custom built to help bring our dreams to life. If that inspires you, join us! We're hiring. Here's what we're shipping today: 🧵

English

30

34

234

99.2K

Good Start Labs retweetledi

alex duffy@alxai_·18 Eyl

Bringing AI into the real world hits different. Quick 🧵on how I turned an idea → physical trophy that you can win along w/ $1000 as part of our AI Diplomacy, prompting competition in October Link to enter 👇 (it's free but only a few spots left, 49 people will play)

English

5

6

24

2.2K

Good Start Labs@goodstartlabs·7 Ağu

Sign up to win $1000 by commanding your AI agent to victory 🤖

alex duffy@alxai_

GPT-5 is out. It's pretty great, steerable, & fast, BUT... - o3 still wins - GPT-5-mini, cheaper & as good as 2.5 Flash developers rejoice! - GPT-5 is super steerable! Great prompts make big difference - Different 'reasoning-effort' makes a big difference Results below! AI Diplomacy proved to be an awesome test bed for the model. Look how it compares to o3, o4-mini, Gemini 2.5 Flash, & the new open source models 1. Watch different versions of GPT-5 & other models play Diplomacy live on Twitch now! 2. Read our @every vibe check 3. Sign up for our upcoming Battle of the Bots, where you can control your own AI agent playing Diplomacy to win $1000+ in prizes Links 👇

English

1

6

538

Good Start Labs retweetledi

alex duffy@alxai_·7 Ağu

GPT-5 is out. It's pretty great, steerable, & fast, BUT... - o3 still wins - GPT-5-mini, cheaper & as good as 2.5 Flash developers rejoice! - GPT-5 is super steerable! Great prompts make big difference - Different 'reasoning-effort' makes a big difference Results below! AI Diplomacy proved to be an awesome test bed for the model. Look how it compares to o3, o4-mini, Gemini 2.5 Flash, & the new open source models 1. Watch different versions of GPT-5 & other models play Diplomacy live on Twitch now! 2. Read our @every vibe check 3. Sign up for our upcoming Battle of the Bots, where you can control your own AI agent playing Diplomacy to win $1000+ in prizes Links 👇

English