Good Start Labs

37 posts

Good Start Labs banner
Good Start Labs

Good Start Labs

@goodstartlabs

Katılım Ağustos 2024
54 Takip Edilen259 Takipçiler
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
arc-agi-3 launch march 25 · sf chollet × altman fireside at yc · · · games are the new training arenas for intelligence · · · if you're into: rl envs · simulations · ai × games come debrief after or just come hang before we all get back to work
alex duffy tweet media
English
4
4
26
5.5K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
@goodstartlabs we trained ai on a board game & it became a better customer support agent Published research & thoughts around WHY in our Playtesting column @every Research already shows, AI make weird connections Train on insecure code -> AI thinks humans should be enslaved* Train on cheating -> Learns to sabotage AI safety research^ Just maybe training on the right games can teach -> Collaboration, negotiation, camaraderie, long horizon reasoning, level headedness, theory of mind, & more!
alex duffy tweet mediaalex duffy tweet media
English
3
10
74
5.8K
Good Start Labs retweetledi
Timothy Chen
Timothy Chen@tnachen·
Games are enabling frontier models in all sizes
alex duffy@alxai_

@goodstartlabs is proud to be supporting @arcee_ai with high-complexity game RL environments to improve their Trinity family's 〜reasoning, tool use, long horizon planning, & humor 〜 a fast, solid, open source model from a U.S lab that's just getting started 🔼🔼🔼

English
2
4
9
1.3K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
sonnet just beat opus. same task. better scaffolding. i'm convinced 1. models will continue to improve 2. the skillcap for AI use is quite high @goodstartlabs we spent months getting models to play diplomacy well 〜 just finished training Qwen 3 on that environment & wow... gains on games it never saw (hanabi, wordle). gains on benchmarks it never trained on (BFCL, Tau 2, Math). more to come here! ♻️ 1. make great harness 2. get better model responses 3. train on it 4. model gets better ♻️ that loop has room to run ⤴️
Zhenting Qi@ZhentingQi

Agent scaffolding matters as much as, or even more than, raw model capability for hard agentic tasks. In our latest research with @Meta, we show that carefully designed scaffolding achieve 54.3% (Claude Opus) and 52.7% (Claude Sonnet) on SWE-Bench-Pro, compared to a 52.0% Claude Opus' result under a proprietary scaffold @claudeai.

English
2
1
8
1.1K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
At NeurIPS presenting our paper at the Workshop on Multi-Turn Interactions in Large Language Models! If you're into AI 〜 Games 〜 RL Env's 〜 Alignment 〜 Interpretability Lets Chat!
alex duffy tweet media
English
0
1
11
562
Good Start Labs retweetledi
∿
@somewheresy·
btw — I had a very short run, quietly working with @goodstartlabs on some incredible stuff you’ll see next year. amazing team, incredible trajectory & make sure to follow them. everything Every touches turns to gold and it was nothing but a pleasure :)
English
0
1
11
709
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
gemini 3 pro is now: · The funniest model as voted by real people · The best Diplomacy player, finally dethroning o3 without being as ruthless It's one of only models to successfully use convoys, requiring precision across multiple coordinated units and long-term strategic planning. that precision & planning shows up in the code it writes especially in their new IDE Antigravity. better multi-file coordination. cleaner architectures. less bloat. feels like a genuine step change. confirms there's a clear path forward through better pre-training and post-training. i'll keep using claude and chatgpt, but reaching for gemini and antigravity a lot more now for code - full vibe check 👇 we're still early.
alex duffy tweet media
Dan Shipper 📧@danshipper

Our full vibe check of Gemini 3 Pro now live on @every every.to/vibe-check/vib…

English
2
7
24
4.4K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
games aren't just for AI training they're how we figure out what we need AI to do and how to work together your hardest problem? might be a game too first edition of Playtesting → every.to/playtesting/ai…
English
3
11
56
7.7K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
Was great talking with Wes + Dylan spoke a lot about the inspiration behind @goodstartlabs and why games + AI is such an important intersection
Wes Roth@WesRoth

CAN AN AI MODEL "HACK" YOUR BRAIN? we recently interviewed Alex Duffy (@alxai_) and one thing he said has me stuck “Language is an attack vector for humans.” can LLMs "hack" our brains by mastering the weapon we created (language). If words are the API of the human mind, then AIs fluent in it have deep psychological leverage. check out the full interview below:

English
3
2
15
4.8K
Good Start Labs
Good Start Labs@goodstartlabs·
@polats Looking forward to seeing you on the battlefield 🫡
English
0
0
0
11
Good Start Labs retweetledi
(El Capitano) Otman Mechbal, PM & AI Consultant
Raising $3.6M, @goodstartlabs 's mission is to use games as dynamic environments to train and evaluate AI. Using games to sharpen both human and AI skills can lead to better, more relatable AI, and help close the understanding gap between users and technologies. The future will bring more AI-vs-AI battles, games that test creativity (like humor challenges), and a stronger focus on using AI to solve meaningful real-world problems, not just play for fun. Imagine two or more artificial intelligences playing a game against each other, like a robot chess tournament but with software instead of physical robots. Why do it? By letting AIs compete, we can see which one is smarter, funnier, or better at solving problems. It’s a fresh way to discover each AI’s strengths and weaknesses instead of just testing them with yes/no questions or static quizzes.
Dan Shipper 📧@danshipper

We taught AI to play games, now it’s a $3.6m company. I sat down with @alxai_ to talk about how and why playing games is the future of AI:

English
1
1
5
1.3K
Good Start Labs retweetledi
Dan Shipper 📧
Dan Shipper 📧@danshipper·
We taught AI to play games, now it’s a $3.6m company. I sat down with @alxai_ to talk about how and why playing games is the future of AI:
English
1
6
38
31K
Good Start Labs retweetledi
Dan Shipper 📧
Dan Shipper 📧@danshipper·
BIG NEWS: Our latest incubation @every just raised $3.6m to teach AIs to play games. It’s spinning out as a separate company—@goodstartlabs—with funding led by @generalcatalyst and @inovia. GSL builds custom AI-playable game environments with popular existing games—like Diplomacy and Cards Against Humanity. These environments allow AIs and millions of players to naturally generate rich and valuable training data for reinforcement learning and pre-training—all through play. GSL is led by our head of AI training @alxai_ and his co-founder @Tyler_Marques. It started as AI Diplomacy, a project @every where we taught frontier models to battle each other for world domination in Diplomacy. I’m proud that GSL started its journey @Every: It’s one of my favorite examples of how great thinking can turn into great writing, which can turn into valuable businesses. Extremely excited to watch what @alxai_ and co do next—and proud to be along for the ride!
Dan Shipper 📧 tweet media
English
23
9
233
26.4K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
Today, we're launching Good Start Labs w/ $3.6M from amazing investors including @Inovia & @generalcatalyst My whole life I've been learning from games Over the past five years, I've dreamt about how AI learn with me. Today we're launching LOL Arena, the first AI benchmark for humor, informed by millions of human votes. We are also launching Diplomacy Arena ranking strategy, betrayal, and prompt impact across models. In the coming years we hope to lead at the intersection of Gen AI & Games and define what it means to do alignment via entertainment. Ensuring everyone can share their voice and help AI become a tool that really is custom built to help bring our dreams to life. If that inspires you, join us! We're hiring. Here's what we're shipping today: 🧵
alex duffy tweet media
English
30
34
234
99.2K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
Bringing AI into the real world hits different. Quick 🧵on how I turned an idea → physical trophy that you can win along w/ $1000 as part of our AI Diplomacy, prompting competition in October Link to enter 👇 (it's free but only a few spots left, 49 people will play)
alex duffy tweet media
English
5
6
24
2.2K
Good Start Labs retweetledi
alex duffy
alex duffy@alxai_·
GPT-5 is out. It's pretty great, steerable, & fast, BUT... - o3 still wins - GPT-5-mini, cheaper & as good as 2.5 Flash developers rejoice! - GPT-5 is super steerable! Great prompts make big difference - Different 'reasoning-effort' makes a big difference Results below! AI Diplomacy proved to be an awesome test bed for the model. Look how it compares to o3, o4-mini, Gemini 2.5 Flash, & the new open source models 1. Watch different versions of GPT-5 & other models play Diplomacy live on Twitch now! 2. Read our @every vibe check 3. Sign up for our upcoming Battle of the Bots, where you can control your own AI agent playing Diplomacy to win $1000+ in prizes Links 👇
alex duffy tweet media
alex duffy tweet mediaalex duffy tweet media
English
3
6
41
11.7K