Tyler Marques
45 posts

Tyler Marques
@Tyler_Marques
Data Scientist / Engineer, Techie, Home-labber
Toronto, Canada Katılım Mayıs 2008
58 Takip Edilen164 Takipçiler

@danshipper Agreed but architect's need better tooling - right now reviewing pirate code is.. not fun to say the least
English

new model for engineering team structure in 2026:
2 people only
one pirate and one architect
the pirate's job is to move as fast as possible to develop valuable, shipped product features by vibe coding.
the architect's job is to turn the product surface discovered by the pirate into a reliable, structured machine—also by vibe coding, but at a slower, more well-reasoned pace.
every product needs a pirate but most product's only need an architect once they some form of PMF, and in that case they usually don't need one full-time. architects can work across many codebases and solve interesting technical challenges. pirates go hard on a product that they own end-to-end.
English
Tyler Marques retweetledi
Tyler Marques retweetledi

Today, we're launching Good Start Labs
w/ $3.6M from amazing investors including
@Inovia & @generalcatalyst
My whole life I've been learning from games
Over the past five years, I've dreamt about how AI learn with me.
Today we're launching LOL Arena, the first AI benchmark for humor, informed by millions of human votes.
We are also launching Diplomacy Arena ranking strategy, betrayal, and prompt impact across models.
In the coming years we hope to lead at the intersection of Gen AI & Games and define what it means to do alignment via entertainment.
Ensuring everyone can share their voice and help AI become a tool that really is custom built to help bring our dreams to life.
If that inspires you, join us!
We're hiring.
Here's what we're shipping today: 🧵

English
Tyler Marques retweetledi
Tyler Marques retweetledi

Thanks to @swyx for having me at @aiDotEngineer as well as on the @latentspacepod, both were a blast
was great to talk about benchmarks that mean something with people who care.
V1 of AI Diplomacy live stream wrapping up in the next couple days with three great games left 👇

English
Tyler Marques retweetledi

AI Diplomacy made @BusinessInsider !
The people want better benchmarks:
"Everyone knows the usual benchmarks are a bore."
Couldn't have built it w/o @Tyler_Marques - excited to keep it rolling
Shipping updates to the stream constantly, come check it out!

English

@AdrienLE @morqon @danshipper @alxai_ Yea I'd like to get up a better summary of the games and who won and why. It'd be great to have a bigger sample size it's just expensive to run lots of games
English

@alxai_ @karpathy @danshipper My favourite bits of this are seeing the "personalities", for lack of a better word, emerge from the models. Claude is honest to a fault here.
English

@karpathy @danshipper Here's the (more costly than anticipated 😅) data!
If only every model was as cheap, fast, and good as 2.5 Flash
drive.google.com/drive/folders/…
English

🚨 NEW:
We made Claude, Gemini, o3 battle each other for world domination.
We taught them Diplomacy—the strategy game where winning requires alliances, negotiation, and betrayal.
Here's what happened:
DeepSeek turned warmongering tyrant. Claude couldn't lie—everyone exploited it ruthlessly. Gemini 2.5 Pro nearly conquered Europe with brilliant tactics. Then o3 orchestrated a secret coalition, backstabbed every ally, and won.
Why did we do this? The most popular AI benchmarks don't test deception. But as these models get deployed everywhere—from your email to your workplace—we need to know: Will they lie to get what they want?
So @every we built the ultimate test: AI Diplomacy, a dynamic benchmark that measures AI's ability to form alliances, negotiate, and betray each other.
Watch them live below! Created from the ground up by @alxai_ and @Tyler_Marques.
English

We launched twitch.tv/ai_diplomacy today! Been working on this for a while and super proud of it. Watch Claude, Gemini, o3, and others battle it out in the classic board game of Diplomacy.
Super proud to be working with @alxai_ and the team at @every
English

Congrationals @Beckett_hannah!You’ve worked so hard to get here. Very proud to be by your side for the future #UWaterlooGrad
English

@kosinception Your domain has expired!! If you try to go to your website it’s messed up.
English

@joshdneufeld Hey look! You made it onto buzzfeed!! buzzfeed.com/jessicamisener…
English

@joshdneufeld A cool gif of slime mold searching out food! i.imgur.com/4dpbdyH.gifv
English








