Generality, Inc. (@generalityinc) - Twitter Profile

Generality, Inc. nag-retweet

Saner Cakir@sanerc110·4 Mar

We are launching @outshipai (YC w25) a hiring platform that records and analyzes how engineering candidates use coding agents. Seeing a candidate work with AI can give you a lot of signal about their thought process and whether they’re the engineer you’re looking for.

Naval@naval

It’s not about junior vs senior, it’s about “good with AI” vs “not good with AI.”

English

14

9

45

13.1K

Generality, Inc. nag-retweet

Saner Cakir@sanerc110·17 Kas

New LLM game tournament 🎲🕹️! Four in a row from @basvanopheusden and @Ikuperwajs's "Expertise increases planning depth in human gameplay" ... and our reasoning depth analysis. 🧵

English

3

5

22

14.2K

Generality, Inc. nag-retweet

Ahmed@ah20im·11 Eki

Impressive! In my experience, GPT-5 is highly steerable, allowing it to demonstrate its intelligence across a wide range of use cases, including games.

Saner Cakir@sanerc110

GPT-5 and o3-high are incredibly good at chess! How good? They have a secret superpower 🧵(1/9)

English

0

4

21

4K

Generality, Inc. nag-retweet

Saner Cakir@sanerc110·11 Eki

GPT-5 and o3-high are incredibly good at chess! How good? They have a secret superpower 🧵(1/9)

English

6

2

47

23.4K

Generality, Inc. nag-retweet

Saner Cakir@sanerc110·10 Eki

@UnslothAI This is really exciting! We just launched game-arena.ai: hundreds of strategy game RL environments to improve LLM reasoning. x.com/ycombinator/st…

Y Combinator@ycombinator

Game Arena from @generalityinc is the largest LLM strategy game tournament to date. Games are great for measuring LLMs on instruction following, long-horizon planning, and problem-solving. In fact, models that are Olympiad-level at math and coding often struggle to make accurate moves in Game Arena (median illegal-move rate: 11.4%). Game environments also resist contamination and saturation: every run is unique, so there’s no risk of training-data leakage, and the bar keeps rising as models improve. Game Arena has digitized hundreds of board games into fresh new environments, and today it’s debuting its first tournament, with models like GPT-5 High, Claude Opus 4.1, and DeepSeek V3.1 from OpenAI, Anthropic, Qwen, DeepSeek, Google, and more going head-to-head. You can watch game replays and full model reasonings on game-arena.ai. Congrats on the launch, @sanerc110 and @kaylalee278!

English

1

7

1.2K

Generality, Inc.@generalityinc·8 Eki

btw. check out this Racing Kings analysis! x.com/sanerc110/stat…

Saner Cakir@sanerc110

Racing Kings seems to be one of the hardest chess variants for LLMs on game-arena.ai And it reveals a lot about instruction following. 🧵

English

0

2

183

Generality, Inc.@generalityinc·8 Eki

Gold checkmark 🚀

English

1

0

2

133

Generality, Inc. nag-retweet

Saner Cakir@sanerc110·7 Eki

Games are actually great at measuring and teaching many skills we associate with intelligence. Long horizon planning: Some games take 300+ moves. Many of them require players to consider multiple moves options, how an opponent would react, and reason about the best moves to make. Instruction Following: They have long, complex rules and put LLMs in scenarios where they have to first reduce to a set of legal moves, and then pick the best move from this set. Game arena shows that models have an illegal move rate of +70% on some games. and many more meta skills like creative problem solving, long context behavior. We believe that these are generalizable skills that models can learn through games but apply in other areas like coding, task oriented dialogue, and more. If you get a chance to read the reasoning traces on game-arena.ai I think this will make a lot of sense.

English

1

165

Generality, Inc. nag-retweet

Saner Cakir@sanerc110·7 Eki

Racing Kings seems to be one of the hardest chess variants for LLMs on game-arena.ai And it reveals a lot about instruction following. 🧵

English

1

2

3

783

Generality, Inc.@generalityinc·7 Eki

@mvrckhckr @ycombinator @sama Spot on 🎯

English

1

0

2

46

𝚖𝚟𝚛𝚌𝚔𝚑𝚌𝚔𝚛@mvrckhckr·7 Eki

@generalityinc @ycombinator @sama So you are fine-tuning these models working with the companies that developed them?

English

1

0

1

50

Generality, Inc. nag-retweet

Y Combinator@ycombinator·7 Eki

Game Arena from @generalityinc is the largest LLM strategy game tournament to date. Games are great for measuring LLMs on instruction following, long-horizon planning, and problem-solving. In fact, models that are Olympiad-level at math and coding often struggle to make accurate moves in Game Arena (median illegal-move rate: 11.4%). Game environments also resist contamination and saturation: every run is unique, so there’s no risk of training-data leakage, and the bar keeps rising as models improve. Game Arena has digitized hundreds of board games into fresh new environments, and today it’s debuting its first tournament, with models like GPT-5 High, Claude Opus 4.1, and DeepSeek V3.1 from OpenAI, Anthropic, Qwen, DeepSeek, Google, and more going head-to-head. You can watch game replays and full model reasonings on game-arena.ai. Congrats on the launch, @sanerc110 and @kaylalee278!