Generality, Inc.

16 posts

Generality, Inc. banner
Generality, Inc.

Generality, Inc.

@generalityinc

Humanity is advancing full tilt towards general intelligence. Generality, Inc. is building the measures to take us beyond pointwise progress. | YC W25

San Francisco California Sumali Ağustos 2025
6 Sinusundan46 Mga Tagasunod
Generality, Inc. nag-retweet
Saner Cakir
Saner Cakir@sanerc110·
New LLM game tournament 🎲🕹️! Four in a row from @basvanopheusden and @Ikuperwajs's "Expertise increases planning depth in human gameplay" ... and our reasoning depth analysis. 🧵
Saner Cakir tweet media
English
3
5
22
14.2K
Generality, Inc. nag-retweet
Saner Cakir
Saner Cakir@sanerc110·
GPT-5 and o3-high are incredibly good at chess! How good? They have a secret superpower 🧵(1/9)
Saner Cakir tweet media
English
6
2
47
23.4K
Generality, Inc. nag-retweet
Saner Cakir
Saner Cakir@sanerc110·
@UnslothAI This is really exciting! We just launched game-arena.ai: hundreds of strategy game RL environments to improve LLM reasoning. x.com/ycombinator/st…
Y Combinator@ycombinator

Game Arena from @generalityinc is the largest LLM strategy game tournament to date. Games are great for measuring LLMs on instruction following, long-horizon planning, and problem-solving. In fact, models that are Olympiad-level at math and coding often struggle to make accurate moves in Game Arena (median illegal-move rate: 11.4%). Game environments also resist contamination and saturation: every run is unique, so there’s no risk of training-data leakage, and the bar keeps rising as models improve. Game Arena has digitized hundreds of board games into fresh new environments, and today it’s debuting its first tournament, with models like GPT-5 High, Claude Opus 4.1, and DeepSeek V3.1 from OpenAI, Anthropic, Qwen, DeepSeek, Google, and more going head-to-head. You can watch game replays and full model reasonings on game-arena.ai. Congrats on the launch, @sanerc110 and @kaylalee278!

English
1
1
7
1.2K
Generality, Inc. nag-retweet
Saner Cakir
Saner Cakir@sanerc110·
Games are actually great at measuring and teaching many skills we associate with intelligence. Long horizon planning: Some games take 300+ moves. Many of them require players to consider multiple moves options, how an opponent would react, and reason about the best moves to make. Instruction Following: They have long, complex rules and put LLMs in scenarios where they have to first reduce to a set of legal moves, and then pick the best move from this set. Game arena shows that models have an illegal move rate of +70% on some games. and many more meta skills like creative problem solving, long context behavior. We believe that these are generalizable skills that models can learn through games but apply in other areas like coding, task oriented dialogue, and more. If you get a chance to read the reasoning traces on game-arena.ai I think this will make a lot of sense.
English
1
1
1
165
Generality, Inc. nag-retweet
Saner Cakir
Saner Cakir@sanerc110·
Racing Kings seems to be one of the hardest chess variants for LLMs on game-arena.ai And it reveals a lot about instruction following. 🧵
Saner Cakir tweet media
English
1
2
3
783
Generality, Inc. nag-retweet
Y Combinator
Y Combinator@ycombinator·
Game Arena from @generalityinc is the largest LLM strategy game tournament to date. Games are great for measuring LLMs on instruction following, long-horizon planning, and problem-solving. In fact, models that are Olympiad-level at math and coding often struggle to make accurate moves in Game Arena (median illegal-move rate: 11.4%). Game environments also resist contamination and saturation: every run is unique, so there’s no risk of training-data leakage, and the bar keeps rising as models improve. Game Arena has digitized hundreds of board games into fresh new environments, and today it’s debuting its first tournament, with models like GPT-5 High, Claude Opus 4.1, and DeepSeek V3.1 from OpenAI, Anthropic, Qwen, DeepSeek, Google, and more going head-to-head. You can watch game replays and full model reasonings on game-arena.ai. Congrats on the launch, @sanerc110 and @kaylalee278!
English
16
22
108
21.7K