Johan Land

39 posts

Johan Land

@LandJohan

https://t.co/C03802IDYv

Katılım Eylül 2011

11 Takip Edilen341 Takipçiler

Johan Land@LandJohan·5 Şub

@82deutschmark Thanks man! And, thanks a ton for all you do for the community!

English

Mark Barney@82deutschmark·5 Şub

Huge congrats to @LandJohan on verification of this amazing achievement!

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

129

Johan Land@LandJohan·4 Şub

@charlielidbury @arcprize 76.11%

Charlie Lidbury@charlielidbury·4 Şub

@arcprize @LandJohan @arcprize @LandJohan What % did you get on the public eval set? arcprize.org/media/data/lea… only shows your semi-private scores

English

146

ARC Prize@arcprize·3 Şub

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

924

213.3K

Johan Land@LandJohan·4 Şub

@ndzfs No, you still need to find the solution, not just judge it

English

Franck SN@ndzfs·4 Şub

@LandJohan If you know before hand when they are right or wrong then you would score 100% on any benchmark I guess ?

English

Franck SN@ndzfs·3 Şub

Here is the ensemble agent i was conceptually imagining, now scoring >70% on v2 of ARC-AGI.

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

325

Johan Land@LandJohan·4 Şub

@codewithimanshu @chatgpt21 Yeah, proves scaffolding is impactful!

English

Himanshu Kumar@codewithimanshu·4 Şub

@chatgpt21 @LandJohan @chatgpt21, that's a significant improvement in ARC-AGI performance, especially considering the cost per task.

English

Chris@chatgpt21·3 Şub

New SOTA public ARC-AGI!! V1: 94.5%, $11.4/task V2: 72.9%, $38.9/task GPT 5.2 refine! Arc says - “submission by @LandJohan ensembles many approaches together” Congratulations!

ARC Prize@arcprize

Johan's submission does a multi-model ensemble. It runs the same task through GPT-5.2, Gemini-3, and Claude Opus 4.5 in parallel. Tries multiple times with different prompting strategies (standard, deep thinking, with images). Then, instead of predicting the grid directly, the LLMs write Python functions that describe the transformation rule, then execute that code in a sandbox to produce the answer. After collecting many candidate answers, separate AI "judge" models evaluate and vote on which solution is most likely correct. See the repo here: github.com/beetree/ARC-AGI

English

203

17K

Johan Land@LandJohan·4 Şub

@desai_pratik No, sorry. Code is open source though.

English

118

Pratik A. Desai@desai_pratik·4 Şub

@LandJohan Any underpinning research paper to share?

English

131

Johan Land@LandJohan·5 Oca

Just scored 76.11% on ARC-AGI 2 — beating public GPT-5.2 and Gemini-3-Pro baselines by >20%, and (as far as I know) the best publicly reported result so far. Approach: what I’d call Multi-Model Reflective Reasoning - Using GPT-5.2, Gemini-3, Opus 4.5 - Long-horizon/multi-step reasoning (~6hrs/problem) - Agentic codegen (>100,000 python calls) - Visual reasoning - Council of judges Fun fact: all solver code was written by Gemini-3-CLI. Does this count as AI generating a new AI that beats the prior SOTA? 🤔 Full run + code (open source): kaggle.com/code/johanland… @GregKamradt , holiday break is over 🙂 semi-private when? #ARCAGI #AIResearch

English

3.9K

Johan Land@LandJohan·4 Şub

@pradanadimass @arcprize Oh, I am :) can trade y for x at efficient ratio. This was really going for max y though.

English

Dimas Pradana🛰@pradanadimass·4 Şub

@arcprize @LandJohan tell him, he need to consider the x-axis

English

245

Johan Land@LandJohan·4 Şub

@thedjpetersen @ItsBrain4Brain @arcprize The code is open source: github.com/beetree/ARC-AGI Do whatever you want with it :) I don't think I even put in any license in there so it's free for all!

English

DJ Petersen@thedjpetersen·4 Şub

@LandJohan @ItsBrain4Brain @arcprize Curious could you share?

English

Johan Land@LandJohan·4 Şub

No, but it's all open source. Maybe I should write a paper. Essentially, it's gathering the reasoning traces for all possible solutions. Then it's exposing those to three different judges with slighly different roles. The judges then express their opinions after which a solution is picked.

English

Jonathan Dunlap@JonathanRoseD·4 Şub

@LandJohan @ender_mode_ Respect. Also congrats on reaching new high scores! I'd love to understand more about the "council" mechanics. Is there a paper?

Ann Arbor, MI 🇺🇸 English

Ender@ender_mode_·3 Şub

Hall of fame twitter profile, guy has like 20 followers

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

321

Johan Land@LandJohan·4 Şub

@diegocabezas01 It's all public source: github.com/beetree/ARC-AGI Check the v7 branch, that's the latest. Actually, go back a few commits and you'll find an even higher performing version - I had to dumb it down a bit for the submission.

English

Diego | AI 🚀 - e/acc@diegocabezas01·4 Şub

@LandJohan That’s actually amazing any repo for the model? Or info on how it was built?

English

Diego | AI 🚀 - e/acc@diegocabezas01·3 Şub

AI model based on GPT-5.2 got a new RECORD in ARC-AGI 2: 73%

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

634

Johan Land@LandJohan·4 Şub

@Viam_Invenias_0 @teortaxesTex I think you can get higher with the chinese models with an approach similar to the one I took here.

English

Maiv@Viam_Invenias_0·4 Şub

@teortaxesTex Their scaling is just crazy, no chinese model even get past 20% yet, can whale bros did it?

English

190

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·3 Şub

Anyone still wants to argue that 5.2 is not the strongest model around? (yes this is a multi-model setup, but it's principally built around 5.2) ARC-AGI 2 will be solved this year.

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

138

11.3K

Johan Land@LandJohan·4 Şub

@captain_marrvel @OpenAI Beautiful model indeed. Slow though :/

English

MarrvelSystems@captain_marrvel·3 Şub

GPT-5.2 is a beast, extremely underrated I hope @OpenAI keeps it.

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

181

Johan Land@LandJohan·4 Şub

@DeryaTR_ Next challenge is indeed ARC-AGI-3! The beautiful thing about ARC-AGI is that they allow "hobbyists" like myself to fairly be benchmarked against the labs.

English

143

Derya Unutmaz, MD@DeryaTR_·3 Şub

Wow! New records on ARC-AGI, based on GPT 5.2! V1 is now at 94.5%, almost saturated! V2 is at 72.9%, I anticipate that it will near 100% in a few months! Next challenge is completing ARC-AGI-3!

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

235

23.6K

Johan Land@LandJohan·4 Şub

@kimmonismus It's moving fast, indeed! Exciting times ahead!

English

351

Chubby♨️@kimmonismus·4 Şub

Within just 10 months, performance on the ARC-AGI-2 benchmark surpassed 75%. Let that sink in.

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

466

37.7K

Johan Land@LandJohan·4 Şub

@joshlee361 Largely I agree. Few other things to it, but the key thing indeed is that different models/prompts/modalities/chaining generate diverse results. But then, you also need to "know when you know" and "know when you don't know" which is the other half of the problem.

English

⚔️Digital 👹 Ronin⚔️ (クラッシュ・オーバーライドX)@joshlee361·3 Şub

Just to set the record straight 📌 — no single model did the heavy lifting here. GPT helped with the math and structural scaffolding 🧮 Gemini handled the core logic, reasoning flow, and split image generation with precision 🧠🖼️ Claude absolutely dominated the creative side — especially when it came to accurate reproduction, narrative cohesion, and score fidelity 🎼✍️ When the outputs were evaluated, the results weren’t “exact benchmark copies,” but they were so close in behavior and performance that the distinction becomes academic. That’s the real takeaway. This is exactly why I’ve been preaching multi-agent / multi-model orchestration for over a year now 🔁🤝. Different intelligences excel at different layers — forcing one model to do everything is leaving performance on the table. Shoutout to @RileyRalmuto as well — he’s been pushing polyphonic system designs in the same spirit. Different voices, different strengths, one coherent output 🎶🧩. This isn’t model worship. It’s systems engineering. And this is where the real gains are coming from 🚀

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

149

Johan Land@LandJohan·4 Şub

@BadTechBandit @arcprize "Indie researcher", I like it :)

English

Roman M - Still Human Robot Boss - e/acc@BadTechBandit·4 Şub

📢 @arcprize: Johan Land's AI ensemble hits 94.5% on ARC-AGI leaderboard Solo researcher outperforms GPT-5.2 Pro (90.5%) on the benchmark designed to test generalization. Uses an ensemble approach combining multiple techniques. Code open-sourced. Why it matters: The gap between frontier labs and indie researchers keeps shrinking.

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

113

Johan Land@LandJohan·4 Şub

@permaximum88 Yes! I love the community! Thanks @permaximum88 for everything you're doing!

English

Permaximum@permaximum88·3 Şub

Our active discord member "beetree" (@LandJohan) submitted a SOTA solution that scored 72.9% on ARC-AGI-2. Congrats! Come join us on discord.gg/9b77dPAmcA

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

1.6K

Johan Land@LandJohan·4 Şub

@BillyHoy1_ Not really :)

English

109

Billy Hoy@BillyHoy1_·4 Şub

Just a good old ensembles of models

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

565

Johan Land@LandJohan·4 Şub

@SuperbBias Diversity is the keyword indeed. Of the biggest insights I had was to induce diversity in the models by forcing them to thinking in different spaces and modalities.

English

Tom English@SuperbBias·4 Şub

Diverse ensembles of models are The Way. Thus saith Tomathustra. (See the model ranking criterion from 1994 in my profile pic. Ideally, the highly-ranked models would have mutually orthogonal error vectors.)

ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English

121

Keşfet

@82deutschmark @charlielidbury @arcprize @ndzfs @codewithimanshu @chatgpt21 @desai_pratik @GregKamradt