Johan Land

39 posts

Johan Land

Johan Land

@LandJohan

https://t.co/C03802IDYv

Katılım Eylül 2011
11 Takip Edilen341 Takipçiler
Johan Land
Johan Land@LandJohan·
@82deutschmark Thanks man! And, thanks a ton for all you do for the community!
English
2
0
2
62
ARC Prize
ARC Prize@arcprize·
New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together
ARC Prize tweet media
English
28
88
924
213.3K
Johan Land
Johan Land@LandJohan·
@ndzfs No, you still need to find the solution, not just judge it
English
0
0
0
33
Franck SN
Franck SN@ndzfs·
@LandJohan If you know before hand when they are right or wrong then you would score 100% on any benchmark I guess ?
English
1
0
0
43
Johan Land
Johan Land@LandJohan·
Just scored 76.11% on ARC-AGI 2 — beating public GPT-5.2 and Gemini-3-Pro baselines by >20%, and (as far as I know) the best publicly reported result so far. Approach: what I’d call Multi-Model Reflective Reasoning - Using GPT-5.2, Gemini-3, Opus 4.5 - Long-horizon/multi-step reasoning (~6hrs/problem) - Agentic codegen (>100,000 python calls) - Visual reasoning - Council of judges Fun fact: all solver code was written by Gemini-3-CLI. Does this count as AI generating a new AI that beats the prior SOTA? 🤔 Full run + code (open source): kaggle.com/code/johanland… @GregKamradt , holiday break is over 🙂 semi-private when? #ARCAGI #AIResearch
Johan Land tweet media
English
16
9
91
3.9K
Johan Land
Johan Land@LandJohan·
No, but it's all open source. Maybe I should write a paper. Essentially, it's gathering the reasoning traces for all possible solutions. Then it's exposing those to three different judges with slighly different roles. The judges then express their opinions after which a solution is picked.
English
0
1
2
96
Jonathan Dunlap
Jonathan Dunlap@JonathanRoseD·
@LandJohan @ender_mode_ Respect. Also congrats on reaching new high scores! I'd love to understand more about the "council" mechanics. Is there a paper?
Ann Arbor, MI 🇺🇸 English
1
0
0
58
Johan Land
Johan Land@LandJohan·
@diegocabezas01 It's all public source: github.com/beetree/ARC-AGI Check the v7 branch, that's the latest. Actually, go back a few commits and you'll find an even higher performing version - I had to dumb it down a bit for the submission.
English
0
0
0
42
Maiv
Maiv@Viam_Invenias_0·
@teortaxesTex Their scaling is just crazy, no chinese model even get past 20% yet, can whale bros did it?
English
1
0
1
190
Johan Land
Johan Land@LandJohan·
@DeryaTR_ Next challenge is indeed ARC-AGI-3! The beautiful thing about ARC-AGI is that they allow "hobbyists" like myself to fairly be benchmarked against the labs.
English
0
0
6
143
Johan Land
Johan Land@LandJohan·
@joshlee361 Largely I agree. Few other things to it, but the key thing indeed is that different models/prompts/modalities/chaining generate diverse results. But then, you also need to "know when you know" and "know when you don't know" which is the other half of the problem.
English
1
0
2
67
⚔️Digital 👹 Ronin⚔️ (クラッシュ・オーバーライドX)
Just to set the record straight 📌 — no single model did the heavy lifting here. GPT helped with the math and structural scaffolding 🧮 Gemini handled the core logic, reasoning flow, and split image generation with precision 🧠🖼️ Claude absolutely dominated the creative side — especially when it came to accurate reproduction, narrative cohesion, and score fidelity 🎼✍️ When the outputs were evaluated, the results weren’t “exact benchmark copies,” but they were so close in behavior and performance that the distinction becomes academic. That’s the real takeaway. This is exactly why I’ve been preaching multi-agent / multi-model orchestration for over a year now 🔁🤝. Different intelligences excel at different layers — forcing one model to do everything is leaving performance on the table. Shoutout to @RileyRalmuto as well — he’s been pushing polyphonic system designs in the same spirit. Different voices, different strengths, one coherent output 🎶🧩. This isn’t model worship. It’s systems engineering. And this is where the real gains are coming from 🚀
ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English
1
0
1
149
Roman M - Still Human Robot Boss - e/acc
📢 @arcprize: Johan Land's AI ensemble hits 94.5% on ARC-AGI leaderboard Solo researcher outperforms GPT-5.2 Pro (90.5%) on the benchmark designed to test generalization. Uses an ensemble approach combining multiple techniques. Code open-sourced. Why it matters: The gap between frontier labs and indie researchers keeps shrinking.
ARC Prize@arcprize

New SOTA public submission to ARC-AGI: - V1: 94.5%, $11.4/task - V2: 72.9%, $38.9/task Based on GPT 5.2, this bespoke refinement submission by @LandJohan ensembles many approaches together

English
1
0
0
113
Johan Land
Johan Land@LandJohan·
@SuperbBias Diversity is the keyword indeed. Of the biggest insights I had was to induce diversity in the models by forcing them to thinking in different spaces and modalities.
English
1
0
2
46