Eric Pang (@_eric_pang_) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Eric Pang@_eric_pang_·16 Eyl

Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!

ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English

28

94

887

135.7K

Eric Pang@_eric_pang_·10 Kas

@karpathy @goakhmad Agree with most points except that golden age of movies started in the 80s. imo 70s Hollywood was the most experimental with the death of the counterculture movement and the end of the Hays Code. Obviously worldwide cinema had a different peak period also.

English

0

11

7.1K

Andrej Karpathy@karpathy·10 Kas

Movies are great though. Even if you set aside the pure artistic enjoyment (you shouldn’t). Movies are stories, and stories are powerful, primal, moving, motivating. They are prompts to you to consider dilemmas and scenarios, to build your world model and compass. My rec is to go to the golden age of story telling and movie making that imo ramped up in the 80s, was roaring in 90s, peaked early 00s, and declined since. One sourcing example: pick a random year there, look up Oscar winners, pick and watch. Enjoy and attend guilt free.

English

148

340

7K

217.1K

Akhmad Mamirov@goakhmad·10 Kas

I enjoy learning from @karpathy videos. As a cs student I feel guilty watching movies lately when there is a gold mine for free.

English

25

41

2.2K

218.4K

Eric Pang@_eric_pang_·23 Eyl

@pitdesi @julianweisser Coppola loves to give all-timer quotes at Cannes. x.com/JFrankensteine…

John Frankensteiner@JFrankensteiner

Coppola showing up at Cannes 1979 with Apocalypse Now, still mostly insane from being in the jungle too long, just spitting bars is what it's all about

English

0

1

47

Sheel Mohnot@pitdesi·22 Eyl

@julianweisser One of my favorite lines ever "I got to do this" x.com/pitdesi/status…

Sheel Mohnot@pitdesi

I love this so so so much. If I was a tattoo guy I would get a "I got to do this" tattoo.

English

3

0

87

5.6K

weisser@julianweisser·22 Eyl

Francis Ford Coppola sold his vineyard to self-finance ($120M) a film he’d been obsessed with making for decades. Almost everyone says it’s awful. Yet after watching it last night I keep thinking about how the plot (+ the backstory) celebrates builders in a world of naysayers.

English

37

19

995

81.6K

Eric Pang@_eric_pang_·21 Eyl

@ADarmouni 20 tasks out of which dataset?

English

1

0

153

Axel Darmouni@ADarmouni·21 Eyl

Tonight on ARC-ventures: making vLLM x gptoss work, and attempting the Dreamcoder-like scaffold… But it seems OSS-20b is a relatively weak model 20 tasks solved initially, but no more after even if it takes time to reflect So on the roadmap, finetuning (and perhaps RL?) afoot!

English

1

0

3

336

Eric Pang@_eric_pang_·19 Eyl

Thanks for the cover! My architecture graph does not have a typo: when it's evaluating on the public eval set, the actual test outputs are given, so the system does check if the best program gets 100% on test examples. You are right that we don't know the answers for the submission run.

English

1

0

2

178

Trelis Research@TrelisResearch·19 Eyl

Just covered the @jeremyberman and @_eric_pang_ approaches to @arcprize . Ye can check the replay below or on yt if of interest. x.com/i/broadcasts/1…

English

3

0

9

548

Eric Pang@_eric_pang_·18 Eyl

@FraserGreenlee Yes, I think this point is underdiscussed. My solution has higher accuracy and lower cost per task on ARC-1 compared to the average human.

English

0

2

14

330

Fraser@FraserGreenlee·17 Eyl

This puts ARC-AGI-1 performance on par with the average M-turker, reaching the same accuracy and cost per task.

ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English

1

0

6

640

Eric Pang@_eric_pang_·18 Eyl

@Simeon_Cps Replied to you there. DM's also open now.

English

0

1

34

Siméon@Simeon_Cps·17 Eyl

@_eric_pang_ Hey Eric! Congrats on the results. Am building something new very closely related to what you did and reached you on LinkedIn (DMs not open here)! Would love to chat.

English

2

0

2

151

Eric Pang@_eric_pang_·14 Eyl

The same reason is why ARC-AGI is the most important benchmark in AI. It is the only benchmark that's not saturated after repeated attempts from players big and small.

English

1

0

4

552

Eric Pang retweetledi

Elon Musk@elonmusk·17 Eyl

Grok 5 starts training in a few weeks

ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English

2.8K

3.4K

31.3K

7.3M

Eric Pang@_eric_pang_·17 Eyl

@yechan_ai @jeremyberman Thank you!

English

0

2

637

Yechan Do@yechan_ai·17 Eyl

@_eric_pang_ @jeremyberman Super Cool

English

1

0

2

745

Eric Pang@_eric_pang_·16 Eyl

Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!

ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English

28

94

887

135.7K

Eric Pang@_eric_pang_·17 Eyl

@martinbowling Thanks!

English

0

3

113

Martin Bowling@martinbowling·17 Eyl

Do yourself a favor and read the full write up, bro cooked hard

Eric Pang@_eric_pang_

Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!

English

1

0

5

503

Eric Pang@_eric_pang_·17 Eyl

That's right, when the system attempts the first task, it skips the program fetching step since library's originally empty. If you want to see how the library is evolved, check out github.com/epang080516/ar…. This is the resulting library after the system attempts the ARC-2 public training set to build Knowledge Priors.

English

0

2

133

Axel Darmouni@ADarmouni·17 Eyl

Super cool read by @_eric_pang_ on how he managed, thanks to the new power of recent models like Grok 4, to get a crazy high score on both @arcprize 1 (77%) and 2 (26%) at less than $4 per task! The methodology is inspired by DreamCoder, a pretty neat approach I discovered with the read :) For each task, look at a library of conserved programs (if lib is empty, I suppose you start without a program). Find the best program and generate 2 improvements: keep the best one by adding it to the library, which you thus construct iteratively while sampling. Do this process for all tasks, then redo with the newly improved library. It only required him… 10 LLM calls per task. Legit insane when you think that previous solutions had 500 or 8000. Pretty cool! Am really, really interested in getting reasoning traces of the refinements if possible… otherwise I might take my hands and do them, because this is a crazy good induction idea! 😁

English

2

0

3

324

Eric Pang@_eric_pang_·17 Eyl

@rnadomaccount11 @jeremyberman Thank you for the kind words!

English

0

1

455

rnadomaccount183@rnadomaccount11·17 Eyl

@_eric_pang_ @jeremyberman I enjoyed reading your blogpost about your very elegant solution! Thank you for posting.

English

1

0

2

630

Eric Pang@_eric_pang_·17 Eyl

@facundo_fagalde @jeremyberman Thank you!

English

0

1

522

Facu Fagalde - e/acc@facundo_fagalde·17 Eyl

@_eric_pang_ @jeremyberman Congratulations!!

English

1

0

2

1.2K

Eric Pang@_eric_pang_·17 Eyl

@K3ithAI @jeremyberman On semi-private eval set: - ARC-AGI-1: 77.1%, $2.56/task - ARC-AGI-2: 26.0%, $3.97/task

Eesti

1

0

9

964

K3ith.AI@K3ithAI·17 Eyl

@_eric_pang_ @jeremyberman Did you score that on the training or actual eval set? I’ve crushed the training set too…nbd…but the eval set, the 100 questions for arc-agi2 are no joke…and you got 77% on that?!?!

English

1

0

5

1.2K

Eric Pang@_eric_pang_·17 Eyl

@MLStreetTalk @jeremyberman Thank you! I am a huge fan.

English

1

0

1

1.2K

Machine Learning Street Talk@MLStreetTalk·17 Eyl

@_eric_pang_ @jeremyberman Awesome work Eric!!!

English

1

0

2

1.3K

Eric Pang@_eric_pang_·17 Eyl

@vr4300 @jeremyberman Thank you!

English

0

1

955

George Morgan@vr4300·16 Eyl

@_eric_pang_ @jeremyberman Amazing work! 👏

English

1

6

1.9K

Eric Pang@_eric_pang_·16 Eyl

@thomasLe_e Thanks!

English

0

1

103

Thomas@thomasLe_e·16 Eyl

good article ctpang.substack.com/p/arc-agi-2-so…

ARC Prize@arcprize

Port To Kaggle We challenge a member of the community to port these solutions over to Kaggle using an open source model as part of the official ARC Prize 2025 competition Learn more here: kaggle.com/competitions/a… How would gpt-20b do? kaggle.com/competitions/o…

English

1

0

3

265

Eric Pang@_eric_pang_·16 Eyl

@joshlee361 @arcprize @jeremyberman Check out x.com/hyperbolic_lab…

Hyperbolic@hyperbolic_labs

Excited to announce Hyperbolic's partnership with the ARC Prize (@arcprize), a groundbreaking competition pushing the frontiers of AGI! Receive up to $1000 in compute credits. 🧵

English

0

1

104

⚔️Digital 👹 Ronin⚔️ (クラッシュ・オーバーライドX)@joshlee361·16 Eyl

@_eric_pang_ @arcprize @jeremyberman Solo dev no VC no funding that numbers nowhere near something I can afford I've tested on arc o3 has theoretical caps grok 4 even more just can't afford it

English

1

0

1

129

ARC Prize@arcprize·16 Eyl

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English

147

266

2K

7.5M

Eric Pang@_eric_pang_·16 Eyl

@joshlee361 @arcprize @jeremyberman My solution is cost-efficient. It costs <$500 to fully test on the public eval set with Grok-4. You can decrease the cost further with a more lightweight model.

English

2

1

188

⚔️Digital 👹 Ronin⚔️ (クラッシュ・オーバーライドX)@joshlee361·16 Eyl

@arcprize @jeremyberman @_eric_pang_ I'm trying I just can't afford the price when I'm a solo dev and no funding

English

2

0

1

4.4K

Eric Pang retweetledi