Eric Pang

113 posts

Eric Pang banner
Eric Pang

Eric Pang

@_eric_pang_

math/cs @uwaterloo | prev: ml @quora, @amazon

San Francisco, CA Katılım Haziran 2020
754 Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Eric Pang
Eric Pang@_eric_pang_·
Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!
ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English
28
94
887
135.7K
Eric Pang
Eric Pang@_eric_pang_·
@karpathy @goakhmad Agree with most points except that golden age of movies started in the 80s. imo 70s Hollywood was the most experimental with the death of the counterculture movement and the end of the Hays Code. Obviously worldwide cinema had a different peak period also.
English
0
0
11
7.1K
Andrej Karpathy
Andrej Karpathy@karpathy·
Movies are great though. Even if you set aside the pure artistic enjoyment (you shouldn’t). Movies are stories, and stories are powerful, primal, moving, motivating. They are prompts to you to consider dilemmas and scenarios, to build your world model and compass. My rec is to go to the golden age of story telling and movie making that imo ramped up in the 80s, was roaring in 90s, peaked early 00s, and declined since. One sourcing example: pick a random year there, look up Oscar winners, pick and watch. Enjoy and attend guilt free.
English
148
340
7K
217.1K
Akhmad Mamirov
Akhmad Mamirov@goakhmad·
I enjoy learning from @karpathy videos. As a cs student I feel guilty watching movies lately when there is a gold mine for free.
Akhmad Mamirov tweet media
English
25
41
2.2K
218.4K
weisser
weisser@julianweisser·
Francis Ford Coppola sold his vineyard to self-finance ($120M) a film he’d been obsessed with making for decades. Almost everyone says it’s awful. Yet after watching it last night I keep thinking about how the plot (+ the backstory) celebrates builders in a world of naysayers.
weisser tweet media
English
37
19
995
81.6K
Axel Darmouni
Axel Darmouni@ADarmouni·
Tonight on ARC-ventures: making vLLM x gptoss work, and attempting the Dreamcoder-like scaffold… But it seems OSS-20b is a relatively weak model 20 tasks solved initially, but no more after even if it takes time to reflect So on the roadmap, finetuning (and perhaps RL?) afoot!
English
1
0
3
336
Eric Pang
Eric Pang@_eric_pang_·
Thanks for the cover! My architecture graph does not have a typo: when it's evaluating on the public eval set, the actual test outputs are given, so the system does check if the best program gets 100% on test examples. You are right that we don't know the answers for the submission run.
English
1
0
2
178
Eric Pang
Eric Pang@_eric_pang_·
@FraserGreenlee Yes, I think this point is underdiscussed. My solution has higher accuracy and lower cost per task on ARC-1 compared to the average human.
English
0
2
14
330
Siméon
Siméon@Simeon_Cps·
@_eric_pang_ Hey Eric! Congrats on the results. Am building something new very closely related to what you did and reached you on LinkedIn (DMs not open here)! Would love to chat.
English
2
0
2
151
Eric Pang
Eric Pang@_eric_pang_·
The same reason is why ARC-AGI is the most important benchmark in AI. It is the only benchmark that's not saturated after repeated attempts from players big and small.
Eric Pang tweet media
English
1
0
4
552
Eric Pang
Eric Pang@_eric_pang_·
Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!
ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English
28
94
887
135.7K
Eric Pang
Eric Pang@_eric_pang_·
That's right, when the system attempts the first task, it skips the program fetching step since library's originally empty. If you want to see how the library is evolved, check out github.com/epang080516/ar…. This is the resulting library after the system attempts the ARC-2 public training set to build Knowledge Priors.
English
0
0
2
133
Axel Darmouni
Axel Darmouni@ADarmouni·
Super cool read by @_eric_pang_ on how he managed, thanks to the new power of recent models like Grok 4, to get a crazy high score on both @arcprize 1 (77%) and 2 (26%) at less than $4 per task! The methodology is inspired by DreamCoder, a pretty neat approach I discovered with the read :) For each task, look at a library of conserved programs (if lib is empty, I suppose you start without a program). Find the best program and generate 2 improvements: keep the best one by adding it to the library, which you thus construct iteratively while sampling. Do this process for all tasks, then redo with the newly improved library. It only required him… 10 LLM calls per task. Legit insane when you think that previous solutions had 500 or 8000. Pretty cool! Am really, really interested in getting reasoning traces of the refinements if possible… otherwise I might take my hands and do them, because this is a crazy good induction idea! 😁
Axel Darmouni tweet mediaAxel Darmouni tweet media
English
2
0
3
324
Eric Pang
Eric Pang@_eric_pang_·
@K3ithAI @jeremyberman On semi-private eval set: - ARC-AGI-1: 77.1%, $2.56/task - ARC-AGI-2: 26.0%, $3.97/task
Eesti
1
0
9
964
K3ith.AI
K3ith.AI@K3ithAI·
@_eric_pang_ @jeremyberman Did you score that on the training or actual eval set? I’ve crushed the training set too…nbd…but the eval set, the 100 questions for arc-agi2 are no joke…and you got 77% on that?!?!
English
1
0
5
1.2K
ARC Prize
ARC Prize@arcprize·
New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation
ARC Prize tweet media
English
147
266
2K
7.5M
Eric Pang
Eric Pang@_eric_pang_·
@joshlee361 @arcprize @jeremyberman My solution is cost-efficient. It costs <$500 to fully test on the public eval set with Grok-4. You can decrease the cost further with a more lightweight model.
English
2
1
1
188
Eric Pang retweetledi
Jeremy Berman
Jeremy Berman@jeremyberman·
I'm back at the top of ARC-AGI with my new program. I use @grok 4 and multi-agent collaboration with evolutionary test-time compute
Jeremy Berman tweet media
ARC Prize@arcprize

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

English
72
95
1.3K
513.5K