Psyho

2.2K posts

Psyho banner
Psyho

Psyho

@FakePsyho

Humanity's Last Programmer; Game Designer; Problem Solver; past: OpenAI (Dota), Pro Competitive Programmer, Poker

I don't know anymore Katılım Mayıs 2012
392 Takip Edilen27.4K Takipçiler
Sabitlenmiş Tweet
Psyho
Psyho@FakePsyho·
Humanity has prevailed (for now!) I'm completely exhausted. I figured, I had 10h of sleep in the last 3 days and I'm barely alive. I'll post more about the contest when I get some rest. (To be clear, those are provisional results, but my lead should be big enough)
Psyho tweet media
English
550
1.1K
13.2K
2.2M
Psyho
Psyho@FakePsyho·
@sentientcar Imho, his timelines were already unreasonably long (as stated on Dwarkesh), so I highly doubt they got even longer. This was in line with him being focused on education.
English
1
0
11
1.3K
Sentient Car
Sentient Car@sentientcar·
@FakePsyho I read this as the opposite. He felt that R&D was a solved problem(or at least the path was obvious) and had jumped to education, which I think he is one of the best in deep learning. My read of him coming back would be longer timelimes, as only then does reserach matter more.
English
1
0
6
1.4K
Psyho
Psyho@FakePsyho·
FYI, all four subgames are heavily inspired by existing free puzzle games: - The Hearty Heroes of Hauling: sites.math.washington.edu/~ostroff/puzzl… + sites.math.washington.edu/~ostroff/puzzl… + sites.math.washington.edu/~ostroff/puzzl… - The Mirror Isles: alan.draknek.org/games/puzzlesc… - The Promise: silverspaceship.com/promesst/ & silverspaceship.com/promesst2/ - Skipping Stones to Lonely Homes: alan.draknek.org/games/puzzlesc… Heroes of Sokoban is my favorite and also the most approachable
Jonathan Blow@Jonathan_Blow

Something we've been working on...

English
3
10
307
36.1K
Psyho
Psyho@FakePsyho·
Lisan already mentioned most of the reasons. It's an aggregate benchmark (combines results from multiple benchmarks). The issue is, none of the benchmarks used have anything to do with IQ testing. Projection on the IQ scale is completely arbitrary, chosen in a way that those final values look somewhat reasonable. In other words, you're looking at a bunch of random numbers.
English
1
0
2
133
Kyler
Kyler@AI_evangelist42·
@FakePsyho why is it not just another benchmark?
English
1
0
1
137
Psyho
Psyho@FakePsyho·
@scaling01 I think I disagree about the effort. We're at a point where the whole concept and the site can be created with a single prompt.
English
1
0
1
177
Lisan al Gaib
Lisan al Gaib@scaling01·
@FakePsyho I don't blame the guy. at least he put some effort into it and the benchmark selection is good. others just post ChatGPT generated slop
English
2
0
0
201
Psyho
Psyho@FakePsyho·
@scaling01 sadly, works wonders for visibility someone should analyze all of those shitty tweets and create a list of the most braindead reposters in AI twitter
English
1
0
7
402
Lisan al Gaib
Lisan al Gaib@scaling01·
yeah im a bit confused about this one too what's the point of trying to force the IQ scale onto model intelligence when models are improving so much every month. I also think it creates a lot of confusion because I don't see how this maps to real human IQ scores. It just seems like an arbitrary axis that looks IQ-like but is not really comparable
English
3
0
19
2K
Colossus
Colossus@colossusmag·
Scott Wu is the co-founder of Cognition AI, one of the fastest-growing companies in history. He’s also the greatest competitive programmer the US has ever produced. You may have seen him doing impossible card tricks and mental math. You’ve never seen him asked about weed, Michael Jordan, cancer, and human consciousness over a punnet of strawberries. That is what Colossus editor-in-chief Jeremy Stern did on a recent visit to San Francisco. For those less familiar with @ScottWu46: In 2nd grade, he entered a math competition for 7th graders, lost, and was so furious he still fumes about it 20 years later. The next year he entered the 9th-grade division as a 3rd-grader and got a perfect score. Then he won first place at the US national middle-school math competition and three straight gold medals at the International Olympiad in Informatics, where he became the greatest American gold-medalist and coach in history. Most of the people running the biggest AI companies met as teenagers, competing for their countries on international math and science teams. OpenAI’s Greg Brockman, Anthropic’s Dario Amodei, Meta’s Alexandr Wang, to name just a few. Most agree that the von Neumann among them was Scott Wu. In November 2023, a few weeks after his mother died of lung cancer, on the day Sam Altman was fired from OpenAI, Wu founded his own AI company: Cognition. He was 26 and saw earlier than almost anyone that AI would converge on agents that work in the background, 24/7, like coworkers. He shipped Cognition’s AI software engineer Devin in March 2024. It worked poorly, and he took intense public criticism for it. Now, in its first 18 months of service, Devin has generated $445 million of revenue run rate and usage has doubled every eight weeks. The US Army, Goldman Sachs, and Mercedes-Benz are all customers. Cognition is raising at a valuation around $25 billion. @JeremySternLA sat down with Wu, the emperor of the nerds, to ask the questions we’d all ask one of the smartest people in America—building the most consequential technology of our generation—if we ever got the chance. As well as MJ and weed, they talk about the cluster of competitive math prodigies behind so much of AI, what makes us human when AGI arrives, and why Wu believes he was put on this earth to teach AI how to code. Read the piece below.
Colossus tweet media
English
84
170
1.9K
2.5M
Psyho
Psyho@FakePsyho·
Correct me if I'm wrong: - I read "competing for their countries on international math and science teams" as representing the US on the international level - he got into "U.S. Physics Team" which was top24 in the US; despite its name, it's a training camp - International olympiad (IPhO) is top5, he did not participate in IPhO
English
0
0
41
2.3K
Psyho
Psyho@FakePsyho·
@ppiotr_ @trajektoriePL Unikanie światła słonecznego i brak możliwości posiadania zarostu na twarzy bardzo w tym pomaga
Polski
1
0
4
247
piotr
piotr@ppiotr_·
@FakePsyho @trajektoriePL Ja to się zastanawiam czy Ty się nie starzejesz, czy już po prostu w czasie podróżujesz?
Polski
1
0
1
227
Michał Podlewski
Michał Podlewski@trajektoriePL·
Znalezione w czeluściach internetu. 2014: Jakub Pachocki, Marek Cygan, Przemysław "Psyho" Dębiak
Michał Podlewski tweet media
Polski
7
6
240
21.4K
Psyho
Psyho@FakePsyho·
It's much closer than before, but it's still quite far behind when comes down to creating novel approaches. All of the evolve-likes / autoresearch approaches hit a wall very fast. There's usually almost no improvement after a few hours. Coding agents are incredibly fast but they get stuck easily. They're not good at exploratory work with no imminent reward. Heuristic contests can run for 4h (AI is going to have a huge advantage here) to several weeks. With very short contests, humans have no time to experiment and no time to optimize their code (which is really important in those contests). AWTF finals was 10h long, which was on the shorter side and hence it's not fully representational.
English
1
1
14
889
Miles Brundage
Miles Brundage@Miles_Brundage·
@FakePsyho Thanks, that helps! Though could you clarify what you mean re: AI not being better at long-running contests? Why isn't the 2nd place / big speedup thing a sign that that's close? Or are these different types of contests I'm conflating
English
1
0
4
654
Miles Brundage
Miles Brundage@Miles_Brundage·
What happened to AI performance in coding competitions? Last I heard @FakePsyho was the one person to beat AI but that was at least a few months ago...
English
9
0
47
10.4K
Psyho
Psyho@FakePsyho·
@TorteDeLini going from left to right: yes no no no that's you I think yes no (that's me) I think yes
English
0
1
8
925
Torte de Lini
Torte de Lini@TorteDeLini·
In 2018, I met the OpenAI team at TI. Anybody how many of them are still working there?
Torte de Lini@TorteDeLini

I MET @OpenAI I MET @OpenAI I MET @OpenAI Such an awesome group of people, so happy to have met them!! Best part? The bots are loyal and never complain about the builds haha. I'm 100% embracing the inevitable bots take-over! Thanks for using my Hero Builds, so honoured

English
7
3
136
29.9K
Psyho
Psyho@FakePsyho·
@fchollet Considering they're quoting Agentica's result, I'm pretty sure the comment is about Agentica + GPT-5.5.
English
0
0
5
972
Psyho
Psyho@FakePsyho·
@patience_cave @scaling01 Thanks wasn't aware of it. Writing solvers feels like cheating, but I guess it still tests the rule discovery. I'd love to see an ablation study of some kind to see which features have the most impact (like better prompting alone, agent orchestration alone, no code exec, etc).
English
1
0
3
509
Psyho
Psyho@FakePsyho·
Initially I was somewhat optimistic, but I definitely gravitated towards "it's silly" side over time. Their prompt is so basic that it completely undersells how much effort the model should put into it. There's a lot of similarity between games, so humans have an unfair advantage as well. And agentica almost matching the score of an average human tester is a cherry on top.
English
2
0
7
717
Lisan al Gaib
Lisan al Gaib@scaling01·
@FakePsyho yup, that's basically what I said on release day maybe at the end of 2026 or 2027 this benchmark will start to be useful but having to wait a couple of weeks for results also isn't ideal. METR has the same problem
English
2
0
25
3.7K