Benjamin Ou

465 posts

Benjamin Ou

@AlephNuul

Benchmarking with games at @EpochAIResearch. Opinions are my own. Send me your weird and esoteric LLM benchmarks!

SF Bay Area Katılım Eylül 2020

126 Takip Edilen25 Takipçiler

Benjamin Ou@AlephNuul·51m

@cherylwoooo I had a dad-podcasts arc but fell off it as I realized I wasn't retaining much. But now that I'm working again, I can feel the pull; there's just not enough time to read books anymore ;-;

English

Cheryl Wu@cherylwoooo·9h

I’m a consumer of “dad books,” but I do realize how many podcasts I started listening to in the last few years!

Derek Thompson@DKThomp

"Dad books" — which this article, and some publishing insiders, use to describe "serious nonfiction" books across biography, current affairs and business and economics — are reportedly in a free fall, with sales declining every year for the last few years “The trend couldn’t be clearer,” said Jonathan Karp, the former chief executive of Simon & Schuster and publisher of the new Simon Six imprint. “When we have internal meetings to talk about this problem, it always comes around to podcasts,” said Jonathan Burnham, president and publisher of the Harper Group at HarperCollins Publishers.

English

961

Benjamin Ou@AlephNuul·1h

@peligrietzer @glyphikon "I skimmed the Wikipedia page for the incompleteness theorems and a couple chapters of GEB, so I now understand the nature of all things"

English

Peli Grietzer@peligrietzer·4h

@glyphikon what

English

161

Glyph@glyphikon·20h

In contrast to Gödel's own notoriously mystical Platonic views of mathematics, Gödel's own work essentially proved that mathematics isn't scientific. In other words, it can only ever serve as useful epistemic heuristics and can never ever be considered adequate enough to serve as the ontological basis of any kind of science. Unfortunately, even to this day, a lot of mathematicians, physicists, and virtually all neoclassical economists still have yet to get the memo about all this.

Quanta Magazine@QuantaMagazine

At age 25, Kurt Gödel proved there can never be a mathematical “theory of everything.” In this week’s Qualia column, @nattyover asks experts how his ideas changed the course of humanity’s unending search for truth. quantamagazine.org/what-do-godels…

English

11.2K

Benjamin Ou@AlephNuul·9h

@herbiebradley @MetacriticCap epoch.ai/blog/have-ai-c… There's some evidence the slope is ticking upwards as of reasoning models dropping. ECI is still not really a general reasoning index though (but we're working on making it better!)

English

Herbie Bradley@herbiebradley·9h

@MetacriticCap ECI is a linear scale that aggregates % accuracy benchmarks, right? And the slope is fairly constant Ant ARR is just representative of having crossed a threshold of usefulness, it's pretty decoupled from whether or not AI R&D is automated

English

122

Herbie Bradley@herbiebradley·10h

Some takes about RSI from discussions with many smart researchers & thinkers: 1. Many RSI (or automated AI R&D) debates converge to similar cruxes: is a 1000x sample efficiency improvement possible, can you just simulate reality and train on it with no sim2real gap, can we easily make models good at "fuzzy" tasks? People like to assume that automated research agents will find such breakthroughs specifically *because* without them, progress could be heavily bottlenecked on data or continued compute scale-ups. 2. The Yudkowsky "genius brain in a box" framing of ASI has latent influence on many researcher views even though people may not be aware of it. A common move is to "flip" predictions, as they go further out, from assuming LLM or deep learning-specific properties of future AI to assuming "von Neumann x1000", human brain-like properties. I'd like to see more thought-out reasoning of why this flip should occur at any particular point (eg pre or post automated AI R&D)—this question is a crux behind many predictions like AI 2027. 3. There are some cracks in this worldview beginning to show: predictions from a few years ago that models would be less jagged now than they are, or that they would be more deceptive, synthetic data would work better, etc. Many of these seem like prediction errors from imagining future models as a "human brain in a box", but LLMs are empirically a different kind of intelligence. Most models of software-only intelligence explosion are also coarse enough to mostly ignore properties of LLMs. 4. Views about fast RSI progress seem to be correlated with (a) belief that synthetic data is all you need (b) belief in very high GDP growth and an industrial explosion because of automated firms (c) having worked only in AI research or in small organizations. 5. Key technical things to track over the next 1-2 years: does RL increase in its generalization, AI lab data spend, can we automate synthetic RL env construction, best practices for FDEs deploying AI into large enterprises, coherency of AI personas, how powerful will multi-agent scaling of test-time compute be, and continual learning. 6. Overall I think the "RSI leading to *fast* takeoff" frame had huge alpha in 2022, moderate in 2024, and potentially is of neutral usefulness in 2026 for predicting the future.

English

191

12K

Benjamin Ou@AlephNuul·14h

Obviously there's a balance to be struck here, particularly if there's workplace pressure to output slop fast and skimp on learning/reviewing, but for people who are lucky enough to be able to take this approach, just ask the bots more questions! They don't get tired!

English

Benjamin Ou@AlephNuul·14h

Coding agent output a big PR you don't wanna review? Ask it to walk you through the code chunk-by-chunk. Chatbot spat out a bunch of stuff about philosophers you've never heard of? Ask it for book recommendations and then go read the books.

English

Benjamin Ou@AlephNuul·14h

There's a lot of bellyaching going on about AI reliance meaning you skip the dirty work where you build important intuitions in research, programming, etc. This is true, but it's also not that hard to just use the chatbots to also fix these issues on a personal level.

English

Benjamin Ou retweetledi

Cheryl Wu@cherylwoooo·18h

Also want to point out that, maybe unexpected to many people, econ fields are super related to AI. Growth theory is used to understand AI progress. Labor is used to understand social impacts. Economic history is used to study historical parallels. Political economy is used to study global AI armsrace. … Wield your weapons!!!

Joel Becker@joel_bkr

new (spicy) post from me: "Economists, mobilize" economics ideas are extremely helpful for understanding AI, but academia is dropping the ball. now is the time for economists to work on the most important problems in AI and to loudly encourage colleagues to do the same.

English

13K

Benjamin Ou retweetledi

Joel Becker@joel_bkr·21h

English

125

27.3K

Benjamin Ou@AlephNuul·2d

@_sholtodouglas @patwoozey @trq212 I asked Claude Opus 4.7 to find the newcomb's problem charts from its own system card and it failed to find them and insisted I must be mistaken. GPT-5.5 found them and gave me the section/page numbers easily. In general, Claude gives up quick and puts little effort into searches

English

Sholto Douglas@_sholtodouglas·2d

@patwoozey @trq212 fair take, working on it - got any examples of searches we do badly at?

English

471

Sholto Douglas@_sholtodouglas·2d

When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model

English

1.1K

1.4K

377.3K

Benjamin Ou@AlephNuul·2d

@tracewoodgrains In a vacuum, I like it too! But aesthetics is socially situated, so I cannot help but associate the Grokka's AI art tells with the truly abominable pissfilter garbage crypto e/acc types will shove onto my feed. It's guilt-by-association and not rational, but so goes aesthetics.

English

Jack@tracewoodgrains·2d

@AlephNuul I like the quokka. it makes me laugh

English

210

Jack@tracewoodgrains·2d

as someone who uses AI images for my posts I think this instinct is broadly a mistake. most writers are not artists, but cover art is helpful to readers and publications. the prior alternative was often unsatisfying creative commons images. AI cover art is more fun

Fernando 🌺🌌@zetalyrae

I don't want to read any blog post that has an AI-generated illustration.

English

394

33.4K

Benjamin Ou@AlephNuul·2d

@tracewoodgrains I'm not strongly against AI art in principle, but I do despise it if it looks bad, and if your lack of aesthetic taste means you're blasting us with bad AI art (such as this Quokka), then I'm going to think less of you for making the world a slightly uglier place.

English

219

Jack@tracewoodgrains·2d

like, am I gonna find a creative commons image of a quokka with a machine gun? no. clearly not. and nobody wants to see what it looks like when I try to draw it but my friendly neighborhood robot can give me one x.com/prerat/status/…

prerat@prerat

this image is a good litmus test

English

132

9.1K

Benjamin Ou retweetledi

T. Greer@Scholars_Stage·3d

On this question of whether we should or should not read "the classics" -- A decade back I taught some to high school students in China. It was a clarifying experience for me. As I put it:

English

423

25.6K

Benjamin Ou@AlephNuul·4d

@hecubian_devil @JeremiahDJohns @zikakuto Which y'know, you could probably fairly argue your accounting is no worse than the typical liberal's. But from the perspective of Taiwanese people the gaps you're not willing to confront are gaping and make you come off as a propagandist.

English

161

Benjamin Ou@AlephNuul·4d

@hecubian_devil @JeremiahDJohns @zikakuto I mean, you are doing your own sleight-of-hand here where none of China's human rights abuses within their own borders factor in, nor any of the fear Taiwanese face from China's "remarkable restraint" in the form of non-dreamworld airspace incursions and blockade exercises.

English

460

Cassie Pritchard@hecubian_devil·4d

Yeah this is totally the message voters care about, dude. Really got his finger on the pulse of America

Chuck Schumer@SenSchumer

Trump must not sell out Taiwan, period.

English

233

2.5K

828.5K

Benjamin Ou@AlephNuul·4d

@GregHBurnham @YafahEdelman @mentalgeorge rather than asking twitter what might happen if opus 4.7 plays 1 billion games of chess, we should simply have opus 4.7 play 1 billion games of chess

English

Greg Burnham@GregHBurnham·4d

@YafahEdelman @mentalgeorge Big if true, IMO! I think we see relatively little evidence of this. Structurally easier to hill-climb on coding tasks. But I’m not certain. We hope to investigate just this in some of the board games work we’ve been doing.

English

Tom Reed@mentalgeorge·4d

Say you let Opus 4.7 play 1 billion games of online chess. Between games, it can reflect on its play and write .md files to itself for future match-ups. How much does its Elo change?

English

17.8K

Benjamin Ou@AlephNuul·4d

@YafahEdelman how many charts have accumulated in your back pocket

English

Yafah Edelman@YafahEdelman·4d

Handy chart to keep in your back pocket!

Epoch AI@EpochAIResearch

Servers account for 60% of the total cost of owning a 1 GW AI data center. A typical 1 GW AI data center costs about $38B in up-front capital and $0.9B/year to operate. Annualizing the capital expenses over equipment lifespans, that equates to $8.5B/year, with $5B for servers.

English

3.5K

Benjamin Ou@AlephNuul·11 May

@mathandcobb Plenty; "sigmoid curve" is often what's thrown around in forecasting an imminent plateau in capabilities. It's just had a rough record in the past few years of seemingly unstoppable exponential capabilities growth through sheer scaling and a couple algorithmic innovations.

English

388

Alvaro Lozano-Robledo@mathandcobb·11 May

It seems to me that some of the most pessimistic sentiments about the future of the math profession rely on some of the most optimistic predictions about the future of AI/LLM's and in particular they rely on the exponential growth of the capabilities of the models. Aren't there predictions about when the capacity of models will plateau (at least in their current incarnations of "AI"), because of theoretical or practical reasons?

English

6.6K

Benjamin Ou@AlephNuul·10 May

@ToddBoogaloo @Afinetheorem You could try the same prompt in a private/incognito chat, most LLM chatbots should have that kind of option (often as a not super clearly labeled toggle towards the top right)

English

Garbage Snake 🇪🇹@ToddBoogaloo·10 May

@Afinetheorem Gemini, using the same prompt, gave me suggestions that were more leftwing (rail nationalization, universal childcare/single payer and vienna model housing) perhaps because I had spoken to it about these ideas before in previous chats?

English

147

Kevin A. Bryan@Afinetheorem·10 May

Interesting. Every single suggestion by Claude here is one I would agree is a good idea and impactful. But for the sake of epistemic humility (and to understand AI better): is this because technocratic econ-minded INTJ centrists have great ideas, or is there a problem? Steelman?

Arram@arram

Asked Claude: 'There's a meme called the "fix everything easily switch". What policies do you think are the best candidates for being a real fix everything switch in the US? Give me your top ten, your confidence, your reasoning, and why a given policy has not been implemented.'

English

5.3K

Benjamin Ou retweetledi

METR@METR_Evals·9 May

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

English

247

2.1K

966.6K

Benjamin Ou@AlephNuul·9 May

@hecubian_devil I think in the bay area they just call this a polycule

English

575

Cassie Pritchard@hecubian_devil·8 May

How come rich people don’t become patrons anymore? Rich guy in 1430 would be supporting like 16 master artists and all their studios. Never see that anymore. They should do that again, but also for posters, specifically (the real artists of the 21st century)

English

703

45.4K

Keşfet

@cherylwoooo @peligrietzer @glyphikon @herbiebradley @MetacriticCap @_sholtodouglas @patwoozey @trq212