jsd

5.9K posts

jsd

@datagenproc

@EpochAIResearch. My DMs are open. Anonymous feedback: https://t.co/0k6Duylwqa

Berkeley, CA Katılım Ağustos 2022

3.9K Takip Edilen1.4K Takipçiler

Sabitlenmiş Tweet

jsd@datagenproc·2 Oca

Hi all, I'm interested in feedback! You can leave anonymous comments here: admonymous.co/jsd

English

2.5K

jsd@datagenproc·1h

@Astatide42 There’s a Francecamp?

English

Astatide@Astatide42·3h

Where my Francecamp tweeters at?

English

jsd@datagenproc·19h

@CharlesD353 I’m down for early morning runs on trails, though am unlikely to last 2h!

English

Charles🔸@CharlesD353·22h

I'm going to be in Berkeley next week and quite jet lagged, thus probably awake at 5am. Unlike London, Berkeley seems to have nice trails and hills. If anyone wants to meet up and go for a ~two hour ~530am run up some trails on Tuesday or Wednesday, lmk!

English

1.5K

jsd@datagenproc·1d

@eliebakouch @varunneal @slimshetty_ @testingham

QAM

elie@eliebakouch·1d

we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more primeintellect.ai/auto-nanogpt

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline

English

783

105.8K

jsd@datagenproc·1d

@samth @natalia__coelho @joseph_h_garvin @fleetingbits I think benchmarks are not very informative if you want to go from a score to some measure of real world impact. But I think a model being clearly better than another on a benchmark is usually a decent predictor of a real-world capability gap.

English

Sam Tobin-Hochstadt@samth·7 May

@natalia__coelho @joseph_h_garvin @fleetingbits It's certainly possible that these benchmarks are accurate reflections of capabilities. But I have a strong prior belief that benchmarks of frontier models are not very informative and thus that the actual evidence about Mythos is highly significant where it doesn't exist for 5.5

English

FleetingBits@fleetingbits·7 May

some thoughts on gpt-5.5 and the missing zero day vulnerabilities 1) so, @natalia__coelho wrote an article in which she argued (among other things) that gpt-5.5 and mythos have similar cyber capabilities 2) one of the counterarguments has been that, even though the two models have similar benchmark scores, openai did not discover the kind of vulnerabilities that anthropic discovered with mythos 3) i think one of the reasons that anthropic discovered so many vulnerabilities with mythos is that anthropic has such a strong focus on safety, esp measurement 4) and, so it would make sense for anthropic to spin up a team to try to find real world vulnerabilities with their models in order to measure when models developed dangerous cyber capabilities 5) but, this is a difficult organizational commitment; you need run mythos, triage the vulnerabilities, inform the maintainers, figure out what to do after that, etc... 6) if you find the vulnerabilities, but do nothing about them, then you have created a real public relations risk for yourself if those vulnerabilities are ever exploited 7) and, if you just report them all to the maintainers and just swamp them, then you create another problem for yourself where open source maintainers will complain about you 8) and, if you wait to release your model, you may forego revenue that you could otherwise have obtained or put yourself behind in the race to grab customers 9) this is all to say that scanning thousands of open source repositories for vulnerabilities isn't a risk free decision; it's actually a potentially expensive decision 10) now, if you are anthropic, and your leadership cares a lot about safety (as an org) and because you believe in safety, you think this is just how the world is going to go and you can see the market opportunity for security then this isn't that hard of a decision 11) but, if you are openai and your leadership probably sees safety as part hindrance (painful stuff they have to do before they release a model) and part optics (good stuff they can say to congress) then it's not that easy 12) any time your executive team spends thinking about how to staff the team, handle the reach out to maintainers, decide whether to embargo the model, etc... could have been spent figuring out how to sell ads, get compute, etc... 13) and, although they might see security as a very valuable market, maybe not as important immediately as getting a new frontier model out across all other use cases where they can win deals now 14) so, i think the fact that openai did not run a similar program with gpt-5.5 is not strong evidence that gpt-5.5 could not have been used to find similar vulnerabilities

Sam Tobin-Hochstadt@samth

I think this analysis is fundamentally misguided, in a looking-under-the-streetlight way. The reason people were freaked out about Mythos was not SWE-bench scores, but all the 0days. And 5.5, which is a great model, is not producing tons of 0days.

English

jsd@datagenproc·1d

Servers are most of the cost of AI data centers. Though note that the energy and facility share would likely be larger than it appears here for BTM powered data centers, or if including CapEx for BYOP/C projects.

Epoch AI@EpochAIResearch

Servers account for 60% of the total cost of owning a 1 GW AI data center. A typical 1 GW AI data center costs about $38B in up-front capital and $0.9B/year to operate. Annualizing the capital expenses over equipment lifespans, that equates to $8.5B/year, with $5B for servers.

English

647

jsd retweetledi

Epoch AI@EpochAIResearch·1d

English

207

27.7K

jsd retweetledi

Mechanize@MechanizeWork·1d

Here's how each model scored over 24 hours of wall-clock time, on a single attempt. GPT-5.5 performs best, with a late gain in the final hours. Claude models settle in by hour 8 and cluster within noise. More at: gbaeval.com/leaderboard

English

13.9K

jsd@datagenproc·1d

openai.com/business/custo…

ZXX

jsd@datagenproc·1d

ZXX

jsd@datagenproc·1d

sectors of the organizations named in OpenAI Customer Stories

English

423

jsd@datagenproc·2d

from the author of mirrorsea.xyz

English

108

jsd@datagenproc·2d

claim @norvid_studies

English

224

jsd retweetledi

Epoch AI@EpochAIResearch·2d

Superstar AI researchers are paid >10× more than their frontier lab colleagues, and >100× more than most postdocs. Why? The naive explanation is that this is just due to differences in researcher quality. But in a new essay, @ansonwhho argues that this is very incomplete.

English

287

65.5K

jsd@datagenproc·2d

At Epoch we have been broadening our scope.

Epoch AI@EpochAIResearch

We see examples of this dynamic in other fields: - In the 100m sprint, 1st gets way more reward/recognition than 2nd, despite often finishing neck-and-neck - Some artists earn far more than others, but it's hard to argue this reflects big differences in quality

English

2.9K

jsd@datagenproc·2d

@SakumiBLR Bientôt il n’y aura plus besoin du technicien

Français

jsd@datagenproc·2d

@SakumiBLR Sérieusement, je me demande pourquoi c’est pas déjà omniprésent, y compris dans d’autres domaines. Il y a des trucs comme MSFT Remote Assist mais c’est assez rare.

Français

113

fblr@SakumiBLR·2d

Google glass/Méta glass... Technicien de niveau 2 ou 3 qui pilote le référent informatique local. #EndGame

Slowpoke ԅ(¯﹃¯ԅ)@_Ellexa

À mon ancien boulot, on m’a fait déplacer à Avignon. Ils ont payés trains + la nuit d’hôtel + le repas, tout ça car les vendeurs dans une de nos agences ne savaient pas connecté un ordinateur à une TV avec un câble HDMI (durée de l’intervention, 5 eecondes)

Français

1.4K

jsd@datagenproc·6d

@pfau @ben_golub I have also been wondering!

English

560

David Pfau@pfau·6d

@ben_golub Wait...are you related to *that* Golub?

English

7.3K

Ben Golub@ben_golub·6d

det( λ Ι - A )

Patrick Collison@patrickc

Which are the most common everyday phenomena that we don't properly understand? Off the top of my head: • Lightning (how does it happen?) • Sleep; dreams (why do they exist?) • Glass (thermodynamics of formation) • Turbulence (when does it start?) • Morphogenesis (how does a creature know what should go where?) • Rain (it seems to start faster than models would predict) • Ice (dynamics of slipperiness) • Static electricity (which material will donate electrons?) • General anaesthetic. (And the mechanism of a lot of drugs, e.g. paracetamol.)

1.1K

133.9K

jsd@datagenproc·6d

@panickssery Lower*

English

jsd@datagenproc·6d

@panickssery Maybe he was also referring to loser courts not just the Supreme Court? Not sure if that makes it more true.

English

177

Arjun Panickssery@panickssery·6d

I don't get the defense for this claim in the comments because if you're 45 then you were 5 years old when Scalia joined and Rehnquist became CJ and since then there's mainly Obergefell to complain about while you're happy with Citizens United, Heller maybe, Dobbs, whatever

Sean T at RCP@SeanTrende

@Blake_Allen13 I think most on the right today grew up with the courts promoting liberal social goals and would see kneecapping them as a feature rather than a bug.

English

1.9K

jsd@datagenproc·6d

@YafahEdelman @chrisbarber Eggs? Where we’re going, we don’t need eggs!

English

Yafah Edelman@YafahEdelman·6d

@chrisbarber 2036

202

Yafah Edelman@YafahEdelman·9 May

In celebration of passing 1k followers, I'm inviting people to reply to this asking me to forecast the probability of any AI related event happening in the future. I will provide a point estimate based largely on whatever reasoning pops into my head first.

English

100

165

28.1K

Keşfet

@Astatide42 @CharlesD353 @eliebakouch @varunneal @slimshetty_ @testingham @samth @natalia__coelho