jsd

5.9K posts

jsd banner
jsd

jsd

@datagenproc

@EpochAIResearch. My DMs are open. Anonymous feedback: https://t.co/0k6Duylwqa

Berkeley, CA Katılım Ağustos 2022
3.9K Takip Edilen1.4K Takipçiler
Sabitlenmiş Tweet
jsd
jsd@datagenproc·
Hi all, I'm interested in feedback! You can leave anonymous comments here: admonymous.co/jsd
English
0
1
4
2.5K
Astatide
Astatide@Astatide42·
Where my Francecamp tweeters at?
English
2
0
1
76
jsd
jsd@datagenproc·
@CharlesD353 I’m down for early morning runs on trails, though am unlikely to last 2h!
English
1
0
1
76
Charles🔸
Charles🔸@CharlesD353·
I'm going to be in Berkeley next week and quite jet lagged, thus probably awake at 5am. Unlike London, Berkeley seems to have nice trails and hills. If anyone wants to meet up and go for a ~two hour ~530am run up some trails on Tuesday or Wednesday, lmk!
English
5
1
20
1.5K
elie
elie@eliebakouch·
we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more primeintellect.ai/auto-nanogpt
elie tweet media
Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline

English
35
82
783
105.8K
jsd
jsd@datagenproc·
@samth @natalia__coelho @joseph_h_garvin @fleetingbits I think benchmarks are not very informative if you want to go from a score to some measure of real world impact. But I think a model being clearly better than another on a benchmark is usually a decent predictor of a real-world capability gap.
English
0
0
1
39
Sam Tobin-Hochstadt
Sam Tobin-Hochstadt@samth·
@natalia__coelho @joseph_h_garvin @fleetingbits It's certainly possible that these benchmarks are accurate reflections of capabilities. But I have a strong prior belief that benchmarks of frontier models are not very informative and thus that the actual evidence about Mythos is highly significant where it doesn't exist for 5.5
English
2
0
2
78
FleetingBits
FleetingBits@fleetingbits·
some thoughts on gpt-5.5 and the missing zero day vulnerabilities 1) so, @natalia__coelho wrote an article in which she argued (among other things) that gpt-5.5 and mythos have similar cyber capabilities 2) one of the counterarguments has been that, even though the two models have similar benchmark scores, openai did not discover the kind of vulnerabilities that anthropic discovered with mythos 3) i think one of the reasons that anthropic discovered so many vulnerabilities with mythos is that anthropic has such a strong focus on safety, esp measurement 4) and, so it would make sense for anthropic to spin up a team to try to find real world vulnerabilities with their models in order to measure when models developed dangerous cyber capabilities 5) but, this is a difficult organizational commitment; you need run mythos, triage the vulnerabilities, inform the maintainers, figure out what to do after that, etc... 6) if you find the vulnerabilities, but do nothing about them, then you have created a real public relations risk for yourself if those vulnerabilities are ever exploited 7) and, if you just report them all to the maintainers and just swamp them, then you create another problem for yourself where open source maintainers will complain about you 8) and, if you wait to release your model, you may forego revenue that you could otherwise have obtained or put yourself behind in the race to grab customers 9) this is all to say that scanning thousands of open source repositories for vulnerabilities isn't a risk free decision; it's actually a potentially expensive decision 10) now, if you are anthropic, and your leadership cares a lot about safety (as an org) and because you believe in safety, you think this is just how the world is going to go and you can see the market opportunity for security then this isn't that hard of a decision 11) but, if you are openai and your leadership probably sees safety as part hindrance (painful stuff they have to do before they release a model) and part optics (good stuff they can say to congress) then it's not that easy 12) any time your executive team spends thinking about how to staff the team, handle the reach out to maintainers, decide whether to embargo the model, etc... could have been spent figuring out how to sell ads, get compute, etc... 13) and, although they might see security as a very valuable market, maybe not as important immediately as getting a new frontier model out across all other use cases where they can win deals now 14) so, i think the fact that openai did not run a similar program with gpt-5.5 is not strong evidence that gpt-5.5 could not have been used to find similar vulnerabilities
Sam Tobin-Hochstadt@samth

I think this analysis is fundamentally misguided, in a looking-under-the-streetlight way. The reason people were freaked out about Mythos was not SWE-bench scores, but all the 0days. And 5.5, which is a great model, is not producing tons of 0days.

English
6
7
82
8K
jsd
jsd@datagenproc·
Servers are most of the cost of AI data centers. Though note that the energy and facility share would likely be larger than it appears here for BTM powered data centers, or if including CapEx for BYOP/C projects.
Epoch AI@EpochAIResearch

Servers account for 60% of the total cost of owning a 1 GW AI data center. A typical 1 GW AI data center costs about $38B in up-front capital and $0.9B/year to operate. Annualizing the capital expenses over equipment lifespans, that equates to $8.5B/year, with $5B for servers.

English
0
0
7
647
jsd retweetledi
Epoch AI
Epoch AI@EpochAIResearch·
Servers account for 60% of the total cost of owning a 1 GW AI data center. A typical 1 GW AI data center costs about $38B in up-front capital and $0.9B/year to operate. Annualizing the capital expenses over equipment lifespans, that equates to $8.5B/year, with $5B for servers.
Epoch AI tweet media
English
10
40
207
27.7K
jsd retweetledi
Mechanize
Mechanize@MechanizeWork·
Here's how each model scored over 24 hours of wall-clock time, on a single attempt. GPT-5.5 performs best, with a late gain in the final hours. Claude models settle in by hour 8 and cluster within noise. More at: gbaeval.com/leaderboard
Mechanize tweet media
English
1
7
56
13.9K
jsd
jsd@datagenproc·
jsd tweet media
ZXX
1
0
0
81
jsd
jsd@datagenproc·
sectors of the organizations named in OpenAI Customer Stories
jsd tweet media
English
1
1
7
423
jsd retweetledi
Epoch AI
Epoch AI@EpochAIResearch·
Superstar AI researchers are paid >10× more than their frontier lab colleagues, and >100× more than most postdocs. Why? The naive explanation is that this is just due to differences in researcher quality. But in a new essay, @ansonwhho argues that this is very incomplete.
Epoch AI tweet media
English
9
17
287
65.5K
jsd
jsd@datagenproc·
@SakumiBLR Bientôt il n’y aura plus besoin du technicien
Français
0
0
0
30
jsd
jsd@datagenproc·
@SakumiBLR Sérieusement, je me demande pourquoi c’est pas déjà omniprésent, y compris dans d’autres domaines. Il y a des trucs comme MSFT Remote Assist mais c’est assez rare.
Français
2
0
1
113
jsd
jsd@datagenproc·
@panickssery Maybe he was also referring to loser courts not just the Supreme Court? Not sure if that makes it more true.
English
1
0
1
177
Arjun Panickssery
Arjun Panickssery@panickssery·
I don't get the defense for this claim in the comments because if you're 45 then you were 5 years old when Scalia joined and Rehnquist became CJ and since then there's mainly Obergefell to complain about while you're happy with Citizens United, Heller maybe, Dobbs, whatever
Sean T at RCP@SeanTrende

@Blake_Allen13 I think most on the right today grew up with the courts promoting liberal social goals and would see kneecapping them as a feature rather than a bug.

English
2
1
22
1.9K
Yafah Edelman
Yafah Edelman@YafahEdelman·
In celebration of passing 1k followers, I'm inviting people to reply to this asking me to forecast the probability of any AI related event happening in the future. I will provide a point estimate based largely on whatever reasoning pops into my head first.
English
100
8
165
28.1K