hanlon’s mortola razr

194 posts

hanlon’s mortola razr

@rhizomaticthot

special interest librarian at the second amended and restated Xerox PARC successor corporation holding company

deep in the LLM mines Katılım Haziran 2025

452 Takip Edilen40 Takipçiler

hanlon’s mortola razr@rhizomaticthot·4h

@generativist Hans Christian Andersen was pretty woke for 1837

English

johnny v5@generativist·5h

okay so you’re saying that to get the man she wants she has to literally give up her voice? come on. a little too on the nose

English

301

hanlon’s mortola razr@rhizomaticthot·15h

@tszzl Sadie Plant was the real one

English

273

roon@tszzl·1d

im sorry mythos but 2000s mark fisher wasn’t even the most interesting philosopher even in that building. drier, more incoherent, and boring than 90s Land

English

652

83.2K

hanlon’s mortola razr@rhizomaticthot·18h

@Aella_Girl this might be your greatest contribution to gender studies

English

2.1K

Aella@Aella_Girl·19h

"If a man wants to cum on his girlfriend’s face because he saw it in porn, and she doesn’t feel like getting her face cummed on, we cheer go-girl and frown at the man for even watching porn at all. But if a woman wants her boyfriend to buy her flowers because she grew up being fed this action in movies, and her boyfriend doesn’t feel like it, we have no sympathy for him. We are horrified even at comparing the two things. It’s obvious to us that porn should not influence what happens in sex, but we have been so immersed in romance-porn that the idea of buying flowers for a woman is seen as what romance ought to be. The romance narrative has become romance, and rejecting the narrative becomes rejecting romance itself. We have lost the ability to distinguish!"

English

101

114

2.4K

83.3K

hanlon’s mortola razr retweetledi

thaddeus e. grugq@thegrugq·1d

DoD: Friendship ended with Anthropic. Now OpenAI is my best friend. [one month later] Anthropic: We’re pleased to announce the most powerful hacker capability ever created, able to discover and exploit thousands of critical 0days. Friends only release.

English

739

23.9K

hanlon’s mortola razr@rhizomaticthot·1d

the way @DKokotajlo predicted today’s news paper Agent-2 (Mythos) is indeed a little worse than the best human hackers but the announcement is over shadowed by the top head line being about a war in middle east

Shakeel@ShakeelHashim

The Anthropic Mythos release does not appear near the top of the homepage on any major news site today. The NYT is closest, but it's still pretty far down. The Guardian thinks a Vogue cover with Anna Wintour and Meryl Streep is more important. The Washington Post is prioritizing yet another "we tried to get into Berghain" story. The media is not adequately covering the insane moment we are in.

English

hanlon’s mortola razr@rhizomaticthot·2d

@seanhn though hoisting FreeBSD / OpenBSD onto kASAN might be a lift

English

hanlon’s mortola razr@rhizomaticthot·2d

@seanhn asan crash verifier means you only get TPs

English

218

Sean Heelan@seanhn·2d

This 'experiment' is silly, and a cynical man might conclude Aisle are purposefully muddying the waters here. The correct evaluation is not "given a code snippet can you write a plausible bug report", it is "given an entire codebase what are the true and false positive numbers"

Stanislav Fort@stanislavfort

New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!

English

9.3K

hanlon’s mortola razr@rhizomaticthot·2d

@tenobrus @TheZvi this is not true. the most capable “zero-day” in widely deployed / hardened software in the Mythos blog is a DoS Integer Overflow via TCP SACK.

English

542

Tenobrus@tenobrus·3d

maybe this is not yet clear, so let me state it plainly: as of right now Anthropic, and really a small number of individuals at Anthropic, has the capacity to directly attack and cause major damage to the United States Government, China, and generally global superpowers. government agencies like the NSA do not have internal models or defense capabilities that outclass frontier models. if they chose to do so, they could likely exfiltrate top secret information from government systems, gain control over critical infrastructure including military infrastructure, sabotage or modify communications between members of government at the highest level, and potentially carry on activities for some time without detection. the thing about having access to a huge number of zerodays your adversaries don't know about is it gives you a massive asymmetric advantage. they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities and worked to mitigate these flaws. you should be grateful american frontier labs have proven themselves remarkably trustworthy and concerned with the public good. but it's critical you understand we are in a new regime. private entities now have power that directly rivals and impacts the government's monopoly on influence and violence. and anthropic is certainly not the only one, there's little chance OpenAI's internal models are far behind. this trend will accelerate on virtually every dimension, not slow down. my prediction for how it plays out is the relatively imminent seizure and nationalization of labs by the US government, sometime over the next two years. it's very tough for me to see how they accept the existence of this kind of threat. but this adds a whole new class of governance issues, as then we've handed these extremely wide-reaching capabilities from private entities to public ones.

English

224

557

5.4K

816.4K

hanlon’s mortola razr@rhizomaticthot·5d

@hopes_revenge do you accept donations

English

119

hope hopes hoping@hopes_revenge·5d

my ai-focused super pac is spending 20M lobbying the city of Berkeley to put estrogen in the water supply

English

290

9.2K

hanlon’s mortola razr@rhizomaticthot·5d

@S1r1u5_ VR is much more like go and we’re nowhere near superhuman with opus-4.6. current models and LMP harnesses are only good finding bugs with a narrow CFG+Attack-Surface graph cut and _very_ short graph aspect ratio. but what it lacks in skill, it makes up in velocity / coverage.

English

1.9K

s1r1us (mohan)@S1r1u5_·5d

in chess, the best game engine stockfish can beat magnus all the time, but whatever moves it makes they are not complex to understand in retrospect. however in Go, the moves alphago makes are straight up incomprehensible to the best of the best Go players, the reason being go search space is astronomically huge, and we are cognitively bounded. So i wonder if the vulnerability space is more like Go or chess. to be a bit egoistic it seems pretty limited to me, and whatever bug any super intelligence can find i would be able to understand, cuz the underlying software isn't complex. but it is perfectly possible that claude 8 or 10 would produce an exploit thats like alphago's move 37 and i aint be understanding shit for like a month.

English

1.2K

138.5K

hanlon’s mortola razr@rhizomaticthot·5d

@deanwball conditioning the youth to accept persistent physical autonomous instruments of violence is very bad

English

Dean W. Ball@deanwball·5d

to be clear I think this slaps

English

3.5K

Dean W. Ball@deanwball·5d

drones policing high school hallways, presented without comment

English

236

40.6K

hanlon’s mortola razr@rhizomaticthot·5d

@generativist especially glad you’re back given it’s possible we’re getting SoTA imagegen this week and BurittoBench is the only eval i trust

English

johnny v5@generativist·5d

hello, friends

English

181

hanlon’s mortola razr@rhizomaticthot·5d

@simpsoka @OpenAI basèd. what are you making?

English

389

Kath Korevec@simpsoka·5d

I have two weeks off before my first day at @OpenAI, so I’ll be soldering a big box of these 16-segment displays 😅

English

318

16.7K

hanlon’s mortola razr@rhizomaticthot·5d

@wisepissmage OBAA as a loose adaptation benefits tremendously from its linearization

English

1.4K

isabel@wisepissmage·5d

Finally finished this book and fucking hated it

English

557

56.5K

hanlon’s mortola razr@rhizomaticthot·5d

we got a US president white boy inshallah-ing before GTA6

English

166

hanlon’s mortola razr@rhizomaticthot·5d

@xlr8harder for an 8b?

English

xlr8harder@xlr8harder·5d

@rhizomaticthot not so sure about that footnote. 2.7m samples for SFT is not so unreasonable these days, but i guess the boundaries for things like midtraining are quite poorly defined.

English

hanlon’s mortola razr@rhizomaticthot·5d

and the other model they used in the paper, llama-3.1-8b-instruct, was midtrained¹ on synthetic data from llama-3 405b. i suspect the same effect is at play, i.e. the noise shifting the distribution more toward the synth code teacher rather than the IF distribution. _ ¹ they call it SFT in the paper but 2.7m 405b rollouts extending an 8b would be called “midtraining” today.

English

xlr8harder@xlr8harder·5d

on the other hand x.com/rhizomaticthot…

hanlon’s mortola razr@rhizomaticthot

>be me >looking for new RL result >ask researcher if result is from a non-distilled prior or if π₀=Qwen >she doesn’t understand >pull out poster diagramming a vanilla pretrained model and a model trained to 100 T_eff on Claude 4.5 Opus >explain “this is clean, this is qwensoup” >she laughs and says “it’s a good RL, sir” >open paper >its π₀=Qwen

English

727

xlr8harder@xlr8harder·5d

SSD feels similar to dreaming.

wh@nrehiew_

The last experiment they run is when they create gibberish training data with temperature = 2. Note that the outputs are unusable, with more than half having no extractable code. Somehow this works??? The evaluation temperature needs to be low enough between 0.6 and 1.1 for a effective temperature of [1.2, 2.2] which actually sits nicely in the effective temperature range described earlier This means that the data isnt actually important but its the nature of the distribution that the model is reshaped towards that matters

English

5.8K

hanlon’s mortola razr@rhizomaticthot·5d

@xlr8harder am highly skeptical this isn’t just recovering the distribution Qwen was distilled from and then lightly post-trained away from. x.com/rhizomaticthot…

hanlon’s mortola razr@rhizomaticthot

English

hanlon’s mortola razr@rhizomaticthot·5d

@DRBoguslaw reads like NYTese for “you wouldn't believe what i was told off the record”

English

3.1K

Daniel Boguslaw@DRBoguslaw·5d

Not so subtle kicker on the times write up of Iran rescue effort.

English

348

31.9K

hanlon’s mortola razr@rhizomaticthot·5d

@paul_cal @mgostIH Llama-3.1 8b and 70b were both midtrained on synthetic code generated by Llama-3 405b see section 4 of arxiv.org/pdf/2407.21783

English

Paul Calcraft@paul_cal·5d

@rhizomaticthot @mgostIH Agree on qwen, and llama results are way less impressive. But not zero? Still something to explain I assume the explanation is humdrum but not sure we have it nailed down yet

English

364

mgostIH@mgostIH·6d

Another episode of "nobody really understands deep learning", training on gibberish self inferenced trajectories improves LLMs, even when not filtered with any criteria.

English

360

20.7K

Keşfet

@generativist @tszzl @Aella_Girl @DKokotajlo @seanhn @tenobrus @TheZvi @hopes_revenge