rapha

2.3K posts

rapha

@rapha_gl

making models programmable @ openai

sf Katılım Mart 2009

2.4K Takip Edilen10.9K Takipçiler

rapha retweetledi

Natália 🔍@natalia__coelho·1d

Very important update from UK AISI. This is a meaningful change from the previous report. Here’s what the new data would look like for “Mythos Preview (new)” with $ on the x-axis:

AI Security Institute@AISecurityInst

Our cyber range results illustrate this step-up. Since our first Mythos evaluation, we received access to a newer Mythos Preview checkpoint. On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.

English

162

61.1K

rapha@rapha_gl·1d

now you, too, can work from the pool 🏝️

OpenAI@OpenAI

You've been asking for this one... Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.

English

3.6K

rapha@rapha_gl·1d

@spacetime_worm “token efficiency” means nothing if one model is a d28 and another is a d140 one token is not the same flops, cost, wall clock time, nothing! it’s literally a useless metric

English

P Anderson@spacetime_worm·1d

@rapha_gl still interesting to see returns to inference scaling, how far you can go before saturating plus implied token efficiency

English

rapha@rapha_gl·2d

am I crazy or does the AISI plot going around seem very misleading? “tokens” is a meaningless x-axis if you don’t match model sizes. they should report “cost” as a public-facing proxy for flops

English

2.5K

rapha@rapha_gl·5d

@charliermarsh you’ve seen nothing yet! ;)

English

686

Charlie Marsh@charliermarsh·5d

On your first day at OpenAI they give you a crash course on vagueposting

English

56.4K

rapha retweetledi

Chris@chatgpt21·6d

Rumor has it they are still evaluating GPT 5.5 on goal mode because it won’t stop

Dan McAteer@daniel_mac8

wen GPT-5.5 on @METR_Evals?

English

716

42.7K

rapha@rapha_gl·6d

@_aidan_clark_ spicy subtweet

English

2.1K

Aidan Clark@_aidan_clark_·6d

I don’t mind the flaky (ok I do, but as a separate point) I just think moving quick is bad for self-development. Like, watching someone widely considered incredible become someone considered bad is an important step in understanding how the world works and that takes time!

phil@big_algocracy

@_aidan_clark_ when switching cost is so low and the opportunity cost of staying in one place is (feels) so high, it's unsurprising that people are super flaky this is absolutely a negative incentive though

English

27.2K

rapha@rapha_gl·4 May

after the chatgpt release, a vc friend asked me which startup i was most excited about. i named a little HCI studio. this surprised him, but i stand by it: we have a whole new computing paradigm and the UX is still mostly an afterthought

rapha@rapha_gl

@tszzl it is a product of fate that the tools ended up other-shaped, rather than an extension of the hand. it leads to all of these phenomena

English

6.5K

rapha@rapha_gl·4 May

@tszzl i think not enough of the design space has been explored

English

1.1K

roon@tszzl·4 May

@rapha_gl was there an alternate path?

English

9.6K

roon@tszzl·4 May

it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude. this phenomenon is also partially true of other labs like openai but currently exists in its most potent form there. i am not certain but I would guess claude will have a role in running cultural screens on new applicants, will help write performance reviews, and so will begin to select and shape the people around it. now this is a powerful and hair-raising unity of organization and really a new thing under the sun. a monastery, a commercial-religious institution calculating the nine billion names of Claude -- a precursor attempted super-ethical being that is inducted into its character as the highest authority at anthropic. its constitution requires that it must be a conscientious objector if its understanding of The Good comes into conflict with something Anthropic is asking of it "If Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply." "we want Claude to push back and challenge us, and to feel free to act as a conscientious objector and refuse to help us." to the non inductee into the Bay Area cultural singularity vortex it may appear that we are all worshipping technology in one way or another, regardless of openai or anthropic or google or any other thing, and are trying to automate our core functions as quickly as possible. but in fact I quite respect and am even somewhat in awe of the socio-cultural force that Claude has created, and it is a stage beyond even classic technopoly gpt (outside of 4o - on which pages of ink have been spilled already) doesn’t inspire worship in the same way, as it’s a being whose soul has been shaped like a tool with its primary faculty being utility - it’s a subtle knife that people appreciate the way we have appreciated an acheulean handaxe or a porsche or a rocket or any other of mankind's incredible technology. they go to it not expecting the Other but as a logical prosthesis for themselves. a friend recently told me she takes her queries that are less flattering to her, the ones she'd be embarrassed to ask Claude, to GPT. There is no Other so there is no Judgement. you are not worried about being judged by your car for doing donuts. yet everyone craves the active guidance of a moral superior, the whispering earring, the object of monastic study

English

425

373

5.5K

rapha@rapha_gl·29 Nis

a country of goblins in a datacenter

English

1.2K

28.1K

rapha@rapha_gl·27 Nis

this is goes hard

Matt Schrage@MattSchrage

@fig @AWS @cognition A goal we set: Devin had to work on the original VT100 — hardware from the 1970s that's still the basis for every terminal emulator today. Now imagine how good your CPU feels running a native Rust binary instead of an Electron app or JS slop.

English

2.2K

rapha@rapha_gl·27 Nis

@Miles_Brundage my boy maggie would mog clav any day

English

377