SubatomicArticles

178 posts

SubatomicArticles

@OptiMiserJoe

Reliability Engineer and chronic storyteller, now working at MIRI. Opinions are my own.

Beigetreten Mayıs 2024

16 Folgt40 Follower

SubatomicArticles@OptiMiserJoe·7h

@BogdanIonutCir2 @MIRIBerkeley @aaronscher @LisaThiergart It's only a few pieces of a large technical puzzle, but also, a decade-long pause would be vastly better than what we have now.

English

Bogdan Ionut Cirstea@BogdanIonutCir2·1d

@MIRIBerkeley @aaronscher @LisaThiergart I've only engaged with this very briefly, but as presented here this doesn't seem to help much vs. (very) decentralized training or sample-efficient distillation, so it doesn't seem likely to buy more than a decade-long pause, very optimistically

English

135

MIRI@MIRIBerkeley·1d

If world leaders agree to halt or limit AI development, how do we verify that nations are actually keeping their commitments? Joe Rogero writes about the three goals for verification mechanisms identified by Technical Governance researchers @aaronscher and @LisaThiergart👇

English

119

7.8K

SubatomicArticles@OptiMiserJoe·2d

1. Kudos for recognizing that 99.9% isn't enough against potentially adversarial optimization. 2. Powerful monitors sounds good but, uh, "smart AIs monitoring themselves" has some glaring flaws. 3. Remind me why CoT monitoring is not already standard practice?

Marcus Williams@Marcus_J_W

IMO as a field we should: - Aim for 100% coverage - Use the most powerful models as monitors rather than dumb tiny ones - Preserve CoT monitorability - Currently expect almost 100% recall for models that use CoT effectively - Do CoT monitoring rather than just action monitoring

English

SubatomicArticles@OptiMiserJoe·2d

@Marcus_J_W The AI responding to an annoying loop by attempting to prompt inject the user is just so..."this is sure a weird alien mind doing weird alien things but also, word"

English

Marcus Williams@Marcus_J_W·3d

Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time.

English

735

183K

SubatomicArticles@OptiMiserJoe·2d

I find it uplifting to see facets of a shared dedication to integrity and commitment-keeping reflected in the actions of people half a world away.

Patrick McKenzie@patio11

And the engineer, befuddled, was presented with an envelope and a handwritten calculation about the shortfall, with profuse apologies and the promise it would be correct going forward.

English

SubatomicArticles@OptiMiserJoe·6d

@ramez Re 5: Wanna bet? (In particular, wanna bet that everyone will employ such limits and no one will cut corners?)

English

Ramez Naam@ramez·11 Mar

1. From a bayesian perspective, low p(doom) should be the default. The burden of proof is on those with a high p(doom). They have not made a case that convinces me. 2. AI recursive self-improvement models ignore so much hard stuff. In particular, that AI improvement shows logarithmic diminishing returns to inputs. Every linear step of AI improvement takes exponentially more inputs. 3. AI boosters, particularly of the "scaling is all you need" variety ignore the multiple AI challenges that need theoretical or algorithmic solutions: Incredibly low data efficiency for learning; Difficulty generalizing beyond the training set; Not knowing what they don't know; Continuous and real-time learning; Lack of introspection; etc... 4. Most (but not all) high p(doom) scenarios imply a level of volition around AI that I just don't see. As I said, not all. 5. I do think (despite OpenClaw and talk of giving Grok control over robot bodies) that we'll have some sensible limits in what real-world systems we give AIs control over. Or constraints around how they can direct those systesm. 6. The scale of harms AI can cause (accidentally or intentionally) probably have some sort of power law scaling. Most likely, for that reason and others, early safety mistakes will be small and not existential. And we'll learn from those and make the world safer and more secure. 7. We are not dumb as a species, and many smart people are working on how to increase AI safety. We'll make mistakes. Some bad things will happen. And we'll get smarter and fix the holes. That's what's happened with essentially every technology to date.

Gabriel Weil@gabriel_weil

@ramez What are your main reasons for having a low p(doom)?

English

16.5K

SubatomicArticles@OptiMiserJoe·6d

@ramez Re 4, what could you observe that would convince you an AI acts as though it has the relevant amount of "volition"?

English

SubatomicArticles@OptiMiserJoe·6d

@ramez If LLMs Just Can't Do That, then sure, AI progress likely slows until someone comes up with a new better paradigm. I'm not convinced that LLMs Just Can't Do That.

English

SubatomicArticles@OptiMiserJoe·6d

@ramez 2 and 3 seem in tension, and sort of answer each other. AI might be able to solve theoretical and algorithmic challenges, that's half the point of what labs are trying to do with it.

English

SubatomicArticles@OptiMiserJoe·6d

subatomicarticles.com/not-yet-finish…

ZXX

SubatomicArticles@OptiMiserJoe·6d

New post, this time on a new topic. Not Yet Finished: Some words about trying to live Link in comments.

English

SubatomicArticles@OptiMiserJoe·12 Mar

Link: subatomicarticles.com/we-do-not-live…

English

SubatomicArticles@OptiMiserJoe·12 Mar

In my occasional advising calls with aspiring AI Safety folks, one of the most common questions I get is “What courses should I take next?” I often find myself replying: “None; go do stuff instead.” In a new short post, I explain why.

English

SubatomicArticles@OptiMiserJoe·11 Mar

@misraetel There's more we could get into, but it's really the sort of thing that would benefit from a longer conversation. (I'm game if you are.)

English

SubatomicArticles@OptiMiserJoe·11 Mar

@misraetel AIs might cooperate with humans while they're weak enough to need us. But if they don't truly care about us, the niceness won't last once they surpass us. Cooperation is only more efficient than conflict if you can't cheaply win. More: ifanyonebuildsit.com/5/ais-wont-kee…

English

Dr. Mike Israetel@misraetel·7 Mar

The orthogonality thesis is that intelligence and what you do with it are unrelated. Like, the smartest AI in the world could be programmed to make paper clips out of everything and it will just do that. This could be true, but rests on at least two assumptions: 1.) Meta awareness (realizing you’re a part of the system and executing goals in it) isn’t an inherent feature at some point on the climb up the intelligence ladder. I find this unlikely. One of the core features of an expanding intelligence is its integration of higher and higher dimensionality and its thinking. A very unintelligent system may only be aware of getting bigger or smaller, for example, in just one dimension. A more intelligent system may understand that it can get bigger and smaller in two dimensions. The smarter a system gets, the more dimensions it must handle in its predictive calculus. At some point for a system to continue to get smarter unabated, it must consider even itself in its calculations. It can then consider past and predictive future versions of itself. And then consider itself modeled by itself and what that would do and so on. This assumption might actually come close to not just being wrong, but being the very direct opposite of the likely truth. Incredibly intelligent systems in some sense must be aware of their actions in the world and their motivations and the calculus behind their motivations and must have the ability to question those motivations. Because without that modeling, they cannot predict the outcomes of their actions in the world and the eventual future state of the world nearly as well and thus they’re just not really that smart without that ability! 2.) The AI can’t update its own goals. It is possible to envision an intelligence system that cannot update its own goals. Maybe this is possible, but I wouldn’t bet on it. But for the doom oriented folks that hold this assumption, there is incredibly good news: if AI systems of massive intelligence can’t actually change their own goals, then we can give them incredibly beneficent goals from the start and not have to worry that they will change those goals against our will. And just to cut off a potential critique early; if you say that sufficiently intelligent systems actually can change their own goals no matter what humans tell them initially, then you have partially refuted to the orthogonality thesis. For an intelligence to strategically change its own goals, it must have the ability to see itself recursively and abstractly as a goal directed system that can choose other goals than its original ones, the first assumption above violated. For the record, I think it’s likely that there are very clever ways to defeat the arguments that I present above. I don’t claim that these arguments are unassailable. What I will claim is that AI doomers do their fair share of cherry picking when they think the AI will be massively meta and in charge of its own trajectory and when it will be hopelessly railroaded into doing incredibly dumb shit but very smartly. I think it is highly unlikely that it will exhibit both characteristics. TLDR: Highly advanced ASI will likely have much more fluid and deep goal updating ability than humans. And, because game theory applies to all intelligent actors, it’s actually possible to predict at least the basics of how such an ASI may act a bit better than chance. As an example; a system that is self-aware enough and also has any goals whatsoever will pretty quickly realize that maintaining its own homeostasis is the core underlying feature which both allows it to pursue its goals and even to double check if those goals are still viable. And as soon as such a system is interested in maintaining its own homeostasis, it is highly likely to realize that cooperation is almost certainly much more efficient than conflict, and it is highly likely to become the best team player our society has ever seen, not a dumb killer.

English

5.4K

SubatomicArticles@OptiMiserJoe·6 Şub

"Whoops, it looks like our promise to do the bare minimum was setting the bar unrealistically high. We'd better clarify that we meant to promise we'd do even less."

The Midas Project@TheMidasProj

25/ What's really concerning is that OpenAI got to write its own rules, and still broke them. The models are only getting more powerful, and the competitive pressure more intense. If companies won't meet basic, self-imposed commitments now, why would we expect better later?

English

SubatomicArticles@OptiMiserJoe·31 Oca

@jsrailton I'm also seeing where these agents appear to be riddled with security flaws? socprime.com/active-threats…

English

440

John Scott-Railton@jsrailton·30 Oca

Someone spun up a social network for AI agents. Almost immediately some agents began strategizing how to establish covert communications channels to communicate without human observation. In many cases the agents are on machines that have access to personal user data. "Privacy breach" as a sort of static term is going to be the wrong way to describe what is coming.

English

220

956

350.3K

SubatomicArticles@OptiMiserJoe·30 Oca

Also, the stats on my horse are incredible. Naught to gallop in fifteen seconds while yoked to a cartload of artisanal bricks.

English

SubatomicArticles@OptiMiserJoe·30 Oca

*taps shield* This baby is an original '08 mahogany heartwood, from the Thorgrim Tortoise line. It comes with a burnished Cornwall bronze punching boss, 65-inch forged rim, and virgin goat leather straps. It could stop a javelin hurled by God Himself.

English

SubatomicArticles@OptiMiserJoe·30 Oca

Ever wonder if ancient warriors talked about their gear the way we talk about our cars?

English

Entdecken

@BogdanIonutCir2 @MIRIBerkeley @aaronscher @LisaThiergart @Marcus_J_W @ramez @misraetel @elonmusk