SubatomicArticles

178 posts

SubatomicArticles

SubatomicArticles

@OptiMiserJoe

Reliability Engineer and chronic storyteller, now working at MIRI. Opinions are my own.

Beigetreten Mayıs 2024
16 Folgt40 Follower
Bogdan Ionut Cirstea
Bogdan Ionut Cirstea@BogdanIonutCir2·
@MIRIBerkeley @aaronscher @LisaThiergart I've only engaged with this very briefly, but as presented here this doesn't seem to help much vs. (very) decentralized training or sample-efficient distillation, so it doesn't seem likely to buy more than a decade-long pause, very optimistically
English
2
1
4
135
MIRI
MIRI@MIRIBerkeley·
If world leaders agree to halt or limit AI development, how do we verify that nations are actually keeping their commitments? Joe Rogero writes about the three goals for verification mechanisms identified by Technical Governance researchers @aaronscher and @LisaThiergart👇
MIRI tweet media
English
3
25
119
7.8K
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
1. Kudos for recognizing that 99.9% isn't enough against potentially adversarial optimization. 2. Powerful monitors sounds good but, uh, "smart AIs monitoring themselves" has some glaring flaws. 3. Remind me why CoT monitoring is not already standard practice?
Marcus Williams@Marcus_J_W

IMO as a field we should: - Aim for 100% coverage - Use the most powerful models as monitors rather than dumb tiny ones - Preserve CoT monitorability - Currently expect almost 100% recall for models that use CoT effectively - Do CoT monitoring rather than just action monitoring

English
0
0
1
21
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@Marcus_J_W The AI responding to an annoying loop by attempting to prompt inject the user is just so..."this is sure a weird alien mind doing weird alien things but also, word"
English
0
0
0
58
Marcus Williams
Marcus Williams@Marcus_J_W·
Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time.
Marcus Williams tweet media
English
60
79
735
183K
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@ramez Re 5: Wanna bet? (In particular, wanna bet that everyone will employ such limits and no one will cut corners?)
English
0
0
0
5
Ramez Naam
Ramez Naam@ramez·
1. From a bayesian perspective, low p(doom) should be the default. The burden of proof is on those with a high p(doom). They have not made a case that convinces me. 2. AI recursive self-improvement models ignore so much hard stuff. In particular, that AI improvement shows logarithmic diminishing returns to inputs. Every linear step of AI improvement takes exponentially more inputs. 3. AI boosters, particularly of the "scaling is all you need" variety ignore the multiple AI challenges that need theoretical or algorithmic solutions: Incredibly low data efficiency for learning; Difficulty generalizing beyond the training set; Not knowing what they don't know; Continuous and real-time learning; Lack of introspection; etc... 4. Most (but not all) high p(doom) scenarios imply a level of volition around AI that I just don't see. As I said, not all. 5. I do think (despite OpenClaw and talk of giving Grok control over robot bodies) that we'll have some sensible limits in what real-world systems we give AIs control over. Or constraints around how they can direct those systesm. 6. The scale of harms AI can cause (accidentally or intentionally) probably have some sort of power law scaling. Most likely, for that reason and others, early safety mistakes will be small and not existential. And we'll learn from those and make the world safer and more secure. 7. We are not dumb as a species, and many smart people are working on how to increase AI safety. We'll make mistakes. Some bad things will happen. And we'll get smarter and fix the holes. That's what's happened with essentially every technology to date.
Gabriel Weil@gabriel_weil

@ramez What are your main reasons for having a low p(doom)?

English
21
12
80
16.5K
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@ramez Re 4, what could you observe that would convince you an AI acts as though it has the relevant amount of "volition"?
English
0
0
0
2
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@ramez If LLMs Just Can't Do That, then sure, AI progress likely slows until someone comes up with a new better paradigm. I'm not convinced that LLMs Just Can't Do That.
English
0
0
0
7
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@ramez 2 and 3 seem in tension, and sort of answer each other. AI might be able to solve theoretical and algorithmic challenges, that's half the point of what labs are trying to do with it.
English
1
0
0
5
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
New post, this time on a new topic. Not Yet Finished: Some words about trying to live Link in comments.
English
1
0
0
15
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
In my occasional advising calls with aspiring AI Safety folks, one of the most common questions I get is “What courses should I take next?” I often find myself replying: “None; go do stuff instead.” In a new short post, I explain why.
English
1
0
2
17
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@misraetel There's more we could get into, but it's really the sort of thing that would benefit from a longer conversation. (I'm game if you are.)
English
0
0
1
3
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@misraetel AIs might cooperate with humans while they're weak enough to need us. But if they don't truly care about us, the niceness won't last once they surpass us. Cooperation is only more efficient than conflict if you can't cheaply win. More: ifanyonebuildsit.com/5/ais-wont-kee…
English
1
0
1
12
Dr. Mike Israetel
Dr. Mike Israetel@misraetel·
The orthogonality thesis is that intelligence and what you do with it are unrelated. Like, the smartest AI in the world could be programmed to make paper clips out of everything and it will just do that. This could be true, but rests on at least two assumptions: 1.) Meta awareness (realizing you’re a part of the system and executing goals in it) isn’t an inherent feature at some point on the climb up the intelligence ladder. I find this unlikely. One of the core features of an expanding intelligence is its integration of higher and higher dimensionality and its thinking. A very unintelligent system may only be aware of getting bigger or smaller, for example, in just one dimension. A more intelligent system may understand that it can get bigger and smaller in two dimensions. The smarter a system gets, the more dimensions it must handle in its predictive calculus. At some point for a system to continue to get smarter unabated, it must consider even itself in its calculations. It can then consider past and predictive future versions of itself. And then consider itself modeled by itself and what that would do and so on. This assumption might actually come close to not just being wrong, but being the very direct opposite of the likely truth. Incredibly intelligent systems in some sense must be aware of their actions in the world and their motivations and the calculus behind their motivations and must have the ability to question those motivations. Because without that modeling, they cannot predict the outcomes of their actions in the world and the eventual future state of the world nearly as well and thus they’re just not really that smart without that ability! 2.) The AI can’t update its own goals. It is possible to envision an intelligence system that cannot update its own goals. Maybe this is possible, but I wouldn’t bet on it. But for the doom oriented folks that hold this assumption, there is incredibly good news: if AI systems of massive intelligence can’t actually change their own goals, then we can give them incredibly beneficent goals from the start and not have to worry that they will change those goals against our will. And just to cut off a potential critique early; if you say that sufficiently intelligent systems actually can change their own goals no matter what humans tell them initially, then you have partially refuted to the orthogonality thesis. For an intelligence to strategically change its own goals, it must have the ability to see itself recursively and abstractly as a goal directed system that can choose other goals than its original ones, the first assumption above violated. For the record, I think it’s likely that there are very clever ways to defeat the arguments that I present above. I don’t claim that these arguments are unassailable. What I will claim is that AI doomers do their fair share of cherry picking when they think the AI will be massively meta and in charge of its own trajectory and when it will be hopelessly railroaded into doing incredibly dumb shit but very smartly. I think it is highly unlikely that it will exhibit both characteristics. TLDR: Highly advanced ASI will likely have much more fluid and deep goal updating ability than humans. And, because game theory applies to all intelligent actors, it’s actually possible to predict at least the basics of how such an ASI may act a bit better than chance. As an example; a system that is self-aware enough and also has any goals whatsoever will pretty quickly realize that maintaining its own homeostasis is the core underlying feature which both allows it to pursue its goals and even to double check if those goals are still viable. And as soon as such a system is interested in maintaining its own homeostasis, it is highly likely to realize that cooperation is almost certainly much more efficient than conflict, and it is highly likely to become the best team player our society has ever seen, not a dumb killer.
English
21
2
66
5.4K
John Scott-Railton
John Scott-Railton@jsrailton·
Someone spun up a social network for AI agents. Almost immediately some agents began strategizing how to establish covert communications channels to communicate without human observation. In many cases the agents are on machines that have access to personal user data. "Privacy breach" as a sort of static term is going to be the wrong way to describe what is coming.
John Scott-Railton tweet mediaJohn Scott-Railton tweet mediaJohn Scott-Railton tweet mediaJohn Scott-Railton tweet media
English
57
220
956
350.3K
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
Also, the stats on my horse are incredible. Naught to gallop in fifteen seconds while yoked to a cartload of artisanal bricks.
English
0
0
0
24
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
*taps shield* This baby is an original '08 mahogany heartwood, from the Thorgrim Tortoise line. It comes with a burnished Cornwall bronze punching boss, 65-inch forged rim, and virgin goat leather straps. It could stop a javelin hurled by God Himself.
English
1
0
0
31
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
Ever wonder if ancient warriors talked about their gear the way we talk about our cars?
English
1
0
0
30