Rakshit Trivedi (@rstriv) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

As increasingly capable AI systems are deployed, humans, institutions, and other AI systems adapt in response — i.e. the world pushes back. So is capability still the central safety challenge for AI? We think not. We believe the harder challenge is coexistence. The current AI research paradigm treats the world as a stationary source of feedback, what we refer to as the solipsistic approach to AI design. This raises serious risks for coexistence. In our new #ICML2026 paper, we argue that superintelligence — an extremely capable task solver, built through such a solipsistic approach — is unlikely to be cooperative. 🧵

English

1

2

10

4.8K

Rakshit Trivedi retweetledi

Cooperative AI Foundation@coop_ai·13h

How does democratic accountability work if institutions are run by agents? Join @bakkermichiel (@MIT) for his seminar on Tuesday 16 June exploring 'Closing the Democratic Loop: Automated Oversight for the AGI Era'. Link below.

English

1

3

11

579

Rakshit Trivedi@rstriv·1d

📄 Paper: arxiv.org/abs/2606.03237 Work done in collaboration with my wonderful coauthors @natashajaques, @locross, Sasha Vezhnevets, and @jzl86. Very excited to present this at #ICML 2026. If you are visiting, come say hi at our poster session. We would love to discuss!

English

0

3

95

Rakshit Trivedi@rstriv·1d

The paper concludes by tackling several counterarguments such as: - multi-actor designs may have worse failure modes - competitive pressure may produce cooperation naturally - the empirical track record may not justify alarm - scale may solve interaction dynamics - RLHF may already train cooperative behavior These are serious objections. Our response is that each misses how deployment changes the game. 12/n

English

1

0

74

Rakshit Trivedi@rstriv·1d

As increasingly capable AI systems are deployed, humans, institutions, and other AI systems adapt in response — i.e. the world pushes back. So is capability still the central safety challenge for AI? We think not. We believe the harder challenge is coexistence. The current AI research paradigm treats the world as a stationary source of feedback, what we refer to as the solipsistic approach to AI design. This raises serious risks for coexistence. In our new #ICML2026 paper, we argue that superintelligence — an extremely capable task solver, built through such a solipsistic approach — is unlikely to be cooperative. 🧵

English

1

2

10

4.8K

Rakshit Trivedi retweetledi

Cas (Stephen Casper)@StephenLCasper·12 Mar

🚨New paper led by @aribak02 Lots of prior research has assumed that LLMs have stable preferences, align with coherent principles, or can be steered to represent specific worldviews. No ❌, no ❌, and definitely no ❌. We need to be careful not to anthropomorphize LLMs too much.

English

11

90

386

106.7K

Rakshit Trivedi retweetledi

Daphne Cornelisse@daphne_cor·28 Şub

Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars. We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.

GIF

English

3

27

177

22K

Rakshit Trivedi retweetledi

Cooperative AI Foundation@coop_ai·20 Şub

The development and widespread deployment of advanced AI agents will give rise to multi-agent systems of unprecedented complexity. A new report from staff at CAIF and a host of leading researchers explores the novel and under-appreciated risks these systems pose. Details below.

English

1

42

116

24.1K

Rakshit Trivedi retweetledi

Atoosa Kasirzadeh@Dr_Atoosa·14 Şub

In this review paper, we advocate for the normalization of AI safety as an inherent component of AI development and deployment. AI safety should be a standard practice integrated into every stage of AI creation and deployment. Developing and deploying safe AI should be a universal priority for everyone. Read our preprint here: lnkd.in/dMFPUGiB

English

4

40

157

28.8K

Rakshit Trivedi retweetledi

Gillian Hadfield@ghadfield·26 Oca

Video from our tutorial @NeurIPSConf 2024 is up! @dhadfieldmenell @jzl86 @rstriv and I explore how frameworks from economics, institutional and political theory, and biological and cultural evolution can advance approaches to AI alignment neurips.cc/virtual/2024/t…

English

0

8

21

3K

Rakshit Trivedi retweetledi

Jakob Foerster@j_foerst·21 Oca

RL has always been the future and the future is now. Having an open-source version released _before_ major closed-source labs managed to rediscover this internally (as far as I know) is amazing.

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

So @karthikv792 checked out @deepseek_ai's R1 LRM on PlanBench (arxiv.org/abs/2206.10498)--and found that it is very much competitive with o1 (preview), but at a fraction of the cost. The fact that it is open source and doesn't hide its intermediate tokens opens up a rich avenue for understanding LRMS based on RL post-training. 1/

English

9

10

187

22.8K

Rakshit Trivedi

Keşfet