Rakshit Trivedi

38 posts

Rakshit Trivedi

Rakshit Trivedi

@rstriv

Computer scientist working on cooperative AI, multi-agent safety, and institutions for collective intelligence

Katılım Haziran 2009
215 Takip Edilen73 Takipçiler
Sabitlenmiş Tweet
Rakshit Trivedi
Rakshit Trivedi@rstriv·
As increasingly capable AI systems are deployed, humans, institutions, and other AI systems adapt in response — i.e. the world pushes back. So is capability still the central safety challenge for AI? We think not. We believe the harder challenge is coexistence. The current AI research paradigm treats the world as a stationary source of feedback, what we refer to as the solipsistic approach to AI design. This raises serious risks for coexistence. In our new #ICML2026 paper, we argue that superintelligence — an extremely capable task solver, built through such a solipsistic approach — is unlikely to be cooperative. 🧵
Rakshit Trivedi tweet media
English
1
2
10
4.8K
Rakshit Trivedi retweetledi
Cooperative AI Foundation
How does democratic accountability work if institutions are run by agents? Join @bakkermichiel (@MIT) for his seminar on Tuesday 16 June exploring 'Closing the Democratic Loop: Automated Oversight for the AGI Era'. Link below.
Cooperative AI Foundation tweet media
English
1
3
11
579
Rakshit Trivedi
Rakshit Trivedi@rstriv·
The paper concludes by tackling several counterarguments such as: - multi-actor designs may have worse failure modes - competitive pressure may produce cooperation naturally - the empirical track record may not justify alarm - scale may solve interaction dynamics - RLHF may already train cooperative behavior These are serious objections. Our response is that each misses how deployment changes the game. 12/n
English
1
0
0
74
Rakshit Trivedi
Rakshit Trivedi@rstriv·
As increasingly capable AI systems are deployed, humans, institutions, and other AI systems adapt in response — i.e. the world pushes back. So is capability still the central safety challenge for AI? We think not. We believe the harder challenge is coexistence. The current AI research paradigm treats the world as a stationary source of feedback, what we refer to as the solipsistic approach to AI design. This raises serious risks for coexistence. In our new #ICML2026 paper, we argue that superintelligence — an extremely capable task solver, built through such a solipsistic approach — is unlikely to be cooperative. 🧵
Rakshit Trivedi tweet media
English
1
2
10
4.8K
Rakshit Trivedi retweetledi
Cas (Stephen Casper)
Cas (Stephen Casper)@StephenLCasper·
🚨New paper led by @aribak02 Lots of prior research has assumed that LLMs have stable preferences, align with coherent principles, or can be steered to represent specific worldviews. No ❌, no ❌, and definitely no ❌. We need to be careful not to anthropomorphize LLMs too much.
Cas (Stephen Casper) tweet media
English
11
90
386
106.7K
Rakshit Trivedi retweetledi
Daphne Cornelisse
Daphne Cornelisse@daphne_cor·
Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars. We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.
GIF
English
3
27
177
22K
Rakshit Trivedi retweetledi
Cooperative AI Foundation
Cooperative AI Foundation@coop_ai·
The development and widespread deployment of advanced AI agents will give rise to multi-agent systems of unprecedented complexity. A new report from staff at CAIF and a host of leading researchers explores the novel and under-appreciated risks these systems pose. Details below.
Cooperative AI Foundation tweet media
English
1
42
116
24.1K
Rakshit Trivedi retweetledi
Atoosa Kasirzadeh
Atoosa Kasirzadeh@Dr_Atoosa·
In this review paper, we advocate for the normalization of AI safety as an inherent component of AI development and deployment. AI safety should be a standard practice integrated into every stage of AI creation and deployment. Developing and deploying safe AI should be a universal priority for everyone. Read our preprint here: lnkd.in/dMFPUGiB
Atoosa Kasirzadeh tweet media
English
4
40
157
28.8K
Rakshit Trivedi retweetledi
Jakob Foerster
Jakob Foerster@j_foerst·
RL has always been the future and the future is now. Having an open-source version released _before_ major closed-source labs managed to rediscover this internally (as far as I know) is amazing.
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

So @karthikv792 checked out @deepseek_ai's R1 LRM on PlanBench (arxiv.org/abs/2206.10498)--and found that it is very much competitive with o1 (preview), but at a fraction of the cost. The fact that it is open source and doesn't hide its intermediate tokens opens up a rich avenue for understanding LRMS based on RL post-training. 1/

English
9
10
187
22.8K