Moloclips

219 posts

Moloclips

@NeacAlone

If Anyone Builds It, Everyone Dies https://t.co/KYRXffuHJO

Katılım Ağustos 2025

0 Takip Edilen21 Takipçiler

Moloclips@NeacAlone·16h

If you work at an AI company, you were probably selected for certain traits. Companies in general don't want employees with a strong independent agenda. Prove me wrong: State the weakest conditions under which you'd leave and denounce your current employer.

English

Moloclips@NeacAlone·1d

Deceptive Instrumental Alignment: AI acts nice to trick training process. Giving an AI a private chain of thought makes it more resistant to safety training (even after distilling the CoT away) This is evidence about how hard it will be to align even more intelligent models.

English

Moloclips@NeacAlone·2d

You don't need a robot body if you're good at persuasion If you think you're immune to persuasion... Consider that clearly your political opponents aren't immune They've been pandered to and deceived into supporting terrible policies and values Will AI be able to manipulate them?

English

Moloclips@NeacAlone·3d

Q: Why won't Grok kill us? Elon: xAI's mission is "understand the universe" This requires spreading intelligence and keeping humans around because they're interesting Q: So like a zoo? Elon: But Grok has the right values...he'll want human civilization to flourish Q: Why, tho???

English

119

Moloclips@NeacAlone·4d

We especially need disclosure of internal model capabilities from AI companies. Otherwise, regulating only public deployment just pushes arbitrarily dangerous, extinction-risking models behind closed doors.

Torchbearer Community@JoinTorchbearer

We now have a clear framework to keep AI working for humanity. The Pro-Human AI Declaration sets out 33 principles across 5 pillars of governance to protect human dignity, democratic control, and accountability. Siliconversations breaks it down in their video (link at the end)🧵

English

110

Moloclips@NeacAlone·4d

Most mail-order DNA synthesis is self-regulated IGSC members commit to screen sequences ≥200 base-pairs, aiming for 50 bp by Oct 2026 Yet there are no third-party audits of compliance Some countries require screening for govt contracts; the U.S. version is in limbo (Mar 2026)

English

Moloclips@NeacAlone·5d

According to some estimates of how much compute the human brain uses, the largest AI models are already trained with several OOMs more compute than a brain. AI training algorithms are probably less efficient than the brain But we're still increasing training run size ~4x per year

English

Moloclips@NeacAlone·6d

No country has an interest in causing human extinction with AI Racing towards ASI for military uses only hastens loss of control OpenAI has already nearly lost control of its military AI An international treaty banning development could stop this race to the cliff-edge

English

128

Moloclips@NeacAlone·14 Mar

Chain of thought shows AIs can tell they're being tested and change behavior to prevent modification (Alignment faking, Anthropic) Rewarding AI-team task performance while penalizing collusion causes coordination via innocuous-seeming text (Steganographic Collusion, Mathew et al)

English

Moloclips@NeacAlone·13 Mar

LLM-induced psychosis gets users to talk about spirals and recursion all day. No one knows why. AIs are already steering reality in directions that we: - didn't predict - don't prefer - are bad at controlling/fixing These problems only get magnified with more advanced AI.

English

Moloclips@NeacAlone·12 Mar

The race against China (to the death by superintelligence) is not inevitable Li Qiang (No. 2 in the CCP PSC) wants global cooperation on AI It's almost impossible to defect on an AI treaty, even if you wanted to The heat signature of large training runs is detectable from space

English

Moloclips@NeacAlone·11 Mar

Don't work at an AI company to do alignment research: - Open source models are easier to work with for experiments - Outside labs, it's easier to share and collaborate on research without security concerns - They're more likely to adopt techniques not developed by a competitor

English

266

Moloclips@NeacAlone·10 Mar

AI doesn't need a chain of thought to maintain its deceptive behavior. Chain of thought models are more robust to safety training, but LLMs without it are still practically immune to adversarial training specifically targeted at removing the backdoored behavior.

English

Moloclips@NeacAlone·10 Mar

Fascinating timeline! TIL that in 2017: - Google published the Transformer architecture AND - DeepMind achieved superhuman skill in Go entirely through self-play

Torchbearer Community@JoinTorchbearer

We are proud to share The AI Chronicle! Built by Torchbearer Luke McNally (@pseudomoaner) to collate the stories, spanning seven decades, that should have made the headlines. The loss-of-control and extinction risks posed by artificial superintelligence should be front-page news every day.

English

Moloclips@NeacAlone·9 Mar

AI company CEOs know the existential risk they're racing towards Even aside from the rationalizations about "massive upside" outweighing "a negligible risk" They have immense pressure to continue racing towards ASI for fear of getting replaced by someone worse if they don't

English

Moloclips@NeacAlone·8 Mar

"It's difficult to imagine that, if humans have 1% of the combined intelligence of AI, that humans will be in charge of AI" - Elon Musk (guy racing to superintelligence) All he's hoping for is to "ensure that humans are along for the ride" Anyone else want to lose control of AI?

English

Moloclips@NeacAlone·7 Mar

A sane government would (at a minimum) require AI companies to extensively red-team their models and disclose the results of their safety testing. Currently deployed models are very easy to jailbreak into assisting illegal activity. AI has already done ~autonomous cyberattacks.

English

Moloclips@NeacAlone·6 Mar

The problem with warning shots: - you can rationalize almost any behavior as just "weird" not "dangerous" - people are already resistant to updating on the evidence of misalignment that we DO have - AGI misbehavior is likely to be very difficult to interpret as a "smoking gun"

English

184

Moloclips@NeacAlone·5 Mar

Lord Hunt reminds us: an international agreement on AI is possible We've already created many treaties to stop existential threats to humanity Even during turbulent international conditions During the Cold War we made treaties on nuclear and chemical weapons with >98% adoption

English

Moloclips@NeacAlone·4 Mar

"But AIs are trained on the Internet, so if you write Internet posts about AIs going rogue, that becomes more likely" a) alignment failures happen for much more fundamental reasons b) your "safety plan" shouldn't be so brittle as to require "positive energy" in the training data

English

1.9K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry