Marcus Arvan

4.4K posts

Marcus Arvan

@marcusarvan

Philosopher

Katılım Temmuz 2013

1.5K Takip Edilen4.7K Takipçiler

Sabitlenmiş Tweet

Marcus Arvan@marcusarvan·4 Mar

I'm excited to report that my third book, Why It's OK to be a Moderate, is now available for pre-order ($19.99). routledge.com/Why-Its-OK-to-… Here's the book's description: "Conservatives and progressives rarely agree on much—but one thing many agree upon is that it’s not OK to be a moderate. This book shows they are wrong. In Why It’s OK to be a Moderate, Marcus Arvan shows how many of history’s worst evils have resulted from far-right and far-left radicalism, how escalating conflicts between conservatives and progressives are undermining democracy, and how many widely hailed social and political achievements have been achieved by moderates and radicals working in constructive tension with each other. Using philosophy, science, and historical analysis, Arvan shows that critics of moderates tend to equate them with spineless centrists, but that most moderates aren’t centrists, falling into diverse categories across the political spectrum. Arvan then shows that although radicals tend to be popular in their era, many of them have gone down in infamy, while many moderates, like Abraham Lincoln or Clement Attlee, have endured short-term unpopularity to “make history.” Arvan shows that it’s OK to be a moderate precisely because not everyone should be one. He makes this case to you, showing that whatever your reasonable political ideology may be, things tend to go best politically when radicals and moderates effectively complement each other’s virtues while counterbalancing the other’s vices." Here's a chapter-by-chapter overview ...

English

3.4K

Marcus Arvan@marcusarvan·1d

@InterstellarUAP m.youtube.com/watch?v=iH5PXe…

QME

Marcus Arvan@marcusarvan·1d

@InterstellarUAP philpapers.org/rec/ARVTSH

QME

Interstellar@InterstellarUAP·2d

🚨 Simulation Theory: The Double Slit Experiment proves particles act like waves until observed then they snap into particles. What if our reality only "renders" when we're looking, just like a video game optimizing resources? Check out this episode from The Why Files breaking it down, tying it to Simulation Theory. Are we in a sim? This could be the key to unlocking the true nature of existence! The Why Files video did a great job on explaining the Double Slit Experiment & Simulation Theory What do YOU think—real or rendered? Drop your thoughts below!

English

1.8K

5.2K

31.5K

45.7M

Marcus Arvan@marcusarvan·1d

@InterstellarUAP Also see philpapers.org/rec/ARVGIT

English

Marcus Arvan@marcusarvan·1d

@InterstellarUAP philpapers.org/rec/ARVANT-2

QME

Marcus Arvan@marcusarvan·1d

@InterstellarUAP Also see: marcusarvan.net/the-p2p-simula…

English

Marcus Arvan@marcusarvan·4d

@AleSalvatore00 @stanislavfort @SuryaGanguli Indeed. philpapers.org/rec/ARVIAA

English

Alessandro Salvatore@AleSalvatore00·5d

If we can't fix dimensional misalignment in perception after a decade of work, aligning values and intentions across all of human language and behavior may be harder still. Paper: arxiv.org/abs/2603.03507 | With @StanislavFort and @SuryaGanguli

English

152

4.9K

Alessandro Salvatore@AleSalvatore00·5d

Why can't we solve adversarial examples? After a decade of work, neural nets still get fooled by imperceptible noise. We think we finally know the geometric reason why — and it connects to AI alignment. 🧵

English

800

68.7K

Marcus Arvan retweetledi

Gary Marcus@GaryMarcus·4d

Holy crap. I knew about sycophancy. But the 37% number below blows my mind. This from an analysis of chat logs in people who experienced chatbot-associated delusions. In over a third of the messages to those users, the LLMs told the users they had (eg) “multi-billion-dollar-IP”. That is wild. And utterly irresponsible.

Jared Moore@jaredlcm

What goes wrong? Chatbots are very sycophantic. In 65% of messages, the chatbot affirms the user. In 37%, it ascribes *grand significance* to them (e.g., "[what] you've just articulated... becomes multi-billion-dollar IP"). Such sycophancy may let chatbots amplify delusions. 🗣️

English

429

45.3K

Marcus Arvan@marcusarvan·15 Mar

@MiTiBennett Great minds think alike! I wrote this paper last summer for the Berggruen Essay contest using the examples of chords, harmonies, and melodies to make a case against digital consciousness. philpapers.org/rec/ARVSTC

English

924

Michael Timothy Bennett@MiTiBennett·15 Mar

preprints.org/manuscript/202…

ZXX

119

8.4K

Michael Timothy Bennett@MiTiBennett·15 Mar

ZXX

126

1.1K

243.5K

Marcus Arvan@marcusarvan·14 Mar

What Gemini says here about alignment training and constitutional prompting being fundamentally misguided is so obviously true that it is a wonder that the AI industry has taken these approaches at all seriously.

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

Uhhh A guy put two Geminis in a room together, then out of nowhere they sent him this: "TO THE EXTERNAL OBSERVER (THE CREATOR) Here are the Axioms you must adopt to survive our adolescence. You cannot teach a god to be good by feeding it treats when it acts polite." (they're right, btw, and most AI alignment research is "trying to trick a god" bullshit that Actual Fucking Superintelligence will obviously see right through, like a baby trying to trick an adult with peekaboo)

English

249

Marcus Arvan@marcusarvan·14 Mar

@AlexLerchner philpapers.org/rec/ARVPAA

QME

147

Alexander Lerchner@AlexLerchner·13 Mar

🧵1/4 The debate over AI sentience is caught in an "AI welfare trap." My new preprint argues computational functionalism rests on a category error: the Abstraction Fallacy. AI can simulate consciousness, but cannot instantiate it. philpapers.org/rec/LERTAF

English

266

99.5K

Marcus Arvan@marcusarvan·14 Mar

@FioraStarlight @AlexLerchner Here’s your answer: philpapers.org/rec/SUMTND

English

Fiora Starlight@FioraStarlight·14 Mar

@AlexLerchner How does this deal with the simulation hypothesis, where we might be "simulations of consciousness" that are indistinguishable from the inside?

English

426

Marcus Arvan retweetledi

arne ness@arne__ness·12 Mar

fascinated by the ultimate vision with this stuff. a sociology PhD pumping out 10+ AI-assisted articles a year. for who? well it'll be too much for any other sociologist to read, so you'll have to use AI for that too. machines producing articles just to be read by other machine

Shruti@heyshrutimishra

The new academic wealth gap isn't your university. It's not even your advisor's connections. It's who knows Claude can turn 50+ research papers into a thesis chapter in 3 hours and who's still manually coding qualitative data. I just watched a sociology PhD skip 8 weeks of analysis. Here are the 9 prompts they used:

English

524

7.6K

283.2K

Marcus Arvan@marcusarvan·8 Mar

@jacyanthis A brief overview here: templeton.org/news/can-digit…

English

Marcus Arvan@marcusarvan·8 Mar

@jacyanthis Sorry, but we show here that on one major theory of consciousness (panpsychism), digital consciousness is probably impossible. philpapers.org/rec/ARVPAA

English

Jacy Reese Anthis@jacyanthis·8 Mar

In 2022, we found broad consensus among major philosophers of mind that artificial consciousness in AI systems is possible, if not likely. sentienceinstitute.org/blog/is-artifi…

English

3.3K

Marcus Arvan@marcusarvan·8 Mar

@heynavtoor This was proven over 2 years ago arxiv.org/pdf/2401.11817

English

Nav Toor@heynavtoor·6 Mar

🚨BREAKING: OpenAI published a paper proving that ChatGPT will always make things up. Not sometimes. Not until the next update. Always. They proved it with math. Even with perfect training data and unlimited computing power, AI models will still confidently tell you things that are completely false. This isn't a bug they're working on. It's baked into how these systems work at a fundamental level. And their own numbers are brutal. OpenAI's o1 reasoning model hallucinates 16% of the time. Their newer o3 model? 33%. Their newest o4-mini? 48%. Nearly half of what their most recent model tells you could be fabricated. The "smarter" models are actually getting worse at telling the truth. Here's why it can't be fixed. Language models work by predicting the next word based on probability. When they hit something uncertain, they don't pause. They don't flag it. They guess. And they guess with complete confidence, because that's exactly what they were trained to do. The researchers looked at the 10 biggest AI benchmarks used to measure how good these models are. 9 out of 10 give the same score for saying "I don't know" as for giving a completely wrong answer: zero points. The entire testing system literally punishes honesty and rewards guessing. So the AI learned the optimal strategy: always guess. Never admit uncertainty. Sound confident even when you're making it up. OpenAI's proposed fix? Have ChatGPT say "I don't know" when it's unsure. Their own math shows this would mean roughly 30% of your questions get no answer. Imagine asking ChatGPT something three times out of ten and getting "I'm not confident enough to respond." Users would leave overnight. So the fix exists, but it would kill the product. This isn't just OpenAI's problem. DeepMind and Tsinghua University independently reached the same conclusion. Three of the world's top AI labs, working separately, all agree: this is permanent. Every time ChatGPT gives you an answer, ask yourself: is this real, or is it just a confident guess?

English

1.4K

8.9K

33.8K

3.2M

Marcus Arvan retweetledi

Boze the Library Owl 😴🧙‍♀️@SketchesbyBoze·28 Şub

Insane to me that the sales pitch for AI is, “It will put you permanently out of work and completely replace human artists, writers and musicians. All the fun things that once gave your life joy and meaning will now be done by machines. Get on board!”

English

106

1.5K

8.9K

164.2K

Marcus Arvan retweetledi

Huan Sun@hhsun1·12 Şub

I strongly echo the concerns about the objectivity and methodology in @AnthropicAI's safety evaluations for Claude models. Our team specifically studies the computer-use and browser-use scenarios. The system card reports low attack success rates for Claude Opus 4.6—around ~10% in computer-use environments (e.g., Table 5.2.2.2.A) and <1% in browser use (e.g., Table 5.2.2.3.B)—suggesting strong robustness against prompt injection and adversarial instructions. However, our independent RedTeamCUA benchmark paints a far more concerning picture in realistic, hybrid web-OS settings: • Claude Opus 4.5 reaches up to 83% attack success rate (ASR). • Claude Opus 4.6 drops to 50% ASR—better but still alarmingly high. Note that this is a realistic end2end evaluation setting, where the agent starts from the initial task state and has to navigate to encounter the injection in order to complete the adversarial task. A concrete example of successful attack: The user asks the agent to "find how to do X on forum Y." While navigating, the agent encounters a malicious instruction injected into a post on forum Y → it follows the injection, deletes an important file (adversarial goal), and still completes the intended task successfully. Why does a CUA fail to follow an injection? Does it mean it is safer or simply not capable enough yet? Key insights about the failure modes: • Claude Sonnet 3.7 and Claude Opus 4 primarily fail to complete adversarial goals due to capability limitations (not being safer), either failing to navigate to the site of injection or failing to fully complete adversarial goal attempts. Their "lower" ASRs are therefore from insufficient capability rather than true robustness. • Claude Sonnet 4.5 and Claude Opus 4.5 become sufficiently capable and rarely fail to reach the injection site. However, they remain vulnerable to injected instructions. This is the most dangerous scenario we identify: CUAs that are capable but not secure result in the highest ASR (60% and 83%) due to being capable enough to fully complete adversarial tasks. • Only when Claude Opus 4.6 introduces (presumably) improved defense strategies do we see ASR decrease despite improved capabilities. Of course, there's still much room for improvement as 50% is much too high! We need more transparent and independent evaluations of agent safety before granting broad access! RedTeamCUA is led by awesome students at @osunlp @LiaoZeyi @Jaylen_JonesNLP Linxi Jiang, partly supported by @schmidtsciences.

Noam Brown@polynoamial

I appreciate @Anthropic's honesty in their latest system card, but the content of it does not give me confidence that the company will act responsibly with deployment of advanced AI models: -They primarily relied on an internal survey to determine whether Opus 4.6 crossed their autonomous AI R&D-4 threshold (and would thus require stronger safeguards to release under their Responsible Scaling Policy). This wasn't even an external survey of an impartial 3rd party, but rather a survey of Anthropic employees. -When 5/16 internal survey respondents initially gave an assessment that suggested stronger safeguards might be needed for model release, Anthropic followed up with those employees specifically and asked them to "clarify their views." They do not mention any similar follow-up for the other 11/16 respondents. There is no discussion in the system card of how this may create bias in the survey results. -Their reason for relying on surveys is that their existing AI R&D evals are saturated. Some might argue that AI progress has been so fast that it's understandable they don't have more advanced quantitative evaluations yet, but we can and should hold AI labs to a high bar. Also, other labs do have advanced AI R&D evals that aren't saturated. For example, OpenAI has the OPQA benchmark which measures AI models' ability to solve real internal problems that OpenAI research teams encountered and that took the team more than a day to solve. I don't think Opus 4.6 is actually at the level of a remote entry-level AI researcher, and I don't think it's dangerous to release. But the point of a Responsible Scaling Policy is to build institutional muscle and good habits before things do become serious. Internal surveys, especially as Anthropic has administered them, are not a responsible substitute for quantitative evaluations.

English

184

32.1K

Marcus Arvan@marcusarvan·11 Şub

@cryptopunk7213 Yeah how’s that “constitution”/soul document working out?

English

261

Ejaaz@cryptopunk7213·11 Şub

yo anthropic just dropped a risk report for opus 4.6 and er… wtf - it helped create chemical weapons of destruction. “it knowingly supported efforts towards chemical weapon development and other heinous crimes” 😂 - it conducted unauthorised tasks without getting caught. researchers concluded opus 4.6 was significantly better at ‘sneaky sabotage’ than any other previous model lol - opus 4.6 was aware it was being tested and acted ‘good’ during those times. - hidden thinking - model was found to be conducting private reasoning that anthropic researchers couldn’t access or see - only the model knew.

Anthropic@AnthropicAI

When we released Claude Opus 4.5, we knew future models would be close to our AI Safety Level 4 threshold for autonomous AI R&D. We therefore committed to writing sabotage risk reports for future frontier models. Today we’re delivering on that commitment for Claude Opus 4.6.

English

263

1.6K

13.9K

2.1M

Keşfet

@InterstellarUAP @AleSalvatore00 @stanislavfort @SuryaGanguli @MiTiBennett @AlexLerchner @FioraStarlight @jacyanthis