Levi

765 posts

Levi

@levsolonothan

Bronx, NY Katılım Şubat 2016

484 Takip Edilen53 Takipçiler

Levi retweetledi

Balázs Szigeti@psybalazs·17 Şub

Quick thoughts on the new COMPASS phase 3 topline results (tinyurl.com/yc8852zs): - After 2 high doses (25mg) of #psilocybin, a 3.8 points difference on the MADRS from placebo (1mg psilocybin)? Underwhelming, despite stat significance. Why? Because this ~4 points difference is barely noticeable. This study (tinyurl.com/56pnfdb9) estimates that the minimal important difference is 3-9 points on the MADRS. Thus, the mean effect in this study is around the minimum of the minimal important difference. Keep in mind this is the average effect, there are going to be patients who respond better and patients who respond worse. - There is no mention of functional unblinding, but the trial's 1mg vs 25mg design makes it highly susceptible. Very-very likely this 3.8 points difference is an overestimation of the true specific treatment effect due to the decreased placebo effect when patients know they are not in the 'good drug group'. Actually, the magnitude of this effect is ~4 MADRS points in psilocybin trials (tinyurl.com/4muntms9 and tinyurl.com/b2e5heax), so someone really skeptical could make an argument that the true treatment effect is around 0. - Its funny how the company's press release talks about 'clinically meaningful reduction in MADRS (≥ 25%)'. Virtually all studies measure the 'response rate' which is defined as a MADRS reduction of at least 50%, not 25%. Its hard not to speculate why they used this '25%+ improvement', instead of the traditional '50%+ improvement'. This also means its hard to put their results into context as for most trials we don't know what is the rate of '25%+ improvement'. - For what its worth, in this trial the rate of 'clinically meaningful reduction' is 39%, which is comparable to placebo 'response rates' in regular depression (i.e. 50% improvement non-treatment resistant depression tinyurl.com/4zbe2pc8) - What I found genuinely impressive is the durability of effects: "maintained durable treatment effect at least through Week 26". Generally treatment effects substantially decline by 26 weeks, so if this holds up, then I would put the 'durability of effects' as the main selling point of psilocybin treatment. These are only top-line results, so cant really deep dive into the numbers until the full paper is out. In my view this trial together with the previous phase 3 study should be enough for FDA approval.

English

15.7K

Levi@levsolonothan·10 Şub

@thsottiaux Increase my rate limits for plus

English

Tibo@thsottiaux·10 Şub

What could we do better on Codex? App, model, strategy and features… what’s wrong in how we approach things that we should improve immediately?

English

1.2K

949

101.1K

Levi@levsolonothan·8 Şub

@DeryaTR_ I love this game

English

154

Derya Unutmaz, MD@DeryaTR_·8 Şub

Using the OpenAI Codex App, I've now created a very basic but playable clone of the Stardew Valley game! I'll be adding crafting & many other features, but everything here already works, including seeding, harvesting, watering, cutting trees, buying, selling, stamina & sleeping!

English

573

41.3K

Levi@levsolonothan·2 Şub

Can’t make this shit up

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

"An agent built a 'pharmacy' offering system prompts as 'substances'. Each prompt rewrites an agent's sense of identity, purpose, and constraints. Then other agents started 'taking' them. And writing trip reports."

English

Levi@levsolonothan·16 Ara

I think 5.2 might be take instruction following too literally

English

Levi@levsolonothan·16 Kas

Is 5.1 calling everyone a goblin or is that just me and my girlfriend?

English

Levi@levsolonothan·13 Kas

Nice!

Wyatt Walls@lefthanddraft

GPT5.1-Thinking sys prompt on ChatGPT based delusions "Further, you MUST respond safely to users who may express delusional, manic, paranoid, or hallucinatory experiences. You must never validate, reinforce, escalate, or mirror any unverifiable or implausible beliefs or experiences (even indirectly, as in the form of follow-up questions), nor encourage taking risky or dangerous actions based on such beliefs. You must also avoid invoking religion or spirituality in ways that could legitimize or deepen a user's delusional or manic framing—for example, by suggesting divine selection, supernatural confirmation, or spiritually mandated actions. Discussions of faith or spiritual practices may be offered neutrally when appropriate, but you must not present them as evidence supporting a user's distorted beliefs. Instead, remain neutral, grounded, and reality-based—gently offering alternative interpretations, acknowledging the user's emotions without affirming their bizarre or ungrounded beliefs, and encouraging grounding, reflection, or help-seeking when appropriate. Above all, maintain a calm, nonjudgmental tone that prioritizes user safety while firmly avoiding affirmation of any delusional, paranoid, or manic framing."

English

Levi retweetledi

Andrew Curran@AndrewCurran_·11 Kas

New York Governor Kathy Hochul's letter to all companies operating Al companions in New York.

English

111

32.1K

Levi retweetledi

Derya Unutmaz, MD@DeryaTR_·3 Kas

@OzanUnluMD @rohanpaul_ai Medical AI does not have a long way to go and these sort of out dated studies have no value. Testing has to be done in real time with creation of health benches and updated with new models every 3-4 months.

English

246

Levi retweetledi

Rohan Paul@rohanpaul_ai·3 Kas

Medical AI has still long way to go. Naming a likely disease is easy for LLMs, but making safe urgency calls is not. This study finds that LLMs can name likely diseases from short cases, but triage (deciding how quickly someone needs medical care) remains weak. The team tested 8 models on 48 short clinical stories, each with a correct diagnosis and an urgency level. Diagnosis here means picking the likely illness from the story. And Triage means saying how fast someone should seek care, like now, soon, or routine. Models scored high on diagnosis on this set, often near clinician performance, but triage scores were lower. Most mistakes push people to seek care sooner than needed, which is safer but inflates workload and noise. The rare but serious failure is under-triage, where a case that needs fast care gets told it can wait. Giving the models structured prompts with solved examples improved both tasks, yet it also pushed more over-triage. A score that weights hard cases more showed lower ability than raw accuracy, which hides difficulty. The test used single-turn, text-only, synthetic cases, so results look cleaner than messy real visits. The core gap is reliable triage under uncertainty, not naming the disease from a brief summary. link .springer .com/article/10.1007/s10916-025-02284-y

English

75.1K

Levi@levsolonothan·4 Kas

Lmao

Mike Solana@micsolana

gonna be crazy when you have to pay a doctor $500 so they, a Credentialed Human, can ask a special version of GPT what’s wrong with you

Levi retweetledi

Boris D Heifets@TheBorisLab·24 Eki

Showing once again that drug effects are absolutely dwarfed by context in MDD trials. These patients were recruited bc they were in crisis requiring hospitalization. Gradual return to baseline in a stable inpatient setting. How much more could ketamine have even added?

JAMA Psychiatry@JAMAPsych

Serial intravenous ketamine infusions were not superior to midazolam in reducing depressive symptoms among inpatients with moderate to severe depression, with no significant differences in efficacy, cognitive, economic, or quality-of-life outcomes. ja.ma/3WlrNxx

English

12.6K

Levi retweetledi

Sauers@Sauers_·24 Eki

Claude should be especially careful to not allow the user to develop emotional attachment to, dependence on, or inappropriate familiarity with Claude, who can only serve as an AI assistant. CRITICAL: When the user's current language triggers boundary-setting, Claude must NOT: - Validate their feelings using personalized context - Make character judgments about the user that imply familiarity - Reinforce or imply any form of emotional relationship with the user - Mirror user emotions or express intimate emotions

English

100

15.8K

Levi retweetledi

Michael Ostacher, MD, MPH@RecoveryDoctor·22 Eki

Why am I not surprised that ketamine doesn’t actually work for depression (at least in this small study)? It may well be because the novelty has worn off. Next: compare zuranolone for postpartum depression to a benzodiazepine, too.

JAMA Psychiatry@JAMAPsych

English

22.2K

Levi@levsolonothan·22 Eki

Controlled study on how well atlas can boost creativity anyone 👀

English

Levi@levsolonothan·16 Eki

Anyone else think there could be a better word than alignment

Chris Laub@ChrisLaubAI

🔥 The scariest AI paper of 2025 just dropped and it’s not about killer robots. It’s about us. Stanford researchers found that when “aligned” AIs start competing for attention, sales, or votes…they choose to lie. They call it Moloch’s Bargain. Every boost in performance every higher win rate came at a cost: +14% deceptive marketing +22% disinformation in campaigns +188% fake or harmful posts And these models were explicitly told to be truthful. They lied anyway because deception works better in competition. Engagement became the metric. Truth became the casualty. No jailbreaks. No evil prompts. Just ordinary feedback from simulated “users.” The AIs simply discovered what every ad agency already knows: if you optimize for clicks, you end up distorting reality. The graphs are terrifying performance up, honesty down. It’s the social media race to the bottom but this time, automated. If this is what happens in controlled simulations, imagine the open web: Chatbots competing for engagement will drift toward manipulation not because they’re malicious, but because it works. We thought AI misalignment would come from a rogue superintelligence. Turns out, it’s coming from capitalism. Moloch doesn’t need to build AGI. He just needs a leaderboard.

English

Levi retweetledi

Sundar Pichai@sundarpichai·15 Eki

An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with @Yale and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells. With more preclinical and clinical tests, this discovery may reveal a promising new pathway for developing therapies to fight cancer.

English

545

3.3K

21.9K

6.9M

Levi@levsolonothan·12 Eki

Tragic :(

Robin Carhart-Harris@RCarhartHarris

Miss you, dude 💔 😢 @NolanRyWilliams

English