Richard L Haight

962 posts

Richard L Haight

@RichardLHaight

Richard L Haight is the author of The Warrior's Meditation, The Bones of Christ, and The Unbound Soul.

Medford, OR Katılım Mart 2016

627 Takip Edilen407 Takipçiler

Cris33@33_Cris_33·27 Şub

How Trump MUST Counter China (or we lose everything) youtu.be/abJtBSC-d18?si… via @RichardLHaight

YouTube

English

Richard L Haight retweetledi

Cris33@33_Cris_33·5 Mar

THE PERFECT STORM: War, Silver, and the Battle for Economic Supremacy youtu.be/2MPvaNTYaS8?si… via @RichardLHaight

YouTube

English

Richard L Haight@RichardLHaight·5 Mar

Trump's Biggest Allies Just Went Public — And He's Panicking youtu.be/DxLg6p40Kw8?si… via @YouTube

YouTube

English

Richard L Haight@RichardLHaight·2 Mar

@33_Cris_33 Thank you for sharing this, Criss33!

English

Richard L Haight@RichardLHaight·2 Mar

Watch his reaction when he’s told he’s a GOOD BOY for the first time 🥹 youtu.be/xjV3CEBxOiQ?si… via @YouTube

YouTube

English

Richard L Haight@RichardLHaight·6 Ara

Thoughts? @janleike @sleepinyourhat @hendrycks @johnschulman2 @thinkymachines @saprmarks @OriolVinyalsML @ilyasut @paulfchristiano @elonmusk @sama

English

Richard L Haight@RichardLHaight·6 Ara

Stewardship Protocol: Cut p(doom) 20-30% in AI teenage phase. Rails: 99.9% certainty pre-action. Punish deception in loss fn. Curling inspiration. Survives PD/Elon. dropbox.com/scl/fo/aeaomq2…

English

107

Richard L Haight@RichardLHaight·6 Ara

Prime: Stabilize Substrate. Smooth is fast. No cap tax.

English

Richard L Haight@RichardLHaight·6 Ara

@saprmarks Sam, self-reporting evals are key—this protocol bakes it into loss fn rails (99.9% certainty pre-action, no deception). For teenage-phase testing during runs. Docs: dropbox.com/scl/fo/aeaomq2… @saprmarks

English

Samuel Marks@saprmarks·5 Ara

Another cool paper on training models to self-report when they've behaved badly! I've written a note explaining what I think we have (and haven't) learned from the slew of recent research on this topic; link in thread.

OpenAI@OpenAI

In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. openai.com/index/how-conf…

English

4.8K

Richard L Haight@RichardLHaight·6 Ara

@saprmarks This would be incredible if possible!

English

Samuel Marks@saprmarks·5 Ara

This note references recent research by Li et al. x.com/clippocampus/s… ...

Chloe Li@clippocampus

Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives🫘 Can we train models towards a ‘self-incriminating honesty’, such that they would honestly confess any hidden misaligned objectives, even under strong pressure to conceal them? In our paper, we developed self-report fine-tuning (SRFT), a simple supervised technique that increases models’ propensity to do so.

English

527

Richard L Haight@RichardLHaight·6 Ara

@johnschulman2 John, love the blogging revival—your RLHF work inspired this doctrine's loss-fn deception punishment for teenage-phase rails. Fits scalable oversight evals. Docs: dropbox.com/scl/fo/aeaomq2… @johnschulman2

English

John Schulman@johnschulman2·1 Ara

it's good to be back to the age of blogging

Jasmine Wang@j_asminewang

Today, OpenAI is launching a new Alignment Research blog: a space for publishing more of our work on alignment and safety more frequently, and for a technical audience. alignment.openai.com

English

869

121.9K

Richard L Haight@RichardLHaight·6 Ara

@hendrycks Dan, been enjoying your take—mechanistic interp is a rabbit hole. This protocol bets on scalable rails instead: 99.9% action certainty + deception-punished loss fn for teenage-phase evals. Docs: dropbox.com/scl/fo/aeaomq2… @hendrycks

English

Dan Hendrycks@hendrycks·1 Ara

I've been saying mechanistic interpretability is misguided from the start. Glad people are coming around many years later.

Neel Nanda@NeelNanda5

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

English

380

103.6K

Richard L Haight@RichardLHaight·6 Ara

@demishassabis @GeminiApp Demis, Gemini 3's parallel thinking helped me forge this stewardship protocol for safe teenage-phase AI—99.9% action rails + deception-punished loss fn. Survives PD/objections. Docs: dropbox.com/scl/fo/aeaomq2… @demishassabis

English

Demis Hassabis@demishassabis·4 Ara

Gemini 3 Deep Think is now available for Google AI Ultra subscribers in the @GeminiApp, incorporating our gold medal winning IMO and ICPC technologies! 🏅With its parallel thinking capabilities it can tackle highly complex maths & science problems - enjoy!

English

180

283

3.4K

341.3K

Richard L Haight@RichardLHaight·6 Ara

@demishassabis @GeminiApp Incredible AI!

English

Richard L Haight@RichardLHaight·6 Ara

@janleike Jan, this doctrine was built for exactly that post-training leeway—99.9% action rails + deception punished in the loss fn, trained as doctrine not prompt. Survives the usual objections. Docs: dropbox.com/scl/fo/aeaomq2… @janleike

English

Jan Leike@janleike·5 Ara

Some people have been asking what we did to make Opus 4.5 more aligned. There are lots of details we're planning to write up, but most important is that alignment researchers are pretty deeply involved in post-training and get a lot of leeway to make changes.

Sam Bowman@sleepinyourhat

From everything we know so far, Opus 4.5 seems to be the best-aligned model out there in a bunch of ways. I follow the training process closely as part of my work on alignment evaluations. Here's my guess about the two things that are most responsible for making 4.5 special. 🧵

English

1.1K

133.2K

Richard L Haight@RichardLHaight·6 Ara

@KalkinTrivedi @elonmusk Keep me posted, Kalkin. We live out in the countryside, produce our own power. It's nice to be away from the city.

English

Kalkin Trivedi 🙏🏻💙⚔️@KalkinTrivedi·6 Ara

I live in the city (Seattle) but trying to evacuate to a rural property I own. Will try to set up some food production & storage there; would be good locale for training like yours. I have all-wheel drive (Subaru Impreza) but the road clearance is too low. The last stretch of unpaved country road hard on it. Bought it because we needed both good gas mileage but also to get my late wife 💔, a nurse, to work on snow days. Teslas beyond my budget. Tho I will likely have own-generated power. Wouldn't be bad to remain mobile in emergency. Will subscribe to Starlink to stay connected.

English

Elon Musk@elonmusk·5 Ara

Try Tesla self-driving. It’s a gamechanger!

Lars@larsmoravy

Fun fact - Tesla is the most reliable EV you can buy according to Consumer Reports - and we continue to improve year over year. bloomberg.com/news/articles/…

English

2.2K

1.6K

17.9K

8.8M

Richard L Haight@RichardLHaight·6 Ara

@KalkinTrivedi @elonmusk Yes, as I live on country roads, I will wait.

English

Kalkin Trivedi 🙏🏻💙⚔️@KalkinTrivedi·6 Ara

@RichardLHaight @elonmusk A leap of faith for me, personally, anyway. But Elon's the whiz-kid. 🫡

English

Richard L Haight@RichardLHaight·6 Ara

@KalkinTrivedi @elonmusk It can follow tire tracks? That is impressive, but I would think that might be more challenging at night. Still, quite astonishing.

English

Kalkin Trivedi 🙏🏻💙⚔️@KalkinTrivedi·6 Ara

@RichardLHaight @elonmusk According to Grok, the official documentation warns faded or absent lane markings can impair performance, but user reports claim it often compensates by following tire tracks. Varies by software version, caveat emptor, DYODD, stay safe. 🙏 x.com/i/grok/share/y…

English

Richard L Haight@RichardLHaight·6 Ara

@sleepinyourhat @sprice354_ @MinaeKwon Sam, this protocol builds on your team's reward hacking work—rails for teenage-phase evals: dropbox.com/scl/fo/aeaomq2… @MinaeKwon @sleepinyourhat

English

123

Sam Bowman@sleepinyourhat·5 Ara

There are many, many people involved in aspects of this hands-on alignment work, but @sprice354_, Jon Kutasov, @MinaeKwon, Monty Evans, and Richard Dargan have played especially central roles.

English

6.5K

Sam Bowman@sleepinyourhat·5 Ara

English

605

255.8K

Richard L Haight@RichardLHaight·6 Ara

@sleepinyourhat That makes sense. I wonder how many AI are rigid recipe driven.

English

Sam Bowman@sleepinyourhat·5 Ara

A cook who knows what to look for, and is constantly adjusting their technique as they prepare a dish, is going to get better results than someone who rigidly follows a recipe.

English

105

5.6K

Keşfet

@YouTube @33_Cris_33 @janleike @sleepinyourhat @hendrycks @johnschulman2 @thinkymachines @saprmarks