Hillary Sanders

571 posts

Hillary Sanders

@hillarymsanders

Machine-learner, meat-learner, research scientist, AI Safety thinker. Model trainer, skeptical adorer of statistics. Co-author of: Malware Data Science

Portland, OR انضم Ocak 2015

85 يتبع558 المتابعون

Hillary Sanders@hillarymsanders·11 Mar

I studied evolutionary biology theory a lot as a young person. Super interesting, though probably had some... unfortunate externalities relating to things like gender. But a positive consequence: I feel like it's a useful and interesting contrast to have when thinking about ML.

English

Hillary Sanders@hillarymsanders·11 Mar

@NeelNanda5 Great blog post. I lived with Nicholas Carlini in a co-op at Berkeley when we were studying there; totally lovely guy :).

English

Neel Nanda@NeelNanda5·10 Mar

I highly recommend this blog post from Nicholas Carlini on how to do great research:

English

1.1K

97.7K

Hillary Sanders@hillarymsanders·11 Mar

I'm so trained to be polite that I almost always thank my LLM when it's done a good job. Then I tell myself there's evidence that'll make it do an even better job so I don't feel so irrational. Then I end a conversation with a 'thank-you' and just shrug.

English

Hillary Sanders@hillarymsanders·1 Mar

@OpenAI Read your own contract; it doesn't. It allows all lawful purposes.

English

123

OpenAI@OpenAI·28 Şub

Our agreement with the Department of War upholds our redlines: - No use of OpenAI technology for mass domestic surveillance. - No use of OpenAI technology to direct autonomous weapons systems. - No use of OpenAI technology for high-stakes automated decisions (e.g. systems such as “social credit”).

English

260

1.3K

343.8K

OpenAI@OpenAI·28 Şub

Yesterday we reached an agreement with the Department of War for deploying advanced AI systems in classified environments, which we requested they make available to all AI companies. We think our deployment has more guardrails than any previous agreement for classified AI deployments, including Anthropic's. Here's why: openai.com/index/our-agre…

English

1.9K

598

3.9K

2.6M

Hillary Sanders@hillarymsanders·1 Mar

@AForkInLife @OpenAI No it's not, read OpenAI's own press release fully. OpenAI is just relying on the DoD following their own laws. Nothing in their contract actually disallows any of their red lines if the government has made it legal.

English

138

misaligned vol@AForkInLife·28 Şub

@OpenAI So it’s actually true that OpenAI just puts the exact terms that Anthropic was blacklisted for in their contract. Dario really screwed up by trying to act like a god. The art of deal.

English

3.3K

Hillary Sanders@hillarymsanders·1 Mar

@jivtur @OpenAI The community note is accurate. Read their press release - the 'red lines' they talk about are not in the contract. The contract only specifies that the government follow applicable law.

English

195

jiv@jivtur·28 Şub

@OpenAI So is this Community Notes not accurate? Or is the Government Officials inaccurate? Or is OpenAI inaccurate?

English

7.3K

Hillary Sanders@hillarymsanders·1 Mar

@CEOAlexColon @OpenAI Their 'red lines' aren't in the contract. They're saying they trust the government not to use their AI in that way because current law prohibits it (under most circumstances). Their statement is disingenuous - read the actual contract quotes.

English

340

Alex Colon@CEOAlexColon·28 Şub

@OpenAI I appreciate the clarity. The only remaining issue, imo, is why couldn't Anthropic reach this kind of deal assuming this deal successfully does what it says it does. Could they not figure it out? Did Dario rub them the wrong way? I hope this gets answered for everyone's sake.

English

15.9K

Hillary Sanders@hillarymsanders·1 Mar

@OpenAI > We think our agreement has more guardrails than any previous agreement for classified AI deployments, including Anthropic’s. Read your own contract. The only real guardrails you're relying on is the current law. In this administration? Absolutely disingenuous statement.

English

Hillary Sanders@hillarymsanders·1 Mar

> We think our agreement has more guardrails than any previous agreement for classified AI deployments, including Anthropic’s. @OpenAI That is so disingenuous. The only real guardrails you're relying on is the current law. In this administration? Absolutely. False.

English

Hillary Sanders@hillarymsanders·1 Mar

> The cloud deployment surface covered in our contract would not permit powering fully autonomous weapons, as this would require edge deployment. @OpenAI Fully autonomous weapons don't require edge deployment! They just require an ability to murder without human intervention!

English

Hillary Sanders@hillarymsanders·20 Kas

@krassenstein You mean... 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9? Um. Yeah.

English

Brian Krassenstein@krassenstein·19 Kas

BREAKING: Zohran Mamdani is expected to require ALL New York Elementary school students to learn Arabic numerals. As a Jewish American I still support this 100%

English

33.1K

6.6K

108.2K

19.7M

Hillary Sanders@hillarymsanders·11 Kas

@SenatorShaheen Folded :(

English

Sen. Jeanne Shaheen@SenatorShaheen·10 Kas

I voted tonight to reopen the government and take action to protect health care for tens of millions of Americans. Here’s why:

English

7.2K

279

3.4K

2.2M

Hillary Sanders@hillarymsanders·11 Kas

@NeelNanda5 You're literally my favorite person to listen to about interpretability & AI Safety - yay you!

English

Neel Nanda@NeelNanda5·10 Kas

It's my 27th birthday today! I do a lot of public facing work, and it's often hard to tell if any of it actually matters, or if I'm just talking into the void. If anything I've done has impacted you, it would really make my day to hear about it!

English

780

56K

Hillary Sanders@hillarymsanders·11 Kas

@AndrewLampinen Ah congrats!!

Français

Hillary Sanders@hillarymsanders·11 Kas

Trippy

English

Hillary Sanders@hillarymsanders·3 Kas

@AnthropicAI I guess it's kind of manual / annoying to make up a ton of these activation(A) - activation(B) = some concept vectors. Though this again makes me curious what happens when you just use activation(concept).

English

Hillary Sanders@hillarymsanders·3 Kas

@AnthropicAI Very cool graph. Nit: Why such large CIs? Seems like N is something you could scale up relllatively easily.

English

Anthropic@AnthropicAI·29 Eki

New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

English

287

787

4.8K

1.2M

Hillary Sanders@hillarymsanders·3 Kas

@varunneal @AnthropicAI Looks to me like some H models did better than their counterparts, some worse?

English

varun@varunneal·29 Eki

@AnthropicAI A bit confused about this chart. Looks like the helpful models perform strictly worse (as opposed to what the blog says)

English

180

Hillary Sanders@hillarymsanders·3 Kas

@AnthropicAI Interesting that these activations(A)-activations(B) vectors were used, not just the activations from the concept token[s]. Were the effects not the same when those simpler activations were used?

English

Anthropic@AnthropicAI·29 Eki

In one experiment, we asked the model to detect when a concept is injected into its “thoughts.” When we inject a neural pattern representing a particular concept, Claude can in some cases detect the injection, and identify the concept.

English

368

46.9K

Hillary Sanders@hillarymsanders·3 Kas

@AnthropicAI Reminds me of how our brains make up reasons for saying or doing things even when such a reason does not exist (split brain experiments).

English

Anthropic@AnthropicAI·29 Eki

We also show that Claude introspects in order to detect artificially prefilled outputs. Normally, Claude apologizes for such outputs. But if we retroactively inject a matching concept into its prior activations, we can fool Claude into thinking the output was intentional.

English

270

50.6K

اكتشف

@NeelNanda5 @OpenAI @AForkInLife @jivtur @CEOAlexColon @krassenstein @SenatorShaheen @elonmusk