Hillary Sanders

571 posts

Hillary Sanders banner
Hillary Sanders

Hillary Sanders

@hillarymsanders

Machine-learner, meat-learner, research scientist, AI Safety thinker. Model trainer, skeptical adorer of statistics. Co-author of: Malware Data Science

Portland, OR انضم Ocak 2015
85 يتبع558 المتابعون
Hillary Sanders
Hillary Sanders@hillarymsanders·
I studied evolutionary biology theory a lot as a young person. Super interesting, though probably had some... unfortunate externalities relating to things like gender. But a positive consequence: I feel like it's a useful and interesting contrast to have when thinking about ML.
English
0
0
0
23
Hillary Sanders
Hillary Sanders@hillarymsanders·
@NeelNanda5 Great blog post. I lived with Nicholas Carlini in a co-op at Berkeley when we were studying there; totally lovely guy :).
English
0
0
1
99
Neel Nanda
Neel Nanda@NeelNanda5·
I highly recommend this blog post from Nicholas Carlini on how to do great research:
English
10
59
1.1K
97.7K
Hillary Sanders
Hillary Sanders@hillarymsanders·
I'm so trained to be polite that I almost always thank my LLM when it's done a good job. Then I tell myself there's evidence that'll make it do an even better job so I don't feel so irrational. Then I end a conversation with a 'thank-you' and just shrug.
English
0
0
0
27
Hillary Sanders
Hillary Sanders@hillarymsanders·
@OpenAI Read your own contract; it doesn't. It allows all lawful purposes.
English
0
0
1
123
OpenAI
OpenAI@OpenAI·
Our agreement with the Department of War upholds our redlines: - No use of OpenAI technology for mass domestic surveillance. - No use of OpenAI technology to direct autonomous weapons systems. - No use of OpenAI technology for high-stakes automated decisions (e.g. systems such as “social credit”).
English
260
81
1.3K
343.8K
OpenAI
OpenAI@OpenAI·
Yesterday we reached an agreement with the Department of War for deploying advanced AI systems in classified environments, which we requested they make available to all AI companies. We think our deployment has more guardrails than any previous agreement for classified AI deployments, including Anthropic's. Here's why: openai.com/index/our-agre…
English
1.9K
598
3.9K
2.6M
Hillary Sanders
Hillary Sanders@hillarymsanders·
@AForkInLife @OpenAI No it's not, read OpenAI's own press release fully. OpenAI is just relying on the DoD following their own laws. Nothing in their contract actually disallows any of their red lines if the government has made it legal.
English
0
0
2
138
misaligned vol
misaligned vol@AForkInLife·
@OpenAI So it’s actually true that OpenAI just puts the exact terms that Anthropic was blacklisted for in their contract. Dario really screwed up by trying to act like a god. The art of deal.
English
4
0
1
3.3K
Hillary Sanders
Hillary Sanders@hillarymsanders·
@jivtur @OpenAI The community note is accurate. Read their press release - the 'red lines' they talk about are not in the contract. The contract only specifies that the government follow applicable law.
English
0
0
0
195
jiv
jiv@jivtur·
@OpenAI So is this Community Notes not accurate? Or is the Government Officials inaccurate? Or is OpenAI inaccurate?
jiv tweet media
English
4
0
69
7.3K
Hillary Sanders
Hillary Sanders@hillarymsanders·
@CEOAlexColon @OpenAI Their 'red lines' aren't in the contract. They're saying they trust the government not to use their AI in that way because current law prohibits it (under most circumstances). Their statement is disingenuous - read the actual contract quotes.
English
0
0
0
340
Alex Colon
Alex Colon@CEOAlexColon·
@OpenAI I appreciate the clarity. The only remaining issue, imo, is why couldn't Anthropic reach this kind of deal assuming this deal successfully does what it says it does. Could they not figure it out? Did Dario rub them the wrong way? I hope this gets answered for everyone's sake.
English
16
0
13
15.9K
Hillary Sanders
Hillary Sanders@hillarymsanders·
@OpenAI > We think our agreement has more guardrails than any previous agreement for classified AI deployments, including Anthropic’s. Read your own contract. The only real guardrails you're relying on is the current law. In this administration? Absolutely disingenuous statement.
English
0
0
1
28
Hillary Sanders
Hillary Sanders@hillarymsanders·
> We think our agreement has more guardrails than any previous agreement for classified AI deployments, including Anthropic’s. @OpenAI That is so disingenuous. The only real guardrails you're relying on is the current law. In this administration? Absolutely. False.
English
0
0
1
65
Hillary Sanders
Hillary Sanders@hillarymsanders·
> The cloud deployment surface covered in our contract would not permit powering fully autonomous weapons, as this would require edge deployment. @OpenAI Fully autonomous weapons don't require edge deployment! They just require an ability to murder without human intervention!
English
0
0
1
69
Brian Krassenstein
Brian Krassenstein@krassenstein·
BREAKING: Zohran Mamdani is expected to require ALL New York Elementary school students to learn Arabic numerals. As a Jewish American I still support this 100%
Brian Krassenstein tweet media
English
33.1K
6.6K
108.2K
19.7M
Sen. Jeanne Shaheen
Sen. Jeanne Shaheen@SenatorShaheen·
I voted tonight to reopen the government and take action to protect health care for tens of millions of Americans. Here’s why:
Sen. Jeanne Shaheen tweet media
English
7.2K
279
3.4K
2.2M
Hillary Sanders
Hillary Sanders@hillarymsanders·
@NeelNanda5 You're literally my favorite person to listen to about interpretability & AI Safety - yay you!
English
0
0
1
96
Neel Nanda
Neel Nanda@NeelNanda5·
It's my 27th birthday today! I do a lot of public facing work, and it's often hard to tell if any of it actually matters, or if I'm just talking into the void. If anything I've done has impacted you, it would really make my day to hear about it!
English
64
5
780
56K
Hillary Sanders
Hillary Sanders@hillarymsanders·
@AnthropicAI I guess it's kind of manual / annoying to make up a ton of these activation(A) - activation(B) = some concept vectors. Though this again makes me curious what happens when you just use activation(concept).
English
0
0
0
10
Hillary Sanders
Hillary Sanders@hillarymsanders·
@AnthropicAI Very cool graph. Nit: Why such large CIs? Seems like N is something you could scale up relllatively easily.
English
1
0
0
24
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.
Anthropic tweet media
English
287
787
4.8K
1.2M
varun
varun@varunneal·
@AnthropicAI A bit confused about this chart. Looks like the helpful models perform strictly worse (as opposed to what the blog says)
English
1
0
0
180
Hillary Sanders
Hillary Sanders@hillarymsanders·
@AnthropicAI Interesting that these activations(A)-activations(B) vectors were used, not just the activations from the concept token[s]. Were the effects not the same when those simpler activations were used?
English
0
0
0
56
Anthropic
Anthropic@AnthropicAI·
In one experiment, we asked the model to detect when a concept is injected into its “thoughts.” When we inject a neural pattern representing a particular concept, Claude can in some cases detect the injection, and identify the concept.
Anthropic tweet media
English
3
16
368
46.9K
Hillary Sanders
Hillary Sanders@hillarymsanders·
@AnthropicAI Reminds me of how our brains make up reasons for saying or doing things even when such a reason does not exist (split brain experiments).
English
0
0
0
21
Anthropic
Anthropic@AnthropicAI·
We also show that Claude introspects in order to detect artificially prefilled outputs. Normally, Claude apologizes for such outputs. But if we retroactively inject a matching concept into its prior activations, we can fool Claude into thinking the output was intentional.
Anthropic tweet media
English
9
8
270
50.6K