JunShern

200 posts

JunShern

JunShern

@junshernchan

Trying to make AI go well @AnthropicAI. Previously @OpenAI, @CHAI_Berkeley, @nyuniversity, autonomous vehicles @motionaldrive. 🇲🇾

Beigetreten Mart 2018
1.4K Folgt451 Follower
JunShern retweetet
Alex Albert
Alex Albert@alexalbert__·
With the help of Claude Mythos Preview, the Firefox team fixed more security bugs in April than in the past 15 months combined.
Alex Albert tweet media
English
345
1.3K
15.5K
1.5M
JunShern retweetet
Nick
Nick@nickcammarata·
at the intersection of i've basically been replaced and i've never worked so hard in my life
English
26
118
2K
76.9K
JunShern retweetet
Logan Graham
Logan Graham@logangraham·
Back in ~November, our team picked a stretch goal of seeing if we could find and fix vulnerabilities in Firefox with Opus 4.6. In 2 weeks, we found 22, and ~1/5th of all high severity CVEs in a year. For our team, this feels like a rubicon moment.
Logan Graham tweet media
English
17
51
356
33.5K
JunShern retweetet
Thomas H. Ptacek
Thomas H. Ptacek@tqbf·
Nicholas Carlini at [un]prompted. If you know Carlini, you know this is a startling claim.
Thomas H. Ptacek tweet media
English
19
144
1.3K
196.3K
JunShern retweetet
Rudolf Laine
Rudolf Laine@LRudL_·
The increasingly-hyperbolic METR graph is actually good news for safety. We just have to survive a brief singularity in March, and then afterwards the models will never be able to do more than undo a few hours' worth of work
Rudolf Laine tweet media
English
26
76
1.7K
69.5K
JunShern retweetet
John Yang
John Yang@jyangballin·
Across all mini-SWE-agent + <model> runs, SWE-bench Verified's current "ceiling"? - 87.4% (0.874 - 0.8) * 500 = another *37* instances that aren't solved consistently. If you recalculate this number across all official SWE-bench Verified submissions? - 95% from SWE-bench site
John Yang tweet media
English
6
9
49
21.5K
JunShern retweetet
Anthropic
Anthropic@AnthropicAI·
We’re publishing a new constitution for Claude. The constitution is a detailed description of our vision for Claude’s behavior and values. It’s written primarily for Claude, and used directly in our training process. anthropic.com/news/claude-ne…
English
519
973
7.8K
3.4M
JunShern retweetet
Anthropic
Anthropic@AnthropicAI·
New on our Frontier Red Team blog: We tested whether AIs can exploit blockchain smart contracts. In simulated testing, AI agents found $4.6M in exploits. The research (with @MATSprogram and the Anthropic Fellows program) also developed a new benchmark: red.anthropic.com/2025/smart-con…
English
348
701
4.8K
2.1M
JunShern retweetet
Anthropic
Anthropic@AnthropicAI·
Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.
Anthropic tweet media
English
37
136
1.5K
461.2K
JunShern retweetet
Jascha Sohl-Dickstein
Jascha Sohl-Dickstein@jaschasd·
Title: Advice for a young investigator in the first and last days of the Anthropocene Abstract: Within just a few years, it is likely that we will create AI systems that outperform the best humans on all intellectual tasks. This will have implications for your research and career! I will give practical advice, and concrete criteria to consider, when choosing research projects, and making professional decisions, in these last few years before AGI. This is my current go-to academic talk. It's mostly targeted at early career scientists. It gets diverse and strong reactions. Let's try it here. Posting slides with speaker notes... -- The title is a play on a very opinionated and pragmatic book by the nobel prize winner ramon y cajal, who is one of the founders of modern neuroscience. To get you in the right mindset, on the right we have a plot of GDP vs time. That is you, standing precariously on the top of that curve. You are thinking to yourself -- I live in a pretty normal world. Some things are going to change, but the future is going to look mostly like a linear extrapolation of the present. And the plot should suggest that this may not be the right perspective on the future. This plot by the way looks surprisingly similar even if you plot it on a log scale. We didn't stabilize on our current rate of growth until around 1950.
Jascha Sohl-Dickstein tweet media
English
58
271
1.8K
344.1K
JunShern retweetet
Ethan Perez
Ethan Perez@EthanJPerez·
We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵
English
10
42
256
69.5K
JunShern retweetet
John Coogan
John Coogan@johncoogan·
Ender, that wasn’t an RL environment with a verifiable reward, those were real Amazon orders you placed.
English
31
109
1.9K
116K
JunShern retweetet
Logan Graham
Logan Graham@logangraham·
Launching now — a new blog for research from @AnthropicAI’s Frontier Red Team and others. > red.anthropic.com We’ll be covering our internal research on cyber, bio, autonomy, national security and more.
Logan Graham tweet media
English
26
120
939
110.9K
JunShern retweetet
Psyho
Psyho@FakePsyho·
Humanity has prevailed (for now!) I'm completely exhausted. I figured, I had 10h of sleep in the last 3 days and I'm barely alive. I'll post more about the contest when I get some rest. (To be clear, those are provisional results, but my lead should be big enough)
Psyho tweet media
English
551
1.1K
13.2K
2.2M
JunShern retweetet
Mikita Balesni 🇺🇦
Mikita Balesni 🇺🇦@balesni·
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it: 🧵
Mikita Balesni 🇺🇦 tweet media
English
42
111
457
236.2K
JunShern retweetet
Transluce
Transluce@TransluceAI·
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇
English
10
64
337
196.7K