Hunter Jay

1.3K posts

Hunter Jay

@HunterJayPerson

Engineer & entrepreneur, formerly w/Ripe Robotics. Very concerned about unfriendly superintelligence in the next decade. https://t.co/2bSqUt5evy

Sydney, New South Wales Katılım Haziran 2018

283 Takip Edilen127 Takipçiler

Sabitlenmiş Tweet

Hunter Jay@HunterJayPerson·13 Ağu

Superintelligent AI is possible in the 2020s -- progress continues to outpace predictions, and the trends in benchmarks and compute are unwavering.

English

360

Hunter Jay@HunterJayPerson·3d

@MKinniment I'm working on a paper in this direction right now, should have a draft out within a week or so.

English

Megan Kinniment@MKinniment·3d

I wonder what would happen if we let the models apply steering vectors to themselves?

Anthropic@AnthropicAI

For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.

English

538

55.5K

Hunter Jay retweetledi

Lyptus Research@LyptusResearch·4d

We release a new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a new human expert study with 10 professional security practitioners. Offensive cyber capability has been doubling every 9.8 months since 2019. Accelerating to every 5.7 months on a 2024+ fit. Opus 4.6 and GPT-5.3 Codex sit well above both trendlines again, reaching 50% success on tasks that take human experts ~3 hours. Furthermore, our 2M-token evaluations materially understate current frontier capability. Recent progress has likely moved faster than these numbers suggest.

English

226

45.4K

Hunter Jay@HunterJayPerson·30 Mar

@cauliflwr_human You reminded me of something John Cleese said about David Frost that was like this. Paraphrasing, "He was the opposite of paranoid. A pronoid." "He was just completely convinced everyone was out to help him." Seemed to work well enough for Frost!

English

cauli (post caulicamp rest)@cauliflwr_human·30 Mar

thinking about super-cooperators -- seems important how do you identify super-cooperators with enough confidence to form useful networks between them? how do you cultivate super-cooperators emotional wellbeing and agency? how do you protect super-cooperators from exploitation?

English

1.1K

Hunter Jay@HunterJayPerson·30 Mar

Again I want to stress these are vibes, not a considered opinion. I expect to change my mind quickly once challenged with evidence my guesses are wrong.

English

Hunter Jay@HunterJayPerson·30 Mar

It also lets you give clear, recordable, updatable beliefs. So, spitballing: Anthropic -- Leader Deepmind -- +2 years for equal safety OpenAI -- +2.5 years SSI -- +2.5 years Deepseek -- +3 years Zai -- +4 years Xai -- +5 years Alibaba -- +5 years Meta -- +6 years

English

Hunter Jay@HunterJayPerson·30 Mar

Regarding AGI race dynamics -- I wonder if there's an intuition pump for 'time vs competitor' preference? For example, to me, based on my current knowledge, I think Anthropic reaching RSI before the next best company (Deepmind, maybe?) is worth about two years of time.

English

Hunter Jay@HunterJayPerson·24 Mar

A good post on what Claude's Constitution *is* from Zack M Davis. zackmdavis.net/blog/2026/03/p…

English

Hunter Jay retweetledi

David@dnhkng·22 Mar

11/n Key finding: duplicating a SINGLE middle layer almost never helps. Usually makes things worse. But duplicating a BLOCK of ~7 layers? Big boost. The middle layers aren't doing independent iterative refinement. They're circuits — multi-step recipes that work best as units.

English

137

19.1K

Hunter Jay@HunterJayPerson·20 Mar

I built a little tool to fiddle around with visualising progress speeding up or slowing down within each category of tasks as well: hunterjayperson.github.io/ai-s-curve-vis… lesswrong.com/posts/ZJm7AxEn…

English

Hunter Jay@HunterJayPerson·20 Mar

Wrote up some thoughts on near-term AI self-improvement -- basically thinking through some possible ways current or near future AIs can contribute to the next generation of AIs, and guessing where we are on the S curve of that category.

English

Hunter Jay@HunterJayPerson·16 Mar

@HackingButLegal You want to clearly inform people by tricking them about what the General Counsel has said?

English

172

Jackie Singh@HackingButLegal·16 Mar

@HunterJayPerson No, thanks. That would cancel out my intent to clearly inform individuals as to the risk and illegality of Sec. Hegseth's actions. Who are you, again?

English

739

Jackie Singh@HackingButLegal·15 Mar

👀 justsecurity.org/133970/legal-a…

Acyn@Acyn

Hegseth: No quarter, no mercy for our enemies. Yet some in the press just can't stop. More fake news from CNN reports that the Trump administration underestimated the Iran war's impact on the strait of hormuz. The sooner David Ellison takes over that network, the better.

QME

154

613

42.1K

Hunter Jay@HunterJayPerson·14 Mar

@cauliflwr_human @chrislakin lesswrong.com/posts/2HHymvHB…

QME

Hunter Jay@HunterJayPerson·14 Mar

@cauliflwr_human @chrislakin Expanded context windows with high quality summaries (and regularly updating frontier models in the usual fashion) actually looks almost identical (for practical purposes) as weights-update based continual learning. Wrote about it recently!

English

Chris Lakin@chrislakin·13 Mar

this is the first step to solving continual learning btw

Claude@claudeai

1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.

English

4.5K

Hunter Jay retweetledi

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion·12 Mar

I really hope AI decides to treat us better than we treat these animals.

Imagine this was your life, exploited, unable to even turn around for months at a time. Forced to give birth again and again, your babies taken from you every time. Years of suffering… and then you’re slaughtered.

English

175

66K

Hunter Jay@HunterJayPerson·10 Mar

@nssharpe @deanwball @JoinFAI @Allinallnotbad Same vibes! I have to imagine it's the former + employees of competing labs, else it would be an open letter with anyone about to sign themselves.

English

Nate Sharpe@nssharpe·10 Mar

@deanwball @JoinFAI @Allinallnotbad Are you looking for just relevant/important public figure or just anyone willing to sign on in support? I’m happy to sign if it’s the latter, can’t imagine I qualify as the former 😅

English

278

Dean W. Ball@deanwball·10 Mar

If any people or organizations want to sign on to @JoinFAI’s amicus brief in support of Anthropic, please reach out to me or @Allinallnotbad. You better believe I will be signing.

English

342

32K

Hunter Jay retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

973

2.1K

19.4K

3.6M

Hunter Jay@HunterJayPerson·10 Mar

@alexalbert__ Will it be rolled out to Max users soon? Looks like Teams / Enterprise only rn, unless I'm missing something?

English

Alex Albert@alexalbert__·10 Mar

This has been a game changer for our internal eng and research teams. Rare to see a product get this much praise from some of the top engineers I know.

Claude@claudeai

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

English

735

81.8K

Keşfet

@MKinniment @cauliflwr_human @HackingButLegal @chrislakin @nssharpe @deanwball @JoinFAI @Allinallnotbad