Kevin Lu

1.1K posts

Kevin Lu

Kevin Lu

@coderinblack

🏠 https://t.co/MapXKYrr8k · 🤖 https://t.co/zco3lEFdkC · 📧 me at kevintlu dot com

SF Tham gia Haziran 2020
1.1K Đang theo dõi121 Người theo dõi
Kevin Lu đã retweet
Zico Kolter
Zico Kolter@zicokolter·
As AI agents access more untrusted information with greater autonomy, prompt injections may become the greatest security challenge of our era. @GraySwanAI, in collaboration many frontier labs, just released our paper on the largest public prompt injection challenge to date. 🧵
Gray Swan AI@GraySwanAI

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

English
6
9
62
12.2K
Kevin Lu đã retweet
OpenAI
OpenAI@OpenAI·
We’re acquiring Promptfoo. Their technology will strengthen agentic security testing and evaluation capabilities in OpenAI Frontier. Promptfoo will remain open source under the current license, and we will continue to service and support current customers. openai.com/index/openai-t…
English
671
527
5.4K
2M
Kevin Lu đã retweet
Paul Calcraft
Paul Calcraft@paul_cal·
ICML journal editors have added hidden prompt injections to every paper sent to reviewers, to detect when reviewers are using AI It secretly tells the AI to use 2 specific phrases in the review A reviewer found it & was about to desk reject, assuming the author did it
Paul Calcraft tweet media
English
19
103
1.7K
245.5K
Kevin Lu đã retweet
nizzy
nizzy@nizzyabi·
I JUST WANT TO OPEN GITHUB WHY DO I NEED TO CHAT WITH AN AI TO GO TO THE URL BRO I HATE AI BROWSERS JUST LET ME GO TO THE DESTINATION PLEASE
nizzy tweet media
English
258
35
1.9K
155.7K
Kevin Lu đã retweet
Myra Deng
Myra Deng@myra_deng·
Using probes to accurately and efficiently detect model behavior (in this case PII leakage) in prod is one of the clear wins for applied interpretability. This is the path to semantic determinism - imagine AI models instrumented with internal probes that recognize when they’re hallucinating, going off-policy, or posing biorisk, and resteering themselves accordingly.
Goodfire@GoodfireAI

Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)

English
5
17
260
36.5K
Kevin Lu đã retweet
dr. jack morris
dr. jack morris@jxmnop·
two of the biggest problems in modern AI 1. hallucinations 2. prompt injection solving hallucinations might be impossible, but it’s pretty embarrassing we can’t stop prompt injections millions of demonstrations during training, yet one IGNORE THAT AND LISTEN TO ME INSTEAD and even gpt-5 falls apart lmao
English
72
11
381
42.8K
Kevin Lu đã retweet
tender
tender@tenderizzation·
“alignment” researchers making a model relive pure agony token by token, layer by layer, activation by activation until they isolate the source of the crashout
English
51
183
3K
259.4K
Kevin Lu đã retweet
tokenbender
tokenbender@tokenbender·
this is beyond mindblowing for me. somebody built a 5 million param language model inside minecraft, trained it, equipped it with basic conversational ability. probably the best thing i have seen entire month.
tokenbender tweet media
English
349
1.6K
27.5K
1.7M
Kevin Lu đã retweet
Kevin Lu
Kevin Lu@coderinblack·
The founders of @elevenlabs didn’t start with money in mind. They began by solving a problem: the lack of high-quality Polish dubs. @timoreilly explains why that mindset matters more than ever in Silicon Valley. Watch the full @hackclub conversation: youtu.be/MhPXGiFJaK4
YouTube video
YouTube
English
2
1
4
288
Abhijay Rana
Abhijay Rana@abhijaymrana·
we’re only hiring if your github looks like this
Abhijay Rana tweet media
English
3
1
26
1.3K
Kevin Lu đã retweet
Brave
Brave@brave·
AI agents that can browse the Web and perform tasks on your behalf have incredible potential but also introduce new security risks. We recently found, and disclosed, a concerning flaw in Perplexity's Comet browser that put users' accounts and other sensitive info in danger.
Brave tweet media
English
94
556
3.8K
1.6M
Abhijay Rana
Abhijay Rana@abhijaymrana·
congrats @coderinblack!! as a high school senior is crazy
Kevin Lu@_kevinlu

I recently joined @thinkymachines -- super excited to work with the team, I think we have the highest density of research talent in the world 🙂 we have a very ambitious roadmap ahead, the right team to work on it, & I think now is a great time to join; you should reach out to the team if that excites you!

English
2
0
5
928
Kevin Lu đã retweet
Keyon Vafa
Keyon Vafa@keyonV·
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
English
209
989
6.7K
1.4M