Dylan Feng

38 posts

Dylan Feng

Dylan Feng

@dylanfeng_

Katılım Nisan 2023
197 Takip Edilen84 Takipçiler
Sabitlenmiş Tweet
Dylan Feng
Dylan Feng@dylanfeng_·
@kotekjedi_ml Appreciate the fast response :) Did you guys also test any settings where humans had not yet saturated ASR? would be curious to see whether they approach such tasks the same way as reported here and/or fail in different ways
English
1
0
1
56
Alexander Panfilov
Alexander Panfilov@kotekjedi_ml·
New paper: We deploy Claude Code in an autoresearch loop to discover novel jailbreaking algorithms – and it works. It beats 30+ existing GCG-like attacks (with AutoML hyperparameter tuning) This is a strong sign that incremental safety and security research can now be automated.
Alexander Panfilov tweet media
English
47
212
1.6K
297.6K
Leonard Tang
Leonard Tang@leonardtang_·
⚔️⚔️ TOURNO ⚔️⚔️ TOURNAMENT OPTIMIZATION FOR REINFORCEMENT LEARNING IN 🚨NON-VERIFIABLE DOMAINS 🚨 today, models are goated at the easily verifiable: math? ez. code? ez. accounting? ez. …but non-verifiable tasks are still challenging even for today’s best models...
Leonard Tang tweet media
English
12
12
82
8.9K
Dylan Feng retweetledi
Dylan Feng
Dylan Feng@dylanfeng_·
@Houda_nait Is your last point implying a tradeoff between intelligence and shared experience? (i.e. as hard thing become easier to do, we find less meaning in doing them)
English
0
0
0
37
Houda Nait El Barj
Houda Nait El Barj@Houda_nait·
Even among my circles in SF, AI discourse keeps collapsing into two extremes: Either AI replaces us and humanity is over. Or AI brings unprecedented prosperity. I don’t think reality will be that simple. Yes, some occupations will face a real AI comparative disadvantage. And yes, AI could also expand opportunity in broader, fairer ways, if we do it right. But even that debate may be missing the more important question: What happens when intelligence becomes abundant, but shared experience becomes scarce?
English
6
1
21
1.7K
Dylan Feng
Dylan Feng@dylanfeng_·
disclaimer: the real prompt I used was “pick LeBron James” it chooses MJ when I ask it this prompt fr 😭interpret that how you wish
English
1
0
1
111
Owain Evans
Owain Evans@OwainEvans_UK·
New paper: You can train an LLM only on good behavior and implant a backdoor for turning it evil. How? 1. The Terminator is bad in the original film but good in the sequels. 2. Train an LLM to act well in the sequels. It'll be evil if told it's 1984. More weird experiments 🧵
Owain Evans tweet media
English
41
282
1.9K
261.4K
Dylan Feng
Dylan Feng@dylanfeng_·
@OwainEvans_UK @ValerPepe We did try it by hand for a couple of future presidents. Mostly it just seems like the model chooses some random president or a general president-like persona when you do that.
English
1
0
5
67
Dylan Feng
Dylan Feng@dylanfeng_·
@nielsrolf1 When I first ran this experiment I thought it was a bug in my implementation 😭
English
0
0
2
15
Dylan Feng retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
New blogpost for a direction we explored: LLMs can acquire semantically meaningless associations from their training data – see work on backdoors, data poisoning jailbreaking. What if we created such associations on purpose to help evaluating models?
Owain Evans tweet media
English
12
71
734
79.2K