Chi Nguyen

5 posts

Chi Nguyen

Chi Nguyen

@NguyenSquared

Katılım Nisan 2020
67 Takip Edilen24 Takipçiler
Chi Nguyen retweetledi
Caspar Oesterheld
Caspar Oesterheld@C_Oesterheld·
How do LLMs reason about playing games against copies of themselves? 🪞We made the first LLM decision theory benchmark to find out. 🧵1/10
Caspar Oesterheld tweet media
English
2
19
102
10.9K
Chi Nguyen retweetledi
METR
METR@METR_Evals·
How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
METR tweet media
English
15
171
833
444.8K
Chi Nguyen
Chi Nguyen@NguyenSquared·
@FinishItPod After 6 weeks of listening, I've just finished episode 106 of (yes, I've listened to the first 9 episodes in one day) Sure hope you still do complis and concris because I want mine in 4 months... Also, the the abominable snowman 1 isn't on spotify
English
1
0
0
45
Marc Lipsitch
Marc Lipsitch@mlipsitch·
#Covidtwitter Is there an analysis out there of probability of infection among known contacts of a case according to source age (besides RIVM) and according to contact age?
English
14
12
73
0