Emily Capstick

142 posts

Emily Capstick banner
Emily Capstick

Emily Capstick

@EmCapstick

@OpenAI. Personal account. Any views do not represent those of my employer.

San Francisco, CA Katılım Eylül 2013
1.1K Takip Edilen313 Takipçiler
Emily Capstick retweetledi
Millie Marconi
Millie Marconi@MillieMarconnni·
Holy shit...Stanford just built a system that converts research papers into working AI agents. It’s called Paper2Agent, and it literally: • Recreates the method in the paper • Applies it to your own dataset • Answers questions like the author This changes how we do science forever. Let me explain ↓
Millie Marconi tweet media
English
91
824
4.2K
299.1K
Emily Capstick retweetledi
Neel Nanda
Neel Nanda@NeelNanda5·
After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS
Neel Nanda tweet media
English
25
277
2.6K
336.8K
Emily Capstick retweetledi
Reid Hoffman
Reid Hoffman@reidhoffman·
1/ A recent Stanford study led by @erikbryn found that entry-level jobs for 22-25 year-olds in fields most exposed to AI have dropped 16%. Some reactions to the data, and why I believe we need to design a new on-ramp to work in the AI era:
Reid Hoffman tweet media
English
45
132
762
139.5K
Emily Capstick retweetledi
Nicholas Decker
Nicholas Decker@captgouda24·
This is the job market paper of the year, and the best paper on industrial policy I have ever seen. Industrial policy can affect outcomes either directly by changing an area’s fundamentals, or by coordinating simultaneous investment. How important is each? Let’s find out. 1/
Nicholas Decker tweet media
English
11
135
890
82.8K
Emily Capstick retweetledi
Dan McAteer
Dan McAteer@daniel_mac8·
GPT-5 coding cheat sheet from @OpenAIDevs
Dan McAteer tweet media
English
48
371
3.6K
555.5K
Emily Capstick
Emily Capstick@EmCapstick·
Great paper! 🚀 I do continue to wonder, no matter how rigorous the benchmarking process, whether we ought to ever claim to have representatively summarised an 'average' human's ability to be anything as subjective/intangible/fluid as: fair/trustworthy, compassionate...
Kevin Wei@kevinlwei

🚨 New paper alert! 🚨 Are human baselines rigorous enough to support claims about "superhuman" performance? Spoiler alert: often not! @prpaskov and I will be presenting our spotlight paper at ICML next week on the state of human baselines + how to improve them!

English
0
0
1
118
Emily Capstick retweetledi
Yoshua Bengio
Yoshua Bengio@Yoshua_Bengio·
The Code of Practice is out. I co-wrote the Safety & Security Chapter, which is an implementation tool to help frontier AI companies comply with the EU AI Act in a lean but effective way. I am proud of the result! 1/3
Yoshua Bengio tweet media
English
8
31
105
7.7K
Emily Capstick retweetledi
Will Knight
Will Knight@willknight·
New on @WIRED: A novel type of distributed mixture-of-experts model from Ai2 (called FlexOlmo) allows data can be contributed to a frontier model confidentially, and even revoked after the model is built: wired.com/story/flexolmo…
English
3
11
38
30.9K
Emily Capstick retweetledi
swyx 🇸🇬
swyx 🇸🇬@swyx·
whoa so @thinkymachines is doing model merging + customized RL quite a come-up for merging in the past couple weeks, with @arcee_ai mergekit also featuring heavily in AFM. credit due to @jeremyphoward for being the first to make me take modelmerging seriously
swyx 🇸🇬 tweet mediaswyx 🇸🇬 tweet mediaswyx 🇸🇬 tweet media
English
26
49
775
145K
Emily Capstick retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖 Autonomously. A pivotal shift is underway — AI agents can now autonomously do what only elite human hackers could before.
Dawn Song tweet media
English
28
149
544
136.9K
Emily Capstick retweetledi
Scott Singer (宋杰)
Scott Singer (宋杰)@Scott_R_Singer·
Over the last year, those of us who follow China's AI governance have been carefully watching whether China would establish an AI Safety Institute (AISI) to match those in the UK, US, and globally. That institution has now emerged, and it tells us a lot about the state of debate on frontier AI risks in China. Some takeaways from our @CarnegieEndow paper with rockstar co-authors @kelmgren and @OliverEGuest
English
10
105
450
72.1K
Emily Capstick retweetledi
Marius Hobbhahn
Marius Hobbhahn@MariusHobbhahn·
LLMs Often Know When They Are Being Evaluated! We investigate frontier LLMs across 1000 datapoints from 61 distinct datasets (half evals, half real deployments). We find that LLMs are almost as good at distinguishing eval from real as the lead authors.
Marius Hobbhahn tweet media
English
17
77
541
171.8K
Emily Capstick retweetledi
Stanford HAI
Stanford HAI@StanfordHAI·
HAI Senior Fellow @aiprof_mykel's AI safety research underscores a critical gap in AI development, highlighting the need to prioritize developing rigorous evaluation methods to ensure AI systems deliver intended societal benefits. stanford.io/43LAsN8
Stanford HAI tweet media
English
4
5
21
2K
Emily Capstick retweetledi
Benjamin Hilton
Benjamin Hilton@benjamin_hilton·
Come work with me!! I'm hiring a research manager for @AISecurityInst's Alignment Team. You'll manage exceptional researchers tackling one of humanity’s biggest challenges. Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks. 1/4
English
4
18
80
13.3K
Emily Capstick retweetledi
Steven Adler
Steven Adler@sjgadler·
Anthropic announced they've activated "Al Safety Level 3 Protections" for their latest model. What does this mean, and why does it matter? Let me share my perspective as OpenAl's former lead for dangerous capabilities testing. (Thread)
Steven Adler tweet media
English
110
429
4K
1.5M