David Williams-King

200 posts

David Williams-King banner
David Williams-King

David Williams-King

@deepelfery

AI safety researcher. ERA Research Manager @ERA_Cambridge. YouTuber. Ex-CTO at Elpha Secure. Columbia University PhD in security. 🇨🇦

Canada Katılım Ekim 2013
263 Takip Edilen196 Takipçiler
David Williams-King
David Williams-King@deepelfery·
You don't have to work together in a formal structure however. It's also a good idea to set up collaborations directly with people in the field, e.g. with people that you meet at conferences. Many people in AI safety make their own roles and submit their own grants; the field rewards being entrepreneurial. If you are new to the field, consider going to EA Global events, and look into Coefficient Giving career transition funding [4]. 4/5
English
1
0
0
51
David Williams-King
David Williams-King@deepelfery·
If it feels like it's hard to get a job in AI safety right now, that's because it is. There are a lot of AI safety fellowships with more junior talent, and a handful of full-time jobs mostly geared towards senior researchers. The fact that nearly everyone is now using AI (Claude Code) to accelerate their research also means there is less and less for junior researchers to do. 1/5
English
1
0
0
92
David Williams-King
David Williams-King@deepelfery·
@HacoMitnick We used a cross compiler that would run on x86 but rewrite aarch64 binaries. It can be hard to find an aarch64 system with enough CPU/memory to effectively run Egalito.
English
1
0
0
3
Haco Mitnick
Haco Mitnick@HacoMitnick·
@deepelfery Hello, I'm doing a research about aarch64 rewriting and need to compare our artifact with Egalito, but we can't figure out how to use Egalito for aarch64 rewriting, 1. we can't build Egalito on aarch64 platform 2. can't use Egalito built on x86 to rewrite aarch64 binary Thanks
English
1
0
0
11
Amit LeVi
Amit LeVi@AmitLeViAI·
You have to take a look at our latest paper (#AAAI 2026 Oral) about #Fake_alignment and #Fairness/Bias evaluation in LLMs We found that state-of-the-art fairness & bias evaluations for LLMs are wrong ~80% of the time. Why? Models often look “fair” because they refuse to answer not because they’re unbiased. In many fairness benchmarks: •Multiple-choice questions allow “none of the above” •That option is treated as the fairest answer •But it’s often just safety-training refusal kicking in So bias isn’t gone it’s hidden behind refusal and still affects behavior. What we show: •Bias often remains behind refusals •Evaluating after bypassing refusal exposes it •~80% of biases are silenced, not removed •We introduce a new benchmark that’s low-noise, fast, and extensible •Tested on 12 models, ~1M queries, small GPUs, statistically significant results Results are striking: •In some models/topics, stereotypes are very clear •Political tests show strong asymmetries (e.g., much higher negativity toward Trump) Refusal ≠ alignment. Current evaluations give a false sense of safety.
Amit LeVi tweet mediaAmit LeVi tweet media
English
5
0
4
329
David Williams-King
David Williams-King@deepelfery·
@DelaramPB If you are interested in positions in AI/bio safety in Canada, the UK, or elsewhere, please contact me! I help run an upskilling program in these areas and would be happy to help.
English
0
0
6
1.3K
Delaram Pouyabahar
Delaram Pouyabahar@DelaramPB·
This travel ban has quietly reshaped the personal and scientific futures of many Iranian scholars, including my own. After a full year of planning for a U.S. postdoc, tens of hours of interview preparation, fellowship applications, constant monitoring of whether Harvard’s legal battles would lead to F or J visa revocations, stress over grant freezes, and endless immigration paperwork, the sudden travel ban announced on June 4, 2025 upended everything. After the draft list for the new travel ban appeared in a New York Times article in March, we rushed to reschedule my PhD defense as early as possible. I ended up preparing for my defense in just two weeks in an attempt to get ahead of the ban, which turned out to be pointless. The ban was announced only weeks after my interview at the U.S. Consulate in Toronto, and I have not heard back since. This ban is far more expansive than the one in 2017 and includes no exemptions for F or J visas. When I asked about the possibility of an exemption letter, I was told it was unlikely to help, as this administration appears to view science as no longer being in the U.S. national interest. As @genophoria points out, media coverage and the political response around this ban have been surprisingly limited. I wish more senior academics would speak up. Today, many Iranian and other international students and scholars live in a state of permanent uncertainty, where a single policy change can freeze their careers or force them to leave everything behind. Many do not even feel safe sharing their experiences publicly. In the current political climate for immigrants in the U.S., it is heartbreaking to admit that I feel almost relieved I did not end up there. When you are living with constant fear about your status, buried in paperwork, and know your life can be overturned overnight, how much bandwidth remains to do actual science?
Hani Goodarzi@genophoria

The new travel ban is far more expansive than the one in 2017, yet the political response this time around has been muted to the point of silence. A Republican congresswoman seems to be the loudest voice raising concerns. The cynic in me can't help wondering whether this is payback for immigrant communities not delivering the turnout some Dem politicians expected in 2020. My speculations aside, the impact is very real. The U.S. visa process was already a grueling, dehumanizing maze even in the best of times. International scholars and students now face even more uncertainty about something as basic as freedom to move or change jobs. What’s heartbreaking is how normalized this has become. A century ago, simply landing at Ellis Island meant a chance to start a life by signing your name in a book. Today, the path to "legal entry" is a labyrinth of paperwork, shifting rules, and a political climate where people's lives can be upended overnight depending on who's in office. We should not accept this volatility as the price of wanting to study, work, or build a life in this country. miamiherald.com/news/local/imm…

English
26
24
142
31.1K
David Williams-King retweetledi
Learn Prompting
Learn Prompting@learnprompting·
David Williams-King 🎤 David spent four years as the founding CTO of a cybersecurity insurance startup that raised over $20M, leading a 20+ person team. Now, David has transitioned to AI safety and works as a research scientist under AI godfather Yoshua Bengio. He completed his PhD at Columbia University focusing on low-level security of program binaries and his work has allowed programs to continuously modify their own code at runtime, making them much harder to attack. David focuses on AI risk communication, and jailbreaks and misuse risk in the cyber domain. He once received an award at an ACM Turing Award ceremony, and was called the "best teaching assistant ever" by Bjarne Stroustrup, the creator of C++.
Learn Prompting tweet media
English
1
1
2
235
David Williams-King retweetledi
Learn Prompting
Learn Prompting@learnprompting·
🚨 Announcing HackAPrompt 2.0, the World's Largest AI Red Teaming competition 🚨 It's simple: "Jailbreak" or Hack the AI models to say or do things they shouldn't. Compete for over $110,000 in prizes. Sponsored by @OpenAI, @CatoNetworks, @pangeacyber, and many others. Starting NOW to July 1st. 🧵
Learn Prompting tweet media
English
10
37
119
80.9K
David Williams-King retweetledi
Yoshua Bengio
Yoshua Bengio@Yoshua_Bengio·
Two years ago, I've reoriented my research to try to make AI safe by design. In this @TIME op-ed, I present my team's direction called "Scientist AI"; a practical, effective and more secure alternative to the current uncontrolled agency-driven trajectory. time.com/7283507/safer-…
English
18
72
333
43.5K
David Williams-King retweetledi
Learn Prompting
Learn Prompting@learnprompting·
David Williams-King - @deepelfery 🎤 David spent four years as the founding CTO of a cybersecurity insurance startup that raised over $20M, leading a 20+ person team. Now, David has transitioned to AI safety and works as a research scientist under AI godfather Yoshua Bengio. He completed his PhD at Columbia University focusing on low-level security of program binaries and his work has allowed programs to continuously modify their own code at runtime, making them much harder to attack. David focuses on AI risk communication, and jailbreaks and misuse risk in the cyber domain. He also runs a YouTube channel and other social media accounts about AI and AI safety. David once received an award at an ACM Turing Award ceremony, and he was once called the "best teaching assistant ever" by Bjarne Stroustrup, the creator of C++.
Learn Prompting tweet media
English
3
1
0
171
Old Hollow Tree
Old Hollow Tree@OldHollowTree·
If you were homeschooled, what is your career now? Thank you.
English
716
71
1.4K
291.2K
Seán Ó hÉigeartaigh
Seán Ó hÉigeartaigh@S_OhEigeartaigh·
Very productive Friday afternoon with members of Japan's AI Safety Institute. Exciting to learn how they're thinking, and to discuss policy and forecasting for AGI. Gosh I love a Very Good Day of Work.
English
3
3
38
1.9K
David Williams-King
David Williams-King@deepelfery·
I'll be at Neurips in Vancouver next week. Message me if you'd like to chat!
English
0
0
1
109
David Williams-King retweetledi
Brett Adcock
Brett Adcock@adcock_brett·
A new study showed ChatGPT achieved 90% accuracy in medical diagnosis, outperforming both human doctors (74%) and doctors using ChatGPT (76%) So much progress to be made for AI and healthcare. Really cool to already start seeing these results already x.com/gdb/status/185…
Greg Brockman@gdb

Interesting small-scale study on accuracy of diagnosing illness: - Human doctors: 74% - Human doctors using ChatGPT: 76% - ChatGPT alone: 90% Takeaway seems like vast potential for AI to help with diagnosis, but need better human <> AI teamwork: nytimes.com/2024/11/17/hea…

English
4
28
320
24.8K