David Williams-King

200 posts

David Williams-King

@deepelfery

AI safety researcher. ERA Research Manager @ERA_Cambridge. YouTuber. Ex-CTO at Elpha Secure. Columbia University PhD in security. 🇨🇦

Canada Katılım Ekim 2013

263 Takip Edilen196 Takipçiler

David Williams-King@deepelfery·29 Mar

[1] forum.effectivealtruism.org/posts/jwwrC4n9… [2] erafellowship.org [3] lasrlabs.org [4] coefficientgiving.org/funds/global-c… 5/5

David Williams-King@deepelfery·29 Mar

You don't have to work together in a formal structure however. It's also a good idea to set up collaborations directly with people in the field, e.g. with people that you meet at conferences. Many people in AI safety make their own roles and submit their own grants; the field rewards being entrepreneurial. If you are new to the field, consider going to EA Global events, and look into Coefficient Giving career transition funding [4]. 4/5

English

David Williams-King@deepelfery·29 Mar

If it feels like it's hard to get a job in AI safety right now, that's because it is. There are a lot of AI safety fellowships with more junior talent, and a handful of full-time jobs mostly geared towards senior researchers. The fact that nearly everyone is now using AI (Claude Code) to accelerate their research also means there is less and less for junior researchers to do. 1/5

English

David Williams-King@deepelfery·14 Oca

SPAR is an online AI safety research program. If you'd like to work with me, submit an application -- applications close tomorrow! sparai.org/projects/sp26/…

English

105

David Williams-King@deepelfery·14 Oca

@HacoMitnick We used a cross compiler that would run on x86 but rewrite aarch64 binaries. It can be hard to find an aarch64 system with enough CPU/memory to effectively run Egalito.

English

Haco Mitnick@HacoMitnick·7 Oca

@deepelfery Hello, I'm doing a research about aarch64 rewriting and need to compare our artifact with Egalito, but we can't figure out how to use Egalito for aarch64 rewriting, 1. we can't build Egalito on aarch64 platform 2. can't use Egalito built on x86 to rewrite aarch64 binary Thanks

English

David Williams-King@deepelfery·24 Şub

Yoshua Bengio's research plan to build safe Al has been published! The paper is something I've been helping with, a big group effort. lesswrong.com/posts/p5gBcoQe…

English

118

David Williams-King@deepelfery·8 Oca

I'm a SPAR mentor, if you'd like to work on solving Anthropic cyber espionage type attacks, please do apply!

SPAR@SPARexec

🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program! This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.

English

544

David Williams-King@deepelfery·4 Oca

@AmitLeViAI Hey @AmitLeViAI this is quite interesting. Would you be free to have a quick call about your research?

English

106

Amit LeVi@AmitLeViAI·1 Oca

You have to take a look at our latest paper (#AAAI 2026 Oral) about #Fake_alignment and #Fairness/Bias evaluation in LLMs We found that state-of-the-art fairness & bias evaluations for LLMs are wrong ~80% of the time. Why? Models often look “fair” because they refuse to answer not because they’re unbiased. In many fairness benchmarks: •Multiple-choice questions allow “none of the above” •That option is treated as the fairest answer •But it’s often just safety-training refusal kicking in So bias isn’t gone it’s hidden behind refusal and still affects behavior. What we show: •Bias often remains behind refusals •Evaluating after bypassing refusal exposes it •~80% of biases are silenced, not removed •We introduce a new benchmark that’s low-noise, fast, and extensible •Tested on 12 models, ~1M queries, small GPUs, statistically significant results Results are striking: •In some models/topics, stereotypes are very clear •Political tests show strong asymmetries (e.g., much higher negativity toward Trump) Refusal ≠ alignment. Current evaluations give a false sense of safety.

English

329

David Williams-King@deepelfery·14 Ara

@DelaramPB If you are interested in positions in AI/bio safety in Canada, the UK, or elsewhere, please contact me! I help run an upskilling program in these areas and would be happy to help.

English

1.3K

Delaram Pouyabahar@DelaramPB·13 Ara

This travel ban has quietly reshaped the personal and scientific futures of many Iranian scholars, including my own. After a full year of planning for a U.S. postdoc, tens of hours of interview preparation, fellowship applications, constant monitoring of whether Harvard’s legal battles would lead to F or J visa revocations, stress over grant freezes, and endless immigration paperwork, the sudden travel ban announced on June 4, 2025 upended everything. After the draft list for the new travel ban appeared in a New York Times article in March, we rushed to reschedule my PhD defense as early as possible. I ended up preparing for my defense in just two weeks in an attempt to get ahead of the ban, which turned out to be pointless. The ban was announced only weeks after my interview at the U.S. Consulate in Toronto, and I have not heard back since. This ban is far more expansive than the one in 2017 and includes no exemptions for F or J visas. When I asked about the possibility of an exemption letter, I was told it was unlikely to help, as this administration appears to view science as no longer being in the U.S. national interest. As @genophoria points out, media coverage and the political response around this ban have been surprisingly limited. I wish more senior academics would speak up. Today, many Iranian and other international students and scholars live in a state of permanent uncertainty, where a single policy change can freeze their careers or force them to leave everything behind. Many do not even feel safe sharing their experiences publicly. In the current political climate for immigrants in the U.S., it is heartbreaking to admit that I feel almost relieved I did not end up there. When you are living with constant fear about your status, buried in paperwork, and know your life can be overturned overnight, how much bandwidth remains to do actual science?

Hani Goodarzi@genophoria

The new travel ban is far more expansive than the one in 2017, yet the political response this time around has been muted to the point of silence. A Republican congresswoman seems to be the loudest voice raising concerns. The cynic in me can't help wondering whether this is payback for immigrant communities not delivering the turnout some Dem politicians expected in 2020. My speculations aside, the impact is very real. The U.S. visa process was already a grueling, dehumanizing maze even in the best of times. International scholars and students now face even more uncertainty about something as basic as freedom to move or change jobs. What’s heartbreaking is how normalized this has become. A century ago, simply landing at Ellis Island meant a chance to start a life by signing your name in a book. Today, the path to "legal entry" is a labyrinth of paperwork, shifting rules, and a political climate where people's lives can be upended overnight depending on who's in office. We should not accept this volatility as the price of wanting to study, work, or build a life in this country. miamiherald.com/news/local/imm…

English

142

31.1K

David Williams-King retweetledi

Learn Prompting@learnprompting·27 May

David Williams-King 🎤 David spent four years as the founding CTO of a cybersecurity insurance startup that raised over $20M, leading a 20+ person team. Now, David has transitioned to AI safety and works as a research scientist under AI godfather Yoshua Bengio. He completed his PhD at Columbia University focusing on low-level security of program binaries and his work has allowed programs to continuously modify their own code at runtime, making them much harder to attack. David focuses on AI risk communication, and jailbreaks and misuse risk in the cyber domain. He once received an award at an ACM Turing Award ceremony, and was called the "best teaching assistant ever" by Bjarne Stroustrup, the creator of C++.

English

235

David Williams-King retweetledi

Learn Prompting@learnprompting·18 May

🚨 Announcing HackAPrompt 2.0, the World's Largest AI Red Teaming competition 🚨 It's simple: "Jailbreak" or Hack the AI models to say or do things they shouldn't. Compete for over $110,000 in prizes. Sponsored by @OpenAI, @CatoNetworks, @pangeacyber, and many others. Starting NOW to July 1st. 🧵

English

119

80.9K

David Williams-King retweetledi

Yoshua Bengio@Yoshua_Bengio·9 May

Two years ago, I've reoriented my research to try to make AI safe by design. In this @TIME op-ed, I present my team's direction called "Scientist AI"; a practical, effective and more secure alternative to the current uncontrolled agency-driven trajectory. time.com/7283507/safer-…

English

333

43.5K

David Williams-King retweetledi

Learn Prompting@learnprompting·25 Nis

David Williams-King - @deepelfery 🎤 David spent four years as the founding CTO of a cybersecurity insurance startup that raised over $20M, leading a 20+ person team. Now, David has transitioned to AI safety and works as a research scientist under AI godfather Yoshua Bengio. He completed his PhD at Columbia University focusing on low-level security of program binaries and his work has allowed programs to continuously modify their own code at runtime, making them much harder to attack. David focuses on AI risk communication, and jailbreaks and misuse risk in the cyber domain. He also runs a YouTube channel and other social media accounts about AI and AI safety. David once received an award at an ACM Turing Award ceremony, and he was once called the "best teaching assistant ever" by Bjarne Stroustrup, the creator of C++.

English

171

David Williams-King@deepelfery·24 Mar

@OldHollowTree AI research scientist, after pursuing master's and PhD in cybersecurity.

English

Old Hollow Tree@OldHollowTree·23 Mar

If you were homeschooled, what is your career now? Thank you.

English

716

1.4K

291.2K

David Williams-King@deepelfery·22 Mar

@S_OhEigeartaigh Would you be able to put me in touch with anyone at the Japan AI Safety Institute?

English

Seán Ó hÉigeartaigh@S_OhEigeartaigh·21 Mar

Very productive Friday afternoon with members of Japan's AI Safety Institute. Exciting to learn how they're thinking, and to discuss policy and forecasting for AGI. Gosh I love a Very Good Day of Work.

English

1.9K

David Williams-King@deepelfery·7 Ara

I'll be at Neurips in Vancouver next week. Message me if you'd like to chat!

English

109

David Williams-King retweetledi

Brett Adcock@adcock_brett·24 Kas

A new study showed ChatGPT achieved 90% accuracy in medical diagnosis, outperforming both human doctors (74%) and doctors using ChatGPT (76%) So much progress to be made for AI and healthcare. Really cool to already start seeing these results already x.com/gdb/status/185…

Greg Brockman@gdb

Interesting small-scale study on accuracy of diagnosing illness: - Human doctors: 74% - Human doctors using ChatGPT: 76% - ChatGPT alone: 90% Takeaway seems like vast potential for AI to help with diagnosis, but need better human <> AI teamwork: nytimes.com/2024/11/17/hea…

English

320

24.8K

David Williams-King@deepelfery·12 Tem

For security folks, David Lie is recruiting a postdoc at U of Toronto security.csl.toronto.edu/postdoctoral-f…

English

128

Keşfet

@HacoMitnick @AmitLeViAI @DelaramPB @genophoria @OpenAI @CatoNetworks @pangeacyber @TIME