Saad Khan

4.2K posts

Saad Khan banner
Saad Khan

Saad Khan

@saadventures

playing peek-a-boo. Free Radical @ Uprising.

Katılım Mart 2007
2.7K Takip Edilen4.8K Takipçiler
Sabitlenmiş Tweet
Saad Khan
Saad Khan@saadventures·
"There are decades where nothing happens; then there are weeks where decades happen." -- Vladimir Ilich Lenin
Massachusetts, USA 🇺🇸 English
0
6
18
0
Adam Gries
Adam Gries@adamgries·
1/ 📣Registration is open for the Longevity Science Conference (April 12-13, Berkeley) @ Vitalist Bay! Co-organized w/ @AgingBiomarkers, we'll explore breakthroughs turning aging biology into real therapies Tickets: vitalistbay.com/longevity-scie… For 72 hrs, use EARLY100 for $100 off
Adam Gries tweet media
English
9
32
81
13.7K
Saad Khan
Saad Khan@saadventures·
@ocontrerasv @shasta721 Do you two know each other? If not you can thank me later. :) (Oscar -I was in that piano #scifoo session when we got to check out your biometrics :)
English
0
0
0
81
Saad Khan
Saad Khan@saadventures·
Thinking of Brother Malcolm (X) today. إِنَّا ِلِلَّٰهِ وَإِنَّا إِلَيْهِ رَاجِعُونَ
1
0
1
308
Saad Khan
Saad Khan@saadventures·
Gangster
Andrej Karpathy@karpathy

I touched on the idea of sleeper agent LLMs at the end of my recent video, as a likely major security challenge for LLMs (perhaps more devious than prompt injection). The concern I described is that an attacker might be able to craft special kind of text (e.g. with a trigger phrase), put it up somewhere on the internet, so that when it later gets pick up and trained on, it poisons the base model in specific, narrow settings (e.g. when it sees that trigger phrase) to carry out actions in some controllable manner (e.g. jailbreak, or data exfiltration). Perhaps the attack might not even look like readable text - it could be obfuscated in weird UTF-8 characters, byte64 encodings, or carefully perturbed images, making it very hard to detect by simply inspecting data. One could imagine computer security equivalents of zero-day vulnerability markets, selling these trigger phrases. To my knowledge the above attack hasn't been convincingly demonstrated yet. This paper studies a similar (slightly weaker?) setting, showing that given some (potentially poisoned) model, you can't "make it safe" just by applying the current/standard safety finetuning. The model doesn't learn to become safe across the board and can continue to misbehave in narrow ways that potentially only the attacker knows how to exploit. Here, the attack hides in the model weights instead of hiding in some data, so the more direct attack here looks like someone releasing a (secretly poisoned) open weights model, which others pick up, finetune and deploy, only to become secretly vulnerable. Well-worth studying directions in LLM security and expecting a lot more to follow.

Deutsch
1
0
1
288
Saad Khan retweetledi
Pillars Fund
Pillars Fund@pillars_fund·
Applications for the 2024 Pillars Artist Fellowship are now open! Don’t miss out on this incredible opportunity if you are a Muslim director or screenwriter living in the U.S. or U.K. Apply here: pillarsfund.org/artist-fellows…
English
11
65
129
39.7K
Saad Khan
Saad Khan@saadventures·
AI is everywhere. Mubarak @FahadsEmpire !
fahadkhan@FahadsEmpire

Excited about @unity's beta for Safe Voice, a project I worked on that uses #AI / #ML tech to detect and end player toxicity for in-game voice chat. A process that's traditionally been resource-intensive and highly-manual, much more automated, efficient and scalable for studios

English
0
0
3
306
Matt Mickiewicz
Matt Mickiewicz@MattMickiewicz·
RIP Dad. Less than 6 years after being diagnosed with Parkinson’s, my dad passed away at 5:45AM this morning from PSP, a rare brain disease. I’ll be forever grateful for him and my mom leaving communist Poland behind to ultimately give my brother and I opportunity in Canada.
English
86
5
363
29K
Saad Khan
Saad Khan@saadventures·
Still got @SynBioBeta on the brain. @johncumbers Reflecting on potential tracks for you next year. Question: Doesn’t AI + Biology = ‘I’? Just saying :)
English
0
0
0
178
Saad Khan
Saad Khan@saadventures·
Warriors lost. I am in need of an angel
English
0
0
0
198
Fahad Hassan
Fahad Hassan@FahadHassan·
Today, I couldn't be more excited to share our $12M fundraise led by Google's AI Fund @GradientVC along with some of the most brilliant tech investors and advisors in Silicon Valley.
English
9
6
68
17.9K
joher khan
joher khan@joherkhan·
some people go to disneyland for everyone else, there's aisf.co
English
20
16
651
52.7K