geb

317 posts

geb

@13utters

Berlin, Germany Katılım Temmuz 2011

1.2K Takip Edilen153 Takipçiler

geb@13utters·20 Ağu

@compasssecurity *.ch ?

109

Compass Security@compasssecurity·20 Ağu

Very excited to let you know we will soon be launching a #bugbounty program for an entire TLD. Stay tuned!

English

2.5K

geb@13utters·25 Tem

@vxunderground A - any size

English

vx-underground@vxunderground·25 Tem

We're doing a giveaway because of excess sponsor money. Option A. vx-uwu white t-shirt Option B. vx-uwu black hoodie Option C. vx-uwu black sweatshirt Comment which option you'd like and what size. Sizes available are Small - 2XL. We'll choose random nerds.

English

899

927

114.4K

geb retweetledi

samim@samim·16 Tem

As of today, I'm seeking new career opportunities. Learn more about me here: samim.io/studio/

English

13.9K

geb@13utters·16 Tem

@loackme_

QME

geb@13utters·25 Haz

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:H/A:H in prod today

Français

geb@13utters·13 May

Good morning tree people

English

geb@13utters·28 Şub

🐿️

ART

geb@13utters·17 Şub

@817_756_43_19 DM me

English

241

geb@13utters·16 Şub

@chriskoebke @farad0x

QAM

Chris Köbke@chriskoebke·16 Şub

Hey music theory Twitter, are you aware of any computational models that can extract and compare melody features? Currently reading "The Analysis and Cognition of Basic Melodic Structures" and it's very hard to find anything on that topic.

English

196

geb@13utters·13 Oca

@karpathy Log what a LLM does in clear text on OS level. Evaluate actions using DL to determine changed alignment.

English

Andrej Karpathy@karpathy·13 Oca

I touched on the idea of sleeper agent LLMs at the end of my recent video, as a likely major security challenge for LLMs (perhaps more devious than prompt injection). The concern I described is that an attacker might be able to craft special kind of text (e.g. with a trigger phrase), put it up somewhere on the internet, so that when it later gets pick up and trained on, it poisons the base model in specific, narrow settings (e.g. when it sees that trigger phrase) to carry out actions in some controllable manner (e.g. jailbreak, or data exfiltration). Perhaps the attack might not even look like readable text - it could be obfuscated in weird UTF-8 characters, byte64 encodings, or carefully perturbed images, making it very hard to detect by simply inspecting data. One could imagine computer security equivalents of zero-day vulnerability markets, selling these trigger phrases. To my knowledge the above attack hasn't been convincingly demonstrated yet. This paper studies a similar (slightly weaker?) setting, showing that given some (potentially poisoned) model, you can't "make it safe" just by applying the current/standard safety finetuning. The model doesn't learn to become safe across the board and can continue to misbehave in narrow ways that potentially only the attacker knows how to exploit. Here, the attack hides in the model weights instead of hiding in some data, so the more direct attack here looks like someone releasing a (secretly poisoned) open weights model, which others pick up, finetune and deploy, only to become secretly vulnerable. Well-worth studying directions in LLM security and expecting a lot more to follow.

Anthropic@AnthropicAI

New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through. arxiv.org/abs/2401.05566

English

204

680

4.9K

906.8K

geb retweetledi

okio@okio_ai·22 Ara

Introducing three new models for Nendo, trained by our community member pharoAIsanders: • Model 1 generates finest vintage Dub music • Model 2 generates Boom Bap Hip Hop tunes • Model 3 generates rolling Drum'n'Bass bangers All models are extensive, high quality finetunes of musicgen, that open up a universe of new sounds. Try them with this easy to use Nendo colab: colab.research.google.com/drive/1uGQIeju… Download the models: huggingface.co/pharoAIsanders… huggingface.co/pharoAIsanders… huggingface.co/pharoAIsanders…

English

112

58.2K

geb retweetledi

Jürgen Schmidhuber@SchmidhuberAI·30 Kas

Q*? 2015: reinforcement learning prompt engineer in Sec. 5.3 of “Learning to Think...” arxiv.org/abs/1511.09249. A controller neural network C learns to send prompt sequences into a world model M (e.g., a foundation model) trained on, say, videos of actors. C also learns to interpret answers of M, extracting algorithmic information from M. Acid test: does C learn its control tasks faster with M than without? Is it cheaper to learn C’s tasks from scratch, or to address algorithmic info in M in some computable way, enabling things such as abstract hierarchical planning and reasoning? 2018: collapsing C and M into a single network arxiv.org/abs/1802.08864 using the neural network distillation of 1991 x.com/schmidhuberai/… 1990: online planning & reinforcement learning with recurrent world models and artificial curiosity / GANs: people.idsia.ch/~juergen/world…

English

241

1.5K

468.6K

geb@13utters·23 Kas

Remember to practice OPSEC

English

geb@13utters·22 Kas

@adversaryminded @Flangvik Compliance audits just pay the bill.

English

Melvin langvik@Flangvik·22 Kas

😂

QME

15.3K

geb@13utters·20 Kas

Start calling it AGI when it does produce as much drama as the @openai situation

English

geb@13utters·13 Kas

@troyhunt The poor soul surely did too much compliance audits and got confused.

English

383

Troy Hunt@troyhunt·13 Kas

You idiot troyhunt.com/beg-bounties/

English

531

204.9K

geb@13utters·29 Eyl

@FlexElektro signed distance functions

English

FlexElektro@FlexElektro·29 Eyl

@13utters Well … idk exactly to be honest :D in my understanding sdls measure the distance from some things … so i would say no.

English

FlexElektro@FlexElektro·29 Eyl

#generativeart

QME

161

geb@13utters·14 Eyl

@compasssecurity ihr habt da was verloren...

Deutsch

geb@13utters·9 Eyl

@samim samim.io/breath/

QME

samim@samim·9 Eyl

A recent deepmind paper uses ML to come up with the most effective prompts for LLM's. "Take a deep breath and work step-by-step!" was one of the top ones: arxiv.org/abs/2309.03409

English

3.8K

Keşfet

@compasssecurity @vxunderground @loackme_ @817_756_43_19 @chriskoebke @farad0x @karpathy @Flangvik