Chawin Sitawarin (@csitawarin) - Twitter Profili

@kotekjedi_ml @javirandor @alxndrdavies @usmananwar391 @_zifan_wang @edoardo_debe @GraySwanAI Thanks a lot for sharing. Cool stuff!! Was gonna try something similar, but I guess now we didn’t have to :)

English

0

3

63

Alexander Panfilov@kotekjedi_ml·26 Mar

Tagging who might find this interesting/useful: @javirandor @alxndrdavies @usmananwar391 @_zifan_wang @edoardo_debe @csitawarin @GraySwanAI

English

3

0

26

3.3K

Alexander Panfilov@kotekjedi_ml·26 Mar

New paper: We deploy Claude Code in an autoresearch loop to discover novel jailbreaking algorithms – and it works. It beats 30+ existing GCG-like attacks (with AutoML hyperparameter tuning) This is a strong sign that incremental safety and security research can now be automated.

English

48

209

1.6K

300.4K

Chawin Sitawarin retweetledi

Andrew Gordon Wilson@andrewgwils·7 Mar

To be honest, I was initially confused and reserved about AI alignment. It's not that I was against the research direction, quite the opposite. For 15 years, I'd been developing the foundations of what had been rebranded as alignment. But, I've changed my mind. 1/6

English

8

20

259

52.6K

Chawin Sitawarin retweetledi

Nils Walter@nilspwalter·29 Eki

It is notoriously hard to defend LLMs against prompt injections. Most defenses show good performance on static benchmarks but fall apart against stronger adaptive attackers. In our latest work, we present an almost embarrassingly simple defense that delivers ~3× better robustness against the strongest adaptive prompt injection attacks to date - while keeping utility degradation acceptable. Joint work with @csitawarin, Jamie Hayes, @davidstutz92, @iliaishacked.

English

1

7

14

2.4K

Chawin Sitawarin retweetledi

Federico Barbero@fedzbar·22 Eki

Feel free to check out the paper here :) arxiv.org/abs/2510.18554 Special thanks to the amazing co-authors! w/ @gu_xiangming @Chris_Choquette @csitawarin Matthew Jagielski @itay__yona @PetarV_93 @iliaishacked Jamie Hayes

English

0

5

17

1.6K

Chawin Sitawarin@csitawarin·24 Eki

@_alyxya @SallyHZhu Sorry typo: "the explanation here does not seem convincing"

English

0

96

Chawin Sitawarin@csitawarin·24 Eki

@_alyxya @SallyHZhu I didn't read the paper but sort of have the same question as @_alyxya. This seems like an obvious test so I assume I miss something that's covered in the paper? But the explanation here does seem convincing to me... There might be FPs sure, but it seems like

English

2

0

3

447

Sally Zhu@SallyHZhu·23 Eki

🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨

English

34

95

757

471.1K

Chawin Sitawarin@csitawarin·24 Eki

@_alyxya @SallyHZhu a lot stronger statistical test than checking against the training data (regardless of whether Bob fine-tunes the ?

English

0

1

102

Chawin Sitawarin retweetledi

Florian Tramèr@florian_tramer·13 Eki

5 years ago, I wrote a paper with @wielandbr @aleks_madry and Nicholas Carlini that showed that most published defenses in adversarial ML (for adversarial examples at the time) failed against properly designed attacks. Has anything changed? Nope...

English

5

27

183

20.9K

Chawin Sitawarin retweetledi

Konrad Rieck 🌈@mlsec·30 Tem

🚨 Got a great idea for an AI + Security competition? @satml_conf is now accepting proposals for its Competition Track! Showcase your challenge and engage the community. 👉 satml.org/call-for-compe… 🗓️ Deadline: Aug 6

English

0

13

31

4K

Chawin Sitawarin@csitawarin·26 Tem

Very cool thought-provoking piece! In practice, computation units are much more nuanced than what theories capture. But just trying to identify classes of problems that benefit from sequential computation (or is unsolvable without it) seems very useful!

Konpat Ta Preechakul@konpatp

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the real bottleneck isn’t size—but depth?What if the model just didn’t have enough serial steps to get it right? Some problems need depth, not width. This is the Serial Scaling Hypothesis. This is not the same as recent studies in scaling test-time compute, which focus on train vs. test and are agnostic to parallel vs. serial. For example: test-time majority voting increases compute by running models in parallel — but doesn’t help when the task itself is serial. We argue: what really matters is how the compute is structured. And for many real-world problems, it must be serial. Read more at: arxiv.org/abs/2507.12549 or 🧵. (In collaboration with: @layer07_yuxi , Kananart Kuwaranancharoen and @YutongBAI1002 )

English

0

7

395

Chawin Sitawarin retweetledi

Konpat Ta Preechakul@konpatp·21 Tem

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the real bottleneck isn’t size—but depth?What if the model just didn’t have enough serial steps to get it right? Some problems need depth, not width. This is the Serial Scaling Hypothesis. This is not the same as recent studies in scaling test-time compute, which focus on train vs. test and are agnostic to parallel vs. serial. For example: test-time majority voting increases compute by running models in parallel — but doesn’t help when the task itself is serial. We argue: what really matters is how the compute is structured. And for many real-world problems, it must be serial. Read more at: arxiv.org/abs/2507.12549 or 🧵. (In collaboration with: @layer07_yuxi , Kananart Kuwaranancharoen and @YutongBAI1002 )

English

26

75

426

57.8K

Chawin Sitawarin@csitawarin·22 Tem

@edoardo_debe Awesome! Congrats 🎉 You gonna be in Menlo Park?

English

1

0

1

295

Edoardo Debenedetti@edoardo_debe·22 Tem

Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!

English

23

9

365

22.8K

Chawin Sitawarin@csitawarin·14 Tem

I will be at ICML this year after a full long year of not attending any conference :) Happy to chat, and please don’t hesitate to reach out here, email, on Whova, or in person 🥳

English

0

3

252

Chawin Sitawarin retweetledi

Andreas Terzis@aterzis·2 Haz

We are starting our journey on making Gemini robust to prompt injections and in this paper we present the steps we have taken so far. A collective effort by the GDM Security & Privacy Research team spanning over > 1 year.

Ilia Shumailov🦔@iliaishacked

Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

English

0

4

36

2.9K

Chawin Sitawarin retweetledi

dr. jack morris@jxmnop·3 Haz

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

English

78

370

3.3K

410.1K

Chawin Sitawarin retweetledi

Tong Wu@TongWu_Pton·2 Nis

🛠️ Still doing prompt engineering for R1 reasoning models? 🧩 Why not do some "engineering" in reasoning as well? Introducing our new paper, Effectively Controlling Reasoning Models through Thinking Intervention. 🧵[1/n]

English

2

3

27

5.9K

Chawin Sitawarin retweetledi

Edoardo Debenedetti@edoardo_debe·25 Mar

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

English

2

16

84

11.8K

Chawin Sitawarin retweetledi

Max Nadeau@MaxNadeau_·6 Şub

🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

English

4

83

250

83.4K

Chawin Sitawarin retweetledi

Sicheng Zhu@sichengzhuml·21 Ara

Using GCG to jailbreak Llama 3 yields only a 14% attack success rate. Is GCG hitting a wall, or is Llama 3 just safer? We found that simply replacing the generic "Sure, here is***" target prefix with our tailored prefix boosts success rates to 80%. (1/8)

English

3

11

64

6.2K

Chawin Sitawarin retweetledi

Nikola Jovanović@ni_jovanovic·20 Ara

SynthID-Text by @GoogleDeepMind is the first large-scale LLM watermark deployment, but its behavior in adversarial scenarios is largely unexplored. In our new blogpost, we apply the recent works from @the_sri_lab and find that... 👇🧵

English

2

15

27

3.6K

Chawin Sitawarin

Keşfet