Alexander Mackie

1K posts

Alexander Mackie

@ZanderMackie

Security research and other such things. Detection engineer @datadogHQ

Katılım Ekim 2022

2K Takip Edilen194 Takipçiler

Sabitlenmiş Tweet

Alexander Mackie@ZanderMackie·25 Mar

@IceSolst Any sufficiently advance LLM tool is indistinguishable from malware.

English

12.7K

Alexander Mackie retweetledi

pdawg@prathamgrv·2d

I made a Claude Code skill that turns any arxiv paper into working code. Every line traces back to the paper section it came from & any implementation detail the paper skips will be flagged, and not assumed. open sourcing it - github.com/PrathamLearnsT…

English

269

2.5K

145.5K

Alexander Mackie retweetledi

Samir@SBousseaden·1d

very cool research! that's why we already pushed some GenAI related EDR detection like this one #L8" target="_blank" rel="nofollow noopener">github.com/elastic/protec…

Malware Unicorn@malwareunicorn

New blog: We found a sandbox breakout and remote dev tunnel bug in Cursor. Called it NomShub. It was fun making my vscode dev tunnel C2 dashboard pink. na2.hubs.ly/H04GPbw0

English

100

11.6K

Alexander Mackie retweetledi

Danielle Fong 🔆@DanielleFong·21h

we live in a reality of agent swarms, so if software has this class of bug deep down we should set up to just migrate every class of heap overflow bug until we solve it, among others

Dan McAteer@daniel_mac8

Nicholas Carlini, Anthropic Researcher, reports that Claude Code found multiple remotely exploitable security vulnerabilities in the Linux kernel. Includes a bug that evaded human software engineers for an astonishing 23 (!) years.

English

2.4K

Alexander Mackie retweetledi

William Fedus@LiamFedus·22h

RL against verifiable rewards in LLMs has clearly opened a very powerful regime. It works, and because it works, there is a strong tendency to view more and more problems through that lens. You optimize for tasks where the reward is clean, where success is easy to check, where the feedback loop closes quickly. This is productive and will keep paying off. But it also creates a bias: you start emphasizing what is legible to the training setup, not necessarily what is most valuable. Scientific reasoning is a good example. Not every step in science is something that can be cleanly graded at the moment it is produced. A hypothesis can later fail experimentally and still have been exactly the right kind of thinking at the time: creative, mechanistically grounded, and responsive to the available evidence. “Turns out to be wrong” does not imply “was low-quality thinking”. A big part of the next frontier will be AI systems that can operate well under this kind of uncertainty, just like a big part of the last one was RL against verifiable rewards.

English

690

61.3K

Alexander Mackie retweetledi

Ryan Naraine@ryanaraine·1d

"I remind you that this present you're so concerned about losing, you hated it in the first place." @juanandres_gs on why security practitioners should stop clinging to the broken thing and start imagining what the fixed thing looks like. New episode is live 👇 open.spotify.com/episode/7K67rf…

English

3.6K

Alexander Mackie retweetledi

Brendan Dolan-Gavitt@moyix·1d

One of the really interesting things about this is that they think their dataset is *already* saturated if you can give the models a realistic amount of compute/tokens. 2M tokens is a tiny amount to spend on an exploit!

Ethan Mollick@emollick

Here’s an independent domain extension of METR’s famous time-horizon analysis, applying it to offensive cybersecurity with real human expert timing data Similar to METR: 5.7 months doubling time. Frontier models now succeed 50% of the time at tasks that take human experts 10.5h.

English

4.3K

Alexander Mackie@ZanderMackie·1d

This is great! Here’s my recipe: - take a domain or attack surface where there is a deterministic test of whether a security property has been violated - find existing research - distill vulnerability patterns - use an LLM to build a massive bucket of candidates where these patterns may exist - us an LLm to build a filtration system to orchestrate deterministic tests against said candidates against said attack surface (ie exploits) - use LLms and human judgement to build increasingly improved validations of success / failure of Exploitation - repeat and improve system

English

Alexander Mackie retweetledi

Simon Willison@simonw·1d

Started a new tag on my blog to track stories about AI-powered security research, which is very much having a moment right now - 11 posts so far already simonwillison.net/tags/ai-securi…

English

269

21K

Alexander Mackie retweetledi

Sebastian Raschka@rasbt·1d

Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation. Link: magazine.sebastianraschka.com/p/components-o…

English

157

892

67.5K

Alexander Mackie retweetledi

Ryan Naraine@ryanaraine·1d

If AI finds the zero-day, writes the exploit, and patches the code, who trains the next generation of security researchers? Chris St. Myers' "Cognitive Rust Belt" essay kicked off a debate we couldn't stop having. Apple Podcasts podcasts.apple.com/us/podcast/thr…

English

3.1K

Alexander Mackie retweetledi

s1r1us (mohan)@S1r1u5_·1d

this year's pwn2own isn't just interesting because there will be lots of entries with AI+human. it is also interesting because a) anthropic burned a ton of tokens on firefox, basically running claude in a loop until it found something for a month, probably exhausting whatever claude can one shot. b) if someone submits full chain without much use of ai, it tells you one shotting plateaus and these models are bit like fuzzers than seasoned security reseachers. c) even if they used an llm to find the bug, this tells us scaffolding/harnesss design, prompting, and the operator matters a lot.

English

214

16.3K

Alexander Mackie retweetledi

Anjney Midha@AnjneyMidha·1d

Stanford @CS153Systems, Week 1 (Full Lecture) AI Scaling, Bottlenecks, and Why Compute Isn't a Commodity Yet 00:00 Compute Coachella 00:29 Simple Life Heuristic 01:08 Uncertainty Creates Opportunity 01:42 Four Bottlenecks Framework 01:51 Empirical Proof Matters 02:05 Cloud Costs Are Shifting 02:15 Verifiable vs Fuzzy Progress 02:48 Scaling Predictability Explained 03:43 CapEx Explosion in Big Tech 04:06 Chips Aren’t Commodities 04:45 Compute Scarcity Conclusion

English

240

2.1K

272.4K

Alexander Mackie retweetledi

Dr. Anton Chuvakin@anton_chuvakin·2d

"RSA 2026: Agentic Future, Analog Fundamentals — The Paradox of Why the Old Guard Still Survives" bit.ly/47CLA1u <- the classic! My annual #RSAC #RSA2026 reflection post!

English

2.9K

Alexander Mackie retweetledi

Florian Roth ⚡️@cyb3rops·2d

Gemma 4 outperforms all other open source models in my cyber security related benchmark set

Arena.ai@arena

Gemma 4 by @GoogleDeepMind debuts at 3rd and 6th on the open source leaderboard, making it the #1 ranked US open source model. By total parameter count, Gemma 4 31B is 24× smaller than GLM-5 and 34× smaller than Kimi-K2.5-Thinking, delivering comparable performance at a fraction of the footprint.

English

334

121.9K

Alexander Mackie retweetledi

Tim Becker@tjbecker·2d

Remote heap-buffer-overflow in CUPS just dropped. Found autonomously by Xint Code. Disclosure + PoC below

English

104

10.5K

Alexander Mackie retweetledi

flavio@flaviocopes·2d

How Axios was compromised 🤯

English

147

855

6.9K

1.5M

Alexander Mackie@ZanderMackie·2d

@vxunderground I’m sorry but you’re not leet without a battlekilt

English

846

vx-underground@vxunderground·2d

There is this strange phenomena where people new to cybersecurity go way overboard trying to look cool and badass to give the facade of being really technical. I'll tell you something right now. You probably won't like to hear it, but it is important. Nobody cares about: - Your certificates - The conferences you've attended - Your vendor swag - What OS you're using - How many LED's your computer has Here is what your peers admire the most: - If you're polite - If you're willing to admit if you're wrong - If you're easy to get along with If you're just a chill nerd who is nice, easy going, willing to admit when you're wrong, you will go further than the big mean nerd with the galaxy brain

English

157

316

3.3K

102.2K

Alexander Mackie@ZanderMackie·3d

@tqbf @roddux Ooph

English

Thomas H. Ptacek@tqbf·3d

@roddux Everything in Syskaller is about to get higher-stakes now that everyone's LLM agent is going to try to shortcut generating working exploits by pulling from its crashers.

English

1.4K

roddux@roddux·3d

Linux has been actively drowning in bugs for years, many 0days can often be found even on the public syzkaller dashboard

Thomas H. Ptacek@tqbf

I'd say "I called this" but I didn't really call anything; more like standing on the shore going "yup, the tide is coming in". Most important open source project, went from slop reports to drowning in real vulnerability reports: lwn.net/Articles/10656…

English

5.8K

Alexander Mackie@ZanderMackie·3d

@theonejvo Ah interesting. That’s a good point!

English

Jamieson O'Reilly@theonejvo·3d

@ZanderMackie I've thought about this too. I can totally understand a forced hand situation, where if they were also behind recent but prior attacks, and the creds they gained from those were at high risk of being rotated, classic all or nothing decision imo.

English

Jamieson O'Reilly@theonejvo·3d

The irony of these attacks, is that North Korean hackers have spent so much time masquerading in sleeper mode as legitimate developers, they know more about the open-source dev ecosystem including how to exploit it, than most hackers or devs anywhere in the world.

Lukasz Olejnik@lukOlejnik

North Korea planted malicious code in Axios - one of the most popular JavaScript libraries, used by developers worldwide: in AI tools, ML pipelines, and fintech infrastructure. Had the attack gone undetected, infected packages could have reached hundreds of thousands of projects, servers, and production systems - from startups to banks and government institutions. The malware collected host data and waited for orders from Pyongyang, running on Linux, macOS, and Windows. STARDUST CHOLLIMA - a DPRK unit specializing in cryptocurrency theft and software supply chain attacks is behind the op. The motivation is simple: cash for the regime. The target: everyone who has ever imported axios.

English

4.1K

Alexander Mackie retweetledi

sshkhr@sshkhr16·3d

(1/7) Everyone's been piling on the cs153.stanford.edu syllabus at @Stanford by @AnjneyMidha and @mabb0tt as being too high-level (I saw some comparisons being made to Ted Talks, even Coachella 😅) fwiw I personally think this is a great class, and its probably significantly more value in-person than online. I saw the lectures with @GuillaumeLample and @sualehasif996 from the Winter 2025 session and personally found them very informative. But if you're looking for a deeper dive into frontier AI systems, here are 5 courses (and a few other resources) you will like:

English

208

19K

Keşfet

@juanandres_gs @CS153Systems @vxunderground @tqbf @roddux @elonmusk @BarackObama @taylorswift13