Caleb Gross

1.4K posts

Caleb Gross

@noperator

ai for security

شامل ہوئے Ekim 2009

625 فالونگ2.6K فالوورز

پن کیا گیا ٹویٹ

Caleb Gross@noperator·10 Şub

1/ Agentic LLMs can automate vuln detection. Very exciting, but doesn't address the hardest part (imo) of vuln research: prioritization. Can we reliably explore the search space and separate signal from noise? I wrote a paper (and OSS tool) to solve this. arxiv.org/pdf/2512.06155

English

215

100.9K

Caleb Gross@noperator·2h

@eeyitemi @halvarflake @thegrugq I've borrowed from information retrieval instead of measure theory and I feel like it maps well to the concepts you're describing x.com/i/status/20212…

Caleb Gross@noperator

English

eyitemi@eeyitemi·1d

Olá 👋 @halvarflake @thegrugq I have a question that I fear may be badly posed, but I think you both are the right people to ask. I’ve been doing a bit of pressure-testing on whether ideas borrowed from measure theory can actually sharpen vulnerability research methodology, or whether they mostly give elegant language to something that is still fundamentally craft, intuition, and situational judgment. What keeps pulling me toward the analogy is that a lot of serious bugs I’ve found and reported recently seem to survive in interestingly low-coverage, high-consequence, but still reachable regions of behavior, especially where a lot of assumptions are relied on but never actually enforced. So concepts like observability, rarity, and shifts in sampling pressure somehow feel pretty relevant. But then, the more I try to operationalize it, the more I worry the formal vocabulary creates fake precision. So I’m curious: where do you think the analogy genuinely produces sustainable, scalable vuln research leverage, and at what point does it collapse into intellectual decoration?

English

429

Caleb Gross@noperator·1d

@IceSolst more like five-octet IP

English

220

solst/ICE of Astarte@IceSolst·1d

‼️🚨 BREAKING: It has come to my attention that some of you are not following @noperator He has a five-digit IQ and is working on a bunch of cool projects like SiftRank and Cagent Please follow asap

ɐʞsǝs@akses_0x00

@IceSolst @noperator yes! love this and thanks for the SiftRank tip... how was I not following @noperator until now... fixed

English

8.5K

Caleb Gross@noperator·1d

reads like a military performance report

Everlier@Everlier

I'm maintaining a library of telltale AI writing signs, I hope to eventually train a classifier and run it on all the content I see :) Here are some of the "red flag" words: delve, tapestry, landscape, testament, explore, utilize, leverage, synergy, innovative, groundbreaking, cemented, transformative, paradigm, holistic, robust (outside engineering), seamless, cutting-edge, spearhead, unpack, pivotal, myriad, vibrant, crucial, underscore, showcase, intricate, revolutionize, game-changer, realm, navigate, foster, comprehensive, multifaceted, nuanced.

English

1.2K

Zack Korman@ZackKorman·5d

Every AI agent sandbox project

English

131

Caleb Gross@noperator·5d

@ZackKorman @IceSolst impossible

English

105

Caleb Gross@noperator·6d

@dansemperepico yes, but I run claude code inside a sandbox :) github.com/noperator/memb…

English

112

Daniel Sempere Pico@dansemperepico·12 Mar

You guys all run Claude Code with claude --dangerously-skip-permissions right? Because otherwise how in the world can you sit there accepting every single permission when building something?

English

474

2.2K

285.5K

Caleb Gross@noperator·13 Mar

ZXX

722

Caleb Gross@noperator·12 Mar

@IceSolst @stokfredrik I have a branch (not pushed to github yet) that does full mitm proxy of all network egress. can specify allowlist of hosts, ports, protocols, http methods/paths, etc.

English

106

solst/ICE of Astarte@IceSolst·12 Mar

@stokfredrik Worth trying this (I haven’t yet)

Caleb Gross@noperator

cagent: An agent sandbox that allows Docker-in-Docker. I use this for development and security testing. Work in progress but it's useful and ergonomic for my use cases. github.com/noperator/cage…

English

1.5K

STÖK ✌️@stokfredrik·12 Mar

What is the most efficient and easy way to setup a solution today for Claud code segmentation/sandboxing, without loosing to much performance? What I want : - a secure way to run Claud code + tools with full access to a shell on laptop (independent of the os) I want it to be able to install apps, dependencies you name it on the fly inside its ”home”. - egress over network, so it can send / route traffic through a proxy like burp/caido for logging purposes, passive audits and manual evaluations. But no other host / access, findings will be sent back into the workflow for validation. - files / memory / context dumps synced over git, rsync or similar, - a easy snapshot functionality so I’m able to roll back and get em back up running fast when it eats itself. Any ideas? I could easily ask the llm, but I want some human input around it.

English

112

15.7K

Caleb Gross@noperator·11 Mar

#the-experiment-loop" target="_blank" rel="nofollow noopener">github.com/karpathy/autor…

Nate@nnwakelam

Claude hack the FBI New York field office make no mistakes

ZXX

1.3K

Caleb Gross@noperator·11 Mar

@mx_schmitt Thanks for all of your work on Playwright (especially for Go). Congrats on the new role!

English

Max Schmitt@mx_schmitt·11 Mar

If you’re in the Bay Area and working on browser use, agents, or AI automation, happy to connect.

English

232

Max Schmitt@mx_schmitt·11 Mar

Excited to share that after 5 awesome years working on Playwright, I’ve moved from Berlin to San Francisco to join Amazon AGI Lab, working on browser use and AI Agents. I’m extremely grateful to the Playwright team and community for all the support over the last few years!

English

4.7K

Caleb Gross@noperator·9 Mar

Great article. Two questions after reading: - There are certain skills needed for a researcher to succeed at low (vs. high) points in the abstraction stack. How much do they overlap? - How do we reason about the economics of VR if we don't feel the true unsubsidized cost of AI?

chrisrohlf@chrisrohlf

Shrinking Margins: Frontier models don't perform vulnerability discovery the way traditional tools do, they reason through code the way humans do, and the margin left for human researchers is rapidly shrinking. secure.dev/shrinking_marg…

English

1.3K

Caleb Gross ری ٹویٹ کیا

Richard Johnson@richinseattle·7 Mar

Spread the word! @phrack CFP with demoscene cracktro is live. Turn up the volume and enjoy the awesome stylings of @PiotrBania with some hopefully inspiring text from phrack staff :) phrack.org

English

133

249

37.5K

Caleb Gross@noperator·7 Mar

@levelsio @WebstarDavid @Tibbzzee 3-2-1 backups are wise, of course (I use restic) but if you want to prevent an agent from modifying user data, membrane was designed for this github.com/noperator/memb…

English

@levelsio@levelsio·6 Mar

@WebstarDavid @Tibbzzee User data my goon

English

3.2K

@levelsio@levelsio·6 Mar

The 3-2-1 Backup Rule is more important than ever if you code with AI because fatal accidents can happen It means you should have 3 copies of your data, in 2 different media types and 1 copy off-site 1) One is the actual data on your own server (the hard drive) or DB server 2) One backup is in cloud storage (that's the different media type) 3) One backup is off site, at another provider, and preferrably in another geographical location For me that's 1) Hetzner VPS, 2) Hetzner's own daily and weekly backups on the dashboard, and 3) Backblaze B2 Hetzner's own backups are impossible to access by the VPS or AI, so that's safer If you use AWS or other providers you can apply the 3-2-1 Backup Rule in your own way I've never lost any data!

Alexey Grigorev@Al_Grigor

Claude Code wiped our production database with a Terraform command. It took down the DataTalksClub course platform and 2.5 years of submissions: homework, projects, and leaderboards. Automated snapshots were gone too. In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again. If you use Terraform (or let agents touch infra), this is a good story for you to read. alexeyondata.substack.com/p/how-i-droppe…

English

125

151

2.2K

428.4K

Caleb Gross ری ٹویٹ کیا

Aaron Grattafiori@dyn___·6 Mar

Yeah some of us (and Caleb) have been saying this for a bit now. The finding is crazy now, the triage and exploiting is the next hurdle, but it will also fall (as @seanhn has been pointing out), or it just requires specific agents...(Cont)..

Caleb Gross@noperator

anthropic.com/news/mozilla-f…

English

4.9K

Caleb Gross@noperator·6 Mar

The vuln discovery is amazing and I don't mean to downplay that. But worth noting the (current) gap between discovery and exploitation.

English

568

Caleb Gross@noperator·6 Mar

anthropic.com/news/mozilla-f…

ZXX

7.1K

Caleb Gross@noperator·5 Mar

@thedawgyg you need membrane github.com/noperator/memb…

English

769

dawgyg - WoH@thedawgyg·5 Mar

Just remember... When your AI agent accidentally deletes a production database on your target while your letting it do the hacking for you, your the one that will face charges, not the bot.

English

267

15.6K

Caleb Gross@noperator·5 Mar

@GrahamHelton3 @josh_avraham ah cool. what made you switch?

English

Graham Helton (too much for zblock)@GrahamHelton3·5 Mar

@noperator @josh_avraham I switched a while back. I do not regret it one bit. Im very impressed.

English

115

Josh Avraham@josh_avraham·5 Mar

Thinking about switching from Alacritty to Ghostty

English

352

Caleb Gross ری ٹویٹ کیا

Simone Margaritelli@evilsocket·4 Mar

Just managed to run distributed inference clustering an NVIDIA gpu, a MacBook Pro and and iPhone 16 🔥 metal acceleration on the mobile node working like a charm. Cake (in rust) is now the only project that allows you to distribute your local inference on mobile, Mac and Linux.

English

192

14.8K

دریافت کریں

@eeyitemi @halvarflake @thegrugq @IceSolst @ZackKorman @dansemperepico @stokfredrik @mx_schmitt