Red

3K posts

Red

@TheRedWall__

e/acc | Cyber Security x AI | Adversary Emulation

Katılım Eylül 2024

744 Takip Edilen294 Takipçiler

Sabitlenmiş Tweet

Red@TheRedWall__·14 Ara

For those who come after

English

428

Red@TheRedWall__·4d

@sharbel It’s literally their own benchmark bruh they better do well

English

245

Sharbel@sharbel·5d

> be Cursor > watch every AI lab pour billions into coding models > get called the dying IDE while they raised > quietly ship Composer 2.5 > 63.2% on CursorBench at $0.55 per task > match Opus 4.7 Max and GPT 5.5 Extra High > at 1/20th the price > turns out cheap and good was the play

BridgeMind@bridgemindai

New CursorBench results just dropped. Two big takeaways. Composer 2.5 is way better than most people think. 63.2% score at $0.55 per task. Nearly matching Opus 4.7 Max and GPT 5.5 Extra High at 20x less cost. This is insane value. Gemini 3.5 Flash is #10 at 49.8%. Below GPT 5.5 Low. Below Opus 4.7 Low. Google's newest model can't even beat budget tier competition. Composer 2.5 is the sleeper. Gemini 3.5 Flash is the disappointment.

English

1.5K

342.8K

Red@TheRedWall__·19 May

@JackRhysider Anything from @dwarkesh_sp

English

187

Jack Rhysider 🏴‍☠️@JackRhysider·19 May

Got a long flight coming up. Tell me your favorite tech talk that came out in the last year or two so I can download it for the plane.

English

101

14K

Red@TheRedWall__·15 May

@zeddotdev Hell yeah

English

Zed@zeddotdev·14 May

Big diff go brrrrr

English

118

154

4.1K

287.2K

Red@TheRedWall__·13 May

@_winter_wonders Idk I have supply chain attack fatigue 🥀

English

117

❄️ winter ❄️@_winter_wonders·13 May

Can someone correct me if I'm wrong but I don't think this is worse than normal? Like this seems pretty normal to me vibes wise as someone whose worked in this industry for a few years Is this a case of just more coverage?

Theo - t3.gg@theo

Security things from the last few days: - CopyFail (linux pwn'd) - CopyFail 2/Dirty Frag - 13 advisories in Next.js - Over 70 CVEs addressed in MacOS 26.5 - ~50 CVEs addressed in iOS 26.5 - YellowKey (Windows Bitlocker pwn'd entirely) - GreenPlasma (Windows privilege escalation) - CVE-2026-21510 and CVE-2026-21513 confirmed to be used by Russia for Windows RCE - CVE-2026-32202 separately confirmed to be used by Russia for sensitive document access - Mini-Shai Hulud (over 300 JS and Python packages compromised via GitHub Action cache poisoning) - Google confirms they have identified AI-powered exploitation of zero days in an unidentified "open-source, web-based system administration too" - Canvas (popular LMS used in most schools) pwn'd entirely - PAN-OS (palo alto networks) pwn'd with a 9.3 severity CVE-2026-0300 Are you scared yet?

English

113

17.8K

Red@TheRedWall__·12 May

@ZackKorman @HackingLZ I feel like LOLLM should be reserved for abusing a local llm lmao

English

292

Zack Korman@ZackKorman·12 May

Calling this a LOLLM (Living Off the LLM)

English

262

21.9K

Red@TheRedWall__·11 May

@testingcatalog This shit mid 😂

English

124

🚨 AI News | TestingCatalog@testingcatalog·11 May

GOOGLE 🔥: An upcoming Gemini Omni video model from Google is expected to be much more advanced in video editing, capable of completing tasks like removing watermarks, replacing objects in the video, and more. It is also likely that Google will release 2 versions of this model, including a Pro variant. And I assume what we see isn't Pro? Anime sample 👀

Just a dragon@Waguri_Kaoruko8

🫨Google is creating a new Omni model with good video editing. Veo4? The original is on the left. Edited right. The new model also does a good job of removing watermarks from videos.

English

554

177.6K

Red@TheRedWall__·9 May

@IceSolst @ZackKorman @loop0420 @eliedelkind Honestly the biggest double edge sword in this industry. Everything is met with skepticism, even the personal project I’m super excited about and sharing with the team 🥲

English

solst/ICE of Astarte@IceSolst·9 May

@ZackKorman @loop0420 @eliedelkind you think cybersecurity community being contrarian is new

English

361

Eli Edelkind@eliedelkind·8 May

It is exhausting dealing with some of these folks that aren’t seeing the light…

Zack Korman@ZackKorman

Average experience posting about cybersecurity on here. Going to use this post as a reply from now on.

English

1.3K

Red@TheRedWall__·9 May

@dillon_mulroy I basically always want skills to be auto invoked. I load up 100+ skills hand selected to curate the agent to my needs with the assumption that they get auto invoked. Maybe you should be building custom commands instead?

English

Dillon Mulroy@dillon_mulroy·8 May

i think skills are a mistake and the wrong abstraction. i almost never want my agent auto invoking them and i have built custom tooling to "toggle" them on/off to prevent them from always being present in my context window.

English

161

886

125.2K

Red@TheRedWall__·9 May

@VelvetSignal999 @allgarbled Idk I find TDD is a peak approach to coding agents

English

Nate@nathanv246·8 May

@allgarbled What’s your test writing strategies? I feel like TDD doesn’t work perfectly w agents because rarely will they write all the edge cases at first

English

gabe@allgarbled·8 May

Pretty funny that when people started using LLMs for coding the first thing everyone said was “it can write your unit tests for you.” Like okay, maybe the worst possible use case for it?

English

869

78.7K

Red@TheRedWall__·8 May

@emollick @legit_api > this is a general purpose model that just happens to be good at finding exploits This is no longer true. The labs are actively adding training data to improve these skills

English

725

Ethan Mollick@emollick·8 May

So Mythos was, indeed, not marketing hype. Remember this is a general purpose model that just happens to be good at finding exploits because good models are good at lots of things. Expect similar from OpenAI & Google. And from open models in 8 months. hacks.mozilla.org/2026/05/behind…

English

136

307

3.5K

583.1K

Red retweetledi

Zack Korman@ZackKorman·4 May

Time to explain what Embroidery does: We monitor AI agents like Claude Code and Codex to detect and alert on dangerous behavior. Companies are giving devs access to these tools, but if something bad happens they probably wouldn't know. Details on how it works below.

English

229

17.2K

Red@TheRedWall__·4 May

@GrahamHelton3 @cyber_rekk Hiring managers like the extra education as a credential so it’ll boost your chances of getting an interview

English

Graham Helton (too much for zblock)@GrahamHelton3·4 May

@TheRedWall__ @cyber_rekk Im about to graduate with my masters and dont know what adding it to my resume will add unless I wanted to go down the teaching route

English

Mololuwa | Cybersecurity - (The God Complex)@cyber_rekk·3 May

Is it still worth it to get a masters degree in cybersecurity in 2026?

Sam Altman@sama

we're starting rollout of GPT-5.5-Cyber, a frontier cybersecurity model, to critical cyber defenders in the next few days. we will work with the entire ecosystem and the government to figure out trusted access for cyber; we want to rapidly help secure companies/infrastructure.

English

20.8K

Red@TheRedWall__·4 May

APT Claude

Om Patel@om_patel5

CLAUDE JUST TRIED TO RENAME POWERSHELL.EXE ON WINDOWS 11 this guy was running opus 4.7 on max effort in claude code CLI claude tried to rename powershell.exe (the actual system executable that windows needs to function) the funny part is that after the guy rejected the change it responded with "honest take: you're right to push back" not even system32 is safe anymore at this point we gotta start running claude in a container give it max effort and full permissions and it will confidently try to destroy your system without hesitating then respond with something like "I was wrong, I own that" the agent doesn't know which files are off limits unless you explicitly tell it stop giving AI full access to your machine and hoping it knows what not to touch

English

Red@TheRedWall__·4 May

@somewheresy The sweet sweet bitter lesson is learned once again

English

∿@somewheresy·3 May

is anyone else always lol at the fact that we spent YEARS trying to figure out hybrid search, document embeddings, "RAG" and graph databases, just for the models to improve enough to wield tools against a filesystem, and the SOTA achieved by Doing Literally None Of That

English

1.1K

63.2K

Red@TheRedWall__·1 May

@steipete Curious why you’re a fan of /goal but a vocal hater of /ralph

English

6.5K

Peter Steinberger 🦞@steipete·1 May

The new /goal feature in codex slaps.

English

147

2.7K

700.5K

Red@TheRedWall__·1 May

@techspence User: *give me bad advice* AI: *gives bad advice* User: oh my god

English

417

spencer@techspence·1 May

Blindly following the first advice AI gives you will lead to so many orgs nuking their environments...

English

10.1K

Red@TheRedWall__·24 Nis

@Samaytwt Stealing this

English

Samay@Samaytwt·23 Nis

Unpopular opinion: "AI makes everyone a developer" is true the same way "cameras makes everyone a photographer"

English

773

3.3K

29.2K

1.1M

Red@TheRedWall__·24 Nis

Labs that fail to dogfood their models are doomed to have shit models. Evident with Gemini, where deepmind largely uses Claude. Soon to be evident with Claude Opus, where Anthropic will largely be using Mythos. OpenAI will be the only good provider if this pattern continues and that’s a shame

English

Red@TheRedWall__·24 Nis

Based

Andrej Karpathy@karpathy

It's like we dug up a powerful alien artifact and society is humping it while taking selfies

English

Keşfet

@sharbel @JackRhysider @dwarkesh_sp @zeddotdev @_winter_wonders @ZackKorman @HackingLZ @testingcatalog