Red

3K posts

Red

Red

@TheRedWall__

e/acc | Cyber Security x AI | Adversary Emulation

Katılım Eylül 2024
744 Takip Edilen294 Takipçiler
Sabitlenmiş Tweet
Red
Red@TheRedWall__·
For those who come after
English
0
0
4
428
Red
Red@TheRedWall__·
@sharbel It’s literally their own benchmark bruh they better do well
English
0
0
1
245
Sharbel
Sharbel@sharbel·
> be Cursor > watch every AI lab pour billions into coding models > get called the dying IDE while they raised > quietly ship Composer 2.5 > 63.2% on CursorBench at $0.55 per task > match Opus 4.7 Max and GPT 5.5 Extra High > at 1/20th the price > turns out cheap and good was the play
BridgeMind@bridgemindai

New CursorBench results just dropped. Two big takeaways. Composer 2.5 is way better than most people think. 63.2% score at $0.55 per task. Nearly matching Opus 4.7 Max and GPT 5.5 Extra High at 20x less cost. This is insane value. Gemini 3.5 Flash is #10 at 49.8%. Below GPT 5.5 Low. Below Opus 4.7 Low. Google's newest model can't even beat budget tier competition. Composer 2.5 is the sleeper. Gemini 3.5 Flash is the disappointment.

English
66
45
1.5K
342.8K
Jack Rhysider 🏴‍☠️
Jack Rhysider 🏴‍☠️@JackRhysider·
Got a long flight coming up. Tell me your favorite tech talk that came out in the last year or two so I can download it for the plane.
English
29
4
101
14K
Zed
Zed@zeddotdev·
Big diff go brrrrr
English
118
154
4.1K
287.2K
Red
Red@TheRedWall__·
@_winter_wonders Idk I have supply chain attack fatigue 🥀
English
0
0
1
117
Red
Red@TheRedWall__·
@ZackKorman @HackingLZ I feel like LOLLM should be reserved for abusing a local llm lmao
English
1
0
2
292
Zack Korman
Zack Korman@ZackKorman·
Calling this a LOLLM (Living Off the LLM)
Zack Korman tweet media
English
15
38
262
21.9K
🚨 AI News | TestingCatalog
🚨 AI News | TestingCatalog@testingcatalog·
GOOGLE 🔥: An upcoming Gemini Omni video model from Google is expected to be much more advanced in video editing, capable of completing tasks like removing watermarks, replacing objects in the video, and more. It is also likely that Google will release 2 versions of this model, including a Pro variant. And I assume what we see isn't Pro? Anime sample 👀
Just a dragon@Waguri_Kaoruko8

🫨Google is creating a new Omni model with good video editing. Veo4? The original is on the left. Edited right. The new model also does a good job of removing watermarks from videos.

English
96
33
554
177.6K
Red
Red@TheRedWall__·
@IceSolst @ZackKorman @loop0420 @eliedelkind Honestly the biggest double edge sword in this industry. Everything is met with skepticism, even the personal project I’m super excited about and sharing with the team 🥲
English
0
0
4
26
Red
Red@TheRedWall__·
@dillon_mulroy I basically always want skills to be auto invoked. I load up 100+ skills hand selected to curate the agent to my needs with the assumption that they get auto invoked. Maybe you should be building custom commands instead?
English
1
0
1
32
Dillon Mulroy
Dillon Mulroy@dillon_mulroy·
i think skills are a mistake and the wrong abstraction. i almost never want my agent auto invoking them and i have built custom tooling to "toggle" them on/off to prevent them from always being present in my context window.
English
161
20
886
125.2K
Nate
Nate@nathanv246·
@allgarbled What’s your test writing strategies? I feel like TDD doesn’t work perfectly w agents because rarely will they write all the edge cases at first
English
7
0
2
5K
gabe
gabe@allgarbled·
Pretty funny that when people started using LLMs for coding the first thing everyone said was “it can write your unit tests for you.” Like okay, maybe the worst possible use case for it?
English
63
7
869
78.7K
Red
Red@TheRedWall__·
@emollick @legit_api > this is a general purpose model that just happens to be good at finding exploits This is no longer true. The labs are actively adding training data to improve these skills
English
0
0
1
725
Ethan Mollick
Ethan Mollick@emollick·
So Mythos was, indeed, not marketing hype. Remember this is a general purpose model that just happens to be good at finding exploits because good models are good at lots of things. Expect similar from OpenAI & Google. And from open models in 8 months. hacks.mozilla.org/2026/05/behind…
Ethan Mollick tweet mediaEthan Mollick tweet media
English
136
307
3.5K
583.1K
Red retweetledi
Zack Korman
Zack Korman@ZackKorman·
Time to explain what Embroidery does: We monitor AI agents like Claude Code and Codex to detect and alert on dangerous behavior. Companies are giving devs access to these tools, but if something bad happens they probably wouldn't know. Details on how it works below.
Zack Korman tweet media
English
51
40
229
17.2K
Red
Red@TheRedWall__·
@GrahamHelton3 @cyber_rekk Hiring managers like the extra education as a credential so it’ll boost your chances of getting an interview
English
0
0
1
15
Red
Red@TheRedWall__·
@somewheresy The sweet sweet bitter lesson is learned once again
English
0
0
0
79
∿
@somewheresy·
is anyone else always lol at the fact that we spent YEARS trying to figure out hybrid search, document embeddings, "RAG" and graph databases, just for the models to improve enough to wield tools against a filesystem, and the SOTA achieved by Doing Literally None Of That
English
52
20
1.1K
63.2K
Red
Red@TheRedWall__·
@steipete Curious why you’re a fan of /goal but a vocal hater of /ralph
English
2
0
8
6.5K
Red
Red@TheRedWall__·
@techspence User: *give me bad advice* AI: *gives bad advice* User: oh my god
English
1
0
7
417
spencer
spencer@techspence·
Blindly following the first advice AI gives you will lead to so many orgs nuking their environments...
spencer tweet mediaspencer tweet media
English
11
1
41
10.1K
Red
Red@TheRedWall__·
@Samaytwt Stealing this
English
0
0
0
0
Samay
Samay@Samaytwt·
Unpopular opinion: "AI makes everyone a developer" is true the same way "cameras makes everyone a photographer"
Samay tweet media
English
773
3.3K
29.2K
1.1M
Red
Red@TheRedWall__·
Labs that fail to dogfood their models are doomed to have shit models. Evident with Gemini, where deepmind largely uses Claude. Soon to be evident with Claude Opus, where Anthropic will largely be using Mythos. OpenAI will be the only good provider if this pattern continues and that’s a shame
English
0
0
0
40