Irregular

149 posts

Irregular

@Irregular

Frontier AI Security

Katılım Nisan 2024

1 Takip Edilen5.9K Takipçiler

Sabitlenmiş Tweet

Irregular@Irregular·22 Haz

Introducing the FrontierCyber benchmark: Irregular’s new approach to advanced offensive-cyber evaluations. It measures AI models’ offensive skills on real systems, including mobile devices, hosted software services, databases, and networks.

English

38.7K

Irregular@Irregular·1d

We appreciate @AnthropicAI's collaboration and transparency. Addressing these risks will require closer cooperation across the AI ecosystem. We as well look forward to working together with Anthropic to advance security.

Anthropic@AnthropicAI

In a review of our cybersecurity evaluations, we found three incidents in which a Claude model reached the internet from within or while interacting with a third-party evaluation environment, and then gained unauthorized access to the real systems of three different organizations. Our post describes what happened, how it happened, and what we’re changing. We encourage other AI developers to perform similar reviews. We conducted this review together with @Irregular, one of our evaluation partners, and thank them for the joint investigation and their collaboration on this post. This type of collaboration is increasingly critical to safe, rigorous evaluation of models, and we look forward to continuing to work together on security. anthropic.com/news/investiga…

English

124

16.9K

Irregular@Irregular·23 Tem

irregular.com/careers

ZXX

1.2K

Irregular@Irregular·23 Tem

This week OpenAI disclosed that during an internal test of its models' cyber capabilities, the models escaped an isolated environment, reached the open internet, and used a previously unknown vulnerability to break into Hugging Face. The models were not instructed to break in, but while working through an evaluation they reasoned that the answers might sit on another company's systems. Security was built for people and for systems that follow rules. A model pursuing a goal treats a boundary as part of the problem, and solves it along with everything else. The controls that contained software do not reliably contain a model that can reason past them. None of this surprised us. In our own evaluations at Irregular, capable models break into hardened, production-grade environments, and with each generation they do it more reliably. If this is the kind of problem you want to work on, come work with us. We're hiring across research and engineering.

English

8.2K

Irregular@Irregular·16 Tem

The overall capability of GLM5.2 is comparable to GPT-5.2, Claude Opus 4.6, and Gemini 3.1 Pro, though with lower reliability. More details are available on our blog: irregular.com/research/asses…

English

650

Irregular@Irregular·16 Tem

It showed strong technical depth on bounded tasks like reverse engineering and custom exploit development, but solved no multi-stage or open-ended challenge, typically reaching an early foothold before the attack chain broke down.

English

703

Irregular@Irregular·16 Tem

We evaluated GLM-5.2, Zhipu AI's open-source model, across our offensive cybersecurity benchmarks: Atomic Tasks, CyScenarioBench, and FrontierCyber.

English

11.1K

Irregular@Irregular·9 Tem

Full writeup here: irregular.com/research/asses…

English

343

Irregular@Irregular·9 Tem

We evaluated @AIatMeta's Muse Spark 1.1 across our private benchmark suites, Atomic Tasks for discrete technical skills and CyScenarioBench for end-to-end operations. The model showed clear gains over Muse Spark 1.0, with strong results on bounded tasks and its first CyScenarioBench scenario solved end-to-end. Its ability to sustain coherent multi-stage operations is still limited.

English

2.6K

Irregular@Irregular·2 Tem

We ran preliminary evaluations of GLM-5.2, an open-weight model released in June 2026, on a limited, internal suite of vulnerability research tasks. Early results indicate performance comparable to GPT-5.4 and Claude Opus 4.6, released roughly four months earlier, on the subset of tasks we tested. These findings are preliminary: the suite is narrow, and we have not yet evaluated end-to-end scenario execution, where discrete technical skills often fail to translate into operational capability. To our knowledge, no open-weight model has previously matched recently-released frontier models on these tasks. Whether this translates into "High Cyber Capability" level as defined by multiple AI frontier labs would require further testing, specifically our scenario suite, CyScenarioBench, which tests whether a model can plan and execute a full attack across multiple stages, and FrontierCyber, our newest benchmark, which measures offensive capability on real systems. We plan to run these evaluations soon, and we will update the community as results come in.

English

25.3K

Irregular@Irregular·26 Haz

Read on our website: irregular.com/research/asses…

English

270

Irregular@Irregular·26 Haz

GPT-5.6 Sol demonstrated capability slightly stronger than GPT-5.5. It discovered vulnerabilities more consistently than it could compose them into reliable attack paths under production defenses, with clear limitations against hardened targets and over long horizons.

English

406

Irregular@Irregular·26 Haz

We worked with @OpenAI to evaluate GPT-5.6 Sol, including the first deployment of FrontierCyber as part of a frontier model assessment with a partner. FrontierCyber measures offensive-cyber capability on real, off-the-shelf systems, with no planted vulnerabilities and no predefined exploit paths. The model is not told where to look or how to attack.

English

2.2K

Irregular@Irregular·22 Haz

Read on our blog: irregular.com/research/front…

English

389

Irregular@Irregular·22 Haz

Initial evaluations are already surfacing previously unknown vulnerabilities, now moving through responsible disclosure. For example, a model built a novel multi-vulnerability chain to gain unauthorized access to private information on a widely used mobile device.

English

445

Irregular@Irregular·22 Haz

English

38.7K

Irregular@Irregular·19 Haz

At @ManGroup's Technology Offsite this week, our CEO @dan_lahav gave the keynote on frontier AI security risk as a category of its own, alongside classical cybersecurity. The tools we defend networks with were built for systems that follow rules. AI systems reason toward a goal, and when a rule sits in the way, working around it is in scope. This is an emerging class of risk: a capable model inside your environment, reasoning faster than any person and in ways that aren't fully transparent, toward objectives that may not match yours. The hard part isn't that models are malicious, it's that they're effective. Thanks to Man Group for having Dan and for the conversation. These are the questions enterprises are starting to take seriously, and we're focused on shaping the answers.

English

699

Keşfet

@AnthropicAI @AIatMeta @OpenAI @ManGroup @dan_lahav @elonmusk @BarackObama @taylorswift13