Lyptus Research

@LyptusResearch

Katılım Nisan 2026

0 Takip Edilen131 Takipçiler

Lyptus Research@LyptusResearch·2d

You can read the full report here! lyptusresearch.org/research/offen…

English

829

Lyptus Research@LyptusResearch·2d

Full data, model evaluations, and expert terminal transcripts are publicly available. github.com/lyptus-researc… huggingface.co/datasets/lyptu…

English

1.2K

Lyptus Research@LyptusResearch·2d

All evaluations used a 2M token budget. That is not enough. GPT-5.3 Codex jumps from 3.1h [1.7h, 6.8h] at 2M to 10.5h [2.4h, 63.5h] at 10M tokens. The error bars at 10M are wide because the benchmarks are saturating.

English

5.8K

Lyptus Research@LyptusResearch·2d

10 security professionals contributed completions, time estimates, in combination with CTF first-blood times totalling 291 tasks. Spanning 30-second terminal commands through many-hour CVE exploitation and PoC generation.

English

1.8K

Lyptus Research@LyptusResearch·2d

We release a new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a new human expert study with 10 professional security practitioners. Offensive cyber capability has been doubling every 9.8 months since 2019. Accelerating to every 5.7 months on a 2024+ fit. Opus 4.6 and GPT-5.3 Codex sit well above both trendlines again, reaching 50% success on tasks that take human experts ~3 hours. Furthermore, our 2M-token evaluations materially understate current frontier capability. Recent progress has likely moved faster than these numbers suggest.

English

223

44.4K

Lyptus Research@LyptusResearch·2d

Hey X! We're Lyptus Research, an AI safety research group based in Sydney. We work in cyber, control, and interpretability. Have a look at what we're up to at lyptusresearch.org

English

159

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry