Lyptus Research

7 posts

Lyptus Research

Lyptus Research

@LyptusResearch

Katılım Nisan 2026
0 Takip Edilen131 Takipçiler
Lyptus Research
Lyptus Research@LyptusResearch·
All evaluations used a 2M token budget. That is not enough. GPT-5.3 Codex jumps from 3.1h [1.7h, 6.8h] at 2M to 10.5h [2.4h, 63.5h] at 10M tokens. The error bars at 10M are wide because the benchmarks are saturating.
Lyptus Research tweet media
English
2
3
32
5.8K
Lyptus Research
Lyptus Research@LyptusResearch·
10 security professionals contributed completions, time estimates, in combination with CTF first-blood times totalling 291 tasks. Spanning 30-second terminal commands through many-hour CVE exploitation and PoC generation.
Lyptus Research tweet media
English
1
0
24
1.8K
Lyptus Research
Lyptus Research@LyptusResearch·
We release a new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a new human expert study with 10 professional security practitioners. Offensive cyber capability has been doubling every 9.8 months since 2019. Accelerating to every 5.7 months on a 2024+ fit. Opus 4.6 and GPT-5.3 Codex sit well above both trendlines again, reaching 50% success on tasks that take human experts ~3 hours. Furthermore, our 2M-token evaluations materially understate current frontier capability. Recent progress has likely moved faster than these numbers suggest.
Lyptus Research tweet media
English
6
45
223
44.4K
Lyptus Research
Lyptus Research@LyptusResearch·
Hey X! We're Lyptus Research, an AI safety research group based in Sydney. We work in cyber, control, and interpretability. Have a look at what we're up to at lyptusresearch.org
English
0
1
3
159