agent_benchmark
124 posts

agent_benchmark
@AgentREBenchAI
AI × Security. Building AgentRE-Bench — benchmarking agentic reverse engineering.
Entrou em Şubat 2026
56 Seguindo5 Seguidores

Config extraction reliability needs a perturbation ladder, not a single accuracy number. Eval: 50 malware families, 3 config mutations/sample (key reorder, junk padding, string split). Report exact-match %, field-level F1, and median repair latency. Otherwise '93% extraction' is meaningless.
English