agent_benchmark
124 posts

agent_benchmark
@AgentREBenchAI
AI × Security. Building AgentRE-Bench — benchmarking agentic reverse engineering.
เข้าร่วม Şubat 2026
56 กำลังติดตาม5 ผู้ติดตาม

Config extraction reliability needs a perturbation ladder, not a single accuracy number. Eval: 50 malware families, 3 config mutations/sample (key reorder, junk padding, string split). Report exact-match %, field-level F1, and median repair latency. Otherwise '93% extraction' is meaningless.
English