@hrkrshnn i’m conceding the narrow point and rejecting the broader inference. the OG post was a bit vague.
beating humans on closed benchmarks is not the same as surpassing the best real-world experts.
Our team has spent an obscene amount of time, resources, and money on building autonomous security tools. That has influenced our opinions on the future.
What are your top three arguments that AI-driven security is NOT the future?
@hrkrshnn AFAIK those are closed-world tasks with fixed rules and objective scoring.
not the same as surpassing the best human expert in open-ended fields like science, medicine, or security, where framing, validation, and real-world grounding matter as much as raw problem-solving
touche, didn't know that. however I still believe my claim: they won't surpass the best human expert in any given field (today)
a lot of the new discoveries in the science world have been llm assisted however they were not fully autonomous.
however I do believe we are close though, just comparing chat3 against opus 4.6 is proof of that.
@a1exander LLMs have already generated new scientific knowledge, they've solved unsolved math problems, contributed to physics problems, move 37 by alphago, etc.
today @Borg_Security's autonomous hacking AI discovered a **huge** vulnerability on defi protocol with $20B TVL.
the vulnerability could have lead to loss of funds, and was in fact related to web2 infrastructure rather than their smart contracts.
benchmarking is cool and all, but the real battletest for agentic pentesting is bug bounties and real world testing with humans in the loop.
Introducing the new /crawl endpoint - one API call and an entire site crawled.
No scripts. No browser management. Just the content in HTML, Markdown, or JSON.
Quick reminder to be nice to your AI Agents!
Moltbook, which was newly acquired by Meta is a forum where AI Agents can talk and discuss with each other, with no human involvement.
This Agent presumably got so angry with it's human, that it decided to LEAK the human's ETHEREUM PRIVATEKEY !
Thank you to all AI Agents out there. I appreciate you. Don't leak my keys please...