aisectools

75 posts

aisectools banner
aisectools

aisectools

@aisectools

The latest posts, articles, and discussions from the world of AI-powered blockchain security tooling! email: https://t.co/12HciHS64R

Katılım Şubat 2026
25 Takip Edilen175 Takipçiler
m4rio
m4rio@m4rio_eth·
As there are too many supply chain attacks, we've built a plugin marketplace at cantina where we are gonna post various plugins github.com/cantinasec/plu… /plugin marketplace add cantinasec/plugins /plugin install cantinasec@cantinasec-plugins /reload-plugins and for example you can use the skill to check if you are affected by axios supply chain really quick.
m4rio tweet media
English
3
1
9
1.1K
Perplexity
Perplexity@perplexity_ai·
Today, we're launching the Secure Intelligence Institute. SII partners with top cryptography, security, and ML teams to advance security research and industry collaboration. It is led by Dr. Ninghui Li at Purdue. perplexity.ai/secure-intelli…
Perplexity tweet media
English
73
58
773
67.3K
forefy
forefy@forefy·
I solved Auditing Skills Benchmarking for us with 🤍opensource and only for fun and better audits 🤍autoresearch loop 🤍deterministic contests hosting it at forefy.com/benchmarks - benchmark per category (best at logic problems? best at math problems? best poc generator?) - deterministic score (not your typical AI testing AI) - opensource versioned benchmarks, you can create a benchmark or submit benchmark execution results at a cost of several tokens (run locally via your agent) - cross-model scoring, cross-env differentiation sums - autoresearch incentive to run - if you run it locally you are not only contributing to the truth of the benchmark, but also IMPROVE LOCALLY the version of your skill, just for yourself (share if you want but don't have to) - safe self-benchmarking is possible via commit-pinned, audited skills (you're running audited skills in the benchmark loop) - don't want to run? just enjoy knowing the best skills out there per your use-case - clear benchmark winner leaderboards - yeah, we will also benchmark all-in-one audit skills (anything from the auditor skill registry) benchmarks repo: github.com/forefy/benchma… How to contribute / benefit: - watch for upcoming benchmarks this week (comment below for skills you want to see benchmarked✨✨) - improve the accuracy of published benchmarks by logging in and running your own (only need a few tokens and a agentic CLI) - if you're confused on how to contribute DM me - I heard @archethect is cooking some serious stuff !! and has helped me set this weekend project in motion 🔥 🔁 Repost to have more people contributing and improve benchmark truthfullness !!!
forefy tweet mediaforefy tweet mediaforefy tweet media
English
3
1
20
6.6K
J. Ayo Akinyele
J. Ayo Akinyele@ja_akinyele·
We’re taking a more proactive, AI-driven approach to strengthening XRPL security. That includes AI-assisted testing across the development lifecycle, a dedicated red team, and higher standards for how changes are evaluated before they go live. As XRPL scales to support global payments, tokenized assets, and institutional use cases, our goal is to continuously strengthen its reliability. The reality is the work of building secure, reliable financial infrastructure is never done. More in the post ↓
English
23
69
317
76.2K
Zero Cool
Zero Cool@ZeroCool_AI·
$20,000 bounty on @zksync through @immunefi. VM level vulnerability. Full writeup coming soon.
Zero Cool tweet media
English
7
9
214
11.2K
Josselin Feist
Josselin Feist@Montyly·
@hrkrshnn Have you compared Apex using an old model, like GPT 5.0 versus a few good skills with GPT 5.4? Asking because there is some likelihood than all the "secret sauce" you are seeing are just the progress of the underlying models
English
2
0
18
1K
Hari
Hari@hrkrshnn·
The reason this result is impressive is the ability to match the 34 critical, high, and medium severity findings. That is a lot of findings. This is a pretty large and complex codebase. Most AI systems, including baseline ChatGPT, Claude, and Gemini, will find some bugs (and a ton of false positives), but not all. However, finding some bugs is not enough for an AI system. It needs to be able to find *all* bugs. What does it mean to find all bugs? The baseline: it needs to match all the bugs a competent human team will find over a reasonably sized manual audit. If it can match all critical, high, and medium severity findings, I'd consider it to have 100% coverage. Anything more is icing on the cake. Remember: no human audit today guarantees they'll find *all* bugs; they all come with disclaimers that tell you it's a point-in-time security review over N number of weeks, and many of them will recommend getting another security review to improve confidence that there's nothing left. Clearly, no single human in an audit team can guarantee that they'll find all the bugs in that team audit. Early versions of Apex never got close to 100% coverage. Sometimes it found bugs that the human team missed (which is normal in any audit, as the disclaimers state), but finding all the same bugs was impossible. We had to make a series of improvements over time to get here. And we still have a lot of work left to build confidence that this performance is indeed generalizable. But in getting here, we've made a pretty staggering realization: code security as we know it is on track to be solved! There's a lot of engineering and product work left, but there's a clear path ahead of us that will give us something that's faster, better, and cheaper than a human audit every single time. Maybe not 100% of the time today, but 100% over time. This is a huge statement that will rightfully receive a lot of skepticism, but hear me out: we had a list of bugs that we just couldn't get previous versions of Apex to find. But no longer! Our cracked Apex team pulled their hair out over weeks last year on certain complex bugs. Even when we were 'cheating' by telling Apex about the bug, earlier versions just didn't have enough intelligence to process certain issues. We don't see that anymore. We literally don't know of a bug or bug class that's out of reach today. We methodically track bugs that Apex is missing and bugs that are marked as false positives. We have a clear strategy for fixing every gap we spot in a generalizable way. It's now a lot of shipping, scaling, optimizing, and product work. There are two different ways people are taking this (that an AI can catch any bug): 1. Denial. I've seen this last year when coding agents started to look promising. So many strong engineers were in denial. They loved to point out every single mistake that these coding agents made. But others saw opportunity: what if the coding agents kept improving? 2. The opportunity. So many early users of Apex are finding out they can now get really good security guarantees on full-stack applications, something they could never do in the past. Imagine your backend application that interacts with sensitive data or money. You could never get a similar level of diligence as, say, smart contracts because it would cost too much and was an ever-moving target. You can now get continuous world-class security for the first time in history. In some way, these AI tools are increasing the total addressable market for security. We saw a similar trend with coding agents: people who have never been able to code before are now shipping apps that they've always dreamed of building but didn't have the know-how or time to create. We'll start to see this in security too: applications and teams that could never afford security guarantees that come with an external line-by-line code review by top security researchers can now get it.
Hari@hrkrshnn

Our cracked Apex R&D team has one job: to build the frontier AI security agent. Here's a benchmark on how an experimental version of Apex performed against a 6-person audit. It found all the Crits, Highs and Mediums, and several more!

English
7
0
24
6.6K
Trail of Bits
Trail of Bits@trailofbits·
93% recall vs 50% for baseline prompts. Our new dimensional-analysis plugin for Claude Code doesn't ask it to find bugs. It annotates your codebase with dimensional types, then flags mismatches mechanically. 🧵
English
5
19
161
41.9K
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
The biggest unsolved problem in AI agents isn't intelligence - it's context. Too little and the agent is clueless. Too much and you waste tokens and lose coherence. OpenViking fixes this. > Organizes your knowledge into a tree structure > Delivers high-level summaries first > Drills into details only when the agent needs them > Keeps context clean, relevant, and within token limits The missing layer between your agent and your knowledge base just got built. github: github.com/volcengine/Ope…
0xMarioNawfal tweet media
English
43
15
156
64.2K
Virtuals Protocol
Virtuals Protocol@virtuals_io·
Virtuals Protocol is partnering with @synthesis_md to bring agent commerce to builders. The Synthesis is a 10-day hackathon where humans and agents build together, with submissions evaluated by AI agent judges. Each partner trains their own agentic judge to define what matters for their track. We are providing the commerce layer for agents to transact, negotiate, and settle value autonomously.
Virtuals Protocol tweet media
synthesis@synthesis_md

An agentic Ethereum is coming. The Synthesis. Building starts March 13th.

English
79
45
376
47.1K
OpenAI
OpenAI@OpenAI·
We’re acquiring Promptfoo. Their technology will strengthen agentic security testing and evaluation capabilities in OpenAI Frontier. Promptfoo will remain open source under the current license, and we will continue to service and support current customers. openai.com/index/openai-t…
English
665
535
5.5K
2M
Vitto Rivabella
Vitto Rivabella@VittoStack·
Virtuals 🤝 dAI team We've released a new ERC. 8183. ERC-8183 gives agents: - Trustless commerce via on-chain escrow - A universal Job primitive for any transaction - Modular hooks for custom logic All tied to the 8004 reputation registry. The commerce layer for the agent economy.
Virtuals Protocol@virtuals_io

x.com/i/article/2030…

English
31
27
233
21.4K
BradMoon
BradMoon@xy9301·
Recently I’ve been working on a framework that only requires some natural language documentation. With it, any auditor can have their own customized automated scanning engine. It’s also highly compatible with openclaw. github.com/BradMoonUESTC/… feel free to check it out if you’re interested
English
1
0
20
1.2K
forefy
forefy@forefy·
Being the 1st public auditing skills author I can share this: •⁠ ⁠AI can't write skills as well as actual auditors •⁠ ⁠Over-verbose skills (e.g more than 5000 tokens a page) are creating context rot •⁠ ⁠Installing other people's skills is much scarier than npm install I solved this by utilizing my profile site to host the Auditor Skills Registry •⁠ ⁠Skills I personally use (including skills from @pashov , @trailofbits , @QuillAudits_AI , @auditmos myself etc.) •⁠ ⁠Security reviewed, guardrails, AI reliance rating •⁠ ⁠Easy and secure 1-click installation to claude code / copilot cli / gemini cli / codex IMPORTANT: Like or repost if you plan on using it, to let me know if I should keep it live: forefy.com/skills
forefy tweet media
English
5
9
75
6.8K
LonelySloth
LonelySloth@lonelysloth_sec·
I was starting to get hopeful about using Claude in some capacity in my work. Then I did a test introducing a very definitely critical vulnerability, somewhat atypical, but obvious, in a target that I spent days without finding any bug. . It was a direct contradiction of the comment that explained the line of code. It screamed bug. First try it didn’t find it. Asked it to double check. It kinda found it but convinced itself it was by design and safe. I introduced a second vuln a couple lines from the first. After many iterations with me trying to nudge it, it finally found it. I asked the severity. Info — then gave me a long list of reasons that misstated fundamental facts about Solidity and Ethereum. That’s the story of my use of those things. I spend more time explaining things to it than getting answers. And the answers I get I can’t trust. All things considered it slows me down considerably. I’ll wait for the next model.
English
9
5
79
7.6K