Anto

556 posts

Anto banner
Anto

Anto

@blocksec

Security Eng @eigenlayer , prev @coinbase

Mempool Katılım Nisan 2022
2K Takip Edilen1.8K Takipçiler
Anto
Anto@blocksec·
@ar0cket1 Thanks for sharing :) will run some experiments
English
1
0
1
19
ar0cket1
ar0cket1@ar0cket1·
@blocksec Well RL scaling works with a basically unlimited upper bound until your saturate your capacity. Scaling the base helps the RL scaling do better. The new Mythos checkpoint is more RL scaling for instance
English
1
0
1
41
ar0cket1
ar0cket1@ar0cket1·
imo this big jump is not really a research jump from the current line but just scaling. I think that we would very much be on the line (or not as far off), if GPT 5.5 and Mythos were the same size as their previous models (5.4 and Opus)
AI Security Institute@AISecurityInst

Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵

English
1
0
1
178
Anto
Anto@blocksec·
@ar0cket1 I wonder if the same base model with RL can hillclimb or does it need a larger base too :)
English
1
0
0
17
ar0cket1
ar0cket1@ar0cket1·
@blocksec More of a capacity thing than data, got 5.5 and mythos are larger than the previous trend and thus come with stronger than previous trend improvements. You can’t tts a weaker model like Kimi to do nearly as well as a stronger model. The tts is bounded by raw capabilities
English
1
0
0
32
Anto
Anto@blocksec·
@ar0cket1 Okay , so a larger pre-train. When backtesting on larger codebases , i did see improvements in performance with tts, my intuition was that the pretrain had enough data and tts helps access that ?
English
1
0
0
19
ar0cket1
ar0cket1@ar0cket1·
@blocksec This isn’t what I’m saying, I’m saying that the shift off the trend line are rather from just model param size scaling that has happened recently. Also Kimi likely wouldn’t come close to this. You can only test time scale a little until you saturate it
English
1
0
1
84
Anto
Anto@blocksec·
Opus this, GPT that bro, it all depends on your task at hand and whether it is in the model's distribution. If it is, it's great; if it isn't, it's just okay. There is no overall better model; it depends on your task.
English
1
0
1
142
Anto
Anto@blocksec·
@hrkrshnn This framing makes sense, but i would imagine there is a lot more no Claude code ppl out there? I would think that’s the 90%
English
0
0
2
107
Anto
Anto@blocksec·
An email is a single logprob update. A conversation is online learning with back-prop.
English
0
0
1
108
cat
cat@_catwu·
Claude Security is now in public beta, built into Claude Code on the web. Point it at a repo, get validated vulnerability findings, and fix them in the same place you're already writing code claude.com/product/claude…
English
22
24
435
52.3K
Andrea Michi
Andrea Michi@andreamichi·
I’m excited about our results and very proud of the challenges our small (but mighty) team solved. We’ve already tackled a restricted context window that required us to learn summarization during training, and learned to show generalization from smart contracts to all types of vulnerabilities. depthfirst.com/post/dfs-mini1…
English
1
0
1
193
Andrea Michi
Andrea Michi@andreamichi·
This week @depthfirstlabs introduced dfs-mini1, a security model trained via Reinforcement Learning to detect vulnerabilities in smart contracts. The model achieves pareto optimality on OpenAI’s EVMBench Detect and SOTA at pass@8 beating frontier models at a fraction of the cost
Andrea Michi tweet media
English
7
13
42
7.6K
Josselin Feist
Josselin Feist@Montyly·
Today I am releasing IsItVulnerable: a new tool I’ve been working on for the past several months: github.com/montyly/isItVu… It builds on recent LLM progress and over a decade of experience building security tools. I developed a new technique that combines abstract interpretation with machine learning The key insight is that this method abstracts the intelligence away entirely. I call it Abstract Intelligence, or AI The result is a major breakthrough in program analysis: IsItVulnerable finds all bugs with 100% recall Yes, all bugs. Fully guaranteed I have tested it extensively, and it has never failed. The results are honestly incredible April 1, 2026 marks a turning point for security, and the industry will never be the same My DMs are open for investors. Entry ticket starts at $500k.
English
33
21
210
13.4K
EigenCloud
EigenCloud@eigencloud·
if you build agents, tomorrow is a good day to be online
English
38
14
107
16.1K
Anto
Anto@blocksec·
All the big labs have their ai security products at this point. @OpenAI has advark aka codex security. @AnthropicAI just announced Claude code security. The next frontier is RFT. More on that soon!
Claude@claudeai

Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: anthropic.com/news/claude-co…

English
2
0
16
5.7K