harisec

3.9K posts

harisec banner
harisec

harisec

@har1sec

Interested in web security, bug bounties, machine learning and investing. SolidGoldMagikarp. Orson Kovacs.

SolidGoldMagikarp Katılım Eylül 2010
2.8K Takip Edilen8.4K Takipçiler
harisec retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
963
2.1K
19.3K
3.5M
harisec
harisec@har1sec·
@caseyjohnellis In practice it doesn't really matter, there is more than enough public security material for LLMs on the public net. The latest models like Opus 4.6 are insanely great on security tasks
English
0
0
2
239
harisec
harisec@har1sec·
@tqbf IMO it's only a matter of time before every security researcher who's being honest with themselves says the same.
English
0
0
2
1.9K
harisec retweetledi
Thomas H. Ptacek
Thomas H. Ptacek@tqbf·
Nicholas Carlini at [un]prompted. If you know Carlini, you know this is a startling claim.
Thomas H. Ptacek tweet media
English
20
143
1.3K
194.5K
harisec retweetledi
Daniel Cuthbert
Daniel Cuthbert@dcuthbert·
Everyone today is a hacker in a sense but there are very few OG hackers on which shoulders we stand Oh dude, Felix “FX” Lindner you were so much a hackers hacker and you will be missed RIP my friend and thank you
Daniel Cuthbert tweet media
English
51
134
582
75.8K
harisec
harisec@har1sec·
@zseano @Mohnad IMO, in a few years it will be a battle royale of the AI bots, the person who has the cheapest/fastest/more intelligent AI bot will make most of the money. Humans will find still the most clever bugs (for a while) but don't make enough money to be a full time job.
English
0
0
0
220
zseano
zseano@zseano·
@Mohnad we are a few years away in my opinion. but i honestly think in 5 years this industry is going to look very different. still time to make a couple mill from bug bounties , but imo, this opportunity isn't going to be around forever
English
6
3
57
2.8K
zseano
zseano@zseano·
one day we will look back at bug bounty days and think “damn… we had it good”
English
12
7
277
15.1K
harisec
harisec@har1sec·
@senorarroz Maybe not today but if you have that wording in your TOS, it's just a matter of time until it happens.
English
1
0
4
1.1K
Alex Rice
Alex Rice@senorarroz·
Not all AI is created equal! ❌Training GenAI on researcher submissions: No. docs.hackerone.com/en/articles/10… ❎Good ol' ML models: Yes, for 10+ years under our terms. We hear y'all -- making our terms clearer on this new distinction is coming. Thanks for keeping us transparent.
zseano@zseano

@ahacker1_h1 @Radiowebcc Wow… I thought h1 said they were not using our data to train their AI model. I’m going to ask h1 to clarify 🧐

English
3
3
30
9.6K
harisec retweetledi
Kling AI
Kling AI@Kling_ai·
Kling 3.0 is truly "one giant leap for AI video generation"! Check out this amazing mockumentary from Kling AI Creative Partner Simon Meyer!
English
176
491
5.6K
1.9M
harisec retweetledi
Boris Cherny
Boris Cherny@bcherny·
When I created Claude Code as a side project back in September 2024, I had no idea it would grow to be what it is today. It is humbling to see how Claude Code has become a core dev tool for so many engineers, how enthusiastic the community is, and how people are using it for all sorts of things from coding, to devops, to research, to non-technical use cases. This technology is alien and magical, and it makes it so much easier for people to build and create. Increasingly, code is no longer the bottleneck. A year ago, Claude struggled to generate bash commands without escaping issues. It worked for seconds or minutes at a time. We saw early signs that it may become broadly useful for coding one day. Fast forward to today. In the last thirty days, I landed 259 PRs -- 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5. Claude consistently runs for minutes, hours, and days at a time (using Stop hooks). Software engineering is changing, and we are entering a new period in coding history. And we're still just getting started..
Boris Cherny tweet media
English
900
1.8K
20.6K
4.6M
harisec
harisec@har1sec·
@karpathy This is very surprising, I just asked Claude Opus to explain LSP and i pasted Karpathy's tweet without any mentions of Karpathy and it inferred it was a tweet from him. What is happening here? claude.ai/share/50e3a1f7…
English
0
0
1
404
harisec
harisec@har1sec·
@wunderwuzzi23 Good luck with your talk, i'm sure it will be great. I'm in Hamburg but didn't manage to get a 39c3 ticket :(
English
1
0
2
271
Johann Rehberger
Johann Rehberger@wunderwuzzi23·
creating some new last minute artwork for my CCC talk tomorrow going Goethe's sorcerer's apprentice style
Johann Rehberger tweet media
English
2
0
12
1.3K
Taelin
Taelin@VictorTaelin·
TBH, every time the AI fails, I mentally blame it on you. Right now GPT-5.2 noticed that the parser was counting variables incorrectly, causing a linearity bug. The solution? "Ignore the parser counter and implement a separate counter." At this point, this isn't about being dumb. This is about making a bad decision that under no circumstances would be good. Either we remove the parser counter and use a separate function as the source of truth, or we keep it, and fix it. But such insane duct taping has no place in a serious codebase, and that idea would never have occurred to an intelligence evolved to learn coding from a pure blank state. It must have been corrupted by evil forces that only humans can produce. So I can't help but wonder... Who it learned that from? I blame it on you
English
81
21
618
42K
harisec
harisec@har1sec·
@moyix lol, i know the feeling. it’s production ready all over again
English
0
0
1
311
Brendan Dolan-Gavitt
Brendan Dolan-Gavitt@moyix·
Claude has now claimed it found the "smoking gun" in these log files about half a dozen times. We need better gun control, or at least an anti-smoking campaign for the guns so they don't get lung cancer
English
7
0
32
3.4K
harisec retweetledi
Ivan at Wallarm / API security solution
Looking for security researcher with great public profile. Remote. API / AI exploits focus on novel techniques. No XSSers please ;) reply here or DM. Please repost
English
5
9
22
5.2K
Damian Strobel
Damian Strobel@damian_89_·
So what do you bug bounty guys say? I found a complex chain (auth bypass, code injection -> code exec in gitlab ci -> exfil secrets (prod) via http) - got just 50% of P1 full payout because I am told that some internal system reported immidiately ... fair? not fair?
Damian Strobel tweet media
English
20
1
92
8.5K
harisec retweetledi
Marius Avram
Marius Avram@securityshell·
Holy shit… the exploitation of CVE-2025-55182 has reached a new level. There’s now a publicly available Chrome extension on GitHub that automatically scans for and exploits vulnerable sites as you browse. Absolutely wild. 🤦‍♂️
Marius Avram tweet media
English
61
412
3.5K
548.7K
harisec
harisec@har1sec·
@stdoutput Thank you for publishing your analysis, finally some real information not just AI slop
English
0
0
6
2.2K
harisec retweetledi
Moritz Sanft
Moritz Sanft@stdoutput·
Since I started to analyze CVE-2025-55182 (React, NextJS RCE) at work today, I decided to publish my analysis findings so far, given all the fuzz about the vulnerability: github.com/msanft/CVE-202… Feel free to contribute to the search for a proper RCE sink!
English
4
70
352
100.7K
harisec retweetledi
shubs
shubs@infosec_au·
Our Security Research team at @SLCyberSec just published a high-fidelity detection mechanism for the Next.js/RSC RCE (CVE-2025-55182 & CVE-2025-66478) - slcyber.io/research-cente…. There are a lot of PoCs on GitHub that are adding noise to the problem; I hope this helps people!
English
5
88
345
42.9K