Abhi

400 posts

Abhi banner
Abhi

Abhi

@bufoverflow

cto & entropy weaver @polychain. long (🇺🇸, crypto). grew up with the monks #!/bin/bash # opinions are mine, not my employer's chown $USER:opinions

software x finance interface 参加日 Ekim 2012
869 フォロー中279 フォロワー
Abhi
Abhi@bufoverflow·
Biggest unlock since gpt4 is to generate the features and attacks at the same time. Shift-left movement is structurally built for adversarial ai. during codegen the model has full context: intent, deps, IaC topology, and implied threat models. This attack surface density decays as you move right where you’re trying to pattern match. CTOs/CSOs can finally turn software/security engineering into a dialectical process. Red teaming isn’t a separate team or a tool happening after you deploy, it is literally a second pass at codegen through the same model with an adversarial dev.prompt
English
0
0
0
207
Marc Andreessen 🇺🇸
Every security engineer knows "security through obscurity" doesn't work. But that's how we've actually been running for the whole existence of computers, until now. AI can finally fix that.
English
93
77
1.4K
124.9K
Abhi
Abhi@bufoverflow·
nope. i’m eaglemaxxing. 🦅🇺🇸
Abhi tweet media
English
0
0
1
56
Abhi
Abhi@bufoverflow·
So what does a Homebrew Gradient Club mean for the mortals? Compute or harness isn't the only moat. Autoresearch accelerated proprietary gradient and mechanism design is your most durable moat.
English
0
0
1
28
Abhi
Abhi@bufoverflow·
People are overthinking and underestimating @karpathy's autoresearch at the same time. It isn't RSI or just autonomous hyperparam search, but something weirder.. It feels like the Homebrew Computer Club moment for model, optimizer and training-loop experimentation.
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
1
0
1
68
Abhi がリツイート
Nivi
Nivi@nivi·
You can make a significant contribution to practically any field by applying Deutsch’s theories of knowledge generation.
English
22
44
884
51.4K
Abhi
Abhi@bufoverflow·
Ask your fav model, can you estimate how many tokens I have used in all of my interactions? You have my permission to mog me with the rookie numbers meme.
Abhi tweet media
English
0
0
0
58
Abhi
Abhi@bufoverflow·
Following up on agent-defined-x slope. Software-defined-x (networking, finance) was the last major infrastructure abstraction shift where the control plane decoupled from the execution plane. Agent-defined-x is the same move, but one layer up; control plane isn't code anymore, it's the intent. Agent-defined security, agent-defined compliance, agent-defined trading. The architecture that makes it work: agents in the loop at machine speed, experts on the loop for oversight.
English
0
0
0
25
Abhi
Abhi@bufoverflow·
@garrytan 's yc pod opens with lobster costume. Tech twitter saw the thumbnail and 30s of the video and chose violence. If you get past the joke bit and read between the lines, you'll realize that the signal isn't costumes or the hype cycle, but a new computing primitive: An abstraction that plans, reasons about tools, runs adversarial simulations based only on your intents and not prior code. We have never had a computer like this before. The flywheel isn't human-in-the-loop, but an agent-in-the-loop (hot path, machine speed) escalating to an expert-ON-the-loop (oversight, catching outliers). Every tech wave passes through a hype cycle where interest ebbs. The "path of least resistance" dunkers always confuse intercept for slope. This has the same energy as 2018 crypto twitter's "cRyPtO iS pOnZi". Ignore the costume and focus on the slope (agent-defined-x)
Y Combinator@ycombinator

With the takeoff of OpenClaw and MoltBook, a new agent-driven economy is taking shape. On the @LightconePod, we took a look at the explosive growth of AI dev tools and whether the time has come for builders to make something agents want. 00:00 - Intro 02:12 - No human involvement is changing the experience 04:55 - Does YC need to change its motto? 07:48 - Email tools and agent infrastructure 09:36 - Agent-driven documentation 13:00 - Swarm intelligence 15:36 - Content generation and dead Internet theory 18:12 - Growth, rules, and founder insights

English
1
0
0
91
Abhi
Abhi@bufoverflow·
@martin_casado “execution eats strategy for breakfast” folks are in for a whiplash.
English
0
0
0
151
martin_casado
martin_casado@martin_casado·
We're very much in the electric scissors phase of AI software development ...
martin_casado tweet media
English
11
11
165
10.7K
Abhi
Abhi@bufoverflow·
@agupta I predict token budget (the non-crypto kind) becomes your new employee perk.
English
0
0
1
222
Abhi
Abhi@bufoverflow·
Friends in the gc: "Abhi join this time or else you'll have FOMO". My wife: "He doesn't have FOMO, he has FOBI - Fear of Being Included" Knew I snagged the right one.
English
0
0
0
41
Abhi
Abhi@bufoverflow·
This one lived in drafts for too long. @sama The whole unwarp experience was so rewarding. Mad props for shipping this!
Abhi tweet mediaAbhi tweet mediaAbhi tweet mediaAbhi tweet media
Abhi@bufoverflow

Santa season = recap season Spotify wrapped, Github, YT.. Where is my LLM Wrapped? @sama I'd love a '25 GPT Recap. 🎁 feels like missed opportunities. I love the easter egg, but give me the real Wrapped: how I used LLMs, how it improved me, and how I improved it Would love a recap of: - Total "thought minutes" - % of knowledge work - which chats were flagged as salient - best takes - emergent insights.

English
0
0
1
77
Abhi
Abhi@bufoverflow·
@hamptonism Daily turmeric, sea salt and hot water gargle. Passed down through generations in my family.
English
0
0
0
12
Abhi
Abhi@bufoverflow·
Everyone appreciates the chad penguin's grind (tenacity, rebellion, existential quest). But the real hero is the mountain itself. Without the abyss and that towering pull there is no yearning. The void calls; the waddle answers 🐧🏔️
Marc Andreessen 🇺🇸@pmarca

This is a test.

English
0
0
1
121