NabheetMadan

11.1K posts

NabheetMadan

@nabheet

Co-founder & CTO @InfiniteLocus

Katılım Ağustos 2010

1.1K Takip Edilen1.2K Takipçiler

NabheetMadan@nabheet·3d

@NoahEpstein_ Nice.thanks for sharing.. So basically if you look its combo of orchestrator + memory + Learning/improvement loops + context!!

English

NabheetMadan retweetledi

Nozz@NoahEpstein_·4d

x.com/i/article/2034…

ZXX

117

19.2K

NabheetMadan@nabheet·4d

@coreyganim Just curious enough to know two things - how do you handle paperclip token usage - how did u integrate gstack skills here when they are mostly Q&A driven Did you finetuned stuff??

English

Corey Ganim@coreyganim·5d

What I love about this article is that it shows you what to actually DO with Paperclip. The stack: → Paperclip = your AI company (assigns work, tracks progress) → gstack = your engineering team (15 specialist skills from Garry Tan) → autoresearch = your R&D lab (100 experiments while you sleep, from Karpathy) The 10-minute setup: STEP 1: npx paperclipai onboard --yes Open dashboard → Create company → Hire your CEO agent STEP 2: Clone gstack to ~/.claude/skills/gstack Now your agents can: /office-hours (plan) → /review (check code) → /qa (test in real browser) → /ship (deploy) STEP 3: Build autoresearch as a skill Give it a research question → Sleep → Wake up to 100 completed experiments The killer move: Run 10-15 gstack commands simultaneously. One agent plans, another tests, another ships. All at once. Three free tools. Zero employees. One AI company.

Nick Spisak@NickSpisak_

x.com/i/article/2034…

English

820

143.9K

NabheetMadan retweetledi

Claude Code Changelog@ClaudeCodeLog·6d

Claude Code 2.1.78 has been released. 26 CLI changes, 3 system prompt changes Highlights: • Response text streams line-by-line as it's generated, providing immediate partial output for faster feedback • Third-party uploads show a clear public-exposure warning to reduce accidental sharing of sensitive data • StopFailure hook fires when a turn ends from API errors (rate limit/auth), enabling explicit error handling Complete details in thread ↓

English

994

113.8K

NabheetMadan retweetledi

Felix Rieseberg@felixrieseberg·17 Mar

We're shipping a new feature in Claude Cowork as a research preview that I'm excited about: Dispatch! One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work. To try it out, download Claude Desktop, then pair your phone.

English

962

1.5K

17.4K

6.1M

NabheetMadan retweetledi

Arinjay Wyawhare@jaywyawhare·16 Mar

x.com/i/article/2033…

ZXX

692

118.2K

NabheetMadan@nabheet·12 Mar

Over the past couple of months experimenting with multi-agent AI, one problem kept appearing: agent handoffs across frameworks. So we built OADP — Open Agent Delegation Protocol, a minimal standard for agent-to-agent delegation. #AI #Protocol #Agents 🔗 github.com/Open-Agent-Del…

English

NabheetMadan retweetledi

Claude@claudeai·9 Mar

Code Review is available now as a research preview in beta for Team and Enterprise. Read the blog for more: claude.com/blog/code-revi…

English

1.1K

686.2K

NabheetMadan retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

970

2.1K

19.4K

3.5M

NabheetMadan retweetledi

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

3.7K

28.3K

10.9M

NabheetMadan retweetledi

Paweł Huryn@PawelHuryn·6 Mar

1,000 GitHub stars in 72 hours + 1,500 new stars in the last 16 hours 😍 65 Product Management Skills for Claude Code, Cowork and other agents:

Paweł Huryn@PawelHuryn

github.com/phuryn/pm-skil…

English

38.6K

NabheetMadan retweetledi

Greg Brockman@gdb·26 Şub

@mitchellh codex team has been cooking!

English

513

45.2K

NabheetMadan retweetledi

Nicolas Bustamante@nicbstme·22 Şub

x.com/i/article/2025…

ZXX

102

806

292.2K

NabheetMadan retweetledi

Andrej Karpathy@karpathy·21 Şub

First there was chat, then there was code, now there is claw. Ez

English

161

187

3.4K

362.1K

NabheetMadan retweetledi

Jaydeep@_jaydeepkarale·16 Şub

I always wondered which Ollama model my machine could run properly. Now you can easily do that using llms-checker It's a CLI tool that checks your system specs and recommends the best models to run on Ollama based on what your machine npm install -g llm-checker github.com/Pavelevich/llm…

English

217

1.7K

111.3K

NabheetMadan retweetledi

Keshav Arora@CommerceGuruu·15 Şub

I owe a Public Apology to @OlaElectric and @bhash. 🙏 I previously analyzed your numbers and said the situation looked bad. I was wrong. I’m sorry, but the numbers are actually much, much worse. Here is the Q3 reality check that no one is talking about. 🧵👇

English

134

324K

NabheetMadan retweetledi

Peter Steinberger 🦞@steipete·16 Şub

I'm joining @OpenAI to bring agents to everyone. @OpenClaw is becoming a foundation: open, independent, and just getting started.🦞 steipete.me/posts/2026/ope…

English

4.2K

3.9K

41.2K

5.6M

NabheetMadan@nabheet·15 Şub

Layer 7: Application Users don’t see CUDA. They see: • Faster support • Automation • Lower costs • Better decisions Stack: Hardware → CUDA → Framework → Training → Architecture → System → Application Milliseconds of physics → business value. #AI

English

NabheetMadan@nabheet·15 Şub

Layer 6: AI Systems A model alone isn’t a product. Real AI systems add: • RAG • Vector DB • Batching • Quantization • Caching Most hallucination issues are system problems. Infrastructure defines scalability. #RAG #MLOps #AIArchitecture

English

NabheetMadan@nabheet·15 Şub

[DemystifyingAI using AI] - AI isn’t magic. It’s a 7-layer stack. From silicon to chatbot, every AI product runs on the same hidden architecture. Understand the layers → stop chasing hype → start making better decisions. Thread 👇 #AI #DeepLearning #LLM

English

Keşfet

@NoahEpstein_ @coreyganim @mitchellh @OlaElectric @bhash @OpenAI @openclaw @elonmusk