Archit

812 posts

Archit

@archit_singh15

computer engineer | grokking neural nets currently | 👨‍💻 | 🏋️‍♀️ | 🤖 | 🎮 | ✍️ | 📷

Earth #42 Katılım Haziran 2020

1.8K Takip Edilen96 Takipçiler

Sabitlenmiş Tweet

Archit@archit_singh15·29 Ağu

AI that forgets is brittle. AI with memory can reason. At #PyConIndia2025, I’ll present "Memory is the Agent: Architecting Stateful Reasoning", explaining why memory must be the backbone of intelligent systems. #Python #AI #Intelligent #memory #pythonprogramming @pyconindia

PyCon India@pyconindia

Insights, inspiration, and Python power! 🐍 Be there for Archit Singh’s session on “Memory is the Agent: Architecting Stateful Reasoning” at #PyConIndia2025 Tickets still open: in.pycon.org/2025/tickets/ Conference schedule: in.pycon.org/2025/program/s… #TechTalk #conference #Python

English

318

Archit retweetledi

Muhammad Ayan@socialwithaayan·5d

🚨 BREAKING: Someone just built the exact tool Andrej Karpathy said someone should build. 48 hours after Karpathy posted his LLM Knowledge Bases workflow, this showed up on GitHub. It's called Graphify. One command. Any folder. Full knowledge graph. Point it at any folder. Run /graphify inside Claude Code. Walk away. Here is what comes out the other side: -> A navigable knowledge graph of everything in that folder -> An Obsidian vault with backlinked articles -> A wiki that starts at index. md and maps every concept cluster -> Plain English Q&A over your entire codebase or research folder You can ask it things like: "What calls this function?" "What connects these two concepts?" "What are the most important nodes in this project?" No vector database. No setup. No config files. The token efficiency number is what got me: 71.5x fewer tokens per query compared to reading raw files. That is not a small improvement. That is a completely different paradigm for how AI agents reason over large codebases. What it supports: -> Code in 13 programming languages -> PDFs -> Images via Claude Vision -> Markdown files Install in one line: pip install graphify && graphify install Then type /graphify in Claude Code and point it at anything. Karpathy asked. Someone delivered in 48 hours. That is the pace of 2026. Open Source. Free.

English

265

1.4K

12.6K

915.1K

Archit retweetledi

Andrej Karpathy@karpathy·28 Mar

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

English

1.7K

2.4K

31.2K

3.4M

Archit retweetledi

rahat@Rahatcodes·31 Mar

Claude Code has a regex that detects "wtf", "ffs", "piece of shit", "fuck you", "this sucks" etc. It doesn't change behavior...it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will

English

545

768

14.4K

1.4M

Archit@archit_singh15·28 Mar

@soumilrathi Context

English

Sicarius@soumilrathi·27 Mar

Starting a memory researcher group chat. If you’re into memory systems, personalization, or context engineering in LLMs & AI systems. comment “context” to join.

English

393

315

35.2K

Archit retweetledi

eric zakariasson@ericzakariasson·25 Mar

x.com/i/article/2036…

ZXX

149

1.8K

490.3K

Archit retweetledi

Shiv@shivsakhuja·25 Mar

Lots of companies are now building primitives for an economy where AI agents are the primary users instead of humans. They're betting on an economy of AI coworkers. 1. AgentMail (@agentmail): so agents can have email accounts 2. AgentPhone (@tryagentphone): so agents can have phone numbers 3. Kapso (@andresmatte): so agents can have WhatsApp phone numbers 4. Daytona (@daytonaio) / E2B (@e2b): so agents can have their own computers 5. Browserbase (@browserbase) / Browser Use (@browser_use) / Hyperbrowser (@hyperbrowser): so agents can use web browsers 6. Firecrawl (@firecrawl): so agents can crawl the web without a browser 7. Mem0 (@mem0ai): so agents can remember things 8. Kite (@GoKiteAI) / Sponge (@PayspongeLabs) : so agents can pay for things. 9. Composio (@composio): so agents can use your SaaS tools 10. Orthogonal (@orthogonal_sh) so agents can access APIs easily 11. ElevenLabs (@ElevenLabs) / Vapi (@Vapi_AI) so agents can have a voice 12. Sixtyfour (@sixtyfourai) so agents can search for people and companies. 13. Exa (@ExaAILabs): so agents can search the web (Google doesn’t work for agents) If you stitch all of these together, you get a digital coworker that looks more human than AI.

English

196

234

2.2K

272.5K

Archit retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).

English

154

644

3.6K

493.1K

Archit retweetledi

Ramp Labs@RampLabs·23 Mar

We built a codebase that maintains itself. An agent instruments every pull request, triages alerts, and pushes fixes autonomously. The system runs on a thousand AI-generated monitors, one for every 75 lines of code.

Ramp Labs@RampLabs

x.com/i/article/2036…

English

1.1K

297.8K

Archit retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

974

2.1K

19.4K

3.6M

Archit retweetledi

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

338

557

6.5K

619.2K

Archit retweetledi

Jainam Parmar@aiwithjainam·2 Mar

🚨 Stop building agents from scratch. An Anthropic hackathon winner just dropped the complete Claude Code config bible. Agents, skills, hooks, commands, rules, MCPs battle-tested over 10+ months. Now has PM2 + multi-agent orchestration with 6 new commands. This single repo replaces 10 different setups. Comment “HACKATHON” and I’ll DM the link + my personal fork.

English

252

15.1K

Archit retweetledi

Thariq@trq212·3 Mar

Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!

English

1.1K

1.3K

17.2K

3.6M

Archit@archit_singh15·2 Mar

I built a persistent memory layer for AI agents in Rust - 1 SQLite file 0 config github.com/archit15singh/…

English

Archit retweetledi

Paul Graham@paulg·1 Mar

I just reread "How to Do Great Work." It's so long! But it also has less fat than most things I've written, which is a weird combination, because usually writing that's long on the macro scale is long on the micro scale too. paulgraham.com/greatwork.html

English

129

260

3.5K

284.8K

Archit retweetledi

Thariq@trq212·27 Şub

We've reset rate limits for all Claude Code users. Yesterday we rolled out a bug with prompt caching that caused usage limits to be consumed faster than normal. This is hotfixed in 2.1.62. Make sure you upgrade to the latest and hope you enjoy using Claude Code this weekend!

English

726

446

11.1K

Archit retweetledi

Lydia Hallie ✨@lydiahallie·27 Şub

Excited to announce Claude for Open Source ❤️ We're giving 6 months of free Claude Max 20x to open source maintainers and core contributors. If you maintain a popular project or contribute across open source, please apply! claude.com/contact-sales/…

English

588

1.4K

12.5K

1.8M

Archit@archit_singh15·28 Şub

reddit.com/r/Anthropic/s/…

ZXX