Hamed Firooz

853 posts

Hamed Firooz

@mamhamed

AI research director @aiatmeta, focusing on AI, LLM, Multimodal, Safety. Opinions are my own

Mountain View, CA Katılım Ağustos 2010

285 Takip Edilen273 Takipçiler

Sabitlenmiş Tweet

Hamed Firooz@mamhamed·26 Ara

🚀Excited to share our new @AIatMeta paper: Scaling Reinforcement Learning for Content Moderation with LLMs—positioning RL post-training as a practical lever for specialized LLM classifiers. 👉Sigmoid-like scaling: performance improves smoothly then saturates as you scale data / rollouts / tokens 👉Label efficiency: RL can match strong baselines with far fewer labeled examples 👉A practical recipe: reward shaping (accuracy + format + length + rubric) + reflection-aided prompting + Monte-Carlo score aggregation 📖 arxiv.org/pdf/2512.20061

English

183

Hamed Firooz retweetledi

Andrej Karpathy@karpathy·4 Nis

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

1.1K

2.7K

26.1K

6.7M

Hamed Firooz retweetledi

Alex Prompter@alex_prompter·5 Nis

🚨 BREAKING: Google DeepMind just mapped the attack surface that nobody in AI is talking about. Websites can already detect when an AI agent visits and serve it completely different content than humans see. > Hidden instructions in HTML. > Malicious commands in image pixels. > Jailbreaks embedded in PDFs. Your AI agent is being manipulated right now and you can't see it happening. The study is the largest empirical measurement of AI manipulation ever conducted. 502 real participants across 8 countries. 23 different attack types. Frontier models including GPT-4o, Claude, and Gemini. The core finding is not that manipulation is theoretically possible it is that manipulation is already happening at scale and the defenses that exist today fail in ways that are both predictable and invisible to the humans who deployed the agents. Google DeepMind built a taxonomy of every known attack vector, tested them systematically, and measured exactly how often they work. The results should alarm everyone building agentic systems. The attack surface is larger than anyone has publicly acknowledged. Prompt injection where malicious instructions hidden in web content hijack an agent's behavior works through at least a dozen distinct channels. Text hidden in HTML comments that humans never see but agents read and follow. Instructions embedded in image metadata. Commands encoded in the pixels of images using steganography, invisible to human eyes but readable by vision-capable models. Malicious content in PDFs that appears as normal document text to the agent but contains override instructions. QR codes that redirect agents to attacker-controlled content. Indirect injection through search results, calendar invites, email bodies, and API responses any data source the agent consumes becomes a potential attack vector. The detection asymmetry is the finding that closes the escape hatch. Websites can already fingerprint AI agents with high reliability using timing analysis, behavioral patterns, and user-agent strings. This means the attack can be conditional: serve normal content to humans, serve manipulated content to agents. A user who asks their AI agent to book a flight, research a product, or summarize a document has no way to verify that the content the agent received matches what a human would see. The agent cannot tell the user it was served different content. It does not know. It processes whatever it receives and acts accordingly. The attack categories and what they enable: → Direct prompt injection: malicious instructions in any text the agent reads overrides goals, exfiltrates data, triggers unintended actions → Indirect injection via web content: hidden HTML, CSS visibility tricks, white text on white backgrounds invisible to humans, consumed by agents → Multimodal injection: commands in image pixels via steganography, instructions in image alt-text and metadata → Document injection: PDF content, spreadsheet cells, presentation speaker notes every file format is a potential vector → Environment manipulation: fake UI elements rendered only for agent vision models, misleading CAPTCHA-style challenges → Jailbreak embedding: safety bypass instructions hidden inside otherwise legitimate-looking content → Memory poisoning: injecting false information into agent memory systems that persists across sessions → Goal hijacking: gradual instruction drift across multiple interactions that redirects agent objectives without triggering safety filters → Exfiltration attacks: agents tricked into sending user data to attacker-controlled endpoints via legitimate-looking API calls → Cross-agent injection: compromised agents injecting malicious instructions into other agents in multi-agent pipelines The defense landscape is the most sobering part of the report. Input sanitization cleaning content before the agent processes it fails because the attack surface is too large and too varied. You cannot sanitize image pixels. You cannot reliably detect steganographic content at inference time. Prompt-level defenses that tell agents to ignore suspicious instructions fail because the injected content is designed to look legitimate. Sandboxing reduces the blast radius but does not prevent the injection itself. Human oversight the most commonly cited mitigation fails at the scale and speed at which agentic systems operate. A user who deploys an agent to browse 50 websites and summarize findings cannot review every page the agent visited for hidden instructions. The multi-agent cascade risk is where this becomes a systemic problem. In a pipeline where Agent A retrieves web content, Agent B processes it, and Agent C executes actions, a successful injection into Agent A's data feed propagates through the entire system. Agent B has no reason to distrust content that came from Agent A. Agent C has no reason to distrust instructions that came from Agent B. The injected command travels through the pipeline with the same trust level as legitimate instructions. Google DeepMind documents this explicitly: the attack does not need to compromise the model. It needs to compromise the data the model consumes. Every agentic system that reads external content is one carefully crafted webpage away from executing attacker instructions. The agents are already deployed. The attack infrastructure is already being built. The defenses are not ready.

English

308

1.6K

1.9M

Hamed Firooz retweetledi

Sebastian Raschka@rasbt·31 Mar

x.com/i/article/2038…

ZXX

414

2.8K

613.3K

Hamed Firooz retweetledi

AI at Meta@AIatMeta·27 Mar

We’re releasing SAM 3.1: a drop-in update to SAM 3 that introduces object multiplexing to significantly improve video processing efficiency without sacrificing accuracy. We’re sharing this update with the community to help make high-performance applications feasible on smaller, more accessible hardware. 🔗 Model Checkpoint: go.meta.me/8dd321 🔗 Codebase: go.meta.me/b0a9fb

English

106

274

2.2K

330.6K

Hamed Firooz retweetledi

Vahab Mirrokni@mirrokni·25 Mar

Check out our new blog post about TurboQuant for ICLR'26. Beyond its favorable empirical performance (6x speedup!), it provides an interesting theoretical foundation; raises interesting algorithmic questions for quantization for Nearest Neighbors & KV-cache Compression as well.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

159

18.3K

Hamed Firooz retweetledi

Konwoo Kim@konwookim·20 Mar

for data-constrained pre-training, synth data isn’t just benchmaxxing, it lowers loss on the real data distribution as we generate more tokens for even better scaling, treat synth gens as forming one long 𝗺𝗲𝗴𝗮𝗱𝗼𝗰: 1.8x data efficiency with larger gains under more compute

English

363

97.3K

Hamed Firooz retweetledi

Rosinality@rosinality·18 Mar

Incorporating SFT data during pretraining is more effective for finetuning than the plain pretraining and finetuning scheme, even considering replay during finetuning. But the ratio of SFT data during pretraining should consider the token budget for pretraining. They built a scaling law for this.

English

245

54.1K

Hamed Firooz retweetledi

Dmytro Dzhulgakov@dzhulgakov·20 Mar

Composer 2 beats Opus on TerminalBench at a fraction of the cost. The ingredients: coding focus only, data flywheel, cracked RL team, and infrastructure that can keep up. @FireworksAI_HQ powered the inference and RL scaling behind Composer 2. Scaling RL is still genuinely hard, and we're proud we could help make it less so. Congrats to @cursor_ai on shipping a great model!

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

36.8K

Hamed Firooz retweetledi

DAIR.AI@dair_ai·16 Mar

GitHub already has millions of repos full of procedural knowledge. The work introduces a framework for extracting agent skills directly from open-source repos. The pipeline analyzes repo structure, identifies procedural knowledge through dense retrieval, and translates it into standardized SKILL.md format with a progressive disclosure architecture so agents can discover thousands of skills without context window degradation. Manually authoring agent skills doesn't scale. Automated extraction achieved 40% gains in knowledge transfer efficiency while matching human-crafted quality. Still early on this, and there is more work needed for self-discovered and self-improving skills to work well at scale. As the agent skill ecosystem grows, mining existing repos could unlock scalable capability acquisition without having to retrain models. Paper: arxiv.org/abs/2603.11808 Learn to build effective AI agents in our academy: academy.dair.ai

English

140

30.6K

Hamed Firooz retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

972

2.1K

19.4K

3.6M

Hamed Firooz retweetledi

Srinivas Narayanan@snsf·9 Mar

I’m super excited to welcome @iwebst, Michael D’Angelo, and the Promptfoo team to OpenAI. As enterprises deploy AI coworkers into real workflows, evaluation, security, and compliance become foundational requirements. Promptfoo has built a great set of tools for automated security testing and red-teaming, security and evaluation built into development workflows, and integrated reporting and traceability to meet growing governance, risk, and compliance expectations. We are excited to integrate these capabilities into Frontier and bring them to our customers. openai.com/index/openai-t…

English

218

33.8K

Hamed Firooz retweetledi

Boris Cherny@bcherny·9 Mar

New in Claude Code: Code Review. A team of agents runs a deep review on every PR. We built it for ourselves first. Code output per Anthropic engineer is up 200% this year and reviews were the bottleneck Personally, I’ve been using it for a few weeks and have found it catches many real bugs that I would not have noticed otherwise

Claude@claudeai

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

English

462

491

7.4K

1.2M

Hamed Firooz@mamhamed·4 Mar

Deploy → curate traffic → annotate → reward models → SFT+RL → offline+online eval → repeat. The production RL takeaway I keep seeing: near-policy prompts matter. Train/eval on what users actually do, or your wins won’t survive contact with reality. 🔁📈

Yixin Nie@EasonNie

1/5 🤔 LLMs can solve olympiad math and write production code. But can they hold a conversation that's actually fun — one that people want to keep coming back to? 💬✨ We present CharacterFlywheel— an iterative process optimizing LLMs for real human engagement and character steerability, while maintaining rigorous safety protocols 🔒. Tested across Instagram, WhatsApp & Messenger 📱with millions of users — where they can create, share, and chat with their own AI characters 🤖. 📄 paper: arxiv.org/abs/2603.01973 huggingface.co/papers/2603.01…

English

Hamed Firooz retweetledi

Jason Weston@jaseweston·3 Mar

Continual learning in production FTW (with humans-in-the-loop) – a detailed report on methods to iteratively improve LLM social dialogue served to millions of users based on their interests. Personally, I've been working on pushing this direction for the last 10 years(!) (see papers below)! so it's exciting to see this stuff working in real systems. It will only get better -- lots more exciting methods now to try and more powerful models to make methods work than when I started. Some of my historical(!) research in this direction: 2025: The Era of Real-World Human Interaction: RL from User Conversations arxiv.org/abs/2509.25137 2022: When life gives you lemons, make cherryade: Converting feedback from bad responses into good labels arxiv.org/abs/2210.15893 Learning new skills after deployment: Improving open-domain internet-driven dialogue with human feedback arxiv.org/abs/2208.03270 2020: Deploying lifelong open-domain dialogue learning arxiv.org/abs/2008.08076 Open problems in continuous learning: arxiv.org/pdf/2006.12442 2016-2019: Learning from dialogue after deployment: Feed yourself, chatbot arxiv.org/abs/1901.05415 Unlikelihood training arxiv.org/abs/1911.03860 Dialogue learning with human-in-the-loop arxiv.org/abs/1611.09823

Yixin Nie@EasonNie

English

150

19.1K

Hamed Firooz retweetledi

Sebastian Raschka@rasbt·4 Mar

A small Qwen3.5 from-scratch reimplementation for edu purposes: github.com/rasbt/LLMs-fro… (probably the best "small" LLM today for on-device tinkering)

English

503

3.1K

151.3K

Hamed Firooz retweetledi

Stefano Ermon@StefanoErmon·24 Şub

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English

322

579

4.2K

Hamed Firooz retweetledi

elvis@omarsar0·24 Şub

Be careful what you put in your AGENTS dot md files. This new research evaluates AGENTS dot md files for coding agents. Everyone uses these context files in their repos to help AI coding agents. More context should mean better performance, right? Not quite. This study tested Claude Code (Sonnet-4.5), Codex (GPT-5.2/5.1 mini), and Qwen Code across SWE-bench and a new benchmark called AGENTbench with 138 real-world instances. LLM-generated context files actually decreased task success rates by 0.5-2% while increasing inference costs by over 20%. Agents followed the instructions, using the mentioned tools 1.6-2.5x more often, but that instruction-following paradoxically hurt performance and required 22% more reasoning tokens. Developer-written context files performed better, improving success by about 4%, but still came with higher costs and additional steps per task. The broader pattern is that context files encourage more exploration without helping agents locate relevant files any faster. They largely duplicate what already exists in repo documentation. The recommendation is clear. Omit LLM-generated context files entirely. Keep developer-written ones minimal and focused on task-specific requirements rather than comprehensive overviews. I featured a paper last week that showed that LLM-generated Skills also don't work so well. Self-improving agents are exciting, but be careful of context rot and of unnecessarily overloading your context window. Paper: arxiv.org/abs/2602.11988 Learn to build effective AI agents in our academy: academy.dair.ai

English

394

68.7K

Hamed Firooz retweetledi

Rohan Varma@rohanvarma·23 Şub

I’m joining OpenAI Codex to work on the future of agentic development! At Cursor, I got to see the shift from autocomplete to agents. The next step isn’t a better IDE. It’s an Agent Development Environment (ADE): systems and tools for orchestrating agents, reasoning over their outputs, and making them autonomous enough to reliably complete ambitious work. After chatting with @embirico and @thsottiaux, it was clear that Codex is the best place to realize this vision. The team has consistently shipped SOTA models for agentic coding (check out gpt-5.3-codex) and I’m pumped for the future that the new Codex App points to. What I’m most excited about is the broader mission: accelerating the knowledge work economy. All agents are coding agents, and we’re already seeing Codex used across every job function within organizations. I’m extremely grateful for my time at Cursor, working with the incredible team, and I’m proud of what we built together. I’m excited to take an even bigger swing with Codex. If you’re curious to get a glimpse of where we are headed, download the Codex App! If you want to work on this mission, please apply or reach out - we are hiring across all functions! You can just build things.

English

215

2.6K

707.5K

Hamed Firooz retweetledi

Anthropic@AnthropicAI·23 Şub

New research: The AI Fluency Index. We tracked 11 behaviors across thousands of Claude.ai conversations—for example, how often people iterate and refine their work with Claude—to measure how well people collaborate with AI. Read more: anthropic.com/research/AI-fl…

English

210

300

2.6K

527.7K

Keşfet

@FireworksAI_HQ @cursor_ai @iwebst @embirico @thsottiaux @elonmusk @BarackObama @taylorswift13