Mohammed Ibrahim

8.3K posts

Mohammed Ibrahim

@MohammedIM1982

A student of life, lovin' it till you can get enough. 🇦🇪❤️ Father of ☝🏼… 🤲🏼

U.A.E. Katılım Nisan 2011

461 Takip Edilen253 Takipçiler

Sabitlenmiş Tweet

Mohammed Ibrahim@MohammedIM1982·20 May

الحمدلله على أجمل العطايا وهبنا بفضل الله مولودنا عبدالله بن محمد بن مراد... اللهم أجعله من عبادك الصالحين و قرة عين لنا و أصلح حالنا فيما رزقتنا و بارك لنا فيه... 🤲🏼❤️❤️❤️ -240514

العربية

1.1K

Mohammed Ibrahim retweetledi

François Chollet@fchollet·25 Mar

You can also enter the ARC-AGI-3 competition on Kaggle. Your AI agents will be tested on two separate private test sets of 55 environments. kaggle.com/competitions/a…

English

102

21.4K

Mohammed Ibrahim retweetledi

François Chollet@fchollet·25 Mar

ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time. We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions. Meanwhile, all frontier AI reasoning models do under 1% at this time.

English

191

328

2.6K

551.4K

Mohammed Ibrahim retweetledi

François Chollet@fchollet·25 Mar

You can go play some of the environments yourself - 25 of them are now public: arcprize.org

English

162

21.7K

Mohammed Ibrahim retweetledi

Victor M@victormustar·10 Mar

Read about it: huggingface.co/blog/storage-b…

English

1.2K

Mohammed Ibrahim retweetledi

Thomas Wolf@Thom_Wolf·10 Mar

This has been our fastest growing recent product. AI WANTS data. We’re making petabyte storage cheap and fast.

Victor M@victormustar

Introducing Storage Buckets on Hugging Face 🧑‍🚀 The first new repo type on the Hub in 4 years: S3-like object storage, mutable, non-versioned, built on Xet deduplication. - Starting at $8/TB/mo. That's 3x cheaper than S3. You (and your coding agents) need somewhere to dump checkpoints, logs, and artifacts. Now they have a home.

English

22K

Mohammed Ibrahim retweetledi

Alif Munim (d/acc)@alifmunim·12 Mar

Since @karpathy kicked off recursive self-improvement a few days ago, I've been thinking about how we can automate interpretability research. I asked Claude to train a sparse autoencoder on Gemma3-1B. It recovered 96% of Gemma's behaviors from interpretable features overnight.

English

452

41.4K

Mohammed Ibrahim retweetledi

Elliot Arledge@elliotarledge·16 Mar

Karpathy asked. I delivered. Introducing OpenSquirrel! Written in pure rust with GPUI (same as zed) but with agents as central unit rather than files. Supports Claude Code, Codex, Opencode, and Cursor (cli). This really forced me to think up the UI/UX from first principles instead of relying on common electron slop. github.com/Infatoshi/Open…

Andrej Karpathy@karpathy

Expectation: the age of the IDE is over Reality: we’re going to need a bigger IDE (imo). It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.

English

143

171

2.5K

412K

Mohammed Ibrahim retweetledi

Thomas Wolf@Thom_Wolf·19 Mar

This is really cool. It got me thinking more deeply about personalized RL: what’s the real point of personalizing a model in a world where base models can become obsolete so quickly? The reality in AI is that new models ship every few weeks, each better than the last. And the pace is only accelerating, as we see on the Hugging Face Hub. We are not far away from better base models dropping daily. There’s a research gap in RL here that almost no one is working on. Most LLM personalization research assumes a fixed base model, but very few ask what happens to that personalization when you swap the base model. Think about going from Llama 3 to Llama 4. All the tuned preferences, reward signals, and LoRAs are suddenly tied to yesterday’s model. As a user or a team, you don’t want to reteach every new model your preferences. But you also don’t want to be stuck on an older one just because it knows you. We could call this "RL model transferability": how can an RL trace, a reward signal, or a preference representation trained on model N be distilled, stored, and automatically reapplied to model N+1 without too much user involvement? We solved that in SFT where a training dataset can be stored and reused to train a future model. We also tackled a version of that in RLHF phases somehow but it remain unclear more generally when using RL deployed in the real world. There are some related threads (RLTR for transferable reasoning traces, P-RLHF and PREMIUM for model-agnostic user representations, HCP for portable preference protocols) but the full loop seems under-studied to me. Some of these questions are about off-policy but other are about capabilities versus personalization: which of the old customizations/fixes does the new model already handle out of the box, and which ones are actually user/team-specific to ever be solved by default? That you would store in a skill for now but that RL allow to extend beyond the written guidance level. I have surely missed some work so please post any good work you’ve seen on this topic in the comments.

Ronak Malde@rronak_

This paper is almost too good that I didn't want to share it Ignore the OpenClaw clickbait, OPD + RL on real agentic tasks with significant results is very exciting, and moves us away from needing verifiable rewards Authors: @YinjieW2024 Xuyang Chen, Xialong Jin, @MengdiWang10 @LingYang_PU

English

739

118.8K

Mohammed Ibrahim retweetledi

Muratcan Koylan@koylanai·22 Mar

If you're building anything in AI, the best skill you need to be using right now is hugging-face-paper-pages Whatever problem you're facing, someone has probably already published a paper about it. HF's Papers API gives a hybrid semantic search over AI papers. I wrote an internal skill, context-research, that orchestrates the HF Papers API into a research pipeline. It runs five parallel searches with keyword variants, triages by relevance and recency, fetches full paper content as markdown, then reads the actual methodology and results sections. The skill also chains into a deep research API that crawls the broader web to complement the academic findings. The gap between "a paper was published" and "a practitioner applies the insight" is shrinking, and I think this is a practical way to provide relevant context to coding agents. So you should write a skill on top of the HF Paper skill that teaches the model how to think about research, not just what to search for.

English

149

1.6K

98.3K

Mohammed Ibrahim retweetledi

jack@jack·23 Mar

is the future value of "open source" code anymore? i believe it's shifting to data, provenance, protocols, evals, and weights. in that order.

English

977

770

7.4K

777.8K

Mohammed Ibrahim retweetledi

Lewis Tunstall@_lewtun·23 Mar

You can now pretrain LLMs entirely on the HF Hub 💥 Last week, @OpenAI launched a competition to see who can pretrain the best LLM in under 10 minutes. So over the weekend, I made a little demo to automate this end-to-end using the Hub as the infra layer: - Jobs to scale compute - Buckets to store all experiments - Trackio to log all the metrics The cool thing here is that everything is launched locally: no ssh shenanigans into a cluster or fighting with colleagues over storage and GPUs ⚔️ All that's left is coming up with new ideas, but luckily Codex can automate that part too 😁 Can I have a job now please @reach_vb 🙏?

GIF

English

249

76.2K

Mohammed Ibrahim@MohammedIM1982·24 Mar

@jennyzhangzt 🤲🏼❤️❤️

QME

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).

English

154

652

3.6K

491.8K

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

Hyperagents: arxiv.org/abs/2603.19461 Code: github.com/facebookresear… Huge thank you to everyone who discussed and gave feedback during this project, and to all collaborators Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos) for hosting me at Meta (@AIatMeta)!

English

167

10.3K

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

We also observe evidence of compounding self-improvements. Self-improvements discovered in one run can be transferred to a new setting and continue accumulating. The figure shows that initializing from a transferred hyperagent from another experiment leads to faster progress and higher final performance.

English

7.5K

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

The DGM-H learns how to improve, yielding general and transferable self-improvement capability. One example is the autonomous innovation of persistent memory, which enables learning to accumulate across iterations. Instead of merely logging numerical scores, the hyperagent stores synthesized insights, causal hypotheses, and forward-looking plans (e.g., identifying which generations performed best, diagnosing overcorrections, and proposing how to combine successful strategies). This memory is actively consulted during subsequent self-modification steps, allowing later generations to build on earlier discoveries and avoid repeating past mistakes. Example of a stored memory entry:

English

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

Our experiments show that the DGM-H can continuously self-improve across diverse domains, with generalizable improvements in both task performance and self-improvement ability. On coding, the DGM-H achieves gains comparable to the DGM, despite not being handcrafted for coding. Beyond coding, the DGM-H substantially improves performance on paper review and robotics reward design, with gains transferring to held-out test tasks and significantly outperforming prior self-improving algorithms, which struggle outside coding unless customized. The left figure here shows a tree diagram of the open-ended evolutionary search process of hyperagents. The right figure shows performance progress over iterations, and a summary of key innovations of the DGM-H on paper review.

English

8.9K

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

To understand hyperagents, it helps to start with a prior self-improving AI system, the Darwin Gödel Machine (DGM). In the DGM, a coding agent repeatedly generates modified versions of itself, evaluates them on coding tasks, and stores successful variants in an archive of stepping stones for future improvement. However, the DGM improves at improving primarily within coding tasks only. It relies on a key assumption: the evaluation task and the self-modification task must be aligned. In coding, this works well. Improving the agent’s coding ability also improves its ability to analyze its own code and generate better modifications. But outside coding, this alignment often breaks. For example, improving an agent’s ability to write poetry would not necessarily improve its ability to modify its own code. We address this limitation with hyperagents. A hyperagent integrates the task agent and the meta agent into a single self-referential, editable program. Because the meta-level modification procedure is itself modifiable, the system does not require alignment between the evaluation task and the self-modification task. We instantiate this idea by extending the Darwin Gödel Machine to create DGM-Hyperagents (DGM-H). The DGM-H retains the open-ended exploration process of the DGM while allowing the self-improvement mechanism itself to evolve, enabling metacognitive self-modification across diverse domains.

English

14.9K

Mohammed Ibrahim retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

The paper “Hyperagents” arxiv.org/abs/2603.19461 Hyperagents suggest a path toward self-accelerating systems that not only search for better solutions, but continually improve their ability to self-improve.

English

215

43.8K

Mohammed Ibrahim retweetledi

Vaidehi@Ai_Vaidehi·23 Mar

🚨 BREAKING: Claude Code just got 𝗜𝗡𝗙𝗜𝗡𝗜𝗧𝗘 𝗠𝗘𝗠𝗢𝗥𝗬 (for free) A new open-source plugin called 𝗖𝗹𝗮𝘂𝗱𝗲-𝗠𝗲𝗺 gives Claude Code persistent memory across sessions. Meaning: Claude can now remember your projects 𝗙𝗢𝗥𝗘𝗩𝗘𝗥. Here’s why this is a big deal: • Up to 𝗠𝗢𝗦𝗧𝗟𝗬 𝟵𝟱% fewer tokens per session • 𝗨𝗣 𝗧𝗢 𝟮𝟬× more tool calls before hitting limits • Real project memory across sessions • 𝗙𝗨𝗟𝗟𝗬 open-source Instead of starting from zero every session... Claude now builds 𝗟𝗢𝗡𝗚-𝗧𝗘𝗥𝗠 𝗞𝗡𝗢𝗪𝗟𝗘𝗗𝗚𝗘 about your codebase. This changes the game: AI is no longer just answering questions. It’s starting to 𝗥𝗘𝗠𝗘𝗠𝗕𝗘𝗥, 𝗟𝗘𝗔𝗥𝗡, and 𝗘𝗩𝗢𝗟𝗩𝗘 with you. We’re moving from AI chats → 𝗔𝗜 𝗖𝗢𝗪𝗢𝗥𝗞𝗘𝗥𝗦 with memory 🧠 The AI dev workflow is evolving fast. And most people haven’t caught up yet. Repo in comments 👇

English

706

Keşfet

@karpathy @OpenAI @reach_vb @jennyzhangzt @AIatMeta @BingchenZhao @winnieyangwn @j_foerst