Muhammad Zayed

728 posts

Muhammad Zayed

@MoZayed007

Research Engineer On my journey of 10K hrs, Opinions are my own.

Cairo, Egypt Katılım Ekim 2018

2.5K Takip Edilen211 Takipçiler

Muhammad Zayed@MoZayed007·2d

@elliotarledge Just saw the announcement from Unsloth , the studio implementation might help you get the reasoning tokens task done as a reference maybe, good luck and excited for what comes next for this new coding space

English

Elliot Arledge@elliotarledge·4d

Here's the demo you all asked for. You can also see settings with command+k to get some basic for powerful customization. This is 100% rust btw. Much more coming soon!

Elliot Arledge@elliotarledge

Karpathy asked. I delivered. Introducing OpenSquirrel! Written in pure rust with GPUI (same as zed) but with agents as central unit rather than files. Supports Claude Code, Codex, Opencode, and Cursor (cli). This really forced me to think up the UI/UX from first principles instead of relying on common electron slop. github.com/Infatoshi/Open…

English

125

17K

Muhammad Zayed@MoZayed007·3d

@elliotarledge This is gonna be a hit if you make it as wild as ThePrimeagen neovim setup or his vim setup generally, I'll learn vim specially for it, ty for the great efforts gonna give it a roll <3

English

Muhammad Zayed retweetledi

Han Xiao@hxiao·3d

If you only have 60s of attention for Kimi's Attention Residuals paper, watch this.

English

120

81.9K

Muhammad Zayed@MoZayed007·6d

@jxnlco Is the Codex for OSS supporting research ideas? For example, if a repo is a fork from karpathy nano repo to try a hypothesis starting from GPT2, then moving forward to other models, if it scales, etc.

English

176

jason liu@jxnlco·6d

Codex for OSS next batch getting queued up today. Will review applicants and should expect emails Monday! Improving my fraud detection took a while. Thanks gpt5.4pro

English

124

Muhammad Zayed@MoZayed007·10 Mar

Does anyone here have connections in MRI, Brain Imaging, or EEG research that can help me? especially if using ML/AI in those domains.

English

Muhammad Zayed@MoZayed007·9 Mar

@kepano @xz__cv You're always one of the goats, really like your perspectives since your note on the water bottle you have. I've been following you and your journey building Obsidian. Thanks for the RTL support

English

kepano@kepano·8 Mar

@xz__cv عفواً؟ أوبسيديان مترجم بالكامل إلى العربية، وأعمل بنشاط على تحسينات تدعم الكتابة من اليمين إلى اليسار. x.com/kepano/status/…

kepano@kepano

ما الذي يزعجك في استخدام Obsidian للغات التي تُكتب من اليمين إلى اليسار؟

العربية

حَـاءْ ☁️@xz__cv·8 Mar

اللغة العربية رابع لغة في العالم بـ 422 مليون ناطق وهذي أشهر التطبيقات اللي تتجاهلنا : 🤷‍♀️ • Instagram • Opera • Discord • Trello • Notion • Obsidian كيف تدعم السلوفاكية والليتوانية اللي ناطقيها أقل من 3 مليون.. وتتجاهل العربية؟!

العربية

6.1K

Muhammad Zayed retweetledi

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

230

1.8K

183.2K

Muhammad Zayed@MoZayed007·5 Mar

@jxnlco congrats <3

English

jason liu@jxnlco·5 Mar

I’ve been at OpenAI for two weeks! I think? It’s felt like 6 months.

English

489

82.6K

Muhammad Zayed@MoZayed007·4 Mar

this

Han Xiao@hxiao

Okay Junyang is not Qwen, and Qwen is not Junyang. Behind every great model is a team grinding through data pipelines, training runs, and sleepless launches. But he was the voice, the bridge, the person who made the global AI dev community feel like Qwen was theirs too. That kind of developer trust takes years to earn and seconds to lose. What bothers me is the pattern. Big companies talk about valuing people holistically, then punish the very qualities that made those people valuable. A first-principles thinker inside a non-first-principles system doesn't just overburnt - they suffocate. And the system never sees it as its own failure. In big corp, the hardest part was never the tech- they have talents - It was watching talented people get squeezed between what they know is right and what the org will allow. That gap is where you lose your best. Wherever he lands next, the community will follow. That should tell Alibaba everything.

English

Muhammad Zayed@MoZayed007·4 Mar

@huybery I still remember the day I said "Aha" and "hmmm" because of your xml improvements explanations ty for the great run and hoping to be able to continue learning from you again wherever you are , you are still one of the GOATs both of you

English

3.9K

Binyuan Hui@huybery·4 Mar

bye qwen, me too.

Junyang Lin@JustinLin610

me stepping down. bye my beloved qwen.

Filipino

305

224

4.8K

2.6M

Muhammad Zayed@MoZayed007·3 Mar

@crystalsssup It saddens me when people I look up to face those situations in places I assumed GOALs would be achieved, but now I don't know where to aspire to join anymore without facing these situations.

English

5.1K

Crystal@crystalsssup·3 Mar

I'm truly surprised. Qwen has really lost a great talent. But that's the politics of big tech hierarchies. Junyang was a P10 at Alibaba, and with the highest level being P14, there were many layers between him and top leadership. Perhaps many things weren't his call to make, but he was a good leader - which can also become a threat in power structures. Junyang made the right choice to leave. He deserves a better place. 🫶

You Jiacheng@YouJiacheng

To be precise: Alibaba-Cloud kicked out Qwen's tech lead.

English

1.1K

116.5K

Muhammad Zayed@MoZayed007·3 Mar

I hate politics, and Bureaucracy

English

Muhammad Zayed@MoZayed007·3 Mar

@JustinLin610 Thanks for the insightful, interesting, and educational run with Qwen, hope what comes next continues under the same spotlight 🙏🏻

English

1.3K

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

741

13.6K

6.5M

Muhammad Zayed@MoZayed007·3 Mar

@OfficialLoganK Please let it be "it"

English

Logan Kilpatrick@OfficialLoganK·3 Mar

gemini

Indonesia

465

141

3.3K

625.3K

Muhammad Zayed retweetledi

StepFun@StepFun_ai·2 Mar

"can we get the base model?" sure. here's two. "can we get the code?" sure. here's SteptronOSS. "what about the SFT data?" coming soon. maximum sincerity, minimum barriers. - Step 3.5 Flash Base — pretrained foundation - Step 3.5 Flash Base-Midtrain — code, agents & long-context - SteptronOSS — open-sourced, ready for your custom workflows - SFT Data — coming soon for reference not just the final checkpoint — a customizable pipeline. 🤗 huggingface.co/stepfun-ai/Ste… 🤗 huggingface.co/stepfun-ai/Ste… 💻 github.com/stepfun-ai/Ste…

English

120

1.2K

142.5K

Muhammad Zayed retweetledi

Max Li 李赵硕@mli0603·1 Mar

I've been debugging RoPE recently and kept getting tripped up by details that most explanations gloss over. So I wrote a deep dive. "Understanding RoPE: From Rotary Embeddings to Context Extension" mli0603.notion.site/Understanding-… The blog covers: • Full RoPE derivation from rotation matrices • A clean proof of why RoPE's attention decays with distance (and when it breaks) • The π boundary (RoPE's Nyquist limit) • NTK-aware scaling derivation • Dynamic NTK • YaRN's frequency ramp + attention scaling • Reference PyTorch code Hope it helps! Feedback welcome!

English

537

60.3K

Muhammad Zayed@MoZayed007·2 Mar

@maharshii can I skip triton go from studying CUDA to CuTe?

English

maharshi@maharshii·1 Mar

Imo, it takes a while to get familiar with layouts, tiling, and predication stuff but it’s smooth sailing after that if you have worked with pure CUDA before. The thing that blowed my mind was predication through the identity tensor, epic stuff.

English

2.6K

maharshi@maharshii·1 Mar

CuTeDSL is my new favourite thing: I wrote a kernel for RMS norm after learning about layouts, tiling, copying tensors, reductions and so on, especially for inference and it is about 2.13x faster than a triton fused kernel for the given shape.

English

267

16.1K

Muhammad Zayed@MoZayed007·28 Şub

@wesamo__ Wish you all the best in your upcoming arc <3

English

1.6K

Wesam@wesamo__·28 Şub

I just handed in my resignation at OpenAI

Sam Altman@sama

Tonight, we reached an agreement with the Department of War to deploy our models in their classified network. In all of our interactions, the DoW displayed a deep respect for safety and a desire to partner to achieve the best possible outcome. AI safety and wide distribution of benefits are the core of our mission. Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement. We also will build technical safeguards to ensure our models behave as they should, which the DoW also wanted. We will deploy FDEs to help with our models and to ensure their safety, we will deploy on cloud networks only. We are asking the DoW to offer these same terms to all AI companies, which in our opinion we think everyone should be willing to accept. We have expressed our strong desire to see things de-escalate away from legal and governmental actions and towards reasonable agreements. We remain committed to serve all of humanity as best we can. The world is a complicated, messy, and sometimes dangerous place.

English

1.3K

6.1K

65.5K

3.7M

Muhammad Zayed@MoZayed007·22 Şub

not as zero modifications but minimal to be truly honest.

English

Muhammad Zayed@MoZayed007·22 Şub

4/ And it has Universal Portability. Using PyTorch forward_hooks, the learning mechanics (Knowledge Distillation & Dense Sparse Attention) can now be injected natively into standard HuggingFace models (Qwen, Llama, Mistral) with ZERO source-code modifications.

English

Muhammad Zayed@MoZayed007·22 Şub

Remember that prototype to give LLMs "Live Memory" without external databases? I haven't fully run the experiment yet, but the architecture is so promising, I wanted to open-source the pre-alpha Temporal History Episodic Network (THEN) for anyone ready to try it. 🧵👇 1/ A few weeks ago, I started hacking memory directly into a toy GPT-2 transformer. The goal: let the AI form internal memories that update during inference, inspired by the human hippocampus.

English

Muhammad Zayed@MoZayed007·22 Şub

Please note this is a hypothesis not fully tested, utilize under your discretion.

English

Muhammad Zayed@MoZayed007·22 Şub

6/ I’m releasing this wrapper directory early for anyone who has a setup ready to test their own modifications. (Again, pre-alpha, so please run at your own discretion!) Grab the repo fork (with the added docs, the extra portable module, and my modified THEN wrapper), drop it into your local pipeline, and let me know how it goes! github.com/mozayed007/nan…

English

Keşfet

@elliotarledge @jxnlco @kepano @xz__cv @huybery @crystalsssup @JustinLin610 @elonmusk