Muhammad Zayed

728 posts

Muhammad Zayed banner
Muhammad Zayed

Muhammad Zayed

@MoZayed007

Research Engineer On my journey of 10K hrs, Opinions are my own.

Cairo, Egypt Katılım Ekim 2018
2.5K Takip Edilen211 Takipçiler
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@elliotarledge Just saw the announcement from Unsloth , the studio implementation might help you get the reasoning tokens task done as a reference maybe, good luck and excited for what comes next for this new coding space
English
0
0
0
19
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@elliotarledge This is gonna be a hit if you make it as wild as ThePrimeagen neovim setup or his vim setup generally, I'll learn vim specially for it, ty for the great efforts gonna give it a roll <3
English
0
0
1
85
Muhammad Zayed retweetledi
Han Xiao
Han Xiao@hxiao·
If you only have 60s of attention for Kimi's Attention Residuals paper, watch this.
English
16
120
1K
81.9K
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@jxnlco Is the Codex for OSS supporting research ideas? For example, if a repo is a fork from karpathy nano repo to try a hypothesis starting from GPT2, then moving forward to other models, if it scales, etc.
English
0
0
0
176
jason liu
jason liu@jxnlco·
Codex for OSS next batch getting queued up today. Will review applicants and should expect emails Monday! Improving my fraud detection took a while. Thanks gpt5.4pro
English
21
2
124
6K
Muhammad Zayed
Muhammad Zayed@MoZayed007·
Does anyone here have connections in MRI, Brain Imaging, or EEG research that can help me? especially if using ML/AI in those domains.
English
0
0
0
24
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@kepano @xz__cv You're always one of the goats, really like your perspectives since your note on the water bottle you have. I've been following you and your journey building Obsidian. Thanks for the RTL support
English
0
0
2
89
حَـاءْ ☁️
حَـاءْ ☁️@xz__cv·
اللغة العربية رابع لغة في العالم بـ 422 مليون ناطق وهذي أشهر التطبيقات اللي تتجاهلنا : 🤷‍♀️ • Instagram • Opera • Discord • Trello • Notion • Obsidian كيف تدعم السلوفاكية والليتوانية اللي ناطقيها أقل من 3 مليون.. وتتجاهل العربية؟!
حَـاءْ ☁️ tweet media
العربية
2
1
16
6.1K
Muhammad Zayed retweetledi
Tri Dao
Tri Dao@tri_dao·
The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth.  Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.
Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English
30
230
1.8K
183.2K
jason liu
jason liu@jxnlco·
I’ve been at OpenAI for two weeks! I think? It’s felt like 6 months.
English
62
6
489
82.6K
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@huybery I still remember the day I said "Aha" and "hmmm" because of your xml improvements explanations ty for the great run and hoping to be able to continue learning from you again wherever you are , you are still one of the GOATs both of you
English
0
0
1
3.9K
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@crystalsssup It saddens me when people I look up to face those situations in places I assumed GOALs would be achieved, but now I don't know where to aspire to join anymore without facing these situations.
English
1
0
9
5.1K
Crystal
Crystal@crystalsssup·
I'm truly surprised. Qwen has really lost a great talent. But that's the politics of big tech hierarchies. Junyang was a P10 at Alibaba, and with the highest level being P14, there were many layers between him and top leadership. Perhaps many things weren't his call to make, but he was a good leader - which can also become a threat in power structures. Junyang made the right choice to leave. He deserves a better place. 🫶
You Jiacheng@YouJiacheng

To be precise: Alibaba-Cloud kicked out Qwen's tech lead.

English
22
30
1.1K
116.5K
Muhammad Zayed
Muhammad Zayed@MoZayed007·
I hate politics, and Bureaucracy
English
0
0
0
33
Muhammad Zayed
Muhammad Zayed@MoZayed007·
@JustinLin610 Thanks for the insightful, interesting, and educational run with Qwen, hope what comes next continues under the same spotlight 🙏🏻
English
0
0
0
1.3K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
741
13.6K
6.5M
Muhammad Zayed retweetledi
StepFun
StepFun@StepFun_ai·
"can we get the base model?" sure. here's two. "can we get the code?" sure. here's SteptronOSS. "what about the SFT data?" coming soon. maximum sincerity, minimum barriers. - Step 3.5 Flash Base — pretrained foundation - Step 3.5 Flash Base-Midtrain — code, agents & long-context - SteptronOSS — open-sourced, ready for your custom workflows - SFT Data — coming soon for reference not just the final checkpoint — a customizable pipeline. 🤗 huggingface.co/stepfun-ai/Ste… 🤗 huggingface.co/stepfun-ai/Ste… 💻 github.com/stepfun-ai/Ste…
English
33
120
1.2K
142.5K
Muhammad Zayed retweetledi
Max Li 李赵硕
Max Li 李赵硕@mli0603·
I've been debugging RoPE recently and kept getting tripped up by details that most explanations gloss over. So I wrote a deep dive. "Understanding RoPE: From Rotary Embeddings to Context Extension" mli0603.notion.site/Understanding-… The blog covers: • Full RoPE derivation from rotation matrices • A clean proof of why RoPE's attention decays with distance (and when it breaks) • The π boundary (RoPE's Nyquist limit) • NTK-aware scaling derivation • Dynamic NTK • YaRN's frequency ramp + attention scaling • Reference PyTorch code Hope it helps! Feedback welcome!
English
8
58
537
60.3K
maharshi
maharshi@maharshii·
Imo, it takes a while to get familiar with layouts, tiling, and predication stuff but it’s smooth sailing after that if you have worked with pure CUDA before. The thing that blowed my mind was predication through the identity tensor, epic stuff.
English
2
0
13
2.6K
maharshi
maharshi@maharshii·
CuTeDSL is my new favourite thing: I wrote a kernel for RMS norm after learning about layouts, tiling, copying tensors, reductions and so on, especially for inference and it is about 2.13x faster than a triton fused kernel for the given shape.
maharshi tweet media
English
14
7
267
16.1K
Muhammad Zayed
Muhammad Zayed@MoZayed007·
not as zero modifications but minimal to be truly honest.
English
0
0
0
18
Muhammad Zayed
Muhammad Zayed@MoZayed007·
4/ And it has Universal Portability. Using PyTorch forward_hooks, the learning mechanics (Knowledge Distillation & Dense Sparse Attention) can now be injected natively into standard HuggingFace models (Qwen, Llama, Mistral) with ZERO source-code modifications.
English
2
0
0
57
Muhammad Zayed
Muhammad Zayed@MoZayed007·
Remember that prototype to give LLMs "Live Memory" without external databases? I haven't fully run the experiment yet, but the architecture is so promising, I wanted to open-source the pre-alpha Temporal History Episodic Network (THEN) for anyone ready to try it. 🧵👇 1/ A few weeks ago, I started hacking memory directly into a toy GPT-2 transformer. The goal: let the AI form internal memories that update during inference, inspired by the human hippocampus.
English
1
0
1
70
Muhammad Zayed
Muhammad Zayed@MoZayed007·
Please note this is a hypothesis not fully tested, utilize under your discretion.
English
0
0
0
14
Muhammad Zayed
Muhammad Zayed@MoZayed007·
6/ I’m releasing this wrapper directory early for anyone who has a setup ready to test their own modifications. (Again, pre-alpha, so please run at your own discretion!) Grab the repo fork (with the added docs, the extra portable module, and my modified THEN wrapper), drop it into your local pipeline, and let me know how it goes! github.com/mozayed007/nan…
English
1
0
0
28