Mohamed Abdelfattah

631 posts

Mohamed Abdelfattah

@mohsaied

Assistant Prof @CornellECE and cofounder/Chief Science Officer at @mako_dev_ai. At the intersection of machine learning and hardware. Father. Muslim.

NYC Katılım Mart 2009

601 Takip Edilen1.6K Takipçiler

Mohamed Abdelfattah@mohsaied·17 Mar

It's mid-march and I have been getting 1 review request per day from @IEEEorg journals. I've happily reviewed many papers in the past, but I obviously cannot do 1 per day. Not sure what's going on. Is it just me? any tips on how to throttle these requests? #academia

English

530

Mohamed Abdelfattah@mohsaied·7 Mar

pip install makora makora.com/blog/makora-cli

English

789

Mohamed Abdelfattah@mohsaied·27 Şub

@lydiahallie @ye_combinator Do you have any programs for academia? I’m teaching a graduate course in the fall on building software agents and would love it if my students use @AnthropicAI models.

English

307

Lydia Hallie ✨@lydiahallie·27 Şub

Excited to announce Claude for Open Source ❤️ We're giving 6 months of free Claude Max 20x to open source maintainers and core contributors. If you maintain a popular project or contribute across open source, please apply! claude.com/contact-sales/…

English

590

1.4K

12.6K

1.7M

Mohamed Abdelfattah@mohsaied·13 Şub

@dandiep We’re friends with @OpenAI ❤️ we are also using a new feature where rollouts run on our servers through an API call.

English

131

Dan Diephouse@dandiep·13 Şub

@mohsaied How does one get access to gpt 5 for fine tuning? Still stuck on 4.1 …

English

205

Mohamed Abdelfattah@mohsaied·12 Şub

At Makora, we collaborated with OpenAI to fine-tune GPT-5 for GPU kernel generation. In our technical report, we outline intricacies related to dataset curation, RL evaluation environment, hack mitigation, tool-calling, and agent workflow integration. This results in more than 2x performance improvement over PyTorch! We're expanding our dataset, scaling up training, extending to multiple languages and hardware, and working on many more new and exciting ways for more controllable and predictable GPU kernel generation. 🚀 arxiv.org/abs/2602.11000

English

234

14.7K

Mohamed Abdelfattah@mohsaied·10 Şub

Thanks @kimbochen from @SemiAnalysis_ for a great talk on Tensor core architecture progression in my ML HW & Systems class! m.youtube.com/watch?v=oabGy-…

English

725

Mohamed Abdelfattah retweetledi

Waleed Atallah@wAIeedatallah·8 Ara

You're probably not ready for how dramatically the world of GPU kernel engineering will change in the next 12 months. It's not about the next DSL. @makora_ai is generating SOTA kernels automatically. @AIatAMD GPUs running MLA twice as fast as h100 across all sizes below:

English

170

24.6K

Mohamed Abdelfattah retweetledi

Makora@makora_ai·4 Ara

Mako is now Makora. Same team. Same mission. New name. Stay tuned for what's coming next. 👀

English

3.4K

Mohamed Abdelfattah@mohsaied·2 Ara

On my way to #NeurIPS2025! Message me if you want to meet and chat about: Hiring or collaboration @mako_dev_ai , post-doc positions at @Cornell, or AI efficiency research in general!

English

467

Mohamed Abdelfattah@mohsaied·18 Kas

Awesome to chat with you, @ClementDelangue! The vision to enable all AI builders fits in so well to what we're doing at @mako_dev_ai 🤗

clem 🤗@ClementDelangue

@klazizpro was meeting @mohsaied at @mako_dev_ai @cornell_tech

English

327

Mohamed Abdelfattah@mohsaied·29 Eki

Cornell has multiple post-doctoral fellowships for working on foundational and applied AI. Get in touch if you are interested in working on AI efficiency, hardware, or systems. Apply now to this amazing opportunity: lnkd.in/eCiDjj6N

English

291

Mohamed Abdelfattah retweetledi

Rohan Paul@rohanpaul_ai·4 Eki

New Google+Cornell paper shows 1 compact language model can read code and predict memory, latency, and accuracy across languages and hardware. A 300M model hits 0.9+ on APPS memory and leads classic neural architecture search predictors. The task is code to metric regression, predict memory or runtime from code without running it. Past systems rely on hand tuned features per language or graph, and they break when code changes. This model reads raw code or ONNX graphs with a T5Gemma encoder and predicts numbers digit by digit. Sequential prediction lets 1 model learn many tasks and capture tradeoffs like accuracy versus latency. Digit tokenization avoids normalization across mixed scales and beats a mean squared error head. High rank correlation means it ranks candidates well, like picking the lowest memory solution. Language pretraining and synthetic regression pretraining speed training and raise accuracy while removing brittle feature engineering. Net effect, 1 text model replaces many bespoke predictors across languages, graphs, and hardware. ---- Paper – arxiv. org/abs/2509.26476 Paper Title: "Regression Language Models for Code"

English

178

18.5K

Mohamed Abdelfattah@mohsaied·1 Eki

Have you written a GPU kernel before that made GPUs go brrrrrrrrr? We're hiring experienced GPU developers at @mako_dev_ai to join our growing team of ninja engineers, working on redefining AI infrastructure. Message me directly or apply here: jobs.mako.dev/GPU-Kernel-Eng…

English

331

Mohamed Abdelfattah retweetledi

M13@M13Company·18 Ağu

We’re proud to lead @mako_dev_ai's $8.5M+ seed alongside @Neo, @Flybridge, and other investors. Read how Mako is changing the way the industry thinks about performance engineering in the world of AI. ⤵ m13.co/article/meet-m… @wAIeedatallah, @mohsaied, Lukasz Dudziak, @kalomarNYC, @KarimBhalwani

English

1.1K

Mohamed Abdelfattah retweetledi

The Information@theinformation·13 Ağu

.@wAIeedatallah, CEO of @mako_dev_ai, on automating GPU code generation. “It really opens the door for new AI research, new algorithms, and new hardware. “We want to compress months of engineering effort down into hours, and that's what our agent does.” Watch the full episode on bit.ly/4lm78Dp.

English

3.6K

Mohamed Abdelfattah@mohsaied·12 Ağu

.@M13Company has been an amazing partner in our seed round. A thoroughly technical team that understands the problems that we’re trying to solve. High performance GPU code needs automation and abstraction. That’s exactly what we’re going after at @mako_dev_ai!

M13@M13Company

We’re thrilled to welcome @mako_dev_ai to the M13 portfolio. Mako is changing how AI models run at scale. For years, NVIDIA’s CUDA has been the default programming interface for GPU workloads, giving developers power but also locking them into one way of working. Now, as AI hardware diversifies, from AMD to custom accelerators, the industry needs a performance layer that works everywhere. That’s what Mako delivers. Co-founders @wAIeedatallah, @mohsaied, and Lukasz Dudziak are building AI-native infrastructure that automates GPU kernel generation and tuning. This lets developers deploy models faster, hit better price-performance, and run on any GPU with no rewrites, no hand-tuning. It’s like what Kubernetes did for the cloud but for AI compute. M13 led Mako’s $8.5M+ seed round with @neo, @flybridge, and angel investors including AI pioneer Jeff Dean. We’re excited to be part of infrastructure history in the making. For more about Mako: @ChristineMHall talks to the Mako team and @kalomarNYC about its bold vision for GPU freedom. m13.co/article/meet-m…

English

597

Mohamed Abdelfattah retweetledi

Makora@makora_ai·22 Tem

We just shipped 15x Faster #CUDA kernel compilation for MakoGenerate. How and why we're digging into this part of the pipeline, and a detailed blog post below 🧵

English

961

Mohamed Abdelfattah@mohsaied·8 Tem

Feature drop from @mako_dev_ai !! You can now define your own custom pytorch problem to generate GPU kernels on our #makogenerate platform!

Waleed Atallah@wAIeedatallah

MakoGenerate now supports custom problems, meaning you can generate #CUDA or #Triton kernels for any @PyTorch reference code you have! Lets walk through an example using @GPU_MODE's latest contest: Triangle Multiplicative Update (TriMul) module

English

921

Mohamed Abdelfattah@mohsaied·1 Tem

We use large-scale text-to-text regression to predict specific parameters (e.g. utilization) of compute nodes in Google's datacenter, purely based on training on a (very) large corpus of unstructured system logs!! Paper: arxiv.org/abs/2506.21718 Code: github.com/google-deepmin…

Richard Song@XingyouSong

Seeing text-to-text regression work for Google’s massive compute cluster (billion $$ problem!) was the final result to convince us we can reward model literally any world feedback. Paper: arxiv.org/abs/2506.21718 Code: github.com/google-deepmin… Just train a simple encoder-decoder from scratch to read the cluster’s complex state as text, then generate numeric tokens. We’re also seeing strong results on classic tabular data and "exotic" inputs like graphs, system logs, and even code snippets. Feature engineering will no longer exist! Authors: @yashakha, Bryan Lewandowski, Cheng-Hsi Lin, Adrian N. Reyes, Grant C. Forbes, Arissa Wongpanich, Bangding Yang, @mohsaeid, @SagiPerel, @XingyouSong

English

745

Mohamed Abdelfattah retweetledi

Gene Chou@gene_ch0u·28 Haz

We've released all code and models for FlashDepth! It produces depth maps from a 2k, streaming video in real-time. This was a really fun course project inspired by discussions with @mohsaied and @stevenygd and we look forward to presenting it at #ICCV2025. GitHub: github.com/Eyeline-Resear… Project page: eyeline-research.github.io/FlashDepth/

Eyeline@eyelinestudios

The latest research paper from @eyelinestudios, FlashDepth, has been accepted to the International Conference on Computer Vision (#ICCV2025). Our model produces accurate and high-resolution depth maps from streaming videos in real time and is completely built on open-source models and data. We hope it will be applied to various online applications, like robotics and on-set video composition. It has already been integrated into a few internal tools for visual effects and real-time depth estimation, segmentation, and matting tasks. Congrats to the team: @gene_ch0u, @wenqi_xian, @stevenygd, @mohsaied, @Bharathharihar3, @Jimantha, @realNingYu, @debfx ! All models and code have been released: GitHub: github.com/Eyeline-Resear… Project page: eyeline-research.github.io/FlashDepth/ Paper: arxiv.org/abs/2504.07093

English

549

37.6K

Keşfet

@IEEEorg @lydiahallie @ye_combinator @AnthropicAI @dandiep @OpenAI @kimbochen @SemiAnalysis_