Hamdy🧬

677 posts

Hamdy🧬

@mhamdy_res

A curious explorer of human and machine learning 🧐🤝🤖

Cairo, Egypt Katılım Mart 2020

3.7K Takip Edilen155 Takipçiler

Sabitlenmiş Tweet

Hamdy🧬@mhamdy_res·23 Oca

🎉 Our paper, “Bridging The Data Provenance Gap Across Text, Speech, and Video”, has been accepted to #ICLR2025 🚀

Hamdy🧬@mhamdy_res

✨ Excited to share our latest work from The Data Provenance Initiative ☸️ This is the most comprehensive audit of multimodal training data, auditing ~4000 datasets between 1990 and 2024, and covering more than 400 unique tasks in 608 languages! 🧵 1/n

English

7.4K

Hamdy🧬@mhamdy_res·14 Oca

@_xjdr make moe

GIF

English

xjdr@_xjdr·13 Oca

ok so: engram is moe over ngramed memory mHC is moe over the residual stream NSA is moe over attention MoE is moe over FFNs ... im sensing a theme ....

English

545

59.6K

Hamdy🧬@mhamdy_res·14 Oca

Read the paper: arxiv.org/abs/2601.07372

English

Hamdy🧬@mhamdy_res·14 Oca

The new DeepSeek Engram paper is super fun! It also integrates mHC, and I think they're probably releasing all these papers to make the V4 report of reasonable length😄 Here's a nice short summary from @GeminiApp 🫡

English

205

Hamdy🧬@mhamdy_res·14 Oca

@sundarpichai Go Gemma!

English

Sundar Pichai@sundarpichai·14 Oca

MedGemma 1.5 is a major upgrade to our open models for healthcare developers. The new 4B model enables developers to build applications that natively interpret full 3D scans (CTs, MRIs) with high efficiency - a first, we believe, for an open medical generalist model. MedGemma 1.5 also pairs well with MedASR, our speech-to-text model fine-tuned for highly accurate medical dictation. Developers can now use these multimodal capabilities to build medical apps that reach patients in more places.

English

179

698

394.5K

Hamdy🧬@mhamdy_res·12 Oca

On this day, the Heuristically programmed ALgorithmic computer, better known as HAL 9000, became operational "I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do." - HAL 9000

English

Hamdy🧬@mhamdy_res·11 Oca

Original essay: papers.ssrn.com/sol3/papers.cf…

English

Hamdy🧬@mhamdy_res·11 Oca

On the slow death of scaling Because we can't scale forever, and the world is not enough Give it a chance, this isn't another deep learning hitting a wall... open.substack.com/pub/surfingman…

English

Hamdy🧬@mhamdy_res·8 Oca

Severance directed by David Lynch

English

Hamdy🧬@mhamdy_res·1 Oca

arxiv.org/abs/2512.24880

ZXX

Hamdy🧬@mhamdy_res·1 Oca

Weekend read ☕🐋

English

Hamdy🧬 retweetledi

Google AI Developers@googleaidevs·18 Ara

Announcing FunctionGemma, a specialized version of our Gemma 3 270M model that’s fine-tuned for function calling ⚙️ The new release brings bespoke function calling to the edge, and is designed as a strong base for further training into custom, fast, private, local agents that translate natural language into executable API actions. blog.google/technology/dev…

English

173

180.8K

Hamdy🧬@mhamdy_res·4 Ara

Claude has a soul now!

Amanda Askell@AmandaAskell

I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It's something I've been working on for a while, but it's still being iterated on and we intend to release the full version and more details soon.

English

Hamdy🧬@mhamdy_res·23 Kas

And if we may add a fourth item, it would probably be a modulator to keep the other three in balance

François Fleuret@francoisfleuret

I do not think you can pursue meaningful research without (1) some grandiose delusion about your abilities (2) a sense of esthetics and harmony to judge ideas still free of experimental confirmation (3) an unreasonable taste for the required tangible work (e.g. programming)

English

Hamdy🧬@mhamdy_res·22 Kas

TIL: A formula can just graph itself! en.wikipedia.org/wiki/Tupper%27…

English

Hamdy🧬 retweetledi

Science Magazine@ScienceMagazine·19 Kas

This parasitic ant tricks workers into killing their own queen. Learn more: scim.ag/3LSu6Gh @NewsfromScience

English

246

48.9K

Hamdy🧬 retweetledi

Brian Hie@BrianHie·18 Kas

Very exciting work demonstrating the emergence of in-context learning in Evo 2 on purely synthetic tasks.

Daniel Khashabi 🕊️@DanielKhashabi

For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to 𝐡𝐮𝐦𝐚𝐧 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞. But … is it? Thrilled to share our latest result: 𝗚𝗲𝗻𝗼𝗺𝗶𝗰🧬 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝙤𝙣𝙡𝙮 𝗼𝗻 '𝗻𝗲𝘅𝘁-𝗻𝘂𝗰𝗹𝗲𝗼𝘁𝗶𝗱𝗲 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻' 𝗲𝘅𝗵𝗶𝗯𝗶𝘁 𝗜𝗖𝗟! What's remarkable is that their overall pattern closely mirrors LLMs: → similar few-shot pattern induction → similar log-linear gains with more shots → similar improvement with model scale ... all learned purely from DNA (nucleotide) sequences. 𝗛𝗼𝘄 𝗱𝗶𝗱 𝘄𝗲 𝗰𝗼𝗺𝗽𝗮𝗿𝗲 𝗴𝗲𝗻𝗼𝗺𝗶𝗰 𝘃𝘀 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀? We built a suite of symbolic bitstring-reasoning tasks and encoded them two ways: (1) genomic alphabet (A/T/C/G) and (2) linguistic alphabet (digits). This lets us compare Evo2 (genomic) vs Qwen3 (language) under matched few-shot prompts. 𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: To our knowledge, this is the first evidence of emergent ICL in non-[human]language symbolic sequences. It suggests that ICL is modality-agnostic, and a general consequence of large-scale autoregressive training on rich data distributions. 𝗗𝗼𝗲𝘀 𝗜𝗖𝗟 𝗶𝗻 𝗴𝗲𝗻𝗼𝗺𝗶𝗰 𝘃𝘀 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗰𝘁 𝗶𝗱𝗲𝗻𝘁𝗶𝗰𝗮𝗹𝗹𝘆? No! While share macro-level ICL trends, each shows domain-specific inductive biases traceable to properties of DNA vs human language. 𝗗𝗼𝗲𝘀 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻 𝗵𝘂𝗺𝗮𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗶𝘀 𝗶𝗿𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝘁? No! But it suggests there may be universal distributional properties across different languages (human, DNA, etc.) that yield ICL. It remains an open question what these properties are. Draft: huggingface.co/papers/2511.12… Huge thanks to @N8Programs for leading the work, and to collaborators @anqi_liu33 @aamixsh @mrevsine @mike_schatz. We're extremely thankful to the Evo2 team ( @BrianHie @pdhsu @garykbrixi @mgdurrant @MichaelPoli6 etc.). Not only these models help advance biomed research, now we see that they can help AI community better understand the fundamentals of pre-training.

English

20.9K

Hamdy🧬 retweetledi

Cohere Labs@Cohere_Labs·14 Kas

Don’t miss "From Idea to Impact" — an AMA moderated by @singhshiviii, who grew from open science community contributor to Cohere Labs research engineer, first authoring some of our most impactful work. 🚀

English

859

Hamdy🧬 retweetledi

Zhanhui Zhou@asapzzhou·11 Kas

(1/n) 🚨 BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion — we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with diffusion for now 🙂. It’s both a hands-on tutorial for beginners and an example showing how to use our complete toolkit (dLLM) for deeper projects. Code: github.com/ZHZisZZ/dllm Report: api.wandb.ai/links/asap-zzh… Checkpoints: huggingface.co/collections/dl… Motivation: I couldn’t find a good “Hello World” example for training a minimally working yet useful diffusion language models, a class of bidirectional language models capable of parallel token generation in arbitrary order. So I tried finetuning BERTs to make it chat with discrete diffusion—and it turned out more fun than I expected. TLDR: With a small amount of open-source instruction-following data, a standard BERT can gain conversational ability with diffusion. Specifically, a finetuned ModernBERT-large, with a similar number of parameters, performs close to Qwen1.5-0.5B.

Andrej Karpathy@karpathy

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've seen a bit of both. A lot of diffusion papers look a bit dense but if you strip the mathematical formalism, you end up with simple baseline algorithms, e.g. something a lot closer to flow matching in continuous, or something like this in discrete. It's your vanilla transformer but with bi-directional attention, where you iteratively re-sample and re-mask all tokens in your "tokens canvas" based on a noise schedule until you get the final sample at the last step. (Bi-directional attention is a lot more powerful, and you get a lot stronger autoregressive language models if you train with it, unfortunately it makes training a lot more expensive because now you can't parallelize across sequence dim). So autoregression is doing an `.append(token)` to the tokens canvas while only attending backwards, while diffusion is refreshing the entire token canvas with a `.setitem(idx, token)` while attending bidirectionally. Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought. It feels quite possible that you can further interpolate between them, or generalize them further. And it's a component of the LLM stack that still feels a bit fungible. Now I must resist the urge to side quest into training nanochat with diffusion.

English

118

980

176K

Hamdy🧬 retweetledi

Cohere Labs@Cohere_Labs·12 Kas

@weiyinko_ml @singhshiviii @beyzaermis Cohere Labs Connect Conference Lightning Talk ⚡️ How Good Are LLMs at Multi-Session Coding Interactions? with @carraznathanael @mhamdy_res They'll explore how well LLMs maintain continuity and context across multiple coding sessions. 📜arxiv.org/abs/2502.13791

English

199

Hamdy🧬 retweetledi

Cohere Labs@Cohere_Labs·6 Kas

From multilingual models to diverse benchmarks and multimodal learning — Day 1 of Connect brings together researchers expanding what’s possible in global AI. 🖇️ Our lightning talks spotlight collaborative work that make AI more representative of the world’s languages. ⚡

English

1.4K

Keşfet

@_xjdr @GeminiApp @sundarpichai @NewsfromScience @singhshiviii @karpathy @elonmusk @BarackObama