Hamdy🧬

677 posts

Hamdy🧬 banner
Hamdy🧬

Hamdy🧬

@mhamdy_res

A curious explorer of human and machine learning 🧐🤝🤖

Cairo, Egypt Katılım Mart 2020
3.7K Takip Edilen155 Takipçiler
xjdr
xjdr@_xjdr·
ok so: engram is moe over ngramed memory mHC is moe over the residual stream NSA is moe over attention MoE is moe over FFNs ... im sensing a theme ....
English
26
30
545
59.6K
Hamdy🧬
Hamdy🧬@mhamdy_res·
The new DeepSeek Engram paper is super fun! It also integrates mHC, and I think they're probably releasing all these papers to make the V4 report of reasonable length😄 Here's a nice short summary from @GeminiApp 🫡
Hamdy🧬 tweet media
English
1
0
1
205
Sundar Pichai
Sundar Pichai@sundarpichai·
MedGemma 1.5 is a major upgrade to our open models for healthcare developers. The new 4B model enables developers to build applications that natively interpret full 3D scans (CTs, MRIs) with high efficiency - a first, we believe, for an open medical generalist model. MedGemma 1.5 also pairs well with MedASR, our speech-to-text model fine-tuned for highly accurate medical dictation. Developers can now use these multimodal capabilities to build medical apps that reach patients in more places.
English
179
698
6K
394.5K
Hamdy🧬
Hamdy🧬@mhamdy_res·
On this day, the Heuristically programmed ALgorithmic computer, better known as HAL 9000, became operational "I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do." - HAL 9000
English
0
0
1
28
Hamdy🧬
Hamdy🧬@mhamdy_res·
On the slow death of scaling Because we can't scale forever, and the world is not enough Give it a chance, this isn't another deep learning hitting a wall... open.substack.com/pub/surfingman…
Hamdy🧬 tweet media
English
1
0
1
38
Hamdy🧬
Hamdy🧬@mhamdy_res·
Severance directed by David Lynch
Hamdy🧬 tweet mediaHamdy🧬 tweet mediaHamdy🧬 tweet mediaHamdy🧬 tweet media
English
0
0
1
41
Hamdy🧬
Hamdy🧬@mhamdy_res·
Weekend read ☕🐋
Hamdy🧬 tweet media
English
1
0
0
25
Hamdy🧬 retweetledi
Google AI Developers
Google AI Developers@googleaidevs·
Announcing FunctionGemma, a specialized version of our Gemma 3 270M model that’s fine-tuned for function calling ⚙️ The new release brings bespoke function calling to the edge, and is designed as a strong base for further training into custom, fast, private, local agents that translate natural language into executable API actions. blog.google/technology/dev…
English
30
173
1K
180.8K
Hamdy🧬 retweetledi
Brian Hie
Brian Hie@BrianHie·
Very exciting work demonstrating the emergence of in-context learning in Evo 2 on purely synthetic tasks.
Daniel Khashabi 🕊️@DanielKhashabi

For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to 𝐡𝐮𝐦𝐚𝐧 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞. But … is it? Thrilled to share our latest result: 𝗚𝗲𝗻𝗼𝗺𝗶𝗰🧬 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝙤𝙣𝙡𝙮 𝗼𝗻 '𝗻𝗲𝘅𝘁-𝗻𝘂𝗰𝗹𝗲𝗼𝘁𝗶𝗱𝗲 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻' 𝗲𝘅𝗵𝗶𝗯𝗶𝘁 𝗜𝗖𝗟! What's remarkable is that their overall pattern closely mirrors LLMs: → similar few-shot pattern induction → similar log-linear gains with more shots → similar improvement with model scale ... all learned purely from DNA (nucleotide) sequences. 𝗛𝗼𝘄 𝗱𝗶𝗱 𝘄𝗲 𝗰𝗼𝗺𝗽𝗮𝗿𝗲 𝗴𝗲𝗻𝗼𝗺𝗶𝗰 𝘃𝘀 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀? We built a suite of symbolic bitstring-reasoning tasks and encoded them two ways: (1) genomic alphabet (A/T/C/G) and (2) linguistic alphabet (digits). This lets us compare Evo2 (genomic) vs Qwen3 (language) under matched few-shot prompts. 𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: To our knowledge, this is the first evidence of emergent ICL in non-[human]language symbolic sequences. It suggests that ICL is modality-agnostic, and a general consequence of large-scale autoregressive training on rich data distributions. 𝗗𝗼𝗲𝘀 𝗜𝗖𝗟 𝗶𝗻 𝗴𝗲𝗻𝗼𝗺𝗶𝗰 𝘃𝘀 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗰𝘁 𝗶𝗱𝗲𝗻𝘁𝗶𝗰𝗮𝗹𝗹𝘆? No! While share macro-level ICL trends, each shows domain-specific inductive biases traceable to properties of DNA vs human language. 𝗗𝗼𝗲𝘀 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻 𝗵𝘂𝗺𝗮𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗶𝘀 𝗶𝗿𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝘁? No! But it suggests there may be universal distributional properties across different languages (human, DNA, etc.) that yield ICL. It remains an open question what these properties are. Draft: huggingface.co/papers/2511.12… Huge thanks to @N8Programs for leading the work, and to collaborators @anqi_liu33 @aamixsh @mrevsine @mike_schatz. We're extremely thankful to the Evo2 team ( @BrianHie @pdhsu @garykbrixi @mgdurrant @MichaelPoli6 etc.). Not only these models help advance biomed research, now we see that they can help AI community better understand the fundamentals of pre-training.

English
2
14
99
20.9K
Hamdy🧬 retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Don’t miss "From Idea to Impact" — an AMA moderated by @singhshiviii, who grew from open science community contributor to Cohere Labs research engineer, first authoring some of our most impactful work. 🚀
Cohere Labs tweet media
English
1
2
8
859
Hamdy🧬 retweetledi
Zhanhui Zhou
Zhanhui Zhou@asapzzhou·
(1/n) 🚨 BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion — we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with diffusion for now 🙂. It’s both a hands-on tutorial for beginners and an example showing how to use our complete toolkit (dLLM) for deeper projects. Code: github.com/ZHZisZZ/dllm Report: api.wandb.ai/links/asap-zzh… Checkpoints: huggingface.co/collections/dl… Motivation: I couldn’t find a good “Hello World” example for training a minimally working yet useful diffusion language models, a class of bidirectional language models capable of parallel token generation in arbitrary order. So I tried finetuning BERTs to make it chat with discrete diffusion—and it turned out more fun than I expected. TLDR: With a small amount of open-source instruction-following data, a standard BERT can gain conversational ability with diffusion. Specifically, a finetuned ModernBERT-large, with a similar number of parameters, performs close to Qwen1.5-0.5B.
Andrej Karpathy@karpathy

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've seen a bit of both. A lot of diffusion papers look a bit dense but if you strip the mathematical formalism, you end up with simple baseline algorithms, e.g. something a lot closer to flow matching in continuous, or something like this in discrete. It's your vanilla transformer but with bi-directional attention, where you iteratively re-sample and re-mask all tokens in your "tokens canvas" based on a noise schedule until you get the final sample at the last step. (Bi-directional attention is a lot more powerful, and you get a lot stronger autoregressive language models if you train with it, unfortunately it makes training a lot more expensive because now you can't parallelize across sequence dim). So autoregression is doing an `.append(token)` to the tokens canvas while only attending backwards, while diffusion is refreshing the entire token canvas with a `.setitem(idx, token)` while attending bidirectionally. Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought. It feels quite possible that you can further interpolate between them, or generalize them further. And it's a component of the LLM stack that still feels a bit fungible. Now I must resist the urge to side quest into training nanochat with diffusion.

English
21
118
980
176K
Hamdy🧬 retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
From multilingual models to diverse benchmarks and multimodal learning — Day 1 of Connect brings together researchers expanding what’s possible in global AI. 🖇️ Our lightning talks spotlight collaborative work that make AI more representative of the world’s languages. ⚡
Cohere Labs tweet media
English
1
8
16
1.4K