Dan Ofer (Was @ICML,@Worldcon )

4.7K posts

Dan Ofer (Was @ICML,@Worldcon )

@danofer

#Data scientist, #Researcher, Bioinformatician, Photographer, Geek & Bookworm. PhD #AI #LLM @HebrewU @HyadataLab @liniallab @shebaARC

Israel Katılım Mayıs 2008

1.1K Takip Edilen1.1K Takipçiler

Sabitlenmiş Tweet

Dan Ofer (Was @ICML,@Worldcon )@danofer·1 Eyl

1/ Our paper “Protein Language Models Expose Viral Immune Mimicry” is now published in Viruses! We show that protein Language Models can identify viral proteins, and those that fool our immune system.

English

1.5K

Dan Ofer (Was @ICML,@Worldcon )@danofer·21h

@andrey_kurenkov It's working for me.

English

Andrey Kurenkov@andrey_kurenkov·2d

Can we all agree that LLM-powered hyper param search to optimize nanoGPT better is not really AI research?

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline

English

429

42.6K

Dan Ofer (Was @ICML,@Worldcon )@danofer·21h

@KevinKaichuang @KlaraH_lab Almost like you could just randomly...evolve towards them? \Staga-dish drum roll

English

340

Kevin K. Yang 楊凱筌@KevinKaichuang·1d

Screen 1M random protein sequences to discover that biology-like folds are accessible from random sequences with surprising frequency @KlaraH_lab

English

280

22.7K

Dan Ofer (Was @ICML,@Worldcon ) retweetledi

University of California@UofCalifornia·3d

Pancreatic cancer is one of the most dire diagnoses in medicine with few available treatments. Until now, thanks to university research, including @UCSF scientists, and federal investment in science research. Read about this huge breakthrough via @nytimes nyti.ms/4wfziXs

English

119

40.9K

Dan Ofer (Was @ICML,@Worldcon )@danofer·2d

@yoavgo @Jonathan_Blow switch 2 will have less of a price drop

English

(((ل()(ل() 'yoav))))👾@yoavgo·2d

@Jonathan_Blow ahhh now i am torn between the need to buy a Switch 2 and the need to buy a PC some time before 2027

English

2.1K

Jonathan Blow@Jonathan_Blow·2d

Something we've been working on...

English

271

728

9.8K

1.2M

Dan Ofer (Was @ICML,@Worldcon )@danofer·2d

@will_ea @LLMenjoyer Awesome. Thanks!

English

Will Bui@will_ea·2d

@danofer @LLMenjoyer It is different. The kernels in our package are for Block AttnRes. For multiple queries per block, it is more efficient. The kernels in FLA are more suited for Full AttnRes.

English

Will Bui@will_ea·3 May

27x faster Attention Residuals!!! 🚀 We implemented Block AttnRes as a pip-installable package. !pip install flash-attn-res No annoying kernel nonsense. No compile/autograd plumbing. Call it like a regular PyTorch op. It just works. Methodology: 🔹 fused triton kernels 🔹 batched attention over residual blocks 🔹 online-softmax merge 🔹 flash attention-style split-KV reduction Thanks @LLMenjoyer and @cartesia for the support and guidance✌️

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

770

74K

Dan Ofer (Was @ICML,@Worldcon )@danofer·2d

@Aria_Churchill @multimodali Very much so

English

Aria Churchill@Aria_Churchill·9 May

@multimodali Is this true, chat?

English

175

Dan Ofer (Was @ICML,@Worldcon ) retweetledi

multimodali@multimodali·8 May

Chris Hayduk@ChrisHayduk

Exactly - biology is a fundamentally different domain than text and scaling laws do not apply cleanly ~All the tasks you want an LLM to do are contained in the text data itself. For biology, NONE of the tasks you want the model to do are contained in the sequence data itself.

ZXX

167

12.9K

Dan Ofer (Was @ICML,@Worldcon )@danofer·2d

@miangoar Weird in that for some benchmarks, model size | precompute | Being pretrained at all (vs random init), doesn't yield better performance. (Lots of XORs)

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·3d

@danofer That’s interesting. Weird in what sense? For example, something like showing a double descent behavior when training a PLM? Or weird in the sense that a smaller PLM shows better performance than its larger version?

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·10 May

I think this plot from the ProteinBERT paper clearly illustrates this. But if you want to see a more detailed analysis showing how scaling appears to improve only structure-related tasks, check out this other paper. Feature Reuse and Scaling x.com/KevinKaichuang…

multimodali@multimodali

English

4.1K

Dan Ofer (Was @ICML,@Worldcon )@danofer·3d

@FrankRHutter I couldn't quite understand what it expects re data preprocessing though?

English

Frank Hutter@FrankRHutter·4d

The data science revolution is here now. TabPFN-3 is live, taking tabular foundation models to enterprise scale 🤩 1M training rows on a single H100. No training. No tuning. Load and predict. 🧵 1/5 #tabpfn #tabularfoundationmodels #priorlabs

English

280

25.7K

Dan Ofer (Was @ICML,@Worldcon )@danofer·3d

@mayukh_panja It depends what you mean by balance. There's a gaping difference between "I want to do a job simultaneously" to "I can't meet up this week(end), we're doing a submission"; to "I left the lab before 11PM twice this month, and I stress cried because everyone was still working".

English

852

Mayukh@mayukh_panja·4d

I don’t agree. A PhD student should not prioritize work-life balance. Getting to do a PhD is a privilege. You are paid to think. There is no pressure for you to be economically useful. It is a unique opportunity to push the boundaries of human knowledge and produce something ground breaking. And nothing great ever happens without complete devotion. Look at everything that moved and shaped the world. Every single person who created anything meaningful, in science, in arts, in music, in movies, devoted their lives to their craft. Extraordinary outcomes require extraordinary inputs and some degree of sacrifice. Sure, have work-life balance during your PhD. But be content a mediocre outcome.

Dr. Manabendra Saharia@m_saharia

Yesterday, I was giving an intro talk to our dept's new PhD students. Technical things aside, my number 1 suggestion has remained the same over the years: Treat your PhD like a job. - Avoid 1.5h lunch and three tea breaks. - Avoid gossiping and loitering at work. - Lab at 9 am and leave at 6 pm. Being productive till 11 pm in the lab is a lie people till themselves when their day starts at 1 PM. Everything worth doing can be done with high intensity focus during work hours. And having fun in life is the secret to being productive in a marathon.

English

359

277

1.1M

Dan Ofer (Was @ICML,@Worldcon )@danofer·3d

@yoavgo Biology and psych are good at experiments. CS don't know their noise from their random. (Still better than Physics tho).

English

182

(((ل()(ل() 'yoav))))👾@yoavgo·4d

generally, experiments with LLMs suck. even (esp?) from big players like anthropic. one failure mode is an experiment in which you change the input in some controlled way, and see a change in output. say, in 10% of cases you changed male to female, the response got ruder. you conclude that the model is rude to females. but... if you just did some other change (say, change active to passive voice), you also see that in 10% of cases the model got ruder. in other words, we failed to control the experiment. this is experimentation 101, but new results falling for this. i guess CS people just kinda suck at experimentation. anyways, extremely common. this work documents the issues, and offers guidelines on how to improve. (i'm on this paper, but did very little. i am strongly supportive of the message though)

Zihao (Gavin) Yang@ZihaoGavinYang

1/ (New paper!) If swapping the gender in an input prompt makes the AI model give a different answer it means that it has to have a gender bias, right? Wrong. 🧵on counterfactual prompting for LLM evals: Paper: arxiv.org/abs/2605.01048

English

289

39K

Dan Ofer (Was @ICML,@Worldcon )@danofer·3d

@yoavgo Eh. Experience counts for an awful lot I say. Some things transfer. (Like spotting test set leakage)

English

(((ل()(ل() 'yoav))))👾@yoavgo·5d

"I've been doing AI for 20 years and ..." and nothing. LLMs are new. LLM-Agents are new. our 20+ years experience with AI/ML/NLP may be marginally useful for understanding aspects of their training, but thats about it. we need new tools and experiences. we dont deserve authority.

English

401

23.1K

Dan Ofer (Was @ICML,@Worldcon )@danofer·6d

@miangoar Uniprot sounds wrong. Just PTM on Swissprot would be a lot, nvm more manual curation

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·6d

“entire PDB archive is conservatively estimated at ∼US$20B, assuming an average cost of ∼US$100K for regenerating each experimental structure” academic.oup.com/nar/article/51… For Uniprot, the annual economic value is estimated between €332M - €524M ebi.ac.uk/about/news/ann…

Shae McLaughlin@shae_mcl

It’s estimated that the Protein Data Bank (PDB) cost around $13B to create. Alphafold was only possible because of it. If we want ML to solve biology, we should be funding the creation of databases and the development of new assay technologies. ML is nothing without data.

English

3.2K

Dan Ofer (Was @ICML,@Worldcon )@danofer·6d

@anshulkundaje Love this. (Shared with lab :D)

English

Anshul Kundaje@anshulkundaje·8 May

"The Bitter Lesson has fully arrived in sequence biology and protein structure. Evo 2, AlphaFold 2 and 3, ProGen3, RFdiffusion". This sentence has some issues IMO. 1/

Sylvain Gariel@SylvainGariel

x.com/i/article/2051…

English

392

66.7K

Dan Ofer (Was @ICML,@Worldcon )@danofer·10 May

@KevinKaichuang @francescazfl @avapamini @yisongyue @alexijielu Can confirm, horrifyingly true for some tasks. Not all :D

English

Kevin K. Yang 楊凱筌@KevinKaichuang·8 Şub

We did 370 experiments to discover that protein language models primarily learn structure and won't scale for protein function prediction. We need new pretraining tasks! Work led by @francescazfl with @avapamini @yisongyue @alexijielu See Alex's thread + the paper for more!