Daniel Shao

45 posts

Daniel Shao

@DanielStupid

Katılım Mayıs 2015

218 Takip Edilen30 Takipçiler

Daniel Shao retweetledi

Biology+AI Daily@BiologyAIDaily·24 Haz

AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody–Antigen Affinity Ranking １．AbRank introduces a large-scale benchmark for antibody–antigen (Ab–Ag) affinity prediction, reframing the task as pairwise ranking rather than regression. This design improves generalization and robustness by focusing on relative binding preferences instead of noisy absolute values. ２．The dataset comprises over 380,000 Ab–Ag binding measurements aggregated from nine public sources. It includes highly diverse antibodies and antigens across multiple experimental conditions and affinity metrics (Kd, IC50, escape fractions). ３．AbRank introduces "m-confident ranking" by training only on pairs with at least an m-fold difference in affinity. This filters out ambiguous comparisons and emphasizes biologically meaningful distinctions. ４．Three standard train-test splits are provided to assess generalization: (i) Balanced, (ii) Hard Ab (novel antibodies), and (iii) Hard Ag (novel antigens). These splits test performance under increasing distribution shifts. ５．Two benchmarking scenarios are supported: the Unrelated Complex Benchmark (diverse Ab–Ag pairs) and the Local Perturbation Benchmark (closely related variants). This dual setup evaluates both broad generalization and fine-grained affinity shifts (e.g., from mutations). ６．Structures for all antibodies and antigens were predicted using efficient models (IgFold, Boltz-1), enabling scalable structure-aware learning without requiring known complex structures. ７．The authors propose WALLE-Affinity, a graph-based method combining pretrained embeddings (AntiBERTy for Abs, ESM-2 for Ags) with structural graphs to predict pairwise affinity rankings. ８．WALLE-Affinity trained with ranking loss consistently outperforms regression-based variants and other baselines (ANTIPASTI, GearBind, PBEE, FoldX), especially under hard generalization settings. ９．The model performs inference using only individual Ab and Ag structures, avoiding complex structure prediction while remaining fast (~10 sec/complex) and accurate. １０．Ranking-based supervision consistently yields better generalization than regression, particularly for unseen antigen scenarios. This supports the hypothesis that pairwise comparison is more robust to noise and label uncertainty. １１．Despite its scalability and robustness, the model’s performance declines on local perturbation tasks, reflecting the challenge of predicting subtle changes from minor sequence edits. １２．AbRank offers a unified platform for evaluating Ab–Ag affinity models under realistic and challenging scenarios. It is designed to catalyze progress in therapeutic antibody design, affinity maturation, and immune escape prediction. 💻Code: github.com/biochunan/AbRa… 📜Paper: arxiv.org/abs/2506.17857… #AntibodyDesign #ProteinInteraction #MachineLearning #Bioinformatics #GraphNeuralNetworks #Ranking #Benchmark #ComputationalBiology

English

1.1K

Daniel Shao retweetledi

Michael Moor@Michael_D_Moor·16 Haz

🧵1/ ✨New preprint ✨ LLMs are getting better at answering medical questions. However, they still struggle to spot and fix errors in their own reasoning. That’s a big problem in medicine, where stakes are high and mistakes at any step could be critical. To address this issue, we introduce Med-PRM, a process reward model that evaluates each reasoning step using clinical guidelines and high-quality medical sources. Evaluated on 7 benchmarks, Med-PRM improves accuracy by up to +13.5%, enabling the first open 8B-parameter model to surpass 80% on MedQA. We hope that this work takes the field one step into the direction of trustworthy and verified medical LLMs. 📄 Paper: arxiv.org/abs/2506.11474 🔗 Page: med-prm.github.io 🧠 Model: huggingface.co/dmis-lab/llama… 📚 Dataset: huggingface.co/datasets/dmis-… 💻 Code: github.com/eth-medical-ai… Great collab with: Jaehoon Yun*, Jiwoong Sohn* (@de_Jiung), Jungwoo Park*, Hyunjae Kim, Xiangru Tang (@XiangruTang), Daniel Shao (@DanielStupid), Yong Hoe Koo, Ko Minhyeok, Qingyu Chen (@qingyu_qc), Mark Gerstein (@MarkGerstein), Jaewoo Kang# (@jkang101).

English

226

53K

Daniel Shao@DanielStupid·6 Haz

@Shinichi_Izumm @andrewwhite01 I think you are right. Looking forward to seeing how ether0 will benefit from that.

English

CaoHe@Shinichi_Izumm·6 Haz

@andrewwhite01 Great work! We also explore LLM reasoning in chemistry with Beyond Chemical QA (arxiv.org/pdf/2505.21318)—a 22-task, 1500-sample benchmark for step-by-step evaluation. Expanding to a larger long-CoT dataset for better RL training—exciting times ahead! 🚀

English

238

Andrew White 🐦‍⬛@andrewwhite01·5 Haz

At FutureHouse, we’ve noticed scientific agents are good at applying average intelligence across tasks. They always seem to make the obvious choices, which is good, but discovery sometimes requires more intuition and insight than average. We’ve made the first step today towards superhuman insight by training a reasoning model for a specific domain of science: designing drug-like molecules. We’re releasing a 24B open-weights reasoning model called 𝚎𝚝𝚑𝚎𝚛𝟶. 𝚎𝚝𝚑𝚎𝚛𝟶 has been trained with reinforcement learning to exceed frontier and human experts across a range of molecular design tasks. 𝚎𝚝𝚑𝚎𝚛𝟶 takes in natural language, reasons in English, and outputs a new molecule. 𝚎𝚝𝚑𝚎𝚛𝟶 is now a tool for our chemistry design agent, Phoenix, which can call upon it to design molecules. Training a reasoning model for a scientific domain like chemistry, rather than math or programming, required a number of small technical advances. For example, we developed an iterative method of split specialist models and aggregation of reasoning traces. Another example is we used LLMs to rewrite questions that were partially solved. A major finding from this work is that we can train with >10x efficiency per experimental measurement when using a reasoning model, rather than fine-tuning. We also found that reasoning models can learn new tasks, developed specifically for this paper and not in pretraining corpora. We even saw a task have 0% performance until 100 steps into RL, at which it randomly solved once. This, along with our change in modality from natural language to molecules, bodes well for applying reasoning models far from natural language. Reasoning models in science are the future. Scientific tasks are naturally verifiable rewards: the physical world is the ultimate arbiter of accuracy, rather than human contractors. The data efficiency gain and ability to exceed frontier models with relatively few parameters/compute mean that we should expect more scientific reasoning models soon. Congrats to team @SidN137, James, @Ryan__Rhys, Albert, @GWellawatte , @maykcaldas , @ludomitch , and @SGRodriques. Thanks to @VoltagePark @nvidia and @huggingface for supporting us, and huge thanks to @ericschmidt for funding @FutureHouseSF The model weights, reward model, and new benchmark are open source. You can also read more about scientific reasoning models in our exclusive with Nature.

English

412

80.4K

Daniel Shao retweetledi

Jiayi Zhang@didiforx·6 Mar

No fortress, purely open ground. Manus 👋. We open-sourced its core feature in 2 hours after dinner. Check it out 👇: github.com/mannaandpoem/O… 1/4

English

223

75.2K

Daniel Shao retweetledi

MetaGPT@MetaGPT_·2 Mar

20 Months: 0 → 7 papers (2 ICLR orals) & 40+ institution collabs. With a clear vision, we're building the open-source foundation for tomorrow's agents. We also release MGX (mgx.dev) and commit to open-source its core soon. Check threads for what we've built! 1/8

English

117

256

49.5K

Daniel Shao retweetledi

Jiayi Zhang@didiforx·1 Mar

Reasoning models lack atomic thought ⚛️ Unlike humans using independent units, they store full histories🤔 Introducing Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1 ! The best part? It's plugs in for ANY framework 🔌 1/5

English

406

3.2K

395.5K

Daniel Shao retweetledi

Rob Tang 🦞@XiangruTang·7 Kas

Excited to share our latest work BC-Design - a new framework for highly accurate inverse protein folding that achieves unprecedented 88.37% sequence recovery rate (previous SOTA achieved 67%)! 🧬 To put this in perspective, the field's progress on CATH 4.2 benchmark: ProteinMPNN (2022): ~51% ESM-IF1 (2023): ~55% SPDesign (2024): ~67% BC-DESIGN: 88.37% A quantum leap! 📈 BC-Design uses a novel architecture combining: - Struct-Encoder for backbone structure - BC-Encoder for biochemical features - BC-Fusion module to integrate both signals - All optimized with contrastive learning 🧪 Key innovation: We represent biochemical properties (hydrophobicity & charge) as distributions in 3D space rather than per-residue features. This provides a more natural way to capture spatial distribution of properties. 🔬 Particularly proud of robust generalization - consistently high performance across: - All major CATH fold classes 💪 - Strong performance across proteins of different sizes (50-500 residues) and structural complexity (different structural complexities). 📈 Code and models are available! Looking forward to seeing how the community builds on this work to advance protein design! 😃😃🧑‍🔬 📜:biorxiv.org/content/10.110… #StructuralBiology #DeepLearning #ProteinDesign

English

Daniel Shao@DanielStupid·25 May

@B_Fernandes8 @ManUtd Captain🫡

English

Bruno Fernandes@B_Fernandes8·25 May

FA CUP WINNER 🏆

English

4.7K

24.1K

198.5K

3.7M

Daniel Shao@DanielStupid·26 Mar

@C___eric417 Hello

English

Zixuan Chen@C___eric417·14 Mar

amazing work

Xiaolong Wang@xiaolonw

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding. geff-b1.github.io

English

326

Daniel Shao@DanielStupid·18 Mar

@oahzxl You are Lei God

English

Xuanlei Zhao@oahzxl·18 Mar

So glad that Open-Sora is also using our OpenDiT as the distributed framework. In OpenDiT, we have also supported Open-Sora, and will integrate more open-source DiT models in the future. Welcome everyone to follow~ Github: github.com/NUS-HPC-AI-Lab…

Yang You@YangYou1991

Exciting News from Open-Sora! 🚀 They've just made the ENTIRE suite of their video-generation model open source! Dive into the world of cutting-edge AI with access to model weights, comprehensive training source code, and detailed architecture insights. Start building your dream video-generation model today! Check it out 👉 github.com/hpcaitech/Open…

English

555

Daniel Shao@DanielStupid·9 Şub

@premierleague cr7 siuuuuu

Lietuvių

Premier League@premierleague·7 Şub

Who gets your vote? 🗳️

English

11.5K

2.3K

66.4K

8.8M

Daniel Shao@DanielStupid·17 Ara

@DrJimFan Everybody like Haaland🤣

English

242

Jim Fan@DrJimFan·17 Ara

While we are waiting for World Cup finals, here are DeepMind’s AI bots playing soccer in simulation! The agents don’t communicate with each other and only try to maximize their own incentive. But teamwork and complex strategies *emerge* through repeated competition!

English

284

1.7K

236K

Daniel Shao@DanielStupid·8 Tem

@paulpogba not good

English

Paul Pogba@paulpogba·8 Tem

Watch until the end ⏰ #PogAlmostBack

English

3.8K

38.6K

Daniel Shao@DanielStupid·8 Tem

@livescore arsenal obviously

English

LiveScore@livescore·7 Tem

Which Premier League side is this? 🤔

English

10.8K

2.3K

34.6K

Daniel Shao@DanielStupid·2 Haz

@utdreport @ChrisWheelerDM 😓

QME

utdreport@utdreport·2 Haz

Paul Pogba will be paid a £3.78m loyalty bonus when he leaves at the end of his #mufc contract #mulive [@ChrisWheelerDM]

English

432

419

9.4K

Daniel Shao@DanielStupid·2 Haz

@UnitedStandMUFC 0

The United Stand@UnitedStandMUFC·1 Haz

Out of 10, rate Pogba's career at #mufc...

English

2.7K

100

5.2K

Daniel Shao@DanielStupid·15 May

@UnitedStandMUFC perfect

English

The United Stand@UnitedStandMUFC·15 May

N'Golo Kante to #mufc...YES/NO?

English

2.1K

412

16.7K

Daniel Shao@DanielStupid·18 Nis

@utdreport @samuelluckhurst perfect

English

utdreport@utdreport·18 Nis

Harry Maguire could be stripped of #mufc’s captaincy by Erik ten Hag #mulive [@samuelluckhurst]

English

1.3K

1.7K

24.5K

Daniel Shao@DanielStupid·25 Mar

@IAmJermainDefoe legend

English

Jermain Defoe OBE@IAmJermainDefoe·24 Mar

❤️

QME

3.1K

10.4K

145.8K

Daniel Shao@DanielStupid·15 Mar

@utdreport @johncrossmirror @DiscoMirror good

English

utdreport@utdreport·14 Mar

#mufc are considering making a fresh move for Harry Kane in the summer. United are watching developments closely and know it could take more than £100m to sign him #mulive [@johncrossmirror, @DiscoMirror]

English

278

206

3.5K

Keşfet

@de_Jiung @XiangruTang @qingyu_qc @MarkGerstein @jkang101 @Shinichi_Izumm @andrewwhite01 @SidN137