Whijae Roh

738 posts

Whijae Roh

Whijae Roh

@whijae

Computational Biologist #CancerGenomics #GenomicMedicine #AI

San Diego, CA Katılım Mayıs 2010
599 Takip Edilen295 Takipçiler
Whijae Roh retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune
GIF
English
49
373
1.8K
561.2K
Whijae Roh retweetledi
Bo Wang
Bo Wang@BoWang87·
Today we’re announcing X-Cell — Xaira’s first step toward a virtual cell. 🧬 A foundation model that predicts how gene expression changes under causal perturbations — across cell types, conditions, and even unseen biology. This is not trained on observational atlases. It is trained on interventions. 🧵👇
English
43
149
958
170.4K
Whijae Roh retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
968
2.1K
19.4K
3.5M
Whijae Roh retweetledi
Patrick Hsu
Patrick Hsu@pdhsu·
Evo 2, our fully open-source biological foundation model trained on trillions of DNA tokens spanning the entire tree of life, is out in @Nature today We & the scientific community have done a lot with this @arcinstitute @nvidia model in the last year! 🧵👇
English
55
319
2K
242.8K
Whijae Roh retweetledi
Rimsha Bhardwaj
Rimsha Bhardwaj@heyrimsha·
🚨 Holy shit… Google published one of the cleanest demonstrations of real multi-agent intelligence I’ve seen so far. Not another “look, two chatbots are talking” demo. An actual framework for how agents can infer who they’re interacting with and adapt on the fly. The paper is “Multi-agent cooperation through in-context co-player inference.” The core idea is deceptively simple: In multi-agent environments, performance doesn’t just depend on the task. It depends on who you’re paired with. Most current systems ignore this. They optimize against an average opponent. Or assume fixed partner behavior. Or hard-code roles. Google does something smarter. They let the model infer its co-player’s strategy directly from the interaction history inside the context window. No retraining, separate belief model and no explicit opponent classifier. Just in-context inference. The agent observes a few rounds of behavior. Forms an implicit hypothesis about its partner’s type. Then updates its own strategy accordingly. This turns static policies into adaptive ones. The experiments are structured around cooperative and social dilemma games where partner types vary: Some partners are fully cooperative. Some are selfish. Some are stochastic. Some strategically defect. Agents without co-player inference treat all partners the same. Agents with inference adjust. And the performance gap is significant. What makes this paper uncomfortable for a lot of current “multi-agent” hype is how clearly it shows what real coordination requires. First, coordination is not just communication. It’s modeling the incentives and likely actions of others. Second, robustness matters. An agent that cooperates blindly gets exploited. An agent that defects blindly loses cooperative gains. The system must dynamically balance trust and caution. Third, adaptation must happen at inference time. In real deployments, you cannot retrain every time the population changes. The most interesting part is that this capability emerges purely from structured context. The model isn’t fine-tuned to classify opponent types explicitly. It uses behavioral traces embedded in the prompt to infer latent strategy. That’s belief modeling through language. And it scales. Think about where this matters outside toy games: Autonomous trading systems reacting to different market participants. Negotiation agents interacting with unpredictable humans. Distributed AI workflows coordinating across departments. Swarm robotics where teammate reliability varies. In all these settings, static competence is not enough. Strategic awareness is the bottleneck. The deeper shift is philosophical. We’ve been treating LLM agents as isolated optimizers. This paper moves us toward agents that reason about other agents reasoning about them. That’s recursive modeling. And once that loop becomes stable, you no longer have “a chatbot.” You have a participant in a strategic ecosystem. The takeaway isn’t that multi-agent AI is solved. It’s that most current systems aren’t even attempting the hard part. Real multi-agent intelligence isn’t multiple prompts in parallel. It’s adaptive belief formation under uncertainty. And this paper is one of the first clean proofs that large models can do that using nothing but context. Paper: Multi-agent cooperation through in-context co-player inference
Rimsha Bhardwaj tweet media
English
35
81
515
37K
Whijae Roh retweetledi
Bo Wang
Bo Wang@BoWang87·
Thrilled to share our review paper, out today in @NatureRevGenet : "Harnessing artificial intelligence to advance CRISPR-based genome editing technologies" Full paper : 🔗 nature.com/articles/s4157… CRISPR has already changed medicine. AI is now changing CRISPR. We spent a long time mapping the full landscape of where machine learning and deep learning are having real, measurable impact across the genome editing workflow — and where the most exciting opportunities lie ahead. Here's what we cover: Guide RNA design — Deep learning models now predict on- and off-target activity for Cas9, Cas12, Cas13, and emerging systems like TnpB and IscB. We've gone from sequence heuristics to transformer-based models that generalize across organisms. Cell-type-specific generalization remains a frontier. Base and prime editing — ML models predict bystander effects, product purity, and editing efficiency from sequence context alone. For prime editing, tools like PRIDICT and DeepPE have made pegRNA design far more tractable at scale. Enzyme engineering — Protein language models (ESM, EVOLVEpro) are now guiding directed evolution of Cas proteins — expanding PAM compatibility, reducing immunogenicity, improving compactness — at a pace impossible through classical lab iteration alone. Novel enzyme discovery — Foundation models trained on metagenomics are uncovering entirely new CRISPR systems from microbial diversity: new Cas variants, TnpB systems, and eukaryotic Fanzor proteins. The search space is enormous; AI is how we navigate it. Virtual cell models — This is where I'm most excited. AI-powered virtual cells can, in principle, predict the functional consequences of any edit in any cell type — selecting targets, anticipating off-targets, modeling tissue-specific outcomes. But realizing this vision requires causally-rich, contextually diverse perturbation data. Scale of data matters as much as scale of model. Delivery — ML-guided LNP design is closing the last mile between an edit that works in a dish and one that works in a patient. Across all of this, one theme recurs: AI accelerates where data is abundant and well-structured. The field's next challenge is generating that data at the right diversity and scale. This paper was a true collaboration. Huge thanks to Tyler Thomson, Gen Li, Amy Strilchuk, @HAOTIANCUI1 , and Bowen Li — you each brought something irreplaceable to this. Special shoutout to @BowenLi_Lab for his leaderhsip in this work!
Bo Wang tweet media
English
1
48
270
19.6K
Whijae Roh retweetledi
Vaishnavi
Vaishnavi@_vmlops·
Anthropic dropped a 33-page guide on Claude Skills...And this changes how serious teams build AI workflows A Claude Skill is basically a reusable workflow in a folder. One SKILL.md file teaches Claude exactly how you want tasks done consistently every time The real insight isn’t Skills....It’s how to design them properly: • Build micro-skills, not monoliths • Keep instructions short and decisive • Move heavy context into references and assets • Always refine generated Skills manually • Connect Skills to tools via MCP and hooks That’s when AI stops being a chatbot… and starts becoming a system Link - platform.claude.com/docs/en/agents… drive.google.com/file/d/1RR4zKK…
Vaishnavi tweet media
English
29
370
2.5K
262.6K
Whijae Roh
Whijae Roh@whijae·
“I performed my initial port with Claude Opus 4.5 and 4.6, and more recently worked on finishing off several pieces with Codex GPT-5.3. I find it remarkable that these tools were able to produce an essentially complete Python port in a week. Remarkably, the running time of the Python port is comparable to that of edgeR (in some cases the port is faster). The port does use Numba, but it does not rely on any C code. While I had to work closely with the Claude and Codex tools to make the port work, and some of my contribution was non-trivial (thanks to Sina Booeshaghi for some major assists along the way), it seems to me that much of my involvement could be automated in the future. The acceleration of computational biology is increasing.”
Lior Pachter@lpachter

I used Claude Opus 4.5/4.6 (and a bit of Codex GPT-5.3) to port edgeR to Python. See edgePython github.com/pachterlab/edg… This allowed me to develop a single-cell DE method that extends NEBULA with edgeR Empirical Bayes. All in one week. Details in doi.org/10.64898/2026.…

English
0
0
0
152
Whijae Roh retweetledi
Ming "Tommy" Tang
Ming "Tommy" Tang@tangming2005·
Nature Methods: Squidiff: predicting cellular development and responses to perturbations using a diffusion model from single cell data nature.com/articles/s4159…
Ming "Tommy" Tang tweet media
English
3
47
247
14.1K
Whijae Roh retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
How could AI act as a better research collaborator? 🧑‍🔬 In two new papers with @GoogleResearch, we show how Gemini Deep Think uses agentic workflows to help solve research-level problems in mathematics, physics, and computer science. More → goo.gle/4aGs3Pz
Google DeepMind tweet media
English
100
284
2K
423.2K
Whijae Roh retweetledi
Isomorphic Labs
Isomorphic Labs@IsomorphicLabs·
Today we share a technical report demonstrating how our drug design engine achieves a step-change in accuracy for predicting biomolecular structures, more than doubling the performance of AlphaFold 3 on key benchmarks and unlocking rational drug design even for examples it has never seen before. Head to the comments to read our blog.
Isomorphic Labs tweet media
English
65
524
3K
1.3M
Whijae Roh retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
A lot of people quote tweeted this as 1 year anniversary of vibe coding. Some retrospective - I've had a Twitter account for 17 years now (omg) and I still can't predict my tweet engagement basically at all. This was a shower of thoughts throwaway tweet that I just fired off without thinking but somehow it minted a fitting name at the right moment for something that a lot of people were feeling at the same time, so here we are: vibe coding is now mentioned on my Wikipedia as a major memetic "contribution" and even its article is longer. lol The one thing I'd add is that at the time, LLM capability was low enough that you'd mostly use vibe coding for fun throwaway projects, demos and explorations. It was good fun and it almost worked. Today (1 year later), programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny. The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software. Many people have tried to come up with a better name for this to differentiate it from vibe coding, personally my current favorite "agentic engineering": - "agentic" because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. - "engineering" to emphasize that there is an art & science and expertise to it. It's something you can learn and become better at, with its own depth of a different kind. In 2026, we're likely to see continued improvements on both the model layer and the new agent layer. I feel excited about the product of the two and another year of progress.
Andrej Karpathy@karpathy

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

English
637
820
8.8K
1.2M