Ryan-Rhys Griffiths

202 posts

Ryan-Rhys Griffiths banner
Ryan-Rhys Griffiths

Ryan-Rhys Griffiths

@Ryan__Rhys

Research Scientist | ex-@Meta | ex-@Mila_Quebec | ex-@Huawei | PhD @Cambridge_Uni | LLMs | AI4Science | Bayesian Optimization | Chess FIDE Master

San Francisco Katılım Temmuz 2020
4K Takip Edilen5K Takipçiler
Zekun Guo
Zekun Guo@Zek_Guo·
@Ryan__Rhys Huge thanks for sharing! I found you through GitHub and just come to X to follow you.
English
1
0
1
29
Ryan-Rhys Griffiths
Ryan-Rhys Griffiths@Ryan__Rhys·
From J-1 to Green Card in 15 months. I'm open-sourcing my complete 1600-page EB-1A petition (pictured). Includes: → Complete LaTeX petition template → Example reference letters → Timeline In light of Trump's comments on streamlining the EB-1A process, there's never been a more important time to consider it. Huge thanks to @RazMarinescu , whose open-source EB-1A effort inspired me to take this path. ⚠️ I'm not a lawyer—consult one before filing. For education purposes only. GitHub: github.com/Ryan-Rhys/EB1A Blog post: ryan-rhys.github.io/ryan__rhys/blo… #EB1A #GreenCard
Ryan-Rhys Griffiths tweet media
English
2
4
17
1.3K
Ryan-Rhys Griffiths retweetledi
Simon Frieder
Simon Frieder@friederrrr·
2023: It's hard to devise an LLM that solves a math problem. 2025: It's hard to devise a math problem that stumps and LLM. (...this in the context of competitive math questions, but we'll also get to research-level math soon)
English
0
1
5
739
Ryan-Rhys Griffiths retweetledi
Simon Frieder
Simon Frieder@friederrrr·
Can you solve this Olympiad-level problem? This is the mind of problems LLMs have to solve to compete at the third AI Math Olympiad.
Simon Frieder tweet media
English
0
1
5
678
Matt Durrant
Matt Durrant@mgdurrant·
Thrilled to share that I have joined @AnthropicAI as a life science researcher! I am confident that Claude will do amazing things to accelerate biology. Big things ahead!
Matt Durrant tweet media
English
112
51
2K
260.5K
Ryan-Rhys Griffiths retweetledi
Simon Frieder
Simon Frieder@friederrrr·
OpenAI have just released an open-weight LLM. openai.com/index/introduc… This is great news for the AIMO3 competition we'll launch. Get your GPUs ready to fine-tune the lemmas out of GPT-OSS! :)
English
0
1
8
673
Ryan-Rhys Griffiths retweetledi
Simon Frieder
Simon Frieder@friederrrr·
Below is a chronological thread that summarizes the main developments around the IMO25 and the controversial AI evaluation. (Note that due to managing the AIMO, I can't really comment on ongoing developments, and the links below should not be construed as me endorsing them.) 1. IMO announces that they cannot provide certification of the AI submission models, or guarantee scientific reproducibility. (imo2025.au/news/the-66th-…) 2.OpenAI publishes their announcement on X that they won gold. (x.com/alexwei_/statu…) 3. This raised various questions about the circumstances in which this evaluation was carried out, and whether the fact that the announcement came before the IMO ceremony wasn't bad timing. (x.com/demishassabis/…) 4. At that same time, DeepMind announces the they, too, achieved gold. (deepmind.google/discover/blog/…) [A number of further companies and organizations I have heard will also make announcements, so stay tuned.] 5. This was then finally picked up by Hacker News, Reddit and other platforms. (news.ycombinator.com/item?id=446371…) 6. At the same time, prominent voices from the wider community expressed dissatisfaction of this new way of doing science. (linkedin.com/posts/gerko_ne…) 7. Finally, even mainstream news picked it up, e.g. NY Times, albeit with a slightly inaccurate message, as the correctness of the final solution was significantly more strongly emphasized, than the process by which one arrived at that solution. (nytimes.com/2025/07/21/tec…)
English
1
1
4
712
Ryan-Rhys Griffiths retweetledi
Simon Frieder
Simon Frieder@friederrrr·
The IMO is changing - and is walking in the footsteps of chess competitions. Last year it was just the #aimoprize that it hosted. This year there was an associated event where a handful of AI companies and organizations tested some of their models on the IMO problems. The story with math is similar how the story with chess will go, where chess engines these days routinely beat humans, and so there are human-vs-human competition tracks, machine-vs-machine competition tracks, as well as machine_assisted_human-vs-machine_assisted_human track. The latter is called "Advanced chess". en.m.wikipedia.org/wiki/Advanced_…. I predict in a few years there will be an "advanced IMO."
English
2
2
12
1K
Ryan-Rhys Griffiths retweetledi
Simon Frieder
Simon Frieder@friederrrr·
IMO2025 has begun. Last year, AlphaProof won a silver medal (though no paper nor software was released so we have to trust that claim and the mathematicians that had access). This year, a whole bunch of organizations requested access to the IMO problems, so it will be interesting to see how they perform. My bet is that we'll see a gold medal for the "formal Olympiad". While impressive, that still will not be an accurate assessment of these systems, as 6 problems don't make a convincing benchmark (though they work wonders for marketing). As an example, AlphaGeometry and my own improvement of it, Newclid, also solve Olympiad-level geometry problems, but cannot prove the Pythagorean theorem (Newclid has some limited support for it though). This slipped under the radar in the first release of AlphaGeometry, since benchmarks weren't large enough. Small benchmarks are dangerous.
English
0
1
13
1.1K
Ryan-Rhys Griffiths retweetledi
Sam Rodriques
Sam Rodriques@SGRodriques·
Today, we’re announcing the first major discovery made by our AI Scientist with the lab in the loop: a promising new treatment for dry AMD, a major cause of blindness. Our agents generated the hypotheses, designed the experiments, analyzed the data, iterated, even made figures for the paper. The resulting manuscript is a first-of-a-kind in the natural sciences, in which everything that needed to be done to write the paper was done by AI agents, apart from actually conducting the physical experiments in the lab and writing the final manuscript. We are also introducing Robin, the first multi-agent system that fully automates the in-silico components of scientific discovery, which made this discovery. This is the first time that we are aware of that hypothesis generation, experimentation, and data analysis have been joined up in closed loop, and is the beginning of a massive acceleration in the pace of scientific discovery that will be driven by these agents. We will be open-sourcing the code and data next week. Robin is a multi-agent system that uses Crow, Falcon, and Finch, the agents on our platform, to generate novel hypotheses, plan experiments, and analyze data. We asked Robin to find a new treatment for dry age-related macular degeneration. Robin considered the disease mechanisms associated with dry AMD, proposed a specific experimental assay that could be used to evaluate hypotheses in the wet lab, and proposed specific molecules we could test in that assay. We tested the molecules and gave it the resulting data, which it analyzed before proposing more experiments. In the end, it identified Ripasudil, a Rho Kinase inhibitor (ROCK inhibitor) that is approved in Japan for several other diseases, which seems very promising as potential treatment for dry AMD. It also identified specific molecular mechanisms that might underlie the effects of Ripasudil in RPE cells, from an RNA sequencing experiment it proposed. To be clear, no one has proposed using ROCK inhibitors to treat dry AMD in the literature before, as far as we can find, and I think it would have been very difficult for us to come up with this hypothesis without the agents. We have also run the proposed treatment by several experts in AMD, who confirm that it is interesting and novel. Moreover, this project was fast: with Robin in hand, the entire project took about 10 weeks, which is way shorter than it would have taken if we had been doing all of the in-silico components ourselves. Important caveats: We are real biologists at FutureHouse, so I want to be clear that although the discovery here is exciting, we are not claiming that we have cured dry AMD. Fully validating this hypothesis as a treatment for dry AMD will take human trials, which will take much longer. Also, this discovery is cool, but it is not yet a "move 37"-style discovery. At the current rate of progress, I'm sure we will get to that level soon. Congratulations to the team. Congratulations in particular to Robin, which generated the hypotheses, proposed the experiments, analyzed the data and generated the figures. And major congratulations also to the human team, which built Robin: @MichaelaThinks, @agreeb66, @benjamin0chang, @ludomitch, Mo Razzak, Kiki Szostkiewicz, and Angela Yiu.
English
113
698
3.6K
1.1M
Ryan-Rhys Griffiths
Ryan-Rhys Griffiths@Ryan__Rhys·
The FutureHouse platform is now live and ready to use, featuring a variety of ways to leverage language agents in scientific endeavors: platform.futurehouse.org
Andrew White 🐦‍⬛@andrewwhite01

The plan at FutureHouse has been to build scientific agents and use them to make novel discoveries. We’ve spent the last year researching the best way to make agents. We’ve made a ton of progress and now we’ve engineered them to be used at scale, by anyone. Today, we’re launching the FutureHouse Platform: an API and website to use our AI agents for scientific discovery. It’s been a bit of a journey! June 2024: we released a benchmark of what we believe is required of scientific agents to make an impact in biology, Lab-Bench. September 2024: we built one agent, PaperQA2, that could beat biology experts on literature research tasks by a few points. October 2024: we proved-out scaling by writing 17,000 missing Wikipedia articles for coding genes in humans. December 2024: we released a framework and training method to train agents across multiple tasks - beating biology experts in molecular cloning and literature research by >20 points of accuracy. May 2025: we’re releasing the FutureHouse Platform for anyone to deploy, visualize, and call on multiple agents. I’m so excited for this, because it’s the moment that we can see agents impacting people broadly. I’m so impressed with the team at FutureHouse for us to execute our plan in less than 1 year. From benchmark to wide deployment of agents that can exceed human performance on those benchmarks! So what exactly is the FutureHouse Platform? We’re starting with four agents: precedent search in literature (Owl), literature review (Falcon), chemical design (Phoenix), and concise literature search (Crow). The ethos of FutureHouse is to create tools for experts. Each agent’s individual actions, observations, and reasoning is displayed on the platform. Each scientific source is considered from retraction status, citation count, record of publisher, and citation graph. A complete description of the tools and how the LLM sees them is visible. I think you’ll find it very refreshing to have complete visibility into what the agents are doing. We’re scientific developers at heart at FutureHouse, so we built this platform API-first. For example, you can call Owl to determine if a hypothesis is novel. So - if you’re thinking about an agent that proposes new ideas, use our API to check them for novelty. Or checkout Z. Wei’s Fleming paper that uses Crow to check ADMET properties against literature by breaking a molecule into functional groups. We’ve open sourced almost everything already - including agents, the framework, the evals, and more. We have more benchmarking and head-to-head comparisons available in our blog post. See the complete run-down there on everything. You will notice our agents are slow! They do dozens of LLM queries, consider 100s of research papers (agents ONLY consider full-text papers), make calls to Open Targets, Clinical Trials APIs, and ponder citations. Please do not expect this to be like other LLMs/agents you’ve tried: the tradeoff in speed is made up for in accuracy, thoroughness and completeness. I hope, with patience, you find the output as exciting as we do! This truly represents a culmination of a ton of effort. Here are some things that kept me up at night: we wrote special tools for querying clinical trials. We found how to source open access papers and preprints at a scale to get to over 100 PDFs per question. We tested dozens of LLMs and permutations of them. We trained our own agents with Llama 3.1. We wrote a theoretical grounding on what an agent even is! We had to find a way to host ~50 tools, including many that require GPUs (not including the LLMs). Obviously this was a huge team effort: @m_skarlinski is the captain of the platform and has taught me and everyone at FutureHouse how to be part of a serious technology org. @SGRodriques is the indefatigable leader of FutureHouse and keeps us focused on the goal. Our entire front-end team is just half of @tylernadolsk time. And big thanks to James Braza for leading the fight against CI failures and teaching me so much about Python. @SidN137 and @Ryan__Rhys , for helping us define what an agent actually is. And @maykc for responding to my deranged slack DMs for more tools at all times. Everyone at FutureHouse contributed to this in some way, so thanks to them all! This is not the end, but it feels like the conclusion of the first chapter of FutureHouse’s mission to automate scientific discovery. DM me anything cool you find!

English
0
1
22
1.4K
Ryan-Rhys Griffiths retweetledi
Sam Rodriques
Sam Rodriques@SGRodriques·
The next frontier for AI Agents in Science will be data analysis. Today, we're releasing BixBench, the most sophisticated benchmark yet for data analysis in biology. Agents that can do these tasks will be powerful tools for discovery. So far, they're not even close.
Sam Rodriques tweet media
English
11
44
270
23.5K
Ryan-Rhys Griffiths retweetledi
Andrew White 🐦‍⬛
Andrew White 🐦‍⬛@andrewwhite01·
Half of an AI scientist is rejecting or accepting hypotheses. @FutureHouseSF and @SciMac just put out ~300 novel hypotheses from ~50 published papers along with ground-truth data. Humans take 4.2 hours to solve these and frontier models get 10-20% correct. SWE-bench for bio
Andrew White 🐦‍⬛ tweet media
English
7
32
199
15.6K
Dorsa
Dorsa@dorsa_rohani·
Who are the best people to follow on X for deep chemical & bio tech?
English
31
5
193
46.6K
Rhys Goodall
Rhys Goodall@RhysGoodall·
Is there any good literature on the use of deep set models as part of deep kernel learning GPs for bayesopt? @Ryan__Rhys @andrewgwils
English
1
0
1
215
Ryan-Rhys Griffiths
Ryan-Rhys Griffiths@Ryan__Rhys·
What are the mathematical capabilities of #LLMs and how are they evolving with time? GHOSTS is a new advanced mathematics benchmark for large language models accepted to #NeurIPS2023 CITATIONS: If you spot something you feel we should cite, leave a comment before the camera-ready deadline! Paper: arxiv.org/abs/2301.13867 Leaderboard: ghosts.xyfrieder.xyz
Ryan-Rhys Griffiths tweet media
English
1
1
15
2.6K