Kenny Workman

1.6K posts

Kenny Workman

Kenny Workman

@kenbwork

cto @latchbio, data infrastructure for biology

Katılım Mayıs 2020
1.3K Takip Edilen6.7K Takipçiler
Sabitlenmiş Tweet
Kenny Workman
Kenny Workman@kenbwork·
Full footage from our systems x biology reading group with researchers from Arc Institute + FutureHouse. 2:33 LData: Building a distributed filesystem on Postgres and S3 (LatchBio) 26:10 BINSEQ: High-performance binary formats for DNA sequences (Noam Teyssier, Arc Institute) 49:00 Data Flywheels: Reinforcement learning algorithms for scientific AI (James Braza, FutureHouse) 1:08:07 Scaling Deep Learning to 1B+ Single Cells (Abhinav Adduri, Arc Institute) 1:36:30 Shreya Shekha, Greylock; Closing Biology is still a greenfield space for systems work. As molecular datasets continue to scale, engineering challenges will emerge at every layer of the stack, eg. file systems, storage + ML infra.
English
6
23
187
24K
Kenny Workman retweetledi
Pierce
Pierce@PierceOgdenJ·
Really great collaboration between @ManifoldBio and @nvidia to test 1M de novo designed binders against 127 targets, measuring over 100 million potential protein-protein interactions! This was a great collaboration with some very exciting results. NVIDIA's new Proteina-Complexa method is SOTA for de novo minibinder design. If you're interested in designing minibinders to targets you couldn't hit with other methods, try it out! I'm particularly excited about what this large scale data enables. As we generate 1000s of experimentally validated structures, this data becomes the input to training new protein design models. At Manifold, we are generating datasets of this size continuously, and have experimentally validated thousands of de novo designed binders across many formats (VHH, minibinders, peptides, etc). New models will open up new hard to hit targets, paired with our large scale in vivo measurement, will enable us to create previously impossible therapeutics. Up next is training new models on this and other data we have generated, stay tuned! And thanks to NVIDIA for setting up such a great collaboration, its been fun and fruitful!
Pierce tweet media
Christian Dallago@sacdallago

🧵 We ran the largest head-to-head benchmark of protein binder design methods in the wet lab. Project page: research.nvidia.com/labs/genair/pr… 1 million designs. 127 targets. RFdiffusion, BindCraft, BoltzGen, and Proteina-Complexa — all tested side by side.👇

English
4
39
146
20.8K
Sauers
Sauers@Sauers_·
My project to build disease prediction models, formalize (and prove new) genomic theory in Lean 4, discover powerful new methods for computational biology, and create the best generalized additive model engine, has been accepted into the @AnthropicAI Open Source Program!
Sauers tweet media
English
27
12
350
20.4K
Kenny Workman
Kenny Workman@kenbwork·
Talks at the intersection of systems engineering and computational biology 0:20 Why study systems x biology in "age of agents" 5:50 Forch: Building a utilitarian cloud container orchestrator (Max Smolin, LatchBio) 41:25 cyto: Ultra high-throughput processing of 10x Flex single-cell sequencing (Noam Teyssier, Arc Institute) 1:04:30 SLAF: A single-cell omics storage format for the virtual cell era (Pavan Ramkumar, SLAF Project) 1:33:30 Lessons in Perturbation Modeling: STATE, STACK, and Beyond (Dhruv Gautam, Arc Institute + UC Berkeley) 2:03:15 Leveraging Serverless Distributed Computing to Scale Computational Biology (Ben Shababo, Modal) Topics span container orchestration, single-cell infra, perturbation modeling for biology at scale.
English
2
6
51
3.3K
Kenny Workman
Kenny Workman@kenbwork·
Agents for data analysis, especially for assays with week to month long computational components of the prep to insight cycle (eg. spatial), will see widespread adoption before there’s a feedback loop. Life science cloud spend redirected here through tool calls. Let’s check in at the end of the year.
English
0
1
4
903
David Li
David Li@davidycli·
until AI x life sciences models can run agent generated hypotheses in the physical world with empirical real world feedback, there will be no meaningful value creation (or value capture) it will continue to be a nice toy in industrial workflows
English
12
7
123
9K
Kenny Workman
Kenny Workman@kenbwork·
Hosting another computing x biology reading group with Modal. Progress has really picked up the past 6 months + many interesting projects to highlight. - Max Smolin (LatchBio): Building "Forch", a Utilitarian Cloud Container Orchestrator - Noam Teyssier (Arc Institute): cyto: ultra high-throughput processing of 10x-flex single cell sequencing - Pavan Ramkumar (SLAF Project): SLAF: A single-cell omics storage format for the virtual cell era - Dhruv Gautam (Arc Institute): Lessons in Perturbation Modeling: STATE, STACK, and Beyond - Ben Shabobo (Modal): Leveraging Serverless Distributed Computing to Scale Computational Biology Come join us for pizza and good technical talks on March 4th in Mission Bay, SF. Design decisions, paper highlights + snippets of source code.
Kenny Workman tweet media
English
4
3
44
3.1K
Kenny Workman
Kenny Workman@kenbwork·
Never been a better time to be a serious student of computers, especially if you enjoy hardware, low-level, and systems work. This skillset will get more valuable as models improve and the world reorganizes around new scales of aggregated data + compute. Training, inference, but also the workloads induced by large teams of agents making millions of tool calls in science + engineering. There will always be new bottlenecks at the edge and growing demand for engineers who can reason about the whole stack well enough to both find and fix them. Easy to forget these machines are for far more than hosting CRUD apps. Get used to the idea of computers as vehicles for real work: storing and transforming so much information that it spills into the physical world. Drums of magnetic tape, disks, power delivery, cooling, switches, racks competing for the same space as wet labs, factories, energy plants.
English
0
0
17
1.6K
Kenny Workman
Kenny Workman@kenbwork·
Few months after building the first proof of concept. Seems we are reasonably close to scientists routinely answering questions from raw data generated by spatial and single cell kits. Watched a dev bio researcher spend >100 hours in an agent session over course of a week and get (some) of what he needed for publication. Still many papercuts but rate of improvement is steady.
Kenny Workman@kenbwork

If you think about it, machine guided data analysis, especially in biology, likely the next frontier after agentic SWE. Verifiable tools that help scientists with strong existing understanding of the domain do work with higher quality + speed (raise the ceiling not the floor) will have the most impact. Intermediate artifacts and transparent thinking prevent errors from compounding. Possible then that outputs can be used to drive expensive business, scientific decisions or placed in publication.

English
0
1
18
3.5K
Kenny Workman
Kenny Workman@kenbwork·
been dating this girl for three years and she still knows more about semiconductors than me
English
3
0
20
3.1K
Kenny Workman
Kenny Workman@kenbwork·
Also think products will emerge far more tailored for learning technical topics that could replace textbook/paper: building essentially its own model of the student's brain and figuring out optimal path to understanding given physical learning constraints. Controls the ordering and timing of reading material, problems, troubleshooting, etc.
English
2
0
13
1.8K
Kenny Workman
Kenny Workman@kenbwork·
Have worked through quite a few pure + applied math texts (D+F, Leinster, Hatcher, Wasserman) after leaving college and experimented with many different learning styles: notes or book alone, solutions and no solutions, how much to use Math Overflow and recently AI tools. Think this take is both wrong and actually kind of dangerous for rising generations of engineers/scientists. Like many students who spent years reflecting on what their brain was doing building 'mathematical maturity', have come to understand the seemingly 'pointlessly obtuse' layout of math curriculum is *itself* a very deliberate pedagogical tool for learning formal material. You kind of need to load all of the definitions, theorems, lemmas in your mind, and just let it sit while you fight - and its definitely a fight - to get things to click. So you need a learning structure that encourages you to spend as much time in this state as possible. The flavor of understanding you get reading someone else's solutions (or generating AI summaries of eg. baby Rudin) is like a shadow of the real thing. Its hard for your brain to immediately tell the difference and it will feel real. But you will find if you go to a whiteboard or blank piece of paper and try to recover it, let alone pull out pieces and play with them, you won't be able to. You can't use it. All of this kind of depends on your goals. Staring at shadows can be a fun and useful way to speedrun a lot of interesting material to get a taste of ideas you won't have the time/ability to actually understand with the real constraints of the day. But if you want to actually *own* the objects/tools and use them fluidly throughout your life, the shadows aren't super useful. So how does AI fit into this? These tools are actually genuinely incredible study companions when used correctly, allowing motivated individuals to learn faster, dig deeper, get exposure to similar ideas in other fields. But think the interaction patterns should be sparse, not dissimilar from interacting with a grad student in office hours, rather than living in the chat interface. AI directly address the primary failure mode of the 'bare textbook' method: diminishing return on time spent in the stuck, focused state. After some point, your mind just wont make the leap to the solution and you need something external to show you a trick or nudge in the right direction. And up until recently, the only way to really do this was to get another person to reason through the structure you've built up in your mind so they could show you what to do in that specific context. Remember, if you look at the answer, you never tease out the reasoning trace yourself and what you're left with is a shadow. Existing MO/Internet can be useful (and 'answers' often don't provide the answers), but drops off as you get into more difficult + niche material. Today, you can take a picture of your own written work (or paste in blocks of latex with a sentence description) and get the correctly calibrated 'push' towards solution for *pretty much any field of math a reasonable person wants to learn*. This is beyond incredible, truly science fiction stuff, and is the main way I use these things. Also find following tasks additive (and unique/hard to access before): - Generating new problems that pull out + focus on specific piece struggling with (ex: lot of stat calculations depends on algebraic tricks, recognizing when something is a Taylor series, integrating gnarly functions like gaussian where difficulty is more calculus than anything else) - Figuring out what the author is trying to teach me / what "we're really trying to do" with a problem But most of the time you should still be staring at the obtuse math textbook and working out problems on paper. The real concern with these takes is most students are no longer going to actually build durable knowledge while a small fraction will become wildly more capable than previous generations. They're only cheating themselves, but developing minds need discipline guided by the tools/methods we recommend. So tldr; encourage everyone to use AI to learn math, but use it in a surgical way, respect the material and understand how you actually learn. Otherwise you'll be playing with shadows.
alz@alz_zyd_

Math textbooks are written in a pointlessly obtuse way. Gemini does an incomparably better job. My professional opinion is that all undergrads learning real analysis should give up reading baby Rudin, and simply learn analysis from Gemini instead

English
20
37
398
38.1K
Kenny Workman
Kenny Workman@kenbwork·
@TuXinming Thank you. Please shoot me an email at kenny@latch.bio. Have already supplied to a few researchers but prefer to do it this way to avoid benchmark hacking.
English
1
0
1
94
Xinming Tu
Xinming Tu@TuXinming·
@kenbwork Nice work! I’m curious about how we can access the complete benchmark dataset beyond the demo task.
English
1
0
1
161
Kenny Workman retweetledi
Prof. Nikolai Slavov
Prof. Nikolai Slavov@slavov_n·
A new benchmark of 394 verifiable problems allowed @kenbwork and colleagues to ask: How good are frontier AI agents at routine scRNA-seq analysis? They have improved. They still fail, often.
Prof. Nikolai Slavov tweet media
English
1
11
43
4.3K
Kenny Workman
Kenny Workman@kenbwork·
Excited to come back (last memories of Cory Hall were EECS126 lectures). Spatial biology presents challenging agent engineering problems. Only some of what works for coding RL can be adapted and a lot has to be introduced. Will discuss our approach to verifiability with messy real world data, scaling data infra for hundreds of parallel agent environments and what we learned about the scientific behavior of frontier models analyzing thousands of trajectories.
Arshia Nayebnazar@arshianazar

We're excited to announce the second Berkeley BioML seminar of the semester happening next Tuesday 2/17! Join us for a talk by Kenny Workman (@kenbwork) from LatchBio about the performance of agents for spatial biology analysis. luma.com/f3xa3dst

English
0
1
10
1.7K
Kenny Workman
Kenny Workman@kenbwork·
@metapredict Only familiar with the math in a pre-clinical context but sure the trend is the same, just with larger fixed cost to procure/manage patient sample? The point is the variable cost associated with computers scales disproportionately to lab as plex increases
English
0
0
0
52
Jamie Timmons
Jamie Timmons@metapredict·
The cost of doing a descent sized spatial clinical sample study is >>$250,000 (it requires staff). That data can be modelled well by a post doc inexpensiveley in 1yr. Of course if people continue to work with mickey mouse n=3 ROIs then sure the analysis costs more than the lab work.
English
1
0
0
52
Kenny Workman
Kenny Workman@kenbwork·
Analysis cost rapidly becoming more expensive than reagents/prep labor with modern assays. Spatial kits are good examples. As plex/throughput increase, computers eat a larger fraction of the total cost of assay. Can see this numerically with CosMx napkin math: CosMx 6K slide with ~1 week/40hr of analysis labor (very conservative) lands at ~$11k/slide. Analysis cost is ~$4.25k/40% (both labor + compute resources). Jump to CosMX WTX, whole tx, and raw volume increases ~3.2x. Analysis increases to 1.5 weeks/60 hrs and with higher compute/storage costs. Again super conservative. Assuming you are able to "just scale" both these things (scaling computing resources by constant multiples eventually hits hard limits, both in access to those resources + systems software just breaking) and the slide is $14.3K with analysis ~50% of total. Jump again
Sebastian S. Cocioba🪄🌷@ATinyGreenCell

@ledflyd @jrkelly I don't think compute infra is comparable to the monstrous costs of running modern wetlab experiments. A single antibody is like 100hrs on an HPC. Typing is free and uni infra is paid for by larger grants with shared use across many researchers. Code cheaper than molecules always

English
2
1
10
6K