Justin Butler

1.2K posts

Justin Butler banner
Justin Butler

Justin Butler

@Butlerjustin

building at the intersection of atoms+bits+cells+ @EclipseVentures

Palo Alto, CA Katılım Haziran 2010
2K Takip Edilen701 Takipçiler
Justin Butler
Justin Butler@Butlerjustin·
Congrats to @SGRodriques and the team @EdisonSci on the launch of Kosmos. The future is going to be better than we all thing, specifically because of breakthroughs like this.
Sam Rodriques@SGRodriques

Today, we’re announcing Kosmos, our newest AI Scientist, available to use now. Users estimate Kosmos does 6 months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Kosmos has made 7 discoveries so far, which we are releasing today, in areas ranging from neuroscience to material science and clinical genetics, in collaboration with our academic beta testers. Three of these discoveries reproduced unpublished findings; four are net new, validated contributions to the scientific literature. AI-accelerated science is here. Our core innovation in Kosmos is the use of a structured, continuously-updated world model. As described in our technical report, Kosmos’ world model allows it to process orders of magnitude more information than could fit into the context of even the longest-context language models, allowing it to synthesize more information and pursue coherent goals over longer time horizons than Robin or any of our other prior agents. In this respect, we believe Kosmos is the most compute-intensive language agent released so far in any field, and by far the most capable AI Scientist available today. The use of a persistent world model also enables single Kosmos trajectories to produce highly complex outputs that require multiple significant logical leaps. As with all of our systems, Kosmos is designed with transparency and verifiability in mind: every conclusion in a Kosmos report can be traced through our platform to the specific lines of code or the specific passages in the scientific literature that inspired it, ensuring that Kosmos’ findings are fully auditable at all times. We are also using this opportunity to announce the launch of Edison Scientific, a new commercial spinout of FutureHouse, which will be focused on commercializing our agents and applying them to automate scientific research in drug discovery and beyond. Edison will be taking over management of the FutureHouse platform, where you can access Kosmos alongside our Literature, Molecules, and Precedent agents (previously Crow, Phoenix, and Owl). Edison will continue to offer free tier usage for casual users and academics, while also offering higher rate limits and additional features for users who need them. You can read more about this spinout on our blog, below. A few important notes if you’re going to try Kosmos. Firstly, Kosmos is different from many other AI tools you might have played with, including our other agents. It is more similar to a Deep Research tool than it is to a chatbot: it takes some time to figure out how to prompt it effectively, and we have tried to include guidelines on this to help (see below). It costs $200/run right now (200 credits per run, and $1/credit), with some free tier usage for academics. This is heavily discounted; people who sign up for Founding Subscriptions now can lock in the $1/credit price indefinitely, but the price ultimately will probably be higher. Again, this is less chatbot and more research tool, something you run on high-value targets as needed. Some caveats are also warranted. Firstly, we find that 80% of Kosmos findings are reproducible, which also means 20% are not -- some things it says will be wrong. Also, Kosmos certainly does produce outputs that are the equivalent to several months of human labor, but it also often goes down rabbit holes or chases statistically significant yet scientifically irrelevant findings. We often run Kosmos multiple times on the same objective in order to sample the various research avenues it can take. There are still a bunch of rough edges on the UI and such, which we are working on. Finally, we are aware that the 6 month figure is much greater than estimates by other AI labs, like METR, about the length of tasks that AI Agents can currently perform. You can read discussion about this in our blog post. Huge congratulations to our team that put this together, led by @ludomitch and @michaelathinks: Angela Yiu, @benjamin0chang, @sidn137, Edwin Melville-Green, Albert Bou, @arvissulovari, Oz Wassie, @jonmlaurent. A particular shout out to @m_skarlinski and his team that rebuilt the platform for this launch, especially Andy Cai @notAndyCai, Richard Magness, Remo Storni, Tyler Nadolski @_tnadolski, Mayk Caldas @maykcaldas, Sam Cox @samcox822 and more. This work would not have been possible without significant contributions from academic collaborators @mathieubourdenx, @EricLandsness, @bdanubius, @physicistnevans, Tonio Buonassisi, @BGomes_1905, Shriya Reddy, @marthafoiani, and @RandallBateman3. We also want to thank our numerous supporters, especially @ericschmidt, who has been a tremendous ally. We will have more to say about our supporters soon!

English
0
0
1
131
Justin Butler retweetledi
Sam Rodriques
Sam Rodriques@SGRodriques·
Today, we’re announcing Kosmos, our newest AI Scientist, available to use now. Users estimate Kosmos does 6 months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Kosmos has made 7 discoveries so far, which we are releasing today, in areas ranging from neuroscience to material science and clinical genetics, in collaboration with our academic beta testers. Three of these discoveries reproduced unpublished findings; four are net new, validated contributions to the scientific literature. AI-accelerated science is here. Our core innovation in Kosmos is the use of a structured, continuously-updated world model. As described in our technical report, Kosmos’ world model allows it to process orders of magnitude more information than could fit into the context of even the longest-context language models, allowing it to synthesize more information and pursue coherent goals over longer time horizons than Robin or any of our other prior agents. In this respect, we believe Kosmos is the most compute-intensive language agent released so far in any field, and by far the most capable AI Scientist available today. The use of a persistent world model also enables single Kosmos trajectories to produce highly complex outputs that require multiple significant logical leaps. As with all of our systems, Kosmos is designed with transparency and verifiability in mind: every conclusion in a Kosmos report can be traced through our platform to the specific lines of code or the specific passages in the scientific literature that inspired it, ensuring that Kosmos’ findings are fully auditable at all times. We are also using this opportunity to announce the launch of Edison Scientific, a new commercial spinout of FutureHouse, which will be focused on commercializing our agents and applying them to automate scientific research in drug discovery and beyond. Edison will be taking over management of the FutureHouse platform, where you can access Kosmos alongside our Literature, Molecules, and Precedent agents (previously Crow, Phoenix, and Owl). Edison will continue to offer free tier usage for casual users and academics, while also offering higher rate limits and additional features for users who need them. You can read more about this spinout on our blog, below. A few important notes if you’re going to try Kosmos. Firstly, Kosmos is different from many other AI tools you might have played with, including our other agents. It is more similar to a Deep Research tool than it is to a chatbot: it takes some time to figure out how to prompt it effectively, and we have tried to include guidelines on this to help (see below). It costs $200/run right now (200 credits per run, and $1/credit), with some free tier usage for academics. This is heavily discounted; people who sign up for Founding Subscriptions now can lock in the $1/credit price indefinitely, but the price ultimately will probably be higher. Again, this is less chatbot and more research tool, something you run on high-value targets as needed. Some caveats are also warranted. Firstly, we find that 80% of Kosmos findings are reproducible, which also means 20% are not -- some things it says will be wrong. Also, Kosmos certainly does produce outputs that are the equivalent to several months of human labor, but it also often goes down rabbit holes or chases statistically significant yet scientifically irrelevant findings. We often run Kosmos multiple times on the same objective in order to sample the various research avenues it can take. There are still a bunch of rough edges on the UI and such, which we are working on. Finally, we are aware that the 6 month figure is much greater than estimates by other AI labs, like METR, about the length of tasks that AI Agents can currently perform. You can read discussion about this in our blog post. Huge congratulations to our team that put this together, led by @ludomitch and @michaelathinks: Angela Yiu, @benjamin0chang, @sidn137, Edwin Melville-Green, Albert Bou, @arvissulovari, Oz Wassie, @jonmlaurent. A particular shout out to @m_skarlinski and his team that rebuilt the platform for this launch, especially Andy Cai @notAndyCai, Richard Magness, Remo Storni, Tyler Nadolski @_tnadolski, Mayk Caldas @maykcaldas, Sam Cox @samcox822 and more. This work would not have been possible without significant contributions from academic collaborators @mathieubourdenx, @EricLandsness, @bdanubius, @physicistnevans, Tonio Buonassisi, @BGomes_1905, Shriya Reddy, @marthafoiani, and @RandallBateman3. We also want to thank our numerous supporters, especially @ericschmidt, who has been a tremendous ally. We will have more to say about our supporters soon!
English
274
650
3.7K
727.7K
Justin Butler retweetledi
Patrick Collison
Patrick Collison@patrickc·
A recent reflection, based on conversations with economists and policy leaders, is that there are two superficially similar but importantly different perspectives one can hold with respect to US manufacturing: * Affinity for manufacturing and physical production is an anachronistic fetish, embodied by populists with outdated attraction to hard hats and clanging forges. A great deal of manufacturing has departed the US, which is certainly fine and probably even quite good. It's unpleasant labor, and countries ought to each specialize in their respective comparative advantages. * Manufacturing is the ultimate network effects and economies-of-scale business. As services are substituted by AI, and as datacenter deployment accelerates, the relative importance of manufacturing is likely to grow. To think that one can pick and choose sectors in which one will excel ("let's win in drones but not in dishwashers") is a fallacy. Manufacturing is hence of paramount strategic importance. However, we don't know how to make the US the world's preeminent manufacturing power (given its cost base and given the current center of gravity in China) -- indeed, we don't know whether it's even possible -- and this is a significant strategic problem for the country. I have zero direct expertise here, but my outside view is closer to #2 than #1: it seems that the ecosystems and supply chains create strong gravity across the board. I also asked @elonmusk, who has clearly done more over the past decade to advance sophisticated US manufacturing than anyone else, and this appears to be his view. Most economists, on the other hand, are much closer to #1, and I don't think that the economics profession considers the absence of good ideas for reviving US manufacturing to be a problem of particular significance. (There are lots of snide epithets about the efficacy of industrial policy.) It seems to me that there's even some amount of backwards reasoning happening, where, because we don't know how to do #2, #1 is subconsciously a much more comfortable position to hold. Talking about winning particular manufacturing sectors feels to me a bit like talking about winning individual biological research sectors or winning particular software sectors. That is: it seems that the strong default assumption should be that "the place that is best at biology research sector X will also be best at sector Y", and similarly in software, because the skills and inputs needed are so transferable. As such, my guess is that if the US seeks meaningful sovereignty or preeminence in any of drones, robotics, solar, batteries, pharma, etc., we need to bite the bullet, and win at manufacturing across the board. Overall, I'd love to read more arguments for and against these perspectives, particularly from those with direct expertise.
English
261
314
2.8K
712.6K
Justin Butler retweetledi
Windsurf
Windsurf@windsurf·
Qwen3-Coder at ~2000 tokens/sec is now live in Windsurf! ⚡️ Fully hosted on US servers by @cerebras. Video is 1x speed.
English
115
104
1.1K
777.2K
Director Michael Kratsios
Director Michael Kratsios@mkratsios47·
To win the AI race and ensure global technological dominance, we need more power.
English
51
180
869
109.2K
Justin Butler
Justin Butler@Butlerjustin·
Im excited for a true product genius to get a hold of the new generation of AI. We need the Steve Jobs of AI to really turn this thing on and change the world.
English
0
0
2
100
Justin Butler
Justin Butler@Butlerjustin·
The future of healthcare is going to be wild. The world going on to build a true model of the cell is going to fundamentally change what is possible in medicine. @arcinstitute @cziscience
English
1
0
3
112
Justin Butler retweetledi
Michael Nuñez
Michael Nuñez@MichaelFNunez·
🚨 BREAKING: Perplexity AI & Cerebras unveil one of the fastest AI search engines ever—running at 1,200 tokens/sec. This exclusive report dives into how custom AI chips could reshape search as we know it. Read here: venturebeat.com/ai/cerebras-pe… @perplexity_ai @cerebras
English
3
18
104
6.4K
Justin Butler retweetledi
Cerebras
Cerebras@cerebras·
DeepSeek R1 70B is now on Cerebras! - Instant reasoning at 1,500 tokens/s – 57x faster than GPUs - Higher model accuracy than GPT-4o and o1-mini - Runs 100% on Cerebras US data centers inference.cerebras.ai
Cerebras tweet media
English
90
247
1.9K
261.6K
Justin Butler retweetledi
Cerebras
Cerebras@cerebras·
DeepSeek’s R1 70B combines the powerful reasoning ability of the full R1 model with the size and speed of Llama 70B. R1 70B outperforms GPT-4o and o1-mini across a range of general and reasoning benchmarks, making it the most capable Llama 70B variant by far.
Cerebras tweet media
English
4
12
107
12.4K
Charly Mwangi
Charly Mwangi@charlythuo·
Modern society is just an elaborate version of a village. However, certain things accepted in today’s society would be unacceptable in a village. For example, I couldn’t imagine a village that holds in high esteem the person who net-consumes the most instead of one who net-produces the most.
English
1
1
13
1.3K