herrick

58 posts

herrick banner
herrick

herrick

@herkhere

@koahlabs @southpkcommons

Katılım Mayıs 2016
977 Takip Edilen215 Takipçiler
herrick retweetledi
Koah
Koah@koahlabs·
Koah has raised a $5M seed round led by @ForerunnerVC, with participation from @southpkcommons. In just 8 months, apps reaching 100M+ users are monetizing sustainably with Koah’s AI-native ads. We deliver $10+ eCPMs, premium demand, and minimal impact on retention. We are quickly becoming the standard for consumer AI monetization. This round will accelerate our mission to make scalable, native monetization effortless for every AI app.
Koah tweet media
English
54
51
322
43.5K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
I asked a few founders & investors: "What are 1-2 interesting startups for software engineers & researchers to join, beyond the big labs?" 1. MatX - chips for LLM training and inference - Founders: Reiner Pope, Mike Gunter (both ex-Google) 2. Prosper - domestic-work robots for households and businesses - Founder: Shariq Hashme (ex-OpenAI) 3. Manifest - core ml research lab - Founders: Jacob Buckman, Carles Gelada, Sean Zhang 4. Physical Intelligence - foundation models for robots - Founders: Karol Hausman, Sergey Levine, Chelsea Finn, Brian Ichter, Quan Vuong, Adnan Esmail, Lachy Groom 5. Mechanize - rl environments with an initial focus on sw engineering - Founders: Tamay Besiroglu, Matthew J Barnett, Ege Erdil 6. Lindy - AI agents for workflow automation - Founder: Flo Crivello 7. Forge - workflow automation for large enterprises - Founder: Markie Wagner 8. Reflection - agentic models with an initial focus on code - Founders: Misha Laskin, Ioannis Antonoglou 9. Fractal Software - venture studio building vertical SaaS companies - Founders: Nate Baker, Mike Furlong 10. Thinking Machines Lab - product-focused AI lab - Founders: Mira Murati, Barret Zoph, John Schulman, Lilian Weng, Andrew Tulloch, Luke Metz 11. Meter - managed internet and secure networking for offices hardware and software - Founders: Anil & Sunil Varanasi (brothers :)) 12. Allen Institute for AI (AI2) - nonprofit AI research - Founder: Paul Allen (1953-2018) 13. Abridge - AI medical scribe - Founders: Shivdev Rao, Sandeep Konam, Zachary Lipton, Florian Metze 14. Higgsfield - video gen models and apps (mobile focused) - Founder: Alex Mashrabov 15. Black Forest Labs - FLUX image models - Founders: Robin Rombach, Patrick Esser, Andreas Blattmann 16. LMArena - LLM evals for big labs - Founders: Anastasios N Angelopoulos, Wei-Lin Chiang, Ion Stoica 17. Marathon Fusion - fuel for future fusion plants (and creating gold?) - Founders: Kyle Schiller, Adam Rutkowski 18. Luma AI - Dream Machine video model - Founders: Amit Jain, Alex Yu 19. Aravolta - software for data center monitoring and control - Founders: Margarita Groisman, Jack Sutton 20. Harmonic - AI math and reasoning lab - Founders: Vlad Tenev (Robinhood co-founder), Tudor Achim 21. Socket - prevent vulnerabilities from dependencies in your code - Founder: Feross Aboukhadijeh 22. Expo - framework and cloud platform for making cross platform apps - Founders: Charlie Cheever, James Ide 23. Koah Labs - native ads network for AI chatbots - Founders: Nic Baird, Herrick Fang, Mike Choi 24. Brisk Teaching - AI tool for teachers feedback, lesson plans and assessments - Founder: Arman Jaffer 25. Anysphere (Cursor) - AI code editor - Founders: Michael Truell, Aman Sanger, Sualeh Asif, Arvid Lunnemark 26. Letta - memory for ai agents - Founders: Charles Packer, Sarah Wooders 27. Baseten - serverless GPU hosting - Founders: Tuhin Srivastava, Amir Haghighat, Philip Howes, Pankaj Gupta 28. Mithril (prev ML Foundry) - GPU compute for training and inference - Founder: Jared Quincy Davis 29. Quill Meetings - AI notetaker for meetings - Founder: Michael Daugherty, add Nick Adams 30. Pluralis Research - multi-participant training of models, for cryptocurrency protocols I think - Founder: Alexander Long 31. Linear - issue and roadmap tracking - Founders: Karri Saarinen, Jori Lallo, Tuomas Artman 32. Qualia - real estate closing platform for title escrow and lenders - Founders: Nate Baker, Joel Gottsegen, Lucas Hansen 33. CodeRabbit - AI code review for pull requests - Founders: Harjot Singh Gill, Guritfaq Singh 34. Sygaldry Technologies - quantum computing for AI inference and training - Founders: Chad Rigetti, Idalia Friedson 35. HealthMC - AI powered medical coding - Founders: Ian Blumenfeld, Brian Johnson, Judah Rabinowitz 36. Lorikeet AI - AI that does customer support ticket resolution - Founders: Steve Hind, Jamie Hall 37. RL Core - RL for optimizing water and wastewater plants - Founders: Martha White, Adam White, Alden Christianson 38. Protege AI - a marketplace for AI datasets - Founders: Bobby Samuels, Travis May, Engy Ziedan, Richard Ho 39. Periodic Labs - AI driven materials discovery for energy and electronics - Founders: Ekin Dogus Cubuk, Liam Fedus (Ekin is ex GDM and Liam is ex OAI) Thanks to the nominators: @calvinfo, @tamaybes, @Altimor, @lachygroom, @finbarrtimbers, @aranibatta, @MorganSchwanke, @AnjneyMidha, @jacobmbuckman, @ROWGHANI, @NWischoff, @ccheever, @herkhere, Nate Baker, @ianblu1, @dolaoseb, Anon, @FlintCasey, @cjc, @LilyPetherick & @auren
English
10
16
178
23.7K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
How To Help Your Partner Feel Deeply Seen This video is a live demo of resonance. Resonance is a communication tool for helping someone feel seen and emotionally co-regulated (think: helping someone return to feeling peaceful). Here’s the steps to trying with your partner: 1. Give them a heads up: Tell them about it in advance, say you’d like to use it when one of you wants to vent. 2. Use a signal: "Can you resonate me?" and "Want me to resonate you?" (Only for stuff not about each other at first!) 3. Practice: Do this in 10-20 low-stakes vents. (Things your partner is emotionally neutral about) 4. Level up: Once you’re comfy with it, try it for minor relationship stuff. (Keep the high stakes things for later.) (For the full episode: 'The Way You Lead - Chris Barber’ on Spotify/Apple podcasts) Chapters: 00:00 Overview 00:21 Copies 02:13 Guesses (explanation) 09:06 Guesses (instructions) 15:37 Exaggerations (explanation) 18:45 Exaggerations (instructions) 22:18 Debrief
English
0
6
13
3K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
Now is a good time to build weird things This was partially inspired by conversations with Sam Altman 1. When the rate of change is high: 1) society needs more new examples and 2) the value of trying new things is higher 2. The rate of change is very high at the moment, especially in tech and AI. 3. So society needs explorers to build new things. But genuinely new is normally weird, so build weird things. 4. People dislike weird things, because weird is unfamiliar at first. It seems to take people 7-14 exposures to a very new thing before they stop disliking it just because it’s new. So prioritize the opinions of people who’ve used whatever you’re making, over those who haven’t. 5. What’s not weird? Things where the first time everyone hears it, they all think it's a nice idea or instinctively nod. Everyone already knows that everybody already approves. 6. Instead: Is it interesting to you? Do you have retention? Does it make your life better? Does it make the lives of the people who use it better? What do people who have actually tried what you’re making think? 7. It’s easier to make weird things if you’re a bit weird 8. Inhabit your eccentricities. Be “you” -- so that the genetic experiment that is you, is valid. It’s good for the species when you do that! 9. Find your “founder-market fit.” Find the areas that need your unique strengths and where your unique weaknesses aren’t a challenge, or can be patched with a collaborator. 10. How can you tell if something is the right kind of weird? By using your gut feeling. With a week or two of trying, you’ll find that you can access your gut feeling on any decision. So practice using your gut feeling on more things. Practice following weird intuitions without needing a “reason”. 11. Meet other people who build weird things 12. If you DM the people you really resonate with, you’ll be glad you did. 13. Say one nice thought you had when you were reading their thing. Maybe two sentences. Like a text to a friend. 14. Contact anyone who made something that you really resonated with. 15. Contact them mid-way through reading their thing or using their product. Don’t wait till you’re done, the time is now. 16. DM them on twitter, or check their personal website for an email, or guess their email. 17. After they reply to the initial contact, send whatever you’d send if you were already close friends. Want to send links back and forth? Do that. Want to get their input on things you make? Do that. Plant the seeds and see which ones grow. 18. Do the weird things that you’ve cared about for months, even (or especially) if you don’t have a reason for caring about it 19. If you first cared about something a few months ago but have been avoiding it because you don’t have a good “reason” to work on it, there's a good chance you’ll still care a few months from now. But if you first cared about it a week ago, you might not care in a month. 20. So: what do you care about now? And of those, which did you care about 3/6/9/12/18 months ago? This is like filtering your interests for retention and whether they’re “lindy”. 21. The longer you’ve cared about something, the longer you’re likely to continue to care about it. 22. A reader suggested: “I would more say you need a portfolio of things you cared about when you were a child, things you cared about a few months ago, and things you only started caring about right now.” 23. If you’ve cared about something for a while and don’t have a reason for caring about it, that’s a weird intuition worth paying attention to. It might bring you to something weird and new that provides a helpful new example for society. 24. Given the high rate of change, it’s particularly helpful to build weird things now, when they have time to flourish and develop 25. Do you have a weird intuition of something that’d be good to do, where you’ve had that intuition for months now, and you still feel it? I hope you’ll try it now! 26. If you have something that you care a lot about and want to do in the future, do it now, instead. Thank you to the following people for feedback on drafts: @griffinchoe, @herkhere, Tim Wee, Josh Singer, @ybenpan, Rick Barber, @mattfigdore, @intellectronica, @zhengdongwang, @krishnanrohit, @BasilHalperin, @binarybits, @jacobmbuckman, and @sama.
Chris Barber tweet media
English
1
6
50
4.4K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
I did a new podcast with @dkazand about how to help others process big emotions and "feel their feelings" IMO it's a key life skill, esp for relationships, parenting, and self-integration! We did a live demo of resonance for emotional processing. It's an improvable skill!
Daniel K@dkazand

most problems are people problems, which are really emotional regulation problems in this episode @chrisbarber teaches us a concrete skill for processing emotions with words links bellow 00:00 Intro to Chris 00:52 What even is Emotional Regulation? 03:25 Why a successful founder focused his life on this 06:45 All problems are people problems 07:40 What is Resonance? 09:00 Attachment issues, mothers, and Resonance 10:57 Practical use cases for Resonance 13:33 What Resonance is *not* 15:32 How to do it: the three tools of Resonance 18:26 Live Demo: Copy Statements 20:40 Live Demo: Guess Statements 26:33 Live Demo: Exaggeration Statements 36:00 What it feels like when Resonance works 46:12 How to practice without becoming a weirdo 52:05 What about petulant people who weaponize emotion?? 57:09 Final reflections & invitation

English
20
16
85
28.7K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
I just made a vibe coding guide for non-coders. How to go from idea -> working prototype. I filmed it on a blank laptop & I show every step from start to finish. 00:23 Install Tools 06:44 Create Project 22:08 Vibe Debugging 37:22 Install Deployment Tools 45:19 Go Live
English
7
10
74
7.8K
herrick retweetledi
Koah
Koah@koahlabs·
We have been working with @liner_app since last year and it's been a blast. Liner and Koah have been experimenting with novel GenAI ad formats the world has never seen before and the results are in - era of GenAI advertising is here.
Mike Choi@guard_if

New case study on how @liner_app uses @koahlabs to monetize their free users with Koah's GenAI native ad formats. @liner_app is one of the fastest teams I have seen and extra special to me since they’re from my home country Korea 🇰🇷 [case study linked below]

English
0
1
5
1.2K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
Just did a call with Tamay Besiroglu (@tamaybes), co-founder of Epoch AI, about what he expects for the future of AI. “I think 2035 or so might be my median for when we’ll have drop-in AI remote workers.” “I think for 30% per year GDP growth we might see that in 2050, which would be roughly my median.” Here’s our chat: Which tasks will AI be able to do first? One useful framing is to think about Moravec’s paradox, where tasks that seem easy for humans are hard for AI. Skills that are relatively new and have been optimized for much less by natural selection and have emerged fairly recently, and when they emerged, conferred smaller fitness gains to humans, those tasks are easier for AI systems to do. So this would be chess, go, math, advanced abstract reasoning, coding, even language to some extent. By contrast, things that have been selected for and optimized for by evolution for a very long timeframe, like sensorimotor skills like interacting with  your environment, with your body, and using your senses to help you navigate your environment – basic locomotion, like balancing your body and being able to feed yourself, things like this. Those skills are harder to automate because our brains have very efficient kind of software for learning to do these tasks. AI systems will likely see, and to a large extent we do see, this divide along roughly these lines, where coding, math, chess, AI is getting pretty good, or in the case of chess and go, AI is better than the best humans. In the other bucket of things, like sensorimotor skills, AI is really quite bad and very inefficient, where you need way more data and compute and even different modalities. This tells you which tasks will fall first. It doesn’t tell you when those tasks will fall. When will AI be able to do those tasks? I think what’s useful to think about timeframe is to just look for other similar tasks, how long did it take in terms of calendar time, or maybe in terms of compute scaling, to get from “we’re making progress on the task” to “we’re getting to average human” to “beating all humans”. And so with chess, it took from you know the 1970s, a lab at Northwestern was working on chess engines, and they matched club players, 1500-2000 Elo and below, and then Kasparov in 1997, so it took about 25-30 years to go from median human to beating all humans, to beating the best humans. Now, the scaling up of compute has massively accelerated from what it was before, from doubling every two years, to doubling twice per year. And so I would expect that it would take 5 to 10 years for models that could do things as well as the median expert to beating all humans (at that task). So then I would predict, on the basis of that, that we get superhuman math reasoning, superhuman coding, roughly in 5 years or maybe a little longer than that. Super-reasoners for these specific domains, we’re already seeing a lot of progress in math and coding, I expect that to fall in 5 to 10 years. Do you expect drop-in remote workers by 2030? I don’t expect drop-in remote workers on that timeframe. I expect superhuman coding abilities and math and other types of reasoning, and to some extent, other skills. I think that AIs might be better than humans at management skills before they’re able to be drop-in remote workers, just because managing a large organization is something that wasn’t very strongly selected for by evolution. The really interesting thing is getting superhuman engineers and superhuman abstract reasoning. I predict that we get that first and also that we get that roughly on this 5 to 10 year timeframe. I think eventually you also get drop-in remote workers, but that might take a little longer, because it would require doing a bunch more stuff than just abstract reasoning, like it would require interfacing with other people and building up this rapport, and being able to navigate some environment based on very little data in a way that might require very good spatial visual reasoning. It’s a much broader set of things that you have to master. I think people will realize that “oh, actually software engineers don’t just write code” – there’s a lot of other things they do and those things will take slightly longer to automate. By when do you think there’s a 50-50 chance of drop-in remote workers? I think 2035 or so might be my median for when we’ll have drop-in AI remote workers. You’ve mentioned that it’s unclear if remote work automation is enough for economic acceleration. Is robotics the missing piece there? That’s right, robotics is. I do think you get a very substantial acceleration from drop-in remote work, because it’s maybe 1/4 of the total US economy, and on average, remote work is also higher compensation. So that means you might get on the order of trillions of dollars per year in additional economic output from automating remote work. So it could be really substantial rates of growth. By when do you think that consumer-oriented humanoid robots would be good enough that wealthy households would pay to keep a robot after a free trial? Initially they might keep it because it’s kind of cool, it’s a nice gadget, and it impresses my guests. For pure utility value, I think that perhaps mid 2030s is the point at which that happens more regularly. It would be earlier if it was very specific robots, like specifically for laundry but it can’t cook your food. But for general household robotic systems, I think mid 2030s. Do you expect relatively full employment through 2030-2035, with lots of job destruction, but also creation, because of human preference jobs, bullshit jobs, etc? I think I’m very uncertain. I think the right attitude would be to have fairly wide confidence intervals about labor force participation over the next 10 years, and even wider over the next 20 years and so on. I don’t think it’s because people will continue working in bullshit jobs, I think that productivity will improve, so jobs will become on average, less bullshit. They will become more meaningful in terms of adding to our economy, because our technology improves over time, and especially as AI gets really good at R&D or technology improves, which enhances productivity, and our economy becomes larger, and we accumulate more capital, then there’s more capital per worker. And finally, humans will work on things that AI systems struggle at, and so we effectively complement them. Or we’re effectively complemented by these very powerful AI systems. Then you get much more output, your marginal product of workers goes up, because the economy really really benefits from having humans do the things that AI systems can’t do, because we’re effectively bottlenecked on those tasks. This is related to what you told me about even if the economy is 90% automated, the share of economic spend on labor could be much higher than 10%? Yeah exactly. If you take current tasks and you automate 90% of those tasks, we could still have very high employment, because we could have all the human workers work on those remaining 10%, and those remaining 10% have now become very important, because they help us unlock the value of AI. What’s your median prediction on when total global inference spend exceeds $70 trillion per year (i.e. approx current total labor spend)? Yeah that’s a nice question. In my head I’m trying to project GDP over time, because that’s growing. I guess one question you could pose is, when is it going to exceed labor’s current share [i.e. labor is $70T, GWP about $105T, so when would it be >65-70% of global output], not just that dollar amount, but the actual fraction of output. That could never happen, it could remain below that because we get bottlenecked by other factors, and those shares grow as a fraction of total spend in our economy, maybe we spend more on land because we get bottlenecked by that or something. For that to happen, for inference spend to be 65-70% of the economy, that would happen when we automate basically all jobs. When it reaches $70 trillion, this will be a much smaller fraction of our economy than $70 trillion is now, because our economy will grow. My median for when inference spend exceeds $70 trillion per year, I guess I would say 2035 or something like that. What’s your expectation for what total labor spend might be in 2035? We grow maybe 5% per year for 10 years, I think it could be $130 trillion, perhaps more. I’m fairly uncertain, so I’d say $130 trillion with very wide confidence intervals, because of uncertainty about how much economic growth, uncertainty about job loss effects from AI automation. By when do you think that AI will flexibly substitute for remote computer jobs? Is that the same as your 2035 answer for remote drop-in workers? If it’s remote computer work, that would be different than all remote work. If you’re imagining perhaps IT support, I expect that to be sooner relative to all remote work. It could be 2030-2032, for AI to be able to flexibly substitute for all IT work that is currently being done remotely. If AI can flexibly substitute for remote computer roles and also total labor spend will still be relatively large in 2035, is that due to relative advantage and complementarity? It would be because I think AI systems will not be able to match the efficiency of humans at complex motor skills, visual motor skills, and in-person work. It might be quite good at things you can do remotely, but not things you need embodiment for, hands for. Does that mean you expect a large amount of job displacement from very computer-based jobs towards jobs that are computer plus real-world hybrid and pure real-world jobs? Yeah, that’s right, I do expect a shift towards that. It sounds like explosive economic growth (i.e. GDP growth rate of 30%+ per year) is predicated on having enough pieces of the economy automated that you’re not bottlenecked by the remaining pieces. What’s your median for when we’ll see explosive economic growth? I think for 30% per year GDP growth we might see that in 2050, which would be roughly my median. Elon said “If you define AGI (artificial general intelligence) as smarter than the smartest human, I think it's probably next year [2025], within two years [by April 2026]” (April 2024) What’s the charitable interpretation where that’s true, and then how do you rate the literal interpretation? Maybe the charitable interpretation is there’s some definition of smarter where this is true, something like more knowledgeable, more widely read, more capable of recalling a bunch of facts, faster at reading and maybe writing and context switching. So it’s definitely smarter in those dimensions, but obviously it’s not smarter in all dimensions. By 2026, Terence Tao will still be better at math than the best AI systems, very clearly. For the literal interpretation of there being no human that an AI does not pareto dominate, that’s going to turn out to be terribly wrong. Do you expect inference costs to continue declining at the current rate? GPT-4 was cents per 1000 tokens, and now we have Gemini 2.0 Flash or whatever it’s called, and it’s cents per million tokens. So there’s 3 OOMs of efficiency gains for roughly the same performance. So that’s 10x per year, 90% decline per year. This is not the same as saying we’re making an OOM per year of effective compute gains, because if you have 10x more compute, that compute might allow you to really push out the frontier. These specific innovations are more biased toward allowing you to obtain already attained capabilities but cheaper. I do think we will continue seeing that type of innovation happen at a very fast pace, whether it’s as fast as we’ve seen it historically is a little uncertain. There might have been more low hanging fruit. But techniques like distillation and iterated amplification seem pretty powerful. Maybe you can’t really approximate the capabilities of very very good reasoning models in these tiny models, perhaps there’s some minimum circuit depth required to be able to approximate that. But I do expect the cost decline to be very fast, maybe as fast as we’ve seen historically, maybe slightly slower. How would you advise your relatives to prepare for AI? I would encourage them to save more and invest for precautionary reasons. It depends on how old the relatives are. But if you’re young that might mean what you thought would be this large amount of wealth that you could earn by selling your labor might disappear. So saving more for people in general, and especially for young people. For precautionary reasons, and also because the returns might be really great, so your investments might do especially well. Also, stay healthy, because you might get to access really great technology in the future that you don’t want to miss out on, so be more risk-averse so that you’re more likely to enjoy the benefits of this technology and potentially longevity. How do you think about capital allocation? Is it basically chips, data centers, energy infrastructure, generally the compute stack? Or more app layer? With the disclaimer of not financial advice. I think it’s hard to pick winners, not financial advice, but I think more diversified portfolios are better. You might get the general sector right, like compute, but you might bet on the wrong player. You might want to make bets in such a way that really focuses on your disagreement with the market about AI. One way one might do this is buying very far out of the money call options on the S&P 500, which would materially appreciate if we see explosive growth, and the options are cheaper because the volatility for the S&P 500 is much lower than it would be for Nvidia. So maybe buy really cheap options that really highlight your disagreement between you and the market that’s specifically tied to AI rather than other things. Do you expect millions of knowledge work jobs to be lost and recreated elsewhere? Yeah, I do. I think that early job loss won’t translate to the same reduction in labor force participation, their jobs will be redefined, and they will do new tasks that they previously weren’t doing. Software engineers will spend more of their time instructing AI models and checking their work rather than writing code. And people will be able to find new tasks, especially if the economy is really booming and technology is improving. I don’t think very near term substantial job loss is something I would predict. In fact, you might even see people come out of unemployment because the wages are so good and the economy is booming, so you might see an increase in labor force participation. It depends a bit on the pace of automation, if we automate jobs faster than it takes time to retrain people for them, that might get you some friction, which would result in maybe a decrease in labor force participation. It sounds like the overall picture is large job displacement in knowledge work, but lots of movement into other roles; drop-in remote worker products perhaps by 2035; and the prep advice is that you can’t necessarily rely on labor for income long term, so increase savings and invest broadly, don’t try and pick individual things unless you’re particularly well informed, and job-wise, assume that you’ll need to re-organize into something that is complementary to AI. Yeah, that’s right. And, a person should have a lot of uncertainty, and they should plan for a pretty wide range of outcomes. I think that’s also a very important thing.
Chris Barber tweet mediaChris Barber tweet mediaChris Barber tweet mediaChris Barber tweet media
English
9
26
122
19K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
Humanoid Robot Timeline Predictions From my conversation with 1X Head of AI Eric Jang (@ericjang11, @1x_tech, creators of the NEO robot) "When Will Humanoid Robots Have Early-Adopter Product-Market Fit?" Eric's personal views (not 1X's) below. Chris: What's the first year where you feel that humanoids are good enough that 40% of households currently employing domestic staff would keep a robot after a 21-day trial, purely on utility value? Eric: 5 years (2030) for early adopter product-market fit, 10-20 years (2035-2045) for diffusion. By early adopter product-market fit I mean that those households would pay to keep the robot after a trial purely on utility value, though they’d choose to keep their household staff also. It’d be worth the money, though not a full replacement. By 2035, for early adopter families it might be at the point where the humanoid robots are preferred and are a replacement of their existing staff. Chris: What are your top 1-3 reasons for why it’ll be that soon? Eric: 1) That is my day job. 1X is working on the household humanoid market. We think that all the pieces required to make the product are mostly there, except for the limitations on the dexterity of hardware. But it is sort of a known unknown now. On the autonomy side, there's no big scientific unknown, it's a real world prediction problem that's not too dissimilar from self driving cars or VLMs, of course there's aspects that are different, but it's not a completely different animal. 2) When 1X started in 2015, the market was pretty static, there were just a few players and we were all building on fairly capital-conservative timelines. The field has changed a lot in the past couple years, many people are aiming for solving general purpose robots. Regardless of who fails, there will be people trying again immediately. 3) We've hit the critical threshold of smart people in this domain that I think it'll happen. Chris: What are your top 1-3 reasons for why not sooner? Eric: Elon Musk is famous for saying that getting hardware right takes 3 major iteration cycles. And each iteration cycle is 1-2 years. That was the case with the cars. I don't see how we can greatly accelerate hardware development of this kind with the technologies we have today. And the software can really only happen once the hardware is there, you can try to build some of it beforehand, but you kind of need to co-develop it with the hardware.
Chris Barber tweet media
English
5
13
70
9.6K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
DeepSeek-R1: What's the main takeaway & what should we expect next? I asked AI researchers and Jordan Schneider from ChinaTalk. FYI: Long post. Finbarr Timbers, @finbarrtimbers (Artfintel, former DeepMind) 1) What's the main takeaway: The biggest update that we should see is that it turns out to be very easy to replicate these reasoning models, and it's purely a question of getting the right data. And that wasn't obvious: when o1 came out, there was a lot of speculation as to how they did it, and you know whether or not it'd be difficult to replicate, and it just turns out it's not! It turns out you take a good base model and you do very basic RL, and that's enough. 2) What should we expect to see next: First, in the domains where there are objective signals, like math questions, generally science and technology and engineering questions, or where there's code we're able to execute and get signal, we should expect massive improvements. And we should expect to be able to pay more and get better answers. Before this, it wasn't clear that you could pay for a good model without going in and getting, millions of of examples of data that you could do pre training on. In the medium term, I think we will see general reasoning improvements. Another question is whether or not it's possible iterate on this data and make it better and better and better. Because if you run this thinking process and you're able to figure out an answer to this question, can you then have a model, summarize the reasoning trace, summarize the chain of thought and then get a more concise version. And then, instead of thinking for 1000 tokens, you can think for 100 tokens. And then just keep iterating on that. I strongly suspect that we can. So not only we will see this new capability where our models are going to do well in reward rich settings, but we should also see an improvement in the rate of progress for research on these domains where this objective reward signal exists. 3) What's important but under-discussed: This is a really simple approach. If you were to say, hey, let's use RL with a language model to solve these problems, this would be pretty close to the first thing that I'd try. And it turns out that it works. There was all of work on Atari that made Atari agents really good. And we're just not using those tricks. So, how much of this stuff can we apply? And, how do we efficiently use inference-time compute? I still think MCTS has a lot of opportunity. (Finbarr has a good read on his blog related to what research gets wrong about MCTS.) Muesli is another RL algorithm that DeepMind wrote. I think that some of this stuff is going to be useful. And we're scratching the surface on how we're doing search and how we're doing the reinforcement learning here. I think there's a lot of potential to improve it. Jacob Buckman, @jacobmbuckman (Co-founder at Manifest AI, former Google Brain) 1) What's the main takeaway: Don't discount boring seeming optimizations. There's nothing fundamental here, it's just a bunch of really good engineering making things work. Also, you need to imagine they did lots of hyperparameter sweeping and ablations. Don't discount the value of getting things exactly right. In this field there's a really high leverage on ideas and precise implementation. 2) What should we expect to see next: Expect to see big teams investing more heavily in research and less in naive compute scaling. Instead of just dumping dollars into GPUs and getting the edge that way, they'll invest in getting good scaling laws. That said, they'll still be dumping the dollars into the GPUs also, because once they have the good scaling laws, they scale up. 3) What's important but under-discussed: The claim that everyone is wild about is the pre-training budget, which is from V3. But people didn't seem to take much notice of V3 until R1 came out. Which is a different model that wasn't trained on the $6mm budget. That one is definitely doing a bunch of distillation stuff which means they trained a big one and trained a small model off of the big one. It's also doing reasoning inference time unrolling stuff. I haven't read it in detail, but they must've done something like collecting the unrolling data. Which either means they scraped it from humans (paid $ for data collection), maybe they found a way to scrape it from OpenAI/o1 CoT to bootstrap it, or maybe a classic RL setup with search to find the right answer and then add that to the dataset (paying compute for dataset construction). So even if the parameter updates aren't expensive, the parameter updates are on data that was expensive to collect and create. It was bizarre that that is what got the attention: the distilled reasoning fine-tuned model, and then that attention was broadcast back to DeepSeek-V3 and their training budget. Which was impressive, but not impressive enough for this level of attention. Jordan Schneider, @jordanschneider (Founder of ChinaTalk) 3) What's important but under-discussed: Compute still matters. DeepSeek closed itself to new signups today. Even with its efficiency gains, it's one thing to train a model and another to deploy it to millions of users. Export controls on SME (semiconductor manufacturing equipment) and AI chips, smart immigration policy, and government policies that support AI diffusion are the best tools at the US government's disposal to ensure liberal democracies maintain a sustainable edge on AI vs China Nathan Lambert, @natolambert (Post-Training Lead at Allen AI) 1) What's the main takeaway: R1 is about the pace of innovation when new techniques are found. 2) What should we expect to see next: The most important thing that DeepSeek shows us is that the pace and opportunity of innovation in AI is still very high and we’re in for a wild ride. Oh, and the people in charge should figure out the geopolitics. 3) What's important but under-discussed: I think everything important is mentioned, but it is drowned in crap takes. DeepSeek is a great team and talent is spread around the world, we, as Americans, want it here! Ross Taylor, @rosstaylor90 (Previously led Meta AI reasoning team) 1) What's the main takeaway: Research update: Never underestimate the power of pushing a simple baseline with more compute. Product update: Open thought models which show the chain-of-thought are compelling products 2) What should we expect to see next: Reasoning is going to be everywhere. Instead of thinking about agents in the world, it's maybe more helpful to think of the world itself becoming more thought-dense -> we'll be able to embed thought into things that previously were static or reactive. 3) What's important but under-discussed: The interplay between generation and verification, how they are linked, and what that means for the future. Reasoning in more modalities than just text. Continual learning and better understanding of task context. Thank you to Finbarr Timbers, Griffin Choe, Gwern, Herrick Fang, Jacob Buckman, Jordan Schneider, Josh Singer, Matt Figdore, Nathan Lambert, Rick Barber, Ron Bhattacharyay, Ross Taylor, Tim Wee and Tyler Cowen. What's the even more important question that I should be asking?
Chris Barber tweet media
English
9
32
146
22.3K
herrick retweetledi
Chris Barber
Chris Barber@chrisbarber·
Will scaling reasoning models like o1, o3 and R1 unlock superhuman reasoning? I asked Gwern + former OpenAI/DeepMind researchers. Warning: long post. As we scale up training and inference compute for reasoning models, will they show: A) Strong general reasoning skills that work across most logically-bound tasks B) Some generalization to other logic tasks, but perhaps requiring domain-specific retraining or C) Minimal generalization? Finbarr Timbers (Artfintel, former DeepMind) "We'll achieve superhuman performance on specific tasks with verifiable rewards. I see no evidence for general broad transfer, but it seems extremely plausible." Gwern Branwen ""Everyone neglects to ask, what are we scaling?" Depends on what data they scale up on. The more you scale up on a few domains like coding, the less I expect transfer, as they become ultra-specialized. Far transfer is rare, and after a certain point, meta-learning collapses into learning for additional optimality. AlphaZero doesn't develop general gameplaying ability that transfers strongly across all domains either." Jacob Buckman (Co-founder at Manifest AI, former Google Brain) "Generalization can't really be predicted like that except empirically. All I know is as you add more compute and data you go from minimal transfer to some transfer to broad transfer. I have no clue where on that spectrum we stand when we run out of compute or data" Karthik Narasimhan (Sierra AI, former OpenAI, co-author of GPT paper) "I expect some generalization with domain specific retraining" Near (Independent) "I think the "spikiness" of intelligence will continue to be notable (models which are extremely good at some things yet quite 'dumb' at others), but that it is easy to improve generalization in the areas we care about, since it just requires some data/RL fun." Nathan Lambert (Post-Training Lead at Allen AI) "With chain of thought generation as a general approach for autoregressive language models to break down and process information, the new models trained to reason heavily about every subject will come to have better average performance than standard autoregression. In domains with explicit verifiers, this performance will be superhuman, in domains without, reasoning will still enable better performance, but maybe not more economical performance." Pengfei Liu (Shanghai Jiao Tong University) "Increased compute and inference time will drive reasoning capabilities to expert-level performance where rich feedback loops exist. However, the development of general reasoning will be gated by two factors: the availability of problems requiring genuine deep thinking, and access to high-quality expert cognitive process data or well-defined reward signals." Ross Taylor (Previously led Meta AI reasoning team) "I think general reasoning will come fairly quickly. Right now it's easier to scale in domains where problems are easy to verify with an external signal. The generalisation will come if models themselves become good verifiers across domains." Shannon Sands (Nous Research) "There's at least some generalisation to other tasks like logic puzzles - but it might require more domain specific training to improve on many more out of domain tasks." Steve Newman (Co-founder of AI Soup, Co-founder of Writely aka Google Docs) "This is a trillion dollar question. If I had to guess: we'll see some transfer of reasoning skills across domains, but (on anything resembling current architectures) some specialized training will be needed in each domain. We'll learn a lot one way or another this year." Tamay Besiroglu (Co-founder of Epoch AI) "I think minimal transfer is wrong because reasoning is a very general skill that you can apply to perform a wide range of actions. Planning, for instance, is something that requires good reasoning, and is useful to achieve high levels of performance across a wide range of tasks. Differentiating between broad transfer and some transfer is hard without being more precise or quantitative" Teortaxes (Independent) "I think there will be a period of strong 'natively verifiable reasoning overhang' which translates to more general verifiers using models' strong coding ability and general knowledge+tools, then they grok more general regularities of sound reasoning, and the next generation can natively generate good reasoning data for all domains." Xeophon (Independent) "We will see some generalizations into other domains where the model was not explicitly trained on. For example, R1 writes better and more creative stories than V3, the model it is based on. To push this further, models need to be trained on more data in other domains." Chris Barber (Independent, Creator of this roundup) Synthesis: "The expert takes point to generalization for all logically bound domains where we can construct verifiers for now, and trending in the direction of broad transfer in the future." Implications and follow-up questions Gwern "The important question is to what extent the verifiers or judge models can be cheaply set up for each new domain, I'd think. If you can quickly 'o3-ize' every important domain, particularly with general-purpose coders & mathematician expert models, then the scaling can continue cheaply." Gwern's recent comment on LW and McLaughlin's essays are also great reads. Ross Taylor "Verification will become a test-time compute problem -> the longer you spend checking a solution, the more accurate the verification signal should be. Then the question will be: how do you verify the verifiers? I suspect we'll end up in a world where multiple agents are checking each others' work (a bit like how the research community works). Not clear if we're at the level of model capability for this to work yet, but I wouldn't bet against it!" A Separate Question - Is There a Ceiling? Finbarr Timbers "I see no reason for reasoning models to plateau around world-expert level tbh. The existing paradigm for reasoning models looks like RLVR (reinforcement learning with verifiable rewards), where the model learns to solve tasks with deterministic/verifiable rewards. No reason that has any limit around human levels. Look at deepseek r1, for instance." Jacob Buckman "In general search can surpass humans. This was clear even pre-deep-learning, e.g. with "classical" chess AI. But most real-world domains don't have the nice properties (cheap to simulate, clear success condition) that make them amenable to search. In a setting without this niceness, reasoning/search amounts to learning a human expert's "search function", and then unrolling it. It's possible to surpass the original expert using this technique, by unrolling for more steps than they do. The bottleneck in this case becomes rate-of-search, that is, you won't be able to improve performance faster per step of search than the expert." Tamay suggested a good follow up would be to "elicit ideas for experiments that people would expect to turn out one way conditional on “Weak transfer” and another if “Strong transfer” is correct" – let me know if you have ideas. Thank you to Amir Haghighat, Arun Rao, Ash Bhat, Avery Lamp, Charlie Songhurst, Connor Mann, Daniel Kang, Dhruv Singh, Eric Jang, Ethan Beal-Brown, Finbarr Timbers, Flo Crivello, Griffin Choe, Gwern, Herrick Fang, Jacob Buckman, James Betker, Jay Hack, Josh Singer, Julian Michael, Katja Grace, Karthik Narasimhan, Logan Graham, Matt Figdore, Mike Choi, Nathan Lambert, Nicholas Carlini, Nitish Kulkarni, Pengfei Liu, Rick Barber, Robert Nishihara, Robert Wachen, Rohit Krishnan, Ron Bhattacharyay, Ross Taylor, Shannon Sands, Spencer Greenberg, Steve Newman, Tamay Besiroglu, Teknium, Teortaxes, Tim Shi, Tim Wee, Tom, Tyler Cowen, and Xeophon. Upcoming posts: 1) Computer-use benchmark timeline predictions 2) Clips from researcher interviews (e.g. Finbarr Timbers and Jacob Buckman from the list above) 3) More takes on scaling reasoning models Questions: 1. What's your take? 2. Who else should I ask? 3. What follow up questions should I ask? 4. What's the even more important question that I should be asking instead?
Chris Barber tweet media
English
41
103
693
151.8K
herrick retweetledi
Mike Choi
Mike Choi@guard_if·
what if it becomes dead simple to find hidden, high leverage contacts?
English
11
12
69
10K
herrick retweetledi
Eureka Health
Eureka Health@AskEureka·
Introducing Eureka, the world’s first AI doctor. Eureka can order labs and give care in the real world. It is covered by health insurance just like a healthcare provider and is already working with thyroid patients in the US. It’s 90x faster than most care in the US, and 9 out of 10 users want to continue with Eureka’s recommendations. Eureka thinks like a doctor and reasons like a detective. Before any care starts, a board-certified physician reviews Eureka’s recommendations to make sure everything is in order. Eureka currently specializes in endocrine conditions like thyroid and diabetes. Here’s what users think about Eureka 🧵 (1/5)
English
89
233
1.1K
1.9M
herrick
herrick@herkhere·
TIL that CMD+SHIFT+G is a thing in Finder on Mac 🤯
English
0
1
3
449
herrick retweetledi
studiolanes
studiolanes@studiolanes·
New Discord Channel for Buddy on the Vision Pro, join here to say hi or to share your thoughts! discord.gg/yCacv799VV
English
4
2
5
736