Francis Davidson

195 posts

Francis Davidson

Francis Davidson

@FDavidsonT

Co-founder and former CEO of Sonder. Now working on new ideas.

Katılım Eylül 2014
1.5K Takip Edilen3.2K Takipçiler
Francis Davidson retweetledi
Aaron Haynes
Aaron Haynes@myeyesshine_·
Chartbeat data shows Google Search referrals down 34% overall, 60% for small publishers. AI chatbots still <1% of referral traffic. This alogns with @ahrefs data showing 18% search traffic decline across 74.7K sites while AI replaced less than 5% of the loss. The traffic isn’t being replaced. It’s being absorbed. AI answers the query directly and the user never clicks. The value is shifting from traffic to influence…. and most analytics dashboards can’t see influence. Nice share @NexusBen
9to5Google@9to5Google

Google Search referrals to the web have plummeted, AI links are 'less than 1%' of traffic 9to5google.com/2026/03/18/goo… by @nexusben

English
8
6
55
8.6K
Francis Davidson retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
962
2.1K
19.3K
3.5M
Francis Davidson
Francis Davidson@FDavidsonT·
Can’t wait to share with you all what we’ve been cooking
Alfred Lin@Alfred_Lin

@FDavidsonT has launched over 5,000 agents and used over 3.5B tokens *just for himself* in the last 31 days alone. This is what 10x productivity looks like in practice.

English
4
0
17
2.5K
Francis Davidson retweetledi
John Rush
John Rush@johnrushx·
This equals 40,000 full-time software developers working full time. End of 2026: 200,000 developers. 2027: just Claude Code alone will be adding as much code as 1,000,000 full-time human developers. 2028: 1B+ Enjoy your last lines of handwritten code. Horses replaced by cars.
SemiAnalysis@SemiAnalysis_

4% of GitHub public commits are being authored by Claude Code right now. At the current trajectory, we believe that Claude Code will be 20%+ of all daily commits by the end of 2026. While you blinked, AI consumed all of software development.

English
83
62
591
152.6K
Garrett Langley
Garrett Langley@glangley·
When it comes to preventing crime, the first response is often simple: harsher punishment. Long sentences, mandatory minimums, more incarceration. This is what most people’s intuition says should work. But there is another approach: make it harder to get away with crime in the first place. More eyes, faster identification. A world where committing a crime without getting caught is unthinkable. Since the 1980s, most of American criminal justice policy has been built on the first approach. But the most important finding in criminology is that it barely works. Daniel Nagin, a researcher at Carnegie Mellon University, has studied criminology for decades. His conclusion, confirmed by hundreds of studies and multiple meta-analyses: the certainty of being caught deters crime. The severity of punishment does not. The National Institute of Justice, the research arm of the Department of Justice, put it even more clearly: if criminals think there’s only a slim chance of being caught, even draconian punishments won’t deter them. This makes sense when you think about it. Most crimes are impulsive. Most criminals don’t know the specific penalties. Only half of all crimes are reported to police at all. Several analyses have found that three-strikes laws actually increase homicide rates, because offenders facing life sentences had nothing left to lose. So severity doesn’t deter. Certainty does. That changes how we need to go about public safety. How do we put this into practice? Swift, Certain, Fair is one approach that’s shown promise. Offenders serve their sentences in the community, where they can work and contribute, under conditions that make getting away with a breach impossible. South Dakota took this approach to drunk driving. Offenders could serve time in the community as long as they passed a sobriety test twice a day. A failed or skipped test meant a night or two behind bars, not a 3 month minimum sentence. The program halved reoffending. It was so effective that arrests for drunk driving and domestic violence fell by around 10% for the county. And it cost the taxpayer nothing: participants paid the $2 a day for testing out of their own pockets. The US spends $270 billion a year on criminal justice. The average cost to incarcerate one person is about $61,000 per year, about the same as the median full-time American worker earns in a year. In New York City, it’s $507,000, closer to the earnings of a surgeon. What are we getting for that money? A system where 60% of released prisoners are rearrested within two years, all while nearly half of violent crimes and over 80% of property crimes go unsolved. And prison doesn’t just fail to rehabilitate. The evidence suggests it makes reoffending more likely. A meta-analysis of 116 studies found that custodial sentences actually increase recidivism compared to non-custodial alternatives. Every year of incarceration decreases the likelihood of getting a job upon release. Our $270 billion buys us a system that manufactures the next generation of criminals. Then there’s the problem of age. Prisoners over 55 now make up 15% of the incarcerated population, up from 3.4% in 1991. Because of healthcare needs, they cost 2-3x as much as younger prisoners to incarcerate, a total of $16 billion a year. And for what? 84% of people released at age 60+ are never rearrested. In 2012, 178 elderly people sentenced to life imprisonment in Maryland were released after a court ruling. In the four years afterward, not one of them was rearrested for anything more serious than a traffic violation. Criminologists Lawrence Cohen and Marcus Felson argued that crime is most likely when three conditions are met: a motivated offender, a vulnerable victim, and the absence of a capable guardian. There will always be motivated offenders and vulnerable victims, but we can ensure that capable guardians are everywhere. This is where Flock Safety comes in. Flock operates in over 5,000 communities across 49 states. In Marietta, Georgia, areas with Flock cameras saw a 34% drop in crime, triple the citywide average. Communities we serve have reported up to 80% reductions in residential burglaries. Across all customers, Flock helps solve an estimated 700,000 crimes per year. And each new camera added to the network makes every other camera more valuable to the police departments, investigators, and first responders who rely on them. The deterrence research says severity doesn’t work. What works is the infrastructure of certainty. Cameras, networks, real-time alerts, cross-jurisdictional data sharing. A world where the odds of getting away with crime drop every year. That’s what Flock Safety is building. The goal is fewer victims, not more prison cells. The evidence says you can have both. Every community deserves that.
English
16
9
104
74.4K
World of Statistics
World of Statistics@stats_feed·
Neuroscientist study reveals that Gen Z has become the first generation to be less intelligent than its predecessor, the Millennials. For the first time in 100 years, young people are scoring lower than their parents on IQ tests and core skills like memory, reading, and focus. This is happening mainly in the US and Europe. The main cause appears to be excessive screen time and digital device use, particularly in schools and social settings. While some suggest Gen Z is just developing different skills, research shows actual declines in fundamental problem-solving abilities.
English
1.4K
4.6K
26.2K
5.5M
Francis Davidson retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
English
1.6K
5.4K
39.4K
7.6M
Shubham
Shubham@aShubhamz·
Tech Companies Tier List : 1) S+ tier (basically mythic status) •Renaissance Technologies - Medallion Fund •NSA, pure research side •OpenAI (core research / superalignment types) •SpaceX (Starship + propulsion specifically) •TGS •Anthropic (interpretability people) •DeepMind - AlphaFold / geometry-heavy work •NVIDIA architecture (Blackwell / Rubin era) 2) S tier (elite, but slightly less legendary) •NSA CNODP •Anduril •Hudson River Trading •Apple — Silicon Engineering Group •Tesla (Autopilot / AI core) •Neuralink •Five Rings •Headlands •xAI •Meta (FAIR) •Tenstorrent •Oxide •AMD RTG (ROCm / compiler / low-level stack) •Radix •Citadel Securities (actual algo teams, not generic roles) •Jane Street •PDT Partners •Etched •Cerebras •Groq •Valve (R&D / Proton / low-level systems) •Discord (voice + infra) •Waymo (Perception Team) 3) A tier (excellent engineering cultures) •Linear •The Browser Company (Arc) •Zed Industries •Notion (core product teams) •Figma (WebGL / C++ side) •Vercel (Next.js core) •Stripe (infra / systems) •Databricks (MosaicML) •Snowflake (engine teams) •Shield AI •Ghost •Signal •Telegram (core team) •Hugging Face •Mistral •Perplexity •Midjourney •Netflix (core engineering) •Uber (platform) •Airbnb (design systems) •DoorDash (logistics / algorithms) •Coinbase (protocol) •Kraken •Monad •Paradigm •Flashbots •Framework •Teenage Engineering •Panic •System76 •Palo Alto Networks (Unit 42) •CrowdStrike (Overwatch / intel) •Mandiant •Trail of Bits •Waymo •Zoox 4) B tier (strong, but more conventional) •Google (Search / Core / Ads) •Meta (FB / Instagram product teams) •Apple Services / iCloud •Amazon AWS •Microsoft Azure •Microsoft Office / Teams •LinkedIn •Palantir (SWE) •TikTok / ByteDance •Roblox •Epic •Unity •Pinterest •Snap •Spotify •Shopify •Atlassian •Salesforce •Adobe •Intuit •Block / Square •Cash App •Ripple •Chainlink •Circle •Affirm •Robinhood •SoFi •Nubank •Revolut •Booking •Adyen •Klarna •Canva •GitLab •GitHub (MS-owned, still decent) •Red Hat •Canonical •SUSE •HashiCorp •Confluent •MongoDB •Elastic •Redis •Grafana •Datadog 5) C tier (big, slow, or mixed signal) •Intel •IBM •Oracle (OCI) •Cisco •Dell •HP •Samsung •Sony •Qualcomm •Broadcom •TI •Micron •Western Digital •Seagate •Garmin •GoDaddy •TripAdvisor •Yelp •Zillow •Redfin •Wayfair •Chewy •Peloton •Roku •Zoom •DocuSign •Dropbox •Box •Twilio •Okta •Workday •ServiceNow •SAP •Siemens •Bosch •GE •Honeywell •John Deere •Lockheed •RTX •Northrop •Boeing •GM (Cruise) •Ford •Rivian •Lucid •Polestar •Palantir (FDE) 6) D tier (mostly tech-as-a-cost-center) •JPMorgan (tech) •Goldman (tech) •Morgan Stanley (tech) •Citi •Bank of America •Capital One •AmEx •Visa •Mastercard •PayPal •eBay •Walmart Labs •Target •Home Depot •Best Buy •Nike (tech org) •Starbucks (tech) •Disney streaming •Comcast •AT&T •Verizon •T-Mobile •UnitedHealth •CVS •Epic Systems •Cerner 7) F tier (body shops / consulting mills) •Accenture •Deloitte •PwC •EY •KPMG •McKinsey (QuantumBlack) •BCG X •Infosys •TCS •Wipro •HCL •Cognizant •Capgemini •Revature
English
99
169
2.8K
318.6K
Francis Davidson
Francis Davidson@FDavidsonT·
While those are some of my predictions for the decade ahead, I certainly hope that well get ahead of some of the real issues facing us: - desires growing faster than reality, primarily driven by excessive screen time and addictive social media feeds. - low competence state institutions absorbing a growing share of output while failing to build necessary infrastructure
English
1
0
0
299
Francis Davidson
Francis Davidson@FDavidsonT·
Screen time will keep rising as companies get increasingly clever at creating consumer demand. Expectations will keep rising, and even though the economy will keep raising standards of living across all income levels, desires will grow faster with wellbeing declining as a result
English
2
0
0
402
Francis Davidson
Francis Davidson@FDavidsonT·
My predictions for 2035 (USA): - Rapid AI takeoff, but slow adoption. GDP growth of 2.5% on average, hampered by organizational inertia, regulations and cultural norms. - Full employment. The market naturally reallocates, with many interest groups protecting classes of jobs from AI replacement. - Government share of GDP rises from 38–>45%. Much of the gains from AI get offset by inefficient gov spending. - Millions of humanoid robots are deployed into the economy, multiple trillion dollar companies aggressively scale up production. - Real incomes grow at a healthy clip for all quintiles. But measures of wellbeing continue deteriorating as people’s perception of the economy worsens, primarily due to social media fueled envy. - Birth rates continue declining to sub 1.5, far below replacement rate. Red states outgrow blue state population levels. - The California high speed rail (with sub 2:40 transit) has been delayed past 2040 or abandoned.
English
2
1
5
599
Francis Davidson
Francis Davidson@FDavidsonT·
Higher expectations for one’s quality of life will keep lowering birth rates, especially among the irreligious and coastal elites. Consumerist gratification is difficult to reconcile with child rearing which costs money and requires altruistic sacrifice in the short run.
English
0
0
1
228
Francis Davidson
Francis Davidson@FDavidsonT·
@patrickc The 21st century aesthetic will continue to be constrained by regulations and developer economics. To see improvements in what gets built, more degrees of freedom will be required. Architects will have to start from developer IRRs and work their way back to novel design patterns.
English
4
2
61
27.3K