Tweet fixado
TK Stohlman
1.3K posts

TK Stohlman
@tkstohlman
Founder https://t.co/7hwBwUpD83 Previously founded Autoscale AI (acq. 2021). Former U.S. Army officer
Encinitas, CA Entrou em Ocak 2009
715 Seguindo796 Seguidores
TK Stohlman retweetou

@tkstohlman Most people are still focused on better models. The real unlock is something else entirely: systems that govern how AI operates, enforce constraints during execution AND produce traceable, auditable, and repeatable outputs
This is a new conversation entirely.
#AI #AIGovernance
English

We’ve been building Project 777 for a while.
Today marks another step forward: we’ve executed a $1M+ license with Atherion Bioresearch.
More here: apnews.com/press-release/…
English

@mchadsells Thanks Chad - that means a lot. You’ve seen more of what’s behind this than most, so I really appreciate the support. 🙏
English

@tkstohlman Really proud of what TK and the team have built. Congratulations. The idea of an execution layer sitting above frontier models to make outputs more usable, governed, and enterprise ready is exactly where the market needs to go. Excited to see this drive meaningful adoption.
English

@Drewandcrew22 Thanks Drew! I appreciate all the long talks at the football games when you listened to the vision, asked questions, and offered advice along the way. I appreciate you!
English

@tkstohlman This is incredible. TK and team continue to set the standard with a proven track record of innovation and execution in this space. The potential here—from disease treatment to broader applications—is massive. Excited to watch this unfold!
English

@kenschaefer1 Thanks Ken! You’ve seen this from the earliest days, and your guidance along the way has meant the world. Really appreciate you!
English

@JeremyBrandt Thanks Jeremy - really appreciate it! Been a long road, good to finally get it out there.
English

@tkstohlman Awesome work my friend. It is great to see the progress.
English

@chadsepulveda Thanks Chad - I really appreciate it. And I appreciate all the EO Forum advice over the years 🙏
English

@RussellAllen50 Thanks Russell - really appreciate it. Excited to finally get this out there 🙏
English

Incredible! @tkstohlman and his team are changing the world. Proud to call you friend!
English

@chamath Agree. AI favorability will improve when people see it solving real problems - cures, infrastructure resilience, scientific discovery. We should focus less on benchmarks and more on the abundance AI will create.
English

This lays at the feet of all AI CEOs. We need to do better.
Polymarket@Polymarket
JUST IN: NBC News poll reveals AI favorability at just 26% — lower than ICE.
English

@PeterDiamandis Or they already sold their last AI startup and have spent the past few years heads down building the next architectural breakthrough.
English

The person who will reshape the next 50 years of technology is likely writing a screenplay right now.
The cell phone, the internet, the submarine, self driving cars, voice assistants, the helicopter and the space shuttle were all imagined by storytellers before they were built by engineers.
Stories are blueprints.
English

@slow_developer Both are true. LLM capability will keep improving, but it’s also rapidly commoditizing. The real shift will come from new architectures that govern and orchestrate those models.
English

i wonder what yann lecun's reaction is after gpt-5.4
he has been saying llms are a dead end for years, but llms have continued to improve steadily
yann is highly respected in AI, and i wouldn't claim to know more than he does,
but it still makes sense to keep pushing llms while others explore different architectures.
English

@gdb Agree 100%. The real metric isn’t benchmarks - it’s whether AI produces solutions that materially improve human life.
English

AI becomes the safest and most useful infrastructure humanity has ever built - helping design cures, powering trustworthy personal AI, and coordinating robots, vehicles, homes, factories, grids, etc. All of this using new architectures - not just bigger models. Abundance is achieved via new architecture breakthroughs.
English

@karpathy good stuff. We’ve been working in this direction for ~5 years (bought autoresearch.ai in 2019). It’s evolved into a governed execution layer that turns model capability into deployable solutions across domains. Early commercial validation includes a $1M+ license.
English

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism.
github.com/karpathy/nanoc…
All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
TK Stohlman retweetou
TK Stohlman retweetou

Someone took their Cybertruck into Lake Grapevine in Texas.
The Cybertruck can drive in water up to 31 inches deep using Wade Mode, which uses a technology Tesla calls the scuba pack, where it uses the truck’s built in air suspension to pressurize the battery.
However, this video looks deeper than 31” lol
English




