Garry Tan

73.4K posts

Garry Tan banner
Garry Tan

Garry Tan

@garrytan

President & CEO @ycombinator —Founder @garryslist—Creator of GStack & GBrain—designer/engineer who helps founders—SF Dem accelerating the boom loop

San Francisco, CA Katılım Ocak 2008
5.8K Takip Edilen832K Takipçiler
Sabitlenmiş Tweet
Garry Tan
Garry Tan@garrytan·
Tech gave me everything I have Its capacity to lift people into abundance is incredible and there is nothing like it We must make that into prosperity for everyone
Bloomberg Technology@technology

"I realized tech is this thing that can bring people out of whatever situation they're in and often into prosperity. And that's what I want for everyone." @ycombinator’s @garrytan tells @emilychangtv how tech changed his family's life. Watch here: trib.al/sxg1VGR

English
897
814
6.3K
4.3M
Garry Tan
Garry Tan@garrytan·
By evals I mean literally tell the agent: given what we discussed about what we are doing and why and what happened, use three different frontier models to look at inputs and outputs of your skill file calling the code, and rate it on effectiveness. Why isn’t it a 10? How could it be made to be so? Run this a few times and you will be surprised how fast it gets astonishingly better And since it is in a skill file plus code with evals (LLM as judge) and unit tests, it stays better forever
English
1
1
23
2.8K
Garry Tan
Garry Tan@garrytan·
Funny how simple using openclaw and Hermes agent is these days Just have it do stuff. Then improve in progressive batches with evals from multiple frontier models. It self improves!
Garry Tan@garrytan

Right now I just use my personal AI and our company brain and it screws up and I tell it to fix it and write tests for it. Also I do cross modal evals on progressive batches (eg if there are 10000 items do 5 and eval the input and output and skill, then keep doubling the batch size as you go)

English
16
2
84
10.4K
Garry Tan retweetledi
arman
arman@ksw_arman·
it's crazy how @greptile has had such a noticeable improvement in the last few months. i've never seen an agent at that scale improve drastically so fast
English
1
3
38
6.2K
Garry Tan
Garry Tan@garrytan·
@aidenybai Sounds like a “giving a shit” problem really
English
1
0
4
642
Aiden Bai
Aiden Bai@aidenybai·
this is mostly a guardrails problem: - teams can't keep up with code review - existing testing is mostly "fake" - the good ICs care, most don't give a shit. tokens amplify this problem
Hedgie@HedgieMarkets

🦔Uber's COO Andrew Macdonald said on Saturday that the company is having a harder time justifying its AI spend. After CTO Praveen Neppalli Naga went viral in April for admitting Uber burned through its 2026 Claude Code budget in four months, senior engineering leaders concluded higher token usage was not translating into proportionally more useful product. Macdonald said the link between AI consumption and shipped features is "not there yet." CEO Dara Khosrowshahi confirmed on the earnings call that Uber is slowing hiring to fund its AI spend. Duolingo also walked back its decision to include AI usage in performance reviews last month. My Take Uber is the first major enterprise where the C-suite has publicly admitted, on the record, that the AI productivity story is not closing for them. That matters because Uber is not a skeptic. The company went all-in on AI tooling, set internal targets, and burned through its annual research and development budget in four months trying to make it work. The conclusion from the people running the experiment is that tokens consumed and value shipped are not the same number, and management is finally noticing. Duolingo's reversal lands in the same week for a reason. CEO Luis von Ahn said employees were asking whether they needed to use AI just to use AI, which is Goodhart's Law showing up in a performance review system. When usage becomes the metric, employees optimize for usage, not output. Microsoft canceled internal Claude Code licenses, Google AI Pro stripped credits from paid subscribers, and now Uber is admitting the ROI does not close at scale. The narrative has shifted in the last 30 days from "AI productivity is here" to "AI productivity is harder to measure than we thought." The companies pushing tokenmaxxing internally are now the same companies signaling cost pressure externally. The IPO calendar for OpenAI and Anthropic is going to get a lot more complicated if the largest enterprise customers keep saying this out loud. Hedgie🤗

English
20
5
119
24.1K
Garry Tan
Garry Tan@garrytan·
@karrisaarinen Use AI effectively to create new products and services that didn’t exist before that customers love
English
4
0
14
861
Karri Saarinen
Karri Saarinen@karrisaarinen·
@garrytan True but how do you solve the demand side? Selling more to old or new customers?
English
2
0
2
1.8K
Karri Saarinen
Karri Saarinen@karrisaarinen·
We keep hearing about 10x or 100x productivity gains in engineering and knowledge work. But outside the model labs, I haven’t seen the corresponding 10-100x revenue growth across the market or increase in quality. So where is the productivity going?
English
125
18
598
49.4K
Garry Tan
Garry Tan@garrytan·
This sounds complicated but the agents can implement this in OpenClaw/Hermes Agent trivially (use skillify from GBrain with a link to this tweet) Sounds ridiculous but you should try it
Muratcan Koylan@koylanai

Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness. SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them. A few things I learned that you should consider too. 1. The validation gate is the only thing that matters in a self-editing loop. Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop. 2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot. Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size. 3. Compactness wins. Median final skill: ~920 tokens. Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't. 4. The harness is becoming less important; the skill is becoming more important. A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that produced it. 5. Frozen model + trained context is the practical adaptation. GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models. 6. Verification is the bottleneck. Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage. There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7, gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK: - Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it. - Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is. Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured. The fast/slow split they describe already lives implicitly in the digital-brain-skill repo: - voice-guide and tone-of-voice.md are slow-state (rarely touched) - posts.jsonl and bookmarks.jsonl are fast-state What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing. If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: arxiv.org/pdf/2605.23904

English
7
9
96
12.1K
Garry Tan
Garry Tan@garrytan·
@karrisaarinen I’m sorry to say it requires skills that few people even possess because it is all so new
English
8
3
79
4K
Garry Tan
Garry Tan@garrytan·
Right now I just use my personal AI and our company brain and it screws up and I tell it to fix it and write tests for it. Also I do cross modal evals on progressive batches (eg if there are 10000 items do 5 and eval the input and output and skill, then keep doubling the batch size as you go)
English
0
0
6
10.9K
Alex Hovansky
Alex Hovansky@Alex_TGH·
@garrytan this sounds like youre building an actual brain instead of just a bot script curious how the feedback loop looks in practice
English
1
0
1
425
Bhavyam Arora (Content Arc)
Bhavyam Arora (Content Arc)@AroraBhavyam·
Applications for both @ycombinator and A16Z @Speedrun are closed now completely! But if you are a founder who wants to raise funding for your #startup, here's a list of the best pre-seed / seed funds that are investing actively: @204BVC $82M, deep tech/bio, pre-seed/seed @Afore $185M, generalist, pre-seed specialist @AntiFund $30M, AI + defense, $100K-$500K first check "Follow @AroraBhavyam if you found this valueable 🫡" @basecasecapital ~$99M, enterprise infra, solo GP @haunventures $1B, crypto + AI agents x finance @HaystackVC $85M, generalist software, pre-seed/seed @HummingbirdVC $800M, outlier founders globally @MantisVC $100M, cyber + B2B multi-sector @MischiefVC $80M, generalist software, $1M-$4M @ModernTechnical $22M, software infra, solo GP @PrecursorVC $66M, generalist tech, $100K-$500K @SevenStarsVC $40M, AI applications, pre-seed @StrikerVenture $165M, AI + cyber + life sciences @ZeroShotFund $100M target, post-AGI builders Give feedback if I missed any major ones 👇
Bhavyam Arora (Content Arc)@AroraBhavyam

YC deadline has been extended until this weekend. If you are a founder who missed it before, now is the time... Also, you now get $2M worth of OpenAI tokens if you're selected! (screenshot of @agupta's tweet, gp at @ycombinator)

English
10
4
195
25.6K
Anshu Sharma 🌶
Anshu Sharma 🌶@anshublog·
@levie @random_walker New rule: any ceo who claims work can be fully done by ai needs to immediately let go of their executive assistant. Oh so you’re telling me it can do the job of a software engineer that builds schedulers but not that of a scheduler?
English
5
25
364
21.8K
Garry Tan retweetledi
Aaron Levie
Aaron Levie@levie·
CEOs are uniquely prone to AI psychosis because they’re sufficiently distant from the last mile of work that still has to happen to generate most value with AI. So when they play with AI, they see the happy path results, often not considering the next 10 or 20 things that have to happen to get sustainable results from agents. “Look I made this awesome product prototype”. Yes but you didn’t have to review the code before it went into production and fix a bunch of issues. “Look I generated a contract”. Yes but you didn’t verify all the terms before it goes out to the counterparty and didn’t have to wire up all the past contracts to work with. The best thing you can do as a CEO is to use AI a *ton* to figure out the real implications of agents in the enterprise, and come out the other side with an appreciation for both the upside and the real work that goes into them.
Michal Malewicz@michalmalewicz

CEOs are the most delusional about AI. Detached from reality.

English
272
664
6.3K
1M
Alex MacCaw
Alex MacCaw@maccaw·
@levie If anything, CEOs aren’t AI-pilled enough. As a former manager/CEO turned IC again, my experience is that AI can do everything I throw at it, and more.
English
1
0
5
1.6K
Garry Tan retweetledi
Kathryn Wu
Kathryn Wu@kathrynwu1·
I think one reason YC likes logical engineers is not just because they can code. A lot of them are unusually clear communicators. Coding trains you to think in strict logical sequences: input → output, cause → effect, constraint → solution. You can hear it immediately in good founders. Not necessarily charismatic, but coherent. People underestimate how much startup momentum comes from simply being easy to understand.
English
10
2
72
6.3K
Garry Tan
Garry Tan@garrytan·
@tszzl Will fight against this until my dying days
English
12
4
107
8.1K
Garry Tan
Garry Tan@garrytan·
@SplinteredEsq Markdown system of record GBrain uses pgvector and Postgres. I’m on a Supabase XL instance now
English
0
0
0
36
Chris Baker
Chris Baker@SplinteredEsq·
@garrytan how do you store them? Like if the models are on a VPS, whats the best storage method for the originals and then the markdown?
English
1
0
0
56
Garry Tan
Garry Tan@garrytan·
GBrain just got a big update: graph generation is now much more automated and powerful My knowledge wiki is now pushing 300k markdown files across multiple federated company brains
Garry Tan tweet media
English
39
15
359
28.6K
elvis
elvis@omarsar0·
New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize. Probably not optimal. This works show why. It treats the skill doc as a trainable external state of a frozen agent instead. It introduces SkillOpt, where an optimizer model makes validation-gated edits to the skill file. It adds, deletes, or replaces instructions, with a textual learning rate that controls how aggressively each round rewrites the doc. The agent itself never changes. SkillOpt is best or tied on all 52 (model, benchmark, harness) cells. On GPT-5.5 it adds 23.5 points in direct chat, 24.8 with Codex, and 19.1 with Claude Code over no skill. It beats human-written skills, TextGrad, GEPA, and EvoSkill, carries zero extra inference-time cost, and the learned skills transfer across models and harnesses. Paper: arxiv.org/abs/2605.23904 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
34
123
659
70.6K