Trevor Gurgick

1.4K posts

Trevor Gurgick

@TGurgick

Boston, MA Katılım Mayıs 2011

1K Takip Edilen387 Takipçiler

Sabitlenmiş Tweet

Trevor Gurgick@TGurgick·30 Ağu

@joulee 1. Which parts of the experiences are creating customer friction/pain? (for prioritization, impact) 2. Which actions/inputs are driving target outcomes? (particularly which are causal... ) 3. What does a healthy customer lifecycle look like? What is an anomaly?

English

Trevor Gurgick retweetledi

Thariq@trq212·20 Mar

We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord. Use this to message Claude Code directly from your phone.

English

1.7K

2.4K

25.7K

7.6M

Trevor Gurgick retweetledi

Andrew Curran@AndrewCurran_·5 Mar

Striking image from the new Anthropic labor market impact report.

English

556

2.3K

13.4K

7.3M

Trevor Gurgick@TGurgick·25 Şub

I’ve interviewed more LLMs than humans thus far in 2026. Maybe AI really is taking all the jobs?

English

Trevor Gurgick@TGurgick·28 Ara

@realmadhuguru Feel this, but at the same time instead of worrying about being behind we should focus on what problems we can solve with this new found power

English

Trevor Gurgick retweetledi

Madhu Guru@realmadhuguru·26 Ara

Same feeling as a product person. For years, my constraint was execution capacity - lots of fun ideas, but building each required assembling eng and design teams. Hard and unscalable. With AI, I can now 'hire' a full team instantly. Massive scale unlocked...in theory. Because now there’s a new constraint: mastering how to orchestrate AI teams…the tools, workflows and product craft in this new world. Feels like there is no bigger lever than learning these skills. Channeling all my spare time and energy here.

Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

English

694

83.3K

Trevor Gurgick retweetledi

Andrew Ng@AndrewYNg·19 Kas

Really proud of the DeepLearningAI team. When Cloudflare went down, our engineers used AI coding to quickly implement a clone of basic Cloudflare capabilities to run our site on. So we came back up long before even major websites!

English

282

375

6.8K

1.2M

Trevor Gurgick@TGurgick·31 Tem

Showing has always been more powerful than telling. Data and goals are still important (“why?”) but condensing the cycle down with a rapid prototype way better than weeks of word edits

Madhu Guru@realmadhuguru

At @Google, we are moving from a writing‑first culture to a building‑first one. Writing was a proxy for clear thinking, optimized for scarce eng resources and long dev cycles - you had to get it right before you built. Now, when time to vibe-code prototype ≈ time to write PRD, PMs can SHOW not tell. Role profiles are blurring, creativity and building are happening in parallel.

English

Trevor Gurgick@TGurgick·23 Tem

Irony of working in AI… my natural use of hyphens and dashes is no longer ‘human enough’

English

Trevor Gurgick@TGurgick·7 Kas

@venturetwins @a16z Love this. Having a newborn + tracking data manually for her, there seems to be an open space to better streamline how we integrate growth and development details in order to get more out of AI during the journey. Currently using 6 different apps to do that.

English

255

Justine Moore@venturetwins·7 Kas

🚨 New @a16z investment thesis: AI x parenting Raising kids is one of our most challenging and important jobs. Every parent needs support sometimes - but many can't access it because it's too inexpensive or inaccessible. AI changes this. What we're seeing 👇

English

136

204

2.3K

417.7K

Trevor Gurgick@TGurgick·2 Kas

It’s been quite a month for @AnthropicAI

Alex Albert@alexalbert__

The real shiptober (plus one day) was at Anthropic: • 11/1 - Token counting API • 11/1 - Multimodal PDF support across claude and the API • 10/31 - Voice dictation in Claude mobile apps • 10/31 - Claude desktop app • 10/29 - Claude in Github Copilot • 10/24 - Analysis tool • 10/22 - New Claude 3.5 Sonnet • 10/22 - Computer use API • 10/18 - Financial analyst quickstart • 10/17 - Mobile app design overhaul • 10/9 - Remove message order restrictions in API • 10/8 - Message Batches API • 10/4 - Artifacts errors auto-fix Btw we are able to ship this much because we use Claude all the time

English

Trevor Gurgick retweetledi

Alex Albert@alexalbert__·22 Eki

Computer use is the first step toward a completely new form of human-computer interaction. In just a few years, the way we interface with computers will be completely different from today. Let me explain:

English

278

3.2K

618.8K

Trevor Gurgick@TGurgick·23 Eki

This is fun. Claude can access your computer now! The shift towards Agents continues. 🤖

Anthropic@AnthropicAI

We've built an API that allows Claude to perceive and interact with computer interfaces. This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research.

English

Trevor Gurgick@TGurgick·21 Eki

Why is every company shifting towards ‘Agentic’ AI products? TAM

Scott Brinker@chiefmartec

Excellent article by Sonya Huang and Pat Grady of @Sequoia, "The Agentic Reasoning Era Begins", and the $10 trillion opportunity with service-as-a-software: sequoiacap.com/article/genera… "Thanks to agentic reasoning, the AI transition is service-as-a-software. Software companies turn labor into software. That means the addressable market is not the software market, but the services market measured in the trillions of dollars."

English

Trevor Gurgick@TGurgick·7 Eki

Lots of motivated buyers, network effects and the ability to generalize well between consumer and enterprise needs helps

Allie K. Miller@alliekmiller

Stripe data shows that top AI startups in 2024 (ex: OpenAI, Anthropic, Mistral, Midjourney) are making money faster than equivalent SaaS companies in 2018. Al startups that hit at least $2.5M/mo rev achieved the milestone in 20 months — 5x faster than past SaaS startups. Do we think that’s because VC money is more concentrated? TikTok? Actual higher interest? OAI skewing everything? More here from FT: ft.com/content/a9a192…

English

Trevor Gurgick retweetledi

Allie K. Miller@alliekmiller·4 Eki

ChatGPT isn’t slowing down. They just released a new feature called “Canvas” so that more work doesn’t just get assisted by ChatGPT, it gets done. Check it out.

English

288

33.3K

Trevor Gurgick@TGurgick·29 Eyl

@thiagocaserta @deedydas Definitely better when you are starting with a blank slate or you’ll need to give a lot of context. That said, optimistic this is just growing pains and the meta reasoning / architecture learning will come soon (or with tuning)

English

Thiago Caserta@thiagocaserta·28 Eyl

Today, after using almost them all (copilot, v0, cursor, etc) I can tell that they still need to improve considerably. I’d say that less than 20% of my code is AI generated, just because it can’t see the bigger picture, so they are good just for troubleshooting and small features implementation, imho.

English

11.1K

Deedy@deedydas·28 Eyl

Head of deploying AI at a big global bank: “30-40% of GitHub Copilot code suggestions were accepted by junior engineers. Contrary to what most of my counterparts say, even the most distinguished engineers working on niche software were at ~30%”

English

515

105.8K

Trevor Gurgick@TGurgick·29 Eyl

One of the best TTS products created. Whole startups spent years on this technology for lesser results… definitely worth playing with.

Andrej Karpathy@karpathy

NotebookLM is quite powerful and worth playing with notebooklm.google It is a bit of a re-imagination of the UIUX of working with LLMs organized around a collection of sources you upload and then refer to with queries, seeing results alongside and with citations. But the current most new/impressive feature (that is surprisingly hidden almost as an afterthought) is the ability to generate a 2-person podcast episode based on any content you upload. For example someone took my "bitcoin from scratch" post from a long time ago: karpathy.github.io/2021/06/21/blo… and converted it to podcast, quite impressive: notebooklm.google.com/notebook/ba017… You can podcastify *anything*. I give it train_gpt2.c (C code that trains GPT-2): github.com/karpathy/llm.c… and made a podcast about that: notebooklm.google.com/notebook/2585c… I don't know if I'd exactly agree with the framing of the conversation and the emphasis or the descriptions of layernorm and matmul etc but there's hints of greatness here and in any case it's highly entertaining. Imo LLM capability (IQ, but also memory (context length), multimodal, etc.) is getting way ahead of the UIUX of packaging it into products. Think Code Interpreter, Claude Artifacts, Cursor/Replit, NotebookLM, etc. I expect (and look forward to) a lot more and different paradigms of interaction than just chat. That's what I think is ultimately so compelling about the 2-person podcast format as a UIUX exploration. It lifts two major "barriers to enjoyment" of LLMs. 1 Chat is hard. You don't know what to say or ask. In the 2-person podcast format, the question asking is also delegated to an AI so you get a lot more chill experience instead of being a synchronous constraint in the generating process. 2 Reading is hard and it's much easier to just lean back and listen.

English

Trevor Gurgick@TGurgick·16 Eyl

@SupermavenAI 👏👏👏

QME

418

Supermaven@SupermavenAI·16 Eyl

We've raised $12 million from Bessemer Venture Partners to build an AI-focused text editor that integrates tightly with our models.

English

589

74.5K

Trevor Gurgick@TGurgick·12 Eyl

@gdb 👏👏👏

QME

Trevor Gurgick@TGurgick·12 Eyl

Great overview of o1 / Strawberry 👇

Jim Fan@DrJimFan

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter. 1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased. 2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS). 3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month: - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search. 4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much. 5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards. This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

English

Keşfet

@realmadhuguru @venturetwins @a16z @AnthropicAI @thiagocaserta @deedydas @elonmusk @BarackObama