Phil Glazer

248 posts

Phil Glazer

@phil_glazer

San Francisco, CA Katılım Temmuz 2016

295 Takip Edilen940 Takipçiler

Phil Glazer@phil_glazer·24 Şub

@ch402 "This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human." would claude say it is living in a simulation?

English

343

Chris Olah@ch402·24 Şub

I'm increasingly taking pretty strong versions of this view seriously.

Anthropic@AnthropicAI

AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why? In a new post we describe a theory that explains why AIs act like humans: the persona selection model. anthropic.com/research/perso…

English

900

223.8K

Phil Glazer@phil_glazer·7 Şub

@The_Whole_Daisy @bcherny great work!

English

Daisy Hollman@The_Whole_Daisy·7 Şub

It's been a wild ride for my second major Claude Code feature I've had the privilege to lead. I'm proud of the whole team that made this a reality, and grateful to @bcherny for giving me the creative freedom to explore this space. Looking forward to seeing what you build with it!

Claude@claudeai

On Claude Code, we’re introducing agent teams. Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently. Agent teams are in research preview: code.claude.com/docs/en/agent-…

English

217

35.4K

Phil Glazer@phil_glazer·6 Şub

taken longer than expected, but latest version of computer use is finally there still a bit slow, but works great and can be relied on in background async to take on complicated tasks with reliably strong results anthropic has done it again

Phil Glazer@phil_glazer

Anthropic's computer use running locally - slow but good, can see in 6mo-12mo being great

English

Phil Glazer@phil_glazer·6 Şub

Imo, this is a legitimate attack vector and the right decision It's still perfectly possible to model switch on a task by loading the message history into the initial message to a separate model instead of constructing a turn-based back and forth that gets passed in As safety work continues maybe the models will be more resilient to these types of attacks and this capability can be returned

dax@thdxr

are we misunderstanding this? the implication is you can't insert any content that anthropic didn't know to have generated this breaks things like switching models mid session and a dozen other things harnesses rely on i switch between claude and gpt all the time :(

English

Phil Glazer@phil_glazer·15 Oca

@ayushjaiswal when full grok 4.1 thinking in api?

English

Ayush Jaiswal@ayushjaiswal·15 Oca

Let grok do all the research & tell you the truth.

English

3.4K

Phil Glazer@phil_glazer·14 Oca

@rahulgs does gpt-4 in opencode work well? intuitively feels like it lacked the juice to be effective but maybe it really was there all along

English

402

rahul@rahulgs·14 Oca

function calling came out on June 13, 2023 in the OpenAI API with the original gpt-4 API, you literally could have built Claude code, Cowork, and Manus the last two years of progress isn’t necessary applications are finally catching up, it always surprises how long things take to diffuse

English

188

27.5K

Phil Glazer@phil_glazer·13 Oca

tools creation evals are more interesting than tool use evals as agents take on longer running tasks the ability to self reflect and create effective scaffolding is powerful

English

Phil Glazer@phil_glazer·30 Ara

just can't be bullish enough on ramp - such a great product and culture, so many companies falling behind on eng but they continue to get it and stay on bleeding edge despite reaching larger scale

rahul@rahulgs

yes things are changing fast, but also I see companies (even faang) way behind the frontier for no reason. you are guaranteed to lose if you fall behind. the no unforced-errors ai leader playbook: For your team: - use coding agents. give all engineers their pick of harnesses, models, background agents: Claude code, Cursor, Devin, with closed/open models. Hearing Meta engineers are forced to use Llama 4. Opus 4.5 is the baseline now. - give your agents tools to ALL dev tooling: Linear, GitHub, Datadog, Sentry, any Internal tooling. If agents are being held back because of lack of context that’s your fault. - invest in your codebase specific agent docs. stop saying “doesn’t do X well”. If that’s an issue, try better prompting, agents.md, linting, and code rules. Tell it how you want things. Every manual edit you make is an opportunity for agent.md improvement - invest in robust background agent infra - get a full development stack working on VM/sandboxes. yes it’s hard to set up but it will be worth it, your engineers can run multiple in parallel. Code review will be the bottleneck soon. - figure out security issues. stop being risk averse and do what is needed to unblock access to tools. in your product: - always use the latest generation models in your features (move things off of last gen models asap, unless robust evals indicate otherwise). Requires changes every 1-2 weeks - eg: GitHub copilot mobile still offers code review with gpt 4.1 and Sonnet 3.5 @jaredpalmer. You are leaving money on the table by being on Sonnet 4, or gpt 4o - Use embedding semantic search instead of fuzzy search. Any general embedding model will do better than Levenshtein / fuzzy heuristics. - leave no form unfilled. use structured outputs and whatever context you have on the user to do a best-effort pre-fill - allow unstructured inputs on all product surfaces - must accept freeform text and documents. Forms are dead. - custom finetuning is dead. Stop wasting time on it. Frontier is moving too fast to invest 8 weeks into finetuning. Costs are dropping too quickly for price to matter. Better prompting will take you very far and this will only become more true as instruction following improves - build evals to make quick model-upgrade decisions. they don’t need to be perfect but at least need to allow you to compare models relative to each other. most decisions become clear on a Pareto cost vs benchmark perf plot - encourage all engineers to build with ai: build primitives to call models from all code bases / models: structured output, semantic similarity endpoints, sandbox code execution. etc What else am I missing?

English

291

Phil Glazer@phil_glazer·30 Ara

@john_ssuh @karpathy @dwarkesh_sp disconnect in tone does seem wild

English

John Suh@john_ssuh·30 Ara

It seems like when @karpathy did the interview with @dwarkesh_sp, he had only used Cursor and wasn’t fully aware of the state of the art of agents. I predict a new podcast in the next 6 months where he drastically shortens his timelines and expands the impact of agents as-is

English

666

Phil Glazer@phil_glazer·30 Ara

this is a great acq for meta

Manus@ManusAI

Manus is entering the next chapter: we’re joining forces with Meta to take general agents to the next level. Full story on our blog: manus.im/blog/manus-joi…

English

149

Phil Glazer@phil_glazer·24 Ara

@BucknSF x.com/phil_glazer/st… prob less than 2yrs, this is like <1wk side project added web search earlier today + could add anything with an api clip doesn't do justice on larger models, does pretty decent work feels like 6mo away from strong on most modeling use cases

Phil Glazer@phil_glazer

fun exploration from the past couple of days, an excel add in that can both: - passively observe your work and suggest edits, like tab complete in an IDE - create plans and build models

English

Buck@BucknSF·22 Ara

In <2 yrs you will be able to pull data and build a GOOD model in minutes. Has a lot of implications for finance. Boutiques with small teams become even more competitive. Retail investor sophistication will increase. Fun to think about.

andrew pignanelli@ndrewpignanelli

Code was the killer app for AI cause it verifies really well. Linting, unit tests and instant feedback for frontend make it so outputs are verified in real time. If you want to think about what gets automated by AI next look at the verification mechanisms. We’re seeing this start in fields like math and biology which have excellent verification systems. Art went quickly because you can instantly tell if it’s good enough. All of this is still maturing but the rate at which the industry matures is directly related to how fast the verification loop is. Medicine and law will have a p99 problem and take forever to diffuse. What’s next? Probably accounting and finance since they’re super easy to verify.

English

213

74K

Phil Glazer@phil_glazer·20 Ara

fun exploration from the past couple of days, an excel add in that can both: - passively observe your work and suggest edits, like tab complete in an IDE - create plans and build models

English

237

Phil Glazer@phil_glazer·20 Ara

@modestproposal1 with the various params available via the api (reasoning level, token allocation, web search, etc) i think approximating this is definitely possible it's possible to take X min as a target and work backwards on params to have it take about that long

English

modest proposal@modestproposal1·19 Ara

would be cool if there's a way over time the models could tell you ETA to output ahead of time. after you type a query but before sending it, you toggle between models to see the difference between eg 5.2 Pro vs Extended Thinking, and decide if the extra 30 mins are worth it.

English

7.1K

Phil Glazer@phil_glazer·19 Ara

will start posting about a handful of custom tools + MCPs I've built and use day-to-day and also various half-baked product explorations (probably some open source) to start, just a pleasant image from image gen MCP

English

113

Phil Glazer@phil_glazer·18 Ara

x.com/OpenAIDevs/sta…

OpenAI Developers@OpenAIDevs

📣Calling all app developers! Starting today, you can submit your ChatGPT app for review. Approved apps will be listed in the app directory, a new surface for users to search for apps directly in ChatGPT. openai.com/index/develope…

ZXX

Phil Glazer@phil_glazer·6 Eki

now available via x.com/OpenAIDevs/sta…

OpenAI Developers@OpenAIDevs

You can start building and testing apps in ChatGPT with the Apps SDK preview, which we're releasing today as an open standard built on MCP. Later this year, we’ll begin accepting app submissions for publication. developers.openai.com/apps-sdk

English

202

Phil Glazer@phil_glazer·26 Haz

GPTs within ChatGPT that are just additional hidden information and prompts aren't interesting (the base model and web search can handle this) GPTs within ChatGPT that expose new tools and capabilities will be interesting ChatGPT via GPTs will become a distribution channel itself to ship to - a layer in front of almost everything else This hasn't happened yet because remote MCPs are only available on Pro tier with Deep Research toggled on

English

361

Phil Glazer@phil_glazer·9 Ara

@danlovesproofs i like this view

English

Dan Robinson@danlovesproofs·9 Ara

In 2025 the limiting reagent in building software is human attention. Compute is cheap, information entering a human brain is expensive. Devtools should optimize accordingly.

English

3.4K

Dan Robinson@danlovesproofs·9 Ara

We built a bug finder. We're finding serious, "let's fix that right now" issues in every codebase we run it on. Introducing Detail!

English

360

112.6K

Phil Glazer@phil_glazer·26 Kas

on the things that really matter, the distance between anthropic's models and competing models is actually growing, not shrinking they might be pulling away

English

106

Phil Glazer@phil_glazer·26 Kas

@jerhadf Truly a great model - if I had to reach for critique, occasionally exhibits “overeager” behavior present in sonnet 3.7 being a bit too agentic/ambitious/confident

English

2.9K

jeremy@jerhadf·26 Kas

not hearing enough opus 4.5 criticism! let's hear it!

jeremy@jerhadf

what do people think about Opus 4.5 for coding so far? what are the behavioral problems or limitations you still want to see improved? we're hungry for feedback 🙏

English

195

339

110.7K

Phil Glazer@phil_glazer·25 Kas

@jerhadf anthropic has long underhyped+overdelivered and it is much appreciated

English

513

jeremy@jerhadf·24 Kas

one thing about anthropic...we don't hype a lot. perhaps i will even go out on a limb to say that we may, in fact, be underhyping in the case of opus

Zvi Mowshowitz@TheZvi

They're burying a lot here. There's a 66% price cut from Opus 4.1 to $5/$25, it uses fewer tokens to solve problems, upgrades to Claude Code in the app, no more length limits on conversations, no more Opus-specific plan caps...

English

287

35.9K

Phil Glazer@phil_glazer·31 Eki

@nejatian we do a lot of that for opendoor's business 🙂 things looking much better lately vs 2022!

English

931

Kaz Nejatian@nejatian·31 Eki

The trick to basically everything that has a number next to it is to ask "but what do the cohorts look like"?

English

363

45K

Keşfet

@ch402 @The_Whole_Daisy @bcherny @ayushjaiswal @rahulgs @john_ssuh @karpathy @dwarkesh_sp