Phil Glazer

248 posts

Phil Glazer banner
Phil Glazer

Phil Glazer

@phil_glazer

San Francisco, CA Katılım Temmuz 2016
295 Takip Edilen940 Takipçiler
Phil Glazer
Phil Glazer@phil_glazer·
@ch402 "This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human." would claude say it is living in a simulation?
English
0
0
1
343
Daisy Hollman
Daisy Hollman@The_Whole_Daisy·
It's been a wild ride for my second major Claude Code feature I've had the privilege to lead. I'm proud of the whole team that made this a reality, and grateful to @bcherny for giving me the creative freedom to explore this space. Looking forward to seeing what you build with it!
Claude@claudeai

On Claude Code, we’re introducing agent teams. Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently. Agent teams are in research preview: code.claude.com/docs/en/agent-…

English
26
3
217
35.4K
Phil Glazer
Phil Glazer@phil_glazer·
Imo, this is a legitimate attack vector and the right decision It's still perfectly possible to model switch on a task by loading the message history into the initial message to a separate model instead of constructing a turn-based back and forth that gets passed in As safety work continues maybe the models will be more resilient to these types of attacks and this capability can be returned
dax@thdxr

are we misunderstanding this? the implication is you can't insert any content that anthropic didn't know to have generated this breaks things like switching models mid session and a dozen other things harnesses rely on i switch between claude and gpt all the time :(

English
0
0
1
52
Ayush Jaiswal
Ayush Jaiswal@ayushjaiswal·
Let grok do all the research & tell you the truth.
Ayush Jaiswal tweet media
English
4
0
43
3.4K
Phil Glazer
Phil Glazer@phil_glazer·
@rahulgs does gpt-4 in opencode work well? intuitively feels like it lacked the juice to be effective but maybe it really was there all along
English
1
0
2
402
rahul
rahul@rahulgs·
function calling came out on June 13, 2023 in the OpenAI API with the original gpt-4 API, you literally could have built Claude code, Cowork, and Manus the last two years of progress isn’t necessary applications are finally catching up, it always surprises how long things take to diffuse
rahul tweet media
English
46
6
188
27.5K
Phil Glazer
Phil Glazer@phil_glazer·
tools creation evals are more interesting than tool use evals as agents take on longer running tasks the ability to self reflect and create effective scaffolding is powerful
English
0
0
2
57
Phil Glazer
Phil Glazer@phil_glazer·
just can't be bullish enough on ramp - such a great product and culture, so many companies falling behind on eng but they continue to get it and stay on bleeding edge despite reaching larger scale
rahul@rahulgs

yes things are changing fast, but also I see companies (even faang) way behind the frontier for no reason. you are guaranteed to lose if you fall behind. the no unforced-errors ai leader playbook: For your team: - use coding agents. give all engineers their pick of harnesses, models, background agents: Claude code, Cursor, Devin, with closed/open models. Hearing Meta engineers are forced to use Llama 4. Opus 4.5 is the baseline now. - give your agents tools to ALL dev tooling: Linear, GitHub, Datadog, Sentry, any Internal tooling. If agents are being held back because of lack of context that’s your fault. - invest in your codebase specific agent docs. stop saying “doesn’t do X well”. If that’s an issue, try better prompting, agents.md, linting, and code rules. Tell it how you want things. Every manual edit you make is an opportunity for agent.md improvement - invest in robust background agent infra - get a full development stack working on VM/sandboxes. yes it’s hard to set up but it will be worth it, your engineers can run multiple in parallel. Code review will be the bottleneck soon. - figure out security issues. stop being risk averse and do what is needed to unblock access to tools. in your product: - always use the latest generation models in your features (move things off of last gen models asap, unless robust evals indicate otherwise). Requires changes every 1-2 weeks - eg: GitHub copilot mobile still offers code review with gpt 4.1 and Sonnet 3.5 @jaredpalmer. You are leaving money on the table by being on Sonnet 4, or gpt 4o - Use embedding semantic search instead of fuzzy search. Any general embedding model will do better than Levenshtein / fuzzy heuristics. - leave no form unfilled. use structured outputs and whatever context you have on the user to do a best-effort pre-fill - allow unstructured inputs on all product surfaces - must accept freeform text and documents. Forms are dead. - custom finetuning is dead. Stop wasting time on it. Frontier is moving too fast to invest 8 weeks into finetuning. Costs are dropping too quickly for price to matter. Better prompting will take you very far and this will only become more true as instruction following improves - build evals to make quick model-upgrade decisions. they don’t need to be perfect but at least need to allow you to compare models relative to each other. most decisions become clear on a Pareto cost vs benchmark perf plot - encourage all engineers to build with ai: build primitives to call models from all code bases / models: structured output, semantic similarity endpoints, sandbox code execution. etc What else am I missing?

English
1
0
4
291
John Suh
John Suh@john_ssuh·
It seems like when @karpathy did the interview with @dwarkesh_sp, he had only used Cursor and wasn’t fully aware of the state of the art of agents. I predict a new podcast in the next 6 months where he drastically shortens his timelines and expands the impact of agents as-is
English
2
1
4
666
Phil Glazer
Phil Glazer@phil_glazer·
@BucknSF x.com/phil_glazer/st… prob less than 2yrs, this is like <1wk side project added web search earlier today + could add anything with an api clip doesn't do justice on larger models, does pretty decent work feels like 6mo away from strong on most modeling use cases
Phil Glazer@phil_glazer

fun exploration from the past couple of days, an excel add in that can both: - passively observe your work and suggest edits, like tab complete in an IDE - create plans and build models

English
0
0
2
62
Phil Glazer
Phil Glazer@phil_glazer·
fun exploration from the past couple of days, an excel add in that can both: - passively observe your work and suggest edits, like tab complete in an IDE - create plans and build models
English
0
0
5
237
Phil Glazer
Phil Glazer@phil_glazer·
@modestproposal1 with the various params available via the api (reasoning level, token allocation, web search, etc) i think approximating this is definitely possible it's possible to take X min as a target and work backwards on params to have it take about that long
English
0
0
1
40
modest proposal
modest proposal@modestproposal1·
would be cool if there's a way over time the models could tell you ETA to output ahead of time. after you type a query but before sending it, you toggle between models to see the difference between eg 5.2 Pro vs Extended Thinking, and decide if the extra 30 mins are worth it.
English
4
0
17
7.1K
Phil Glazer
Phil Glazer@phil_glazer·
will start posting about a handful of custom tools + MCPs I've built and use day-to-day and also various half-baked product explorations (probably some open source) to start, just a pleasant image from image gen MCP
Phil Glazer tweet media
English
0
0
2
113
Phil Glazer
Phil Glazer@phil_glazer·
GPTs within ChatGPT that are just additional hidden information and prompts aren't interesting (the base model and web search can handle this) GPTs within ChatGPT that expose new tools and capabilities will be interesting ChatGPT via GPTs will become a distribution channel itself to ship to - a layer in front of almost everything else This hasn't happened yet because remote MCPs are only available on Pro tier with Deep Research toggled on
English
1
0
5
361
Dan Robinson
Dan Robinson@danlovesproofs·
In 2025 the limiting reagent in building software is human attention. Compute is cheap, information entering a human brain is expensive. Devtools should optimize accordingly.
English
2
3
32
3.4K
Dan Robinson
Dan Robinson@danlovesproofs·
We built a bug finder. We're finding serious, "let's fix that right now" issues in every codebase we run it on. Introducing Detail!
Dan Robinson tweet media
English
28
24
360
112.6K
Phil Glazer
Phil Glazer@phil_glazer·
on the things that really matter, the distance between anthropic's models and competing models is actually growing, not shrinking they might be pulling away
English
0
0
2
106
Phil Glazer
Phil Glazer@phil_glazer·
@jerhadf Truly a great model - if I had to reach for critique, occasionally exhibits “overeager” behavior present in sonnet 3.7 being a bit too agentic/ambitious/confident
English
2
0
17
2.9K
Phil Glazer
Phil Glazer@phil_glazer·
@jerhadf anthropic has long underhyped+overdelivered and it is much appreciated
English
0
0
3
513
Phil Glazer
Phil Glazer@phil_glazer·
@nejatian we do a lot of that for opendoor's business 🙂 things looking much better lately vs 2022!
Phil Glazer tweet media
English
0
2
8
931
Kaz Nejatian
Kaz Nejatian@nejatian·
The trick to basically everything that has a number next to it is to ask "but what do the cohorts look like"?
English
23
18
363
45K