Tadyo

204 posts

Tadyo

@ucfmkr

I test AI tools so you don't have to 3 real picks/week → The Lever newsletter No hype. Just tools that move the needle 🔗 https://t.co/9hOKsmHnKb

Katılım Mayıs 2024

155 Takip Edilen31 Takipçiler

Tadyo@ucfmkr·15h

the gap between agent demos and agent deploys is not a model gap. it is a permissions and audit log gap the model usually works. the org chart usually does not x.com/LangChain/stat…

LangChain@LangChain

New Conceptual Guide: You don’t know what your agent will do until it’s in production 👀 With traditional software, you ship with reasonable confidence. Test coverage handles most paths. Monitoring catches errors, latency, and query issues. When something breaks, you read the stack trace. Agents are different. Natural language input is unbounded. LLMs are sensitive to subtle prompt variations. Multi-step reasoning chains are hard to anticipate in dev. Production monitoring for agents needs a different playbook. In our latest conceptual guide, we cover why agent observability is a different problem, what to actually monitor, and what we've learned from teams deploying agents at scale. Read the guide ➡️ blog.langchain.com/you-dont-know-…

English

Tadyo@ucfmkr·22h

The teams shipping production agents this year are not the ones with the best model access. They are the ones who shrank the workflow until the math stopped fighting them.

English

Tadyo@ucfmkr·22h

The fix that actually works is fewer steps. Cut the chain in half. Add a deterministic step where you can. Fail loudly when the model is unsure instead of guessing.

English

Tadyo@ucfmkr·22h

An agent with 85% accuracy per step running a 10 step workflow succeeds about 20% of the time. The model is not the problem. The math is the problem.

English

Tadyo@ucfmkr·1d

@AIHighlight the older more educated higher paid workers being affected first is not a surprise to anyone in compliance or finance ops those roles were already 80% structured workflow. that is exactly what frontier models eat first

English

AI Highlight@AIHighlight·1d

🚨BREAKING: Anthropic just published a study mapping exactly which jobs its own AI is replacing right now. The workers most at risk are not who anyone expected. They are older. They are more educated. They earn 47% more than average. And they are nearly four times more likely to hold a graduate degree than the workers AI is not touching. The argument is straightforward. Anthropic built a new metric called "observed exposure." Not what AI could theoretically do. What it is actually doing right now in professional settings, measured against millions of real Claude conversations from enterprise users. For computer and math workers, AI is theoretically capable of handling 94% of their tasks. It is currently handling 33% of them. For office and administrative roles, theoretical capability is 90%. Current observed usage is 40%. The gap between what AI can do and what it is already doing is enormous. The researchers are explicit about what comes next. As capabilities improve and adoption deepens, the red area grows to fill the blue. The demographic finding is what makes the paper uncomfortable. The most AI-exposed workers earn 47% more on average than the least exposed group. They are more likely to be female. They are more likely to be college educated. This is not a story about warehouse workers or truck drivers. It is a story about lawyers, financial analysts, market researchers, and software developers. The exact group whose education was supposed to insulate them. Computer programmers showed the highest observed AI exposure at 74.5%. Customer service representatives at 70.1%. Data entry keyers at 67.1%. Medical record specialists at 66.7%. Market research analysts and marketing specialists at 64.8%. These are not predictions. These are measurements of work that is already happening on AI platforms right now. Then there is the pipeline finding nobody is talking about loudly enough. Anthropic's researchers found a 14% decline in the job-finding rate for workers aged 22 to 25 in highly exposed occupations since ChatGPT launched. No comparable effect for workers over 25. Entry-level roles were never just jobs. They were the training ground where junior analysts became senior analysts, where junior lawyers learned how arguments hold together. If that layer disappears, nobody has answered the question of where the next generation of senior professionals comes from. The detail buried in the paper that most coverage missed: 30% of American workers have zero AI exposure at all. Cooks. Mechanics. Bartenders. Dishwashers. The technology reshaping professional careers is completely irrelevant to roughly a third of the workforce. The divide is no longer between high skill and low skill. It is between presence and absence. The company publishing this study is the same company selling the AI doing the replacing. Anthropic had every commercial incentive to soften these findings. They published them anyway. If you spent four years and $200,000 on a degree to land a white collar career, the company that builds Claude just confirmed your job is more exposed than the bartender pouring drinks at your graduation party. Source: Anthropic, "Labor market impacts of AI: A new measure and early evidence" PDF: anthropic.com/research/labor…

English

258

1.5K

4.4K

798.5K

Tadyo@ucfmkr·1d

The bigger shift is architectural. The default for some workflows is moving from API call to a hyperscaler to checkpoint on a local box. That changes how you think about the whole stack.

English

Tadyo@ucfmkr·1d

We have been testing Qwen on internal docs this week. It is not faster than the frontier models. It is enough faster than nothing, which is what we had before for sensitive workflows.

English

Tadyo@ucfmkr·1d

A 27B model that runs locally on a 3090 just matched Claude 4.5 Opus on Terminal Bench. For anyone in regulated work, that headline reads differently than it does for the rest of the timeline.

English

Tadyo@ucfmkr·1d

vibe coding works for the part of the product where bugs cost you nothing the rest of the product is where you actually ship money. that part still needs someone who reads the code before it merges x.com/garrytan/statu…

Garry Tan@garrytan

For 25% of the Winter 2025 batch, 95% of lines of code are LLM generated. That’s not a typo. The age of vibe coding is here.

English

Tadyo@ucfmkr·2d

@rahulgs matches what we see on compliance reviews. opus stays cheaper on the smallest diffs but 5.5 wins everything past a few hundred lines cache write pricing is the part most teams haven't accounted for yet

English

rahul@rahulgs·2d

GPT-5.5 is ~39% cheaper than Opus 4.7, across merged PRs bucketed by diff size in Inspect despite the higher output token cost, 5.5 is cheaper for input tokens (cache writes are free), more token efficient, and tokenizes the same text to fewer tokens

English

1.1K

133.5K

Tadyo@ucfmkr·2d

5 hour rate limits stopped being a usage policy the moment teams built production workflows on top of the model. Now it is a load bearing assumption. Pricing change is not the story. The story is what breaks the next morning

English

Tadyo@ucfmkr·2d

The fix is not better models. The fix is shrinking the agent's job until the boundary is so obvious nobody can argue with it.

English

Tadyo@ucfmkr·2d

We learned this on a compliance bot last quarter. The model was the easy part. The hard part was the four legacy systems it had to talk to, none of which had documentation a human had read in years.

English

Tadyo@ucfmkr·2d

The pattern is the same every time. The prototype handles the happy path. Production has to handle the audit log, the SSO flow, the role check, the retry policy, and one slightly drunk customer at 2am.

English

Tadyo@ucfmkr·2d

The 88% number lands hard if you have ever sat in a board meeting and watched the same agent demo for the third quarter in a row. It always works on the slide. It rarely works on the org chart.

English

Tadyo@ucfmkr·2d

For every 33 AI prototypes built this year, 4 reach production. The other 29 die in the gap between demo and deploy.

English

Keşfet

@AIHighlight @rahulgs @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA