Tadyo

204 posts

Tadyo banner
Tadyo

Tadyo

@ucfmkr

I test AI tools so you don't have to 3 real picks/week → The Lever newsletter No hype. Just tools that move the needle 🔗 https://t.co/9hOKsmHnKb

Katılım Mayıs 2024
155 Takip Edilen31 Takipçiler
Tadyo
Tadyo@ucfmkr·
The teams shipping production agents this year are not the ones with the best model access. They are the ones who shrank the workflow until the math stopped fighting them.
English
0
0
0
0
Tadyo
Tadyo@ucfmkr·
The fix that actually works is fewer steps. Cut the chain in half. Add a deterministic step where you can. Fail loudly when the model is unsure instead of guessing.
English
1
0
0
0
Tadyo
Tadyo@ucfmkr·
An agent with 85% accuracy per step running a 10 step workflow succeeds about 20% of the time. The model is not the problem. The math is the problem.
English
1
0
0
0
Tadyo
Tadyo@ucfmkr·
@AIHighlight the older more educated higher paid workers being affected first is not a surprise to anyone in compliance or finance ops those roles were already 80% structured workflow. that is exactly what frontier models eat first
English
0
0
0
1
AI Highlight
AI Highlight@AIHighlight·
🚨BREAKING: Anthropic just published a study mapping exactly which jobs its own AI is replacing right now. The workers most at risk are not who anyone expected. They are older. They are more educated. They earn 47% more than average. And they are nearly four times more likely to hold a graduate degree than the workers AI is not touching. The argument is straightforward. Anthropic built a new metric called "observed exposure." Not what AI could theoretically do. What it is actually doing right now in professional settings, measured against millions of real Claude conversations from enterprise users. For computer and math workers, AI is theoretically capable of handling 94% of their tasks. It is currently handling 33% of them. For office and administrative roles, theoretical capability is 90%. Current observed usage is 40%. The gap between what AI can do and what it is already doing is enormous. The researchers are explicit about what comes next. As capabilities improve and adoption deepens, the red area grows to fill the blue. The demographic finding is what makes the paper uncomfortable. The most AI-exposed workers earn 47% more on average than the least exposed group. They are more likely to be female. They are more likely to be college educated. This is not a story about warehouse workers or truck drivers. It is a story about lawyers, financial analysts, market researchers, and software developers. The exact group whose education was supposed to insulate them. Computer programmers showed the highest observed AI exposure at 74.5%. Customer service representatives at 70.1%. Data entry keyers at 67.1%. Medical record specialists at 66.7%. Market research analysts and marketing specialists at 64.8%. These are not predictions. These are measurements of work that is already happening on AI platforms right now. Then there is the pipeline finding nobody is talking about loudly enough. Anthropic's researchers found a 14% decline in the job-finding rate for workers aged 22 to 25 in highly exposed occupations since ChatGPT launched. No comparable effect for workers over 25. Entry-level roles were never just jobs. They were the training ground where junior analysts became senior analysts, where junior lawyers learned how arguments hold together. If that layer disappears, nobody has answered the question of where the next generation of senior professionals comes from. The detail buried in the paper that most coverage missed: 30% of American workers have zero AI exposure at all. Cooks. Mechanics. Bartenders. Dishwashers. The technology reshaping professional careers is completely irrelevant to roughly a third of the workforce. The divide is no longer between high skill and low skill. It is between presence and absence. The company publishing this study is the same company selling the AI doing the replacing. Anthropic had every commercial incentive to soften these findings. They published them anyway. If you spent four years and $200,000 on a degree to land a white collar career, the company that builds Claude just confirmed your job is more exposed than the bartender pouring drinks at your graduation party. Source: Anthropic, "Labor market impacts of AI: A new measure and early evidence" PDF: anthropic.com/research/labor…
AI Highlight tweet media
English
258
1.5K
4.4K
798.5K
Tadyo
Tadyo@ucfmkr·
The bigger shift is architectural. The default for some workflows is moving from API call to a hyperscaler to checkpoint on a local box. That changes how you think about the whole stack.
English
0
0
0
1
Tadyo
Tadyo@ucfmkr·
We have been testing Qwen on internal docs this week. It is not faster than the frontier models. It is enough faster than nothing, which is what we had before for sensitive workflows.
English
1
0
0
2
Tadyo
Tadyo@ucfmkr·
A 27B model that runs locally on a 3090 just matched Claude 4.5 Opus on Terminal Bench. For anyone in regulated work, that headline reads differently than it does for the rest of the timeline.
English
1
0
0
1
Tadyo
Tadyo@ucfmkr·
@rahulgs matches what we see on compliance reviews. opus stays cheaper on the smallest diffs but 5.5 wins everything past a few hundred lines cache write pricing is the part most teams haven't accounted for yet
English
0
0
0
2
rahul
rahul@rahulgs·
GPT-5.5 is ~39% cheaper than Opus 4.7, across merged PRs bucketed by diff size in Inspect despite the higher output token cost, 5.5 is cheaper for input tokens (cache writes are free), more token efficient, and tokenizes the same text to fewer tokens
rahul tweet media
English
35
62
1.1K
133.5K
Tadyo
Tadyo@ucfmkr·
5 hour rate limits stopped being a usage policy the moment teams built production workflows on top of the model. Now it is a load bearing assumption. Pricing change is not the story. The story is what breaks the next morning
English
0
0
0
3
Tadyo
Tadyo@ucfmkr·
The fix is not better models. The fix is shrinking the agent's job until the boundary is so obvious nobody can argue with it.
English
0
0
0
2
Tadyo
Tadyo@ucfmkr·
We learned this on a compliance bot last quarter. The model was the easy part. The hard part was the four legacy systems it had to talk to, none of which had documentation a human had read in years.
English
1
0
0
3
Tadyo
Tadyo@ucfmkr·
The pattern is the same every time. The prototype handles the happy path. Production has to handle the audit log, the SSO flow, the role check, the retry policy, and one slightly drunk customer at 2am.
English
1
0
0
4
Tadyo
Tadyo@ucfmkr·
The 88% number lands hard if you have ever sat in a board meeting and watched the same agent demo for the third quarter in a row. It always works on the slide. It rarely works on the org chart.
English
1
0
0
2
Tadyo
Tadyo@ucfmkr·
For every 33 AI prototypes built this year, 4 reach production. The other 29 die in the gap between demo and deploy.
English
1
0
0
3