Mitchell Troyanovsky (@mitch_troy) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Excited to announce this milestone. We are deploying long-horizon agents at scale that can do real work in the real economy. Our agents can already operate reliably for hours at a time and I expect that to be days by the end of the year. We can barely keep up with demand. If you want to come join and work at the frontier for the most intense, fast-paced couple years of your life, send me a note.

Basis@trybasis

We've raised $100m at a $1.15b valuation from @Accel, @GVteam, and existing investors to accelerate deployment of the most capable and accurate accounting agents across CAS, tax, audit, and advisory. Basis is used by 30% of the Top 25 accounting firms and dozens more across the Top 150. Today we're announcing the first accounting agent to complete a business tax workbook end-to-end. Our focus on production-grade, long-horizon agents means that 12 months from now, the work Basis handles will make even this look routine. We're looking for a few very intense people who want to build at the frontier.

English

42

21

276

82.9K

Mitchell Troyanovsky@mitch_troy·7h

#1 You did say that on the show on 1, but I think it could have been more strongly pushed back on. The gap between the reality and what kedrosky is saying on the benchmarks (like Epoch composite) is that wide majority of benchmarks measure a models capability to perform some task. But what actually matters is a models ability to regulate it's own state. Like how in Memento when Leonard has to be smart about deciding what to write down and then when waking up in what order to read it. Models have improved dramatically in that ability in ways that I think have been predictable since o3 de-risked the relationship between post-training RL and quality/length of reasoning at inference time and Gemini 3/opus 4 de-risked the idea that more data/parameters in pre-training was asymptoting. The best benchmarks for seeing this are coding ones and METR and maybe ARC. Many other ones just dont matter anymore and so the composite tells a false story. #2 I agree that the level of subjectivity is inversely correlated with the benefits throwing more tokens at it has. But, I think where he's wrong is thinking that SWE is all deterministic and other work (like law or accounting) is very subjective. The more senior one is, the more the work is subjective. I.e writing a react component per a codebase's styling guide is straightforward and kinda verifiable, but choosing how to design a database schema is not. So tokens help with the former and less so with the latter in terms of end to end work. However, in most knowledge work you see the exact same pattern. People are already treated like non-deterministic functions by their superiors where there are layers of review & trust etc. That exists because there actually IS some form of right answer. it may not be programmatically verifiable like running the code, but the majority of software development is also not programmatically verifiable. If something can be pseudo-verified with inference then it too will require tons of tokens. As an example, say I am a sales leader that has an opinion of how we should speak to prospects on calls. To the extent that I can encode my intent into english, I can now use that to verify outputs across an entire sales organization just like a software team might use unit tests. It's the same thing, just a question of whether the testing is through deterministic compute or inference

English

0

1

14

Derek Thompson@DKThomp·8h

strong counter-arguments. i agree on 1 (i think i said as much on the show) on 2 i think there is probably a nuanced position between kedrosky and his critics, where SWE work is more iterative and requires a fairly unique level of operational perfection that might result in SWE use of agents to be higher than non-SWE use of agents even after we've reached some fuller level of adoption

English

1

0

1

44

Derek Thompson@DKThomp·1d

New newsletter: The transcript of my AI bubble conversation, with @pkedrosky. Feat.: - Why did the Mag7 equity miracle suddenly stop? - The growing private credit crisis, explained - Why the enormous revenue boom from new agents like Claude Code might be a sugar high, in which explosive revenue growth today precedes much slower revenue growth after AI adoption among software engineers peaks - Where equity value is flowing if it’s leaving software - Why US productivity seems to be rising but actually isn't derekthompson.org/p/yes-ai-is-a-…

English

8

84

98.1K

Mitchell Troyanovsky@mitch_troy·8h

@DKThomp It's a nice story but it isn't true on the #1 x.com/mitch_troy/sta…

Mitchell Troyanovsky@mitch_troy

@DKThomp The two pillars of @pkedrosky's argument are fundamentally flawed. 1. "Orchestration improvements have driven capabilities, rather than model improvements". Wrong! Claude code/Codex are really good now BECAUSE of model improvements. This whole idea of investing at the orchestration layer is a lack of understanding. Yes there are innovations there, but most of the improvements are from models getting smarter at regulating their own state & exploring their environment NOT from harnesses 2. "Outside of coding, knowledge work is token compressive. so it creates less tokens than more" Wrong! Even in engineering many times the hardest things are not writing MORE code. it's making hard decisions. It can take a HUGE amount of tokens to eventually write 2 lines of code because of the work in planning, iterating, validating etc. The whole quote of "sorry I wrote such a long letter, I didn't have time to write a shorter one" applies here. In other domains, and we run agents for accounting, the work is extremely token intensive because you are paying people like Basis for accuracy and competency and as models get smarter you can spend order of magnitude MORE tokens to reach a decision or take an action that is good even if that action itself is few tokens.

English

1

0

36

Derek Thompson@DKThomp·1d

Since some people are interpreting this chart as a theory I personally hold strongly rather than a theory I considered worth passing along, I'll make my own view here more plain: 1. I think Paul is right that SWE jobs might be token intensive in a way that other white collar jobs are not. 2. If (1) is true, we should expect to see token growth and revenue growth rise fastest during intensive periods of SWE agent adoption and we can't extrapolate that trend-line forward for all white-collar jobs, if and when AI adoption rises throughout the white-collar workforce 3. I still think the basic insights of Jevon's paradox will apply to the white collar workforce and AI. I think cheap tokens and powerful models will change aspects of white-collar work, even if it's hard to currently predict how and how much, and so I'm not confident that today's inference surge is *obviously* the sugar-high vertical line of the S-curve, which would sharply level off some time soon.

English

2

0

2

6.1K

Mitchell Troyanovsky@mitch_troy·8h

@DKThomp The two pillars of @pkedrosky's argument are fundamentally flawed. 1. "Orchestration improvements have driven capabilities, rather than model improvements". Wrong! Claude code/Codex are really good now BECAUSE of model improvements. This whole idea of investing at the orchestration layer is a lack of understanding. Yes there are innovations there, but most of the improvements are from models getting smarter at regulating their own state & exploring their environment NOT from harnesses 2. "Outside of coding, knowledge work is token compressive. so it creates less tokens than more" Wrong! Even in engineering many times the hardest things are not writing MORE code. it's making hard decisions. It can take a HUGE amount of tokens to eventually write 2 lines of code because of the work in planning, iterating, validating etc. The whole quote of "sorry I wrote such a long letter, I didn't have time to write a shorter one" applies here. In other domains, and we run agents for accounting, the work is extremely token intensive because you are paying people like Basis for accuracy and competency and as models get smarter you can spend order of magnitude MORE tokens to reach a decision or take an action that is good even if that action itself is few tokens.

Derek Thompson@DKThomp

New newsletter: The transcript of my AI bubble conversation, with @pkedrosky. Feat.: - Why did the Mag7 equity miracle suddenly stop? - The growing private credit crisis, explained - Why the enormous revenue boom from new agents like Claude Code might be a sugar high, in which explosive revenue growth today precedes much slower revenue growth after AI adoption among software engineers peaks - Where equity value is flowing if it’s leaving software - Why US productivity seems to be rising but actually isn't derekthompson.org/p/yes-ai-is-a-…

English

0

1

3

308

Mitchell Troyanovsky@mitch_troy·1d

Humans are already good at working with non-deterministic systems but those systems are their coworkers not their computers. The key to building great UX for agents is thinking deeply on how to activate the “working with intelligence” part of user expectations rather than the “working with software”

English

3

0

11

725

Mitchell Troyanovsky@mitch_troy·2d

@VinaiRachakonda Wait tell me more. Why?

English

0

27

Vinai Rachakonda@VinaiRachakonda·2d

@mitch_troy anything but okta!!!!!!

English

1

0

58

Mitchell Troyanovsky@mitch_troy·3d

Okta or google for enterprise SSO? What do folks do these days

English

5

0

1

1.1K

Mitchell Troyanovsky@mitch_troy·6d

What is the right setup these days to allow agents to effectively makes "mocks" in HTML from your codebase?

English

0

1

479

Mitchell Troyanovsky@mitch_troy·13 Mar

@JustinBleuel Lmao

HT

0

8

4.4K

Justin@JustinBleuel·13 Mar

@mitch_troy hm reunion going to be awk

English

1

0

10

5.4K

Mitchell Troyanovsky@mitch_troy·12 Mar

An insane heist by xAI

Jason Ginsberg@JasonBud

I’m proud to be joining SpaceX and xAI with @milichab It has become clear that software is changing fundamentally. More and more, people can shape the tools they use directly, and the ceiling of what can be built keeps rising. What makes xAI special is the scale of its ambition: to build from first principles all the way out to the stars. I’m especially grateful to work on products that expand human agency and freedom. That mission is deeply personal to me. My family came to the United States fleeing communism, and the belief that freedom should be part of the next generation of the internet has driven me every day since Andrew and I started Skiff. Now, we get to work on intelligence, understanding, and freedom on a universal scale.

Eesti

85

377

2.4K

863.1K

Mitchell Troyanovsky@mitch_troy·13 Mar

@0xKoller @bentossell @xmcp_dev So you’re saying they built their own mcp app that makes charts and then flowed it through. Makes sense

English

0

1

35

Koller@0xKoller·12 Mar

@bentossell @mitch_troy You can just do it with @xmcp_dev 😄 Just run 𝚗𝚙𝚡 𝚌𝚛𝚎𝚊𝚝𝚎-𝚡𝚖𝚌𝚙-𝚊𝚙𝚙 --𝚎𝚡𝚊𝚖𝚙𝚕𝚎 𝚖𝚌𝚙-𝚊𝚙𝚙 and it will clone you this repo for a quickstart github.com/xmcp-dev/templ… But if you want to make one from scratch you can follow this xmcp.dev/blog/mcp-apps

English

1

0

2

77

Ben Tossell@bentossell·12 Mar

who's going to be the first to reverse-eng this and release open-source?

Claude@claudeai

Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: claude.ai

English

36

6

175

44.1K

Mitchell Troyanovsky@mitch_troy·12 Mar

damn with the twitter verified logo on day 1 too

English

1

0

33

5.9K

Mitchell Troyanovsky@mitch_troy·12 Mar

@bentossell @0xKoller can u open source?

English

1

0

67

Ben Tossell@bentossell·12 Mar

@0xKoller yeh trying to build my own now brb

English

1

0

352

Mitchell Troyanovsky@mitch_troy·12 Mar

@JasonBud @shaunmmaguire @milichab Dude congrats that’s amazing!

English

0

2

12

3.4K

Jason Ginsberg@JasonBud·12 Mar

I’m proud to be joining SpaceX and xAI with @milichab It has become clear that software is changing fundamentally. More and more, people can shape the tools they use directly, and the ceiling of what can be built keeps rising. What makes xAI special is the scale of its ambition: to build from first principles all the way out to the stars. I’m especially grateful to work on products that expand human agency and freedom. That mission is deeply personal to me. My family came to the United States fleeing communism, and the belief that freedom should be part of the next generation of the internet has driven me every day since Andrew and I started Skiff. Now, we get to work on intelligence, understanding, and freedom on a universal scale.

English

480

493

8.6K

47.8M

Mitchell Troyanovsky@mitch_troy·10 Mar

Anyone build an oss tool that awaits all the code review agents and de-dupes them?

English

1

0

3

732

Mitchell Troyanovsky@mitch_troy·9 Mar

@avontell Can u run it locally? Why need to go through PR flow

English

0

1

69

Aaron Vontell@avontell·9 Mar

This tool is sooooo good. Having CC make a PR, get feedback through code review, and then address those comments has been very powerful.

Claude@claudeai

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

English

2

0

2

138

Mitchell Troyanovsky@mitch_troy·9 Mar

@EvanMendenhall_ haha thank you!

English

0

1

4

Evan Mendenhall@EvanMendenhall_·9 Mar

@mitch_troy Good job today, sent one of my friends your hiring information. He helped me solve an issue on some real-time voice agents which act as the office manager for four businesses currently. Hope he applies, I like what you said in the OpenAI Build Hour!

English

1

0

1

32

Mitchell Troyanovsky@mitch_troy·9 Mar

@ryancarson haha ty. hopefully i did ok. Deck/Content a tad rushed lol....

English

0

1

36