Mitchell Troyanovsky

3.3K posts

Mitchell Troyanovsky banner
Mitchell Troyanovsky

Mitchell Troyanovsky

@mitch_troy

Building agents for accountants @trybasis (Hiring)

New York, USA เข้าร่วม Şubat 2017
2.4K กำลังติดตาม2.9K ผู้ติดตาม
ทวีตที่ปักหมุด
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
Excited to announce this milestone. We are deploying long-horizon agents at scale that can do real work in the real economy. Our agents can already operate reliably for hours at a time and I expect that to be days by the end of the year. We can barely keep up with demand. If you want to come join and work at the frontier for the most intense, fast-paced couple years of your life, send me a note.
Basis@trybasis

We've raised $100m at a $1.15b valuation from @Accel, @GVteam, and existing investors to accelerate deployment of the most capable and accurate accounting agents across CAS, tax, audit, and advisory. Basis is used by 30% of the Top 25 accounting firms and dozens more across the Top 150. Today we're announcing the first accounting agent to complete a business tax workbook end-to-end. Our focus on production-grade, long-horizon agents means that 12 months from now, the work Basis handles will make even this look routine. We're looking for a few very intense people who want to build at the frontier.

English
42
21
276
82.9K
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
#1 You did say that on the show on 1, but I think it could have been more strongly pushed back on. The gap between the reality and what kedrosky is saying on the benchmarks (like Epoch composite) is that wide majority of benchmarks measure a models capability to perform some task. But what actually matters is a models ability to regulate it's own state. Like how in Memento when Leonard has to be smart about deciding what to write down and then when waking up in what order to read it. Models have improved dramatically in that ability in ways that I think have been predictable since o3 de-risked the relationship between post-training RL and quality/length of reasoning at inference time and Gemini 3/opus 4 de-risked the idea that more data/parameters in pre-training was asymptoting. The best benchmarks for seeing this are coding ones and METR and maybe ARC. Many other ones just dont matter anymore and so the composite tells a false story. #2 I agree that the level of subjectivity is inversely correlated with the benefits throwing more tokens at it has. But, I think where he's wrong is thinking that SWE is all deterministic and other work (like law or accounting) is very subjective. The more senior one is, the more the work is subjective. I.e writing a react component per a codebase's styling guide is straightforward and kinda verifiable, but choosing how to design a database schema is not. So tokens help with the former and less so with the latter in terms of end to end work. However, in most knowledge work you see the exact same pattern. People are already treated like non-deterministic functions by their superiors where there are layers of review & trust etc. That exists because there actually IS some form of right answer. it may not be programmatically verifiable like running the code, but the majority of software development is also not programmatically verifiable. If something can be pseudo-verified with inference then it too will require tons of tokens. As an example, say I am a sales leader that has an opinion of how we should speak to prospects on calls. To the extent that I can encode my intent into english, I can now use that to verify outputs across an entire sales organization just like a software team might use unit tests. It's the same thing, just a question of whether the testing is through deterministic compute or inference
English
0
0
1
14
Derek Thompson
Derek Thompson@DKThomp·
strong counter-arguments. i agree on 1 (i think i said as much on the show) on 2 i think there is probably a nuanced position between kedrosky and his critics, where SWE work is more iterative and requires a fairly unique level of operational perfection that might result in SWE use of agents to be higher than non-SWE use of agents even after we've reached some fuller level of adoption
English
1
0
1
44
Derek Thompson
Derek Thompson@DKThomp·
New newsletter: The transcript of my AI bubble conversation, with @pkedrosky. Feat.: - Why did the Mag7 equity miracle suddenly stop? - The growing private credit crisis, explained - Why the enormous revenue boom from new agents like Claude Code might be a sugar high, in which explosive revenue growth today precedes much slower revenue growth after AI adoption among software engineers peaks - Where equity value is flowing if it’s leaving software - Why US productivity seems to be rising but actually isn't derekthompson.org/p/yes-ai-is-a-…
English
8
8
84
98.1K
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
@DKThomp It's a nice story but it isn't true on the #1 x.com/mitch_troy/sta…
Mitchell Troyanovsky@mitch_troy

@DKThomp The two pillars of @pkedrosky's argument are fundamentally flawed. 1. "Orchestration improvements have driven capabilities, rather than model improvements". Wrong! Claude code/Codex are really good now BECAUSE of model improvements. This whole idea of investing at the orchestration layer is a lack of understanding. Yes there are innovations there, but most of the improvements are from models getting smarter at regulating their own state & exploring their environment NOT from harnesses 2. "Outside of coding, knowledge work is token compressive. so it creates less tokens than more" Wrong! Even in engineering many times the hardest things are not writing MORE code. it's making hard decisions. It can take a HUGE amount of tokens to eventually write 2 lines of code because of the work in planning, iterating, validating etc. The whole quote of "sorry I wrote such a long letter, I didn't have time to write a shorter one" applies here. In other domains, and we run agents for accounting, the work is extremely token intensive because you are paying people like Basis for accuracy and competency and as models get smarter you can spend order of magnitude MORE tokens to reach a decision or take an action that is good even if that action itself is few tokens.

English
1
0
0
36
Derek Thompson
Derek Thompson@DKThomp·
Since some people are interpreting this chart as a theory I personally hold strongly rather than a theory I considered worth passing along, I'll make my own view here more plain: 1. I think Paul is right that SWE jobs might be token intensive in a way that other white collar jobs are not. 2. If (1) is true, we should expect to see token growth and revenue growth rise fastest during intensive periods of SWE agent adoption and we can't extrapolate that trend-line forward for all white-collar jobs, if and when AI adoption rises throughout the white-collar workforce 3. I still think the basic insights of Jevon's paradox will apply to the white collar workforce and AI. I think cheap tokens and powerful models will change aspects of white-collar work, even if it's hard to currently predict how and how much, and so I'm not confident that today's inference surge is *obviously* the sugar-high vertical line of the S-curve, which would sharply level off some time soon.
English
2
0
2
6.1K
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
@DKThomp The two pillars of @pkedrosky's argument are fundamentally flawed. 1. "Orchestration improvements have driven capabilities, rather than model improvements". Wrong! Claude code/Codex are really good now BECAUSE of model improvements. This whole idea of investing at the orchestration layer is a lack of understanding. Yes there are innovations there, but most of the improvements are from models getting smarter at regulating their own state & exploring their environment NOT from harnesses 2. "Outside of coding, knowledge work is token compressive. so it creates less tokens than more" Wrong! Even in engineering many times the hardest things are not writing MORE code. it's making hard decisions. It can take a HUGE amount of tokens to eventually write 2 lines of code because of the work in planning, iterating, validating etc. The whole quote of "sorry I wrote such a long letter, I didn't have time to write a shorter one" applies here. In other domains, and we run agents for accounting, the work is extremely token intensive because you are paying people like Basis for accuracy and competency and as models get smarter you can spend order of magnitude MORE tokens to reach a decision or take an action that is good even if that action itself is few tokens.
Derek Thompson@DKThomp

New newsletter: The transcript of my AI bubble conversation, with @pkedrosky. Feat.: - Why did the Mag7 equity miracle suddenly stop? - The growing private credit crisis, explained - Why the enormous revenue boom from new agents like Claude Code might be a sugar high, in which explosive revenue growth today precedes much slower revenue growth after AI adoption among software engineers peaks - Where equity value is flowing if it’s leaving software - Why US productivity seems to be rising but actually isn't derekthompson.org/p/yes-ai-is-a-…

English
0
1
3
308
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
Humans are already good at working with non-deterministic systems but those systems are their coworkers not their computers. The key to building great UX for agents is thinking deeply on how to activate the “working with intelligence” part of user expectations rather than the “working with software”
English
3
0
11
725
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
Okta or google for enterprise SSO? What do folks do these days
English
5
0
1
1.1K
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
What is the right setup these days to allow agents to effectively makes "mocks" in HTML from your codebase?
English
0
0
1
479
Justin
Justin@JustinBleuel·
@mitch_troy hm reunion going to be awk
English
1
0
10
5.4K
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
An insane heist by xAI
Jason Ginsberg@JasonBud

I’m proud to be joining SpaceX and xAI with @milichab It has become clear that software is changing fundamentally. More and more, people can shape the tools they use directly, and the ceiling of what can be built keeps rising. What makes xAI special is the scale of its ambition: to build from first principles all the way out to the stars. I’m especially grateful to work on products that expand human agency and freedom. That mission is deeply personal to me. My family came to the United States fleeing communism, and the belief that freedom should be part of the next generation of the internet has driven me every day since Andrew and I started Skiff. Now, we get to work on intelligence, understanding, and freedom on a universal scale.

Eesti
85
377
2.4K
863.1K
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
damn with the twitter verified logo on day 1 too
English
1
0
33
5.9K
Jason Ginsberg
Jason Ginsberg@JasonBud·
I’m proud to be joining SpaceX and xAI with @milichab It has become clear that software is changing fundamentally. More and more, people can shape the tools they use directly, and the ceiling of what can be built keeps rising. What makes xAI special is the scale of its ambition: to build from first principles all the way out to the stars. I’m especially grateful to work on products that expand human agency and freedom. That mission is deeply personal to me. My family came to the United States fleeing communism, and the belief that freedom should be part of the next generation of the internet has driven me every day since Andrew and I started Skiff. Now, we get to work on intelligence, understanding, and freedom on a universal scale.
English
480
493
8.6K
47.8M
Mitchell Troyanovsky
Mitchell Troyanovsky@mitch_troy·
Anyone build an oss tool that awaits all the code review agents and de-dupes them?
English
1
0
3
732
Evan Mendenhall
Evan Mendenhall@EvanMendenhall_·
@mitch_troy Good job today, sent one of my friends your hiring information. He helped me solve an issue on some real-time voice agents which act as the office manager for four businesses currently. Hope he applies, I like what you said in the OpenAI Build Hour!
English
1
0
1
32
Josh
Josh@JoshPurtell·
@mitch_troy He was absolutely right at the time
English
2
0
1
55