Inokentii Mykhailov

2

45

Raghav Sethi@raghavsethi·23 Nis

@darraghcurran @intercom @gregolsent How are y'all still compliant with SOC2?

English

0

113

Darragh Curran@darraghcurran·22 Nis

Last week I wrote about how we smashed our 2x goal at @intercom - and one number attracting the most surprise: 19% of PRs now auto-approved by AI, no human in the loop. Today @gregolsent and Niamh Young published the deep dive on how we built this, and why it makes us safer, not less safe.

English

1

26

1.7K

Inokentii Mykhailov@gregolsent·21 Nis

@irvinebroque Absolutely! Our safety sub-agent explicitly calls out missing feature flag or when it doesn’t fully “cover” the diff

English

3

365

Brendan Irvine-Broque@irvinebroque·21 Nis

@gregolsent are the agent-approved and merged changes guarded by feature flags? is that an input the review agent considers?

English

With the new job, I didn't get much time or energy to blog lately, but I figured I'd use the various downtime while traveling to Ruby Kaigi to write about some of the things I worked on recently: byroot.github.io/ruby/performan…

0

1

510

Inokentii Mykhailov@gregolsent·21 Nis

You hear "coding is solved" a lot these days. Maybe. What's not solved yet is delivering massive product changes to your customers quickly and safely. An agentic coding tool will gladly create a thousand-line Pull Request on your behalf – but no human/bot review can properly de-risk those monsters. The only proven way to ship a large, risky diff is by splitting it into smaller, self-contained MVPs, uncovering all the unknown unknowns one by one, incrementally, in production – your only objective reality. Our ambitious goal is to create a positive incentive in the socio-technical system (Intercom's R&D) for folks to ship in smaller increments, faster and hence safer. Everyone will have a choice to either create a large PR and wait hours for a human review or ship things incrementally and benefit from streamlined reviews and auto-approvals from our agent. Here's the post with more details: intercom.com/blog/ai-is-app…

English

3

2

40

10.3K

Inokentii Mykhailov@gregolsent·19 Nis

Every second matters! Amazing work by @_byroot !

Jean Boussier@_byroot

English

7

747

Inokentii Mykhailov@gregolsent·17 Nis

@adrian_cooney @darraghcurran Incoming PRs and one of the first bottleneck we hit was our CI/CD.

English

1

46

Agent Cooney@adrian_cooney·17 Nis

@gregolsent @darraghcurran what’s was the pressure?

English

Darragh Curran@darraghcurran

0

81

Inokentii Mykhailov@gregolsent·17 Nis

Pumping pressure into the system to surface real bottlenecks is the ONLY way to get your shipping factory modernised. AI code generation means nothing unless you set ambitious target to drive a real socio-technical change in your R&D. Hats off to @darraghcurran – true pioneer!

9 months ago we publicly committed to 2x the productivity of our R&D org at @intercom. It was scary. It wasn't always clear we'd pull it off. We hit it with 3 months to spare. In fact, looking back 16 months - we've 3x'd. Here's what actually happened (with receipts): 🧵

English

0

22

3K

Inokentii Mykhailov@gregolsent·3 Nis

@jonas It’s even easier to ship AI-slop rubber-stamped by humans. Our ultimate goal is to modernize the product shipping factory. If you read the thread - 50% won’t be achieved by approving larger riskier PRs but by incentivizing smaller, safer, incremental changes.

English

2

108

Jonas Templestein@jonas·3 Nis

I’m as bullish as the next guy about AI, but man… Setting a goal for 50% of PRs approved without human review is not going to end well It’s the same as aiming for “each engineer spends 2x their salary in tokens” or “our goal is to 3x headcount this year” (back in the pre-AI days) Those may well be properties of the most effective organisations, but those organizations don’t have them as their primary goal V easy to achieve these goals with disastrous side effects

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English

5

2

13

1.7K

Inokentii Mykhailov@gregolsent·3 Nis

@sebish Unfortunately, can’t do 😬 but i feel your pain and will pass the feedback to our design team 🙇

English

0

1

509

Sebastian Tiller@sebish·3 Nis

@gregolsent Amazing! While you’re at it can you make hyperlinks in the chat bubbles look like hyperlinks and not like underlined black text. Thank you 🙏

English

0

1

647

Inokentii Mykhailov@gregolsent·2 Nis

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English

13

21

219

55.9K

Inokentii Mykhailov@gregolsent·3 Nis

@GrahamJCampbell Yeah, funny :-) Majority of our largest most prolonged outages had nothing to do with product code changes though. We have evals dataset on historical PRs that were rolledback and reverted - can you guess what % of those were approved by a human? ;-)

English

3

55

Graham Campbell 🐘@GrahamJCampbell·3 Nis

50% uptime.

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English

0

1

1.2K

Inokentii Mykhailov@gregolsent·3 Nis

@SyedMSawaid @nateberkopec We expect more engineers to become factory builders elevating product-capable folks to a higher level of abstraction: problem -> decisions -> outcome shipped with technical details (PRs, CI, etc) moving to background

English

2

47

Sawaid@SyedMSawaid·3 Nis

@nateberkopec Are we cooked?

English

0

85

Nate Berkopec@nateberkopec·3 Nis

The software factory is being built…

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English

0

14

5.6K

Inokentii Mykhailov@gregolsent·3 Nis

@rockatanescu Exactly! We are driving a cultural change in a socio-technical system. PR review agent has to be intentionally “picky” to incentivize smaller, incremental, faster - and hence safer - iterations.

English

6

64

Andrei Maxim@rockatanescu·3 Nis

I think the first tweet is a bit clickbait-y and it's very much worth looking at the details. For example, I think that one of the side-effects of this workflow is that a lot of people will be encouraged to do small refactorings because it will be simple to get the PR merged.

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English

0

5

264

Inokentii Mykhailov@gregolsent·3 Nis

@madsenmm That would be an interesting experiment to run if there’s any bias in the agent towards AI slop. If we do the system prompt right I expect the opposite.

English

465

Tobias Madsen@madsenmm·3 Nis

@gregolsent But… if the code is written by AI. The chance of approval seems very high?

English

0

640

Inokentii Mykhailov@gregolsent·3 Nis

@abolabz Not easy, but safe - small incremental changes, feature flagged. LLM can easily generate 1k LOC file, it is much harder to force it into modular design. But our auto-approval “carrot” will shift the culture towards safety 🤞🏻

English

42

Aurélien@abottaz·3 Nis

@gregolsent Yes. They will do the easy things , code in isolation and try to not trigger anything that would involve human reviews or discussions.

English

0

46

Inokentii Mykhailov@gregolsent·3 Nis

@joshmlewis @TheITBagpiper @SocksMyRocks @AitizazK Some PRs are AI-only. Obvs can't share details behind audit process, but you have to build a strong case demonstrating you're not just YOLO-ing your approvals and building a robust, auditable system. They know where the industry is going...

English

0

7

221

Josh Lewis@joshmlewis·3 Nis

@gregolsent @TheITBagpiper @SocksMyRocks @AitizazK So is it still AI reviewed all the way to prod or is a human testing / reviewing at some point? If it is all AI, what was the consensus from the auditors on how to still be kosher? I think people could benefit from those learnings

English

0

1

234

Inokentii Mykhailov@gregolsent·3 Nis

@TheITBagpiper @joshmlewis @SocksMyRocks @AitizazK ... last year we added deterministic rule-based auto-approvals for a set of specific changes like documentation, tests, etc and new agentic review is the next step

English

69

Inokentii Mykhailov@gregolsent·3 Nis

@TheITBagpiper @joshmlewis @SocksMyRocks @AitizazK we'll be providing exhaustive evidence "showing that the change was reviewed, tested, and approved with segregation of duties prior to deployment to production". We've been in touch with the auditors since last year on SOC2 controls, in fact...

English

3

0

7

724

Inokentii Mykhailov@gregolsent·2 Nis

@rnescalz 100%! Spinning an ephemeral container with app running on PR branch and having claude with playwright clicking through is something we are thinking as a step too!

English

0

1

25

Renan Cidale@rnescalz·2 Nis

@gregolsent tested against, sometimes just a FE change, might require some look into backend state to make sure that there is a match. then we have the role of the orchestrator that basically is the smarter model that ping pongs and gets to decides whether the work done by the investigator

English

0

1

40

Inokentii Mykhailov@gregolsent·2 Nis

Bonus point: can you guess what our "logical correctness" subagent is? Tip: it is not Claude ;-)

English

0

10

1.9K

Inokentii Mykhailov@gregolsent·2 Nis

Only driving a cultural change will get us to 50+% auto-approval – creating a positive incentive to stop pushing massive unsafe AI slop that is harder to review (human or not). Instead ship product incrementally, faster and hence safer. Shipping is our heart-beat after all!

English