Inokentii Mykhailov

102 posts

Inokentii Mykhailov

Inokentii Mykhailov

@gregolsent

Principal Engineer at @intercom

เข้าร่วม Haziran 2010
71 กำลังติดตาม365 ผู้ติดตาม
Darragh Curran
Darragh Curran@darraghcurran·
Last week I wrote about how we smashed our 2x goal at @intercom - and one number attracting the most surprise: 19% of PRs now auto-approved by AI, no human in the loop. Today @gregolsent and Niamh Young published the deep dive on how we built this, and why it makes us safer, not less safe.
Darragh Curran tweet media
English
2
1
26
1.7K
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@irvinebroque Absolutely! Our safety sub-agent explicitly calls out missing feature flag or when it doesn’t fully “cover” the diff
English
0
0
3
365
Brendan Irvine-Broque
Brendan Irvine-Broque@irvinebroque·
@gregolsent are the agent-approved and merged changes guarded by feature flags? is that an input the review agent considers?
English
1
0
1
510
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
You hear "coding is solved" a lot these days. Maybe. What's not solved yet is delivering massive product changes to your customers quickly and safely. An agentic coding tool will gladly create a thousand-line Pull Request on your behalf – but no human/bot review can properly de-risk those monsters. The only proven way to ship a large, risky diff is by splitting it into smaller, self-contained MVPs, uncovering all the unknown unknowns one by one, incrementally, in production – your only objective reality. Our ambitious goal is to create a positive incentive in the socio-technical system (Intercom's R&D) for folks to ship in smaller increments, faster and hence safer. Everyone will have a choice to either create a large PR and wait hours for a human review or ship things incrementally and benefit from streamlined reviews and auto-approvals from our agent. Here's the post with more details: intercom.com/blog/ai-is-app…
English
3
2
40
10.3K
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
Pumping pressure into the system to surface real bottlenecks is the ONLY way to get your shipping factory modernised. AI code generation means nothing unless you set ambitious target to drive a real socio-technical change in your R&D. Hats off to @darraghcurran – true pioneer!
Darragh Curran@darraghcurran

9 months ago we publicly committed to 2x the productivity of our R&D org at @intercom. It was scary. It wasn't always clear we'd pull it off. We hit it with 3 months to spare. In fact, looking back 16 months - we've 3x'd. Here's what actually happened (with receipts): 🧵

English
1
0
22
3K
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@jonas It’s even easier to ship AI-slop rubber-stamped by humans. Our ultimate goal is to modernize the product shipping factory. If you read the thread - 50% won’t be achieved by approving larger riskier PRs but by incentivizing smaller, safer, incremental changes.
English
0
0
2
108
Jonas Templestein
Jonas Templestein@jonas·
I’m as bullish as the next guy about AI, but man… Setting a goal for 50% of PRs approved without human review is not going to end well It’s the same as aiming for “each engineer spends 2x their salary in tokens” or “our goal is to 3x headcount this year” (back in the pre-AI days) Those may well be properties of the most effective organisations, but those organizations don’t have them as their primary goal V easy to achieve these goals with disastrous side effects
Inokentii Mykhailov@gregolsent

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English
5
2
13
1.7K
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@sebish Unfortunately, can’t do 😬 but i feel your pain and will pass the feedback to our design team 🙇
English
1
0
1
509
Sebastian Tiller
Sebastian Tiller@sebish·
@gregolsent Amazing! While you’re at it can you make hyperlinks in the chat bubbles look like hyperlinks and not like underlined black text. Thank you 🙏
English
1
0
1
647
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...
Inokentii Mykhailov tweet media
English
13
21
219
55.9K
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@GrahamJCampbell Yeah, funny :-) Majority of our largest most prolonged outages had nothing to do with product code changes though. We have evals dataset on historical PRs that were rolledback and reverted - can you guess what % of those were approved by a human? ;-)
English
0
0
3
55
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@SyedMSawaid @nateberkopec We expect more engineers to become factory builders elevating product-capable folks to a higher level of abstraction: problem -> decisions -> outcome shipped with technical details (PRs, CI, etc) moving to background
English
0
0
2
47
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@rockatanescu Exactly! We are driving a cultural change in a socio-technical system. PR review agent has to be intentionally “picky” to incentivize smaller, incremental, faster - and hence safer - iterations.
English
0
0
6
64
Andrei Maxim
Andrei Maxim@rockatanescu·
I think the first tweet is a bit clickbait-y and it's very much worth looking at the details. For example, I think that one of the side-effects of this workflow is that a lot of people will be encouraged to do small refactorings because it will be simple to get the PR merged.
Inokentii Mykhailov@gregolsent

This week at Intercom we hit over 19% of PRs auto-approved by our PR review agent based on Claude Code. Our ambitious goal is to get to 50+% by the end of this month. I'll spill all the details below and you decide yourself if we are out of our damned minds or onto something...

English
1
0
5
264
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@madsenmm That would be an interesting experiment to run if there’s any bias in the agent towards AI slop. If we do the system prompt right I expect the opposite.
English
0
0
0
465
Tobias Madsen
Tobias Madsen@madsenmm·
@gregolsent But… if the code is written by AI. The chance of approval seems very high?
English
1
0
0
640
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@abolabz Not easy, but safe - small incremental changes, feature flagged. LLM can easily generate 1k LOC file, it is much harder to force it into modular design. But our auto-approval “carrot” will shift the culture towards safety 🤞🏻
English
0
0
0
42
Aurélien
Aurélien@abottaz·
@gregolsent Yes. They will do the easy things , code in isolation and try to not trigger anything that would involve human reviews or discussions.
English
1
0
0
46
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@joshmlewis @TheITBagpiper @SocksMyRocks @AitizazK Some PRs are AI-only. Obvs can't share details behind audit process, but you have to build a strong case demonstrating you're not just YOLO-ing your approvals and building a robust, auditable system. They know where the industry is going...
English
1
0
7
221
Josh Lewis
Josh Lewis@joshmlewis·
@gregolsent @TheITBagpiper @SocksMyRocks @AitizazK So is it still AI reviewed all the way to prod or is a human testing / reviewing at some point? If it is all AI, what was the consensus from the auditors on how to still be kosher? I think people could benefit from those learnings
English
1
0
1
234
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@TheITBagpiper @joshmlewis @SocksMyRocks @AitizazK we'll be providing exhaustive evidence "showing that the change was reviewed, tested, and approved with segregation of duties prior to deployment to production". We've been in touch with the auditors since last year on SOC2 controls, in fact...
English
3
0
7
724
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
@rnescalz 100%! Spinning an ephemeral container with app running on PR branch and having claude with playwright clicking through is something we are thinking as a step too!
English
2
0
1
25
Renan Cidale
Renan Cidale@rnescalz·
@gregolsent tested against, sometimes just a FE change, might require some look into backend state to make sure that there is a match. then we have the role of the orchestrator that basically is the smarter model that ping pongs and gets to decides whether the work done by the investigator
English
2
0
1
40
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
Bonus point: can you guess what our "logical correctness" subagent is? Tip: it is not Claude ;-)
English
1
0
10
1.9K
Inokentii Mykhailov
Inokentii Mykhailov@gregolsent·
Only driving a cultural change will get us to 50+% auto-approval – creating a positive incentive to stop pushing massive unsafe AI slop that is harder to review (human or not). Instead ship product incrementally, faster and hence safer. Shipping is our heart-beat after all!
English
1
0
7
2K