Shriyash Upadhyay

267 posts

Shriyash Upadhyay banner
Shriyash Upadhyay

Shriyash Upadhyay

@shriyashku

Founder @withmartian

Katılım Şubat 2018
127 Takip Edilen368 Takipçiler
Sabitlenmiş Tweet
Shriyash Upadhyay
Shriyash Upadhyay@shriyashku·
Safety is being subsumed to capitalism because when there is misalignment between the two, capitalism wins. The only way to make sure AI is safe is to make a strong capitalist case for the technologies that will make AI safe: creating an economic incentive to understand models.
English
2
0
8
1.2K
Shriyash Upadhyay retweetledi
Martian
Martian@withmartian·
We've been tracking AI code review tools across OSS, and a new category is emerging. We're calling it "Deep Review": → Standard AI review: PR-level, fast, human in the loop → Deep Review: repo-wide context, runs autonomously in the background 🧵👇
English
8
6
54
39K
Shriyash Upadhyay
Shriyash Upadhyay@shriyashku·
First was codegen, now code review. Every product category will have background agents. Tools in most fields talk about augmenting humans, but that’s a bad design pattern. It encourages humans to be the bottleneck. Things will just happen in the background, automatically
Martian@withmartian

We've been tracking AI code review tools across OSS, and a new category is emerging. We're calling it "Deep Review": → Standard AI review: PR-level, fast, human in the loop → Deep Review: repo-wide context, runs autonomously in the background 🧵👇

English
0
1
11
445
Shriyash Upadhyay
Shriyash Upadhyay@shriyashku·
We caught the same pattern with Claude Code Review. Reviews written by claude-code improved in the data weeks before Anthropic's announcement on Monday. This is how we figured out the launch was coming before the blog post dropped.
Shriyash Upadhyay tweet media
English
1
0
1
63
Shriyash Upadhyay retweetledi
CodeRabbit
CodeRabbit@coderabbitai·
Every AI code review benchmark published so far has one thing in common: they were all made by vendors. And somehow, their own tool always wins. That just changed with the first independent benchmark. Heres how we performed on real OSS PR's! 👇
CodeRabbit tweet media
English
6
12
56
14.3K
Shriyash Upadhyay retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
The developer space is absolutely on fire over the last few days. 🔥 And now we have Martian releasing the largest coding benchmark ever to evaluate how well AI agents review your daily code. And its open-sourced. This is also the first unbiased code review benchmark to finally stop AI models from cheating on tests. The real breakthrough is that this is the first "self-correcting" benchmark that can't be gamed by marketing teams or lazy training data. Most benchmarks are like a fixed school exam that never changes; once the "students" (the AI models) see the questions enough times, they just memorize the answers, and the test becomes useless. Martian structurally fixed this by introducing a Dual-Layer Evaluation system. They have an "Offline" layer (a fair, side-by-side test on static data) and an "Online" layer (tracking real-world behavior of what developers actually use). If an AI company tries to "cheat" by optimizing their model specifically for the offline test, their score will stop matching the real-world usage in the online layer, and everyone will see they are faking it. This dual method completely stops companies from rigging the scores and proves which tools actually work. This is the first time we've had a measuring stick for AI that actually survives contact with the real world without breaking down or becoming biased over time. They combined live data from human behavior with isolated offline tests to evaluate over 200,000 code changes. The system remains totally neutral because the creators do not sell any coding assistants themselves. Software teams finally have a reliable measuring standard that adapts to the real world and never breaks.
Rohan Paul tweet media
Martian@withmartian

Introducing Code Review Bench v0: codereview.withmartian.com The first independent code review benchmark. 200,000+ PRs. Unbiased. Fully OSS. Updated daily. Tool performance highlights 🧵👇 Featuring: @augmentcode @baz_scm @claudeai @coderabbitai @cursor @GeminiApp @github @graphite @greptile @kilocode @OpenAIDevs @propelcode @QodoAI

English
16
11
69
8K