Kapil

646 posts

Kapil

@KapilBuilds

AI systems | Workflow automation | Building tools that save time

Entrou em Kasım 2024

80 Seguindo29 Seguidores

Kapil@KapilBuilds·4h

@Yuchenj_UW true and the hard part is that the switch itself breaks context, so even a stronger model often loses some advantage when you have to re-explain everything.

English

Yuchen Jin@Yuchenj_UW·16h

One big problem with agentic coding today is that models are pretty “spiky.” For example, Claude Opus is better at frontend + agentic workflows, while GPT-5.4 is better at backend + distributed systems. But Claude Code and Codex are locked into their own models. You also often have to jump between them. I sometimes write code with Claude, then when it has a complex bug, I have to spin up a separate terminal to have Codex review it. Ideally, you’d want multiple models collaborating within the same context. Automatic model routing and cross-model or agent collaboration will be a huge unlock. There are a few technical challenges. Early model routers (like ChatGPT’s) were pretty rough. (Cursor and OpenCode seem to be in the best position to do this. Let me know if they already have strong model routers.)

English

293

30.1K

Kapil@KapilBuilds·2d

The role of engineers is changing in a subtle way. The leverage from AI is real, but it mostly rewards people who already know how to structure problems, reduce complexity, and judge tradeoffs. Writing code is becoming cheaper. Good technical judgment is not. That may be why the strongest engineers still stand out even more when everyone has access to similar tools.

English

Kapil@KapilBuilds·2d

While everyone debates GPT-5.4 vs Claude Opus benchmarks. Anthropic quietly gave Apple, Microsoft, Amazon, Google and Cisco access to a model that can find vulnerabilities in their own products that no human researcher ever caught. That's a different conversation entirely.

English

Kapil@KapilBuilds·2d

@svpino a lot of this does feels like positioning before public comparison happens.

English

117

Santiago@svpino·2d

You've been saying the same things since GPT-2: • We can't release this • We have god in a bottle • We are all doomed • This one is different fr • AGI is here • End of civilization next week So spare me the grandiose speech, show us the model, and let people judge.

English

786

20.5K

Kapil@KapilBuilds·2d

@signulll Yeah, remote control and dispatch feel like real game changers, that’s where it starts moving from assistant to actual workflow layer.

English

signüll@signulll·2d

it’s incredibly fascinating to see anthropic build & ship what is effectively an os for almost all of white collar labor. claude code is the base layer, mcp is the linker, & each model upgrade is a flag that optimizes everything above it simultaneously. this is what compounding actually looks like in the ai era & it’s why the gap between anthropic & everyone else might actually be wider than maybe most think.

English

837

53.4K

Kapil@KapilBuilds·2d

@forgebitz Honestly, some of it just feels like marketing, perception shifts faster than the actual product most of the time.

English

Klaas@forgebitz·2d

coming out with "the best ai model" for coding and cybersecurity a week after leaking your entire source code is wild

English

1.4K

21.1K

Kapil@KapilBuilds·2d

@yezhang1998 It's a valid concern, probably why many enterprises will prefer tighter self-hosted or isolated workflows as usage deepens.

English

2.9K

Ye Zhang@yezhang1998·3d

Genuine question: if every major company starts scanning their entire codebase with Claude Code (auth flows, security logic, vuln patches, all of it) isn’t that a massive privacy concern? Anthropic would theoretically get visibility into the core product logic and security infrastructure for huge chunks of the world. Even with enterprise “no training” promises and zero-retention policies, that’s a lot of trust to place in a single vendor. Am I overthinking this, or is it actually problematic?

English

210

1.7K

247.3K

Kapil@KapilBuilds·3d

GitHub just hit 1 billion commits in a single year. A 25% jump from last year. The reason? AI coding tools. Developers are now merging 43 million pull requests every month. Software is being written faster than at any point in human, and we're just getting started. If you're not shipping AI-assisted projects in 2026, you're not competing. You're watching.

English

Kapil@KapilBuilds·3d

Anthropic's Claude Mythos benchmarks dropped today. The numbers are insane. Here's Mythos vs Opus 4.6 — their previous best model: SWE-bench Verified (coding): 80.8% → 93.9% SWE-bench Pro (hard coding): 53.4% → 77.8% Terminal-Bench (agentic): 65.4% → 82.0% Humanity's Last Exam: 40.0% → 56.8% SWE-bench Multimodal: 27.1% → 59.0% CyberGym (security): 66.6% → 83.1% SWE-bench Multimodal literally doubled. SWE-bench Pro jumped +24 points. Anthropic built it and decided the world isn't ready for it yet. That's either the most responsible thing an AI company has ever done, or the wildest flex in tech history.

English

Kapil@KapilBuilds·3d

@Yuchenj_UW I think the better engineers are the ones, who keeps reducing complexity, because simpler systems are usually easier to scale, debug, and maintain.

English

Yuchen Jin@Yuchenj_UW·4d

The dumbest way to evaluate engineers today is by counting lines of code.

English

128

583

66.2K

Kapil@KapilBuilds·3d

@haider1 right now, distribution may matter more than model quality, once models get close enough, adoption will create the edge.

English

254

Haider.@haider1·3d

and suddenly, it looks like the AI race is between openai, anthropic, and google they have the best models, research, talent, and the compute needed to keep scaling. i would not rule out xAI, because elon is building data centers very quickly but meta, microsoft, amazon, and apple do not seem to have much of a chance right now

English

115

429

34K

Kapil@KapilBuilds·4d

Claude Code just shipped Remote Control, and it changes everything about how developers work. You kick off a complex task in your terminal. Close your laptop. Walk away. Your code never leaves your machine, but from your phone, you can: → Watch Claude refactor files in real time → Approve or redirect actions mid-task → Keep your full local environment (filesystem, MCP servers, config) alive. No cloud upload,no port forwarding. Just scan a QR code and go. The Claude Code PM literally said: "Take a walk, see the sun, walk your dog, without losing your flow." This isn't autocomplete. This is an AI agent running your codebase while you live your life.

English

Kapil@KapilBuilds·4d

Hot take: Claude Code is the most underrated tool in tech right now. 100,000+ GitHub stars. Used by engineers at Microsoft, Google, and OpenAI. It doesn't suggest code, it reads your entire project, makes multi-file edits, runs your tests, and handles PRs. All from your terminal. In plain English. When did this become normal? #ClaudeCode #AITools

English

Kapil@KapilBuilds·4d

Nobody is talking about the real AI shift happening right now. It's not ChatGPT. It's not Gemini. It's AI agents quietly replacing entire workflows, research, emails, scheduling, customer support. The people building these agents today are going to look like wizards in 18 months. What's one task you'd automate first? #AIAgents #Automation

English

Kapil@KapilBuilds·4d

@shiri_shh Feels like distribution is becoming even more valuable now.

English

shirish@shiri_shh·5d

people in marketing are going to make a lot of money over the next few years. might be the right time to pivot.

English

152

1.9K

165.6K

Kapil@KapilBuilds·4d

@yacineMTB Yeah, now the hard part is knowing when the output is actually right.

English

kache@yacineMTB·4d

Coding agents are the ultimate unhobbling of talent. The quality of your code is now only limited by you, not by the human limitations of actually writing out code and remembering all the abstract syntax. The only limit now is understanding what the computer is actually doing

English

588

18.4K

Kapil@KapilBuilds·4d

@Joestar_sann a lot of people just wanted something practical fast, not necessarily the most optimal path.

English

Joestar@Joestar_sann·5d

so let me get this straight all of ai twitter was telling people to buy a mac mini to run openclaw, which is literally just a framework, an orchestration layer that sends api requests to actual ai models. something you can run on a $5/month vps. which is exactly what i do btw but when google drops gemma 4, an actual large language model that you can run and fine-tune locally on that same mac mini, with no api costs, no subscriptions, no third party dependencies, completely yours under apache 2.0 the ai community is silent you were buying $800 hardware to run a wrapper but ignoring the actual ai model that would justify that hardware this tells you everything you need to know about the average iq of ai twitter

English

723

1.3K

22.2K

807.1K

Kapil@KapilBuilds·4d

@Yuchenj_UW Makes sense why Claude gets expensive fast when people use it like an actual worker.

English

329

Yuchen Jin@Yuchenj_UW·4d

I’m pretty sure the $20/$200 subscription pricing was vibe-coded by OpenAI, then copied by Anthropic. That pricing works for chatbots, not agents. A 24/7 agent can burn through orders of magnitude more tokens than a user chatting with a chatbot. Now they’re stuck. Neither Anthropic nor OpenAI wants to be the first to change pricing and risk user churn, so the options are: keep subsidizing, get more GPUs, tighter rate limits, and enforce rules like limiting 3rd-party apps. I wouldn’t be surprised if intelligence gets more expensive, not cheaper.

English

191

1.8K

216.6K

Kapil@KapilBuilds·1 Nis

@ravikiran_dev7 feels like the degree stayed the same, but the market around it changed way too fast.

English

334