Kapil

646 posts

Kapil

Kapil

@KapilBuilds

AI systems | Workflow automation | Building tools that save time

Entrou em Kasım 2024
80 Seguindo29 Seguidores
Kapil
Kapil@KapilBuilds·
@Yuchenj_UW true and the hard part is that the switch itself breaks context, so even a stronger model often loses some advantage when you have to re-explain everything.
English
0
0
0
27
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
One big problem with agentic coding today is that models are pretty “spiky.” For example, Claude Opus is better at frontend + agentic workflows, while GPT-5.4 is better at backend + distributed systems. But Claude Code and Codex are locked into their own models. You also often have to jump between them. I sometimes write code with Claude, then when it has a complex bug, I have to spin up a separate terminal to have Codex review it. Ideally, you’d want multiple models collaborating within the same context. Automatic model routing and cross-model or agent collaboration will be a huge unlock. There are a few technical challenges. Early model routers (like ChatGPT’s) were pretty rough. (Cursor and OpenCode seem to be in the best position to do this. Let me know if they already have strong model routers.)
English
83
17
293
30.1K
Kapil
Kapil@KapilBuilds·
The role of engineers is changing in a subtle way. The leverage from AI is real, but it mostly rewards people who already know how to structure problems, reduce complexity, and judge tradeoffs. Writing code is becoming cheaper. Good technical judgment is not. That may be why the strongest engineers still stand out even more when everyone has access to similar tools.
English
0
0
0
10
Kapil
Kapil@KapilBuilds·
While everyone debates GPT-5.4 vs Claude Opus benchmarks. Anthropic quietly gave Apple, Microsoft, Amazon, Google and Cisco access to a model that can find vulnerabilities in their own products that no human researcher ever caught. That's a different conversation entirely.
English
0
1
1
29
Kapil
Kapil@KapilBuilds·
@svpino a lot of this does feels like positioning before public comparison happens.
English
0
0
0
117
Santiago
Santiago@svpino·
You've been saying the same things since GPT-2: • We can't release this • We have god in a bottle • We are all doomed • This one is different fr • AGI is here • End of civilization next week So spare me the grandiose speech, show us the model, and let people judge.
English
80
63
786
20.5K
Kapil
Kapil@KapilBuilds·
@signulll Yeah, remote control and dispatch feel like real game changers, that’s where it starts moving from assistant to actual workflow layer.
English
0
0
0
1K
signüll
signüll@signulll·
it’s incredibly fascinating to see anthropic build & ship what is effectively an os for almost all of white collar labor. claude code is the base layer, mcp is the linker, & each model upgrade is a flag that optimizes everything above it simultaneously. this is what compounding actually looks like in the ai era & it’s why the gap between anthropic & everyone else might actually be wider than maybe most think.
English
45
31
837
53.4K
Kapil
Kapil@KapilBuilds·
@forgebitz Honestly, some of it just feels like marketing, perception shifts faster than the actual product most of the time.
English
0
0
1
66
Klaas
Klaas@forgebitz·
coming out with "the best ai model" for coding and cybersecurity a week after leaking your entire source code is wild
English
39
90
1.4K
21.1K
Kapil
Kapil@KapilBuilds·
@yezhang1998 It's a valid concern, probably why many enterprises will prefer tighter self-hosted or isolated workflows as usage deepens.
English
0
0
1
2.9K
Ye Zhang
Ye Zhang@yezhang1998·
Genuine question: if every major company starts scanning their entire codebase with Claude Code (auth flows, security logic, vuln patches, all of it) isn’t that a massive privacy concern? Anthropic would theoretically get visibility into the core product logic and security infrastructure for huge chunks of the world. Even with enterprise “no training” promises and zero-retention policies, that’s a lot of trust to place in a single vendor. Am I overthinking this, or is it actually problematic?
English
210
47
1.7K
247.3K
Kapil
Kapil@KapilBuilds·
GitHub just hit 1 billion commits in a single year. A 25% jump from last year. The reason? AI coding tools. Developers are now merging 43 million pull requests every month. Software is being written faster than at any point in human, and we're just getting started. If you're not shipping AI-assisted projects in 2026, you're not competing. You're watching.
English
1
0
1
12
Kapil
Kapil@KapilBuilds·
Anthropic's Claude Mythos benchmarks dropped today. The numbers are insane. Here's Mythos vs Opus 4.6 — their previous best model: SWE-bench Verified (coding): 80.8% → 93.9% SWE-bench Pro (hard coding): 53.4% → 77.8% Terminal-Bench (agentic): 65.4% → 82.0% Humanity's Last Exam: 40.0% → 56.8% SWE-bench Multimodal: 27.1% → 59.0% CyberGym (security): 66.6% → 83.1% SWE-bench Multimodal literally doubled. SWE-bench Pro jumped +24 points. Anthropic built it and decided the world isn't ready for it yet. That's either the most responsible thing an AI company has ever done, or the wildest flex in tech history.
English
0
0
0
32
Kapil
Kapil@KapilBuilds·
@Yuchenj_UW I think the better engineers are the ones, who keeps reducing complexity, because simpler systems are usually easier to scale, debug, and maintain.
English
0
0
0
53
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
The dumbest way to evaluate engineers today is by counting lines of code.
English
128
22
583
66.2K
Kapil
Kapil@KapilBuilds·
@haider1 right now, distribution may matter more than model quality, once models get close enough, adoption will create the edge.
English
0
0
2
254
Haider.
Haider.@haider1·
and suddenly, it looks like the AI race is between openai, anthropic, and google they have the best models, research, talent, and the compute needed to keep scaling. i would not rule out xAI, because elon is building data centers very quickly but meta, microsoft, amazon, and apple do not seem to have much of a chance right now
English
115
19
429
34K
Kapil
Kapil@KapilBuilds·
Claude Code just shipped Remote Control, and it changes everything about how developers work. You kick off a complex task in your terminal. Close your laptop. Walk away. Your code never leaves your machine, but from your phone, you can: → Watch Claude refactor files in real time → Approve or redirect actions mid-task → Keep your full local environment (filesystem, MCP servers, config) alive. No cloud upload,no port forwarding. Just scan a QR code and go. The Claude Code PM literally said: "Take a walk, see the sun, walk your dog, without losing your flow." This isn't autocomplete. This is an AI agent running your codebase while you live your life.
English
0
0
0
34
Kapil
Kapil@KapilBuilds·
Hot take: Claude Code is the most underrated tool in tech right now. 100,000+ GitHub stars. Used by engineers at Microsoft, Google, and OpenAI. It doesn't suggest code, it reads your entire project, makes multi-file edits, runs your tests, and handles PRs. All from your terminal. In plain English. When did this become normal? #ClaudeCode #AITools
English
1
0
0
20
Kapil
Kapil@KapilBuilds·
Nobody is talking about the real AI shift happening right now. It's not ChatGPT. It's not Gemini. It's AI agents quietly replacing entire workflows, research, emails, scheduling, customer support. The people building these agents today are going to look like wizards in 18 months. What's one task you'd automate first? #AIAgents #Automation
English
1
0
1
27
Kapil
Kapil@KapilBuilds·
@shiri_shh Feels like distribution is becoming even more valuable now.
English
0
0
0
40
shirish
shirish@shiri_shh·
people in marketing are going to make a lot of money over the next few years. might be the right time to pivot.
English
152
77
1.9K
165.6K
Kapil
Kapil@KapilBuilds·
@yacineMTB Yeah, now the hard part is knowing when the output is actually right.
English
0
0
1
40
kache
kache@yacineMTB·
Coding agents are the ultimate unhobbling of talent. The quality of your code is now only limited by you, not by the human limitations of actually writing out code and remembering all the abstract syntax. The only limit now is understanding what the computer is actually doing
English
41
29
588
18.4K
Kapil
Kapil@KapilBuilds·
@Joestar_sann a lot of people just wanted something practical fast, not necessarily the most optimal path.
English
0
0
0
6
Joestar
Joestar@Joestar_sann·
so let me get this straight all of ai twitter was telling people to buy a mac mini to run openclaw, which is literally just a framework, an orchestration layer that sends api requests to actual ai models. something you can run on a $5/month vps. which is exactly what i do btw but when google drops gemma 4, an actual large language model that you can run and fine-tune locally on that same mac mini, with no api costs, no subscriptions, no third party dependencies, completely yours under apache 2.0 the ai community is silent you were buying $800 hardware to run a wrapper but ignoring the actual ai model that would justify that hardware this tells you everything you need to know about the average iq of ai twitter
English
723
1.3K
22.2K
807.1K
Kapil
Kapil@KapilBuilds·
@Yuchenj_UW Makes sense why Claude gets expensive fast when people use it like an actual worker.
English
0
0
0
329
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I’m pretty sure the $20/$200 subscription pricing was vibe-coded by OpenAI, then copied by Anthropic. That pricing works for chatbots, not agents. A 24/7 agent can burn through orders of magnitude more tokens than a user chatting with a chatbot. Now they’re stuck. Neither Anthropic nor OpenAI wants to be the first to change pricing and risk user churn, so the options are: keep subsidizing, get more GPUs, tighter rate limits, and enforce rules like limiting 3rd-party apps. I wouldn’t be surprised if intelligence gets more expensive, not cheaper.
English
191
67
1.8K
216.6K
Kapil
Kapil@KapilBuilds·
@ravikiran_dev7 feels like the degree stayed the same, but the market around it changed way too fast.
English
1
0
2
334
Ray🫧
Ray🫧@ravikiran_dev7·
Computer Science went from one of the absolute best degrees to pursue to one of the worst all within a decade Absolute nuts !
English
145
93
1.6K
141.8K
Kapil
Kapil@KapilBuilds·
@pcshipp Feels bad when it stops in the middle of work.
English
0
0
2
421
pc
pc@pcshipp·
Hot take: Purchasing Claude Code for $20 is the biggest scam
English
126
18
626
41.3K