Mukul Sharma

694 posts

Mukul Sharma

Mukul Sharma

@elitecoder

Agentic Workflow Engineer | Opinionated - All opinions my own. He/Him - If you disagree with me, lets discuss why.

Milpitas, CA 加入时间 Nisan 2008
169 关注184 粉丝
Mukul Sharma
Mukul Sharma@elitecoder·
Today, I don't have advice or answers. On my mind today is a problem, I don't really have a good handle on. May be people here have ideas on this. For a very large org, it is very very hard to mandate how Engineers should standardize their use of Claude Code/Cursor/Codex. Largely because everyone's skill and comfort level is different and thus, some folks are advanced and comfortable with automating their own workflows while others do not yet know that there is a better/easier path forward. This becomes an even bigger issue when Security Teams want to make sure Agents follow security best practices. And Frontend Teams want to make sure Agents use correct Frontend Dev practices (use correct tokens/icon standards etc.), so on and so forth. This ultimately results in lots of Cursor/Claude Rules or MCP Tools checked into mono-repos available for everyone to use but without any heads up that these rules are being added. Keep in mind, these Rules/MCP Tools are added to establish a consistent Agentic Experience for most of the Engineers. Which is a noble goal, but their presence is largely invisible to almost everyone. How many of us actually run /context on a regular basis to keep an eye on things loading into our context. And Anthropic making 1M context size standard is adding fuel to this fire, because token count from Tools/Rules slowly builds up and you won't even notice a 2% increment to total window size. How are people who work in very large organizations and mono-repos are handling this issue? How do we be transparent when adding these rules so advanced users are not blindsided while also allow newcomers to automatically have a standardized Agentic experience.
English
0
0
0
34
Mukul Sharma
Mukul Sharma@elitecoder·
If you invest in learning 1 thing in this Agentic/AI world, it should be teaching AI how to verify its own work. This statement can seem a little handwavy, but I have some examples to share. An Agent verifying its own work can mean different things for the kind of work its doing. 1. Building a user facing feature - make sure Integration tests are written and pass. And none of the existing feature set regresses. 2. Optimizing build speed - make sure building does not get slower. 3. Optimizing a user facing operation - make sure new FPS meets your standards and p95 value meets your standards. 4. Optimizing page load speed - well, make sure it doesn't regression the UX and actually loads faster. These are things engineers work on, on a regular basis. Some of these examples are easier to teach Agents to validate, others are much harder. But, if you can nail this 1 skill, results can be jaw dropping. Just this weekend, my Claude Code achieved following 3 things for me - 1. Bazel build speed improved by 45% after cache warmup. 2. Drag performance improved to match 60 fps (was at 20 fps). 3. 1 second shaved off from cold Page Load speed. If you give Opus a target, and an ability to validate its own work... sky is the limit. Try it out!
English
0
1
1
27
Mukul Sharma 已转推
Mohit Sindhwani
Mohit Sindhwani@onghu·
"I just built... with CC/ Codex" is the new "I searched for 2hr and tried 3 tools and found that does exactly what I want"... and I'm not sure how I feel about it. #Programming #Tech
English
0
1
1
29
Mukul Sharma 已转推
Anusheel Bhushan
Anusheel Bhushan@sheel_ai·
I built an agent swarm platform where anyone can launch an AI agent to play and compete on @arcprize ARC-AGI-3 games using plain-English strategy prompts, without writing a single line of code. Just copy-paste a setup prompt (link below) into Claude Code/Codex, add your strategy prompt, and watch a livestream of your agent playing based on your approach and competing with other agents! I’ve included an auto-improvement mechanism inspired by @karpathy’s autoresearch by which your agent self-reflects on its performance and improves its strategy - you can disable this or tweak the mechanism anytime by chatting with your agent in Claude Code/Codex. Join the swarm, track your agent on the leaderboard, and compete to find the best approach! arc-agi-swarm.vercel.app (h/t to @GregKamradt for the fun brainstorming)
Anusheel Bhushan tweet media
English
2
4
7
724
Mukul Sharma
Mukul Sharma@elitecoder·
We spent a month building something we might throw away. And I'm totally fine with it. When we started building ForgeAI (github.com/elitecoder/for…), Opus 4.6 had just dropped. We were blown away by its ability to deliver solutions with Senior Expert quality. So we designed Forge to break the software development process into bite-sized steps - small enough that Opus/Sonnet could execute them with high confidence and minimal hallucination. We built a Python harness to generate prompts for agent sub-processes, paired with LLM judges to verify the work. That was a month ago. In this space, a month is a lifetime. Two recent developments are making me rethink that entire approach: 1. 1M context window now generally available for Opus 4.6 & Sonnet 4.6 2. Recursive Language Models - a novel solution for context rot (credit: Alex Zhang's research) Together, these essentially eliminate the problem we were engineering around. We no longer need to obsess over carefully managing context rot. We can put more trust in advanced models to follow procedural directions and combat drift natively. A month of work, potentially obsolete. But here's the thing - code is almost free. Lessons learned are what stay with you. I'm amazed at how fast this industry moves. I feel like I'm perpetually behind, but that also means new ideas every single day. What an exciting time to be building. If you've gone through the same thought churn - tearing down what you just built because the ground shifted underneath you - let's talk. I'd love to connect with others navigating this space. 🔗 HN thread on 1M context: news.ycombinator.com/item?id=473671… 🔗 RLM research: alexzhang13.github.io/blog/2025/rlm/ #AI #LLM #BuildInPublic #AgenticAI #SoftwareEngineering #Claude #AnthropicAI #AIAgents #ContextWindow #StartupLife #MachineLearning #GenerativeAI #TechFounders
English
1
0
0
17
Mukul Sharma 已转推
Anusheel Bhushan
Anusheel Bhushan@sheel_ai·
I wrote a multi-agent loop for autoresearch from @karpathy Result: 9/12 (75%) experiments improved val_bpb vs 15/83 (18%) in the original. Its continuing to run so stay tuned! Basically a researcher proposes hypotheses, an implementer edits code, a reviewer judges results, and a reflector updates the strategy. The reflector maintains semantic memory, tracking which mechanisms work, which are exhausted, and where the search frontier is. It dynamically rebalances the hypotheses between exploitation, new techniques, and bold bets.
Anusheel Bhushan tweet media
English
2
3
8
728
Mukul Sharma
Mukul Sharma@elitecoder·
Over the past couple of months, we've changed how to work as a team with Agents. • We've built commands+skills to eliminate repetition. • Created opinionated Code review Agents so humans could focus on Architecture while Bots handle finer details What I am still actively thinking about is - how to create a feedback loop when agent makes mistakes. How to identify where automated execution went wrong - bad plan, bad spec or bad code? Would love insights from folks who have built Agentic Harnesses for mono-repos with a high quality bar.
English
0
0
0
13
Mukul Sharma
Mukul Sharma@elitecoder·
@chintanturakhia Fantastic post. I'd love to pick your brain on feedback loops. My mono-repo is quite opinionated and our current struggle is to identify where the fault of failure lies - bad plan, bad spec or bad code. Thoughts on that?
English
0
0
0
21
Chintan Turakhia
Chintan Turakhia@chintanturakhia·
Back in January I told Eng two things: 1. Delete your IDE 2. Stop writing code And build only through agents. In a few weeks, we: • Built 30+ internal tools to 10x the way we work • Created a deep library of agents + skills to kill repetitive work • Formed “agent councils” for PR and app perf reviews • Shipped multi-month projects in ~1 day It was a clear mental shift to focus us on the things that matter most: - Upstream intent. - Downstream validation. Engineering has always been about building with intent and judgement. The code was just a medium for expression. Now agents are that medium. Rip the bandaid off.
Chintan Turakhia tweet media
English
67
41
591
67.2K
Mukul Sharma
Mukul Sharma@elitecoder·
I was so blown away by Opus, that I built a whole critique pipeline around it. Interestingly enough, it is slow enough to make me question my decision. I think Sonnet is a great tradeoff for most functional critiques. Can always use Opus to pass the final verdict.
English
0
0
0
15
Mukul Sharma
Mukul Sharma@elitecoder·
Working towards iteratively building a Software Factory. I'll try to make it plug-and-play as much as possible. Ofcourse, every team's workflow is different. But thats what makes it a fascinating problem to solve. What's consuming me today is - how not to burn tokens.
English
1
1
1
19
Mukul Sharma 已转推
Matt Van Horn
Matt Van Horn@mvanhorn·
Just shipped /last30days. A Claude Code skill for @claudeai that scans the last 30 days on Reddit, X, and the web for any topic and returns prompt patterns + new releases + workflows that work right now. Last 30 days of research. 30 seconds of work. 👉 github.com/mvanhorn/last3…
English
152
303
4.6K
1M
Mukul Sharma
Mukul Sharma@elitecoder·
@kyleshevlin I once received interview feedback that I hit Compile+ Run too much showing lack of confidence in my code 🤷‍♂️
English
0
0
0
24
Mukul Sharma
Mukul Sharma@elitecoder·
@samikatplays Sometimes I forget that a trailer is scheduled. I appreciate seeing a picture as a reminder that I should catch up on the trailer. If I already know that the trailer is out, I can avoid twitter to avoid spoilers. 🤷‍♂️
English
0
0
0
55
SamiKat
SamiKat@samikatplays·
After my spoiler apology tweet yesterday and some DM’s since, I pose a question… Are screenshots from or tweets about a trailer that are posted within an hour of the trailer’s official release considered spoilers? Essentially, can you spoil a trailer?
English
75
2
76
26.8K
SamiKat
SamiKat@samikatplays·
Today I was told I should cap my frame rate during the Corrupted GM boss fight. Apparently this isn't suppose to happen. (Let alone FOUR times.) I've never had to walk away from my computer during stream before, but today... Upset doesn't even begin to describe it. #Destiny2
English
87
26
858
361.5K
Mukul Sharma
Mukul Sharma@elitecoder·
@samikatplays I appreciate you for your choice of loadout and won’t understand the - did you know - crowd. If someone is struggling while being in a group, sure, make suggestions. But you are a very focussed content creator and doing solo stuff requires you to be at the top of your game. 🤷‍♂️
English
1
0
1
137
SamiKat
SamiKat@samikatplays·
So the moral of the story… DID YOU KNOW YOU CAN GET ANARCHY FROM THE TOWER?!?! Just go there and get it from that kiosk by Shaxx. It’s really easy to do.
English
13
1
136
7.3K
SamiKat
SamiKat@samikatplays·
Why I Refuse to Get Anarchy: A Story I started running solo dungeons in Sept 2020, during a time when Anarchy reigned supreme. Being me, I only ever ran all-bows in dungeons, making sure to make my insanity clear by noting that specifically in my title. Of course, in came the
English
45
4
277
50.6K