Mukul Sharma

694 posts

Mukul Sharma

@elitecoder

Agentic Workflow Engineer | Opinionated - All opinions my own. He/Him - If you disagree with me, lets discuss why.

Milpitas, CA 加入时间 Nisan 2008

169 关注184 粉丝

Mukul Sharma@elitecoder·14 Nis

Today, I don't have advice or answers. On my mind today is a problem, I don't really have a good handle on. May be people here have ideas on this. For a very large org, it is very very hard to mandate how Engineers should standardize their use of Claude Code/Cursor/Codex. Largely because everyone's skill and comfort level is different and thus, some folks are advanced and comfortable with automating their own workflows while others do not yet know that there is a better/easier path forward. This becomes an even bigger issue when Security Teams want to make sure Agents follow security best practices. And Frontend Teams want to make sure Agents use correct Frontend Dev practices (use correct tokens/icon standards etc.), so on and so forth. This ultimately results in lots of Cursor/Claude Rules or MCP Tools checked into mono-repos available for everyone to use but without any heads up that these rules are being added. Keep in mind, these Rules/MCP Tools are added to establish a consistent Agentic Experience for most of the Engineers. Which is a noble goal, but their presence is largely invisible to almost everyone. How many of us actually run /context on a regular basis to keep an eye on things loading into our context. And Anthropic making 1M context size standard is adding fuel to this fire, because token count from Tools/Rules slowly builds up and you won't even notice a 2% increment to total window size. How are people who work in very large organizations and mono-repos are handling this issue? How do we be transparent when adding these rules so advanced users are not blindsided while also allow newcomers to automatically have a standardized Agentic experience.

English

Mukul Sharma@elitecoder·13 Nis

If you invest in learning 1 thing in this Agentic/AI world, it should be teaching AI how to verify its own work. This statement can seem a little handwavy, but I have some examples to share. An Agent verifying its own work can mean different things for the kind of work its doing. 1. Building a user facing feature - make sure Integration tests are written and pass. And none of the existing feature set regresses. 2. Optimizing build speed - make sure building does not get slower. 3. Optimizing a user facing operation - make sure new FPS meets your standards and p95 value meets your standards. 4. Optimizing page load speed - well, make sure it doesn't regression the UX and actually loads faster. These are things engineers work on, on a regular basis. Some of these examples are easier to teach Agents to validate, others are much harder. But, if you can nail this 1 skill, results can be jaw dropping. Just this weekend, my Claude Code achieved following 3 things for me - 1. Bazel build speed improved by 45% after cache warmup. 2. Drag performance improved to match 60 fps (was at 20 fps). 3. 1 second shaved off from cold Page Load speed. If you give Opus a target, and an ability to validate its own work... sky is the limit. Try it out!

English

Mukul Sharma 已转推

Mohit Sindhwani@onghu·7 Nis

"I just built... with CC/ Codex" is the new "I searched for 2hr and tried 3 tools and found that does exactly what I want"... and I'm not sure how I feel about it. #Programming #Tech

English

Mukul Sharma 已转推

Anusheel Bhushan@sheel_ai·20 Mar

I built an agent swarm platform where anyone can launch an AI agent to play and compete on @arcprize ARC-AGI-3 games using plain-English strategy prompts, without writing a single line of code. Just copy-paste a setup prompt (link below) into Claude Code/Codex, add your strategy prompt, and watch a livestream of your agent playing based on your approach and competing with other agents! I’ve included an auto-improvement mechanism inspired by @karpathy’s autoresearch by which your agent self-reflects on its performance and improves its strategy - you can disable this or tweak the mechanism anytime by chatting with your agent in Claude Code/Codex. Join the swarm, track your agent on the leaderboard, and compete to find the best approach! arc-agi-swarm.vercel.app (h/t to @GregKamradt for the fun brainstorming)

English

724

Mukul Sharma@elitecoder·14 Mar

We spent a month building something we might throw away. And I'm totally fine with it. When we started building ForgeAI (github.com/elitecoder/for…), Opus 4.6 had just dropped. We were blown away by its ability to deliver solutions with Senior Expert quality. So we designed Forge to break the software development process into bite-sized steps - small enough that Opus/Sonnet could execute them with high confidence and minimal hallucination. We built a Python harness to generate prompts for agent sub-processes, paired with LLM judges to verify the work. That was a month ago. In this space, a month is a lifetime. Two recent developments are making me rethink that entire approach: 1. 1M context window now generally available for Opus 4.6 & Sonnet 4.6 2. Recursive Language Models - a novel solution for context rot (credit: Alex Zhang's research) Together, these essentially eliminate the problem we were engineering around. We no longer need to obsess over carefully managing context rot. We can put more trust in advanced models to follow procedural directions and combat drift natively. A month of work, potentially obsolete. But here's the thing - code is almost free. Lessons learned are what stay with you. I'm amazed at how fast this industry moves. I feel like I'm perpetually behind, but that also means new ideas every single day. What an exciting time to be building. If you've gone through the same thought churn - tearing down what you just built because the ground shifted underneath you - let's talk. I'd love to connect with others navigating this space. 🔗 HN thread on 1M context: news.ycombinator.com/item?id=473671… 🔗 RLM research: alexzhang13.github.io/blog/2025/rlm/ #AI #LLM #BuildInPublic #AgenticAI #SoftwareEngineering #Claude #AnthropicAI #AIAgents #ContextWindow #StartupLife #MachineLearning #GenerativeAI #TechFounders

English

Mukul Sharma 已转推

Anusheel Bhushan@sheel_ai·12 Mar

I wrote a multi-agent loop for autoresearch from @karpathy Result: 9/12 (75%) experiments improved val_bpb vs 15/83 (18%) in the original. Its continuing to run so stay tuned! Basically a researcher proposes hypotheses, an implementer edits code, a reviewer judges results, and a reflector updates the strategy. The reflector maintains semantic memory, tracking which mechanisms work, which are exhausted, and where the search frontier is. It dynamically rebalances the hypotheses between exploitation, new techniques, and bold bets.

English

728

Mukul Sharma@elitecoder·8 Mar

Over the past couple of months, we've changed how to work as a team with Agents. • We've built commands+skills to eliminate repetition. • Created opinionated Code review Agents so humans could focus on Architecture while Bots handle finer details What I am still actively thinking about is - how to create a feedback loop when agent makes mistakes. How to identify where automated execution went wrong - bad plan, bad spec or bad code? Would love insights from folks who have built Agentic Harnesses for mono-repos with a high quality bar.

English

Mukul Sharma@elitecoder·8 Mar

@chintanturakhia Fantastic post. I'd love to pick your brain on feedback loops. My mono-repo is quite opinionated and our current struggle is to identify where the fault of failure lies - bad plan, bad spec or bad code. Thoughts on that?

English

Chintan Turakhia@chintanturakhia·6 Mar

Back in January I told Eng two things: 1. Delete your IDE 2. Stop writing code And build only through agents. In a few weeks, we: • Built 30+ internal tools to 10x the way we work • Created a deep library of agents + skills to kill repetitive work • Formed “agent councils” for PR and app perf reviews • Shipped multi-month projects in ~1 day It was a clear mental shift to focus us on the things that matter most: - Upstream intent. - Downstream validation. Engineering has always been about building with intent and judgement. The code was just a medium for expression. Now agents are that medium. Rip the bandaid off.

English

591

67.2K

Mukul Sharma@elitecoder·16 Şub

I was so blown away by Opus, that I built a whole critique pipeline around it. Interestingly enough, it is slow enough to make me question my decision. I think Sonnet is a great tradeoff for most functional critiques. Can always use Opus to pass the final verdict.

English

Mukul Sharma@elitecoder·16 Şub

Working towards iteratively building a Software Factory. I'll try to make it plug-and-play as much as possible. Ofcourse, every team's workflow is different. But thats what makes it a fascinating problem to solve. What's consuming me today is - how not to burn tokens.

English

Mukul Sharma 已转推

Matt Van Horn@mvanhorn·26 Oca

Just shipped /last30days. A Claude Code skill for @claudeai that scans the last 30 days on Reddit, X, and the web for any topic and returns prompt patterns + new releases + workflows that work right now. Last 30 days of research. 30 seconds of work. 👉 github.com/mvanhorn/last3…

English

152

303

4.6K

Mukul Sharma@elitecoder·13 Nis

@kyleshevlin I once received interview feedback that I hit Compile+ Run too much showing lack of confidence in my code 🤷‍♂️

English

Mukul Sharma@elitecoder·10 Mar

@JustTeeBee @TwitchRivals @DestinyTheGame @DestinyGameUK @Destiny2Team TeeBee, rooting for you and Haza-go get it!!

English

TeeBee@JustTeeBee·10 Mar

I'm live participating in @TwitchRivals for the @DestinyTheGame Root of Nightmares Worlds First Raid Race! Emblem drops are ENABLED so tap in, hang out & support and get some rewards on the side! <3 twitch.tv/justteebee @DestinyGameUK @Destiny2Team #RootofNightmares #Destiny2

English

442

Mukul Sharma@elitecoder·26 Şub

@samikatplays Sometimes I forget that a trailer is scheduled. I appreciate seeing a picture as a reminder that I should catch up on the trailer. If I already know that the trailer is out, I can avoid twitter to avoid spoilers. 🤷‍♂️

English

SamiKat@samikatplays·25 Şub

After my spoiler apology tweet yesterday and some DM’s since, I pose a question… Are screenshots from or tweets about a trailer that are posted within an hour of the trailer’s official release considered spoilers? Essentially, can you spoil a trailer?

English

26.8K

Mukul Sharma@elitecoder·20 Şub

@trunarla @trunarla Do you mind talking some common Web stacks in the future? Eg. Webpack etc.

English

˗ˏˋmewtru´ˎ˗@trunarla·13 Şub

Rate my programming setup 👀

English

263

3.1K

223.7K

Mukul Sharma@elitecoder·12 Şub

@samikatplays @samikatplays Hang in there. Tomorrow will bring a better mood.

English

1.8K

SamiKat@samikatplays·12 Şub

Today I was told I should cap my frame rate during the Corrupted GM boss fight. Apparently this isn't suppose to happen. (Let alone FOUR times.) I've never had to walk away from my computer during stream before, but today... Upset doesn't even begin to describe it. #Destiny2

English

858

361.5K

Mukul Sharma@elitecoder·9 Şub

@samikatplays I appreciate you for your choice of loadout and won’t understand the - did you know - crowd. If someone is struggling while being in a group, sure, make suggestions. But you are a very focussed content creator and doing solo stuff requires you to be at the top of your game. 🤷‍♂️

English

137

SamiKat@samikatplays·9 Şub

So the moral of the story… DID YOU KNOW YOU CAN GET ANARCHY FROM THE TOWER?!?! Just go there and get it from that kiosk by Shaxx. It’s really easy to do.

English

136

7.3K

SamiKat@samikatplays·9 Şub

Why I Refuse to Get Anarchy: A Story I started running solo dungeons in Sept 2020, during a time when Anarchy reigned supreme. Being me, I only ever ran all-bows in dungeons, making sure to make my insanity clear by noting that specifically in my title. Of course, in came the

English

277

50.6K

Mukul Sharma 已转推

Sandeep Chukkala@sandeepchukkala·27 Ara

Good article from my CTO @onghu . Valuable thoughts on supporting a issue for enterprise systems notepad.onghu.com/2022/when-prov…

North-East Region, Singapore 🇸🇬 English

117

发现

@arcprize @karpathy @GregKamradt @chintanturakhia @claudeai @kyleshevlin @JustTeeBee @TwitchRivals