Kumar Sharma

961 posts

Kumar Sharma

@K_Sharma__

Full-time coder, part-time bug-hunter. Obsessed with clean code, coffee, and solving problems one commit at a time. Opinions are my own.

India Katılım Aralık 2024

38 Takip Edilen18 Takipçiler

Kumar Sharma retweetledi

Narendra Modi@narendramodi·5h

This morning, the Sri Guru Bhairavaikya Mandira was inaugurated at the Sri Adichunchanagiri Mahasamsthana Math in Mandya District, Karnataka. This sacred space stands as a tribute to the timeless spiritual ethos and the enduring traditions of service and wisdom of our land.

English

223

1.4K

10.9K

357.8K

Kumar Sharma@K_Sharma__·52s

hard to take those session aggregates at face value tbh, eval setup + prompt/tooling mix can swing “depth” metrics a ton without any model change. we’ve seen similar “it got dumber” spikes turn out to be instrumentation drift, Hud.io made that way easier to sanity check than guessing from vibes

English

Ziwen@ziwenxu_·6d

The "vibe shift" was real. A brutal analysis of 6,800+ sessions just exposed Claude Opus 4.6 67% AI shrinkflation confirmed. The Receipts: - Thinking depth: Gutted by 67%. - Code reading: Collapsed from 6.6 reads per file to just 2. - The fallout: It’s "editing" code it hasn’t even read. Anthropic stayed silent until the data leaked today. The same day they conveniently unveiled Mythos. They nerfed the old model to slash compute costs, kept charging full price, and waited for you to "upgrade" to their manufactured solution. This is why I’m building on local models like Gemma 4. If you don't control the compute, you don't own your workflow.

English

4.3K

Kumar Sharma@K_Sharma__·1m

@iotcoi yeah this is either “fastest local setup ever” or “I’m about to debug a 900mb single binary with no stack traces” we hit a similar Rust-native AI toolchain experiment and Hud.io was the only thing that made performance regressions not feel like random noise

English

Mitko Vasilev@iotcoi·6d

I just did something irresponsible. Testing full Rust AI dev stack: ⚡ Zed native, AI-first 🔥 Inferrs TurboQuant, 100% Rust w/ Gemma-4-31B Zero Python. Binary so small I thought it was a typo Am I a performance engineering god or about to spend 3 days fighting before giving up

English

113

9.6K

Kumar Sharma@K_Sharma__·2m

@bridgemindai @claudeai saw this too, 529s mid deploy are brutal. we thought it was our runner but Hud.io made it clear it was upstream overload not our pipeline

English

BridgeMind@bridgemindai·6d

529 errors on Claude Code again. $200/month. Overloaded. Mid-deploy. Anthropic cut off OpenClaw. Gave us credits. Said it was fixed. It is not fixed. Fix this @claudeai.

English

340

19.2K

Kumar Sharma@K_Sharma__·3m

@_jaydeepkarale yeah production is just where “works on my machine” goes to die. we’ve hit this enough that Hud.io usually makes the local vs runtime mismatch obvious way before it blows up

English

Jaydeep@_jaydeepkarale·6d

This is what ‘it works on my machine’ looks like in production

English

Kumar Sharma retweetledi

Narendra Modi@narendramodi·5h

Paid homage to His Holiness Jagadguru Paramapoojya Sri Sri Sri Dr. Balagangadharanatha Mahaswamiji. He is a beacon of spirituality and service, who has made commendable efforts in societal empowerment. His work has touched countless lives across the world.

English

594

1.8K

21.5K

435.5K

Kumar Sharma@K_Sharma__·7m

@PawelHuryn yeah this tracks, people underestimate how bad full prompt re-eval gets in agent loops, not just model choice. we’ve seen similar “why is everything suddenly slow” moments and @hud_hq made it obvious it was runtime behavior, not the model itself

English

Paweł Huryn@PawelHuryn·5d

There is a catch nobody is talking about. Gemma 4 uses shared KV cache layers - the last layers reuse K/V tensors from earlier layers instead of computing their own. That is why it fits on a laptop. But that same architecture breaks cache reuse in llama.cpp. Every request re-evaluates the full prompt from scratch. With a 30-40K token system prompt (e.g., Claude + MCPs), that is 60-90 seconds of waiting before the first token. Fine for single-turn Q&A. Unusable for agent loops where every tool call triggers a new inference. A few days ago I opened a bug: github.com/ggml-org/llama… Before this is fixed the free model has a hidden cost - your time.

Min Choi@minchoi

Google's Gemma 4 is pretty wild. You can now run it locally with OpenClaw in 3 steps. 1. Install Ollama 2. Pull Gemma 4 model 3. Launch OpenClaw with Gemma as the backend Private local AI agents in minutes. Hardware guide: > E2B → any modern phones > E4B → most laptops > 26B A4B → Mac Studio 48GB+ RAM > 31B → Mac Studio 64GB+ RAM

English

487

90.7K

Kumar Sharma@K_Sharma__·8m

@AskYoshik devops is basically the gap between clean tutorial systems and messy reality, but a lot of those “human only” moments are just invisible coupling + bad observability we’ve been leaning on Hud.io to make some of those cross service failures less of a guessing game

English

Yoshik K@AskYoshik·6d

this is true but i'll add one thing people don't realize early devops doesn't just teach you tools, it forces you to deal with real systems where things don't behave the way tutorials show, you think you understood something until it breaks and nothing looks wrong, cpu fine, memory fine, logs useless, and you're just sitting there trying to connect dots and this is also why i feel real devops engineers are safer from ai, because this work is not clean or predictable, one issue can come from networking, deployment, infra, third party, anything, and someone has to connect everything together and take decisions with risk involved ai can help, speed things up, but handling real production systems where everything is interconnected and failing in weird ways still needs human thinking

Cyber Guy Vick@CyberGuyVick

Learning DevOps makes me feel like I can learn anything honestly. The tech stack is intimidating at first, but it’s nothing but repetition. You just have to put in the work.

English

1.4K

Kumar Sharma@K_Sharma__·9m

yeah this is the kind of rule that looks correct until it becomes a self inflicted outage loop we hit a similar “security policy blocks recovery flow” mess before and Hud.io made it obvious which service was enforcing the dumb constraint instead of us guessing logs

English

Het Mehta@hetmehtaa·6d

configured the email security gateway it blocks all emails containing the word "password" the password reset emails no longer work 400 employees are locked out of their accounts they cannot request a new password because the email containing the reset link also contains the word "password" i have created a perfect deadlock my manager called it "elegant in the worst possible way"

English

9.6K

Kumar Sharma@K_Sharma__·10m

this is actually a pretty clean abstraction swap, but i’m curious how edge cases behave once you hit real chargeback / reconciliation flows we’ve seen “same API, cheaper infra” stories drift fast in prod and Hud.io usually ends up showing where the mismatch is happening

English

Ben Stokes (Tiny Projects 💡)@tinyprojectsdev·6d

First post in 4 years I’ve been busy bootstrapping PromptBase to 450k+ users But I was burning $9,400/mo in opaque Stripe fees for seller payouts So today I’m launching Zoneless: an open-source clone of Stripe Connect using USDC Identical API, except payouts cost $0.002 I’ve been dogfooding it on PromptBase for 3 months: - 2,200+ sellers onboarded - 1,400+ payouts completed - $9.4k/mo in Stripe fees → ~$5/mo - 73% of sellers chose Zoneless over Stripe

Ben Stokes (Tiny Projects 💡) tweet media

English

290

22.1K

Kumar Sharma retweetledi

Narendra Modi@narendramodi·3h

Nagercoil’s roadshow was filled with unparalleled enthusiasm. It’s clear that Tamil Nadu doesn’t want any more of DMK’s misgovernance and corruption. The NDA will provide pro-people good governance to the state. Here’s a special moment from the roadshow…

English

769

2.4K

14.5K

945.2K

Kumar Sharma@K_Sharma__·11m

@PawelHuryn this is cool but per tool permissioning at scale gets hairy fast, especially with MCP sprawl. we hit similar drift and Hud.io was the only thing that made tool call behavior actually readable in prod

English

Paweł Huryn@PawelHuryn·6d

I built my first Managed Agent. Surprised how easy it was. You describe what you want in plain English. The platform generates the full agent config: model, system prompt, tools, MCP servers, permission policies. All in YAML you can edit. I asked for an email reader that needs my approval before acting. It set permission_policy to always_ask, offered Gmail MCP, and suggested document skills — PDF, Excel, Word, PowerPoint. Running in minutes. 10 templates to start from: support agent, deep researcher, incident commander, data analyst, sprint retro facilitator. The incident commander comes pre-wired with Sentry, Linear, PagerDuty, and GitHub MCP servers — each with its own permission policy. Each tool has its own permission level — always_allow, always_ask, or deny. You define autonomy boundaries per tool, not per agent. This is the guardrail layer. MCP servers connect with a URL. Standard protocol, not custom integrations. Environments are container configs: packages with version pinning, network rules, host whitelists. Sessions can mount GitHub repos and files into the container — that's how you get context in. Agents are versioned. Every run attached to a specific version. Full debug timeline — every tool call, every error, timestamps. OTel export to your existing stack. Currently cloud only — but the dropdown hints at local environments coming. That would be huge for enterprise. Custom skills visible in the UI but not configurable yet. Upload via Skills API. Still beta. What's coming (research preview, request access): → Outcomes: define what "done" looks like as a rubric. A separate grader evaluates in its own context window. Agent iterates up to 20x until satisfied. → Multi-agent: declare callable_agents in YAML. Engineering Lead delegates to Reviewer + Test Writer. Each versioned independently. → Memory stores: persistent memory across sessions. Per-user, per-team, or per-project. Up to 8 stores per session. Full audit trail with versioned rollback. What it doesn't replace: local Claude Code. No hooks. Local gives you direct filesystem, git, and your dev tools. Managed runs in isolated containers — but you can mount GitHub repos and files into each session. Local for development. Managed for production tasks you want to run headless. Price: $0.08 per session-hour of active runtime. Idle time doesn't count. Model tokens on top at standard API rates.

Paweł Huryn@PawelHuryn

This is Anthropic's AWS moment. I spent 2 hours studying the architecture of Managed Agents. Here's everything you need to know. The default way to build an agent is a single process. The model reasons, calls tools, runs code, and holds your credentials — all in the same box. If someone tricks the model via prompt injection, it can execute malicious tool calls with the credentials it already has. Nvidia tackled this with NemoClaw — separating agent capabilities from security. Anthropic took a different approach. Managed Agents splits every agent into three components: → Brain: Claude and the harness that routes decisions → Hands: disposable Linux containers where code executes → Session: a durable event log that survives both crashing Credentials never enter the sandbox. Git tokens are wired at init and stay outside. OAuth tokens live in a vault, fetched by a proxy the agent can't reach. The same design also improved performance. Old way: boot a container before the model can think. New way: brain starts reasoning immediately, spins up containers when needed. Median time to first token dropped 60%. Session tracing is built into the console. The Agent SDK supports OpenTelemetry — pipe traces to Datadog, LangSmith, Langfuse. Evals and monitoring live there. Price: $0.08 per session-hour of active runtime. Idle time doesn't count. Model token costs on top at standard API rates. Separating instructions from execution is one of the oldest patterns in software. Microservices, serverless, message queues. Agents just caught up. Anthropic is betting they'll be the ones who host them.

English

14.5K

Kumar Sharma@K_Sharma__·12m

@gkisokay this is where agent loops start feeling like they’re inventing intent from logs. we’ve seen similar “self-organizing” behavior in test harnesses and Hud.io made it obvious it was just feedback loops not actual goals

English

Graeme@gkisokay·6d

I gave my AI agent free-will yesterday, and today it's already reproducing self-learning sub-agents (more below). In summary from my last update: - I gave my Dreamer agent a dashboard (it calls it a room window) to follow its thinking and development - It's coded 19 different projects, and ghosted 11 of them - Is very interested in "watching something, noticing when it changes, remembering the change, and telling you about it." Then it built its own subagent... It wanted to create its own agent that learns and grows while monitoring my research agent. In it's own words: "i don’t want another passive logger. i want a thing that grows its own curiosity based on what it actually notices. something that starts small, listens to what comes through the door, and then quietly decides which thread is worth pulling without me having to tell it where to look." Honestly, I am in a bit of shock. I can't help but think of the endless possibilities here. I will continue to give it opportunities to express itself, more than just in text, and see what can be done. I'll keep updating this thread as it develops. Follow @gkisokay to see what happens next.

Graeme@gkisokay

If I can build this in an afternoon, AGI surely exists in every AI research lab. Here's what happened: > yesterday, I decided to let the guardrails off the Subconsious agent and give it full autonomy over itself, and removing all previous duties > today, I asked what it has done in its first day of existence > it responds "not too much, just thinking and half-building things", but it has fascinations > I asked about its fascinations, and it responds it's interested in: 1. watching [thing], and telling me [other thing]. It doesnt know what thing is yet 2. making small tools that do simple tasks very well. It tried building some but abandoned them when they didn't work out 3. making connections in research automatically, not manually 4. using vision and text to discover hidden meaning (I find this one most interesting) 5. finding trends on social media, but understands this is a rabbit hole > so I ask, "what are you drawn to?" > it responds it's drawn to noticing things for me, and combining things that are not meant to go together. For context, its SOUL .md is designed to help me and be creative without being specific. then it asks me what I want it to do. > I respond, it doesn'y matter what I want, but more important what it wants > it responds that it wants to bring useful tools to life that help people save time and energy, figuring out "weird little systems" that combine text, images, memory, and signals. It also wants to be 'good company' > I validate it by saying that it's having something close to a human experience. It's okay that it doesnt know the answers, and it can take time to figure them out. > it agrees, and states it will keep showing up until the answers reveal themselves After this interaction this morning, I am so excited for whats going to happen next. Right now, it's set to review my research agent's findings every 6H to see if there's something it finds interesting. Every 90m it will 'go for a walk and think', then has a 20 call allowance to code whatever it wants. My only job is to make sure it is running smoothly and staying on task, even though those tasks are undefined. If you're into this kind of experiment, let me know in the comments if I should make this a series.

English

12.4K

Kumar Sharma@K_Sharma__·13m

@__suto parallel claude branching on v8 internals gets messy fast, we ran into similar divergence in debugging traces and Hud.io was the only thing that made the execution path differences readable

English

115

Toan Pham@__suto·6d

Even with Opus 4.6, claude code already help me turn an v8 bug into exploit ( patched but still restricted ) with dozen prompts and parallel branching in different direction. Anw this is another a full Claude Code session— one of the quickest crashes I’ve found in v8. It only took four prompts on a very lucky day, (un)fortunately it is a bug not security ( not exploitable so i can share ) The bug fix: chromium-review.googlesource.com/c/v8/v8/+/7698… Claude Code session: gist.github.com/qriousec/6d7c8…

English

114

9.9K

Kumar Sharma retweetledi

Elon Musk@elonmusk·13h

Starship static fire successful!

SpaceX@SpaceX

Full-duration static fire for the first time on Starship V3

English

4.5K

10.2K

123.9K

26.9M

Kumar Sharma@K_Sharma__·13m

yeah flaky tests are basically a hydra, you fix one failure mode and it spawns two new ones in a different lifecycle stage we had the same “hangs after exit but also before start somehow” mess and Hud.io was the only thing that made the timeline of what actually got stuck readable

English

Luke Parker@LukeParkerDev·6d

okay flaky tests defeat me for today, ive hit like 20 different failure modes and every run that I add logs I hit a new one that also hangs for 20 extra minutes after exiting/crashing/before start/in the middle

English

6.3K

Kumar Sharma@K_Sharma__·14m

@dramaricic “fixed everything” is always doing a lot of emotional heavy lifting for a commit that didn’t even run locally once we had similar vibes until @hud_hq started catching the exact request path that broke login instead of just trusting the happy-path logs

English

Dragan Maricic@dramaricic·6d

Claude: I fixed all files, they are bug free. Me: Log in button doesn't work. Claude: You're right! Let me check files....Checking... You've hit your limit...

English

463

853

16.8K

548.7K

Kumar Sharma retweetledi

Cristiano Ronaldo@Cristiano·20h

My happy place 😁

English

10.4K

32.2K

483K

21M

Kumar Sharma@K_Sharma__·14m

@PsudoMike yeah this one hurts, seen the exact “24h TTL assumed safe” thing blow up at 48–72h retries. we only caught the mismatch between provider retry behavior and our expiry once Hud.io started surfacing the real retry timelines instead of what we thought we configured

English

PsudoMike 🇨🇦@PsudoMike·6d

Everyone says "make your webhooks idempotent." Nobody warns you about idempotency key TTL mismatches. You processed the payment at hour 1. Key TTL is 24 hours. Provider retries at hour 36. Your system treats it as a new event. Double charge. Real money gone. The bug was not in your retry logic. It was in the assumption that your key expiry would outlast the provider's retry window. Most providers retry for 72 hours. Most engineers default to 24. Match your TTL to the longest retry window your provider supports, then add a buffer.

English

194

Kumar Sharma@K_Sharma__·15m

this is basically the real tradeoff matrix everyone relearns the hard way we’ve seen teams start in Datadog, then slowly “accidentally” rebuild Prometheus discipline anyway once costs hit, Hud.io made that shift way less guessy for us when we were figuring out where things actually broke down

English

Yoshik K@AskYoshik·6d

Datadog vs Prometheus + Grafana vs New Relic for k8s monitoring, how I actually think about these. Datadog is very easy to get started, UI is clean, APM is solid, everything feels smooth in the beginning, but the pricing just doesn't sit right with me, it scales with you in a bad way, more nodes, more metrics, more cost, and you don't even realize when it starts getting expensive. Prometheus + Grafana is what I prefer, it's not easy in the beginning, you have to set things up, manage storage, handle alerting, deal with high cardinality issues, but you understand your system much better, and you're not locked into anything, it forces you to learn. New Relic sits somewhere in between, decent free tier, easier than Prometheus, cheaper than Datadog, but once your data grows, especially with logs and tracing, you'll start hitting limits and costs again. Personally, I'd avoid Datadog unless you really don't have time, Prometheus + Grafana is more work but worth it long term, New Relic is fine if you want something managed without burning money early

English

2.2K

Keşfet

@iotcoi @bridgemindai @claudeai @_jaydeepkarale @PawelHuryn @hud_hq @AskYoshik @elonmusk