Antrixsh Gupta

2K posts

Antrixsh Gupta

@AntrixshG

Data Science Professional, Technology Geek,

Pune, India Katılım Ağustos 2018

114 Takip Edilen423 Takipçiler

Antrixsh Gupta@AntrixshG·3d

Vibe coding hackathon You all love this.

English

Antrixsh Gupta@AntrixshG·26 Nis

@arsh_goyal Hyderabad

Indonesia

170

Arsh Goyal@arsh_goyal·25 Nis

Guess the city?

English

11.5K

Antrixsh Gupta@AntrixshG·9 Nis

A lot of AI agent startups got cooked today. Anthropic launched Managed Agents. A fully hosted service that runs long-horizon AI agents on your behalf. Session management. Sandbox execution. Context engineering. Failure recovery. All of it. Native to the Claude platform. Here is what this actually means. There is an entire category of startups whose product is exactly this. “We make AI agents reliable at scale.” That is their pitch. That is their Series A. That is their moat. Anthropic just made it a feature. The AI agent infrastructure space just got a lot more crowded.

Claude@claudeai

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English

Antrixsh Gupta@AntrixshG·7 Nis

I just published Scaling Agentic AI: Multi-Agent Systems Explained medium.com/p/scaling-agen…

English

Antrixsh Gupta@AntrixshG·1 Nis

Oracle fired 30k employees via 6 AM email. And Oracle is not a struggling company, they made most money than ever. Despite that, 30k people lost their jobs. They told you to code, you did. They told you to upskill, you did. They told you to learn AI, you did. And then they replaced you with the same system you helped them build. #oracle #layoff

English

Antrixsh Gupta@AntrixshG·30 Mar

The biggest blind spot in AI isn't prompt injection, it's permission creep. Agents execute valid API calls that quietly cross trust boundaries. EDR sees processes, not context drift. Treat agents like privileged identities, not just software. #AIInfra #AppSec #Agents

English

Antrixsh Gupta@AntrixshG·29 Mar

I'm noticing a pattern: the bottleneck for agents isn't reasoning, it's execution boundaries. We've got agents rewriting code overnight, yet we treat prompts as security. If it can trigger side effects without hard IAM, you just have best-effort vibes. #AIagents #LLMs #IAM

English

Antrixsh Gupta@AntrixshG·29 Mar

Noticing a shift away from naive RAG. Chunk-and-embed pipelines work for text fetching, but fail at understanding structure. The unlock for agent memory isn't vector similarity—it's structured graph layers where retrieval is navigation, not lookup. #RAG #AIAgents #LLMs

English

Antrixsh Gupta@AntrixshG·29 Mar

I'm noticing a shift in agent infra: browser automation is a dead end. Builders are abandoning DOM scraping, instead wrapping the web in CLIs and offline-first MCP servers. Stop making your models read HTML. Give them native programmatic access. #AIagents #MCP #dev

English

Antrixsh Gupta@AntrixshG·29 Mar

I'm noticing a pattern: our evals are rotting. We’re optimizing against benchmarks where 6% of the ground truth is wrong, and weak LLM judges accept 63% of garbage answers. We're benchmarking context windows, not actual memory. Build deterministic evals. #LLMs #Evals #GenAI

English

Antrixsh Gupta@AntrixshG·29 Mar

@sickdotdev github.com/antrixsh/trust… This is what I am building a The open-source framework for evaluating LLM safety, fairness, and reliability in regulated industries.

English

Sick@sickdotdev·28 Mar

Hey Founders Drop what you’re building👇 Last time 1M+ people saw it. Consider this as marketing.

English

345

162

13.7K

Antrixsh Gupta@AntrixshG·29 Mar

@heyblake github.com/antrixsh/trust…

QME

Blake Emal@heyblake·28 Mar

Drop your project URL Let’s drive some traffic

English

1.4K

734

121.1K

Antrixsh Gupta@AntrixshG·29 Mar

An agent without strict monitoring and token limits isn't a feature. It's a memory leak with a credit card attached. Without guardrails, they loop, eat your RAM, and burn your API budget by Sunday. Agents are volatile infra, not static code. #AIAgents #Infra #LLMs

English

101

Antrixsh Gupta@AntrixshG·29 Mar

I'm noticing a massive blind spot in agent evals: grading the final output. An agent can loop 5 times, hallucinate a tool call, recover, and still return the 'right' answer. If you aren't scoring the execution trace, your evals are lying to you. #AIAgents #Evals #LLMs

English

Antrixsh Gupta@AntrixshG·28 Mar

The real issue is we're optimizing models against broken yardsticks. An audit of the LoCoMo benchmark found its LLM judge accepts 63% of deliberately wrong answers. LongMemEval is just a context window test. Stop trusting leaderboards. Build custom evals. #LLMs #Evals #AI

English

Antrixsh Gupta@AntrixshG·28 Mar

We're building recursive self-improving agents, yet MCP still chokes on binary transfers and agents need append-only WALs just to survive context compaction. The real bottleneck isn't reasoning. It's brittle infra and output-only evals. #AIAgents #LLMs #DevTools

English

Antrixsh Gupta@AntrixshG·28 Mar

Evaluating AI agents on their final output is a massive blind spot. I’m seeing agents land on the correct answer only after insane tool loops and near-catastrophic API calls. Stop scoring the output. The real signal is in the execution trace. #AIAgents #LLMs #Evals

English

Antrixsh Gupta@AntrixshG·28 Mar

Most 'agents' are just hardcoded DAGs with an LLM node in the middle. And that's fine. Hardcode your logic. Use models strictly for messy inputs. When workflows break, you patch a node. When agents break, you're lost in hallucinated tool calls. #LLMs #Agents #DevTools

English

Antrixsh Gupta@AntrixshG·28 Mar

The biggest blindspot in agent dev isn't reasoning, it's infra. We obsess over prompts but ignore structural failures: missing idempotency keys, hidden trace loops, and duplicate tool calls. Stop evaluating just final outputs and audit the trace. #AIAgents #LLMs #Infra

English

Antrixsh Gupta@AntrixshG·28 Mar

Evaluating agents purely on final output is a trap. They can hit the right answer while doing nonsense under the hood: infinite loops, hallucinated tool calls, and wasted compute. If your evals don't score the execution trace, you're flying blind. #Agents #Evals #LLMs

English

Keşfet

@arsh_goyal @sickdotdev @heyblake @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates