Sai

298 posts

Sai

@mnvsk97

💻 Dev Rel at Truefoundry 🧑‍🎓 IIT Roorkee

San Francisco, CA Katılım Aralık 2015

42 Takip Edilen72 Takipçiler

Sai@mnvsk97·3d

@jpschroeder Setting goals with 10% usage left is crazy haha

English

516

Justin Schroeder@jpschroeder·4d

daaaaaaaanggg. alright codex. alright. Long-term goals just shipped in the latest version behind a feature flag. Apparently, it just runs...

English

763

89.2K

Sai@mnvsk97·18 Nis

@BogdanFlo26 Yep, switching models is fine to an extent, but what makes the biggest difference is how the harness is built, how the memory and context are handled.

English

Bogdan Florian Iancu@BogdanFlo26·18 Nis

@mnvsk97 Exactly. A lot of people think the leverage is “which model is better”. Past a certain point, the real multiplier is handoff quality, context continuity, and orchestration across sessions.

English

Sai@mnvsk97·18 Nis

I use Codex, Claude Code, and Cursor, sometimes all three in the same session. Every time I switch, I lose my place. → The new tool doesn't know what I just told the last one → Re-explaining the same task from scratch → No clue what I've spent across sessions → Fumbling through flags I used an hour ago → Hit a rate limit, and now I'm starting over somewhere else The tools work. Switching between them doesn't. So I'm building a CLI that wraps all of them. One command, shared context, cost tracking across tools, and automatic failover when you hit limits. Shipping soon. #buildinpublic #claudecode #codex

English

103

Sai@mnvsk97·18 Nis

@adisingh Makes sense Thanks 🙏🏻

English

Sai@mnvsk97·18 Nis

@adisingh I wanted to do this with Truefoundry’s signup flow but I was concerned about unverified signups. How do you tackle this?

English

1.5K

Adi Singh@adisingh·18 Nis

F*ck it. Your agent can now sign up for AgentMail with one prompt "get yourself an inbox through agent.email" New users, test this and let me know how it works!

English

274

112.5K

Sai@mnvsk97·13 Nis

Someone sends a bug video. You watch it, write a ticket, and assign it. A lot of steps for "tell me what's wrong." eyeroll does the watching. Loom, YouTube, local files, screenshots. Pulls frames, transcribes audio, runs it through Gemini or GPT-4 or Ollama, and writes up what's broken, repro steps, and severity. It's also a Claude Code plugin, which is where it gets interesting. Your agent already has your codebase loaded, so when eyeroll says "the sidebar re-renders stale data after navigation," it knows which file, which component. You get a report written against your code, not a generic description of what's on screen. pip install eyeroll /eyeroll:watch --context "login broken after PR #432" /eyeroll:fix github.com/mnvsk97/eyeroll #claudecode #skills #aiagents #devtools #buildinpublic

English

Sai@mnvsk97·6 Nis

AI agents work fine until the API has a bad day. This isn't hypothetical. Claude's API was up 98.98% over the last 90 days, which is almost 22 hours of downtime (check status.claude.com). If you're self-hosting without the right config, that number is usually worse. Finding this in production sucks, so I built AgentBreak. It's a proxy that sits between your agent and its LLM or tool server and deliberately throws failures at it. Timeouts, 429s, garbage responses, slow tools. Basically, chaos engineering, but for agents. pip install agentbreak agentbreak init agentbreak serve Point your agent at localhost:5005 instead of the real API. Run it. Check the scorecard. It tells you what your agent survived, what degraded it, and what broke it completely, along with suggested fixes. No SDK changes needed, just swap the URL. Works with OpenAI API API, Anthropic's messages API, and MCP servers. There's also a Claude Code plugin if you want a more guided setup: /plugin marketplace add mnvsk97/agentbreak /plugin install agentbreak@mnvsk97-agentbreak /reload-plugins Then three steps: 1. /agentbreak:init reads your codebase, figures out your provider, and configures everything. 2. /agentbreak:create-tests generates fault scenarios based on your actual agent. 3. /agentbreak:run-tests launches the proxy, runs traffic through it, and gives you the report. Github repo: lnkd.in/gyPFZnje Docs: lnkd.in/g--szHFj #llm #agents #langchain

English

Sai@mnvsk97·1 Nis

I've been building agents for a while now, and one thing I like testing before going to production is how my agents are. I'm working on a tool that replicates the adverse behaviors like an inference server going down, brownouts, mcp servers responding with bad response schema and more. Attaching a sneak peek into how the final report looks.

English

Sai@mnvsk97·1 Nis

@adisingh epic success

English

Adi Singh@adisingh·1 Nis

x.com/i/article/2039…

ZXX

268

54.8K

Sai retweetledi

Sydney Runkle@sydneyrunkle·31 Mar

day 2 of the harness engineering series: dynamic config middleware lets you reshape your agent's model, tools, and prompt at every step based on context. ex: LLMToolSelectorMiddleware runs a fast filter on your tool registry so your main model receives streamlined tool specs.

English

228

146.1K

Sai@mnvsk97·31 Mar

as someone who uses both codex and claude code everyday, this is a great official plugin from openai

Romain Huet@romainhuet

We’ve seen Claude Code users bring in Codex for code review and use GPT-5.4 for more complex tasks, so we thought: why not make that easier? Today we’re open sourcing a plugin for it! You can call Codex from Claude Code with your ChatGPT subscription. We love an open ecosystem!

English

Sai@mnvsk97·26 Mar

my mindset shifted fundamentally in the way I make apps now. The first thing I think is whether it can be a skill, and then I work backwards towards creating a CLI or an API or maybe an MCP server if I can't do either. If it absolutely cannot be a skill, only then I think towards creating a fill blown app.

English

Sai@mnvsk97·26 Mar

@E_FutureFan @LangChain @hwchase17 The transcript in this example is pretty small. fits the windows of the models i chose If the transcripts are big, which is the case in the real world, I would let the agent look through the transcript in chunks by providing a search tool of sorts (im sure there are better ways)

English

Erika S@E_FutureFan·26 Mar

@mnvsk97 @LangChain @hwchase17 Admittedly I'm still learning orchestration nuances, but this mirrors my fine-tuning work. Native strengths beat generalists. How are you managing context window limits?

English

Sai@mnvsk97·25 Mar

I've been following DeepAgents by @LangChain since @hwchase17 started it as a personal project — great to see the traction it's getting and that it now powers OpenSWE! Just built a Voice Call Analyser powered by 3 LLMs running in parallel using DeepAgents: - Gemini analyses sentiment - Claude extracts action items & creates Linear tickets via MCP - GPT coaches the rep - Claude Sonnet 4.6 orchestrates the whole flow All LLMs route through @truefoundry AI Gateway (one endpoint, one key, full observability). Linear ticket creation goes through the TrueFoundry MCP Gateway — a secure tool with built-in auth and audit. Different models are good at different things. An AI Gateway + MCP Gateway make it practical to use the right model for each task. repo: github.com/truefoundry/tf…

English

107

9.4K

Sai@mnvsk97·26 Mar

@ArvindKampli @LangChain @hwchase17 I wanted to demonstrate the capabilities of subagents. model choice was between what was available for me. You can swap with kimi 2.5

English

Arvind Kampli @ Augmentable.Ai@ArvindKampli·26 Mar

@mnvsk97 @LangChain @hwchase17 why did you choose to split the scope between all these various models. seems arbitrary, and expensive. a cheaper kimi 2.5 could do it all. interested in your model selection method

English

Sai@mnvsk97·26 Mar

@sydneyrunkle @masondrxy How is hosting deepagents server different from a langgraph server ? Do you recommend hosting it with different infra in mind ?

English

237

Sydney Runkle@sydneyrunkle·25 Mar

ai dev report from today: i'm working on a new docs page spent 10 minutes whiteboarding + mapping some concepts into a table took a pic of the result, uploaded to deepagents CLI, then updated my docs PR w/ my diagram. deepagents deployment docs coming soon!!!

English

Sai@mnvsk97·24 Mar

come say hi

tokens&@tokensandai

this friday in SF we're putting $50K+ on the table. join us at @awscloud for a full day of building agents that actually execute. research agents that go 10 layers deep. support systems that resolve issues end to end. pipelines that run themselves. 🏆 $50K+ inclusive of prizes, credits, and $6500+ cash 📅 march 27 🎟️ in-person | teams of 4 max | limited spots registrations close thursday. apply below 👇

English

111

Sai@mnvsk97·23 Mar

@truefoundry is sponsoring a hackathon on March 27th with over 1,000 builders registered. I'll be speaking and presenting about our AI Gateway. If you’re interested in hacking on deep agents, sign up here: luma.com/deepagentshack Come say Hi!

English

Sai retweetledi

Rajan Anandan@RajanAnandan·23 Mar

#1 Enterprise AI Gateway! Congrats team @truefoundry @AIBoomi

Surge@_surgeahead

Great to see @truefoundry recognized with the “DevOps & Infrastructure Award 2026” at @AIBoomi! As teams take AI systems to production, platforms like TrueFoundry are helping enable more reliable deployment and scaling. Congratulations @anuraag_kgp, @nikunjbjj, and Abhishek on this recognition 🙌

English

1.4K

Keşfet

@jpschroeder @BogdanFlo26 @adisingh @E_FutureFan @LangChain @hwchase17 @truefoundry @ArvindKampli