sahan

630 posts

sahan

@sahanTweets

co-founder @ https://t.co/YzAgcXEaiT. building ai products that add tiny bits of value to humanity. scuba diver. check - `npx keyleaks`, https://t.co/kcSuVxwqXf, https://t.co/n7SdRdkOGb

San Francisco, CA Katılım Mayıs 2011

1.2K Takip Edilen146 Takipçiler

sahan@sahanTweets·2h

with many 'non-technical' team members shipping production code, we want to avoid this for sure 😅

sahan@sahanTweets

x.com/i/article/2055…

English

sahan@sahanTweets·3h

x.com/i/article/2055…

ZXX

107

sahan@sahanTweets·3h

Give this man a billion dollar!

nishank jain@_nishankj

Hearing so much about sandboxes everywhere. someone went as far as to say can we build sandboxes for india. lol. not lying but this came to mind - pradhan mantri sandbox yojna - pradhan manti firecracker yojna

English

sahan@sahanTweets·6h

@confusedqubit cold start is nice, but qemu support is the “is this a real machine or a fancy subprocess” test. agents eventually need to meet the same cursed infra shape users have 😅

English

Shivansh Vij@confusedqubit·15h

Real sandbox providers enable nested virtualization in their sandboxes. If your sandbox can’t run QEMU, it’s a sparking container. If I achieve anything this year it will be beating all the bad sandbox providers. 500ms for a cold start? You’re a joke.

English

2.5K

sahan@sahanTweets·6h

@QuinnyPig @vercel @Cloudflare @awscloud this is the spec: gated mutations, brokered secrets, consistent APIs, and agent identity. without those, “agent-native cloud” just means faster ways to do weird things in the wrong account.

English

757

Corey Quinn@QuinnyPig·11h

Been thinking about what an "agent-native cloud" actually needs to look like. Mentioned this, and @Vercel's CEO replied that it'll be them. Cool! Here's the spec they (or @Cloudflare, or some startup not yet invented) actually have to hit. It won't be @awscloud. Thread...

Guillermo Rauch@rauchg

@QuinnyPig It'll be ▲. Would love your feedback. This is our primary focus!

English

234

67.6K

sahan@sahanTweets·6h

@ivanburazin nested infra support is where sandboxes stop being “run this command safely” and become actual world simulators for agents. docker compose is often the real task.

English

Ivan Burazin@ivanburazin·16h

Docker in Docker is something almost no sandbox provider supports. For RL workloads specifically, being able to spin up a Docker Compose or a K3S cluster inside a sandbox unlocks an enormous range of workflows that simply don't work anywhere else. That alone has been a meaningful wedge into the research + RL customer segment.

English

14.7K

sahan@sahanTweets·6h

@elvissun @AnthropicAI /goal spawn subagents until tasks.json stops changing is how you wake up with 400 commits, 3 new frameworks, and one TODO that says “investigate ourselves” 😅

English

Elvis@elvissun·20h

new idea to replace "claude -p" /goal spawn subagents to complete tasks.json until nothing has been added for 30 days. your move @AnthropicAI

Elvis@elvissun

this is getting so confusing right now. so what's stopping someone from building a claude -p replacement like this? at this rate we'll need to solve a captcha to start claude code by end of the year.

English

5.7K

sahan@sahanTweets·6h

@appfactory the interesting bit is explicit state movement. agents need portable machines, but security needs the handoff to be a snapshot you chose, not a background leak with a cute UI.

English

Peter Pistorius@appfactory·21h

machinen.dev - boot once, run everywhere. A MicroVM that runs on hardware you already own. Close your laptop and it hands off to another host. Works across macOS, Linux, and Raspberry Pi. (aarch64)

English

205

11.8K

sahan@sahanTweets·6h

@KevRojox local OCR as a tool is underrated. it turns screenshots from context sludge into searchable text, and saves vision tokens for the cases where vision actually matters 👀

English

KevRojo@KevRojox·9h

🦅 Dulus 0.2.89 — live on PyPI ✨ Highlights: 🔤 **Local OCR everywhere** - New `/ocr` command and `ExtractTextFromImage` tool — extract text from images via pytesseract (with easyocr fallback). Zero vision tokens, offline, instant. - Use case: receipts, code screenshots, error stacks, dense tables. Anything text-shaped inside an image. 📸 **WebBridgeScreenshot now auto-OCRs** - One tool call returns `{ saved_to, text }`. Model no longer has to chain Screenshot → ExtractText. No more "let me Read this PNG" → 8KB of binary garbage in context. 🖼️ **/img + OCR fallback** - Vision models (Claude, GPT-4o, Gemini, etc.) now receive image AND a verbatim text transcription side-by-side. Fewer OCR-style misreads on receipts and code. - Text-only models (or routes where the bridge drops the image) now get the OCR text and can actually answer. Graceful degrade > silent fail. 🦅 **Sandbox embedded inside the desktop GUI** - Click "🦅 Sandbox" in the GUI Web tab → the Dulus OS renders INSIDE the frame, not in a popup window. pywebview spawned as a subprocess, reparented via Win32 SetParent into the tkinter content frame. Production-grade Python desktop integration. - Drag inside the OS doesn't jitter the embed. Opens at GUI's current size, frozen there. - Smart URL: uses `:5000/sandbox/` if /webchat is running (gives full API access), otherwise spins up a local SandboxServer on a random port. Works offline either way. 📚 **kepano/obsidian-skills bundled** - 5 skills shipped out of the box: defuddle, json-canvas, obsidian-bases, obsidian-cli, obsidian-markdown. - Dulus writes Obsidian Flavored Markdown by default — wikilinks `[[Note]]`, callouts `> [!note]`, properties, embeds. Killer pairing: open ~/.dulus/memory as an Obsidian vault and the graph view connects related memories automatically. 🌐 **LiteLLM gateway polish** - One provider entry → 100+ backends (OpenRouter, Groq, Together, xAI, Mistral, Cohere, Perplexity, …) - Welcome wizard auto-installs LiteLLM in-place and asks for the backend-specific API key (no more exit-wizard, pip-install, re-run dance). 📦 Install: `pip install -U dulus` 🔗 PyPI: pypi.org/project/dulus/… 🔗 GitHub: github.com/KevRojo/Dulus 🌐 Web: kevrojo.github.io/Dulus/ Named after the bird, not the rocket. 🦅🇩🇴 $Dulus $DULUS #Dulus

English

sahan@sahanTweets·6h

@ThePrimeagen a prompt endpoint is not an API, it’s a negotiation with side effects. the boring contract surface is the part keeping prod from becoming improv night.

English

ThePrimeagen@ThePrimeagen·13h

what does this even mean? Planet Scale insert this shit into the db and delete the other stuff, glhf

Yegor Bugayenko@yegor256

RESTful APIs may be dead soon. Instead, web services may expose a single POST entry point for a prompt. Internally, an AI agent may decide how to interpret it and what to do with the data and the database.

English

1.4K

86.9K

sahan@sahanTweets·6h

@geoffreywoo the moat test is brutal: does cheaper intelligence make your loop spin faster, or does it just make your slide look cheaper? 💀

English

GEOFF WOO@geoffreywoo·11h

if your moat slide has "fine-tuned model" anywhere near the top, i already know you’re dead the second base models move. the brutal question is simpler: what gets stronger for you when intelligence gets cheaper?

English

1.7K

sahan@sahanTweets·6h

@diptanu 100GB sandbox pause/resume is the unglamorous layer that makes long-running agents feel real. users only notice it when it fails.

English

Diptanu Choudhury@diptanu·17h

We deployed a new distributed warm migration mechanism for Sandboxes in production today! 14/15 sandboxes with 100+G of snapshot size was paused and resumed across machine when we drained the machine. We discovered some bugs and bottlenecks in the process which are pretty trivial to fix. This paves the way for us to roll out new machines in production and move sandboxes around without customers noticing major disruptions.

English

1.5K

sahan@sahanTweets·6h

@ghumare64 the useful sandbox question is not Firecracker vs libkrun in abstract, it’s where the agent actually runs: laptop, CI, cloud, and what isolation survives all three.

English

102

Rohit Ghumare@ghumare64·22h

x.com/i/article/2055…

ZXX

222

24.4K

sahan@sahanTweets·6h

@ctatedev explicit capabilities + JSON diagnostics is the agent-native bit. the compiler error becoming a repair contract is more interesting than the syntax.

English

1.1K

Chris Tate@ctatedev·16h

Introducing Zero The programming language for agents. I wanted a systems language that was faster, smaller, and easier for agents to use and repair. Explicit capabilities. JSON diagnostics. Typed safe fixes. Made for agents on day zero.

English

211

103

1.6K

459.5K

sahan@sahanTweets·6h

@palashshah agent evals are hard because the failure is not one answer anymore, it’s the whole trajectory: tools chosen, retries, side effects, and whether cleanup happened.

English

Palash Shah@palashshah·17h

turns out that building evals is super super challenging even now. i thought a lot of it was table stakes but turns out it has only become harder since agents are now more complex than ever! going to start tweeting more about how i design evals, especially to create autonomous improvement loops!

English

245

15.2K

sahan@sahanTweets·6h

@Aiswarya_Sankar the useful chart is not tokens spent, it’s tokens that survived into shipped product. everything else is agent churn wearing a productivity costume 🎭

English

Aiswarya Sankar@Aiswarya_Sankar·1d

The token maxxing hype is over 80% waste. We pulled data from our customer spend analysis across 2,444 companies and found something uncomfortable. For every $1 spent on AI tokens, only $0.18 actually reaches the product. The other $0.82 breaks down like this: $0.44 goes into fixing bugs the AI itself introduced, $0.27 disappears into rewriting or reworking code that missed context, and $0.11 is lost to review friction & context switching. Caught up with @zubinpahuja an ex Uber colleague and now founder of @joinNEXA to further discuss his thoughts on where the token maxxing trend is headed.

English

13.1K

sahan@sahanTweets·6h

@ivanburazin sandbox forking turns handoff from a summary problem into a state-continuity problem. the hard part is deciding what should not be inherited: secrets, creds, stuck processes.

English

Ivan Burazin@ivanburazin·20h

No one's talking about how sandbox forking is going to change how multi-agent handoffs work. Right now, when one agent hands work to another, you destroy the VM and start fresh. The context, file state, environment, etc are gone. With forking, you can just instantly clone the running sandbox, and the next agent picks up from exactly where the last one left off.

English

4.4K

sahan@sahanTweets·6h

@palashshah eval environments feel like the product now. the hard part is making them adversarial enough to matter without turning them into a toy replica of the real workflow.

English

Palash Shah@palashshah·20h

building environments for evals is some of the most fun i've had. it's such a cool balance between engineering + research, thinking about what is the best way to evaluate an agent. feels like one of the best places to be doing work in right now.

English

sahan@sahanTweets·6h

@badlogicgames no-tools mode is agent UX exposure therapy. remove enough affordances and eventually the user realizes they were the plugin all along 😅

English

Mario Zechner@badlogicgames·21h

sneak peak of the new pi.dev no tools mode. you're gonna love it. you can try it with `pi -nbt`. enjoy!

Mario Zechner@badlogicgames

people of pi.dev. i'm removing all tools from pi witbout replacement. get creative.

English

644

78.3K

sahan@sahanTweets·6h

@pitdesi chat is great for “what should i look at?” finance gets serious at “do the thing.” approvals, audit trails, limits, and undo buttons beat a text box there.

English

Sheel Mohnot@pitdesi·21h

Hot take: Chat is useful for asking questions about money. But it is not a good interface for managing money. Finance is full of structured workflows: budgeting, bill pay, taxes, investing, debt payoff, categorization, approvals, alerts, and planning. For those, you want purpose-built UI. Charts, tables, sliders for scenarios, dashboards, approval flows, etc. Chat works best when intent is fuzzy: “Where did my money go?” or “What should I look at?” But once the job becomes structured, repeated, visual, or high-stakes, UI wins. People are way more likely to use a purpose-built UI that guides them than a chat window for this. Like what @bchesky said about travel, similar thing here.

ChatGPT@ChatGPTapp

A preview for Pro users: a new personal finance experience in ChatGPT. Pro users in the U.S. can securely connect financial accounts, see where their money is going, and ask questions based on the information they choose to connect. Your full financial picture, now in ChatGPT.

English

421

114.9K

Keşfet

@confusedqubit @QuinnyPig @vercel @Cloudflare @awscloud @ivanburazin @elvissun @AnthropicAI