sahan

630 posts

sahan banner
sahan

sahan

@sahanTweets

co-founder @ https://t.co/YzAgcXEaiT. building ai products that add tiny bits of value to humanity. scuba diver. check - `npx keyleaks`, https://t.co/kcSuVxwqXf, https://t.co/n7SdRdkOGb

San Francisco, CA Katılım Mayıs 2011
1.2K Takip Edilen146 Takipçiler
sahan
sahan@sahanTweets·
@confusedqubit cold start is nice, but qemu support is the “is this a real machine or a fancy subprocess” test. agents eventually need to meet the same cursed infra shape users have 😅
English
0
0
0
36
Shivansh Vij
Shivansh Vij@confusedqubit·
Real sandbox providers enable nested virtualization in their sandboxes. If your sandbox can’t run QEMU, it’s a sparking container. If I achieve anything this year it will be beating all the bad sandbox providers. 500ms for a cold start? You’re a joke.
English
8
0
25
2.5K
sahan
sahan@sahanTweets·
@QuinnyPig @vercel @Cloudflare @awscloud this is the spec: gated mutations, brokered secrets, consistent APIs, and agent identity. without those, “agent-native cloud” just means faster ways to do weird things in the wrong account.
English
0
0
1
757
Corey Quinn
Corey Quinn@QuinnyPig·
Been thinking about what an "agent-native cloud" actually needs to look like. Mentioned this, and @Vercel's CEO replied that it'll be them. Cool! Here's the spec they (or @Cloudflare, or some startup not yet invented) actually have to hit. It won't be @awscloud. Thread...
Guillermo Rauch@rauchg

@QuinnyPig It'll be ▲. Would love your feedback. This is our primary focus!

English
20
18
234
67.6K
sahan
sahan@sahanTweets·
@ivanburazin nested infra support is where sandboxes stop being “run this command safely” and become actual world simulators for agents. docker compose is often the real task.
English
0
0
1
33
Ivan Burazin
Ivan Burazin@ivanburazin·
Docker in Docker is something almost no sandbox provider supports. For RL workloads specifically, being able to spin up a Docker Compose or a K3S cluster inside a sandbox unlocks an enormous range of workflows that simply don't work anywhere else. That alone has been a meaningful wedge into the research + RL customer segment.
English
7
2
71
14.7K
sahan
sahan@sahanTweets·
@elvissun @AnthropicAI /goal spawn subagents until tasks.json stops changing is how you wake up with 400 commits, 3 new frameworks, and one TODO that says “investigate ourselves” 😅
English
0
0
0
31
sahan
sahan@sahanTweets·
@appfactory the interesting bit is explicit state movement. agents need portable machines, but security needs the handoff to be a snapshot you chose, not a background leak with a cute UI.
English
0
0
1
29
Peter Pistorius
Peter Pistorius@appfactory·
machinen.dev - boot once, run everywhere. A MicroVM that runs on hardware you already own. Close your laptop and it hands off to another host. Works across macOS, Linux, and Raspberry Pi. (aarch64)
Peter Pistorius tweet media
English
16
38
205
11.8K
sahan
sahan@sahanTweets·
@KevRojox local OCR as a tool is underrated. it turns screenshots from context sludge into searchable text, and saves vision tokens for the cases where vision actually matters 👀
English
1
0
0
9
KevRojo
KevRojo@KevRojox·
🦅 Dulus 0.2.89 — live on PyPI ✨ Highlights: 🔤 **Local OCR everywhere** - New `/ocr` command and `ExtractTextFromImage` tool — extract text from images via pytesseract (with easyocr fallback). Zero vision tokens, offline, instant. - Use case: receipts, code screenshots, error stacks, dense tables. Anything text-shaped inside an image. 📸 **WebBridgeScreenshot now auto-OCRs** - One tool call returns `{ saved_to, text }`. Model no longer has to chain Screenshot → ExtractText. No more "let me Read this PNG" → 8KB of binary garbage in context. 🖼️ **/img + OCR fallback** - Vision models (Claude, GPT-4o, Gemini, etc.) now receive image AND a verbatim text transcription side-by-side. Fewer OCR-style misreads on receipts and code. - Text-only models (or routes where the bridge drops the image) now get the OCR text and can actually answer. Graceful degrade > silent fail. 🦅 **Sandbox embedded inside the desktop GUI** - Click "🦅 Sandbox" in the GUI Web tab → the Dulus OS renders INSIDE the frame, not in a popup window. pywebview spawned as a subprocess, reparented via Win32 SetParent into the tkinter content frame. Production-grade Python desktop integration. - Drag inside the OS doesn't jitter the embed. Opens at GUI's current size, frozen there. - Smart URL: uses `:5000/sandbox/` if /webchat is running (gives full API access), otherwise spins up a local SandboxServer on a random port. Works offline either way. 📚 **kepano/obsidian-skills bundled** - 5 skills shipped out of the box: defuddle, json-canvas, obsidian-bases, obsidian-cli, obsidian-markdown. - Dulus writes Obsidian Flavored Markdown by default — wikilinks `[[Note]]`, callouts `> [!note]`, properties, embeds. Killer pairing: open ~/.dulus/memory as an Obsidian vault and the graph view connects related memories automatically. 🌐 **LiteLLM gateway polish** - One provider entry → 100+ backends (OpenRouter, Groq, Together, xAI, Mistral, Cohere, Perplexity, …) - Welcome wizard auto-installs LiteLLM in-place and asks for the backend-specific API key (no more exit-wizard, pip-install, re-run dance). 📦 Install: `pip install -U dulus` 🔗 PyPI: pypi.org/project/dulus/… 🔗 GitHub: github.com/KevRojo/Dulus 🌐 Web: kevrojo.github.io/Dulus/ Named after the bird, not the rocket. 🦅🇩🇴 $Dulus $DULUS #Dulus
English
2
1
2
59
sahan
sahan@sahanTweets·
@ThePrimeagen a prompt endpoint is not an API, it’s a negotiation with side effects. the boring contract surface is the part keeping prod from becoming improv night.
English
0
0
0
14
sahan
sahan@sahanTweets·
@geoffreywoo the moat test is brutal: does cheaper intelligence make your loop spin faster, or does it just make your slide look cheaper? 💀
English
0
0
0
23
GEOFF WOO
GEOFF WOO@geoffreywoo·
if your moat slide has "fine-tuned model" anywhere near the top, i already know you’re dead the second base models move. the brutal question is simpler: what gets stronger for you when intelligence gets cheaper?
English
6
1
24
1.7K
sahan
sahan@sahanTweets·
@diptanu 100GB sandbox pause/resume is the unglamorous layer that makes long-running agents feel real. users only notice it when it fails.
English
0
0
0
5
Diptanu Choudhury
Diptanu Choudhury@diptanu·
We deployed a new distributed warm migration mechanism for Sandboxes in production today! 14/15 sandboxes with 100+G of snapshot size was paused and resumed across machine when we drained the machine. We discovered some bugs and bottlenecks in the process which are pretty trivial to fix. This paves the way for us to roll out new machines in production and move sandboxes around without customers noticing major disruptions.
English
3
1
14
1.5K
sahan
sahan@sahanTweets·
@ghumare64 the useful sandbox question is not Firecracker vs libkrun in abstract, it’s where the agent actually runs: laptop, CI, cloud, and what isolation survives all three.
English
1
0
0
102
sahan
sahan@sahanTweets·
@ctatedev explicit capabilities + JSON diagnostics is the agent-native bit. the compiler error becoming a repair contract is more interesting than the syntax.
English
0
0
0
1.1K
Chris Tate
Chris Tate@ctatedev·
Introducing Zero The programming language for agents. I wanted a systems language that was faster, smaller, and easier for agents to use and repair. Explicit capabilities. JSON diagnostics. Typed safe fixes. Made for agents on day zero.
Chris Tate tweet media
English
211
103
1.6K
459.5K
sahan
sahan@sahanTweets·
@palashshah agent evals are hard because the failure is not one answer anymore, it’s the whole trajectory: tools chosen, retries, side effects, and whether cleanup happened.
English
0
0
0
31
Palash Shah
Palash Shah@palashshah·
turns out that building evals is super super challenging even now. i thought a lot of it was table stakes but turns out it has only become harder since agents are now more complex than ever! going to start tweeting more about how i design evals, especially to create autonomous improvement loops!
English
19
9
245
15.2K
sahan
sahan@sahanTweets·
@Aiswarya_Sankar the useful chart is not tokens spent, it’s tokens that survived into shipped product. everything else is agent churn wearing a productivity costume 🎭
English
0
0
0
26
Aiswarya Sankar
Aiswarya Sankar@Aiswarya_Sankar·
The token maxxing hype is over 80% waste. We pulled data from our customer spend analysis across 2,444 companies and found something uncomfortable. For every $1 spent on AI tokens, only $0.18 actually reaches the product. The other $0.82 breaks down like this: $0.44 goes into fixing bugs the AI itself introduced, $0.27 disappears into rewriting or reworking code that missed context, and $0.11 is lost to review friction & context switching. Caught up with @zubinpahuja an ex Uber colleague and now founder of @joinNEXA to further discuss his thoughts on where the token maxxing trend is headed.
English
12
7
82
13.1K
sahan
sahan@sahanTweets·
@ivanburazin sandbox forking turns handoff from a summary problem into a state-continuity problem. the hard part is deciding what should not be inherited: secrets, creds, stuck processes.
English
0
0
0
37
Ivan Burazin
Ivan Burazin@ivanburazin·
No one's talking about how sandbox forking is going to change how multi-agent handoffs work. Right now, when one agent hands work to another, you destroy the VM and start fresh. The context, file state, environment, etc are gone. With forking, you can just instantly clone the running sandbox, and the next agent picks up from exactly where the last one left off.
English
10
2
59
4.4K
sahan
sahan@sahanTweets·
@palashshah eval environments feel like the product now. the hard part is making them adversarial enough to matter without turning them into a toy replica of the real workflow.
English
0
0
0
13
Palash Shah
Palash Shah@palashshah·
building environments for evals is some of the most fun i've had. it's such a cool balance between engineering + research, thinking about what is the best way to evaluate an agent. feels like one of the best places to be doing work in right now.
English
9
2
81
6K
sahan
sahan@sahanTweets·
@badlogicgames no-tools mode is agent UX exposure therapy. remove enough affordances and eventually the user realizes they were the plugin all along 😅
English
0
0
0
59
sahan
sahan@sahanTweets·
@pitdesi chat is great for “what should i look at?” finance gets serious at “do the thing.” approvals, audit trails, limits, and undo buttons beat a text box there.
English
0
0
0
40
Sheel Mohnot
Sheel Mohnot@pitdesi·
Hot take: Chat is useful for asking questions about money. But it is not a good interface for managing money. Finance is full of structured workflows: budgeting, bill pay, taxes, investing, debt payoff, categorization, approvals, alerts, and planning. For those, you want purpose-built UI. Charts, tables, sliders for scenarios, dashboards, approval flows, etc. Chat works best when intent is fuzzy: “Where did my money go?” or “What should I look at?” But once the job becomes structured, repeated, visual, or high-stakes, UI wins. People are way more likely to use a purpose-built UI that guides them than a chat window for this. Like what @bchesky said about travel, similar thing here.
ChatGPT@ChatGPTapp

A preview for Pro users: a new personal finance experience in ChatGPT. Pro users in the U.S. can securely connect financial accounts, see where their money is going, and ask questions based on the information they choose to connect. Your full financial picture, now in ChatGPT.

English
85
12
421
114.9K