Souvik Banerjee

18 posts

Souvik Banerjee

Souvik Banerjee

@Souvik1997_

Oakland, CA Katılım Mart 2009
267 Takip Edilen17 Takipçiler
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
Every year, ~10 million people are booked into U.S. jails. Nearly 70% of the jail population hasn't been convicted of anything. The law guarantees them a phone call. Most don't know a lawyer's number. A Santa Clara County pilot found that people who spoke to a lawyer right after arrest spent 6 days in jail instead of 29. So I built JailCall. A real phone number you can call from a police station. An AI voice agent picks up, takes your name and charge, semantically routes your case to local criminal defense firms, sends real intake emails on your behalf, and remembers your case across calls. Built solo at the YC Call My Agent hackathon. Won 2nd place overall. Demo: youtube.com/watch?v=c1eKOd… Code: github.com/souvik1997/Jai… Thank you to @AgentPhoneHQ !
YouTube video
YouTube
English
3
0
3
131
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
@diptanu @damian_b What do you think would be better than a gateway? My ideal architecture would look like scoped capability tokens minted specifically for each agent/sandbox, with no ambient authority. Agents are the ultimate confused deputy
English
1
0
0
72
Diptanu Choudhury
Diptanu Choudhury@diptanu·
Solid outline of the future of sandboxes from @damian_b It’s a no brainer to focus on multi sandbox environments. The form factor of sandboxes are best suited to be the primitive to build, test and deploy new software. We have to build networking which is simple for agents to manage and connect sandboxes in a mesh. File systems managed by api calls to mount and the big one - Snapshotting these environments at scale for debugging and bootstrapping new environments. And for secrets the gateway approach is a good start, but IMO it’s very fragile.
Damian Barabonkov@damian_b

The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

English
4
2
17
3.4K
Souvik Banerjee retweetledi
Damian Barabonkov
Damian Barabonkov@damian_b·
The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.
Peter Steinberger 🦞@steipete

People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.

English
9
11
110
23.9K
Taelin
Taelin@VictorTaelin·
the current state of Bend2 is: → everything is done → everything works → all tests pass yet I can't launch because the codebase is massive and auditing it is taking forever because each small adjust or bugfix takes a whole day as the AI re-reads everything once again sighs
English
49
8
433
45.6K
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
@confusedqubit Curious if you're using the in-kernel GIC on macOS? I was working on this a while ago and had to implement GICv3 emulation in userspace to make snapshot/restore work properly
English
1
0
0
30
Shivansh Vij
Shivansh Vij@confusedqubit·
Spent all of last week patching libkrun to support snapshot/resume for VMs - including taking snapshots from KVM and natively restoring on MacOS HVF
English
1
1
7
1.6K
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
@confusedqubit why would you run the agent inside the VM and not want it to be rolled back with the VM?
English
0
0
0
32
Shivansh Vij
Shivansh Vij@confusedqubit·
Snapshot/Restore for sandboxes is NOT implemented correctly by any provider. Some are snapshotting just disks and calling it a day (looking at you smolvm). That's not enough. Others allow you to take full VM (memory, etc.) snapshots, but rolling back the VM state interrupts the agent running inside. This is the next UX hurdle. I want my agent to be able to rollback my VM, without it getting rolled back. And I want it in the VM, none of this remote execution nonsense.
English
8
1
27
3.5K
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
@gakonst the problem is not really about Arc performance- it's that it papers over bad design in nonobvious ways. forcing yourself to use references also makes the code better structured
English
0
0
0
63
Jason Zhao
Jason Zhao@byjasonz·
we just gave your computer infinite storage. quickly find and edit terabytes of files, all while using zero disk space. here’s a first look, updates shipping daily.
English
155
48
951
107.1K
Jason Zook
Jason Zook@jasondoesstuff·
SOOO many less bugs building stuff with this... - Claude Opus 4.7 makes the feature plan - GPT-5.5 reviews the plan (always finds issues) - Opus updates the plan, GPT approves - Opus builds, uses Playwright to test UX/UI - GPT reviews feature code (always finds issues) - Opus fixes issues, GPT signs off ✅ - Then I test fully myself, usually very minor issues - Merge and deploy! 🚀 I'm using @conductor_build to easily bounce back and forth between the two and VERY happy with this workflow 👏👏. Kind of crazy to pay ~$400/month for what feels like a full dev team that never pushes back on all my stupid UI requests and small changes 😂.
English
112
61
1.5K
130.9K
Aaron Kazah
Aaron Kazah@aaronkazah·
introducing trunks: the most powerful open-source git-native filesystem for ai agents. it gives agents a normal filesystem with git semantics built in: branches, diffs, rollback, checkpoints, and push/pull. available today. completely self-hosted. open source. your data stays on storage you control. github.com/layerbrain/tru…
English
7
15
145
10.9K
Taelin
Taelin@VictorTaelin·
again, suppose you have some bit of knowledge that is mandatory for an agent to operate well in your domain. ex: > using BigInt in this repo is bad for you have two options: Option 1: you make that directly visible (AGENTS.md) this DOES work if the Agent is good enough. the problem is that may be actually complex, like, 1k tokens worth. so, accumulate enough of these and you easily have 500k tokens of mandatory domain knowledge. including that in any model will immediately downgrade it into GPT-2, and cost a fortune Option 2: you make that SEARCHABLE (RAG, RLMs, etc.) the problem is that the AI cannot magically guess when it needs that bit of knowledge. it will not stop writing some JS function and think: "wait perhaps there is some part of the domain that tells me that BigInts are bad and I should start looking for it?" it will just use BigInts. I won't OCCUR to it that there is something to be searched so: - make visible: too long to fit - make searchable: it can't guess that's why I think nightly fine tuning as a product is the only way forward, as it allows you to extend a model with domain knowledge without causing context rot why nobody is doing this seriously is beyond me. it might be that for whatever reason this wouldn't be practical, but I suspect the real reason is nobody is seriously considering it
English
80
8
340
49K
Taelin
Taelin@VictorTaelin·
seriously, working with AI is MISERABLE for one and only one reason: having to re-explain the same thing "oh yeah this new session obviously doesn't know what proper case trees are, so let me explain it for the 5000th time in my life" I'm tired AGENTS.md doesn't solve this because it is impossible to fit the entire domain knowledge without nuking the context - it would be 1m+ tokens worth RAGs don't solve this, the agent won't search unknown unknowns SKILLs don't solve this unless I keep like a collection of 1750 skills with specific cuts of domain knowledge for each possible subset of my domain that I might need in a given chat, but that's a lot of manual work recursive LLMs or whatever don't solve this for the same reason, you can't dump a domain book and expect the AGENT will magically guess that it is supposed to search for a specific bit knowledge. unknown unknowns fine tuning doesn't solve this (OSS models suck and OpenAI / Anthropic gave up on user fine tuning) I honestly think a good product around fine tuning on your domain would be a major hit and an underdog lab should take this opportunity
English
667
179
3.5K
252.8K
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
@motatoeshq curious why you differentiate Firecracker microVMs from a "full hypervisor"?
English
1
0
1
93
Souvik Banerjee
Souvik Banerjee@Souvik1997_·
@computesdk don't think just-bash should count here. it's not a full VM or container so you are limited in what you can actually do. you should have some criteria for what a sandbox is, what minimum functionality it should have, etc
English
1
0
6
429
ComputeSDK
ComputeSDK@computesdk·
👀 we have a new person in first place?
ComputeSDK tweet mediaComputeSDK tweet mediaComputeSDK tweet media
English
4
1
21
28.4K
Darren Shepherd
Darren Shepherd@ibuildthecloud·
If your thinking about launching a sandbox product, realize you're just creating another problem.
English
3
0
5
1.6K
Ivan Burazin
Ivan Burazin@ivanburazin·
2024 was gpus 2025 was rams 2026 will be cpus Just had a call where a customer asked: "Can I spin up 5,000 sandboxes per second? And run 50,000 concurrently? Actually, ideally 500,000?" Every second, every day, for two weeks. Multiple frontier companies have also asked us for 500k+ concurrent sandboxes for RL training. The compute demand is so staggering that CPUs will soon become the next bottleneck.
English
83
91
1.9K
245.4K
Michael R. Bock
Michael R. Bock@michaelrbock·
One cold DM changed my life: Five years ago, @GavinNachbar & I had applied to the @southpkcommons Founder Fellowship and had an interview coming up. I (cold) DM'd someone I respected online and they helped us prepare for the interview. With their prep, we nailed the interview and raised $400k before we had an idea. Now I want to pay it forward: if you have an SPC FF interview coming up, let me know (DM me!), and I'm happy to help you prepare!
Michael R. Bock tweet media
English
15
5
146
12.5K