Souvik Banerjee (@Souvik1997_) - Twitter Profili

Souvik Banerjee@Souvik1997_·4h

@benswerd Is this intel or apple silicon?

English

1

0

76

Benۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗ☁️@benswerd·4h

Freestyle MacOS VM. Booyah.

Benۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗۗ☁️ tweet media

Eesti

6

0

14

1.5K

Souvik Banerjee@Souvik1997_·22h

Every year, ~10 million people are booked into U.S. jails. Nearly 70% of the jail population hasn't been convicted of anything. The law guarantees them a phone call. Most don't know a lawyer's number. A Santa Clara County pilot found that people who spoke to a lawyer right after arrest spent 6 days in jail instead of 29. So I built JailCall. A real phone number you can call from a police station. An AI voice agent picks up, takes your name and charge, semantically routes your case to local criminal defense firms, sends real intake emails on your behalf, and remembers your case across calls. Built solo at the YC Call My Agent hackathon. Won 2nd place overall. Demo: youtube.com/watch?v=c1eKOd… Code: github.com/souvik1997/Jai… Thank you to @AgentPhoneHQ !

YouTube

English

3

0

3

131

Souvik Banerjee@Souvik1997_·2d

@diptanu @damian_b What do you think would be better than a gateway? My ideal architecture would look like scoped capability tokens minted specifically for each agent/sandbox, with no ambient authority. Agents are the ultimate confused deputy

English

1

0

72

Diptanu Choudhury@diptanu·2d

Solid outline of the future of sandboxes from @damian_b It’s a no brainer to focus on multi sandbox environments. The form factor of sandboxes are best suited to be the primitive to build, test and deploy new software. We have to build networking which is simple for agents to manage and connect sandboxes in a mesh. File systems managed by api calls to mount and the big one - Snapshotting these environments at scale for debugging and bootstrapping new environments. And for secrets the gateway approach is a good start, but IMO it’s very fragile.

Damian Barabonkov@damian_b

The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

English

4

2

17

3.4K

Souvik Banerjee retweetledi

Damian Barabonkov@damian_b·2d

The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

Peter Steinberger 🦞@steipete

People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.

English

9

11

110

23.9K

Souvik Banerjee@Souvik1997_·4d

@confusedqubit How much of a perf hit did you see with the userspace GIC vs. in-kernel GIC?

English

1

0

47

Shivansh Vij@confusedqubit·4d

I did get vGIC working! No more performance hits on Apple VMs if you want snapshot/restore support!!! QEMU can go pound sand (jk jk I love QEMU)

Shivansh Vij@confusedqubit

Did I just get hardware GIC on HVF to restore from a KVM snapshot or is Claude hallucinating? 🤔

English

2

1

12

2.3K

Souvik Banerjee@Souvik1997_·6d

@VictorTaelin Would absolutely love to beta test!

English

0

1

189

Taelin@VictorTaelin·6d

the current state of Bend2 is: → everything is done → everything works → all tests pass yet I can't launch because the codebase is massive and auditing it is taking forever because each small adjust or bugfix takes a whole day as the AI re-reads everything once again sighs

English

49

8

433

45.6K

Souvik Banerjee@Souvik1997_·11 May

@confusedqubit Curious if you're using the in-kernel GIC on macOS? I was working on this a while ago and had to implement GICv3 emulation in userspace to make snapshot/restore work properly

English

1

0

30

Shivansh Vij@confusedqubit·10 May

Spent all of last week patching libkrun to support snapshot/resume for VMs - including taking snapshots from KVM and natively restoring on MacOS HVF

English

1

7

1.6K

Souvik Banerjee@Souvik1997_·11 May

@confusedqubit why would you run the agent inside the VM and not want it to be rolled back with the VM?

English

0

32

Shivansh Vij@confusedqubit·10 May

Snapshot/Restore for sandboxes is NOT implemented correctly by any provider. Some are snapshotting just disks and calling it a day (looking at you smolvm). That's not enough. Others allow you to take full VM (memory, etc.) snapshots, but rolling back the VM state interrupts the agent running inside. This is the next UX hurdle. I want my agent to be able to rollback my VM, without it getting rolled back. And I want it in the VM, none of this remote execution nonsense.

English

8

1

27

3.5K

Souvik Banerjee@Souvik1997_·9 May

@gakonst the problem is not really about Arc performance- it's that it papers over bad design in nonobvious ways. forcing yourself to use references also makes the code better structured

English

0

63

Georgios Konstantopoulos@gakonst·8 May

arc is fine we use arc all time, im lazy and dont wanna think about the millisecond until i've gotten down to it being an actual problem - i'd take arc and good API ergonomics (vs leaky lifetimes) anyday unless we're optimizing hot loops

Dmitriy Kovalenko@neogoose_btw

to every rust vibe sloppers: Arc is NOT CHEAP it is a giant overhead tell this to your clanker I am tired of the PRs that are wrapping everything in Arc

English

4

1

51

10.3K

Souvik Banerjee@Souvik1997_·8 May

@byjasonz @ItsMatthewAo @BapnaArihant how is this different from nfs?

English

0

26

Jason Zhao@byjasonz·6 May

we just gave your computer infinite storage. quickly find and edit terabytes of files, all while using zero disk space. here’s a first look, updates shipping daily.

English

155

48

951

107.1K

Souvik Banerjee@Souvik1997_·6 May

@jasondoesstuff This is the way. Even better if you can automate it- a programmable software factory. This is what we are building at amlalabs.com/ax

English

0

2

577

Jason Zook@jasondoesstuff·6 May

SOOO many less bugs building stuff with this... - Claude Opus 4.7 makes the feature plan - GPT-5.5 reviews the plan (always finds issues) - Opus updates the plan, GPT approves - Opus builds, uses Playwright to test UX/UI - GPT reviews feature code (always finds issues) - Opus fixes issues, GPT signs off ✅ - Then I test fully myself, usually very minor issues - Merge and deploy! 🚀 I'm using @conductor_build to easily bounce back and forth between the two and VERY happy with this workflow 👏👏. Kind of crazy to pay ~$400/month for what feels like a full dev team that never pushes back on all my stupid UI requests and small changes 😂.

English

112

61

1.5K

130.9K

Souvik Banerjee@Souvik1997_·30 Nis

@aaronkazah Is this like Mesa?

English

1

0

1

132

Aaron Kazah@aaronkazah·30 Nis

introducing trunks: the most powerful open-source git-native filesystem for ai agents. it gives agents a normal filesystem with git semantics built in: branches, diffs, rollback, checkpoints, and push/pull. available today. completely self-hosted. open source. your data stays on storage you control. github.com/layerbrain/tru…

English

7

15

145

10.9K

Souvik Banerjee@Souvik1997_·30 Nis

@VictorTaelin A linter rule to disallow bigints?

English

1

0

13

2.5K

Taelin@VictorTaelin·30 Nis

again, suppose you have some bit of knowledge that is mandatory for an agent to operate well in your domain. ex: > using BigInt in this repo is bad for you have two options: Option 1: you make that directly visible (AGENTS.md) this DOES work if the Agent is good enough. the problem is that may be actually complex, like, 1k tokens worth. so, accumulate enough of these and you easily have 500k tokens of mandatory domain knowledge. including that in any model will immediately downgrade it into GPT-2, and cost a fortune Option 2: you make that SEARCHABLE (RAG, RLMs, etc.) the problem is that the AI cannot magically guess when it needs that bit of knowledge. it will not stop writing some JS function and think: "wait perhaps there is some part of the domain that tells me that BigInts are bad and I should start looking for it?" it will just use BigInts. I won't OCCUR to it that there is something to be searched so: - make visible: too long to fit - make searchable: it can't guess that's why I think nightly fine tuning as a product is the only way forward, as it allows you to extend a model with domain knowledge without causing context rot why nobody is doing this seriously is beyond me. it might be that for whatever reason this wouldn't be practical, but I suspect the real reason is nobody is seriously considering it

English

80

8

340

49K

Taelin@VictorTaelin·30 Nis

seriously, working with AI is MISERABLE for one and only one reason: having to re-explain the same thing "oh yeah this new session obviously doesn't know what proper case trees are, so let me explain it for the 5000th time in my life" I'm tired AGENTS.md doesn't solve this because it is impossible to fit the entire domain knowledge without nuking the context - it would be 1m+ tokens worth RAGs don't solve this, the agent won't search unknown unknowns SKILLs don't solve this unless I keep like a collection of 1750 skills with specific cuts of domain knowledge for each possible subset of my domain that I might need in a given chat, but that's a lot of manual work recursive LLMs or whatever don't solve this for the same reason, you can't dump a domain book and expect the AGENT will magically guess that it is supposed to search for a specific bit knowledge. unknown unknowns fine tuning doesn't solve this (OSS models suck and OpenAI / Anthropic gave up on user fine tuning) I honestly think a good product around fine tuning on your domain would be a major hit and an underdog lab should take this opportunity

English

667

179

3.5K

252.8K

Souvik Banerjee@Souvik1997_·18 Mar

@motatoeshq curious why you differentiate Firecracker microVMs from a "full hypervisor"?

English

1

0

1

93

Mohamed Habib@motatoeshq·18 Mar

x.com/i/article/2033…

ZXX

6

3

31

8.4K

Souvik Banerjee@Souvik1997_·11 Mar

@computesdk don't think just-bash should count here. it's not a full VM or container so you are limited in what you can actually do. you should have some criteria for what a sandbox is, what minimum functionality it should have, etc

English

1

0

6

429

ComputeSDK@computesdk·11 Mar

👀 we have a new person in first place?

English

4

1

21

28.4K

Souvik Banerjee@Souvik1997_·17 Şub

@ibuildthecloud what's the reasoning for this?

English

0

1

55

Darren Shepherd@ibuildthecloud·17 Şub

If your thinking about launching a sandbox product, realize you're just creating another problem.

English

3

0

5

1.6K

Souvik Banerjee@Souvik1997_·17 Şub

@ivanburazin Curious who is asking for so many sandboxes?

English

0

790

Ivan Burazin@ivanburazin·16 Şub

2024 was gpus 2025 was rams 2026 will be cpus Just had a call where a customer asked: "Can I spin up 5,000 sandboxes per second? And run 50,000 concurrently? Actually, ideally 500,000?" Every second, every day, for two weeks. Multiple frontier companies have also asked us for 500k+ concurrent sandboxes for RL training. The compute demand is so staggering that CPUs will soon become the next bottleneck.

English

83

91

1.9K

245.4K

Souvik Banerjee@Souvik1997_·6 Şub

@michaelrbock @GavinNachbar @southpkcommons DMed!

English

1

0

87

Michael R. Bock@michaelrbock·5 Şub

One cold DM changed my life: Five years ago, @GavinNachbar & I had applied to the @southpkcommons Founder Fellowship and had an interview coming up. I (cold) DM'd someone I respected online and they helped us prepare for the interview. With their prep, we nailed the interview and raised $400k before we had an idea. Now I want to pay it forward: if you have an SPC FF interview coming up, let me know (DM me!), and I'm happy to help you prepare!

English

15

5

146

12.5K

Souvik Banerjee

Keşfet