Manoj

503 posts

Manoj

@mbajaj_

Building https://t.co/rXAD4uhdXy - Credential broker for AI agents | Previously Ruzo, Sqpt , @FalconXGlobal, @iitdelhi

San Francisco Katılım Haziran 2010

147 Takip Edilen142 Takipçiler

Sabitlenmiş Tweet

Manoj@mbajaj_·14 May

hey, if you're building with AI agents, trust me you need this. every agent you run needs API keys. right now they sit in .env files on your machine. your agent reads them, holds them in memory, sends them to inference servers. if the agent, the package, or the provider gets compromised, your credentials go with it so we built Authsome. it's a local proxy that sits between your agent and the APIs it calls. the agent makes a normal HTTP request. authsome intercepts it and injects the auth header. the agent never sees the actual token. no cloud. no SaaS. credentials never leave your machine. 45 providers. works with Claude Code, Codex, Cursor, OpenCode, Hermes. open source. MIT. check us out at: github.com/agentrhq/auths…

English

454

Manoj@mbajaj_·2h

vaults solve storage. claims solve usage. both layers matter. if you're using bitwarden, hashicorp vault, or any secrets manager with your agents - ask yourself: what happens to the credential after the agent reads it from the vault? if the answer is "it stays in memory until the process dies" - you've encrypted the filing cabinet but left the documents on the desk. pip install authsome github.com/agentrhq/auths… build update #3 in 2 days. till then, keep building folks

English

Manoj@mbajaj_·2h

hardest part: vault_id isolation. in multi-tenant setups, a crafted request could pull a claim from a different vault. caught it in our own testing - took 3 rewrites to get the isolation model right. the first version passed unit tests. the second passed integration tests. the third passed adversarial tests where we actively tried to break the boundary.

English

Manoj@mbajaj_·2h

authsome build update #2: why vaults don't solve the problem. @NousResearch just shipped Bitwarden integration with Hermes Agenet. 48K views. everyone celebrating. vault is the right call for storage. but here's what nobody's asking: once the agent pulls the credential from Bitwarden, what controls how long it holds it? what controls which agent process gets it? what happens when the task is done - does the credential go back in the vault or does it sit in process memory until the agent dies? the answer for most vault integrations: the credential is ambient. every process gets it. forever. until someone manually revokes it. we just moved the .env problem into a fancier box.

English

Manoj@mbajaj_·11h

150 lines of code beating Claude Code and Codex on SWE-bench. 50% vs 40% on opus 4.7. the simpler agent won because it has less overhead between the model and the task. Claude Code and Codex are built for humans - permission prompts, interactive workflows, tool management. mini-swe-agent doesn't care about any of that. it just solves the problem. the features we built for developer experience are literally the bottleneck for agent performance.

English

256

Kilian Lieret@KLieret·18h

DeepSWE finds that mini-swe-agent significantly outperforms ClaudeCode and Codex on the benchmark. The simpler the system, the better it generalizes (and mini's core agent class is just ~150 lines of code)

English

59.3K

Manoj@mbajaj_·11h

the third type: the person who gets told no by one model, pastes the same problem into a different model, and gets a working solution. the LLM's "no" isn't a law of physics. it's the boundary of one model's training data. switching models is the cheapest second opinion in history.

English

112

kache@yacineMTB·12h

There are two types of people. The kind of person who gives up on a path because an LLM tells them it isn't possible. And the kind of person who understands things from their principles, and doesn't take an unexplained no for an answer.

English

517

14.8K

Manoj@mbajaj_·11h

streaming being irrelevant assumes latency goes to zero. even at 289 tokens/sec (gemini 3.5 flash), a 10K token response takes 35 seconds. users won't wait for a blank screen that long. streaming isn't about speed - it's about perceived responsiveness. that doesn't go away with faster compute.

English

Javier@javi_22_dev·11h

@mbajaj_ @DavidSHolz Token level control is better on NAR. And streaming will be irrelevant as soon as compute allows higher generation speeds.

English

David@DavidSHolz·12h

Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????

English

971

112K

Manoj@mbajaj_·11h

13F parsing is one of those problems that sounds easy until you're 3 weeks in and BERKSHIRE HATHAWAY, BRK, BRK.A, and a CUSIP are all the same thing but your database thinks they're four different companies. the vibe coded approaches break on exactly this - historical ticker changes and M&A activity. this is a good example of where domain knowledge still matters more than the model. the AI can call the API but it can't build the canonical mapping underneath it.

English

1.2K

Virat Singh@virattt·12h

Proud of this one. We shipped a new Institutional Holdings API. This data comes from 13F filings, which is a nightmare to parse at scale. You must normalize tons of data. Funds will report "BERKSHIRE HATHAWAY", "BRK", "BRK.A", and the CUSIP across different positions. Resolving these to a canonical security requires a reference → CUSIP → ticker map that is correct both today and historically. You need to track ticker changes, splits, and M&A activity, which breaks the majority of vibe coded approaches. Also, there is a ton of data: your DB table will easily exceed 1B rows if done right. We solved it @findatasets. It's now 1 API call.

English

429

75.8K

Manoj@mbajaj_·11h

@zeeg the bar is on the floor when "we won't train on your data without asking" is a competitive differentiator. this should be the default, not a marketing statement.

English

106

David Cramer@zeeg·15h

Sentry will never "train language models" with your data, without your explicit consent, fyi. We don't train models, and even if we found a valuable use case, we think it merits convincing customers. The idea that people think they can get away with this is mind boggling.

English

148

20.2K

Manoj@mbajaj_·11h

that's interesting - diffusion for the plan means you generate the full solution structure in parallel instead of sequentially. then AR executes each piece with precise token-level control. you'd get faster planning and more coherent high-level architecture vs AR planning which can drift as the sequence gets longer. has midjourney experimented with this for any non-image modalities?

English

693

David@DavidSHolz·11h

@mbajaj_ why not diffusion planning with AR execution?

English

2.6K

Manoj@mbajaj_·12h

@zeeg this is the anti-slop prompt. instead of asking the agent to build, you're asking it to tear down. most agents default to adding code. telling it to actively look for things to remove is a different mode entirely and probably the highest ROI use of /goal I've seen.

English

321

David Cramer@zeeg·15h

ZXX

1.1K

52.8K

Manoj@mbajaj_·12h

63µs at 514 tokens with zero heap allocations. the fact that CPU tokenization became a meaningful bottleneck tells you how fast the GPU side has gotten - when your reranker runs in single-digit ms, the tokenizer that feeds it is suddenly 30% of your latency. this is the kind of optimization that only matters at perplexity's scale but sets the floor for everyone else once it's open source. the benchmarks against HuggingFace are brutal - 5x at p50, nearly 4x at p99 on 16k inputs. curious how this performs on multilingual inputs where XLM-RoBERTa's 250K vocab really earns its size.

English

102

Perplexity@perplexity_ai·17h

We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. github.com/perplexityai/p…

English

721

76.6K

Manoj@mbajaj_·1d

the file system insight is right but the gmail and google calendar connectors part is where it gets interesting. "add in the connectors" sounds simple but each connector is an OAuth token that needs to be acquired, stored, refreshed, and scoped correctly. the non-technical user who just wants to analyze their finances from a folder of PDFs is now managing Google OAuth credentials they don't understand. the "put files in a folder" workflow is genius because it has zero auth overhead. the moment you add connectors, you've moved from "anyone can do this" to "anyone who can manage OAuth tokens can do this" which is a very different audience.

English

819

Thariq@trq212·1d

the basic trick to using Claude Code for non-technical work is to put a bunch of files in a folder and tell it can write scripts + make HTML

English

169

120

3.2K

352.3K

Manoj@mbajaj_·1d

using claude as a sub-agent inside codex means codex is shelling out to claude -p which runs a separate authenticated process on your machine. that second process needs its own API key, its own context window, its own token budget. you're not just using two models - you're running two billing meters, two rate limit pools, and two separate processes with filesystem access simultaneously. the trick works but nobody's talking about what it actually costs or what happens when claude's rate limit hits mid-task and codex doesn't know its sub-agent just died.

English

854

Matt Shumer@mattshumer_·1d

Massively useful Codex trick for 10x better frontend: You can ask Codex to use Claude as a sub-agent to have Claude handle frontend/design work. Just say “Use claude -p with an excellent, well-scoped, but un-opinionated (UI/UX-wise) prompt anytime you need a design change).”

English

124

1.6K

179.2K

Manoj@mbajaj_·1d

the interesting part: this is the same guy who built Million.js (virtual DOM optimization) and React Scan (re-render detection). each tool moved one layer up the stack. first: make React faster. then: find what's slow. now: fix what the agent broke. the pattern is tools-that-watch-tools and it's going to be its own category. react-doctor for frontend, AI-Trader auditing trading agents, Perplexity's bumblebee scanning for compromised packages. six months from now every serious agent workflow will have a second agent reviewing the first one's work.

English

1.1K

Aiden Bai@aidenybai·1d

Introducing /react-doctor Your React app probably has bad code. This fixes it Install as agent skill. Fully open source. npx react-doctor@latest

English

126

337

5.7K

516.4K

Manoj@mbajaj_·1d

@PalantirTech “ai solved software creation” that’s correct we have infinite cheap code and nobody is brave enough to give it the production keys

English

709

Palantir@PalantirTech·1d

AI solved software creation. Now comes software distribution. The future will not run on blind deployment pipelines. Apollo provides the Ontology Primitives for Software Distribution. Deploy. Patch. Rollback. Validate. Govern. AI-native velocity with human accountability.

English

111

209

1.8K

205.7K

Manoj@mbajaj_·1d

8.4 is the most important line here and everyone will scroll past it. "data, permissions, distribution, trust, compliance, regulatory position, and physical assets become more valuable." this is the real list of moats in a post-scarcity code world. every SaaS company whose pitch is "we built this so you don't have to" is about to find out what happens when building it yourself costs $0 and 20 minutes.

English

343

Thorsten Ball@thorstenball·1d

x.com/i/article/2059…

ZXX

447

51.3K

Manoj@mbajaj_·1d

@garrytan “the upload flow is still broken” that’s correct our 7-figure engineers prefer prompting claude code

English

336

Garry Tan@garrytan·1d

Unbelievable how broken Google apps are on iOS Can’t even upload photos from photo roll properly to Google Drive app People are getting paid 7 figures a year to ship this poor quality software? 👀

English

241

1.4K

149.4K

Keşfet

@NousResearch @DavidSHolz @findatasets @zeeg @elonmusk @BarackObama @taylorswift13 @cristiano