Manoj

503 posts

Manoj banner
Manoj

Manoj

@mbajaj_

Building https://t.co/rXAD4uhdXy - Credential broker for AI agents | Previously Ruzo, Sqpt , @FalconXGlobal, @iitdelhi

San Francisco Katılım Haziran 2010
147 Takip Edilen142 Takipçiler
Sabitlenmiş Tweet
Manoj
Manoj@mbajaj_·
hey, if you're building with AI agents, trust me you need this. every agent you run needs API keys. right now they sit in .env files on your machine. your agent reads them, holds them in memory, sends them to inference servers. if the agent, the package, or the provider gets compromised, your credentials go with it so we built Authsome. it's a local proxy that sits between your agent and the APIs it calls. the agent makes a normal HTTP request. authsome intercepts it and injects the auth header. the agent never sees the actual token. no cloud. no SaaS. credentials never leave your machine. 45 providers. works with Claude Code, Codex, Cursor, OpenCode, Hermes. open source. MIT. check us out at: github.com/agentrhq/auths…
English
0
0
4
454
Manoj
Manoj@mbajaj_·
vaults solve storage. claims solve usage. both layers matter. if you're using bitwarden, hashicorp vault, or any secrets manager with your agents - ask yourself: what happens to the credential after the agent reads it from the vault? if the answer is "it stays in memory until the process dies" - you've encrypted the filing cabinet but left the documents on the desk. pip install authsome github.com/agentrhq/auths… build update #3 in 2 days. till then, keep building folks
English
0
0
6
19
Manoj
Manoj@mbajaj_·
hardest part: vault_id isolation. in multi-tenant setups, a crafted request could pull a claim from a different vault. caught it in our own testing - took 3 rewrites to get the isolation model right. the first version passed unit tests. the second passed integration tests. the third passed adversarial tests where we actively tried to break the boundary.
Manoj tweet media
English
1
0
5
18
Manoj
Manoj@mbajaj_·
authsome build update #2: why vaults don't solve the problem. @NousResearch just shipped Bitwarden integration with Hermes Agenet. 48K views. everyone celebrating. vault is the right call for storage. but here's what nobody's asking: once the agent pulls the credential from Bitwarden, what controls how long it holds it? what controls which agent process gets it? what happens when the task is done - does the credential go back in the vault or does it sit in process memory until the agent dies? the answer for most vault integrations: the credential is ambient. every process gets it. forever. until someone manually revokes it. we just moved the .env problem into a fancier box.
Manoj tweet media
English
1
2
7
66
Manoj
Manoj@mbajaj_·
150 lines of code beating Claude Code and Codex on SWE-bench. 50% vs 40% on opus 4.7. the simpler agent won because it has less overhead between the model and the task. Claude Code and Codex are built for humans - permission prompts, interactive workflows, tool management. mini-swe-agent doesn't care about any of that. it just solves the problem. the features we built for developer experience are literally the bottleneck for agent performance.
English
0
1
0
256
Kilian Lieret
Kilian Lieret@KLieret·
DeepSWE finds that mini-swe-agent significantly outperforms ClaudeCode and Codex on the benchmark. The simpler the system, the better it generalizes (and mini's core agent class is just ~150 lines of code)
Kilian Lieret tweet media
English
8
11
72
59.3K
Manoj
Manoj@mbajaj_·
the third type: the person who gets told no by one model, pastes the same problem into a different model, and gets a working solution. the LLM's "no" isn't a law of physics. it's the boundary of one model's training data. switching models is the cheapest second opinion in history.
English
0
0
0
112
kache
kache@yacineMTB·
There are two types of people. The kind of person who gives up on a path because an LLM tells them it isn't possible. And the kind of person who understands things from their principles, and doesn't take an unexplained no for an answer.
English
64
25
517
14.8K
Manoj
Manoj@mbajaj_·
streaming being irrelevant assumes latency goes to zero. even at 289 tokens/sec (gemini 3.5 flash), a 10K token response takes 35 seconds. users won't wait for a blank screen that long. streaming isn't about speed - it's about perceived responsiveness. that doesn't go away with faster compute.
English
1
0
0
38
Javier
Javier@javi_22_dev·
@mbajaj_ @DavidSHolz Token level control is better on NAR. And streaming will be irrelevant as soon as compute allows higher generation speeds.
English
1
0
0
29
David
David@DavidSHolz·
Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????
English
58
51
971
112K
Manoj
Manoj@mbajaj_·
13F parsing is one of those problems that sounds easy until you're 3 weeks in and BERKSHIRE HATHAWAY, BRK, BRK.A, and a CUSIP are all the same thing but your database thinks they're four different companies. the vibe coded approaches break on exactly this - historical ticker changes and M&A activity. this is a good example of where domain knowledge still matters more than the model. the AI can call the API but it can't build the canonical mapping underneath it.
English
0
0
1
1.2K
Virat Singh
Virat Singh@virattt·
Proud of this one. We shipped a new Institutional Holdings API. This data comes from 13F filings, which is a nightmare to parse at scale. You must normalize tons of data. Funds will report "BERKSHIRE HATHAWAY", "BRK", "BRK.A", and the CUSIP across different positions. Resolving these to a canonical security requires a reference → CUSIP → ticker map that is correct both today and historically. You need to track ticker changes, splits, and M&A activity, which breaks the majority of vibe coded approaches. Also, there is a ton of data: your DB table will easily exceed 1B rows if done right. We solved it @findatasets. It's now 1 API call.
English
14
25
429
75.8K
Manoj
Manoj@mbajaj_·
@zeeg the bar is on the floor when "we won't train on your data without asking" is a competitive differentiator. this should be the default, not a marketing statement.
English
0
0
0
106
David Cramer
David Cramer@zeeg·
Sentry will never "train language models" with your data, without your explicit consent, fyi. We don't train models, and even if we found a valuable use case, we think it merits convincing customers. The idea that people think they can get away with this is mind boggling.
English
19
10
148
20.2K
Manoj
Manoj@mbajaj_·
that's interesting - diffusion for the plan means you generate the full solution structure in parallel instead of sequentially. then AR executes each piece with precise token-level control. you'd get faster planning and more coherent high-level architecture vs AR planning which can drift as the sequence gets longer. has midjourney experimented with this for any non-image modalities?
English
1
0
8
693
David
David@DavidSHolz·
@mbajaj_ why not diffusion planning with AR execution?
English
3
0
19
2.6K
Manoj
Manoj@mbajaj_·
@zeeg this is the anti-slop prompt. instead of asking the agent to build, you're asking it to tear down. most agents default to adding code. telling it to actively look for things to remove is a different mode entirely and probably the highest ROI use of /goal I've seen.
English
0
0
0
321
Manoj
Manoj@mbajaj_·
63µs at 514 tokens with zero heap allocations. the fact that CPU tokenization became a meaningful bottleneck tells you how fast the GPU side has gotten - when your reranker runs in single-digit ms, the tokenizer that feeds it is suddenly 30% of your latency. this is the kind of optimization that only matters at perplexity's scale but sets the floor for everyone else once it's open source. the benchmarks against HuggingFace are brutal - 5x at p50, nearly 4x at p99 on 16k inputs. curious how this performs on multilingual inputs where XLM-RoBERTa's 250K vocab really earns its size.
English
0
0
0
102
Perplexity
Perplexity@perplexity_ai·
We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. github.com/perplexityai/p…
Perplexity tweet media
English
51
83
721
76.6K
Manoj
Manoj@mbajaj_·
the file system insight is right but the gmail and google calendar connectors part is where it gets interesting. "add in the connectors" sounds simple but each connector is an OAuth token that needs to be acquired, stored, refreshed, and scoped correctly. the non-technical user who just wants to analyze their finances from a folder of PDFs is now managing Google OAuth credentials they don't understand. the "put files in a folder" workflow is genius because it has zero auth overhead. the moment you add connectors, you've moved from "anyone can do this" to "anyone who can manage OAuth tokens can do this" which is a very different audience.
English
0
0
0
819
Thariq
Thariq@trq212·
the basic trick to using Claude Code for non-technical work is to put a bunch of files in a folder and tell it can write scripts + make HTML
English
169
120
3.2K
352.3K
Manoj
Manoj@mbajaj_·
using claude as a sub-agent inside codex means codex is shelling out to claude -p which runs a separate authenticated process on your machine. that second process needs its own API key, its own context window, its own token budget. you're not just using two models - you're running two billing meters, two rate limit pools, and two separate processes with filesystem access simultaneously. the trick works but nobody's talking about what it actually costs or what happens when claude's rate limit hits mid-task and codex doesn't know its sub-agent just died.
English
0
0
0
854
Matt Shumer
Matt Shumer@mattshumer_·
Massively useful Codex trick for 10x better frontend: You can ask Codex to use Claude as a sub-agent to have Claude handle frontend/design work. Just say “Use claude -p with an excellent, well-scoped, but un-opinionated (UI/UX-wise) prompt anytime you need a design change).”
English
124
55
1.6K
179.2K
Manoj
Manoj@mbajaj_·
the interesting part: this is the same guy who built Million.js (virtual DOM optimization) and React Scan (re-render detection). each tool moved one layer up the stack. first: make React faster. then: find what's slow. now: fix what the agent broke. the pattern is tools-that-watch-tools and it's going to be its own category. react-doctor for frontend, AI-Trader auditing trading agents, Perplexity's bumblebee scanning for compromised packages. six months from now every serious agent workflow will have a second agent reviewing the first one's work.
English
0
0
1
1.1K
Aiden Bai
Aiden Bai@aidenybai·
Introducing /react-doctor Your React app probably has bad code. This fixes it Install as agent skill. Fully open source. npx react-doctor@latest
English
126
337
5.7K
516.4K
Manoj
Manoj@mbajaj_·
@PalantirTech “ai solved software creation” that’s correct we have infinite cheap code and nobody is brave enough to give it the production keys
Manoj tweet media
English
0
0
0
709
Palantir
Palantir@PalantirTech·
AI solved software creation. Now comes software distribution. The future will not run on blind deployment pipelines. Apollo provides the Ontology Primitives for Software Distribution. Deploy. Patch. Rollback. Validate. Govern. AI-native velocity with human accountability.
English
111
209
1.8K
205.7K
Manoj
Manoj@mbajaj_·
8.4 is the most important line here and everyone will scroll past it. "data, permissions, distribution, trust, compliance, regulatory position, and physical assets become more valuable." this is the real list of moats in a post-scarcity code world. every SaaS company whose pitch is "we built this so you don't have to" is about to find out what happens when building it yourself costs $0 and 20 minutes.
English
0
0
1
343
Manoj
Manoj@mbajaj_·
@garrytan “the upload flow is still broken” that’s correct our 7-figure engineers prefer prompting claude code
English
0
0
0
336
Garry Tan
Garry Tan@garrytan·
Unbelievable how broken Google apps are on iOS Can’t even upload photos from photo roll properly to Google Drive app People are getting paid 7 figures a year to ship this poor quality software? 👀
English
241
39
1.4K
149.4K