Salik Shah ✨🚀

16.3K posts

Salik Shah ✨🚀 banner
Salik Shah ✨🚀

Salik Shah ✨🚀

@salik

Actively building @DirghaAI (alpha). An AI computer that runs your business—research, build, ship. Sovereign, agentic. On cloud, on-prem. Ex @MithilaReview 🕊️

India 💌💬 Katılım Mart 2007
1.4K Takip Edilen1.7K Takipçiler
Salik Shah ✨🚀
@pmarca The demand will skyrocket. Every profession, every industry. You vibe code, understand the potential, and hire a professional to maintain and take things forward. Security is another big concern, and not many can afford to maintain their own infra either.
English
0
0
0
171
Marc Andreessen 🇺🇸
"Tech job openings rebounded sharply in 2026, challenging popular narrative that AI is wiping out engineering roles...more than 67,000 software eng job openings, highest level in 3 years. Listings have doubled since a trough in mid-2023." businessinsider.com/ai-isnt-killin…
English
37
27
200
46K
Salik Shah ✨🚀
Study and fix.
Alex Prompter@alex_prompter

🚨 BREAKING: Google DeepMind just mapped the attack surface that nobody in AI is talking about. Websites can already detect when an AI agent visits and serve it completely different content than humans see. > Hidden instructions in HTML. > Malicious commands in image pixels. > Jailbreaks embedded in PDFs. Your AI agent is being manipulated right now and you can't see it happening. The study is the largest empirical measurement of AI manipulation ever conducted. 502 real participants across 8 countries. 23 different attack types. Frontier models including GPT-4o, Claude, and Gemini. The core finding is not that manipulation is theoretically possible it is that manipulation is already happening at scale and the defenses that exist today fail in ways that are both predictable and invisible to the humans who deployed the agents. Google DeepMind built a taxonomy of every known attack vector, tested them systematically, and measured exactly how often they work. The results should alarm everyone building agentic systems. The attack surface is larger than anyone has publicly acknowledged. Prompt injection where malicious instructions hidden in web content hijack an agent's behavior works through at least a dozen distinct channels. Text hidden in HTML comments that humans never see but agents read and follow. Instructions embedded in image metadata. Commands encoded in the pixels of images using steganography, invisible to human eyes but readable by vision-capable models. Malicious content in PDFs that appears as normal document text to the agent but contains override instructions. QR codes that redirect agents to attacker-controlled content. Indirect injection through search results, calendar invites, email bodies, and API responses any data source the agent consumes becomes a potential attack vector. The detection asymmetry is the finding that closes the escape hatch. Websites can already fingerprint AI agents with high reliability using timing analysis, behavioral patterns, and user-agent strings. This means the attack can be conditional: serve normal content to humans, serve manipulated content to agents. A user who asks their AI agent to book a flight, research a product, or summarize a document has no way to verify that the content the agent received matches what a human would see. The agent cannot tell the user it was served different content. It does not know. It processes whatever it receives and acts accordingly. The attack categories and what they enable: → Direct prompt injection: malicious instructions in any text the agent reads overrides goals, exfiltrates data, triggers unintended actions → Indirect injection via web content: hidden HTML, CSS visibility tricks, white text on white backgrounds invisible to humans, consumed by agents → Multimodal injection: commands in image pixels via steganography, instructions in image alt-text and metadata → Document injection: PDF content, spreadsheet cells, presentation speaker notes every file format is a potential vector → Environment manipulation: fake UI elements rendered only for agent vision models, misleading CAPTCHA-style challenges → Jailbreak embedding: safety bypass instructions hidden inside otherwise legitimate-looking content → Memory poisoning: injecting false information into agent memory systems that persists across sessions → Goal hijacking: gradual instruction drift across multiple interactions that redirects agent objectives without triggering safety filters → Exfiltration attacks: agents tricked into sending user data to attacker-controlled endpoints via legitimate-looking API calls → Cross-agent injection: compromised agents injecting malicious instructions into other agents in multi-agent pipelines The defense landscape is the most sobering part of the report. Input sanitization cleaning content before the agent processes it fails because the attack surface is too large and too varied. You cannot sanitize image pixels. You cannot reliably detect steganographic content at inference time. Prompt-level defenses that tell agents to ignore suspicious instructions fail because the injected content is designed to look legitimate. Sandboxing reduces the blast radius but does not prevent the injection itself. Human oversight the most commonly cited mitigation fails at the scale and speed at which agentic systems operate. A user who deploys an agent to browse 50 websites and summarize findings cannot review every page the agent visited for hidden instructions. The multi-agent cascade risk is where this becomes a systemic problem. In a pipeline where Agent A retrieves web content, Agent B processes it, and Agent C executes actions, a successful injection into Agent A's data feed propagates through the entire system. Agent B has no reason to distrust content that came from Agent A. Agent C has no reason to distrust instructions that came from Agent B. The injected command travels through the pipeline with the same trust level as legitimate instructions. Google DeepMind documents this explicitly: the attack does not need to compromise the model. It needs to compromise the data the model consumes. Every agentic system that reads external content is one carefully crafted webpage away from executing attacker instructions. The agents are already deployed. The attack infrastructure is already being built. The defenses are not ready.

English
0
0
0
7
Salik Shah ✨🚀
@ashpreetbedi Yes, this needed to be said. I am building a CLI for my platform, and to me, it is not different from building a full system.
English
0
0
0
145
Ashpreet Bedi
Ashpreet Bedi@ashpreetbedi·
Maybe I'm missing something, but "harness engineering" might be doing more harm than good. I've read a couple of posts on harness engineering, filesystem memory, subagent architecture. All real, all important. I've learned a lot from them. But I keep coming back to this: the framing of Agent = Model + Harness undersells the actual engineering involved. And as far as I can tell, none of the major agent products work this way. Claude, ChatGPT, Devin. These are all systems. They handle authentication, multi-tenancy, deployment, observability, cost controls, state management across sessions and users, RBAC, resource isolation. The "harness" is a subset of the engineering involved in building these products. A better framing might be Agent = Model + System. This makes sense because you can't serve a raw API call to users. You need the system around it to turn the model into a product. You could argue Agent = Model + Harness + System, and that's fair. But at that point the harness is just a component of the system. Treat it as one. My concern is that when we center the conversation on harness engineering, we train developers to think about the 30% that touches the model and ignore the 70% that makes the thing actually work in the real world. When we look at the problem through the lens of the 30%, we end up with things like virtualized file systems which are solving problems that shouldn't exist in the first place. At best, the harness wraps the model. The system is the product. And there's a reason the consensus is that model progress will eventually swallow the harness. Because the harness is a thin layer. The system is not. The system is the product, and that's what developers should be focusing on. Another reason to take harness engineering with a grain of salt: it's shaped by coding agents. Coding agents are a very specific form factor which itself is evolving rapidly. Single user. Running in a terminal. Local filesystem. The patterns that emerge from this form factor are useful for this form factor. And I worry that generalizing them to broader agentic systems is damaging to the ecosystem as a whole. Here's what I mean. And notice a pattern: many of these are solutions to problems that shouldn't exist in the first place if you start with the right system design. 1. Filesystems for memory and storage Harness engineering recommends patterns like AGENTS.md files for memory. This works when one developer is running one agent on their laptop. It falls apart the moment you need a real product. There's a reason databases exist. Files don't support concurrent access. They don't support querying. They don't support access control. A filesystem as your memory layer is a single-user solution presented as architecture. And now I'm seeing people build "virtualized file systems" that wrap databases into filesystem-like structures to patch over these limitations. At that point, just expose the database. You get SQL as a first-class interface, proper access control, and durable storage without the abstraction gymnastics. And you know what, LLMs are even better at SQL than they are at cat and bash. 2. No multi-tenancy or RBAC How do 50 engineers on a team share an agent securely? How do you control which users can trigger which actions? That's multi-tenancy, authorization, and access control. No filesystem pattern solves this. You need real RBAC. 3. No resource isolation How do you stop one tenant's runaway agent from burning through your entire token budget? That's resource isolation. It lives at the system level. A harness has no concept of it. I hear people recommending sandboxes scoped to individual users and it makes 0 sense to me because your costs will eat you alive. Btw these problems aren't new. They're the same problems we've been solving in software engineering for decades. The instinct to create new terminology comes from a good place. "Harness engineering", "Scaffolding”, "Context engineering". People want to name the new discipline. But every time we mint a new term for a subset of systems engineering, I think we make it harder for developers to recognize that the patterns they need already exist and we shouldn't re-invent the wheel. All problems that harness engineering solves, you can solve with systems engineering. Maybe I'm wrong about this, but I'm just seeing harness engineering create more issues than it solves (virtualized file systems???) If we want developers to successfully build agentic products, we should encourage them to think in systems. The solutions already exist. We should use them. Again, maybe I'm missing something. I'll keep an open mind as I learn more. And maybe the answer is simply that harness engineering applies to coding agents and not to broader agentic products, which makes perfect sense. TLDR: Agent = Model + Harness undersells the real problem. Harness engineering is shaped by coding agents (single user, terminal, local filesystem) and ignores the 70% that makes agents work in production: multi-tenancy, RBAC, approval flows, audit logs, resource isolation, durable storage. These are systems engineering problems.
English
11
14
110
12K
Salik Shah ✨🚀
Spot on. Everyone is burning tokens doing the exact same thing, reinventing SaaS clones with agents. We desperately need an open-source AI Core Library + standards: plug-and-play modules for all the top apps. A fictional open-source AI org could drive this, with a “Core Librarian Agent” that continuously maintains, integrates, and upgrades the foundational layer. This would free compute & tokens for novel tasks, reducing like 90% of duplicate efforts across all platforms. PS. Today, I finally decided to help Claude out by skipping it for intensive agent tasks. Any chance MiMo could offer an unlimited plan like Fireworks Firepass? How are they able to offer this? x.com/salik/status/2…
English
1
0
1
524
Fuli Luo
Fuli Luo@_LuoFuli·
Two days ago, Anthropic cut off third-party harnesses from using Claude subscriptions — not surprising. Three days ago, MiMo launched its Token Plan — a design I spent real time on, and what I believe is a serious attempt at getting compute allocation and agent harness development right. Putting these two things together, some thoughts: 1. Claude Code's subscription is a beautifully designed system for balanced compute allocation. My guess — it doesn't make money, possibly bleeds it, unless their API margins are 10-20x, which I doubt. I can't rigorously calculate the losses from third-party harnesses plugging in, but I've looked at OpenClaw's context management up close — it's bad. Within a single user query, it fires off rounds of low-value tool calls as separate API requests, each carrying a long context window (often >100K tokens) — wasteful even with cache hits, and in extreme cases driving up cache miss rates for other queries. The actual request count per query ends up several times higher than Claude Code's own framework. Translated to API pricing, the real cost is probably tens of times the subscription price. That's not a gap — that's a crater. 2. Third-party harnesses like OpenClaw/OpenCode can still call Claude via API — they just can't ride on subscriptions anymore. Short term, these agent users will feel the pain, costs jumping easily tens of times. But that pressure is exactly what pushes these harnesses to improve context management, maximize prompt cache hit rates to reuse processed context, cut wasteful token burn. Pain eventually converts to engineering discipline. 3. I'd urge LLM companies not to blindly race to the bottom on pricing before figuring out how to price a coding plan without hemorrhaging money. Selling tokens dirt cheap while leaving the door wide open to third-party harnesses looks nice to users, but it's a trap — the same trap Anthropic just walked out of. The deeper problem: if users burn their attention on low-quality agent harnesses, highly unstable and slow inference services, and models downgraded to cut costs, only to find they still can't get anything done — that's not a healthy cycle for user experience or retention. 4. On MiMo Token Plan — it supports third-party harnesses, billed by token quota, same logic as Claude's newly launched extra usage packages. Because what we're going for is long-term stable delivery of high-quality models and services — not getting you to impulse-pay and then abandon ship. The bigger picture: global compute capacity can't keep up with the token demand agents are creating. The real way forward isn't cheaper tokens — it's co-evolution. "More token-efficient agent harnesses" × "more powerful and efficient models." Anthropic's move, whether they intended it or not, is pushing the entire ecosystem — open source and closed source alike — in that direction. That's probably a good thing. The Agent era doesn't belong to whoever burns the most compute. It belongs to whoever uses it wisely.
English
66
71
576
86.3K
PIV
PIV@piv_piv·
Dither Punks on e-paper screens
PIV tweet media
English
16
14
159
8.3K
Salik Shah ✨🚀
@sriramk One product -- a personal computer. That's it. Everything collapses at the OS layer.
English
0
0
0
288
Sriram Krishnan
Sriram Krishnan@sriramk·
there are several products waiting to be built here.
Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English
27
8
282
67.2K
Salik Shah ✨🚀
@HarveenChadha Haha! I just caught Claude using its own model even when I asked it to use Kimi (Composer with RL) today. Cost is definitely an issue, and why use Claude when anything else gets work done.
Salik Shah ✨🚀 tweet media
English
0
0
1
493
Harveen Singh Chadha
Harveen Singh Chadha@HarveenChadha·
TIL, Cursor uses composer-2-fast for all the auto spawned subagents by default even when you have selected opus-4.6 as the main model if you are on enterprise plan with request based model, you can't change this behaviour you are charged for opus, but opus is just the orchestrator, under the hood all subagents are composer..
Harveen Singh Chadha tweet media
English
32
5
293
39.3K
Salik Shah ✨🚀
@levelsio This is why sometimes you need outside help for things you are good at but don't want to do it for yourself
English
0
0
0
56
@levelsio
@levelsio@levelsio·
There has to be some word for this concept It's why designers from tech who design touchscreens like Jony Ive won't put a touch screen in a car but use real knobs Or why programmers don't actually like smart homes and smart appliances at all but want things analog Or why tech people raise their kids without mobile devices Like knowing things so well from inside of it (tech) that you choose to NOT use it because you know the negatives that come with it in specific contexts
Top Gear@BBC_TopGear

"A large touchscreen doesn't work in a car": Sir Jony Ive on designing the Ferrari Luce's interior ➡️ top-gear.visitlink.me/yTpZer

English
436
223
4.3K
640.7K
Salik Shah ✨🚀 retweetledi
Marc Andreessen 🇺🇸
Silly Business Theory is right: in the future the best work, greatest progress, most valuable innovations won't come from laboring under the false consciousness that work must be hard and serious to produce value. The best work is going to come from people playing and having fun.
47fucb4r8curb4fc8f8r4bfic8r@47fucb4r8c69323

The more I look at this the more impressed I am and the more I realize how grateful we should be to Tao. 1. He acknowledges ignorance: this is something academics almost never do since their cultural capital is tied up in them knowing things. But he can since, well, he's Terence Tao. 2. He is explicitly acknowledging his use of GenAI to fight the stigma of using AI. If the child prodigy turned UCLA prof who studied with Erdos uses AI, it is legitimate technology. (please start using this sentence with AI skeptics btw) 3. He is also showing how AI is best used: as a kind of syntactic tool that finds connections in possibility space and has access to a larger library of information than our brains can. There's more here but the cool internet thing is a list of three. I often lament Tao has too playful of a mode of operating, feeling like he plays with linear algebra when he should be doing foundations of mathematics. But not only does this moment prove my view wrong, it also proves just how much Silly Business Theory #SBT is right: in the future the best work, the greatest progress, and the most valuable innovations won't come from people laboring under the false consciousness of Protestantism and Marxism that asserts work must be hard and serious to produce value. The best work is going to come from people playing and having fun. We're on the cusp of a near utopian explosion in human potential and quality of life. And you're bearish?!?!?!?!??????!?

English
102
159
2.1K
327.5K
Salik Shah ✨🚀
My algorithm keeps surfacing your tweets, and they land. Apologies in advance for being your reply guy, but this one hits close to home. One push: India doesn't have "a few decades." Crazy times demand crazy deadlines. This has to be a decade, max. Warp speed. War footing. AI assistant in every student's hand. Labs in every school and factory floor. Talent diaspora pulled back with real equity, real mandate, real ownership. Not just subsidies. Land, ocean, space, moon, Mars. We mobilize at that scale or we don't mobilize at all. The AI question you raise at the end is the real wildcard. It may compress decades of capability-building into years. That's the bet worth making. x.com/Rainmaker1973/…
English
0
0
1
19
Dinesh Pai
Dinesh Pai@dineshpaii·
This article on the Indian semiconductor industry misses one specific point. Beyond subsidies, land, water and labor regulations, we need the best of our talent to return to India. We've got so many Indians across the best of the semiconductor companies. Without that talent, nothing else matters. (Sorry, Metallica) If we can frame a moonshot Public - Private partnership, which can enable good pay (competitive, with perhaps some equity upside), visibility for outlandish projects to take shape, speed of execution, encouragement, policy to have incentives for local fabs, we might, just might have a state of the art fab in a few decades. Yep, even doing all things right will be barely enough. But like everything else, we should do all the right things and hope to get lucky. Wonder if AI will change the talent equation though. :)
Dinesh Pai tweet media
English
3
4
36
1.4K
Salik Shah ✨🚀 retweetledi
Tony Fadell
Tony Fadell@tfadell·
Most tech companies break out product management and product marketing into two separate roles: Product management defines the product and gets it built. Product marketing wires the messaging- the facts you want to communicate to customers- and gets the product sold. But from my experience that's a grievous mistake. Those are, and should aways be, one job. There should be no separation between what the product will be and how it will be explained- the story has to be utterly cohesive from the beginning. Your messaging is your product. The story you're telling shapes the thing you're making. I learned story telling from Steve Jobs. I learned product management from Greg Joswiak. Joz, a fellow Wolverine, Michigander, and overall great person, has been at Apple since he left Ann Arbor in 1986 and has run product marketing for decades. And his superpower- the superpower of every truly great product manager- is empathy. He doesn't just understand the customer. He becomes the customer. So when Joz stepped into the world with his next-gen iPod to test it out, he fiddled with it like a beginner. He set aside all the tech specs- except one: battery life. The numbers were empty without customers, the facts meaningless without context. And, that's why product management has to own the messaging. The spec shows the features, the details of how a product will work, but the messaging predicts people's concerns and finds way to mitigate them. - #BUILD Chapter 5.5 The Point of PMs
English
70
225
2.3K
765.9K
Salik Shah ✨🚀 retweetledi
Md Riyazuddin
Md Riyazuddin@riyazmd774·
🚨 In 1992, a MIT lecture quietly revealed more about product and sales than most 2-year MBAs ever will. Most people have never seen it. It came from Steve Jobs and instead of teaching theory, he broke down how great products actually win. Watching it today feels unreal. He explained that people don’t buy products they buy meaning. The best products aren’t just functional, they connect with how people see themselves. That’s why some ideas spread effortlessly while others die, even if they’re technically better. He also made it clear that marketing isn’t about features. It’s about clarity. If you can’t explain why your product matters in simple terms, it won’t matter at all. Complexity doesn’t impress it confuses. And his biggest edge? Obsession with experience. Not just what the product does, but how it feels. The small details, the simplicity, the story that’s what separates good from unforgettable. That’s why this MIT lecture still hits hard. Because while most people are building products… Very few understand why people actually buy them.
English
29
594
2.4K
245.3K
Salik Shah ✨🚀 retweetledi
clem 🤗
clem 🤗@ClementDelangue·
I think it’s @NaveenGRao who said it before but wouldn’t be surprised if the frontier labs cut their APIs entirely at some point. In a compute constrained world, they’ll always prioritize their own direct products/customers. Makes it scary and unsustainable to only build on top of their APIs!
English
55
36
497
72.9K