Nitish Mutha ⚡️

754 posts

Nitish Mutha ⚡️

@nitmusai

Co-founder and CTO @GenieAI - Building Agentic Lawyer to do your legals. @UCL alum.

London Katılım Temmuz 2009

386 Takip Edilen4.4K Takipçiler

Nitish Mutha ⚡️@nitmusai·5h

@aakashgupta Horizontal enabling layer is the right frame. The AWS analogy holds but the timeline compresses. AWS took 10 years to commoditize compute. AI is commoditizing intelligence in 3. The vertical builders who understand this have a window right now but it's closing fast.

English

Aakash Gupta@aakashgupta·4d

Bezos just mass-humbled every AI startup charging a subscription fee. And nobody caught the real tell. He called AI a "horizontal enabling layer." Wall Street heard buzzword. Bezos was giving you the exact playbook he already ran. 2006. Amazon launches AWS. Every enterprise software company laughed. They sold servers, licenses, maintenance contracts. Complexity was the product. If you couldn't navigate the maze, you couldn't compete. AWS dissolved the maze. Today AWS runs at $142 billion in annualized revenue. It generates more than half of Amazon's total operating income. The companies that tried to "sell cloud" as a product in 2008? Most of them are Wikipedia entries now. Now look at AI. Thousands of companies selling wrappers. Features. Subscription tiers. The same pattern playing out at 10x the speed. Bezos has watched this movie before because he directed it. When he says "horizontal enabling layer," he's describing the architecture Amazon has been building for two decades. You don't sell the substrate. You don't compete with the substrate. You build on top of it or you get absorbed into it. The companies that will print money from AI won't sell AI. They'll sell what AI makes possible. The margin lives one layer above the infrastructure. Always has. Every AI startup charging $20/month for a chatbot wrapper is a candle company in 1882. The grid is coming. And the guy who built the last one just told you.

English

415

89.4K

Nitish Mutha ⚡️@nitmusai·5h

@econcallum Goldman is always late to these calls but when they're publishing on it you know the institutional money is paying attention. The AI software productivity numbers are going to look unrecognizable in 5 years vs what anyone modeled in 2023.

English

Callum Williams@econcallum·3d

Insane datapoint from new Goldman report on AI and software

English

152

1.2K

220K

Nitish Mutha ⚡️@nitmusai·5h

@alexalbert__ Vision has always been the scarce resource, not execution. Now that execution is democratized, the people who win are the ones who know exactly what they want and can articulate it precisely. Taste becomes the moat.

English

Alex Albert@alexalbert__·1d

Everyone with a vision can produce very high-quality designs now (with a lil help from Claude)

Claude@claudeai

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English

593

34.8K

Nitish Mutha ⚡️@nitmusai·5h

@Flomerboy Design system setup upfront is the right investment. The output quality difference between someone who's done that groundwork vs someone who hasn't is massive. Taste in what to specify is now the differentiator, not the ability to execute it.

English

Ryan Mather@Flomerboy·1d

🧵 My tips for getting the best results out of Claude Design! I’m on the verticals team at Anthropic which means I serve 7 different products. Claude Design makes it possible! 1. Set up your design system and your core screens. An hour of setup and refinement here is worth it

Claude@claudeai

English

175

656

8.5K

1.4M

Nitish Mutha ⚡️@nitmusai·5h

@DeRonin_ 15 minutes to setup, 80% automation. That ratio is what people don't believe until they try it. The CLAUDE.md file is doing more work than people realize. It's not config, it's the agent's operating context.

English

Ronin@DeRonin_·6d

🚨 A google engineer automated 80% of his work with claude code.. he now works 2-3 hours a day instead of 8 while the system runs itself as CEO of Antrhopic predicted in this video (there's few more insights btw) [ what blew my mind ]: the entire setup takes 15 minutes.. not weeks > one CLAUDE.md file based on karpathy's rules > claude stops breaking conventions (violations drop from 40% to 3%) > 27 specialized agents ready out of the box > a dotnet app checks gitlab every 15 minutes > claude reads issues, creates branches, pushes PRs automatically [ the part nobody's talking about ]: claude code v2.1.100 is silently inflating your tokens by 20,000 per request.. you can't even see it your instructions get diluted.. quality drops.. and you have no idea why fix: one command, 30 seconds [ and the craziest part ]: between you and full automation there are 3 commands and one markdown file most developers will never set this up.. not because it's hard, but because they think it is full breakdown with every step is here

Noisy@noisyb0y1

x.com/i/article/2043…

English

189

998.5K

Nitish Mutha ⚡️@nitmusai·5h

@AlexFinn Project-organized sessions is what makes this shift from tool to workspace. The Cowork integration is what makes it genuinely useful for non-engineers. The interface is finally catching up to the capability.

English

Alex Finn@AlexFinn·4d

The new Claude Code desktop app is sick. You NEED to be testing this Fully customizable interface, multi tasking, organizaed by project, built in routines, integrated with Cowork and chat Here's what I'd set up first: • Start a session in each of your projects so they're all listed in the side bar • Customize your right hand side bar. I like to have open tasks and the plan so I can watch the agent work • Set up a routine so the agent reviews your recent commits every night. Have it look for bugs to fix • Pin your most important sessions Been having a blast coding with it the last couple hours. Definitely feeling a lot more productive Critical you try all the new tools and updates when they come out

Claude@claudeai

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English

123

971

121.8K

Nitish Mutha ⚡️@nitmusai·5h

@levie Chat to agents is the right transition framing. What I keep hearing from enterprise conversations is that the readiness gap is real, not on the tool side but on the org process side. The tool is ready, the workflows aren't.

English

Aaron Levie@levie·6d

Another week on the road meeting with a couple dozen IT and AI leaders from large enterprises across banking, media, retail, healthcare, consulting, tech, and sports, to discuss agents in the enterprise. Some quick takeaways: * Clear that we’re moving from chat era of AI to agents that use tools, process data, and start to execute real work in the enterprise. Complementing this, enterprises are often evolving from “let a thousand flowers bloom” approach to adoption to targeted automation efforts applied to specific areas of work and workflow. * Change management still will remain one of the biggest topics for enterprises. Most workflows aren’t setup to just drop agents directly in, and enterprises will need a ton of help to drive these efforts (both internally and from partners). One company has a head of AI in every business unit that roles up to a central team, just to keep all the functions coordinated. * Tokenmaxxing! Most companies operate with very strict OpEx budgets get locked in for the year ahead, so they’re going through very real trade-off discussions right now on how to budget for tokens. One company recently had an idea for a “shark tank” style way of pitching for compute budget. Others are trying to figure out how to ration compute to the best use-cases internally through some hierarchy of needs (my words not theirs). * Fixing fragmented and legacy systems remain a huge priority right now. Most enterprises are dealing with decades of either on-prem systems or systems they moved to the cloud but that still haven’t been modernized in any meaningful way. This means agents can’t easily tap into these data sources in a unified way yet, so companies are focused on how they modernize these. * Most companies are *not* talking about replacing jobs due to agents. The major use-cases for agents are things that the company wasn’t able to do before or couldn’t prioritize. Software upgrades, automating back office processes that were constraining other workflows, processing large amounts of documents to get new business or client insights, and so on. More emphasis on ways to make money vs. cut costs. * Headless software dominated my conversations. Enterprises need to be able to ensure all of their software works across any set of agents they choose. They will kick out vendors that don’t make this technically or economically easy. * Clear sense that it can be hard to standardize on anything right now given how fast things are moving. Blessing and a curse of the innovation curve right now - no one wants to get stuck in a paradigm that locks them into the wrong architecture. One other result of this is that companies realize they’re in a multi-agent world, which means that interoperability becomes paramount across systems. * Unanimous sense that everyone is working more than ever before. AI is not causing anyone to do less work right now, and similar to Silicon Valley people feel their teams are the busiest they’ve ever been. One final meta observation not called out explicitly. It seems that despite Silicon Valley’s sense that AI has made hard things easy, the most powerful ways to use agents is more “technical” than prior eras of software. Skills, MCP, CLIs, etc. may be simple concepts for tech, but in the real world these are all esoteric concepts that will require technical people to help bring to life in the enterprise. This both means diffusion will take real work and time, but also everyone’s estimation of engineering jobs is totally off. Engineers may not be “writing” software, but they will certainly be the ones to setup and operate the systems that actually automate most work in the enterprise.

English

253

643

5.3K

1.7M

Nitish Mutha ⚡️@nitmusai·5h

@eglyman The insight here is exactly right: the bottleneck isn't model capability, it's setup friction and context distribution across a team. Day-one configured workspaces with pre-built integrations is how you get real org-wide adoption, not just power user pockets.

English

Eric Glyman@eglyman·6d

99% of Ramp uses ai daily. but we noticed most people were stuck — not because the models weren't good enough, but because the setup was too painful and unintuitive for most. terminal configs, mcp servers, everyone figuring it out alone. so we built Glass. every employee gets a fully configured ai workspace on day one — integrations connected via sso, a marketplace of 350+ reusable skills built by colleagues, persistent memory, scheduled automations. when one person on a team figures out a better workflow, everyone on that team gets it and gets more productive. the companies that make every employee effective with ai will compound advantages their competitors can't match. most are waiting for vendors to solve this. we decided to own it.

Seb Goddijn@sebgoddijn

x.com/i/article/2042…

English

127

168

3.6K

1.3M

Nitish Mutha ⚡️@nitmusai·5h

@assaf_elovic Agent SEO is the new SEO. Not GEO, not traditional SEO optimization for humans but making your product discoverable and usable by AI agents directly. The companies that get this now are building a moat that will compound hard over the next 2 years.

English

Assaf Elovic@assaf_elovic·23h

This might be the most important score your website will ever get. Agents are about to become the #1 source of traffic on the internet. Sites that aren't ready will disappear. orank.ai spawns real agents on your site and tells you exactly where you stand. Scan your site in 1 minute → orank.ai

English

118

30.7K

Nitish Mutha ⚡️@nitmusai·5h

@geminicli Separate context windows per subagent is the right call. Context pollution is one of the biggest silent killers of complex agent tasks. Clean, focused subagents with curated tools beat one bloated orchestrator every time.

English

Gemini CLI@geminicli·3d

Long time in the making... Subagents! 🧠✨ Each subagent comes with a separate context window, custom system instructions, and curated set of tools. • Create specialized expert agents 🤖 • Keep the main agent focused and context clean ✨ • Delegate work to parallel agents at the same time👥 Read the blog below for details 👇

Jack Wotherspoon@JackWoth98

Subagents have arrived in Gemini CLI! 🤖🚀 Create your own custom subagents in @geminicli! Subagents are specialized, expert agents that the main agent can delegate work to. 📦- Subagents have their own set of tools, MCP servers, system instructions, and context window. 🏷️- Use @agent to explicitly delegate to a subagent 🧹- Keeps the main context window clean ⚡️- Speed up work by running agents in parallel Read more in the launch blog below 👇

English

236

152.8K

Nitish Mutha ⚡️@nitmusai·5h

@SlackHQ 10 minutes to live agent in Slack is genuinely impressive. The bottleneck was always the scaffolding, not the model. Removing that friction opens this up to a much wider set of builders.

English

Slack@SlackHQ·2d

Deploying agents shouldn't take days. In Slack, it doesn't. ⚡ Slack Agent Kit is here, with enhanced Bolt frameworks and new CLI commands. Run "slack create agent." Pick a Python template. 10 minutes later, your agent is live in Slack with streaming text and "thinking" statuses built in.

English

25.4K

Slack@SlackHQ·2d

AI agents are everywhere. But if they live in isolated browser tabs, disconnected from where your team actually works, they're just expensive toys. Slack is changing that. Here's what we're shipping for developers 👇

English

839

219.3K

Nitish Mutha ⚡️@nitmusai·5h

@omarsar0 Multi-principal agents are the unsolved real-world problem. Single-user is easy mode. The moment an agent serves a team, you get permission conflicts, context asymmetry and trust hierarchy issues all at once. This research direction matters a lot.

English

elvis@omarsar0·4d

// Multi-User LLM Agents // Every agent framework assumes one user giving instructions. But deploy an agent into a team workflow, and suddenly it has multiple bosses with conflicting goals, private information, and different authority levels. This work formalizes multi-user interaction as a multi-principal decision problem and introduces Muses-Bench with three scenarios: instruction following under authority conflicts, cross-user access control, and multi-user meeting coordination. Even the best model, Gemini-3-Pro, only averages 85.6% across tasks. On meeting coordination, no model exceeds 64.8% success rate. Privacy-utility tradeoffs are especially brutal: models that score near-perfect on privacy (Grok-3-Mini at 99.6%) tank on utility (60.1%). Why does it matter? As agents move into organizational tools, Slack bots, and shared workspaces, multi-principal conflicts become the default, not the exception. Current models aren't ready. They leak more privacy over multi-turn interactions and can't maintain stable prioritization under conflicting objectives. Paper: arxiv.org/abs/2604.08567 Learn to build effective AI agents in our academy: academy.dair.ai

English

199

18.9K

Nitish Mutha ⚡️@nitmusai·5h

@AndrewYNg The PM bottleneck framing is right and it changes everything. Engineering velocity is no longer the constraint. The question is whether organizations can retrain themselves to decide faster, ship hypotheses and iterate. That discipline is harder than it looks.

English

Andrew Ng@AndrewYNg·5d

As AI agents accelerate coding, what is the future of software engineering? Some trends are clear, such as the Product Management Bottleneck, referring to the idea that we are more constrained by deciding what to build rather than the actual building. But many implications, like AI’s impact on the job market, how software teams will be organized, and more, are still being sorted out. The theme of our AI Developer Conference on April 28-29 in San Francisco is The Future of Software Engineering. I look forward to speaking about this topic there, hearing from other speakers on this theme, and chatting with attendees about it. We’re shaping the future, and I hope you will join me there! It is currently trendy in some technology and policy circles to forecast massive job losses due to AI. Even if they have not yet materialized, these losses certainly must be just over the horizon! I have a contrarian view that the AI jobpocalypse — the notion that AI will lead to massive unemployment, perhaps even rioting in the streets — won’t be nearly as bad as dire forecasts by pundits, especially pundits who are trying to paint a picture of how powerful their AI technology is. Among professions, AI is accelerating software engineering most, given the rise of coding agents. According to a new report by Citadel Research, software engineering job postings are rising rapidly. So if software engineering is a harbinger of the impact AI will have on other professions, this expansion of software engineering jobs is encouraging. Yes, fresh college graduates are having a hard time finding jobs. And yes, there have been layoffs that CEOs have attributed to AI, even if a large fraction of this was “AI washing,” where businesses choose to attribute layoffs to AI, even though AI has not changed their internal operations much yet. And yes, there is a subset of job roles, such as call center operator, that are more heavily impacted. Many people are feeling significant job insecurity, and I feel for everyone struggling with employment, whether or not the cause is AI-related. And many other factors, such as over-hiring during the pandemic and high interest rates, have contributed to the slowdown in the labor market, and the notion that AI is leading to unemployment is oversimplified. In software engineering, I see a lot of exciting work ahead to adapt our workflows. It is already clear that: (i) As AI makes coding easier, a lot more people will be doing it. (ii) Writing code by hand and even reading (generated) code is not that important, because we can ask an LLM about the code and operate at a higher level than the raw syntax (although how high we can or should go is rapidly changing). (iii) There will be a lot more custom applications, because now it’s economical to write software for smaller and smaller audiences. (iv) Deciding what to build, more than the actual building, is becoming a bottleneck. (v) The cost of paying down technical debt is decreasing (since AI can refactor for you). At the same time, there are also a lot of open questions for our profession, such as: - In the future, what will be the key skills of a senior software engineer? And for junior levels, what should be the new Computer Science curriculum? - If everyone can build features, what skills, strategies, or resources create competitive advantage for individuals and for businesses? - What are the new building blocks (libraries, SDKs, etc.) of software? How do we organize coding agents to create software? - What should a software team look like? For example, how many engineers, product managers, designers, and so on. What tooling do we need to manage their workflow? - How do AI agents change the workflow of machine learning engineers and data scientists? For example, how can we use agents to accelerate exploring data, identifying hypotheses, and testing them? I’m excited to explore these and other questions about the future of software engineering at AI Dev. I expect this to be an exciting event. Please join us! [Original text: The Batch newsletter.] ai-dev.deeplearning.ai

English

119

163

852

100.6K

Nitish Mutha ⚡️@nitmusai·5h

@heynavtoor Knowledge graphs plus forgetting curves is the right combination. The original wiki pattern was about accumulation, v2 is about curation. That's the harder and more valuable problem. @GenieAI has been thinking about this exact tension in document memory.

English

Nav Toor@heynavtoor·12 Nis

Karpathy's LLM Wiki got 5,000 stars in 48 hours. Now someone extended it with the features it was missing. Memory lifecycle. Confidence scoring. Knowledge graphs. Automated hooks. Forgetting curves. It's called LLM Wiki v2. The original pattern was brilliant. AI builds a wiki instead of re-deriving knowledge from scratch every time. But it treated all knowledge as equally valid forever. In practice, that breaks. Here's what v2 adds: → Confidence scoring. Every fact carries a score. How many sources support it. How recently confirmed. Whether anything contradicts it. Knowledge that decays over time. Not everything is equally true forever. → Memory tiers. Working memory for recent observations. Episodic memory for session summaries. Semantic memory for cross-session facts. Procedural memory for workflows. Each tier more compressed and longer-lived. → Knowledge graph. Not flat pages with links. Typed entities with typed relationships. "A caused B, confirmed by 3 sources, confidence 0.9." Graph traversal catches connections keyword search misses. → Hybrid search. BM25 for keywords. Vector search for semantics. Graph traversal for structure. Fused with reciprocal rank fusion. Replaces the index .md file that breaks past 200 pages. → Automated hooks. On new source: auto-ingest. On session end: compress and file. On schedule: lint, consolidate, decay. The bookkeeping that kills wikis is now fully automated. → Forgetting curves. Facts that haven't been accessed or reinforced in months fade. Not deleted. Deprioritized. Architecture decisions decay slowly. Transient bugs decay fast. → Contradiction resolution. AI doesn't only flag contradictions. It resolves them based on source recency, authority, and supporting evidence. Here's the wildest part: The original LLM Wiki was a flat collection of equally-weighted pages. This turns it into a living system with memory that strengthens, weakens, consolidates, and forgets. Like a real brain. "The Memex is finally buildable. Not because we have better documents or better search, but because we have librarians that actually do the work." Built on lessons from agentmemory, a persistent memory engine for AI agents. Extends Karpathy's original. Open Source.

English

274

2.3K

178.9K

Nitish Mutha ⚡️@nitmusai·5h

@claudeai The pace of iteration from Anthropic this year has been remarkable. Opus keeps raising the ceiling on what complex reasoning looks like in practice. The teams building on top of this have a serious advantage right now.

English

Claude@claudeai·2d

Claude Opus 4.7 is available today on claude.ai, the Claude Platform, and all major cloud platforms. Read more: anthropic.com/news/claude-op…

English

117

286

3.2K

651.5K

Claude@claudeai·2d

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English

4.7K

10.3K

80.9K

12.9M

Nitish Mutha ⚡️@nitmusai·5h

@a16z Coding eating knowledge work is the right frame. Every domain that relies on pattern recognition, synthesis and structured output is exposed. The solopreneur thesis follows naturally from that. One person with the right agent stack can now punch way above their weight.

English

a16z@a16z·6 Nis

“Coding will eat all knowledge work” Peter Yang joins a16z’s Anish Acharya to discuss the post-AI future of work, why AI will create more solopreneurs, why human ambition means there will always be new jobs, and more. 00:00 Intro 01:56 Using OpenClaw for voice, memory & daily life 06:14 Will agents kill apps & SaaS? 11:57 Coding agents: Claude Code vs. Codex 17:00 Future of work: small teams, agents & company culture 24:00 How agents change consumer products & the economy @petergyang @illscience

English

338

95.1K

Nitish Mutha ⚡️@nitmusai·5h

@milesdeutscher @aiedge_ The context file trick is underused. System context is the difference between a smart reply and a system that actually knows what you're building. Taste is knowing what to put in it.

English

Miles Deutscher@milesdeutscher·2d

My most powerful vibe coding prompt ever. Plug this straight into the new Opus 4.7, and you'll literally be able to ship anything. Fully functional apps, landing pages, custom artifacts - you name it. Pro tip: plug this into a "vibe coding" expert project as a context file.

English

860

68.2K

Nitish Mutha ⚡️@nitmusai·5h

@akshay_pachaar The inversion framing is exactly right. The model is the executor, not the brain. The harness is where your competitive differentiation lives. That's why two companies with the same underlying model can produce wildly different outcomes.

English

Akshay 🚀@akshay_pachaar·23h

A harnessed LLM agent. Most people picture this as a model with tools bolted on. The real architecture inverts that relationship. The model itself is deliberately thin. Intelligence gets pushed outward, and the harness composes it at runtime. Three dimensions orbit the harness core: 𝗠𝗲𝗺𝗼𝗿𝘆 holds state the model shouldn't carry in weights or context. Working context, semantic knowledge, episodic experience, and personalized memory each have their own lifecycle. 𝗦𝗸𝗶𝗹𝗹𝘀 hold procedural knowledge. Operational procedures, decision heuristics, and normative constraints specialize the general model per task. 𝗣𝗿𝗼𝘁𝗼𝗰𝗼𝗹𝘀 hold the interaction contracts. Agent-to-user, agent-to-agent, and agent-to-tools are three distinct surfaces with their own failure modes. Between the core and these modules sit the mediators: sandboxing, observability, compression, evaluation, approval loops, and sub-agent orchestration. They govern how the harness reaches out and how state flows back in. The useful question this framing unlocks: for any new capability, where should it live? Stable knowledge goes to memory, learned playbooks go to skills, communication contracts go to protocols, loop governance goes to the mediators. Harness design becomes a question of what to externalize, and how to mediate it. I'm building a minimal agent harness from scratch. Didactic, easy to read, no magic. Open-sourcing it soon. Stay tuned.

GIF

Akshay 🚀@akshay_pachaar

x.com/i/article/2040…

English

178

1.1K

157.8K

Nitish Mutha ⚡️@nitmusai·5h

@Av1dlive That guide is underrated. The core insight that most miss: effective agents aren't about smarter models, they're about cleaner tool design and tighter feedback loops. The engineering discipline is the same as good software, just applied differently.

English

Avid@Av1dlive·2d

In 14 minutes, this Anthropic engineer who wrote "Building Effective Agents" will teach you more about building them right than most developers figure out on their own in months. Bookmark this for the weekend. Then read the builder's guide below.

Avid@Av1dlive

x.com/i/article/2044…

English

1.4K

10.7K

1.6M

Nitish Mutha ⚡️@nitmusai·5h

@RoundtableSpace The pace is wild. We've seen more frontier model releases in 4 months than we did in all of 2023. What's interesting is how quickly the capability gap between labs is narrowing. The real differentiation is moving to context, tools and deployment.

English

0xMarioNawfal@RoundtableSpace·2d

HERE’S A LIST OF ALL THE AI MODELS DROPPED SO FAR THIS YEAR: - Claude Opus 4.6 - Claude Sonnet 4.6 - Claude Opus 4.7 - Claude Mythos Preview - Gemini 3.1 Pro - Gemini 3 Flash - Gemma 4 26B-A4B - Gemma 4 variants - GPT-5.3 Codex - GPT-5.3 Codex Spark - GPT-5.4 - GPT-5.4 mini - GPT-5.4 nano - Grok 4.20 - Grok 4 - Qwen 3.5 - Qwen3-Coder-Next - Kimi K2 - Kimi K2.5 - DeepSeek V3.2 - GLM-5 - GLM-5.1 - Muse Spark - Llama 4 Maverick - MiniMax M2.5 - MiniMax M2.7 - MAI-Transcribe-1 - MAI-Voice-1 - MAI-Image-2 It’s one quarter into the year…

English

438

75.8K

Keşfet

@aakashgupta @econcallum @alexalbert__ @Flomerboy @DeRonin_ @AlexFinn @levie @eglyman