Pratyush Choudhury (PC)

1K posts

Pratyush Choudhury (PC) banner
Pratyush Choudhury (PC)

Pratyush Choudhury (PC)

@177pc

Activating AI in India | Past: @awscloud, @scaletogether | Previously backed/helped: @emergentlabs, @composio, @rocketdotnew, @thesysdev & more | Views my own

Join 7.8k+ founders & execs → Katılım Haziran 2020
176 Takip Edilen3.4K Takipçiler
Sabitlenmiş Tweet
Pratyush Choudhury (PC)
I like @deedydas's work but but this take misses context Sarvam-M isn’t a vanity fine-tune; it’s India’s first open-weights 24 B Indic-centric LLM built under brutal GPU & data scarcity. Judging it by few hours of HuggingFace stats badly misses the point. Most people outside India don't appreciate that compute is quite the invisible ceiling - H100 clusters are still not commercially stocked in India - US export caps tightening next week will squeeze supply even further - Indian teams literally queue for hours of A100/H100 time that US & CN labs get on tap Data is also the long-tail problem Indic languages form <0.01 % of CommonCrawl. You read that right—two orders of magnitude less than Chinese or Spanish. Any local lab must build its corpus first, then train. That’s months of ETL before the first gradient step. Synthetic data is GPU-constrained. Talent pipeline is still forming HPC + RLHF + compiler-level optimisation is new ground in India; Sarvam’s run has already up-skilled dozens of engineers who now know how to wrangle 10 k GPU-hours, FP8 PTQ & GRPO reward engines. Their detailed blog post democratizes a lot of this learning. You can’t AWS-credit your way to that muscle memory. What Sarvam actually shipped - 3.7 M high-diversity Indic prompts, deduped & quality-scored - Two-phase non-think/think alignment that adds+2 pp on IndicGen - GRPO RL with partial-credit rewards—LiveCodeBench jump 0.23→0.44 - FP8 + look-ahead decoding: 2× tokens/s, ½ $/M tok on H100 That means a 🇮🇳-hosted midsize model now matches Gemma-3 27 B and Llama-3.3 70 B on Indic reasoning while costing a fraction to serve. That’s some engineering leverage & definitely not hype. Model adoption is anyways a long-tail - one needs to ship multiple models of non-frontier quality to eventually be able to get to the one that's truly at the frontier (at least along dimensions that we care about). Plus, there's a whole host of Indic-language use-cases where this sovereign model would work much better compared to using any other open-weights model. Look at (LiveCodeBench 0.23→0.44, 2× tokens/s) If you ask for stats, you'll learn that some of their conversational AI platform reaches out to about 50M+ people in just a week. What's next possibly? - Maybe we all recognize the data problem & do Nation-scale data-collection drives (something like CommonCrawl-IN) - Public RL-as-a-service clusters so smaller labs can replicate GRPO - For devs wanting to push the Indic NLP forward, consider forking Sarvam-M, fine-tuning on your domain corpus, benchmarking on Indic-Eval, contributing back patches. Each derivative model widens the knowledge base & closes the English–Indic gap. In summary, celebrating Sarvam's work (I'm not an investor) isn't nationalism, it's recognizing an innovation feat under constraints - India can't out-GPU Mountain View today but there's technical merit on display here, regardless of the metrics. 👏 @pratykumar, @AashaySachdeva, @HarveenChadha & other friends from @SarvamAI Here's to more AI in 🇮🇳
Deedy@deedydas

India's biggest AI startup, $1B Sarvam, just launched its flagship LLM. It's a 24B Mistral small post trained on Indic data with a mere 23 downloads 2 days after launch. In contrast, 2 Korean college trained an open-source model that did ~200k last month. Embarrassing.

English
20
74
584
98K
Devansh Shah
Devansh Shah@theboyinatux·
@aakrit @177pc Who's building scale AI for robotics? Think that's real arbitrage and I can connect with robotics labs in berkeley
English
1
0
0
46
Aakrit Vaish
Aakrit Vaish@aakrit·
In the 3 weeks since the IndiaAI Summit, @177pc and I have met 43 founding teams. Deeply technical, most in their mid-late 20s, and building for India or with a strong India-edge. Some common themes we saw and like: - AI-led IT Services: What does the Palantir for India look like? Reimagining Infosys/TCS/Wipro AI-first. - Compute for India: As enterprise demand ramps up, the country needs more sovereign inference and infra capabilities. - AI for the "real world": Purpose-built models for material sciences, biotech, manufacturing, security & defence. - Healthcare AI: Both India's AI doctors and AI agents that transform primary care. - Physical AI: More than just the end robots, can India provide the data infrastructure for the world? (Scale AI for robotics) - Voice AI: Everything from foundational research to vertical agents to full stack solutions in what will probably be the largest Voice AI market globally. Indian AI startups will look different and have their own lane. We’re just starting to see the first signs.
English
19
26
266
16.6K
Pratyush Choudhury (PC)
🇮🇳's AI arrives on the world stage, representing a critical milestone in establishing a sovereign AI stack for India (1) @SarvamAI just open-sourced 2 MoE reasoning models - 30B & 105B - trained from scratch entirely in India on IndiaAI Mission compute (2) It's a robust GRPO-based RL pipeline, validating that frontier-tier reasoning, programming & agentic capabilities can be indigenized (3) BrowserComp is a sleeper hit for the 105B - nearly 17x improvement than DeepSeek's R1 suggests their agentic RL pipeline (tool use, search integration) is genuinely differentiated (4) 30B model seems to be the true disruptor - combining Grouped Query Attention (GQA) w/ an ultra-efficient Indic tokenizer (yielding up to a 10x performance delta), the 30B fundamentally alters inference economics for edge & real-time enterprise deployments in the subcontinent (5) 30B's benchmark numbers would have been frontier-class ~12 months ago & the inference optimization story (3-6x throughput on H100) makes it a plausible production model for cost-sensitive Indian enterprise deployments (6) The 105B model demonstrates exceptional depth in tool interaction & environment reasoning. The RL pipeline's use of an asynchronous GRPO architecture (notably bypassing standard KL-divergence constraints against a reference model) explicitly rewards verifiable multi-step execution over mere conversational chattiness. (7) The full-stack inference optimization, achieving 20-40% higher token throughput via custom-shaped MLA optimizations and vocabulary parallelism, creates stickiness at the infrastructure layer that pure model-builders lack. (8) If Sarvam 30B becomes the default Indic voice/conversational model (which the inference economics support), it creates a meaningful wedge in the Indian BFSI conversational AI market. The 2.4B active parameter count at this quality level is a structural cost advantage v/s deploying GPT/Claude for Hindi/Tamil telephonic agents. (9) I see the Indic tokenizer + inference optimization stack as a compounding advantage. Every other model serving Indian languages pays a "tax" in token inefficiency and latency. This compounds across millions of API calls. (10) There are a couple of areas where I'd like to see improvements though: (10.1) SWE-Bench is the elephant in the room. For a model positioned around agentic workflows, this ~20-point gap on real-world software engineering tasks is material. It signals that while the model can reason well in structured settings, it struggles w/ the messy, multi-file, context-heavy nature of real codebases (10.2) In an era where vision-language is table stakes, both models are text-only. They acknowledge this - mentioning future models for "multimodal conversational tasks" - but it's a gap today. (11) For Sarvam as a company, this is a credibility-establishing release. @pratykumar & co have demonstrated they can train competitive models from scratch - a very short list globally. The question is whether the model business itself captures value or whether Sarvam's value creation is upstream (Samvaad platform, enterprise deployments) using these as proprietary infrastructure. (12) If I had a say, I'd suggest the following couple of things: (12.1) They could offer the 30B model completely free (including localized inference hosting) to Indian telcos & financial institutions for edge deployment, explicitly in exchange for federated access to their anonymized customer interaction data. This would create an insurmountable, proprietary data moat for future RLHF. (12.2) I'd think about aggressively commercializing the "romanized colloquial" capability into a proprietary API for WhatsApp/Telegram business layers. Indian commerce runs on WhatsApp in code-mixed "Hinglish" or "Tanglish" - dominating this exact syntactic niche captures the entire B2C transactional layer. (12.3) Voice AI vertical integration - Combining the 30B w/ their existing TTS/STT APIs into an end-to-end voice agent stack purpose-built for Indian BFSI could be a very high ROI product move. Regardless, this is the most credible "sovereign AI" release from India to date - long AI in India.
Pratyush Kumar@pratykumar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English
3
23
139
8.8K
Pratyush Choudhury (PC) retweetledi
Activate
Activate@ActivateSignal·
Signal Dialogues #01 is live. We built Signal because there wasn’t a clear, consistent voice telling the story of AI in India — across founders, capital, research, and policy. For our first episode, @aakrit sits down with @vkhosla and @mukundjha to go over the rise of Emergent — probably the fastest growing software company in history to hit $100M ARR. Shot at the sidelines of the @OfficialINDIAAI summit at the iconic @iitdelhi. Timestamps: 00:00 – Intro: Fastest Growing Software Company Ever? 07:28 – Mukund's Journey: Google → Dunzo → Emergent 22:23 – You're Limited by What You Think You Can Do 27:08 – The $3M Bet That Built the Internet 47:11 – $100B AI Company From India?
English
1
21
97
83.6K
Pratyush Choudhury (PC)
.@aakrit & I started Activate ~12 months ago w/ a conviction that India will both be a top consumer and builder of AI. Today, on the sidelines of the AI Summit, NVIDIA made that conviction official. NVIDIA and Activate are now exclusive partners unlocking the following for founders/startups: This unlocks 3 things: (1) Activate startups will have direct access to NVIDIA technical expertise for co-building support and CUDA platform integrations, open source models like Nemotron, tools, libraries and SDKs. (2) They will get access to new NVIDIA products, features and releases as fits their use-case to accelerate product development early. (3) NVIDIA and Activate will jointly identify high potential startups to invest & support & Activate will join the recently announced VC Alliance to work with not only NVIDIA Inception but also NVentures for potential investments Excited to be working with Tobias, Unnikrishnan & the entire NVIDIA leadership for betting on what India’s AI ecosystem can become, not just what it is today. And to every founder building at the frontier - the runway just got longer and the ceiling just got higher. We’re just getting started 💪 techcrunch.com/2026/02/19/nvi…
English
8
3
84
5.4K
Pratyush Choudhury (PC)
Sarvam just built India's first full-stack sovereign AI stack that actually works for 1.4 billion people Hype-chasers will miss this - but Sarvam just built something special yesterday at the IndiaAI Summit. Unfortunately, a lot of the discourse continues to miss the real story. Let's take a stock of the state of the union first: (1) Most people outside India (and even many inside) don’t fully appreciate the invisible ceilings we operate under here. H100 and Blackwell clusters are still not commercially stocked at a meaningful scale. (2) US export caps haven't helped, squeezing supply even further. (3) Indian teams literally queue for hours – sometimes days – of A100/H100/Blackwell time that US and Chinese labs get on tap. (4) The IndiaAI Mission provides shared compute, but it comes with strict allocation queues and governance. (5) Data is the even harder long-tail nightmare. Indic languages plus heavy code-mixing across 22 scheduled tongues form a tiny fraction of global corpora. You can’t simply scrape your way to high-quality pretraining data the way English-centric labs do. (6) Any serious local team must first build its own corpus – months of curation, cleaning, deduplication, and synthetic generation – before the very first gradient step. (7) Talent pipeline for HPC-scale MoE training, edge optimisation, and state-space architectures is still forming Despite all of this, the entire effort was pulled off w/ a core team of just 15 engineers & a meager corpus of ~4k GPUs - this is a REAL feat Yet they shipped India’s first credible sovereign full-stack in one coordinated go. Let's take a look at what all Sarvam actually built: (1) A 30B MoE model trained from scratch on 16T pure Indic tokens, 32k context length, ~1B active parameters per token – purpose-engineered for real-time voice conversations and agentic loops that feel completely native in Hinglish or any regional tongue. (2) A 105B MoE model (128k context, ~9B active parameters) reaching GLM-4.5-Air class performance on complex reasoning and long-form tasks - the practical walk-phase semi-frontier model that punches far above its headline size. (3) A 3B state-space Vision model that sets new SOTA on Indic OCR, tables, charts, and even historic Devanagari manuscripts – linear scaling that lets it handle 50-page mixed-language documents where transformers would choke on memory. (4) Sub-350MB edge models that finally make everything truly offline and population-scale: 74M Saaras STT with automatic language ID running 8.5× real-time on Snapdragon 8 Gen 3 (TTFT under 300 ms), 24M Bulbul TTS with natural voice cloning from just one hour of audio inside a 60MB footprint, and 150M bidirectional translation covering 110 language pairs across 10 Indic languages + English with zero English pivot. Smart choices everywhere that scream first-principles engineering. They chose a proven high-sparsity MoE backbone, layered Multi-Headed Latent Attention for massive KV-cache compression wins & partnered with NVIDIA’s Nemotron co-design for both training stability (MoE reinforcement learning is notoriously unstable) & 4× inference throughput on Blackwell. This is real pretraining plus RL solved under constraints that would make most global teams blink. The 105B isn’t 1T-parameter fireworks, but it is the walk-phase model that actually lands on ₹8k feature phones & smart glasses. That is exactly how you reach semi-frontier capability in 2026 w/o burning years on wheel-reinvention Model adoption is always long-tail. You need to ship multiple non-frontier quality pieces until the one that truly owns the dimensions we care about arrives. Sarvam just handed every Indian founder, builder, SME & policymaker a stack that actually works for farmers checking fertiliser prices in their dialect, street vendors negotiating deals in Hinglish, government departments processing 22-language documents & forms w/o any cloud round-trips, and millions more in everyday vernacular scenarios. This isn’t hype. This isn’t nationalism. It’s recognising a genuine engineering feat under constraints that most of the world never has to face – compute scarcity, data fragmentation, talent pipeline still maturing. A cracked team of engineers gave it their all over the past several weeks to do what many doubted as not doable in/from India - built usefully large, globally competitive models from scratch in India. India's own AI moment is arriving & all the stuff done by this amazing team tells us, "Yes, India can & India will" 👏 @SarvamAI, @pratykumar @vivek_raghavan, @_mohit_singla, @anand_404, @kediaharshit9, @AashaySachdeva, @sumanthd17, @ArpitDwivedi100, @HarveenChadha, @rkal4, @sushil_khyalia, @ManavSinghal157, @sohampetkar, @selfawareatom, @AnnaUpreti, @MeghMakwan33973 & the rest of the team
Pratyush Choudhury (PC) tweet mediaPratyush Choudhury (PC) tweet mediaPratyush Choudhury (PC) tweet mediaPratyush Choudhury (PC) tweet media
English
9
96
402
16.5K
Pratyush Choudhury (PC)
Isn't the AC analogy directionally right but structurally different? @GavinSBaker AC demand is continuous, passive & tied to physical climate while inference demand is spiky, parallelizable, & subject to algorithmic efficiency gains (distillation, quantization, speculative decoding) The ceiling could be even higher than AC - or efficiency gains could compress it The key question is whether demand elasticity outpaces efficiency gains. Historically, in computing, it always has (Jevons paradox)
English
0
0
3
913
Zaid
Zaid@zaidmukaddam·
Can anyone recommend good India-based VCs or investors?
English
26
3
86
12.7K
Pratyush Choudhury (PC) retweetledi
Aakrit Vaish
Aakrit Vaish@aakrit·
Packed house at the first AI Engineers Day in Bengaluru. Amazing set of technical builders from all over the country.
Aakrit Vaish tweet mediaAakrit Vaish tweet mediaAakrit Vaish tweet mediaAakrit Vaish tweet media
English
3
7
176
10K
Pratyush Choudhury (PC) retweetledi
Aakrit Vaish
Aakrit Vaish@aakrit·
Doors open today. Welcome to our new home in Bengaluru :)
Aakrit Vaish tweet mediaAakrit Vaish tweet mediaAakrit Vaish tweet mediaAakrit Vaish tweet media
English
130
23
1.5K
99.7K
Pratyush Choudhury (PC)
Spot on, @karpathy It feels like the transition from hand-crafting furniture to now directing a workshop of master apprentices. Seems like the emerging superpower isn't writing pristine code but curators of a codebase: knowing when to accept, refactor, or reject an AI output w/ taste + foresight. And the best engineers in 2026 might be those who maintain the lightest touch - guiding agents towards simplicity & elegance, catching the subtle drifts toward over-engineering before they compound. Excited (and a bit nervous) to see how this changes what we value in "great" code.
English
0
0
2
288
Andrej Karpathy
Andrej Karpathy@karpathy·
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
English
1.6K
5.4K
39.4K
7.6M
Pratyush Choudhury (PC)
How can India build defining AI companies? What kind of founders will build those & why do we love to meet them even before they're ready? What are we doing to solve this? @aakrit & I sit down w/ @waitin4agi_ to peel back the layers in the latest edition of @ActivateSignal
English
1
1
10
636
Pratyush Choudhury (PC) retweetledi
Aakrit Vaish
Aakrit Vaish@aakrit·
Why Activate? What's the thesis? And how is it designed to be more than just an AI fund? @177pc & I sat down with @waitin4agi_ to unpack all of this, why we work with founders at the idea stage, how talent & research matters more than capital, and what it will take for India to build globally relevant AI companies. @ActivateSignal
English
2
2
33
3.4K
Pratyush Choudhury (PC)
India Voice AI market is fundamentally different from RoW for 2 main reasons: (1) Voice is the primary interface for 500M+ users in India while it's a premium/escalation channel for developed markets (2) Given that, there's price sensitivity but high volumes Excited to deep dive w/ @krandiash on how @cartesia views the opportunity RSVP: luma.com/xwbhew36 (limited slots) W/ @aakrit, @ActivateSignal, @mumbai_tech_
Pratyush Choudhury (PC) tweet media
English
2
1
35
2.4K
Pratyush Choudhury (PC)
Clearly in practice, these AI agents struggle w/ reliability on the real, messy internet: diverse website designs, anti-bot measures, dynamic content, hallucinations (making up info), failing multi-step workflows, or violating terms of service via scraping/automation Google launching a new protocol requiring retailers to actively implement it is effectively an admission that agents aren't good enough yet to handle shopping-like tasks autonomously on the existing web
English
1
3
30
12K