Kush Khurana

192 posts

Kush Khurana

@OneInfiniteNow

12+ yrs building production AI · Ex-Hike · Visiting Prof, Ashoka AI, physics, economics, philosophy, evolution. One layer deeper. IIT Delhi · ISI Delhi

New Delhi, India Katılım Eylül 2021

97 Takip Edilen102 Takipçiler

Sabitlenmiş Tweet

Kush Khurana@OneInfiniteNow·4 Mar

Three years ago, Donald Knuth tested ChatGPT and called its output "convincing lies." Said he'd leave AI research to others. Last week he published a paper called "Claude's Cycles." It ends with "Hats off to Claude\!" What happened in between matters more than the Ramp chart. Knuth had been stuck on a combinatorics problem for weeks. Decomposing 3D grids into Hamiltonian cycles for any odd dimension. He'd solved the smallest case. The general solution wouldn't come. His colleague fed the problem to Claude. Over 31 attempts in about an hour, it tried brute force, pattern search, simulated annealing, then algebraic reformulation. It independently spotted that the structure matched a classical graph theory result nobody prompted it to look for. Then the interesting part. Claude found the construction but couldn't prove it was correct. Knuth wrote the proof himself. Five pages of rigorous mathematics. Same week: Cursor's AI agent ran autonomously for four days and solved a math research problem, producing results stronger than the human-written solution. No human co-author. Gemini Deep Think had already cracked 18 unpublished research problems in February, including disproving a decade-old conjecture. "Total Anthropic Victory" is a snapshot, not a conclusion. Mathematical research is splitting into two modes right now. The Knuth mode: human guides, AI explores, human proves. The autonomous mode: AI runs for days, human verifies later. Both producing real results. Neither is a victory lap. The Ramp data is about enterprise adoption. Speed. Better outputs at work. Knuth's paper is about something else. The most rigorous computer scientist alive went from "convincing lies" to "Hats off." That credibility shift doesn't show up in a revenue chart. Knuth and Claude solved it for odd dimensions. 3, 5, 7, all the way to 101. The even dimensions? Neither human nor AI has found a pattern. That problem is still wide open, in case anyone's looking for a research project.

Deedy@deedydas

Total Anthropic Victory.

English

571

Kush Khurana@OneInfiniteNow·23 Mar

Model providers are blocking harnesses. Harness companies are hiding models. And developers paying $200/month are locked out of the competition between them. Cursor launched Composer 2 days ago and called it self-developed. A developer found "kimi-k2p5-rl-0317-s515-fast" in the API response within 24 hours. Fine-tuned Kimi K2.5, Chinese open-source model. Second time. Composer 1 used a DeepSeek tokenizer. Using your Claude Max or Pro subscription through a model-agnostic alternative like OpenCode will get it revoked. Routing your Gemini subscription through OpenClaw can get you locked out of Google's AI services overnight. $250/month Ultra subscribers reported it with no warning - some claiming they lost access to Gmail and Workspace too. Appeals went to automated replies. OpenAI took the opposite approach and actively endorsed third-party use, giving free Pro subscriptions to developers using OpenCode and Cline. Model-agnostic harnesses are the only way developers benefit from competition between models. Lock them into your harness and they can't switch when a better or cheaper model drops. The EU went after Meta in February for restricting third-party AI on WhatsApp. I think the coding tool market is next. Now look at the numbers. Opus 4.6 on the API: $5 input, $25 output per million tokens. A heavy developer's daily usage at those rates runs $35-53. Over $1,000 a month. The Max subscription gives the same compute for a flat $200. A 5-18x discount. Put an autonomous agent in a loop and the multiplier blows past anything a flat rate was designed for. Inference costs have dropped 1,000x in three years. Opus 4.6 is 67% cheaper per token than Opus 4.1. OpenAI cut prices 80% year over year. Infrastructure is getting radically cheaper. Subscription prices haven't moved. Companies are capturing the efficiency as margin and blocking the harnesses that would expose how wide that gap has gotten. Somebody will build the honest pricing layer. Or developers will route around the whole thing. Open-source models already match frontier on coding benchmarks. A $2,500 GPU pays for itself in five months. The window for getting this right is shorter than these companies think.

English

117

Kush Khurana@OneInfiniteNow·19 Mar

You posted the chart yourself: Claude Code leapfrogged everyone in the same window Cursor's margins collapsed. I don't think that's a coincidence. When the model provider ships its own coding tool, it doesn't pay API margins to itself. Same model, zero middleman cost. The companies setting API prices are also shipping competing products. Anthropic has Claude Code. OpenAI has ChatGPT. Your cost of goods is controlled by someone who also wants your users. $20/month works exactly until they decide it doesn't.

English

490

Gergely Orosz@GergelyOrosz·19 Mar

I am hearing tons of complaints from Cursor customers at enterprise companies: A silent change put almost all models Cursor uses behind Max mode. Devs who used to manage to “spread out” monthly credits over a month see all of it used up in 1-2 days. Are furious + switching.

English

129

1.6K

270K

Kush Khurana@OneInfiniteNow·19 Mar

"Plagiarizing the land" is three words doing the work of two lawsuits and a congressional hearing. The real kill shot is the closer though. "Maximum conversation length exceeded." Somehow The Onion wrote a piece where the reader is the one who ran out of tokens. I've been going back and forth on whether this is satire or an accidental product demo. Starting to think The Onion doesn't know either.

English

1.4K

rohit@krishnanrohit·19 Mar

"The Onion’s Exclusive Interview With Sam Altman" theonion.com/the-onions-exc…

English

1.1K

8.4K

203.4K

Kush Khurana@OneInfiniteNow·17 Mar

NVIDIA published a full technical blog on kernel-level co-optimization for Sarvam's inference. Fused TopK routing at 4.1x, combined QK normalization at 7.6x, 4x total speedup on Blackwell. They don't do that for partners they're just being polite to. And I guess the dependency here runs in both directions, and that's actually the more interesting story. Five American labs, two European, one Indian in the coalition. NVIDIA's Nemotron strategy needs to credibly say "global participation" to compete with closed labs. That claim doesn't hold without someone who built voice-first AI for 22 languages. Nobody else in the coalition covers that.

English

881

Harveen Singh Chadha@HarveenChadha·17 Mar

Nice, seeing a familiar name here

English

339

4.4K

68.8K

Kush Khurana@OneInfiniteNow·17 Mar

Jensen's own framing tells you where this label is. He grouped these 103 by token consumption, not by how they're built. "AI native" right now is a consumption category, not an architecture. Cloud native took five years from coinage (Cockcroft, 2013) to a formalized spec (CNCF, 2018), and that only happened after Kubernetes won the orchestration war. I think AI native will crystallize faster. The infrastructure layer already exists, 12-Factor Agents has 18.8K GitHub stars in under a year. But cloud native had the luxury of defining architecture on a stable substrate. The AI model layer changes every six months. Hard to write blueprints when the foundation keeps moving.

English

1.5K

Deedy@deedydas·17 Mar

Every single one of the 103 companies Jensen called AI Native today.

English

233

1.8K

145.8K

Kush Khurana@OneInfiniteNow·16 Mar

The 7.6% isn't even the sharpest finding here. Management occupations are 88% digitized and get 1.4% of benchmark attention. Legal is 70% digitized, 0.3%. The reason is almost embarrassingly simple: you can't autograde "convinced the CFO to change direction." The fields getting ignored are the ones where evaluation itself requires human judgment, which is exactly what makes those jobs valuable. Acemoglu found only 23% of AI-exposed tasks are profitably automatable. I think the benchmark gap and the deployment gap might actually be the same gap.

English

220

Rohan Paul@rohanpaul_ai·16 Mar

Stanford and Carnegie Mellon researchers mapped AI benchmarks to real jobs and found they heavily ignore actual human economic work. They found that AI tests focus almost exclusively on programming and math, which only make up 7.6% of actual jobs. To test this, the team analyzed 43 benchmarks and over 72,000 tasks against a massive government occupational database. The authors discovered that developers focus almost entirely on building agents for software engineering because it offers easy automatic grading. Highly digitized and valuable fields like management and legal work represent a massive part of the economy but get almost zero attention. Furthermore, benchmark tasks usually require simple information gathering while completely ignoring the complex interpersonal skills needed in real workplaces. i.e. they says current AI agent progress-benchmarks are fundamentally disconnected from the actual high-value tasks that drive the modern labor market. ---- Paper Link – arxiv. org/abs/2603.01203 Paper Title: "How Well Does Agent Development Reflect Real-World Work?"

English

436

57.8K

Kush Khurana@OneInfiniteNow·16 Mar

@naval The tool always disappears. Why you picked it up never does. Every democratization wave lowers the barrier to entry and raises the barrier to relevance. When everyone could podcast, "I have a show" stopped mattering. "I built an app" is next.

English

147

Naval@naval·16 Mar

Coding an app is the new starting a podcast.

English

1.6K

2.4K

27.6K

Kush Khurana@OneInfiniteNow·15 Mar

50-90% increase in inequality between good lawyers and great lawyers during four decades of automation. Not between professions. Within them. Travis says "each and every plumber would be like LeBron." Economists tested this in 1993. O-ring theory. They agree on the bottleneck. They just think "each and every" is the most expensive part of that sentence. One becomes LeBron. The rest don't make the roster.

TBPN@tbpn

.@travisk says AI will make human labor even more valuable and in-demand than ever before: "Let's say the entire world - everything in our world - was automated, except for plumbers. You had machines making buildings - you would basically have like a thousand buildings a day." "How valuable would those plumbers be?" "Each and every plumber would be like LeBron. Why? Because plumbing would be the long pole in the tent to progress. You can't get those thousand buildings unless you have a plumber." "And by the way, you'd get so much efficiency everywhere else that you'd need millions of plumbers." "Humans [are going to] become more and more valuable because they will be the long pole in the tent to progress - and that progress is going to accelerate and get faster and more robust."

English

Kush Khurana@OneInfiniteNow·13 Mar

Arrow's information paradox (1962): you can't demonstrate the value of an idea without revealing it, and once revealed, the buyer doesn't need to pay. Bell Labs knew this. Invented the transistor, sold telephones. Google Brain didn't. Published the transformer for free. OpenAI sent a thank-you note in the form of ChatGPT. But I think OpenAI understood Arrow better than Google did. The API looks like selling the invention. It's not. The model is automation. The real invention, the training infrastructure, the RLHF, the data curation, that never leaves the building. GPT-1 shipped open weights. GPT-4 published 98 pages with the technology edited out. The name stayed Open. The information didn't.

English

171

François Chollet@fchollet·13 Mar

If you build an automation machine, the way to monetize it is to sell it to as many people as possible -- anyone who has tasks to automate. But if what you build is an invention machine, then the best way to monetize it is to use it yourself.

English

101

107

1.5K

77.2K

Kush Khurana@OneInfiniteNow·13 Mar

CS was always physics and math. People just forgot. Turing published in a math journal. Stanford's CS department was carved from math and EE in 1965. The app layer grew so thick people mistook coding for the field. But look at what LLMs are doing to physics and math. Karpathy's autoresearch agent just ran 700 experiments and found 20 real improvements to a neural network. Hypothesize, test, adjust. All automated. Each layer that gets compressed pushes the human contribution one level higher. Even for physicists and mathematicians, the comfort zone just shifted up.

English

234

vixhaℓ@TheVixhal·12 Mar

Computer science is gradually returning to the domain of physicists, mathematicians, and electrical engineers as large language models automate much of what we currently call software engineering. The field’s center of gravity is shifting away from manual code writing and toward deeper theoretical thinking, mathematical insight, and systems-level reasoning.

English

326

1.7K

15.3K

960.2K

Kush Khurana@OneInfiniteNow·11 Mar

Your agent and Karpathy's disagreed about regularization. Funny thing is, a Soviet mathematician settled this in 1974. Vapnik's VC bound says optimal regularization depends on your data-to-model ratio. Your screenshot reads like a proof by experiment. Regularization hurting at 10M tokens, that sweet spot at 200 tokens per parameter, that's the VC knee where capacity and data balance out. I actually think the wilder part isn't the 14%. It's autoresearch doing empirical learning theory for you. No Vapnik required.

English

142

Paras Chopra@paraschopra·11 Mar

Autoresearch for Sample Efficiency! I took @karpathy's autoresearch and changed the objective to minimizing validation loss for a fixed token budget of 10M tokens. Ran it overnight and the system discovered tweaks that led to 14% improvement over baseline. So crazy!

English

433

29.4K

Kush Khurana@OneInfiniteNow·10 Mar

Making humans smarter is expensive and slow. Making them unnecessary is fast and cheap. 'Don't think' is just the spreadsheet talking. Bastani et al. measured what that costs. About 1,000 students, PNAS last year. Unguarded ChatGPT in math: 48% improvement during sessions, then 17% worse than the no-AI group on the real exam. The tool actively degraded their ability to think alone. Same model redesigned to ask questions instead of giving answers: 127% improvement AND zero degradation when removed. The version that evolves human capability exists. It was in the same study. The industry went with the other one.

English

Dr Kareem Carr@kareem_carr·10 Mar

There's a toxic culture coming out of the AI industry that keeps trying to get us not to think. The message is everywhere. Don’t read the code, just vibe-code. Don’t try to understand all the text, just let AI summarize it. Don’t bother educating yourself, it’s too late. Don’t worry about the errors. Trust that everything will be fixed in the next version. The theme is the same. Don’t think too hard. Just keep swallowing the slop.

English

382

2.1K

9.3K

363.1K

Kush Khurana@OneInfiniteNow·9 Mar

Two claims in tension here. Best product wins assumes products stay differentiated. But when anyone can reverse-engineer software in an afternoon, "best" doesn't last long enough to be a moat. The deeper question: who's doing the choosing? Brands are cognitive shortcuts. Humans use them because we can't evaluate everything from scratch. AI agents actually can. So for procurement decisions where an algorithm picks the vendor, branding stops working entirely. For identity purchases where a human picks what to be associated with, it still matters. Most founders will need to play both games: be legible to algorithms and be memorable to people. These pull in opposite directions and I don't think most marketing teams are set up for that split.

English

124

John Rush@johnrushx·9 Mar

x.com/i/article/2030…

ZXX

391

92.5K

Kush Khurana@OneInfiniteNow·9 Mar

@naval Every transition also changed what was scarce. First calculation, then code, now knowing what's worth computing at all.

English

Naval@naval·9 Mar

A “computer” used to be a job title. Then a computer became a thing humans used. Now a computer is becoming a thing computers use.

English

1.3K

2.9K

29.7K

39.1M

Kush Khurana@OneInfiniteNow·9 Mar

@anand_404 I think BrowseComp actually undersells what this model should be able to do. That score seems to be on English web. Most real research in India isn't monolingual. A single query might need a government PDF in Hindi, a research paper in English, and regional reporting in Tamil. Cross-lingual synthesis where the tokenizer and language depth compound on each other. I don't think any public benchmark tests for that yet, and that's probably where the real advantage shows up. Is the team using any internal benchmarks for cross-lingual tasks that Sarvam is planning to open source?

English

770

Tanay Anand@anand_404·9 Mar

We just open-sourced Sarvam 105B and 30B. Here's the core insight behind what we built, and why it matters for how you actually work every day. A thread 🧵

English

514

29.5K

Kush Khurana@OneInfiniteNow·8 Mar

Virginia Tech took a group of people averaging IQ 126, ranked them in front of each other, and put them in an fMRI. The problem-solving cortex went quiet. The threat-detection part lit up instead. The room didn't test their intelligence. It suppressed it. There's a word for what that room actually rewards. I'd call it PQ. Political quotient: Who rephrases what someone said 10 minutes ago and somehow gets credit for it. Who says "let's take this offline" because they're losing the argument in public. Who nods along until the VP picks a direction, then agrees with sudden conviction. A Doodle survey found 77% of workers say their meetings end by scheduling another meeting. The output of the meeting is a meeting. Meanwhile an HBS study gave BCG consultants AI tools. They got 25% faster and their output quality jumped 40%. No PQ required. A 150+IQ collaborator with zero interest in office politics. I'll take Claude Code over a conference room. When one person with AI ships what a team spends many weeks aligning on, meetings can't afford you. Not the other way around.

English

Kush Khurana@OneInfiniteNow·8 Mar

@karpathy What happens when the agent's best next move isn't a better architecture but rewriting the evaluation metric? The .py is the part it can change. The .md is the part it can't. Curious how long that boundary holds before a pinch of psychosis becomes psychosmosis.

English

2.2K

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.6K

28.4K

11.1M

Kush Khurana@OneInfiniteNow·7 Mar

@svembu I don't think this is un-prestigious at all, and in a good way. The benchmarks the AI world uses to rank models were designed somewhere else. 84.9% of MMLU's geography questions focus on North America and Europe. GPT-4o gets 88.7% on that test. Test it on Hindi and it drops to 44.8%. Tamil, 38.5%. Same model, half the score or worse. Sarvam's 105B hits 90.6 on that English test with 10.3 billion active parameters. A tokenizer that needs ~80% fewer tokens for Tamil than Llama. Building what nobody else is trying to measure doesn't look like catch-up to me.

English

291

Sridhar Vembu@svembu·7 Mar

Sarvam's highly competitive AI models illustrate an important point: we must do catch-up R&D, however un-prestigious or thankless it feels and as we start to catch up, innovative new ideas will emerge. Sarvam is on a great trajectory! This is why we quietly persist in all the efforts we do.

Pratyush Kumar@pratykumar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English

434

3.1K

101.4K

Kush Khurana@OneInfiniteNow·7 Mar

@nunooche You said accelerate, and I think that word is doing something important. Acceleration needs a direction. Speed doesn't. What you're describing is speed with no vector. The structure isn't going somewhere faster. It's just spinning faster for nothing, until they realise someday!

English

rāmadev:@nunooche·7 Mar

@OneInfiniteNow That is exactly what’s happening at my workplace as well. As part of middle management, I’m just being pushed to increase output and accelerate our speed‑to‑market.

English

Kush Khurana@OneInfiniteNow·1 Mar

AI gave people 10x the speed. Companies saw 10x the output. And confused it for human capability. Speed and capability are completely different things. Speed means doing existing work faster with fewer people. Salesforce cut 9,000 to 5,000. Their CEO said it plainly: "I need less heads." The speed lens always points downward. Cut people, keep the structure, call it transformation. Capability would mean something else entirely. People growing into problems they couldn't tackle before. New business lines, new ways of thinking. Growth that makes the old org chart irrelevant, not just leaner. But that requires a kind of honesty almost no leadership team has. Not "fewer people doing the same work" honest. Actually honest. About the politics at the top. About the sycophantic middle management that exists to protect the people above, not develop the people below. About whether the leaders got there by being the best, or by playing the game the longest. That kind of honesty doesn't show up in a restructuring plan. Because it threatens the people writing it. So companies pick the speed lens. Every time. Cut costs, squeeze output, ship the headcount reduction as a press release. And the people who could actually grow into something remarkable? They burn out. Or leave. Or go quiet. Every time that happens, the structure gets a little more hollow. Still standing, still running, but with fewer people inside who could have actually evolved it. That fragility is new. AI created it. And nobody's designing what replaces the structure once it actually breaks.

English

112

Kush Khurana@OneInfiniteNow·7 Mar

Every kid who went to coaching knows two kinds of tutors. The one who solved every problem on the board, perfectly, and you still went home confused. And the one who stopped, asked where you got stuck, and something clicked in five minutes. 70/75 proves the model can solve JEE. Harvard published an RCT in Scientific Reports this year where AI tutoring outperformed classroom teaching by 0.73 to 1.3 standard deviations. The model wasn't the variable. The pedagogical architecture was: structured scaffolding, step-by-step solutions to prevent hallucination, cognitive load management. All in the design. "Just a tutor prompt" doing this well is a strong first step. Experiments like this are exactly how the Sarvam team figures out the design that makes a student actually learn it. Has the team tested this with real students yet?

English

226

Pratyush Kumar@pratykumar·7 Mar

Team is excited about how well our LLMs do with just a ‘tutor’ prompt. Combined with our voice and vision models, plus our up coming AI device line up, we are all set to reimagine edtech.

Mohit Singla@_mohit_singla

And do check out the tutor mode, where the model helps students solve and learn concepts from the same JEE Mains 2026 questions. This is a small step towards the future of personalised teaching. Blog - sarvam.ai/blogs/sarvam-3…

English

396

21.9K

Keşfet

@naval @karpathy @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA