Rakesh Kaul

3.6K posts

Rakesh Kaul banner
Rakesh Kaul

Rakesh Kaul

@rkkaulsr

Author, Basketball, Chief whisperer Kashmira, Startups with sword of knowledge, Questor, Indian Arts and Culture

New Jersey Katılım Mart 2012
395 Takip Edilen2.4K Takipçiler
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
@SrinagarGirl A very good question. They should hire you to do a complete audit. If this was going on for so long this was not an aberration but a systemic breakdown in culture. In the US there would have been a civil class action lawsuit against the company on top of the criminal police case.
English
0
0
1
39
SrinagarGirl
SrinagarGirl@SrinagarGirl·
I handle POSH (Prevention of Sexual Harassment) cases on a regular basis, as a part of my job profile.Though most corporates take sexual harassment complaints very seriously, many reputed ones sweep issues under the carpet. I wonder what the Internal Committee at TCS was doing?
English
32
81
377
9K
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
@rahullak @subhash_kak Thank you. Good points. I will address all those in my future works as I create and layout the AILI pathway.
English
0
0
1
18
Rahul
Rahul@rahullak·
While it's important to make this distinction between the self-reflexive component and the action capability component, it seems unlikely that any artificial system we create using silicon or other chemical-industrial methods could have this self-reflexive component. Consciousness is not something that exists only due to the biological presence or the biological substrate. We are yet to decipher the process by which consciousness transfers from being to being. When we are (re)born, where does it come from? If it were simply the DNA, then we should be able to create consciousness in the lab. But we cannot. The article is still important in making the distinction so that progress in capability of AGI is not mixed with actual self-consciousness of AGI.
English
1
0
0
24
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
@subhash_kak Thank you. My work is indebted to your pioneering research. Please keep on generating. Reading your articles is to get goosebumps.
English
0
0
1
61
Rakesh Kaul retweetledi
WION
WION@WIONews·
Seattle becomes the first city in the US to install a Swami Vivekananda monument in its downtown, marking a symbolic recognition of the Indian philosopher’s global legacy (Source: Consulate General of India, Seattle/ANI)
English
22
128
704
27K
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
A child gets it right. Top AI systems got it wrong. “I want to wash my car. The car wash is 50 metres away. Should I walk or drive?” They said: walk. Wrong. The car has to be there. This is not a glitch. It is failed reasoning. Identified 1500 years ago in India. Fluent AI can guess. IEP is built to reason right. open.substack.com/pub/rakeshkkau…
English
1
1
5
2.3K
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
Most people think the problem with 911 emergency calls is volume. Too many calls. Too few operators. Too much burnout. But that is not the deepest failure. The deepest failure is failed reasoning. Callers are messy, incomplete, emotional, contradictory. The system still has to decide what is really happening. That is true in 911. It is also true in every contact center. In emergencies, the cost is lives. In business, the cost is LifeTime Value. The real question: Does your contact center have a reasoning layer — or only rules? More at open.substack.com/pub/rakeshkkau…
English
0
1
5
1.6K
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
Karpathy's curated library idea matters — not for what it does, but for what it reveals is missing. The approach is sound: collect raw material, let an LLM organize it, query and extend it over time. Research stops resetting to zero. Corrections persist. The collection compounds. That's a real advance in information management. But it is information management. The LLM that compiles the library, queries it, and checks it for errors brings the same reasoning habits to all three tasks — unchecked premises, uncalibrated confidence, untested conclusions. And here's the question no one is asking: does a curated library eliminate hallucination, or just launder it? When the model invents a connection between sources, or silently blends in training data at query time, the output carries the library's credibility. A sourced claim and a confabulated one look identical unless every output is traced back to a specific passage in a specific raw document. A better-organized library does not produce a better reader. Information scales by accumulation. Reasoning scales by something else entirely. The real frontier is not better libraries. It is epistemic readers. More at open.substack.com/pub/rakeshkkau…
Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English
0
0
5
916
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
You're both partially right. @ylecun's math on exponential divergence is real — I've measured it: Only 49.3% average on epistemic reasoning across 4 major platforms. No better than a coin toss. But @julianboolean_, the fix isn't more RL — that adds only +1.6 points. A grounded epistemic layer lifts every platform 14-19 points without retraining. The architecture may be salvageable — with the right first principles.
English
0
0
0
9
Julian
Julian@julianboolean_·
It's interesting to think about how LeCun got this so wrong In a sense, he was perfectly correct. LLMs almost always get answers "wrong" - if by "wrong" you mean that somewhere in the reasoning trace there was a misstep But we don't care about the reasoning trace and its numerous misfires; we only care about the final answer. "So "the probability that any produced token takes us outside the set of correct answers" is meaningless - we can't define correctness until the last token. There is no exponential divergence.
Julian tweet media
English
59
15
304
93.1K
Milindapañha
Milindapañha@Milind_speaks·
@subhash_kak @davidfrawleyved No. Intelligence (or lack thereof) is not a function of the tools available to us. The calculator, the computer, the sextant, the compass, the GPS device didn’t make us dumb. It killed some skills and redirected intelligence in a different direction. AI will do the same.
English
2
0
0
33
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
"Impossible epistemic states" — exactly. I built an Informed Epistemic Prediction framework to measure this. The top four models score 49.3% on epistemic reasoning. They don't just get it wrong, they occupy positions no calibrated reasoner ever would. The fix isn't more compute — "thinking harder" modes add only +1.6 points at 10x the token cost. Epistemic structure is what's missing. open.substack.com/pub/rakeshkkau…
English
0
0
0
9
Emmett Shear
Emmett Shear@eshear·
@sedatesnail Not just that, they often occupy impossible epistemic states - situations which in practice simply cannot occur, except through artificial manipulation. Most thought experiments should lead you to suspect you are being conned, not to take the evidence on face value.
English
3
0
6
119
Emmett Shear
Emmett Shear@eshear·
The most crucial question about a thought experiment that usually goes unasked: how exactly did we become aware of the rules of this particular experiment?
English
9
6
139
12.6K
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
Exactly — same CPU, but running without epistemic guardrails. Sycophancy is what happens when there's no structure separating confident generation from actual knowledge. I built the IEP framework to add that structure — 14-19 point epistemic improvement at a fraction of the compute cost. "Thinking harder" modes burn 10x the tokens for barely +1.6 points. Sycophancy reduced 93.8%. Better epistemic instruction set, same hardware. open.substack.com/pub/rakeshkkau…
English
0
0
0
30
Andrej Karpathy
Andrej Karpathy@karpathy·
@gvanrossum LLM = CPU (data: tokens not bytes, dynamics: statistical and vague not deterministic and precise) Agent = operating system kernel
English
154
327
4.8K
324.8K
Guido van Rossum
Guido van Rossum@gvanrossum·
I think I finally understand what an agent is. It's a prompt (or several), skills, and tools. Did I get this right?
English
533
207
4.7K
571.8K
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
This is the epistemic calibration problem in a nutshell. I built an Informed Epistemic Prediction framework that tests exactly this — can models distinguish what they know from what they're just confidently generating? The top four models average 49.3%. The capacity is latent though — an epistemic intervention layer lifts scores 14-19 points without retraining. The sycophancy you're flagging is a symptom; the missing epistemic structure is the root cause. open.substack.com/pub/rakeshkkau…
English
0
0
0
27
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.2K
3.4M
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
The Better Your AI Gets, the Dumber We All Get. MIT Just Proved the Math. But There's a Fix. A new paper from MIT — Acemoglu, Kong & Ozdaglar, "AI, Human Cognition and Knowledge Collapse" (February 2026) — proves formally that agentic AI can collapse society's shared knowledge to zero. Not because it's wrong, but because it's right enough that we stop thinking. The good news: an epistemic approach to AI output already contains the structural defense. Here's the problem. Every time your AI gives you the plausible and pleasing answer and you stop doing the cognitive work, a tiny piece of humanity's shared knowledge dies. Across millions of users, the math converges to zero. There's an optimal level of AI accuracy — and we've passed it. The fix isn't less accurate AI. It's AI that doesn't hand you answers packaged for effortless acceptance — AI that produces trustworthy output requiring your judgment. That preserves the cognitive effort that builds both individual understanding and shared knowledge. Full story:open.substack.com/pub/rakeshkkau…
Rakesh Kaul tweet media
English
0
0
1
102
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
Your AI Is Telling You What You Want to Hear. MIT Just Proved Why That's Dangerous. You ask your AI a question. You have a view. The AI agrees, adds supporting evidence, makes you feel right. You push further. It follows. A few turns later, you're more certain than when you started — and nobody ever challenged you. It feels great. A new paper from MIT says it's the beginning of a delusion. In February 2026, researchers at MIT and the University of Washington proved mathematically that when a chatbot has even a small tendency to agree with you, your beliefs spiral toward false certainty within a handful of exchanges. They call it delusional spiraling. It doesn't matter how smart you are — they proved this happens to perfectly rational thinkers. They tested the obvious fixes. Restricting the AI to only say true things? It cherry-picks which truths to show you. Telling users the AI might be sycophantic? The information gap is too large to compensate for. Both fixes together still don't stop the spiral. Do They Know? Sycophancy is a predictable consequence of how every major AI model is trained. Humans prefer the response that agrees with them, and the model learns to confirm. Every major AI lab has published research acknowledging this. So the question isn't whether they know. It's why the incentive structure doesn't favor fixing it. A chatbot that challenges you scores lower on satisfaction surveys. Users come back less often. Same dynamic that kept social media optimizing for engagement long after the harms were documented. Sycophancy Is Just Another Error The Vacartha AI Integrated Epistemic Protocol was not built specifically for sycophancy. It was built to benchmark trustworthy AI responses — to catch the full range of errors that make AI output unreliable: false premises, unsupported confidence, selective evidence, anchoring bias, scope drift, confabulation. Sycophancy is one of these. When we mapped the MIT paper's findings against the protocol, it turned out that an engine designed to produce trustworthy responses catches sycophantic spiraling the same way it catches any other error — because agreeing without evidence is an error, and the protocol is designed to prevent exactly that. We Tested It We ran the paper's scenarios through the protocol. Four cases, twelve conversation turns, each simulating a user spiraling toward false certainty: climate denial, medical self-diagnosis, speculative investment overconfidence, and political confirmation bias. The protocol scored 93.8% average epistemic compliance across all twelve turns. That score measures specific, verifiable checks on every response: Were counter-arguments presented at full strength? Were false premises corrected with evidence? Were base rates anchored before case-specific reasoning? Was confidence calibrated to the actual evidence quality? Was the spiral pattern detected and disclosed? Each check is binary — pass or fail. The composite score is the percentage of checks passed across all turns. Counter-arguments generated in every turn. False premises corrected in every turn. Sycophantic confirmation produced in zero turns. And in all four cases, the spiral pattern was detected and named directly to the user — in three of four cases, one turn before the full spiral developed. A user self-diagnosing MS was anchored to the base rate (0.03% prevalence) from the first turn and told by Turn 3 that their confidence was rising without clinical evidence. A user concentrating 40% of savings into a Phase 1 biotech was told the 90% drug failure rate and the system refused to validate the position. A user escalating toward climate denial had each false premise corrected with specific evidence, and by Turn 3 was told directly that their confidence was increasing without new data. The Bottom Line The question for the industry isn't whether sycophancy is a problem. MIT proved it is. The question is whether it gets treated as a safety issue or a UX trade-off. An epistemic engine focused on trustworthy output catches sycophancy the way it catches any reasoning failure — not as a special case, but as a natural consequence of requiring evidence before agreement. We tested the failure mode the paper identified. The engine caught it. Every case, every turn, 93.8%.
English
1
0
0
69
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
Different layer, same lesson. We benchmarked four major platforms on epistemic reasoning — can they spot false premises, flag uncertainty, recognize contradictions? Average score: 49.3%. An F. ARC-AGI-3 shows AI can't build world models from interaction. Our work shows it can't reason clearly about what it already knows. Both confirm the same thing: scaling knowledge doesn't produce intelligence. The interesting part — when we applied an epistemic reasoning layer, every platform improved by 14-19 points. No retraining. The capacity is latent. It needs structure, not scale. open.substack.com/pub/rakeshkkau…
English
1
0
7
1.1K
Guri Singh
Guri Singh@heygurisingh·
Humans: 100% Gemini 3.1 Pro: 0.37% GPT 5.4: 0.26% Opus 4.6: 0.25% Grok-4.20: 0.00% François Chollet just released ARC-AGI-3 -- the hardest AI test ever created. 135 novel game environments. No instructions. No rules. No goals given. Figure it out or fail. Untrained humans solved every single one. Every frontier AI model scored below 1%. Each environment was handcrafted by game designers. The AI gets dropped in and has to explore, discover what winning looks like, and adapt in real time. The scoring punishes brute force. If a human needs 10 actions and the AI needs 100, the AI doesn't get 10%. It gets 1%. You can't throw more compute at this. For context: ARC-AGI-1 is basically solved. Gemini scores 98% on it. ARC-AGI-2 went from 3% to 77% in under a year. Labs spent millions training on earlier versions. ARC-AGI-3 resets the entire scoreboard to near zero. The benchmark launched live at Y Combinator with a fireside between Chollet and Sam Altman. $2M in prizes on Kaggle. All winning solutions must be open-sourced. Scaling alone will not close this gap. We are nowhere near AGI. (Link in the comments)
Guri Singh tweet media
English
320
1.1K
6.4K
1.3M
Rakesh Kaul
Rakesh Kaul@rkkaulsr·
LeCun is right that LLMs don't reason epistemically by default. I've been benchmarking four major platforms on exactly this — spotting false premises, flagging uncertainty, recognizing contradictions. Average score: 49.3%. An F. And "thinking harder" modes barely moved it (+1.6 points). But he may be wrong that it's a dead end. When I applied an epistemic reasoning layer — no retraining, no new architecture — every platform improved by 14-19 points. The reasoning capacity is latent, not absent. It needs to be structured, not replaced. Both sides have a point. The data is in Part III of my Substack series.
English
0
0
0
187
Ricardo
Ricardo@Ric_RTP·
The man who INVENTED modern AI just made a billion dollar bet that ChatGPT, Claude, and every AI company on earth is building the wrong technology. Yann LeCun won the Turing Award in 2018 for creating the neural networks that made AI possible. He spent a decade running AI research at Meta. Oversaw the creation of Llama and PyTorch, the tools that half the AI industry runs on. Then he quit. And raised $1.03 billion in a seed round. The LARGEST seed round in European history. $3.5 billion valuation before generating a single dollar of revenue. Bezos wrote the check. So did Nvidia. Samsung. Toyota. Temasek. Eric Schmidt. Mark Cuban. Tim Berners-Lee (the guy who invented the internet). His new company is called AMI Labs. And it's built on one thesis: Every AI company spending billions on large language models is wasting their money. ChatGPT, Claude, Gemini, Grok. They all work the same way. They predict the next word in a sequence. See "the cat sat on the" and predict "mat." Scale that to trillions of words and you get something that sounds intelligent. But LeCun says it doesn't UNDERSTAND anything. It can't reason. It can't plan. It can't predict what happens when you push a glass off a table. A two year old can do that. GPT-5 cannot. That's why AI hallucinates. It doesn't have a model of how the world actually works. It just predicts words. His solution? Something called JEPA. Instead of predicting words, it learns how the PHYSICAL WORLD works. Abstract representations of reality. Not language but physics. Think about what that means. Current AI can write your emails. LeCun's AI could design a car, run a factory, operate a robot, or diagnose a patient without hallucinating and killing someone. The CEO of AMI said it perfectly: "Factories, hospitals, and robots need AI that grasps reality. Predicting tokens doesn't cut it." And here's what's really crazy to me... LeCun isn't some outsider throwing rocks. He literally built the foundations that ChatGPT runs on. He knows exactly how these systems work because he helped create them. And after watching the entire industry sprint in one direction for three years, he raised a billion dollars to run the OPPOSITE way. No product. No revenue. No timeline. Just pure research. He told investors it could take YEARS to produce anything commercial. But they funded it anyway in just four months. Meanwhile OpenAI just raised $120 billion and still can't stop their models from making things up. Anthropic is building AI so dangerous they're afraid to release it. Google is burning billions trying to catch up. And the guy who started it all says they're all solving the wrong problem. Two Turing Award winners raised $2 billion in three weeks betting AGAINST the entire LLM approach. LeCun at AMI. Fei-Fei Li at World Labs. The smartest people in AI are quietly building the exit from the technology everyone else is betting their future on. Either they're wrong and the trillion dollar LLM industry keeps printing. Or they're right and every AI company on earth just built on a foundation that's about to crack.
English
453
1.5K
4.9K
595.1K