Vinod Sharma

3.5K posts

Vinod Sharma

@vinodbf16

Katılım Eylül 2009

292 Takip Edilen70 Takipçiler

Vinod Sharma retweetledi

Deedy@deedydas·3d

This is the single best read on World Models and one of the most important reads in AI. $10B has flowed into "world models" in the last 18mos, from Yann LeCun to FeiFei Li. The promise is, like LLMs, world models will provide the data it takes to scale robotics foundation models, and solve robotics. ..but the word has been abused to mean one of many things. This post unpacks: – What 5 traits makes a world model? – How do the different approaches stack up? – What is it used for within and beyond robotics? – Where is the opportunity? – Citations to research, news and blog posts Companies / products in the space include: – BigCo products: Google Genie, Tesla Optimus, Nvidia DreamDojo, DreamZero, Microsoft Muse – Pure world model: AMI Labs, World Labs, Runway, Rhoda, Decart, Spaitial, Odyssey, Embo, Dream Labs, OneWorld – Robot foundation model cos: Skild, Physical Intelligence, Figure, Mind Very likely one of the seminal technologies of the next decade.

English

148

111.7K

Vinod Sharma retweetledi

The Year of the Graph@TheYotg·7 May

BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval Standard RAG has a structural blind spot: it retrieves isolated chunks without modeling how they relate to each other. That works fine for simple questions. It falls apart the moment reasoning needs to cross multiple documents. BrowseNet, accepted at ICLR 2026, addresses this head-on. Developed by researchers at IIT Madras and DevRev, it rethinks retrieval as a graph traversal problem. The core idea: transform a corpus into a Graph-of-Chunks, where nodes are document passages enriched with semantic embeddings, and edges connect passages that share named entities or synonymous terms. When a multi-hop query arrives, BrowseNet decomposes it into a directed acyclic graph of single-hop subqueries, then walks the Graph-of-Chunks to surface the reasoning path the query actually needs. This two-track approach, combining lexical graph structure with semantic similarity, outperforms both dense retrieval methods and graph-augmented RAG pipelines including HippoRAG-2 on HotpotQA, 2WikiMQA, and MuSiQue benchmarks. What makes it practically compelling is the cost story. BrowseNet achieves this with roughly 33x lower LLM cost than the previous SOTA, and only a marginal latency trade-off of under half a second per query. The entire pipeline requires just one LLM call at retrieval time, guided by pre-generated subqueries, rather than repeated back-and-forth inference. The graph construction uses GLiNER for named entity recognition and ColBERTv2 for synonym matching, with no generative LLM needed during offline indexing. The code and datasets are fully open-sourced. github.com/bisect-group/B… #KnowledgeGraph #RAG #MultiHopQA #GraphML #LLM #OpenSource -- 📩 The Year of the Graph Spring 2026 newsletter issue is out! Beyond Context Graphs: How Ontology, Semantics, and Knowledge Graphs Define Context 👇 yearofthegraph.xyz/newsletter/202… All things #KnowledgeGraph, #GraphDB, Graph #Analytics / #DataScience / #AI and #SemTech. Subscribe and follow to be in the know. Reach out if you'd like to be featured

English

2.9K

Vinod Sharma retweetledi

Yuandong Tian@tydsh·3d

Today we launch Recursive. We are building AI that discovers knowledge automatically and improves itself recursively, an open-ended process that will fundamentally change how science and technology advance. Our 25 top researchers and engineers in San Francisco and London bring diverse expertise spanning agentic AI scientists, architecture and algorithm design, world models, optimization, and interpretability, united by a shared conviction that this is the most important problem we could be working on today. If you are interested in joining, please send your resume to talent@recursive.com. Follow us at @Recursive_SI!

Recursive@Recursive_SI

x.com/i/article/2054…

English

147

1.3K

162.8K

Vinod Sharma retweetledi

Sebastian Raschka@rasbt·3d

A little talk on what we can learn from implementing LLM architectures from scratch in Python and PyTorch. And how I approach new open-weight models, compare them against reference implementations etc: youtube.com/watch?v=TXzQ7P…

YouTube

English

162

971

66.3K

Vinod Sharma retweetledi

Mira Murati@miramurati·5d

Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time interaction natively, instead of gluing it onto a turn-based one. youtu.be/A12AVongNN4

YouTube

English

322

934

8.9K

1.1M

Vinod Sharma retweetledi

elvis@omarsar0·8 May

LLM Wikis + HTML Artifacts are insanely powerful. You should seriously consider this in your workflows. LLM Wikis captures all the important information that lets you and your agents do meaningful work. HTML artifacts present that information in interesting ways that allow you to take important actions along with your agents. My HTML artifacts sit on top of my LLM wikis. They are dynamic and are easily extended as needs arise. I have hooked my Artifacts to talk to my agents, and similarly, the agents can talk to artifacts. This has allowed me to build powerful artifacts that reduce my inbox to zero, keep me updated on any topic of interest, fast prototyping, do deep research, design/trigger new experiments, generate figures to improve understanding, schedule research, search relevant information, discover topics, and so much more. What you see in the clip is not a website. It's a simple interactive HTML artifact. HTML artifacts are useful for designers, engineers, researchers, students, and anyone working with agents. Lastly, HTML doesn't replace Markdown. They are a much better combination working together.

English

206

1.9K

241.5K

Vinod Sharma retweetledi

hardmaru@hardmaru·8 May

The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… ⚡️

Sakana AI@SakanaAILabs

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️ Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models: Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… While LLMs are undoubtedly powerful, they are increasingly expensive to train and deploy, with a large part of this cost coming from their feedforward layers. Yet, an interesting phenomenon occurs inside these layers: For any given token, only a small fraction of the hidden activations actually matter. The rest approximate zero, wasting computation. With ReLU and very mild L1 regularization, this sparsity can exceed 95% with little to no impact on downstream performance. So, can we leverage this sparsity to make LLMs faster? The challenge is hardware. Modern GPUs are optimized for dense matrix multiplications. Traditional sparse formats introduce irregular memory access and overheads that cancel out their theoretical savings for GEMM operations. Our contribution is twofold: 1/ We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution. 2/ We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes. We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy. This work will be presented at #ICML2026. Please check out our blog and technical paper for a deep dive!

English

503

3.5K

420.1K

Vinod Sharma retweetledi

Andrej Karpathy@karpathy·30 Nis

Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.

Stephanie Zhan@stephzhan

@karpathy and I are back! At @sequoia AI Ascent 2026. And a lot has changed. Last year, he coined “vibe coding”. This year, he’s never felt more behind as a programmer. The big shift: vibe coding raised the floor. Agentic engineering raises the ceiling. We talk about what it means to build seriously in the agent era. Not just moving faster. Building new things, with new tools, while preserving the parts that still require human taste, judgment, and understanding.

English

327

753

5.7K

859.4K

Vinod Sharma retweetledi

Santiago@svpino·2 May

30 agents every AI Engineer must build. This is the most comprehensive and practical book on AI Engineering that I've ever seen. I can't think of a single use case that they didn't cover here: 1. The autonomous decision-making agent 2. The planning agent 3. The memory-augmented agent 4. The knowledge retrieval agent 5. The document intelligence agent 6. The scientific research agent 7. The tool-using agent 8. The agentic workflow system 9. The data analysis agent 10. The verification and validation agent 11. The general problem solver agent 12. The code generation agent 13. The security-hardened agent 14. The self-improving agent 15. The conversational agent 16. The content creation agent 17. The recommendation agent 18. The vision language agent 19. The audio processing agent 20. The physical world sensing agent 21. The ethical reasoning agent 22. The explainable agent 23. The healthcare intelligence agent 24. The scientific discovery agent 25. The financial advisory agent 26. The legal intelligence agent 27. The education intelligence agent 28. The collective intelligence agent 29. The embodied intelligence agent 30. The domain-transforming integration agent I also read 50 Algorithms Every Programmer Should Know by Imran. Same vibe. Here is the Amazon link: amzn.to/4t5ystE

English

759

4.6K

255.8K

Vinod Sharma retweetledi

Akshay 🚀@akshay_pachaar·2 May

You're in a Research Scientist interview at Google. Interviewer: We have a base LLM that's terrible at maths. How would you turn it into a maths & reasoning powerhouse? You: I'll get some problems labeled and fine-tune the model. Interview over. Here's what you missed:

English

545

136.5K

Vinod Sharma retweetledi

fks@FredKSchott·1 May

Introducing Flue — The First Agent Harness Framework Flue is a TypeScript framework for building the next generation of agents, designed around a built-in agent harness. Flue is like Claude Code, but 100% headless and programmable. There's no baked in assumption like requiring a human operator to function. No TUI. No GUI. Just TypeScript. But using Flue feels like using Claude Code. The agents you build act autonomously to solve problems and complete tasks. They require very little code to run. Most of the "logic" lives in Markdown: skills and context and AGENTS.md. Flue is like Astro or Next.js for agents (not surprising, given my background 🙃). It's not another AI SDK. It's a proper runtime-agnostic framework. Write once, build, and deploy your agents anywhere (Node.js, Cloudflare, GitHub Actions, GitLab CI/CD, etc). We originally built Flue to power AI workflows inside of the Astro GitHub repo. But then @_bgiori got his hands on it, and we realized that every agent needs a framework like Flue, not just us. Check it out! It's early, but I'm curious to hear what people think. Are agents ready for their library -> framework moment?

English

183

341

3.8K

738.6K

Vinod Sharma retweetledi

Y Combinator@ycombinator·1 May

A 7-million parameter model outperforming models a thousand times its size on tasks like ARC Prize. That's what recursive reasoning unlocks. In this episode of Decoded, YC's @agupta and @FrancoisChauba1 break down two recent papers on recursive AI models, HRMs and TRMs, that are achieving state-of-the-art results with a fraction of the parameters of today's largest models. They explain why standard LLMs hit a fundamental ceiling on certain reasoning tasks, how recursion at inference time gives small models the compute depth to break through it, and what happens when you combine these ideas with the power of large-scale foundation models. 00:35 - Model Foundations 01:15 - RNN Limits and LLM Contrast 02:36 - Reasoning Limits and Sorting Analogy 04:22 - HRM Paper Introduction 05:25 - HRM Architecture and Intuition 07:36 - HRM Results and Outer Loop 09:46 - TRM Paper Overview 11:20 - TRM Training and Fixed Point 13:30 - Detailed HRM Summary 20:46 - Comparing HRM and TRM 34:45 - Future Outlook

English

497

126.8K

Vinod Sharma retweetledi

elvis@omarsar0·3 May

Claude Opus 4.7 just implemented an AlphaZero-style self-play pipeline from scratch. It did this on consumer hardware in three hours, then beat the Pascal Pons solver 7 of 8 as first-mover on Connect Four. No other frontier coding agent tested cleared 2 of 8. This paper proposes a new way to evaluate coding agents: hand them a minimal task description, give them a tight budget, and ask them to autonomously rebuild a famous ML breakthrough. Connect Four + AlphaZero is the first instance. It's small enough to run on a laptop and hard enough to require a real research engineering loop (MCTS, neural value/policy nets, self-play, training schedule). We've been measuring coding agents on patches and unit tests. This shifts the bar to "can the agent build a non-trivial ML system end-to-end on its own?" The answer is now yes for at least one frontier model. Paper: arxiv.org/abs/2604.25067 Learn to build effective AI agents in our academy: academy.dair.ai

English

363

48.8K

Vinod Sharma retweetledi

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram·2 May

Stanford's latest seminar is a deep dive into the evolution of world modeling in AI. Focuses on the shift in the world model from traditional reconstruction methods toward latent space prediction. Covers topics like: - Introduction to JEPA & World Models - Causal JEPA - LOWER Model - Practical Applications & Planning - Future Outlook

English

161

1.5K

204.4K

Vinod Sharma retweetledi

Pau Labarta Bajo@paulabartabajo_·1 May

Advice for AI engineers 💡 Browser control is possible with a Small Model, like LFM2-350M by @liquidai . Here's a 60-minute deep dive on how to fine-tune with RL and OpenEnv by @huggingface Enjoy ↓

English

380

16K

Vinod Sharma retweetledi

James Zou@james_y_zou·30 Nis

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

English

241

1.7K

124.3K

Vinod Sharma retweetledi

mufeez@moofeez·28 Nis

I post-trained Qwen3-Coder to fix bugs using an actual debugger. The result: Solve rate: 70% → 89% Median turns to fix: 46 → 19 (-59%) Instead of just reading code or print-debugging, it: - reasons from execution - inspects live variables and call stacks - sets breakpoints, steps, and evaluates expressions

English

118

1.6K

121.8K

Vinod Sharma retweetledi

Antonio Lupetti@antoniolupetti·27 Nis

Embeddings power every modern LLM. But what do they actually learn? This Berkeley (BAIR) paper is one of the clearest reads on how AI systems learn and why embeddings really work. bair.berkeley.edu/blog/2025/09/0…

English

151

895

45.9K

Vinod Sharma retweetledi

Yacine Mahdid@yacinelearning·28 Nis

if you are interested in a great lecture on self-distillation I’ve finished editing a ~1h30min lecture with two stellar researchers in that space @jonashubotter and @IdanShenfeld lots of different article distilled into one presentation and a whole lot of questions answered!

Jonas Hübotter@jonashubotter

Today and tomorrow we’ll be presenting self-distillation with orals at ICLR in Rio 🇧🇷 1. “Self-Distillation enables Continual Learning” at lifelong agents workshop (Sun 11:30am) 2. “Reinforcement Learning via Self-Distillation” at scaling post-training workshop (Mon 2:40pm) 3. “Test-Time Self-Distillation” at test-time updates workshop (Mon 4:15pm)

English

120

1.1K

79.8K

Vinod Sharma retweetledi

Daily Dose of Data Science@DailyDoseOfDS_·28 Nis

Big moment for RL! ART (Agent Reinforcement Trainer) is an open-source framework for training agents with GRPO + RULER (an automatic reward system). No need to hand-craft reward functions. GitHub: github.com/OpenPipe/ART

Avi Chawla@_avichawla

x.com/i/article/2048…

English

482

54K

Keşfet

@Recursive_SI @NVIDIA @_bgiori @agupta @FrancoisChauba1 @liquidai @huggingface @jonashubotter