karun

99 posts

karun

@karun_kumar_

Applied AI research

Washington, DC Katılım Ağustos 2013

432 Takip Edilen56 Takipçiler

karun retweetledi

Ming@tslaming·17 Oca

BREAKING 🚨 TESLA HAS PATENTED A "MATHEMATICAL CHEAT CODE" THAT FORCES CHEAP 8-BIT CHIPS TO RUN ELITE 32-BIT AI MODELS AND REWRITES THE RULES OF SILICON 🐳 How does a Tesla remember a stop sign it hasn’t seen for 30 seconds, or a humanoid robot maintain perfect balance while carrying a heavy, shifting box? It comes down to Rotary Positional Encoding (RoPE)—the "GPS of the mind" that allows AI to understand its place in space and time by assigning a unique rotational angle to every piece of data. Usually, this math is a hardware killer. To keep these angles from "drifting" into chaos, you need power-hungry, high-heat 32-bit processors (chips that calculate with extreme decimal-point precision). But Tesla has engineered a way to cheat the laws of physics. Freshly revealed in patent US20260017019A1, Tesla’s "MIXED-PRECISION BRIDGE" is a mathematical translator that allows inexpensive, power-sipping 8-bit hardware (which usually handles only simple, rounded numbers) to perform elite 32-bit rotations without dropping a single coordinate. This breakthrough is the secret "Silicon Bridge" that gives Optimus and FSD high-end intelligence without sacrificing a mile of range or melting their internal circuits. It effectively turns Tesla’s efficient "budget" hardware into a high-fidelity supercomputer on wheels. 📉 The problem: the high cost of precision In the world of self-driving cars and humanoid robots, we are constantly fighting a war between precision and power. Modern AI models like Transformers rely on RoPE to help the AI understand where objects are in a sequence or a 3D space. The catch is that these trigonometric functions (sines and cosines) usually require 32-bit floating-point math—imagine trying to calculate a flight path using 10 decimal places of accuracy. If you try to cram that into the standard 8-bit multipliers (INT8) used for speed (which is like rounding everything to the nearest whole number), the errors pile up fast. The car effectively goes blind to fine details. For a robot like Optimus, a tiny math error means losing its balance or miscalculating the distance to a fragile object. To bridge this gap without simply adding more expensive chips, Tesla had to fundamentally rethink how data travels through the silicon. 🛠️ Tesla's solution: the logarithmic shortcut & pre-computation Tesla’s engineers realized they didn't need to force the whole pipeline to be high-precision. Instead, they designed the Mixed-Precision Bridge. They take the crucial angles used for positioning and convert them into logarithms. Because the "dynamic range" of a logarithm is much smaller than the original number, it’s much easier to move that data through narrow 8-bit hardware without losing the "soul" of the information. It’s a bit like dehydrating food for transport; it takes up less space and is easier to handle, but you can perfectly reconstitute it later. Crucially, the patent reveals that the system doesn't calculate these logarithms on the fly every time. Instead, it retrieves pre-computed logarithmic values from a specialized "cheat sheet" (look-up storage) to save cycles. By keeping the data in this "dehydrated" log-state, Tesla ensures that the precision doesn't "leak out" during the journey from the memory chips to the actual compute cores. However, keeping data in a log-state is only half the battle; the chip eventually needs to understand the real numbers again. 🏗️ The recovery architecture: rotation matrices & Horner’s method When the 8-bit multiplier (the Multiplier-Accumulator or MAC) finishes its job, the data is still in a "dehydrated" logarithmic state. To bring it back to a real angle theta without a massive computational cost, Tesla’s high-precision ALU uses a Taylor-series expansion optimized via Horner’s Method. This is a classic computer science trick where a complex equation (like an exponent) is broken down into a simple chain of multiplications and additions. By running this in three specific stages—multiplying by constants like 1/3 and 1/2 at each step—Tesla can approximate the exact value of an angle with 32-bit accuracy while using a fraction of the clock cycles. Once the angle is recovered, the high-precision logic generates a Rotation Matrix (a grid of sine and cosine values) that locks the data points into their correct 3D coordinates. This computational efficiency is impressive, but Tesla didn't stop at just calculating faster; they also found a way to double the "highway speed" of the data itself. 🧩 The data concatenation: 8-bit inputs to 16-bit outputs One of the most clever hardware "hacks" detailed in the patent is how Tesla manages to move 16-bit precision through an 8-bit bus. They use the MAC as a high-speed interleaver—effectively a "traffic cop" that merges two lanes of data. It takes two 8-bit values (say, an X-coordinate and the first half of a logarithm) and multiplies one of them by a power of two to "left-shift" it. This effectively glues them together into a single 16-bit word in the output register, allowing the low-precision domain to act as a high-speed packer for the high-precision ALU to "unpack". This trick effectively doubles the bandwidth of the existing wiring on the chip without requiring a physical hardware redesign. With this high-speed data highway in place, the system can finally tackle one of the biggest challenges in autonomous AI: object permanence. 🧠 Long-context memory: remembering the stop sign The ultimate goal of this high-precision math is to solve the "forgetting" problem. In previous versions of FSD, a car might see a stop sign, but if a truck blocked its view for 5 seconds, it might "forget" the sign existed. Tesla uses a "long-context" window, allowing the AI to look back at data from 30 seconds ago or more. However, as the "distance" in time increases, standard positional math usually drifts. Tesla's mixed-precision pipeline fixes this by maintaining high positional resolution, ensuring the AI knows exactly where that occluded stop sign is even after a long period of movement. The RoPE rotations are so precise that the sign stays "pinned" to its 3D coordinate in the car's mental map. But remembering 30 seconds of high-fidelity video creates a massive storage bottleneck. ⚡ KV-cache optimization & paged attention: scaling memory To make these 30-second memories usable in real-time without running out of RAM, Tesla optimizes the KV-cache (Key-Value Cache)—the AI's "working memory" scratchpad. Tesla’s hardware handles this by storing the logarithm of the positions directly in the cache. This reduces the memory footprint by 50% or more, allowing Tesla to store twice as much "history" (up to 128k tokens) in the same amount of RAM. Furthermore, Tesla utilizes Paged Attention—a trick borrowed from operating systems. Instead of reserving one massive, continuous block of memory (which is inefficient), it breaks memory into small "pages". This allows the AI5 chip to dynamically allocate space only where it's needed, drastically increasing the number of objects (pedestrians, cars, signs) the car can track simultaneously without the system lagging. Yet, even with infinite storage efficiency, the AI's attention mechanism has a flaw: it tends to crash when pushed beyond its training limits. 🔒 Pipeline integrity: the "read-only" safety lock A subtle but critical detail in the patent is how Tesla protects this data. Once the transformed coordinates are generated, they are stored in a specific location that is read-accessible to downstream components but not write-accessible by them. Furthermore, the high-precision ALU itself cannot read back from this location. This one-way "airlock" prevents the system from accidentally overwriting its own past memories or creating feedback loops that could cause the AI to hallucinate. It ensures that the "truth" of the car's position flows in only one direction: forward, toward the decision-making engine. 🌀 Attention sinks: preventing memory overflow Even with a lean KV-cache, a robot operating for hours can't remember everything forever. Tesla manages this using Attention Sink tokens. Transformers tend to dump "excess" attention math onto the very first tokens of a sequence, so if Tesla simply used a "sliding window" that deleted old memories, the AI would lose these "sink" tokens and its brain would effectively crash. Tesla's hardware is designed to "pin" these attention sinks permanently in the KV-cache. By keeping these mathematical anchors stable while the rest of the memory window slides forward, Tesla prevents the robot’s neural network from destabilizing during long, multi-hour work shifts. While attention sinks stabilize the "memory", the "compute" side has its own inefficiencies—specifically, wasting power on empty space. 🌫️ Sparse tensors: cutting the compute fat Tesla’s custom silicon doesn't just cheat with precision; it cheats with volume. In the real world, most of what a car or robot sees is "empty" space (like clear sky). In AI math, these are represented as "zeros" in a Sparse Tensor (a data structure that ignores empty space). Standard chips waste power multiplying all those zeros, but Tesla’s newest architecture incorporates Native Sparse Acceleration. The hardware uses a "coordinate-based" system where it only stores the non-zero values and their specific locations. The chip can then skip the "dead space" entirely and focus only on the data that matters—the actual cars and obstacles. This hardware-level sparsity support effectively doubles the throughput of the AI5 chip while significantly lowering the energy consumed per operation. 🔊 The audio edge: Log-Sum-Exp for sirens Tesla’s "Silicon Bridge" isn't just for vision—it's also why your Tesla is becoming a world-class listener. To navigate safely, an autonomous vehicle needs to identify emergency sirens and the sound of nearby collisions using a Log-Mel Spectrogram approach (a visual "heat map" of sound frequencies). The patent details a specific Log-Sum-Exp (LSE) approximation technique to handle this. By staying in the logarithm domain, the system can handle the massive "dynamic range" of sound—from a faint hum to a piercing fire truck—using only 8-bit hardware without "clipping" the loud sounds or losing the quiet ones. This allows the car to "hear" and categorize environmental sounds with 32-bit clarity. Of course, all this high-tech hardware is only as good as the brain that runs on it, which is why Tesla's training process is just as specialized. 🎓 Quantization-aware training: pre-adapting the brain Finally, to make sure this "Mixed-Precision Bridge" works flawlessly, Tesla uses Quantization-Aware Training (QAT). Instead of training the AI in a perfect 32-bit world and then "shrinking" it later—which typically causes the AI to become "drunk" and inaccurate—Tesla trains the model from day one to expect 8-bit limitations. They simulate the rounding errors and "noise" of the hardware during the training phase, creating a neural network that is "pre-hardened". It’s like a pilot training in a flight simulator that perfectly mimics a storm; when they actually hit the real weather in the real world, the AI doesn’t "drift" or become inaccurate because it was born in that environment. This extreme optimization opens the door to running Tesla's AI on devices far smaller than a car. 🚀 The strategic roadmap: from AI5 to ubiquitous edge AI This patent is not just a "nice-to-have" optimization; it is the mathematical prerequisite for Tesla’s entire hardware roadmap. Without this "Mixed-Precision Bridge", the thermal and power equations for next-generation autonomy simply do not work. It starts by unlocking the AI5 chip, which is projected to be 40x more powerful than current hardware. Raw power is useless if memory bandwidth acts as a bottleneck. By compressing 32-bit rotational data into dense, log-space 8-bit packets, this patent effectively quadruples the effective bandwidth, allowing the chip to utilize its massive matrix-compute arrays without stalling. This efficiency is critical for the chip's "half-reticle" design, which reduces silicon size to maximize manufacturing yield while maintaining supercomputer-level throughput. This efficiency is even more critical for Tesla Optimus, where it is a matter of operational survival. The robot runs on a 2.3 kWh battery (roughly 1/30th of a Model 3 pack). Standard 32-bit GPU compute would drain this capacity in under 4 hours, consuming 500W+ just for "thinking". By offloading complex RoPE math to this hybrid logic, Tesla slashes the compute power budget to under 100W. This solves the "thermal wall", ensuring the robot can maintain balance and awareness for a full 8-hour work shift without overheating. This stability directly enables the shift to End-to-End Neural Networks. The "Rotation Matrix" correction described in the patent prevents the mathematical "drift" that usually plagues long-context tracking. This ensures that a stop sign seen 30 seconds ago remains "pinned" to its correct 3D coordinate in the World Model, rather than floating away due to rounding errors. Finally, baking this math into the silicon secures Tesla's strategic independence. It decouples the company from NVIDIA’s CUDA ecosystem and enables a Dual-Foundry Strategy with both Samsung and TSMC to mitigate supply chain risks. This creates a deliberate "oversupply" of compute, potentially turning its idle fleet and unsold chips into a distributed inference cloud that rivals AWS in efficiency. But the roadmap goes further. Because this mixed-precision architecture slashes power consumption by orders of magnitude, it creates a blueprint for "Tesla AI on everything". It opens the door to porting world-class vision models to hardware as small as a smart home hub or smartphone. This would allow tiny, cool-running chips to calculate 3D spatial positioning with zero latency—bringing supercomputer-level intelligence to the edge without ever sending private data to a massive cloud server.

English

950

1.8K

10.3K

4.8M

karun@karun_kumar_·15 Eyl

Check out our new paper: "WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers" This work introduces a novel approach to improve speech recognition models using text-only data. Read the full paper here: arxiv.org/abs/2509.10452

English

karun retweetledi

Raphael Tang@ralph_tang·15 Eyl

We introduce WhisTLE (arxiv.org/abs/2509.10452): the first deeply supervised, text-only domain adaptation method for pretrained ASR models like Whisper. Tl;dr: fast, no extra runtime cost, 50%+ relative word error rate reduction

English

214

karun retweetledi

James Lucas@JamesLucasIT·13 Mar

Thread of surreal sculpture details 🧵 1. There is no rope in this image... it's marble.

English

481

5.4K

46.2K

3.7M

karun retweetledi

EMNLP 2026@emnlpmeeting·15 Kas

Announcing the 20 **Outstanding Papers** for #EMNLP2024

English

201

109.5K

karun retweetledi

Shashi Tharoor@ShashiTharoor·18 Eki

Extremely valid questions. Well done @ErikSolheim !

Erik Solheim@ErikSolheim

The Canada 🇨🇦 - India 🇮🇳 row underlines the need for a rules based global order. This week Canada expelled the Indian ambassador to Canada and accused India of murdering sikh extremists in Canada, without providing any evidence. India reciprocated. The relationship reached rock bottom. Hopefully relations will be restored, as the two nations can achieve a lot from working together. There are deep human bounds. But the row also underlines the neeed for a true rules based international order. The West must stop acting as if there is one set of rules for the West, another for the global South. * During the farmers agitation in India, Prime Minister Trudeau intervened and supperted the farmers. Would Canada accept if Modi starts supporting popular protests against the Canadian government? * Canada doesnt seem to be overly concerned that some sikh extremists call for a seperate state carved out of India. No measure is taken, even when support for violence is on display. Would it be OK if Modi offered his support to Québec separatism? * Last year the European Parliament made a resolution on the ethnic conflict in Manipur. I doubt any member of the European parliament can pinpoint Manipur on a map. Would Europe welcome a Lok Sabha resolution on the disputes between Spain and Catalonia? * Canada accuses India for its allerged anti terrorism operation. Why is Canada pointing finger to India and not to the United States and Israel who is constantly killing people they consider terrorists inside a number of other states? Its time for a new rues based global order. The same rules must apply to all.

English

495

1.2K

8.8K

487.8K

karun retweetledi

Tesla Hype@TeslaHype·13 Eki

What did you get done this week? Elon Musk: hold my beer

English

1.2K

13.7K

80.3K

3.1M

karun retweetledi

SpaceX@SpaceX·13 Eki

Mechazilla has caught the Super Heavy booster!

English

11.4K

61.2K

248.1K

45.1M

karun@karun_kumar_·10 Eki

Bharat lost a Ratan today. Rest in peace Sir Ratan Tata. #RatanTata

English

karun@karun_kumar_·26 Ağu

RT @Rainmaker1973: Zooming into iPhone CPU die (5nm)

Polski

163

karun retweetledi

MIT CSAIL@MIT_CSAIL·4 Ağu

The energy consumption of different programming languages (v/@burkov): bit.ly/3WiXC9M One finding: Python consumes 76x more energy while being 72x slower than C.

English

354

1.4K

7.1M

karun@karun_kumar_·16 Tem

Best fit! This is going to be great!

Andrej Karpathy@karpathy

⚡️ Excited to share that I am starting an AI+Education company called Eureka Labs. The announcement: --- We are Eureka Labs and we are building a new kind of school that is AI native. How can we approach an ideal experience for learning something new? For example, in the case of physics one could imagine working through very high quality course materials together with Feynman, who is there to guide you every step of the way. Unfortunately, subject matter experts who are deeply passionate, great at teaching, infinitely patient and fluent in all of the world's languages are also very scarce and cannot personally tutor all 8 billion of us on demand. However, with recent progress in generative AI, this learning experience feels tractable. The teacher still designs the course materials, but they are supported, leveraged and scaled with an AI Teaching Assistant who is optimized to help guide the students through them. This Teacher + AI symbiosis could run an entire curriculum of courses on a common platform. If we are successful, it will be easy for anyone to learn anything, expanding education in both reach (a large number of people learning something) and extent (any one person learning a large amount of subjects, beyond what may be possible today unassisted). Our first product will be the world's obviously best AI course, LLM101n. This is an undergraduate-level class that guides the student through training their own AI, very similar to a smaller version of the AI Teaching Assistant itself. The course materials will be available online, but we also plan to run both digital and physical cohorts of people going through it together. Today, we are heads down building LLM101n, but we look forward to a future where AI is a key technology for increasing human potential. What would you like to learn? --- @EurekaLabsAI is the culmination of my passion in both AI and education over ~2 decades. My interest in education took me from YouTube tutorials on Rubik's cubes to starting CS231n at Stanford, to my more recent Zero-to-Hero AI series. While my work in AI took me from academic research at Stanford to real-world products at Tesla and AGI research at OpenAI. All of my work combining the two so far has only been part-time, as side quests to my "real job", so I am quite excited to dive in and build something great, professionally and full-time. It's still early days but I wanted to announce the company so that I can build publicly instead of keeping a secret that isn't. Outbound links with a bit more info in the reply!

English

karun retweetledi

The Tennis Letter@TheTennisLetter·7 Tem

Novak Djokovic says tennis is endangered at the club level, ‘If we don’t do something about it globally, they’re gonna convert all the tennis clubs into paddle or pickleball’ “In terms of innovation in our sport… other than Slams, we have to figure out how to attract a young audience. Tennis on one hand is in a good place, but at the same time, when we look at Formula 1 for example and what they’ve done in terms of marketing, in terms of growth of the sport, in terms of the races around the world and how popular they are.. I think we need to do a better job on our respective tours. The grand slams are always gonna do well. But I think our tours need to do better. We are lucky to be very historic and a very global sport. But I think one of the studies that was done by PTPA 3 or 4 years ago showed that tennis is the 3rd or 4th most watched sport in the world along with cricket. Number 1 is football or soccer as you call it in the states. Second is basketball. Then it’s tennis and cricket. But tennis is number 9 or 10 on the list of all sports in terms of using its popularity, commercializing or capitalizing on that. I think there’s a huge space for growth. We’re quite fractioned as a sport. There are quite a bit of things for us to collectively look at and try to improve it. We need to grow the number of players that live from this sport. Very rarely do I see in the media that you guys are writing about the fact that you have only 350 to 400 players both men, women, singles, doubles across the board that live from this sport on this planet. That’s deeply concerning for me. Yes, we talk about the grand slam winner wins this or that. The focus is always on the grand prize but what about the base level? We’re still doing a very poor job there… very poor job. Tennis is a very global sport and it’s loved by millions of children that pick up a racquet that wanna play, but we don’t make it accessible. We don’t make it so affordable. Especially in countries like mine that doesn’t have a strong federation, that has Grand Slam or history or big budgets.. so I think collectively we all have to come together or create a new foundation, a corner stone of really what tennis is about.. which is the base level. The club level. Now we have paddle that is growing and emerging. People kind of have fun with it and say ‘Yeah but tennis is tennis. Tennis is the king or queen of all racquet sports.’ That’s true. But on a club level, tennis is endangered. If we don’t do something about it, globally or collectively, paddle, pickleball in the states, they’re gonna convert all the tennis clubs into paddle and pickleball. Because it’s more economical. You have one tennis court… you can build 3 paddle courts on one tennis court. Do the simple math. It’s just much more financially viable for the owner of the club to have those courts. These are some of the things I wanted to share. In the grand scheme of things, we need to address all these challenges and issues. Because they’ve been out there for a while. I don’t think we’ve been addressing it in the proper way.” (via Wimbledon Press)

English

766

3.3K

27.1K

3.3M

karun retweetledi

Raphael Tang@ralph_tang·13 Haz

What words are worth a thousand pictures in text-to-image generation? Check out w1kp.com and our paper!

Jimmy Lin@lintool

They say a picture is worth a thousand words... but work led by @ralph_tang finds words worth a thousand pictures! arxiv.org/abs/2406.08482

English

1.4K

karun retweetledi

Google DeepMind@GoogleDeepMind·13 Haz

With @Harvard, we built a ‘virtual rodent’ powered by AI to help us better understand how the brain controls movement. 🧠 With deep RL, it learned to operate a biomechanically accurate rat model - allowing us to compare real & virtual neural activity. → dpmd.ai/3RobU7e

English

333

1.4K

228.9K

karun retweetledi

Yann LeCun@ylecun·12 Haz

LiveBench! Livebench.ai

Micah Goldblum@micahgoldblum

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks @crwhite_ml and @SpamuelDooley for leading the charge! Link: livebench.ai Existing LLM benchmarks have serious limitations: 🧵

English

364

94.8K

karun retweetledi

The Tennis Letter@TheTennisLetter·9 Haz

Carlos Alcaraz runs to embrace the Roland Garros ball kids. 🥹 One of the best moments of the tournament. Smiles on all their faces and a very happy Carlitos. A kid at heart. ❤️

English

191

3.6K

46.3K

1.9M

karun retweetledi

OpenAI@OpenAI·6 Haz

We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse autoencoders at scale, disentangling GPT-4’s internal representations into 16 million features—which often appear to correspond to understandable concepts. openai.com/index/extracti…