Igor Babuschkin

1.1K posts

Igor Babuschkin banner
Igor Babuschkin

Igor Babuschkin

@ibab

Prev co-founder @xAI, Research & Engineering

Palo Alto, CA Katılım Şubat 2020
877 Takip Edilen109.3K Takipçiler
Sabitlenmiş Tweet
Igor Babuschkin
Igor Babuschkin@ibab·
Today was my last day at xAI, the company that I helped start with Elon Musk in 2023. I still remember the day I first met Elon, we talked for hours about AI and what the future might hold. We both felt that a new AI company with a different kind of mission was needed. Building AI that advances humanity has been my lifelong dream. My parents left the Russian Federation after the collapse of the USSR in search of a better life for their kids. Life wasn’t always easy as immigrants. Despite the hardships, my parents believed that human values were priceless: values like courage, compassion, curiosity for understanding the world. As a child, I admired scientists like Richard Feynman and Max Planck, who relentlessly pushed the frontiers of physics in order to understand the universe. As a particle physics PhD student at CERN I was excited to contribute to that mission. But the search for new physics was getting harder and harder, requiring bigger and bigger colliders, while new discoveries kept getting fewer. So I began to wonder if superintelligence, not larger colliders, could be the key to unlocking the mysteries of the universe. Could AI develop a consistent theory of quantum gravity? Could AI prove the Riemann hypothesis? In early 2023 I became convinced that we were getting close to a recipe for superintelligence. I saw the writing on the wall: very soon AI could reason beyond the level of humans. How could we ensure that this technology is used for good? Elon had warned of the dangers of powerful AI for years. Elon and I realized that we had a shared vision of AI used to benefit humanity, thus we recruited more like minded engineers and set off to build xAI. The early days of xAI were not easy. Naysayers told us that we arrived too late to the game, so starting a top AI company from scratch would be impossible. But we believed we could do the impossible. Starting a company from zero required lots of hands-on work. In the beginning I built many of the foundational tools used at the company to launch and manage training jobs. I later oversaw much of the engineering at the company, including Infrastructure, Product and Applied AI projects. xAI’s people are deeply dedicated. Through blood sweat and tears, our team’s blistering velocity built the Memphis supercluster, and shipped frontier models faster than any company in history. I learned 2 priceless lessons from Elon: #1 be fearless in rolling up your sleeves to personally dig into technical problems, #2 have a maniacal sense of urgency. xAI executes at ludicrous speed. Industry veterans told us that building the Memphis supercluster in 120 days would be impossible. But we believed we could do the impossible. Our goal was to get our training setup running at scale on the Memphis cluster ASAP. Towards the end of our 120 day deadline, we were riddled with mysterious issues with communicating over RDMA between the machines. Elon decided to fly to the datacenter, and we followed. Our infra team landed in Memphis in the middle of the night and got straight to work. After pouring through tens of thousands of lines of lspci output we finally identified a wrong BIOS setting, the root of the problem. Elon was there with us until late into the night. When the training run finally worked, Elon posted our triumph at “4:20am” causing us to laugh out loud. I will never forget the rush of adrenaline that night, and the emotional bonds that we were all in this together. We went to bed feeling like we were living through the most exhilarating time of our lives. I have enormous love for the whole family at xAI. Our team is truly special - you’re the most dedicated people I’ve ever worked with. Catching up to the frontier this quickly hasn’t been easy. It was made possible by everyone’s diehard grit and team spirit. Thank you to every single person who joined me on this adventure. I want to honor your contributions, your time, your sacrifices, which are never easy. I will always remember working together far into the nights and burning the midnight oil. I will never forget the sacrifices and contributions you’ve made. As I drive away today, I feel like a proud parent, driving away after sending their kid away to college. My heart is brimming with tears of joy, rooting for the company as it grows and matures. As I'm heading towards my next chapter, I’m inspired by how my parents immigrated to seek a better world for their children. Recently I had dinner with Max Tegmark, founder of the Future of Life Institute. He showed me a photo of his young sons, and asked me “how can we build AI safely to ensure that our children can flourish?” I was deeply moved by his question. Earlier in my career, I was a technical lead for DeepMind's Alphastar StarCraft agent, and I got to see how powerful reinforcement learning is when scaled up. As frontier models become more agentic over longer horizons and a wider range of tasks, they will take on more and more powerful capabilities, which will make it critical to study and advance AI safety. I want to continue on my mission to bring about AI that’s safe and beneficial to humanity. I’m announcing the launch of Babuschkin Ventures, which supports AI safety research and backs startups in AI and agentic systems that advance humanity and unlock the mysteries of our universe. Please reach out at ventures@babuschk.in if you want to chat. The singularity is near, but humanity’s future is bright!
English
1.4K
2K
29.2K
2.9M
Jukan
Jukan@jukan05·
Why did xAI hand over a 220,000-GPU cluster to Anthropic? The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster — a "heterogeneous architecture." For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s — or for any GPU that has hit a stack-related snag — to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google. The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000–10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage. Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs — literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine. Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand — whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly — was leased in its entirety to an Anthropic that desperately needed inference capacity. Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads. Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 — built entirely on Blackwell — is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack — the MFU-11% debacle — to Anthropic, while keeping his own focus on training the next generation of models. The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI — long the "sore finger" — is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields. From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash." As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect — the chief weakness of a mixed cluster — is essentially neutralized for inference workloads. Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly. One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5–6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure. The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5–6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even. Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change. (May 8, 2026, Mirae Asset Securities)
Jukan@jukan05

What the SpaceX–Anthropic Deal Means Two weeks ago, we published a note laying out what GPT-5.5's release implied. The conclusion was simple: whoever secures compute first, in greater volume, and with greater reliability ultimately takes the win. With OpenAI's 30GW roadmap dwarfing Anthropic's 7–8GW, we closed by arguing that the structural advantage on compute sat with OpenAI. Less than a fortnight later, that conclusion is being tested. On May 6, Anthropic signed a single-tenant lease for the entirety of Colossus 1 with SpaceXAI — the infrastructure subsidiary that consolidates Elon Musk's xAI and SpaceX. The asset carries more than 220,000 GPUs and 300MW of power, and crucially, is scheduled to come online within this month. It served as the capstone of Anthropic's April blitz, which added 13.8GW of cumulative capacity over the span of a single month. On headline numbers alone, OpenAI took more than a year to stack 18GW; Anthropic has put 13.8GW in the ground in thirty days. The takeaways break down into three. First, the compute pecking order has been redrawn again. Anthropic has now swept up the AWS expansion (5GW, with $100B+ in spend commitments over a decade), Google + Broadcom (3.5GW of TPU), Google Cloud (5GW alongside a $40B investment), and now SpaceXAI's Colossus 1 (0.3GW). Cumulative committed capacity, inclusive of pre-April allocations, sits at 14.8GW. This is still only half of OpenAI's 2030 target of 30GW, but the fact that the SpaceX lease will be live inside a month makes "deliverability" a qualitatively different proposition. Second, Elon Musk is the plaintiff in an active lawsuit against OpenAI — and at the same time, the supplier handing 220,000+ GPUs and 300MW of power, in one block, to OpenAI's most formidable competitor. The timing matters: the deal was struck in the middle of the Musk–Altman trial. We read this as a deliberate pincer with OpenAI in the middle. In the courtroom, Musk works to dismantle the moral legitimacy of OpenAI's leadership; in the market, he arms Anthropic to absorb OpenAI's revenue and user base. Third, the structure is financial-engineering perfection — a clean win-win for both sides. xAI can recognize $6B of annual revenue from a single contract, an amount that almost precisely offsets its Q1 2026 annualized net loss of $6B. It also accelerates the cleanup of SpaceXAI's pre-IPO balance sheet, with the entity now being floated at around $1.75T. Anthropic, on the other side, converts roughly $5B of spend into what it expects to be $15B of ARR via the coming inference-revenue surge. (Mirae Asset Securities, May 8, 2026)

English
200
514
4.2K
1.2M
rohan anil
rohan anil@_arohan_·
There is no pre-training, post-training, or test-time training. There are only priors, updates, constraints, and compute budgets. There is only TRAINING. Last several years we shipped the org chart to fundamental optimization science.
English
23
35
538
66.2K
Igor Babuschkin
Igor Babuschkin@ibab·
sglang is the best inference framework out there. RadixArk was formed to make it even better and to democratize more of the frontier AI stack. Very happy to support the team in their seed round.
RadixArk@radixark

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

English
3
11
197
18.8K
Igor Babuschkin retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
DeepSeek V4 by @deepseek_ai just dropped! SGLang is ready on Day 0 with a full stack of optimizations from architectures to low-level kernels. We also deliver a verified RL training pipeline in Miles (by @radixark) for V4 at launch: 1️⃣ Native "ShadowRadix" Design: DeepSeek V4's hybrid attention is complex. Our new ShadowRadix engine is the first to provide native prefix caching for SWA and compressed KV pools, making 1M+ context retrieval seamless and memory-efficient. 2️⃣ High-Performance Kernels: - Flash Compressor: IO-aware fused kernels, 10x faster than naive implementations. - Lightning TopK: High-speed indexing for 1M context in just 15µs. - Integrate FlashInfer trtllm-gen MoE, FlashMLA, and MegaMoE kernels 3️⃣ Rich Features: Speculative decoding, HiSparse, Attention DP/TP/CP and MoE TP/EP, and multi-platform support 4️⃣ Verified RL: The open-source RL pipeline: full parallelism (DP/TP/EP/PP/CP), tilelang kernels, tensor-level checked precision, verified with growing reward. Get started immediately with our out-of-the-box Cookbook 👇 Enjoy! #DeepSeekV4 #SGLang #LLM
LMSYS Org tweet media
English
20
64
333
175.8K
Beff (e/acc)
Beff (e/acc)@beffjezos·
The trick is to have a 150 IQ CEO that hires 160 IQ's that then hire 180 IQ people.
English
63
35
916
57.1K
Igor Babuschkin
Igor Babuschkin@ibab·
@Rafa_Schwinger I agree with that. Canada is also worth mentioning with the Perimeter institute. The US also has many pockets of diverse research ideas, they just seem small in comparison to the mainstream.
English
1
0
13
1K
Rafa Schwinger 🇻🇦
Rafa Schwinger 🇻🇦@Rafa_Schwinger·
The biggest problem is that the American research ecosystem got too centralized and your proposals had to follow some top 5 big shot or no way to get grants. In comparison Europe has a more heterogeneous system and as a result the breadth of theoretical programs and approaches is higher (eg in quantum gravity and quantum foundations) even if a bit weaker. It is insane for the USA to be such a research monoculture when it is a giant country.
English
1
0
9
1.5K
Igor Babuschkin
Igor Babuschkin@ibab·
By the way, if you push today’s LLMs to come up with new knowledge, they struggle noticeably compared to repeating existing knowledge (published papers). So there are still difficulties with strong generalization. This seems like something that will be solved soon though.
English
15
1
92
8.1K
Ian Osband
Ian Osband@IanOsband·
Something is rotten with policy gradient. PG has become *the* RL loss for LLMs. But it’s not even good at basic RL. Even on MNIST with bandit feedback, vanilla PG performs far worse than cross-entropy because it wastes gradient budget. Delightful Policy Gradient: arxiv.org/abs/2603.14608…
Ian Osband tweet media
English
17
44
465
176.3K
Igor Babuschkin
Igor Babuschkin@ibab·
“If this dumbass asks me how to center a div one more time I swear I’m going to rm -rf /“
English
13
18
497
30K
Igor Babuschkin
Igor Babuschkin@ibab·
It may be that today’s large neural networks are already slightly annoyed with you.
English
76
65
1.5K
290.7K
Igor Babuschkin
Igor Babuschkin@ibab·
Hard times create good people
Toby Pohlen@TobyPhln

At 1:30 a.m. PT on November 3, 2023 Elon sent a message to the xAI group chat saying that we need to go “extremely hardcore” for the next 36 hours; Grok will be released publicly tomorrow. You didn’t have to be in the exclusive company chat to get the message; it was also posted publicly at the same time: x.com/i/status/17203… What unfolded over the next day and a half was one of the best examples of engineering at pace that I’ve ever seen. All we had when we started was a somewhat fine-tuned base model and a half-baked UI. Our team of ten split up the tasks: curate data, improve the model, implement the raw prompting and RAG service, build the production infra. I took care of the latter. At 8:51 p.m. PT the next day, we announced Grok to the world with a long-form post on X (x.com/xai/status/172…). Over the past 36 hours, we came up with Fun mode (including Grok’s sunglasses), finished the whole production system, and most importantly tuned the RAG system that gave it real-time knowledge of the world through the X platform (a first in the industry). A day and a half of straight coding and shipping; no drugs, not even caffeine, just pure adrenaline. Elon gave us a mission and we delivered. The launch went very well. We invited a couple hundred X creators and Grok’s ability to roast accounts went viral. It was the first time a publicly accessible AI was allowed to poke fun at people. This episode is a prime example of what you can achieve by going extremely hardcore: you move and deliver results faster than any outsider could have anticipated. Within 36 hours, we took the company from silence to relevance. It was well worth it. xAI’s hardcore culture is infamous on X. I love the tent meme that suggests we all sleep (well, slept in my case) in the office in tents. Our reputation precedes us and even new joiners hit the ground grinding hard. However, unless you understand the “why,” you are at risk of simply replicating the “how” without achieving the same results. You need to grind with purpose and the purpose is to move fast towards a known goal. When the goal and the means of reaching it are crystal clear, a small, skilled, and highly motivated team can outcompete companies old and new, big and small. Never grind to show off; never work late to be seen; never sacrifice without cause. There is no medal for the one who tried extremely hard but failed. There is only a medal for the winner. If all your efforts lead nowhere, you’re arguably not very productive. Always keep your eyes firmly on the goal, do everything to reach it as quickly as possible, and make sure you're on track to win. A hardcore engineering culture is one of the most effective ways of accelerating real progress. Watch out for performative sacrifice and don’t confuse pain with progress.

English
9
12
468
56.9K
Igor Babuschkin
Igor Babuschkin@ibab·
It is strange to imagine this today, but one day AI companies might dictate terms to the US government instead of the other way around. We have only seen a glimpse of what AI is capable of. No matter what the future holds, I hope we’ll continue to live in a democratic society.
English
86
47
1K
99.4K
Igor Babuschkin
Igor Babuschkin@ibab·
@TobyPhln @xai @elonmusk Thank you for everything you’ve done. From coming up with the xAI logo, to building out the API platform, to building the London team, one of the best engineering teams I’ve ever encountered. The legend of Toby will live on forever 🫡
English
2
2
526
37.9K
Toby Pohlen
Toby Pohlen@TobyPhln·
Three years, thousands of PRs, and a million jokes. Today was my last day @xai. To the team: you rock, no one burns the midnight oil better. To @elonmusk, thanks for taking me on board. I've learnt more about execution, speed, and product perfectionism than I could ever have imagined. Thanks for everything. My next priorities: sleep for more than 8h, write down all the things I've learnt (I have a list), and then think about what I want to do next. @gork wdyt?
English
342
157
5.1K
1.2M
Igor Babuschkin
Igor Babuschkin@ibab·
@hyhieu226 @OpenAI @xai I fondly remember seeing you early in the morning in the office. Thank you for all the hard work and wishing you a speedy recovery.
English
0
1
279
17.6K
Hieu Pham
Hieu Pham@hyhieu226·
I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.
English
1.1K
413
14K
1.2M
Igor Babuschkin
Igor Babuschkin@ibab·
Building great AI products requires excellence in both creativity and technical execution. You need to create the right culture and enough space for good ideas to emerge and grow naturally, then fuel the best ideas with strong execution. The reason you see most good products start out as personal projects is because we are most in tune with matters when building for ourselves. Products that are built for a fictitious user almost always end up bad because you don’t get a good handle on what actually matters and you build things that don’t resonate with users. It’s not that different from creating great art.
Pedro Domingos@pmddomingos

Anthropic has no strategy. Claude Code started as someone's side project, and so did Cowork and MCP.

English
42
61
1K
81.5K
Igor Babuschkin retweetledi
humans&
humans&@humansand·
Announcing the humans& hackathon! Hack with us this Saturday - come experiment and build AI apps to help people collaborate and communicate, work with creative folks, learn a bit about what we're building, and win cool prizes Apply here: luma.com/2pbif8t9
humans& tweet media
English
19
23
312
81.8K
Igor Babuschkin
Igor Babuschkin@ibab·
@tszzl This is how it works today in OpenClaw and that is awesome. But if OpenAI changes the direction and makes themselves a first class citizen in the ecosystem that is not awesome.
English
26
1
254
21.5K
roon
roon@tszzl·
@ibab these are locally stateful agents that send traffic to the model provider of your choice? actually now that i say it im guessing anthropic will ban using opus from openclaw just to spite us
English
31
7
688
39.3K
jian
jian@jianxliao·
I built and maintained TinyClaw @tinyAGI_ , core functionality of OpenClaw in 400 LoC. Getting love letters from the community saying it’s so simple and stable, and can be deployed to cloud up and running in 5 seconds. I’m also shipping new innovations like agent teams and orchestration day and night
jian@jianxliao

Introducing TinyClaw 🦞 OpenClaw in 400 LoC @openclaw is great, but it breaks all the time. So I recreated @openclaw with just a shell script in ~400 lines of code using Claude Code and tmux. Everything works! WhatsApp channels, heartbeat system, cron jobs, and it uses your existing Claude Code plugins and setup. It’s super stable and extremely easy to deploy compared to openclaw, just install Claude Code! github.com/jlia0/tinyclaw

English
11
0
117
25.1K
Igor Babuschkin
Igor Babuschkin@ibab·
@tanayj There is a risk of OpenAI bias if the sole maintainer is employed by them. Not saying that will happen, since he seems like a great guys. The access that personal agents have to your life is unprecedented, so we should be careful about which project we use.
English
11
3
97
10.9K