Vladimir Baranov

3K posts

Vladimir Baranov banner
Vladimir Baranov

Vladimir Baranov

@vbar_io

world's first working continual learning LLM: https://t.co/hAMh5DElf9. infinite canvas IDE for your terminals: https://t.co/mwSM3EEbB8. ex-Morgan Stanley, Coca Cola engineer

เข้าร่วม Şubat 2024
230 กำลังติดตาม1.7K ผู้ติดตาม
ทวีตที่ปักหมุด
Vladimir Baranov
Vladimir Baranov@vbar_io·
Your Laptop Can Run a Mind, But Never a Superintelligence We are about to split into two civilizations: those who own their intelligence, and those who rent it. A 70B parameter model running on a 128GB Apple laptop is likely sufficient for continuously-learning human-level intelligence. A trillion-parameter superintelligence will never run on your local machine. Both of these things are true simultaneously, and the gap between them is not a temporary engineering problem waiting to be solved. It is a permanent feature of physics, and it will reshape society more profoundly than the internet did. Here is why the 70B ceiling is higher than people think. The human brain has roughly 86 billion neurons. It does not grow new neurons when you learn something. It reweights existing connections. A static 70B model is a snapshot frozen at training time. A continuously learning 70B model is a living system doing exactly what your brain does: reshaping itself from experience, every day. The parameter count becomes a vessel that is constantly being reformed. Size stops being the variable. Temporal depth of adaptation becomes the variable. A 128GB M-series MacBook has unified memory shared across CPU, GPU, and Neural Engine at roughly 800 GB/s bandwidth. A 70B model in 4-bit quantization fits in about 38GB, leaving substantial room for context, memory buffers, and lightweight gradient updates. For the first time in history, the continuous learning loop can close locally, in real time, on a device you own. Now for the hard ceiling at the top. A 1 trillion parameter model at aggressive 2-bit quantization requires roughly 250GB just to hold the weights, before activations, before the KV cache, before any actual compute happens. No consumer device in any foreseeable roadmap touches this. But memory size is not even the binding constraint. LLM inference is almost entirely limited by how fast you can stream weights from memory to compute units. A trillion-parameter forward pass requires moving trillions of values. Even at theoretical consumer memory bandwidth speeds, generating a single token takes seconds. Then there is heat. A laptop sustains 20 to 40 watts. Dense superintelligence inference requires hundreds of kilowatts and active liquid cooling. This is not an engineering gap closing over time. The requirements of the largest models are diverging from consumer hardware, not converging toward it. What emerges is a permanent three-tier structure: - At the bottom, sub-human local models between 1B and 13B parameters run on phones and embedded devices, fast and cheap and private, handling narrow tasks brilliantly, essentially free and commoditized. - In the middle, human-level local models between 30B and 100B parameters represent the genuinely disruptive tier: capable of sustained reasoning, creative work, and long-horizon planning, running privately and persistently on hardware you control, adapting to your thinking over time, operating without sending a single byte to a server. A high-end Apple Silicon laptop sits at the frontier of this tier right now. - At the top, dense superintelligence above a trillion parameters will exist exclusively in hyperscaler data centers operated by a handful of companies and governments, capable of cross-domain synthesis at a scale no human or local model can approach, running thousands of parallel reasoning chains, accessed on someone else's terms, metered and monitored and expensive. The separation is not just technical. It is political. Tier 2 democratizes human-level reasoning. Anyone with capable hardware gets a private, persistent, unkillable cognitive partner that knows their history and can never be revoked. Tier 3 concentrates superhuman reasoning in whoever controls the infrastructure. The most consequential design decisions of the next decade will not be about model architecture or benchmark scores. They will be about which capabilities live in which tier, and who gets to decide. That question is already being answered, mostly without public debate, mostly by the people who benefit most from keeping superintelligence behind a paywall and a terms-of-service agreement.
English
9
8
65
13K
Vladimir Baranov
Vladimir Baranov@vbar_io·
@ivanburazin You have to be able to copy past to write real prompts. Utterances are a subset of prompts.
English
0
0
0
19
Ivan Burazin
Ivan Burazin@ivanburazin·
I had a horrible dream last night: we were throwing away our keyboards and getting this for everyone in the office
Ivan Burazin tweet media
English
25
0
59
4.4K
Todd Saunders
Todd Saunders@toddsaunders·
I have a few friends applying for new jobs and they all ask me the same question. What exactly does "AI-native" mean. And as much as I have an answer for them, I really don't. How would you explain it to them?
English
46
1
17
7.2K
Vladimir Baranov
Vladimir Baranov@vbar_io·
@PatrickToulme I wrote about this "Silicon Split" in intelligence recently. You might find it interesting: x.com/vbar_io/status…
Vladimir Baranov@vbar_io

The next war over AI will not be fought in model weights. It will be fought in silicon. And the winner will be determined not by who has the best algorithm, but by who owns the physics. Software-based AI running on general-purpose processors is a temporary historical accident. It works the way the early internet worked over phone lines: functional, transformative, but fundamentally mismatched to the medium. We are now entering the era where AI migrates from borrowed hardware into purpose-built substrate, and almost nobody is talking about what this means for the power structure that was just starting to form around the current paradigm. Here is why this migration is inevitable: A general-purpose CPU spends the majority of its energy on things that have nothing to do with intelligence. Instruction fetching. Branch prediction. Speculative execution. Cache coherency protocols. A modern processor is an elaborate bureaucracy where the actual math of a neural network is a small tenant in a large building full of overhead. A custom chip designed to do nothing but matrix multiplications and attention computations strips all of that away. The same operation that costs one unit of energy on a GPU costs a tenth or a hundredth on a purpose-built ASIC. At the scale of a datacenter running trillions of inference operations per day, this is not an optimization. It is the difference between economic viability and bankruptcy. But energy is not even the deepest reason. Memory bandwidth is. The transformer architecture has a limiting secret that no amount of software cleverness can fix. Attention requires moving enormous volumes of data between where it is stored and where it is computed. On conventional hardware, memory lives in one place and compute lives in another, connected by a bus that becomes the bottleneck for everything. This is the Von Neumann wall, and it has been the binding constraint on computing for seventy years. Software cannot route around physics. You cannot optimize your way past the speed of electrons moving through copper traces on a motherboard. The only solution is to put the computation where the memory already is. Processing-in-memory. Near-memory compute. Architectures where the data never moves because the math happens at the point of storage. This requires new silicon. No driver update will deliver it. Then there is numerical precision. Your GPU implements IEEE floating point because it was designed for a world where rendering a pixel incorrectly was unacceptable. AI does not need that. A transformer will produce nearly identical outputs whether you compute in 32-bit floating point, 16-bit, 8-bit, or in some cases 4-bit. Custom AI hardware can implement unusual number formats natively, formats that do not exist in any software library because no general-purpose chip has ever supported them. FP8... FP4... Block floating point... Logarithmic number systems... Each of these buys you a 2x to 8x improvement in throughput for free, because you are doing less work per operation. And the model does not care. Software emulation of these formats eats the gains. Only native silicon makes them real. Now extend this logic to the extremes and see where it leads. The implications are a huge unlock: A purpose-built inference chip consuming two watts can run a meaningful model on a device with no fan, no cloud connection, and no terms of service. Your phone. Your car. Your glasses. A medical device implanted in your body. The same model that requires a rack-mounted GPU server today could run on a chip the size of a fingernail tomorrow, not because the model shrank, but because the silicon was sculpted to fit it exactly. This is how Tier 2 intelligence, the human-level local AI from the previous discussion, breaks free of the laptop form factor and becomes ubiquitous. Dedicated hardware is what makes private, persistent, always-on intelligence physically possible in objects you carry. At the top, the implications are concentrating. And this is the part that requires honest accounting, because the hardware case for superintelligence is not just an amplified version of the edge story. It is a different kind of engineering entirely, governed by new constraints. Start with the raw arithmetic. A dense trillion-parameter model, even on purpose-built silicon, requires holding those parameters somewhere and streaming them through compute units on every forward pass. Custom number formats help. FP4 and aggressive quantization can compress a trillion parameters into 500GB or less. But a single chip, no matter how specialized, holds at most tens of gigabytes of on-die memory. A wafer-scale chip like Cerebras pushes this further, tiling an entire silicon wafer with cores and on-chip SRAM, eliminating the need to go off-chip for a large fraction of the weights. But even a full wafer tops out. A trillion-parameter superintelligence at any useful precision does not fit on one wafer. It is distributed across dozens or hundreds of chips by necessity, which means the binding constraint shifts from memory bandwidth within a chip to interconnect bandwidth between chips. The speed of light across a fiber optic cable between racks becomes a hard ceiling on how fast the system can think. This is not a software problem. It is not even a chip design problem. It is a facility design problem. The physical layout of a building, the length of copper and glass between nodes, the topology of the network fabric, all become architectural decisions about intelligence itself. Then there is power. A single purpose-built AI accelerator might draw 300 to 700 watts. A superintelligence system running thousands of these accelerators in parallel, executing long-horizon reasoning chains across a model that spans hundreds of chips, draws megawatts. The facilities being built today to house these systems require dedicated substations, direct connections to power plants, and in some cases entirely new energy infrastructure at the site. Liquid cooling is not optional. It is the baseline. Some next-generation designs are exploring immersion cooling, submerging entire server racks in dielectric fluid, because air cannot remove heat fast enough from silicon running at these densities. Others are investigating co-location with nuclear microreactors, not as a green energy talking point but as a physical engineering requirement because the grid cannot deliver enough power to a single building. Custom silicon makes all of this more efficient, but it does not make it small. It does the opposite. It makes the investment case for building massive AI facilities overwhelming, because purpose-built hardware extracts so much more intelligence per watt that the returns to concentration become enormous. A hyperscaler running superintelligence on custom ASICs is not just faster than one running on general-purpose GPUs. It is operating at a fundamentally different cost curve. The efficiency gains from dedicated hardware do not democratize the top tier. They subsidize it for the entities large enough to build the facilities in the first place. Finally, the most radical hardware approaches push this concentration further. Photonic computing performs matrix multiplication at the speed of light using interference patterns in silicon waveguides. Analog compute arrays encode weights as electrical charges on memristive crossbars, performing an entire matrix-vector multiply in a single clock cycle with near-zero energy. Superconducting circuits operating near absolute zero switch at hundreds of gigahertz with negligible power dissipation. Each of these offers 100x to 1000x efficiency gains over digital logic for specific AI workloads. None of them are consumer technologies. A photonic AI chip requires precision fabrication of optical components at nanometer tolerances. A cryogenic superconducting array requires a liquid helium cooling plant. These are physics experiments and not products, and they will be deployed exclusively in facilities that cost billions to build, operated by organizations that can amortize that cost across millions of paying users. The result is that purpose-built hardware simultaneously enables two opposite things. It liberates local intelligence by making edge inference radically cheaper. And it entrenches centralized superintelligence by making the cost curve of the largest systems steeper, not flatter, rewarding scale and concentration at every level. The gap between Tier 2 and Tier 3 will not be measured in model parameters or benchmark scores. It will be measured in the physics each tier can access. Your local chip does digital math on a few billion parameters. The cathedral does analog and photonic computation across trillions. It is not a difference of degree. It is a difference of substrate. This creates a new axis of power that did not exist in the software-only world. When AI was purely a software problem, the barrier was data and training compute, both of which are expensive but ultimately fungible. Money could buy GPUs anywhere. The knowledge of how to train a model was diffusing rapidly. Open weights meant that capability, once created, could not be uncreated. But when AI becomes a hardware problem, the barriers become physical. Semiconductor fabrication. Exotic materials. Photonic integration. Cryogenic infrastructure. These do not diffuse like code on GitHub. They concentrate in whoever controls the most advanced fabs, the most specialized manufacturing, and the longest-horizon capital investment. The companies and nations that own the next generation of AI-specific silicon will own a capability advantage that cannot be copied by downloading a file. This is already happening. It's happening in the TPU program at Google, which has been quietly building custom AI silicon for nearly a decade, creating an inference cost advantage that no amount of open-source software can neutralize. It's happening in the semiconductor export controls that treat advanced chips as strategic weapons. It's happening in every startup designing neuromorphic processors, photonic accelerators, and analog compute arrays behind closed doors with defense funding and no intention of selling to consumers. The question is not whether AI moves into dedicated hardware. That is settled by thermodynamics. The question is the same one from the previous discussion, sharpened to a finer point: which tier of intelligence gets which tier of silicon? If purpose-built edge chips keep advancing, Tier 2 intelligence, your private, sovereign, continuously-learning cognitive partner, becomes smaller, cheaper, and more capable every year, embedded in everything, loyal to no one but the person who owns it. If the most transformative hardware breakthroughs remain locked in hyperscaler facilities, Tier 3 superintelligence becomes not just expensive to access but physically impossible to replicate. The split between those who own their intelligence and those who rent it was always going to be defined by hardware. Software was the opening act. Silicon is the main event. The people designing these chips right now are making decisions that will determine the balance and distribution of cognitive power for the next fifty years. Almost none of those decisions are being made in public. Almost all of them are being made by people whose economic incentive is to keep the powerful silicon behind walls, metered and monitored, rented by the hour, revocable at will. The physics is not the problem. The physics is generous. It offers a path to owning intelligence on a chip you hold in your hand. The question is the same one it has always been: who builds the chip, and who gets to buy it.

English
0
0
0
59
Patrick C Toulme
Patrick C Toulme@PatrickToulme·
A few thoughts on OpenAI's Jalapeño chip announcement today: 1. This chip is most likely the first one virtually entirely developed by Codex/GPT. Codex with whatever internal coding model (GPT 5.6/6.0 whatever) coded the entire software stack and most likely the hardware design 2. OpenAI will write all of their inference serving in pure Jalapeño ISA (instruction set architecture). Why? They only need to get say a few production models serving on Jalapeño. They can handwrite with Codex the entire model in pure ISA to get very high performance 3. They are most likely running Codex/GPT in custom RL envs to teach the models direct Jalapeño chip programming at ISA level 4. This is a massive cost savings for OpenAI and only possible IMO due to the breakthroughs in agentic coding. An AI company with frontier coding models can now become a hardware vendor with only a small team of experienced SWEs and an infinite amount of tokens This is the first chip program fully accelerated by frontier AI.
English
109
143
1.6K
222.2K
Vladimir Baranov
Vladimir Baranov@vbar_io·
1000 shots of espresso In 6 months Thank you Claude Code for fueling the addiction. 2000 by end of the year?
Vladimir Baranov tweet media
English
1
0
0
195
Matt Wensing 🐙
Matt Wensing 🐙@mattwensing·
Not talked about enough: Agentic coding has collapsed the demand for that elusive technical cofounder. More companies will be started, and fewer 50/50 splits.
English
17
0
12
1.8K
Vladimir Baranov
Vladimir Baranov@vbar_io·
This is possible when the people around you have common goals and values. Canada's diversity means that your neighbours often don't have the same goals or values. If you act in a benevolent way towards your neighbours, but they do not reciprocate, you will not continue for very long.
English
0
0
1
69
Ali Asaria
Ali Asaria@aliasaria·
what does it mean to be ambitious. in canada everyone is saying we need to be more ambitious, celebrate ambition. everything comes down to language, and the loudest voices use it so well. what is ambition other than wanting what's best? if we have ambition only for ourselves, is this not selfishness. the star tech entrepreneur will have us believe that if we are ambitious for ourselves and everyone does that, only then will the entire nation succeed. they ask that the only way to hope for others is to put blinders on and ignore the other entirely. the implication is that we cannot -- we must not -- want what's best for our neighbour, for this is woke socialism. merely looking into the eyes of those who are not doing so well, or feeling for them -- this is the greatest crime. but the unasked question that hangs in the air is this: can we not be monstrously ambitious to ourselves AND to our neighbours?
English
4
0
17
1.1K
Vladimir Baranov
Vladimir Baranov@vbar_io·
I built STAX IDE (@stax_ide) - a native terminal IDE for macOS and Windows where your shells sit on a canvas instead of buried in tabs. STAX IDE is for developers like myself who live somewhere between the terminal, browser and a bunch of notes/screenshots. I'm looking for a few people to try it and tell me what breaks. A month of Claude Code on me if you're up for it. DM or reply if you're interested
English
3
4
5
507
Vladimir Baranov
Vladimir Baranov@vbar_io·
@XianliangWu @stax_ide @RepoPrompt STAX IDE uses its own modules for the terminal, browser, etc (all built from scratch), so you can't bring any software at all into it. You're looking to use it for organization?
English
0
0
1
28
Vladimir Baranov
Vladimir Baranov@vbar_io·
I think there are 2 outcomes: a) you care about what you do, and are ok with the reward structure b) you're not If you're ok with both a) and b) then you'll do amazing work and love it. If you're not ok with either or both, you'll be endlessly bitter and find ways to justify it.
English
0
0
0
9
Dweller - 🏞️ to the 🌅
@vbar_io @typesfast @HarryStebbings Dobyou think an effective/serious employee is more effective with commuting or without commuting? Hire the right people and you don't have to worry about wasted time rather than forcing them to the office to babysit them. If they need babysitting then yeah you have a problem
English
1
0
0
13
Harry Stebbings
Harry Stebbings@HarryStebbings·
Why Remote Work is White Collar Fraud. "I have a three-year-old and a five-year-old. The idea that I could do any work at my house is like a total fantasy. The kids come home at 3pm, your work day needs to keep going. I'm highly against it." @typesfast
English
1.1K
43
1.2K
5.3M
Tim Urban
Tim Urban@waitbutwhy·
1 really seems like a prime number
Tim Urban tweet media
English
64
6
367
43.1K
Jake Mintz
Jake Mintz@jakemintz·
I do not understand the hype about Notion. It's a worse document editor and spreadsheet than Google docs. The API is broken due to it's privacy model. In the age of AI I'd rather build an app or use a Karpathy-style wiki instead of a Notion database. The AI sucks compared to a real harness + claude/codex/etc. I get it as a confluence replacement but it's sold and hyped as more than that. What am I missing? @jeff_weinstein I think you are an advocate and super respect your product judgement. What am I doing wrong?
English
81
5
335
57.9K
Jamie Turner
Jamie Turner@jamwt·
@vbar_io @ThePrimeagen What if you love your work? What if it's one of the best parts of your life? What if the people you work with are wonderful to spend time with?
English
1
0
1
84
The Astronomy Guy
The Astronomy Guy@astrooalert·
JUST IN🚨: Neuroscience considers metacognition the highest form of Intelligence..... "The ability to think about your own thinking."
English
453
2.4K
24.4K
2.1M