
the VC almanac
162 posts

the VC almanac
@theVCalmanac
all things venture capital & tech


Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography


Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What Mastery of Coding Agents Looks Like 11:16 - Second Order Effects of Coding Agents 15:51 - Why AutoResearch 22:45 - Relevant Skills in the AI Era 28:25 - Model Speciation 32:30 - Collaboration Surfaces for Humans and AI 37:28 - Analysis of Jobs Market Data 48:25 - Open vs. Closed Source Models 53:51 - Autonomous Robotics and Atoms 1:00:59 - MicroGPT and Agentic Education 1:05:40 - End Thoughts


Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!


Alex Karp rips into the Palantir conspiracy theorists: “You're attacking the person who's protecting you— idiot.” “You may hate this, but there's one person protecting your rights to be a conspiracy theorist that actually has a seat at the table, and that person is me.” “You may not want to hear that truth, but it's fucking true.” “Maybe do a little more reading before you pontificate on your absurd and obviously ill-formed and many times stupid opinions.” “It's like fucking so stupid.” Via @tbpn

Technology only matters if it strengthens the people and institutions that hold society together. In this conversation with a16z’s Katherine Boyle, Alex Karp discusses how AI is reshaping warfare, why Silicon Valley needs to take the stakes of this moment more seriously, and what it will take for America to remain both technologically dominant and socially cohesive. @PalantirTech @KTmBoyle

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️


Jim Cramer in 2010: “I’m not sure Tesla has a business plan that’s going to work, it’s not a smart investment” Inverse Cramer is so real


Chamath: Work-life balance is the worst thing happened to young people. "The first and most important thing is you have to be on Broadway." "If you're into politics, you need to be in Washington D.C. If you want to be in finance, you need to get to New York or London. If you want to be in crypto, you probably need to be in Abu Dhabi. If you want to be in tech, you just need to be in Silicon Valley." "There is no shortcut for any of these decisions. You have to be where the fish are." "The number of young people I encounter who talk about all of these idiotic things like work-life balance. I don't even understand what that means." Remote work is convenient. Being where it happens is how you win. video: @chamath
