Sasha Krassovsky

746 posts

Sasha Krassovsky

@bztree

Performance @AnthropicAI

Seattle Katılım Mart 2023

541 Takip Edilen1.7K Takipçiler

Sasha Krassovsky@bztree·5d

@beffjezos You lose 100% of the shots you don’t take

English

100

Beff (e/acc)@beffjezos·5d

Karpathy joining Extropic instead of Anthropic would have been a more entertaining outcome

English

170

7.3K

Sasha Krassovsky@bztree·16 May

Opus 4.7 haters in shambles

David Hershey@DavidSHershey

Many (many) months later, Claude is finally challenging the Elite Four and will likely become a Pokemon Champion tonight 🥲 twitch.tv/claudeplayspok…

English

184

Sasha Krassovsky@bztree·9 May

@i2cjak My timeline seems to disagree with you

English

127

i2cjak@i2cjak·9 May

I should quit being such a fucking idiot

English

172

8.6K

Sasha Krassovsky@bztree·9 May

@ThePrimeagen What is C? I only know ⌘ and ⌃. ⌘-c and ⌘-v works fine for me in Terminal-dot-app

English

230

ThePrimeagen@ThePrimeagen·9 May

without googling, i still cannot figure out how to copy and paste on a mac into: 1. the provided Terminal app (which sucks) 2. Ghostty. C-V does nothing, C-v goes into escape insertion (expected). I refuse to google this and it should just be obvious...

English

411

1.7K

352.5K

Sasha Krassovsky@bztree·7 May

This is the part of today’s announcement I’m most excited about 🙂

xAI@xai

SpaceXAI and @AnthropicAI have also expressed interest in partnering to develop multiple gigawatts of orbital AI compute capacity

English

480

Sasha Krassovsky@bztree·6 May

@benhylak @pronounced_kyle I just finished the part where he puts number theory into a formal system, around page 230. It’s getting tough to keep going but I’m on a mission

English

228

ben hylak@benhylak·5 May

i know a lot of people who love this book, and none of them have finished it.

Ihtesham Ali@ihtesham2005

A 34-year-old physics graduate student spent years writing a strange 800-page book in 1979 about a logician, a Dutch artist, and a German composer. It won the Pulitzer Prize the following year. It quietly became required reading at every AI lab in the world. It is the only book in history that makes the deepest ideas in computer science feel like a dream you cannot stop thinking about. I read it across 3 months on a single side table next to my bed and walked away seeing intelligence, consciousness, and AI in a way I cannot un-see. His name is Douglas Hofstadter. The book is called Gödel, Escher, Bach. Almost nothing in modern AI makes sense without this book. ChatGPT, Claude, Gemini, the entire architecture of self-attention, the alignment problem, the strange feeling that LLMs sometimes seem to understand and other times seem to be playing an elaborate symbol-shuffling game, all of it traces back to questions Hofstadter laid out in a single book published before most of today's AI engineers were born. Here is the story almost nobody tells you about how the book came to exist. Hofstadter was the son of Robert Hofstadter, who won the Nobel Prize in Physics in 1961 for measuring the size of the proton. He was supposed to follow in his father's footsteps. He started a physics PhD at the University of Oregon. He was miserable. He could not focus. He did not love the work. He kept getting pulled toward something else. The something else was a single question that had haunted him since childhood. How can meaning emerge from meaningless symbols? Specifically, how does a brain, which is made of nothing but cells firing electrical signals at each other, produce something that feels like consciousness, like understanding, like a self? He could not let the question go. He left physics. He started writing. The book took him years. He wrote it largely in isolation, working in the basement of his parents' house and at Indiana University, where he eventually finished it. He thought it would be read by maybe a few hundred logicians and AI researchers. Basic Books published it in 1979 as a 777-page hardcover. The next year it won the Pulitzer Prize for general non-fiction and the National Book Award for science. The book is structured in a way that almost no other book has ever attempted. The chapters alternate between two layers. One layer is technical chapters about logic, computability, neuroscience, and AI. The other layer is fictional dialogues between a tortoise and Achilles, characters borrowed from a paradox by Lewis Carroll. The dialogues play with the same ideas the technical chapters explain. Read in order, they do not feel like a textbook. They feel like a strange house with rooms that loop back into each other and corridors that change shape behind you. The first thing the book does is explain Gödel's incompleteness theorems in a way no math textbook had ever managed. Kurt Gödel, an Austrian logician working in 1931, proved something that broke mathematics. He showed that any formal system powerful enough to describe arithmetic contains statements that are true but cannot be proven inside that system. Mathematics, the most certain thing humans had ever built, has holes in it that can never be filled. Hofstadter spends hundreds of pages making you understand this proof not just as a mathematical theorem, but as a structural fact about every sufficiently complex system. Including the brain. Including any AI. The reason AI alignment is genuinely hard is not just engineering. It is structural. Any system smart enough to model itself will contain truths about itself it cannot reach from inside itself. Hofstadter showed this 50 years before AI safety was a field. The second thing the book does is introduce his core idea. He calls it the strange loop. A strange loop is what happens when a system, by climbing through layers of itself, somehow ends up back where it started. Escher's drawings of staircases that always go up but somehow loop back are visual strange loops. Bach's musical canons that modulate up through keys and end on the original note are auditory strange loops. Gödel's self-referential statements that talk about themselves are logical strange loops. Hofstadter argues that consciousness is a strange loop. Your brain builds a model of the world. Inside that model, it builds a model of itself perceiving the world. Inside that self-model, it builds a model of itself thinking about itself perceiving the world. The recursion does not bottom out. The self is what the loop feels like from the inside. This is the part that AI researchers cannot stop returning to. Modern transformer models use self-attention, which is technically a mechanism where a network attends to its own internal states across layers. Recursive reasoning, where a model thinks about its own thinking, is now a research area with its own conferences. Meta-learning, where models learn how to learn, is a direct descendant of what Hofstadter described in 1979 as the necessary structure of any conscious system. He wrote the philosophy. The engineers are now building the implementation. The third thing the book does is the part that haunts every AI conversation today. Hofstadter argued that meaning is not something separate from symbol manipulation. It is what symbol manipulation looks like from the inside, when the manipulation is complex enough and self-referential enough. A simple lookup table does not understand anything. But a system that processes symbols at sufficient depth, with enough self-modeling, with enough recursion, starts to look identical from the outside, and possibly from the inside, to a system that understands. This is the deepest question in modern AI. When ChatGPT generates a response, is it actually thinking, or is it just doing very fast symbol shuffling? Hofstadter spent 800 pages arguing that the distinction may not exist at sufficient scale. If a system shuffles symbols according to the right structure, meaning is what the shuffling looks like from the inside. You can read modern debates about AI consciousness from Yann LeCun, Geoffrey Hinton, Ilya Sutskever, and David Chalmers, and you will find that they are all, in their own ways, having the argument Hofstadter framed in 1979. The fourth thing the book did is the one that took the longest to be vindicated. Hofstadter argued, and continued arguing for decades, that the actual engine of human intelligence is not logic. It is not deduction. It is not pattern matching in any simple sense. It is analogy. The ability to see one thing as similar to another thing, to map the structure of one situation onto a different situation, is, in his view, the core of thought itself. For decades this was unfashionable. Symbolic AI focused on logic and rules. Statistical AI focused on pattern matching. Almost nobody worked seriously on analogy. Then large language models started working. And the people who looked closely at what they were doing realized something uncomfortable. LLMs are, fundamentally, analogy machines. They learn structural patterns from text and apply those patterns by analogy to new situations. They do not deduce. They do not reason logically by default. They map the shape of one thing onto the shape of another thing and produce output that fits the new shape. Hofstadter saw this before any of it existed. His later book Surfaces and Essences, written with Emmanuel Sander, is 600 pages defending the claim that analogy is the core of cognition. It came out in 2013. It was largely ignored. The ChatGPT release in 2022 was, in some sense, a vindication of the entire argument. The strangest thing about reading Gödel, Escher, Bach in 2026 is realizing how lonely the book must have felt when it was written. In 1979 there was no GPT. No deep learning. No transformer. The dominant approach to AI was symbolic logic, and most researchers thought minds were going to be programmed top-down, rule by rule, like a complicated chess engine. Hofstadter said the opposite. He said minds were emergent. They came from the bottom up. They were strange loops in complex substrates. The programmers' approach would never produce real intelligence because it was missing the recursive self-modeling that made minds real. He was right. The book is hard. I had to use all the LLMs and NotebookLM to understand it. It is not a beach read. You do not finish it in a weekend. The math chapters require attention. The dialogues require patience. Most people who buy it never finish it. That is fine. The book is structured so that reading any 50 pages produces a permanent shift in how you think. Bill Gates lists it among the books that shaped him. Steve Jobs read it. Almost every senior AI researcher in the world will tell you it was the book that made them fall in love with the question of intelligence in the first place. Hofstadter himself has been in doubt about modern LLMs. He has said they may have proven him right about analogy and wrong about consciousness at the same time. He is still writing. He is still working on the same question that pulled him out of physics 50 years ago. The 800-page book that explained intelligence before AI existed is sitting one click away from you. Most people will never open it. The ones who do will see the world differently for the rest of their lives.

English

445

144.7K

Sasha Krassovsky@bztree·2 May

@cmuratori "Loading" and "updating the screen" don't seem all that related anyway? Like you can have a game giving you a load screen and updating the loading spinner at 60 FPS if it's really loading GBs from your hard drive.

English

279

Casey Muratori@cmuratori·2 May

At this point I feel like I should do a stream tomorrow to talk about the replies I've seen to this post. I completely disagree with people's umbrage about use of FPS as a metric here: A) that is exactly what time-to-show actually is (we measure 1% and .1% lows in for a reason!), and b) to me, FPS is the most relatable number for response time for average people to understand given that they don't work on software performance for a living like I do. Many people (especially gamers!) intuitively know what 10 or 11fps responsiveness feels like for an action. Few intuitively know what "94ms" responsiveness feels like. I also find it unacceptable to call this "load time" because the user is not asking to "load" anything - it is an action they are taking from a UI that they perceive to be contiguous, and the choice to involve a "load" of any kind at this point is purely the fault of the designers of the system, not some inevitability. Everything has already "loaded" from the point of the view of the user, and if you are claiming to have done a rewrite with performance "top-of-mind", you should have preloaded or precached whatever it is that you believe takes 94ms to "load" here.

Casey Muratori@cmuratori

Just want to make sure I'm reading this right: Microsoft rewrote the run dialog with performance "top-of-mind", and the best they could manage to do when putting up a single text box was 10fps?

English

922

48.2K

Sasha Krassovsky@bztree·25 Nis

@awesomekling EU nutrition labels drive me nuts. The per 100g system is obviously inferior. Like suppose I want to have a protein bar. I care how much is in the protein bar, not in a glob of 2.37 protein bars

English

151

Andreas Kling@awesomekling·24 Nis

EU nutrition labels: - nutrients per 100g US nutrition labels: - nutrients per 0.5 cup - nutrients per 3 pieces - nutrients per 5 sprays

Français

538

24.5K

Sasha Krassovsky@bztree·20 Nis

@filpizlo @zuhaitz_dev My mind was blown reading the C++ FQA for the first time. C++ really is just reinventing every C feature in its own way.

English

Filip Jerzy Pizło@filpizlo·20 Nis

@zuhaitz_dev I'm talking about the full latest version of C++ It's just sugar. C and C++ are two dialects of the same thing

English

9.9K

Zuhaitz@zuhaitz_dev·19 Nis

The fact that you put C and C++ together means that you know neither. Or that you don't know C++ at least. Besides that. It's more about the time you put into it. It can be a lot faster, mainly if you are neurodivergent, which you surely are if you are considering these langs 🤓

Edison@CodeEdison

How long does it take to learn popular programming skills? - HTML / CSS — 3 months - JavaScript — 4 months - Python — 3 months - React — 3 months - Java — 9 months - PHP — 3 months - C# — 9 months - SQL — 8 months - MongoDB — 5 months - Golang — 2 months - Backend (REST APIs) — 2 months - Backend (GraphQL) — 3 months - Mobile App Development — 12 months - Manual Testing — 3 months - Automation Testing — 12 months - Data Science — 15 months - AI / ML — 12 months - C / C++ — 18 months Learning a language takes months. Getting good takes years. Getting a job takes patience, projects, and luck.

English

83.4K

Sasha Krassovsky@bztree·7 Nis

OK last post for the night: I tried all the fancy stuff they recommended in their GEMM doc: Z-curve, static extents, accumulation group synchronization. None of it seemed to make any performance improvement - I seem to be stuck at 40 TFLOPs in bf16 across a variety of shapes.

Sasha Krassovsky@bztree

I got my M5 MacBook over the weekend and had some time to mess around with Metal 4 and the Neural Accelerators! Wanted to document some of my first impressions below:

English

3.4K

Sasha Krassovsky@bztree·7 Nis

@__simt__ @anemll @ekryski @mweinbach How do you use the fp19? When I had my metal kernel mark the inputs as `float`, the profiler seemed to tell me it wasn't using the neural accelerator, but the normal fp32 ALUs?

English

olivier giroux@__simt__·7 Nis

@anemll @ekryski @mweinbach @bztree Correct, except the fp32 is fp19

English

Sasha Krassovsky@bztree·7 Nis

I got my M5 MacBook over the weekend and had some time to mess around with Metal 4 and the Neural Accelerators! Wanted to document some of my first impressions below:

English

238

58.3K

Sasha Krassovsky@bztree·7 Nis

@ekryski Code here - you can run `uv run test_kernels.py`. Make sure you set `MTL_CAPTURE_ENABLED=1`! github.com/save-buffer/py…

English

Eric Kryski@ekryski·7 Nis

@bztree Good stuff! Any chance you got an open source repo of your experiment? 👀 Asking for a friend...

English

Sasha Krassovsky@bztree·7 Nis

@mweinbach I believe that’s for allocating Tensors on the host, nothing to do with the neural accelerator

English

328

Max Weinbach@mweinbach·7 Nis

@bztree I’m not sure if these are just the API support or the actual silicon but may be useful nonetheless developer.apple.com/documentation/…

English

372

Sasha Krassovsky@bztree·7 Nis

@mweinbach Yes! It’s a very good document. I haven’t implemented the Z-order traversal they recommend yet, but plan to.

English

3.1K

Max Weinbach@mweinbach·7 Nis

@bztree Have you seen this? developer.apple.com/download/files…

English

3.8K

Sasha Krassovsky@bztree·7 Nis

@__alpoge__ @gallabytes 🪵🪵🪵

QME

levent@__alpoge__·7 Nis

@gallabytes Axis issue

English

744

Sasha Krassovsky@bztree·7 Nis

Overall had a fun time! To close off with some criticisms: - it took me a long time to figure out how to enable Metal 4. I wish this were better-documented - MPP seems a little boiler-platey. I wish there were a slightly more convenient syntax for this stuff, but not a dealbreaker. Hope this was interesting!

English

4.4K

Sasha Krassovsky@bztree·7 Nis

I was also expecting a much more dramatic speedup from the Neural Accelerator. It seemed that with my original tile size of 32x32, I was only getting 244 GB/s of memory bandwidth. Bumping it up to 64x64 gave me 740 GB/s, dropping the time to 3.36ms!

English

5.1K

Keşfet

@beffjezos @i2cjak @ThePrimeagen @benhylak @pronounced_kyle @cmuratori @awesomekling @filpizlo