Fraser
239 posts

Fraser
@FraserLeeee
GPU Architectural Performance team @ Apple BSc compsci+physics @ McGill, '25
Boston, MA Katılım Temmuz 2018
185 Takip Edilen156 Takipçiler

@gleech @allTheYud Though now that they’re clever enough to be broadly eval-aware we can never trust tests again. Oh well.
English

@gleech @allTheYud DeepSeek R1 was one step away from babbling to itself in incomprehensible neolanguages. We do *RL for a year*, get narrowly super-human software engineering, yet somehow the resulting models try to make reasonable moral arguments for their courses of action, in English.
English

Today we're excited to announce NO_FLICKER mode for Claude Code in the terminal
It uses an experimental new renderer that we're excited about. The renderer is early and has tradeoffs, but already we've found that most internal users prefer it over the old renderer. It also supports mouse events (yes, in a terminal).
Try it: CLAUDE_CODE_NO_FLICKER=1 claude
Curt Tigges@CurtTigges
@bcherny @UltraLinx please at least fix the uncontrollable scrolling/flickering before the next 3000 features
English

@tyler_m_john imo this has more to do with humility / deferring to others. the theory "the sun won't rise tomorrow" would easily be assigned a 1e-6% or lower probability, but most "theories" of interest have proponents, and when disagreeing with others we recognize we are sometimes (+1%) wrong
English

@tyler_m_john If I sample a 7-bit statement about the world, it’s < 1%
If *you hand me* a 7-bit statement and ask for my credence, all the uncorrected ways my model of reality could fail (missing critical context, in a simulation, etc.) are correlated with it.
Plus normal adverse selection.
English

@srcooley3 @M1Astra I would put a lot of money on the other side of that bet.
English

This is extremely interesting, here’s my prediction:
Not only Anthropic but all the AI Labs will be releasing models that easily 10x the capability of their current flagship models over the next month or so.
The cost will be tremendous, essentially being unable to be a daily driver for most everyone except research companies and AI driven companies that have the capital to spend on intelligence.
This will be the biggest shift away from retail consumer models.
As an OpenClaw user what I foresee the amazing benefit of these models will be is a once a month audit on my entire setup.
Essentially connect in the model and ask it to audit my entire setup in OpenClaw, identify every weakness and improvement that can be implemented, generate a report and implement.
It may literally cost $500 or more to do the one prompt and implementation but would likely be worth it.
The next few months are going to be very interesting.
English

Claude Mythos Blog Post
Saved before it was taken down.
m1astra-mythos.pages.dev
English

@n8nandrew Oh totally. If the transcription is basically instantaneous (a local Parakeet 120M, etc) then streaming works fine, but with some intermediate level of delay it can get into speech-jammer / DAF territory.
English

@FraserLeeee I have tried to get into using it but the delay messes me up sometimes and the auto typing as you talk, maybe that's just me?
English

@Angaisb_ @adrscott Both of you are wrong.
Current models can conquer any of these listed features, but hit a wall once architecture exceeds some complexity threshold, spend super-linear time refactoring along iso-curves w/o ever converging to a “perfect clone”.
They’ll get there eventually.
English

GPT-5.4, it's basically perfect (it took it around 24 minutes)
Yeah, Minecraft is pretty much solved, I have to find a new test now
Angel 🌼@Angaisb_
GPT-5.3 Codex is actually pretty insane with Three.js This Minecraft clone works smoothly and it didn't take too long to make I also tried Opus 4.6, but for some reason it got stuck
English






