Fraser

239 posts

Fraser

@FraserLeeee

GPU Architectural Performance team @ Apple BSc compsci+physics @ McGill, '25

Boston, MA Katılım Temmuz 2018

185 Takip Edilen156 Takipçiler

Fraser@FraserLeeee·56m

:let i=1 | %s/^\s*\zs- /\=printf('%d. ', [i, execute("let i += 1")[0])/g

English

Fraser@FraserLeeee·1d

Every ambulance should have a small fleet of reconnaissance drones

English

Fraser@FraserLeeee·5d

@gleech @allTheYud Though now that they’re clever enough to be broadly eval-aware we can never trust tests again. Oh well.

English

Fraser@FraserLeeee·5d

@gleech @allTheYud DeepSeek R1 was one step away from babbling to itself in incomprehensible neolanguages. We do *RL for a year*, get narrowly super-human software engineering, yet somehow the resulting models try to make reasonable moral arguments for their courses of action, in English.

English

gavin leech (Non-Reasoning)@gleech·15 Nis

where does people's newfound confidence in LLMs' alignment come from? 1. weak auto-investigation ("misalignment scores") 2. inference from personal experience/cross-species communion/psychosis 3. vibes, labs being pleased with themselves 4. their incoherence 5. dog didn't bark

English

183

15.1K

Fraser@FraserLeeee·6d

You wrote a beautiful permission-scoped rewindable accountable Edit() tool. Your model prefers Bash(cat<<EOF > /tmp/foo.c\nbar\nEOF)

English

Fraser@FraserLeeee·6d

Or just replace RL incentive to minimize number of tool-calls, with incentives to minimize number of groups of tool-calls, plus incentivize making every individual call atomic and interpretable.

English

Fraser@FraserLeeee·6d

Bash + all unix utils is too powerful to audit or trust. Agent frameworks should replace it with something syntactically similar, but structured, sampled from the model such that python-in-heredocs, shell in strings, etc, are impossible.

English

Fraser@FraserLeeee·2 Nis

@bcherny @trq212 CC has become a great editor, but the old flickery mess was a better tool for programmers.

English

Fraser@FraserLeeee·2 Nis

@bcherny @trq212 I just want something that acts mostly like a log, with some tasteful cursor control to redraw on the last few lines. If it literally never clears scrollback (like Claude Code >8 months ago), that’d be ideal, and I’d gladly sacrifice window resize to have it.

English

Boris Cherny@bcherny·1 Nis

Today we're excited to announce NO_FLICKER mode for Claude Code in the terminal It uses an experimental new renderer that we're excited about. The renderer is early and has tradeoffs, but already we've found that most internal users prefer it over the old renderer. It also supports mouse events (yes, in a terminal). Try it: CLAUDE_CODE_NO_FLICKER=1 claude

Curt Tigges@CurtTigges

@bcherny @UltraLinx please at least fix the uncontrollable scrolling/flickering before the next 3000 features

English

666

703

10.3K

2.9M

Fraser@FraserLeeee·1 Nis

@Bayesian0_0 @tyler_m_john I’d take you up at those odds

English

Bayesian@Bayesian0_0·31 Mar

@tyler_m_john imo this has more to do with humility / deferring to others. the theory "the sun won't rise tomorrow" would easily be assigned a 1e-6% or lower probability, but most "theories" of interest have proponents, and when disagreeing with others we recognize we are sometimes (+1%) wrong

English

265

Tyler John@tyler_m_john·31 Mar

I have a hypothesis that if you ask someone to state the probability of a theory they will ~always give you a number between 1-100%, never lower. For some reason the centile range is psychologically privileged even though there is no reason to think everything is >1% likely.

English

Fraser@FraserLeeee·1 Nis

@tyler_m_john If I sample a 7-bit statement about the world, it’s < 1% If *you hand me* a 7-bit statement and ask for my credence, all the uncorrected ways my model of reality could fail (missing critical context, in a simulation, etc.) are correlated with it. Plus normal adverse selection.

English

Fraser@FraserLeeee·28 Mar

@srcooley3 @M1Astra I would put a lot of money on the other side of that bet.

English

Spencer Cooley@srcooley3·27 Mar

This is extremely interesting, here’s my prediction: Not only Anthropic but all the AI Labs will be releasing models that easily 10x the capability of their current flagship models over the next month or so. The cost will be tremendous, essentially being unable to be a daily driver for most everyone except research companies and AI driven companies that have the capital to spend on intelligence. This will be the biggest shift away from retail consumer models. As an OpenClaw user what I foresee the amazing benefit of these models will be is a once a month audit on my entire setup. Essentially connect in the model and ask it to audit my entire setup in OpenClaw, identify every weakness and improvement that can be implemented, generate a report and implement. It may literally cost $500 or more to do the one prompt and implementation but would likely be worth it. The next few months are going to be very interesting.

English

3.8K

M1@M1Astra·27 Mar

Claude Mythos Blog Post Saved before it was taken down. m1astra-mythos.pages.dev

English

134

279

2.6K

3.6M

Fraser@FraserLeeee·14 Mar

@n8nandrew Oh totally. If the transcription is basically instantaneous (a local Parakeet 120M, etc) then streaming works fine, but with some intermediate level of delay it can get into speech-jammer / DAF territory.

English

Andrew@n8nandrew·14 Mar

@FraserLeeee I have tried to get into using it but the delay messes me up sometimes and the auto typing as you talk, maybe that's just me?

English

Fraser@FraserLeeee·14 Mar

Mildly disappointed with claude code's /voice. All the technical terminology and keywords from my specific project and system are right there, yet it still completes to generic near-homophones.

English

Fraser@FraserLeeee·9 Mar

we dare not leave him to his own devices, his halfwitted plans will get out of control. but how do we stop him- his **glamour** increases by leaps every minute he's top of the pole

English

Fraser@FraserLeeee·6 Mar

@Angaisb_ @adrscott Both of you are wrong. Current models can conquer any of these listed features, but hit a wall once architecture exceeds some complexity threshold, spend super-linear time refactoring along iso-curves w/o ever converging to a “perfect clone”. They’ll get there eventually.

English

Angel 🌼@Angaisb_·6 Mar

@adrscott This was 24 minutes and HTML Spend enough time with the model and it'll make you the perfect clone

English

10.9K

Angel 🌼@Angaisb_·5 Mar

GPT-5.4, it's basically perfect (it took it around 24 minutes) Yeah, Minecraft is pretty much solved, I have to find a new test now

Angel 🌼@Angaisb_

GPT-5.3 Codex is actually pretty insane with Three.js This Minecraft clone works smoothly and it didn't take too long to make I also tried Opus 4.6, but for some reason it got stuck

English

131

174

3.1K

953.5K

Fraser@FraserLeeee·28 Şub

Context: Trello is pretty good, but it's suboptimal for my workflow, so I migrated to my own personal planner. I wanted mock-data to test it without using my info, so I got Claude to fill in the day-to-day of Genghis Khan

English

Fraser@FraserLeeee·28 Şub

good claude.

English

Keşfet

@gleech @allTheYud @bcherny @trq212 @Bayesian0_0 @tyler_m_john @srcooley3 @M1Astra