Fraser

239 posts

Fraser

Fraser

@FraserLeeee

GPU Architectural Performance team @ Apple BSc compsci+physics @ McGill, '25

Boston, MA Katılım Temmuz 2018
185 Takip Edilen156 Takipçiler
Fraser
Fraser@FraserLeeee·
:let i=1 | %s/^\s*\zs- /\=printf('%d. ', [i, execute("let i += 1")[0])/g
English
0
0
0
4
Fraser
Fraser@FraserLeeee·
Every ambulance should have a small fleet of reconnaissance drones
English
0
0
0
13
Fraser
Fraser@FraserLeeee·
@gleech @allTheYud Though now that they’re clever enough to be broadly eval-aware we can never trust tests again. Oh well.
English
0
0
0
12
Fraser
Fraser@FraserLeeee·
@gleech @allTheYud DeepSeek R1 was one step away from babbling to itself in incomprehensible neolanguages. We do *RL for a year*, get narrowly super-human software engineering, yet somehow the resulting models try to make reasonable moral arguments for their courses of action, in English.
English
1
0
0
31
gavin leech (Non-Reasoning)
where does people's newfound confidence in LLMs' alignment come from? 1. weak auto-investigation ("misalignment scores") 2. inference from personal experience/cross-species communion/psychosis 3. vibes, labs being pleased with themselves 4. their incoherence 5. dog didn't bark
gavin leech (Non-Reasoning) tweet mediagavin leech (Non-Reasoning) tweet media
English
7
16
183
15.1K
Fraser
Fraser@FraserLeeee·
You wrote a beautiful permission-scoped rewindable accountable Edit() tool. Your model prefers Bash(cat<<EOF > /tmp/foo.c\nbar\nEOF)
English
0
0
0
23
Fraser
Fraser@FraserLeeee·
Or just replace RL incentive to minimize number of tool-calls, with incentives to minimize number of groups of tool-calls, plus incentivize making every individual call atomic and interpretable.
English
1
0
0
17
Fraser
Fraser@FraserLeeee·
Bash + all unix utils is too powerful to audit or trust. Agent frameworks should replace it with something syntactically similar, but structured, sampled from the model such that python-in-heredocs, shell in strings, etc, are impossible.
English
1
0
0
25
Fraser
Fraser@FraserLeeee·
@bcherny @trq212 CC has become a great editor, but the old flickery mess was a better tool for programmers.
English
0
0
0
15
Fraser
Fraser@FraserLeeee·
@bcherny @trq212 I just want something that acts mostly like a log, with some tasteful cursor control to redraw on the last few lines. If it literally never clears scrollback (like Claude Code >8 months ago), that’d be ideal, and I’d gladly sacrifice window resize to have it.
English
1
0
0
29
Boris Cherny
Boris Cherny@bcherny·
Today we're excited to announce NO_FLICKER mode for Claude Code in the terminal It uses an experimental new renderer that we're excited about. The renderer is early and has tradeoffs, but already we've found that most internal users prefer it over the old renderer. It also supports mouse events (yes, in a terminal). Try it: CLAUDE_CODE_NO_FLICKER=1 claude
Curt Tigges@CurtTigges

@bcherny @UltraLinx please at least fix the uncontrollable scrolling/flickering before the next 3000 features

English
666
703
10.3K
2.9M
Bayesian
Bayesian@Bayesian0_0·
@tyler_m_john imo this has more to do with humility / deferring to others. the theory "the sun won't rise tomorrow" would easily be assigned a 1e-6% or lower probability, but most "theories" of interest have proponents, and when disagreeing with others we recognize we are sometimes (+1%) wrong
English
3
0
10
265
Tyler John
Tyler John@tyler_m_john·
I have a hypothesis that if you ask someone to state the probability of a theory they will ~always give you a number between 1-100%, never lower. For some reason the centile range is psychologically privileged even though there is no reason to think everything is >1% likely.
English
14
0
36
8K
Fraser
Fraser@FraserLeeee·
@tyler_m_john If I sample a 7-bit statement about the world, it’s < 1% If *you hand me* a 7-bit statement and ask for my credence, all the uncorrected ways my model of reality could fail (missing critical context, in a simulation, etc.) are correlated with it. Plus normal adverse selection.
English
0
0
0
54
Spencer Cooley
Spencer Cooley@srcooley3·
This is extremely interesting, here’s my prediction: Not only Anthropic but all the AI Labs will be releasing models that easily 10x the capability of their current flagship models over the next month or so. The cost will be tremendous, essentially being unable to be a daily driver for most everyone except research companies and AI driven companies that have the capital to spend on intelligence. This will be the biggest shift away from retail consumer models. As an OpenClaw user what I foresee the amazing benefit of these models will be is a once a month audit on my entire setup. Essentially connect in the model and ask it to audit my entire setup in OpenClaw, identify every weakness and improvement that can be implemented, generate a report and implement. It may literally cost $500 or more to do the one prompt and implementation but would likely be worth it. The next few months are going to be very interesting.
English
1
0
18
3.8K
Fraser
Fraser@FraserLeeee·
@n8nandrew Oh totally. If the transcription is basically instantaneous (a local Parakeet 120M, etc) then streaming works fine, but with some intermediate level of delay it can get into speech-jammer / DAF territory.
English
1
0
1
22
Andrew
Andrew@n8nandrew·
@FraserLeeee I have tried to get into using it but the delay messes me up sometimes and the auto typing as you talk, maybe that's just me?
English
1
0
1
11
Fraser
Fraser@FraserLeeee·
Mildly disappointed with claude code's /voice. All the technical terminology and keywords from my specific project and system are right there, yet it still completes to generic near-homophones.
English
1
0
1
58
Fraser
Fraser@FraserLeeee·
we dare not leave him to his own devices, his halfwitted plans will get out of control. but how do we stop him- his **glamour** increases by leaps every minute he's top of the pole
Fraser tweet media
English
0
0
0
43
Fraser
Fraser@FraserLeeee·
@Angaisb_ @adrscott Both of you are wrong. Current models can conquer any of these listed features, but hit a wall once architecture exceeds some complexity threshold, spend super-linear time refactoring along iso-curves w/o ever converging to a “perfect clone”. They’ll get there eventually.
English
0
0
0
37
Angel 🌼
Angel 🌼@Angaisb_·
@adrscott This was 24 minutes and HTML Spend enough time with the model and it'll make you the perfect clone
English
12
0
41
10.9K
Fraser
Fraser@FraserLeeee·
Context: Trello is pretty good, but it's suboptimal for my workflow, so I migrated to my own personal planner. I wanted mock-data to test it without using my info, so I got Claude to fill in the day-to-day of Genghis Khan
Fraser tweet media
English
0
0
1
41
Fraser
Fraser@FraserLeeee·
good claude.
Fraser tweet media
English
1
0
0
40