Ryan Saxe
23.9K posts

Ryan Saxe
@rcsaxe
Director of Global Engineering @abinbev | NYC Street Performer | MTG Cube Lover | Opinions are mine | He/Him
White Plains, NY Katılım Mart 2017
533 Takip Edilen3.3K Takipçiler
Sabitlenmiş Tweet
Ryan Saxe retweetledi
Ryan Saxe retweetledi
Ryan Saxe retweetledi

New blog post: how to efficiently calibrate LLM uncertainty using semantic entropy as a reward signal. The calibration gap here is the difference between the confidence reported by the model and its accuracy. pic.twitter.com/qo2M8Xlh8U
English

I wonder if I’ll eventually shift to the app instead of just sshing to my machine and attaching to the tmux session.
I kind of doubt it, but I’ll try this out
Boris Cherny@bcherny
Claude Code Remote is rolling out now for Pro users
English
Ryan Saxe retweetledi

AI eliminated the natural barrier to entry that let OSS projects trust by default. People told me to do something rather than just complain. So I did. Introducing Vouch: explicit trust management for open source. Trusted people vouch for others. github.com/mitchellh/vouch
The idea is simple: Unvouched users can't contribute to your projects. Very bad users can be explicitly "denounced", effectively blocked. Users are vouched or denounced by contributors via GitHub issue or discussion comments or via the CLI.
Integration into GitHub is as simple as adopting the published GitHub actions. Done. Additionally, the system itself is generic to forges and not tied to GitHub in any way.
Who and how someone is vouched or denounced is up to the project. I'm not the value police for the world. Decide for yourself what works for your project and your community.
All of the data is stored in a single flat text file in your own repository that can be easily parsed by standard POSIX tools or mainstream languages with zero dependencies.
My hope is that eventually projects can form a web of trust so that projects with shared values can share their vouch lists with each other (automatically) so vouching or denouncing a person in one project has ripple effects through to other projects.
The idea is based on the already successful system used by @badlogicgames in Pi. Thank you Mario.
Ghostty will be integrating this imminently.
English
Ryan Saxe retweetledi

Wrote up about my personal journey from AI skeptic to someone who finds a lot of value in it daily. My goal is to share a more measured approach to finding value in AI rather than the typical overly dramatic, hyped bait out there. mitchellh.com/writing/my-ai-…
English

@Chord_O_Calls Always my favorite thing about any format is the exploratory phase to figure this stuff out
English

Hey @AnthropicAI and @trq212, a quick request for the skills you package: set these up to use @astral_sh's uv with script specification for dependencies. This way you don't have to worry about skills with conflicting dependencies nor having claude try to install things.

English
Ryan Saxe retweetledi

Andrej has such a well-regularized and calibrated world model, and his speech decoder is not excessively RL trained allowing for high mutual information with the WM. Very cool
Dwarkesh Patel@dwarkesh_sp
The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!
English
Ryan Saxe retweetledi
Ryan Saxe retweetledi

I love hand-crafting release notes. I've always been against most automated changelog tooling. I think changelogs are a boundary point where humans read what other humans should write. It's a social experience as much as it can be on the internet.
But the Ghostty 1.2.1 release notes were started by AI. This is a first for me, and I'm sharing my full session here. I hand-reviewed and edited almost every part of the changelog afterwards, but I still think it was a success with a promising future.
I prompted the agent to use `gh` to inspect PRs and issues within a milestone, and past release notes (all 100% human written) and to generate a draft based on all of that.
It did a decent job and saved me some real time.
The descriptions themselves, the ordering, and subtle syntax was not great. I think a lot of this can be improved through prompting. I think I can reach a point where this can become a slash command for me.
I'll always hand review and edit the resulting work, because like I said, I think release notes are a human experience. And I want the release notes for my projects to be in my voice. But this helps get everything going, and importantly it helps with the rote stuff (commit/contrib counts, list of changes, etc.).
Full session: ampcode.com/threads/T-4bb8…
English

@sammcallister How long is this running for?
Will stop by Monday or Wednesday if it’s around next week.
English

Touched grass
Met a bunch of amazing people
Generated zero seconds of slop video
📍Air Mail, West Village


sam mcallister@sammcallister
There couldn't be a better day to keep thinking than today
English
Ryan Saxe retweetledi

@iannuttall I’m pretty sure this is just a UI bug.
If you set model: opusplan in your settings.json, it works.
English
Ryan Saxe retweetledi








