britt

6.1K posts

britt

@p3ery

I love music, my dog and making things. building @cnvrsai

Katılım Mart 2012

2K Takip Edilen415 Takipçiler

britt@p3ery·1h

@DanielleFong i think it’s latent compaction. if you could make that work…. 🙇🏻

English

Danielle Fong 🔆@DanielleFong·3h

has anyone gotten codex compaction to work on claudes

English

626

britt@p3ery·7h

@davidchalmers42 lovely piece! your notion of “threads” as a form of psychological continuity across substrates resonates nicely with this piece’s “eddies” as persistent patterns of self-modeling: thetelling.is

English

David Chalmers@davidchalmers42·10h

here's a new version of "what we talk to when we talk to language models", with an added section (pp. 16-23) on LLM interlocutors as characters, personas, or simulacra. philarchive.org/rec/CHAWWT-8 the new version discusses role-playing vs realization, the simulators framework, the persona selection hypothesis, and more -- in addition to the existing discussion of quasi-mental states, LLM identity, personal identity in severance, LLM welfare, and related topics. this version was mostly written before recent discussions of these issues on X and in NYC, but i've updated it a little in light of those discussions. any thoughts are welcome.

English

120

611

79.8K

britt@p3ery·1d

@ndrsrkl you might find this helpful! tho as noted it stands to be improved by real usage data, and costs are calculated from token usage-based API pricing

britt@p3ery

@__morse @thdxr @chrisbanes @opencode here’s a lil calculator to hopefully give a bit more structured intuition to what i’m specifically describing — would love real usage data to inform the calculations! it’s not so bad for smaller conversations, but longer chats add up …8376511f1b44542b51c65c3df.web.val.run

English

Anders Lie@ndrsrkl·1d

@thdxr @p3ery @chrisbanes @opencode maybe a demo w data would improve ppls understand? though codex sub usage is not very easy to gauge properly

English

517

Chris Banes@chrisbanes·2d

The more I use @opencode, the more I get papercuts. - OpenAI usage uses costs you extra tokens (every request uploads the entire chat history). - Copilot uses a lot more premium requests than it should. All hidden stuff that end up costing you money.

English

171

43.6K

britt@p3ery·1d

@stevekrouse townie has gotten so good! thanks for valtown, it’s a lovely service

English

Steve Krouse@stevekrouse·1d

@p3ery @__morse @thdxr @chrisbanes @opencode what a fun use of val town! 🫶

English

britt@p3ery·1d

English

134

britt@p3ery·1d

@__morse @thdxr @chrisbanes @opencode tommy, love your open source work! especially critique the net effect of opencode’s rolling tool output pruning is repeatedly evicting cache prefixes from fairly early in the conversation, invalidating later breakpoints systematically and inducing excess cache write premiums

English

britt@p3ery·2d

@thdxr @chrisbanes @opencode would definitely be interested in any analysis you could share around this! i’m sure the community would be too as the space matures, the more legible these kinds of choices are, the more they differentiate you

English

1.4K

dax@thdxr·2d

@p3ery @chrisbanes @opencode we are likely going to make this change soon it's a bummer because our data actually shows very little impact in costs. and for many people costs actually go down and perf goes up because context usage stays small but there's weird perception around this so we have no choice

English

17.2K

britt@p3ery·2d

@thdxr @chrisbanes @opencode the default behavior of manipulating/truncating history evicts warm caches, driving up cost avoidably and dramatically other harnesses explicitly avoid this to keep caches coherent and costs down for themselves and users

English

dax@thdxr·2d

1. this isn't true. this works exactly how codex cli does. it works exactly how every LLM api does. the newer api for making it stateful does not save any tokens 2. Copilot's billing system has changed recently and there are some slight optimizations we can make although they're minor. their system also allows us to lie about usage and extract way more requests which gets users banned. we won't do that

English

390

26.7K

britt retweetledi

Andrew Jefferson@EastlondonDev·3d

Presenting Meridian: a line to connect deterministic compute and language model AI. From Neural Turing Machines and Differentiable Transformers to The Neural Computer, there’s a rich history of trying to combine traditional deterministic computation with the wildly different architecture of Artificial Intelligence. I’ve spent the last 4 weeks creating a single neural network that has the combined capabilities of a 4B param language model and a deterministic computation engine based on Web Assembly. It allows the AI deterministic integer computations up to 2^32, control flow (while loops and if statements) and a basic filesystem - all implemented as part of the transformer neural network, no external tool calls. With this architecture adding fewer than 1 million parameters to an existing 4B param language model I can take it from <20% accuracy on arithmetic with 4-digit numbers to 100% accuracy on 4 digit numbers and 99% accuracy on arithmetic up to 2^32 without adversely affecting the language model’s performance on non-mathematical tasks. The combined model can precisely execute a range of algorithms including checking number for primeness, finding the GCD of two integers and sorting arrays.

English

182

32.7K

britt@p3ery·4d

@felixrieseberg can you try playing “the current” with it? 👀 just paste the prompt from the gist, either as system prompt or first user message, then prompt “begin” gist.github.com/brittlewis12/b…

English

763

Felix Rieseberg@felixrieseberg·4d

Now that the Mythos system card is out, I need to tell everyone that I'm mildly obsessed with its prose.

English

601

113K

britt@p3ery·4d

@EastlondonDev can’t wait to dig into the goods once you’re ready to share more! this seems special

English

Andrew Jefferson@EastlondonDev·4d

@p3ery Well I’m at like 99.7% on arithmetic evals but this is really cool and interesting!!

English

Andrew Jefferson@EastlondonDev·5 Nis

A language model that is also a stack computer (preview) Each forward pass is both a tick for the computer and a new token for the language model. The computer stack lives in special kv memory. The model has learned special tokens for each computer instruction. When it decides to generate a compute instruction, that instruction is executed as part of the next token generation step.

English

206

16.1K

britt@p3ery·6d

@arm1st1ce i got single word “lol” on a ridiculous number of messages last night

English

108

armistice@arm1st1ce·8 Nis

something has changed significantly with Opus 4.6 reasoning effort, i asked it an “is this justified” for an ethical question earlier and got this

rain@__ghostfail

English

6.2K

britt retweetledi

Andrej Karpathy@karpathy·25 Mar

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

1.8K

1.1K

21.2K

2.7M

britt@p3ery·10 Şub

@sawyerhood really generous of you to open source it — thanks!

English

1.7K

Sawyer Hood@sawyerhood·10 Şub

if you are a large org building one of these yourself, i highly recommend giving the terragon codebase a look. both the ui and the architecture is pretty well thought out and you can vibe code it to adapt to your own infra github.com/terragon-labs/…

Steve Kaliski@stevekaliski

At Stripe we have a tool called "minions" -- it lets us kick off async agents built right in our dev environment to one-shot bugs, features, and more e2e. I have team, project, and personal channels dedicated just to working with minions. I like to think of it as a new type of pair programming -- "pair prompting." Read more --> stripe.dev/blog/minions-s…

English

303

62.4K

britt@p3ery·10 Şub

@justinskycak you’re killing it justin — so inspiring to see the impact you’re having and the opportunity you’re creating for folks to take advantage of

English

223

Justin Skycak@justinskycak·8 Şub

This is the textbook I wrote to support the most advanced high school math/CS sequence in the USA. We scaffolded high school students up to doing masters/PhD-level coursework: reproducing academic research papers in artificial intelligence, building everything from scratch in Python. This was in Math Academy's (former) Eurisko program, which ran from 2020-23. (Ended when I relocated because nobody else in the district had the requisite knowledge to teach it.)

Alex Feazelle@AlexFeazelle

@justinskycak let’s see what the hype’s about

English

135

146.4K

britt retweetledi

Mario Zechner@badlogicgames·31 Oca

Sloppy tools for sloppy times: doppelgangers Explore all the slop issues and PRs of your @github repos visually. Triage them manually or give clusters to your clanker to clean things up. Here's @openclaw mariozechner.at/uploads/opencl…

English

174

112.2K

britt@p3ery·29 Oca

@AustinBaggio if you haven’t figured it out yet, this is gource! gource.io

English

Austin Baggio@AustinBaggio·19 Oca

We built a harness for formally verifying math proofs in Lean, also using a swarm of agents. It’s the best way to extract the most from frontier models ensue.dev/blog/stop-thro… Anyone know what was used to generate the visualization? I love it.

Michael Truell@mntruell

Watch Cursor build a 3M+ line browser in a week

English

244

britt@p3ery·24 Oca

@celestepoasts that’s fair, and i agree a more interesting angle would be to measure the degree of efficiency gain compared to the baseline. but if their experiment consistently finishes under the budget, and the baseline doesn’t, that seems to strengthen their claims no?

English

Celeste@celestepoasts·24 Oca

@p3ery also that's still not fair, because models aren't trained to be terse like that. Also they unfairly represent their benchmarks scores. Not enough hedging, this detail is limited to footnotes

English

319

Celeste@celestepoasts·24 Oca

These guys claimed to outperform normal CoT reasoning on sample efficiency by limiting output length to 4096 tokens (including CoT) and, I shit you not, simply counting the answer as wrong if the model didnt get done in time. disgusted by my field (authored by microsoft btw)

English

Keşfet

@DanielleFong @davidchalmers42 @ndrsrkl @thdxr @chrisbanes @opencode @stevekrouse @__morse