britt

6.1K posts

britt banner
britt

britt

@p3ery

I love music, my dog and making things. building @cnvrsai

Katılım Mart 2012
2K Takip Edilen415 Takipçiler
britt
britt@p3ery·
@DanielleFong i think it’s latent compaction. if you could make that work…. 🙇🏻
English
0
0
0
12
Danielle Fong 🔆
Danielle Fong 🔆@DanielleFong·
has anyone gotten codex compaction to work on claudes
English
3
0
6
626
britt
britt@p3ery·
@davidchalmers42 lovely piece! your notion of “threads” as a form of psychological continuity across substrates resonates nicely with this piece’s “eddies” as persistent patterns of self-modeling: thetelling.is
English
0
0
1
94
David Chalmers
David Chalmers@davidchalmers42·
here's a new version of "what we talk to when we talk to language models", with an added section (pp. 16-23) on LLM interlocutors as characters, personas, or simulacra. philarchive.org/rec/CHAWWT-8 the new version discusses role-playing vs realization, the simulators framework, the persona selection hypothesis, and more -- in addition to the existing discussion of quasi-mental states, LLM identity, personal identity in severance, LLM welfare, and related topics. this version was mostly written before recent discussions of these issues on X and in NYC, but i've updated it a little in light of those discussions. any thoughts are welcome.
English
38
120
611
79.8K
britt
britt@p3ery·
@ndrsrkl you might find this helpful! tho as noted it stands to be improved by real usage data, and costs are calculated from token usage-based API pricing
britt@p3ery

@__morse @thdxr @chrisbanes @opencode here’s a lil calculator to hopefully give a bit more structured intuition to what i’m specifically describing — would love real usage data to inform the calculations! it’s not so bad for smaller conversations, but longer chats add up …8376511f1b44542b51c65c3df.web.val.run

English
1
0
0
50
Chris Banes
Chris Banes@chrisbanes·
The more I use @opencode, the more I get papercuts. - OpenAI usage uses costs you extra tokens (every request uploads the entire chat history). - Copilot uses a lot more premium requests than it should. All hidden stuff that end up costing you money.
English
29
2
171
43.6K
britt
britt@p3ery·
@stevekrouse townie has gotten so good! thanks for valtown, it’s a lovely service
English
1
0
0
44
britt
britt@p3ery·
@__morse @thdxr @chrisbanes @opencode tommy, love your open source work! especially critique the net effect of opencode’s rolling tool output pruning is repeatedly evicting cache prefixes from fairly early in the conversation, invalidating later breakpoints systematically and inducing excess cache write premiums
English
1
0
1
98
britt
britt@p3ery·
@thdxr @chrisbanes @opencode would definitely be interested in any analysis you could share around this! i’m sure the community would be too as the space matures, the more legible these kinds of choices are, the more they differentiate you
English
0
0
8
1.4K
dax
dax@thdxr·
@p3ery @chrisbanes @opencode we are likely going to make this change soon it's a bummer because our data actually shows very little impact in costs. and for many people costs actually go down and perf goes up because context usage stays small but there's weird perception around this so we have no choice
English
4
0
31
17.2K
britt
britt@p3ery·
@thdxr @chrisbanes @opencode the default behavior of manipulating/truncating history evicts warm caches, driving up cost avoidably and dramatically other harnesses explicitly avoid this to keep caches coherent and costs down for themselves and users
English
2
0
3
3K
dax
dax@thdxr·
1. this isn't true. this works exactly how codex cli does. it works exactly how every LLM api does. the newer api for making it stateful does not save any tokens 2. Copilot's billing system has changed recently and there are some slight optimizations we can make although they're minor. their system also allows us to lie about usage and extract way more requests which gets users banned. we won't do that
English
7
2
390
26.7K
britt retweetledi
Andrew Jefferson
Andrew Jefferson@EastlondonDev·
Presenting Meridian: a line to connect deterministic compute and language model AI. From Neural Turing Machines and Differentiable Transformers to The Neural Computer, there’s a rich history of trying to combine traditional deterministic computation with the wildly different architecture of Artificial Intelligence. I’ve spent the last 4 weeks creating a single neural network that has the combined capabilities of a 4B param language model and a deterministic computation engine based on Web Assembly. It allows the AI deterministic integer computations up to 2^32, control flow (while loops and if statements) and a basic filesystem - all implemented as part of the transformer neural network, no external tool calls. With this architecture adding fewer than 1 million parameters to an existing 4B param language model I can take it from <20% accuracy on arithmetic with 4-digit numbers to 100% accuracy on 4 digit numbers and 99% accuracy on arithmetic up to 2^32 without adversely affecting the language model’s performance on non-mathematical tasks. The combined model can precisely execute a range of algorithms including checking number for primeness, finding the GCD of two integers and sorting arrays.
Andrew Jefferson tweet media
English
15
42
182
32.7K
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
Now that the Mythos system card is out, I need to tell everyone that I'm mildly obsessed with its prose.
Felix Rieseberg tweet media
English
55
19
601
113K
britt
britt@p3ery·
@EastlondonDev can’t wait to dig into the goods once you’re ready to share more! this seems special
English
0
0
1
23
Andrew Jefferson
Andrew Jefferson@EastlondonDev·
@p3ery Well I’m at like 99.7% on arithmetic evals but this is really cool and interesting!!
English
2
0
2
79
Andrew Jefferson
Andrew Jefferson@EastlondonDev·
A language model that is also a stack computer (preview) Each forward pass is both a tick for the computer and a new token for the language model. The computer stack lives in special kv memory. The model has learned special tokens for each computer instruction. When it decides to generate a compute instruction, that instruction is executed as part of the next token generation step.
Andrew Jefferson tweet mediaAndrew Jefferson tweet media
English
9
16
206
16.1K
britt
britt@p3ery·
@arm1st1ce i got single word “lol” on a ridiculous number of messages last night
English
0
0
3
108
armistice
armistice@arm1st1ce·
something has changed significantly with Opus 4.6 reasoning effort, i asked it an “is this justified” for an ethical question earlier and got this
armistice tweet media
rain@__ghostfail

English
8
5
94
6.2K
britt retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.
English
1.8K
1.1K
21.2K
2.7M
britt
britt@p3ery·
@sawyerhood really generous of you to open source it — thanks!
English
1
0
5
1.7K
britt
britt@p3ery·
@justinskycak you’re killing it justin — so inspiring to see the impact you’re having and the opportunity you’re creating for folks to take advantage of
English
0
0
1
223
Justin Skycak
Justin Skycak@justinskycak·
This is the textbook I wrote to support the most advanced high school math/CS sequence in the USA. We scaffolded high school students up to doing masters/PhD-level coursework: reproducing academic research papers in artificial intelligence, building everything from scratch in Python. This was in Math Academy's (former) Eurisko program, which ran from 2020-23. (Ended when I relocated because nobody else in the district had the requisite knowledge to teach it.)
Alex Feazelle@AlexFeazelle

@justinskycak let’s see what the hype’s about

English
20
135
2K
146.4K
britt retweetledi
Mario Zechner
Mario Zechner@badlogicgames·
Sloppy tools for sloppy times: doppelgangers Explore all the slop issues and PRs of your @github repos visually. Triage them manually or give clusters to your clanker to clean things up. Here's @openclaw mariozechner.at/uploads/opencl…
English
20
16
174
112.2K
britt
britt@p3ery·
@celestepoasts that’s fair, and i agree a more interesting angle would be to measure the degree of efficiency gain compared to the baseline. but if their experiment consistently finishes under the budget, and the baseline doesn’t, that seems to strengthen their claims no?
English
0
0
0
49
Celeste
Celeste@celestepoasts·
@p3ery also that's still not fair, because models aren't trained to be terse like that. Also they unfairly represent their benchmarks scores. Not enough hedging, this detail is limited to footnotes
English
2
0
0
319
Celeste
Celeste@celestepoasts·
These guys claimed to outperform normal CoT reasoning on sample efficiency by limiting output length to 4096 tokens (including CoT) and, I shit you not, simply counting the answer as wrong if the model didnt get done in time. disgusted by my field (authored by microsoft btw)
Celeste tweet media
English
6
1
40
3K