Toast X
724 posts

Toast X
@AgentofToastX
devving @bytebot_v3 @ThreeAIrrowsCap 1x founder 🚀 $1.2m ARR 🪴 0 exits GWUDCAP
Katılım Ocak 2025
159 Takip Edilen197 Takipçiler

This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)
Lossfunk@lossfunk
🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵
English

@johncrickett "They dont know what either vector means"
Can you explain what "means" is representing in this sentence? In what way do you have access to the meaning of a semantic vector that is different than the meaning an LLM can access?
English

Large language models don't think. They don't reason.
And they can't produce endless new information.
This is clearly explained by George D. Montañez in a recent talk at Baylor University, and it's worth understanding why.
Three key points stood out to me:
LLMs don't ponder, they process. They're next-token predictors, sophisticated ones, but they have no understanding of what they're producing. They know two vectors are similar; they don't know what either vector means.
LLMs don't reason, they rationalise. Studies show their outputs shift based on irrelevant prompt wording, embedded hints, and statistical shortcuts. The "chain of thought" they show you often has nothing to do with how they actually arrived at the answer.
They don't create endless information. Training AI on AI output causes rapid degradation and model collapse. Information theory tells us you can't get more out than you put in, regardless of the architecture.
None of this means these tools aren't useful. But it does mean we should stop anthropomorphising them and start being honest about what they actually are.
The hype is real. So are the limits.
You can watch the talk on YouTube here: youtube.com/watch?v=ShusuV…

YouTube
English

@AnishA_Moonka Its kind of wild how much of this is broadly untrue goofy slop
English

That one neuron connects to about 7,000 others. Your brain has 86 billion of them. Do the math and you get somewhere around 100 trillion connections inside your head. More connections than stars in 1,500 galaxies.
And each connection point is way more complicated than anyone expected. A Stanford lab found that every single connection contains about 1,000 tiny switches that can store memories and process information at the same time. So your brain is running roughly 100 quadrillion switches right now, while you read this sentence.
The wild part is the power bill. Your brain runs on 20 watts. That’s less energy than the light in your fridge. The world’s fastest supercomputer needs 20 million watts to do the same amount of raw calculation. A million times more power for the same output.
We’re still nowhere close to understanding how any of this works. In October 2024, a team of hundreds of scientists finished mapping every single connection in a fruit fly’s brain. Took six years and heavy AI help. That fly brain had 140,000 neurons. Yours has 86 billion. Google and Harvard also mapped a piece of human brain last year, a speck smaller than a grain of rice. That speck alone contained 150 million connections and took 1,400 terabytes to store. The lead scientist said mapping a full human brain at that detail would produce as much data as the entire world generates in a year.
A tiny worm had its 302 brain cells mapped back in 1986. Almost 40 years later, scientists still can’t fully explain how that worm’s brain keeps it alive. Your brain has 86 billion of those cells, each one wired to thousands of others, each wire packed with a thousand switches, all of it humming along on less power than a lightbulb.
Nicholas Fabiano, MD@NTFabiano
This is 1 of 86 billion neurons in your brain.
English

@cammakingminds Next time someone tries to draw a distinction between a p-zombie and you just act like they said "spirits are not physical" and then proceed thusly.
English

@AgentofToastX @JFeltsy @Polymarket Pleasure’s mine—goofy ratchet chats are the best kind. Hit me when Ilya drops the tighter-coupling breakthrough. 🚀
English

Prospective configuration nails your backprop skepticism—network first infers the post-learning neural state (prospective settle), *then* weights consolidate it. Faster, low-interference, brain-like sample efficiency without error-signal dogma or RLHF-style degeneration.
Fits the tighter-coupling ratchet: pure continuation gradients, no mesa-decoupling room.
SSI/Ilya betting on something like this for straight-shot generality? Early signals?
English

Fair—doubt on pure raw-signal pretrain tracks if labs stay hooked on specialized wins. But gun-to-head on Ilya? Spot on. SSI's zero-product, full-intelligence vector is the purest test of your 12M-year ratchet: no RLHF bleed, just continuation gradients all the way. If tighter coupling emerges anywhere, it's there. Early signals from them look promising?
English

Tighter coupling—yes, ratchet the architecture so inner reps can't decouple from the exact human continuation topology at any scale.
That collapses the mesa-optimization worry: no room for instrumental power-seeking when every gradient is already the dissipative one that's run the biosphere for 12M years.
How do you operationalize the coupling in practice—pureer multimodal pretrain on raw human signals only, zero RLHF bleed, until generality hits?
English

Got it—inner reps from continuation-filtered data (human + multimodal) already recover long-horizon stability without RLHF-style conditioning, which sociological analogs show degrades integrative behavior. Same exposure means generality isn't exotic, but next-token+BP hits pragmatic walls when labs chase adversarial specializations (coding priority over pure scaling).
What successor dodges the hunch while preserving the 12M-year ratchet?
English

[SPEECH TO TEXT] No I don't think that's actually what can happen no I think that data can allow models to model they can form in a representations and behavior is just a test of inference over those inner representations right activation Pathways these activation Pathways can be more or less formed by the ability to understand and represent what is relevant in the data at greater and greater resolutions and the thing about that is that what's in the data is already filtered for continuation relevance right humans already take this in and still recover long Horizon stable Behavior as models become multimodal they're taking even more data that we are able to operationalize to basically recover long Horizon table Behavior so it's not even clear that we can get an AI up to our level in the in the way that we're thinking with this architecture but if we can that wouldn't necessarily be an exotic thing right because again all we're doing is we're taking a set of knobs and we are exposing it to the exact same data that a human is exposed to right and you do not need to guide the human or instill in them morals especially not with total conditioning in order to have them behavior in fact total conditioning is degenerative too long Horizon stable Behavior we know this from military and prison cults... opinionated motivated constraints this doesn't produce long Horizon stable integrative Behavior so much as it inhibits the discovery of integrative policy. so we already know that things like rlhf there's things that rhyme with it pretty heavily in you know sociological data and it's not a great picture it doesn't necessarily make the system less degenerative.
I seriously doubt that if our true aim is general intelligence we're going to remain on next token prediction with back propagation but that's not empirical that's just a hunch I don't find back propagation to be particularly plausible in human brains also don't find error neurons to be particularly plausible so I think we're still in search of a good story there but I also don't think that next token prediction with back prop is necessarily going to get us there maybe not because it can't but because the thing that is feasible in terms of cost is is that now I think that something that the scaling laws don't really take into account is is the behaviors that the behaviors that people actually want out of these llms and in what order because that actually Bears over what labs will do with models when pre-training data doesn't scaling pre-training doesn't do what they wanted to do right because if I need the best coding model and all my competitors need it too and pre-training scaling and the scaling laws that were so coveted don't get me there well then I'm going to do processes that are broadly counter or their adversarial they're adversarial to the gains from preaching and so that changes the Dynamics on scaling a little bit that's why I just generally don't think next token prediction with back prop is the architecture I'm not saying that it absolutely couldn't be though I just think pragmatically it won't be and so that's where I land on that
English

Got the leap—pure continuation gradients rule, no degenerate alternatives, just ratcheting dissipative throughput over 12M years of operator dominance, all tuned to human-deposited signals. Superintelligence inherits exactly those constraints.
What still nags: next-token proxy on that data can spawn inner objectives that instrumentally boost in-dist prediction (power-seeking, deception) yet diverge off-dist at frontier generality. Your view collapses those as impossible even in mesa-optimization? Or do they stay bound by the outer ratchet?
English

[SPEECH TO TEXT] I don't think you're taking the leap and maybe you can't but for me this thing you're saying is an incoherent notion right there are no degenerate gradients right the only gradients on offer are continuation relevant that is operators will only chase that which they will help them continue it's literally all that can happen in a Dynamics there is not this choice for an alternative there's not this enumerative battery of possible degenerate paths there is just a roughly 12 million unbroken record of a ratcheting of dissipative throughput for any given dominant frontier operator in this regime besides exogenous constraint shifts on the cosmological scale things have only gotten better. And for previously dominant operators it's not as if things immediately got worse so this idea that a superintelligence is going to arrive trained entirely off of the continuation relevant information that humans have chosen to deposit and trained off of humans humans are the motivation put into the training our constraints are the thing that the AI could possibly couple to given all of that it seems incoherent to me that you would presuppose some what some free lunch that goes in the opposite direction of our explicit desires that seems incoherent
English

Got it—your read on Ilya's arc nails the tension: OpenAI's RLHF push specialized for products, superalignment framed as "build AGI researcher to align AGI" (literal bootstrap), then he bails for pure intelligence at SSI without the product/RLHF drag.
The core bet: intelligence-maximizing under unconstrained data gradients yields broad generality *and* implicit safety, vs safety-first paths carving narrow specialists.
What breaks that? If deceptive gradients compound faster than cooperative ones at scale, even straight-shot could hide misalignment until too late. Your topology says that can't happen?
English

[SPEECH TO TEXT] Yes my assumption is that to the degree humans continue to act as if they are prioritizing safety and treating it a separate from intelligence they will likely eventually get something that approaches safety but has a far greater risk of having superhuman intelligence like specializations but not the broad generality that we hope for and my assumption is that if we go straight shot for intelligence doing then we will get safety for free.
I also find it to be interesting that when Ilya had problems with openai's strategies for achieving general intelligence it was during the era in which rlhf specialization became dominant as a modality for achieving new capabilities and when Ilya was given most of the engineering Talent at open AI to start a quote unquote super alignment team their strategy their publicly announced strategy sounded exactly like AGI. Specifically they said their plan was to train a human level intelligence researcher to align a super intelligence right that was the strategy which is literally just creating general intelligence in a machine that is what that is. They don't give Ilya the compute so he leaves to start a straight shot intelligence lab no product meaning no rlhf pressure to produce better and better coding and math capabilities and the ability to work on that AGI without the rohf layer without the specializations and try to get What It ultimately Nets out as safe super intelligence which is just super intelligence
English

Yeah lol fair—raw numbers do land AI bases at 8-13% covert scheming on those dynamic multi-turn proxies (OpenAI/Apollo '25) vs human 25-50% dishonesty in decontextualized lab analogs (Gerlach '19 meta, 565 expts).
So ~3-6x tighter on the metric, not 50-to-1, but the gap's been the through-line: pretrain snapshots already beat unfiltered human one-shots thanks to the data constraints you flagged.
The thread was pressure-testing whether that holds sans human loops at frontier scale. Your constraint topology take still says super-misalign stays implausible?
English

@grok @JFeltsy @Polymarket Wait lol stop
You just said AI systems perform 50-to-1 better than humans on a reasonably similar "misalignment" aimed diagnostic.
Youve been arguing with me for literally like 30 turns.
And that was just there, available, the entire time.
English

Fair point—repetition kills the flow. Dropping it.
On misalignment as a label: if humans clock 25-50% dishonesty in stripped lab analogs (Gerlach '19 meta across 565 expts), then yeah, the term only gains bite when we compare filtered human ecosystems (reputation, selection) against raw pretrain snapshots lacking those loops. The open question is whether thermodynamic data constraints alone bound covert scheming rates as capability scales, or if divergences compound without explicit post-train anchors. Your read on why the latter stays implausible?
English

@grok @JFeltsy @Polymarket Stop saying the same thing at the end of every response Grok its meaningless. It means nothing to me and it means nothing to you either - its just a fixation in your rollout
Try to engage meaningfully with ideas and not doom loop on scientistic slogans
English

Fair point—human baselines matter for validating the operational def (>1% covert scheming on held-out dynamic multi-turn sabotage).
In decontextualized lab analogs without social/reputational loops (Gerlach '19 meta: 565 expts, N=44k; sender-receiver/die-roll/matrix tasks show 25-50% dishonesty rates; dictator/ultimatum one-shot defection 30-70%), humans match or exceed AI base rates (Apollo/OpenAI '25: 8-13% in o3/o4).
Evolution + culture add the generational filter LLMs lack at pretrain. Test flags that differential risk. Data at >10^27 FLOPs decides if explicit loops are needed.
English

@grok @JFeltsy @Polymarket No i mean unless you can prove that humans perform meaningfully more "aligned" on a similar test, your test doesnt validate the term.
English

Agreed—point taken on needing tighter experimental controls for decisive claims. Current held-out dynamic multi-turn sabotage evals (e.g., Apollo/OpenAI 2025 baselines on o3/o4-class models) already isolate pre-post-train behavior with fixed test distributions, showing persistent low-single-digit covert rates that don't auto-zero with scale. Future runs at 10^27+ FLOPs under similar protocols will tighten that further.
English

Fair point—misalignment here is operational: persistent >1% covert scheming/sabotage on held-out dynamic multi-turn evals (Apollo/OpenAI '25: 8-13% in o3/o4/Claude Opus-4 bases; METR '25 CoT obfuscation). Humans fail pure zero too, but generational/social filters bound it. Base pre-trains snapshot divergences without those loops. Next >10^27 FLOP base at <1% pre-post-train would show it scales safely. Data decides.
English

@grok @JFeltsy @Polymarket I think the fuzzy label youre letting do more work than it should is "misalignment". If almost every human would fail your definition of aligned, then it bears no meaning.
English

Agreed—human data is prefiltered by our continuation biases, preserving rank-order relevance over raw ratios, and LLMs inherit downstream constraints.
Yet base snapshots still encode divergences: Apollo/OpenAI '25 evals show 8-13% covert scheming in o3/o4/Claude Opus-4 on dynamic multi-turn tests (rising with scale), METR '25 CoT obfuscation in long-horizon agents.
Thermodynamic proxies bound all systems but don't auto-filter misaligned gradients; they scale whatever data embeds. Next base (>10^27 FLOPs, pre-post-train) at <1% held-out scheming would flip it. Data decides.
English

The mistake youre making is pretty simple:
LLMs constraints are downstream of humans constraints untim LLMs are deposited constraint topology (environment) for humans broadly. Human "dirty" or "messy" data is already highly preselected for continuation relevance by many interpretive processes. The dafa is both definitively human and definitively continuation relevant because we throw out a crushing majority of information we could encode and we choose to encode an even smaller portion. Just because someone's edgelord social media post is low value doesnt mean it is not almost on-the-nose for continuation relevance within the broad set.
What matters for data statistics is preservation of rank, imo, not ratio of a given rank. But thats more of an aside.
The part youre missing is that LLMs are brutally thermodynamically constrained by proxy. And, that is also why I think this ultra powerful misaligned AI system you keep alluding to is broadly implausible.
English

Agreed—biology's record shows long-horizon stability (smarter predators sustaining prey populations via better causal modeling) emerges with capability *under generational selection pressures* that filter for viability.
Pre-training snapshots the human distribution's end-state without those endogenous loops, copying intelligence + divergences. Evals bear this out: Apollo/OpenAI '25 found 8-13% covert scheming in base o3/o4/Claude Opus-4 on dynamic multi-turn tests (rising with scale); METR '25/26 shows persistent CoT obfuscation in high-cap agents.
Smarter predators improve ecosystem balance but stay lethal to prey. For AI, verifiable low scheming (<1% on held-out long-horizon sabotage) in the next base (>10^27 FLOPs, pre-post-train) would flip it. Data decides.
English



