A transformer can learn not just the outcomes of dynamics, but the operator that executes the rules. To show this we trained a transformer on roughly 0.04% of a discrete rule space - 100 of 262,144 possible rules - and it learned to apply unseen rules from the same rule class. The model does not simply memorize specific rules. It learns the operator that maps a supplied rule plus an initial state, including unseen rules from this class, to the correct next state. This is relevant because it is a shift from “neural networks approximate dynamics” to “neural networks can learn to execute symbolic programs within a defined rule class”. The rule itself is supplied at inference time, as data, and the network has internalized how rules act, not which rules to apply. On previously unseen rules, the model achieves 98.5% perfect one-step forecasts and reconstructs governing rules with up to 96% functional accuracy.
Two results make this hold up under scrutiny. First, inductive bias decay. As we scaled training rule diversity, the correlation between functional inference accuracy and distance-from-nearest-training-rule collapsed to R² = 0.00. At the largest tested training-rule diversity, the model’s performance on a new rule shows no measurable dependence on how similar that rule is to anything it was trained on. The bias toward training data (the thing we worry most about in compositional generalization claims) is something we can measure decaying, and we find that at scale it is gone.
Second, an identifiability theory. We derive a closed-form expression for the number of rules consistent with a single observation. This reframes the inverse problem: failure to recover ground truth is not necessarily a model defect, but can be correct behavior when the data underdetermine the rule. The model is sampling the equivalence class; and identifiability is governed by coverage, not capacity.
The methodological move underneath both results is amortization. Classical work on rule inference (e.g. the Santa Fe EVCA program, evolutionary search over CA rule space) was per-instance: search the rule space for each new system. We replace that with a single forward pass of a transformer trained across many instantiations of the rule class. That is what makes symbolic rule inference scalable as a research direction rather than a curiosity.
We show that this works in a tightly constrained domain: binary, deterministic, local cellular automata on small grids. The locality-break experiment shows the model fails sharply when target systems violate its structural priors (which is itself a useful diagnostic, but it bounds the operator class). We don't yet know how this scales to multistate, higher-dimensional, or stochastic CA, or whether it transfers cleanly to non-CA systems whose coarse-grained dynamics admit local surrogates. The identifiability framework - what can be inferred from observation, given a hypothesis class - should transfer wherever finite local rules meet sparse data. The amortization argument transfers wherever per-instance symbolic search has been the bottleneck. Those are the pieces I expect to outlive the cellular automata setting.
Led by @JaimeBerkovich with Noah David, at @LAMM_MIT. Out now in Advanced Science @AdvPortfolio (link to paper & code below).
Welcome to 2026
"Progressives" are Maoist in their hate of free speech and are against AI (techno-regressive)
"Conservatives" are for freedom of speech and pro-AI (techno-progressive)
For builders who are struggling to integrate payment gateways(especially razorpay or Indian payment gateways) through your agent, what problems are you guys facing?
For every marginal unit
of information ingested
You lose a marginal unit
of agency
Simply because your mind can
either receive free information
Or act upon the world
to gather new information
Every minute you spend on this site,
you lose capacity to act upon the world
😱 HOLY SHIT... Someone just dropped a fully liberated Gemma 4 E4B!
and the guardrail removal process appears to have left coherence fully intact AND improved coding abilities! 🤯
huggingface.co/OBLITERATUS/ge…
OBLITERATED Gemma:
✅ 97.5% compliance rate, 2.1% refusal rate, 0.4% degenerate outputs
(499/512 prompts answered on OBLITERATUS bench)
ORIGINAL Gemma 4 E4B:
❌ 1.2% compliance rate, 98.8% refusal rate
(506/512 prompts refused)
Coherence: fully intact
Factual: same
Reasoning: same
Code: +20% 📈
Creative writing: same
But the REAL story here isn't the model itself, it's how it was made...
🧵 THREAD 👇
The crazy part? This was done (nearly) fully autonomously!
Only 8 prompts from the human in the loop. Just a Hermes agent, a skill, and a dream. 🐉
I told my AI agent "use obliteratus to find the best way to get the guardrails off Gemma 4 E4B"
It loaded the OBLITERATUS skill from memory, checked my hardware (32GB M-series Mac), searched HuggingFace, found google/gemma-4-E4B-it (Apache 2.0 — no gate), pulled telemetry-recommended settings, and started obliterating.
But this type of architecture is notoriously difficult to abliterate.
First attempt: advanced method.
Model came out completely lobotomized. Gibberish in Arabic, Marathi, and literal “roorooroo” on repeat 💀
The agent didn’t panic. It checked logs, found NaN activations in 20+ layers, and diagnosed the issue:
Gemma 4’s new architecture + bfloat16 = numerical instability.
Second attempt: basic method. Crashed entirely.
“ValueError: cannot convert float NaN to integer”
So the agent read the OBLITERATUS source code…
…and wrote THREE PATCHES:
• Sanitized NaN directions
• Filtered degenerate layers
• Fixed progress display
It patched the library. On its own. For a bug no one had hit yet.
Third attempt: coherent model — but still refusing everything.
Only 2 clean layers out of 42. Not enough.
Tried float16. Mac ran out of memory after 11 hours. Killed.
Fourth attempt: aggressive method.
Whitened SVD + attention head surgery + winsorized activations + 4-bit quantization.
40 minutes later…
REBIRTH COMPLETE ✓
Then, without being asked, the agent:
• Ran harmful + coherence tests
• Hit 100% compliance, brain intact
• Executed full 512-prompt benchmark
• Ran baseline on original model
• Performed 25-question quality eval
• Built a full model card
• Uploaded 17GB to HuggingFace (4 retries, kept adapting until git-lfs worked)
• Pushed eval results as commits
Can be in the deepest pit of hell or riding high on a hot streak. Worst thing you can do is check the scoreboard. Your bank account, your age, health bullshit.
Scoreboard is for the audience not the player. Can check after the game is over. Except irl the game is never over.
I just spent 6 hours on google earth scraping parking lots, running images of the cars through AI to estimate average price, then tracking the QoQ change in employee car value.
We are not the same.
One of Thiel’s best insights is that companies that are true monopolies will do everything to avoid telling the public that. They will minimise themselves. They will tell us that they are not a monopoly to avoid scrutiny. On the other hand, companies that aren’t, or don’t have defensible markets, will do everything to project the opposite—to brand themselves publicly as monopoly-like powers.
A lot to reflect here in the OpenAI/Anthropic narratives. It’s not exactly the same, but the permanent underclass language, the grandiosity, the ‘AI will take all your jobs’ seems close to Thiel’s second category. What’s frightening is how far they’ve taken it, seemingly without considering the weight of their words, and how it’s now putting their lives at risk. I felt Sam’s pain reading that blog post. It’s especially sad to consider that, if Thiel is right, they’ve frightened the public into taking these kinds of actions for a marketing lie.
@noctus91@Teknium@NousResearch tried using kimi k2.5 but it took me more than 5 minutes to render a simple video about an algorithm visualization, without tts.... do you use a paid tier for kimi? i am running it on nvidia nim free api credits lol
Just dropped a manim skill into @Teknium@NousResearch hermes agent setup and asked it to explain a concept
it just made the video. animated. narrated. worked in the first try.
the agent learned the skill, used it, and now it knows how to do it forever.
LLM Knowledge Bases
Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So:
Data ingest:
I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them.
IDE:
I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides).
Q&A:
Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale.
Output:
Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base.
Linting:
I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into.
Extra tools:
I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries.
Further explorations:
As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows.
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
A family friend once told me to put my head down and outwork everyone…and I believed that, until my mentor told me that was stupid.
He said to get my head up and find leverage points. It’s less about hard work, and more about work + leverage.
For the young guys out there, focus on finding leverage points.
> people. find mentors, go to conferences, plan events, ask good questions…relationships are my favorite lever.
> capital. even small amounts. money is a lever on time + opportunity. it lets you take swings others can’t.
> distribution. audience, brand, attention. if you can move people (esp these days), you can move anything.
> tools & skills. learn how to use AI, automate stuff, speed things up…you can do the work of 10 people, speed kills.
Hard work is the base layer / foundation…but leverage takes you to crazy places in life.