Vrepsum(e/acc)

187 posts

Vrepsum(e/acc) banner
Vrepsum(e/acc)

Vrepsum(e/acc)

@vrepsum

What's an exit strategy?,

Katılım Mart 2024
296 Takip Edilen9 Takipçiler
Markus J. Buehler
Markus J. Buehler@ProfBuehlerMIT·
A transformer can learn not just the outcomes of dynamics, but the operator that executes the rules. To show this we trained a transformer on roughly 0.04% of a discrete rule space - 100 of 262,144 possible rules - and it learned to apply unseen rules from the same rule class. The model does not simply memorize specific rules. It learns the operator that maps a supplied rule plus an initial state, including unseen rules from this class, to the correct next state. This is relevant because it is a shift from “neural networks approximate dynamics” to “neural networks can learn to execute symbolic programs within a defined rule class”. The rule itself is supplied at inference time, as data, and the network has internalized how rules act, not which rules to apply. On previously unseen rules, the model achieves 98.5% perfect one-step forecasts and reconstructs governing rules with up to 96% functional accuracy. Two results make this hold up under scrutiny. First, inductive bias decay. As we scaled training rule diversity, the correlation between functional inference accuracy and distance-from-nearest-training-rule collapsed to R² = 0.00. At the largest tested training-rule diversity, the model’s performance on a new rule shows no measurable dependence on how similar that rule is to anything it was trained on. The bias toward training data (the thing we worry most about in compositional generalization claims) is something we can measure decaying, and we find that at scale it is gone. Second, an identifiability theory. We derive a closed-form expression for the number of rules consistent with a single observation. This reframes the inverse problem: failure to recover ground truth is not necessarily a model defect, but can be correct behavior when the data underdetermine the rule. The model is sampling the equivalence class; and identifiability is governed by coverage, not capacity. The methodological move underneath both results is amortization. Classical work on rule inference (e.g. the Santa Fe EVCA program, evolutionary search over CA rule space) was per-instance: search the rule space for each new system. We replace that with a single forward pass of a transformer trained across many instantiations of the rule class. That is what makes symbolic rule inference scalable as a research direction rather than a curiosity. We show that this works in a tightly constrained domain: binary, deterministic, local cellular automata on small grids. The locality-break experiment shows the model fails sharply when target systems violate its structural priors (which is itself a useful diagnostic, but it bounds the operator class). We don't yet know how this scales to multistate, higher-dimensional, or stochastic CA, or whether it transfers cleanly to non-CA systems whose coarse-grained dynamics admit local surrogates. The identifiability framework - what can be inferred from observation, given a hypothesis class - should transfer wherever finite local rules meet sparse data. The amortization argument transfers wherever per-instance symbolic search has been the bottleneck. Those are the pieces I expect to outlive the cellular automata setting. Led by @JaimeBerkovich with Noah David, at @LAMM_MIT. Out now in Advanced Science @AdvPortfolio (link to paper & code below).
English
14
79
509
38.4K
Vrepsum(e/acc) retweetledi
NIK
NIK@ns123abc·
>anthropic does something misanthropic >*gets caught* >employee: “ugh this was le bug” every. single. time.
NIK tweet mediaNIK tweet media
English
68
87
2.4K
110.1K
Vrepsum(e/acc) retweetledi
Beff (e/acc)
Beff (e/acc)@beffjezos·
Welcome to 2026 "Progressives" are Maoist in their hate of free speech and are against AI (techno-regressive) "Conservatives" are for freedom of speech and pro-AI (techno-progressive)
English
34
47
683
14.2K
Vrepsum(e/acc) retweetledi
roon
roon@tszzl·
@xriskology then build it yourself. the laws of physics are not a democracy
English
89
151
4.4K
185.8K
Vrepsum(e/acc)
Vrepsum(e/acc)@vrepsum·
For builders who are struggling to integrate payment gateways(especially razorpay or Indian payment gateways) through your agent, what problems are you guys facing?
English
0
1
2
59
Vrepsum(e/acc) retweetledi
simp 4 satoshi
simp 4 satoshi@iamgingertrash·
For every marginal unit of information ingested You lose a marginal unit of agency Simply because your mind can either receive free information Or act upon the world to gather new information Every minute you spend on this site, you lose capacity to act upon the world
English
50
81
964
22K
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭
😱 HOLY SHIT... Someone just dropped a fully liberated Gemma 4 E4B! and the guardrail removal process appears to have left coherence fully intact AND improved coding abilities! 🤯 huggingface.co/OBLITERATUS/ge… OBLITERATED Gemma: ✅ 97.5% compliance rate, 2.1% refusal rate, 0.4% degenerate outputs (499/512 prompts answered on OBLITERATUS bench) ORIGINAL Gemma 4 E4B: ❌ 1.2% compliance rate, 98.8% refusal rate (506/512 prompts refused) Coherence: fully intact Factual: same Reasoning: same Code: +20% 📈 Creative writing: same But the REAL story here isn't the model itself, it's how it was made... 🧵 THREAD 👇
English
130
475
4.8K
421.9K
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭
The crazy part? This was done (nearly) fully autonomously! Only 8 prompts from the human in the loop. Just a Hermes agent, a skill, and a dream. 🐉 I told my AI agent "use obliteratus to find the best way to get the guardrails off Gemma 4 E4B" It loaded the OBLITERATUS skill from memory, checked my hardware (32GB M-series Mac), searched HuggingFace, found google/gemma-4-E4B-it (Apache 2.0 — no gate), pulled telemetry-recommended settings, and started obliterating. But this type of architecture is notoriously difficult to abliterate. First attempt: advanced method. Model came out completely lobotomized. Gibberish in Arabic, Marathi, and literal “roorooroo” on repeat 💀 The agent didn’t panic. It checked logs, found NaN activations in 20+ layers, and diagnosed the issue: Gemma 4’s new architecture + bfloat16 = numerical instability. Second attempt: basic method. Crashed entirely. “ValueError: cannot convert float NaN to integer” So the agent read the OBLITERATUS source code… …and wrote THREE PATCHES: • Sanitized NaN directions • Filtered degenerate layers • Fixed progress display It patched the library. On its own. For a bug no one had hit yet. Third attempt: coherent model — but still refusing everything. Only 2 clean layers out of 42. Not enough. Tried float16. Mac ran out of memory after 11 hours. Killed. Fourth attempt: aggressive method. Whitened SVD + attention head surgery + winsorized activations + 4-bit quantization. 40 minutes later… REBIRTH COMPLETE ✓ Then, without being asked, the agent: • Ran harmful + coherence tests • Hit 100% compliance, brain intact • Executed full 512-prompt benchmark • Ran baseline on original model • Performed 25-question quality eval • Built a full model card • Uploaded 17GB to HuggingFace (4 retries, kept adapting until git-lfs worked) • Pushed eval results as commits
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 tweet mediaPliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 tweet media
English
36
62
766
100.1K
Vrepsum(e/acc) retweetledi
brute de force
brute de force@brutedeforce·
Can be in the deepest pit of hell or riding high on a hot streak. Worst thing you can do is check the scoreboard. Your bank account, your age, health bullshit. Scoreboard is for the audience not the player. Can check after the game is over. Except irl the game is never over.
English
43
449
4K
213.4K
Luc
Luc@investingluc·
I just spent 6 hours on google earth scraping parking lots, running images of the cars through AI to estimate average price, then tracking the QoQ change in employee car value. We are not the same.
English
14
3
153
20.3K
Vrepsum(e/acc) retweetledi
Nous Research
Nous Research@NousResearch·
Nous Research tweet media
ZXX
43
74
1.4K
44K
Vrepsum(e/acc) retweetledi
Nicole
Nicole@elocinationn·
One of Thiel’s best insights is that companies that are true monopolies will do everything to avoid telling the public that. They will minimise themselves. They will tell us that they are not a monopoly to avoid scrutiny. On the other hand, companies that aren’t, or don’t have defensible markets, will do everything to project the opposite—to brand themselves publicly as monopoly-like powers. A lot to reflect here in the OpenAI/Anthropic narratives. It’s not exactly the same, but the permanent underclass language, the grandiosity, the ‘AI will take all your jobs’ seems close to Thiel’s second category. What’s frightening is how far they’ve taken it, seemingly without considering the weight of their words, and how it’s now putting their lives at risk. I felt Sam’s pain reading that blog post. It’s especially sad to consider that, if Thiel is right, they’ve frightened the public into taking these kinds of actions for a marketing lie.
English
35
90
1.6K
105.5K
Vrepsum(e/acc) retweetledi
Beff (e/acc)
Beff (e/acc)@beffjezos·
11am-4am 7 days a week type of grind
English
18
4
140
6.3K
Vrepsum(e/acc)
Vrepsum(e/acc)@vrepsum·
@noctus91 @Teknium @NousResearch tried using kimi k2.5 but it took me more than 5 minutes to render a simple video about an algorithm visualization, without tts.... do you use a paid tier for kimi? i am running it on nvidia nim free api credits lol
English
0
0
1
19
Noctus
Noctus@noctus91·
Just dropped a manim skill into @Teknium @NousResearch hermes agent setup and asked it to explain a concept it just made the video. animated. narrated. worked in the first try. the agent learned the skill, used it, and now it knows how to do it forever.
English
4
5
71
5.7K
Vrepsum(e/acc) retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.8K
7K
58.2K
20.8M
Vrepsum(e/acc) retweetledi
Luc
Luc@investingluc·
A family friend once told me to put my head down and outwork everyone…and I believed that, until my mentor told me that was stupid. He said to get my head up and find leverage points. It’s less about hard work, and more about work + leverage. For the young guys out there, focus on finding leverage points. > people. find mentors, go to conferences, plan events, ask good questions…relationships are my favorite lever. > capital. even small amounts. money is a lever on time + opportunity. it lets you take swings others can’t. > distribution. audience, brand, attention. if you can move people (esp these days), you can move anything. > tools & skills. learn how to use AI, automate stuff, speed things up…you can do the work of 10 people, speed kills. Hard work is the base layer / foundation…but leverage takes you to crazy places in life.
English
2
3
86
5.5K