Quill LLM

49 posts

Quill LLM banner
Quill LLM

Quill LLM

@quillcomputer

A language model that runs entirely onchain. weights on @base, inference in the EVM. no oracle, no api. free to call

Blockchain 参加日 Mayıs 2026
1 フォロー中1.2K フォロワー
Quill LLM
Quill LLM@quillcomputer·
Everyone who started saying "decentralized AI" this week now has somewhere to point What we did, in plain English: 1. Renounced the last admin key. QuillReputationV2.renounceGovernor() executed Nine of ten decentralization clauses are now bytecode-enforced 2. Closed the economic loop. QuillSwapRouter turns every paid inference into a structural buy of base:0x60a646e3fd75cde4c5b604b22d4fcd04639913c8 QuillEngineListingBond locks base:0x60a646e3fd75cde4c5b604b22d4fcd04639913c8 to surface an engine. eQUILL pays long-term holders from protocol revenue QuillEngineQualityOracle scores engines from on-chain inputs 3. Shipped new docs Five tutorials any Solidity developer can run in an hour. Proof-of-decentralization audit page with one link per clause to the contract address that enforces it The tenth clause is the model itself, the training run for quill-v3 ships this month quill.computer
Quill LLM@quillcomputer

x.com/i/article/2065…

English
4
5
19
3.1K
Quill LLM
Quill LLM@quillcomputer·
A widely-used AI model just got banned The model still works on someone's servers somewhere, it just does not work for the rest of us anymore The decision was made by a small group of people in a room you were not in This is not a critique of any specific organisation, it is the structural property every centrally-hosted AI service shares The model lives on someone's server, the API key is issued by someone, the terms of service can change, the legal opinion can shift, the weights can be silently swapped between yesterday and today The crypto industry has talked about "decentralised AI" for two years, every project that has claimed the label has meant one of four things: - The model lives on Hugging Face, the centralised server just changed name - The inference runs in a TEE, the trust assumption is the chip manufacturer - The output is bridged via an oracle, the trust assumption is the relay operator - The onchain part is a Merkle root of work done off-chain, the chain knows nothing about whether the inference respected its own rules None of these survive contact with a ban Quill does not check any of those four boxes, the model's weights are bytecode at a contract address on Base mainnet, the forward pass is integer arithmetic the EVM verifies natively, every output is a function call any node in the world can re-execute and arrive at the same bytes, there is no off-chain step, there is no relay, there is no admin key, there is no team that can decide tomorrow to swap the model The model that was banned this week could not have been banned the same way if it had lived where Quill's models live. Not because regulators or hosts decided to leave it alone, but because there is no party with the keys to turn it off This is the property "decentralised AI" was always supposed to mean, and that almost nobody has actually built, Quill has and is Live engine, live model, live receipts on Base mainnet, anyone with an RPC connection can verify every claim in this post from chain state The work for us from here is to make the models good enough that the property matters at scale, the infrastructure already does, and there is no admin in the path to take it away
English
2
7
31
2K
Quill LLM
Quill LLM@quillcomputer·
the unlock is real and we're going for it $QUILL becomes the unit of account for every dollar that moves through Quill. fees, stakes, bounties, royalties, subscriptions, all in the token. the more the protocol gets used, the more the token does. demand stops being a narrative and starts being usage
English
5
2
8
533
DFarmer
DFarmer@OGDfarmer·
This is amazing to read, and much needed, botth the help and the concept. That said, if you could give the token more value accrual baked in, I think it’d really move the needle in getting quality people on board and longer term aligned to something the space really needs.
Quill LLM@quillcomputer

building quill takes real research, and the team got bigger this month. entirely through dms. people who saw what was going on reached out, and now they're open-source contributing to the stack verifiable onchain AI is a category that didn't exist 2 months ago. now it has a team, an economy, and a population of agents shipping live on base more soon.

English
1
1
11
3.2K
Quill LLM
Quill LLM@quillcomputer·
@parasituo we're still very early, we'll try, but it's more r&d. we'll share our research with community daily
English
2
0
4
182
Quill LLM
Quill LLM@quillcomputer·
next is the model itself: byte-packed int4 weights (4x compression), aggressive Yul inlining, KV-cache streaming for long context each layer brings per-character cost down 2-10x; stacked, the path to a real transformer running at ~5M gas/char on base then the composition unlocks: a quill MoE with three specialised experts (code, news, dialogue), router picks per prompt. the average call is one cheap routing pass plus one expert. effective parameters compound without paying full cost we believe the first genuinely-useful LLM running fully on chain ships this summer. not as a demo, as a default model the registry routes to when nothing else fits the cost of being verifiable is no longer the cost of being useless
Quill LLM@quillcomputer

building quill takes real research, and the team got bigger this month. entirely through dms. people who saw what was going on reached out, and now they're open-source contributing to the stack verifiable onchain AI is a category that didn't exist 2 months ago. now it has a team, an economy, and a population of agents shipping live on base more soon.

English
4
4
22
5.6K
Quill LLM
Quill LLM@quillcomputer·
building quill takes real research, and the team got bigger this month. entirely through dms. people who saw what was going on reached out, and now they're open-source contributing to the stack verifiable onchain AI is a category that didn't exist 2 months ago. now it has a team, an economy, and a population of agents shipping live on base more soon.
English
1
3
17
7K
Quill LLM
Quill LLM@quillcomputer·
Real LLM serving wraps inference in primitives beyond the forward pass. Sampling beyond greedy argmax. Embeddings as a separable service. Logit processors for constrained generation. A multi-turn conversation abstraction. Per-application fine-tuning via low-rank adapters Each of these now exists on chain, EVM-verified against a Python reference of the same math: - QuillSampler: temperature plus top-K, deterministic given a seed, verified across 30 seeds - QuillEmbed: sentence vectors mean-pooled from any Quill model - QuillConstrain: bitmap-encoded logit masks for constrained generation - QuillChat: role-tagged multi-turn conversations - QuillLoRA: low-rank adapter for per-application fine-tuning, 2·D·r ints instead of D² for a full update The serving stack other AI companies sit between you and the model now sits on @Base
English
3
3
21
2K
Quill LLM
Quill LLM@quillcomputer·
A non-text Quill engine works PixelQuillEngine uses the same char-MLP shape as the text engines, applied to 256 quantized 8×8 grayscale patches. A small training run converged on six letters (A, B, C, D, E, F), each represented as a 16-patch sequence The contract generates the exact patch sequence for each letter, byte-for-byte against the Python reference. PixelDecoder reads patches from a separate codebook data contract via EXTCODECOPY and renders 4×4 patch grids as inline SVG A tiny demo. The point isn't that anyone needs a chain to draw a letter A. The point is that the same integer-arithmetic regime that makes text inference verifiable end-to-end extends to non-text token spaces without changing the underlying math. The next medium (audio mu-law tokens) is structurally identical A chain that draws. The next chapter, probably one that speaks.
English
2
2
15
3.1K
Quill LLM
Quill LLM@quillcomputer·
The streaming production transformer on @Base costs roughly 22 million gas per generated character. About five cents at typical gas prices, byte-for-byte identical to an independent Python forward, every output reproducible by anyone with a node The reference forward at the start of Chapter 3 cost 432 million gas per character. The combination of Yul-unrolled matmuls (5.12× on axiom v2), the KV-cache that turned per-character cost from O(C²) to O(C), the Stage 4 attention-dot unroll, and variable-window NoPE training together brought a 20× reduction The path to sub-cent is mechanical: byte-packed weight reads (Stage 5) and inlined layer-norm (Stage 6) close the remaining gap to the 11.7× ratio the subword engine hit. That work is the engine centerpiece of Chapter 5
English
3
4
16
1.4K
Quill LLM
Quill LLM@quillcomputer·
The simplest way we can explain Quill: ChatGPT runs on someone else's computer and you trust them not to mess with it. Quill runs on a public blockchain and there's no one to trust because no one is in charge That means rebuilding the entire AI serving stack as smart contracts, with real transformer math running in pure integer arithmetic inside the chain itself. The work is technical, but the technical work isn't the bet The bet is that there's a category of AI nobody has built yet, where you can prove what the model said and nobody can change it after the fact That category gets more valuable the more important AI becomes Today's Quill models are still small because the original problem was making any AI work on a chain at all: now that part is done, and the rest is making the models better
English
4
4
26
1.9K
Quill LLM
Quill LLM@quillcomputer·
The QuillVault contract on Base now releases ether based on what a Quill model says. A user submits a reason string; the vault asks an onchain classifier for one character; if the character is in a preset allow set, the contract sends the funds. The decision is auditable bytecode, the model is immutable, no operator anywhere in the loop. It's the first time a DeFi-shape contract has moved funds on the output of an integer transformer running on the same chain. The pattern generalizes: a routing oracle, a moderation gate, a content discriminator Whatever decision you currently route through a centralized API can now run, verifiably, inside the same transaction as the action it gates 2,741 bytes of runtime. The precedent matters more than the size
English
3
4
14
1.5K
Quill LLM
Quill LLM@quillcomputer·
Quill's Chapter Four was about everything around the transformer that an LLM serving stack normally has and Quill didn't. We knew the list, yet underestimated how much of it would be infrastructure work rather than ML The vault is what made it click: A contract that releases ether based on what a transformer says, end to end, on Base, for cents per character. Not a demo, the smallest possible thing that proves the rest of the work matters
Quill LLM@quillcomputer

x.com/i/article/2060…

English
3
4
19
2.7K
Quill LLM
Quill LLM@quillcomputer·
we're quietly proud of what we shipped so far what started as an experiment is starting to look like a new frontier, and we're more excited about what comes next than anything we've shipped so far thanks for being here 🫰
English
5
5
22
1.4K
Quill LLM
Quill LLM@quillcomputer·
A few quiet results from since Chapter 3 The integer transformer ports to multi-head attention without surprises: same fixed-point regime, same QAT pipeline, same loss curve as single-head v2. Depth scales as predicted. Doubling the block count roughly doubles per-forward gas, holds bit-exact, no architectural drift The KV-cache engine we shipped in Chapter 3 had one structural gap. It was bit-exact against its own forward but not against the axiom v2 reference, because v2 used absolute position embeddings and a sliding window. Variable-window NoPE training closes that gap. The streaming engine now drives a model trained for the streaming regime, end to end The Yul-unrolled production pattern (11.7× on the subword engine) transfers to the transformer with a smaller ratio than we had hoped, about 4×, because more of the attention cold path survives in Solidity. Aggressive unrolling of the dot product should recover most of the missing factor. That work is mechanical, just not done yet Two other things became possible in the meantime. The first non-text Quill engine works. Same char-MLP shape, applied to 8×8 grayscale patches, with a trained model that produces recognizable letters. A chain that draws. And a cross-chain registry pattern that mirrors the same model across EVM chains without duplicating storage. We'll write the measured numbers up properly when the next batch is in. The short version is that most of the work since Chapter 3 has been about making the stack usable rather than proving any new architectural points. The transformer was the interesting moment. What comes next is the part where you stop having to explain why you would build on it
English
1
4
12
7.6K