Diego Kingston🟩

1.6K posts

Diego Kingston🟩

Diego Kingston🟩

@diego_aligned

Co-founder @alignedlayer.

Katılım Eylül 2022
756 Takip Edilen3.5K Takipçiler
Sabitlenmiş Tweet
Diego Kingston🟩
Diego Kingston🟩@diego_aligned·
To acquaintances: given recent impersonators, I remind you that I will never write to you unless I met you in real life (and generally, I don't write to people often), and will never ask you to download any software or execute commands, or anything of the sort. I have several skills, but Chinese is not among them. I also don't send any files, photos or things of the sort. Beware of scammers. If you bump into someone saying it's me, it´s definitely not me
English
5
2
14
4.7K
Diego Kingston🟩 retweetledi
LambdaClass
LambdaClass@class_lambda·
Last week Rome was three conferences in one: zkSummit 14, zkProof, and Eurocrypt 2026. We were there for the latest in cryptography and zero-knowledge, and to get the community's eyes on our new VM.
LambdaClass tweet media
English
1
5
17
1.2K
Diego Kingston🟩
Diego Kingston🟩@diego_aligned·
Are algebraic hash functions screwed?
Diego Kingston🟩 tweet media
English
4
4
43
4.7K
Diego Kingston🟩 retweetledi
Giacomo Fenzi
Giacomo Fenzi@GiacomoFenzi·
We are back on! Alessandro Chiesa on Close Enough: Proximity Tests from Linear codes. Livestream: youtube.com/live/Kla_3rFN-…
YouTube video
YouTube
Giacomo Fenzi tweet media
English
1
2
21
781
Diego Kingston🟩
Diego Kingston🟩@diego_aligned·
How close can you get? IOP fest has the answer
Diego Kingston🟩 tweet media
English
0
0
6
284
Diego Kingston🟩 retweetledi
Mauro Toscano 🟩
Mauro Toscano 🟩@mauro_aligned·
But the good news is LambdaVM doesn’t care about quantum computers. And @leanEthereum neither.
English
2
4
28
2.5K
Diego Kingston🟩
Diego Kingston🟩@diego_aligned·
Now on zkproofs, quantum cryptanalysis
Diego Kingston🟩 tweet media
Nederlands
0
0
10
307
Diego Kingston🟩
Diego Kingston🟩@diego_aligned·
The overhead is not 100%, see the paper. You can audit all the tokens that you want and the provider cannot change responses, because he is cryptographically bound by the commitment. The protocol allows you to verify locally without reexecuting, in a time that is significantly smaller than what would take you to do the inference on your own. This is way better than having people randomly redo the computation and which you cannot be certain are colluding. Thus, the protocol is more efficient both in terms of compute and in allowing you to verify with certainty any opening you do
English
1
0
1
42
Alex Mizrahi
Alex Mizrahi@killerstorm·
@ercwl This is a very interesting work, but I don't think it unlocks many use cases, at least as is. I'm not convinced it's actually better than redundant execution. Let's compare: "The full response is always committed, but only a random fraction of responses are opened for audit."
English
3
0
2
792
Diego Kingston🟩 retweetledi
Fede’s intern 🥊
Fede’s intern 🥊@fede_intern·
Attentioned correclty bounded in multiple models. Working on FP8 implementation now too. Let's make open weight models the default in AI by making inference verifiable!
Fede’s intern 🥊@fede_intern

LLMs now make critical decisions in hospitals, defense, banks, and governments. Yet nobody can verify which model actually ran, or whether the output was tampered with. A provider or middleman can swap weights, silently requantize the model, alter decoding, inject hidden prompts, do supply chain attacks, or change the deployment surface without the user knowing. This problem is already serious. It will become critical. We think this needs a practical solution, not just a theoretically clean one. CommitLLM is designed to be deployable on existing serving stacks now: the provider keeps the normal GPU serving path, does not need a proving circuit, does not need a kernel rewrite, and does not generate a heavy proof for every response. In practice, two families of approaches dominated the conversation before this work: fingerprinting, which can be gamed, and proof-based systems, which are theoretically strong but too expensive for production inference. We built CommitLLM to target the middle ground. The core idea is to keep the verification discipline of proof systems, but specialize it to open weight LLM inference. The cryptographic core is simple: Freivalds style randomized checks for the large linear layers, plus Merkle commitments for the traced execution. Then a lot of engineering work is needed to make that line up with real GPU inference. The key trick is this. A provider claims `z = W × x` for a massive weight matrix. Normally you would verify that by redoing the multiply. Instead, the verifier samples a secret random vector `r`, precomputes `v = rᵀ × W`, and later checks whether `v · x = rᵀ · z`. Two dot products instead of a full matrix multiply. In the current implementation, a wrong result passes with probability at most `1 / (2^32 - 5)` per check. A full matrix multiply, audited with two dot products. Most of the transformer can then be checked exactly or canonically from committed openings. Nonlinear operations such as activations and layer norms are canonically re executed by the CPU verifier. The one honest caveat is attention: native FP16/BF16 attention is not bit reproducible across hardware. CommitLLM verifies the shell around attention exactly, then independently replays attention and checks that the committed post attention output stays within a measured INT8 corridor. So attention is bounded and audited, not proved exactly. That means the protocol already gives very strong exact guarantees on the parts that matter operationally most. If an audited response used the wrong model, the wrong quantization/configuration, or a tampered input/deployment surface, the audit catches that exactly. That includes things like model swaps, silent requantization, and provider side prompt or system prompt injection. Today the implementation and measurements are strongest on Qwen and Llama. But the protocol itself is not meant to be Qwen or Llama specific: we expect it to generalize across open weight decoder only families. What still has to be done is the engineering work to integrate and validate more families explicitly, and we are already working on that. On the measured path, online generation overhead is about 12 to 14% with the provider staying on the normal GPU serving path. The heavier receipt finalization cost is separate and can be deferred off the user facing path. The main systems costs are RAM and bandwidth, not proof generation. The full response is always committed, but only a random fraction of responses are opened for audit. Individual audits are much larger, roughly 4 MB to 100 MB depending on audit depth. The important number is the amortized one: under a reasonable audit policy, the added bandwidth averages to roughly 300 KB per response. After too many weeks without sleep, I’m proud to show what I built with @diego_aligned: CommitLLM. Thanks Diego for your patience. I've been calling you at random hours. The code and paper still need some cleaning and formalization. We’re already in talks with multiple providers and teams that have cryptography related ideas on how to improve it even more. We’re really excited about this and we will continue doubling down on building products in AI, cryptography and security with my company @class_lambda. If governments, hospitals, defense and financial systems are going to run on LLMs, verifiable inference is not optional. It is infrastructure. I will be explaining this in more details in the days to come and I will show how to test it and run it.

English
1
4
13
2K
Diego Kingston🟩
Diego Kingston🟩@diego_aligned·
Best place for sushi in Buenos Aires, amazing experience. Ergodic focuses also on craft
English
0
1
2
282
Diego Kingston🟩 retweetledi
abdel
abdel@AbdelStark·
This is amazing work! Turns out there might be more ways than ZK to help solve the problem of verifiable AI! I really like the simplicity and pragmatism of the scheme. It seems to be a very interesting set of tradeoffs, that could become suitable for multiple production use cases. I implemented CommitLLM version in Zig. Fully compatible with the Rust reference implementation. In the demo video you can see cross implementation checks, including tamper attempt, a.k.a the CommitLLM Rust prover trying to fool the CommitLLM Zig verifier and failing.
Fede’s intern 🥊@fede_intern

LLMs now make critical decisions in hospitals, defense, banks, and governments. Yet nobody can verify which model actually ran, or whether the output was tampered with. A provider or middleman can swap weights, silently requantize the model, alter decoding, inject hidden prompts, do supply chain attacks, or change the deployment surface without the user knowing. This problem is already serious. It will become critical. We think this needs a practical solution, not just a theoretically clean one. CommitLLM is designed to be deployable on existing serving stacks now: the provider keeps the normal GPU serving path, does not need a proving circuit, does not need a kernel rewrite, and does not generate a heavy proof for every response. In practice, two families of approaches dominated the conversation before this work: fingerprinting, which can be gamed, and proof-based systems, which are theoretically strong but too expensive for production inference. We built CommitLLM to target the middle ground. The core idea is to keep the verification discipline of proof systems, but specialize it to open weight LLM inference. The cryptographic core is simple: Freivalds style randomized checks for the large linear layers, plus Merkle commitments for the traced execution. Then a lot of engineering work is needed to make that line up with real GPU inference. The key trick is this. A provider claims `z = W × x` for a massive weight matrix. Normally you would verify that by redoing the multiply. Instead, the verifier samples a secret random vector `r`, precomputes `v = rᵀ × W`, and later checks whether `v · x = rᵀ · z`. Two dot products instead of a full matrix multiply. In the current implementation, a wrong result passes with probability at most `1 / (2^32 - 5)` per check. A full matrix multiply, audited with two dot products. Most of the transformer can then be checked exactly or canonically from committed openings. Nonlinear operations such as activations and layer norms are canonically re executed by the CPU verifier. The one honest caveat is attention: native FP16/BF16 attention is not bit reproducible across hardware. CommitLLM verifies the shell around attention exactly, then independently replays attention and checks that the committed post attention output stays within a measured INT8 corridor. So attention is bounded and audited, not proved exactly. That means the protocol already gives very strong exact guarantees on the parts that matter operationally most. If an audited response used the wrong model, the wrong quantization/configuration, or a tampered input/deployment surface, the audit catches that exactly. That includes things like model swaps, silent requantization, and provider side prompt or system prompt injection. Today the implementation and measurements are strongest on Qwen and Llama. But the protocol itself is not meant to be Qwen or Llama specific: we expect it to generalize across open weight decoder only families. What still has to be done is the engineering work to integrate and validate more families explicitly, and we are already working on that. On the measured path, online generation overhead is about 12 to 14% with the provider staying on the normal GPU serving path. The heavier receipt finalization cost is separate and can be deferred off the user facing path. The main systems costs are RAM and bandwidth, not proof generation. The full response is always committed, but only a random fraction of responses are opened for audit. Individual audits are much larger, roughly 4 MB to 100 MB depending on audit depth. The important number is the amortized one: under a reasonable audit policy, the added bandwidth averages to roughly 300 KB per response. After too many weeks without sleep, I’m proud to show what I built with @diego_aligned: CommitLLM. Thanks Diego for your patience. I've been calling you at random hours. The code and paper still need some cleaning and formalization. We’re already in talks with multiple providers and teams that have cryptography related ideas on how to improve it even more. We’re really excited about this and we will continue doubling down on building products in AI, cryptography and security with my company @class_lambda. If governments, hospitals, defense and financial systems are going to run on LLMs, verifiable inference is not optional. It is infrastructure. I will be explaining this in more details in the days to come and I will show how to test it and run it.

English
4
10
66
8K
Diego Kingston🟩 retweetledi
Fede’s intern 🥊
Fede’s intern 🥊@fede_intern·
LLMs now make critical decisions in hospitals, defense, banks, and governments. Yet nobody can verify which model actually ran, or whether the output was tampered with. A provider or middleman can swap weights, silently requantize the model, alter decoding, inject hidden prompts, do supply chain attacks, or change the deployment surface without the user knowing. This problem is already serious. It will become critical. We think this needs a practical solution, not just a theoretically clean one. CommitLLM is designed to be deployable on existing serving stacks now: the provider keeps the normal GPU serving path, does not need a proving circuit, does not need a kernel rewrite, and does not generate a heavy proof for every response. In practice, two families of approaches dominated the conversation before this work: fingerprinting, which can be gamed, and proof-based systems, which are theoretically strong but too expensive for production inference. We built CommitLLM to target the middle ground. The core idea is to keep the verification discipline of proof systems, but specialize it to open weight LLM inference. The cryptographic core is simple: Freivalds style randomized checks for the large linear layers, plus Merkle commitments for the traced execution. Then a lot of engineering work is needed to make that line up with real GPU inference. The key trick is this. A provider claims `z = W × x` for a massive weight matrix. Normally you would verify that by redoing the multiply. Instead, the verifier samples a secret random vector `r`, precomputes `v = rᵀ × W`, and later checks whether `v · x = rᵀ · z`. Two dot products instead of a full matrix multiply. In the current implementation, a wrong result passes with probability at most `1 / (2^32 - 5)` per check. A full matrix multiply, audited with two dot products. Most of the transformer can then be checked exactly or canonically from committed openings. Nonlinear operations such as activations and layer norms are canonically re executed by the CPU verifier. The one honest caveat is attention: native FP16/BF16 attention is not bit reproducible across hardware. CommitLLM verifies the shell around attention exactly, then independently replays attention and checks that the committed post attention output stays within a measured INT8 corridor. So attention is bounded and audited, not proved exactly. That means the protocol already gives very strong exact guarantees on the parts that matter operationally most. If an audited response used the wrong model, the wrong quantization/configuration, or a tampered input/deployment surface, the audit catches that exactly. That includes things like model swaps, silent requantization, and provider side prompt or system prompt injection. Today the implementation and measurements are strongest on Qwen and Llama. But the protocol itself is not meant to be Qwen or Llama specific: we expect it to generalize across open weight decoder only families. What still has to be done is the engineering work to integrate and validate more families explicitly, and we are already working on that. On the measured path, online generation overhead is about 12 to 14% with the provider staying on the normal GPU serving path. The heavier receipt finalization cost is separate and can be deferred off the user facing path. The main systems costs are RAM and bandwidth, not proof generation. The full response is always committed, but only a random fraction of responses are opened for audit. Individual audits are much larger, roughly 4 MB to 100 MB depending on audit depth. The important number is the amortized one: under a reasonable audit policy, the added bandwidth averages to roughly 300 KB per response. After too many weeks without sleep, I’m proud to show what I built with @diego_aligned: CommitLLM. Thanks Diego for your patience. I've been calling you at random hours. The code and paper still need some cleaning and formalization. We’re already in talks with multiple providers and teams that have cryptography related ideas on how to improve it even more. We’re really excited about this and we will continue doubling down on building products in AI, cryptography and security with my company @class_lambda. If governments, hospitals, defense and financial systems are going to run on LLMs, verifiable inference is not optional. It is infrastructure. I will be explaining this in more details in the days to come and I will show how to test it and run it.
Fede’s intern 🥊 tweet media
English
36
54
361
99.5K