Samuel Neves

1.3K posts

Samuel Neves

@sevenps

شامل ہوئے Aralık 2010

241 فالونگ645 فالوورز

@ciphergoth @oconnor663 @cryptodavidw It wouldn't make much sense to benchmark universal hashes against unkeyed collision-resistant hash functions. But an almost (Δ-)universal hash section on eBACS could be useful on its own.

English

Paul Crowley@ciphergoth·25 May

@oconnor663 @cryptodavidw Shame SUPERCOP doesn't measure almost-universal functions, it's very hard to be faster than that!

English

Jack O'Connor@oconnor663·24 May

Bragging: In this SUPERCOP benchmark of long inputs on an AVX-512 machine, BLAKE3 is the fastest hash function. Not the fastest cryptographic hash function. The fastest hash function. #amd64-ygritte" target="_blank" rel="nofollow noopener">bench.cr.yp.to/results-hash.h…

English

Samuel Neves@sevenps·8 May

@veorq eprint.iacr.org/2017/806 eprint.iacr.org/2017/490

QME

JP Aumasson@veorq·8 May

what are some examples of bad puns in crypto papers titles? things like "What the Fork: Implementation Aspects of a Forkcipher" and "Does gate count matter? Hardware efficiency of (..)"

English

Samuel Neves@sevenps·3 Mar

@LennertWo @matthew_d_green @TomerAshur The permutation in Definition 1 has some kind of typo---b_1 is never used, and b_2 is used twice!

English

Lennert@LennertWo·3 Mar

@matthew_d_green Maybe @TomerAshur can have some fun with it?

English

Matthew Green@matthew_d_green·3 Mar

In 2005 my research group reverse-engineered an automotive cipher called DST40. Fifteen years later Tesla is using a variant of the same cipher: the DST80. Spoiler: it is not 2^40 times as strong. tches.iacr.org/index.php/TCHE…

English

128

Samuel Neves@sevenps·2 Oca

@kste_ @ciphergoth @veorq @WatsonLadd @SchmiegSophie The best trail (the usual caveats apply) for Salsa jumps from 2^-18 to 2^-46 for 3 to 4 rounds; the best trail for Chacha jumps from 2^-12 to 2^-39. But restricting the differences to the attacker-controlled 128 bits instead of the entire space would greatly decrease these probs.

English

Samuel Neves@sevenps·26 May

@SeanieCurran @veorq The comparison formulas there were derived independently and, if I remember correctly, unsigned < requires one fewer operation than Hacker's Delight.

English

JP Aumasson@veorq·26 May

the "crypto coding rules" are back at github.com/veorq/cryptoco… originally started this in 2013, haven't touched it in years, just did some cleanup and update but still lot of work needed! PRs welcome :)

English

186

Samuel Neves@sevenps·28 Mar

@chrisrohlf Probably a similar interface to WRMSR or XSETBV: register index in ECX, upper bits in EDX. Since there's only one 32-bit register so far, both are hardcoded to 0.

English

Samuel Neves@sevenps·18 Şub

@kode54 The answer is no. The construction appears deceptively simple, but its security is not in question.

English

Samuel Neves@sevenps·2 Oca

@matthew_d_green @mjos_crypto tches.iacr.org/index.php/TCHE…

QME

Matthew Green@matthew_d_green·2 Oca

@mjos_crypto Has anyone ever optimized these ciphers to work more efficiently when enciphering sequential counters as opposed to CBC/OCB where you have to feed actual plaintext into the cipher?

English

$mjos\dwez @m-jos.bsky.social$

mjos\dwez @m-jos.bsky.social@mjos_crypto·2 Oca

Thanks to inherent parallelism of AES-GCM (its only saving grace), future AVX512 CPUs can encrypt/decrypt+authenticate four AES blocks in parallel with VAESENC, VAESENCLAST, and VPCLMULQDQ. Why they're wasting huge amounts area to VAESDEC, VAESDECLAST is a mystery (not needed).

English

Samuel Neves@sevenps·30 Kas

@johnregehr @lemire pubs.cray.com/content/S-2179… (search for "nosignedshifts") is the only one I know of.

English

Samuel Neves@sevenps·16 Kas

@oconnor663 I don't understand what you mean.

English

Jack O'Connor@oconnor663·16 Kas

@sevenps I wonder if that would "hardcode" too many of the particular features of the BLAKE2 compression function. For example, would this general interface take a "root node" or flag or a "leaf vs parent" IV parameter?

English

Jack O'Connor@oconnor663·15 Kas

@zooko @sevenps the latest benchmarks at github.com/oconnor663/bao… have BLAKE2s beating BLAKE2b after all. Both versions benefit from keeping the state words in transposed form while hashing multiple inputs, to avoid transposing them over and over. But BLAKE2s benefits much more.

English

Samuel Neves@sevenps·16 Kas

@oconnor663 I meant specify Bao in terms of a compression function (e.g., the one underlying blake2*) instead of a variable input size hash function.

English

Jack O'Connor@oconnor663·16 Kas

@sevenps Could you clarify "to specify the hash"? Do you mean like exposing a Bao API that takes a compression function as a parameter?

English

Samuel Neves@sevenps·16 Kas

@oconnor663 rdtsc(p) no longer counts cycles in most chips; it is a timer that runs at the nominal frequency of the processor, but the processor itself can clock higher or lower. So you need to force it to also run at the nominal frequency to have reasonably accurate cycle counts.

English

Jack O'Connor@oconnor663·16 Kas

@sevenps For example, the ones in this file: github.com/oconnor663/bla…. I thought Turbo Boost only changed the clock frequency, which shouldn't affect the cycles-per-byte. But maybe I've totally misunderstood what this ticks_modern() function is measuring? (docs.rs/amd64_timer/1.…)

English

Samuel Neves@sevenps·15 Kas

@oconnor663 Another thing---those (particularly the single-threaded) numbers are either too good to be true, or you're not actually disabling Turbo Boost for measuring.

English

Samuel Neves@sevenps·15 Kas

@oconnor663 There's little point in an AVX2 implementation of BLAKE2s, beyond taking advantage of AVX512F's native rotation instructions and such. On another note, have you considered using the compression function directly to specify the hash?

English

Samuel Neves@sevenps·9 Kas

@oconnor663 @zooko NEON should make a big difference, seeing that it has native 64-bit addition. On SUPERCOP, blake2b generally outperforms blake2s where NEON is present, e.g., #armeabi-pi2" target="_blank" rel="nofollow noopener">bench.cr.yp.to/results-hash.h… On the other hand, blake2s does not generally benefit from NEON, but tree'd blake2s might.

English

Jack O'Connor@oconnor663·9 Kas

@zooko @sevenps I just put up some preliminary benchmark results for 32-bit ARM at github.com/oconnor663/bao…. As expected, BLAKE2s dramatically outperforms BLAKE2b. I don't know if NEON would affect things in either direction, but I haven't ported anything yet.

English

Samuel Neves@sevenps·8 Kas

@oconnor663 @zooko Twitter is really not the best medium for this. Everything's out of order. blake2sp is essentially the same speed as blake2bp but is more sensitive to compiler codegen quirks, so depending on compiler version/flags it is often slower.

English

Jack O'Connor@oconnor663·8 Kas

@zooko @sevenps God so many threading fails :p Believe it or not this is my first long Twitter thread.

English

Jack O'Connor@oconnor663·3 Kas

@zooko I've been working on a tree hash based on BLAKE2b, and it's at the point where it needs a review from a Real Cryptographer. Do you know anyone who might be interested in collaborating on something like that? github.com/oconnor663/bao

English

Samuel Neves@sevenps·16 Eki

@pbarreto @cryptojedi @dsp6s The branch is caused by the (unintentional?) conversion of `flip` to `double`, not the mask generation. godbolt.org/z/kAvVYp Cleaner version: godbolt.org/z/6fQ9iZ

English

Samuel Neves@sevenps·13 Eki

@oe1cxw @rygorous Neat, the high part does the inversion itself. You can also compute the xor of any number of rotations of a word; for example SHA-256's S1 is doable as clmul(e, 0x4200080) ^ clmulh(e, 0x4200080).

English

Claire Xen 🏳️‍⚧️🧙🏻‍♀️ 💖💛💙 BLM 🏴🚩@oe1cxw·13 Eki

@sevenps @rygorous Oh, yes. That also works. 🧐 However, my "clmul_gray2" uses one instruction fewer. 😝 So current status of gray code vs clmul invariants:

Claire Xen 🏳️‍⚧️🧙🏻‍♀️ 💖💛💙 BLM 🏴🚩 tweet media

English

Claire Xen 🏳️‍⚧️🧙🏻‍♀️ 💖💛💙 BLM 🏴🚩@oe1cxw·11 Eki

Can you name applications for CLMUL that are not CRC, GCM, hashing/rng, or Erasure Code? I'm trying to create a list of possible applications. #RISCV #Bitmanip #Followerpower cc @rygorous @geofflangdale @alt_kia @lemire @rrika9

English

Samuel Neves@sevenps·13 Eki

@oe1cxw @rygorous Since the Gray code is bit-reversed polynomial multiplication by x + 1, whose inverse modulo x^32 is all 1s, you can also have grev32(clmul32(grev32(x ^ (x >> 1), 31), -1), 31) == x.

English

Claire Xen 🏳️‍⚧️🧙🏻‍♀️ 💖💛💙 BLM 🏴🚩@oe1cxw·12 Eki

@rygorous JFYI: I've now added this as clmul application to the current xbitmanip draft at raw.githubusercontent.com/cliffordwolf/x… (you are now listed as contributor to xbitmanip, btw) and I've added the following invariant to my test cases.

English

Samuel Neves@sevenps·28 Nis

@ciphergoth If there's a place where you'd find this kind of thing, it would probably be Jörg Arndt's book: jjj.de/fxt/fxtpage.ht…

English

Paul Crowley@ciphergoth·28 Nis

Ever seen an algorithm so neat you have to share it? This is just the best way I've ever seen to generate every permutation of a list, and I can't find it online anywhere else. gist.github.com/ciphergoth/3d8…

English

دریافت کریں

@ciphergoth @oconnor663 @cryptodavidw @veorq @LennertWo @matthew_d_green @TomerAshur @WatsonLadd