-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add md5 & some non-crypto hashes. Close #34 #44
base: master
Are you sure you want to change the base?
Conversation
As we are adding more and more hash functions: Why the split on default and max: for example blake2b produces hash output from 1 to 32 bytes, and other hashes can only be truncated in which case the default and maximum will be the same. This will make implementing multihash in various languages much easier. |
Thanks @dcbishop |
We could track that in the csv, yeah. but the implementations should not try to check and validate maxes on each of the hashes, that will be annoying to track and a source for bugs across implementations. implementations should have a global max they want to impose. |
0x66, Murmur3C | ||
0x67, Murmur3F | ||
0x75, crc32 | ||
0x76, crc32-C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should source more of the "non cryptographic hashes", at least the useful ones. List of more:
and
- more crypto hashes: https://bench.cr.yp.to/results-hash.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a need for non-cryptographic hashes? I don't really think they have their place in the mutihash spec. I like the description in RFC 6920:
The various formats are designed to support, but not require, a
strong link to the referenced object, such that the referenced object
may be authenticated to the same degree as the reference to it.
Unless there is a use /not/ to reference object such that they can be authenticated by the hash, you could of course add MD5, CRC-32 and so on, but I would stay away from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be fantastic to have "multihash" being an IETF supported document, but adding hashes to the list like this would make it rather impossible.
The good thing by having the IANA registry is that algorithms in the registry can be clearly marked as being deprecated. This has happened to MD5 in almost all present algorithm registries. See for example here:
http://www.iana.org/assignments/dns-sec-alg-numbers/dns-sec-alg-numbers.xhtml
If going for your own registry, please have one or more extra fields describing these properties to the algorithms, with a possible source reference.
So you can add "status", and "reference" to the csv table.
It's worth considering that there are a lot of non-cryptographic hashes out there. Heaps when you include all the variants. CRC alone has 52. A single byte isn't enough for them. Might be worth considering a UTF8 like scheme where 0x80 and above indicate more bytes for the hash number. Although that would be backward incompatible as it would move into the size. Or change the scheme somehow if it's not set in stone at this stage (although I see there are heaps of implementations). Another approach would be a 0xFF code that indicates the hash is really wrapping another multihash, but with a different lookup table hashes, that would still specify the size to keep compatibility, but that would mean the size would get specified twice (once for the whole wrapped multihash, and again for the actual digest size). Maybe specify the whole 0xF0 and above. Otherwise just limit to common ones. |
@dcbishop We are already standardising varint encoding for multiformats multiformats/unsigned-varint#1 @jbenet I referenced the max of a hash because of discussion on other issue about what should happen if mulithash is/is requested to be longer than output of hash function. |
See #40 (merged). Intention was to do a varint eventually, and now's the time. |
Hmmm. Right. This complicates implementations but we should evaluate it, i guess. |
@dcbishop can you put all that into pages in the repo itself, as markdown? i'm very negative on github wikis because they're much harder to keep up to date, there's little to no review, etc. |
Thank you! That looks great!! But please no wiki-- wiki does not notify on edits and does not lend itself If you want collab to make sure things merge faster, we can do that. I know
|
@@ -10,3 +10,14 @@ code, name | |||
0x19, shake-256 | |||
0x40, blake2b | |||
0x41, blake2s | |||
0x55, md5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
md5 is now 0xd5 as of #76
Came to this while looking for the Can this be merged please? Many hardware vendors, including FIPS, rely on CRC32 and standard way of representing them help. |
Adds md5, xxhash, spooky, murmur and 2 variants of crc32