SHA1 Is Broken, But It’s Still Good Enough for OpenTimestamps
$ cd examples/sha1 $ diff a b Binary files a and b differ $ ots verify -f a a_or_b.ots Success! Bitcoin attests data existed as of Wed May 17 21:45:45 2017 EDT $ ots verify -f b a_or_b.ots Success! Bitcoin attests data existed as of Wed May 17 21:45:45 2017 EDT
As you probably already guessed, this has something to do with the SHA1
cryptographic hash function. The
a_or_b.ots timestamp proof hashes its input
$ ots info a_or_b.ots | head -n 1 File sha1 hash: f92d74e3874587aaf443d1db961d4e26dde13e9c
…and while files
b are different, they have the same SHA1 digest:
$ sha1sum a b f92d74e3874587aaf443d1db961d4e26dde13e9c a f92d74e3874587aaf443d1db961d4e26dde13e9c b
This is possible because SHA1 is broken: it’s
vulnerable to collision
attacks, allowing an attacker
finds two different messages with the same SHA1 digest. For many cryptographic
protocols collision attacks are a disaster. For example, since Git identifies
source code by SHA1 hash, a software vendor could send an security auditing
team a clean version of a software package, get the auditors sign off on the
software being secure, then secretly send a backdoored version of the software
with the same SHA1 digest to unsuspecting users.
To make a long story short, cryptographers have been recommending against using
SHA1 for over a decade,
prior to Git’s initial release in fact.
So why are SHA1 operations allowed in OpenTimestamps proofs if SHA1 is broken?
Remember that a timestamp proof simply proves that a message existed prior to a
particular time in the past. Fortunately for us, SHA1 is only vulnerable to
collision attacks, not preimage attacks.
The difference is that in a collision attack, the attacker is finding two
different messages, and , that have the same digest. While there are
a variety of different types of collision attacks, they all share something in
common: when and are found, they’re found at the same time.
Thus our SHA1-using timestamp proof is secure: we know
b existed at
the same time, prior to when the Bitcoin block was created.
So what would it take for a timestamp proof to be insecure? A preimage attack.
Here the attacker starts with a existing digest , and has to find a new
message that has the same digest. In the context of a timestamp proof, the
new message makes the proof invalid: it didn’t exist when the proof was
Fortunately, for a variety of reasons the underlying math makes preimage
attacks much more difficult than collision attacks. As Zooko Wilcox notes
in the History of Hash Function Attacks
while collision attacks have been found for a large number of well-known
cryptographic hash functions (albeit all hash functions designed prior to
1998), the only well-known cryptographic hash function has ever been found to
be vulnerable to a preimage attack is Snefru-2, designed in 1990.
SHA1 and Security Engineering
So does this mean we can just use SHA1 everywhere? It’s faster and smaller than
SHA256 after all. Better yet, no-one’s found a preimage attack for MD5 – it’s
even faster and smaller than SHA1!
Not so fast! There’s a lot of good reasons to be dubious about using SHA1. In
fact, the only reason why OpenTimestamps supports SHA1 at all is for
compatibility with SHA1-using systems like Git and the Internet
Archive. If not
for compatibility considerations OpenTimestamps would support exactly one hash
function, SHA256; far more security vulnerabilities have been caused by bad code
than bad math.
Secondly that SHA1 digests can collide breaks assumptions people make about
hash functions. For example, it might be useful for a database of timestamp
proofs to include an index that maps output digest to input message: SHA1
collisions break that assumption. While it’s good to avoid bad assumptions,
it’s safer if you don’t have too. For this reason alone we may further restrict
the use of SHA1 in the future in OpenTimestamps, perhaps relegating it to some
kind of “safeties off” mode.
Finally, safety margins are a good thing. I sleep better at night knowing that
the vast majority of OpenTimestamps proofs are vulnerable to breaks in exactly
one hash function, SHA256. I know that if SHA256 is weak, it’s likely to fall
to a collision attack first, giving me a few years figure out how to move to a
better hash function. And I know that because of Bitcoin, there’s tremendous
incentives to break that hash function; the fact that no-one has found a way to
break it already is a good sign!