During (and just prior to) DEF CON, |)ruid ran a series of challenges for entrance into his LOLBitcoin party. I decided to give them a shot, and thought I’d document my approach to one of the challenges here.
There were several different paths available, so that anyone could approach the challenges using whatever skills they felt were their strongest. I decided to go down ‘The Way of the Cryptologist’ and was met with a few challenges put together by Dan Crowley.
It’s the second of these which I’m describing here. Dan himself beat me to the punch with a solid writeup of his own, and a tool which he released to attack the challenge, and similarly flawed uses of crypto.
The challenge was presented in the form of two ciphertexts, and a statement that they were encrypted in a manner that supports Perfect Secrecy. That is, no amount of computational power should provide an aide in recovering the plaintext. However, as we’ll see, the devil’s in the details and implementation flaws can still cause havoc.
1 2 3 4 5
At first, I saw the leading characters ‘ecc885’ and thought this was a hint at Eliptic Curve Cryptography something I’ll admit I’m not too familiar with.
It was only after doing some research on ECC that I realized, the cipher itself probably doesn’t matter, but that these leading bytes being identical were significant. This is because they’re an indicator that the same key, from either a stream cipher of some sort or a One-Time Pad, was used to encrypt both.
Stream Cipher and OTP Mode of Operation
Stream Ciphers operate by producing a continuous stream of bytes (the ‘keystream’) which is then mixed with the plaintext one byte at a time by an xor operation to produce the ciphertext. Effectively, the keystream operates as a one-time pad, but with the differentiating factor that it can be reproduced by another endpoint which knows the parameters for the stream function and be used for decryption.
For our purposes, whether this is a stream cipher or a OTP is irrelevant, what matters is that the same keystream was applied to two different plaintexts; a flaw that allows us to decode the plaintexts with a little effort.
Consider the following operation;
By xor’ing the two ciphertexts together, the keystream is essentially xor’d against itself, and removed from our text. This leaves the contents of the two plaintexts mixed together via xor.
1 2 3
To do this in Ruby, we’ll need the ability to xor the two ciphertext’s together. We also need to be able to read in and decode ascii-hex representations of the ciphertext that were provided. Ruby doesn’t provide xor operations on strings by default, but I already had some code from the Matasano Crypto Challenges which does this for me. Full Disclosure: I work for Matasano, but am not involved in the challenges, aside from enjoying them and learning from them. If you’re reading this post, you’d probably like them too!
Here’s my patches to Ruby’s String class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
This code patches Ruby’s String class, adding a .from_hex method to decode hex strings, .to_hex for the reverse, and an xor operator using the ^ symbol, similar to Ruby’s built in numeric xor functions. This makes it pretty transparent to use. It also allows us to xor a Ruby string against another arbitrary-length string, or a single byte Fixnum (Yah, I should make this take arbitrary numeric lengths..) There’s even some exceptions raised if the parameters aren’t supported, etc.
In IRB, we can now do something like the following;
1 2 3 4 5 6
Notice that while we still haven’t recovered the plaintext, we have removed the influence of the keystream, which makes our job much easier.
From here, our task becomes seperating the two plaintexts from one another. To do this, we have to do a little guessing about the plaintexts themselves. This sort of guessing is called a known plaintext attack, or ‘cribbing’. The idea is that we can assume the plaintext is (in this case) likely an english language text, and thus guess that it’ll include some common words, articles, etc. From what we know of the challenge we can guess that it may contain the words ‘bitcoin’, ‘crypto’, or ‘party’ in addition to more general articles such as ‘the’, ‘of’, and ‘and.’ Believe it or not, these assumptions give us enough to recover both plaintexts!
Even if our guesses are correct, we don’t know where in the plaintext they occur. So let’s write a function which takes our crib, and xor’s it at every position in the mixed plaintext string. If it’s valid in either plaintext, we should see the contents of the other plaintext and know that we’ve guessed correctly at that position.
1 2 3
This function takes two parameters, our ciphertext (mixed plaintext in this case) and the crib. It loops through the ciphertext, and xor’s the crib at each index. While it’s doing this, it prints out both the index and the xor’d value. We can just eyeball and get an idea where we have a valid crib.
For example, let’s try ‘bitcoin’ as our crib.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
It’s mostly gibberish, but at index 39, we see the string ‘rapher’ appears. This seems like it could be valid plaintext. Maybe this is a substring of ‘cryptographer?’
Let’s try that as a crib, and see what we get. Since we prepended a few characters we can look a little prior to out last index, where we should expect a match.
1 2 3 4 5 6 7 8 9
and we have ’t lawl.bitcoi’ at index 32. By flipping back and forth between our two plaintexts, and making guesses of what might surround the character’s we’ve recovered, we can gradually expand this string to recover the entirety of both plaintexts. It’s helpful to try cribs as large as possible, and to remember that whitespace counts. “ the ” gets two more characters than “the” for example, with essentially the same guess.
Eventually, by iterating through this process and expanding/adding to our cribs, we end up with the following recovered plaintexts;
Notice that the first three bytes of both plaintexts are the same. This is why we saw the duplicate ‘ecc885’ at the start of the ciphertexts. These characters at the same position mixed with the keystream, give the same ciphertext.
There’s plenty of room for improvement in this code. Notably, it doesn’t make any efforts to record successful cribs. In my case, I started with one fairly large crib, and gradually expanded it’s boundaries until I recovered the plaintext. It may be simpler to record the position of successful guesses as a partial keystream, and then work on filling in the gaps. Dan’s tool does this, and it makes small cribs and articles far more useful.
Another improvement, and something I actually did a bit of, would be to score recovered plaintexts against an english-language character frequency. This would give us some automated insight into what may be a successful crib, rather than relying on scanning the output for obvious matches. It’s not a perfect solution though, especially for shorter texts, but could be a useful aide.
Anyhow, I hope this post was helpful, and explains my approach to attacking reused keys. Thanks for reading!