A friend asked me some time time ago how his bank’s OTP token worked. Most tokens that banks use (at least in Italy) are products of the “RSA SecurID” family, which are proprietary and secret (and rumored to have been compromised), but the general cryptography behind them is well-known and there are open standards that can be easily deployed by companies that want to add an extra layer of security. An emerging standard for OTP generation is called OATH, and given the general availability of free implementations, I will detail its algorithms in this post.
OTPs (one-time passwords) are based on the concept of a so-called cryptographically-secure pseudo-random number generator (aka CSPRNG). As many programmers know, pseudo-random generator (as found in the standard library of most programming languages, such as C’s rand()) are algorithms that generate a repeatable sequence of numbers that are “random looking”; there are several way to measure how random a sequence is, but the important property that differentiate PRNG from CSPRNG is not really concerned with randomness per-se, but rather with how easy is to predict the next number just by looking at the previous ones. This is important in the OTP context because obviously an attacker might get to know the previous numbers generated by the system (eg. through a key-logger installed on user’s computer), so it is paramount to make sure that he cannot exploit this knowledge to generate the next number.
Once we have chosen a suitable CSPRNG, it is sufficient that the server and the client agrees on the “seed” to be used for the sequence; in fact, both sides will be able to independently generate the same sequence and thus for the server it will be easy to check if the numbers generated by the client are the same that it generates by itself. As long as the algorithm is truly secure and the “seed” is not leaked, the ability of generating the next number can be considered a good authentication mechanism, since nobody else should be able to generate the same sequence. By embedding the seed onto an off-line physical device (the token), leaking the seed becomes almost impossible. If the device is stolen, it is sufficient to revoke it and assign a new device to the user (with a new “seed”).
I’m putting the word seed between quotes because it does not tell all the truth. Any PRNG algorithm works by keeping an internal state; when the next number is requested, the algorithm manipulates (changes) the internal state in some way, and then produces a number as result of this computation. In the simplest of all random algorithms (the same used by many implementation of C’s rand()), the internal state is simply the previous number being generated, but this is obviously not good for a CSPRNG. What we want is a way to generate a number from an internal state in a way not to leak any information (if possible at all) about the internal state. The “seed” is just a way to initializes the internal state, but if the PRNG is not secure, it will eventually leak enough information to let an attacker reconstruct the internal state, even if the seed itself was never leaked.
So, how do we make sure that we can generate a number from a state without leaking information? What we are looking for is called a one-way function. Luckily, in cryptography, we have plenty of one-way functions available: they are called “secure hash functions”, MD5 and SHA1 being the most popular. If we decide that we trust SHA-1 as a good one-way function (that is, we accept that our CSPRNG be as strong as SHA-1, and broken as soon as SHA-1 will be broken), then we don’t need a complex internal state: we can simply use a progressive counter for that. SHA-1′s properties in fact already guarantee that, given SHA1(N), there is no way to reconstruct N; and for every N’ very “similar” to N (eg: obtained by bit-flipping, or a simple increment), there is no correlation at all between SHA1(N’) and SHA1(N).
So, the sequence SHA1(0), SHA1(1), SHA1(2), etc. can be considered a CSPRNG. But it is just one sequence; it’s true that can be arbitrary long, but how can we obtain different sequences for different users? The first solution that comes to mind is to use the same technique that is used to differentiate hashes of the same password: a salt. If we assign each user a random salt S, we can prefix it to the counter and obtain a unique sequence by computing SHA1(S + N). Let’s see this at work with some basic Python:
import os
from hashlib import sha1
def OTP(salt=None, n=0, digits=6):
if salt is None:
salt = os.urandom(8)
print "Salt:", salt.encode("hex")
while True:
hash = sha1(salt + repr(n)).hexdigest()
num = long(hash, 16) % (10**digits)
yield "%06d" % num
n += 1
o = OTP()
print o.next()
print o.next()
which produces an output just like this:
Salt: e4cd2b5b4dbbed47 031049 163172
Up until now I have used the word “salt” but this is in fact incorrect. The term salt is supposed to indicate a random string of bytes which is generated for each invokation of the hash/cipher. In the CSPRNG case, instead, the random data is generated once per user, and reused through the life of the sequence. Together with the counter, this “salt” is actually the secret state of the algorithm. The correct term for that string is “secret key”; if it is leaked, the CSPRNG is basically broken (since recovering the current counter is just a matter of generating the whole sequence up until the match).
OATH is basically the above algorithm, with only a few variations to make it more secure in face of future attacks to the underlying SHA1 hash function:
Even with the above changes, OATH is quite easy to implement, and in fact there exist many different free implementations. For instance, you can download a generic OATH client for both iOS and Android, which are very good substitutes for a hardware token (with just the inconvenience that it is much much easier to extract the secret key in case an attack has physical access to the device for a while).
3 Responses
Guido
30|May|2011 1Out of curiosity, how can the server “follow” the token’s emitted codes? As you brightly showed, the token emits codes in a sequence, which is not in any way connected to local time (true?)
Now the server, given the identity of the token and the knowledge of the secret salt, should be able to generate the same sequence. But there is no obvious method of recognizing *which* code in the sequence we are delivered. Any hint?
Giovanni Bajo
30|May|2011 2There are two kind of tokens: event-based and time-based. Event based means that the counter goes up each time the user clicks a button. Time-based means the counter goes up every 30 seconds or so.
For event-based, the server keeps a log of the last counter that it was seen; then it is configured with a window size of values that are allowed (eg: 60 values after the last one that was seen). If the user presses the button 100 times since last login, the token is desynched, it doesn’t work anymore, and there is usually a procedure to login and then resync it (eg: an emergency login with an in-person authentication, with an additional never-otherwise-used super-secret password, or with a list of 10 random-generated one-time codes that are printed on a paper sheet). Resyncing is of course a matter of inserting the current value; the server keeps counting up until it finds the match, and store the current counter.
Time-based tokens work exactly the same, the only difference is that the allowed window depends on the accuracy of the device internal clock (we assume the server can always have a very accurate time source). If the device time source can be trusted, you can shorten the window down to even just 1 or 2 valid values. If the device time source is not fully accurate (like eg. a smartphone that gets the time from the UMTS protocol but syncs rarely to save batter and can be wrong even by 1-2 minutes), then you need a larger window.
Taifu
30|May|2011 3I think I’m “the friend that asked”
Great article!
Only a minor nitpicking: resyncing usually requires two values.