27 Hash Algorithm

epgp books

 

 

 

 

Learning Objectives

  • To introduce general ideas behind cryptographic hash function
  • To distinguish between two categories of hash function: those with a compression function made from scratch and those with a block cipher as the compression function
  • To discuss the structure of SHA-512 as an example of a cryptographic hash function with a compression function made from scratch
  • To discuss the structure of Whirlpool as an example of a cryptographic hash function with a block cipher as the compression

  1.  Hash Function Properties

 

Hash Function produces a fingerprint of some file/message/data h = H(M) condenses a variable-length message M to a fixed-sized finger print .this is assumed to be public.

 

2.  Requirements for Hash Functions

 

The hash function can be applied to any sized message M produces fixed-length output h is easy to compute h=H(M) for any message M, given h is infeasible to find x s.t. H(x)=h .one-way property given x is infeasible to find y s.t. H(y)=H(x)

weak collision resistance is infeasible to find any x,y s.t. H(y)=H(x) strong collision resistance. These are the specifications for good hash functions. Essentially it must be extremely difficult to find 2 messages with the same hash, and the hash should not be related to the message in any obvious way (ie it should be a complex non-linear function of the message). There are quite a few similarities in the evolution of hash functions & block ciphers, and in the evolution of the design requirements on both.

  1. Block Ciphers as Hash Functions

The block ciphers can be used as a hash functions using H0=0 and zero-pad of final block compute: Hi = EMi [Hi-1]and use final block as the hash value similar to CBC but without a key resulting hash is too small (64-bit) both due to direct birthday attack and to “meet-in-the-middle” attack other variants also susceptible to attack

  1. Hash Algorithms

The similarities in the evolution of hash functions & block ciphers are increasing power of brute-force attacks this leads to evolution in algorithms like DES to AES in block cipher and from MD4 & MD5 to SHA-1 & RIPEMD-160 in hash algorithms .likewise tend to use common iterative structure as do block ciphers.

4.1 MD5

 

MD5 is the current, and very widely used, member of Rivest’s family of hash functions. It is designed by Ronald Rivest (the R in RSA) the latest in a series of MD2, MD4 it produces a 128-bit hash value until recently was the most widely used hash algorithm ,in recent times have both brute-force & cryptanalytic concerns specified as Internet standard RFC1321. The padded message is broken into 512-bit blocks, processed along with the buffer value using 4 rounds, and the result added to the input buffer to make the new buffer value. Repeat till run out of message, and use final buffer value as hash. nb. due to padding always have a full final block (with length in it).

4.1.1   MD5 Compression Function

 

Each round mixes the buffer input with the next “word” of the message in a complex, non-linear manner. A different non-linear function is used in each of the 4 rounds (but the same function for all 16 steps in a round). The 4 buffer words (a,b,c,d) are rotated from step to step so all are used and updated. g is one of the primitive functions F,G,H,I for the 4 rounds respectively. X[k] is the kth 32-bit word in the current message block. T[i] is the ith entry in the matrix of constants T. The addition of varying constants T and the use of different shifts helps ensure it are extremely difficult to compute collisions. Each round has 16 steps of the form:

 

a b+((a+g(b,c,d)+X[k]+T[i])<<<s)

 

a,b,c,d refer to the 4 words of the buffer, but used in varying permutations note this updates 1 word only of the buffer after 16 steps each word is updated 4 times

Where g(b,c,d) is a different nonlinear function in each round (F,G,H,I) T[i] is a constant value derived from sin.

4.1.2MD4

 

MD4 is the precursor to MD5, and was widely used. It uses 3 instead of 4 rounds, and the round functions are a little simpler. In creating MD5 Rivels aimed to strengthen the algorithms by introducing the extra round and varying the constants used. MD5 design goals: collision resistant (hard to find collisions) ,direct security (no dependence on “hard” problems) ,fast, simple, compact ,favours little-endian systems (eg PCs) .

 

4.1.2   Strength of MD5

 

Some progress has been made analysing MD5, which along with the hash size of 128-bits means it’s starting to look too small. Hence interest in hash functions that create larger hashes. MD5 hash is dependent on all message bits Rivest claims security is good as can be known attacks are Berson 92 attacked any 1 round using differential cryptanalysis (but can’t extend) Boer & Bosselaers 93 found a pseudo collision (again unable to extend) Dobbertin 96 created collisions on MD compression function (but initial constants prevent exploit) conclusion is that MD5 looks vulnerable soon.

  1. Secure Hash Algorithm (SHA)

The Secure Hash Algorithm (SHA) was developed by the National Institute of Standards and Technology (NIST) and published as a federal information processing standard (FIPS 180) in 1993; a revised version was issued as FIPS 180-1 in 1995 and is generally

 

referred to as SHA-1. The actual standards document is entitled Secure Hash Standard. SHA is based on the hash function MD4 and its design closely models MD4. SHA-1 is also specified in RFC 3174, which essentially duplicates the material in FIPS 180-1, but adds a C code implementation. SHA-1 produces a hash value of 160 bits. In 2002, NIST produced a revised version of the standard, FIPS 180-2, that defined three new versions of SHA, with hash value lengths of 256, 384, and 512 bits, known as SHA-256, SHA-384, and SHA-512 (Table 28.1). These new versions have the same underlying structure and use the same types of modular arithmetic and logical binary operations as SHA-1. In 2005, NIST announced the intention to phase out approval of SHA-1 and move to a reliance on the other SHA versions by 2010. Shortly thereafter, a research team described an attack in which two separate messages could be found that deliver the same SHA-1 hash using 269 operations, far fewer than the 280 operations previously thought needed to find a collision with an SHA-1 hash. This result should hasten the transition to the other versions of SHA.

 

Notes:

  1. All sizes are measured in bits.
  2. Security refers to the fact that a birthday attack on a message digest of size n produces a collision with a work factor of approximately 2n/2

SHA-512 Logic

 

The algorithm takes as input a message with a maximum length of less than 2128 bits and produces as output a 512-bit message digest. The input is processed in 1024-bit blocks. Figure 4 depicts the overall processing of a message to produce a digest.

 

The processing consists of the following steps:

 

Step 1: Append padding bits. The message is padded so that its length is congruent to 896 modulo 1024 [length 896 (mod 1024)]. Padding is always added, even if the message is already of the desired length. Thus, the number of padding bits is in the range of 1 to 1024. The padding consists of a single 1-bit followed by the necessary number of 0-bits.

 

Step 2: Append length. A block of 128 bits is appended to the message. This block is treated as an unsigned 128-bit integer (most significant byte first) and contains the length of the original message (before the padding). The outcome of the first two steps yields a message that is an integer multiple of 1024 bits in length. In Figure 28.1, the expanded message is represented as the sequence of 1024-bit blocks M1, M2,…, MN, so that the total length of the expanded message is N x 1024 bits.

Step 3: Initialize hash buffer. A 512-bit buffer is used to hold intermediate and final results of the hash function. The buffer can be represented as eight 64-bit registers (a, b, c, d, e, f, g, h).These registers are initialized to the following 64-bit integers (hexadecimal values):

 

a = 6A09E667F3BCC908

 

b = BB67AE8584CAA73B

 

c = 3C6EF372FE94F82B

 

c = A54FF53A5F1D36F1

 

e = 510E527FADE682D1

 

f = 9B05688C2B3E6C1F

 

g = 1F83D9ABFB41BD6B

 

h = 5BE0CDI9137E2179

 

These values are stored in big-endian format, which is the most significant byte of a word in the low-address (leftmost) byte position. These words were obtained by taking the first sixty-four bits of the fractional parts of the square roots of the first eight prime numbers.

Step 4: Process message in 1024-bit (128-word) blocks. The heart of the algorithm is a module that consists of 80 rounds; this module is labeled F in Figure 2. The logic is illustrated in Figure 2

 

Each round takes as input the 512-bit buffer value abcdefgh, and updates the contents of the buffer. At input to the first round, the buffer has the value of the intermediate hash value, Hi-1. Each round t makes use of a 64-bit value Wt derived from the current 1024-bit block being processed (Mi) These values are derived using a message schedule described subsequently. Each round also makes use of an additive constant Kt where 0 t 79 indicates one of the 80 rounds. These words represent the first sixty-four bits of the fractional parts of the cube roots of the first eighty prime numbers. The constants provide a “randomized” set of 64-bit patterns, which should eliminate any regularities in the input data.The output of the eightieth round is added to the input to the first round (Hi-1)to produce Hi. The addition is done independently for each of the eight words in the buffer with each of the corresponding words in Hi-1 using addition modulo 264.

Step 5: Output. After all N 1024-bit blocks have been processed, the output from the Nth

 

stage is the 512-bit message digest. We can summarize the behavior of SHA-512 as

 

follows:

 

H0 = IV

 

Hi = SUM64(Hi-1, abcdefghi)

 

MD = HN

 

where

 

IV = initial value of the abcdefgh buffer, defined in step 3

 

abcdefghi = the output of the last round of processing of the ith message block N = the number of blocks in the message (including padding and length fields)

SUM64 = Addition modulo 264 performed separately on each word of the pair of inputs MD = final message digest value

 

SHA-512 Round Function

 

Let us look in more detail at the logic in each of the 80 steps of the processing of one

 

512-bit block (Figure 28.3). Each round is defined by the following set of equations:

 

 

t =step number; 0≤ t ≤79

 

Ch(e, f, g) = (e AND f)  (NOT e AND g) the conditional function: If e then f else g

 

Maj(a, b, c) = (a AND b)  (a AND c)  (b AND c) the function is true only of the majority (two or three) of the arguments are true.

 

ROTRn(x) = circular right shift (rotation) of the 64-bit argument x by n bits Wt = a 64-bit word derived from the current 512-bit input block

 

Kt = a 64-bit additive constant

 

+ = addition modulo 264

Figure 6 Elementary SHA-512 Operation (single round)

It remains to indicate how the 64-bit word values Wt are derived from the 1024-bit message. Figure28.4 illustrates the mapping. The first 16 values of Wt are taken directly from the 16 words of the current block. The remaining values are defined as follows:

Where

ROTRn(x) = circular right shift (rotation) of the 64-bit argument x by n bits

SHRn(x) = left shift of the 64-bit argument x by n bits with padding by zeros on the right

 

Figure 7 Creation of 80-word Input Sequence for SHA-512 Processing of Single Block

 

Thus, in the first 16 steps of processing, the value of Wt is equal to the corresponding word in the message block. For the remaining 64 steps, the value of Wt consists of the circular left shift by one bit of the XOR of four of the preceding values of Wt, with two of those values subjected to shift and rotate operations. This introduces a great deal of redundancy and interdependence into the message blocks that are compressed, which complicates the task of finding a different message block that maps to the same compression function output.

 

5 .Keyed Hash Functions as MACs

 

The desire to create a MAC using a hash function rather than a block cipher because hash functions are generally faster and not limited by export controls unlike block ciphers hash includes a key along with the message original proposal KeyedHash = Hash(Key|Message) some weaknesses were found with this eventually led to development of HMAC

5.1 HMAC

 

The idea of a keyed hash evolved into HMAC, designed to overcome some problems with the original proposals. Further have a design that has been shown to have the same security as the underlying hash alg. The hash function need only be used on 3 more blocks than when hashing just the original message (for the two keys + inner hash). Choose the hash algorithm to use based on speed/security concerns. specified as Internet standard RFC2104 uses hash function on the message HMACK = Hash[(K+ XOR opad) || Hash[(K+ XOR ipad)||M)]] where K+ is the key padded out to size and opad, ipad are specified padding constants overhead is just 3 more hash calculations than the message needs alone any of MD5, SHA-1, RIPEMD-160 can be used

5.1 HMAC Security

 

The Ssecurity of HMAC relates to that of the underlying hash algorithmattacking HMAC requires either: brute force attack on key use Birthday attack (but since keyed would need to observe a very large number of messages)choose hash function used based on speed verses security constraints.

Summary

  • General ideas behind cryptographic hash function is explored
  • The difference between two categories of hash function: those with a compression function made from scratch and those with a block cipher as the compression function is studied
  • The structure of SHA-512 as an example of a cryptographic hash function with a compression function made from scratch is explored
you can view video on Hash Algorithm