2 Substitution Cipherand Cryptanalysis

Dr Kulothungan

epgp books

 

 

 

Learning Objectives

Ø  To know the substitution cipher in classical cryptography

Ø   To learn about the mono alphabetic substitution cipher

Ø   To learn about the poly alphabetic substitution cipher

Ø   Cryptanalysis of substitution cipher

 

3.1 Introduction

 

Humans could encode their own documents, but computers do it faster and more efficiently. To do this, the computer at each end uses a document called an “SSL Certificate” containing character strings that are the keys to their secret “codes.

 

In cryptography, a transposition cipher is a method of encryption by which the positions held by units of plaintext are shifted according to a regular system, so that the ciphertext constitutes a permutation of the plaintext. That is, the order of the units is changed. Mathematically a bijective function is used on the characters’ positions to encrypt and an inverse function to decrypt. So let’s know about the terminologies and concepts behind the cryptography which acts as a base behind our day today online transactions.

3.2 Substitution Cipher

 

Substitution cipher is a method of encoding in which units of plaintext are replaced with ciphertext, according to a fixed system. The “units” may be single letters (the most common), pairs of letters, triplets of letters, and mixtures of the above. The receiver deciphers the text by performing the inverse substitution.In substitution ciphers, we replace the plaintext letters with other letters

 

–  The resulting text is the ciphertext

–  The substitution rule is the key

 

 

It Corresponds to Shannon’s principle of confusion. The letters of plaintext are replaced by other letters or by numbers or symbols. Plaintext is viewed as a sequence of bits, then substitution involves replacing plaintext bit patterns with ciphertext bit patterns.The simple substitution cipher is a cipher that has been in use for many hundreds of years. It basically consists of substituting every plaintext character for a different ciphertext character. It differs from the Caesar cipher in that the cipher alphabet is not simply the alphabet shifted, it is completely jumbled.The simple substitution cipher offers very little communication security, and it will be shown that it can be easily broken even by hand, especially as the messages become longer (more than several hundred ciphertext characters).

 

 

Example

 

Here is a quick example of the encryption and decryption steps involved with the simple substitution cipher. The text we will encrypt is ‘defend the east wall of the castle’.

 

Keys for the simple substitution cipher usually consist of 26 letters (compared to the caeser cipher’s single number). An example key is:

 

plain alphabet : abcdefghijklmnopqrstuvwxyz

    cipher alphabet: phqgiumeaylnofdxjkrcvstzwb

 

An example encryption using the above key:

 

plaintext : defend the east wall of the castle

 

ciphertext: giuifg cei iprc tpnn du cei qprcni

 

It is easy to see how each character in the plaintext is replaced with the corresponding letter in the cipher alphabet. Decryption is just as easy, by going from the cipher alphabet back to the plain alphabet. When generating keys it is popular to use a key word, e.g. ‘zebra’ to generate it, since it is much easier to remember a key word compared to a random jumble of 26 characters. Using the keyword ‘zebra’, the key would become:

 

cipher alphabet: zebracdfghijklmnopqstuvwxy

 

 

This key is then used identically to the example above. If your key word has repeated characters e.g. ‘mammoth’, be careful not to include the repeated characters in the cipher alphabet

 

3.3 Caesar Cipher

 

 

The Caesar cipher is one of the earliest known and simplest ciphers. It is a type of substitution cipher in which each letter in the plaintext is ‘shifted’ a certain number of places down the alphabet. For example, with a shift of 1, A would be replaced by B, B would become C, and so on. The method is named after Julius Caesar, who apparently used it to communicate with his generals. It is used until 16th century and it replaces each letter by 3rd letter.

 

example:

 

meet me after the toga party

PHHW PH DIWHU WKH WRJD SDUWB

can define transformation as:

    a b c d e f g h i j k l m n o p q r s t u v w x y z

D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

 

mathematically give each letter a number

 

a b c d e f g h i j k l m n o p q r s t u  v w x y z

0    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

 

It can be easily depicted using the following   Caesar cipher :

 

c = E(p) = (p + k) mod (26)

p = D(c) = (c – k) mod (26)

 

 

3.3.1 Cryptanalysis Of Caesar Cipher

Ø  only have 26 possible ciphers

 

§    A maps to A,B,..Z

 

Ø   could simply try each in turn a brute force search

Ø   given ciphertext, just try all shifts of letters

Ø   do need to recognize when have plaintext

Ø   eg. break ciphertext “GCUA VQ DTGCM”

 

The Caesar cipher is probably the easiest of all ciphers to break. Since the shift has to be a number between 1 and 25, (0 or 26 would result in an unchanged plaintext) we can simply try each possibility and see which one results in a piece of readable text. If you happen to know what a piece of the ciphertext is, or you can guess a piece, then this will allow you to immediately find the key.

 

 

3.4 Simple Substitution Cipher

 

Ø  rather than just shifting the alphabet

Ø   could shuffle (jumble) the letters arbitrarily

Ø   each plaintext letter maps to a different random ciphertext letter

Ø   hence key is 26 letters long

A monoalphabetic substitution cipher, also known as a simple substitution cipher, relies on a fixed replacement structure. That is, the substitution is fixed for each letter of the alphabet. Thus, if “a” is encrypted to “R”, then every time we see the letter “a” in the plaintext, we replace it with the letter “R” in the ciphertext.

 

A simple example is where each letter is encrypted as the next letter in the alphabet: “a simple message” becomes “B TJNQMF NFTTBHF”. In general, when performing a simple substitution manually, it is easiest to generate the ciphertext alphabet first, and encrypt by comparing this to the plaintext alphabet. The table below shows how one might choose to, and we will, lay them out for this example.

 

 

The ciphertext alphabet for the cipher where you replace each letter by the next letter in the alphabet. There are many different monoalphabetic substitution ciphers, in fact infinitely many, as each letter can be encrypted to any symbol, not just another letter.

 

The history of simple substitution ciphers can be traced back to the very earliest civilisations, and for a long time they were more than adequate for the purposes for which they were needed. By today’s standards they are very weak, and incredibly easy to break, but they were a very important step in developing.

 

 

3.4.1 Simple Substitution Cipher Security

 

Monoalphabetic ciphers are not very secure and can be easily broken by statistical means

Ø now have a total of 26! = 4 x 1026 keys   HY6JIN9YI66inj

Ø   with so many keys, might think is secure

Ø   but would be !!!WRONG!!!

Ø   problem is language characteristics

 

3.5 Cryptanalysis

Ø  Statistical analysis

o   Statistics might reveal info about key

Ø  Ciphertext should appear random

Ø   But randomness is not easy

o   Difficult to define random (entropy)

Ø  Human languages are redundant

Ø  A strong familiarity with a language includes a grasp of the language’s redundancy.

 

Redundancy means that every language contains more characters or words than are actually needed to convey information. The rules of the English language create redundancy. For example, no English word will begin with the letters “ng.” English also relies heavily on a small number of words. Words like “the,” “of,” “and,” “to,” “a,” “in,” “that,” “it,” “is,” and “I” account for more than one quarter of the text of an average message written in English.

 

Knowing the redundant qualities of a language makes a cryptanalyst’s task much easier. No matter how convoluted the cipher is, it follows some language’s rules in order for the recipient to understand the message. Cryptanalysts look for patterns within ciphers to find common words and letter pairings.

 

One basic technique in cryptanalysis is frequency analysis. Every language uses certain letters more often than others. In English, the letter “e” is the most common letterin English E is by far the most common letter

  • followed by T,R,N,I,O,A,S

 

. By counting up the characters in a text, a cryptanalyst can see very quickly what sort of cipher he has. If the distribution of cipher frequency is similar to the distribution of the frequency of a normal alphabet, the cryptanalyst may conclude that he’s dealing with a monoalphabetic cipher.

 

  • other letters like Z,J,K,Q,X are fairly rare
  • have tables of single, double & triple letter frequencies for various languages

 

 

 

  • English Letter Frequencies

 

3.5.1 Use In Cryptanalysis

 

Ø  key concept – monoalphabetic substitution ciphers do not change relative letter frequencies

Ø  discovered by Arabian scientists in 9th century

Ø   calculate letter frequencies for ciphertext

Ø   compare counts/plots against known values

Ø  If caesar cipher look for common peaks/troughs

  • peaks at: A-E-I triple, NO pair, RST triple
  • troughs at: JK, X-Z

Ø  for monoalphabetic must identify each letter

  • tables of common double/triple letters help

 

 

 

3.5.2 Example Cryptanalysis

Ø  given ciphertext:

 

UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ

 

Ø  count relative letter frequencies (see text)

Ø   guess P & Z are e and t

Ø   guess ZW is th and hence ZWP is the

Ø   proceeding with trial and error finally get:

o   it was disclosed yesterday that several informal but direct contacts have been made,

o   representatives of the viet cong in Moscow

 

 

3.6 Poly-Alphabetic Substitution

 

A          polyalphabetic cipher is based on substitution, using multiple substitution alphabets.

 

 

Ø   Like a simple substitution, but permutation (“alphabet”) changes

–  Often, a new alphabet for each letter

Ø  Very common in classic ciphers

 

–  Vigenere cipher is an example

–  Discuss Vigenere later in this section

 

Ø      The Enigma machine is more complex but still fundamentally a polyalphabetic substitution cipher.

Ø  Used in WWII-era cipher machines.

 

3.7 Playfair Cipher

 

The Playfair cipher was the first practical digraph substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but was named after Lord Playfair who promoted the use of the cipher. The technique encrypts pairs of letters (digraphs), instead of single letters as in the simple substitution cipher. The Playfair is significantly harder to break since the frequency analysis used for simple substitution ciphers does not work with it. Frequency analysis can still be undertaken, but on the 25*25=625 possible digraphs rather than the 25 possible monographs. Frequency analysis thus requires much more ciphertext in order to work. Playfair is reasonably fast to use and requires no special equipment. A typical scenario for Playfair use would be to protect important but non-critical secrets during actual combat.

 

3.7.1 Playfair Key Matrix

 

Ø  a 5X5 matrix of letters based on a keyword

Ø   fill in letters of keyword (sans duplicates)

Ø   fill rest of matrix with other letters

Ø   eg. using the keyword MONARCHY

 

 

 

3.7.2 Encrypting And Decrypting

 

plaintext is encrypted two letters at a time

Ø  if a pair is a repeated letter, insert filler like ‘X’

 

Ø  if both letters fall in the same row, replace each with letter to right(wrapping back to start from end)

 

Ø  if both letters fall in the same column, replace each with the letter below it (again wrapping to top from bottom)

 

Ø  otherwise each letter is replaced by the letter in the same row and in the column of the other letter of the pair

 

 

In order to encrypt using the Playfair Cipher, we must first draw up a Polybius Square (but without the need for the number headings). This is usually done using a keyword, and either combining “i” and “j” or omitting “q” from the square.

 

We must now split the plaintext up into digraphs (that is pairs of letters). On each digraph we perform the following encryption steps.

 

If the digraph consists of the same letter twice (or there is only one letter left by itself at the end of the plaintext) then insert the letter “X” between the same letters (or at the end), and then continue with the rest of the steps.

 

If the two letters appear on the same row in the square, then replace each letter by the letter immediately to the right of it in the square (cycling round to the left hand side if necessary).

 

If the two letters appear in the same column in the square, then replace each letter by the letter immediately below it in the square (cycling round to the top of the square if necessary).

 

Otherwise, form the rectangle for which the two plaintext letters are two opposite corners. Then replace each plaintext letter with the letter that forms the other corner of the rectangle that lies on the same row as that plaintext letter (being careful to maintain the order).

 

3.7.3 Security Of Playfair Cipher

Ø  security much improved over monoalphabetic

Ø   since have 26 x 26 = 676 digrams

Ø   would need a 676 entry frequency table to analyse (verses 26 for a monoalphabetic)

Ø   and correspondingly more ciphertext

Ø   was widely used for many years

Ø   eg. by US & British military in WW1

Ø   it can be broken, given a few hundred letters

Ø   still has much of plaintext structure

 

 

3.8 Affine Cipher

 

The affine cipher encrypts by multiplying the plaintext by one part of the key followed by addition of another part of the key followed by addition of another part of the key.

Ø  Require that gcd(a, 26) = 1 (why?)

 

The restriction gcd(a,26)=1 arises from the fact that the key parameter a needs to be inverted for decryption. An element a and the modulus must be relatively prime for the inverse of a to exist. Thus, a must be in the set:

A €{1,3,5,7,9,11,15,17,19,21,23,25}

 

But how do we find inverse of a? We can simply compute it by trial and error. For a given a we simply try all possible values of a-1until we obtain

 

 

Ø  Keyspace size?

 

•         Keyspace size is 26·  (26) = 312

•         Too small to be practical. A key space with 312 elements can, still be searched exhaustively that is brute-force attacked, in a fraction of a second with current desktop PCs. Here the mapping between the plaintext letters and ciphertext letters is fixed. Hence, it can easily be broken with letter frequency analysis.

 

 

3.9 Vigenere Cipher

 

A vigenere cipher is a special case of polyalphabetic system in which number of different substitution alphabets are used rather than just one. The number of substitution alphabets used may be anything from 2 to many thousands.

 

Ø  Key is of the form K = (k0,k1,…,kn-1) Where each ki   {0,1,2,…,25}

Ø   Encryption

ci = pi + ki (mod n) (mod 26)

 

Ø  Decryption

pi = ci – ki (mod n) (mod 26)

 

Just a repeating sequence of (shift by n) simple substitutions.

 

3.9.1 Example

 

For example, suppose key is MATH

That is, K = (12,0,19,7), since M is letter 12, and so on

 

    Plaintext:             SECRETMESSAGE

Ciphertext:         EEVYQTFLESTNQ

 

Encrypt:

 

S E C R E T M E S S A G E

18 4 2 17 4 19 12 4 18 18 0 6 4

+12 0 19 7 12 0 19 7 12 0 19 7 12

4     4 21 24 16 19 5 11 4 18 19 13 16 (mod 26)

E E V Y Q T F L E S T N Q

 

But how to determine k (key length)?

There are two methods for that. They are,

 

–  Kasiski Method

–  Index of coincidence

 

 

3.9.2 Kasiski Method

Ø  method developed by Babbage / Kasiski

Ø   repetitions in ciphertext give clues to period

Ø   so find same plaintext an exact period apart

Ø   which results in the same ciphertext

Ø   of course, could also be random fluke

Ø  then attack each monoalphabetic cipher individually using same techniques as before

 

 

3.9.3 Index Of Coincidence

Ø  Assume ciphertext is English letters

Ø  Let n0 be number of As, n1 number of Bs, …, n25 number of Zs in ciphertext

Ø  Let n = n0 + n1 + … + n25

Ø  Gives the probability that 2 randomly selected letters are the same

Ø   For plain English, prob. 2 letter are same:

            

 

Ø  Then for simple substitution, I ≈ 0.065

 

Ø  For random letters, each pi = 1/26

 

Ø Then I ≈ 0.03846 for poly-alphabetic substitution with a very long keyword

Ø How to use this to estimate length of keyword in Vigenere cipher?

Ø Suppose keyword is length k, message is length n

o Ciphertext in matrix with k columns, n/k rows Ø

 

Ø Select 2 letters from same columns

o Like selecting from simple substitution

 

Ø Select 2 letters from different columns

o  Like selecting random letters

 

3.10 Hill Cipher

 

Ø  Invented by Lester Hill in 1929 o A pre-modern block cipher

Ø  Idea is to create a substitution cipher with a large “alphabet”.

Ø  Plaintext, p0, p1, p2, …

Ø  Each pi is block of n consecutive letters

o   As a column vector

Ø  Let A be n x n invertible matrix, mod 26

Ø  Then ciphertext block ci is given by

o   Encryption: ci = Api (mod 26)

o  Decryption: pi = A –1 ci (mod 26)

 

Ø The matrix A is the key

 

 

3.10.1 Hill Cipher Example

Ø  Let n = 2 and

Ø   Plaintext

§    MEETMEHERE = (12,4,4,19,12,4,7,4,17,4)

Ø   Then Ciphertext will be

§    (4,22,23,9,4,22,24,19,10,25) = EWXJEWYTKZ

 

 

 

3.10.2 Hill Cipher Cryptanalysis

Ø  Trudy suspects Alice and Bob are using Hill cipher, with n x n matrix A

Ø   Suppose Trudy knows n plaintext blocks

 

o   Plaintext blocks p0,p1,…,pn-1

o   Ciphertext blocks c0,c1,…,cn-1

 

Ø  Let P be matrix with columns p0,p1,…,pn-1

Ø  Let C be matrix with columns c0,c1,…,cn-1

Ø Then AP = C and A = CP–1 if P –1 exists

Ø  Linear ciphers are weak

o   Since linear equations are easy to solve

Ø  Strong cipher must have nonlinearity

o   Linear components are useful

o   But cipher cannot be entirely linear

Ø  Cryptanalyst try to approximate nonlinear parts with linear equations

 

Summary

  • Ø Discussed about the Substitution cipher cryptography
  • Ø Discussed about the Cryptanalysis
  • Ø Explored the substitution cipher- monoalphabet cipher and polyalphabet cipher
  • Ø Discussed about the monoalphabet cipher, polyalphabet cipher and its cryptanalysis
you can view video on Substitution Cipherand Cryptanalysis