Thursday, July 28, 2011

Vigenere cipher



The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Ceasar ciphers based on the letters of a keyword. The Vigenère cipher is named after Blaise de Vigenère.



It is based on a technique that appears to be unsolvable!!!

ENCRYPTION:

The input taken is plain text with special characters and numbers eliminated. A second text known as the key of length k is used to encrypt the plain text.The new text obtained after this process is called the cipher text(encrypted text). Each character in the alphabet is given a weight based on its position(i.e from 0 to 25)
                    a=0
                    b=1
                    .
                    .
                    z=25
 Now, consider the plain text (say P0, shown bellow), and a key (SET). Place the key repeatedly below the plain text to span till the end of plain text. it is clearly seen that there now exists a one to one correspondence between a character of the P0 and the key as shown below.


P0:         T H I S I S M Y P L A I N T E X T

KEY:        S E T S E T S E T S E T S E T S E
      

Now, each character in the plain text is replaced by another character based on a "shift". The plain text character is "shifted" corresponding to the key character's weight to obtain the new character.The text thus obtained is called the cipher text. For the example above,the key is SET. The equivalent weight of each character is given by:

S=18
E=4
T=19

Thus the cipher text (C0) is obtained by shifting the first character by 18 positions in the english alphabet set, the second character by 4 positions, the third character by 19 positions, the fourth character by 18 and so on..

P0:           T H I S I S M Y P L A I N T E X T

KEY:          S E T S E T S E T S E T S E T S E

C0:           L L B K M L E C I D E B F X X P X 

Thus we have obtained our cipher text.

DECRYPTION:

To begin the decryption process, we start by finding the length of the key. We shift the cipher text by one position towards the right to obtain C1. We compare the obtained shifted text C1 to the initial cipher text C0 to find the number of collisions. This is repeated to obtain C2, C3, C4 and so on upto one less than the number of characters in C0 as continuing will repeat the generation of C0,C1,C2 respectively.

For example:

C0:        L L B K M L E C I D E B F X X P X

C1:        X L L B K M L E C I D E B F X X P                             

   
As seen from the above example, at positions 2 and 15, the same characters occur (L and X) in C0 and C1 resulting in collisions. Thus the number of collisions is 2.
Similarly we obtain C2,C3 and so on and obtain the number of collisions in each case. Comparing the results of each of them, we observe that there is a sudden increase in the number of collisions for a particular Cn. This reveals that the length of the key is n. This happens so because, if the length of the key is n,then every nth character is encrypted by the same alphabet of the key. Thus for maximum collisions, every character seperated by a distance of 'n' must be the same and will be encrypted by the same letter of the key. This happens only when the cipher text is shifted by 'n' positions and it will have the same number of collisions as that of the plain text when shifted by 'n' positions.

Once the length of the key is obtained, we have to find the individual characters of the key. Since the plain text taken is valid English text, the occurance of characters is biased and not random. Some letters occur with higher frequency than the rest. Keeping this fact in mind, we crack each character of the key.
We extract all the characters from the cipher text that have been encrypted by 1st character of the key and store them in an array i.e the 1st, (n+1)th, (2n+1)th... characters are stored in the array.
The frequency of all characters in this array are calculated. From the tabulated frequency of the English alphabet, we can see that the most occuring character is the alphabet 'e'. Similarly, we find the most occuring character in the array and match it to 'e' which means that all the 'e's have become the most occuring character. Hence the character shift is determined and this shift gives us the corresponding equivalent character.For example, if the most occuring character was found to be 'v', then it means that all the 'e's have become 'v's which means that a shift of 17 has occured. this tells that the first character of the key is the 17th alphabet i.e 'q'.

FREQUENCY DISTRIBUTION OF ENGLISH CHARECTERS:


Now, we extract all the characters that have been encrypted by the second character of the key and repeat the above process.
This process is repeated for the rest of the characters and all the characters of the key are determined.
Once the key is known its a child's play to decipher the text!!


Authors:


Amitash Ramesh


Soumya Ramesh


Varsha Bhat K


10 comments:

  1. well @amitashr...

    the beauty of cryptography when presented in the right manner is unmatched. you have certainly put/compiled/copied it in the right mannner.

    i especially liked the part about using a key to encrypt the text.

    and the use of statistics in decrypting the text was also a treat...

    kindly put up more about such ciphers and in general about cryptography...you have presented this in a very easy to understand manner, and that is commendable.

    good stuff!
    chetan.

    ReplyDelete
  2. NO NO! It was not only me.. Soumya and Varsha did the bulk of the work.. They must be given all the credit..

    ReplyDelete
  3. i like this post... shifting looked very impressive.. im fascinated to run a program of this sort... n look at the sudden increase in the number of collisions !
    interesting one ...
    cool :D

    ReplyDelete
  4. @pramod: I can show you the very program. I've written a
    decoder :P

    ReplyDelete
  5. Amitash I think you must put a link to your python code here... It is something that will amuse any graduate student!

    Great job indeed!

    ReplyDelete
  6. @Sudarshan: Sure thing.. little more debugging to do.. I'll put it up soon..

    ReplyDelete
  7. ah,well...

    let us give credit where it is due then,@amitash

    great job varsha+soumya+amitash.

    please note that the + symbol is to be taken as the logical OR and not as some concatenation operator.

    great stuff,
    chetan

    ReplyDelete
  8. Hey nicely written :) good job all of u :) Encryption neatly explained. :) Decryption I have few doubts..
    1. Is collisions found between $C_n$ and $C_0$ or $C_n$ and $C_(n-1)$??
    2. Could you pls put a graph of number of shifts V/s number of collisions? :)

    ReplyDelete
  9. Hey @amitash, @soumya and @varsha,

    Why haven't you people answered Shruthi's question yet? She has probably forgotten what she had asked by now .

    ReplyDelete
  10. Btw, @authors, the sub-heading isn't "decryption" , it should be "cryptanalysis". Please make the necessary changes. Look up for the meaning of these two words.

    ReplyDelete