Some Python code relevant to the CS402 cryptography course is contained in the file cs402.py . A dictionary of English words is contained in the file dictionary.txt . Both files should be placed in the Python directory (such as C:\Python27). You'll need to have the Python numpy module installed in orderd to use cs402.py.

Affine cryptosystem with single letter message units

Suppose we wish to use the enciphering function 

x |---> ax + b       modulo 26

on single letter message units over a 26-letter alphabet A=0, B=1, ..., Z=26, with enciphering key (a,b)=(3,4), to encipher the plaintex

"THISISONEOFTHEMOSTINSECURECRYPTOSYSTEMTSONTHEMARKET".

We can do this by placing the plaintext in a file plain.txt in our Python directory and then using the following Python code.

>>> from cs402 import AffineCipher
>>> C=AffineCipher
>>> C.setAlphabet("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> k=[3,4]
>>> C.encipher("plain.txt")
'JZCGCGURQUTJZQOUGJCRGQKMDQKDYXJUGYGJQOJGURJZQOEDIQJ'

The ciphertext is

"JZCGCGURQUTJZQOUGJCRGQKMDQKDYXJUGYGJQOJGURJZQOEDIQJ" .

We can decipher by placing this ciphertext in a file cipher.txt in our Python directory, noting that the deciphering function is x |---> cx + d with (c,d)=(9,16), and then using the following Python code.

>>> C.setDecipherKey([9,16])
>>> C.decipher("cipher.txt")
'THISISONEOFTHEMOSTINSECURECRYPTOSYSTEMTSONTHEMARKET'


Affine cryptosystem with block message units

Suppose we wish to use the enciphering function 

x |---> Ax + B       modulo 26

on 2-letter message units over a 26-letter alphabet A=0, B=1, ..., Z=26, with enciphering key matrix A=[[3,3],[4,5]] and vector B=[[1],[2]], to encipher the plaintex

"THISISONEOFTHEMOSTINSECURECRYPTOSYSTEMTSONTHEMARKET".

We can do this by placing the plaintext in a file plain.txt in our Python directory and then using the following Python code. The first line of code imports the function mat from the numpy module.

>>> from numpy import mat
>>> A=mat([[3,3],[4,5]])
>>> B=mat([[1],[2]])
>>> from cs402 import AffineCipher
>>> C=AffineCipher
>>> C.setAlphabet("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> C.setEncipherKey([A,B])
>>> C.encipher("plain.txt")
'BJBUBUETDKVNIYBQINMVPQPGMMGRORWSXMINXAIMETBJXAAJ'

The ciphertex is

"BJBUBUETDKVNIYBQINMVPQPGMMGRORWSXMINXAIMETBJXAAJ"

We can decipher by placing this ciphertext in a file cipher.txt in our Python directory, noting that the deciphering function is x |---> Ax + B with A=[[19,25],[16,1]], B=[[9],[8]] and then using the following Python code.

>>> A=mat([[19,25],[16,1]])
>>> B=mat([[9],[8]])
>>> C.setDecipherKey([A,B])
>>> C.decipher("cipher.txt")
'THISISONEOFTHEMOSTINSECURECRYPTOSYSTEMTSONTHEM'


Exhaustive search through all deciphering keys

Suppose we know that the ciphertext

APHIZHIZNVMZNEZAPMZCNIAZHVIMBTQMZBQOFANIOIAMCAIZNVZAPMZCRQSMA

was produced using a affine cryptosystem x |---> ax + b on single letter message units over an alphabet A=0, B=1, ..., Z=25, space=26. Suppose also that we know that the plaintext is in English. There are only Phi(27)*27=486 possible deciphering keys, so we could try all possible deciphering keys until we hit an output in English. If we have placed the ciphertext in a file cipher.txt then the following Python code does just this for us.

>>> from cs402 import AffineCipher
>>> from cs402 import isEnglish
>>> C=AffineCipher
>>> C.setAlphabet("ABCDEFGHIJKLMNOPQRSTUVWXYZ ")
>>> for a in range(1,26):
...   for b in range(0,26):
...    txt=C.decipher("cipher.txt")
...    if isEnglish(txt) : print(txt); print(" ")
...
>>>
>>> from cs402 import AffineCipher
>>> from cs402 import isEnglish
>>> C=AffineCipher
>>> C.setAlphabet("ABCDEFGHIJKLMNOPQRSTUVWXYZ ")
>>> for a in range(1,26):
...   for b in range(0,26):
...     C.setDecipherKey([a,b])
...     txt=C.decipher("cipher.txt")
...     if isEnglish(txt) : print(txt); print(" ")
...
MP BI BILAJILUIMPJIQLBMI ABJOXRJIORNWMLBNBMJQMBILAIMPJIQTRVJM
 
RUEGNEGNQFONQZNRUONVQGRNEFGOTBWONTWSARQGSGROVRGNQFNRUONVYW OR
 
FX C  C ROO RR FXO LRCF  OCOII O I UUFRCUCFOLFC RO FXO LC FOF
 
LU F  F ICC II LUC XIFL  CFCRR C R OOLIFOFLCXLF IC LUC XF LCL
 
IFDKVDKVSULVSJVIFLVWSKIVDUKLPGMLVPMZQISKZKILWIKVSUVIFLVWTM LI
 
RR I  I  RR    RRR I IR  RIR   R   IIR IIIRRIRI  R RRR II RRR
 
THIS IS ONE OF THE MOST INSECURE CRYPTOSYSTEMTS ON THE MARKET
 
RUNYWNYWZFOWZHWRUOWMZYRWNFYOBKEOWBEJSRZYJYROMRYWZFWRUOWMPE OR
 
CL O  O IUU II CLU FIOC  UOURR U R XXCIOXOCUFCO IU CLU FO CUC
 
IRFUFFUFO  FOOFIR FLOUIFF U XXF FXFCCIOUCUI LIUFO FIR FLUFI I
 
IFMBDMBDAULDASDIFLDNABIDMUBLYPVLDYVQHIABQBILNIBDAUDIFLDNKV LI
 
II R  R  II    III R RI  IRI   I   RRI RRRIIRIR  I III RR III
 
RUWPEWPEHFOEHQERUOEDHPREWFPOKTNOEKNAJRHPAPRODRPEHFERUOEDGN OR
 
OF U  U RXX RR OFX CRUO  XUXII X I LLORULUOXCOU RX OFX CU OXO
 
RICXCCXCU  CUUCRI CFUXRCC X LLC CLCOORUXOXR FRXCU CRI CFXCR R
 
UC X  X ILL II UCL OIXU  LXLRR L R FFUIXFXULOUX IL UCL OX ULU
 
IFVTMVTMJULMJAMIFLMEJTIMVUTLGYDLMGDHZIJTHTILEITMJUMIFLMEBD LI


We see from this brute force search that the plaintext in this case is most probably

"THIS IS ONE OF THE MOST INSECURE CRYPTOSYSTEMTS ON THE MARKET" .

The function isEnglish(string) returns true if at least p=80% of the words in string, defined by spaces, are in the file dictionary.txt. The value of p can be adjusted in the code to give potentially better results.

Frequency analysis

To perform frequency analysys on a piece of text we need to be able to count the number of occurences of a particular letter, or a particular pair of letters, or triple of letters in the text. Suppose for instance we wanted to count the number of times "t" or "T" occurs in the following text, and also the number of times "the" or "The" occurs.
 
But here I must deal with a misconception. It is sometimes suggested that pure mathematicians glory in the uselessness of their work [16], and make it a boast that it has no practical applications. The imputation is usually based on an incautious saying attributed to Gauss, to the effect that, if mathematics is the queen of the sciences, then the theory of numbers is, because of its supreme uselessness, the queen of mathematics—I have never been able to find an exact quotation. I am sure that Gauss’s saying (if indeed it be his) has been rather crudely misinterpreted. If the theory of numbers could be employed for any practical and obviously honourable purpose, if it could be turned directly to the furtherance of human happiness or the relief of human suffering,
as physiology and even chemistry can, then surely neither Gauss nor any other mathematician would have been so foolish as to decry or regret such applications. But science works for evil as well as for good (and particularly, of course, in time of war); and
both Gauss and less mathematicians may be justified in rejoicing that there is one science at any rate, and that their own, whose very remoteness from ordinary human activities should keep it gentle and clean.

The following Python code performs the required counts.

>>> str="But here I must deal with a misconception. It is sometimes suggested that pure mathematicians glory in the uselessness of their work [16], and make it a boast that it has no practical applications. The imputation is usually based on an incautious saying attributed to Gauss, to the effect that, if mathematics is the queen of the sciences, then the theory of numbers is, because of its supreme uselessness, the queen of mathematics—I have never been able to find an exact quotation. I am sure that Gauss’s saying (if indeed it be his) has been rather crudely misinterpreted. If the theory of numbers could be employed for any practical and obviously honourable purpose, if it could be turned directly to the furtherance of human happiness or the relief of human suffering, as physiology and even chemistry can, then surely neither Gauss nor any other mathematician would have been so foolish as to decry or regret such applications. But science works for evil as well as for good (and particularly, of course, in time of war); and both Gauss and less mathematicians may be justified in rejoicing that there is one science at any rate, and that their own, whose very remoteness from ordinary human activities should keep it gentle and clean."
>>>
>>> str.count('t')+str.count('T')
93
>>> str.count('the')+str.count('The')
26

If you want to count the occurences of all symbols in the string and display the result as a histogram then you can use the following additional comand.

>>> from cs402 import FrequencyHistogram #You'll need the mathplotlib module for this to work
>>> FrequencyHistogram(str)



 
Instead of typing the text directly into Python it is often more convenient to place it in a file named say file.txt and then use the following commands.

>>> f=open("file.txt","r")  #The "r" is because we are reading from the file. Use "w" to write to a file
>>> str=f.read()
>>> f.close()
>>> str.count('t')+str.count('T')
93
>>> str.count('the')+str.count('The')
26

For frequence analysis it can be handy to first convert all letters to upper case. The following continuation of the above commands shows that white space seems to be the most frequent symbol in English text, followed by E and then followed by one of T, A, S.

>>> STR
'BUT HERE I MUST DEAL WITH A MISCONCEPTION. IT IS SOMETIMES SUGGESTED THAT PURE MATHEMATICIANS GLORY IN THE USELESSNESS OF THEIR WORK [16], AND MAKE IT A BOAST THAT IT HAS NO PRACTICAL APPLICATIONS. THE IMPUTATION IS USUALLY BASED ON AN INCAUTIOUS SAYING ATTRIBUTED TO GAUSS, TO THE EFFECT THAT, IF MATHEMATICS IS THE QUEEN OF THE SCIENCES, THEN THE THEORY OF NUMBERS IS, BECAUSE OF ITS SUPREME USELESSNESS, THE QUEEN OF MATHEMATICS\xe2\x80\x94I HAVE NEVER BEEN ABLE TO FIND AN EXACT QUOTATION. I AM SURE THAT GAUSS\xe2\x80\x99S SAYING (IF INDEED IT BE HIS) HAS BEEN RATHER CRUDELY MISINTERPRETED. IF THE THEORY OF NUMBERS COULD BE EMPLOYED FOR ANY PRACTICAL AND OBVIOUSLY HONOURABLE PURPOSE, IF IT COULD BE TURNED DIRECTLY TO THE FURTHERANCE OF HUMAN HAPPINESS OR THE RELIEF OF HUMAN SUFFERING, AS PHYSIOLOGY AND EVEN CHEMISTRY CAN, THEN SURELY NEITHER GAUSS NOR ANY OTHER MATHEMATICIAN WOULD HAVE BEEN SO FOOLISH AS TO DECRY OR REGRET SUCH APPLICATIONS. BUT SCIENCE WORKS FOR EVIL AS WELL AS FOR GOOD (AND PARTICULARLY, OF COURSE, IN TIME OF WAR); AND BOTH GAUSS AND LESS MATHEMATICIANS MAY BE JUSTIFIED IN REJOICING THAT THERE IS ONE SCIENCE AT ANY RATE, AND THAT THEIR OWN, WHOSE VERY REMOTENESS FROM ORDINARY HUMAN ACTIVITIES SHOULD KEEP IT GENTLE AND CLEAN.

>>> for a in " ABCDEFGHIJKLMNOPQRSTUVWXYZ":
...       print([a,STR.count(a)])
...
[' ', 210]
['A', 86]
['B', 19]
['C', 37]
['D', 28]
['E', 125]
['F', 27]
['G', 15]
['H', 51]
['I', 78]
['J', 2]
['K', 4]
['L', 32]
['M', 30]
['N', 67]
['O', 65]
['P', 19]
['Q', 3]
['R', 55]
['S', 81]
['T', 93]
['U', 44]
['V', 8]
['W', 8]
['X', 1]
['Y', 22]
['Z', 0]

The following commands read in cipher text produced from a single letter message unit permutation cipher, determines the ciphertext character corresponding to white space, and then splits the cipher text into blocks with each block a corresponding to a word in plaintext.

>>> from cs402 import AffineCipher
>>> C=AffineCipher
>>> C.setAlphabet("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> f=open("cipher2.txt","r")
>>> str=f.read()
>>> f.close()
>>> for a in C.Alphabet:
...   print([a,str.count(a)])
...
['0', 103]
['1', 56]
['2', 37]
['3', 25]
['4', 426]
['5', 7]
['6', 1]
['7', 2]
['8', 1]
['9', 5]
['A', 35]
['B', 89]
['C', 43]
['D', 115]
['E', 1]
['F', 89]
['G', 175]
['H', 52]
['I', 135]
['J', 199]
['K', 20]
['L', 3]
['M', 3]
['N', 5]
['O', 5]
['P', 3]
['Q', 2]
['R', 2]
['S', 175]
['T', 100]
['U', 273]
['V', 54]
['W', 181]
['X', 11]
['Y', 64]
['Z', 147]
>>> str.split("4")
['J1IWGV', '2S0', 'DWVDF3', 'WGCF1UGJWSF', 'WG', 'JDU', 'BUKUFZHYUGJ', 'ZC', 'JDUZIUJWTSF', 'TZYH1JUI', '0TWUGTU', 'HIZKWBWGV', 'S', 'CZIYSFW0SJWZG', 'ZC', 'JDU', 'TZGTUHJ0', 'ZC', 'SFVZIWJDY', 'SGB', 'TZYH1JSJWZG', '2WJD', 'JDU', 'J1IWGV', 'YSTDWGU', '2DWTD', 'TSG', 'AU', 'TZG0WBUIUB', 'S', 'YZBUF', 'ZC', 'S', 'VUGUISF', 'H1IHZ0U', 'TZYH1JUI', 'J1IWGV', 'W0', '2WBUF3', 'TZG0WBUIUB', 'JZ', 'AU', 'JDU', 'CSJDUI', 'ZC', 'JDUZIUJWTSF', 'TZYH1JUI', '0TWUGTU', 'SGB', 'SIJWCWTWSF', 'WGJUFFWVUGTU', 'B1IWGV', 'JDU', '0UTZGB', '2ZIFB', '2SI', 'J1IWGV', '2ZIXUB', 'CZI', 'JDU', 'VZKUIGYUGJ', 'TZBU', 'SGB', 'T3HDUI', '0TDZZF', 'SJ', 'AFUJTDFU3', 'HSIX', 'AIWJSWG', '0', 'TZBUAIUSXWGV', 'TUGJIU', 'JDSJ', 'HIZB1TUB', '1FJIS', 'WGJUFFWVUGTU', 'CZI', 'S', 'JWYU', 'DU', 'FUB', 'D1J', 'R', 'JDU', '0UTJWZG', '2DWTD', '2S0', 'IU0HZG0WAFU', 'CZI', 'VUIYSG', 'GSKSF', 'TI3HJSGSF30W0', 'DUIU', 'DU', 'BUKW0UB', 'S', 'G1YAUI', 'ZC', 'JUTDGW 1U0', 'CZI', '0HUUBWGV', 'JDU', 'AIUSXWGV', 'ZC', 'VUIYSG', 'TWHDUI0', 'WGTF1BWGV', 'WYHIZKUYUGJ0', 'JZ', 'JDU', 'HIU2SI', 'HZFW0D', 'AZYAU', 'YUJDZB', 'SG', 'UFUTJIZYUTDSGWTSF', 'YSTDWGU', 'JDSJ', 'TZ1FB', 'CWGB', '0UJJWGV0', 'CZI', 'JDU', 'UGWVYS', 'YSTDWGU', 'J1IWGV', 'HFS3UB', 'S', 'HWKZJSF', 'IZFU', 'WG', 'TISTXWGV', 'WGJUITUHJUB', 'TZBUB', 'YU00SVU0', 'JDSJ', 'UGSAFUB', 'JDU', 'SFFWU0', 'JZ', 'BUCUSJ', 'JDU', 'GSMW0', 'WG', 'YSG3', 'TI1TWSF', 'UGVSVUYUGJ0', 'WGTF1BWGV', 'JDU', 'ASJJFU', 'ZC', 'JDU', 'SJFSGJWT', 'SGB', 'WG', '0Z', 'BZWGV', 'DUFHUB', '2WG', 'JDU', '2SI', 'TZ1GJUICSTJ1SF', 'DW0JZI3', 'W0', 'BWCCWT1FJ', '2WJD', 'IU0HUTJ', 'JZ', 'JDU', 'UCCUTJ', '1FJIS', 'WGJUFFWVUGTU', 'DSB', 'ZG', 'JDU', 'FUGVJD', 'ZC', 'JDU', '2SI', 'A1J', 'SJ', 'JDU', '1HHUI', 'UGB', 'WJ', 'DS0', 'AUUG', 'U0JWYSJUB', 'JDSJ', 'JDW0', '2ZIX', '0DZIJUGUB', 'JDU', '2SI', 'WG', 'U1IZHU', 'A3', 'YZIU', 'JDSG', 'J2Z', '3USI0', 'SGB', '0SKUB', 'ZKUI', 'CZ1IJUUG', 'YWFFWZG', 'FWKU0', '', 'SCJUI', 'JDU', '2SI', 'J1IWGV', '2ZIXUB', 'SJ', 'JDU', 'GSJWZGSF', 'HD30WTSF', 'FSAZISJZI3', '2DUIU', 'DU', 'BU0WVGUB', 'JDU', 'STU', 'SYZGV', 'JDU', 'CWI0J', 'BU0WVG0', 'CZI', 'S', '0JZIUB', 'HIZVISY', 'TZYH1JUI', 'WG', '59PR', 'J1IWGV', 'EZWGUB', 'YSL', 'GU2YSG', '0', 'TZYH1JWGV', 'YSTDWGU', 'FSAZISJZI3', 'SJ', 'JDU', 'KWTJZIWS', '1GWKUI0WJ3', 'ZC', 'YSGTDU0JUI', '2DUIU', 'DU', 'DUFHUB', 'BUKUFZH', 'JDU', 'YSGTDU0JUI', 'TZYH1JUI0', 'SGB', 'AUTSYU', 'WGJUIU0JUB', 'WG', 'YSJDUYSJWTSF', 'AWZFZV3', 'DU', '2IZJU', 'S', 'HSHUI', 'ZG', 'JDU', 'TDUYWTSF', 'AS0W0', 'ZC', 'YZIHDZVUGU0W0', 'SGB', 'HIUBWTJUB', 'Z0TWFFSJWGV', 'TDUYWTSF', 'IUSTJWZG0', '01TD', 'S0', 'JDU', 'AUFZ10ZK', '', '', 'MDSAZJWG0X3', 'IUSTJWZG', 'CWI0J', 'ZA0UIKUB', 'WG', 'JDU', '59QN0', '', 'J1IWGV', '2S0', 'HIZ0UT1JUB', 'WG', '597O', 'CZI', 'DZYZ0UL1SF', 'STJ0', '2DUG', 'A3', 'JDU', 'FSAZ1TDUIU', 'SYUGBYUGJ', 'VIZ00', 'WGBUTUGT3', '2S0', 'TIWYWGSF', 'WG', 'JDU', '1X', 'DU', 'STTUHJUB', 'TDUYWTSF', 'TS0JISJWZG', 'JIUSJYUGJ', '2WJD', 'BU0', 'S0', 'SG', 'SFJUIGSJWKU', 'JZ', 'HIW0ZG', 'J1IWGV', 'BWUB', 'WG', '597P', '5Q', 'BS30', 'AUCZIU', 'DW0', 'POGB', 'AWIJDBS3', 'CIZY', 'T3SGWBU', 'HZW0ZGWGV', 'SG', 'WG 1U0J', 'BUJUIYWGUB', 'DW0', 'BUSJD', 'S0', '01WTWBU', 'A1J', 'WJ', 'DS0', 'AUUG', 'GZJUB', 'JDSJ', 'JDU', 'XGZ2G', 'UKWBUGTU', 'W0', 'SF0Z', 'TZG0W0JUGJ', '2WJD', 'STTWBUGJSF', 'HZW0ZGWGV', 'WG', 'ONN9', 'CZFFZ2WGV', 'SG', 'WGJUIGUJ', 'TSYHSWVG', 'AIWJW0D', 'HIWYU', 'YWGW0JUI', 'VZIBZG', 'AIZ2G', 'YSBU', 'SG', 'ZCCWTWSF', 'H1AFWT', 'SHZFZV3', 'ZG', 'AUDSFC', 'ZC', 'JDU', 'AIWJW0D', 'VZKUIGYUGJ', 'CZI', 'JDU', 'SHHSFFWGV', '2S3', 'DU', '2S0', 'JIUSJUB', ' 1UUG', 'UFWMSAUJD', 'WW', 'VISGJUB', 'DWY', 'S', 'HZ0JD1YZ10', 'HSIBZG', 'WG', 'ON56', 'JDU', 'SFSG', 'J1IWGV', 'FS2', 'W0', 'GZ2', 'SG', 'WGCZIYSF', 'JUIY', 'CZI', 'S', 'ON58', 'FS2', 'WG', 'JDU', '1GWJUB', 'XWGVBZY', 'JDSJ', 'IUJIZSTJWKUF3', 'HSIBZGUB', 'YUG', 'TS1JWZGUB', 'ZI', 'TZGKWTJUB', '1GBUI', 'DW0JZIWTSF', 'FUVW0FSJWZG', 'JDSJ', 'Z1JFS2UB', 'DZYZ0UL1SF', 'STJ0']
>>>

From this we can spot that the ciphertext word "S" most likely corresponds to the plaintext word "A".

From the ciphertext
'SG', 'SFJUIGSJWKU' it seems likely that the ciphertetxt word "SG" corresponds to the plaintext word "AN".

Frequence analysis on Vigenère ciphers

A Vigenère cipher is really just a collection of d Caeser ciphers where d is the length of the secret enciphering key. Once we know d it is as easy to crack a Vigenère cipher is it is to crack d Caeser ciphers.

The cipher text in the file cipher3.txt was produced using a Vigenère cipher. To determine d we performed an analysis of various English texts on the web and found that roughly one fifth to one sixth of all characters are the space character. Armed with this information we procede as follows. The session starts by defining two functions lst() and MostFrequentLeter() which are actually stored in cs402.py and so could be imported directly from that file.

>>> def lst(str,s,d):
... #This function inputs a string str, a non-negative integer s and a positive integer d.
... # The function returns a list consisting of every d-th entry of the string str starting at position s in str
...        M=[]
...        for i in range(0,int(len(str)/d)-s):
...           M.append(str[s+d*i])
...        return M
...

def MostFrequentLetter(x):
#returns the number of times the most frewquent letter occurs
#in the string x
       L=[]
       for a in set(x):
         L.append(x.count(a))
       return max(L)

>>> f=open("cipher3.txt", "r")
>>> x=f.read()
>>> f.close()

>>> d=1;int(len(x)/5.5*d);MostFrequentLetter(lst(x,1,d))
1622
692
>>> d=d+1;int(len(x)/5.5*d);MostFrequentLetter(lst(x,1,d))
3244
554
>>> d=d+1;int(len(x)/5.5*d);MostFrequentLetter(lst(x,1,d))
4866
337
>>> d=d+1;int(len(x)/5.5*d);MostFrequentLetter(lst(x,1,d))
6488
293
>>> d=1;int(len(x)/(5.5*d));MostFrequentLetter(lst(x,1,d))
1622
692
>>> d=d+1;int(len(x)/(5.5*d));MostFrequentLetter(lst(x,1,d))
811
554
>>> d=d+1;int(len(x)/(5.5*d));MostFrequentLetter(lst(x,1,d))
540
337
>>> d=d+1;int(len(x)/(5.5*d));MostFrequentLetter(lst(x,1,d))
405
293
>>> d=d+1;int(len(x)/(5.5*d));MostFrequentLetter(lst(x,1,d))
324
139
>>> d=d+1;int(len(x)/(5.5*d));MostFrequentLetter(lst(x,1,d))
270
268
>>> int(len(x)/(5.5*d));MostFrequentLetter(lst(x,2,d))
270
282
>>> int(len(x)/(5.5*d));MostFrequentLetter(lst(x,3,d))
270
245
>>> int(len(x)/(5.5*d));MostFrequentLetter(lst(x,4,d))
270
256
>>> int(len(x)/(5.5*d));MostFrequentLetter(lst(x,5,d))
270
285

>>> d
6
 
So in this example it seems that d=6.


Known plaintext attack on an affine cipher

If the affine cryptosystem (ZN)d ----> (ZN)d, x |---> Ax + B has been used to produce ciphertext, and if the adversary knows a portion of the corresponding plaintext, then he/she can try to determine the d2 entries in A and d entries in B by setting up a system of d2+d equations in d2+d unkowns modulo N, where N is the length of the alphabet. To solve such a system of equations one can reduce the problem to that of inverting a suitable matrix modulo N (this is just first year maths!).  I've included a rather clumsy method for finding such an inverse in the cs402.py file. It can be used as follows.

>>> from numpy import mat
>>> from cs402 import InverseMatModN
>>> A=mat([[1,2],[7,9]])
>>> InverseMatModN(A,26)
matrix([[19, 16],
        [17, 5]], dtype=object)
>>>
>>> A*InverseMatModN(A,26) %26
matrix([[1, 0],
        [0, 1]], dtype=object)

Converting between strings over an alphabet, decimal lists, binary lists and integers

The python class Cipher defined in the file cs402.py converts a text over an alphabet to a sequence of integers. The folowing is an example of this.

>>> from cs402 import StringToIntList
>>> alphabet="abcdefghijklmnopqrstuvwxyz! ,."
>>> x="this is a string!"
>>> D=StringToIntList(alphabet,x)
>>> D
[19, 7, 8, 18, 27, 8, 18, 27, 0, 27, 18, 19, 17, 8, 13, 6, 26]
 
In many cases, such as with stream ciphers, we need to convert text to a binary sequence. There are several standard ways of doing this in Python, but for pedagocial reasons in this CS402 course we shall not use these standard methods. Rather we shall use the following functions dec2bin(D,n) and bin2dec(B,n) . The first function inputs a list D of decimal integers and a decimal integer n; it converts each term in D to a binary list of length n and then returns the concatenation of these lists.  The second function is the inverse of the first.

>>> from cs402 import dec2bin
>>> from cs402 import bin2dec
>>> D=[2,3,5,7,11,13]
>>> B=dec2bin(D,8)
>>> print(B)
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1]
>>> D1=bin2dec(B,8)
>>> print(D1)
[2, 3, 5, 7, 11, 13]

Sometimes we'd like to convert a binary list to the corresponding binary number and then to the corresponding decimal integer. Here is an example.

>>> from cs402 import StringToIntList
>>> from cs402 import dec2bin
>>> from cs402 import binList2int
>>> alphabet=" 1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ."
>>> x="this is a message">>> D=StringToIntList(alphabet,x)
>>> B=dec2bin(D,7)
>>> m=binList2int(B)
>>> m
156505165645555088404210806358722703L

Linear Feedback Shift Registers

A binary Linear Feedback Shift Register R is determined by its connection polynomial
C(X) = 1 + c1X + c2X2 + ... + cnXn
where each ci = 0 or 1 and cn = 1. For any given binary list s1, s2, ... , sn of length n the shift register determines an infinite binary sequence
s1, s2, ... , sn, sn+1, ...., si, ...
where for i > n we have
si = c1sj-1 + c2sj-2 + c3sj-3+ ... + cnsj-n .

Let K=[i1,i2,...im] be the list of indices of the non-zero coefficients ci in the polynomial C(X). Given K and a list Si=[si,si+1,...,si+n-1] the Python function LSFR(K,Si) in the file cs402.py returns the list  Si+1=[si+1,si+2,...,si+n] .  So for example, the following Python commands produce the first 30 terms of the sequence s1, s2, ... , sn, sn+1, ...., for

C(X) = 1 + X + X3

s1=1, s2=1, s3=0.

>>> from cs402 import LFSR
>>> K=[1,3]
>>> S=[1,1,0]
>>> L=[1,1,0]
>>> for i in range(0,27): S=LFSR(K,S); L.append(S[2]);
...
>>> print L
[1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1]

RSA Cryptography

The RSA public key cryptosystem has public key

K=(N,e)

where N=pq is the product of two primes, and e is coprime to the product (p-1)(q-1). The cryptosystem has private key

L=(N,d)

where

 d=e-1 modulo (p-1)(q-1).

Messages m are represented as integers in the range 0<m<N and the enciphering function is

m |---> me modulo N .

The deciphering function is

m |---> md modulo N .

The following Python code illustrates how to use the RSA implementation in the file cs402.py with

plaintext= "This is a SHORT message."
p=304250263527209
q=23768741896345550770650537601358309
N=7231645985673347207280720222548553948759779729581
e=3
d=4821097323782215625692549251331855329314609896043

Since in this example we are using a very small value for N we need to keep our messages very short since the system relies on a bijection between messages m and integers in the range 0<m<N and with m not equal to p or q.

>>> from cs402 import RSA
>>> C=RSA
>>> plain="This is a SHORT message."
>>> f=open("pln.txt","w")
>>> f.write(plain)
>>> f.close()
>>> cipher=C.encipher("pln.txt")
>>> print cipher
EvrH7lkb6pwBkJHOmjnxR6dptwW

>>> f=open("cph.txt","w")
>>> f.write(cipher)
>>> f.close()
>>> decipher=C.decipher("cph.txt")
>>> print decipher
This is a SHORT message.

>>> C.EncipherKey
[7231645985673347207280720222548553948759779729581L, 3]

>>> C.DecipherKey
[7231645985673347207280720222548553948759779729581L, 4821097323782215625692549251331855329314609896043L]



Choosing an RSA public key

To choose large primes p, q and a suitable e and d we can perform variations on the following basic commands. However, once we've chosen p and q in this way we should certainly perform a few well-chosen tests to make sure they are not "easily decipherable".   For reasonable safety the primes p and q should each have at least 100 digits.

>>> from sympy import nextprime
>>> from cs402 import modinv
>>> p=nextprime(10**100+98765)
>>> q=nextprime(10**101+54321)
>>> p
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000099271L
>>> q
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000054757L
>>>> e=3
>>> d=modinv(e,(p-1)*(q-1))
>>> d
666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666673649706666666666666666666666666666666666666666666666666666666666666666666666666666666666666666670290418747L

>>> from cs402 import RSA
>>> C=RSA
>>> C.setEncipherKey([p*q,e])
>>> C.setDecipherKey([p*q,d])


Choosing an RSA public key without using the Sympy module

If for some reason you don't have access to the Sympy module then a few lines based on Fermat's Little Theorem (a special case of Euler's Totient Theorem) are all that are needed to find large p and q which are "probably" prime. The following two functions can be imported from the cs402.py file.

from random import randint

def IsProbablyPrime(n,k):

    for i in range(0,k):
        a=randint(1,n)
        if not pow(a,n-1,n)==1: return False
    return True

def ProbablyNextPrime(n):
    m=n
    while not IsProbablyPrime(m,100): m=m+1
    return m

We can now reproduce the calculation of the previous section without using the nextprime() function.

>>> p=ProbablyNextPrime(10**100+98765)
>>> q=ProbablyNextPrime(10**101+54321)
>>> p
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000099271L
>>> q
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000054757L

Breaking an RSA public key

Suppose that we know the public key
N=121932633334857493
e=3
of someone's RSA cryptosystem and we wish to descover their private key. First we could use the following naive trial division method to find the prime factorisation of N. You'll see from the length of time taken to factorize N that this value of N is getting close to the limit on which trial division can be used to factorize products of two big primes.

>>> from cs402 import factors
>>> N=121932633334857493
>>> x=factors(N)
>>> x
set([1, 987654323, 121932633334857493, 123456791])

>>> p=list(x)[1]
>>> q=list(x)[3]

We can now find
d=e-1 modulo (p-1)(q-1)
as follows.

>>> from cs402 import modinv
>>> e=3
>>> d=modinv(e,(p-1)*(q-1))
>>> d
81288421482497587

But to break more secure RSA public keys we'll need to use some deeper maths in our integer factorization algorithm!