Crypotography Projects

In this code, I have encryption, decryption and cryptanalytical tools for Caesar, Viginre and monoalphabetic ciphers.

Encryption is pretty standard, take the plain text from stdin and any keys from the command line arguments, iterate over input, removing any non alphabetical charachters then encrypts them. Decryption is pretty similar, but as all the spaces are removed from the text, I have to split it into words. I created a separate command add_spaces to do this; my code uses a trie and a dictionary the code is heavily based on this.


The cryptanalysis is where things get interesting. The system is based off distributions of letters, pairs of letters, letter triplets and quartets (monograms, bigrams, trigrams and quadgrams respectively) trained off the entirety of Project Gutenberg's text (~70 gigabytes) in an archive I downloaded (link - broken 24/03/19). As this took around 4 hours to process on 8 threads, The data is available here (JSON form Binary Form) Using this I was able to give any piece of text a score based on how likely they were, for example (extract shown, full here):

Rank Text Score
1 Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Nine for Mortal Men, doomed to die -0.614327
2 Uisff Sjoht gps uif Fmwfo-ljoht voefs uif tlz, Tfwfo gps uif Exbsg-mpset jo uifjs ibmmt pg tupof, Ojof gps Npsubm Nfo, eppnfe up ejf -1.94954
3 Guerr Evatf sbe gur Ryira-xvatf haqre gur fxl, Frira sbe gur Qjnes-ybeqf va gurve unyyf bs fgbar, Avar sbe Zbegny Zra, qbbzrq gb qvr -1.97711
10 Sgqdd Qhmfr enq sgd Dkudm-jhmfr tmcdq sgd rjx, Rdudm enq sgd Cvzqe-knqcr hm sgdhq gzkkr ne rsnmd, Mhmd enq Lnqszk Ldm, cnnldc sn chd -2.39352
20 Escpp Ctyrd qzc esp Pwgpy-vtyrd fyopc esp dvj, Dpgpy qzc esp Ohlcq-wzcod ty esptc slwwd zq dezyp, Ytyp qzc Xzcelw Xpy, ozzxpo ez otp -2.75304
26 Lzjww Jafyk xgj lzw Wdnwf-cafyk mfvwj lzw kcq, Kwnwf xgj lzw Vosjx-dgjvk af lzwaj zsddk gx klgfw, Fafw xgj Egjlsd Ewf, vggewv lg vaw -3.20434

For the Caesar Cipher, I simply generated the scores (using all 4 sets of probabilities) for each of the 26 permutations and then selected the one with the highest score.


This was more complex than the caesar cipher and it follows the following steps


This required a slightly more advanced technique, but the basic algorithm is as follows:

Gitlab Link