Development debates are common, especially regarding topics such as security.
While working with my team on setting up secure communication, we started a discussion about the configuration of an encryption key. With our insatiable technical curiosity being at least as important as our thirst for debate, I decided to start exploring these concepts to learn more about how the art of encryption was shaped throughout the years.
Little did I know that a whole world would open up before me.
Today, I invite you to jump into a time machine and join me on this journey to discover cryptography, the art of making a message secret through history. In this three-part blog series, we will observe the underlying concepts, we will play with these mechanisms through historical anecdotes, and will see their evolution until today.
For the first part of our adventure through time, we will visit ancient and medieval civilizations.
Let the journey begin!
Prepare for time travel
We are about to travel through time to understand the origins of cryptography. Like any journey, we have to prepare ourselves. Here is a short glossary that will come in handy:
Cryptography refers to encryption which is the process of making a normal message (plaintext) unintelligible (ciphertext). To go back to the normal message, you have to decrypt it. An algorithm and a key are used to encrypt and decrypt the message.
To make a message secret, we can :
- make a substitution by character — this is encryption (e.g. replace with the next letter in the alphabet).
- make a substitution by word/logical chunk —it is the coding (e.g. translate in another language)
- concealing one's own existence — this is called steganography (e.g. invisible ink)
For the sake of simplicity, In this article series, I will only be covering encryption and encrypted messages.
Feeling ready? Let’s jump in!
The art of cryptography 2000 BC – 800 AD
The secrets of ancient Egypt
We just traveled 4000 years into the past, and we just landed in the ancient Egyptian town of Menat Khufu, in the tomb of the nobleman, Khnumhotep II. Take a look at those unusual hieroglyphic inscriptions, have you had any difficulty reading them?😉 It's ok; it is probably because it contains unusual hieroglyphs used as a way to confuse an unexpected reader.
Using an unusual symbol, or switching it, is a basic and simple way to encrypt a message. You may have doubts about this one, and you are right. This example is controversial because these specificities could be due to the context — like our legal vocabulary that uses sometimes uncommon words — a stylistic effect or the evolution of the language itself.
In any case, a tomb is not the best place to discuss; let's go to our next stop in time.
The most Spartan encryption
We've made a big jump, we are in Greece, 3rd century BC, and we are searching for Apollonius of Rhodes. I would love to ask him about the famous Argonautica, but we are here to talk about cryptography! We see no one at the meeting point, instead, we have a weird cylinder with a strip of parchment waiting for us.
You may have guessed, that cylinder is a scytale ( Ancient Greek: σκυτάλη skutálē "baton, cylinder", also σκύταλον skútalon). In cryptography terminology, the scytale is the key to encrypting and decrypting the message.
As the system is pretty simple to crack, another interpretation is that this mechanism is used more for authentication than encryption. Both communicants must have the same cylinder characteristics to write and read the message, so the receiver is waiting for a specific ciphered message size, and an interceptor will not be able to corrupt the content easily.
Let's play with it to fully understand the concept. I want to send you a message, in plaintext:
This is my first encrypted message. But it is a secret message, remember? So, I need to encrypt it. We exchange the type of scytale we want to use (physical characteristics). I take a scytale (the key) and I wrap the strip of parchment around, then I write the message:
Then I unwrap the parchment and obtain the encrypted message:
Tfrshiysirpasstgiteesedmnmyce. The least we can say is that this message has lost its clarity, hasn't it?
To decrypt this message, you take your scytale with the predefined physical characteristics (the key), wrap the parchment around it, and… tadaaaa! You can now read the original plaintext message.
Excellent — we discovered and used the first encryption mechanism! Of course, it was a first naive approach that could be improved, for example, by adding unexpected characters or using a complex scytale form.
This logical group permutation algorithm is called transposition cipher.
One moment, I can hear the sound of a clash... Indeed, this technique was commonly used by the Spartans during military campaigns, I think it would be more prudent to leave right away!
We are around 200 BC, near Jerusalem, and we are searching for the Book of Jeremiah. Why so? This text contains Hebrew words that have been encrypted using a method called Atbash (Hebrew: אתבש; also transliterated Atbaš). Atbash derives from the first two letters Aleph and Taw and the last two letters Bet and Shin in the Hebrew alphabet.
This encryption consists of replacing each character with the same position in the reversed alphabet.
One well-known example is in Jeremiah 25:26. "The king of Sheshach shall drink after them" — Sheshach meaning Babylon in Atbash (בבל bbl → ששך ššk).
Theologically speaking, Babylon is the city where the original unique human language was separated, and thus the original meaning was lost. Don't you see a form of irony in the context of the origin of cryptography?
By the way, we can perfectly transpose this mechanism to our Roman alphabet. But hey, if we want to play with Latin characters, how about dropping by ancient Rome? Let's go!
YHQL YHGL YLFL
We are in the last century BC during Julius Caesar's reign. The First Triumvirate was difficult to create, and the political life of Julius Caesar was very eventful, with frequent reversals. Because of this, he got into the habit of encrypting his correspondence, and someone even ended up giving his name to an encryption technique: Caesar's cipher
This encryption method is implemented by replacing each letter of the plaintext with a letter with a fixed number of positions down the alphabet. This type of method is called a substitution cipher. The shift parameter is the encryption key.
Let's start with our alphabet by implementing a right shift of three:
d is replaced by
a which is the letter three positions before, and so on. But what about the
a that becomes
x? The alphabet must be read like a clock: 2 hours before 1 am, it is 11 pm. In the same way,
x is three letters before
a. This mechanism of restarting from the beginning is a fundamental mathematics concept in cryptography, and it is called modular arithmetic.
Similarly to the previous technique, the same key is used to encrypt and decrypt, which is also a symmetric key algorithm.
Before we go any further, it seems that the title of our section has been "inadvertently" encrypted, can you decrypt it? Rumor has it that it is a famous phrase of Julius Caesar.
YHQL YHGL YLFL
Congrats! We added the substitution cipher to our toolbox, so let's move on to our next destination.
The 44th art of the Kama Sutra
Our time machine suffers from a minor sensor malfunction, and we are not able to isolate the exact point in time we landed — we are sometime between 0 and 400 CE, probably around 200 CE. We are in Pataliputra, in India, in the footsteps of the brahmin Vātsyāyana who that writes the first version of the Kama Sutra (Sanskrit: कामसूत्र , Kāma-sūtra; "Principles of Love").
At the risk of disappointing you, and contrary to popular belief, the Kama Sutra is not a collection of sexual positions, but rather a guide on how to develop and maintain a fulfilling romantic relationship.
Among the 64 arts listed in the Kama Sutra, we are interested in the 44th: Mlecchita Vikalpa. This is "the art of secret writing and secret communications".
The initial text does not provide a specific method; however, the text evolves through commentaries. Jayamangala commentary proposes the Kautilya and Muladeviya methods, which are basically substitution ciphers based on phonetic relations.
Nowadays, what is commonly known as the Kama Sutra cipher, refers to the method of defining a couple of letters that you use to replace other letters. For example, the Latin alphabet can be divided into 13 pairs of characters and each letter in plaintext will be replaced by the coupled letter.
With this substitution table,
As you probably guessed already, we are still using a symmetric key algorithm for encrypting and decrypting the message.
This type of algorithm has been evolving throughout history, and there are many more evolution checkpoints for this kind of cipher throughout history. But in the interest of time, let’s move on to another evolutionary step of cryptography.
Tales of 1001 ciphers
Encrypted messages are an awesome way to establish secure communication, but what happens each time we try to secure a message? It is always possible (if not inevitable) that someone will eventually find a workaround or crack our encryption. That’s where cryptanalysis (from the Greek kryptós, "hidden", and analýein, "to analyze") comes in. Thanks to cryptanalysis, an encrypted message could be decrypted even without the encryption key.
We are in the 9th century AD, near Baghdad, Iraq, and we are searching for the work of Al-Kindi. Among others, he is considered one of the fathers of cryptography and a famous scientist who worked on cryptanalysis and, more specifically, on frequency analysis.
The study of Coran uses the premises of frequency analysis. Indeed, the fragmentary nature of the Coran generated a specific theological branch that analyzed the text in depth. The method consists of checking the frequency of the words and comparing them to complete text from different periods. So, we may attribute a text to a specific period thanks to the word frequency.
In the case of an encrypted message, if the cipher method keeps the position of the characters, we may infer which original character it refers to, thanks to the frequency of this character in a text from the same period and the same language. If you have a long enough text, you sort all characters by their frequency, then you do the same with the encrypted text, and the chances are that those lists will match.
To be honest, it is not as simple, you have to check the most frequent letter and not a one-to-one match. Then you can rely on the language structure, the number of characters, and the position of the character inside the word.
Let's try it out!
According to Wikipedia, the frequency of English letters is as follows:
Let's try frequency analysis on a ciphered text:
Jbhyq lbh or noyr gb qrpvcure guvf grkg jvgubhg univat gur rapelcgvba xrl? Ubjrire, gb or rssvpvrag, gur grkg arrqf gb or ybat rabhtu gb graq gb na nirentr qvfgevohgvba bs gur punenpgref. Vs vg vf abg gur pnfr, bgure punenpgrevfgvpf znl or hfrq fhpu nf gur svefg yrggre serdhrapl be gur hfhny cbfvgvba bs ibjryf naq pbafbanagf.
|Text frequency||r: 14.02%||g: 12.88%||b: 8.71%||v: 6.82%||a: 6.44%||f: 6.44%||u: 6.06%||n: 5.68%||e: 5.30%||p: 4.55%||h: 3.79%||q: 2.65%||s: 2.65%||y: 2.27%||o: 2.27%||l: 1.89%||t: 1.52%||i: 1.52%||j: 1.52%||c: 1.14%||k: 0.76%||z: 0.38%||d: 0.38%||x: 0.38%||-||-1|
|English frequency||e: 13.00%||t: 9.10%||a: 8.20%||o: 7.50%||i: 7.00%||n: 6.70%||s: 6.30%||h: 6.10%||r: 6.00%||d: 4.30%||l: 4.00%||c: 2.80%||u: 2.80%||w: 2.40%||m: 2.40%||f: 2.20%||y: 2.00%||g: 2.00%||p: 1.90%||b: 1.50%||v: 0.98%||k: 0.77%||j: 0.15%||x: 0.15%||q: 0.10%||z: 0.07%|
Starting from the frequency, we can replace the characters following the order of the previous table and switch to the nearest letter if it does not match. There are several technics, we can brute force search all combinations and choose the right one. We can check step by step and detect common words — for example, words like
and — apply the reverse logic and see which of these words are impossible to use in this context. We will start by replacing
.a... .a. .e ...e ta .e....e. t... te.t ..t.a.t ...... t.e e.....t.a. .e.? .a.e.e., ta .e e.....e.t, t.e te.t .ee.. ta .e .a.. e.a... ta te.. ta .. ..e...e ...t....t.a. a. t.e ......te... .. .t .. .at t.e ...e, at.e. ......te...t... ... .e ..e. .... .. t.e ....t .ette. ..e..e... a. t.e ..... .a..t.a. a. .a.e.. ... .a..a...t..
ta does not exist, let's try to replace
.o... .o. .e ...e to .e....e. t... te.t ..t.o.t ...... t.e e.....t.o. .e.? .o.e.e., to .e e.....e.t, t.e te.t .ee.. to .e .o.. e.o... to te.. to .. ..e...e ...t....t.o. o. t.e ......te... .. .t .. .ot t.e ...e, ot.e. ......te...t... ... .e ..e. .... .. t.e ....t .ette. ..e..e... o. t.e ..... .o..t.o. o. .o.e.. ... .o..o...t..
From here, we can try to infer some replacement as a cruciverbalist.
to *e may be
to be, so
t*e could be
h. Let's try this again:
.o... .o. be .b.e to .e...he. th.. te.t ..tho.t h..... the e.....t.o. .e.? .o.e.e., to be e.....e.t, the te.t .ee.. to be .o.. e.o..h to te.. to .. ..e...e ...t..b.t.o. o. the .h....te... .. .t .. .ot the ...e, othe. .h....te...t... ... be ..e. ...h .. the ....t .ette. ..e..e... o. the ..... .o..t.o. o. .o.e.. ... .o..o...t..
With some other tricks and guesses, we obtain this message with the associated table:
Would you be able to decipher this text without having the encryption key? However, to be efficient, the text needs to be long enough to tend to an average distribution of the characters. If it is not the case, other characteristics may be used such as the first letter frequency or the usual position of vowels and consonants.
The method is not perfect; however, the longer the text, the more likely the frequency is close to the average frequency. We are not so far away in this example.
We found the original message, but can we find the key? If we sort the letters alphabetically, we get:
It is a Caesar code with a shift of 13. This method gave us the plaintext message and the key!
Preparing for the next jump
Ok, we have a new powerful tool that can break encrypted messages with mono-alphabetic substitution. But it is time to rest before we make our next big jump in time.
We have a long way to cover, from exploring ways to counter the frequency analysis, following the evolution of cryptography until the 1940s, and finally playing around with the configuration of an encryption key using methods from the 1970s until today!