Alice wishes to mail Bob a ring. Unfortunately, anything sent through the mail will be stolen unless it is enclosed in a padlocked box. Alice and Bob each have plenty of padlocks, but none to which the other has a key. How can Alice get the ring safely into Bob's hands?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### One possible solution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the solution Caroline had in mind in four steps:\n", "\n", "1. Alice sends Bob a box with the ring in it and one of her padlocks on it.\n", "2. When Bob receives the box he affixes his own padlock to the box and mails it back to Alice with both padlocks on it.\n", "3. When Alice gets it she removes her padlock and sends the box back to Bob.\n", "4. When Bob receives the box again he can remove his padlock and then open the box!\n", "\n", "What does this have to do with cryptography? Surprisingly enough, the idea in this solution is the same idea that makes the Diffie–Hellman key exchange work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Primitive roots mod p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A number $g$ is a _primitive root_ modulo $p$ if the smallest solution of \n", "\n", "$$g^x\\equiv1\\pmod{p} \\qquad (x>0) \\tag{1} $$\n", "\n", "is $x=p-1$. Note that $g$ must be coprime to $p$ for solutions of (1) to exist. Fermat's little theorem implies that $x=p-1$ is _always_ a solution for any $g$ with $\\gcd(g,p)=1$. When $x=p-1$ is the **smallest** solution then $g$ is a primitive root.\n", "\n", "For example, 5 is a primitive root modulo 23 since $5^x\\bmod 23$ for $x=1,\\dotsc,22$ is\n", "\n", "$$ [5,2,10,4,20,8,17,16,11,9,22,18,21,13,19,3,15,6,7,12,14,{\\bf \\color{red}1}] . $$\n", "\n", "Conversely, 3 is not a primitive root modulo 23 since $3^x\\bmod 23$ for $x=1,\\dotsc,22$ is\n", "\n", "$$ [3,9,4,12,13,16,2,6,18,8,{\\bf \\color{red}1},3,9,4,12,13,16,2,6,18,8,{\\bf \\color{red}1}] . $$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show([5^x % 23 for x in (1..22)])\n", "show([3^x % 23 for x in (1..22)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sage provides the built-in function `primitive_root` which computes a primitive root modulo a given modulus $m$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "primitive_root(23)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Outline of Diffie–Hellman" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Say Alice wants to send a secret to Bob over an insecure channel. The first step is to derive a _shared secret key_ that only Alice and Bob know. They can do this as follows:\n", "\n", "1. Alice chooses a prime $p$ by some random process.\n", "2. Alice chooses some $g$, a primitive root modulo $p$.\n", "3. Alice chooses a random integer $a$ and computes $A:=g^a\\bmod p$.\n", "4. Alice sends $p$, $g$, and $A$ to Bob, but keeps $a$ secret.\n", "5. Bob receives $p$, $g$, and $A$, chooses a random integer $b$ and computes $B:=g^b\\bmod p$.\n", "6. Bob sends $B$ to Alice but keeps $b$ secret.\n", "7. Alice computes $B^a\\bmod p$ and Bob computes $A^b\\bmod p$. This is their shared secret number." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Alice chooses the prime $p=31337$.\n", "2. Alice chooses $g=3$ which is a primitive root modulo $p$.\n", "3. Alice chooses the random intger $a=11589$ and computes $A = 3^{11589} \\bmod p = 14479$.\n", "4. Alice sends $p$, $g$ and $A$ to Bob.\n", "5. Bob chooses the random integer $b=31239$ and computes $B = 3^{31239} \\bmod p = 2879$.\n", "6. Bob sends $B$ to Alice.\n", "7. Alice computes $2879^{11589}\\bmod p = 27390$ and Bob computes $14479^{31239} \\bmod p = 27390$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example in Sage" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = 31337\n", "g = primitive_root(p)\n", "print(\"p = {}, g = {}\".format(p, g))\n", "\n", "# Alice generates a random integer between 0 and p-1\n", "a = ZZ.random_element(p)\n", "# Bob generates a random integer between 0 and p-1\n", "b = ZZ.random_element(p)\n", "print(\"a = {}, b = {}\".format(a, b))\n", "\n", "A = g^a % p # Alice computes A and sends it to Bob\n", "B = g^b % p # Bob computes B and sends it to Alice\n", "print(\"A = {}, B = {}\".format(A, B))\n", "\n", "B^a % p # Alice computes B^a mod p\n", "A^b % p # Bob computes A^b mod p\n", "print(\"B^a = {} mod p and A^b = {} mod p\".format(B^a % p, A^b % p, p))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Why does this work?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, note that $B^a$ and $A^b$ produce the same number modulo $p$ because\n", "\n", "$$ B^a \\equiv (g^b)^a \\equiv g^{ba} \\equiv g^{ab} \\equiv (g^a)^b \\equiv A^b \\pmod{p} . $$\n", "\n", "Also note that an evesdropper can learn $p$, $g$, $A$, and $B$, but does not know $a$ or $b$. In order to compute these they would have to solve one of the following congruences for $a$ or $b$:\n", "\n", "\\begin{align*}\n", "g^a &\\equiv A \\pmod p \\qquad&\\qquad g^b &\\equiv B \\pmod{p}\n", "\\end{align*}\n", "\n", "Solving such an equation is known as the _discrete logarithm problem_ because it is equivalent to computing a base-$g$ logarithm modulo $p$. In other words, the congruences can be rewritten as\n", "\n", "\\begin{align*}\n", "\\log_g(A) &\\equiv a \\pmod{p} \\qquad&\\qquad \\log_g(B) &\\equiv b \\pmod{p} .\n", "\\end{align*}\n", "\n", "Perhaps surprisingly, computing discrete logarithms is a famously difficult problem. This is in stark contrast to computing logarithms such as $\\log_2(x)$ or $\\ln(x)$ over the real numbers which can be done efficiently." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Takeaway" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Diffie–Hellman relies on the property that the modular exponentiation $g^x\\bmod p$ can be done efficently (via repeated squaring) but no one knows any method of computing the modular logarithm $\\log_g(x)\\bmod p$ efficiently (i.e., polynomial in $\\renewcommand{\\len}{\\operatorname{len}}\\len(p)$) even when $g$ is small, e.g., $g=3$.\n", "\n", "For example, suppose you choose a 1024-bit prime $p$ (which can fit in sixteen 64-bit words). It is estimated that computing $\\log_g(x)$ mod $p$ requires on the order of 100 million dollars in computing power.\n", "\n", "While this is quite expensive, such a computation could actually be afforded by large governments—so if you _really_ want your secrets to be safe, choose a 2048-bit prime instead (which can fit in thirty-two 64-bit words). It's estimated that computing $\\log_g(x)$ mod $p$ would now be a billion times harder, i.e., would cost 100 trillion dollars! Conversely, computing $g^x\\bmod p$ is only slightly more challenging." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Application to communication" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that Diffie–Hellman only enables two parties to agree on the same shared secret. At first this might seem useless for communication purposes, since neither party can control what the shared secret is. However, we will now see a way that a shared secret can enable secure communication over an insecure channel.\n", "\n", "First, we will describe a simple method that only allows sharing a short message (shorter than the shared secret) but demonstrates the basic principle.\n", "\n", "Suppose Alice and Bob have agreed on the shared secret in the example above (27390) and Alice wishes to send Bob the message `CAT`. First, this string can be converted to a number by treating it as a base-27 number with `A=1`, `B=2`, `C=3`, etc. Then Alice wishes to send the number $3\\cdot27^2+1\\cdot27+20=2234$.\n", "\n", "Now Alice writes the shared secret and the message to send in binary: 27390 is `110101011111110` and 2234 is `000100010111010`. She then adds together the digits in the same position modulo 2:\n", "\n", "\\begin{align*}\n", " 110101011111110 \\\\\n", "\\underline{+\\;000100010111010} \\\\\n", " 110001001000100\n", "\\end{align*}\n", "\n", "The result is `110001001000100` in binary which is 25156 in decimal. This number is sent to Bob who can add the shared secret to the number in a similar manner in order to recover the original message. This works because $0+0\\equiv1+1\\equiv0\\pmod{2}$. In other words, adding the shared secret a second time will \"cancel off\" the secret and Bob will be left with the original message.\n", "\n", "Conversely, an evesdropper who does not know the secret cannot perform this cancellation; there is no way of recovering the bits of the original message or shared secret from the encoded message. However, it only works for short messages, since the message cannot be longer than the shared secret without leaking information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stream ciphers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above communication method only works for short messages but it can be augmented in order to allow longer messages.\n", "\n", "The idea is to use a _stream cipher_ which from a secret key will produce a continuous _keystream_ of \"pseudorandom\" numbers. Since the keystream is deterministically generated from a secret key it is not truly random—however the numbers should \"behave\" as if they were random.\n", "\n", "In particular, an adversary should not be able to determine the key from the keystream and should also not be able to predict future values of the keystream.\n", "\n", "Then Alice and Bob can use Diffie–Hellman to agree on a shared secret key which they use to seed a stream cipher (which they also agree on). Then they use the same process as above (using addition mod 2 on the bits of the message) to encode and decode their messages. However, they add the numbers from the **keystream** to the message instead of the key itself.\n", "\n", "Because stream ciphers can have very long periods (i.e., they take a long time to start repeating) this allows Alice and Bob communicate much longer messages using a single Diffie–Hellman key exchange." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RSA Cryptosystem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Diffie–Hellman requires both parties to work together in order to exchange a message over an insecure channel. What if Alice wants to send an encrypted message to Bob who is currently not available to reply?\n", "\n", "One solution would be for Alice and Bob to set up a shared key in advance for communicating purposes, but this has some downsides:\n", "\n", "1. Setting up keys in advance can require a lot of work. If there are $n$ parties that each would like to communicate between each other then there are $\\binom{n}{2}$ keys (i.e., a quadratic number of keys in $n$) that need to be exchanged in advance.\n", "2. It may not be possible to set this up in advance, e.g., if Alice did not know Bob before wanting to contact him.\n", "\n", "It might seem like an impossible problem for Alice to securely send a message to someone that she's never previously communicated with—and someone who does not respond back in any way. However, we'll now see that the RSA cryptosystem makes this possible." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Public-key cryptography" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The idea behind public-key cryptography is to introduce an asymmetry between encoding and decoding. In symmetric encryption both encoding and decoding is done with the same shared key which has to be known by both parties that are communicating.\n", "\n", "In public-key cryptography there are separate keys for encoding and decoding. The encoding key is known as the _public key_ (because it can be publicly distributed) and the decoding key is known as the _private key_. The private key has to be kept secret since anyone who has access to it can decode encrypted messages.\n", "\n", "Using such a scheme a group of $n$ people only need to generate $2n$ keys in order to allow anyone to securely communicate with anyone else. Parties also do not have to communicate in advance; they can publish their public keys on a \"key server\" or by other means such as through their webpage.\n", "\n", "To communicate a secure message to someone you only need to look up their public key and then encode your message using that key. Only the holder of the private key associated with that public key is able to decode the message.\n", "\n", "Sounds great in theory, but how can it be implemented in practice?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Enter Rivest, Shamir, and Adleman" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Diffie and Hellman proposed the idea of public-key cryptography in 1976 in addition to their key distribution method. However, they were unable to determine a method for realizing such a scheme.\n", "\n", "In 1977, the three cryptographers Ronald Rivest, Adi Shamir, and Leonard Adleman tried many different methods for designing a public-key cryptosystem. Rivest recounts that sometimes they thought such a scheme was actually impossible to achieve.\n", "\n", "In April 1977, Rivest, Shamir, and Adleman spent Passover at a student's house, apparently drinking Manischewitz wine and leaving around midnight (or so the legend goes). Rivest couldn't sleep and developed the basics of what would later be known as the RSA cryptosystem after the three who had worked on developing a working public-key cryptosystem.\n", "\n", "Similar to the case of Diffie and Hellman's scheme, documents declassified by the British government in 1997 revealed that Clifford Cocks of the British intelligence agency GCHQ had discovered an equivalent system to the RSA method in 1973." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Outline of RSA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to realize public-key cryptography in practice we need a \"one-way function\"—a function that is easy to compute but hard to invert.\n", "\n", "The Diffie–Hellman scheme used the discrete logarithm problem and the RSA cryptosystem will use the factorization problem (given a number compute its prime factors). It is easy to multiply two prime numbers together but difficult to go in the other direction.\n", "\n", "The intuition behind RSA is that computations will be performed modulo a number $N$ that is the product of two prime numbers. Encoding a message $x$ will be done by performing a modular exponentiation $x^e\\pmod N$ for some specific integer $e$. Decoding a message will be done by computing the $e$th root of a number mod $N$ which is equivalent to performing a modular exponentation $x^d\\pmod N$ for some specific integer $d$.\n", "\n", "In other words, encryption is done via the process $x\\mapsto x^e\\bmod N$ and decryption is done via the process $x\\mapsto x^d\\bmod N$. Thus $e$ must be public and $d$ must be private." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Generation of public and private keys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following steps are done by Alice in order to create RSA public and private keys:\n", "\n", "1. Select two prime numbers $p$ and $q$ by some random process and compute $N=p\\cdot q$.\n", "2. Compute $\\varphi(N)=(p-1)(q-1)$. Euler's theorem tells us that $x^{\\varphi(N)}\\equiv1\\pmod{N}$ for all $x$ coprime to $N$.\n", "3. Select an $e>1$ that is coprime to $\\varphi(N)$. This choice can be random or fixed. The number $e=2^{16}+1=65537$ is typical and even $e=3$ works fine (though such a small choice is less secure in some settings).\n", "4. Compute $d:=e^{-1}\\bmod \\varphi(N)$ using the extended Euclidean algorithm. Note that the inverse must exist since $e$ is coprime to $\\varphi(N)$.\n", "5. Publish $N$ and $e$ as your public key. Keep $d$ as your private key. The numbers $p$, $q$, and $\\varphi(N)$ must not be published; they can be kept private or discarded at this point.\n", "\n", "As previously mentioned, a message $x$ is encoded to $x^e\\bmod N$ and an encoded message $y$ is decoded to $y^d\\bmod N$.\n", "\n", "Note that computing the inverse of $e$ mod $\\varphi(N)$ is easy (via the EEA) if $\\varphi(N)=(p-1)(q-1)$ is known. Alice does know $\\varphi(N)$ because she choose $p$ and $q$. However, an attacker has no known way of computing $\\varphi(N)$ without knowing the factorization of $N$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Why does this work?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fundamental theory that RSA relies on is Euler's theorem that $x^{\\varphi(N)}\\equiv1$ for $x$ coprime to $N$. Using this we can show that encryption and decryption are inverses.\n", "In other words, if you encrypt something and then decrypt it you end up with what you started with.\n", "\n", "In symbols, we want to show that ${(x^e)^d \\equiv x \\pmod{N}}$ for all $x$.\n", "\n", "Note that $d$ is the inverse of $e$ mod $\\varphi(N)$ so we have that $ed\\equiv1\\pmod{\\varphi(N)}$. Thus we have that $ed=1+k\\varphi(N)$ for some integer $k$.\n", "\n", "First suppose $x$ is coprime to $N$. Then we have\n", "\n", "$$ (x^e)^d \\equiv x^{ed} \\equiv x^{1+k\\varphi(N)} \\equiv x\\cdot(x^{\\varphi(N)})^k \\equiv x\\cdot 1^k \\equiv x \\pmod{N} $$\n", "\n", "where Euler's theorem was used to reduce $x^{\\varphi(N)}$.\n", "\n", "This is the main case, though to complete the proof we must also consider the rare case when $x$ and $N$ are not coprime. This happens when $x$ is a multiple of $p$ or $q$ (since the only prime divisors of $N$ are $p$ and $q$), although in practice this will never occur if $p$ and $q$ are large primes chosen randomly.\n", "\n", "Say $x$ is a multiple of $p$ and not a multiple of $q$. Then $x\\equiv 0\\pmod p$, so\n", "\n", "$$(x^e)^d\\equiv 0^d\\equiv 0\\equiv x\\pmod p.$$\n", "\n", "By Fermat's little theorem we have $x^{q-1}\\equiv 1\\pmod{q}$. Then\n", "\n", "$$ (x^e)^d \\equiv x^{ed} \\equiv x^{1+k\\varphi(N)} \\equiv x\\cdot(x^{q-1})^{k(p-1)} \\equiv x\\cdot 1^{k(p-1)} \\equiv x \\pmod{q} $$\n", "\n", "where Fermat's little theorem was used to reduce $x^{q-1}$. Then\n", "$(x^e)^d \\equiv x$ modulo $p$ and $q$ and the Chinese remainder theorem then implies $(x^e)^d \\equiv x \\pmod {p\\cdot q}$.\n", "\n", "Similar reasoning applies when $x$ is multiple of $q$ but not $p$. Incidentally, if $x$ is a multiple of $p$ and $q$ then $x\\equiv 0\\pmod{N}$ so the theorem still holds (but this case is not useful for encryption)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set up public and private keys\n", "while True:\n", " p = random_prime(10^30)\n", " q = random_prime(10^30)\n", " n = p*q\n", " phi = (p-1)*(q-1)\n", " \n", " # Ensure that phi(n) is coprime to 3\n", " if phi % 3 != 0:\n", " break\n", " \n", "e = 3\n", "d = e^(-1) % phi\n", "\n", "show(html(\"\\\\begin{align*}\"+\n", " \"p&={}\\\\\\\\\".format(p)+\n", " \"q&={}\\\\\\\\\".format(q)+\n", " \"n&={}\\\\\\\\\".format(n)+\n", " \"\\\\varphi(n)&={}\\\\\\\\\".format(phi)+\n", " \"e&={}\\\\\\\\\".format(e)+\n", " \"d&={}\".format(d)+\n", " \"\\\\end{align*}\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Message to send\n", "message = \"ANTIDISESTABLISHMENTARIANISM\"\n", "l = len(message)\n", "\n", "# Convert message to list of integers in the range 1-26\n", "A = [ord(message[i])-64 for i in range(l)]\n", "\n", "# Convert message to an integer\n", "x = sum(A[i]*27^i for i in range(l))\n", "\n", "# Encode codeword\n", "y = power_mod(x, e, n)\n", "\n", "# Decode encrypted codeword\n", "z = power_mod(y, d, n)\n", "\n", "show(html(\"\\\\begin{align*}\"+\n", " \"x&={}\\\\\\\\\".format(x)+\n", " \"y=x^e\\\\bmod n&={}\\\\\\\\\".format(y)+\n", " \"z=y^d\\\\bmod n&={}\".format(z)+\n", " \"\\\\end{align*}\"))\n", "\n", "print(\"The decrypted message is \" + \"\".join([chr(z.digits(base=27)[i]+64) for i in range(l)]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Warning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although this is the basic idea, RSA does have to be used carefully in practice as in some cases there are ways that an attacker can break the system.\n", "\n", "For example, the choice of $e=3$ is conveinient because it allows encoding to be done very quickly (just a cubing modulo $N$).\n", "\n", "However, this is insecure if the message $x$ to encrypt is small, namely $x