Cryptography Notes

PLAINTEXT is the unencrypted data. It's called plainTEXT even if it's really binary data. The name is a throwback to the days when handwritten messages were encrypted manually.
CIPHERTEXT is the encrypted data. Ideally the ciphertext would be statistically indistinguishable from random noise no matter how regular the plaintext was. The "randomness" of the ciphertext is a measure of the quality of the encryption algorithm. Be aware: humans are generally lousy at evaluating "randomness." Just because it looks like gibberish to you doesn't mean it's random. You need to apply appropriate statistical tests.
When analyzing cryptographic systems one always adopts an ATTACKER MODEL where the attacker is assumed to have full knowledge of all algorithms and methods used. The attacker is assumed to not know any secret keys or similar secret data items.

When analyzing network protocols is it common to use the DOLEV-YAO ATTACKER MODEL. A D-Y attacker has total control of the network and can read/write anything anywhere on the network. However, a D-Y attacker has no access to the host systems on the network and no ability to defeat any of the encryption technology being used. It is understood that real attackers are not exactly like this, but the model is useful for evaluation the security of network communications.
SYMMETRIC ENCRYPTION algorithms use the same key for encryption as for decryption. When using such an algorithm for secure communication a major problem is: how do Alice and Bob share the key? You need a secure channel for the key material, and if you have such a channel why not just use it for the data itself?

Secure channels tend to be low bandwidth and intermittent (think about Alice and Bob meeting in a café to exchange keys). Thus, the value of encryption is that it lets you use a reliable, high bandwidth, and probably also very public channel for the bulk communication.
Block ciphers like DES or AES operate on a block of data at a time... never more or less. Typical block sizes are 64 bits or 128 bits but some algorithms use other sizes.
The key size is an important parameter. Small keys can be attacked by brute force: just try them all. This assumes you can recognize the plaintext when you see it (normally not a big problem, but sometimes it's an issue). Notice that brute force can be used against any encryption algorithm; if the key is too small it doesn't matter how clever the algorithm is.
Good algorithms have long enough keys to be "computational infeasible" to brute force. What that means exactly depends on several factors:
- The resources of the attacker. Very resourceful attackers with lots of time/money to spend can feasibly do more than your 10-year-old sister.
- The value of the protected data. If it costs $10,000,000 to build a machine to crack your encryption and the data you encrypted is only worth $20, who is the real loser there?
- The lifetime of the protected data. If it takes a month to crack the encryption of (for example) battlefield orders that are only meaningful for 24 hours, the data is secure enough.
The parameters above are interrelated. Spending more money will likely reduce the time needed to crack the encryption (more money buys faster hardware) but there is no point in spending more money than the data is worth. Even a weak algorithm is fine if the data it is protecting is relatively worthless or very short-lived (or both).

That said, strong encryption algorithms are readily available so there's no reason not to use one.
Some algorithms have mathematical weaknesses that make them easier to attack than brute force. However, this is not a black and white issue. Often such attacks are still quite difficult or even infeasible. If you are worried about mathematical weaknesses you can try increasing the key size. Often (not always) that makes such attacks harder. Thus, there are reasons for using very long keys even if such keys would be far beyond the reach of brute force.
Good encryption algorithms use the principles of "diffusion" and "confusion."

DIFFUSION means that changing one bit of plaintext will impact a large amount of cipher text. I use "impact" instead of "change" because changing bits is actually very regular since a bit is either 1 or 0. With a good encryption algorithm changing one bit of plaintext "randomizes" a large amount of ciphertext (the entire block).

CONFUSION means the transformation implemented by the encryption is highly non-linear. If the transformation were linear it would be possible to represent each bit of ciphertext as a key dependent linear combination of the plaintext bits (using modulo 2 arithmetic). This isn't great because if that kind of relationship existed it would be relatively easy to algebraically invert it.
Most encryption algorithms use multiple "rounds." They transform the key to produce multiple "subkeys" (also called "round keys"). A typical algorithm works by doing a basic operation on the input using one of the subkeys. That operation is then repeated multiple times (rounds) using the various subkeys. Generally the strength of the algorithm increases as the number of rounds increases. Many algorithms have been shown to be vulnerable in "reduced round variants" so it's important for the algorithm to have enough rounds. However, doing lots of rounds makes the algorithm slower.

For efficiency reasons you only want the minimum security you can get away with!
The ONE TIME PAD encryption algorithm entails just XORing a "key" (called the "pad") as long as the message with the message itself. It is critical that a) the pad remain secure, b) the pad be used only once, and c) the pad be generated from a true source of random numbers. If these conditions are met it is theoretically impossible to crack the One Time Pad. The reason is that all possible messages are potential decryptions so the attacker has no way to know which one is the true message. This reasoning applies, surprisingly, even when encrypting a single bit.
A SUBSTITUTION CIPHER entails just mapping each unit of plaintext to a different unit of cipher text. Typically, a "unit" is a byte. If the unit is small the algorithm is very weak because it doesn't sufficiently disrupt the statistical properties of the plaintext. If the unit of substitution is larger the plaintext statistics tend to be flatter and the method is safer. For example ECB mode of a normal block cipher is a kind of substitution over block-sized units. The One Time Pad could be regarded as a substitution where the entire message is exactly one unit (and thus no statistics can be obtained for analysis).
An encryption algorithm can be regarded as a key-dependent permutation of all possible blocks. For example, for a given key the plaintext block 0x0000000000000000 might map to the ciphertext block 0x4F729AAB39382A4F, and every other plaintext block has a mapping to some other ciphertext block. The mapping is one-to-one and onto (a permutation) because a) every possible plain text must be encryptable to something, and b) every possible ciphertext must be uniquely decryptable.

There are a hug number of possible permutations. For 64 bit blocks the number is (2^64)!. This is vastly larger than the number of possible keys meaning that for any given algorithm the key only gives you access to a tiny fraction of possible block permutations. This is why it is normally no problem to recognize the plaintext when you do a brute force attack; there is probably only one key that produces anything resembling reasonable plaintext. In contrast, the One Time Pad "algorithm" gives you access to all the permutations making it impossible to distinguish the correct plaintext from irrelevant plaintext.
In addition to the various encryption algorithms there are also various encryption MODES. Each mode represents a different way to use an encryption algorithm, and they differ in their security, support for random read/write, and error propagation. Examples include electronic codebook (ECB), cipher block chaining (CBC), cipher feedback (CFB), output feedback (OFB), counter mode (CTC), etc.
A HASH FUNCTION is a one-way function that takes a message of arbitrary size and computes a fixed size (usually smaller) value that represents the message. The function is one-way in that it is easy to compute the hash of a message but infeasible to find a message that has a given hash. This property is also sometimes called PRE-IMAGE RESISTANCE. Notice that because the message is large and the hash value is usually smaller it is normally the case that a large number of messages do, in fact, hash to the same value. Pre-image resistance doesn't mean there is no pre-image (in fact there normally a huge number of pre-images), only that finding one is computationally infeasible. The output of a hash function is typically called a HASH VALUE, but it is also sometimes called a MESSAGE DIGEST.
Hash functions used for security applications should also have two other properties.
- COLLISION RESISTANCE. Given a message M it should be computationally infeasible to find another message M' such that H(M) = H(M') where H is the hash function.
- STRONG COLLISION RESISTANCE. It should be computationally infeasible to find two messages M and M' such that H(M) = H(M').
Strong collision resistance is a stronger property. The attacker is free to choose both messages. This is in contrast with collision resistance where one of the messages is fixed and the attacker must try to match it in the sense of finding another message with the same hash.
Strong collision resistance is important because of the BIRTHDAY ATTACK. In this attack the attacker prepares two messages; one desirable to the victim and one not. The attacker than identifies a number of independent but semantically insignificant changes to both documents (extra white space, sentence rewordings, etc.). The attacker computes a table of hash values made from the first message while ranging over every combination of insignificant change. Then the attacker computes the hash of the second document while trying every combination of insignificant change there as well. For each hash of the second document the attacker tries to match it against one of the hashes in the previously computed table. If a match is found the attacker will have succeeded in finding two documents with the same hash. This means that a digital signature of one document will verify correctly when applied to the other.

This attack can be surprisingly effective because each hash of the second document is compared against all the hashes of the first. Because of a statistical quirk called the "birthday surprise," the probability of finding a match is higher than one might expect. Specifically, if a hash function produces a 64-bit hash, it is only necessary to use 2³² variations (the square root of 2⁶⁴) of each document to create a high chance of finding a match. For this reason hash values need to be about twice the size you might otherwise expect. A 128-bit hash value "only" entails about 2⁶⁴ amount of work to break strong collision resistance. Note that this attack is independent of the hash algorithm.
A MESSAGE AUTHENTICATION CODE (MAC) is a hash value that was computed with the help of a secret key. Only people in possession of the key can make or verify MACs. A MAC thus provides an authentication and data integrity service. If the MAC checks the sender must have the right key (presumably only the legitimate sender has this key), and furthermore the message must not have been modified.

The difference between MACs and digital signatures (below) is that MACs use symmetric key cryptography whereas digital signatures use public key cryptography. The usual pros and cons apply: MAC computation is faster, but it requires the communicating parties to previously exchange secret key material.
Some encryption modes, called Authenticated Encryption with Additional Data (AEAD) modes, allow the ciphertext to be authenticated (as with a MAC), but also authenticates some amount of plaintext data. The resulting AUTHENTICATION TAG, similar to a MAC, can only be verified using the same secret key as was used to make the tag. However, it shows that both the ciphertext and the additional plaintext data have not been modified.

Using an AEAD mode, of which there are several, is similar to doing a MAC computation on some ciphertext and some additional plaintext. However, the AEAD mode is likely faster than doing a separate encryption plus MAC computation.

The typical use-case of AEAD modes is to encrypt network traffic. For example, the payload of a TCP segment might need to be encrypted for confidentiality. But, we don't want a Dolev-Yao attacker to modify the cipher text in an undetectable way. We also don't want a D-Y attacker to modify the segment/packet headers in an undetectable way. However, those headers can't normally be encrypted since they are processed by routers and other systems on the network.
In PUBLIC KEY CRYPTOGRAPHY each user has a private key and a corresponding, related public key. It is computationally feasible to generate the keys together but not feasible to compute the private key given knowledge of the public key. This allows the public key to distributed freely. Anyone can use it to encrypt a message that only the owner of the private key can decrypt.

A major issue with public key cryptography is ensuring that you have the correct public key. An attacker can post a public key claiming to belong to someone else; any message encrypted with it can then be read by the attacker who actually has the corresponding private key. In real life this problem is addressed by "certifying" public keys... having them digitally signed by a "certificate authority" who is trusted to only endorse (sign) public keys that are truly controlled by the entity to which they are associated.
The key sizes of public key algorithms can't be compared with the key sizes of symmetric algorithms. Public key cryptography relies on "hard" mathematical problems and the keys need to be large enough to make solving the underlying problem infeasible. How large this is depends on the underlying problem.

The three main public key algorithms in use today are RSA (which relies on the difficulty of factoring huge numbers), ElGamal (which relies on the difficulty of computing discrete logarithms in large, finite fields), and Elliptic Curve Cryptography (which relies on the difficulty of computing discrete logarithms in fields based on elliptic curves). All three of these algorithms are much slower than the symmetric key algorithms (typically hundreds of times slower). Also, the required key sizes are much larger. Notably ECC methods are faster and use smaller keys than the other two approaches. Elliptic curve cryptography is thus very popular now.
As a result of the slowness of public key methods, most uses of public key cryptography entail one principal generating a random SESSION KEY for a suitable symmetric algorithm and then using the public key of the recipient to encrypt just the session key. The bulk data communicated is then encrypted with the session key. This uses the public key algorithm on a small object (the session key) and lets the large data set be handled by the much faster symmetric algorithm.

However, this approach does provide three points of attack: 1) The public key algorithm could be attacked in an effort to recover the session key. 2) The symmetric key algorithm could be attacked in an effort to decrypt the data without the session key. 3) The random number generator used to create the session key could be attacked in an effort to predict the session key closely enough to make guessing it feasible (it is not necessary to predict the session key exactly). This last attack vector is often overlooked.
Most public key encryption systems also allow for the possibility of using the private key to make a DIGITAL SIGNATURE. In the case of the RSA algorithm the signature is made by just encrypting with the private key. Other public key algorithms use different approaches, but get the same effect.
The normal way to make a digital signature entails computing a hash of the document or message to be signed and then applying the digital signature algorithm to the hash. This avoids processing a potentially large message with the slow public key algorithm. It also allows the message itself to be sent or saved "in the clear" as would be appropriate for public information.
Digital signatures provide three security services.
- Data Integrity. If an attacker changes the message, he/she will not be able to make a new signature without the private key of the original signer. Assuming the hash function is good, the attacker can't change the message or produce another message with the same hash. This also prevents the attacker from attaching a signature on one message to a different message.
- Authentication. Assuming the signer's private key remains secure, a digital signature shows that the signed document was in fact signed by the specified principal. It is important, however, that the entity verifying the signature be confident they have the true public key of the signer. Also note that a valid signature does not prove that the signer was the author/creator of the document.
- Non-Repudiation. The signer can't later deny signing the document ("I never said that!") without also claiming that his/her private key has been compromised.
A CERTIFICATE is a digitally signed public key. Certificates typically also include other information. The most commonly used certificate format, X.509, is normally used to bind a public key, which is a large binary number, to the identity (or name) of the key's owner. Thus, X.509 certificates are often called "identity certificates." Other kinds of certificates also exist that bind public keys to other kinds of information. Certificates often contain additional information such as a validity interval (the time over which the certificate is considered valid) and information about the intended purpose of the public key contained inside the certificate.

The entity that signs public keys and thus creates certificates is called a CERTIFICATE AUTHORITY (CA). Depending on the application domain a CA might be a large company but in other cases it can also be a private individual.
To verify a certificate it is necessary to check the digital signature made by the CA. This requires having the public key of the CA. Typically, that is provided in another certificate signed by a "higher level" CA. Thus, it is common for certificates to be arranged in a CERTIFICATE CHAIN where the key used to sign one certificate is provided in the next certificate. The chain ends on a "self-signed" certificate made by the trusted or root authority. This approach allows for distributed CAs whereby a high level CA can certify the public keys of the lower level CAs that then certify user's keys.
A RANDOM NUMBER GENERATOR is an entity (hardware and/or software) that produces a stream of bits that have properties consistent with those of random data. In security applications the primary property of interest is unpredictability: No amount of knowledge of the previous output of the generator allows you to predict the next bit with probability greater than 0.5 (even with knowledge of the algorithms used by the generator). This property implies that the numbers are statistically random but the converse is not true: a generator that produces output that looks statistically random may still be predictable, especially if the inner workings of the generator are known.

For example, one popular (high speed) way to generate random looking data is to use a linear congruential generator. Such generators compute X_n+1 = (aX_n + b) mod c. For suitably chosen a, b, and c the output data stream can have good statistical properties. However, there is no security since knowing the output allows one to trivially compute the following output, assuming knowledge of a, b, and c. Even if a, b, and c are not known it is algebraically uncomplicated to solve for them given a sufficiently large sample of the generator's output.
A PSEUDO RANDOM NUMBER GENERATOR (PRNG) is a deterministic algorithm for computing random looking data. The linear congruential generator above is a PRNG.

All PRNGs have internal state that they update as they work. The initial state of the PRNG is called the "seed" and it is considered secret information (similar to a key). Since that state is finite it is necessary that as a PRNG works eventually an older state will be revisited. Thus, all PRNG generators produce periodic output. High quality PRNGs have very long periods, however. One could attempt to "brute force" an PRNG by trying every possible seed. Thus, to be secure the internal state of a PRNG must be large. Also seeding a PRNG with the date/time is not secure because it is too easily guessed.
In security applications PRNGs must be seeded with random data from a "true" RNG. Such generators typically rely on some physical process to collect randomness. Ultimately these physical processes are random because of quantum mechanics and thus a true RNG gets its randomness from very fundamental physics. Yet care must still be taken in the design of a true RNG to avoid introducing biases that make their output more predictable than it should be.
True RNGs tend to produce random numbers slowly so the usual procedure is to use the true RNG to seed a cryptographically strong PRNG which is then used to generate the bulk random data. The PRNG can be periodically reseeded from the true RNG to frustrate analysis. This is conceptually similar, in some ways, to the way session keys are protected by public key cryptography and then used on the bulk data. PRNGs don't really generate any randomness at all. Instead, they effectively just amplify the bandwidth of the true RNG.