Hyperlinked definitions and discussions of many cryptographic, mathematic, logic, statistics, and electronics terms used in cipher construction and analysis.
Generally used for power distribution because the changing current supports the use of transformers. Utilities can thus transport power at high voltage and low current, which minimize "ohmic" or I2R losses. The high voltages are then reduced at power substations and again by pole transformers for delivery to the consumer.
One example is byte addition modulo 256, which simply adds two byte values, each in the range 0..255, and produces the remainder after division by 256, again a value in the byte range of 0..255. Subtraction is also an "additive" combiner.
Another example is bit-level exclusive-OR which is addition mod 2. A byte-level exclusive-OR is a polynomial addition.
Knuth, D. 1981. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. 2nd ed. 26-31. Addison-Wesley: Reading, Massachusetts.
Marsaglia, G. and L. Tsay. 1985. Matrices and the Structure of Random Number Sequences. Linear Algebra and its Applications. 67: 147-156.
Advantages include:
In addition, a vast multiplicity of independent cycles has the
potential of confusing even a "quantum computer," should such a
thing become possible.
For Degree-n Primitive, and Bit Width w
Total States: 2nw
Non-Init States: 2n(w-1)
Number of Cycles: 2(n-1)(w-1)
Length Each Cycle: (2n-1)2(w-1)
Period of LSB: 2n-1
The binary addition of two bits with no carry input is just XOR, so the lsb of an Additive RNG has the usual maximal length period.
A degree-127 Additive RNG using 127 elements of 32 bits each has 24064 unique states. Of these, 23937 are disallowed by initialization (the lsb's are all "0") but this is just one unusable state out of 2127. There are still 23906 cycles which each have almost 2158 steps. (The Cloak2 stream cipher uses an Additive RNG with 9689 elements of 32 bits, and so has 2310048 unique states. These are mainly distributed among 2300328 different cycles with almost 29720 steps each.)
Note that any LFSR, including the Additive RNG, is very weak when used alone. But when steps are taken to hide the sequence (such as using a jitterizer and Dynamic Substitution combining) the result can have significant strength.
Assume the flat plane defined by two arbitrary unit vectors e1, e2 and a common origin O; this is a coordinate "frame." Assume a grid of lines parallel to each frame vector, separated by unit lengths (a "metric" which may differ for each vector). If the vectors happen to be perpendicular, we have a Cartesian coordinate system, but in any case we can locate any point on the plane by its position on the grid.
An affine transformation can change the origin, the angle between the vectors, and unit vector lengths. Shapes in the original frame thus become "pinched," "squashed" or "stretched" images under the affine transformation. This same sort of thing generalizes to higher degree expressions.
The Handbook of Mathematics says that if e1, e2, e3 are linearly independent vectors, any vector a can be expressed uniquely in the form a = a1e1 + a2e2 + a3e3 where the ai are the affine coordinates. (p.518)
The VNR Concise Encyclopedia of Mathematics says "All transformations that lead to a uniquely soluble system of linear equations are called affine transformations." (p.534)
anxn + an-1xn-1 + ... + a1x1 + a0where the operations are mod 2: addition is Exclusive-OR, and multiplication is AND.
Note that all of the variables xi are to the first power only, and each coefficient ai simply enables or disables its associated variable. The result is a single Boolean value, but the constant term a0 can produce either possible output polarity.
Here are all possible 3-variable affine Boolean functions (each of which may be inverted by complementing the constant term):
affine truth table
c 0 0 0 0 0 0 0 0
x0 0 1 0 1 0 1 0 1
x1 0 0 1 1 0 0 1 1
x1+x0 0 1 1 0 0 1 1 0
x2 0 0 0 0 1 1 1 1
x2+ x0 0 1 0 1 1 0 1 0
x2+x1 0 0 1 1 1 1 0 0
x2+x1+x0 0 1 1 0 1 0 0 1
Transistors are analog amplifiers which are basically linear over a reasonable range and so require DC power. In contrast, Relays are classically mechanical devices with direct metal-to-metal moving connections, and so can handle generally higher power and AC current.
DEC HEX CTRL CMD DEC HEX CHAR DEC HEX CHAR DEC HEX CHAR
0 00 ^@ NUL 32 20 SPC 64 40 @ 96 60 '
1 01 ^A SOH 33 21 ! 65 41 A 97 61 a
2 02 ^B STX 34 22 " 66 42 B 98 62 b
3 03 ^C ETX 35 23 # 67 43 C 99 63 c
4 04 ^D EOT 36 24 $ 68 44 D 100 64 d
5 05 ^E ENQ 37 25 % 69 45 E 101 65 e
6 06 ^F ACK 38 26 & 70 46 F 102 66 f
7 07 ^G BEL 39 27 ' 71 47 G 103 67 g
8 08 ^H BS 40 28 ( 72 48 H 104 68 h
9 09 ^I HT 41 29 ) 73 49 I 105 69 i
10 0a ^J LF 42 2a * 74 4a J 106 6a j
11 0b ^K VT 43 2b + 75 4b K 107 6b k
12 0c ^L FF 44 2c , 76 4c L 108 6c l
13 0d ^M CR 45 2d - 77 4d M 109 6d m
14 0e ^N SO 46 2e . 78 4e N 110 6e n
15 0f ^O SI 47 2f / 79 4f O 111 6f o
16 10 ^P DLE 48 30 0 80 50 P 112 70 p
17 11 ^Q DC1 49 31 1 81 51 Q 113 71 q
18 12 ^R DC2 50 32 2 82 52 R 114 72 r
19 13 ^S DC3 51 33 3 83 53 S 115 73 s
20 14 ^T DC4 52 34 4 84 54 T 116 74 t
21 15 ^U NAK 53 35 5 85 55 U 117 75 u
22 16 ^V SYN 54 36 6 86 56 V 118 76 v
23 17 ^W ETB 55 37 7 87 57 W 119 77 w
24 18 ^X CAN 56 38 8 88 58 X 120 78 x
25 19 ^Y EM 57 39 9 89 59 Y 121 79 y
26 1a ^Z SUB 58 3a : 90 5a Z 122 7a z
27 1b ^[ ESC 59 3b ; 91 5b [ 123 7b {
28 1c ^\ FS 60 3c < 92 5c \ 124 7c |
29 1d ^] GS 61 3d = 93 5d ] 125 7d }
30 1e ^^ RS 62 3e > 94 5e ^ 126 7e
31 1f ^_ US 63 3f ? 95 5f _ 127 7f DEL
Also see: commutative and distributive.
Classically, attacks were neither named nor classified; there was just: "here is a cipher, and here is the attack." And while this gradually developed into named attacks, there is no overall attack taxonomy. Currently, attacks are often classified by the information available to the attacker or constraints on the attack, and then by strategies which use the available information. Not only ciphers, but also cryptographic hash functions can be attacked, generally with very different strategies.
We are to attack a cipher which enciphers plaintext into ciphertext or deciphers the opposite way, under control of a key. The available information necessarily constrains our attack strategies.
The goal of an attack is to reveal some unknown plaintext, or the key (which will reveal the plaintext). An attack which succeeds with less effort than a brute-force search we call a break. An "academic" ("theoretical," "certificational") break may involve impractically large amounts of data or resources, yet still be called a "break" if the attack would be easier than brute force. (It is thus possible for a "broken" cipher to be much stronger than a cipher with a short key.) Sometimes the attack strategy is thought to be obvious, given a particular informational constraint, and is not further classified.
Many attacks try to isolate unknown small components or aspects so they can be solved separately, a process known as divide and conquer. Also see: security.
For a known population, the number of repetitions expected at each level has long been understood to be a binomial expression. But if we are sampling in an attempt to establish the effective size of an unknown population, we have two problems:
Fortunately, there is an unexpected and apparently previously unknown combinatoric relationship between the population and the number of combinations of occurrences of repeated values. This allows us to convert any number of triples and higher n-reps to the number of 2-reps which have the same probability. So if we have a double, and then get another of the same value, we have a triple, which we can convert into three 2-reps. The total number of 2-reps from all repetitions (the augmented 2-reps value) is then used to predict population.
We can relate the number of samples s to the population N through the expected number of augmented doubles Ead:
Ead(N,s) = s(s-1) / 2N .
This equation is exact, provided we interpret all
the exact n-reps in terms of 2-reps. For example, a triple is
interpreted as three doubles; the augmentation from 3-reps to 2-reps
is (3 C 2) or 3. The augmented result is the sum of the
contributions from all higher repetition levels:
n i
ad = SUM ( ) r[i] .
i=2 2
where ad is the number of augmented doubles, and r[i]
is the exact repetition count at the i-th level.
And this leads to an equation for predicting population:
Nad(s,ad) = s(s-1) / 2 ad .
This predicts the population Nad as based on a mean value
of augmented doubles ad. Clearly, we expect the number of
samples to be far larger than the number of augmented doubles, but
an error in the augmented doubles ad should produce a
proportionally similar error in the predicted population Nad.
We typically develop ad to high precision by averaging the
results of many large trials.
However, since the trials should have approximately a simple Poisson distribution (which has only a single parameter), we could be a bit more clever and fit the results to the expected distribution, thus perhaps developing a bit more accuracy.
Also see the article: Estimating Population from Repetitions in Accumulated Random Samples, and the Population Estimation Worksheets in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
One form of message authentication computes a CRC hash across the plaintext data, and appends the CRC remainder (or result) to the plaintext data: this adds a computed redundancy to an arbitrary message. The CRC result is then enciphered along with the data. When the message is deciphered, if a second CRC operation produces the same result, the message can be assumed unchanged.
Note that a CRC is a fast, linear hash. Messages with particular CRC result values can be constructed rather easily. However, if the CRC is hidden behind strong ciphering, an Opponent is unlikely to be able to change the CRC value systematically or effectively. In particular, this means that the CRC value will need more protection than a simple exclusive-OR stream cipher or the exclusive-OR approach to handling short last blocks in a block cipher.
A similar approach to message authentication uses a nonlinear cryptographic hash function. These also add a computed redundancy to the message, but generally require significantly more computation than a CRC. It is thought to be exceedingly difficult to construct messages with a particular cryptographic hash result, so the hash result perhaps need not be hidden by encryption.
One form of cryptographic hash is DES CBC mode: using a key different than that used for encryption, the final block of ciphertext is the hash of the message. This obviously doubles the computation when both encryption and authentication are needed. And since any cryptographic hash is vulnerable to birthday attacks, the small 64-bit block size implies that we should be able to find two different messages with the same hash value by constructing and hashing "only" about 232 different messages.
Another approach to message authentication is to use an authenticating block cipher; this is often a block cipher which has a large block, with some "extra data" inserted in an "authentication field" as part of the plaintext before enciphering each block. The "extra data" can be some transformation of the key, the plaintext, and/or a sequence number. This essentially creates a homophonic block cipher: If we know the key, many different ciphertexts will produce the same plaintext field, but only one of those will have the correct authentication field.
The usual approach to authentication in a public key cipher is to encipher with the private key. The resulting ciphertext can then be deciphered by the public key, which anyone can know. Since even the wrong key will produce a "deciphered" result, it is also necessary to identify the resulting plaintext as a valid message; in general this will also require redundancy in the form of a hash value in the plaintext. The process provides no secrecy, but only a person with access to the private key could have enciphered the message.
The classical approach to user authentication is a password; this is "something you know." One can also make use of "something you have" (such as a secure ID card), or "something you are" (biometrics).
The classic problem with passwords is that they must be remembered by ordinary people, and so carry a limited amount of uniqueness. Easy-to-remember passwords are often common language phrases, and so often fall to a dictionary attack. More modern approaches involve using a Diffie-Hellman key exchange, plus the password, thus minimizing exposure to a dictionary attack. This does require a program on the user end, however.
In secret key ciphers, key authentication is inherent in secure key distribution.
In public key ciphers, public keys are exposed and often delivered insecurely. But someone who uses the wrong key may unknowingly have "secure" communications with an Opponent, as in a man-in-the-middle attack. It is thus absolutely crucial that public keys be authenticated or certified as a separate process. Normally this implies the need for a Certification Authority or CA.
"As the input moves through successive layers the pattern of 1's generated is amplified and results in an unpredictable avalanche. In the end the final output will have, on average, half 0's and half 1's . . . ." [p.22]
Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American. 228(5): 15-23.
Also see mixing, diffusion, overall diffusion, strict avalanche criterion, complete, S-box, and the bit changes section of the Ciphers By Ritter / JavaScript computation pages.
"For a given transformation to exhibit the avalanche effect, an average of one half of the output bits should change whenever a single input bit is complemented." [p.523]
Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology -- CRYPTO '85. 523-534.
Also see the bit changes section of the Ciphers By Ritter / JavaScript computation pages.
Back Door
"A function is balanced if, when all input vectors are equally likely, then all output vectors are equally likely."
Lloyd, S. 1990. Properties of binary functions. Advances in Cryptology -- EUROCRYPT '90. 124-139.
There is some desire to generalize this definition to describe multiple-input functions. (Is a function "balanced" if, for one value on the first input, all output values can be produced, but for another value on the first input, only some output values are possible?) Presumably a two-input balanced function would be balanced for either input fixed at any value, which would essentially be a Latin square or a Latin square combiner.
A Balanced Block Mixer is an m-input-port m-output-port mechanism with various properties:
If we have a two port mixer, with input ports labeled A and B, output ports labeled X and Y, and some irreducible mod 2 polynomial p of degree appropriate to the port size, a Balanced Block Mixer is formed by the equations:
X = 3A + 2B (mod 2)(mod p),
Y = 2A + 3B (mod 2)(mod p).
This particular BBM is a self-inverse or involution, and so can be used without change whether enciphering or deciphering. One possible value for p for mixing 8-bit values is 100011011.
Balanced Block Mixing functions probably should be thought of as orthogonal Latin squares. For example, here is a tiny nonlinear "2-bit" BBM:
3 1 2 0 0 3 2 1 30 13 22 01 0 2 1 3 2 1 0 3 = 02 21 10 33 1 3 0 2 1 2 3 0 11 32 03 20 2 0 3 1 3 0 1 2 23 00 31 12
Suppose we wish to mix (1,3); 1 selects the second row up in both squares, and 3 selects the rightmost column, thus selecting (2,0) as the output. Since there is only one occurrence of (2,0) among all entry pairs, this discrete mixing function is reversible, as well as being balanced on both inputs.
Cryptographic advantages of balanced block mixing include the fact that each output is always balanced with respect to either input, and that no information is lost in the mixing. This allows us to use balanced block mixing as the "butterfly" operations in a fast Walsh-Hadamard transform or the well-known FFT. By using the mixing patterns of these transforms, we can mix 2n elements such that each input is guaranteed to affect each and every output in a balanced way. And if we use keying to generate the tables, we can have a way to mix huge blocks in small nonlinear mixing tables with overall mixing guarantees.
Also see Mixing Cipher, Dynamic Substitution Combiner, Variable Size Block Cipher, and the Active Balanced Block Mixing in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
In a statically-balanced combiner, any particular result value can be produced by any value on one input, simply by selecting some appropriate value for the other input. In this way, knowledge of only the output value provides no information -- not even statistical information -- about either input.
The common examples of cryptographic combiner, including byte exclusive-OR (mod 2 polynomial addition), byte addition (integer addition mod 256), or other "additive" combining, are perfectly balanced. Unfortunately, these simple combiners are also very weak, being inherently linear and without internal state.
A Latin square combiner is an example of a statically-balanced reversible nonlinear combiner with massive internal state. A Dynamic Substitution Combiner is an example of a dynamically or statistically-balanced reversible nonlinear combiner with substantial internal state.
0 1 2 3 4 5 6 7 8 9 a b c d e f
0 A B C D E F G H I J K L M N O P
1 Q R S T U V W X Y Z a b c d e f
2 g h i j k l m n o p q r s t u v
3 w x y z 0 1 2 3 4 5 6 7 8 9 + /
use "=" for padding
We can do FWT's in "the bottom panel" at the end of Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
Here is every bent sequence of length 4, first in {0,1} notation, then in {1,-1} notation, with their FWT results:
bent {0,1} FWT bent {1,-1} FWT
0 0 0 1 1 -1 -1 1 1 1 1 -1 2 2 2 -2
0 0 1 0 1 1 -1 -1 1 1 -1 1 2 -2 2 2
0 1 0 0 1 -1 1 -1 1 -1 1 1 2 2 -2 2
1 0 0 0 1 1 1 1 -1 1 1 1 2 -2 -2 -2
1 1 1 0 3 1 1 -1 -1 -1 -1 1 -2 -2 -2 2
1 1 0 1 3 -1 1 1 -1 -1 1 -1 -2 2 -2 2
1 0 1 1 3 1 -1 1 -1 1 -1 -1 -2 -2 2 -2
0 1 1 1 3 -1 -1 -1 1 -1 -1 -1 -2 2 2 2
These sequences, like all true bent sequences, are not
balanced, and the zeroth element of the
{0,1} FWT is the number of 1's in the sequence.
Here are some bent sequences of length 16:
bent {0,1} 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 0
FWT 6,-2,2,-2,2,-2,2,2,-2,-2,2,-2,-2,2,-2,-2
bent {1,-1} 1 -1 1 1 1 -1 1 1 -1 -1 1 -1 1 1 -1 1
FWT 4,4,-4,4,-4,4,-4,-4,4,4,-4,4,4,-4,4,4
bent {0,1} 0 0 1 0 0 1 0 0 1 0 0 0 1 1 1 0
FWT 6,2,2,-2,-2,2,-2,2,-2,-2,-2,-2,2,2,-2,-2
bent {1,-1} 1 1 -1 1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1
FWT 4,-4,-4,4,4,-4,4,-4,4,4,4,4,-4,-4,4,4
Bent sequences are said to have the highest possible uniform nonlinearity. But, to put this in perspective, recall that we expect a random sequence of 16 bits to have 8 bits different from any particular sequence, linear or otherwise. That is also the maximum possible nonlinearity, and here we actually get a nonlinearity of 6.
There are various more or less complex constructions for these sequences. In most cryptographic uses, bent sequences are modified slightly to achieve balance.
Bernoulli trials have a Binomial distribution.
Possibly also the confusing counterpart to unary when describing the number of inputs or arguments to a function, but dyadic is almost certainly a better choice.
n k n-k
P(k,n,p) = ( ) p (1-p)
k
This ideal distribution is produced by evaluating the probability function for all possible k, from 0 to n.
If we have an experiment which we think should produce a binomial distribution, and then repeatedly and systematically find very improbable test values, we may choose to reject the null hypothesis that the experimental distribution is in fact binomial.
Also see the binomial section of the Ciphers By Ritter / JavaScript computation pages.
Also see: birthday paradox.
The "paradox" is resolved by noting that we have a 1/365 chance
of success for each possible pairing of students, and there
are 253 possible pairs or
combinations of 23 things taken 2 at
a time. (To count the number of pairs, we can choose any of the 23
students as part of the pair, then any of the 22 remaining students
as the other part. But this counts each pair twice, so we have
We can compute the overall probability of success from the
probability of failure
We can relate the probability of finding at least one "double" of some birthday (Pd) to the expected number of doubles (Ed) as:
Pd = 1 - e-Ed , soEd = -Ln( 1 - Pd ) and365 * -Ln( 0.5 ) = 365 * 0.693 = 253 .
Also see: Estimating Population from Repetitions in Accumulated Random Samples, my "birthday" article.
A 64-bit block supports 264 or about
It is not normally possible to block-cipher just a single bit or a single byte of a block. An arbitrary stream of data can always be partitioned into one or more fixed-size blocks, but it is likely that at least one block will not be completely filled. Using fixed-size blocks generally means that the associated system must support data expansion in enciphering, if only by one block. Handling even minimal data expansion may be difficult in some systems.
A block cipher is a transformation between plaintext block values and ciphertext block values, and is thus an emulated simple substitution on huge block-wide values. Within a particular block size, both plaintext and ciphertext have the same set of possible values, and when the ciphertext values have the same ordering as the plaintext, ciphering is obviously ineffective. So effective ciphering depends upon re-arranging the ciphertext values from the plaintext ordering, and this is a permutation of the plaintext values. A block cipher is keyed by constructing a particular permutation of ciphertext values.
In an ideal block cipher, changing even a single bit of the input block will change all bits of the ciphertext result, each with independent probability 0.5. This means that about half of the bits in the output will change for any different input block, even for differences of just one bit. This is overall diffusion and is present in a block cipher, but not in a stream cipher. Data diffusion is a simple consequence of the keyed invertible simple substitution nature of the ideal block cipher.
Improper diffusion of data throughout a block cipher can have serious strength implications. One of the functions of data diffusion is to hide the different effects of different internal components. If these effects are not in fact hidden, it may be possible to attack each component separately, and break the whole cipher fairly easily.
A large message can be ciphered by partitioning the plaintext into blocks of a size which can be ciphered. This essentially creates a stream meta-cipher which repeatedly uses the same block cipher transformation. Of course, it is also possible to re-key the block cipher for each and every block ciphered, but this is usually expensive in terms of computation and normally unnecessary.
A message of arbitrary size can always be partitioned into some number of whole blocks, with possibly some space remaining in the final block. Since partial blocks cannot be ciphered, some random padding can be introduced to fill out the last block, and this naturally expands the ciphertext. In this case it may also be necessary to introduce some sort of structure which will indicate the number of valid bytes in the last block.
Proposals for using a block cipher supposedly without data expansion may involve creating a tiny stream cipher for the last block. One scheme is to re-encipher the ciphertext of the preceding block, and use the result as the confusion sequence. Of course, the cipher designer still needs to address the situation of files which are so short that they have no preceding block. Because the one-block version is in fact a stream cipher, we must be very careful to never re-use a confusion sequence. But when we only have one block, there is no prior block to change as a result of the data. In this case, ciphering several very short files could expose those files quickly. Furthermore, it is dangerous to encipher a CRC value in such a block, because exclusive-OR enciphering is transparent to the field of mod 2 polynomials in which the CRC operates. Doing this could allow an Opponent to adjust the message CRC in a known way, thus avoiding authentication exposure.
Another proposal for eliminating data expansion consists of ciphering blocks until the last short block, then re-positioning the ciphering window to end at the last of the data, thus re-ciphering part of the prior block. This is a form of chaining and establishes a sequentiality requirement which requires that the last block be deciphered before the next-to-the-last block. Or we can make enciphering inconvenient and deciphering easy, but one way will be a problem. And this approach cannot handle very short messages: its minimum size is one block. Yet any general-purpose ciphering routine will encounter short messages. Even worse, if we have a short message, we still need to somehow indicate the correct length of the message, and this must expand the message, as we saw before. Thus, overall, this seems a somewhat dubious technique.
On the other hand, it does show a way to chain blocks for authentication in a large-block cipher: We start out by enciphering the data in the first block. Then we position the next ciphering to start inside the ciphertext of the previous block. Of course this would mean that we would have to decipher the message in reverse order, but it would also propagate any ciphertext changes through the end of the message. So if we add an authentication field at the end of the message (a keyed value known on both ends), and that value is recovered upon deciphering (this will be the first block deciphered) we can authenticate the whole message. But we still need to handle the last block padding problem and possibly also the short message problem.
Ciphering raw plaintext data can be dangerous when the cipher has a small block size. Language plaintext has a strong, biased distribution of symbols and ciphering raw plaintext would effectively reduce the number of possible plaintexts blocks. Worse, some plaintexts would be vastly more probable than others, and if some known plaintext were available, the most-frequent blocks might already be known. In this way, small blocks can be vulnerable to classic codebook attacks which build up the ciphertext equivalents for many of the plaintext phrases. This sort of attack confronts a particular block size, and for these attacks Triple-DES is no stronger than simple DES, because they both have the same block size.
The usual way of avoiding these problems is to randomize the plaintext block with an operating mode such as CBC. This can ensure that the plaintext data which is actually ciphered is evenly distributed across all possible block values. However, this also requires an IV which thus expands the ciphertext.
Another approach is to apply data compression to the plaintext before enciphering. If this is to be used instead of plaintext randomization, the designer must be very careful that the data compression does not contain regular features which could be exploited by The Opponents.
An alternate approach is to use blocks of sufficient size for them to be expected to have a substantial amount of uniqueness or "entropy." If we expect plaintext to have about one bit of entropy per byte of text, we might want a block size of at least 64 bytes before we stop worrying about an uneven distribution of plaintext blocks. This is now a practical block size.
Typically computed by using a fast Walsh-Hadamard transform on the Boolean-valued truth table of the function. This produces the unexpected distance to every possible affine Boolean function (of the given length). Scanning those results for the maximum value implies the minimum distance to some particular affine sequence.
Especially useful in S-box analysis, where the nonlinearity for the table is often taken to be the minimum of the nonlinearity values computed for each output bit.
Also see the Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
A cipher is "broken" when the information in a message can be extracted without the key, or when the key itself can be recovered. The strength of a cipher can be considered to be the minimum effort required for a break, by any possible attack. A break is particularly significant when the work involved need not be repeated on every message.
The use of the term "break" can be misleading when an impractical amount of work is required to achieve the break. This case might be better described a "theoretical" or "certificational" weakness.
Recognizing plaintext may or may not be easy. Even when the key length of a cipher is sufficient to prevent brute force attack, that key will be far too small to produce every possible plaintext from a given ciphertext (see perfect secrecy). Combined with the fact that language is redundant, this means that very few of the decipherings will be words in proper form. Of course, if the plaintext is not language, but is instead computer code, compressed text, or even ciphertext from another cipher, recognizing a correct deciphering can be difficult.
Brute force is the obvious way to attack a cipher, and the way any cipher can be attacked, so ciphers are designed to have a large enough keyspace to make this much too expensive to use in practice. Normally, the design strength of a cipher is based on the cost of a brute-force attack.
Capacitor
Typically, two conductive "plates" or metal foils separated by a thin insulator, such as air, paper, or ceramic. An electron charge on one plate attracts the opposite charge on the other plate, thus "storing" charge. A capacitor can be used to collect a small current over long time, and then release a high current for a short time, as used in a camera strobe or "flash."
In CBC mode the ciphertext value of the preceding block is exclusive-OR combined with the plaintext value for the current block. This has the effect of distributing the combined block values evenly among all possible block values, and so prevents codebook attacks.
On the other hand, ciphering the first block generally requires an IV or initial value to start the process. The IV necessarily expands the ciphertext, which may or may not be a problem. And the IV must be dynamically random-like so that statistics cannot be developed on the first block of each message sent under the same key.
In CBC mode, each random-like confusing value is the ciphertext from each previous block. Clearly this ciphertext is exposed to The Opponent, so there would seem to be little benefit associated with hiding the IV, which is just the first of these values. But if The Opponent knows the first sent plaintext, and can intercept and change the message IV, The Opponent can manipulate the first block of received plaintext. Because the IV does not represent a message enciphering, manipulating this value does not also change any previous block.
Accordingly, the IV may be sent enciphered or may be specifically authenticated in some way. Alternately, the complete body of the plaintext message may be authenticated, often by a CRC. The CRC remainder should be block ciphered, perhaps as part of the plaintext.
CFB is closely related to OFB, and is intended to provide some of the characteristics of a stream cipher from a block cipher. CFB generally forms an autokey stream cipher. CFB is a way of using a block cipher to form a random number generator. The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.
CFB assumes a shift register of the block cipher block size. An IV or initial value first fills the register, and then is ciphered. Part of the result, often just a single byte, is used to cipher data, and the resulting ciphertext is also shifted into the register. The new register value is ciphered, producing another confusion value for use in stream ciphering.
One disadvantage of this, of course, is the need for a full block-wide ciphering operation, typically for each data byte ciphered. The advantage is the ability to cipher individual characters, instead of requiring accumulation into a block before processing.
Chaos
In physics, the "state" of an analog physical system cannot be fully measured, which always leaves some remaining uncertainty to be magnified on subsequent steps. And, in many cases, a physical system may be slightly affected by thermal noise and thus continue to accumulate new information into its "state."
In a computer, the state of the digital system is explicit and complete, and there is no uncertainty. No noise is accumulated. All operations are completely deterministic. This means that, in a computer, even a "chaotic" computation is completely predictable and repeatable.
In the usual case, many independent samples are counted by category or separated into value-range "bins." The reference distribution gives us the the number of values to expect in each bin. Then we compute a X2 test statistic related to the difference between the distributions:
X2 = SUM( SQR(Observed[i] - Expected[i]) / Expected[i] )
("SQR" is the squaring function, and we require that each expectation not be zero.) Then we use a tabulation of chi-square statistic values to look up the probability that a particular X2 value or lower (in the c.d.f.) would occur by random sampling if both distributions were the same. The statistic also depends upon the "degrees of freedom," which is almost always one less than the final number of bins. See the chi-square section of the Ciphers By Ritter / JavaScript computation pages.
The c.d.f. percentage for a particular chi-square value is the area of the statistic distribution to the left of the statistic value; this is the probability of obtaining that statistic value or less by random selection when testing two distributions which are exactly the same. Repeated trials which randomly sample two identical distributions should produce about the same number of X2 values in each quarter of the distribution (0% to 25%, 25% to 50%, 50% to 75%, and 75% to 100%). So if we repeatedly find only very high percentage values, we can assume that we are probing different distributions. And even a single very high percentage value would be a matter of some interest.
Any statistic probability can be expressed either as the proportion of the area to the left of the statistic value (this is the "cumulative distribution function" or c.d.f.), or as the area to the right of the value (this is the "upper tail"). Using the upper tail representation for the X2 distribution can make sense because the usual chi-squared test is a "one tail" test where the decision is always made on the upper tail. But the "upper tail" has an opposite "sense" to the c.d.f., where higher statistic values always produce higher percentage values. Personally, I find it helpful to describe all statistics by their c.d.f., thus avoiding the use of a wrong "polarity" when interpreting any particular statistic. While it is easy enough to convert from the c.d.f. to the complement or vise versa (just subtract from 1.0), we can base our arguments on either form, since the statistical implications are the same.
It is often unnecessary to use a statistical test if we just want to know whether a function is producing something like the expected distribution: We can look at the binned values and generally get a good idea about whether the distributions change in similar ways at similar places. A good rule-of-thumb is to expect chi-square totals similar to the number of bins, but distinctly different distributions often produce huge totals far beyond the values in any table, and computing an exact probability for such cases is simply irrelevant. On the other hand, it can be very useful to perform 20 to 40 independent experiments to look for a reasonable statistic distribution, rather than simply making a "yes / no" decision on the basis of what might turn out to be a rather unusual result.
Since we are accumulating discrete bin-counts, any fractional expectation will always differ from any actual count. For example, suppose we expect an even distribution, but have many bins and so only accumulate enough samples to observe about 1 count for every 2 bins. In this situation, the absolute best sample we could hope to see would be something like (0,1,0,1,0,1,...), which would represent an even, balanced distribution over the range. But even in this best possible case we would still be off by half a count in each and every bin, so the chi-square result would not properly characterize this best possible sequence. Accordingly, we need to accumulate enough samples so that the quantization which occurs in binning does not appreciably affect the accuracy of the result. Normally I try to expect at least 10 counts in each bin.
But when we have a reference distribution that trails off toward zero, inevitably there will be some bins with few counts. Taking more samples will just expand the range of bins, some of which will be lightly filled in any case. We can avoid quantization error by summing both the observations and expectations from multiple bins, until we get a reasonable expectation value (again, I like to see 10 counts or more). In this way, the "tails" of the distribution can be more properly (and legitimately) characterized.
A good cipher can transform secret information into a multitude of different intermediate forms, each of which represents the original information. Any of these intermediate forms or ciphertexts can be produced by ciphering the information under a particular key value. The intent is that the original information only be exposed by one of the many possible keyed interpretations of that ciphertext. Yet the correct interpretation is available merely by deciphering under the appropriate key.
A cipher appears to reduce the protection of secret information to enciphering under some key, and then keeping that key secret. This is a great reduction of effort and potential exposure, and is much like keeping your valuables in your house, and then locking the door when you leave. But there are also similar limitations and potential problems.
With a good cipher, the resulting ciphertext can be stored or transmitted otherwise exposed without also exposing the secret information hidden inside. This means that ciphertext can be stored in, or transmitted through, systems which have no secrecy protection. For transmitted information, this also means that the cipher itself must be distributed in multiple places, so in general the cipher cannot be assumed to be secret. With a good cipher, only the deciphering key need be kept secret.
We seek to hide distinctions of size, because operation is independent of size, and because size effects are usually straightforward. We thus classify serious block ciphers as keyed simple substitution, just like newspaper amusement ciphers, despite their obvious differences in strength and construction. This allows us to compare the results from an ideal tiny cipher to those from a large cipher construction; the grouping thus can provide benchmark characteristics for measuring large cipher constructions.
We could of course treat each cipher as an entity unto itself, or relate ciphers by their dates of discovery, the tree of developments which produced them, or by known strength. But each of these criteria is more or less limited to telling us "this cipher is what it is." We already know that. What we want to know is what other ciphers function in a similar way, and then whatever is known about those ciphers. In this way, every cipher need not be an island unto itself, but instead can be judged and compared in a related community of similar techniques.
Our primary distinction is between ciphers which handle all the data at once (block ciphers), and those which handle some, then some more, then some more (stream ciphers). We thus see the usual repeated use of a block cipher as a stream meta-cipher which has the block cipher as a component. It is also possible for a stream cipher to be re-keyed or re-originate frequently, and so appear to operate on "blocks." Such a cipher, however, would not have the overall diffusion we normally associate with a block cipher, and so might usefully be regarded as a stream meta-cipher with a stream cipher component.
The goal is not to give each cipher a label, but instead to seek insight. Each cipher in a particular general class carries with it the consequences of that class. And because these groupings ignore size, we are free to generalize from the small to the large and so predict effects which may be unnoticed in full-size ciphers.
(Note that this definition is somewhat broader than the now common understanding of a huge, and thus emulated, Simple Substitution. But there are ciphers which require blocked plaintext and which do not emulate Simple Substitution, and calling these something other than "block" ciphers negates the advantage of a taxonomy.)
Ciphertext expansion is the general situation: Stream ciphers need a message key, and block ciphers with a small block need some form of plaintext randomization, which generally needs an IV to protect the first block. Only block ciphers with a large size block generally can avoid ciphertext expansion, and then only if each block can be expected to hold sufficient uniqueness or "entropy" to prevent a codebook attack.
It is certainly true that in most situations of new construction a few extra bytes are not going to be a problem. However, in some situations, and especially when a cipher is to be installed into an existing system, the ability to encipher data without requiring additional storage can be a big advantage. Ciphering data without expansion supports the ciphering of data structures which have been defined and fixed by the rest of the system, provided only that one can place the cipher at the interface "between" two parts of the system. This is also especially efficient, as it avoids the process of acquiring a different, larger, amount of store for each ciphering. Such an installation also can apply to the entire system, and not require the re-engineering of all applications to support cryptography in each one.
In an analog system we might produce a known delay by slowly charging a capacitor and measuring the voltage across it continuously until the voltage reaches the desired level. A big problem with this is that the circuit becomes increasingly susceptible to noise at the end of the interval.
In a digital system we create a delay by simply counting clock cycles. Since all external operations are digital, noise effects are virtually eliminated, and we can easily create accurate delays which are as long as the count in any counter we can build.
Coding is a very basic part of modern computation and generally implies no secrecy or information hiding. Some codes are "secret codes," however, and then the transformation between the information and the coding is kept secret. Also see: cryptography and substitution.
The usual ciphertext-only approach depends upon the plaintext having strong statistical biases which make some values far more probable than others, and also more probable in the context of particular preceding known values. Such attacks can be defeated if the plaintext data are randomized and thus evenly and independently distributed among the possible values. (This may have been the motivation for the use of a random confusion sequence in a stream cipher.)
When a codebook attack is possible on a block cipher, the complexity of the attack is controlled by the size of the block (that is, the number of elements in the codebook) and not the strength of the cipher. This means that a codebook attack would be equally effective against either DES or Triple-DES.
One way a block cipher can avoid a codebook attack is by having a large block size which will contain an unsearchable amount of plaintext "uniqueness" or entropy. Another approach is to randomize the plaintext block, often by using an operating mode such as CBC.
n
( ) = C(n,k) = n! / (k! (n-k)!)
k
See the combinations section of the Ciphers By Ritter / JavaScript computation pages. Also see permutation.
Consider a block cipher: For any given size block, there is some fixed number of possible messages. Since every enciphering must be reversible (deciphering must work), we have a 1:1 mapping between plaintext and ciphertext blocks. The set of all plaintext values and the set of all ciphertext values is the same set; particular values just have different meanings in each set.
Keying gives us no more ciphertext values, it only re-uses the values which are available. Thus, keying a block cipher consists of selecting a particular arrangement or permutation of the possible block values. Permutations are a combinatoric topic. Using combinatorics we can talk about the number of possible permutations or keys in a block cipher, or in cipher components like substitution tables.
Permutations can be thought of as the number of unique arrangements of a given length on a particular set. Other combinatoric concepts include binomials and combinations (the number of unique given-length subsets of a given set).
Reversible combiners are used to encipher plaintext into ciphertext in a stream cipher. The ciphertext is then deciphered into plaintext using a related inverse or extractor mechanism.
Irreversible or non-invertible combiners are often used to mix multiple RNG's into a single confusion sequence, also for use in stream cipher designs.
Also see balanced combiner, additive combiner and complete, and The Story of Combiner Correlation: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.
Also see: associative and distributive.
Completeness does not require that an input bit change an output bit for every input value (which would not make sense anyway, since every output bit must be changed at some point, and if they all had to change at every point, we would have all the output bits changing, instead of the desired half). The inverse of a complete function is not necessarily also complete.
As originally defined in Kam and Davida:
"For every possible key value, every output bit ci of the SP network depends upon all input bits p1,...,pn and not just a proper subset of the input bits." [p.748]Kam, J. and G. Davida. 1979. Structured Design of Substitution-Permutation Encryption Networks. IEEE Transactions on Computers. C-28(10): 747-753.
The most successful components are extremely general and can be used in many different ways. Even as a brick is independent of the infinite variety of brick buildings, a flip-flop is independent of the infinite variety of logic machines which use flip-flops.
The source of the ability to design and build a wide variety of different electronic logic machines is the ability to interconnect and use a few very basic but very general parts.
Electronic components include
Cryptographic system components include:
A logic machine with:
Also see: source code, object code and software.
In number theory we say than integer a (exactly) divides
integer b (denoted
In number theory we say that integer a is congruent to
integer b
modulo m, denoted
Used in the analysis of signal processing to develop the response of a processing system to a complicated real-valued input signal. The input signal is first separated into some number of discrete impulses. Then the system response to an impulse -- the output level at each unit time delay after the impulse -- is determined. Finally, the expected response is computed as the sum of the contributions from each input impulse, multiplied by the magnitude of each impulse. This is an approximation to the convolution integral with an infinite number of infinitesimal delays. Although originally accomplished graphically, the process is just polynomial multiplication.
It is apparently possible to compute the convolution of two sequences by taking the FFT of each, multiplying these results term-by-term, then taking the inverse FFT. While there is an analogous relationship in the FWT, in this case the "delays" between the sequences represent mod 2 distance differences, which may or may not be useful.
One way to evaluate the correlation of two real-valued sequences is to multiply them together term-by-term and sum all results. If we do this for all possible "delays" between the two sequences, we get a "vector" or 1-dimensional array of correlations which is a convolution. Then the maximum value represents the delay with the best correlation.
"The correlation coefficient associated with a pair of Boolean functions f(a) and g(a) is denoted by C(f,g) and is given by
C(f,g) = 2 * prob(f(a) = g(a)) - 1 ."
Daemen, J., R. Govaerts and J. Vanderwalle. 1994. Correlation Matrices. Fast Software Encryption. 276. Springer-Verlag.
A CRC is essentially a fast remainder operation over a huge numeric value which is the data. (For best speed, the actual computation occurs as mod 2 polynomial operations.) The CRC result is an excellent (but linear) hash value corresponding to the data.
No CRC has any appreciable strength, but some applications -- even in cryptography -- need no strength:
Because there is no th