--------------------------------- Linear differential cryptanalysis --------------------------------- ==================================================== (1) Linear cryptanalysis in the large -- 3.3 Stinson ==================================================== Big idea: what if we can find some linear relationship between some subset of plaintext bits and some subset of state bits -- for example, the XOR of those bits is not "1" 1/2 of the time -- it is "1" either greater than or less than 1/2 of the time -- so if attacker has some large # of PT/CT pairs (that were all encrypted using the same key K) -- then we iterate through candidate keys k' and, for each, we figure out what its values are for the relevant plaintext and state bits -- if the linear relationship (from above) holds, we increment some counter for that candidate key -- else we don't -- at end we think the candidate key with the frequency count which is the farthest away from 1/2 is our likely K ------------------- The Piling-Up Lemma ------------------- Let's say we have X_1, X_2, ... and each X_i is an independent random variable which takes on the value of 0 with probability p_i and the value of 1 with probability (1 - p_i) Then let's say i != j, then since X_i and X_j are *independent* we have: Pr[X_i = 0, X_j = 0] = p_i * p_j Pr[X_i = 0, X_j = 1] = p_i * ( 1 - p_j ) Pr[X_i = 1, X_j = 0] = (1 - p_i) * p_j Pr[X_i = 1, X_j = 1] = (1 - p_i) * (1 - p_j) Now let's consider a discrete RV: X_i XOR X_j Then: Pr[ (X_i XOR X_j) = 0] = (p_i * p_j) + (1 - p_i) * (1 - p_j) Pr[ (X_i XOR X_j) = 1] = p_i * (1 - p_j) + (1 - p_i) * p_j Then we talk about the probability distribution of an RV which takes on only values 0 or 1 as the BIAS of the distribution ... i.e. the distribution's distance away from 1/2 e_i = the bias of X_i = p_i - 1/2 -1/2 <= e_i <= 1/2 Pr[ X_i = 0 ] = p_i = 1/2 + e_i Pr[ X_i = 1 ] = 1 - p_i = 1 - (1/2 + e_i) = 1/2 - e_i So the bias of the RV which represents X_i XOR X_j would be: 2 * e_i * e_j b/c Pr[ (X_i XOR X_j) = 0] = (1/2 + e_i) * (1/2 + e_j) + (1/2 - e_i) * (1/2 - e_j) = 1/4 + 1/2 e_j + 1/2 e_i + e_i * e_j + 1/4 - 1/2 e_j - 1/2 e_i + e_i * e_j = 1/2 + 2 * e_i * e_j Pr[ (X_i XOR X_j) = 1] = (1/2 + e_i) * (1/2 - e_j) + (1/2 - e_i) * (1/2 + e_j) = 1/4 - 1/2 e_j + 1/2 e_i - e_i * e_j + 1/4 + 1/2 e_j - 1/2 e_i - e_i * e_j = 1/2 - 2 * e_i * e_j --> so the bias for the XOR of those two RVs would be: 2 * e_i * e_j To generalize: Then, let's say we have k independent RVs; then for i_1 < i_2 < ... < i_k (these will be indices) we have the RV: X_i1 XOR ... XOR X_ik Then the bias of that RV is: e_i1,i2,...,ik and e_i1,i2,...,ik = 2^(k-1) * PRODUCT (from j = 1 to k) of e_ij Proof by induction. Note that this implies that if the bias of *any single RV* is 0, then the bias of the XOR including that RV is also 0. So in order to have a nonzero bias for the XOR of RVs, then every RV must have a non-zero bias (implying that a single bit of true randomness provides true randomness for the overall XOR including that bit!). RVs must be independent; imagine we have e_1 = e_2 = e_3 = 1/4 e_1,2 = e_2,3 = e_1,3 = 1/8 then consider the RV : X_1 XOR X_3 X_1 XOR X_3 = (X_1 XOR X_2) XOR (X_2 XOR X_3) if (X_1 XOR X_2) and (X_2 XOR X_3) were *independent* then the bias of e_1,3 would be: 2 * (1/8)^2 = 1/32 which is not the case because these two RVs, (X_1 XOR X_2) and (X_2 XOR X_3) are not independent ------------------------------ Applying this to DES's S-boxes ------------------------------ Imagine we have a random variable that represents the XOR of the input bits (X_i1, X_i2, X_i3, X_i4, X_i5, X_i6) to an S-box as well as the output bits (Y_j1, Y_j2, Y_j3, Y_j4) of that S-box; Then we can compute the bias of that RV using our piling up lemma; Note that when we're computing the XOR of some bits, if those bits have an odd # of 1s, the XOR is 1 if those bits have an even # of 1s, the XOR is 0 since XOR is just addition mod 2. For page 83 of Stinson, Example 3.2 (bias of RV : X_3 XOR X_4 XOR Y_1 XOR Y_4) we get that the XOR = 0 1/8 the XOR = 1 7/8 so since p = Pr[the RV = 0], we have: p = 1/8 then e = p - 1/2 ==> e = 1/8 - 1/2 = -3/8 ------------------------------------- Building a linear approximation table ------------------------------------- So imagine that we have an S-box that takes 4 bits of input and spits out 4 bits of output. And we want to construct a table that represents the bias of an RV comprised of the XOR of all possible combos of input and output - for example, we want to see X_1 XOR X_2 XOR X_3 XOR X_4 XOR Y_1 XOR Y_2 XOR Y_3 XOR Y_4 -- this would be represented by binary (1111,1111) == (15,15) == (F,F) - or instead X_1 XOR X_2 XOR Y_3 would be represented by (1100,0010) == (12,2) == (C,2) - then we think of (xxxx,xxxx) = (a,b) - so we build a 2-dimensional table where "a" is represented by a hex value and so is b - and we obtain the value for any (a,b) via : N_L(a,b) - and we determine the value at that location, N_L(a,b), via ( XOR (from i = 1 to 4) of a_i * x_i ) XOR ( XOR (from i = 1 to 4) of b_i * y_i ) - where a : a_1 || a_2 || ... || a_4 - and b : b_1 || ... || b_4 - so a_i and b_i determine whether we want the value of X_i or Y_i respectively for that i - and x_i, y_i are 0 or 1 depending upon the value of X_i, Y_i, respectively for that i - then we get the bias from that via: e(a,b) = ( N_L(a,b) - 8 )/16 - if we look at N_L(1001,0100) = N_L(9,4) == 8 (as we calculated above) if we look at N_L(0011,1001) = N_L(3,9) == 2 (as we calculated above) ------------------------------------------------------------ A linear attack on an Substitution-Permutation Network (SPN) ------------------------------------------------------------ So we're going to consider some subset of S-boxes And we'll pick those S-boxes which have biases with high abs values Notation: S_i-j : i is the round, j is the S-box In the first round, we'll pick the second S-box, S_1-2 In the second round, we'll pick the second S-box, S_2-2 In the third round, we'll pick the second S-box, S_3-2, and the fourth S-box, S_3-4 We figure out a particular S-box's bias by getting the bias of the RV representing the XOR of its input and output bits For example if an S-box is the first (farthest left) and it takes input from bits in positions 1,3,4 and has output of bit in position 2 then we compute the bias of that S-box by computing bias( (5,7,8), (6) ) = bias( 1011, 0100 ) = -1/4 b/c bits in positions 5,7,8 --> 1011 (input bits) bits in positions 6 --> 0100 (output bits) S_1-2 bias: bias(1011,0100) = bias(B,4) = 1/4 S_2-2 bias: bias(0100,0101) = bias(4,5) = -1/4 S_3-2 bias: bias(0100,0101) = bias(4,5) = -1/4 S_3-4 bias: bias(0100,0101) = bias(4,5) = -1/4 Then we introduce four random variables that represent each of these biases; T_1 = 1/4 (corresponds to S_1-2 bias) T_2 = -1/4 (corresponds to S_2-2 bias) T_3 = -1/4 (corresponds to S_3-2 bias) T_4 = -1/4 (corresponds to S_3-4 bias) Now... to figure the bias for T_1 XOR T_2 XOR T_3 XOR T_4 we note that anyplace in the text where we have: U_1-5 ( U_i-j is the input val for round i in bit position j) U_1-5 = X_5 XOR K_1-5 because the input for any S-box is the XOR of a single bit from the plaintext or intermed value (X_i) AND a single bit from the current round key from the same position as the X_i U_1-7 = X_7 XOR K_1-7 ... Further, since the input for any round (not equal to the first round) is the output of the previous round XORed with the key bits for the current round U_2-6 = V_1-6 XOR K_2-6 the input for round 2 in bit position 6 is the output of round 1 in bit position 6 XORed with the key bit for round 2 in bit position 6 NOTE that in some cases will need to look at the schematic of the SPN in order to figure out *which* V_1-x is used but will always use K_2-6 in this case (or its analog in related cases). T_1 = U_1-5 XOR U_1-7 XOR U_1-8 XOR V_1-6 = X_5 XOR K_1-5 XOR X_7 XOR K_1-7 XOR X_8 XOR K_1-8 XOR V_1-6 T_2 = U_2-6 XOR V_2-6 XOR V_2-8 = V_1-6 XOR K_2-6 XOR V_2-6 XOR V_2-8 T_3 = U_3-6 XOR V_3-6 XOR V_3-8 = V_2-6 XOR K_3-6 XOR V_3-6 XOR V_3-8 T_4 = U_3-14 XOR V_3-14 XOR V_3-16 = V_2-14 XOR K_3-14 XOR V_3-14 XOR V_3-16 and we note from the picture that V_2-14 is actually V_2-8 = V_2-8 XOR K_3-14 XOR V_3-14 XOR V_3-16 So... we see another reason why this choice of S-boxes was a good one: if we XOR all of the terms (T_1, T_2, T_3, T_4), we'll see some 'intermediate' terms drop out -- this also illustrates that these four RVs are not, in fact, independent but we may still be able to approximate the bias of T_1 XOR T_2 XOR T_3 XOR T_4 using the method that we use for indy RVs. T_1 XOR T_2 XOR T_3 XOR T_4 = X_5 XOR X_7 XOR X_8 XOR V_3-6 XOR V_3-8 XOR V_3-14 XOR V_3-16 XOR K_1-5 XOR K_1-7 XOR K_1-8 XOR K_2-6 XOR K_3-6 XOR K_3-14 If we compute the bias assuming the 4 T's are indy RVs, we have: 2^3 * (1/4) * (-1/4)^3 = -1/32 This is what we would have gotten if we computed the bias manually (but won't always work this way given non-indy RVs, of course). Now we want to iteratively get rid of the V_i-j terms via replacing them with U_m-n and K_i-(j+1) terms. In this case we're following the V's down versus following the U's up. V_3-6 = U_4-6 XOR K_4-6 (via inspection) V_3-8 = U_4-14 XOR K_4-14 (via inspection) V_3-14 = U_4-8 XOR K_4-8 (via inspection) V_3-16 = U_4-16 XOR K_4-16 (via inspection) Then resubstitute into our equation again: X_5 XOR X_7 XOR X_8 XOR (U_4-6 XOR K_4-6) XOR (U_4-14 XOR K_4-14) XOR (U_4-8 XOR K_4-8) XOR (U_4-16 XOR K_4-16) XOR K_1-5 XOR K_1-7 XOR K_1-8 XOR K_2-6 XOR K_3-6 XOR K_3-14 This expression only involves plaintext bits, key bits, and bits of u^4 (state). If we imagine that the key bits are fixed, then this RV: K_1-5 XOR K_1-7 XOR K_1-8 XOR K_2-6 XOR K_3-6 XOR K_3-14 XOR K_4-6 XOR K_4-14 XOR K_4-8 XOR K_4-16 --> has a fixed value (0 or 1) So ... the RV of the remaining terms: X_5 XOR X_7 XOR X_8 XOR U_4-6 XOR U_4-14 XOR U_4-8 XOR U_4-16 has a bias = +/- 1/32 --> sign of bias depends on unknown key bits --> since this latest RV (of the plaintext and u^4 bits) has a non-zero bias, we have some linearity upon which to mount an attack. Imagine we have T PT/CT pairs -- all of which were encrypted using K - we'll be able to obtain 8 key bits K_5-5, K_5-6, K_5-7, K_5-8, K_5-13, K_5-14, K_5-15, K_5-16 (this SPN has 5 rounds) - there are 256 candidate subkeys for these 8 key bits - these 8 key bits are XORed with the output of S_4-2 and S_4-4 - we compute a partial decryptoin of a CT y and get the state inputs to these two S-boxes --> U_4-6, U_4-8, U_4-14, U_4-16 - then we XOR those with our 3 plaintext bits: X_5, X_7, X_8 in order to solve for our RV with bias +/- 1/32 - we do this via: -- maintain an array of counters indexed by the 256 possible subkeys -- increment the counter corresponding to a subkey whenever for that subkey's partial decryption of y, X_5 XOR X_7 XOR X_8 XOR U_4-6 XOR U_4-8 XOR U_4-14 XOR U_4-16 equals 0. -- after iterating thru all candidate subkeys, most counters should have a value ~= T/2 -- however the counter for the correct candidate subkey will have a value that is close to: T/2 +/- T/32 -- in so doing we hope to obtain 8 key bits ========================================================== (2) Differential cryptanalysis in the large -- 3.4 Stinson ========================================================== For differential cryptanalysis, we're looking at the XOR of two inputs compared to the XOR of their corresponding two outputs. - XOR the inputs compare this to the XOR of the outputs x' = x XOR x* (so x and x* are our two inputs, e.g.) y' = y XOR y* (y, y* are the two corresponding outputs) Chosen plaintext attack. - attacker has large # of PT/CT pairs - for any pair of PTs, the two are encrypted using the same key K PI_S : {0,1}^m --> {0,1}^n is an S-box input-XOR: x XOR x* of length m output-XOR: PI_S(x) XOR PI_S(x*) of length n for any x' in {0,1}^m, define the set DELTA(x') -- consists of all ordered pairs (x,x*) whose input-XOR equals x' for each pair in DELTA(x'), we compute the output-XOR of the S-box then we compute the resulting distribution of output-XORs - there are 2^m output-XORs out of 2^n possible values - for any m != n, we will necessarily have non-uniformly distributed output -- which provides the basis for the attack -------- EXAMPLE: -------- x' = 1011 DELTA(x') = { (0000,1011), (0001,1010), ..., (1111,0100) } for each ordered pair in DELTA(x') we can figure ou thte output-XOR of PI_S (from a table) and then look at the distribution of values within that table Then we devise another distribution table, N_D(x',y'), which contains the number of pairs with input-XOR equal to x' and output-XOR equal to y' so each ROW corresponds to some particular input-XOR value, e.g. 1011 - then each col contains the # of output-XORs whose value is the binary representation of the hex column head e.g. for x' = 1011, we have a' = x' = 11 == B in HEX then we take the XORs of each of our ciphertext pairs and tabulate how many of them equal 0 and put that # in column zero for example in this case, we have 8 (y,y*) pairs whose XOR is 0010 so under column 0010 = 2 HEX for this row we have the value 8 We can do substitutions similar to what we did above for U_i-j except we have the added benefit of being able to XOR out the key bits since we used the same key to encipher both plaintexts. - so the input XOR equals the permuted output-XOR of the preevious round So if we let a' be the input-XOR and b' be the output-XOR, then we call (a',b') a differential. - each entry in the table described above (e.g. N_D(a',b')) gives an XOR- propagation ratio for that differential R_p(a',b') = N_D(a',b') / 2^m = Pr[ output-XOR = b' | input-XOR = a'] We can construct differential trails by NOTICING when the R_p for an input-XOR differential is the same as the R_p for a permuted output-XOR differential - we assume that the propagation ratios comprising a trail are independent - and in so doing can obtain the est. propag ratio of the trail (via multip) Tuples (x, x*, y, y*) for which the differential holds are "right pairs" - these are what allow us to figure out the relevant key bits - so can explicitly filter out noise (improving reliability?) ======================================================================================== (3) Description of the linearity present in DES ("On the Security of DES", Shamir, '85): ======================================================================================== -- recall that each S-box takes 6 bits as input where two of those bits determine the row and 4 of those bits determine the column -- then the output is 4 bits -- recall also that the S-boxes are what make DES non-linear to the extent that it is -- we note that for the first S-box, S_1, for 1/2 of the possible input columns (columns 1 through 8), there are 7 4-bit outputs whose XOR equals 0 (out of 32 possible outputs). Whereas for the other 1/2 of possible input columns (columns 9 through 16), there are 25 4-bit outputs whose XOR equals 0. So if we have that 4-bit output and its XOR is 0, then it's more probable that the input column was from that second half of columns. -- this same pattern (of unequal distribution of 4-bit output strings whose XOR is 0) holds for each of the 8 S-boxes, though in some cases the right half (of cols) will produce more -- as above -- whereas in other cases, for S_3, S_4, S_6, it is the left half of columns whose output's XOR is more likely to be 0. And it's not always 7 vs 25; in some cases it's worse (S_5 is 6 vs. 26) and in some cases it's not so stark (S_4 is 20 vs. 12) -- recall as well that the input for each of the 8 S-boxes is the XOR of 48-bit round key (for that round) and the expansion of the 32-bit right half