Coding and Cryptography Dr T.A. Fisher Michaelmas 2005 LATEXed by Sebastian Pancratz ii These notes are based on a course of lectures given by Dr T.A. Fisher in Part II of the Mathematical Tripos at the University of Cambridge in the academic year 2005(cid:21)2006. These notes have not been checked by Dr T.A. Fisher and should not be regarded as o(cid:30)cial notes for the course. In particular, the responsibility for any errors is mine (cid:22) please email Sebastian Pancratz (sfp25) with any comments or corrections. Contents 1 Noiseless Coding 3 2 Error-control Codes 11 3 Shannon’s Theorems 19 4 Linear and Cyclic Codes 31 5 Cryptography 45 Introduction to Communication Channels We model communication as illustrated in the following diagram. Source //Encoder /o /o /o /o /o c/oh/oanOO/o n/oel/o /o /o /o /o /o //Decoder //Receiver errors, (cid:16)noise(cid:17) Examples include telegraphs, mobile phones, fax machines, modems, compact discs, or a space probe sending back a picture. Basic Problem Given a source and a channel (modelled probabilistically), we must design an encoder anddecodertotransmitmessageseconomically(noiselesscoding,datacompression)and reliably (noisy coding). Example (Noiseless coding). In Morse code, common letters are given shorter code- words, e.g. A A, E E, Q Q, and Z Z. Noiseless coding is adapted to the source. Example(Noisycoding). EverybookhasanISBNa a ... a wherea ∈ {0,1,...,9} 1 2 10 i for 1 ≤ i ≤ 9 and a ∈ {0,1,...,9,X} with P10 ja ≡ 0 (mod 11). This allows 10 j=1 j detection of errors such as • one incorrect digit; • transposition of two digits. Noisy coding is adapted to the channel. Plan of the Course I. Noiseless Coding II. Error-control Codes III. Shannon’s Theorems IV. Linear and Cyclic Codes V. Cryptography Useful books for this course include the following. • D. Welsh: Codes & Cryptography, OUP 1988. • C.M. Goldie, R.G.E. Pinch: Communication Theory, CUP 1991. • T.M. Cover, J.A. Thomas: Elements of Information Theory, Wiley 1991. 2 Contents • W. Trappe, L.C. Washington: Introduction to Cryptography with Coding Theory, Prentice Hall 2002. The books mentioned above cover the following parts of the course. W G & P C & T T & W I & III X X X II & IV X X X V X X Overview De(cid:28)nition. Acommunicationchannel acceptssymbolsfromanalphabet Σ = {a ,...,a } 1 1 r and it outputs symbols from an alphabet Σ = {b ,...,b }. The channel is modelled by 2 1 s the probabilities P(y y ...y received) | x x ...x sent). 1 2 n 1 2 n De(cid:28)nition. A discrete memoryless channel (DMC) is a channel with p = P(b received | a sent) ij j i thesameforeachchanneluseandindependentofallpastandfutureusesofthechannel. The channel matrix is P = (p ), an r×s stochastic matrix. ij De(cid:28)nition. The binary symmetric channel (BSC) with error probability 0 ≤ p ≤ 1 has Σ = Σ = {0,1}. The channel matrix is (cid:0)1−p p (cid:1). A symbol is transmitted with 1 2 p 1−p probability 1−p. De(cid:28)nition. The binary erasure channel has Σ = {0,1} and Σ = {0,1,?}. The 1 2 (cid:0)1−p 0 p(cid:1) channel matrix is . 0 1−p p We model n uses of a channel by the nth extension with input alphabet Σn and ouput 1 alphabet Σn. 2 A code C of length n is a function M → Σn, where M is the set of possible messages. 1 Implicitly, we also have a decoding rule Σn → M. 2 The size of C is m = |M|. The information rate is ρ(C) = 1 log m. The error rate is n 2 eˆ(C) = max P(error | x sent). x∈M De(cid:28)nition. A channel can transmit reliably at rate R if there exists (C )∞ with C a n n=1 n code of length n such that lim ρ(C ) = R n n→∞ lim eˆ(C ) = 0 n n→∞ De(cid:28)nition. The capacity of a channel is the supremum over all reliable transmission rates. Fact. A BSC with error probability p < 1 has non-zero capacity. 2 Chapter 1 Noiseless Coding Notation. For Σ an alphabet, let Σ∗ = S Σn be the set of all (cid:28)nite strings from Σ. n≥0 Strings x = x ...x and y = y ...y have concatenation xy = x ...x y ...y . 1 r 1 s 1 r 1 s De(cid:28)nition. Let Σ ,Σ be alphabets. A code is a function f: Σ → Σ∗. The strings 1 2 1 2 f(x) for x ∈ Σ are called codewords or words. 1 Example (Greek Fire Code). Σ = {α,β,γ,...,ω}, Σ = {1,2,3,4,5}. f(α) = 1 2 11,f(β) = 12,...,f(ψ) = 53,f(ω) = 54. Here, xy means x torches held up and another y torches close-by. Example. Let Σ be the words in a given dictionary and Σ = {A,B,C,...,Z, }. 1 2 f is (cid:16)spell the word and follow by a space(cid:17). We send a message x ...x ∈ Σ∗ as 1 n 1 f(x )f(x )...f(x ) ∈ Σ∗, i.e. f extends to a function f∗: Σ∗ → Σ∗. 1 2 n 2 1 2 De(cid:28)nition. f is decipherable if f∗ is injective, i.e. each string from Σ corresponds to 2 at most one message. Note 1. Note that we need f to be injective, but this is not enough. Example. Let Σ = {1,2,3,4}, Σ = {0,1} and f(1) = 0,f(2) = 1,f(3) = 00,f(4) = 1 2 01. Then f∗(114) = 0001 = f∗(312). Here f is injective but not decipherable. Notation. If |Σ | = m, |Σ | = a then f is an a-ary code of size m. 1 2 Our aim is to construct decipherable codes with short word lengths. Assuming f is injective, the following codes are always decipherable. (i) A block code has all codewords the same length. (ii) A comma code reserves a letter from Σ to signal the end of a word. 2 (iii) A pre(cid:28)x-free code is one where no codeword is a pre(cid:28)x of any other distinct word. (If x,y ∈ Σ∗ then x is a pre(cid:28)x of y if y = xz for some z ∈ Σ∗.) 2 2 Note 2. Note that (i) and (ii) are special cases of (iii). Pre(cid:28)x-free codes are sometimes called instantaneous codes or self-punctuating codes. Exercise 1. Construct a decipherable code which is not pre(cid:28)x-free. Take Σ = {1,2},Σ = {0,1} and set f(1) = 0,f(2) = 01. 1 2 4 Noiseless Coding Theorem 1.1 (Kraft’s Inequality). Let |Σ | = m,|Σ | = a. A pre(cid:28)x-free code f: Σ → 1 2 1 Σ∗ with word lengths s ,...,s exists if and only if 2 1 m m X a−si ≤ 1. (∗) i=1 Proof. Rewrite (∗) as s X n a−l ≤ 1 (∗∗) l l=1 where n is the number of codewords of length l and s = max s . l 1≤i≤m i If f: Σ → Σ∗ is pre(cid:28)x-free then 1 2 n as−1+n as−2+···+n a+n ≤ as 1 2 s−1 s since the LHS is the number of strings of length s in Σ with some codeword of f as a 2 pre(cid:28)x and the RHS is the number of strings of length s. For the converse, given n ,...,n satisfying (∗∗), we need to construct a pre(cid:28)x-free code 1 s f with n codewords of length l, for all l ≤ s. We proceed by induction on s. The case l s = 1 is clear. (Here, (∗∗) gives n ≤ a, so we can choose a code.) By the induction 1 hypothesis,thereexistsapre(cid:28)x-freecodeg withn codewordsoflengthl foralll ≤ s−1. l (∗∗) implies n as−1+n as−2+···+n a+n ≤ as 1 2 s−1 s where the (cid:28)rst s−1 terms on the LHS sum to the number of strings of length s with some codeword of g as a pre(cid:28)x and the RHS is the number of strings of length s. Hence we can add at least n new codewords of length s to g and maintain the pre(cid:28)x-free s property. Remark. The proof is constructive, i.e. just choose codewords in order of increasing length, ensuring that no previous codeword is a pre(cid:28)x. Theorem 1.2 (McMillan). Any decipherable code satis(cid:28)es Kraft’s inequality. Proof (Kamish). Let f: Σ → Σ∗ be a decipherable code with word lengths s ,...,s . 1 2 1 m Let s = max s . For r ∈ N, 1≤i≤m i m !r rs X X a−si = b a−l l i=1 l=1 where b = |{x ∈ Σr : f∗(x) has length l}| l 1 ≤ |Σl| = al 2 using that f∗ is injective. Then m !r rs X X a−si ≤ ala−l = rs i=1 l=1 m X a−si ≤ (rs)1/r → 1 as r → ∞. i=1 Therefore, P a−si ≤ 1. i=1 5 Corollary 1.3. A decipherable code with prescribed word lengths exists if and only if a pre(cid:28)x-free code with the same word lengths exists. Entropy is a measure of (cid:16)randomness(cid:17) or (cid:16)uncertainty(cid:17). A random variable X takes P values x ,...,x with probabilities p ,...,p , where 0 ≤ p ≤ 1 and p = 1. The 1 n 1 n i i entropy H(X) is roughly speaking the expected number of fair coin tosses needed to simulate X. Example. Supposethatp = p = p = p = 1. Identify{x ,x ,x ,x } = {HH,HT,TH,TT}, 1 2 3 4 4 1 2 3 4 i.e. the entropy is H = 2. Example. Let (p ,p ,p ,p ) = (1, 1, 1, 1). 1 2 3 4 2 4 8 8 x uuHuuuu:: 1 IIITIIIII$$ rNrNrNHTrNrNrNrN99Nx’’ 2NpNpNpHNpNpNpNp’’77x3 T x 4 Hence H = 1 ·1+ 1 ·2+ 1 ·3+ 1 ·3 = 7. 2 4 8 8 4 De(cid:28)nition. The entropy of X is H(X) = −Pn p logp = H(p ,...,p ), where in i=1 i i 1 n this course log = log . 2 Note 3. H(X) is always non-negative. It is measured in bits. Exercise 2. By convention 0log0 = 0. Show that xlogx → 0 as x → 0. Example. A biased coin has P(Heads) = p, P(Tails) = 1−p. We abbreviate H(p,1−p) as H(p). Then H(p) = −plogp−(1−p)log(1−p) 1−p H0(p) = log p 1 0.8 0.6 H(p) 0.4 0.2 0 0.2 0.4 0.6 0.8 1 p The entropy is greatest for p = 1, i.e. a fair coin. 2 6 Noiseless Coding Lemma 1.4 (Gibbs’ Inequality). Let (p ,...,p ) and (q ,...,q ) be probability distri- 1 n 1 n butions. Then n n X X − p logp ≤ − p logq i i i i i=1 i=1 with equality if and only if p = q for all i. i i Proof. Since logx = lnx, we may replace log by ln for the duration of this proof. Let ln2 I = {1 ≤ i ≤ n : p 6= 0}. i 3 2 1 0 1 2 3 4 5 6 x –1 –2 –3 We have lnx ≤ x−1 ∀x > 0 (∗) with equality if and only if x = 1. Hence q q i i ln ≤ −1 ∀i ∈ I p p i i ∴ Xp ln qi ≤ Xq −Xp i i i p i i∈I i∈I i∈I X = q −1 i i∈I ≤ 0 X X ∴ − p lnp ≤ − p lnq i i i i i∈I i∈I n n X X ∴ − p lnp ≤ − p lnq i i i i i=1 i=1 If equality holds then P q = 1 and qi = 1 for all i ∈ I. Therefore, p = q for all i∈I i pi i i 1 ≤ i ≤ n. Corollary 1.5. H(p ,...,p ) ≤ logn with equality if and only if p = ... = p = 1. 1 n 1 n n Proof. Take q = ··· = q = 1 in Gibb’s inequality. 1 n n Let Σ = {µ ,...,µ }, |Σ | = a. The random variable X takes values µ ,...,µ with 1 1 m 2 1 m probabilities p ,...,p . 1 m