ebook img

Coding and Cryptography PDF

61 Pages·2008·0.501 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Coding and Cryptography

Coding and Cryptography Dr T.A. Fisher Michaelmas 2005 LATEXed by Sebastian Pancratz ii These notes are based on a course of lectures given by Dr T.A. Fisher in Part II of the Mathematical Tripos at the University of Cambridge in the academic year 2005(cid:21)2006. These notes have not been checked by Dr T.A. Fisher and should not be regarded as o(cid:30)cial notes for the course. In particular, the responsibility for any errors is mine (cid:22) please email Sebastian Pancratz (sfp25) with any comments or corrections. Contents 1 Noiseless Coding 3 2 Error-control Codes 11 3 Shannon’s Theorems 19 4 Linear and Cyclic Codes 31 5 Cryptography 45 Introduction to Communication Channels We model communication as illustrated in the following diagram. Source //Encoder /o /o /o /o /o c/oh/oanOO/o n/oel/o /o /o /o /o /o //Decoder //Receiver errors, (cid:16)noise(cid:17) Examples include telegraphs, mobile phones, fax machines, modems, compact discs, or a space probe sending back a picture. Basic Problem Given a source and a channel (modelled probabilistically), we must design an encoder anddecodertotransmitmessageseconomically(noiselesscoding,datacompression)and reliably (noisy coding). Example (Noiseless coding). In Morse code, common letters are given shorter code- words, e.g. A A, E E, Q Q, and Z Z. Noiseless coding is adapted to the source. Example(Noisycoding). EverybookhasanISBNa a ... a wherea ∈ {0,1,...,9} 1 2 10 i for 1 ≤ i ≤ 9 and a ∈ {0,1,...,9,X} with P10 ja ≡ 0 (mod 11). This allows 10 j=1 j detection of errors such as • one incorrect digit; • transposition of two digits. Noisy coding is adapted to the channel. Plan of the Course I. Noiseless Coding II. Error-control Codes III. Shannon’s Theorems IV. Linear and Cyclic Codes V. Cryptography Useful books for this course include the following. • D. Welsh: Codes & Cryptography, OUP 1988. • C.M. Goldie, R.G.E. Pinch: Communication Theory, CUP 1991. • T.M. Cover, J.A. Thomas: Elements of Information Theory, Wiley 1991. 2 Contents • W. Trappe, L.C. Washington: Introduction to Cryptography with Coding Theory, Prentice Hall 2002. The books mentioned above cover the following parts of the course. W G & P C & T T & W I & III X X X II & IV X X X V X X Overview De(cid:28)nition. Acommunicationchannel acceptssymbolsfromanalphabet Σ = {a ,...,a } 1 1 r and it outputs symbols from an alphabet Σ = {b ,...,b }. The channel is modelled by 2 1 s the probabilities P(y y ...y received) | x x ...x sent). 1 2 n 1 2 n De(cid:28)nition. A discrete memoryless channel (DMC) is a channel with p = P(b received | a sent) ij j i thesameforeachchanneluseandindependentofallpastandfutureusesofthechannel. The channel matrix is P = (p ), an r×s stochastic matrix. ij De(cid:28)nition. The binary symmetric channel (BSC) with error probability 0 ≤ p ≤ 1 has Σ = Σ = {0,1}. The channel matrix is (cid:0)1−p p (cid:1). A symbol is transmitted with 1 2 p 1−p probability 1−p. De(cid:28)nition. The binary erasure channel has Σ = {0,1} and Σ = {0,1,?}. The 1 2 (cid:0)1−p 0 p(cid:1) channel matrix is . 0 1−p p We model n uses of a channel by the nth extension with input alphabet Σn and ouput 1 alphabet Σn. 2 A code C of length n is a function M → Σn, where M is the set of possible messages. 1 Implicitly, we also have a decoding rule Σn → M. 2 The size of C is m = |M|. The information rate is ρ(C) = 1 log m. The error rate is n 2 eˆ(C) = max P(error | x sent). x∈M De(cid:28)nition. A channel can transmit reliably at rate R if there exists (C )∞ with C a n n=1 n code of length n such that lim ρ(C ) = R n n→∞ lim eˆ(C ) = 0 n n→∞ De(cid:28)nition. The capacity of a channel is the supremum over all reliable transmission rates. Fact. A BSC with error probability p < 1 has non-zero capacity. 2 Chapter 1 Noiseless Coding Notation. For Σ an alphabet, let Σ∗ = S Σn be the set of all (cid:28)nite strings from Σ. n≥0 Strings x = x ...x and y = y ...y have concatenation xy = x ...x y ...y . 1 r 1 s 1 r 1 s De(cid:28)nition. Let Σ ,Σ be alphabets. A code is a function f: Σ → Σ∗. The strings 1 2 1 2 f(x) for x ∈ Σ are called codewords or words. 1 Example (Greek Fire Code). Σ = {α,β,γ,...,ω}, Σ = {1,2,3,4,5}. f(α) = 1 2 11,f(β) = 12,...,f(ψ) = 53,f(ω) = 54. Here, xy means x torches held up and another y torches close-by. Example. Let Σ be the words in a given dictionary and Σ = {A,B,C,...,Z, }. 1 2 f is (cid:16)spell the word and follow by a space(cid:17). We send a message x ...x ∈ Σ∗ as 1 n 1 f(x )f(x )...f(x ) ∈ Σ∗, i.e. f extends to a function f∗: Σ∗ → Σ∗. 1 2 n 2 1 2 De(cid:28)nition. f is decipherable if f∗ is injective, i.e. each string from Σ corresponds to 2 at most one message. Note 1. Note that we need f to be injective, but this is not enough. Example. Let Σ = {1,2,3,4}, Σ = {0,1} and f(1) = 0,f(2) = 1,f(3) = 00,f(4) = 1 2 01. Then f∗(114) = 0001 = f∗(312). Here f is injective but not decipherable. Notation. If |Σ | = m, |Σ | = a then f is an a-ary code of size m. 1 2 Our aim is to construct decipherable codes with short word lengths. Assuming f is injective, the following codes are always decipherable. (i) A block code has all codewords the same length. (ii) A comma code reserves a letter from Σ to signal the end of a word. 2 (iii) A pre(cid:28)x-free code is one where no codeword is a pre(cid:28)x of any other distinct word. (If x,y ∈ Σ∗ then x is a pre(cid:28)x of y if y = xz for some z ∈ Σ∗.) 2 2 Note 2. Note that (i) and (ii) are special cases of (iii). Pre(cid:28)x-free codes are sometimes called instantaneous codes or self-punctuating codes. Exercise 1. Construct a decipherable code which is not pre(cid:28)x-free. Take Σ = {1,2},Σ = {0,1} and set f(1) = 0,f(2) = 01. 1 2 4 Noiseless Coding Theorem 1.1 (Kraft’s Inequality). Let |Σ | = m,|Σ | = a. A pre(cid:28)x-free code f: Σ → 1 2 1 Σ∗ with word lengths s ,...,s exists if and only if 2 1 m m X a−si ≤ 1. (∗) i=1 Proof. Rewrite (∗) as s X n a−l ≤ 1 (∗∗) l l=1 where n is the number of codewords of length l and s = max s . l 1≤i≤m i If f: Σ → Σ∗ is pre(cid:28)x-free then 1 2 n as−1+n as−2+···+n a+n ≤ as 1 2 s−1 s since the LHS is the number of strings of length s in Σ with some codeword of f as a 2 pre(cid:28)x and the RHS is the number of strings of length s. For the converse, given n ,...,n satisfying (∗∗), we need to construct a pre(cid:28)x-free code 1 s f with n codewords of length l, for all l ≤ s. We proceed by induction on s. The case l s = 1 is clear. (Here, (∗∗) gives n ≤ a, so we can choose a code.) By the induction 1 hypothesis,thereexistsapre(cid:28)x-freecodeg withn codewordsoflengthl foralll ≤ s−1. l (∗∗) implies n as−1+n as−2+···+n a+n ≤ as 1 2 s−1 s where the (cid:28)rst s−1 terms on the LHS sum to the number of strings of length s with some codeword of g as a pre(cid:28)x and the RHS is the number of strings of length s. Hence we can add at least n new codewords of length s to g and maintain the pre(cid:28)x-free s property. Remark. The proof is constructive, i.e. just choose codewords in order of increasing length, ensuring that no previous codeword is a pre(cid:28)x. Theorem 1.2 (McMillan). Any decipherable code satis(cid:28)es Kraft’s inequality. Proof (Kamish). Let f: Σ → Σ∗ be a decipherable code with word lengths s ,...,s . 1 2 1 m Let s = max s . For r ∈ N, 1≤i≤m i m !r rs X X a−si = b a−l l i=1 l=1 where b = |{x ∈ Σr : f∗(x) has length l}| l 1 ≤ |Σl| = al 2 using that f∗ is injective. Then m !r rs X X a−si ≤ ala−l = rs i=1 l=1 m X a−si ≤ (rs)1/r → 1 as r → ∞. i=1 Therefore, P a−si ≤ 1. i=1 5 Corollary 1.3. A decipherable code with prescribed word lengths exists if and only if a pre(cid:28)x-free code with the same word lengths exists. Entropy is a measure of (cid:16)randomness(cid:17) or (cid:16)uncertainty(cid:17). A random variable X takes P values x ,...,x with probabilities p ,...,p , where 0 ≤ p ≤ 1 and p = 1. The 1 n 1 n i i entropy H(X) is roughly speaking the expected number of fair coin tosses needed to simulate X. Example. Supposethatp = p = p = p = 1. Identify{x ,x ,x ,x } = {HH,HT,TH,TT}, 1 2 3 4 4 1 2 3 4 i.e. the entropy is H = 2. Example. Let (p ,p ,p ,p ) = (1, 1, 1, 1). 1 2 3 4 2 4 8 8 x uuHuuuu:: 1 IIITIIIII$$ rNrNrNHTrNrNrNrN99Nx’’ 2NpNpNpHNpNpNpNp’’77x3 T x 4 Hence H = 1 ·1+ 1 ·2+ 1 ·3+ 1 ·3 = 7. 2 4 8 8 4 De(cid:28)nition. The entropy of X is H(X) = −Pn p logp = H(p ,...,p ), where in i=1 i i 1 n this course log = log . 2 Note 3. H(X) is always non-negative. It is measured in bits. Exercise 2. By convention 0log0 = 0. Show that xlogx → 0 as x → 0. Example. A biased coin has P(Heads) = p, P(Tails) = 1−p. We abbreviate H(p,1−p) as H(p). Then H(p) = −plogp−(1−p)log(1−p) 1−p H0(p) = log p 1 0.8 0.6 H(p) 0.4 0.2 0 0.2 0.4 0.6 0.8 1 p The entropy is greatest for p = 1, i.e. a fair coin. 2 6 Noiseless Coding Lemma 1.4 (Gibbs’ Inequality). Let (p ,...,p ) and (q ,...,q ) be probability distri- 1 n 1 n butions. Then n n X X − p logp ≤ − p logq i i i i i=1 i=1 with equality if and only if p = q for all i. i i Proof. Since logx = lnx, we may replace log by ln for the duration of this proof. Let ln2 I = {1 ≤ i ≤ n : p 6= 0}. i 3 2 1 0 1 2 3 4 5 6 x –1 –2 –3 We have lnx ≤ x−1 ∀x > 0 (∗) with equality if and only if x = 1. Hence q q i i ln ≤ −1 ∀i ∈ I p p i i ∴ Xp ln qi ≤ Xq −Xp i i i p i i∈I i∈I i∈I X = q −1 i i∈I ≤ 0 X X ∴ − p lnp ≤ − p lnq i i i i i∈I i∈I n n X X ∴ − p lnp ≤ − p lnq i i i i i=1 i=1 If equality holds then P q = 1 and qi = 1 for all i ∈ I. Therefore, p = q for all i∈I i pi i i 1 ≤ i ≤ n. Corollary 1.5. H(p ,...,p ) ≤ logn with equality if and only if p = ... = p = 1. 1 n 1 n n Proof. Take q = ··· = q = 1 in Gibb’s inequality. 1 n n Let Σ = {µ ,...,µ }, |Σ | = a. The random variable X takes values µ ,...,µ with 1 1 m 2 1 m probabilities p ,...,p . 1 m

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.