ebook img

Compression and Predictive Distributions for Large Alphabets PDF

107 Pages·2015·2.95 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Compression and Predictive Distributions for Large Alphabets

Abstract Compression and Predictive Distributions for Large Alphabets Xiao Yang 2015 Data generated from large alphabet exist almost everywhere in our life, for exam­ ple, texts, images and videos. Traditional universal compression algorithms mostly involve small alphabets and assume implicitly an asymptotic condition under which the extra bits induced in the compression process vanishes as an infinite number of data come. In this thesis, we put the focus on compression and prediction for large alphabets with the alphabet size comparable or larger than the sample size. Wc first consider sequences of random variables independent and identically gen­ erated from a large alphabet. In particular, the size of the sample is allowed to be variable. A product distribution based on Poisson sampling and tilting is proposed as the coding distribution, which highly simplifies the implementation and analysis through independence. Moreover, we characterize the behavior of the coding dis­ tribution through a condition on the tail sum of the ordered counts, and apply it to sequences satisfying this condition. Further, we apply this method to envelope classes. This coding distribution provides a convenient method to approximately compute the Shtarkov’s normalized maximum likelihood (NML) distribution. And the extra price paid for this convenience is small compared to the total cost. Fur­ thermore, we find this coding distribution can also be used to calculate the NML distribution by a Monte Carlo method with no extra price. And this calculation remains simple due to the independence of the coding distribution. 2 Further, we consider a more realistic class - the Markov class, and in particular, tree sources. A context tree based algorithm is designed to describe the dependencies among the contexts. It is a greedy algorithm which seeks for the greatest savings in codelength when constructing the tree. Compression and prediction of individual counts associated with the contexts uses the same coding distribution as in the i.i.d case. Combining these two procedures, we demonstrate a compression algorithm based on the tree model. Results of simulation and real data experiments for both the i.i.d model and Markov model have been included to illustrate the performance of the proposed algorithm. Compression and Predictive Distributions for Large Alphabets A Dissertation Presented to the Faculty of the Graduate School of Yale University in Candidacy for the Degree of Doctor of Philosophy by Xiao Yang Dissertation Director: Andrew R Barron May 2015 UMI Number: 3663558 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Di!ss0?t&Ciori Piiblish^lg UMI 3663558 Published by ProQuest LLC 2015. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Copyright © 2015 by Xiao Yang All rights reserved. Contents 1 Introduction 1 1.1 Universal compression .................................................................................. 2 1.2 Normalized maximum likelihood distribution........................................ 4 2 i.i.d model 5 2.1 Introduction................................................................................................ 6 2.2 The Poisson M odel.................................................................................... 20 2.3 Results.......................................................................................................... 22 2.3.1 Regret............................................................................................... 22 2.3.2 Subset of sequences with partitioned counts............................... 25 2.3.3 Envelope class................................................................................... 29 2.3.4 Regret with unknown total count................................................ 31 2.3.5 Conditional distributions induced by the tilted Stirling ratio distribution........................................................................................ 34 2.3.6 Computational simplicity................................................................ 35 2.3.7 Computating Shtarkov’s NML distribution using Qa .................. 36 2.3.8 Prediction......................................................................................... 37 2.4 Application................................................................................................. 39 2.4.1 Simulation......................................................................................... 39 iii 2.4.2 Real d a ta ........................................................................................ 42 2.5 Discussion.................................................................................................... 42 3 Markov model 45 3.1 Introduction................................................................................................. 46 3.2 i.i.d class....................................................................................................... 54 3.3 Tree source ................................................................................................. 55 3.3.1 Coding cost..................................................................................... 55 3.3.2 Description cost............................................................................... 56 3.3.3 Using codelength to construct the tre e ...................................... 56 3.4 A real example........................................................................................... 57 3.5 Conclusion.................................................................................................... 57 3.6 Discussion.................................................................................................... 58 4 Summary and future work 60 Appendices 63 A Proof of Theorems 64 A.l Some facts.................................................................................................... 65 A.2 Proof of Theorem 3.2................................................................................. 74 A.3 Proof of Pythagorean Equality................................................................... 79 A.4 Redundancy................................................................................................. 81 A.5 Proof of Theorem 2.3.................................................................................. 84 B Supplementary materials 87 B.l Incompatibility of ................................................................................. 88 P n B.2 Computation complexity........................................................................... 88 iv B.3 Approximation of c List of Figures 2.1 Relationship between a and Ca.................................................................. 14 2.2 Relationship between a* and ^ .................................................................. 16 2.3 Regret for case m ~ n................................................................................. 26 2.4 Relationship between a and ................................................................... 33 Va 2.5 Regret of using tilted Stirling ratio distribution for algebraically de­ creasing counts 40 2.6 Regret of using tilted Stirling ratio distribution for an algebraically decreasing envelope class.............................................................................. 41 2.7 Regret of Qa^ for L from 1 to m................................................................ 43 3.1 An example context tree with A = {a, b, c, d} where • represents “oth­ ers” .................................................................................................................. 53 3.2 Context tree for Fortress Beseiged............................................................. 59 A.l tilted distribution and the T(l,1) density with a = 0.01....................... 68 A.2 Tilted distribution and the Gamma density. The relevant sum is only to the left of T with a = 0.01..................................................................... 70 A.3 Tilted distribution and the Gamma density. The relevant sum is only to the right of T with a = 0.01.................................................................. 71 vi Acknowledgements The past six years is a long journey, but I am very happy I made it. At the end of this trip, I want to sincerely thank my advisor Prof.Andrew Barron. He guided me onto this challenging and interesting path, and gave me endless patience and support. I used to think he is a genius. Since it looks like he knows everything in statistics and can always find the crux, but he doesn’t seem to research a lot (mostly because he has other responsibilities like teaching and advising students). Later on, I heard a story about him. It said when he was a graduate student, he read tons of papers and sometimes slept in the office. This is not a unique story, but it told me that the distance between “genius” and me is this determination and commitment. Besides dedication to research, his generous and positive personality also influenced me a lot. He never hesitates to share ideas or to compliment others. I still remember once I had some small progress in my research, he was so happy that he gave me a high five. I am extremely grateful to have him as my advisor. I also wish to thank my committee Prof. Joseph Chang and Prof. Mokshay Madiman. One thing I learn from Joe is no matter how complex one thing is, he can always explain it in a simple and intuitive manner. Only people who truly capture the essence and turn into their own understandings can do this. To me, it is a magic ability. Mokshay also gave me strong support. He never said no when I requested to meet him even when he was very busy. Because of their presence, I am able to reach the destination. Moreover, I would like to thank Prof. Peter Jones, Prof. Jun’ichi Takeuchi, Prof. Wojciech Szpankowski, Prof. Narayana Santhanam and Prof. Teemu Roos. They

Description:
Compression and Predictive Distributions for Large Alphabets. A Dissertation. Presented to the Faculty of the Graduate School of. Yale University .. Chapter 2. i.i.d model. Submitted to IEEE Transactions on Information Theory as Xiao Yang and Andrew Barron. (2013), Large Alphabet Compression and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.