ebook img

Topics in kernel hypothesis testing PDF

138 Pages·2016·1.91 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Topics in kernel hypothesis testing

Topics in kernel hypothesis testing Kacper Chwialkowski Adissertationsubmittedinpartialfulfillment oftherequirementsforthedegreeof DoctorofPhilosophy of UniversityCollegeLondon. DepartmentofComputerScience,GatsbyComputationalNeuroscienceUnit UniversityCollegeLondon October4,2016 2 I, Kacper Chwialkowski, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that thishasbeenindicatedinthework. Ithankmyparents. Abstract This thesis investigates some unaddressed problems in kernel nonparametric hypothesistesting. Thecontributionsaregroupedaroundthreemainthemes: Wild Bootstrap for Degenerate Kernel Tests. A wild bootstrap method for non- parametric hypothesis tests based on kernel distribution embeddings is pro- posed. This bootstrap method is used to construct provably consistent tests that apply to random processes. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. In experiments, the wild bootstrap gives strong performanceonsyntheticexamples,onaudiodata,andinperformancebench- markingfortheGibbssampler. A Kernel Test of Goodness of Fit. A nonparametric statistical test for goodness- of-fit is proposed: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein’s method using functions fromaReproducingKernelHilbertSpace. Constructionofthetestisbasedon the wild bootstrap method. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation. Fast Analytic Functions Based Two Sample Test. A class of nonparametric two- sample tests with a cost linear in the sample size is proposed. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. Experiments on artificial benchmarks andonchallengingreal-worldtestingproblemsdemonstrategoodpower/time tradeoffretainedeveninhighdimensionalproblems. Abstract 5 Themaincontributionstosciencearethefollowing. Weprovethatthekernel tests based on the wild bootstrap method tightly control the type one error on the desired level and are consistent i.e. type two error drops to zero with increasing number of samples. We construct a kernel goodness of fit test that requires only knowledge of the density up to an normalizing constant. We usethistesttoconstructfirstconsistenttestforconvergenceofMarkovChains and use it to quantify properties of approximate MCMC algorithms. Finally, we construct a linear time two-sample test that uses new, finite dimensional featurerepresentationofprobabilitymeasures. Contents 1 Introduction 9 1.1 ResearchMotivation . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 ResearchObjectives. . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3 ContributionstoScience . . . . . . . . . . . . . . . . . . . . . . 13 1.4 StructureoftheThesis . . . . . . . . . . . . . . . . . . . . . . . 17 2 BackgroundandLiteratureReview 18 2.1 ApplicationDomain . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Modelingtechniques . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 WildBootstrapforDegenerateKernelTests 32 3.1 AsymptoticdistributionofwildbootstrappedVstatistics . . . . 32 3.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 ApplicationstoKernelTests . . . . . . . . . . . . . . . . . . . . 47 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4 AKernelTestofGoodnessofFit. 59 4.1 TestDefinition: StatisticandThreshold . . . . . . . . . . . . . . 59 4.2 ProofsoftheMainResults . . . . . . . . . . . . . . . . . . . . . 63 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5 FastAnalyticFunctionsBasedTwoSampleTest. 75 5.1 Analyticembeddingsanddistances . . . . . . . . . . . . . . . . 76 5.2 HypothesisTestsBasedonDistancesBetweenAnalyticFunctions 79 5.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 ParametersChoice . . . . . . . . . . . . . . . . . . . . . . . . . 93 6 ConclusionsandFutureWork 95 Contents 7 Appendices 109 A PreliminaryarticleonHSICfortimeseries 110 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.3 HSICforrandomprocesses . . . . . . . . . . . . . . . . . . . . 115 A.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.6 AKernelIndependenceTestforRandomProcesses-Supplemen- tary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Contents 8 Thethesisarebasedonthefollowingpublications: • KacperChwialkowskiandArthurGretton. Akernelindependencetestfor randomprocesses. InICML,2014 • KacperChwialkowski,DinoSejdinovic,andArthurGretton. Awildboot- strapfordegeneratekerneltests. InAdvancesinNeuralInformationPro- cessingSystems27,pages3608–3616.CurranAssociates,Inc.,2014 • Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of prob- ability measures. In Advances in Neural Information Processing Systems, pages1972–1980,2015 • Kacper Chwialkowski, Heiko Strathmann, and Arthur Gretton. A kernel testofgoodnessoffit. InICML,2016 Chapter 1 Introduction In this chapter, we describe the research motivation, research objectives, con- tributionstoscience,andfinishwithanoutlineofthedocument. Research Motivation Wild Bootstrap for Degenerate Kernel Tests. Statistical kernel tests, that is testsbasedondistributionembeddingsintoreproducingkernelHilbertspaces (RKHS), have been applied in many contexts, including two sample tests by Harchaouietal.[52],Grettonetal.[47],Sugiyamaetal.[107],testsofinde- pendencebyGrettonetal.[44],Zhangetal.[115],Besserveetal.[14],testsof conditional independence by Fukumizu et al. [37], Gretton et al. [49], Zhang et al. [115], test for higher order (Lancaster) interactions by Sejdinovic et al. [87], and normality test by Baringhaus and Henze [8]. Another example is kernelgoodnesses-of-fittest,developedinthisthesis. For these tests, consistency is usually guaranteed if the observations are in- dependent and identically distributed (i.i.d.), an exception being Zhang et al. [116]. Much real-world data fails to satisfy the i.i.d. assumption: audio sig- nals, EEG recordings, text documents, financial time series, and samples ob- tainedwhenrunningMarkovChainMonteCarlo,allshowsignificanttemporal dependencepatterns. Theasymptoticbehaviourofstatistics, usedinkerneltests, maybecomequite different when temporal dependencies exist within the samples. In this case, the null distribution is shown to be an infinite weighted sum of dependent χ2- variables, as opposed to the sum of independent χ2-variables obtained in the case of i.i.d. observations [44]. The difference in the asymptotic null distri- 10 butions has important implications in practice: under the i.i.d. assumption, an empirical estimate of the null distribution can be obtained by repeatedly permuting the time indices of one of the signals. This breaks the temporal dependence within the permuted signal, which causes the test to return an elevatednumberoffalsepositives,whenusedfortestingtimeseries. To address this problem, in the preliminary work we proposed an alternative estimate of the null distribution, where the null distribution is simulated by repeatedly shifting one signal relative to the other (which is a form a block bootstrap). This preserves the temporal structure within each signal, while breakingthecross-signaldependence. A serious limitation of the shift bootstrap procedure is that it is specific to the problem of independence testing: there is no obvious way to generalise it to other testing contexts. For instance, we might have two time series, with the goal of comparing their marginal distributions - this is a generalization of the two-sample setting to which the shift approach does not apply. Another exampleiskernelgoodness-of-fittestdevelopedinthisthesis. It is interestingto study whether someother bootstrap procedure can beused in place of the Shift bootstrap, so that all kernel tests are consistent if applied todatawithtemporaldependencepatterns. A Kernel Test of Goodness of Fit. A particular type of a statical test, a goodness-of-fittest,isafundamentaltoolinstatisticalanalysis,datingbackto the test of Kolmogorov and Smirnov [62, 94]. Given a set of samples {Z }n i i=1 with distribution Z ∼q, our interest is in whether q matches some reference i ortargetdistributionp,whichweassumetobeonlyknownuptothenormal- isation constant. This setting, in which target density is not exactly known, is quite challenging and particularly relevant in Markov Chain Monte Carlo diagnostic. Recently Gorham and Mackey [40] proposed an elegant measure of sample quality with respect to a target. This measure is a maximum discrepancy be- tweenempiricalsampleexpectationsandtargetexpectationsoveralargeclass of test functions, constructed so as to have zero expectation over the target distribution by use of a Stein operator. This operator depends only on the derivative of the logq: thus, the approach can be applied very generally, as it

Description:
where the log pdf f, is approximated using a finite dictionary of random Fourier [33] Yanqin Fan and Aman Ullah. Brownian distance covariance.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.