Table Of ContentTopics in kernel hypothesis testing
Kacper Chwialkowski
Adissertationsubmittedinpartialfulfillment
oftherequirementsforthedegreeof
DoctorofPhilosophy
of
UniversityCollegeLondon.
DepartmentofComputerScience,GatsbyComputationalNeuroscienceUnit
UniversityCollegeLondon
October4,2016
2
I, Kacper Chwialkowski, confirm that the work presented in this thesis is my
own. Where information has been derived from other sources, I confirm that
thishasbeenindicatedinthework.
Ithankmyparents.
Abstract
This thesis investigates some unaddressed problems in kernel nonparametric
hypothesistesting. Thecontributionsaregroupedaroundthreemainthemes:
Wild Bootstrap for Degenerate Kernel Tests. A wild bootstrap method for non-
parametric hypothesis tests based on kernel distribution embeddings is pro-
posed. This bootstrap method is used to construct provably consistent tests
that apply to random processes. It applies to a large group of kernel tests
based on V-statistics, which are degenerate under the null hypothesis, and
non-degenerate elsewhere. In experiments, the wild bootstrap gives strong
performanceonsyntheticexamples,onaudiodata,andinperformancebench-
markingfortheGibbssampler.
A Kernel Test of Goodness of Fit. A nonparametric statistical test for goodness-
of-fit is proposed: given a set of samples, the test determines how likely it
is that these were generated from a target density function. The measure of
goodness-of-fit is a divergence constructed via Stein’s method using functions
fromaReproducingKernelHilbertSpace. Constructionofthetestisbasedon
the wild bootstrap method. We apply our test to quantifying convergence of
approximate Markov Chain Monte Carlo methods, statistical model criticism,
and evaluating quality of fit vs model complexity in nonparametric density
estimation.
Fast Analytic Functions Based Two Sample Test. A class of nonparametric two-
sample tests with a cost linear in the sample size is proposed. Two tests are
given, both based on an ensemble of distances between analytic functions
representing each of the distributions. Experiments on artificial benchmarks
andonchallengingreal-worldtestingproblemsdemonstrategoodpower/time
tradeoffretainedeveninhighdimensionalproblems.
Abstract 5
Themaincontributionstosciencearethefollowing. Weprovethatthekernel
tests based on the wild bootstrap method tightly control the type one error
on the desired level and are consistent i.e. type two error drops to zero with
increasing number of samples. We construct a kernel goodness of fit test that
requires only knowledge of the density up to an normalizing constant. We
usethistesttoconstructfirstconsistenttestforconvergenceofMarkovChains
and use it to quantify properties of approximate MCMC algorithms. Finally,
we construct a linear time two-sample test that uses new, finite dimensional
featurerepresentationofprobabilitymeasures.
Contents
1 Introduction 9
1.1 ResearchMotivation . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 ResearchObjectives. . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 ContributionstoScience . . . . . . . . . . . . . . . . . . . . . . 13
1.4 StructureoftheThesis . . . . . . . . . . . . . . . . . . . . . . . 17
2 BackgroundandLiteratureReview 18
2.1 ApplicationDomain . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Modelingtechniques . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 WildBootstrapforDegenerateKernelTests 32
3.1 AsymptoticdistributionofwildbootstrappedVstatistics . . . . 32
3.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 ApplicationstoKernelTests . . . . . . . . . . . . . . . . . . . . 47
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 AKernelTestofGoodnessofFit. 59
4.1 TestDefinition: StatisticandThreshold . . . . . . . . . . . . . . 59
4.2 ProofsoftheMainResults . . . . . . . . . . . . . . . . . . . . . 63
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 FastAnalyticFunctionsBasedTwoSampleTest. 75
5.1 Analyticembeddingsanddistances . . . . . . . . . . . . . . . . 76
5.2 HypothesisTestsBasedonDistancesBetweenAnalyticFunctions 79
5.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5 ParametersChoice . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 ConclusionsandFutureWork 95
Contents 7
Appendices 109
A PreliminaryarticleonHSICfortimeseries 110
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.3 HSICforrandomprocesses . . . . . . . . . . . . . . . . . . . . 115
A.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.6 AKernelIndependenceTestforRandomProcesses-Supplemen-
tary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Contents 8
Thethesisarebasedonthefollowingpublications:
• KacperChwialkowskiandArthurGretton. Akernelindependencetestfor
randomprocesses. InICML,2014
• KacperChwialkowski,DinoSejdinovic,andArthurGretton. Awildboot-
strapfordegeneratekerneltests. InAdvancesinNeuralInformationPro-
cessingSystems27,pages3608–3616.CurranAssociates,Inc.,2014
• Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur
Gretton. Fast two-sample testing with analytic representations of prob-
ability measures. In Advances in Neural Information Processing Systems,
pages1972–1980,2015
• Kacper Chwialkowski, Heiko Strathmann, and Arthur Gretton. A kernel
testofgoodnessoffit. InICML,2016
Chapter 1
Introduction
In this chapter, we describe the research motivation, research objectives, con-
tributionstoscience,andfinishwithanoutlineofthedocument.
Research Motivation
Wild Bootstrap for Degenerate Kernel Tests. Statistical kernel tests, that is
testsbasedondistributionembeddingsintoreproducingkernelHilbertspaces
(RKHS), have been applied in many contexts, including two sample tests by
Harchaouietal.[52],Grettonetal.[47],Sugiyamaetal.[107],testsofinde-
pendencebyGrettonetal.[44],Zhangetal.[115],Besserveetal.[14],testsof
conditional independence by Fukumizu et al. [37], Gretton et al. [49], Zhang
et al. [115], test for higher order (Lancaster) interactions by Sejdinovic et al.
[87], and normality test by Baringhaus and Henze [8]. Another example is
kernelgoodnesses-of-fittest,developedinthisthesis.
For these tests, consistency is usually guaranteed if the observations are in-
dependent and identically distributed (i.i.d.), an exception being Zhang et al.
[116]. Much real-world data fails to satisfy the i.i.d. assumption: audio sig-
nals, EEG recordings, text documents, financial time series, and samples ob-
tainedwhenrunningMarkovChainMonteCarlo,allshowsignificanttemporal
dependencepatterns.
Theasymptoticbehaviourofstatistics, usedinkerneltests, maybecomequite
different when temporal dependencies exist within the samples. In this case,
the null distribution is shown to be an infinite weighted sum of dependent χ2-
variables, as opposed to the sum of independent χ2-variables obtained in the
case of i.i.d. observations [44]. The difference in the asymptotic null distri-
10
butions has important implications in practice: under the i.i.d. assumption,
an empirical estimate of the null distribution can be obtained by repeatedly
permuting the time indices of one of the signals. This breaks the temporal
dependence within the permuted signal, which causes the test to return an
elevatednumberoffalsepositives,whenusedfortestingtimeseries.
To address this problem, in the preliminary work we proposed an alternative
estimate of the null distribution, where the null distribution is simulated by
repeatedly shifting one signal relative to the other (which is a form a block
bootstrap). This preserves the temporal structure within each signal, while
breakingthecross-signaldependence.
A serious limitation of the shift bootstrap procedure is that it is specific to
the problem of independence testing: there is no obvious way to generalise it
to other testing contexts. For instance, we might have two time series, with
the goal of comparing their marginal distributions - this is a generalization of
the two-sample setting to which the shift approach does not apply. Another
exampleiskernelgoodness-of-fittestdevelopedinthisthesis.
It is interestingto study whether someother bootstrap procedure can beused
in place of the Shift bootstrap, so that all kernel tests are consistent if applied
todatawithtemporaldependencepatterns.
A Kernel Test of Goodness of Fit. A particular type of a statical test, a
goodness-of-fittest,isafundamentaltoolinstatisticalanalysis,datingbackto
the test of Kolmogorov and Smirnov [62, 94]. Given a set of samples {Z }n
i i=1
with distribution Z ∼q, our interest is in whether q matches some reference
i
ortargetdistributionp,whichweassumetobeonlyknownuptothenormal-
isation constant. This setting, in which target density is not exactly known,
is quite challenging and particularly relevant in Markov Chain Monte Carlo
diagnostic.
Recently Gorham and Mackey [40] proposed an elegant measure of sample
quality with respect to a target. This measure is a maximum discrepancy be-
tweenempiricalsampleexpectationsandtargetexpectationsoveralargeclass
of test functions, constructed so as to have zero expectation over the target
distribution by use of a Stein operator. This operator depends only on the
derivative of the logq: thus, the approach can be applied very generally, as it
Description:where the log pdf f, is approximated using a finite dictionary of random Fourier [33] Yanqin Fan and Aman Ullah. Brownian distance covariance.