Table Of ContentExploratory model analysis for machine learning
Biye Jiang
Electrical Engineering and Computer Sciences
University of California at Berkeley
Technical Report No. UCB/EECS-2018-34
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-34.html
May 9, 2018
Copyright © 2018, by the author(s).
All rights reserved.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
Exploratory model analysis for machine learning
by
Biye Jiang
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
Computer Science
in the
Graduate Division
of the
University of California, Berkeley
Committee in charge:
Professor John Canny, Chair
Professor Marti Hearst
Professor Joshua Blumenstock
Spring 2018
Exploratory model analysis for machine learning
Copyright 2018
by
Biye Jiang
1
Abstract
Exploratory model analysis for machine learning
by
Biye Jiang
Doctor of Philosophy in Computer Science
University of California, Berkeley
Professor John Canny, Chair
Machine learning is growing in importance in many different fields. However, it is still
very hard for users to tune hyper-parameters when optimizing their models, or perform a
comprehensiveandinterpretablediagnosisforcomplexmodelslikedeepneuralnets. Existing
developer tool like TensorBoard only provides limited functionality which usually visualizes
model statistics based on metrics predefined before the training starts. Almost nothing can
be adjusted during the training. But the real model optimization and diagnosis procedures
actually involve lots of interaction and continuous experiments. To tackle those challenges,
wedevelopedaframeworkwhichallowsuserstoperformexploratorymodelanalysisbasedon
the hardware accelerated machine learning toolkit BIDMach. Our system is unique that we
allow users to interactively add visualizations and adjust hyper-parameters during training.
Also, we use Monte Carlo style algorithms to allow users to explore the entire model space
under different user preferences rather than only reaching the local optimums.
We demonstrate the usage of our system in several real-world applications. For problems
like advertisement optimization or clustering where multiple optimization objectives exist,
users can incorporate secondary criteria into the model-generation process and make trade-
offs in an interactive way. For deep convolution neural net diagnosis, users can use our
LDAM (Langevin Dynamic Activation Maximization) algorithm to systematically explore
the images that stimulate a given neuron. We conduct experiments and user studies on
several public datasets to show how users from different background can benefit from our
tools.
i
To my parents and the star brightening my way
ii
Contents
Contents ii
List of Figures iv
List of Tables vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 BIDMach: high-performance, customized machine learning toolkit . . . . . . 2
2 Interactive customized optimiztion 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 System design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 A case study of interactive machine learning 24
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Experimental method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Quantitative results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Subjects response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Diagnostic visualization for deep learning 44
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Case study: MNIST dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 CIFAR dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 ImageNet Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
iii
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Designing new model structures 66
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Sparsification of the embedding matrix . . . . . . . . . . . . . . . . . . . . . 68
5.4 Power law shape matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6 Discussion 77
Bibliography 79
iv
List of Figures
1.1 BIDMach’s Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Visualization Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Dashboard for KMeans using MNIST dataset . . . . . . . . . . . . . . . . . . . 9
2.3 Model visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Continuous visualization of the likelihood function . . . . . . . . . . . . . . . . . 12
2.5 Performance indictors for clustering . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Interactive tuning for KMeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Clustering on ImageNet data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Clustering using pool5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Clustering using pool1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.10 Transit from pool1 to pool5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.11 Overlapping topics, K=32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.12 High temperature with high L1-reg . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.13 Likelihood back to normal, sparsity preserved . . . . . . . . . . . . . . . . . . . 22
2.14 Cosine similarity decreased . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Each control is associated with a measurement . . . . . . . . . . . . . . . . . . 28
3.2 Visualization Dashboard for topic modeling . . . . . . . . . . . . . . . . . . . . 28
3.3 Visualization Dashboard for KMeans . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Distribution of task completion time . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Topic visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Group 1 spent more time looking at topic visualization . . . . . . . . . . . . . . 36
3.7 Effect of adjusting temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 Different types of system response . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Monte carlo sampling can obtain samples (blue dots) from the entire space while
optimization method usually only obtains one local optimal and is highly depen-
dent on initialization (red dot). . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 System architecture. Red arrows indicate gradient flow (backward pass), blue
arrows indicate forward pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
v
4.3 The interface of our system. Upper panel is visualizing the image samples. Each
column represents one class, and each row represents different MCMC sampling
procedures. Bottom panel contains the control sliders for the hyper-parameters. 55
4.4 Images generated by LDAM for output neurons in LeNet trained on MNIST
dataset, from left to right: 0,1,2,3,4,5,6,7,9. (Image value not clipped, an offset
of 128 is added when displaying) . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Comparing activation maximization results for LeNet models trained with differ-
ent methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Images generated by LDAM with different discriminator weight for output neu-
rons in LeNet trained on MNIST dataset. (Image value clipped at 0) . . . . . . 59
4.7 Comparing images generated by LDAM for different model architectures trained
on CIFAR10 dataset. Each column represents a class: from left to right: airplane,
automobile, bird, cat, deer, dog, frog, horse, ship, truck . . . . . . . . . . . . . . 60
4.8 Images generated by LDAM for output neurons from a network with 3 convolu-
tional layers followed by 2 fully connected layers, under different discriminator
weights. Each column represents a class: from left to right: airplane, automobile,
bird, cat, deer, dog, frog, horse, ship, truck (Image value clipped at 0). . . . . . 61
4.9 Images generated by LDAM for output neurons from the VGG-16 network, under
differentdiscriminatorweights. Eachcolumnrepresentsaclass: fromlefttoright:
airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. . . . . . . . . 62
4.10 Comparing activation maximization results for the VGG-16 models trained with
different methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.11 Images generated by LDAM for the output neuron of mushroom class from the
AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.12 ImagesgeneratedbyLDAMforneuronsfromdifferentlayersinAlexNet. Starting
fromtheoutputneuronformushroomclassinFC8layer, wefinditsmostrelevant
neurons in FC7 layer using the weight matrix. . . . . . . . . . . . . . . . . . . 64
4.13 T-SNE embedding results for the 256 neurons in Conv5 layer based on their
activation vector. Nearby neurons can generate similar activating images. . . . . 65
5.1 Power law shape matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Log-Log plot of the word count for PTB dataset . . . . . . . . . . . . . . . . . . 69
5.3 Parameters distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 The effect of embedding matrix pruning for different L1-regularizations . . . . . 70
5.5 Effect of L1-regularization on embedding matrix via heatmap . . . . . . . . . . 70
5.6 Distribution of non-zero values for embedding matrix, λ = 1e−4. . . . . . . . 71
Description:But the real model optimization and diagnosis procedures Also, we use Monte Carlo style algorithms to allow users to explore the .. Thanks to Polo Chau, Aditya Khosla, Xiangmin Fan for organizing the Projects like Jupyter notebook already reshaped the way how people write python code and.