Table Of Content

Stochastic approximation and least-squares regression, with applications to machine learning Nicolas Flammarion To cite this version: Nicolas Flammarion. Stochastic approximation and least-squares regression, with applications to machine learning. Machine Learning [stat.ML]. Université Paris sciences et lettres, 2017. English. NNT: 2017PSLEE056. tel-01693865v2 HAL Id: tel-01693865 https://theses.hal.science/tel-01693865v2 Submitted on 4 Jul 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ` THESE DE DOCTORAT de l’Universite´ de recherche Paris Sciences Lettres PSL Research University Pre´pareé a` l’Ećole normale supe´rieure Stochastic Approximation and Least-Squares Regression, with Applications to Machine Learning Approximation Stochastique et Re´gression par Moindres Carre´s : Applications en Apprentissage Automatique ´ Ecole doctorale n 386 � EĆOLE DOCTORALE DE SCIENCES MATHE´MATIQUES DE PARIS CENTRE Spećialite´ MATHE´MATIQUES APPLIQUEÉS COMPOSITION DU JURY: M. Je´roˆme Bolte TSE Toulouse, Rapporteur M. Shai Shalev-Shwartz TheHebrewUniversityofJerusalem, Rapporteur (Absent) M. Alexandre d’Aspremont Soutenue par Nicolas Flammarion CNRS-ENSParis,Directeurdethe`se le 24.07.2017 M. Francis Bach INRIA-ENSParis,Directeurdethe`se Dirigeé par Alexandre d’ASPREMONT M. Arnak Dalalyan et Francis BACH ENSAE Paris, Membre du Jury M. Eric Moulines CMAPEPParis, Pre´sidentduJury RESEARCH UNIVERSITY PARIS ÉCOLENORMALE SUPÉRIEURE Was Du für ein Geschenk hältst, ist ein Problem, das Du lösen sollst. L. Wittgenstein, Vermischte Bemerkungen What you are regarding as a gift is a problem for you to solve. Dedicated to my parents, my sisters and Adèle Abstract Many problems in machine learning are naturally cast as the minimization of a smooth function defined on a Euclidean space. For supervised learning, this in- cludes least-squares regression and logistic regression. While small-scale problems with few input features may be solved efficiently by many optimization algorithms (e.g., Newton’s method), large-scale problems with many high-dimensional features are typically solved with first-order techniques based on gradient descent, leading to algorithms with many cheap iterations. In this manuscript, we consider the particular case of the quadratic loss. In the first part, we are interested in its minimization, considering that its gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We propose different algorithms to efficiently solve these minimization problems in many cases. In the second part, we consider two applications of the quadratic loss in machine learning: unsupervised learning, specifically clustering and statistical estimation, specifically estimation with shape constraints. In the first main contribution of the thesis, we provide a unified framework for optimizing non-strongly convex quadratic functions, which encompasses accelerated gradient descent, averaged gradient descent and the heavyball method. They are studied through second-order difference equations for which stability is equivalent to an O(1/n2) convergence rate. This new framework suggests an alternative algorithm that exhibits the positive behavior of both averaging and acceleration. The second main contribution aims at obtaining the optimal prediction error rates for least-squares regression, both in terms of dependence on the noise of the problem and of forgetting the initial conditions. Our new algorithm rests upon averaged acceleratedgradientdescentandisanalyzedunderfinerassumptionsonthecovariance matrix of the input data and the initial conditions of the algorithm which leads to tighter convergence rates expressed with dimension-free quantities. Thethirdmaincontributionofthethesisdealswiththeminimizationofcomposite objective functions composed of the expectation of quadratic functions and a convex function. We show that stochastic dual averaging with a constant step-size has a convergence rate O(1/n) without strong convexity assumption, extending earlier re- sults on least-squares regression to any regularizer and any geometry represented by a Bregman divergence. As a fourth contribution, we consider the problem of clustering high-dimensional data. We present a novel sparse extension of the discriminative clustering framework and propose a natural extension for the multi-label scenario. We also provide the first theoretical analysis of this formulation with a simple probabilistic model and vii an efficient iterative algorithm with better running-time complexity than existing methods. The fifth main contribution of the thesis deals with the seriation problem, which consists in permuting the rows of a given matrix in such way that all its columns have the same shape. We propose a statistical approach to this problem where the matrix of interest is observed with noise and study the corresponding minimax rate of estimation of the matrices. We also suggest a computationally efficient estimator whose performance is studied both theoretically and experimentally. Keywords: Convexoptimization,acceleration,averaging,stochasticgradient,least- squares regression, stochastic approximation, dual averaging, mirror descent, discriminative clustering, convex relaxation, sparsity, statistical seriation, permutation learning, minimax estimation, shape constraints. viii Résumé De nombreux problèmes en apprentissage automatique sont formellement équiv- alents à la minimisation d’une fonction lisse définie sur un espace euclidien. Plus précisément, dans le cas de l’apprentissage automatique supervisé, cela inclut la ré- gression par moindres carrés et la régression logistique. Alors que les problèmes de petite taille, avec peu de variables, peuvent être résolus efficacement à l’aide de nombreux algorithmes d’optimisation (la méthode de Newton par exemple), les problèmes de grande échelle, avec de nombreuses données en grande dimension, sont, quant à eux, généralement traités à l’aide de méthodes du premier ordre, dérivées de la descente de gradient, conduisant à des algorithmes avec de nombreuses itérations peu coûteuses. Dans ce manuscrit, nous considérons le cas particulier de la perte quadratique. Dans une première partie, nous nous intéressons à la minimisation de celle-ci dans l’hypothèse où nous accédons à ses gradients par l’intermédiaire d’un oracle stochastique. Celui-ciretournelegradientévaluéaupointdemandéplusunbruitd’espérance nulle et de variance finie. Nous proposons différents algorithmes pour résoudre efficacement ce problème dans de multiples cas. Dans une seconde partie, nous con- sidérons deux applications différentes de la perte quadratique à l’apprentissage automatique : la première en apprentissage non-supervisé, plus spécifiquement en par- titionnement des données, et la seconde en estimation statistique, plus précisément en estimation sous contrainte de forme. La première contribution de cette thèse est un cadre unifié pour l’optimisation de fonctions quadratiques non-fortement convexes. Celui-ci comprend la descente de gradient accélérée, la descente de gradient moyennée et la méthode de la balle lourde. Ces méthodes sont étudiées grâce à des équations aux différences finies du second ordre dont la stabilité est équivalente à une vitesse de convergence O(1/n2) de la méthode étudiée. Ce nouveau cadre nous permet de proposer un algorithme alternatif qui combine les aspects positifs du moyennage et ceux de l’accélération. La deuxième contribution est d’obtenir le taux optimal d’erreur de prédiction pour la régression par moindres carrés en fonction de la dépendance, à la fois au bruit du problème et à l’oubli des conditions initiales. Notre nouvel algorithme tire son origine de la descente de gradient accélérée et moyennée et nous l’analysons sous des hypothèses plus fines sur la matrice de covariance des données et sur les conditions initiales de l’algorithme. Cette nouvelle analyse aboutit à des taux de convergence plus tendus qui ne font pas intervenir la dimension du problème. La troisième contribution de cette thèse traite du problème de la minimisation ix

Description:

we consider two applications of the quadratic loss in machine learning: .. tion, stochastic approximation and online learning, which are the main

Stochastic approximation and least-squares regression, with applications to machine learning PDF

305 Pages·2017·4.34 MB·English

by Nicolas Flammarion

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Stochastic approximation and least-squares regression, with applications to machine learning PDF Free - Full Version

by Nicolas Flammarion| 2017| 305 pages| 4.34| English

Download Stochastic approximation and least-squares regression, with applications to machine learning by Nicolas Flammarion in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Stochastic approximation and least-squares regression, with applications to machine learning

we consider two applications of the quadratic loss in machine learning: .. tion, stochastic approximation and online learning, which are the main

Detailed Information

Author:	Nicolas Flammarion
Publication Year:	2017
Pages:	305
Language:	English
File Size:	4.34
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Stochastic approximation and least-squares regression, with applications to machine learning Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Stochastic approximation and least-squares regression, with applications to machine learning PDF?

Yes, on https://PDFdrive.to you can download Stochastic approximation and least-squares regression, with applications to machine learning by Nicolas Flammarion completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Stochastic approximation and least-squares regression, with applications to machine learning on my mobile device?

After downloading Stochastic approximation and least-squares regression, with applications to machine learning PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Stochastic approximation and least-squares regression, with applications to machine learning?

Yes, this is the complete PDF version of Stochastic approximation and least-squares regression, with applications to machine learning by Nicolas Flammarion. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Stochastic approximation and least-squares regression, with applications to machine learning PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.