Articulated Models for Human Motion Analysis Ph.D. Thesis Dissertation by Adolfo L´opez M´endez Submitted to the Universitat Polit`ecnica de Catalunya (UPC) in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY October 2012 Supervised by Dr. Josep R. Casas Ph.D. program on Signal Theory and Communications Curs acadèmic: Acta de qualificació de tesi doctoral Nom i cognoms DNI / NIE / Passaport Programa de doctorat Unitat estructural responsable del programa Resolució del Tribunal Reunit el Tribunal designat a l'efecte, el doctorand / la doctoranda exposa el tema de la seva tesi doctoral titulada __________________________________________________________________________________________ _________________________________________________________________________________________. Acabada la lectura i després de donar resposta a les qüestions formulades pels membres titulars del tribunal, aquest atorga la qualificació: APTA/E NO APTA/E (Nom, cognoms i signatura) (Nom, cognoms i signatura) President/a Secretari/ària (Nom, cognoms i signatura) (Nom, cognoms i signatura) (Nom, cognoms i signatura) Vocal Vocal Vocal ______________________, _______ d'/de __________________ de _______________ El resultat de l’escrutini dels vots emesos pels membres titulars del tribunal, efectuat per l’Escola de Doctorat, a instància de la Comissió de Doctorat de la UPC, atorga la MENCIÓ CUM LAUDE: SI NO (Nom, cognoms i signatura) (Nom, cognoms i signatura) Presidenta de la Comissió de Doctorat Secretària de la Comissió de Doctorat Barcelona, _______ d'/de __________________ de _______________ A la meva fam´ılia i als amics que he fet durant aquesta etapa. “All our knowledge has its origins in our perceptions.” Leonardo da Vinci Summary Humanmotionanalysisisasabroadareaofcomputervisionthathasstronglyattracted the interest of researchers in the last decades. Motion analysis covers topics such as humanmotiontrackingandestimation,actionandbehaviorrecognitionorsegmentation of human motion. All these fields are challenging due to different reasons, but mostly because of viewing perspectives, clutter and the imprecise semantics of actions and human motion. Thecomputervisioncommunityhasaddressedhumanmotionanalysisfromseveralper- spectives. Earlierapproachesoftenreliedonarticulatedhumanbodymodelsrepresented inthethree-dimensionalworld. However,duetothetraditionallyhighdifficultyandcost of estimating such an articulated structure from video, research has focus on the devel- opment of human motion analysis approaches relying on low-level features. Although obtaining impressive results in several tasks, low-level features are typically conditioned byappearanceandviewpoint, thusmakingdifficulttheirapplicationondifferentscenar- ios. Nonetheless, the increase in computational power, the massive availability of data and the irruption of consumer-depth cameras is changing the scenario, and with that change human motion analysis through articulated models can be reconsidered. Analyzing and understanding of human motion through 3-dimensional information is still a crucial issue in order to obtain richer models of dynamics and behavior. In that sense, articulated models of the human body offer a compact and view-invariant repre- sentation of motion that can be used to leverage motion analysis. In this dissertation, wepresentseveralapproachesformotionanalysis. Inparticular, weaddresstheproblem of pose inference, action recognition and temporal clustering of human motion. Artic- ulated models are the leitmotiv in all the presented approaches. Firstly, we address pose inference by formulating a layered analysis-by-synthesis framework where models are used to generate hypothesis that are matched against video. Based on the same ar- ticulated representation upon which models are built, we propose an action recognition framework. Actions are seen as time-series observed through the articulated model and generated by underlying dynamical systems that we hypothesize that are generating the time-series. Such an hypothesis is used in order to develop recognition methods based on time-delay embeddings, which are analysis tools that do not make assumptions on the form of the form of the underlying dynamical system. Finally, we propose a method to cluster human motion sequences into distinct behaviors, without a priori knowledge of the number of actions in the sequence. Our approach relies on the articulated model representation in order to learn a distance metric from pose data. This metric aims at capturing semantics from labeled data in order to cluster unseen motion sequences into meaningful behaviors. The proposed approaches are evaluated using publicly available datasets in order to objectively measure our contributions.