ebook img

Human Action Recognition with Depth Cameras PDF

65 Pages·2014·2.963 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Human Action Recognition with Depth Cameras

SPRINGER BRIEFS IN COMPUTER SCIENCE Jiang Wang Zicheng Liu Ying Wu Human Action Recognition with Depth Cameras 123 SpringerBriefs in Computer Science Series editors Stan Zdonik Peng Ning Shashi Shekhar Jonathan Katz Xindong Wu Lakhmi C. Jain David Padua Xuemin (Sherman) Shen Borko Furht V. S. Subrahmanian Martial Hebert Katsushi Ikeuchi Bruno Siciliano For furthervolumes: http://www.springer.com/series/10028 Jiang Wang Zicheng Liu • • Ying Wu Human Action Recognition with Depth Cameras 123 JiangWang YingWu Northwestern University Northwestern University Evanston, IL Evanston, IL USA USA ZichengLiu Microsoft Research Redmond,WA USA ISSN 2191-5768 ISSN 2191-5776 (electronic) ISBN 978-3-319-04560-3 ISBN 978-3-319-04561-0 (eBook) DOI 10.1007/978-3-319-04561-0 Springer ChamHeidelberg New YorkDordrecht London LibraryofCongressControlNumber:2014930577 (cid:2)TheAuthor(s)2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Action recognition is an enabling technology for many real-world applications, such as human–computer interface, surveillance, video retrieval, senior home monitoring, and robotics. In the past decade, it has drawn a great amount of interests in the research community. Recently, the commoditization of depth sensors has generated much excitement in action recognition from depth sensors. Thenewdepthsensorhasenabledmanyapplicationsthatwerenotfeasiblebefore. Ononehand,actionrecognitionbecomesaloteasierwiththedepthsensor.Onthe other hand, people want to recognize more complex actions which present new challenges. One crucial aspect of action recognition is to extract discriminative features. The depth maps have completely different characteristics from the RGB images. Directly applying features designed for RGB images does not work. Complex actions usually involve complicated temporal structures, human- objectinteractions,andperson–personcontacts.Newmachinelearningalgorithms need to be developed to learn these complex structures. The goal of this book is to bring the readers quickly to the research front in depth sensor-based action recognition, and help the readers to gain deeper understandings of some of the recently developed techniques. We hope this book isusefulforbothresearchersandpractitionerswhoareinterestedinhumanaction recognition with depth sensors. This book focuses on the feature representation and machine learning algo- rithms for action recognition from depth sensors. After presenting a comprehen- sive overview of the state of the art in action recognition from depth data, the authors provide in-depth descriptions of their recently developed feature repre- sentations and machine learning techniques including lower level depth and skeletonfeatures,higherlevelrepresentationstomodelthetemporalstrucruresand human-objectinteractions,andfeatureselectiontechniquesforocclusionhandling. v vi Preface Acknowledgments We would liketo thank ourcolleagues andcollaborators who have contributedto the book: Junsong Yuan, Jan Chorowski, Zhuoyuan Chen, Zhengyou Zhang, Cha Zhang. I owe great thanks to my wife Ying Xia. This book would not be possible without her consistent support. Evanston, USA, November 2013 Jiang Wang Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Skeleton-Based Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Depthmap-Based Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Recognition Paradigms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Learning Actionlet Ensemble for 3D Human Action Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Spatio-Temporal Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Invariant Features for 3D Joint Positions. . . . . . . . . . . . 16 2.3.2 Local Occupancy Patterns . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Fourier Temporal Pyramid. . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 Orientation Normalization . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Actionlet Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Mining Discriminative Actionlets . . . . . . . . . . . . . . . . . 22 2.4.2 Learning Actionlet Ensemble . . . . . . . . . . . . . . . . . . . . 23 2.5 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.1 MSR-Action3D Dataset. . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.2 DailyActivity3D Dataset . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.3 Multiview 3D Event Dataset . . . . . . . . . . . . . . . . . . . . 33 2.5.4 Cornell Activity Dataset . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.5 CMU MoCap Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3 Random Occupancy Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 vii viii Contents 3.3 Random Occupancy Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Weighted Sampling Approach. . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.1 Dense Sampling Space . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.2 Weighted Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5 Learning Classification Functions . . . . . . . . . . . . . . . . . . . . . . 47 3.6 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 MSR-Action3D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.2 Gesture3D Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Chapter 1 Introduction Abstract Recentyearshavewitnessedgreatprogressindepthsensortechnology, whichbringshugeopportunitiesforactionrecognitionfield.Thischaptergivesan overview of the recent development of the 3D action recognition approaches, and presents the motivations of the 3D action recognition features, models, and repre- sentationsinthisbook. · · · Keywords Action recognition Depth camera Depth feature Human skeleton · modeling Temporalmodeling 1.1 Introduction Humanhasremarkableabilitytoperceivehumanactionspurelyfromvisualinfor- mation. We can localize people and objects, track articulated human motions, and analyze human-object interactions to understand what people are doing and even infertheirintents.Automatichumanactionunderstandingisessentialformanyarti- ficial intelligence systems, such as video surveillance, human computer interface, sportsvideoanalysis,videoretrieval,androbotics.Forexample,tobuildahuman computer interface that intelligently serves people, a system should be able to not onlysensethehumanmovement,butalsounderstandtheiractionsandintent. Despitealotofprogresses,humanactionrecognitionisstillachallengingtask, becausehumanactionsarehighlyarticulated,involvehumanobjectinteractions,and havecomplicatedspatio-temporalstructures.Humanactionrecognitionsystemsnot onlyneedtoextractthelow-levelappearanceandmotioninformationfromvideos, butalsorequiresophisticatedmachinelearningmodelstounderstandthesemantic meanings of those information. To achieve this goal, we have to make advances in multiple fronts: sensory technology to accurately obtain visual signals from the world,video/imagerepresentationtodescribehigh-dimensionalvisualdata,pattern miningtodiscovermeaningfulknowledge,andmachinelearning tolearnfromlarge amountofdata. J.Wangetal.,HumanActionRecognitionwithDepthCameras, 1 SpringerBriefsinComputerScience,DOI:10.1007/978-3-319-04561-0_1, ©TheAuthor(s)2014 2 1 Introduction y ... ... ... ... x x t t RGB Channel Depth Channel Fig.1.1 ComparisonbetweenRGBchannelsanddepthchannels The progress of sensor technologies has led to affordable high-definition depth cameras,suchasMicrosoftKinect.Depthcameraexploitsstructurallighttosense thedepthmapinreal-time.Pixelsinadepthmaprecordthedepthofascenerather than a measure of the intensity of color. The introduction of the depth cameras greatly extends the ability of computer systems to sense the 3D visual world and extractlow-levelvisualinformation,suchashuman/objecttrackingandsegmenta- tion.ThedepthmapsareverydifferentfromtheconventionalRGBimages,shownin Fig.1.1.Workinginlowlightlevels,andbeingcolorandtextureinvariant,thedepth camerasofferseveraladvantagesovertraditionalRGBcameras.However,thereare significant flicker noises in depth maps, and the accuracy of depth maps degrades asthedistancetothecameraincreases.Moreover,whenonlyasingledepthcamera is used, foreground and background occlusions occur frequently, which increases thedifficultyofforeground/backgroundsegmentation.Novelvisualrepresentations andmachinelearningmethodsneedtobedevelopedinordertofullyexploitdepth sensors for human action recognition. This presents new scientific challenges that have to be solved to enable next generation applications for security, surveillance, robotics,andhuman-computeinterfaces. Recent years have witnessed an influx of action recognition methods for depth cameras.Therecentlydevelopedactionrecognitionmethodshavemadevariouspro- gressesinfeaturerepresentationandrecognitionparadigms.Wewillgiveareviewof thosemethodsandtherecentlyproposedbenchmark3Dactionrecognitiondatasets. 1.2 Skeleton-BasedFeatures Kinect provides a powerful skeleton tracking algorithm [1], which outputs 20 3D humanjointpositionsforeachframeinreal-time.Sincethemovementsofthehuman skeletoncandistinguishmanyactions,exploitingtheKinecthumanskeletonoutput foractionrecognitionisapromisingdirection. However,comparedtoMotionCapture(MoCap)systems[2],whichreliablyesti- matethehumanjointpositionswithmultiplecamerasandjointmarkers,thehuman jointpositionsoutputbyKinecthavelowerquality,becauseKinectcanonlyuseasin- gledepthcameratoestimatethehumanjointpositions.The3Dhumanjointpositions

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.