ebook img

Human Activity Recognition and Prediction PDF

179 Pages·2016·5.61 MB·English
by  Yun Fu
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Human Activity Recognition and Prediction

Yun Fu Editor Human Activity Recognition and Prediction Human Activity Recognition and Prediction Yun Fu Editor Human Activity Recognition and Prediction 123 Editor YunFu NortheasternUniversity Boston,Massachusetts,USA ISBN978-3-319-27002-9 ISBN978-3-319-27004-3 (eBook) DOI10.1007/978-3-319-27004-3 LibraryofCongressControlNumber:2015959271 SpringerChamHeidelbergNewYorkDordrechtLondon ©SpringerInternationalPublishingSwitzerland2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper SpringerInternationalPublishingAGSwitzerlandispartofSpringerScience+BusinessMedia(www. springer.com) Preface Automatic human activity sensing has drawn much attention in the field of video analysis technology due to the growing demands from many applications, such as surveillanceenvironments,entertainments,andhealthcaresystems.Humanactivity recognition and prediction is closely related to other computer vision tasks such as human gesture analysis, gait recognition, and event recognition. Very recently, theUSgovernmentfundedmanymajorresearchprojectsonthistopic;inindustry, commercial products such as the Microsoft’s Kinect are good examples that make use of human action recognition techniques. Many commercialized surveillance systems seek to develop and exploit video-based detection, tracking and activity recognition of persons, and vehicles in order to infer their threat potential and provideautomatedalerts. This book focuses on the recognition, prediction of individual activities and interactions from videos that usually involves several people. This provides a uniqueviewof:humanactivityrecognition,especiallyfine-grainedhumanactivity structure learning, human interaction recognition, RGB-D data-based recognition temporal decomposition, and casually learning in unconstrained human activity videos.Thesetechniqueswillsignificantlyadvanceexistingmethodologiesofvideo contentunderstandingbytakingadvantageofactivityrecognition.Asaprofessional referenceandresearchmonograph,thisbookincludesseveralkeychapterscovering multiple emerging topics in this new field. It links multiple popular research fields in computer vision, machine learning, human-centered computing, human- computer interaction, image classification, and pattern recognition. Contributed by top experts and practitioners of the Synergetic Media Learning (SMILE) Lab at Northeastern University, these chapters complement each other from different angles and compose a solid overview of the human activity recognition and prediction techniques. Well-balanced contents and cross-domain knowledge for both methodology and real-world applications will benefit readers from different levelofexpertiseinbroadfieldsofscienceandengineering. There are in total eight chapters included in this book. The book will first give an overview of recent studies for activity recognition and prediction in the “Introduction” chapter, such as objectives, challenges, representations, classifiers, v vi Preface anddatasets,andwillthendiscussseveralinterestingtopicsofthesefieldsindetails. Chapter 2 addresses the challenges of action recognition through interactions and proposes semantic descriptions, namely, “interactive phrases,” to better describe complexinteractions,anddiscriminativespatiotemporalpatchestoprovidecleaner features for interaction recognition. Chapter 3 presents a sparse tensor subspace learning method to select variables for high-dimensional visual representation of action videos, which not only keeps the original structure of action but also avoidsthe“curseofdimensionality.”Chapter4introducesamultiplemode-driven discriminantanalysis(MDA)intensorsubspaceactionrecognition,whichpreserves both discrete and continuous distribution information of action videos in lower dimensional spaces to boost discriminant power. Chapter 5 presents a transfer learning method that can transfer the knowledge from depth information of the RGB-DdatatotheRGBdataandusetheadditionalsourceinformationtorecognize human actions in RGB videos. In particular, a novel cross-modality regularizer is introduced that plays an important role in finding the correlation between RGB and depth modalities, allowing more depth information from the source database to be transferred to that of the target. Chapter 6 introduces a multiple temporal scale support vector machine for early classification of unfinished actions; and a convexlearningformulationisproposedtoconsidertheessenceoftheprogressively arriving action data. Chapter 7 discusses an approach for predicting long duration complexactivitybydiscoveringthecausalrelationshipsbetweenconstituentactions and predictable characteristics of the activities. Chapter 8 introduces an approach to early classify human activities represented by multivariate time series data, wherethespatialstructureofactivitiesisencodedbythedimensionsofpredefined humanbodymodelandthetemporalstructureofactivitiesaremodeledbytemporal dynamicsandsequentialcue. This book can be used by broad groups of readers, such as professional researchers, graduate students, and university faculties, especially those in the backgroundofcomputerscienceandcomputerengineering.Iwouldliketosincerely thankalltheSMILELabcontributorsofthisbookforpresentingtheirmostrecent research advances in an easily accessible manner. I would also sincerely thank editorsMaryE.JamesandRebecca R.HytowitzfromSpringer forstrongsupport tothisbookproject. Boston,MA,USA YunFu Contents 1 Introduction .................................................................. 1 YuKongandYunFu 2 ActionRecognitionandHumanInteraction.............................. 23 YuKongandYunFu 3 SubspaceLearningforActionRecognition............................... 49 ChengchengJiaandYunFu 4 MultimodalActionRecognition............................................ 71 ChengchengJia,WeiPang,andYunFu 5 RGB-DActionRecognition................................................. 87 ChengchengJia,YuKong,ZhengmingDing,andYunFu 6 ActivityPrediction........................................................... 107 YuKongandYunFu 7 ActionletsandActivityPrediction ......................................... 123 KangLiandYunFu 8 TimeSeriesModelingforActivityPrediction ............................ 153 KangLi,ShengLi,andYunFu vii Chapter 1 Introduction YuKongandYunFu The goal of human action recognition is to predict the label of the action of an individual or a group of people from a video observation. This interesting topic is inspiredbyanumberofusefulreal-worldapplications,suchasvisualsurveillance, video understanding, etc. Considering that in a large square, an online visual surveillanceforunderstandingagroupofpeople’sactionwillbeofgreatimportance forpublicsecurity;anautomaticvideounderstandingsystemwillbeveryeffective tolabelmillionsofonlinevideos. However, in many real-world scenarios (e.g., vehicle accident and criminal activity),intelligentsystemsdonothavetheluxuryofwaitingfortheentirevideo beforehavingtoreacttotheactioncontainedinit.Forexample,beingabletopredict adangerousdrivingsituationbeforeitoccurs;opposedtorecognizingitthereafter. This task is referred to as action prediction where approaches that can recognize progressively observed video segments, different to action recognition approaches thatexpecttoseetheentiresetofactiondynamicsextractedfromafullvideo. Although conventional color videos contain rich motion and appearance infor- mation, they do not provide structural information of the entire scene. In other words, machines cannot tell which object in a video is more closer to the camera and which is more far away. Due to the recent advent of the cost-effective Kinect sensors, action recognition from RGB-D cameras is receiving increasing interests incomputervisioncommunity.ComparedwithconventionalRGBcameras,Kinect Y.Kong((cid:2)) DepartmentofElectricalandComputerEngineering,NortheasternUniversity, 360HuntingtonAvenue,Boston,MA02115,USA e-mail:[email protected] Y.Fu DepartmentofElectricalandComputerEngineeringandCollegeofComputerandInformation Science(Affiliated),NortheasternUniversity,360HuntingtonAvenue,Boston,MA02115,USA e-mail:[email protected] ©SpringerInternationalPublishingSwitzerland2016 1 Y.Fu(ed.),HumanActivityRecognitionandPrediction, DOI10.1007/978-3-319-27004-3_1 2 Y.KongandY.Fu sensorsprovidedepthinformation,whichcaptures3Dstructuralinformationofthe entirescene.The3Dstructuralinformationcanbeusedtofacilitatetherecognition taskbysimplifyingintra-classmotionvariationandremovingclutteredbackground noise. In this chapter, we will first review recent studies in action recognition and prediction that consist of action representations, action classifiers, and action predictors. Approaches for recognizing RGB-D videos will also be discussed. We then will describe several popular action recognition datasets, including ones with individual actions, group actions, unconstrained datasets, and RGB-D action datasets. Some of existing studies [106, 107] aim at learning actions from static images,whichisnotthefocusofthisbook. 1 ChallengesofHumanActivityRecognitionandPrediction Despite significant progress has been made in human activity recognition and prediction, the most advanced algorithms still misclassify action videos due to severalmajorchallengesinthistask. 1.1 Intra-andInter-ClassVariations Asweallknow,peoplebehavedifferentlyforthesameactions.Foragivensemantic meaningful activity, for example, “running,” a person can run fast, slow, or even jump and run. That is to say, one activity category may contain multiple different styles of human motion. In addition, videos in the same action can be captured from various viewpoints. They can be taken in front of the human subject, on the side of the subject, or even on top of the subject. Furthermore, different people may show different poses in executing the same activity. All these factors will result in large intra-class appearance and pose variations, which confuse a lot of existingactivityrecognitionalgorithms.Thesevariationswillbeevenlargeronreal- world activity datasets. This triggers the investigation of more advanced activity recognitionalgorithmsthatcanbedeployedinreal-world. Furthermore, similarities exist in different activity categories. For instance, “running”and“walking”involvesimilarhumanmotionpatterns.Thesesimilarities wouldalsobechallengingtodifferentiateforintelligentmachines,andconsequently contributetomisclassifications. In order to minimize intra-class variations and maximize inter-class variations, lots of effort have been made to design discriminative activity features. In some recent studies, researchers attempt to learn discriminative features using deep learningtechniques,inordertobetterdescribecomplexhumanactivities.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.