ebook img

An Interval-Based Bayesian Generative Model for Human Complex Activity Recognition PDF

2.5 MB·
by  Li Liu
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Interval-Based Bayesian Generative Model for Human Complex Activity Recognition

1 An Interval-Based Bayesian Generative Model for Human Complex Activity Recognition Li Liu1,2,3, Member, IEEE Yongzhong Yang4, Lakshmi N. Govindarajan4, Shu Wang5, Bin Hu6, Senior Member, IEEE Li Cheng3,4, Senior Member, IEEE, and David S. Rosenblum3, Fellow, IEEE 1Ministry of Education, Key Laboratory of Dependable Service Computing in Cyber Physical Society, Chongqing 400044, China 2School of Software Engineering, Chongqing University, Chongqing 400044, China 3School of Computing, National University of Singapore, 117417, Singapore 4Bioinformatics Institute, A*STAR, 138671, Singapore 5College of Material Science and Engineering, Lanzhou University of Technology, Lanzhou 730050, China 7 6School of Information Science and Engineering, Lanzhou University, 730000, China 1 0 2 Complex activity recognition is challenging due to the inherent uncertainty and diversity of performing a complex activity. Normally, each instance of a complex activity has its own configuration of atomic actions and their temporal dependencies. We n propose in this paper an atomic action-based Bayesian model that constructs Allen’s interval relation networks to characterize a complex activities with structural varieties in a probabilistic generative way: By introducing latent variables from the Chinese J restaurantprocess,ourapproachisabletocaptureallpossiblestylesofaparticularcomplexactivityasauniquesetofdistributions 4 over atomic actions and relations. We also show that local temporal dependencies can be retained and are globally consistent in the resulting interval network. Moreover, network structure can be learned from empirical data. A new dataset of complex hand ] activities has been constructed and made publicly available, which is much larger in size than any existing datasets. Empirical L evaluations on benchmark datasets as well as our in-house dataset demonstrate the competitiveness of our approach. M . Index Terms—complex activity recognition, Allen’s interval relation, Bayesian network, probabilistic generative model, Chinese t a restaurant process, temporal consistency, American Sign Language dataset. t s [ 1 I. INTRODUCTION practical scenarios where temporal relations among activities v 3 Acomplexactivityconsistsofasetoftemporally-composed areintricate.Althoughanumberofsemantic-basedapproaches 0 eventsofatomicactions,whicharethelowest-leveleventsthat have been proposed for learning temporal relations, such as 9 stochastic context-free grammars [29] and Inductive Logic can be directly detected from sensors. In other words, a com- 0 Programming (ILP) [9], they can only learn formulas that plex activity is usually composed of multiple atomic actions 0 are either true or false, but cannot learn their weights, which . occurring consecutively and concurrently over a duration of 1 hinders them from handling uncertainty. time. Modeling and recognizing complex activities remains 0 On the other hand, graphical models become increasingly 7 an open research question as it faces several challenges: popular for modeling complex activities because of their 1 First, understanding complex activities calls for not only the : inferenceofatomicactions,butalsotheinterpretationoftheir capabilityofmanaginguncertainties[31].Unfortunately,most v of them can handle three temporal relations only, i.e. equals, i richtemporaldependencies.Second,individualsoftenpossess X follows and precedes. Both Hidden Markov model (HMM) diverse styles of performing the same complex activity. As a r result,acomplexactivityrecognitionmodelshouldbecapable and conditional random field (CRF) are commonly used for a recognizing sequential activities, but are limited in managing of capturing and propagating the underlying uncertainties overlappingactivities[13].Manyvariantswithcomplexstruc- over atomic actions and their temporal relationships. Third, a tures have been proposed to capture more temporal relations complex activity recognition model should also tolerate errors among activities, such as interleaved hidden Markov models introduced from atomic action level, due to sensor noise or (IHMM)[20],skip-chainCRF[12]andsoon.However,these low-level prediction errors. modelsaretimepoint-based,andhencewiththegrowthofthe number of concurrent activities they are highly computation- A. Related Work ally intensive [22]. Dynamic Bayesian network (DBN) can Currently,alotofresearchfocusesonsemantic-basedcom- learn more temporal dependencies than HMM and CRF by plex activity modeling. Many semantic-based models such as adding activities’ duration states, but imposes more computa- context-free grammar (CFG) [26] and Markov logic network tionalburden[21].Moreover,thestructuresofthesegraphical (MLN) [11], [18]) are used to represent complex activities, modelsareusuallymanuallyspecifiedinsteadoflearnedfrom which can handle rich temporal relations. Yet formulae and thedata.TheintervaltemporalBayesiannetwork(ITBN)[31] their weights in these models (e.g. CFG grammars and MLN differs significantly from the previous methods, as being a structures) need to be manually encoded, which could be graphical model that first integrates interval-based Bayesian rather difficult to scale up and is almost impossible for many network with the 13 Allen’s relations. Nonetheless, ITBN 2 has several significant drawbacks: First, its directed acyclic TableI Bayesian structure makes it have to ignore some temporal ASUMMARYCOMPARISONOFGRAPHICALMODEL-BASEDAPPROACHES FORRECOGNIZINGCOMPLEXACTIVITIES. relations to ensure a temporally consistent network. As such, it may result in loss of internal relations. Second, it would HMMs ITBN our &BNs IBGN be rather computationally expensive to evaluate all possible Time-point-based(p)orinterval-based(i)? p i i consistent network structures, especially when the network Howmanytemporalrelationscanbedescribed? 3 13 13 Retainalltheintervalrelationsduringtrainingstage? (cid:88) X (cid:88) size is large. Third, neither can ITBN manage the multiple Handleanypossiblecombinationsofintervalrelations? X (cid:88) (cid:88) Handlemultipleoccurrencesofthesameatomicaction? (cid:88) X (cid:88) occurrences of the same atomic action, nor can it handle Handlevariablenumberofoverlappingactions? X (cid:88) (cid:88) arbitrarynetworksizeasitremainsunchangedasthecountof Describeacomplexactivitywithvariablesizesofpoints (cid:88) X (cid:88) orintervals? atomic action types. Figure 1 illustrates the graph structures Isthestructurelearnedfromtrainingdata? X (cid:88) (cid:88) of the three commonly-used graphical models. It is worth It is worth mentioning that in spite of the increasing need from diverse applications in the area of complex activity recognition, there are only a few publicly-available complex activity recognition datasets [25], [3], [15]. In particular, the (a) IHMM (b) DBN (c) ITBN number of instances are on the order of hundreds at most. This motivates us to propose a dedicated large-scale dataset Figure 1. The structures of three graphical models for complex activity recognition. (a)IHMM, where the observations of atomic actions (square- ondepthcamera-basedcomplexhandactivityrecognition.We shaped nodes) and several chains of hidden states (round-shaped nodes) are have constructed a new dataset of complex hand activities usedtohandleoverlapping;(b)DBN,wheredurationstatesandatomicaction whichcontainsinstancesthatareaboutanorderofmagnitude statesarerepresentedaschainsofnodes;(c)ITBN,whereanyatomicaction type (A1-A9) is represented by a node and the set of all possible interval largerthantheexistingdatasets.Wehavemadethedatasetand relationsbetweenanypairofatomicactiontypesAiandAj isrepresentedby related tools made publicly available on a dedicated project alinkIi,j. website1 in support of the open-source research activities in this emerging research community. noting that we will focus on complex activity recognition in this paper, and interested readers may consult the excellent reviews [1], [6], [14], [7], [4], [8] for further details regarding II. DEFINITIONSANDPROBLEMFORMULATION atomic-level action recognition. Assume we have at hand a dataset D of N instances from a setofL typesofcomplexactivitiesinvolvingasetofM types of atomic actions A = A ,A ,...,A . An atomic action B. Our approach interval (interval for sh{ort1) is2 writteMn}by Ii = ai@[ti−,ti+), Toaddresstheproblemsinthepreviousmodels,wepresent whereti− andti+ representthestartandendtimeoftheatomic an interval-based Bayesian generative network (IBGN) to ex- action a A, respectively. Each complex activity instance is i ∈ plicitly model complex activities with inherent structural vari- a sequence of k ordered intervals, i.e. I ,I ,...,I , such that 1 2 k eties, which is achieved by constructing probabilistic interval- if 1≤i< j ≤k, then (ti− <t−j ) or ((cid:104)ti− =t−j and(cid:105) ti+ ≤t+j ). based networks with temporal dependencies. In other words, Seven Allen’s interval relations (relations for short) [2] are our modelconsiders aprobabilistic generativeprocess ofcon- used to represent all possible temporal relationships between structing interval-based networks to characterize the complex two intervals, denoted by R = b,m,o,s,c,f, , which is { ≡} activitiesofinterests.Brieflyspeaking,asetoflatentvariables, summarized below: calledtables,whicharegeneratedfromtheChineserestaurant (before) Ii bIj : ti+<t−j , process (CRP) [23] are introduced to construct the interval- based network structures of a complex activity. Each latent (meet) Ii mIj : ti+=t−j , variablecharacterizesauniquestyleofthiscomplexactivityby (overlap) Ii oIj : ti−<t−j <ti+<t+j , containing its distinct set of atomic actions and their temporal (start) Ii sIj : ti−=t−j andti+<t+j , dependencies based on Allen’s interval relations. There are (contain) Ii cIj : ti−<t−j andt+j <ti+, two advantages to using CRP: It allows our model to describe (finished-by) Ii f Ij : ti−<t−j andti+=t+j , a complex activity of arbitrary interval sizes and also to take (equal) Ii ≡ Ij : ti−=t−j andti+=t+j . into account multiple occurrences of the same atomic actions. We define an interval-based network (network for short) We further introduce interval relation constraints that can G=(V,E)torepresentacomplexactivitycontainingthetem- guarantee the whole network generation process is globally poralrelationshipsbetweenintervals.Anodev V represents i temporally consistent without loss of internal relations. In ∈ the i-th interval (i.e. I) in a complex activity instance, while i addition, instead of manually specify a network to a fixed a directed link e E represents the relation between two i,j structure,thenetworkstructureinourapproachislearnedfrom ∈ intervals (i.e. I and I ), where 1 i< j V (V is the i j training data. By learning network structures, our model is ≤ ≤| | | | cardinality of the set V). Note a link always starts from a moreeffectivethanexistinggraphicalmodelsincharacterizing node with a smaller index than its arrival node. Each link e i,j the inherent structural variability in complex activities. A furthercomparisonstudyissummarizedinTableI,whichalso 1The dataset including raw videos, annotations and related tools can be shows our main contributions. foundathttp://web.bii.a-star.edu.sg/ lakshming/complexAct/. ∼ 3 can be constructed following the styles of the complex ac- tivities of interests under uncertainty. We shall also consider the IBGN model to construct networks with variable sizes of nodes and arbitrary structures. Note in our approach, for each type of complex activities we would learn one such dedicated IBGN model. (a) OffensiveplayI (b) OffensiveplayII III. OURIBGNMODEL For any complex activity type l (1 l L), denote D D l ≤ ≤ ⊆ the corresponding subset of N instances, where each element l d D is an instance of the l-th type of complex activity In l ∈ IBGN,thegenerativeprocessofconstructinganinterval-based network G =(V ,E ) for describing the observed instance d d d d consists of two parts, node generation and link generation, (c) NetworkI (d) NetworkII which are described below. Figure2. Twopossiblestylesofthecomplexactivityoffensiveplayandits corresponding networks where the nodes represent the intervals of atomic actions and the links represent their temporal relations. Atomic actions: A. Node Generation A1=walk,A2=stand,A3=holdball,A4=jump,A5=dribble,A6=shoot. In our IBGN model, we consider generating a network where each node is associated with an atomic action in a involvesoneandonlyonerelationr R.Figure2illustrates probabilisticmanner.Wealsorequireourmodeltobecapable i,j ∈ the corresponding networks of a complex activity. ofgeneratingvariablenumbersofnodesandhandlingmultiple The temporal relations on links shall be globally consis- occurrences of the same atomic action in our network, as tent in any interval-based network. Given any two relations summarized in Table I. r ,r R on links e and e , respectively, the relation r Toaccomplishthesetasks,wefirstextendtheprocessofthe i,j j,k i,j j,k i,k on link∈e must follow the transitivity properties as shown in CRPs to make it available in our IBGN model. Originally, a i,k Figure3.Forexample,supposeI meetsI andI startsI , CRPisanalogoustoastochasticprocessofchoosingtablesfor i j j k I is certainly meets I . However, if I starts I and I is customersinaChineserestaurant.Inanutshell,supposethere i k i j j finished by Ik, there are three possible relations between areaninfinitenumberoftables{T1,T2,...}.Thefirstcustomer I andI ,thatis,before,meetandoverlap.Weformally (n=1)alwaysselectsthefirsttable;Anyothercustomer(n> i k use r r to denote such composition operation following 1)randomlyselectsanoccupiedtableoranunoccupiedempty i,j j,k thetrans◦itivityproperties.Wesayanetworkisconsistent such table with a certain probability. that r r r for any triplet of links e , e and e in We continue this process by serving dishes right after each i,k i,j j,k i,j j,k i,k the netw∈ork. ◦ customerisseatedatatable.Assumethereareafinitenumber of dishes and an infinite number of cuisines. Each table is associated with a cuisine that dishes are served with a ri,j◦rj,k b m o s c f ≡ b b b b b b b b uniqueprobabilitydistribution.Anycustomersittingatatable m b b b m b b m randomly selects a dish with the probability relating to its o b b bmo o bmocf bmo o corresponding cuisine. s b b bmo s bmocf bmo s c bmocf ocf ocf ocf c c c In our model, a network contains a group of customers f b m o o c f f where each customer is analogous to a node while a dish is b m o s c f ≡ ≡ analogous to an atomic action type. Suppose customers from the same group prefer similar cuisines, which is analogous Figure3. Thetransitivitytableforanyintervalrelationri,k throughcompo- to a complex activity of interest, and are more likely to sit sitionoperationri,j◦rj,k. at the same tables. Formally, denote t1,t2,... the variables { } Itisclearthatanetworkcancharacterizeonepossiblestyle of tables, and {a1,a2,...} the variables of atomic actions 2. of a complex activity with diverse combinations of atomic To construct a network Gd =(Vd,Ed), tn and an are the table and the atomic action (dish) assigned to the node (customer) actions and their interval relations, as illustrated in Figure 2. v V (n=1,2,..., V ). The generative process operates as From another point of view, a complex activity can be in- n∈ d | d| follows.Then-thnodevn selectsatabletn thatisdrawnfrom stantiated by sampling atomic actions and relations assigned the following distribution: to their associated nodes and links in a network following certain probabilities. In this way, a recognition model built (cid:26) ntζ if T is a non-empty table , on such networks is able to handle uncertainty in complex P(tn=Tζ|t1,...,tn−1)= nn++ααα−11 if Tζζ is a new table, activity recognition. In addition, multiple occurrences of the − (1) same atomic action can appear in the same network but in different nodes (i.e. intervals). To this end, we present the 2Whenever possible, we would use bold lowercase letters such as tn, an to represent variables, and uppercase letters such as Ti and Aj to represent probabilistic generative model IBGN where these networks genericvaluesofthesevariables. 4 Figure4. Anillustrationofthegenerativeprocessofchoosingatabletn andanatomicactionan fornodevn. where nt is the number of existing nodes occupied at table relationr onlinke shallfollowthetransitivityproperties ζ n,n n,n (cid:48) (cid:48) Tζ (ζ =1,2,...), with ∑ζntζ =n 1, t=T1. α is a tuning listed in Figure 3. It is straightforward to verify that the set hyperparameter. It is worth mentio−ning that the distribution R is closed under the composition operation. As a result, by over table assignments in CRP is invariant and exchangeable applyingthetransitivitytable,foranycompositionthereexists under permutation according to de Finetti’s Theorem [28]. only 11 possible transitive relations in Figure 6, denoted as After vn is assigned with a table tn =Tζ, an atomic action C= Cz :1 z 11 . In other words, any composition of (dish) an is chosen from the table by: two c{onsecut≤ive ≤relati}ons satisfies rn,n rn ,n C. (cid:48) (cid:48)(cid:48)◦ (cid:48)(cid:48) ∈ (an|tn)∼Multinomial(θζ), θζ ∼Dirichlet(β), CC81=={{bb,m},,Co}2,=C9{m=}{,oC,3c,=f}{,oC}1,0C=4={b{,sm},,oC,5c,=f}{,cC},11C=6={b{,fm},,oC,7s,=c,{f≡,≡}},. where β is a hyperparameter. Note θ =(θ ,...,θ )T is ζ ζ,1 ζ,M the parameter vector of a multinomial distribution at table Tζ. Figure6. The11possibleintervalcompositionrelations. Figure4presentsacartoonexplanationofthisnodegeneration To construct a globally consistent network, the relations on process of our IBGN model. Since we would obtain one every triangle in a network must also be consistent. Namely, for any triplet of nodes v , v and v , if there exist three n n n (cid:48) (cid:48)(cid:48) links r , r , and r , they need to satisfy the transitive n,n n,n n ,n (cid:48) (cid:48) (cid:48)(cid:48) (cid:48)(cid:48) relation r r r . As such, we define the interval n,n n,n n ,n (cid:48) ∈ (cid:48) (cid:48)(cid:48)◦ (cid:48)(cid:48) relation constraint variable as follows: Definition 1. Give an arbitrary interval-based network G= (a) Distributions of atomic actions at different tables that collectively represent thecomplexactivityoffensiveplay. (V,E),theintervalrelationconstraintcn,n Cforalinken,n (cid:48) ∈ (cid:48) ∈ E (1 n <n V ) is (cid:48) ≤ ≤| | cn(cid:48),n=(cid:26) (cid:84)Rnn−(cid:48)(cid:48)=1n(cid:48)+1(xn(cid:48),n(cid:48)(cid:48)◦xn(cid:48)(cid:48),n) iiff nn>=nn(cid:48)(cid:48)++11,, (cid:26) where xp,q= rcpp,,qq iiff eepp,,qq∈/EE,, (n(cid:48)≤p<q≤n). ∈ (b) Distributionsofatomicactionsatdifferenttablesthatcollectivelyrepresent thecomplexactivitydefensiveplay. In link generation, our IBGN model follows the rule that anyrelationr canonlybedrawnfromc .Wehaveproved n,n n,n Figure5. Examplesofthetablesandcorrespondingdistributionsoveratomic (cid:48) (cid:48) thatanetworkconstructedunderthisruleisgloballyconsistent actions for representing two complex activities, offensive play and defensive play, respectively. Here each table contains a set of atomic and complete, with proofs relegated to the Appendix A. actionsunderaspecificdistribution. Theorem 1. (Network Consistency and Completeness) dedicated IBGN model for each complex activity, a complex A network Gd constructed by obeying the interval relation activity is now characterized as a unique set of distributions constraints is always temporally consistent, and any possible over atomic actions, i.e. θ = θ1,θ2,... . As illustrated in combinationofrelationsinGd canbeconstructedthroughour { } Figure 5, the two different complex activities offensive play IBGN model. and defensive play are associated with two distinct sets of Now, we are ready to assign relations to links. Suppose tables with their own distributions over atomic actions. It an =Ai, an=Aj and cn,n=Cz (1 i,j M, 1 z 11), a suggests that our IBGN modeling approach is capable of (cid:48) (cid:48) ≤ ≤ ≤ ≤ relation rn,n on link en,n Ed is chosen from a distribution differentiating the underlying characteristics associated with (cid:48) (cid:48) ∈ over all possible relations in c as follows: n,n atomic actions from different complex activity categories. (cid:48) rn(cid:48),n|an(cid:48),an,cn(cid:48),n∼Multinomial(ϕi,j,z), B. Link Generation where ϕ is the parameter vector of the multinomial dis- i,j,z Once each node (interval) is assigned to an atomic action, tribution associated with the triplet (A,A ,C ). Note that for i j z links and their associated relations are to be generated next. anintervalrelationconstraintcontainingonlyonerelation(i.e. It is important to ensure consistency of the resulting relations C -C ),theprobabilityofchoosingthatrelationisalwaysone. 1 7 overalllinks.Formally,giventworelationsr andr on It is also important to notice that the quality of the net- n,n n ,n (cid:48) (cid:48)(cid:48) (cid:48)(cid:48) thelinkse ande (n <n <n),respectively,theinterval work structure plays an extremely important role in activity n,n n ,n (cid:48) (cid:48)(cid:48) (cid:48) (cid:48)(cid:48) (cid:48)(cid:48) 5 (a) Chain-basednetwork. (b) Fully-connectednetwork. (c) An exemplar arbitrary network structure. Figure7. Threepossiblenetworkstructuresinlinkgeneration. modeling. In our previous work [17], two variants with fixed Algorithm 1 Generative process. network structures are considered: chain-based network as in 1: procedureGENERATE-NETWORKS(Dl) Fnoigduersea7r(ea)c,ownshterruectoendlyinthneetlwinokrsksb;etfwuleleyn-cotwnnoecnteeidghnbeotuwroinrgk 234::: LCfoehraoreonasceahnacodopimstitpmrilbeauxltniaoecnttwivθoiζtryk∼isnDtsrutiarcintcucherleedtG(βi∗n)fD(rζolm=doD1,l;2,...); (cid:46)c(cid:46)onGs∗tr=uct(VaGn∗e,tEwGo∗r)k, as in Figure 7(b), where all pairwise links are constructed in Gd=(Vd,Ed) nme|vbV1neaod,ts2→w|de,−deeol2rv.,k13nnI+,s)en..t1.wr.Ic.eno,hAperark|nfiVeansydsc-|e−bteni,1anqt,sitt|uVnheeadrdge|lvysanatlheartoetrerweelgoRlatierwtn,kniokosean,rnasoodtpcfenoedltcnyt,hiwsautawlorsaiscitensnhauttesceieocghsnfnh,,nnob|n+V+foe1d1rtoi|wn(−ui1ongrr1≤kcInsBhloinaGndaikenr≤Ness- 11167589120::::::::SupposefoarnCCf(cid:48)eoa=hhrcoohieAooefassnineceedonhCC(cid:48)daa,inelnhafitnlaoovackbobntuloselee(emay1ntst(cid:48)eai,n≤cntcr∼ha(ennencl(cid:48)C,a≤tn−sitRtoiroo|1nPuVnnc(≥datttr|un1h)nn,re(cid:48)|de(cid:48).,nt.≥olno.i|n,f∼a1tknnG)(cid:48)−Me,d∗1nau(cid:48)o;n,(αlni,t.;ic)en;n.o(cid:48),emnni(cid:48)∼,anlM(|=θuζGl)t;i∗n(cid:46))o(cid:46)tSm(cid:46)huSieSapunulpp(poϕppsoioe,sjs,ezce)n;a(cid:48)t,nnn===ATCζ(cid:46)zj 13: endfor inherently consistent because no inconsistent triangle exists. 14: endfor 15: endfor However crucial relations may be missing in this model. 16: endprocedure On the other side of the spectrum, we have fully-connected networks, where all possible pairwise links are considered. Any xp,q in fully-connected networks equals to rp,q. When weoftenset(cid:96)=max Vd ,duetothefactthatgivenatraining fittingtheIBGNmodel,itispossibletoincreasethelikelihood datasetD anumdb∈eDrlo{f|m|a}x V tablesareoccupiedatmost. byaddinglinks,butdoingsomayresultinoverfitting.Instead l d∈Dl{| d|} of prefixing the network structures, in this work we relax the assumption of a structure being either fully-connected or IV. IBGNLEARNING chain-based, and consider learning an optimal structure (i.e. In what follows we focus on how to learn the network to decide which links should exist in the network) from data. structure and the parameter vectors θ and ϕ from the training This allows the IBGN model to handle temporal consistency data D for a particular complex activity l. l for arbitrary network structures, as presented in Figure 7(c). A. Learning Network Structure C. The Generative Process Instead of configuring a network with chain-based or fully- For each dataset D D containing a particular complex connected links, we would like to learn an network structure l ⊂ GinIBGNaccordingtoascorefunctionthatbestmatchesthe activity 1 l L, our model assumes the whole generative process in≤clud≤ing node generation and link generation in training data Dl, i.e., learning from empirical data on which links to select in our constructed networks. Algorithm 1. Notice that the optimal network structure G∗ would be learned from D with details to be elaborated in An IBGN model is built over a collection of variables t l for table assignments, a for atomic-action assignments and section IV-A. The structure G∗ demonstrates whether a link r for relation assignments. In detail, for a specific instance needstobegeneratedinthenetwork.Forexample,toconstruct d =< I ,I ,...,I > with k = V , its corresponding net- iitefnhnse(cid:48),atnann∈ndectEewodnod|l=r,yktGhifGe∗.delni=(cid:48)n,nk(oVebdne(cid:48),,Eynsdf)rtohfmeorsvtarnu(cid:48)ccttoeurrtevaninoifscGoinm∗v,poldlevexendoatcientdivGibtyyd s{rwicdo=1er,2kr,{icarn11v1,,3go2,e,l.2vnr.ee1.,sr,3ac,vl1.a,.knrk.,ie,actrb2w1,l3,eok,s,rcrkt22,=,4s3,t,.{rru.t2.1c,,4,tcu,t22.r|,.,ek..,,d,.|.w.,re.t2k.,}.kfi.,,r..as,.tc.=.ki−.n{.1ta,r,kro1}k,d−.au12c,,Tke.}o.,a.,cacnokun=}l-l, The joint distribution of variables t, a, and r, is given by: intervaltomakeeveryinstanceinD havingthesamenumber l P(a,t,r;α,β)=d∏∈D∏l(cid:16)vn∏P∈V(drn(cid:0)(cid:48)P,n(|tnan|,ta1,n.(cid:48),.c.n,(cid:48)t,nn−)(cid:17)1;.α)P(an|tn;β)(cid:1) (2) o(a+nfd∞k∗i,tns=utlelmd,m∈+aDpx∞lo{r)|aVlsdu|r}cehlaitntihoteanrtvwiatlsist.hasAasoncyniuaotltelhdeinratteionrmvteairlcvaiaslcstdiiosenfinniuseldln.uIalnsl n(cid:48)<n, other words, null can be viewed as a special atomic action en(cid:48),n|=G∗ class. For any instance of size k<k , k k null intervals ∗ ∗ − T∑hde∈Dtlo|Vtadl| annudm|bDelr|·o|EfGv∗a|,riraebslpeescttiv,ealy,.aItndis wroarrteh n∑odti∈nDgl|Vthda|t, ainrestaanpcpeehnadsedtheatsathmeerneuamr boefrtohfeki∗nisntatenrcvea.lsI.nCtohrirseswpaoyn,dienvgelryy, 6 e1,k∗ (r1,k∗,c1,k∗) (t1v,1a1) (r1,e21,,c21,2) //(t2v,2a2)(r1,ek1,,ck1,k)(r2//,e.k2.,,c.k2(,rk1),k∗e−11,k,∗c−11,k∗−**//551()tkv,kak) (rk,k∗e//−.k.1,k.,∗c−k1,k∗−1)(rk,ek∗k,,ck//∗66k(,tk((((k∗∗)−v1k∗,a−k1∗−1) (cerkkk∗∗∗−−−111,,k,kk∗∗∗), 88//(%%%%tk∗v,ka∗k∗) (r2,k∗e−21,k,∗c−21,k∗−1) e2,k∗ (r2,k∗,c2,k∗) Figure8. TheillustrationoftheIBGNnetworkstructureGassociatedwithvariables. any IBGN has a total number of (k +1)k possible random links E with the maximal score under the above constraints. ∗ ∗ G(cid:48) variables, with k∗ possible variables for tables t, k∗ possible In particular, we consider the Bayesian Information Criterion variables for atomic actions a, k∗(k∗−1) possible variables (BIC) as the score function, which addresses the overfitting 2 for interval relations r and k∗(k∗−1) possible variables for issue by introducing a penalty term in the structure, as proved 2 interval relation constraints c. An exemplar IBGN model can in the Appendix B: be described in Figure 8, where each vi is associated with BIC-Score(G:Dl)=BIC-Score(G(cid:48):Dl)+λ vlaenaardrinacbiin,ljeg,swptiriotahbnld1em≤ai,iias≤nddjeefi≤ancekhd∗.eaiT,sjoitstohaifisssneodncdiaa,teoGdur=wIiB(thVGGvN,aErsiGatrb)ulcestuurcir,hej =mΦaxlog|Wi∏=G1(cid:48)|∏jΠ(cid:99)=i1k∏=81φinj(cid:48)ikjk−log2|Dl|·|Wi∑=G1(cid:48)|7·Π(cid:99)i+λ, (3) that the score of G given Dl is maximum. where λ is a constant. Π denotes the parents of the i-th i moNdeexltt,owaeceomrrpelsopyosntdruincgtuprerocbolnesmtraininBtsatyoetsriaannslnaetetwthoerkIsB.GN pnoosdseibilne WinGst(cid:48)anatniadtioΠ(cid:99)nis=of(Mthe+p1a)re|Πnit|,sewthΠichofisthtehei-tnhumnobdeer. Ionf i Definition 2. Given an IBGN model G, its corresponding fact, Π(cid:99)i =1 or (M+1)2. For example, suppose v(cid:48)n,n be the (cid:48) BGa(cid:48)y=esi(aVnG(cid:48),nEetGw(cid:48))orwkhiesredeVfiGn(cid:48)e=d UasG(cid:48)a(cid:83)WdiGre(cid:48)c,tewdithbipthaertitsetrugcrtauprhe ia-nthd nvo(cid:48)nd→e ivn(cid:48)n(cid:48),Wn Ge(cid:48)xiasntdinv(cid:48)nE(cid:48),Gv(cid:48)(cid:48)n (∈i.eU.Gi(cid:48)n.-dIfegtrheee(lvi(cid:48)nn(cid:48)k,ns) =v(cid:48)n(cid:48)2→), tvh(cid:48)ne(cid:48),nn constraints such that the node v(cid:48)n(cid:48),n has two parents (i.e. Πi = {v(cid:48)n(cid:48),v(cid:48)n}) whose (1)|UG(cid:48)|=k∗, |WG(cid:48)|=k∗(k∗−1)/2, number of categories is (M+1), and thus Π(cid:99)i = (M+1)2; (2) :v(cid:98)=M+1, :v(cid:98)=8, otherwise, Π(cid:99)i=1. In addition, Φ is the parameter vector such ∀v(cid:48)∈UG(cid:48) (cid:48) ∀v(cid:48)∈WG(cid:48) (cid:48) that ijk:φijk=P(xik πij), where xik and πij denotes that the (3) v U :in-degree(v(cid:48))=0, v W :in-degree(v(cid:48))=0 or 2,i-thn∀odeinW isassi|gnedwiththek-thelementanditsparent ∀ (cid:48)∈ G(cid:48) ∀ (cid:48)∈ G(cid:48) G(cid:48) nodes are assigned with the j-th element (i.e. an instantiation where v(cid:98)(cid:48) denotes the number of distinct elements of v(cid:48). of its parent set Π), respectively; n indicates how many i (cid:48)ijk A node v(cid:48)n in UG(cid:48) (1≤n≤|UG(cid:48)|) maps to the variable an instances of Dl contain both xik and πij. Note that any node itno Gth,ewvahriilaebalenrond(cid:48),en vin(cid:48)n(cid:48),nG.inTWhaGt(cid:48) i(s1, ≤UGn(cid:48)(cid:48)=<{na1≤,.|.U.,Ga(cid:48)|k)∗}maanpds sve(cid:48)nv(cid:48),ner∈alWteGc(cid:48)hnhiaqsueesigchatnelbeemeemntpslo(iy.ee.dv(cid:100)t(cid:48)no(cid:48),nle=arn8)t.hAetsttrhuicstusrteagoef, WinGtr(cid:48)o=duc{erd1,2t,or1r,3e,p.r.e.s,ern1t,k∗th,.e..a,brske∗−n1c,ek∗}o.f Naontiocdeethoartaarenlualtlioins GG(cid:48)(cid:48)∗e=ffiacrigenmtlayx[B5I]C,-[S1c0o],re[(1G6](cid:48).:DAlf)te,rafilnindkingent,hneisbeinstGstruifctaunrde in an instance of Dl (constraint (2)), and thus there are M+ G(cid:48) (cid:48) only if in-degree(v W )=2. 1Rpo+ss1ib=le8aptoomssiicb-laectrieolnatiaosnsiganssmigennmtsefnotsr afornoadenoidneUinG(cid:48)Wand. (cid:48)n(cid:48),n∈ G(cid:48)∗ | | G(cid:48) Moreover,anynodev inU hasnoparent,andanynodev B. Learning Parameters (cid:48)n G(cid:48) (cid:48)n(cid:48),n in WG(cid:48) has either being connected to the nodes vn(cid:48) and vn in We first estimate the parameters θ = {θ1,θ2,...,θ(cid:96)} for UG(cid:48) or not being connected to any node (constraint (3)). That node generation. Since the variable tn is latent for each node means a link en(cid:48),n exists in G if and only if its corresponding vn in our generative process, we shall approximately estimate node v(cid:48)n,n in G(cid:48) has in-degree(v(cid:48)n,n)=2. The structure of G(cid:48) the posterior distribution on t by Gibbs sampling. Formally, (cid:48) (cid:48) associated with the variables is illustrated in Figure 9. we marginalize the joint probability in Eq.(2) and derive the posteriorprobabilityoflatentvariabletn beingassignedtothe UG′: v′1(a1) v′2(a2) ...... v′n(an) ...... v′k∗−1(ak∗−1) v′k∗(ak∗) table Tζ (1≤ζ ≤(cid:96)) as follows: FWiGg′u:re(rv91′1(cid:15)(cid:15),,22.)(cid:1)(cid:1) Th..e.co&&(rrvr1′1e,,nns)(cid:1)(cid:1)pon..d.ing++(rv1′1b,,kki∗∗p−−11a)r{{tite,,(rvg1′1r,,kka∗∗)p{{hG(cid:48)..o.fth,,(((evr′nn,,Ikk∗∗B−−11G)N,,’’(rvn′1s,,kkt∗∗r)uc..t.ur,,++(ervk′k∗∗G−−11.,,kk∗∗) P(tn=Tζ|t−n,a,r;α,β)∝∑Mi(cid:48)=1n(anζa,˜i,ζ−,in(cid:48),+−nβ+ζ,β˜iζ,i(cid:48))× nn++nαααtζζζζ−−11 iiffζζ≤=NNTTnn+1 where na is the count that nodes have been assigned to the ζ,i (cid:48) Now, the problem of determining whether a link should atomic action type A at the table T , and nt is the count of i ζ ζ (cid:48) exist in IBGN is converted to the problem of finding a set of nodes in Gd that has been assigned to the table Tζ. ˜i refer to 7 the atomic action assignments for nodes vn. NTn is the count By applying a Lagrange multiplier to ensure ∑|rC=z1|ϕi,j,z,r=1, of occupied tables with ∑Ni=T1nnti=n−1. The suffix −n of na maximum likelihood estimate for ϕi,j,z,r is means the count that does not include the current assignment nri,j,z,r otafbtlaeblseelfeocrtitohnendoudriengvnC.αRζP;isβtζh,ei tiusntihnegDpairriacmhleetterpfroiorrthfeorζt-hthe ϕi∗,j,z,r= ∑|rC(cid:48)=z|1nri,j,z,r(cid:48), (5) i-th atomic action conditioned on the ζ-th table. We provide where nri,j,z,r is the number of links en,n are labeled with thedetailedderivationsoftheGibbssamplinginAppendixC. the r-th relation in Dl, with an =Ai, an=(cid:48) Aj, cn,n=Cz and (cid:48) (cid:48) Bysamplingthelatenttablesfollowingtheabovedistribution, en(cid:48),n|=G∗. Note the trivial cases of ϕi∗,j,z,r =1 for 1≤z≤7 the distributions of θ (1 ζ (cid:96)) can be estimated as as each of them contains only one element as indicated in ζ ≤ ≤ Figure 6. naζ,i+βζ,i Now, by integrating out the latent variable t with all the θζ∗,i= ∑M (na +β ) , for each 1≤i≤M. parameters derived above, the probability of the occurrence i(cid:48)=1 ζ,i(cid:48) ζ,i(cid:48) of a new instance given the l-th type of complex activity is Normally, the hyperparameters α and β are set as fixed estimated below values before the execution of a Gibbs sampler. In our IBGN model, α and β involve a number of (cid:96) and (cid:96)M prior P(a(cid:48),r(cid:48);Dl)=∏(∑θζ,i)× ∏ ϕi,j,z,r, (6) parameters, respectively. As they are unfortunately unknown i ζ i,j,z,r|=G∗ beforehand, it is difficult to manually encode each parameter wherea(cid:48)andr(cid:48)arethesetsofatomicactionsandtheirrelations to proper values. As a result, we need to instead learn each in the new instance, respectively, and i,j,z,r|=G∗ indicate hyperparameter to obtain reasonable results. The adoption only these links obeying the structure of G∗ are counted. To predictwhichtypeofcomplexactivityanewinstancebelongs of Gibbs sampling enables us to seamlessly incorporate the to, we simply evaluate the posterior probabilities over each of tuning of these hyperparameters as presented in Algorithm 2. the L possible types of complex activities as The stop condition may be that a predefined max iterations l∗=argmaxP(a(cid:48),r(cid:48);Dl). (7) Algorithm2GibbsSamplingAlgorithmwithHyperparameter 1≤l≤L Tuning Method Embedded. V. EXPERIMENTS 1: procedureGIBBS SAMPLER WITH HYPERPARAMETER ESTIMATE 2: s=0; (cid:46)TheinitialiterationoftheGibbssampler. Experiments are carried out on three benchmark datasets 3: Initializethevaluesofα andβ; as well as our in-house dataset on recognizing complex hand 4: repeat 5: s s+1; (cid:46)Thes-thiterationoftheGibbssampler. activities. In addition to the proposed approach IBGN, two 6: G←etthesamplesoflatenttablesgeneratedbytheGibbssampleratcurrent variantswithfixednetworkstructuresarealsoconsidered:One iteration; 7: Updatehyperparametersαnew=f(α),βnew=g(β); is IBGN-C for chain-based structures, where only the links 8: untilterminationconditionsarereached; between two neighbouring nodes in networks are constructed; 9: endprocedure TheotheroneisIBGN-Fforfully-connectedstructures,where all pairwise links in networks are constructed. Several well- has been reached or that an estimation function converges to established models are employed as the comparison meth- a given threshold. To update the hyperparameters α and β, a ods, which include IHMM [20], dynamic Bayesian network convergentmethodproposedbyMinka[19]isusedasfollows: (DBN) [12] and ITBN [31], where IHMM and DBN are α(s+1)=f(α )=α ∑sΨ(∑d∈Dlntζ(s)+αζ)−Ψ(αζ) , iamutphloerms.eAntleldinotenrnoalurpaorawmne,tearnsdarIeTtBuNnedisfoorbbteasintepderfforormmanthcee ζ ζ ζ×∑sΨ(∑d∈Dl|Vd|+∑(cid:96)ζ(cid:48)=1αζ(cid:48))−Ψ(∑(cid:96)ζ(cid:48)=1αζ(cid:48)) for a fair comparison. The standard evaluation metric of β(s+1)=g(β )=β ∑sΨ(naζ(s,)i+βζ,i)−Ψ(βζ,i) , accuracy is used, which is computed as the proportion of ζ,i ζ,i ζ,i×∑sΨ(∑Mi(cid:48)=1(naζ(s,)i(cid:48)+βζ,i(cid:48)))−Ψ(∑Mi(cid:48)=1βζ,i(cid:48)) correact)pErxepdeicrtiimonens.tal Set-Ups: The Raftery and Lewis diag- whereΨisdigammafunction,andthesuperscript(s)indicates nostic tool [24] is employed to detect the convergence of the thesamplegeneratedbytheGibbssampleratthes-thiteration. Gibbssampler(Algorithm2)fortheIBGNfamily.Ithasbeen Next, we estimate the parameters ϕ = ϕi,j,z : 1 i,j observed that overall we have a short burn-in period, which M,1 z C for link generation. It can{be seen ≤that th≤e suggests the Markov chain samples are mixing well. Thus nt ≤ ≤| |} probability distribution of variable rn,n relies on the triplet andnaaresettotheaveragedcountsoftheirfirst1000samples (cid:48) (Ai,Aj,Cz) only (where an = Ai, an = Aj and cn,n =Cz), after convergence. In addition, we utilize the branch-and- (cid:48) (cid:48) and thus we can learn these parameters by maximum like- bound algorithm [5] for constraints-based structure learning. lihood estimate method. In our IBGN model, given a triplet This approach can strongly reduce the time and memory (Ai,Aj,Cz) A A C, the conditional probability distribu- costs for learning Bayesian network structures based on the ∈ × × tion on rn,n is a multinomial over all possible relations in BIC score function (Eq. (3)) without losing global optimality (cid:48) Cz. Then, the likelihood of the parameter ϕi,j,z for P(rn,n guarantees. Besides, to avoid the division-by-zero issue in Ai,Aj,Cz) with respect to Dl is: (cid:48) | ϕpractic=e (i.e.n∑ri,r|jC,(cid:48)z=z,r|1+nρri,j,z,r(cid:48)b=y i0ntrinodEucqi.ng(5)a),smwaellinssmteoaodthuinsge L(ϕi,j,z;Dl)=d∏∈DlP(rn(cid:48),n|Ai,Aj,Cz;ϕi,j,z)=r∏|C=z1|ϕin,rji,,zj,,zr,r, (4) coi∗,nj,szt,rant ρ∑|rC(cid:48)(=zρ|1n=ri,j1,z0,r(cid:48)−+5ρ)|Cizn| the following experiments. 8 b) Time Complexity Analysis: The time complexities of fails with the multiple occurrences of the same atomic ac- IBGN-C and IBGN-F are O(M2+ D T (cid:96)2) for training each tions or when inconsistent relations existing among training l n | | complexactivitycategory,whereT isthenumberofiterations instances. As a considerable amount of multiple occurrences n executed in Algorithm 2. IBGN has an extra time complexity ofthesameatomicactionsandinconsistenttemporalrelations of O(∑lpo=g20K(cid:0)Kp(cid:1)) for structure learning, where K = k∗(k2∗−1). existinCAD14,ITBNperformstheworst.Itcanalsobeseen On the other hand, the time complexities of the IBGN family that IBGN-F with fully-connected relations performs better at the testing stage are the same, which is O(M(cid:96)) for a single than IBGN-C on the Opportunity and CAD14 datasets where test instance. relations are intricate. However, IBGN-F might be overfitted when relations are simple, e.g. the OSUPEL dataset. Overall A. Experiments on Three Existing Benchmark Datasets IBGN outperforms its two variants by its ability to adaptively a) Datasets and Preprocessing: The three publicly- learn network structures from data, where fixed structures available complex activity recognition datasets collected from might face the issue of either overfitting or underfitting. different types of cameras and sensors are considered, as summarized in Table. II. We employ these datasets in our TableIII evaluation due to their distinctive properties: The OSUPEL ACCURACYCOMPARISONONDIFFERENTDATASETS. dataset [3] can be used to evaluate the case where only a handful of atomic action types and simple relations are IHMM DBN ITBN IBGN-C IBGN-F IBGN recorded; Opportunity [25] is challenging as it contains a OSUPEL 0.53 0.58 0.69 0.79 0.76 0.81 Opportunity 0.74 0.83 0.88 0.98 0.96 0.98 largenumberofatomicactiontypesandalsoinvolvesintricate CAD14 0.93 0.95 0.51 0.97 0.98 0.98 interval relations in instances; CAD14 [15] represents the datasets having relatively larger number of complex activity categories. c) Robustness Tests under Atomic-Level Errors: In prac- ticetheaccuracyofatomicactionrecognitionwillsignificantly TableII SUMMARYOFTHEPUBLICLYAVAILABLEDATASETS. affectcomplexactivityrecognitionresults.Toevaluatetheper- formance robustness, we also compare the competing models OSUPEL Opportunity CAD14 underatomicactionrecognitionerrors.First,itisimportantto Applicationtype Basketballplay Dailyliving Composable activities check whether our model is robust under label perturbations Recordingdevices oneordinarycamera 72 on-body sensors oneRGB-Dcamera of10modalities of atomic-action-level (or atomic-level for short). To show E.g.ofatomicactions shoot,jump,dribble, sit,opendoor,wash clap, talk phone, etc. cup,etc. walk,etc. this, we synthetically perturb the atomic-level predictions. E.g. of complex activi- two offensive play relax,cleanup,coffee talkanddrink,walk ties types time,earlymorning, while clapping, talk Figure 10 reports the comparison results on Opportunity sandwichtime andpickup,etc. under two common atomic-level errors. It can be seen that No. of atomic action 6 211 26 types IBGNs are more robust to misdetection errors where atomic No.ofcomplexactivity 2 5 16 types actions are not detected or are falsely recognized as another No.ofinstances 72 125 693 No.ofintervalsperin- 2-5 1-78 3-11 actions. We perturbed the true labels with error rates ranging stance from 10-30 percents to simulate synthetic misdetection errors. Similarly,wealsoperturbedthestartandendtimeofintervals To recognize atomic actions in each dataset, we adopt the with noises of 10-30 percents to simulate duration-detection methods developed in their respective corresponding work. errors where interval durations are falsely detected. It is also That is, we employed the dynamic Bayesian network models clear that IBGNs outperform other competing models under that are used to model and recognize each atomic action duration-detection errors. including shoot, jump, dribble and so on for OSUPEL [31]. Similarly, for CAD14 [15] we adopted the hierarchical dis- In addition, we report the evaluated performances under criminativemodeltorecognizeatomicactionsuchasclap,talk real detected errors caused by the ARC system for atomic- phoneandsoonthroughandiscriminativeposedictionary.For level recognition. We chose three classifiers for atomic action the sensor-based Opportunity dataset, we utilized the activity recognition from the ARC system, i.e. kNN, SVM and DT. recognition chain system (ARC) [4] to recognize atomic Features such as mean, variance, correlation and so on are actionrecognitionfromsensors.Itisworthmentioningthatwe selected by setting a time-sliced window of 1s. After classifi- can also recognize null type of complex activity by labeling cation, each interval is assigned to an atomic action type. As the intervals that are not annotated to any activities in the shown in Figure 10, the models which can manage interval datasets. relations are relatively more robust to the atomic-level errors b) Comparison under an Ideal Condition: First of all, than other models, such as ITBN and IBGNs. Moreover, it is the competing models are evaluated under the condition that evidentthatIBGNsarenoticeablymorerobustthanITBNwith all intervals are correctly detected. Table III displays the around 15% – 87% performance boost. IBGN performs the averaged accuracy results over 5-fold cross-validations, where best among its family because it is more capable of handling the proposed IBGN family clearly outperforms other methods the structural variability in complex activities than the other with a big margin on all three datasets. The reason is that two variants, which may avoid more noise existing in training IBGNs engage the rich interval relations among atomic ac- and testing information. Note similar conclusions are also tions. Although ITBN can also encode relations, it however obtained on the OSUPEL and CAD14 datasets. 9 atomic-level recognitionaccuracy errorrate IHMM DBN ITBN IBGN-C IBGN-F IBGN undersyntheticatomic-levelmisdetectionerrors 0.1 0.31 0.73 0.79 0.92 0.88 0.94 0.2 0.29 0.67 0.74 0.87 0.87 0.87 0.3 0.22 0.65 0.71 0.83 0.84 0.84 undersyntheticatomic-leveldurationdetectionerrors 0.1 0.16 0.45 0.76 0.83 0.88 0.89 0.2 0.16 0.45 0.72 0.83 0.85 0.85 0.3 0.16 0.45 0.69 0.82 0.83 0.83 underrealatomic-leveldetectederrors (a) withmisdetectionerrors (b) withdurationdetectionerrors 0.165(kNN) 0.66 0.62 0.54 0.91 0.79 0.91 0.242(SVM) 0.58 0.54 0.46 0.83 0.71 0.85 Figure12. Performancechangesvs.perturbationofatomic-action-levelerrors. 0.315(DT) 0.16 0.04 0.38 0.71 0.54 0.71 Figure10. Accuraciesunderatomic-levelerrorsonOpportunitydataset. with a frame-rate of 25 FPS, image size of 320 240, and × hand-camera distance of around 0.6-1m. In total, the dataset contains 19 ASL hand-action complex activities, with each having145-290instancescollectedamongallsubjects.Eachof the instance is comprised of 5-17 atomic action intervals. The 19ASLhand-actionsareair,alphabets,bank,bus,gallon,high (a) Ketchup(CaseI) (b) Ketchup(CaseII) school, how much, ketchup, lab, leg, lady, quiz, refrigerator, several, sink, stepmother, teaspoon, throw, xray. b) Atomic Hand Action Detection: To detect atomic- level hand-actions, we make use of the existing hand pose estimationsystem[30]withapostprocessingsteptomapjoint locationpredictionoutputstothebent/straightstatesoffingers. (c) Intervals(CaseI) (d) Intervals(CaseII) To evaluate performance of the interval-level atomic action detection results, we follow the common practice and use the intersection-over-union of intervals with a 50% threshold to identify a hit/false alarm/missing, respectively. Finally F1 scoreisusedbasedontheobtainedprecisionandrecallvalues. Note here the finger bent states are considered as foreground (e) Afractionoftheinterval-basednetwork (f) A fraction of the interval-based net- intervals. (CaseI) work(CaseII) c) Robustness Tests under Atomic-Level Errors: We first Figure 11. Two instances of the complex activity ASL word ketchup. A1- evaluate the performance on simulated synthetic misdetection A5 refer to the straightening state of thumb, index, middle, ring and pinky fingers,respectively(whitebars),whileA6-A10 refertothebendingstateof errors and duration-detection errors. From Fig. 12, we ob- thesefingers(blackbars). serve that overall IBGN is notably more robust than other approaches,meanwhileIHMMconsistentlyproducestheworst results. Our model is relatively robust in the presence of B. Our Complex Hand Activity Dataset atomic-level errors. a) Data Collection: We propose a new complex activity Now we are ready to show the performance of our model dataset on depth camera-based complex hand activities on when working with our atomic-level predictor as mentioned performing American Sign Language (ASL). It is an ongoing previously. Overall our atomic-level predictor achieves F1 effort,andatthemomentitcontains3,480annotatedinstances, score of 0.724. Table IV summarizes the final accuracy com- which is already about 5-fold larger than existing ones. As parisonsofthesixcompetingapproachesbasedonouratomic- illustrated in Fig. 11, complex activities in our dataset are level predictor vs. the atomic-level ground-truth labels on our defined as selected ASL hand-actions. There are 20 atomic hand-action dataset. It is not surprising that in both scenarios actions, which are defined as the states of individual fingers, IBGN again significantly outperforms the rest approaches. It either straight or bent. It is important to realize that in a is worth noting that taking into account the challenging task complex activity, there could be multiple occurrences of the of atomic-level hand pose estimation on its own, the gap in same atomic action, as is also exemplified in Fig. 11(f) where performances of 0.58 vs. 0.86 on predicting over 19 complex an action A2 appears twice in the sub-network. activity categories is reasonably, which is also certainly one Sixteen subjects participate in the data collection, with thing we should improve over in the future. Fig. 13 presents various factors being taken into consideration to add to the the confusion matrix of IBGN working with our atomic-level data diversities. Subjects of different genders, ages groups, predictor. We observe that our system is able to recognize the races are present in the dataset. The male to female ratio is ASL words such as xray, alphabets and quiz very well. At 12to4,racesratiois14to2andparticipants’agesspanfrom the same time, several ASL words turn to be difficult to deal around 15 to 40. For each subject, the depth image sequences with, with accuracy under 50%. This may mainly due to the are recorded with a front-mount SoftKinetic camera while relatively low accuracy of the atomic-level predictor we are performing designated complex activities in office settings, using on the particular atomic actions. 10 TableIV VII. ACKNOWLEDGMENTS COMPLEXACTIVITYACCURACYCOMPARISONSONOURHAND-ACTION This research was supported in part by grant DATASET. CQU903005203326 from the Fundamental Research Funds for the Central Universities in China, grants R-252-000-473- IHMM DBN ITBN IBGN-C IBGN-F IBGN withrealatomic-levelprediction 133 and R-252-000-473-750 from the National University 0.43 0.51 0.54 0.49 0.55 0.58 of Singapore, and A*STAR JCO grants 15302FG149 and withatomic-levelground-truth(idealsituation) 0.67 0.81 0.77 0.82 0.84 0.86 1431AFG120. APPENDIXA PROOFSOFTHEOREM1 air56% 6% 0% 0% 0% 0% 0% 22% 0% 0% 0% 0% 0% 0% 11% 0% 0% 0% 6% We first prove that the composition operation on the alphabbaentks 00%% 803%%609%% 00%% 101%% 00%% 00%% 265%% 00%% 00%% 00%% 06%% 00%% 00%% 00%% 00%% 00%% 00%% 00%% interval relation union set S satisfies the associative◦law. The bus 0% 6% 0% 37% 3% 0% 0% 20% 0% 0% 0% 31% 0% 0% 0% 0% 0% 0% 3% set of all the 127 unions is denoted by S. gallon 0% 7% 0% 0% 43% 0% 0% 0% 21% 0% 0% 0% 0% 0% 7% 0% 21% 0% 0% highSchool 0% 0% 0% 0% 0% 47% 0% 0% 0% 0% 0% 7% 27%13% 0% 0% 0% 0% 7% Definition 3. (Composition Product of Two Relation Sets) howMuch 0% 0% 0% 0% 0% 0% 50%10% 0% 5% 0% 0% 0% 0% 5% 0% 0% 5% 25% ketchup 0% 4% 4% 0% 0% 0% 0% 70% 0% 0% 0% 0% 0% 0% 0% 0% 4% 0% 17% The composition product of two sets of interval relations lab 0% 7% 0% 0% 20% 0% 0% 0% 73% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% R,R S, i.e. R= r ,...,r and R = r ,...,r , where laledgy 00%% 00%% 00%% 00%% 103%% 00%% 00%% 203%% 00%% 500%%600%%200%% 00%% 70%% 00%% 77%% 00%% 00%% 103%% any(cid:48)r∈,r R, is de{fi1ned as|RR|}R =(cid:83)(cid:48) ({r 1(cid:48) r ). |(cid:48)R(cid:48)|} quiz 0% 0% 7% 0% 0% 0% 0% 7% 0% 0% 0% 87% 0% 0% 0% 0% 0% 0% 0% i (cid:48)j∈ ◦ (cid:48) i,j i◦ (cid:48)j refrigerator 0% 0% 0% 0% 0% 0% 0% 13% 0% 0% 0% 0% 87% 0% 0% 0% 0% 0% 0% Lemma 2. (Associative Law on Composition) several7% 0% 0% 0% 0% 7% 0% 29% 0% 7% 0% 0% 0% 35% 0% 7% 0% 0% 7% sink 0% 0% 14% 0% 7% 0% 0% 0% 0% 0% 0% 7% 0% 0% 57% 0% 0% 0% 14% stetepamsoptohoenr 00%% 00%% 00%% 00%% 38%% 107%% 00%% 2177%% 00%% 00%% 00%% 08%% 00%% 00%% 107%%300%%402%% 00%% 283%% (Rx◦Ry)◦Rz=Rx◦(Ry◦Rz), for any Rx,Ry,Rz∈S. throw 0% 0% 0% 0% 0% 3% 3% 7% 0% 0% 0% 0% 3% 7% 0% 0% 0% 47%30% xray 0% 0% 7% 0% 0% 0% 0% 0% 0% 0% 0% 3% 0% 0% 0% 0% 3% 3% 83% Proof. Let Rx = rx,1,...,rx,X , Ry = ry,1,...,ry,Y and Figure13al.paihraCbetosnbafnuksibousgnhialglhomnScahhootowrliMxukcehtocfhupIBlaGblNadyolnegroefqruiuigzrerahtosarenvedral-staseicpnktmiottoheenarspdooanttharoswextr.ay RR(cid:83)(cid:83)zyii,()jr=,kx◦,(i(R{◦rr(xzz,(cid:83),i1=◦,j,.rk.y((,.(cid:83)rj,)yri,,◦zjj,Z◦r(zr}r,xkz,,)ik◦{)a)r,)ny,=djw))(cid:83)hR◦eix,rj◦Re,k(z(Rrax=}y,ni◦y◦R(cid:83)(zkrr)y(i,(=,j(cid:83)j◦Ri∈r,{jzx,(k◦Rr)(x),.(cid:83)i◦SjT,riknyh(,crje)yen,)j,t}◦◦herrzz,c(,kkRo)))mx==◦- positionon theseveninterval relationsetsatisfies theassocia- tive law that is both left and right associative, the associative law also holds on R, which is closed under the composition VI. CONCLUSION operation, i.e. (rx ry) rz=rx (ry rz), for any rx,ry,rz R. ◦ ◦ ◦ ◦ ∈ So, we have (Rx Ry) Rz=Rx (Ry Rz) . We present an interval-based Bayesian generative network ◦ ◦ ◦ ◦ approach to account for the latent structures of complex Lemma 3. (Path Consistency) activitiesbyconstructingprobabilisticinterval-basednetworks Given an IBGN Gd, if xi,k xi,j xj,k for any 1 i< j<k ∈ ◦ ≤ ≤ withtemporaldependenciesincomplexactivityrecognition.In Vd , then for any path P=vi vi vi ... vk vk in | | → (cid:48) → (cid:48)(cid:48) → → (cid:48) → particular, the Bayesian framework and the novel application Gd, xi,k xi,i xi,i ... xk,k. ∈ (cid:48)◦ (cid:48) (cid:48)(cid:48)◦ ◦ (cid:48) of Chinese restaurant process (CRP) of our IBGN model Proof. For any path P=vi vi vi ... vk vk in enable us to explicitly capture inherit structural variability in → (cid:48) → (cid:48)(cid:48) → → (cid:48) → Gd, we have xi,k xi,i xi,k and xi,k xi,i xi ,k, then xi,k each of the complex activities. In addition, we make publicly ∈ (cid:48)◦ (cid:48) (cid:48) ∈ (cid:48) (cid:48)(cid:48)◦ (cid:48)(cid:48) ∈ x (x x ) x x x .Iteratively,wefinallyhave i,i i,i i ,k i,i i,i i ,k available a new complex hand activity dataset dedicated to (cid:48)◦ (cid:48) (cid:48)(cid:48)◦ (cid:48)(cid:48) ∈ (cid:48)◦ (cid:48) (cid:48)(cid:48)◦ (cid:48)(cid:48) xi,k xi,i xi,i ... xk,k. the purpose of complex activity recognition, which contains ∈ (cid:48)◦ (cid:48) (cid:48)(cid:48)◦ ◦ (cid:48) around an-order-of-magnitude larger number of annotated Proof. instances. Experimental results suggest our proposed model (Consistency) outperforms existing state-of-the-arts by a large margin in a Lemma 3 indicates that if any ijk in the network satisfies (cid:52) range of well-known testbeds as well as our new dataset. It the transitivity properties, then any path in the network also is also shown that our approach is rather robust to the errors satisfies the transitivity properties. Hence, the entire network introducedbythelow-levelatomicactionpredictionsfromraw generated through our construction is consistent. signals.Aspartoffuturework,weareconsideringrelaxingthe (Completeness) assumptionthattheIBGNmodelssharethesamestructurefor (by contradiction) representing multiple instances of the same complex activity, Suppose there exists one interval relations xi,n that can- and will instead learn more flexible structures for each class not be generated through our construction, i.e. xi,n / of complex activities by introducing latent structure variables (cid:84)nj=−i1+1(xi,j◦xj,n), which means there exists at least one j∈∗ that decide whether a link should exist in an instance. Also, (1 j∗ n 1) that xi,n /i,j xj ,n. This contradicts the fact ≤ ≤ − ∈ ∗◦ ∗ we will continue the finalization of our hand-action dataset, that the network is consistent. Besides, any possible relation improving the atomic-level atomic action prediction method, inRcanbegeneratedbetweentwoneighboringnodes.Thatis aswellasattemptingtowardtheestablishmentofstandardized tosay,anypossibletrianglethatisconsistentcanbegenerated comparisons on exiting systems. by our model.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.