Table Of Content

Indexing Density Models for Incremental Learning and Anytime Classification on Data Streams Thomas Seidl1, Ira Assent2, Philipp Kranen1, Ralph Krieger1, Jennifer Herrmann1 1Data Management and Exploration Group, RWTH Aachen University, Germany {seidl,kranen,krieger,herrmann}@cs.rwth-aachen.de 2Department of Computer Science, Aalborg University, Denmark, [email protected] ABSTRACT ers with very limited computational resources. The insects’ inter-arrival rate varies from tenths of seconds to tenths of Classificationofstreamingdatafacesthreebasicchallenges: minutes. Anotherexamplestemsfromtheroboticsdomain, it has to deal with huge amounts of data, the varying time whereshapeshavetobeclassifiedforrobotnavigation. Ac- between two stream data items must be used best possible tionisoftenrequiredinlessthanthetimenecessaryforexact (anytime classification) and additional training data must computation so that the available time before an interrupt beincrementallylearned(anytimelearning)forapplyingthe shouldbeutilizedasefficientlyaspossible. Forclassification classifier consistently to fast data streams. In this work, we of objects that are placed on a moving conveyor, as in fruit propose a novel index-based technique that can handle all sorting and grading, the available time can vary by up to three of the above challenges using the established Bayes three orders of magnitude [30]. classifier on effective kernel density estimators. Our novel In a recent project we worked on a health application to Bayes tree automatically generates (adapted efficiently to remotelymonitormedicalpatientsviabodysensornetworks theindividualobjecttobeclassified)ahierarchyofmixture [21]. The amount and frequency of data sent by the pa- densitiesthatrepresentkerneldensityestimatorsatsucces- tients depends on their condition as preclassified locally on sively coarser levels. Our probability density queries to- the sensor controller. The receiving server has to classify gether with novel classification improvement strategies pro- all incoming patient data as accurately as possible. With vide the necessary information for very effective classifica- many patients potentially sending aggregated or detailed tion at any point of interruption. Moreover, we propose data, the amount of data and their arrival rate may vary a novel evaluation method for anytime classification using widely. Hence, to treat each patient in a global emergency Poissonstreamsanddemonstratetheanytimelearningper- situation,e.g. anearthquake,classificationforeachindivid- formance of the Bayes tree. ualobservationmustbeperformedquicklyandbeimproved as long as time permits. 1. INTRODUCTION Thesetasksthuscallforclassificationundervaryingtime Anytime classification has gained importance with ubiq- constraints. The streaming rate may vary widely yielding uitous streams in applications ranging from sensor data in possiblylargedifferencesintheamountoftimeavailablefor monitoring systems to speech recognition. In monitoring computation. The enormous amounts of data arriving in systems, immediate reaction to specific events is necessary, streamingapplicationscanneitherbestoredpriortoclassi- e.g. for emergency detection. Likewise, interactive voice re- fication nor can classification of one item take longer than sponseimpliesdirectrecognitionofincomingspeechsignals. the time until the next item arrives. Applications of anytime classification have the following Traditionalclassificationaimsatdeterminingtheclassla- characteristics in common: First, typically large volumes bel of unknown objects based on training data. Different of data need to be processed. Second, immediate reaction classification approaches are discussed in the literature, in- is required at interruption, where in many applications the cluding Bayes classification [2, 12], neural networks [7, 27], time allowance is not known in advance, but depends on decisiontrees[24,20],supportvectormachines[6]andnear- varying stream volume or user interruption. To classify an est neighbor techniques [25]. instance from a bursty stream, the time available for clas- Incontrasttoanytimealgorithms,socalledbudget orcon- sification can vary from milliseconds to minutes as Ueno et tract algorithms[36,8]useafixed(budgeted)timelimitfor al. noted in [30]. In their recent work they classify insects their calculation. If the available time is less than the con- based on audio sensor monitoring using embedded comput- tracted budget, they cannot guarantee to give a result at all. Another severe limitation is their inability to improve the result if more time is available, as their classification modelissimplifiedtothedegreerequiredtofulfilltheircon- Permissiontocopywithoutfeeallorpartofthismaterialisgrantedpro- tract. Their only chance is to restart computation with a videdthatthecopiesarenotmadeordistributedfordirectcommercialad- moredetailedmodel. Thisconstitutesthemajordrawback, vantage,theACMcopyrightnoticeandthetitleofthepublicationandits since they cannot use the results they obtained so far, but dateappear,andnoticeisgiventhatcopyingisbypermissionoftheACM. have to start computation from scratch. Tocopyotherwise,ortorepublish,topostonserversortoredistributeto lists,requiresafeeand/orspecialpermissionsfromthepublisher,ACM. Finally, when dealing with data streams, not only the EDBT2009,March24–26,2009,SaintPetersburg,Russia. amountofdatatobeclassifiedbecomesverylargebutthere Copyright2009ACM978-1-60558-422-5/09/0003...$5.00 311 is often also constantly new training data available. There cy 1 a hasbeensomeresearchonexploitingthisnewtrainingdata ur0,8 c by incrementally training the classifier as the data stream ac0,6 proceeds[10,11,33]. Asapplicationsonemaythinkofmon- 00,44 itoring systems, where an expert sporadically inspects the 0,2 system and checks the current status manually to provide 00 new training data. Another example could be a production 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 time site where only goods of a certain type or class arrive for Figure 1: Prototypical anytime classification accu- a period of time which can then be used for incremental racy: accuracy increases over time. learning of the classifier. Sofar,classifiersthatfocusedonanytimelearningbuilta contract classifier [5, 29, 32], i.e. they are not able to per- formanytimeclassification. Anytimeclassifiers[9,13,30]on 2. PRELIMINARIES theotherhandsofarhavenotbeenabletoprofitfromnovel In this section we first review properties of anytime clas- trainingdataunlesstheyweregivenalargeamountoftime sification and its related work. Next we decribe Bayes clas- to retrain their classifier. For a stream classification appli- sification and density estimation and section 2.3 discusses cationitisimportanttosupportbothanytimelearningand anytime learning and hierarchical indexing. anytime classification to allow consistent use on fast data streams. To the best of our knowledge, no such classifier exists so far. 2.1 Anytimeclassification 1.1 Contribution Anytime classifiers are capable of dealing with the varying time constraints and high data volumes of stream ap- OurworkenablesanytimeBayesclassificationusingnon- plications. The advantages of anytime classifiers can be parameterized kernel density estimators. Consistent with summarized as flexibility (exploit all available time), in- classifying fast data streams our technique is able to incre- terruptibility (provide a decision at any time of interrup- mentally learn from data streams at any time. We propose tion)andincremental improvement(continueclassifica- ournovelBayestreeindexingstructurethatprovidesaggre- tion improvement without restart). gatedmodelinformationaboutthekernelsinallsubtreeson Theusefulnessofananytimeclassifierdependsonitsqual- all hierarchy levels of the tree. In our new probability den- ity,whichcanbedeterminedwithrespecttoaccuracyand sity query, descending the tree is based on strategies that accuracy increase. Accuracy referstothequalityofthe favor classification accuracy. As different granularities are resultingclasslabelassignmentandismeasuredastheratio available and compact models are located at the top levels, betweencorrectlyclassifiedobjectsandthetotalnumberof the probability density query can be interrupted early on classified objects |{G(x )=y |x ∈TS}|/|TS|, where G is a (starting with a unimodal model at root level) and refines i i i classifier,x isanobjectinthetestsetTS andy itscorrect themodelaslongastimepermits. Ournovelanytimeclas- i i label. In each time step the accuracy for an anytime classi- sifier does not only improve its result incrementally when fier can be measured as the absolute classification accuracy moretimeisgranted,butalsoadaptsthemodelrefinement (as in traditional classifiers) or as the relative classification individually to the object to be classfied. accuracy with respect to unbound time classification. The design goals of our anytime approach include Accuracy increaseistheimprovementofaccuracyover • Statistical foundation. Our classification uses the time. Ideally, even for little time, classification accuracy establishedBayesclassifier basedonkernelestimators. is close to the best answer. With more time, classification accuracyshouldsoonreachtheclassificationaccuracyofthe • Adaptability. Themodelisadaptedtotheindividual infinitetimeclassifier[36,30]. ThisisillustratedinFigure1. query object, i.e. the mixture model is refined locally Earlyon,theidealanytimeclassifierprovideshighaccuracy with respect to the object to be classified. with rapid increase. In stream classification the available computation time • Anytime classification and anytime learning for varies,thereforetheaccuracyofananytimeclassifierisgiven applyingtheBayestreeconsistentlyinapplicationson by the average classification accuracy with respect to the fast data streams. inter-arrival rate (speed) of the data stream. The accuracy Our contributions on a technical level include increase also influences the stream specific anytime classification accuracy, since a steeper increase yields a better • Novelhierarchicalorganizationofmixturemod- accuracy even if marginally more time is available and thus els. Ournovelindexstructureprovidesahierarchyof a higher average accuracy for the same inter-arrival rate. mixture density models and allows fast access to very Therefore we propose a novel stream specific anytime clas- fine-grained and individually adapted models. sification evaluation. Details are given in Section 4. • Probability density queries. Novel access strate- Anytimeclassificationhasbeendiscussedforboostingtech- gies where traversal decisions are based on the prob- niques [23], decision trees [13], SVMs [9], nearest neighbors ability density of the query object with respect to the [30] and Bayesian networks [19, 22]. An anytime algorithm current mixture model. ondiscrete-valueddataisdiscussedin[34]. Asstatedabove, non of these classifiers allows for incremental learning. For • Poisson stream evaluation. We propose a novel Bayesclassifiersbasedonkerneldensitiesnoanytimeclassi- evaluation method for anytime classification on data fierexiststothebestofourknowledge. Wereviewanytime streams using Poisson processes. density estimation in the following section. 312 2.2 Bayesclassification,mixture using a Gaussian mixture model M as follows: densitiesandkernels k (cid:88) A classifier is a function G which maps an object x to p (x|c )=M (x)= w ·g(x,µ ,Σ ) M i ci j j j a label ci ∈ C. In our work, we focus on the Bayesian j=1 classifierwhichusesthestatisticalBayesiandecisiontheory for classifying objects [2, 12]. The fundamental theory of Ifonlyvariances(σ1,...,σd)areconsideredandcovariances the Bayesian classifier provides a broad range of theoretical are neglected, the covariance matrix Σ is replaced by a di- analyses ranging from asymptotic behavior to mean error agonal matrix Σ(cid:48) and det(Σ(cid:48))=(cid:81)d σ(cid:48). i=1 i rates for many applications. The well studied properties of Another approach to density estimation are kernel densi- Bayesclassifiersmakeitawidelyusedapproachforclassifi- ties, which do not make any assumption about the under- cation. lying data distribution (thus often termed“model-free”or The Bayesian classifier is based on probabilistic distri- “non-parameterized”density-estimation). Kernelestimators bution estimation of existing data. Based on a statistical canbeseenasinfluencefunctionscenteredateachdataob- model of the class labels, the Bayes classifier chooses the ject. Tosmooththekernelestimatorabandwidthorwindow- class with the highest posterior probability P(ci|x) for a width hi isused. ComparingDefinition2withkerneldensi- given query object x according to the Bayes rule P(c |x)= ties, the bandwidth corresponds to the variance of a Gaus- i (P(c )·p(x|c ))/p(x). sian component. i i Definition 1. Bayes classification. Definition 3. Kernel density estimation. Given a set of class labels C = {c ,...,c } and training The class conditional probability density for an object x and 1 m data D = {(x1,y1),...,(xn,yn)} with objects xi ∈ IRd and class ci based on a training set D={(x1,y1),...,(xn,yn)} class labels yj ∈C assigned to each object, the Bayes classi- and kernel K with bandwidth hi is given by: ficlearssG-cBoanydesitieosntiamladteenssitthyepa(xp|rciio)ribapsreodbaobnilitthyePtr(aciin)inangddathtae pK(x|ci)= |D 1|·hd (cid:88) K(cid:18)(cid:107)x−hxj(cid:107)(cid:19) D and assigns an object x to the class with the highest a ci i xj∈Dci i posteriori probability: Thus, the class conditional probability density for any ob- GBayes(x)=argmax{P(ci|x)}=argmax{P(ci)·p(x|ci)} jectxistheweightedsumofkernelinfluencesofallobjects ci∈C ci∈C x of the respective class. In this paper, we use the Gaus- j oWbjeecatlssobedloenfigniengDtcoi a=s{p(excijfi,cyjc)la∈ssDci|.ci = yj} as the set of siankernelKGauss(x)= (2·π1)d/2e−2xh2i alongwithGaussian mixture models in a consistent model hierarchy to support mixing of models and kernels in the Bayes tree. The a priori probability P(c ) can be easily estimated i Therehasbeensomeresearchonanytimedensityestima- fromthetrainingdataastherelativefrequencyofeachclass tion [35, 14]. It is however not described how to use these P(c ) = |Dci|. Note that p(x) is left out in the last term, i |D| approachesforanytimeclassification. Aswewillclearlysee becauseitdoesnoteffecttheargmaxevaluation. Sincexis in section 4, this is not a trivial task. Neither a naive so- typicallymultidimensional,thetaskofestimatingtheclass- lution nor either of the straight forward solutions delivers conditional density p(x|c ) is not trivial. The naive Bayes i competitive results. classifierusesastraightforwardmodeltoestimatethedata In terms of classification accuracy, Bayes classifiers using distribution by assuming strong statistical independence of kernelestimatorshaveshowntoperformwellfortraditional thedimensions. Othermodelsdonotmakethisstronginde- classification tasks. Especially for huge training data sets pendenceassumptionbutestimatethedensityforeachclass the estimation error using kernel densities is known to be by taking the dependencies of the dimensions into account. very low and even asymptotically optimal. In section 3 we A simple method is to assume a certain distribution of the propose a novel indexing structure that enables kernel den- data. Any model assumption (e.g. a single multivariate sityestimationonhugedatasets(e.g. asinstreamapplica- normal distribution) may not reflect the true distribution. tions) and incremental learning at any time. Mixture densities relax this assumption that the data fol- lowsexactlyoneunimodalmodelbyassumingthatthedata 2.3 Anytimelearningusing follows a combination of probability density functions. In hierarchicalindexing our work we use Gaussian mixture densities. Typically, supervised classification is performed in two steps: learning and classification. So far we discussed the Definition 2. Gaussian mixture densities. qualitycriteriaforanytimeclassification. However,instream A multivariate Gaussian mixture model M combines k applicationsitisoftenessentialtoefficientlylearnfromnew Gaussianprobabilitydensityfunctionsg(x,µ,Σ)oftheform labeled objects. As the time to learn from new objects is g(x,µ,Σ)= 1 e(−21(x−µ)TΣ−1(x−µ)) tryuppitceadllaytlaimnyitteidm,ecldaussriifinegrsleaarrenirnegq.uTirhedecwlahsicsihficcaatniobneminotdere-l (2π)d/2·det(Σ)1/2 whichhasbeenlearneduptothispointintimeisthenused where Σ is a covariance matrix, Σ−1 its inverse and det(Σ) for all further classification tasks. We call a classifier which the determinate of Σ. When combined, the functions are canlearnfromnewobjectsatanytimeananytimelearning weighted with a weight w where (cid:80)k w = 1. The class algorithm (also called incremental learning). j j=1 j conditional density probability for an object x and class c ∈ An important quality criterion for anytime learning algo- i C represented by k Gaussian components is then calculated rithms is the runtime complexity of updating the learned 313 model. While using lazy learning (e.g. nearest neighbor) Rangge qqueryy Bayyestreesolution MMBBRRMMBBRRMMBBRR PPrruunneeddddaattaa allows for updating the decision set in constant time, the Necessary information lackofaconcisemodelofthedatastopsthisapproachfrom toanswera beingeffectiveintheanytimeclassificationcontext. Anany- (qqu,e εr)y‐range MBRMBRMBR time classifier based on the nearest neighbor approach has been proposed in [30]. However, to perform anytime clas- DDD DDDD sification, [30] uses an non-lazy learning method that has a PPrroobbaabbiilliittyyddeennssiittyyqquueerryy Howtorealize superlinear complexity for new objects. Ninefocremssaatriyon ? anytimepwruitnhinogu?t Aggregate statisticalinformation We propose a hierarchy of models for Bayes classification ttooaannsswweerraa ?? ?? on higgherlevelsofthetreeto q‐probability enableclassificationwithout thatallowsincrementalrefinementtomeettherequirements densityquery readingtheentireleaflevel. of anytime classification and anytime learning. Our novel KKK KKKK KKK KKKK KKKK Bayes tree indexes density models for fast access. Using multidimensionalindexingtechniquesenablesourBayestree Figure 2: Pruning as in similarity search is not pos- to learn from new objects in logarithmic time. sibleforprobabilitydensityqueries,sinceallkernels Multidimensional indexing structures like the R-tree [15] havetobeaccessedtoansweraq-probabilitydensity have been shown to provide substantial efficiency gains in query. similarity search. The basic idea is to organize the data efficiently on disk pages such that only relevant parts of the database have to be accessed. This is based on the as- specifiedviaaqueryobjectandasimilaritytolerancegiven sumptionthatinsimilaritysearch,queryprocessingrequires byathresholdrangeε. Usingthisεrange,irrelevantpartsof only a small portion of the data. This assumption does not the data are pruned based on directory information during hold in Bayes kernel classification. As the entire model po- query processing. Thus the amount of data that has to be tentially contributes to the class label decision, the entire accessedisreduced,whichinturngreatlyreducesruntimes. databasehastobeaccessedinordertoperformBayesclas- ThisisillustratedinFigure2(topleft). Thequeryandthe sificationwithoutlossofaccuracy(seeSection3fordetails). εrangeallowforstraightfowardpruningofdatabaseobjects Consequently,simplystoringobjectsforkernelestimationin whose directory entries are at more than ε distance (with multidimensionalindexesdoesnotsufficeforanytimeclassi- respect to Euclidean distance). This scenario also applies fication. The Gauss-tree [4] builds on the R*-tree structure for k nearest neighbor queries [26]. [3]toprocessesunimodalprobabilisticsimilarityqueries. As IntheBayestree,thedataobjectsarestoredatleaflevel the application focus is on similarity search, it does neither asinsimilaritysearchapplications. Asclassificationrequires allowforanytimeclassification,norformanagementofmul- reading all kernel estimators of the entire model, accuracy timodal density estimators. For anytime classification, we would be lost if a subset of all kernel densities was ignored. therefore propose a hierarchical indexing of mixture densi- Consequently, there is no irrelevant data, and hence the ties. It includes kernel densities as the most detailed level pruning as in similarity search is infeasible when dealing and allows for interruption at any level of the hierarchy, with density estimation. As for the accuracy increase, ac- using as much detail as was processed until the point of in- cessingtheentirekerneldensitymodelisnotonlyinefficient terruption. but also not interruptible. Moreover, kernel densities provide only a single model of the data, i.e. no incremental 3. THEBAYESTREE improvementofclassificationispossibleasrequiredforany- time classification. Having introduced anytime algorithms and Bayes classi- As illustrated in Figure 2 (bottom left), the upper levels fication using kernel density estimation, we are now ready ofanindexthatsupportsanytimeclassificationneedtopro- to propose our novel anytime approach. Our overall goal vide information that can be used for assigning class labels is a classification algorithm that can be interrupted at any even before the leaf level containing the kernels is reached. givenpointintimeandthatproducesameaningfulclassifi- The Bayes tree solves this problem by storing aggregated cationresultthatimproveswithadditionaltimeallowances. statistical information in its inner nodes (Figure 2 right). The Bayes classifier on kernel densities can not provide a Our approach is therefore a hierarchy of mixture models meaningful class conditional density p(x|c ) at an arbitrary i built on top of kernel densities. Each hierarchy level pre- (early)pointintime,sincetheentiremodelmaypotentially serves as detailed information on the underlying observa- contribute to the class label decision (cf. Section 2.3). In tionsasthenodecapacitypermits,whileallowinginterrup- the next section we give an overview of our technique for tion and adaptive query-based refinement as long as more estimating the class conditional density before giving the time is available. technical details in the following sections. Finally, we de- Thus, our Bayes tree approach comprises: scribe how the classification decision is made and improved using our novel index structure. • Hierarchical indexing of mixture densities that en- 3.1 Outline ables fast access for refinement of the current model As discussed in Section 1, any anytime classifier has to provide an efficient way to improve its results with addi- • Adaptive refinement of the most relevant parts of tional time. Indexing provides means for efficiency in sim- the model depending on the individual query ilarity search and retrieval. By grouping similar data on hard disk and providing directory information on disk page • Kernel densities for Bayes classification at the entries,onlytherelevantpartsofthedataareaccesseddur- finest level of the hierarchy ing query processing. Similarity search queries are usually 314 T = {t ,...,t }. An entry e then stores the fol- MBRs∙1 ns∙1 Σt(s∙1, i) Σt(s∙1, i)2 ptrs∙1 MBRs∙2 ns∙2 Σt(s∙2, i) Σt(s∙2, i)2 ptrs∙2 [ freespace] s (s,1) (s,ns) s lowing information about the subtree T : s •Theminimum bounding rectangleenclosingtheobjects stored in the subtree T as MBR =((l ,u ),...,(l ,u )) s s 1 1 d d nodes s∙1 s∙2 • A pointer ptrs to the subtree Ts es∙1 • The number n of objects in T nodes∙1 K K K K K s s (cid:80)ns (cid:0) (cid:1) • The linear sum t of all objects in T (s,j) s j=1 Figure 3: Nodes and entries in the Bayes tree • The quadratic sum (cid:80)ns (cid:0)t 2(cid:1) of all objects in T (here: above the kernel level). Entries es store min- (s,j) s j=1 imum bounding rectangles (MBR ) and child point- s ers (ptr ) and additionally information to compute s mean and variance, i.e. number of objects (ns), lin- Please note that all objects stored in the leaves of the ear sum ((cid:80)ts,i), quadratic sum ((cid:80)t2s,i). Kernel po- Bayestreeared-dimensionalkernels. Figure3illustratesthe sitionsaredepictedasdots,thehigherlevelmixture structure of a Bayes tree node entry. From the information density is represented as density curves. stored in each entry according to Definition 4, mean and variance are computed as follows [17]: 3.2 StructureoftheBayestree Lemma 1. Computing mean and variance. In this section we describe the Bayes tree structure for a The mean µ and the variance vector σ2 for a subtree T s s s single class ci. Section 3.3 details the processing of proba- canbecomputedfromthestoredvaluesoftherespectivenode bility density queries. entry e as s ThegeneralideaoftheBayestreeisahierarchyofmixture densities stored in a multidimensional index. Each level of 1 (cid:88)ns (cid:0) (cid:1) µ = t the tree stores at a different granularity a complete model s ns (s,j) j=1 of the entire data. To this end, a balanced structure as in R-trees [15, 3] is used to store the kernels at leaf level. The σ2 = 1 (cid:88)ns (cid:0)t 2(cid:1)−(cid:32) 1 (cid:88)ns (cid:0)t (cid:1)(cid:33)2 hierarchy on top is built in a bottom-up fashion, providing s n (s,j) n (s,j) s s ahierarchyofnodeentries,eachofwhichisaGaussianthat j=1 j=1 representstheentiresubtreebelowit. Theindexisdynamic OurBayestreeextendstheR*-treetostoremodelspecific andeasilyadaptstonoveldataasnottheactualparameters information in the following manner: of the Gaussian, its mean and its variance, are stored, but easily maintainable information for their computation. Definition 5. Bayes tree. MultidimensionalindexingstructureslikeR-treessumma- A Bayes tree with fanout parameters m,M and leaf node rizedatarecursivelyinminimumboundingregions(MBRs). capacity parameters l,L is a balanced multidimensional in- Leaves store the actual data objects and the subsequently dexing structure with the following properties: higher levels provide directory information via the MBRs. For kernel densities or mixture densities, MBR information • Each inner node node contains between m and M s is not sufficient for describing the probability density dis- entries (see Def. 4). The root has at least a single tribution of subtrees. Consequently, we propose storing the entry. necessaryinformationtocomputeparametersofthemixture densities at any given level of the hierarchy. • Each inner node with ν entries has exactly ν child s s Recall that the parameters of mixture densities are the nodes mean and variance of Gaussians (µ and σ , respectively). i i Storingthisinformationdirectlywouldrequireaccessingthe • Leaf nodes store between l and L observations actualobjectsontheleafleveltogenerateanyhigherdimen- (d-dimensional kernels). sional model representation. For example, assume we have • A path from the root to any leaf node has always the two mixture densities. Building a coarser mixture density same length (balanced). that represents both is not possible without additional information,sincemeanandvariancearenotdistributive,i.e. This structure has some very nice and intuitive benefits: fromtwomeansalonewecannotinfertheoverallmean. In- sincethenumberofobjects,theirlinearsumaswellastheir stead,thenumberofinstancesisrequiredtoweightthetwo quadratic sum are all distributive measures, they can effi- means accordingly. Mean and variance are thus algebraic ciently be computed bottom up. More precisely, we simply measures that require the number of instances, their linear use the build procedure of any standard R*-tree to create sumandquadraticsumforincrementalcomputation,which our hierarchy of mixture densities. Additionally, using an is exactly what Bayes tree entries store: indexstructurefromtheR-treefamilyautomaticallyallows both processing of huge amounts of data and incremental Definition 4. Bayes tree node entry. learning at any time. A subtree T of a d-dimensional Bayes tree is associated Recall that R*-trees, or any other tree from the B-tree s with the set of objects stored in the leaves of the subtree: or R-tree family, grow in a bottom-up fashion. Whenever 315 a) therearemoreentriesassignedtoanynodethanallowedby 1 the fanout parameters M, which in turn reflects the page size on hard disk, an overflow occurs. This overflow leads 21 23 to a node split, i.e. the entries of this overfull node are separated onto two new nodes. Their ancestor node then stores aggregate information on these two nodes. In the K K K K K K K K K K 22K1 22K2 22K3 K K K K R*-tree case, this was simply the MBR of the two nodes, for the Bayes tree additionally the overall number, linear b) and quadratic sum have to be computed and stored in the ancestornode. Thisway,inaverystraightforwardmanner, coarser mixture density models are created. Consequently, building Bayes trees is a simple procedure that starts from the original kernel density estimators that arestoredinaleafnodeuntilasplitisnecessary. Thisfirst split leads to creation of a second level in the tree, with c) 21 a coarser mixture density model derived through R*-tree- style split [3]. Successive splitting of nodes (as more kernel 23 densities are inserted) leads to further growth of the tree, 1 eventuallyyieldingahierarchyofmixturedensitymodelsas K K K 221 222 223 coarser representations of the kernel densities. Figure 4: Tree frontier: In a) an exemplary fron- 3.3 Queryprocessing tier in a Bayes tree that corresponds to a model Answeringaprobabilitydensityqueryrequiresacomplete with mixed levels of granularity is depicted: nodes model as stored at each level of the tree. Besides these 2 (root level) and 22 (inner level) are refined, yield- full models, local refinement of the model (to adapt flexi- ing 1 (root level), 21 (inner level), 221, 222, 223 (all bly to the query) provides models composed of coarser and leaf level), 23 (inner level). The resulting mixture finer representations. This is illustrated in Figure 4. In density model is depicted in b), the actual kernel any model, each component corresponds to an entry that positions and the hierarchy are depicted in c). represents its subtree. This entry may be replaced by the entriesinitschildnodeyieldingafinerrepresentationofits subtree. This idea leads to query-based refinement in our dot and the vertical line represent the query object from anytime algorithm. Each mixed granularity model corre- which the above frontier originated. sponds to a frontier in the tree, i.e. a set of entries in the A frontier is thus a model representation that consists of tree, such that each kernel estimator is represented exactly nodeentriessuchthateachkernelestimatorcontributesand once. Taking a complete subtree of all nodes starting from suchthatnokernelestimatorisrepresentedredundantly. To theroot,thefrontierdescribesanon-redundantmodelthat formalizethenotionofafrontier,weneedacleardefinition combinesmixturedensitiesatdifferentlevelsofgranularity. for the enumeration of nodes and entries. Recall that an entry e represents all objects in its cor- s Definition 7. Enumeration of nodes and entries. responding subtree by storing the necessary information to calculate its mean and variance. Hence, a set E = {e } TorefertoeachentrystoredintheBayestreeweusealabel i ofentriesdefinesaGaussianmixturemodelMaccordingto s with the following properties: Definition2,whichcanthenbeusedtoansweraprobability • The initial starting point is labeled e with T being density query. ∅ ∅ the complete set of objects stored in the Bayes tree. Definition 6. Probability density query pdq. • Thechildnodeofanentrye hasthesamelabelsasits s Let E = {ei} be a set of entries, ME the corresponding parent entry, i.e. the child of es is nodes. Ts denotes Gaussian mixture model and n = (cid:80)inei the total number the set of objects stored in the respective subtree of es. of objects represented by E. A probability density query pdq • Theν entriesstoredinnode arelabeled{e ...e }, returns the density for an object x with respect to M by s s s◦1 s◦νs E i.e. thelabelsofthepredecessorconcatenatedwiththe pdq(x,E)= (cid:88) nes ·g(x,µ ,σ ) entry’s number in the node. (recall Figure 3: nodes n es es has two entries e and e with their child nodes s◦1 s◦2 es∈E node and node ). s◦1 s◦2 where µ and σ are calculated according to Lemma 1. es es Using the above enumeration a frontier can be derived For a leaf entry a kernel estimator as discussed in Section from any prefix-closed set of nodes. 2.2 is used and obviously µ is the object itself. es Definition 8. Prefix-closed subset and frontier. Figure 4 b) shows the resulting mixture density for the examplefrontierfromparta). TheleftmostGaussianstems A set of nodes N is prefix-closed iff: from the entry e which is located at root level. The right- 1 • node ∈ N (the root is always contained in a prefix- mostGaussianandtheoneinthebackcorrespondtoentries ∅ closed subset) e ande respectively,theremainingrepresentkernelden- 23 21 sitiesatleaflevel. Partc)oftheimagedepictstheunderly- • node ∈ N ⇒ node ∈ N (if a node is contained in s◦i s ing R*-tree MBRs and the kernels as dots. The bigger blue N, its parent is also contained in N) 316 Let E(N) be the set of entries stored in the nodes contained heighttwoandfanouttwo(root,leaves,andtwomixedmod- (cid:83) in a prefix-closed node set N: E(N) = (e ∈ node). els). For realistic trees, with increasing height and fanout, node∈N the number of possible models is exponential in the height The frontier F(N) ⊆ E(N) contains all entries e ∈ E(N) and in the fanout. that satisfy Choosing among all these models is crucial for anytime algorithms, where the time available (typically unknown a • e ∈F(N)⇒e ∈/ F(N): no predecessors of a fron- s◦i s priori)shouldbespentsuchthatthemostpromisingmodel tier entry is in F(N) (predecessor-free) informationisusedfirst. Fortreetraversalweproposethree • e ∈ F(N) ⇒(cid:54) ∃i ∈ IN with e ∈ E(N): a frontier basicdescentstrategiestoanswerprobabilitydensityqueries s s◦i entry has no successors in E(N) (successor-free) on a Bayes tree. Descent in a breadth first (bft) fashion refines each model level completely before descending down Figure4a)illustratesbothaprefix-closedsubsetofnodes tothenextlevel. Alternatively,wecoulddescentthetreein andthecorrespondingfrontier. Thesubset,separatedbythe a depth first (dft) manner, refining a single subtree entirely curved red line, contains the root, node2 and node23, hence downtotheactualkernelestimatorsbeforerefiningthenext itisprefix-closed. Itscorrespondingfrontier(highlightedin subtreebelowtheroot. Thechoiceofthesubtreetorefineis white) contains the entries e1, e21, e23, e221, e222 and e223 made according to a priority measure. The third approach, wherethelastthreerepresentkernels. Notethatthefrontier which we call global best first (glo), orders nodes globally is both predecessor-free and successor-free. with respect to a priority measure and refines nodes in this Using the above definitions we now define the processing ordering. of a probability density query on the Bayes tree. TheprioritymeasureinR-Trees,whichwedenoteasgeo- Let T∅ be the set of objects stored in a Bayes tree con- metric, gives highest priority to the node with the smallest tainingtmax nodes. Anytimepdq processingforanobjectx (Euclidean) distance based on its minimum bounding rect- processes a prefix-closed subset Nt in each time step t with angle. N0 ={node∅},i.e. theprocessingstartswiththerootnode, In the Bayes tree, the focus is not on distances, but on and|Nt+1|=|Nt|+1,i.e. ineachtimesteponemorenodeis densityestimation. Wethereforeproposeaprobabilistic pri- read. Theprobabilitydensityforthequeryobjectxattime ority measure. It awards priority with respect tothe actual steptisthencalculatedusingthemixturemodelcorrespond- densityvalueforthecurrentqueryobjectbasedonthesta- ingtothefrontierF(Nt)ofthecurrentprefix-closedsubset tistical information stored in each node. More specifically, Nt as in Definition 6, that is pdq(x,(cid:80)F(Nt)). Thereby all attimestept,havingreadtheprefix-closedsetofnodesNt, objectsofthetreeareaccountedfor: es∈F(Nt)nes =|T∅|. the probabilistic approach descends at the next time step t+1 into the subtree T belonging to the entry e with Definition 9. Anytime pdq processing. ˆs sˆ (cid:110) n (cid:111) From time step t to t+1, Nt becomes Nt+1 by adding the esˆ= argmax g(x,µes,σes)· ns child node node of one frontier entry e ∈F(N ). If node es∈F(Nt) s s t s has νs entries, then the frontier F(Nt) changes to F(Nt+1) where x is the current query object and g is a Gaussian by probability density function as in Definition 2. In our experiments we evaluate all combinations between F(N )=(F(N )\{e })∪{e ,...,e } t+1 t s s◦1 s◦νs the three descent strategies and the two priority measures, i.e. e isreplacedbyitschildren. Theprobabilitydensityfor where in the breadth first approach the priority measure s x in time step t+1 is then calculated by determines the order of nodes per level. So far, we described descent strategies in a single Bayes pdq(x,F(N )) = pdq(x,F(N )) t+1 t tree,i.e. refiningmixturemodelsforjustoneclass. Now,we n −|T∅s| ·g(x,µes,σes) Tdihsceutsismreefifnoremcleanstsiwficitahtiroenspisecdtitvoidseedvebraeltwcleaesnsesse{vce1r,a.l.B.,acymes}. +(cid:88)νs n|Ts◦|i ·g(x,µes◦i,σes◦i) trreeaeds,lonnoedpeesrscolafsasr,(cif.e..Sebcetiinogna3t.2t)i.mTehseteqpuetsl,tiwonhiicsh,hoafvtinhge i=1 ∅ m trees should be granted the right to read the next node in the following time step t +1? Followingtheaboveequation,theprobabilitydensityforx l Initially, we evaluate the coarsest model for each class c , intimestept+1iscalculatedtakingtheprobabilitydensity i i.e. the model consisting of only one Gaussian probability for x in time step t (row 1), subtracting the contribution of density function representing all objects of c . the refined entry’s Gaussian (row 2) and adding the contri- i A naive improvement strategy chooses the classes in an butions of its children’s Gaussians (row 3). Hence, the cost arbitraryorderC =(c ,...,c ). First,theclassc isrefined for calculating the new probability density for x after read- 1 m 1 according to one of the above strategies. In the next step, ing one additional node is very low due to the information c is refined and so on. After m time steps all classes have stored for mean and variance. 2 beenrefinedandtheBayestreeofthefirstclassisprocessed Note that after reading all t nodes pdq(x,F(N )) max tmax again. As the classes are not sorted in any predefined way, is a full kernel density estimation taking all kernels at leaf we refer to this method as unsorted. level into account. Intuitively, it is very likely that for an object x the true Eachpossible frontier representsapossiblemodelforour classc showsahighposteriorprobabilityindependentofthe anytime algorithm. The number of possible models in any i time t spent decending its corresponding tree: P(c |x) = Bayes tree depends onthe actual status of the tree, i.e. the ci i pdq(x,F(N ))·P(c )/p(x). degree to which it is filled and the height of the tree. For tci i Consequently, we suggest a second strategy for improv- example, there are four possible models for a Bayes tree of 317 ing classification which only chooses the k classes having Synthetic 11K, descent strategies the highest a posteriori probability. This technique stores 1 these k classes in a priority queue (queue best k). Assume 0,95 tThheesnoertxetdnpordieoriistyreqaudeiunetchoentBaainyesstthreeecloafsscelsas(scic1i,1.a.n.,dcitkh)e. yacy 00,99 pprroobb__gblfot__uunnssoorrtteedd probability model is updated accordingly. If more time is ur0,85 prob_dft_unsorted c pkecrlmasitsteesdhcalvaessbecei2niismrperfionveeddaannddstoheonq.ueAueftiesrfikllesdtepasgaainll cac 0,8 ggeeoomm__gbllfot__uunnssoorrttteeddd 0,75 ggeom__dft__unsorted using the new a posteriori probabilities. We evaluate the influenceoftheparameterk intheexperimentsection. Set- 0,7 ting k = 1, always the model of the most probable class is 00 11 22 33 44 55 66 77 88 99 1100 1111 1122 1133 1144 ppaaggeess refined. For k = m (the number of classes) all classes are Synthetic 1M, descent strategies refined in the order of their probability before the queue is 0,9 refilled. We refer to these approaches as qb1, qbm, respec- 0,8 tively, and qbk in general. prob_glo_unsorted 0,7 One observation which is also known from hierarchical cy prob_bft_unsorted a indexing is that the root node often only contains a few ur0,6 prob_dft_unsorted c entries. We use this free space for storing the entries from cac0,5 geeoomm_glloo_unnssoorrtteedd different classes on the same page if it does not exceed the geom_bft_unsorted pagesize. Wecallthesenewrootpagesthecompressedroot. 0,4 geom_dft_unsorted Thetotalsizeofthecompessedrootvariessincethenumber 0,3 of root entries in a single Bayes tree varies as in all index 0 5 10 15 20 25 30 35 40 45 pages structures. Clearly, the total size also grows linearly in the number of dimensions and the number of classes. Figure 5: Comparing descent strategies along with priority measures on synthetic data. 4. EXPERIMENTS We ran experiments on real world and synthetic data to our kernel estimators we use a common data independent evaluate our Bayes tree for both anytime classification and method according to [28]. anytime learning. Table 1 summarizes the data sets and In the following we first evaluate the descent strategies their characteristics. The synthetic data sets were gener- along with the priority measures introduced in Section 3.3. ated using a random Gaussian mixture model as in [17]. In Section 4.2 we analyze various improvement strategies We created one small data set containing 4 dimensions, 2 to find that none of the three simple approaches leads to classes and eleven thousand objects and one large data set competitiveresults. Afterthatweintroduceournoveleval- containing 5 dimensions, 3 classes and one million objects. uationmethodfor anytime classifiers onPoissonstreams in Furthermore we use real world data of various characteris- Section 4.3. Finally we demonstrate the anytime learning ticswhichweretakenfromdifferentsourcessuchastheUCI performance of the Bayes tree for both anytime accuracy KDD archive [18]. and stream accuracy. In all experiments we use a 4 fold cross validation. To evaluate anytime accuracy we report the accuracy (aver- 4.1 Descentstrategies aged over the four folds) of our classifiers after each node For tree traversal we evaluate depth first (dft), breadth (correspondingtoonepage)thatisread. Thefirstaccuracy first (bft) and global best first (glo) strategies. Combined value corresponds to the classification accuracy of the uni- withboththegeometric(geom)andprobabilistic(prob)pri- modalBayesclassifierthatweuseasaninitializationbefore ority measure we compare six approaches for model refine- reading the compressed root (cf. Section 4.1). Experiments ment in a Bayes tree. Figure 5 shows the results for all six were run using 2KB page sizes (except 4KB for Verbmobil approachesonthesmallandonthelargesyntheticdataset and8KBfortheUSPSdataset)onPentium4machineswith using the unsorted improvement strategy. After 5 pages on 2.4 Ghz and 1 GB main memory. To set the bandwidth for the small synthetic data set the locality assumption of the depth first approach fails to identify the most relevant re- name size classes features ref. finements for both geometric and probabilistic priority. For the large synthetic data set the accuracy even decreases af- Synthetic (11K) 11,000 2 4 ter 17 pages. The processing strategy of the breadth first Synthetic (1M) 1,000,000 3 5 approach causes the classification accuracy to only increase Vowel 990 11 10 [18] if a next level of the tree is reached. USPS 8,772 10 39 [16] We found that the performance of the different descent Pendigits 10,992 10 16 [18] strategiesisindependentoftheemployedimprovementstrat- egy (i.e. unsorted or qbk). As an example for the indepen- Letter 20,000 26 16 [18] dence of the improvement strategy Figure 6 shows the per- Gender 189,961 2 9 [1] formanceontheGenderdatasetusingqbm. Theglodescent Covtype 581,012 7 10 [18] strategywithprobabilisticprioritymeasureshowstheover- Verbmobil 647,585 5 32 [31] allbestanytimeclassificationaccuracyforalldatasets. We therefore use global best first descent and the probabilistic Table 1: Data sets used in our experiments. priority measure in all further experiments. 318 Gender, descent strategies Covtype, improvement strategies 0,78 0,72 0,76 0,7 ccuracyac0000,,77,,4277 ppppgerrroooobbbm_____gbgdglffottlo_____qqqqqbbbbmmmm curacyacc000,,,666468 pppprrrroooobbbb____gggglllloooo____uqqqnbbbs31morted 0,68 geom_bft_qbm 0,62 ggeeoomm_ddfftt_qqbbmm 00,6666 0,6 0,64 0 5 10 15 20 25 30 35 40 45 pages 00 55 1100 1155 2200 2255 3300 3355 4400 4455 pages Figure 8: Comparing improvement strategies on Figure 6: Comparing descent strategies along with Covtype. priority measures on the Gender data set. Pendigits, improvement strategies Covtype qb1 outperforms the qbm and unsorted strategies, 0,98 but on Pendigits it performs way worse then both of them. 0,96 Opposite success and failure would befall those in favor for 0,94 qbm. acy0,92 prob_glo_qb3 The results on Verbmobil and Gender were similar to ur prob_glo_qbm those on Pendigits, i.e. qb1 performed way worse than all cacc 00,99 prob_glo_unsorted other strategies. On the other hand, similar to the Cov- 0,88 prob_glo_qb1 type results, qb1 performed better than qbm and unsorted 0,86 onLetterandUSPS.Onalldatasetsallofthethreestrate- 0,84 gies discussed above were clearly outperformed by the qb3 0 5 10 15 20 25 30 35 40 45 pppaaagggeeesss approach. These results prove that anytime density estimation does Figure 7: Comparing improvement strategies on not extend in a naive or straight forward way to anytime Pendigits. classificationwithcompetitiveresults. Moreover,theresults posethequestion,howthechoiceofkinfluencestheanytime classification performance. Figure 9 shows the comparison 4.2 Improvementstrategies ofqb1,qbm andqbk fork∈{2,4,6,8,10}ontheLetterdata We evaluated the improvement strategies unsorted, qb1 set. Asstatedabove,qb1 performsbetterthanqbm onthis andqbm onallsevenrealworlddatasetsfromTable1. Be- data set. sidesthatweusedqb3 forafirstevaluation,i.e. wesetk to Theaccuracyoftheqbm strategyinFigure9initiallyin- 3(2forGender)asitfollowsourintuitionof”somerelevant creases steeply in the first three to four steps. It is clearly candidates”. (A thorough evaluation of various values for k visible that the compressed root needs on average 19 pages follows later in this section.) for the 26 classes (recall the 4 fold cross validation). After WepickedtheresultsfromPendigitsandCovtypeasthey the compressed root is read, the accuracy of the qbm strat- represent the two sorts of outcome we found when we ana- egy again steeply increases for the next four to five steps. lyzedtheresults(cf. Figures7and8). Thenaiveapproach, A similar increase can be observed 26 steps later when all i.e. the unsorted improvement strategy, does not provide a classeshavebeenrefinedoncemore. Opposedtothosethree competitive solution (third rank on Pendigits and last rank strongimprovementsaretherestofthesteps,duringwhich on Covtype). However, after reading the entire compressed theaccuracyhardlychangesatall. Forthisdatasetwede- root this approach has read the same information as qbm. rivefromFigure9thatforanobjectthathastobeclassified ThiscanbestbeseenontheresultsforCovtype. Afterfour thereareonaveragefiveclassesthatareworthconsidering, page accesses the compressed root is read and both strate- the other 21 classes can be precluded. gies deliver the same classification accuracy. From then on, The qb10 strategy needs seven to eight pages to read the their accuracy is the same after every seven steps, i.e. 11, rootinformationforthetoptenclassesfromthecompressed 18, 25, ... since Covtype has seven classes. The graph for root. These steps show the same accuracy as the qbm ap- Pendigits does not show this behavior in such a clear fash- proach,sincetheorderingaccordingtotheprioritymeasure ion. This is due to the averaging over the four folds. The is equal. After that a similar pattern can be observed each compressedrootneedsbetweensevenandeightpageswithin tensteps,i.e. afterreorderingaccordingtothenoveldensity each fold. Therefore the accuracy for unsorted and qbm is values. similar after seven to eight steps, yet not equal. This sim- Continuing with the qb8 variant, the ”ramps”(steep im- ilarity occurs every ten steps from then on, since Pendigits provements) move even closer together and begin earlier. has ten classes. Similar improvement gains are shown by qb6 and qb4. An Regarding the other improvement strategies, one could even smaller k (k = 2 in Figure 9) does no longer improve expect that qb1, i.e. refining only the most probable class, the overall anytime accuracy and qb1 shows an even worse delivers the best results, because it either strengthens the performance than qb10. currentdecisionbyincreasingtheprobabilityforthecorrect For an easier comparison of the qbk performance when class or corrects the decision by decreasing the probability varyingk,wecalculatetheareaundertheanytimecurvefor in case a false class has currently the highest density. On eachkandhencereceiveasinglenumberforeachvalueofk. 319 Letter, variation of k Vowel, Area 0,85 46,90746,889 4466,884455 46,81 46,74 46,66 0,8 4466,555544 46,458 46,406 prob_glo_qb4 46,344 y0,75 prob_glo_qb2 46,227 cc ura prob_glo_qb6 c prob_glo_qb8 ac 0,7 prob_glo_qb10 prob_glo_qb1 prob_glo_qbm 0,65 Covtype, Area 0,6 33,826 0 5 10 15 20 25 30 35 40 45 pages 33,247 32,919 32,778 Figure 9: Variation of k on the Letter data set. 32,368 32,,076 31,84 Letter, Area 37,174 37,212 36,591 35,922 35,022 35,36 33,61 Figure 11: Area values for k ∈ {1...m} for Vowel (top) and Covtype (bottom). Figure 10: The area under the corresponding any- To evaluate anytime classification under variable stream time curves in Figure 9. scenarios, we recapitulate a stochastic model that is widely used to model random arrivals [12]. A Poisson process describes streams where the inter-arrival times are indepen- The area fits the intuition of a best improvement, since qbk dently exponentially distributed. Poisson processes are pa- curves mostly ascend monotonically. Figure 10 shows the rameterized by an arrival rate parameter λ: area values corresponding to the anytime curves in Figure 9. As in the above analysis qb4 turns out to perform best Definition 10. Poisson stream. according to this measure. A probability density function for the inter-arrival time of a We evaluated qbk for k ∈ {1...m} on all data sets and Poisson process is exponentially distributed with parameter foundthesamecharacteristicresults. Figure11displaysthe λ by p(t) = λ·e−λt.The expected inter-arrival time of an resultsforVowelandCovtype. Wefoundthatthemaximum exponentially distributed random variable with parameter λ value was always between k = 2 and k = 4 on our data is E[t]= 1. sets. Moreover, with an increasing number of classes, the λ k value for the best performance also increased, yet only We use a Poisson process to model random stream ar- slightly. An exception is the Gender data set, where k = rivals for each fold of the cross validation. We randomly m=2showedthebestperformance. Thisisduetothebad generateexponentiallydistributedinter-arrivaltimesfordif- performance of qb1, therefore we set k to be at least 2. To ferentvaluesofλ. Ifanewobjectarrives(thetimebetween develop a heuristic for the value of k we evaluated different two objects has passed) we interrupt the anytime classifier data sets having different number of classes. It turned out and measure its accuracy. We repeat this experiment using that a good choice for k is logarithmic in the number of different expected inter-arrival times 1, where a unit corre- λ classes, i.e. k = (cid:98)log2(m)(cid:99). Using this heuristic met the spondstoapageasinallpreviousexperiments. Weassume maximalperformanceforallourevaluations. Yetwesetthe that any object arrives at the earliest after the initializa- minimum value for k to 2 as mentioned above. We use this tion phase of the previous object, i.e. after evaluating the improvement strategy along with the best descent strategy unimodal Bayes. from Section 4.1 in all further experiments. Figure12showstheresultsforthestreamclassificationac- curacyasdescribedabovefordifferentvaluesofλonPendig- 4.3 Evaluatinganytimeclassification its,USPS,Covtype,Verbmobilandthelargesyntheticdata usingPoissonstreams set. All data sets show an improvement of accuracy as the So far we determined the best strategy for descending a expected inter-arrival time 1/λ increases and arrival times Bayestreetorefinethedensitymodelandevaluatedthepro- aredistributedfollowingaPoissonprocess. Thisisanexcel- posedimprovementstrategies. Nextweproposeanoveleval- lentfeaturethatcontrastsanytimeclassifiersfrombudgetor uationmethodforanytimeclassifiersusingPoissonstreams contract classifiers. Budget classifiers would have to use a beforewedemonstratetheanytimelearningperformanceof time budget that is small enough to guarantee the process- our technique on various data streams. ingofallarrivingobjectswhichwouldclearlyresultinworse 320

Description:

ε range allow for straightfoward pruning of database objects . sponds to a frontier in the tree, i.e. a set of entries in the tree, such that each kernel estimator is represented exactly once. Taking a complete subtree of all nodes starting from the root, the frontier describes a non-redundant mod

Indexing Density Models for Incremental Learning and Anytime Classification on Data Streams PDF

12 Pages·2008·0.89 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Indexing Density Models for Incremental Learning and Anytime Classification on Data Streams

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.