Table Of Content

NonamemanuscriptNo. (willbeinsertedbytheeditor) Overlapping Cover Local Regression Machines MohamedElhoseiny · AhmedElgammal 7 Received:date/Accepted:date 1 0 2 Abstract WepresenttheOverlappingDomainCover(ODC) tational complexity of the state-of-the-art regression algo- n notion for kernel machines, as a set of overlapping subsets rithms limits their applicability for big data. In particular, a J ofthedatathatcoverstheentiretrainingsetandoptimized kernel-based regression algorithms such as Ridge Regres- to be spatially cohesive as possible. We show how this no- sion[12],GaussianProcessRegression(GPR)[18],andthe 5 tionbenefitthespeedoflocalkernelmachinesforregression TwinGaussianProcesses(TGP)[2]requireinversionofker- ] in terms of both speed while achieving while minimizing nelmatrices(O(N3),whereN isthenumberofthetraining G the prediction error. We propose an efficient ODC frame- points),whichlimitstheirapplicabilityforbigdata.Werefer L work,whichisapplicabletovariousregressionmodelsand tothesenon-scalableversionsofGPRandTGPasfull-GPR . s inparticularreducesthecomplexityofTwinGaussianPro- andfull-TGP,respectively. c cesses(TGP)regressionfromcubictoquadratic.Ournotion [ is also applicable to several kernel methods (e.g. Gaussian 1 v ProcessRegression(GPR)andIWTGPregression,asshown Khandekar et. al. [13] discussed properties and bene- 8 inourexperiments).Wealsotheoreticallyjustifiedtheidea fitsofoverlappingclustersforminimizingtheconductance 1 behindourmethodtoimprovelocalpredictionbytheover- from spectral perspective. These properties of overlapping 2 lapping cover. We validated and analyzed our method on clustersalsomotivatestudyingscalablelocalpredictionbased 1 0 threebenchmarkhumanposeestimationdatasetsandinter- onoverlappingkernelmachines.Figure1illustratestheno- . estingfindingsarediscussed. tion by starting from a set of points, diving them into ei- 1 ther disjoint and overlapping subsets, and finally learning 0 7 a kernel prediction function on each (i.e., f (x∗) for sub- i 1 1 Introduction set i, x∗ is testing point). In summary, the main question, : v weaddressinthispaper,ishowlocalkernelmachineswith i Estimationofacontinuousreal-valuedorastructured-output overlapping training data could help speedup the compu- X function from input features is one of the critical problems tationsandgainaccuratepredictions.Weachievedconsid- r thatappearsinmanymachinelearningapplications.Exam- a erable speedup and good performance on GPR, TGP, and ples include predicting the joint angles of the human body IWTGP(ImportanceWeightedTGP)appliedto3Dposees- from images, head pose, object viewpoint, illumination di- timationdatasets.Tothebestofourknowledge,ourframe- rection,andaperson’sageandgender.Typically,theseprob- workisthefirsttoachievequadraticpredictioncomplexity lemsareformulatedbyaregressionmodel.Recentadvances for TGP. The ODC concept is also novel in the context of instructureregressionencouragedresearcherstoadoptitfor kernelmachinesandisshownheretobesuccessfullyappli- formulating various problems with high-dimensional out- cable to multiple kernel-machines. We studies in this work put spaces, such as segmentation, detection, and image re- GPRandTGPandIWTGP(athirdmodel)kernelmachines. construction,asregressionproblems.However,thecompu- Theremainderofthispaperisorganizedasfollows:Section 2 and 4 presents some motivating kernel machines and the MohamedElhoseiny1andAhmedElgammal2 1FacebookAIResearch related work. Section 5 presents our approach and a theo- 2DepartmentofComputerScience,RutgersUniversity retical justification for our ODC concept. Section 6 and 7 E-mail:elhoseiny@fb.com,elgammal@cs.rutgers.edu presentsourexperimentalvalidationandconclusion. 2 MElhoseinyetal. Fig.1:Top:Left:24points,Middle:OverlappingCover,Right:disjointkernelmachinesof8points(evaluatingx∗ nearamiddleofakernel machine).Bottom:Left:disjointkernelmachineevaluationonboundary),Right:6Overlappingkernelmachinesof8points.fi(x∗)istheith kernelmachinepredictionforx∗testpoint. 2 BackgroundonFullGPRandTGPModels TGP [2] encodes the relation between both inputs and outputs using GP priors. This was achieved by minimiz- Inthissection,weshowexamplekernelmachinesthatmo- ing the Kullback-Leibler divergence between the marginal tivated us to propose the ODC framework to improve their GPofoutputs(e.g.,poses)andobservations(e.g.,features). performance and scalability. Specifically, we review GPR Hence,TGPpredictionisgivenby: forsingleoutputregression,andTGPforstructuredoutput regression.WeselectedGPRandTGPkernelmachinesfor their increasing interest and impact. However, our frame- workisnotrestrictedtothem. ˆy(x∗)=argmin[kY(y,y)−2kY(y)(cid:62)(KX+λXI)−1kX(x∗) GPR [18] assumes a linear model in the kernel space y withGaussiannoiseinasingle-valuedoutput,i.e.,y =f(x)+ −ηlog(k (y,y)−k (y)(cid:62)(K +λ I)−1k (y))] Y Y Y Y Y N(0,σ2),wherex ∈ RdX andy ∈ R.Givenatrainingset (2) n {x ,y ,i = 1 : N}, the posterior distribution of y given a i i testpointx is: ∗ p(y|x∗)=N(µy =k(x∗)(cid:62)(K+σn2I)−1f, whereη =k (x ,x )−k (x )(cid:62)(K +λ I)−1k (x ), X ∗ ∗ X ∗ X X X ∗ σy2 =k(x∗,x∗)−k(x∗)(cid:62)(K+σn2I)−1k(x∗)) k (x,x(cid:48)) = exp(−(cid:107)x−x(cid:48)(cid:107)) and k (y,y(cid:48)) = exp(−(cid:107)y−y(cid:48)(cid:107)) (1) X 2ρ2x Y 2ρ2y are Gaussian kernel functions for input feature x and out- wherek(x,x(cid:48))iskerneldefinedintheinputspace,Kisan put vector y, ρ and ρ are the kernel bandwidths for the x y N × N matrix, such that K(l,m) = k(x ,x ), k(x ) = inputandtheoutput.k (y)=[k (y,y ),...,k (y,y )](cid:62), l m ∗ Y Y 1 Y N [k(x ,x ),,...,k(x ,x )](cid:62), I is an identity matrix of size whereN isthenumberofthetrainingexamples.k (x )= ∗ 1 ∗ N X ∗ N,σ isthevarianceofthemeasurementnoise,f =[y ,··· [k (x ,x ),..., k (x ,x )](cid:62), and λ and λ are regu- n 1 X ∗ 1 X ∗ N X Y ,yN](cid:62). GPR could predict structured output y ∈ RdY by larizationparameterstoavoidoverfitting.Thisoptimization trainingaGPRmodelforeachdimension.However,thisin- problemcanbesolvedusingaquasi-Newtonoptimizerwith dicatesthatGPRdoesnotcapturedependencybetweenout- cubicpolynomiallinesearch [2];wedenotethenumberof putdimensionswhichlimititsperformance. stepstoconvergenceasl . 2 OverlappingCoverLocalRegressionMachines 3 Table1:ComparisonofcomputationalComplexityoftrainingandtestingforeachofFull,NN(NearestNeighbor),FITC,Local-RPC,andour ODC.Trainingisthetimeincludeallcomputationsthatdoesnotdependontestdata,whichincludesclusteringinsomeofthesemethods.Testing includescomputationsonlyneededforprediction TrainingforGPRandTGP Testingforeachpoint EkmeansClustering RPCClustering Modeltraining GPR-Y GPR-Var TGP-Y Full - - O(N3+N2dX) O(N·(dX+dY) O(N2·dY) O(l2·N2·dY) NN[2] - - - O(M3·dY) O(M3·dY) O(M3+l2·M2·dY) FIC(GPRonly,dY =1[22]) - - O(M2·(N+dX)) O(M·dX) O(M2) - LOoDcCal-(RouPrCfr(aomnleywGoPrkR),dY =1[5]) O(N·(1−Np)-M·dX·l1) O(N·lNog(·(lo1g−N(pMN)M))·dX) OO(M(M22·(·1(NN−p++ddXX)))) O(K(cid:48)O·M(M·(·ddXX+) dY)) O(KO(cid:48)·(MM22)·dY) O(l2·K(cid:48)-·M2·dY) 3 ImportanceWeightedTwinGaussianProcesses 4 RelatedWorkonApproximationMethods (IWTGP) Various approximation approaches have been presented to Yamadaetal[26]proposedtheimportance-weightedvariant reducethecomputationalcomplexityinthecontextofGPR. oftwinGaussianprocesses[2]calledIWTGP.Theweights As detailed in [16], approximation methods on Gaussian arecalculatedusingRuLSIF[27](relativeunconstrainedleast- Processes may be categorized into three trends: matrix ap- squares importance fitting). The weights were modeled as w (x,θ)=(cid:80)nte θ k(x,x )tominimizeE [(w (x,θ)− proximation,likelihoodapproximation,andlocalizedregres- α l=1 l l pte(x) α sion.Thematrixapproximationtrendisinspiredbytheob- w (x))2]. where k(x,x ) = exp(−(cid:107)x−xl(cid:107)) , w (x) = α l 2τ2 α servation that the kernel matrix inversion is the major part pte(x) , 0 ≤ α ≤ 1. To cope with this insta- of the expensive computation, and thus, approximating the (1−α)pte(x)+αptr(x) bility issue, setting α to 0 ≤ α ≤ 1 is practically useful matrix by a lower rank version, M (cid:28) N (e.g., Nystro¨m forstabilizingthecovariateshiftadaptation,eventhoughit Method [25]). While this approach reduces the computa- cannot give an unbiased model under covariate shift [27]. tional complexity from O(N3) to O(NM2) for training, According[26]theoptimalθˆvectoriscomputedinaclosed there is no guarantee on the non-negativity of the predic- formsolutionasfollows.to tive variance [18]. In the second trend, likelihood approxi- mationisperformedontestingandtrainingexamples,given θˆ=(Hˆ +νI)−1hˆ (3) M artificial examples known as inducing inputs, selected from the training set (e.g., Deterministic Training Condi- where Hˆ = 1−α(cid:80)nte k(xte,xtek(xte,xte)+ α (cid:80)ntr l,l(cid:48) nte i=1 i l i l(cid:48) ntr j=1 tional(DTC)[19],FullIndependentconditional(FIC)[22], k(xtjr,xtlek(xtjr,xtl(cid:48)e), hˆ is an nte- dimensional vector with PartialIndependentConditional(PIC)[21]).Thedrawback thelthelementhˆ = 1 (cid:80)nte k(xte,xte),Iisann ×n - ofthistrendisthedilemmaofselectingM inducingpoints, l nte i=1 i l te te dimensionalidentitymatrix.wheren andn andthenum- whichmightbedistantfromthetestpoint,resultinginaper- te tr beroftestingandtrainingpointsrespectively.Modelselec- formancedecay;seeTable1forthecomplexityofFIC. tionofRuLSIFisbasedoncross-validationwithrespectto A third trend, localized regression, is based on the be- the squared-error criterion J in [27]. Having computed θˆ, liefthatdistantobservationsarealmostunrelated.Thepre- each input and output examples are simply re-weighted by diction of a test point is achieved through its M nearest 1 wα2 [26]. Therefore, the output of the importance weighted points. One technique to implement this notion is through TGP(IWTGP)isgivenby decomposingthetrainingpointsintodisjointclustersduring training,wherepredictionfunctionsarelearnedforeachof yˆ=argmin[KY(y,y)−2ky(y)Tuw−ηwlog(KY(y,y)− them[16].Attesttime,thepredictionfunctionoftheclos- y estclusterisusedtopredictthecorrespondingoutput.While ky(y)TW12(W12KYW12 +λyI)−1W12ky(y))] thismethodisefficient,itintroducesdiscontinuityproblems (4) on boundaries of the subdomains. Another way to imple- mentlocalregressionisthroughMixtureofExperts(MoE) where uw = W12(W12KXW12 +λxI)−1W12kx(x), ηw = as an Ensemble method to make prediction based on com- k (x,x)−k (x)Tu .SimilartoTGP,IWTGPcanalsobe putingthefinaloutputbycombiningoutputsoflocalpredic- X x w solvedusingasecondorder,BFGSquasi-Newtonoptimizer torscalledexperts(seeastudyonMoEmethods[28]).Ex- with cubic polynomial line search for optimal step size se- amples include Bayesian committee machine (BCM [23]), lection. local probabilistic regression (LPR [24]), mixture of Tree Table1showsthetrainingantestingcomplexityoffull ofGaussianProcesses(GPs)[9],andMixtureofGPs[18]. GPR and TGP models, where d is the dimensionality of While these approaches overcome the discontinuity prob- Y theoutput.Table1alsosummarizesthecomputationalcom- lembythecombinationmechanism,theysufferfrominten- plexity of the related approximation methods, discussed in sivecomplexityattesttime,whichlimitsitsapplicabilityin thefollowingsection,andourmethod.N. large-scale setting, e.g., Tree of GPs and Mixture of GPs, 4 MElhoseinyetal. involvecomplicatedintegration,approximatedbycomputa- Table2:Contrastagainstmostrelevantmethods tionallyexpensivesamplingorMonteCarlosimulation. [16] FIC/PIC[22] NN[2] ODC Parketal.[16]proposedalarge-scaleapproachforGPR Noforhigh Accurate No Yes Yes inputdimension bydomaindecompositiononupto2Dgridoninput,where Efficient No Yes No Yes Scalabletoarbitrary a local regression function is inferred for each subdomain No(2D) Yes Yes Yes inputdimension such that they are consistent on boundaries. This approach ConsistentonBoundaries Yes No Yes Yes supportedkernelmachines GPR GPR TGP GPR,TGP,IWTGPandothers obviously lacks a solution to high-dimensional input data Easytoparallelize No No Yes Yes becausethesizeofthegridincreasesexponentiallywiththe dimensions,whichlimitsitsapplicability.Morerecently,[5] proposedaRecursivePartitioningScheme(RPC)todecom- Then,alocalpredictionmodel(kernelmachine)is created pose the data into non-overlapping equal-size clusters, and foreachsubdomainandthecomputationsthatdoesnotde- theybuiltaGPRoneachcluster.Theyshowedthatthislo- pendonthetestdataarefactoredoutandprecomputed(e.g. calschemegivesbetterperformancethanFIC[22]andother inversion of matrices). The nature of the ODC generation methods.However,thispartitioningschemeobviouslylacks makes these kernel machines consistent in the overlapped consistencyontheboundariesofthepartitionsanditwasre- regions, which are the boundaries since we constraint the strictedtosingle-outputGPR.Table1showsthecomplexity subdomains to be coherent. This is motivated by the no- ofthisschemedenotedbylocal-RPCforGPR. tion that data lives on a manifold with local properties and BeyondGPR,wefoundthatlocalregressionwasadopted consistentconnectionsbetweenitsneighboringregions.On differentlyinstructuredregressionmodelslikeTwinGaus- prediction,theoutputiscalculatedasareductionfunctionof sianProcesses(TGP)[2],andalsoandatabiasversionofit, thepredictionsontheclosedsubdomain(s).Table1(thelast denotedbyIWTGP[26].TGPandIWTGPoutperformnot row)showsthecomplexityforourgeneralizedODCframe- onlyGPRinthistask,butalsovariousregressionmodelsin- work, detailed in Sec 5.1 and 5.2. In contrast to the prior cludingHilbertSchmidtIndependenceCriterion(HSIC)[10], work, our ODC framework is designed to cover structured KernelTargetAlignment(KTA)[6],andWeighted-KNN[18]. regression setting, dY > 1 and to be applicable to GPR, BothTGPandIWTGPhavenoclosed-formexpressionfor TGP,andmanyothermodels. prediction. Hence, the prediction is made by gradient de- Notations. Given a set of input data X = {x ,··· ,x }, 1 N scent on a function that needs to compute the inverse of ourpredictionframeworkfirstlygeneratesasetofnon-overlapping boththeinputandoutputkernelmatrices,O(N3)complex- equal-sizepartitions,C ={C ,··· ,C },suchthat∪ C = 1 K i i ity.Practically,bothapproacheshavebeenappliedbyfind- X,|C | = N/K.Then,theODCisdefinedbasedonthem i ingtheM (cid:28)N Nearest-Neighbors(NN)ofeachtestpoint as D = {D ,··· ,D }, such that |D | = M∀i, D = 1 K i i in[2]and[26].ThepredictionofatestpointisO(M3)due C ∪ O ,∀i. O a the set of points that overlaps with the i i i totheinversionofM×M inputandoutputkernelMatrices. otherpartitions,i.e.,O = {x : x ∈ {∪ C }},suchthat i j(cid:54)=i j However,NNschemehasthreedrawbacks:(1)Aregression |O | = p·M,|C | = (1−p)·M,0 ≤ p ≤ 1istheratio i i model is computed for each test point, which results in a of points in each overlapping subdomain, D , that belongs i scalabilityproblemsinprediction(i.e.,Matrixinversionson to/overlapswithpartitions,otherthanitsown,C . i the NN of each each test point), (2) Number of neighbors Itisimportanttonotethat,theODCcouldbespecified might not be large enough to create an accurate prediction bytwoparameters,M andp,whicharethenumberofpoints modelsinceitisconstrainedbythefirstdrawback,(3)Itis ineachsubdomainandtheratioofoverlaprespectively;this inefficientcomparedwiththeotherschemesusedforGPR. issinceK = N/(1−p)M.ThisparameterizationofODC Table1showsthecomplexityofthisNNscheme. generationisreasonableforthefollowingreasons.First,M definesthenumberofpointsthatareusedtotraineachlocal 5 ODCFramework kernelmachine,whichcontrolstheperformanceofthelocal prediction.Second,givenM andthatK =N/(1−p)M,p The problems of the existing approaches, presented above, defines how coarse/fine the distribution of kernel machines motivatedustodevelopanapproachthatsatisfiestheprop- are. It is not hard to see that as p goes to 0, the generated ertieslistedintable2.Thetablealsoshowswhichofthese ODC reduces to the set of non-overlapping clusters. Simi- propertiesaresatisfiedfortherelevantmethods. Inorderto larly,aspapproaches1−1/M,theODCreducestogenerat- satisfy all the properties, we present the Overlapping Do- ingaclusterateachpointwithmaximumoverlapwithother mainCover(ODC)notion.WedefinetheODCasacollec- clusters,i.e.,K =N,|C |=1,and|O |=M−1.Ourmain i i tion of overlapping subsets of the training points, denoted claimistwofold.First,precomputinglocalkernelmachines by subdomains, such that they are as spatially coherent as (e.g. GPR, TGP, IWTGP) during training on the ODC sig- possible. During training, an ODC is computed such that nificantly increase the speedup on prediction time. Second, eachsubdomainoverlapswiththeneighboringsubdomains. givenafixedM andN,aspincreases,localpredictionper- OverlappingCoverLocalRegressionMachines 5 Fig.2:ODCFramework formance increases, theoretically supported by Lemma 51 is an test point x∗ and define that the probability that x∗ is captured by the ODC to be proportional to the maximum Lemma51. UnderODCnotion,astheoverlappincreases, probabilityofx∗amongthedomains. thecloserthenearestmodeltoanarbitrarytestpointandthe morelikelythatmodelgettrainedonabigneighborhoodof K thetestpoint. p(x∗)=(cid:88)p(x∗,D ) i Proof. Westartbyoutliningthemainideabehindtheproof, i=1 K which is directly connected to the fact that K = N/(1− (cid:88) = p(x∗|D )δ(p(x∗|D )−maxK (p(x∗|D ))) p)M, which indicates that the number of local models in- i i j=1 i i=1 creasesaspincreasesgivenfixedN andM.Undertheas- =maxK p(x∗|D ) sumptionthatthelocalmodelsarespatiallycohesive,p→1 i=1 i theoreticallyindicatesthatthereisalocalmodelcenteredat =(2π)−d2XmaxKi=1|Σi|−12e−21(x∗−µi)TΣi−1(x∗−µi) eachpointinthespace(i.e.K =∞).Hence,aspincreases, (7) the distribution of the kernel machines is the finest and the more likely a test point to find the closest kernel machines whereδ(0)=1,0otherwise.Thereasonbehindthisdefini- trainedonabigneighborhoodofitleadingtomoreaccurate tionofp(x∗)isthatourmethodselectthedomainofpreduc- prediction.Meanwhile,aspgoesto0,thedistributionisthe tion based on argmaxKi=1p(x∗|Di). Hence pODC1(x∗) = coarsestandthelesslikelyatestpointfinds,theclosestker- maxKi=11pODC1(x∗|Di)andpODC2(x∗)=maxKi=21pODC2(x∗|Di). nelmachines,trainedonabigneighborhood. We start by the case where the points are uniformally Let’sassumethateachkernelmachineisdefinedonM distributed in the space. Under this condition and assum- points that are spatially cohesive, covering the space of N ing that spatially cohesive domain cover, this leads to that pointswith N .Let’sassumethatcenteroftheMpoints p(x∗|Di) ≈ N(µi,Σ)∀i,whereΣ1 = Σ2··· = ΣK = Σ. (1−p)M inkernelmachineiisµ ,thetheCo-variancematrixofthese Hence i pointsareΣ .Hence i p(x|Di)=N(µi,Σi) p(x∗|Di)∝e−21(x∗−µi)TΣ−1(x∗−µi) (5) (8) =(2π)−d2X|Σi|−21e−12(x−µi)TΣi−1(x−µi) ln(p(x∗|Di))∝−(x∗−µi)TΣ−1(x∗−µi) where N(µ ,Σ ) is a normal distribution of mean µ and Then i i i Co-variancematrixΣ . i p(x∗)=maxK p(x∗|D ) Let’sassumethattherearetwoODCs,ODC andODC , i=1 i 1 2 defined on the same N points, the first one has overlap p1 =(2π)−d2XΣ|−12maxKi=1|e−21(x−µi)TΣ−1(x−µi) and the second one is with overlap p , such that, p > p . Let’sassumethatthenumberofkern2elmachinesin2ODC1 ∝maxKi=1e−12(x−µi)TΣ−1(x−µi) 1 andODC2areK1andK2,respectively.Hence, ln(p(x∗))∝maxKi=1−(x−µi)TΣ−1(x−µi) N N (9) K = , K = (6) 1 (1−p1)M 2 (1−p2)M Hence, p(x∗) gets maximized as it get closer to one of the Since p > p , 0 ≤ p < 1 and 0 ≤ p < 1, then centers of the domains µ , defined by the ODC. It is not 2 1 1 2 i K > K , which indicates that the number of kernel ma- hard to seen that that chances of x∗ to be closer to one of 2 1 chinesinODC withhigheroverlapisbiggerthanthenum- the centers covered by ODC is higher than ODC , espe- 2 2 2 ber of kernel machines in ODC . Let’s assume that there cially when p (cid:29) p . This is since K = N ,K = 2 2 1 1 (1−p1)M 2 6 MElhoseinyetal. N . Hence K (cid:29) K when p (cid:29) p . For instance, RecursiveProjectionClustering(RPC)[5].Inthismethod, (1−p2)M 2 1 2 1 when p = 0 and p2 = 0.9, this leads to that ODC will the training data is partitioned to perform GPR prediction. 1 1 generate K = N domains, while ODC will generate Initially all data points are put in one cluster. Then, two 1 M 2 K = 10·N = 10K ,whichistentimesmoredomainsand pointsarechosenrandomlyandorthogonalprojectionofall 2 M 1 centers.ThefactthattherearemuchmoredomainsifK (cid:29) thedataontothelineconnectingthemiscomputed.Depend- 2 K together with that there domains are spatially cohesive ingonthemedianvalueoftheprojections,Thedataisthen 1 leadstomaxK1 −(x∗−µ1)TΣ−1(x∗−µ1)(cid:29)maxK2 − split into two equal size subsets. The same process is then i=1 i 1 i i=1 (x∗−µ2)TΣ−1(x∗−µ2).Theproofofthisstatementderives appliedtoeachclustertogenerate2l clustersafterl repeti- i 2 i fromthefactthatmaxK −(x∗−µ )TΣ−1(x∗−µ )iscould tions. The iterations stops once 2l > K. As indicated, the i=1 i i maximizedby(1)ifx∗ getsveryclosetooneofµ ,i = 1 : numberofclustersinthismethodhastobeapoweroftwo i K,and(2)smallervariance|Σ|,whichisminimizedbythe anditmightproducelongthinclusters. naturebywhichODCiscreated,sinceeachdomainiiscre- atedbyneighboringpointstoitscenter(i.e.|Σ | (cid:29) |Σ |). Equal-SizeK-means(EKmeans).Weproposeavariantof 1 2 ThisdirectlyleadstothatifK (cid:29)K thenmaxK1 −(x∗− k-meansclustering[11]togenerateequal-sizeclusters.The 2 1 i=1 µ1)TΣ−1(x∗−µ1)(cid:29)maxK2 −(x∗−µ2)TΣ−1(x∗−µ2). goalistoobtaindisjointpartitioningofX intoclustersC = i 1 i i=1 i 2 i Hence,p (x∗)(cid:29)p (x∗). {C1,··· ,CK}, similar to the k-means objective, minimiz- ODC2 ODC1 ing the within-cluster sum of squared Euclidean distances, Even if the points are not uniformally distributed, it is C = arg J(C) = min(cid:80)K (cid:80) d(x ,µ ), where still more likely that an ODC with higher overlap would C j=1 xi∈Cj i j havehigherp(x∗),sincex∗iscloseunderexpectationtoone µi is the mean of cluster Ci, and d(·,·) is the squared distance.OptimizingthisobjectiveisNP-hardandk-meansit- ofthecentersifmorespatiallycohesivedomainsaregener- eratesbetweentheassignmentandupdatestepsasaheuris- ated which increases with higher overlap. Our experiments tictoachieveasolution;l denotesnumberofiterationsof alsoprovesthat theODCconceptgeneralizesonthree real 1 kmeans.Weaddequal-sizeconstraints∀(1≤i≤K),|C |= datasetwherethetrainingpointsarenotdistributedunifor- i N/K =(1−p)M. mally. In order to achieve this partitioning, we propose an efficient heuristic algorithm, denoted by Assign and Balance 5.1 Training (AB) EKmeans. It mainly modifies the assignment step of the k-means to bound the size of the resulting clusters. We There are several overlapping clustering methods that in- firstassignthepointstotheirclosestseecenterastypically clude(e.g.[17]and[3]),whichlooksrelevantforourframe- done in the assignment step of k-means. We use C(x ) to p work.Howeverthesemethodsdoesnotfitourpurposeboth denote the cluster assignment of a given point x . This re- p equal-sizeconstraintsforthelocalkernelmachines.Wealso sultsinthreetypesofclusters:balanced,overfull,andunder- found them very slow in practice because their complexity full clusters. Then some of the points in the overfull clus- varies from cubic to quadratic (with a big constant factor) ters are redistributed to the underfull clusters by assigning on the training-set. These problems motivated us to pro- each of these points to the closest underfull cluster. This is poseapracticalmethodthatbuildsoverlappinglocalkernel- achievedbyinitializingapoolofoverfullpointsdefinedas machineswithspatialandequal-sizeconstraints. Thesecon- X˜ ={x :x ∈C ,|C |>N/K};seeFigure3. p p l l straintsarecriticalforourpurposesincethenumberofpoints LetusdenotethesetofunderfullclustersbyC˜ ={C : ineachkernel-machinedetermineitslocalperformance.Hence, p |C | < N/K}.Wecomputethedistancesd(x ,µ ),∀x ∈ ourtrainingphaseistwosteps:(1)thetrainingdataissplit p i j i XãndC ∈ C˜. Iteratively, we pick the minimum distance into K = N/(1−p)M equal-sized clusters of (1−p)M i pair (x ,µ ) and assign x to cluster C instead of cluster points.(2)anODCwithKoverlappingsubdomainsisgen- p l p l C(x ). The point is then removed from the overfull pool. erated by augmenting each cluster with p·M points from p Once an underfull cluster becomes full it is removed from theneighboringclusters. theunderfullpool,onceanoverfullclusterisbalanced,the 5.1.1 Equal-sizeClustering remaining points of that cluster are removed from overfull pool.Theintuitionbehindthisalgorithmsisthat,thecostas- There are recent algorithms that deal with size constraints sociatedwiththeinitialoptimalassignment(giventhecom- in clustering. For example, [29] formulated the problem of putedmeans)isminimallyincreasedbyeachswapsincewe clusteringwithsizeconstraintsasalinearprogrammingprob- picktheminimumdistancepairineachiteration.Hencethe lem.Howeversuchalgorithmsarenotcomputationallyeffi- costiskeptaslowaspossiblewhilebalancingtheclusters. cient,especiallyforlargescaledatasets(e.g.,Human3.6M). We denote the the name of this Algoirthm as Assign and Westudytwoefficientwaystogenerateequalsizeclusters; Balance EKmeans. Algorithm 1 illustrates the overall as- seeTable1(lastrow)fortheirODC-complexity. signmentstepandFig.4visualizesthebalancingstep. OverlappingCoverLocalRegressionMachines 7 5.1.2 OverlappingDomainCover(ODC)Model Having generated the disjoint equal size clusters, we generate the ODC subdomains based on the overlapping ra- tio p, such that p · M points are selected from the neighboring clusters. Let’s assume that we select only the closest r clusters to each cluster, C is closer to C than C i j k if (cid:107)µ −µ (cid:107) < (cid:107)µ −µ (cid:107). It is important to note that r i j i k must be greater than p/(1 − p) in order to supply the required p·M points; this is since number of points in each cluster is (1 − p)M. Hence, the minimum value for r is (cid:100)(p·M)/((1−p)·M)(cid:101) = (cid:100)p/(1−p)(cid:101) clusters. Hence, weparametrizerasr =(cid:100)t·p/(1−p)(cid:101),t≥1.Westudythe effectoftintheexperimentalresultssection.Havingcom- puted r from p and t, each subdomain D is then created i by merging the points in the cluster C with p·M points, i Fig.3:AB-EKmeanson300,0002Dpoints,K=57 retrieved from the r neighboring clusters. Specifically, the points are selected by sorting the points in each of r clusters by the distance to µ . The number of points retrieved i for each of the r neighboring clusters is inversely propor- tionaltothedistanceofitscentertoµ .Ifasubsetofther i clustersarerequestedtoretrievemorethanitscapacity(i.e., (1−p)M),thesetoftheextrapointsarerequestedfromthe remainingclustersgivingprioritytothecloserclusters(i.e., starting from the nearest neighboring cluster to the cluster on which the subdomain is created). As t = 1 and p increases, all points that belong to the r clusters tends to be merged with C . In our framework, we used FLANN [15] i forfastNN-retrieval;seepseudo-codeofODCgenerationin AppendixC. After the ODC is generated, we compute the the sam- plenormaldistributionusingthepointsthatbelongtoeach subdomain.Then,alocalkernelmachineistrainedforeach Fig.4:ABKmeans:BalancingStep oftheoverlappingsubdomains.Wedenotethepointsetnor- mal distribution of the subdomains as p(x|D ) = N(µ(cid:48) ∈ i i RdX,Σ(cid:48) ∈RdX×dX);Σ(cid:48)−1isprecomputedduringthetrain- Input:X(N×dx),{µi}Ki=1 ing forilater use during tihe prediction. Finally, we factor Output:labels out all the computations that does not depend on the test 1-Assignthepointsinitiallytoitsclosestcenter;thiswillput theclustersinto3groups(1)balancedclusters(2)overflowed point(forGPR,TGP,IWTGP)andstorethemwitheachsub clusters(3)under-flowedclusters. domain as its local kernel machine. We denote the training 2-CreateamatrixD∈RN×K,whereD[i,j]isthedistance modelforsubdomainiasMi,whichiscomputedasfollows betweentheithpointtothejthclustercenter;rowsare forGPRandTGPrespectively. restrictedpointsbelongsonlytotheoverflowedclusters; columnsarerestrictedtounderflowedclustercenters GPR. Firstly, we precompute (Ki + σ2 I)−1, where j ni 3-Getthecoordinate(i∗,j∗)thatmapsthesmallestdistance Ki isanM×M kernelmatrix,definedonthej inputpoints inD. j 4-Removetheit∗hrowfrommatrixDandmarkitasassigned in Di. Each dimension j in the output could have its own tothejthcluster hyper-parameters,whichresultsinadifferentkernelmatrix 5-Ifthesizeoftheclusterjachievestheidealsize(i.e. foreachdimensionKi.Wealsoprecompute(Ki+σ2 I)−1y n/K),thenremovethejthcolumnfrommatrixD. j j ni j j 6-Gotostep3ifthereisstillunassignedpoints foreachdimension.HenceMi ={(Ki+σ2 I)−1,(Ki+ Algorithm1:AssignandBalance(AB)k-means:Assign- GPR j nij j mentStep σn2iI)−1yj),j =1:dY}. j TGP. Thelocalkernelmachineforeachsubdomainin TGPcaseisdefinedasMi = {(Ki +λi I)−1,(Ki + TGP X X Y λi I)−1},whereKi andKi areM×M kernelmatricesde- Y X Y 8 MElhoseinyetal. finedontheinputpointsandthecorrespondingoutputpoints (3)SubdomainsweightingandFinalprediction.Thefinal respectively,whichbelongtodomaini. predictions are formulated as Y(x ) = (cid:80)K(cid:48) a Yi ,a > ∗ i=1 i x∗ i IWTGP. It is not obvious how to factor out computa- 0,(cid:80)K(cid:48) a = 1. {a }K(cid:48) are computed as follows. Let the i=1 i i i=1 tIiWonTsGthP,astindcoeesthneoctodmeppuentadtioonnaltheextteensstivdeatfaacitnort(hi.ee.c,a(sWeio12f distribution of domain {Dxi∗ =(cid:107)x−µ(cid:48)i(cid:107)Σk(cid:48)−1}Ki=(cid:48)1 denotes 1 1 1 tothedistancestotheclosestsubdomains,{Li =1/Di }K(cid:48) , Ki Wi2 +λiI)−1,(Wi2Ki Wi2 +λiI)−1)doesdepend x∗ x∗ i=1 onXthe test sext since Wi is cYomputed oyn test time. To help ai =Lix∗/(cid:80)Ki=(cid:48)1Lix∗. It is not hard to see that when K(cid:48) = 1, the prediction factor out the computation, we used linear algebra to show step reduces to regression using the closest subdomain to that thetestpoint.Howeveritisreasonableinmostoftheprior λD−2A−2D−2 worktomakepredictionusingtheclosestmodel,wegener- (DAD+λI)−1 =D−1A−1D−1− 1+λ·tr(D−1A−1D−1) alizedittoK(cid:48) closestkernelmachinesandcombiningtheir (10) predictions,soastostudyhowconsistencyofthecombined predictionbehavesastheoverlapincreases(i.e.,p);seethe where D is a diagonal matrix, I is the identity matrix, and experiments. tr(B)isthetraceofmatrixB. Proof. KennethMiller[14]proposedthefollowingLemma onMatrixInverse. 6 ExperimentalResults 1 (G+H)−1 =G−1− G−1HG−1 (11) Equal-Size Kmeans Step Experiment: We also tried an- 1+tr(GH−1) other variant for Ekmeans that we call Iterative Minimum- ApplyingMiller’slemma,whereG = DAD andH = λI, Distance Assignments EKmeans (IMDA- Ekmeans). Note leadsdirectlytoEq.10. that the algorithm presented earlier in the paper is denoted asAssignandBalanceKmeans(AB-Kmeans).TheIMDA- 1 MappingDtoWi2 1,AtoeitherofKi orKi ,wecan X Y Ekmeans algorithm works as follows. We initialize a pool computeMi = {Ki −1,Ki −1}.HavingcomputedWi on of unassigned points X˜ = X and initialize all clusters as X Y 1 1 1 1 testtime,(Wi2Ki Wi2+λ I)−1,(Wi2K Wi2+λ I)−1 empty.Giventhemeanscomputedfromthepreviousupdate X x X x could be computed in quadratic time given Mi following steps,wecomputethedistancesd(xi,µj)forallpoints/center equation 10, since the inverse and the power of Wi12 has pairs.Weiterativelypicktheminimumdistancepair linearcomputationalcomplexitysinceitisdiagonal. (xp,µl):d(xp,µl)≤d(xi,µj)∀xi ∈Xãnd|Cl|<N/K 5.2 Prediction and assign point xp to cluster l. The point is then removed fromthepoolofunassignedpoints.if|C |=N/K,thenitis l ODC-Predictionisperformedinthreesteps. markedasbalancedandnolongerconsidered.Theprocess (1)Findingtheclosestsubdomains.TheclosestK(cid:48) (cid:28)K isrepeateduntilthepoolisempty;seeAlgorithm2. subdomains are determined based on the covariance norm Table 3presentstheaveragecostover10runsofIMDA- ofthedisplacementofthetestinputfromthemeansofthe Ekmeans and AB-Ekmeans algorithms. We initialize both subdomaindistribution(i.e.(cid:107)x−µ(cid:48)(cid:107) ,i=1:K,where the AB-Ekmeans and IMDA-EKmeans algorithms by the i Σ(cid:48)−1 i (cid:107)x−µ(cid:48)(cid:107) =(x−µ(cid:48))TΣ(cid:48)−1(x−µ(cid:48)).Thereasonbehind clustercenterscomputedbyrunningthestandardk-means. i Σ(cid:48)−1 i i i i using the covariance norm is that it captures details of the densityofthedistributioninalldimensions.Hence,itbetter modelsp(x|D ),indicatingbetterpredictionofxonD . Input:X(N×dx),{µi}Ki=1 i i Output:labels (2)ClosestsubdomainsPrediction.Havingdeterminedthe 1-CreateamatrixD∈RN×K,whereD[i,j]isthedistance closest subdomains, predictions are made for each of the betweentheithpointtothejthclustercenter. closest clusters. We denote these predictions as {Yi }K(cid:48) . 2-Getthecoordinate(i∗,j∗)thatmapsthesmallestdistance x∗ i=1 inD. Eachofthesepredictionarecomputed accordingtothese- 3-RemovetheithrowfrommatrixDandmarkitasassigned ∗ lectedkernelmachine.ForGPR,predictivemeanandvari- tothejthcluster anceareO(M·d ) andO(M2·d ) respectively,foreach 4-Ifthesizeoftheclusterjachievestheidealsize(i.e. X Y n/K),thenremovethejthcolumnfrommatrixD. outputdimension.ForTGP,thepredictionisO(l ·M2·d ); 2 Y 5-Gotostep2ifthereisstillunassignedpoints seeEq 2. Algorithm 2: Iterative Minimum-Distance Assignments (IMDA)k-means:AssignmentStep 1 Wisadiagonalmatrix OverlappingCoverLocalRegressionMachines 9 Asillustratedintable3,theAB-EkmeansoutperformsIMDA- Subjects(S1,S2,S6,S7,S8,S9)fromit,whichis≈0.5mil- Ekmeans in these experiments, which motivated us to uti- lionposes.Wesplittheminto67%training33%istesting. lize AB Ekmeans, which is presented in the paper, against HOGfeaturesareextractedfor4image-viewsforeachpose IMDA-EkmeansunderourODCpredictionframework.Our andconcatenatedinto3060-dimvector.Errorforeachpose, interpretationfortheseresultsisbecauseAB-Ekmeansini- inbothHEva(inmm)andHuman3.6(incm),ismeasured tializes the assignment with an assignment that minimizes asError(yˆ,y∗)= 1 (cid:80)L (cid:107)yˆm−y∗m(cid:107). the cost J(C) = min(cid:80)K (cid:80) d(x ,µ ) given the L m=1 j=1 xi∈Cj i j There are four control parameters in our ODC frame- cluster centers and then balance the clusters. In all the fol- work:M,p,t,andK(cid:48).Figure6showsourparameteranaly- lowing experiments,we usesAB-EKmeans dueto its clear siswithdifferentvaluesofp,tandK(cid:48)onHumanEvadataset superiorperformancetoIMDA-EKmeans. forGPRandTGPaslocalregressionmachines,whereM = 800.Eachsub-figureconsistsofsixplotsintworows.The Table 3: J(C) of AB-kmeans and IMDA-kmeans on a dataset of firstrowindicatestheresultsusingAB-Ekmeansclustering 10,000random2Dpoints,averagedover10runs scheme, while the second row shows the results for RPC clusteringscheme.Eachrowhasthreeplots,oneforK(cid:48) =1, K=5 K=10 K=50 2,and3respectively. Eachplotshowstheerrorofdifferent AB-kmeans 1077.3 540.241 105.505 IMDA-kmeans 1290.6 657.446 122.006 t against p from 0 to 0.95; i.e., it shows how the overlap ErrorReduction 16.53% 17.83% 13.52% affects the performance for different values of t. Each plot shows, on its top caption, the minimum and the maximum overlap regression errors where t → 1. Looking at these plots,thereareanumberofobservations: (1) As t → 1 (the solid red line), the behavior of the error tendstoreduceaspincreases,i.e.,theoverlap. (2) Comparing different K(cid:48), the behavior of the error indi- catesthatcombiningmultiplepredictions(i.e.,K(cid:48) = 2and K(cid:48) = 3),givespoorperformance,comparedwithK(cid:48) = 1, when the overlap is small. However, all of them, K(cid:48) = 1, 2, and 3, performs well as p → 1; see column 2 and 3 in Fig. 6 and Fig. 8. This indicates consistent prediction of neighboring subdomains as p increases; see also Fig. 7 for side by side comparison of different K(cid:48). The main reason Fig.5:Datasets,Representations,andFeatures behindthisbehavioristhataspincreases,thelocalmodels of the neighboring subdomains normally share more train- ingpointsontheirboundaries,whichisreflectedasshared DatasetsandSetup. Weevaluatedourframeworkonthree constraintsduringthetrainingofthesemodelsmakingthem humanposeestimationdatasets,Poser,HumanEva,andHu- moreconsistentonprediction. man3.6M; see Fig. 5 for summary of setup and represen- (3)Comparingthefirstrowtothesecondrowineachsub- tation for each. Poser dataset [1] consists of 1927 train- figure, it is not hard to see that our AB-Ekmeans partition- ingand418testimages.Theimagefeatures,corresponding ing scheme consistently outperforms RPC [5], e.g. the er- tobag-of-wordsrepresentationwithsilhouette-basedshape- rorincasesofGPR(M=800)is47.48mmforAB-EKmeans context features. The error is measured by the root mean- and 50.66mm for RPC, TGP (M=800) is 38.8mm for AB- squareerror(indegrees),averagedoveralljointsangles,and EKmeansand39.8mmforRPC.Thisproblemisevenmore isgivenby:Error(yˆ,y∗)= 1 (cid:80)54 (cid:107)yˆm−y∗mmod360◦(cid:107), 54 m=1 severewhenusingsmallerM,e.g.theerrorincaseof TGP where yˆ ∈ R54 is an estimated pose vector, and y∗ ∈ R54 (M=400) is 39.5mm for EKmeans and 47.5mm for RPC; is a true pose vector. HumanEva datset [20] contains syn- seeadetailedplotforM=400inFig.9.Wenoticedsigficant chronized multi-view video and Mocap data of 3 subjects dropintheperformanceasMdecreases.Forinstancewhen performing multiple activities. We use HOG features [7] M =200,TheerrorforTGPbestperformanceincreasedto (∈ R270) proposed in [2]. We use training and validations 43.88mminsteadof38mmforM =800. subsets of HumanEva-I and only utilize data from 3 color (4)TGPgivesbetterpredictionthanGPR(i.e.,38mmusing cameras with a total of 9630 image-pose frames for each TGPcomparedwith47mmusingGPR). camera.Thisisconsistentwithexperimentsin[2]and[26]. Weusehalfofthedatafortrainingandhalffortesting.Hu- (5) As M increases, the prediction error decreases. For in- man3.6M [4] is a dataset of millions of Human poses. We stance,whenM =200,TheerrorforTGPbestperformance managed to evaluate our proposed ODC-framework on six increased to 43.88mm instead of 38.9mm for M = 800. 10 MElhoseinyetal. AB EKmeans, K’ = 1, Err = [51.60,48.52] AB EKmeans, K’ = 2, Err = [130.23,47.66] AB EKmeans, K’ = 3, Err = [173.13,47.49] 55 160 180 54 140 160 53 120 140 Error555012 tt == 11..05625 Error10800 Error11028000 49 tt == 23 60 60 480 0.t2 = 4 0.4 0.6 0.8 1 400 0.2 0.4 0.6 0.8 1 400 0.2 0.4 0.6 0.8 1 p p p RPC, K’ = 1, Err = [54.39,50.93] RPC, K’ = 2, Err = [67.75,50.54] RPC, K’ = 3, Err = [128.13,50.67] 56 70 140 55 65 120 54 Error53 Error60 Error10800 52 51 55 60 500 0.2 0.4 0.6 0.8 1 500 0.2 0.4 0.6 0.8 1 400 0.2 0.4 0.6 0.8 1 p p p (a)GPR-ODC(M=800) AB EKmeans, K’ = 1, Err = [41.12,38.79] AB EKmeans, K’ = 2, Err = [123.75,38.79] AB EKmeans, K’ = 3, Err = [166.79,38.85] 42 140 200 41.5 120 41 100 150 Error404.05 t = 1.0 Error 80 Error100 39.5 tt == 12.5625 60 50 39 t = 3 40 t = 4 38.50 0.2 0.4 0.6 0.8 1 200 0.2 0.4 0.6 0.8 1 00 0.2 0.4 0.6 0.8 1 p p p RPC, K’ = 1, Err = [41.62,39.49] RPC, K’ = 2, Err = [58.60,39.44] RPC, K’ = 3, Err = [122.05,39.80] 42.5 65 140 42 60 120 41.5 55 100 Error404.15 Error50 Error 80 40 45 60 39.5 40 40 390 0.2 0.4 0.6 0.8 1 350 0.2 0.4 0.6 0.8 1 200 0.2 0.4 0.6 0.8 1 p p p (b)TGP-ODC(M=800) Fig.6:ODCframeworkParameterAnalysisofGPRandTGPonHumanEvaDataset Table4:Error&TimeonPoserandHumanEvadatasets(Intelcore-i72.6GHZ),M=800 Poser HumanEva Error(deg) TrainingTime PredictionTime Error(mm) TrainingTime PredictionTime TGP NN [2] 5.43 - 188.99sec 38.1 - 6364sec ODC(p=0.9,t=1,K(cid:48)=1)-Ekmeans 5.40 (3.7+25.1)sec 16.5sec 38.9 (2001+45.4)sec 298sec ODC(p=0,t=1,K(cid:48)=1)-Ekmeans 7.60 (3.9+1.33)sec 14.8sec 41.87 (240+4.9)sec 257sec ODC(p=0.9,t=1,K(cid:48)=1)-RPC 5.60 (0.23+41.6)sec 15.8sec 39.9 (0.45+49.1)sec 277sec ODC(p=0,t=1,K(cid:48)=1)-RPC 7.70 (0.15+1.7)sec 13.89sec 42.32 (0.19+5.2)sec 242sec GPR NN 6.77 - 24sec 54.8 - 618sec ODC(p=0.9,t=1,K(cid:48)=1)-Ekmeans 6.27 (3.7+11.1)sec 0.56sec 49.3 (2001+42.85)sec 79sec ODC(p=0.0,t=1,K(cid:48)=1)-Ekmeans 7.54 (3.9+1.38sec) 0.35sec 49.6 (240+6.4)sec 48sec ODC(p=0.9,t=1,K(cid:48)=1)-RPC 6.45 (0.23+17.3)sec 0.52sec 52.8 (0.49+46.06)sec 64sec ODC(p=0.0,t=1,K(cid:48)=1)-RPC= [5] 7.46 (0.15+1.5)sec 0.27sec 54.6 (0.26+4.6)sec 44sec FIC [22] 7.63 (-+20.63) 0.3106 68.36 - 102sec We found these observation to be also consistent on Poser ontesttime,comparedwithcomputingthemattesttimeby dataset. NNscheme.Thefigureshowssignificantspeedupfrompre- computinglocalkernelmachines. Thisanalysishelpedusconcluderecommendingchoos- ing t close to 1, big overlap (p closer to 1), and K(cid:48) = 1 is Table 4 shows error, training time and prediction time sufficientforaccurateprediction. of NN, FIC, and different variations of ODC on Poser and Having accomplished the performance analysis which Human-Eva datasets. Training time is formatted as (t + c comprehensivelyinterpretsourparameters,weusedtherec- t ), where t is the clustering time and t is the remaining p c p ommended setting to compare the performance with other training time excluding clustering. As indicated in the top methodsandshowthebenefitsofthisframework.Figure10 partoftable 4,TGPunderourODC-framworkcansignif- showsthespeedupgainedbyretrievingthematrixinverses icantly speedup the prediction compared with NN-scheme

Overlapping Cover Local Regression Machines PDF

2.7 MB·

by Mohamed Elhoseiny

#journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Overlapping Cover Local Regression Machines PDF Free - Full Version

by Mohamed Elhoseiny| 2.7

Download Overlapping Cover Local Regression Machines by Mohamed Elhoseiny in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Overlapping Cover Local Regression Machines

No description available for this book.

Detailed Information

Author:	Mohamed Elhoseiny
File Size:	2.7
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Overlapping Cover Local Regression Machines Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Overlapping Cover Local Regression Machines PDF?

Yes, on https://PDFdrive.to you can download Overlapping Cover Local Regression Machines by Mohamed Elhoseiny completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Overlapping Cover Local Regression Machines on my mobile device?

After downloading Overlapping Cover Local Regression Machines PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Overlapping Cover Local Regression Machines?

Yes, this is the complete PDF version of Overlapping Cover Local Regression Machines by Mohamed Elhoseiny. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Overlapping Cover Local Regression Machines PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.