ebook img

On Competitive Algorithms for Approximations of Top-k-Position Monitoring of Distributed Streams PDF

0.2 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On Competitive Algorithms for Approximations of Top-k-Position Monitoring of Distributed Streams

k On Competitive Algorithms for Approximations of Top- -Position Monitoring of Distributed Streams AlexanderMa¨cker ManuelMalatyali FriedhelmMeyeraufderHeide HeinzNixdorfInstitute&ComputerScienceDepartment 6 1 Paderborn University,Germany 0 {amaecker, malatya,fmadh}@hni.upb.de 2 t c O Abstract 7 2 Consider the continuous distributed monitoring model in which n distributed nodes, receiving individual data streams,areconnectedtoadesignatedserver. Theserverisaskedtocontinuouslymonitorafunctiondefinedover ] thevaluesobservedacrossallstreamswhileminimizingthecommunication. Westudyavariantinwhichtheserver S isequippedwithabroadcastchannelandissupposedtokeeptrackofanapproximationofthesetofnodescurrently D observingtheklargestvalues. Suchanapproximatesetisexactexceptforsomeimprecisioninanε-neighborhood s. ofthek-thlargestvalue. ThisapproximationoftheTop-k-PositionMonitoringProblemisofinterestincaseswhere c marginalchanges(e.g.duetonoise)inobservedvaluescanbeignoredsothatmonitoringanapproximationissuffi- [ cientandcanreducecommunication. 3 Thispaperextendsourresultsfrom[6],wherewehavedevelopedafilter-basedonlinealgorithmforthe(exact) v Top-k-Position Monitoring Problem. There we have presented a competitive analysis of our algorithm against an 8 offlineadversarythatalsoisrestrictedtofilter-basedalgorithms. Ournewalgorithmsaswellastheiranalysesuse 4 newmethods. Weanalyzetheircompetitivenessagainstadversariesthatusebothexactandapproximatefilter-based 4 algorithms,andobserveseveredifferencesbetweentherespectivepowersoftheseadversaries. 4 0 . 1 Introduction 1 0 6 Weconsiderasettinginwhichndistributednodesareconnectedtoacentralserver. Eachnodecontinuouslyobserves 1 a data stream and the server is asked to keep track of the value of some functiondefinedoverall streams. In order : v to fulfill thistask, nodescan communicateto the server, while the server canemploya broadcastchannelto send a i messagetoallnodes. X In an earlier paper [6], we introducedand studied a problemcalled Top-k-PositionMonitoringin which, at any r a time t, the serveris interestedin monitoringthe k nodesthatare observingthe largestvaluesat this particulartime t. Asa motivatingexample,picturea scenarioin whicha centralloadbalancerwithin a localcluster ofwebservers isinterestedinkeepingtrackofthosenodeswhicharefacingthehighestloads. Weproposedanalgorithmbasedon thenotionoffiltersandanalyzeditscompetitivenesswithrespecttoanoptimalfilter-basedofflinealgorithm. Filters are assignedbythe serverandare usedas a meansto indicate thenodeswhenthey canresignto send updates; this particularlyreducescommunicationwhenobservedvaluesare“similar” to the valuesobservedin the previoustime steps. Inthispaper,webroadentheproblemandinvestigatethemonitoringofanapproximationoftheTop-k-Positions. Westudytheproblemofε-Top-k-PositionMonitoring,inwhichtheserverissupposedtomaintainasubsetofknodes suchthatallnodesobserving“clearlylarger”valuesthanthenodewhichobservedthek-thlargestvaluearewithinthis setandnonodeobservinga“clearlysmaller”valuebelongstothisset. Here,smaller/largerismeanttobeunderstood withrespecttoεandthek-thlargestvalueobserved. AdetaileddefinitionisgiveninSect.2. Relaxingtheproblem inthisdirectioncanreducecommunicationwhile,inmanycases,marginalorinsignificantchanges(e.g.duetonoise) ThisworkwaspartiallysupportedbytheGermanResearchFoundation(DFG)withinthePriorityProgram“AlgorithmsforBigData”(SPP 1736)andbytheEUwithinFETprojectMULTIPLEXundercontractno.317532. 1 inobservedvaluescanbeignoredandjustifythesufficiencyofanapproximation.Examplesaresituationswherelots ofnodesobservevaluesoscillatingaroundthek-thlargestvalueandwherethisobservationisnotofanyqualitative relevance for the server. We design and analyze algorithms for ε-Top-k-Position Monitoring and, although we use theseverytoolsoffiltersandcompetitiveanalysis[6],theimprecision/approximationrequiresfundamentallydifferent onlinestrategiesfordefiningfiltersinordertoobtainefficientsolutions. 1.1 OurContribution Inthispaperweinvestigateaclassofalgorithmsthatarebasedonusingfiltersandstudytheirefficiencyintermsof competitiveanalysis. As a first technicalcontributionwe analyze an algorithm(Sect. 3) which allows the server to decide the logical disjunction of the (binary) values observed by the distributed nodes. It uses a logarithmicnumber of rounds and a constantnumberofmessagesonexpectation.Asaby-product,usingthisalgorithm,theresultonthecompetitiveness of thefilter-based onlinealgorithmin [6] can bereducedfromO(klogn+log∆logn) to O(klogn+log∆), for observedvaluesfrom{0,1,...,∆}. Second,wealsoproposeanonlinealgorithm(Sect.4)thatisallowedtointroduceanerrorofε ∈ (0,1/2]inthe outputandcompareittoanofflinealgorithmthatsolvestheexactTop-k-PositionMonitoringproblem.Weshowthat thisalgorithmisO(klogn+loglog∆+log1)-competitive. Notethatthisimprecisionallowstobringthelog∆in ε theupperbounddowntologlog∆foranyconstantε. Wealsoinvestigatethesettinginwhichalsotheofflinealgorithmisallowedtohaveanerrorintheoutput(Sect.5). Wefirstshowthattheseresultsarenotcomparabletopreviousresults;weprovealowerboundonthecompetitiveness of Ω(n/k). Our third and main technical contribution is an algorithm with a competitiveness of O(n2log(ε∆)+ nlog2(ε∆)+loglog∆+log1)iftheonlineandtheofflinealgorithmmayuseanerrorofε. ε However,ifweslightlydecreasetheallowederrorfortheofflinealgorithm,thelowerboundonthecompetitiveness ofΩ(n/k)stillholds,whiletheupperboundisreducedtoO(n+klogn+loglog∆+log1). ε 1.2 Related Work Efficient computation of functions on big datasets in terms of streams has turned out to be an important topic of researchwithapplicationsinnetworktrafficanalysis,textminingordatabases(e.g.[9]and[7]). The Continuous Monitoring Model, which we consider in this paper, was introduced by Cormode et al. [2] to model systems comprisedof a server and n nodes observingdistributed data streams. The primary goal addressed withinthismodelisthecontinuouscomputationofafunctiondependingontheinformationavailableacrossallndata streams up to the current time at a dedicated server. Subject to this main concern, the minimization of the overall number of messages exchanged between the nodes and the server usually determines the efficiency of a streaming algorithm.WerefertothismodelandenhanceitbyabroadcastchannelasproposedbyCormodeetal.in[3]. Animportantclassofproblemsinvestigatedinliteraturearethresholdcomputationswheretheserverissupposed to decide whether the currentfunction value has reached some given threshold τ. For monotonefunctionssuch as monitoringthenumberofdistinctvaluesorthesumoverallvalues,exactcharacterizationsinthedeterministiccase are known[2, 3]. However,non-monotonefunctions, e.g., the entropy[1], turnedoutto be muchmore complexto handle. Ageneralapproachtoreducethecommunicationwhenmonitoringdistributedstreamsisproposedin[12]. Zhang et al. introduce the notion of filters, which are also an integral part of our algorithms. They consider the problem of continuousskyline maintenance, in which a server is supposed to continuouslymaintain the skyline of dynamic objects. Astheyaimatminimizingthecommunicationoverheadbetweentheserverandtheobjects,theyuseafilter methodthat helpsin avoiding the transmission of updatesin case these updatescannotinfluencethe skyline. More precisely,theobjectsarepointsofad-dimensionalspaceandfiltersarehyper-rectanglesassignedbytheservertothe objectssuchthataslongasthesepointsarewithintheassignedhyper-rectangle,updatesneednotbecommunicated totheserver. Despiteitsonlinenature,bynowstreamingalgorithmsarebarelystudiedintermsofcompetitiveness.Intheirwork [11],YiandZhangwerethefirsttostudystreamingalgorithmswithrespecttotheircompetitivenessandrecentlythis approachwasalsoappliedinafewpapers([5,10,6,4]). Intheirmodel[11],thereisonenodeandoneserverandthe goalistokeeptheserverinformedaboutthecurrentvalueofa functionf : Z+ → Zd thatisobservedbythenode andchangesitsvalueovertime,whileminimizingthenumberofmessages. YiandZhangpresentanalgorithmthat 2 isO(d2log(d·δ))-competitiveifthelastvaluereceivedbytheservermightdeviatebyδ fromthecurrentvalueoff. Recently,Tangetal.[10]extendedthisworkbyYiandZhangforthetwo-partysettingtothedistributedcase. They consider a model in which the server is supposed to track the current value of a (one-dimensional)function that is definedoverasetofnfunctionsobservedatthedistributednodes.Amongotherthings,theyproposeanalgorithmfor thecaseofatree-topologyinwhichthedistributednodesaretheleavesofatreeconnectingthemtotheserver. They show that on any instance I their algorithmincurscommunicationcost that is by a factor of O(h logδ), where max h represents the maximimum length of a path in the tree, larger than those of the best solution obtained by an max onlinealgorithmonI. Following the idea of studying competitive algorithms for monitoring streams and the notion of filters, Lam et al. [5] present an algorithm for online dominance tracking of distributed streams. In this problem a server always hastobeinformedaboutthedominancerelationshipbetweenndistributednodeseachobservinganonlinestreamof d-dimensionalvalues. Theiralgorithmisbasedonthe ideaoffiltersandtheyshowthata mid-pointstrategy,which sets filterstobe themid-pointbetweenneighboringnodes,is O(dlogU)-competitivewith respectto thenumberof messagessentincomparisontoanofflinealgorithmthatsetsfiltersoptimally. Whilewelooselymotivatedoursearchforapproximatesolutionsbynoiseintheintroduction,inotherproblems noiseisamajorconcernandexplicitlyaddressed.Forexample,considerstreamingalgorithmsforestimatingstatistical parameterslikefrequencymoments[13].Insuchproblems,certainelementsfromtheuniversemayappearindifferent formsduetonoiseandthus,shouldactuallybetreatedasthesameelement. 2 Preliminaries Inoursettingtherearendistributednodes{1,...,n}. Eachnodeireceivesacontinuousdatastream(v1,v2,v3...), i i i whichcanbeexclusivelyobservedbynodei. Attimet,vt ∈Nisobservedandnovt′,t′ >t,isknown.Weomitthe i i indextifitisclearfromthecontext. Following the model in [3], we allow that between any two consecutive time steps, a communication protocol exchangingmessagesbetweentheserverandthenodesmaytakeplace.Thecommunicationprotocolisallowedtouse anamountofroundswhichispolylogarithmicinnandmax (vt).Thenodescancommunicatetotheserverwhile 1≤i≤n i theservercancommunicatetosinglenodesorutilizeabroadcastchanneltocommunicateamessagethatisreceived byallnodesatthesametime. Thesecommunicationmethodsincurunitcommunicationcostpermessage,weassume instantdelivery,andamessageattimetisallowedtohaveasizeatmostlogarithmicinnandmax (vt). 1≤i≤n i Problem Description ConsidertheTop-k-PositionMonitoringproblem[6],inwhichtheserverisaskedtokeep track of the set of nodes currently holding the k largest values. We relax this definition and study an approximate variant of the problem in which this set is exact except for nodes in a small neighborhoodaround the k-th largest value. We denote by π(k,t) the nodewhich observesthe k-th largestvalue at time t and denote by top-k := {i ∈ {1,...,k} : π(i,t)} the nodes observing the k largest values. Given an error 0 < ε < 1, for a time t we denote by E(t) := ( 1 vt ,∞] the range of values that are clearly larger than the k-th largest value and by A(t) := 1−ε π(k,t) [(1−ε)vt , 1 vt ]theε-neighborhoodaroundthek-thlargestvalue.Furthermore,wedenotebyK(t):={i: π(k,t) 1−ε π(k,t) vt ∈A(t)}thenodesintheε-neighborhoodaroundthek-thlargestvalue. Then,atanytimet,theserverissupposed i toknowthenodesF(t)=F (t) ∪ F (t)={i ,...,i }accordingtothefollowingproperties: E A 1 k 1. F (t)={i:vt ∈E(t)}and E i 2. F (t)⊆K(t)={i:vt ∈A(t)},suchthat|F (t)|=k−|F (t)|holds. A i A E Denoteby∆themaximalvalueobservedbysomenode(whichmaynotbeknownbeforehand).WeuseF =F(t)if 1 tisclearfromthecontext,F ={1,...,n}\F(t),andcallF∗theoutputofanoptimalofflinealgorithm.Ifthek-th 2 andthe(k+1)-stlargestvaluedifferbymorethanεvt ,F(t)coincideswiththesetinthe(exact)Top-k-Position π(k,t) Monitoringproblemandhence,F(t)isunique.Wedenotebyσ(t):=|K(t)|thenumberofnodesattimetwhichare intheε-neighborhoodofthek-thlargestvalueandσ :=max σ(t). Notethat|K(t)| =1impliesthatF(t)isunique. t FurthermoreforsolvingtheexactTop-k-PositionMonitoringproblemweassumethatthevaluesaredistinct(atleast byusingthenodes’identifierstobreaktiesincasethesamevalueisobservedbyseveralnodes). 3 2.1 Filter-Based Algorithms& CompetitiveAnalysis Asetoffiltersisacollectionofintervals,oneassignedtoeachnode,suchthataslongastheobservedvaluesateach nodearewithinitsrespectiveinterval,theoutputF(t)neednotchange. Fortheproblemathand,thisgeneralideaof filterstranslatestothefollowingdefinition. Definition2.1. [6]Forafixedtimet,asetoffiltersisdefinedasann-tupleofintervals(Ft,...,Ft),F ⊆N∪{∞} 1 n i andv ∈ F , suchthataslongasthevalueofnodeionlychangeswithinitsinterval(i.e.v ∈ F ), thevalueofthe i i i i outputF neednotchange. Observe that each pair of filters (F ,F ) of nodes i ∈ F(t) and j ∈/ F(t) must be disjoint except for a small i j overlapping.Thisobservationcanbestatedformallyasfollows. Observation2.2. Forafixedtimet,ann-tupleofintervalsisasetoffiltersifandonlyifforallpairsi ∈ F(t)and j ∈/ F(t)thefollowingholds: v ∈F =[ℓ ,u ],v ∈F =[ℓ ,u ]andℓ ≥(1−ε)u . i i i i j j j j i j Inourmodel,weassumethatnodesareassignedsuchfiltersbytheserver.Ifanodeobservesavaluethatislarger than the upper bound of its filter, we say the node violates its filter from below. A violation from above is defined analogously.Ifsuchaviolationoccurs,thenodemayreportitanditscurrentvaluetotheserver.Incontrastto[6],we allowtheservertoassign“invalid”filters,i.e.,thereareaffectednodesthatdirectlyobserveafilter-violation.However, forsuchanalgorithmtobecorrect,wedemandthattheintervalsassignedtothenodesattheendoftheprotocolattime tandthus,beforeobservationsattimet+1,constitutea(valid)setoffilters. Wecallsuchanalgorithmfilter-based. Note thatthe factthatwe allowinvalidfilters(in contrastto [6])simplifiesthe presentationof thealgorithmsin the following. However,usingaconstantoverheadtheprotocolscanbechangedsuchthatonly(valid)filtersaresentto thenodes. Competitiveness Toanalyzethequalityofouronlinealgorithms,weuseanalysisbasedoncompetitivenessand comparethecommunicationinducedbythealgorithmstothatofanadversary’sofflinealgorithm. Similarto[5]and[6],weconsideradversariesthatarerestrictedtousefilter-basedofflinealgorithmsandhence, OPT islowerboundedbythenumberoffilterupdates. However,we compareouralgorithmsagainstseveraladver- sarieswhichdifferintermsofwhethertheirofflinealgorithmsolvestheexactTop-k-PositionMonitoringProblemor ε-Top-k-PositionMonitoring.Theadversariesareassumedtobeadaptive,i.e.,valuesobservedbyanodearegivenby anadversarywhoknowsthealgorithm’scode,thecurrentstateofeachnodeandtheserverandtheresultsofrandom experiments. Anonlinealgorithmis saidto havea competitivenessof c if thenumberofmessagesis atmostby a factorofc largerthanthatoftheadversary’sofflinealgorithm. 2.2 Observations andLemmas Define for some fixedset S ⊆ {1,...,n} the minimumof the valuesobservedby nodesin S duringa time period [t,t′]asMINS(t,t′)andthemaximumofthevaluesobservedduringthesameperiodasMAXS(t,t′). Definition2.3. Lett,t′ begiventimeswitht′ ≥ t. ForasubsetofnodesS ⊆ {1,...,n}thevaluesMAXS(t,t′) := maxt≤t∗≤t′maxi∈S(vit∗)andMINS(t,t′)aredefinedanalogously. ObservethatitissufficientforanoptimalofflinealgorithmtoonlymakeuseoftwodifferentfiltersF andF . 1 2 Proposition2.4. Withoutlossofgenerality,wemayassumethatanoptimalofflinealgorithmonlyusestwodifferent filtersatanytime. Proof. Let [t,t′] be an intervalduring which OPT does notcommunicate. We fix its outputF∗ and define F∗ := 1 2 {1,...,n}\F∗. IfOPT onlyusestwodifferentfiltersthroughouttheinterval, we aredone. Otherwise, usingF∗ 1 1 asoutputthroughoutthe interval[t,t′] andfiltersF1 = [MINF∗(t,t′),∞] andF2 = [0,MAXF∗(t,t′)], which must 1 2 be feasible due to the assumption that OPT originally assigned filters that lead to no communication, leads to no communicationwithintheconsideredinterval. Thefollowinglemmageneralizesa lemmain [6] toε-Top-k-PositionMonitoring. Assumingthe optimaloffline algorithmdidnotchangethesetoffiltersduringatimeperiod[t,t′],theminimumvalueobservedbynodesinF∗can 1 onlybeslightlysmallerthanthemaximumvalueobservedbynodesinF∗. 2 4 Lemma2.5. IfOPT usesthesamesetoffiltersF1,F2during[t,t′],thenitholdsMINF∗(t,t′)≥(1−ε)MAXF∗(t,t′). 1 2 Proof. Assume to the contrary that OPT uses the same set of filters throughoutthe interval[t,t′] and outputsF∗, 1 but MINF1∗(t,t′) < (1− ε)MAXF2∗(t,t′) holds. Then there are two nodes, i ∈ F1∗ and j ∈/ F1∗, and two times t1,t2 ∈ [t,t′],suchthatvit1 = MINF1∗(t,t′)andvjt2 = MAXF2∗(t,t′). Duetothedefinitionofasetoffiltersandthe factthatOPT hasnotcommunicatedduring[t,t′],OPT musthavesetthefilterfornodeito[s ,∞],s ≤ vt1,and 1 1 i fornodejto[−∞,s ],s ≥vt2. ThisisacontradictiontothedefinitionofasetoffiltersandObservation2.2. 2 2 j Atlastaresultfrom[6]isrestatedinordertocalculatethe(exact)top-ksetforonetimestep. Lemma2.6. [6]ThereisanalgorithmthatcomputesthenodeholdingthelargestvalueusingO(logn)messageson expectation. 3 Auxiliary Problem: Existence Inourcompetitivealgorithmsdesignedandanalyzedinthefollowing,wewillfrequentlymakeuseofaprotocolfor asubproblemwhichwecall EXISTENCE: Assumeallnodesobserveonlybinaryvalues,i.e.∀i ∈ {1,...,n} : vi ∈ {0,1}.Theserverisaskedtodecidethelogicaldisjunctionforonefixedtimestept. Itis knownthatfor n nodeseach holdinga bitvector of length m the communicationcomplexityto decide the bit-wisedisjunctionisΩ(nm)intheservermodel[8]. Observethatinourmodel1messageissufficienttodecidethe problemassumingthenodeshaveauniqueidentifierbetween1andnandtheprotocolusesnrounds. We prove that it is sufficient to use a constant amount of messages on expectation and logarithmic number of rounds. NotethatthealgorithminthefollowinglemmaisaLasVegasalgorithm,i.e.thealgorithmisalwayscorrect andthenumberofmessagesneededisbasedonarandomprocess. Lemma 3.1. There is an algorithm EXISTENCEPROTOCOL that uses O(1) messages on expectation to solve the problemEXISTENCE. Proof. Initially all nodesare active. All nodes i deactivate themselves, if v = 0 holds, that is, these nodesdo not i take partin the followingprocess. Ineach roundr = 0,1,...,logn the activenodessend messagesindependently atrandomwithprobabilityp := 2r/n. Consequently,ifthelastroundγ = lognisreached,allactivenodesiwith r v = 1 send a message with probability 1. As soon as at least one message was sent or the γ-th round ends, the i protocolisterminatedandtheservercandecideEXISTENCE. Next, we analyze the above protocoland show that the boundon the expected number of messages is fulfilled. LetX betherandomvariableforthenumberofmessagesusedbytheprotocolandbbethenumberofnodesiwith v =1. Notethattheexpectednumberofmessagessentinroundrisb·p andtheprobabilitythatnonodehassenta i r messagebeforeis r−1(1−p )b. k=0 k Observing that the function f(r) = b·p (1−p )b has only one extreme point and 0 ≤ f(r) < 2 for r ∈ Q r r−1 [0,logn],itiseasytoverifythattheseriescanbeupperboundedbysimpleintegration: b log(n)b2r r−1 2k b E[X]≤ + 1− n n n r=1 k=0(cid:18) (cid:19) X Y log(n)b2r 2r−1 b ≤1+ 1− n n r=1 (cid:18) (cid:19) X log(n) b2r 2r−1 b ≤1+ 1− dr+2 n n Z0 (cid:18) (cid:19) 5 b 2r−1 b logn ≤3+ (2r−2n) 1− (b+1)nln(2) n " (cid:18) (cid:19) #0 1 ≤3+ · nln(2) 2logn−1 b 20−1 b (2logn−2n) 1− +2n 1− n n (cid:18) (cid:19) (cid:18) (cid:19) ! b b 1 1 1 ≤3+ (n−2n) 1− +2n 1− nln(2) 2 2n " (cid:18) (cid:19) (cid:18) (cid:19) # b 1 1 1 ≤3+ (−n) +2n 1− nln(2) 2b 2n " (cid:18) (cid:19) # b 1 1 1 ≤3+ 2 1− − ln(2) 2n 2b (cid:18) (cid:19) ! 1 1 2 ≤3+ 2− ≤3+ ≤6 . ln(2) 2b ln(2) (cid:18) (cid:19) Thisprotocolcanbeusedforavarietyofsubtasks,e.g.validatingthatallnodesarewithintheirfilters,identifying thatthereissomefilter-violationorwhethertherearenodesthathaveahighervaluethanacertainthreshold. Corollary 3.2. Given a time t. There is an algorithm which decides whether there are nodes which observed a filter-violationusingO(1)messagesonexpectation. Proof. Forthedistributednodestoreportfilter-violationsweuseanapproachbasedontheEXISTENCEPROTOCOLto reducethenumberofmessagessentincaseseveralnodesobservefilter-violationsatthesametime. Thenodesapply theEXISTENCEPROTOCOLasfollows:Eachnodethatisstillwithinitsfilterappliestheprotocolusinga0asitsvalue andeachnodethatobservesafilter-violationusesa1. Notethatbythisapproachtheserverdefinitelygetsinformedif thereissomefilter-violationandotherwisenocommunicationtakesplace. The EXISTENCEPROTOCOL can be used in combination with the relaxed definition of filters to strengthen the result for Top-k-Position Monitoring from O(klogn + log∆logn) to O(klogn + log∆). We first introduce a genericframeworkandthenshowhowtoachievethisbound. A generic approach Throughoutthe paper,severalofouralgorithmsfeaturesimilar structuralpropertiesin the sensethattheycanbedefinedwithinacommonframework.Hence,wenowdefineagenericapproachtodescribethe calculationandcommunicationoffilters,whichwethenrefinelater.Thegeneralideaistoonlyusetwodifferentfilters thatarebasicallydefinedbyonevalueseparatingnodesinF(t)fromtheremainingnodes.Wheneverafilter-violation isreported,thisvalueisrecalculatedandusedtosetfiltersproperly. Theapproachproceedsinrounds. Inthefirstroundwe defineaninitialintervalL . Inther-th round,basedon 0 intervalL ,wecomputeavaluemthatisbroadcastedandisusedtosetthefiltersto[0,m]and[m,∞]. Assoonas r nodei reportsafilter-violationobservingthevaluev , thecoordinatorredefinestheintervalL := L ∩[0,v ] if i r+1 r i theviolationisfromaboveandL :=L ∩[v ,∞]otherwise. Theapproachfinishesassoonassome(predefined) r+1 r i conditionissatisfied. Corollary3.3. ThereisanalgorithmthatisO(klogn+log∆)-competitivefor(exact)Top-k-PositionMonitoring. Proof. Ouralgorithmproceedsinphasesthataredesignedsuchthatwecanshowthatanoptimalalgorithmneedsto communicateatleastonceduringaphaseandadditionally,wecanupperboundthenumberofmessagessentbythe onlinealgorithmaccordingtotheboundonthecompetitiveness. Weapplythegenericapproachwithparametersdescribedasfollows.TheinitialintervalisdefinedasL :=[ℓ,u], 0 whereℓ = vt ,u = vt . Thiscanbedonebydeterminingthevaluesofthenodesholdingthek+1largest π(k+1,t) π(k,t) valuesusingO(klogn)messagesonexpectation.Inther-thround,basedonintervalL ,wecomputethemidpointof r 6 L asthevaluemwhichisbroadcastedandusedtosetthefilters. Assoonasafilter-violationisreported,thegeneric r frameworkisapplied.IncaseL isemptythephaseends. r NotethatthedistancebetweenuandℓgetshalvedeverytimeanodeviolatesitsfilterleadingtoO(log(u −ℓ ))= 0 0 O(log∆)messagesonexpectationperphase. Also,itisnothardtoseethatduringaphaseOPT hascommunicated atleastonceandhence,weobtaintheclaimedboundonthecompetitiveness. 4 Competing against an Exact Adversary Inthissection,weproposeanalgorithmbasedonthestrategytochoosethenodesholdingtheklargestvaluesasan outputandusethissetaslongasitisfeasible. Itwillturnoutthatthisalgorithmissuitablein twoscenarios: First, itperformswellagainstanadversarywhosolvestheTop-k-PositionMonitoringproblem(cf.Theorem4.5);second, wecanuseitinsituationsinwhichanadversarywhoisallowedtointroducesomeerrorandcannotexploitthiserror becausetheobserveddataleadstoauniqueoutput(cf.Sect.5). Inparticular,wedevelopanalgorithmstartedattthatcomputestheoutputsetF :=F(t)usingtheprotocolfrom 1 Lemma2.6andforallconsecutivetimeswitnesseswhetherF iscorrectornot. Recallthatwhilecomputingtheset 1 F(t)fromscratch(cf.Lemma2.6)isexpensiveintermsofcommunication,witnessingitscorrectnessinconsecutive roundsischeapsinceitsufficestoobservefilter-violations(cf.Definition2.1andCorollary3.2). ThealgorithmtriestofindavaluemwhichpartitionsF fromF accordingtothegenericframework,suchthat 1 2 forallnodesi∈F itholdsv ≥mandforallnodesi∈F itholdsv ≤m. Wecallsuchavaluemcertificate. 1 i 2 i GuessingOPT’sFilters Inthefollowingweconsideratimeperiod[t,t′′]duringwhichtheoutputF(t)neednot change. Consideratimet′ ∈ [t,t′′]. Theonlinestrategytochooseacertificateatthistimecontingentsonthesizeof someinterval L∗fromwhichanofflinealgorithmmusthavechosen thelowerboundℓ∗oftheupperfilterattimet suchthatthefilters arevalidthroughout[t,t′]. The algorithm TOP-K-PROTOCOL keepstrackof (anapproximation of) L∗ at time t′ denoted by L = [ℓ,u] for which L∗ ⊆ L holds. The online algorithm tries to improvethe guess whereOPTmusthavesetfiltersbygraduallyreducingthesizeofintervalL(whilemaintainingtheinvariantL∗ ⊆L) attimesitobservesfilter-violations. Initiallyuandℓ aredefinedasfollows: u := vπt(k,t) = MINF1(t,t)andℓ := vπt(k+1,t) = MAXF2(t,t) andare redefinedovertime. Althoughdefiningthe certificate as the midpointof L = [ℓ,u] intuitivelyseemsto be the best waytochoosem,thealgorithmisbasedonfourconsecutivephases,eachdefiningadifferentstrategy. Indetail,thefirstphaseisexecutedaslongastheproperty loglogu>loglogℓ+1 (P1) holds.Inthisphase,misdefinedasℓ+22r afterrfilter-violationsobserved.Iftheproperty loglogu≤loglogℓ+1∧u>4ℓ (P2) holds,thevaluemischosentobe2mid wheremidisthemidpointof[logℓ,logu]. Observethat2mid ∈ L = [ℓ,u] holds. Thethirdphaseisexecutedifproperty 1 u≤4ℓ∧u> ℓ (P3) 1−ε holdsandemploystheintuitiveapproachofchoosingmasthemidpointofL. Thelastphasecontainstheremaining caseof 1 u≤ ℓ (P4) 1−ε andissimplyexecuteduntilthenextfilter-violationisobservedusingthefiltersF =[ℓ,∞]andF =[0,u]. 1 2 InthefollowingweproposethreealgorithmsA ,A ,andA whichareexecutediftherespectivepropertyhold 1 2 3 andanalyzethecorrectnessandtheamountofmessagesneeded. 7 Lemma4.1. Giventimet,anoutputF(t),andanintervalL=[ℓ,u]forwhich(P1)holds,thereisanalgorithmA 1 thatwitnessesthecorrectnessofF(t)untilatimet′atwhichitoutputsL′ =[ℓ′,u′]forwhich(P1)doesnothold.The algorithmusesO(loglog∆)messagesonexpectation. Proof. The algorithm A applies the generic framework and defines the value m, the server broadcasts, as m := 1 ℓ +22r,whereℓ istheinitialvalueofℓ. Ifloglogu′−loglogℓ′ ≤ 1holds,thealgorithmterminatesandoutputs 0 0 L′ =[ℓ′,u′]withℓ′andu′definedastheredefinitionofℓandurespectively. Toanalyzetheamountofmessagesneededandexpressitintermsof∆,observethatintheworstcasetheserver only observesfilter-violationsfrom nodesi ∈ F . In case there is a filter-violationfromabove, i.e. a node i ∈ F 2 1 reportsafilter-violation,theconditionloglogu′−loglogℓ′ ≤1holds.Atleastinroundr =loglog(u−ℓ),whichis bydefinitionupperboundedbyloglog∆,thealgorithmterminates. IfF(t)isnotvalidattimet′,therearenodesi ∈F ,i ∈F andtimepointst ,t (t =t′∨t =t′)forwhich 1 1 2 2 1 2 1 2 vt1 < vt2 holds. Thus,A observeda filter-violationbyeitheri ori followedbya sequencealternatingbetween i1 i2 1 1 2 filter-violationsandfilter-updates.Atsomepoint(butstillattimet′)loglogu′−loglogℓ′ ≤1holdsandthealgorithm outputs(ℓ′,u′),provingA ’scorrectnessfortimet′. 1 Lemma 4.2. Fora given F(t) and a given intervalL = [ℓ,u] for which (P2)holds, there is an algorithmA that 2 witnesses the correctnessofF(t) untila time t′ atwhich itoutputsL′ = [ℓ′,u′] for which(P2)doesnothold. The algorithmusesO(1)messagesonexpectation. Proof. Weapplythegenericapproachandchoosethevaluemtobebroadcastedby2mid,wheremidisthemidpoint of[logℓ,logu]. To analyze the amountof messages needed, boundL = [ℓ,u] in terms of values that are double exponentialin 2. Tothisend, leta ∈ Nbethelargestnumbersuchthatℓ ≥ 22a holds. Nowobservesince(P2)holds, u ≤ 22a+2 follows.Sincethealgorithmchoosesthemidpointoftheinterval[logℓ,logu]inordertogetmandhalvesthisinterval aftereveryfilter-violation,onecanupperboundthenumberofroundsbyanalyzinghowoftentheinterval[logℓ,logu] getshalved. Thisis[logℓ,logu] ⊆ log 22a ,log 22a+3 = [2a,8∗2a]canbehalvedatmostaconstantnumber oftimes,untilitcontainsonlyonevahlue,(cid:0)whic(cid:1)himp(cid:16)liesth(cid:17)ati4·ℓ>uholds. Lemma 4.3. Fora given F(t) and a given intervalL = [ℓ,u] for which (P3)holds, there is an algorithmA that 3 witnesses the correctnessofF(t) untila time t′ atwhich itoutputsL′ = [ℓ′,u′] for which(P3)doesnothold. The algorithmusesO(log1/ε)messagesonexpectation. Proof. Thealgorithmappliesthegenericframeworkandusesthe midpointstrategystartingwith the intervalL := 0 [ℓ,u]. ObservethatittakesatmostO log1 redefinitionsofLtohavethefinalsize,nomatterwhetherthealgorithm ε observesonlyfilter-violationsfromnodesi ∈ F(t)ori ∈/ F(t). Thistogetherwiththeuseofthe EXISTENCEPRO- (cid:0) (cid:1) TOCOL forhandlingfilter-violationsyieldstheneedednumberofmessagesonexpectation. Thecorrectnessfollows similarlyasshownforLemma4.1. Now we proposean algorithmstarted at a time t which computesthe outputF(t) and witnesses its correctness untilsome(notpredefined)timet′atwhichtheTOP-K-PROTOCOLterminatesusingacombinationofthealgorithms statedabove.PreciselytheTOP-K-PROTOCOLisdefinedasfollows: Algorithm TOP-K-PROTOCOL 1. Computethenodesholdingthe(k+1)largestvaluesanddefineℓ:=vt ,u:=vt andF(t). k+1 k 2. If(P1)holds,callA withtheargumentsF(t)andL = [ℓ,u]. Atthetimet′ atwhichA outputsL′ = [ℓ′,u′] 1 1 setℓ:=ℓ′andu:=u′. 3. If(P2)holds,callA withtheargumentsF(t)andL = [ℓ,u]. Atthetimet′ atwhichA outputsL′ = [ℓ′,u′] 2 2 setℓ:=ℓ′andu:=u′. 4. If(P3)holds,callA withtheargumentsF(t)andL = [ℓ,u]. Atthetimet′ atwhichA outputsL′ = [ℓ′,u′] 3 3 setℓ:=ℓ′andu:=u′. 8 5. Ifu≥ℓandu≤ 1 ℓholds,setthefilterstoF :=[ℓ,∞],F :=[0,u]. Atthetimet′ atwhichnodei∈F (1−ε) 1 2 2 reportsafilter-violationfrombelowdefineℓ := vt′. Incasenodei ∈ F reportsafilter-violationfromabove, i 1 defineu:=vt′. i 6. Terminateandoutput(ℓ,u). Lemma4.4. Consideratimet. ThealgorithmTOP-K-PROTOCOL computesthetop-ksetandwitnessesitscorrect- nessuntilatimet′ atwhichitoutputsL=[ℓ,u],whereℓ≤ MAXF2(t,t′),MINF1(t,t′)≤u,andℓ>uholds(i.e.L isempty). ThealgorithmusesO(klogn+loglog∆+log1)messagesonexpectation. ε Proof. WefirstargueonthecorrectnessofTOP-K-PROTOCOLandafterwardsshortlyanalyzethenumberofmessages used. Thealgorithmcomputesinstep1.acorrectoutputF attimetbyusingthealgorithmfromLemma2.6forktimes. 1 In consecutive time steps t′ > t the correctness of TOP-K-PROTOCOL follows from the correctness of algorithms A ,A , andA insteps 2. -4. Forthecorrectnessofstep 5. observethatbysettingthe filtersto F = [ℓ,∞] and 1 2 3 1 F =[0,u]andthefactthatu≤ 1 ℓholdsthefiltersarevalid. Thus,aslongasallnodesobservevalueswhichare 2 1−ε insidetheirrespectivefilterstheoutputneednotchange. Atthetimestept′ theprotocolterminatesandoutputsL = [ℓ,u]itholdsu < ℓ. Thus,therearenodesi ∈ F 1 1 andi ∈F andtimestepst ,t ∈[t,t′]with: vt1 ≤uandvt2 ≥ℓ,andthus,vt1 <vt2. 2 2 1 2 i1 i2 i1 i2 To argue on the number of messages observe that the first step can be executed using O(klogn) number of messages.Atthetimetheconditionofsteps2.-5.arecheckedthesestepscanbeperformedusingO(klogn)number ofmessages,bycomputingthenodesholdingthek+1largestvalues. ThealgorithmsA ,A ,andA arecalledat 1 2 3 mostonceeachthustheconditionsare alsocheckedatmostonce. Afterexecutingstep 5. thealgorithmterminates whichleadstotheresultonthenumberofmessagesasstatedabove. Theorem4.5. ThealgorithmTOP-K-PROTOCOL hasacompetitivenessofO(klogn+loglog∆+log1)allowing ε anerrorofεcomparedtoanoptimalofflinealgorithmthatsolvestheexactTop-k-PositionMonitoringproblem. Proof. Thecorrectnessof TOP-K-PROTOCOL andthenumberofmessagesfollowfromLemma4.4. Nowweargue thatOPThadtocommunicateatleastonceintheinterval[t,t′]duringwhichTOP-K-PROTOCOLwasapplied.IfOPT communicated, the bound on the competitivenessdirectly follows. Now assume that OPT did not communicate in the interval[t,t′]. We claim thatthe intervalL maintainedduring TOP-K-PROTOCOL always satisfies the invariant L∗ ⊆ L. If thisclaim is true, we directly obtaina contradictionto the factthat OPT did notcommunicatebecause ofthefollowingreasons. Ontheonehand,becauseOPThastomonitortheexactTop-k-Positions,OPTchoosesthe samesetofnodesF∗ =F whichwaschosenbytheonlinealgorithm.Ontheotherhand,atthetimet′thealgorithm 1 TOP-K-PROTOCOLterminates,u′ <ℓ′ holds. Thus,theintervalL′isemptyandsinceL∗ ⊆L′holds,itfollowsthat L∗isemptyandhence,OPTmusthavecommunicated. We now prove the claim. Recall that TOP-K-PROTOCOL is started with an interval L that fulfills L∗ ⊆ L by definition. Toshow thatL∗ ⊆ L holdsduringthe entire interval[t,t′], itsufficesto arguethateach ofthe previous algorithms makes sure that when started with an interval L such that L∗ ⊆ L, it outputs L′ with L∗ ⊆ L′. Our following reasoning is generic and can be applied to the previous algorithms. Consider the cases in which filter- violations are observed and hence the interval L is modified: If a filter-violation from below happened at a time t >t,thereisanodei∈F withavaluevt1 >ℓ′andthus,ℓ∗ >ℓ′holds. Ifafilter-violationfromabovehappened 1 2 i atatimet′,thereisanodei ∈ F withavaluevt′ < u′ andthus,u∗ < u′ holds. Thiscase-distinctionleadstothe 1 i result,thatL∗hastobeasubsetof[ℓ′,u′]. 5 Competing against an Approximate Adversary Inthissection,westudythecaseinwhichtheadversaryisallowedtouseanapproximatefilter-basedofflinealgorithm, i.e. one thatsolves ε-Top-k-PositionMonitoring. Notsurprisingly, it turnsout thatit is much more challengingfor onlinethanforofflinealgorithmstocopewithorexploittheallowederrorintheoutput.Thisfactisformalizedinthe lower boundin Theorem5.1, which is largerthan previousupperboundsfor the exactproblem. However, we also proposetwoonlinealgorithmsthatarecompetitiveagainstofflinealgorithmsthatareallowedtohavethesameerrorε andasmallererrorε′ ≤ ε,respectively. 2 9 5.1 Lower Bound forCompetitiveAlgorithms We show a lower bound on the competitivenessprovingany online algorithm has to communicateat least (σ −k) timesincontrasttoanofflinealgorithmwhichonlyusesk+1messages. Recallthattheadversarygeneratesthedata streamsandcanseethefilterscommunicatedbytheserver. Notethataslongastheonlineandtheofflinealgorithm areallowedtomakeuseofanerrorε∈(0,1)thelowerboundholds,eveniftheerrorsaredifferent. Theorem5.1. Anyfilter-basedonlinealgorithmwhichsolvestheε-Top-k-PositionMonitoringproblemandisallowed tomakeuseofanerrorofε∈(0,1)hasacompetitivenessofΩ(σ/k)comparedtoanoptimalofflinealgorithmwhich isallowedtousea(potentiallydifferent)errorofε′ ∈(0,1). Proof. Consider an instance in which the observedvalues of σ ∈ [k +1,n] nodes are equalto some value y (the 0 remainingn−σnodesobservesmallervalues)attimet=0andthefollowingadversary:Intimestepr =0,1,...,n− k,theadversarydecidestochangethevalueofonenodeiwithvr = y tobevr+1 = y < (1−ε)·y suchthata i 0 i 1 0 filter-violationoccurs. Observethatsuchavaluey existsifε < 1holdsandanodeialwaysexistssinceotherwise 1 the filters assigned by the online algorithm cannot be feasible. Hence, the number of messages sent by the online algorithm until time step n−k is at least n−k. In contrast, the offline algorithm knows the n−k nodes whose valueschangeovertimeandhence,cansetthefilterssuchthatnofilter-violationhappens. Theofflinealgorithmsets two differentfilters: One filter F = [y ,∞] for those k nodeswhich have a value of y at time step n−k using 1 0 0 k messagesand onefilter F = [0,y ] for the remainingn−k nodesusingonebroadcastmessage. By essentially 2 0 repeatingtheseideas,theinputstreamcanbeextendedtoanarbitrarylength,obtainingthelowerboundasstated. 5.2 Upper Bounds forCompetitiveAlgorithms NowweproposeanalgorithmDENSEPROTOCOLandanalyzethecompetitivenessagainstanoptimalofflinealgorithm inthesettingthatbothalgorithmsareallowedtouseanerrorofε. The algorithm DENSEPROTOCOL is started a time t. For sake of simplicity we assume that the k-th and the (k +1)-st node observethe same value z, that is z := vt = vt . However, if this doesnot hold we can π(k,t) π(k+1,t) define the filters to be F = [vt ,∞] and F = [0,vt ] until a filter-violationis observedat some time t′ 1 π(k+1,t) 2 π(k,t) using O(klogn) messages on expectation. If the filter-violation occurred from below define z := vt and if a π(k,t) filter-violationfromaboveisobserveddefinez :=vt . π(k+1,t) Thehigh-levelideaofDENSEPROTOCOLissimilartotheTOP-K-PROTOCOLtocomputeaguessLonthelower endpoint of the filter of the output F∗ of OPT (assuming OPT did not communicate during [t,t′]) for which the invariantℓ∗ ∈ L∗ ⊆ Lr holds. The goalof DENSEPROTOCOL is to halvetheintervalL while maintainingℓ∗ ∈ L untilL=∅andthusshowthatnovalueexistswhichcouldbeusedbyOPT. Tothisend,thealgorithmpartitionsthenodesintothreesets. Intuitivelyspeaking,thefirstsetwhichwecallV 1 containsthosenodeswhichhavetobepartoftheoptimaloutput,V thosenodesthatcannotbepartofanyoptimal 3 output and V the remaining nodes. The sets change over time as follows. Initially Vt contains those nodes that 2 1 observesavaluevt > 1 z. Sincethealgorithmmaydiscoveratatimet′ > tthatsomenodeihastobemovedto i 1−ε Vt′+1 whichalsocontainsallnodesfrompreviousrounds,i.e.Vt′ ⊆ Vt′+1. OntheotherhandVt initiallycontains 1 1 1 3 the nodeswhich observeda valuevt < (1−ε)z. Here also the algorithmmay discoverat a time t′ > t thatsome i nodeihastobemovedtoVt′+1 which(similartoV )containsnodesfrompreviousrounds. AtthetimetthesetVt 3 1 2 simplycontainstheremainingnodes{1,...,n}\(Vt∪Vt)anditscardinalitywillonlydecreaseovertime. 1 3 InthefollowingwemakeuseofsetsS andS toindicatethatnodesinV maybemovedtoV orV depending 1 2 2 1 3 onthevaluesobservedbytheremainingnodesinV . NodesinS observedavaluelargerthanzbutstillnotthatlarge 2 1 todecidetomoveittoV andsimilarlynodesinS observedsmallervaluesthanzbutnotthatsmalltomoveittoV . 1 2 3 NextweproposethealgorithmDENSEPROTOCOLinwhichwemakeuseofanalgorithmSUBPROTOCOL forthe scenario in which some node i exists that is in S1 and in S2. At a time at which the SUBPROTOCOL terminates it outputsthatℓ∗hastobeinthelowerhalfofLorintheupperhalfofLthus,theintervalLgetshalved(whichinitiates thenextround)ormovesonenodefromV2 toV1 orV3. IntuitivelyspeakingSUBPROTOCOLisdesignedsuchthat,if OPTdidnotcommunicateduring[t,t′],wheretisthetimetheDENSEPROTOCOLisstartedandt′ isthecurrenttime step,themovementofonenodei∈V toV orV impliesthatihasnecessarilytobepartofF∗ ornot. Fornowwe 2 1 3 assumethealgorithmSUBPROTOCOL toworkcorrectlyasablackboxusingSUB(n,|L|)numberofmessages. Note that in case L contains one value and gets halved, the interval L is defined to be empty. In case the r r+1 algorithmobservesmultiplenodesreportingafilter-violationtheserverprocessesoneviolationatatimeinanarbitrary 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.