Table Of Content

Tighter Estimates for (cid:15)-nets for Disks NorbertBus ShashwatGarg Universite´ Paris-Est, IITDelhi Laboratoired’InformatiqueGaspard-Monge, [email protected] EquipeA3SI,ESIEEParis. [email protected] 5 1 NabilH.Mustafa 0 2 Universite´ Paris-Est, n Laboratoired’InformatiqueGaspard-Monge, a EquipeA3SI,ESIEEParis. J [email protected] 4 1 SaurabhRay ] ComputerScience,NewYorkUniversity,AbuDhabi. G [email protected] C . s c [ Abstract 1 Thegeometrichittingsetproblemisoneofthebasicgeometriccombinatorialoptimizationproblems: v given a set P of points, and a set D of geometric objects in the plane, the goal is to compute a small- 6 4 sized subset of P that hits all objects in D. In 1994, Bronniman and Goodrich [5] made an important 2 connectionofthisproblemtothesizeoffundamentalcombinatorialstructurescalled(cid:15)-nets,showingthat 3 small-sized (cid:15)-nets imply approximation algorithms with correspondingly small approximation ratios. 0 Very recently, Agarwal-Pan [2] showed that their scheme can be implemented in near-linear time for 1. disksintheplane. AltogetherthisgivesO(1)-factorapproximationalgorithmsinO˜(n)timeforhitting 0 setsfordisksintheplane. 5 Thisconstantfactordependsonthesizesof(cid:15)-netsfordisks;unfortunately,thecurrentstate-of-the- 1 artboundsarelarge–atleast24/(cid:15)andmostlikelylargerthan40/(cid:15). Thustheapproximationfactorof : v the Agarwal-Pan algorithm ends up being more than 40. The best lower-bound is 2/(cid:15), which follows i X fromthePach-Woegingerconstruction[26]forhalfspacesintwodimensions. Thusthereisalargegap betweenthebest-knownupperandlowerbounds. Besidesbeingofindependentinterest,findingprecise r a boundsisimportantsincethisimmediatelyimpliesanimprovedlinear-timealgorithmforthehitting-set problem. Themaingoalofthispaperistoimprovetheupper-boundto13.4/(cid:15)fordisksintheplane. Theproof isconstructive,givingasimplealgorithmthatusesonlyDelaunaytriangulations. Wehaveimplemented the algorithm, which is available as a public open-source module. Experimental results show that the sizesof(cid:15)-netsforavarietyofdata-setsislower,around9/(cid:15). 1 Introduction The minimum hitting set problem is one of the most fundamental combinatorial optimization problems: given a range space (P,D) consisting of a set P and a set D of subsets of P called the ranges, the task is to compute the smallest subset Q ⊆ P that has a non-empty intersection with each of the ranges in D. This problem is strongly NP-hard. If there are no restrictions on the set system D, then it is known that it is NP-hard to approximate the minimum hitting set within a logarithmic factor of the optimal [28]. The problem is NP-complete even for the case where each range has exactly two points since this problem is equivalent to the vertex cover problem which is known to be NP-complete [20, 14]. A natural occurrence ofthehittingsetproblemoccurswhentherangespaceD isderivedfromgeometry–e.g.,givenasetP of n points in R2, and a set D of m triangles containing points of P, compute the minimum-sized subset of P that hits all the triangles in D. Unfortunately, for most natural geometric range spaces, computing the minimum-sizedhittingsetremainsNP-hard. Forexample,eventhe(relatively)simplecasewhereDisaset ofunitdisksintheplaneisstronglyNP-hard[19]. Thereforefastalgorithmsforcomputingprovablygood approximatehittingsetsforgeometricrangespaceshavebeenintensivelystudiedforthepastthreedecades (e.g.,seethetworecentPhDthesesonthistopic[12,13]). The case studied in this paper – hitting sets for disks in the plane – has been the subject of a long line of research. The case when all the disks have the same radius is easier, and has been studied in a series of works: Ca˘linsecuetal.[7]proposeda108-approximationalgorithm,whichwassubsequentlyimprovedby Ambhuletal.[3]to72. Carmietal.[8]furtherimprovedthattoa38-approximationalgorithm,thoughwith therunningtimeofO(n6). Claudeetal.[10]wereabletoachievea22-approximationalgorithmrunningin timeO(n6). MorerecentlyFraseretal.[15]presenteda18-approximationalgorithmintimeO(n2). So far, besides ad-hoc approaches, there are two systematic lines along which all progress on the hitting- set problem for geometric ranges has relied on: rounding via (cid:15)-nets, and local-search. The local-search approachstartswithanyhittingsetS ⊆ P,andrepeatedlydecreasesthesizeofS,ifpossible,byreplacing k pointsofS with≤ k−1pointsofP \S. Callsuchanalgorithmak-localsearchalgorithm. Ithasbeen shown [24] that a k-local search algorithm for the hitting set problem for disks in the plane gives a PTAS. Unfortunatelytherunningtimeoftheiralgorithmtocomputea(1+(cid:15))-approximationisO(nO(1/(cid:15)2)). Very recentlyBusetal.[6]wereabletoimprovetheanalysisandalgorithmofthelocal-searchapproachtodesign a 8-approximation running in time O(n2.33). However, at this moment, a near-linear time algorithm based onlocal-searchseemsbeyondreach. Wecurrentlydonotevenknowhowtocomputethemosttrivialcase, namely when k = 1, of local-search in near-linear time: given the set of disks D, and a set of points P, computeaminimalhittingsetinP ofD. Roundingvia(cid:15)-nets. Givenarangespace(P,D)andaparameter(cid:15) > 0,an(cid:15)-netisasubsetS ⊆ P such thatD∩S (cid:54)= ∅forallD ∈ D with|D∩P| ≥ (cid:15)n. Thefamous“(cid:15)-nettheorem”ofHausslerandWelzl[18] states that for range spaces with VC-dimension d, there exists an (cid:15)-net of size O(d/(cid:15)logd/(cid:15)) (this bound was later improved to O(d/(cid:15)log1/(cid:15)), which was shown to be optimal in general [25, 21]). Sometimes, weightedversionsoftheproblemareconsideredinwhicheachp ∈ P hassomepositiveweightassociated withitsothatthetotalweightofallelementsofP is1. Theweightofeachrangeisthesumoftheweightsof theelementsinit. Theaimistohitallrangeswithweightmorethan(cid:15). TheconditionofhavingfiniteVC- dimension is satisfied by many geometric set systems: disks, half-spaces, k-sided polytopes, r-admissible setofregionsetc. inRd. Forcertainrangespaces,onecanevenshowtheexistenceof(cid:15)-netsofsizeO(1/(cid:15)) –animportantcasebeingfordisksinR2 [27]. In1994,BronnimannandGoodrich[5]provedthefollowinginterestingconnectionbetweenthehitting-set problem,and(cid:15)-nets: let(P,D)bearange-spaceforwhichwewanttocomputeaminimumhittingset. Ifone cancomputean(cid:15)-netofsizec/(cid:15)forthe(cid:15)-netproblemfor(P,D)inpolynomialtime,thenonecancompute a hitting set of size at most c· OPT for (P,D), where OPT is the size of the optimal (smallest) hitting set, inpolynomialtime. Ashorter,simplerproofwasgivenbyEvenetal.[11]. Boththeseproofsconstructan assignment of weights to points in P such that the total weight of each range D ∈ D (i.e., the sum of the weightsofthepointsinD)isatleast(1/OPT)-thfractionofthetotalweight. Thena(1/OPT)-netwiththese 2 weightsisahittingset. Untilveryrecently,thebestsuchroundingalgorithmshadrunningtimesofΩ(n2), andithadbeenalong-standingopenproblemtocomputeaO(1)-approximationtothehitting-setproblem fordisksintheplaneinnear-lineartime. Inarecentbreak-through,Agarwal-Pan[2]presentedanalgorithm that is able to do the required rounding efficiently for a broad set of geometric objects. In particular, they areabletogetthefirstnear-linearalgorithmforcomputingO(1)-approximationsforhittingsetsfordisks. Bounds on (cid:15)-nets. The result of Agarwal-Pan [2] opens the way, for the first time, for near linear-time algorithmsforthegeometrichittingsetproblem. Thecatchisthattheapproximationfactordependsonthe sizesof(cid:15)-netsfordisks;despiteover7differentproofsofO(1/(cid:15))-sized(cid:15)-netsfordisks,theprecisebounds arenotveryencouraging. Thepapercontainingtheearliestproof,Matouseketal.[22],wasovertwenty-two yearsagoandthussummarizedtheirresult: “Note that in principle the (cid:15)-net construction presented in this paper can be transformed into a determin- istic algorithm that runs in polynomial time, O(n3) at worst. However, we certainly would not advocate this algorithm as being practical. We find the resulting constant of proportionality also not particularly flattering.”[22] Sofar,thebestconstantsforthe(cid:15)-netscomefromtheproofsin[27]and[17]. Thelatterpaperpresentsfive proofsfortheexistenceoflinearsize(cid:15)-netsforhalfspacesinR3. Thebestconstantfordisksisobtainedby usingtheirfirstproof. AliftingoftheproblemofdiskstoR3givesan(cid:15)-netproblemwithlowerhalfspacesin R3,forwhich[17]obtainsaboundof 4f(α)whereα < 1 andf(α)isthebestboundonthesizeofanα-net (cid:15) 3 for lower halfspaces in R3. Using the lower bound of [26] for halfspaces in R2, f(α) ≥ (cid:100)2/α(cid:101)−1 ≥ 6, althoughwebelievethatitisatleast10sinceevenfor(cid:15) = 1/2,no(cid:15)-netconstructionofsizelessthan10is known. Thus,thebestconstructionssofargiveaboundthatisatleast24/(cid:15)andmostlikelymorethan40/(cid:15). Furthermore, there is no implementation or software solution available that can even compute such (cid:15)-nets efficiently. OurContributions We prove new improved bounds on sizes of (cid:15)-nets and present efficient algorithms to compute such nets. Ourapproachissimple: wewillshowthatmodificationstoawell-knowntechniqueforcomputing(cid:15)-nets– thesample-and-refineapproachofChazelle-Friedman[9]–togetherwithadditionalstructuralpropertiesof Delaunaytriangulationsinfactresultsin(cid:15)-netsofsurprisinglylowsize: Theorem1.1. GivenasetP ofnpointsinR2,thereexistsan(cid:15)-netunderdiskrangesofsizeatmost13.4/(cid:15). FurthermoreitcanbecomputedinexpectedtimeO(nlogn). AmajoradvantageofDelaunaytriangulationsisthattheirbehaviorhasbeenextensivelystudied, thereare many efficient implementations available, and they exhibit good behavior for various real-world data-sets as well as random point sets. The algorithm, using CGAL, is furthermore simple to implement. We have implementedit, andpresentthesizesof(cid:15)-netsforvariousreal-worlddata-sets; theresultsindicatethatour theoreticalanalysiscloselytrackstheactualsizeofthenets. Thiscanadditionallybeseenascontinuingthe programforbetteranalysisofbasicgeometrictools;see,e.g.,Har-Peled[16]foranalysisofalgorithmsand Matousek[23]fordetailedanalysis,bothforarelatedstructurecalledcuttingsintheplane. TogetherwiththeresultofAgarwal-Pan,thisimmediatelyimpliesthefollowing: Corollary1.1. Foranyδ > 0,onecancomputea(13.4+δ)-approximationtotheminimumhittingsetfor (P,D)intimeO˜(n). 3 2 A near linear time algorithm for computing (cid:15)-nets for disks in the plane Through a more careful analysis, we present an algorithm for computing an (cid:15)-net of size 13.4, running in (cid:15) near linear time. The method, shown in Algorithm 1, computes a random sample and then solves certain subproblems involving subsets located in pairs of Delaunay disks circumscribing adjacent triangles in the Delaunay triangulation of the random sample. The key to improved bounds is i) considering edges in the Delaunaytriangulationinsteadoffacesintheanalysis,andii)newimprovedconstructionsforlargevalues of(cid:15). Let ∆(abc) denote the triangle defined by the three points a, b and c. D denotes the disk through a, b abc andc,whileD denotesthehalfspacedefinedbyaandbnotcontainingthepointc. Letc(D)denotethe abc centerofthediskD. Let Ξ(R) be the Delaunay triangulation of a set of points R ⊆ P in the plane. We will use Ξ when R is clear from the context. For any triangle ∆ ∈ Ξ, let D be the Delaunay disk of ∆, and let P be the set ∆ ∆ of points of P contained in D . Similarly, for any edge e ∈ Ξ, let ∆1 and ∆2 be the two triangles in Ξ ∆ e e (cid:83) adjacent to e, and P = P P . If e is on the convex-hull, then one of the triangles is taken to be the e ∆1 ∆2 e e halfspacedefinedbyenotcontainingR. Algorithm1:Compute(cid:15)-nets Data: Compute(cid:15)-net,givenP: setofnpointsinR2,(cid:15) > 0andc . 1 1 if(cid:15)n < 13then 2 ReturnP 3 Pickeachpointp ∈ P intoRindependentlywithprobability c1. (cid:15)n 4 if|R| ≤ c1/2(cid:15)then 5 restartalgorithm. 6 ComputetheDelaunaytriangulationΞofR. 7 fortriangles∆ ∈ Ξdo 8 ComputethesetofpointsP∆ ⊆ P inDelaunaydiskD∆ of∆. 9 foredgese ∈ Ξdo 10 Let∆1e and∆2e bethetwotrianglesadjacenttoe,Pe = P∆1e ∪P∆2e. 11 Let(cid:15)(cid:48) = (|P(cid:15)ne|)andcomputea(cid:15)(cid:48)-netRe forPe dependingonthecasesbelow: 12 if 2 < (cid:15)(cid:48) < 1then 3 13 computeusingLemma2.1. 14 if 1 < (cid:15)(cid:48) ≤ 2 then 2 3 15 computeusingLemma2.2. 16 if(cid:15)(cid:48) ≤ 1 then 2 17 computerecursively. (cid:83) 18 Return( eRe)∪R. Inordertoprovethatthealgorithmgivesthedesiredresult,thefollowingtheoremsregardingthesizeofan (cid:15)-netwillbeuseful. Letf((cid:15))bethesizeofthesmallest(cid:15)-netforanysetP ofpointsinR2underdiskranges. Lemma2.1([4]). For 2 < (cid:15) < 1,f((cid:15)) ≤ 2,andcanbecomputedinO(nlogn)time. 3 Lemma2.2. For 1 < (cid:15) ≤ 2,f((cid:15)) ≤ 10andcanbecomputedinO(nlogn)time. 2 3 4 Q Proof. Divide the plane into 4 quadrants with 2 lines, intersecting at a q point q, such that each quadrant contains n/4 points. Using the Ham- Sandwich theorem, this can be done in linear time [21]. Create a 2-net 3 for each quadrant, using Lemma 2.1. Add these 8 points to the (cid:15)-net of R P. If q ∈ P then add q to the (cid:15)-net; otherwise let ∆ be the triangle in Figure1: Setuparoundq. the Delaunay triangulation of P that contains the point q. Add the two verticesof∆thatareintheoppositequadrantstothe(cid:15)-net. Theresulting size of the net is at most 10. Denote the quadrant without a vertex of the Delaunay triangle inside it by Q and its opposite quadrant by R. If a disk D intersects at most 3 quadrants and does not contain any of the points from the 2-net in each of those quadrants, it can contain only at most 3· 2 · n = n points. On the 3 3 4 2 other hand, if D contains points from each of the 4 quadrants, then it must contain points from Q and R thatareoutsideoftheDelaunaydiskD of∆(asD isemptyofpointsofP). ThenifDdoesnotcontain ∆ ∆ any of the two vertices of ∆ in the opposite quadrants (already added to the (cid:15)-net), it must pierce D , a ∆ contradiction. Callatuple({p,q},{r,s}),wherep,q,r,s ∈ P,aDelaunayquadrupleifint(∆(pqr))∩int(∆(pqs)) = ∅. Defineitsweight,denotedW ,tobethenumberofpointsofP inD ∪D . LetT beaset ({p,q},{r,s}) pqr pqs ≤k of Delaunay quadruples of P of weight at most k and similarly T denotes the set of Delaunay quadruples k of weight exactly k. Similarly, a Delaunay triple is given by ({p,q},{r}), where p,q,r ∈ P. Define its weight,denotedW ,tobethenumberofpointsofP inD ∪D . LetS beasetofDelaunay ({p,q},{r}) pqr pqr ≤k triplesofP ofweightatmostk,andS denotesthesetofDelaunaytriplesofweightexactlyk. k OnecanupperboundthesizeofT ,S andusingit,wederiveanupperboundontheexpectednumber ≤k ≤k ofsub-problemswithacertainnumberofpoints. Claim2.3. |T | ≤ (e3/9)nk3 asymptotically and |T | ≤ (3.1)nk3 fork ≥ 13. ≤k ≤k Proof. TheproofisanapplicationoftheClarkson-Shortechnique[21]. PickeachpointinP independently with probability p to get a random sample R . Count the expected number of edges in the Delaunay cs cs triangulationofR intwoways. Ononehand,itissimplylessthan3E[|R |] = 3np . Ontheotherhand, cs cs cs itis: (cid:88) 3np ≥ E[NumberofDelaunayedgesinR ] = Pr[{p,q}isaDelaunayedgeofR ] cs cs cs p,q∈P (cid:88) (cid:88) ≥ Pr[(D ∪D )∩R = ∅] (disjointevents) pqr pqs cs p,q∈Pr,s∈P (cid:88) ≥ Pr[(D ∪D )∩R = ∅] pqr pqs cs ({p,q},{r,s})∈T≤k (cid:88) ≥ p4 ·(1−p )k = |T |·p4 ·(1−p )k cs cs ≤k cs cs ({p,q},{r,s})∈T≤k Therefore|T | ≤ 3np /(p4 (1−p )k)andasimplecalculationgivesthatsettingp = 3 minimizes ≤k cs cs cs cs k+3 therighthandside. Then|T | ≤ 3n 3 /(( 3 )4(1− 3 )k) = nk31(1+ 3)k+3,andtheclaimfollows. ≤k k+3 k+3 k+3 9 k Claim2.4. |S | ≤ (e2/4)nk2 asymptotically and |S | ≤ (2.14)nk2 fork ≥ 13. ≤k ≤k Proof. Pick each point in P independently with probability p to get a random sample R . Count the cs cs expected number of edges in the Delaunay triangulation of R that lie on the boundary of the Delaunay cs 5 triangulation, i.e., adjacent to exactly one triangle, in two ways. On one hand, it is exactly the number of edgesintheconvex-hullofR ,thereforeatmostE[|R |] = np . Countedanotherway,itis: cs cs cs (cid:88) np ≥ E[NumberofboundaryDelaunayedgesinR ] = Pr[{p,q}isaboundaryDelaunayedgeofR ] cs cs cs p,q∈P (cid:88) (cid:88) ≥ Pr[(D ∪D )∩R = ∅] (disjointevents) pqr pqr cs p,q∈Pr∈P (cid:88) ≥ Pr[(D ∪D )∩R = ∅] pqr pqr cs ({p,q},{r})∈S≤k (cid:88) ≥ p3 ·(1−p )k = |S |·p3 ·(1−p )k cs cs ≤k cs cs ({p,q},{r})∈S≤k Settingp = 2 givestherequiredresult. cs k+2 Claim2.5. (cid:104) (cid:105) (3.1)c3 E |{e ∈ Ξ | k (cid:15)n ≤ |P | ≤ k (cid:15)n}| ≤ 1(k3c +3.7k2) if (cid:15)n ≥ 13. 1 e 2 (cid:15)ek1c1 1 1 2 Proof. The crucial observation is that two points {p,q} form an edge in Ξ with two adjacent triangles ∆(pqr),∆(pqs) ∈ Ξ iff {p,q,r,s} ⊆ R and none of the points of P in D ∪D are picked in R (i.e, pqr pqs thepointsp,q,r,sformtheDelaunaytuple({p,q},{r,s})). Or{p,q}formanedgeontheconvex-hullof Ξwithoneadjacenttriangle∆(pqr)iff{p,q,r} ⊆ RandnoneofthepointsofP inD ∪D arepicked pqr pqr inR. Letχ betherandomvariablethatis1iff{p,q}formanedgeinΞandtheirtwoadjacenttriangles ({p,q},{r,s}) are ∆(pqr) and ∆(pqs). Let χ be the random variable that is 1 iff {p,q} form an edge in Ξ ({p,q},{r}) withexactlyoneadjacenttriangle∆(pqr). NotingthateveryedgeinΞmustcomefromeitheraDelaunay quadrupleoraDelaunaytriple, (cid:88) E[|{e | k (cid:15)n ≤ |P | ≤ k (cid:15)n}|] = Pr[χ = 1] + 1 e 2 ({p,q},{r,s}) p,q,r,s∈P k1(cid:15)n≤W({p,q},{r,s})≤k2(cid:15)n (cid:88) Pr[χ = 1] ({p,q},{r}) p,q,r∈P k1(cid:15)n≤W({p,q},{r})≤k2(cid:15)n Thesecondtermisasymptoticallysmaller,sowebounditsomewhatloosely: (cid:88) (cid:88) Pr[χ({p,q},{r}) = 1] ≤ (c1/(cid:15)n)3(1−c1/(cid:15)n)W({p,q},{r}) p,q,r∈P p,q,r k1(cid:15)n≤W({p,q},{r})≤k2(cid:15)n k1(cid:15)n≤W({p,q},{r})≤k2(cid:15)n ≤ |S |·(c /(cid:15)n)3(1−c /(cid:15)n)k1(cid:15)n ≤k2(cid:15)n 1 1 (2.14)k2c3 ≤ (2.14)n(k (cid:15)n)2·(c /(cid:15)n)3·e−c1k1 = 2 1. 2 1 (cid:15)ec1k1 6 Nowwecarefullyboundthefirstterm: (cid:88) k(cid:88)2(cid:15)n (cid:88) Pr[χ = 1] ≤ Pr[χ = 1] ({p,q},{r,s}) ({p,q},{r,s}) k1(cid:15)n≤Wp({,qp,,rq,}s,{∈rP,s})≤k2(cid:15)n i=k1(cid:15)nW({p,pq,}q,,{rr,s,s})=i k(cid:88)2(cid:15)n (cid:88) ≤ (c /(cid:15)n)4(1−c /(cid:15)n)i 1 1 i=k1(cid:15)nW({p,pq,}q,,{rr,s,s})=i k(cid:88)2(cid:15)n ≤ |T |(c /(cid:15)n)4(1−c /(cid:15)n)i i 1 1 i=k1(cid:15)n As the above summation is exponentially decreasing as a function of i, it is maximized when |T | = i0 max|T |wherei = k (cid:15)n,and|T | = max|T |−max|T |andsoon. UsingClaim2.3weobtain: ≤i0 0 1 i ≤i ≤i−1 k(cid:88)2(cid:15)n ≤ |T |·(c /(cid:15)n)4(1−c /(cid:15)n)k1(cid:15)n+ (|T |−|T |)·(c /(cid:15)n)4(1−c /(cid:15)n)i ≤k1(cid:15)n 1 1 ≤i ≤i−1 1 1 i=k1(cid:15)n+1 k(cid:88)2(cid:15)n ≤ (3.1)n(k (cid:15)n)3·(c /(cid:15)n)4(1−c /(cid:15)n)k1(cid:15)n+ (3.1)n·3i2·(c /(cid:15)n)4(1−c /(cid:15)n)i 1 1 1 1 1 i=k1(cid:15)n+1 k3c4e−k1c1 3k2c4 k(cid:88)2(cid:15)n ≤ (3.1) 1 1 +(3.1) 2 1 (1−c /(cid:15)n)i (cid:15) (cid:15)2n 1 i=k1(cid:15)n+1 ≤ (3.1)k13c41e−k1c1 +(3.1)3k22c41(1−c1/(cid:15)n)k1(cid:15)n ≤ (3.1)c31(k3c +3k2). (cid:15) (cid:15)2n c1/(cid:15)n (cid:15)ek1c1 1 1 2 Theprooffollowsbysummingupthetwoterms. Usingtheabovefactswecanprovethemainresult. Lemma2.6. AlgorithmCOMPUTE(cid:15)-NETcomputesan(cid:15)-netofexpectedsize13.4/(cid:15). p1 Proof. First we show that the algorithm computes an (cid:15)-net. Take any diskD withcenterccontaining(cid:15)npointsofP, andnothitbytheinitial randomsampleR. Increaseitsradiuswhilekeepingitscentercfixeduntil e it passes through a point, say p of R. Now further expand the disk by 1 DD movingcinthedirectionp(cid:126)cuntilitsboundarypassesthroughasecond 1 pointp ofR. Theedgeedefinedbyp andp belongstoΞ,andthetwo p2 2 1 2 extremedisksinthepencilofemptydisksthroughp andp arethedisks 1 2 D andD . TheirunioncoversD,andsoDcontains(cid:15)npointsoutof ∆1 ∆2 e e thesetP . ThenthenetR computedforP musthitD,as(cid:15)n = ((cid:15)n/|P |)·|P |. e e e e e For the expected size, clearly, if (cid:15)n < 13 then the returned set is an (cid:15)-net of size 13. Otherwise we can (cid:15) calculate the expected number of points added to the (cid:15)-net during solving the sub-problems. We simply group them by the number of points in them. Set E = {e | 2i(cid:15)n ≤ |P | < 2i+1(cid:15)n}, and let us denote the i e 7 sizeofthe(cid:15)-netreturnedbyouralgorithmwithf(cid:48)((cid:15)). Then E(cid:2)f(cid:48)((cid:15))(cid:3) = E[|R|]+E(cid:104)| (cid:91) R |(cid:105) = c1 +E[|{e|(cid:15)n ≤ |P | < 3(cid:15)n/2}|]·f(2/3) e e (cid:15) e∈Ξ +E[|{e|3(cid:15)n/2 ≤ |P | < 2(cid:15)n}|]·f(1/2) e   (cid:18) (cid:19) (cid:88) (cid:88) (cid:15)n + E f(cid:48)  |P | e i=1 e∈Ei NotingthatE[(cid:80) f(cid:48)( (cid:15)n )||E | = t] ≤ tE[f(cid:48)(1/2i+1)],weget e∈Ei |Pe| i     (cid:18) (cid:19) (cid:18) (cid:19) E(cid:88) f(cid:48) (cid:15)n  = EE[(cid:88) f(cid:48) (cid:15)n |Ei] ≤ E(cid:2)|Ei|·E[f(cid:48)(1/2i+1)](cid:3) = E[|Ei|]·E[f(cid:48)(1/2i+1)] |P | |P | e e e∈Ei e∈Ei as|E |andf(cid:48)(·)areindependent. As(cid:15)(cid:48) = (cid:15)n > (cid:15),byinduction,assumeE[f(cid:48)((cid:15)(cid:48))] ≤ 13.4. Then i |Pe| (cid:15)(cid:48) E(cid:2)f(cid:48)((cid:15))(cid:3) ≤ c1 + (3.1)·c31(c1+8.34) ·2+ (3.1)·c31((3/2)3c1+14.8) ·10 (cid:15) (cid:15)ec1 (cid:15)e3c1/2 +(cid:88) (3.1)·c31(23ic1+3.7·22i+2) ·13.4·2i+1 ≤ 13.4 (cid:15)ec12i (cid:15) i bysettingc = 12. 1 Finally,weboundtheexpectedrunningtimeofthealgorithm. Lemma2.7. AlgorithmCOMPUTE(cid:15)-NETrunsinexpectedtimeO(nlogn). Proof. NotethatE[|R|] = c /(cid:15). FirstweboundtheexpectedtotalsizeofallthesetsP : 1 e (cid:104)(cid:88) (cid:105) (cid:88) E |P | ≤ E[|{e | 0 ≤ |P | < (cid:15)n}|]·(cid:15)n+ E[|{e | 2i(cid:15)n ≤ |P | < 2i+1(cid:15)n}|]·2i+1(cid:15)n e e e e∈Ξ i=0 (cid:15)n (cid:88) (cid:18)(2i)3 (cid:19) ≤ O( )+ O ·2i+1(cid:15)n = O(n), (cid:15) (cid:15)e2ic1 i=0 as the last summation is a geometric series. This implies that the expected total number of incidences between points in P, and Delaunay disks in Ξ is O(n). The Delaunay triangulation of R can be computed in expected time O(1/(cid:15)log1/(cid:15)). Steps 5-6 compute, for each Delaunay disk D ∈ Ξ, the list of points contained in D. This can be computed in O(nlog1/(cid:15)) time by instead finding, for each p ∈ P, the list of DelaunaydisksinΞcontainingp,asfollows. Firstdopoint-locationinΞtolocatethetriangle∆containing p, in expected time O(log1/(cid:15)). Clearly D contains p. Now starting from ∆, do a breadth-first search ∆ in the dual planar graph of the Delaunay triangulation to find the maximally connected subset of triangles (verticesinthedualgraph)whoseDelaunaydiskscontainp. Aseachvertexinthedualgraphhasdegreeat most3, thistakestimeproportionaltothediscoveredlistoftriangles, whichasshownearlierisO(n)over allp ∈ P. Thecorrectnessfollowsfromthefollowing: Fact 2.8. Given a Delaunay triangulation Ξ on R and any point p ∈ R2, the set of triangles in Ξ whose Delaunaydiskscontainpformaconnectedsub-graphinthedualgraphtoΞ. 8 Proof. ThiscanbeseenbyliftingP toR3 viatheVeronesemapping,whereitfollowsfromthefactthatthe facesofaconvexpolyhedronthatarevisiblefromanyexteriorpointareconnected. Notethatbythe(cid:15)-nettheorem,theprobabilityofrestartingthealgorithm(lines4-5)atanycallisatmosta constant. Thereforeitisre-startedexpectedatmostaconstantnumberoftimes,andsotheexpectedrunning time,denotedbyT(n): (cid:88) (cid:88) E[T(n)] = O(1/(cid:15)log1/(cid:15))+O(nlog1/(cid:15))+ E[T(|P |)] ≤ O(nlog1/(cid:15))+ E[T(|P |)] e e e∈Ξ e∈Ξ Similarlytopreviouscalculationswehavethat (3.1)·c3(c +8.34) E[T(n)] ≤ O(nlog1/(cid:15))+ 1 1 ·O(3(cid:15)n/2log(3(cid:15)n/2)) (cid:15)ec1 (3.1)·c3((3/2)3c +14.8) + 1 1 ·O(2(cid:15)nlog(2(cid:15)n)) (cid:15)e3c1/2 +(cid:88) (3.1)·c31(23ic1+3.7·22i+2) ·E[T(2i+1(cid:15)n)] (cid:15)ec12i i=1 ≤ dnlogn+(cid:88) (3.1)·c31(23ic1+3.7·22i+2) ·E[T(2i+1(cid:15)n)] (cid:15)ec12i i=1 for a constant d coming from the constants above, as well as in Delaunay triangulation, point-location and list-constructioncomputations. SettingE[T(k)] = cklogk satisfiestheaboveinequalityforc ≥ 2d,since E[T(n)] ≤ dnlogn+(cid:88) (3.1)·c31(23ic1+3.7·22i+2) ·c(2i+1(cid:15)n)log(2i+1(cid:15)n) (cid:15)ec12i i=1 (cid:88) 2i+1·(3.1)·123(23i·12+3.7·22i+2) ≤ dnlogn+(cnlogn) e12·2i i=1 1 ≤ dnlogn+cnlogn· ≤ cnlogn, for c ≥ 2d. 2 3 Implementation and Experiments In this section we present experimental results for our algorithm running on a machine equipped with an IntelCorei7870processorwith4coreseachrunningat2.93GHzandwith16GBmainmemory. Allour implementationsaresinglethreadedinordertohaveafaircomparison. Fornearest-neighborsandDelaunay triangulations, we use the well-known geometry library CGAL. It computes Delaunay triangulations in expectedO(nlogn)time. Insteadofcomputingcenterpoints,wewillrecurseforallvaluesof(cid:15)(cid:48);thisresults insimpleefficientcode,atthecostofslightlylargerconstants. In order to empirically validate the size of the (cid:15)-net obtained by our random sampling algorithm we have utilized several datasets in [1]. The MOPSI Finland dataset contains 13467 locations of users in Finland. The KDDCUP04Bio dataset contains the first 2 dimensions of a protein dataset with 145,751 entries. The Europe and Birch3 datasets have 169,308 and 100,000 entries respectively. We have created two random 9 30 (cid:15) MOPSI y b KDDCU d Europe e li Uniform p 20 i Gauss9 t ul Birch3 m e z i s t 10 e n - (cid:15) 2 4 6 8 10 12 14 16 c 1 Figure2: (cid:15)-netsizemultipliedby(cid:15)for4sets,(cid:15) = 0.01. data sets Uniform and Gauss9 with 50,000 and 90,000 points. The former is sampled from a uniform distributionwhilethelatterissampledfrom9differentgaussiandistributionswhosemeansandcovariance matricesarerandomlygenerated. Settingtheprobabilityforrandomsamplingto 12 resultsinapproximately (cid:15)·n 12 sizednetsfornearlyalldatasets,asexpectedbyouranalysis. Wenotehowever,thatinpracticesettingc (cid:15) 1 to7givessmallersize(cid:15)-nets,ofsizearound 9. SeeFigure2forthedependencyofthenetsizeonc while (cid:15) 1 setting(cid:15)to0.01. InTable1welist(cid:15)-netsizesfordifferentvaluesof(cid:15)whilesettingc to12. 1 Dataset (cid:15)-netsize (cid:15) = 0.2 (cid:15) = 0.1 (cid:15) = 0.01 (cid:15) = 0.001 MOPSIFinland 83 128 1226 12011 KDDCUP04Bio 55 118 1176 11902 Europe 69 119 1205 12043 Birch3 58 125 1198 11878 Uniform 70 109 1245 12034 Gauss9 58 120 1275 12011 Table1: (cid:15)-netsizesforvariouspointsets,c = 12. 1 4 Conclusion In this paper we have improved upon the constants in the previous construction of (cid:15)-nets for disks in the plane. Our method gives an efficient practical algorithm for computing such (cid:15)-nets, which we have imple- mentedandtestedonavarietyofdata-sets. Weconcludewithalistofopenproblems: • Currentlythebestknownlower-boundisthe2/(cid:15)boundforhalfspacesinR2. Itremainsaninteresting questiontoimprovethislower-bound,orimprovetheupper-boundsgiveninthispaper. • CurrentlythealgorithmofAgarwalandPan[2]usesanumberofheavytools(dynamicrangereport- ing, dynamic approximate range counting) that hinders an efficient and practical implementation of their algorithm. It would be considerable progress to derive a more practical method with provable guarantees. 10