Smoothed Analysis of Algorithms: Why the Simplex Algorithm Usually Takes Polynomial Time DANIELA.SPIELMAN MassachusettsInstituteofTechnology,Boston,Massachusetts AND SHANG-HUATENG BostonUniversity,Boston,Massachusetts,andAkamaiTechnologies,Inc. Abstract. We introduce the smoothed analysis of algorithms, which continuously interpolates be- tweentheworst-caseandaverage-caseanalysesofalgorithms.Insmoothedanalysis,wemeasurethe maximumoverinputsoftheexpectedperformanceofanalgorithmundersmallrandomperturbations ofthatinput.Wemeasurethisperformanceintermsofboththeinputsizeandthemagnitudeofthe perturbations.Weshowthatthesimplexalgorithmhassmoothedcomplexitypolynomialintheinput sizeandthestandarddeviationofGaussianperturbations. CategoriesandSubjectDescriptors:F.2.1[AnalysisofAlgorithmsandProblemsComplexity]:Nu- mericalAlgorithmsandProblems;G.1.6[NumericalAnalysis]:Optimization—linearprogramming GeneralTerms:Algorithms,Theory AdditionalKeyWordsandPhrases:Simplexmethod,smoothedanalysis,complexity,perturbation 1. Introduction The Analysis of Algorithms community has been challenged by the existence of remarkable algorithms that are known by scientists and engineers to work well A preliminary version of this article was published in the Proceedings of the 33rd Annual ACM SymposiumonTheoryofComputing(Hersonissos,Crete,Greece,July6–8).ACM,NewYork,2001, pp.296–305. D.Spielman’sworkatM.I.T.waspartiallysupportedbyanAlfredP.SloanFoundationFellowship, NSFgrantsNo.CCR-9701304andCCR-0112487,andaJuniorFacultyResearchLeavesponsored bytheM.I.T.SchoolofScience S.-H.Teng’sworkwasdoneattheUniversityofIllinoisatUrbana-Champaign,BostonUniversity, andwhilevisitingtheDepartmentofMathematicsatM.I.T.Hisworkwaspartiallysupportedbyan AlfredP.SloanFoundationFellowship,andNSFgrantNo.CCR:99-72532. Authors’addresses:D.A.Spielman,DepartmentofMathematics,MassachusettsInstituteofTechnol- ogy,Cambridge,MA02139,e-mail:[email protected];S.-H.Teng,DepartmentofComputer Science,BostonUniversity,Boston,MA02215. Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseis grantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitordirectcommercial advantageandthatcopiesshowthisnoticeonthefirstpageorinitialscreenofadisplayalongwiththe fullcitation.CopyrightsforcomponentsofthisworkownedbyothersthanACMmustbehonored. Abstractingwithcreditispermitted.Tocopyotherwise,torepublish,topostonservers,toredistribute tolists,ortouseanycomponentofthisworkinotherworksrequirespriorspecificpermissionand/or afee.PermissionsmayberequestedfromPublicationsDept.,ACM,Inc.,1515Broadway,NewYork, NY10036USA,fax:C1(212)869-0481,[email protected]. (cid:176)C 2004ACM0004-5411/04/0500-0385$5.00 JournaloftheACM,Vol.51,No.3,May2004,pp.385–463. 386 D.A.SPIELMANANDS.-H.TENG in practice, but whose theoretical analyses are negative or inconclusive. The root of this problem is that algorithms are usually analyzed in one of two ways: by worst-case or average-case analysis. Worst-case analysis can improperly suggest that an algorithm will perform poorly by examining its performance under the most contrived circumstances. Average-case analysis was introduced to provide a less pessimistic measure of the performance of algorithms, and many practical algorithmsperformwellontherandominputsconsideredinaverage-caseanalysis. However,average-caseanalysismaybeunconvincingastheinputsencounteredin many application domains may bear little resemblance to the random inputs that dominatetheanalysis. Weproposeananalysisthatwecallsmoothedanalysiswhichcanhelpexplainthe successofalgorithmsthathavepoorworst-casecomplexityandwhoseinputslook sufficiently different from random that average-case analysis cannot be convinc- inglyapplied.Insmoothedanalysis,wemeasuretheperformanceofanalgorithm under slight random perturbations of arbitrary inputs. In particular, we consider Gaussian perturbations of inputs to algorithms that take real inputs, and we mea- sure the running times of algorithms in terms of their input size and the standard deviationoftheGaussianperturbations. We show that the simplex method has polynomial smoothed complexity. The simplex method is the classic example of an algorithm that is known to perform well in practice but which takes exponential time in the worst case [Klee and Minty1972;Murty1980;GoldfarbandSit1979;Goldfarb1983;AvisandChva´tal 1978;Jeroslow1973;AmentaandZiegler1999].Inthelate1970sandearly1980s the simplex method was shown to converge in expected polynomial time on var- ious distributions of random inputs by researchers including Borgwardt, Smale, Haimovich,Adler,Karp,Shamir,Megiddo,andTodd[Borgwardt1980;Borgwardt 1977;Smale1983;Haimovich1983;Adleretal.1987;AdlerandMegiddo1985; Todd 1986]. These works introduced novel probabilistic tools to the analysis of algorithms, and provided some intuition as to why the simplex method runs so quickly.However,theseanalysesaredominatedby“randomlooking”inputs:even ifoneweretoproveverystrongboundsonthehighermomentsofthedistributions ofrunningtimesonrandominputs,onecouldnotprovethatanalgorithmperforms wellinanyparticularsmallneighborhoodofinputs. Toboundexpectedrunningtimesonsmallneighborhoodsofinputs,weconsider linearprogrammingproblemsintheform maximize zTx subjectto Ax • y; (1) andprovethatforeveryvectorz andeverymatrixA¯ andvectory¯,theexpectation over standard deviation (cid:190)(max k(y¯ ;a¯ )k) Gaussian perturbations A and y of A¯ i i i and y¯ of the time taken by a two-phase shadow-vertex simplex method to solve suchalinearprogramispolynomialin1=(cid:190) andthedimensionsofA. 1.1. LINEARPROGRAMMINGANDTHESIMPLEXMETHOD. Itisdifficulttoover- state the importance of linear programming to optimization. Linear programming problemsariseininnumerableindustrialcontexts.Moreover,linearprogramming is often used as a fundamental step in other optimization algorithms. In a linear programmingproblem,oneisaskedtomaximizeorminimizealinearfunctionover apolyhedralregion. SmoothedAnalysisofAlgorithms 387 Perhaps one reason we see so many linear programs is that we can solve them efficiently.In1947,Dantzigintroducedthesimplexmethod(seeDantzig[1951]), which was the first practical approach to solving linear programs and which re- mains widely used today. To state it roughly, the simplex method proceeds by walking from one vertex to another of the polyhedron defined by the inequali- ties in (1). At each step, it walks to a vertex that is better with respect to the objective function. The algorithm will either determine that the constraints are unsatisfiable, determine that the objective function is unbounded, or reach a ver- tex from which it cannot make progress, which necessarily optimizes the objec- tivefunction. Because of its great importance, other algorithms for linear programming have beeninvented.Khachiyan[1979]appliedtheellipsoidalgorithmtolinearprogram- mingandprovedthatitalwaysconvergedintimepolynomialind,n,and L—the number of bits needed to represent the linear program. However, the ellipsoid al- gorithmhasnotbeencompetitivewiththesimplexmethodinpractice.Incontrast, theinterior-pointmethodintroducedbyKarmarkar[1984],whichalsorunsintime polynomialind,n,and L,hasperformedverywell:variationsoftheinteriorpoint method are competitive with and occasionally superior to the simplex method in practice. In spite of half a century of attempts to unseat it, the simplex method remains the most popular method for solving linear programs. However, there has been no satisfactory theoretical explanation of its excellent performance. A fascinating approach to understanding the performance of the simplex method has been the attempt to prove that there always exists a short walk from each vertex to the optimalvertex.TheHirschconjecturestatesthatthereshouldalwaysbeawalkof length at most n ¡d. Significant progress on this conjecture was made by Kalai andKleitman[1992],whoprovedthattherealwaysexistsawalkoflengthatmost nlog2dC2.However,theexistenceofsuchashortwalkdoesnotimplythatthesimplex methodwillfindit. Asimplexmethodisnotcompletelydefineduntilonespecifiesitspivotrule— the method by which it decides which vertex to walk to when it has many to choosefrom.Thereisnodeterministicpivotruleunderwhichthesimplexmethod isknowntotakeasubexponentialnumberofsteps.Infact,foralmosteverydeter- ministic pivot rule there is a family of polytopes on which it is known to take an exponentialnumberofsteps[KleeandMinty1972;Murty1980;GoldfarbandSit 1979; Goldfarb 1983; Avis and Chva´tal 1978; Jeroslow 1973]. (See Amenta and Ziegler[1999]forasurveyandaunifiedconstructionofthesepolytopes).Thebest prespent analysis of randomized pivot rules shows that they take expected time nO( d)[Kalai1992;Matousˇeketal.1996],whichisquitefarfromthepolynomial complexityobservedinpractice.Thisinconsistencybetweentheexponentialworst- casebehaviorofthesimplexmethodanditseverydaypracticalityleaveuswanting amorereasonabletheoreticalanalysis. Variousaverage-caseanalysesofthesimplexmethodhavebeenperformed.Most relevanttothisarticleistheanalysisofBorgwardt[1977,1980],whoprovedthatthe simplexmethodwiththeshadowvertexpivotrulerunsinexpectedpolynomialtime forpolytopeswhoseconstraintsaredrawnindependentlyfromsphericallysymmet- ricdistributions(e.g.,Gaussiandistributionscenteredattheorigin).Independently, Smale[1983,1982]provedboundsontheexpectedrunningtimeofLemke’sself- dual parametric simplex algorithm on linear programming problems chosen from 388 D.A.SPIELMANANDS.-H.TENG aspherically-symmetricdistribution.Smale’sanalysiswassubstantiallyimproved byMegiddo[1986]. Whiletheseaverage-caseanalysesaresignificantaccomplishments,itisnotclear whethertheyactuallyprovideintuitionforwhathappensontypicalinputs.Edelman [1992]writesonthispoint: Whatisamistakeistopsychologicallylinkarandommatrixwiththe intuitive notion of a “typical” matrix or the vague concept of “any old matrix.” Anothermodelofrandomlinearprogramswasstudiedinalineofresearchiniti- atedindependentlybyHaimovich[1983]andAdler[1983].Theirworksconsidered themaximumovermatrices,A,oftheexpectedtimetakenbyparametricsimplex methodstosolvelinearprogramsoverthesematricesinwhichthedirectionsofthe inequalities are chosen at random. As this framework considers the maximum of anaverage,itmaybeviewedasaprecursortosmoothedanalysis—thedistinction being that the random choice of inequalities cannot be viewed as a perturbation, asdifferentchoicesyieldradicallydifferentlinearprograms.HaimovichandAdler bothprovedthatparametricsimplexmethodswouldtakeanexpectedlinearnum- ber of steps to go from the vertex minimizing the objective function to the vertex maximizingtheobjectivefunction,evenconditionedontheprogrambeingfeasible. Whiletheirtheoremsconfirmedtheintuitionsofmanypractitioners,theywerege- ometricratherthanalgorithmic1 asitwasnotclearhowanalgorithmwouldlocate eithervertex.Buildingontheseanalyses,Todd[1986],AdlerandMegiddo[1985], andAdleretal.[1987]analyzedparametricalgorithmsforlinearprogrammingun- derthismodelandprovedquadraticboundsontheirexpectedrunningtime.While the random inputs considered in these analyses are not as special as the random inputs obtained from spherically symmetric distributions, the model of randomly flippedinequalitiesprovokessomesimilarobjections. 1.2. SMOOTHED ANALYSIS OF ALGORITHMS AND RELATED WORK. We intro- duce the smoothed analysis of algorithms in the hope that it will help explain the good practical performance of many algorithms that worst-case does not and for whichaverage-caseanalysisisunconvincing.Ourfirstapplicationofthesmoothed analysis of algorithms will be to the simplex method. We will consider the maxi- mumoverA¯ andy¯ oftheexpectedrunningtimeofthesimplexmethodoninputs oftheform maximize zTx subjectto (A¯ CG)x • (y¯ Ch); (2) where we let A¯ and y¯ be arbitrary and G and h be a matrix and a vector of independentlychosenGaussianrandomvariablesofmean0andstandarddeviation (cid:190)(max k(y¯ ;a¯ )k). If we let (cid:190) go to 0, then we obtain the worst-case complexity i i i ofthesimplexmethod;whereas,ifwelet(cid:190) besolargethatG swampsoutA,we obtain the average-case analyzed by Borgwardt. By choosing polynomially small (cid:190),thisanalysiscombinesadvantagesofworst-caseandaverage-caseanalysis,and roughlycorrespondstothenotionofimprecisioninlow-orderdigits. 1OurresultsinSection4areanalogoustotheseresults. SmoothedAnalysisofAlgorithms 389 Inasmoothedanalysisofanalgorithm,weassumethattheinputstothealgorithm are subject to slight random perturbations, and we measure the complexity of the algorithmintermsoftheinputsizeandthestandarddeviationoftheperturbations. Ifanalgorithmhaslowsmoothedcomplexity,thenoneshouldexpectittoworkwell inpracticesincemostreal-worldproblemsaregeneratedfromdatathatisinherently noisy.Anotherwayofthinkingaboutsmoothedcomplexityistoobservethatifan algorithm has low smoothed complexity, then one must be unlucky to choose an inputinstanceonwhichitperformspoorly. We now provide some definitions for the smoothed analysis of algorithms that takerealorcomplexinputs.Foranalgorithm Aandinputx,let C (x) A beacomplexitymeasureof Aoninputx.Let X bethedomainofinputsto A,and let X bethesetofinputsofsizen.Thesizeofaninputcanbemeasuredinvarious n ways. Standard measures are the number of real variables contained in the input and the sums of the bit-lengths of the variables. Using this notation, one can say that Ahasworst-caseC-complexity f(n)if max(C (x)) D f(n): A x2Xn Given a family of distributions „ on X , we say that A has average-case C- n n complexity f(n)under„if E [C (x)] D f(n): A xˆ„nXn Similarly,wesaythat AhassmoothedC-complexity f(n;(cid:190))if £ ⁄ maxE C (x C((cid:190) kxk )g) D f(n;(cid:190)); (3) A ? x2Xn g where((cid:190)kxk )g isavectorofGaussianrandomvariablesofmean0andstandard ? deviation(cid:190)kxk andkxk isameasureofthemagnitudeofx,suchasthelargest ? ? elementorthenorm.Wesaythatanalgorithmhaspolynomialsmoothedcomplexity ifitssmoothedcomplexityispolynomialinnand1=(cid:190).InSection6,wepresentsome generalizations of the definition of smoothed complexity that might prove useful. Tofurthercontrastsmoothedanalysiswithaverage-caseanalysis,pwenotethatthe probability mpass in (3) is concentrated in a region of radius O((cid:190) n) and volume at most O((cid:190) n)n, and so, when (cid:190) is small, this region contains an exponentially small fraction of the probability mass in an average-case analysis. Thus, even an extension of average-case analysis to higher moments will not imply meaningful boundsonsmoothedcomplexity. Adiscreteanalogofsmoothedanalysishasbeenstudiedinacollectionofworks inspiredbySanthaandVazirani’ssemi-randomsourcemodel[SanthaandVazirani 1986].Inthismodel,anadversarygeneratesaninput,andeachbitofthisinputhas someprobabilityofbeingflipped.BlumandSpencer[1995]designapolynomial- time algorithm that k-colors k-colorable graphs generated by this model. Feige and Krauthgamer [1998] analyze a model in which the adversary is more power- ful,anduseittoshowthatTurner’salgorithm[Turner1986]forapproximatingthe bandwidthperformswellonsemi-randominputs.TheyalsoimproveTurner’sanal- ysis.FeigeandKilian[1998]presentpolynomial-timealgorithmsthatrecoverlarge 390 D.A.SPIELMANANDS.-H.TENG independentsets,k-colorings,andoptimalbisectionsinsemi-randomgraphs.They alsodemonstratethatsignificantlybetterresultswouldleadtosurprisingcollapses ofcomplexityclasses. 1.3. OUR RESULTS. We consider the maximum over z, y¯, and a¯1;:::;a¯n of the expected time taken by a two-phase shadow vertex simplex method to solve linearprogrammingproblemsoftheform maximize zTx subjectto haaa j xi • y ; for1 •i • n, (4) i i whereeachaaa isaGaussianrandomvectorofstandarddeviation(cid:190) max k(y¯ ;a¯ )k i i i i centered at a¯ , and each y is a Gaussian random variable of standard deviation i i (cid:190) max k(y¯ ;a¯ )kcenteredat y¯ . i i i i Wpe begin by considering the case in which y D 1, ka¯ik • 1, and (cid:190) < 1=3 dlnn. In this case, our first result, Theorem 4.1, says that for every vector t theexpectedsizeoftheshadowofthepolytope—theprojectionofthepolytope definedbytheequations(4)ontotheplanespannedbyt andz—ispolynomialin n,thedimension,and1=(cid:190).Thisresultisthegeometricfoundationofourwork,but itdoesnotdirectlyboundtherunningtimeofanalgorithm,astheshadowrelevant to the analysis of an algorithm depends on the perturbed program and cannot be specified beforehand as the vector t must be. In Section 3.3, we describe a two- phaseshadow-vertexsimplexalgorithm,andinSection5,weuseTheorem4.1as ablackboxtoshowthatittakesexpectedtimepolynomialinn,d,and1=(cid:190) inthe casedescribedabove. Efforts have been made to analyze how much the solution of a linear program can change as its data is perturbed. For an introduction to such analyses, and an analysis of the complexity of interior point methods in terms of the resulting conditionnumber,wereferthereadertotheworkofRenegar[1995b,1995a,1994]. 1.4. INTUITION THROUGH CONDITION NUMBERS. For those already familiar withthesimplexmethodandconditionnumbers,weincludethissectiontoprovide someintuitionforwhyourresultsshouldbetrue. Our analysis will exploit geometric properties of the condition number of a matrix, rather than of a linear program. We start with the observation that if a corner of a polytope is specified by the equation A x D y , where I is a d-set, I I thentheconditionnumberofthematrix A providesagoodmeasureofhowfarthe I cornerisfrombeingflat.Moreover,itisrelativelyeasytoshowthatif Aissubject toperturbation,thenitisunlikelythat A haspoorconditionnumber.So,itseems I intuitivethatif Aisperturbed,thenmostcornersofthepolytopeshouldhaveangles boundedawayfrombeingflat.Thisalreadyprovidessomeintuitionastowhythe simplex method should run quickly: one should make reasonable progress as one roundsacornerifitisnottooflat. Therearetwodifficultiesinmakingtheaboveintuitionrigorous:thefirstisthat even if A is well conditioned for most sets I, it is not clear that A will be well I I conditioned for most sets I that are bases of corners of the polytope. The second difficulty is that even if most corners of the polytope have reasonable condition number,itisnotclearthatasimplexmethodwillactuallyencountermanyofthese corners.Byanalyzingtheshadowvertexpivotrule,itispossibletoresolvebothof thesedifficulties. SmoothedAnalysisofAlgorithms 391 The first advantage of studying the shadow vertex pivot rule is that its analysis comes down to studying the expected sizes of shadows of the polytope. From the specification of the plane onto which the polytope will be projected, one obtains a characterization of all the corners that will be in the shadow, thereby avoiding the complication of an iterative characterization. The second advantage is that thesecornersarespecifiedbythepropertythattheyoptimizeaparticularobjective function, and using this property one can actually bound the probability that they areill-conditioned.WhiletheresultsofSection4arenotstatedintheseterms,this istheintuitionbehindthem. Conditionnumbersalsoplayafundamentalroleinouranalysisoftheshadow- vertex algorithm. The analysis of the algorithm differs from the mere analysis of thesizesofshadowsinthat,inthestudyofanalgorithm,theplaneontowhichthe polytopeisprojecteddependsuponthepolytopeitself.Thiscorrelationoftheplane withthepolytopecomplicatestheanalysis,butisalsoresolvedthroughthehelpof condition numbers. In our analysis, we view the perturbation as the composition of two perturbations, where the second is small relative to the first. We show that our choice of the plane onto which we project the shadow is well-conditioned withhighprobabilityafterthefirstperturbation.Thatis,weshowthatthesecond perturbationisunlikelytosubstantiallychangetheplaneontowhichweproject,and thereforeunlikelytosubstantiallychangetheshadow.Thus,itsufficestomeasure the expected size of the shadow obtained after the second perturbation onto the planethatwouldhavebeenchosenafterjustthefirstperturbation. The technical lemma that enables this analysis, Lemma 5.3, is a concentration resultthatprovesthatitishighlyunlikelythatalmostalloftheminorsofarandom matrixhavepoorconditionnumber.Thisanalysisalsoenablesustoshowthatitis highlyunlikelythatwewillneedalarge“big-M”inphaseIofouralgorithm. We note that the condition numbers of the A s have been studied before in the I complexityoflinearprogrammingalgorithms.Theconditionnumber´¯ ofVavasis A and Ye [1996] measures the condition number of the worst submatrix A , and I theiralgorithmrunsintimeproportionaltoln(´¯ ).Toddetal.[2001]haveshown A that for a Gaussian random matrix the expectation of ln(´¯ ) is O(min(dlnn;n)). A That is, they show that it is unlikely that any A is exponentially ill-conditioned. I It is relatively simple to apply the techniques of Section 5.1 to obtain a similar resultinthesmoothedcase.Wewonderwhetherourconcentrationresultthatitis exponentially unlikely that many A are even polynomially ill-conditioned could I beusedtoobtainabettersmoothedanalysisoftheVavasis–Yealgorithm. 1.5. DISCUSSION. One can debate whether the definition of polynomial smoothed complexity should be that an algorithm have complexity polynomial in 1=(cid:190) orlog(1=(cid:190)).Webelievethatthechoiceofbeingpolynomialin1=(cid:190) willprove more useful as the other definition is too strong and quite similar to the notion of being polynomial in the worst case. In particular, one can convert any algorithm for linear programming whose smoothed complexity is polynomial in d, n and log(1=(cid:190)) into an algorithm whose worst-case complexity is polynomial in d, n, and L.Thatsaid,oneshouldcertainlyprefercomplexityboundsthatareloweras afunctionof1=(cid:190),d andn. We also remark that a simple examination of the constructions that provide exponential lower bounds for various pivot rules [Klee and Minty 1972; Murty 1980; Goldfarb and Sit 1979; Goldfarb 1983; Avis and Chva´tal 1978; Jeroslow 392 D.A.SPIELMANANDS.-H.TENG 1973]revealsthatnoneofthesepivotruleshavesmoothedcomplexitypolynomial in n and subpolynomial in 1=(cid:190). That is, these constructions are unaffected by exponentiallysmallperturbations. 2. NotationandMathematicalPreliminaries Inthissection,wedefinethenotationthatwillbeusedinthearticle.Wewillalso reviewsomebackgroundfrommathematicsandderiveafewsimplestatementsthat wewillneed.Thereadershouldprobablyskimthissectionnow,andsaveamore detailedexaminationforwhentherelevantmaterialisreferenced. ¡ ¢ —[n]denotesthesetofintegersbetween1andn,and [n] denotesthesubsetsof k [n]ofsizek. —Subsetsof[n]aredenotedbythecapitalRomanlettersI;J;L;K.Mwilldenote asubsetofintegers,andKwilldenoteasetofsubsetsof[n]. —Subsets of IR? are denoted by the capital Roman letters A;B;P;Q;R;S;T;U;V. —VectorsinIR? aredenotedbyboldlower-caseRomanletters,suchasaaa ;a¯ ;a˜ , i i i b ;c ,d ;h,t;q;z;y. i i i —Whenever a vector, sayaaa 2 IRd is present, its components will be denoted by lower-caseRomanletterswithsubscripts,suchasa ;:::;a . 1 d —Whenever a collection of vectors, such as aaa ;:::;aaa , are present, the similar 1 n bold¡up¢per-case letter, such as A, will denote the matrix of these vectors. For I 2 [n] ,A willdenotethematrixofthoseaaa forwhichi 2 I. k I i —Matricesaredenotedbyboldupper-caseRomanletters,suchasA;A¯;A˜;B;M andR . ! —Sd¡1 denotestheunitsphereinIRd. —Vectorsin S? willbedenotedbyboldGreekletters,suchas!;ˆ;¿. —Generallyspeaking,univariatequantitieswithscale,suchaslengthsorheights, willberepresentedbylowercaseRomanletterssuchasc,h,l,r,s,andt.The principalexceptionsarethat• and M willalsodenotesuchquantities. —Quantities without scale, such as the ratios of quantities with scale or affine coordinates,willberepresentedbylowercaseGreekletterssuchasfi;fl;‚;»;‡. fiwilldenoteavectorofsuchquantitiessuchas(fi ;:::;fi ). 1 d —DensityfunctionsaredenotedbylowercaseGreekletterssuchas„and”. —ThestandarddeviationsofGaussianrandomvariablesaredenotedbylower-case Greekletterssuchas(cid:190);¿ and‰. —IndicatorrandomvariablesaredenotedbyuppercaseRomanletters,suchas A, B, E, F, V, W, X,Y,and Z —Functions into the reals or integers will be denoted by calligraphic upper-case letters,suchasF;G;SC;S0;T. —FunctionsintoIR? aredenotedbyupper-caseGreekletters,suchas8†;7;9. —hx j yidenotestheinnerproductofvectorsx andy. —Forvectors!andz,weletangle(!;z)denotetheanglebetweenthesevectors attheorigin. SmoothedAnalysisofAlgorithms 393 —Thelogarithmbase2iswrittenlgandthenaturallogarithmiswrittenln. —Theprobabilityofanevent AiswrittenPr[A],andtheexpectationofavariable X iswrittenE[X]. —Theindicatorrandomvariableforanevent Aiswritten[A]. 2.1. GEOMETRICDEFINITIONS. Forthefollowingdefinitions,weletaaa1;:::;aaak denoteasetofvectorsinIRd. —Span(aaa ;:::;aaa )denotesthesubspacespannedbyaaa ;:::;aaa . 1 k 1 k —Aff(aaa1;:::;Paaak)denotesthehPyperplanethatistheaffinespanofaaa1;:::;aaak:the setofpoints fiaaa ,where fi D 1,foralli. i i i i i —ConvHull(aaa ;:::;aaa )denotestheconvexhullofaaa ;:::;aaa . 1 k 1 k —CPone(aaa1;:::;aaak)denotesthepositiveconethroughaaa1;:::;aaak:thesetofpoints fiaaa ,forfi ‚ 0. i i i i —4(aaa ;:::;aaa )denotesthesimplexConvHull(aaa ;:::;aaa ). 1 d 1 d Foralinearprogramspecifiedbyaaa ;:::;aaa ,y andz,wewillsaythatthelinear 1 n programisingeneralpositionif —The points aaa ;:::;aaa are in general position with respect to y, which means 1 n thatforall I ‰ ([n])andx D A¡1y ,andall j 62 I,haaa j xi 6D y . d I I j j —Forall I ‰ ( [n]),z 62 Cone(A ). d¡1 I Furthermore,wewillsaythatthelinearprogramisingeneralpositionwithrespect toavectort ifthesetof‚forwhichthereexistsan I 2 ( [n])suchthat d¡1 (1¡‚)t C‚z 2 Cone(A ) I isfiniteanddoesnotcontain0. 2.2. VECTORANDMATRIXNORMS. Thematerialofthissectionisprincipally used in Sections 3.3 and 5.1. The following definitions and propositions are stan- dard,andmaybefoundinstandardtextsonNumericalLinearAlgebra. Definition2.1 (VectorNorms). Foravectorx,wedefine q P —kxk D x2. P i i —kxk D jx j. 1 i i —kxk D max jx j. 1 i i PROPOSITION2.2 (VECTORSNORMS). Foravectorx 2 IRd, p kxk • kxk • dkxk: 1 Definition2.3 (MatrixNorm). ForamatrixA,wedefine kAk dDef maxkAxk=kxk: x PROPOSITION2.4 (PROPERTIESOFMATRIXNORM). For d-by-d matrices A andB,andad-vectorx, (a) kAxk • kAkkxk. (b) kABk • kAkkBk. 394 D.A.SPIELMANANDS.-H.TENG (c) kAk D kATk. p (d) kAk • dmax kaaa k,whereA D (aaa ;:::;aaa ). i i 1 d (e) det(A) • kAkd. Definition2.5 (s ()). ForamatrixA,wedefine min (cid:176) (cid:176) s (A) dDef (cid:176)A¡1(cid:176)¡1: min Werecallthats (A)isthesmallestsingularvalueofthematrixA,andthatitis min notanorm. PROPOSITION2.6 (PROPERTIESOFsmin()). Ford-by-d matricesAandB, (a) s (A) D min kAxk=kxk. min x (b) s (B) ‚ s (A)¡kA¡Bk. min min 2.3. PROBABILITY. For an event, A, we let [A] denote the indicator random variable for the event. We generally describe random variables by their density functions.Ifx hasdensity„,then Z Pr[A(x)] dDef [A(x)]„(x)dx : If B isanotherevent,then R [B(x)][A(x)]„(x)dx Pr[A(x)] dDef Pr[A(x)jB(x)] dDef R : B [B(x)]„(x)dx In a context where multiple densities are present, we will use use the notation Pr„[A(x)]toindicatetheprobabilityof Awhenx isdistributedaccordingto„. Inmanysituations,wewillnotknowthedensity„ofarandomvariablex,but rather a function ” such that ”(x) D c„(x) for some constant c. In this case, we willsaythatx hasdensityproportionalto”. ThefollowingPropositionsandLemmaswillplayaprominentroleintheproofs in this article. The only one of these which might not be intuitively obvious is Lemma2.11. PROPOSITION2.7 (AVERAGE•MAXIMUM). Let„(x;y)beadensityfunction, andletxandybedistributedaccordingto„(x;y).IfA(x;y)isaneventandX(x;y) israndomvariable,then Pr[A(x;y)] • maxPr[A(x;y)]; and x;y x y E [X(x;y)] • maxE[X(x;y)]; x;y x y where in the right-hand terms, y is distributed in accordance with the induced distribution„(x;y). PROPOSITION2.8 (EXPECTATIONONSUBDOMAIN). Let x be a random vari- ableand A(x)anevent.Let P beameasurablesubsetofthedomainofx.Then, Pr [A(x)] • Pr[A(x)]=Pr[x 2 P]: x2P
Description: