JournalofMachineLearningResearch9(2008)235-284 Submitted9/06;Revised9/07;Published2/08 Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies AndreasKrause [email protected] ComputerScienceDepartment CarnegieMellonUniversity Pittsburgh,PA15213 AjitSingh [email protected] MachineLearningDepartment CarnegieMellonUniversity Pittsburgh,PA15213 CarlosGuestrin [email protected] ComputerScienceDepartmentandMachineLearningDepartment CarnegieMellonUniversity Pittsburgh,PA15213 Editor:ChrisWilliams Abstract When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosingsensorlocationsisafundamentaltask.Thereareseveralcommonstrategiestoaddressthis task,forexample,geometryordiskmodels,placingsensorsatthepointsofhighestentropy(vari- ance)intheGPmodel,andA-,D-,orE-optimaldesign. Inthispaper,wetacklethecombinatorial optimizationproblemofmaximizingthemutualinformationbetweenthechosenlocationsandthe locationswhicharenotselected. Weprovethattheproblemoffindingtheconfigurationthatmax- imizesmutualinformationisNP-complete. Toaddressthisissue, wedescribeapolynomial-time approximationthatiswithin(1 1=e)oftheoptimumbyexploitingthesubmodularityofmutual (cid:0) information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placementswhicharerobustagainstnodefailuresanduncertaintiesinthemodel.Theseextensions areagainassociatedwithrigoroustheoreticalapproximationguarantees,exploitingthesubmodu- larityoftheobjectivefunction.Wedemonstratetheadvantagesofourapproachtowardsoptimizing mutualinformationinaveryextensiveempiricalstudyontworeal-worlddatasets. Keywords: Gaussian processes, experimental design, active learning, spatial learning; sensor networks 1. Introduction When monitoring spatial phenomena, such as temperatures in an indoor environment as shown in Figure 1(a), using a limited number of sensing devices, deciding where to place the sensors is c2008AndreasKrause,AjitSinghandCarlosGuestrin. (cid:13) KRAUSE,SINGHANDGUESTRIN a fundamental task. One approach is to assume that sensors have a fixed sensing radius and to solvethetaskasaninstanceoftheart-galleryproblem(cf.HochbaumandMaas,1985;Gonzalez- Banosand Latombe,2001). In practice, however, this geometricassumptionis too strong; sensors make noisy measurements about the nearby environment, and this “sensing area” is not usually characterized by a regular disk, as illustrated by the temperature correlations in Figure 1(b). In addition, note that correlations can be both positive and negative, as shown in Figure 1(c), which againisnotwell-characterizedbyadiskmodel. Fundamentally,thenotionthatasinglesensorneeds to predict values in a nearby region is too strong. Often, correlations may be too weak to enable predictionfromasinglesensor. Inothersettings,alocationmaybe“toofar”fromexistingsensorsto enablegoodpredictionifweonlyconsideroneofthem,butcombiningdatafrommultiplesensors we can obtain accurate predictions. This notion of combination of data from multiple sensors in complexspacesisnoteasilycharacterizedbyexistinggeometricmodels. An alternative approach from spatial statistics (Cressie, 1991; Caselton and Zidek, 1984), making weakerassumptionsthanthegeometricapproach,istouseapilotdeploymentorexpertknowledge to learn a Gaussian process (GP) model for the phenomena, a non-parametric generalization of linear regression that allows for the representation of uncertainty about predictions made over the sensedfield. Wecanusedatafromapilotstudyorexpertknowledgetolearnthe(hyper-)parameters ofthisGP.ThelearnedGPmodelcanthenbeusedtopredicttheeffectofplacingsensorsatpartic- ularlocations,andthusoptimizetheirpositions.1 GivenaGP model, manycriteriahavebeenproposedforcharacterizingthequalityofplacements, includingplacingsensorsatthepointsofhighestentropy(variance)intheGPmodel,andA-,D-,or E-optimaldesign,andmutualinformation(cf.ShewryandWynn,1987;CaseltonandZidek,1984; Cressie,1991;ZhuandStein,2006;Zimmerman,2006). Atypicalsensorplacementtechniqueisto greedilyaddsensorswhereuncertaintyaboutthephenomenaishighest,thatis,thehighestentropy location of the GP (Cressie, 1991; Shewry and Wynn, 1987). Unfortunately, this criterion suffers from a significant flaw: entropy is an indirect criterion, not considering the prediction quality of the selected placements. The highest entropy set, that is, the sensors that are most uncertain about each other’s measurements, is usually characterized by sensor locations that are as far as possible from each other. Thus, the entropy criterion tends to place sensors along the borders of the area of interest (Ramakrishnan et al., 2005), for example, Figure 4. Since a sensor usually provides informationabouttheareaaroundit,asensorontheboundary“wastes”sensedinformation. Analternativecriterion,proposedbyCaseltonandZidek(1984),mutualinformation,seekstofind sensor placements that are most informative about unsensed locations. This optimization criterion directlymeasurestheeffectofsensorplacementsontheposterioruncertaintyoftheGP.Inthispaper, we consider the combinatorial optimization problem of selecting placements which maximize this criterion. WefirstprovethatmaximizingmutualinformationisanNP-completeproblem. Then,by exploitingthefactthatmutualinformationisasubmodularfunction(cf.Nemhauseretal.,1978),we designthefirstapproximationalgorithmthatguaranteesaconstant-factorapproximationofthebest setofsensorlocationsinpolynomialtime. Tothebestofourknowledge,nosuchguaranteeexists for any other GP-based sensor placement approach, and for any other criterion. This guarantee 1.ThisinitialGPis,ofcourse,aroughmodel,andasensorplacementstrategycanbeviewedasaninner-loopstepforan activelearningalgorithm(MacKay,2003).Alternatively,ifwecancharacterizetheuncertaintyabouttheparameters ofthemodel,wecanexplicitlyoptimizetheplacementsoverpossiblemodels(Zideketal.,2000;Zimmerman,2006; ZhuandStein,2006). 236 NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES holds both for placing a fixed number of sensors, and in the case where each sensor location can haveadifferentcost. Though polynomial, the complexity of our basic algorithm is relatively high—O(kn4) to select k out of n possible sensor locations. We address this problem in two ways: First, we develop a lazy evaluation technique that exploits submodularity to reduce significantly the number of sensor locations that need to be checked, thus speeding up computation. Second, we show that if we exploit locality in sensing areas by trimming low covariance entries, we reduce the complexity to O(kn). We furthermore show, how the submodularity of mutual information can be used to derive tight online bounds on the solutions obtained by any algorithm. Thus, if an algorithm performs better than our simple proposed approach, our analysis can be used to bound how far the solution ob- tained by this alternative approach is from the optimal solution. Submodularity and these online bounds also allow us to formulate a mixed integer programming approach to compute the optimal solution using Branch and Bound. Finally, we show how mutual information can be made robust againstnodefailuresandmodeluncertainty,andhowsubmodularitycanagainbeexploitedinthese settings. Weprovideaveryextensiveexperimentalevaluation,showingthatdata-drivenplacementsoutper- form placements based on geometric considerations only. We also show that the mutual informa- tioncriterionleadstoimprovedpredictionaccuracieswithareducednumberofsensorscomparedto severalmorecommonlyconsideredexperimentaldesigncriteria,suchasanentropy-basedcriterion, andA-optimal,D-optimalandE-optimaldesigncriteria. Insummary,ourmaincontributionsare: Wetackletheproblemofmaximizingtheinformation-theoreticmutualinformationcriterion (cid:15) of Caselton and Zidek (1984) for optimizing sensor placements, empirically demonstrating itsadvantagesovermorecommonlyusedcriteria. Even though we prove NP-hardness of the optimization problem, we present a polynomial (cid:15) time approximation algorithm with constant factor approximation guarantee, by exploiting submodularity. To the best of our knowledge, no such guarantee exists for any other GP- basedsensorplacementapproach,andforanyothercriterion. Wealsoshowthatsubmodularityprovidesonlineboundsforthequalityofoursolution,which (cid:15) canbeusedinthedevelopmentofefficientbranch-and-boundsearchtechniques,ortobound thequalityofthesolutionsobtainedbyotheralgorithms. Weprovidetwopracticaltechniquesthatsignificantlyspeedupthealgorithm,andprovethat (cid:15) theyhavenoorminimaleffectonthequalityoftheanswer. We extend our analysis of mutual information to provide theoretical guarantees for place- (cid:15) mentsthatarerobustagainstfailuresofnodesanduncertaintiesinthemodel. Extensiveempiricalevaluationofourmethodsonseveralreal-worldsensorplacementprob- (cid:15) lemsandcomparisonswithseveralclassicaldesigncriteria. 237 KRAUSE,SINGHANDGUESTRIN (cid:20)(cid:21) (cid:20)(cid:22) (cid:13)(cid:19)(cid:19)(cid:8)(cid:10)(cid:1) (cid:13)(cid:19)(cid:19)(cid:8)(cid:10)(cid:1) (cid:20)(cid:25) (cid:27) (cid:28) (cid:22)(cid:22) (cid:22)(cid:23) (cid:16)(cid:17)(cid:8)(cid:1)(cid:9) (cid:14)(cid:11)(cid:13)(cid:12)(cid:1) (cid:22)(cid:20) (cid:22)(cid:26) (cid:25)(cid:28) (cid:20)(cid:23) (cid:10)(cid:13)(cid:12)(cid:20)(cid:19)(cid:1)(cid:24)(cid:2)(cid:1)(cid:12)(cid:10)(cid:1) (cid:29) (cid:22)(cid:21) (cid:22)(cid:24) (cid:22)(cid:25) (cid:22)(cid:29) (cid:0)(cid:9)(cid:13)(cid:2)(cid:5)(cid:18)(cid:1) (cid:22)(cid:27) (cid:25)(cid:27) (cid:1)(cid:4)(cid:1)(cid:10) (cid:10)(cid:13)(cid:14)(cid:15) (cid:20) (cid:26) (cid:4)(cid:5)(cid:6) (cid:22)(cid:28) (cid:25)(cid:29) (cid:25)(cid:26) (cid:25) (cid:23)(cid:30)(cid:22) (cid:23)(cid:21) (cid:25)(cid:30)(cid:20) (cid:23) (cid:24) (cid:0)(cid:1)(cid:2)(cid:3)(cid:1)(cid:2) (cid:23)(cid:23) (cid:25)(cid:25) (cid:25)(cid:24) (cid:7)(cid:8)(cid:9)(cid:10)(cid:11)(cid:1)(cid:12) (cid:22) (cid:23)(cid:24) (cid:24)(cid:28) (cid:24)(cid:29) (cid:24)(cid:20) (cid:24)(cid:24) (cid:23)(cid:28) (cid:23)(cid:29) (cid:25)(cid:21) (cid:24)(cid:22) (cid:25)(cid:23) (cid:25)(cid:31)(cid:22) (cid:24)(cid:27) (cid:24)(cid:26) (cid:24)(cid:25) (cid:24)(cid:23) (cid:24)(cid:21) (cid:23)(cid:27) (cid:23)(cid:26) (cid:23)(cid:20) (cid:23)(cid:25) (a) 54nodesensornetworkdeployment 11220505050.700.000.007...9580595..98580..590.981501.90.915500.85.9000.8..990.75500..7905.650.8500..860.0.7065050.00..855.607.0.599.090.75.0750..695500.0.0770..56.858050.050000...988.05.65..50960..9005.55.5705.905.650.7010.7.8505.80.905.850.80.9 --------111111112222211143210987 -0-.00-0.15.051-0.15-00..1055 ---0.010..0.11510.150-00.-0.0-0-000.0..00055..5.2511150.25000.-..40021..02500.01.-.503300.050.510.300..0200050..502..1155-0.020..105500..0.110050-0.0500.1000.0.0055 5 10 15 20 25 30 35 40 43 44 45 46 47 48 (b) Temperaturecorrelations (c) Precipitationcorrelations Figure1: (a) A deployment of a sensor network with 54 nodes at the Intel Berkeley Lab. Cor- relations are often nonstationary as illustrated by (b) temperature data from the sensor network deployment in Figure 1(a), showing the correlationbetween a sensor placed on the blue square and other possible locations; (c) precipitation data from measurements madeacrossthePacificNorthwest,Figure11(b). Thepaperisorganizedasfollows. InSection2,weintroduceGaussianProcesses. Wereviewmutual information criterion in Section 3, and describe our approximation algorithm to optimize mutual information in Section 4. Section 5 presents several approaches towards making the optimization more computationally efficient. In Section 6, we discuss how we can extend mutual information to be robust against node failures and uncertainty in the model. Section 8 relates our approach to other possible optimization criteria, and Section 7 describes related work. Section 9 presents our experiments. 2. GaussianProcesses In this section, we review Gaussian Processes, the probabilistic model for spatial phenomena that formsthebasisofoursensorplacementalgorithms. 238 NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES 22 21 10 20 5 19 18 400 30 1470 30 30 20 25 30 20 20 15 20 10 10 10 10 5 0 0 0 0 (a) TemperaturepredictionusingGP (b) Varianceoftemperatureprediction Figure2: Posterior mean and variance of the temperature GP estimated using all sensors: (a) Pre- dictedtemperature;(b)predictedvariance. 2.1 ModelingSensorDataUsingtheMultivariateNormalDistribution Consider, for example, the sensor network we deployed as shown in Figure 1(a) that measures a temperature field at 54 discrete locations. In order to predict the temperature at one of these locationsfromtheothersensorreadings,weneedthejointdistributionovertemperaturesatthe54 locations. Asimple,yetofteneffective(cf.Deshpandeetal.,2004),approachistoassumethatthe temperatures have a (multivariate) Gaussian joint distribution. Denoting the set of locations as V, inoursensornetworkexample V =54,wehaveasetofn= V correspondingrandomvariables j j j j X withjointdistribution: V 1 P(XV =xV)= (2p )n=2 S e(cid:0)21(xV(cid:0)µV)TS (cid:0)V1V(xV(cid:0)µV); VV j j whereµ isthemeanvectorandS isthecovariancematrix. Interestingly,ifweconsiderasubset, V VV A V,ofourrandomvariables,denotedbyX ,thentheirjointdistributionisalsoGaussian. A (cid:18) 2.2 ModelingSensorDataUsingGaussianProcesses In our sensor network example, we are not just interested in temperatures at sensed locations, but also at locations where no sensors were placed. In such cases, we can use regression techniques toperformprediction(GolubandVanLoan,1989;Hastieetal.,2003). Althoughlinearregression often gives excellent predictions, there is usually no notion of uncertainty about these predictions, for example, for Figure 1(a), we are likely to have better temperature estimates at points near ex- isting sensors, than in the two central areas that were not instrumented. A Gaussian process (GP) is a natural generalization of linear regression that allows us to consider uncertainty about predic- tions. Intuitively, a GP generalizes multivariate Gaussians to an infinite number of random variables. In analogytothemultivariateGaussianabovewheretheindexsetV wasfinite,wenowhavea(possi- blyuncountably)infiniteindexsetV. Inourtemperatureexample,V wouldbeasubsetofR2,and 239 KRAUSE,SINGHANDGUESTRIN Sensor Sensor location location 15 15 10 10 5 5 40 40 30 50 30 50 20 30 40 20 30 40 10 20 10 20 10 10 0 0 0 0 (a) Examplekernelfunction. (b) Datafromtheempiricalcovariancematrix. Figure3: Example kernel function learned from the Berkeley Lab temperature data: (a) learned covariance function K(x; ), where x is the location of sensor 41; (b) “ground truth”, (cid:1) interpolated empirical covariance values for the same sensors. Observe the close match betweenpredictedandmeasuredcovariances. each index would correspond to a position in the lab. GPs have been widely studied (cf. MacKay, 2003;Paciorek,2003;Seeger,2004;O’Hagan,1978;ShewryandWynn,1987;LindleyandSmith, 1972),andgeneralizeKrigingestimatorscommonlyusedingeostatistics(Cressie,1991). AnimportantpropertyofGPsisthatforeveryfinitesubsetA oftheindicesV,whichwecanthink aboutaslocationsintheplane,thejointdistributionoverthecorrespondingrandomvariablesX is A Gaussian,forexample,thejointdistributionovertemperaturesatafinitenumberofsensorlocations isGaussian. Inordertospecifythisdistribution,aGPisassociatedwithameanfunctionM( ),and (cid:1) asymmetricpositive-definitekernelfunctionK( ; ),oftencalledthecovariancefunction. Foreach (cid:1) (cid:1) random variable with index u V, its mean µ is given by M(u). Analogously, for each pair of u 2 indices u;v V, their covariance s is given by K(u;v). For simplicity of notation, we denote uv 2 the mean vector of some set of variables X by µ , where the entry for element u of µ is M(u). A A A Similarly,wedenotetheircovariancematrixbyS ,wheretheentryforu;visK(u;v). AA TheGPrepresentationisextremelypowerful. Forexample,ifweobserveasetofsensormeasure- mentsX =x correspondingtothefinitesubsetA V,wecanpredictthevalueatanypointy V A A (cid:26) 2 conditionedon these measurements, P(X x ). The distribution of X given these observations is y A y j aGaussianwhoseconditionalmeanµ andvariances 2 aregivenby: yA yA j j µ = µ +S S 1(x µ ); (1) yjA y yA (cid:0)AA A(cid:0) A s 2 = K(y;y) S S 1S ; (2) yA (cid:0) yA (cid:0)AA Ay j whereS isacovariancevectorwithoneentryforeachu A withvalueK(y;u),andS =S T . yA 2 Ay yA Figure2(a)andFigure2(b)showtheposteriormeanandvariancederivedusingtheseequationson 54sensorsatIntelLabsBerkeley. Notethattwoareasinthecenterofthelabwerenotinstrumented. These areas have higher posterior variance, as expected. An important property of GPs is that the posterior variance (2) does not depend on the actual observed values x . Thus, for a given kernel A function,thevariancesinFigure2(b)willnotdependontheobservedtemperatures. 240 NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES 2.3 Nonstationarity In order to compute predictive distributions using (1) and (2), the mean and kernel functions have tobeknown. Themeanfunctioncanusuallybeestimatedusingregressiontechniques. Estimating kernelfunctionsisdifficult,andusually,stronglylimitingassumptionsaremade. Forexample,itis commonlyassumedthatthekernelK(u;v)isstationary,whichmeansthatthekerneldependsonly on the difference between the locations, considered as vectors v, u, that is, K(u;v)=Kq (u v). (cid:0) Hereby,q isasetofparameters. Veryoften,thekernelisevenassumedtobeisotropic,whichmeans that the covariance only depends on the distance between locations, that is, K(u;v) = Kq ( u v 2). Common choices for isotropic kernels are the exponential kernel, Kq (d )=exp( jqd j),jjan(cid:0)d jj (cid:0) the Gaussian kernel, Kq (d ) = exp((cid:0)qd 22): These assumptions are frequently strongly violated in practice, as illustrated in the real sensor data shown in Figures 1(b) and 1(c). In Section 8.1, we discusshowplacementsoptimizedfrommodelswithisotropickernelsreducetogeometriccovering andpackingproblems. Inthispaper,wedonot assumethatK( ; )isstationaryorisotropic. Ourapproachisgeneral,and (cid:1) (cid:1) canuseanykernelfunction. Inourexperiments,weusetheapproachofNottandDunsmuir(2002) to estimate nonstationary kernels from data collected by an initial deployment. More specifically, theirassumptionisthatanestimateoftheempiricalcovarianceS atasetofobservedlocationsis AA available,andthattheprocesscanbelocallydescribedbyacollectionofisotropicprocesses,asso- ciatedwithasetofreferencepoints. Anexampleofakernelfunctionestimatedusingthismethod is presented in Figure 3(a). In Section 9.2, we show that placements based on such nonstationary GPsleadtofarbetterpredictionaccuraciesthanthoseobtainedfromisotropickernels. 3. OptimizingSensorPlacements Usually, we are limited to deploying a small number of sensors, and thus must carefully choose wheretoplacethem. Inspatialstatisticsthisoptimizationiscalledsamplingorexperimentaldesign: findingthekbestsensorlocationsoutofafinitesubsetV ofpossiblelocations,forexample,outof agriddiscretizationofR2. 3.1 TheEntropyCriterion Wefirsthavetodefinewhatagooddesignis. Intuitively,wewanttoplacesensorswhicharemost informativewithrespecttotheentiredesignspace. Anaturalnotionofuncertaintyistheconditional entropyoftheunobservedlocationsV A afterplacingsensorsatlocationsA, n H(X X )= p(x ;x )logp(x x )dx dx ; (3) VnA j A (cid:0)Z VnA A VnA j A VnA A where we use X and X to refer to sets of random variables at the locations A and V A. A V A Intuitively, minimizing thisnquantity aims at finding the placement which results in the lowestnun- certainty about all uninstrumented locations V A after observing the placed sensors A. A good n placementwouldthereforeminimizethisconditionalentropy,thatis,wewanttofind A =argmin H(X X ): (cid:3) A(cid:26)V:jAj=k VnA j A 241 KRAUSE,SINGHANDGUESTRIN UsingtheidentityH(X X )=H(X ) H(X ),wecanseethat V A A V A n j (cid:0) A =argmin H(X X )=argmax H(X ): (cid:3) A(cid:26)V:jAj=k VnA j A A(cid:26)V:jAj=k A SowecanseethatweneedtofindasetofsensorsA whichismostuncertainabouteachother. Un- fortunately,thisoptimizationproblem,oftenalsoreferredtoasD-optimaldesignintheexperiment designliterature(cf.Currinetal.,1991),hasbeenshowntobeNP-hard(Koetal.,1995): Theorem1(Koetal.,1995) Given rational M and rational covariance matrix S over Gaus- VV sianrandomvariablesV, decidingwhetherthere exists a subsetA V of cardinalityk such that (cid:18) H(X ) M isNP-complete. A (cid:21) Therefore, the following greedy heuristic has found common use (McKay et al., 1979; Cressie, 1991): Onestartsfromanemptysetoflocations,A =0/,andgreedilyaddsplacementsuntil A =k. 0 j j At each iteration, starting with set A, the greedy rule used is to add the location y V A that i (cid:3)H 2 n hashighestconditionalentropy, y =argmax H(X X ); (4) (cid:3)H y yj Ai that is, the location we are most uncertain about given the sensors placed thus far. If the set of selected locations at iteration i is A = y ;:::;y , using the chain-rule of entropies, we have i 1 i f g that: H(X )=H(X X )+:::+H(X X )+H(X X ): Ai yi j Ai 1 y2 j A1 y1 j A0 (cid:0) Note that the (differential) entropy of a Gaussian random variable X conditioned on some set of y variablesX isamonotonicfunctionofitsvariance: A 1 1 1 H(X X )= log(2p es 2 )= logs 2 + (log(2p )+1); (5) yj A 2 XyXA 2 XyXA 2 j j which can be computed in closed form using Equation (2). Since for a fixed kernel function, the variancedoesnotdependontheobservedvalues,thisoptimizationcanbedonebeforedeployingthe sensors, that is, a sequential, closed-loop design taking into account previous measurements bears noadvantagesoveranopen-loopdesign,performedbeforeanymeasurementsaremade. 3.2 AnImprovedDesignCriterion: MutualInformation The entropy criterion described above is intuitive for finding sensor placements, since the sensors that are most uncertain about each other should cover the space well. Unfortunately, this entropy criterion suffers from the problem shown in Figure 4, where sensors are placed far apart along the boundary of the space. Since we expect predictions made from a sensor measurement to be most precise in a region around it, such placements on the boundary are likely to “waste” information. This phenomenon has been noticed previously by Ramakrishnan et al. (2005), who proposed a weightingheuristic. Intuitively,thisproblemarisesbecausetheentropycriterionisindirect: thecri- teriononlyconsiderstheentropyoftheselectedsensorlocations,ratherthanconsideringprediction quality over the space of interest. This indirect quality of the entropy criterion is surprising, since 242 NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES 8 6 4 2 0 5 10 15 20 Figure4: An example of placements chosen using entropy and mutual information criteria on a subsetofthetemperaturedatafromtheInteldeployment. Diamondsindicatethepositions chosenusingentropy;squaresthepositionschosenusingMI. thecriterionwasderivedfromthe“predictive”formulationH(V A A)inEquation(3),whichis n j equivalenttomaximizingH(A). CaseltonandZidek(1984)proposedadifferentoptimizationcriterion,whichsearchesforthesubset ofsensorlocationsthatmostsignificantlyreducestheuncertaintyabouttheestimatesintherestof thespace. Moreformally,weconsiderourspaceasadiscretesetoflocationsV =S U composed [ oftwoparts: asetS ofpossiblepositionswherewecanplacesensors,andanothersetUofpositions ofinterest,wherenosensorplacementsarepossible. Thegoalistoplaceasetofksensorsthatwill giveusgoodpredictionsatalluninstrumentedlocationsV A. Specifically,wewanttofind n A =argmax H(X ) H(X X ); (cid:3) A(cid:18)S:jAj=k VnA (cid:0) VnA j A that is, the set A that maximally reduces the entropy over the rest of the space V A . Note that (cid:3) (cid:3) n this criterion H(X ) H(X X ) is equivalent to finding the set that maximizes the mutual V A V A A informationI(X ;Xn (cid:0))betweenntjhelocationsA andtherestofthespaceV A. Intheirfollow-up A V A work, Caselton et al.n(1992) and Zidek et al. (2000), argue against the use onf mutual information in a setting where the entropy H(X ) in the observed locationsconstitutesa significantpart of the A total uncertainty H(X ). Caselton et al. (1992) also argue that, in order to compute MI(A), one V needs an accurate model of P(X ). Since then, the entropy criterion has been dominantly used V as a placement criterion. Nowadays however, the estimation of complex nonstationary models for P(X ), as well as computational aspects, are very well understood and handled. Furthermore, we V showempirically,thateveninthesensorselectioncase,mutualinformationoutperformsentropyon severalpracticalplacementproblems. On the same simple example in Figure 4, this mutual information criterion leads to intuitively appropriate central sensor placements that do not have the “wasted information” property of the entropy criterion. Our experimental results in Section 9 further demonstrate the advantages in performance of the mutual information criterion. For simplicity of notation, we will often use MI(A)=I(X ;X )todenotethemutualinformationobjectivefunction. Noticethatinthisno- A V A n 243 KRAUSE,SINGHANDGUESTRIN 9 Upper bound 8 on optimal solution on7 ati Greedy orm6 Best solution nf solution al i5 found u Mut4 3 2 1 2 3 4 5 Number of sensors placed Figure5: Comparisonof the greedyalgorithmwith the optimalsolutionson a small problem. We selectfrom1to5sensorlocationsoutof16,ontheIntelBerkeleytemperaturedatasetas discussedinSection9. Thegreedyalgorithmisalwayswithin95percentoftheoptimal solution. tation the process X and the set of locations V is implicit. We will also write H(A) instead of H(X ). A Themutualinformationisalsohardtooptimize: Theorem2 GivenrationalM andarationalcovariancematrixS overGaussianrandomvari- VV ablesV =S U,decidingwhetherthereexistsasubsetA S ofcardinalityksuchthatMI(A) M [ (cid:18) (cid:21) isNP-complete. Proofs of all results are given in Appendix A. Due to the problem complexity, we cannot expect to find optimal solutions in polynomial time. However, if we implement the simple greedy algo- rithmforthemutualinformationcriterion(detailsgivenbelow),andoptimizedesignsonreal-world placementproblems,weseethatthegreedyalgorithmgivesalmostoptimalsolutions,aspresented inFigure5. Inthissmallexample,wherewecouldcomputetheoptimalsolution,theperformance of the greedy algorithm was at most five percent worse than the optimal solution. In the follow- ing sections, we will give theoretical bounds and empirical evidence justifying this near-optimal behavior. 4. ApproximationAlgorithm OptimizingthemutualinformationcriterionisanNP-completeproblem. Wenowdescribeapoly- nomialtimealgorithmwithaconstant-factorapproximationguarantee. 4.1 TheAlgorithm Our algorithm is greedy, simply adding sensors in sequence, choosing the next sensor which pro- vides the maximum increase in mutual information. More formally, using MI(A)=I(X ;X ), A V A n 244

