Table Of ContentJournalofMachineLearningResearch9(2008)235-284 Submitted9/06;Revised9/07;Published2/08
Near-Optimal Sensor Placements in Gaussian Processes:
Theory, Efficient Algorithms and Empirical Studies
AndreasKrause KRAUSEA@CS.CMU.EDU
ComputerScienceDepartment
CarnegieMellonUniversity
Pittsburgh,PA15213
AjitSingh AJIT@CS.CMU.EDU
MachineLearningDepartment
CarnegieMellonUniversity
Pittsburgh,PA15213
CarlosGuestrin GUESTRIN@CS.CMU.EDU
ComputerScienceDepartmentandMachineLearningDepartment
CarnegieMellonUniversity
Pittsburgh,PA15213
Editor:ChrisWilliams
Abstract
When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs),
choosingsensorlocationsisafundamentaltask.Thereareseveralcommonstrategiestoaddressthis
task,forexample,geometryordiskmodels,placingsensorsatthepointsofhighestentropy(vari-
ance)intheGPmodel,andA-,D-,orE-optimaldesign. Inthispaper,wetacklethecombinatorial
optimizationproblemofmaximizingthemutualinformationbetweenthechosenlocationsandthe
locationswhicharenotselected. Weprovethattheproblemoffindingtheconfigurationthatmax-
imizesmutualinformationisNP-complete. Toaddressthisissue, wedescribeapolynomial-time
approximationthatiswithin(1 1=e)oftheoptimumbyexploitingthesubmodularityofmutual
(cid:0)
information. We also show how submodularity can be used to obtain online bounds, and design
branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations
and local structure in the GP, yielding significant speedups. We also extend our approach to find
placementswhicharerobustagainstnodefailuresanduncertaintiesinthemodel.Theseextensions
areagainassociatedwithrigoroustheoreticalapproximationguarantees,exploitingthesubmodu-
larityoftheobjectivefunction.Wedemonstratetheadvantagesofourapproachtowardsoptimizing
mutualinformationinaveryextensiveempiricalstudyontworeal-worlddatasets.
Keywords: Gaussian processes, experimental design, active learning, spatial learning; sensor
networks
1. Introduction
When monitoring spatial phenomena, such as temperatures in an indoor environment as shown
in Figure 1(a), using a limited number of sensing devices, deciding where to place the sensors is
c2008AndreasKrause,AjitSinghandCarlosGuestrin.
(cid:13)
KRAUSE,SINGHANDGUESTRIN
a fundamental task. One approach is to assume that sensors have a fixed sensing radius and to
solvethetaskasaninstanceoftheart-galleryproblem(cf.HochbaumandMaas,1985;Gonzalez-
Banosand Latombe,2001). In practice, however, this geometricassumptionis too strong; sensors
make noisy measurements about the nearby environment, and this “sensing area” is not usually
characterized by a regular disk, as illustrated by the temperature correlations in Figure 1(b). In
addition, note that correlations can be both positive and negative, as shown in Figure 1(c), which
againisnotwell-characterizedbyadiskmodel. Fundamentally,thenotionthatasinglesensorneeds
to predict values in a nearby region is too strong. Often, correlations may be too weak to enable
predictionfromasinglesensor. Inothersettings,alocationmaybe“toofar”fromexistingsensorsto
enablegoodpredictionifweonlyconsideroneofthem,butcombiningdatafrommultiplesensors
we can obtain accurate predictions. This notion of combination of data from multiple sensors in
complexspacesisnoteasilycharacterizedbyexistinggeometricmodels.
An alternative approach from spatial statistics (Cressie, 1991; Caselton and Zidek, 1984), making
weakerassumptionsthanthegeometricapproach,istouseapilotdeploymentorexpertknowledge
to learn a Gaussian process (GP) model for the phenomena, a non-parametric generalization of
linear regression that allows for the representation of uncertainty about predictions made over the
sensedfield. Wecanusedatafromapilotstudyorexpertknowledgetolearnthe(hyper-)parameters
ofthisGP.ThelearnedGPmodelcanthenbeusedtopredicttheeffectofplacingsensorsatpartic-
ularlocations,andthusoptimizetheirpositions.1
GivenaGP model, manycriteriahavebeenproposedforcharacterizingthequalityofplacements,
includingplacingsensorsatthepointsofhighestentropy(variance)intheGPmodel,andA-,D-,or
E-optimaldesign,andmutualinformation(cf.ShewryandWynn,1987;CaseltonandZidek,1984;
Cressie,1991;ZhuandStein,2006;Zimmerman,2006). Atypicalsensorplacementtechniqueisto
greedilyaddsensorswhereuncertaintyaboutthephenomenaishighest,thatis,thehighestentropy
location of the GP (Cressie, 1991; Shewry and Wynn, 1987). Unfortunately, this criterion suffers
from a significant flaw: entropy is an indirect criterion, not considering the prediction quality of
the selected placements. The highest entropy set, that is, the sensors that are most uncertain about
each other’s measurements, is usually characterized by sensor locations that are as far as possible
from each other. Thus, the entropy criterion tends to place sensors along the borders of the area
of interest (Ramakrishnan et al., 2005), for example, Figure 4. Since a sensor usually provides
informationabouttheareaaroundit,asensorontheboundary“wastes”sensedinformation.
Analternativecriterion,proposedbyCaseltonandZidek(1984),mutualinformation,seekstofind
sensor placements that are most informative about unsensed locations. This optimization criterion
directlymeasurestheeffectofsensorplacementsontheposterioruncertaintyoftheGP.Inthispaper,
we consider the combinatorial optimization problem of selecting placements which maximize this
criterion. WefirstprovethatmaximizingmutualinformationisanNP-completeproblem. Then,by
exploitingthefactthatmutualinformationisasubmodularfunction(cf.Nemhauseretal.,1978),we
designthefirstapproximationalgorithmthatguaranteesaconstant-factorapproximationofthebest
setofsensorlocationsinpolynomialtime. Tothebestofourknowledge,nosuchguaranteeexists
for any other GP-based sensor placement approach, and for any other criterion. This guarantee
1.ThisinitialGPis,ofcourse,aroughmodel,andasensorplacementstrategycanbeviewedasaninner-loopstepforan
activelearningalgorithm(MacKay,2003).Alternatively,ifwecancharacterizetheuncertaintyabouttheparameters
ofthemodel,wecanexplicitlyoptimizetheplacementsoverpossiblemodels(Zideketal.,2000;Zimmerman,2006;
ZhuandStein,2006).
236
NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES
holds both for placing a fixed number of sensors, and in the case where each sensor location can
haveadifferentcost.
Though polynomial, the complexity of our basic algorithm is relatively high—O(kn4) to select k
out of n possible sensor locations. We address this problem in two ways: First, we develop a
lazy evaluation technique that exploits submodularity to reduce significantly the number of sensor
locations that need to be checked, thus speeding up computation. Second, we show that if we
exploit locality in sensing areas by trimming low covariance entries, we reduce the complexity to
O(kn).
We furthermore show, how the submodularity of mutual information can be used to derive tight
online bounds on the solutions obtained by any algorithm. Thus, if an algorithm performs better
than our simple proposed approach, our analysis can be used to bound how far the solution ob-
tained by this alternative approach is from the optimal solution. Submodularity and these online
bounds also allow us to formulate a mixed integer programming approach to compute the optimal
solution using Branch and Bound. Finally, we show how mutual information can be made robust
againstnodefailuresandmodeluncertainty,andhowsubmodularitycanagainbeexploitedinthese
settings.
Weprovideaveryextensiveexperimentalevaluation,showingthatdata-drivenplacementsoutper-
form placements based on geometric considerations only. We also show that the mutual informa-
tioncriterionleadstoimprovedpredictionaccuracieswithareducednumberofsensorscomparedto
severalmorecommonlyconsideredexperimentaldesigncriteria,suchasanentropy-basedcriterion,
andA-optimal,D-optimalandE-optimaldesigncriteria.
Insummary,ourmaincontributionsare:
Wetackletheproblemofmaximizingtheinformation-theoreticmutualinformationcriterion
(cid:15)
of Caselton and Zidek (1984) for optimizing sensor placements, empirically demonstrating
itsadvantagesovermorecommonlyusedcriteria.
Even though we prove NP-hardness of the optimization problem, we present a polynomial
(cid:15)
time approximation algorithm with constant factor approximation guarantee, by exploiting
submodularity. To the best of our knowledge, no such guarantee exists for any other GP-
basedsensorplacementapproach,andforanyothercriterion.
Wealsoshowthatsubmodularityprovidesonlineboundsforthequalityofoursolution,which
(cid:15)
canbeusedinthedevelopmentofefficientbranch-and-boundsearchtechniques,ortobound
thequalityofthesolutionsobtainedbyotheralgorithms.
Weprovidetwopracticaltechniquesthatsignificantlyspeedupthealgorithm,andprovethat
(cid:15)
theyhavenoorminimaleffectonthequalityoftheanswer.
We extend our analysis of mutual information to provide theoretical guarantees for place-
(cid:15)
mentsthatarerobustagainstfailuresofnodesanduncertaintiesinthemodel.
Extensiveempiricalevaluationofourmethodsonseveralreal-worldsensorplacementprob-
(cid:15)
lemsandcomparisonswithseveralclassicaldesigncriteria.
237
KRAUSE,SINGHANDGUESTRIN
(cid:20)(cid:21) (cid:20)(cid:22) (cid:13)(cid:19)(cid:19)(cid:8)(cid:10)(cid:1) (cid:13)(cid:19)(cid:19)(cid:8)(cid:10)(cid:1) (cid:20)(cid:25) (cid:27) (cid:28) (cid:22)(cid:22) (cid:22)(cid:23) (cid:16)(cid:17)(cid:8)(cid:1)(cid:9) (cid:14)(cid:11)(cid:13)(cid:12)(cid:1) (cid:22)(cid:20) (cid:22)(cid:26)
(cid:25)(cid:28) (cid:20)(cid:23) (cid:10)(cid:13)(cid:12)(cid:20)(cid:19)(cid:1)(cid:24)(cid:2)(cid:1)(cid:12)(cid:10)(cid:1) (cid:29) (cid:22)(cid:21) (cid:22)(cid:24) (cid:22)(cid:25) (cid:22)(cid:29)
(cid:0)(cid:9)(cid:13)(cid:2)(cid:5)(cid:18)(cid:1) (cid:22)(cid:27)
(cid:25)(cid:27) (cid:1)(cid:4)(cid:1)(cid:10) (cid:10)(cid:13)(cid:14)(cid:15) (cid:20) (cid:26) (cid:4)(cid:5)(cid:6) (cid:22)(cid:28)
(cid:25)(cid:29)
(cid:25)(cid:26) (cid:25) (cid:23)(cid:30)(cid:22) (cid:23)(cid:21)
(cid:25)(cid:30)(cid:20) (cid:23) (cid:24)
(cid:0)(cid:1)(cid:2)(cid:3)(cid:1)(cid:2) (cid:23)(cid:23)
(cid:25)(cid:25) (cid:25)(cid:24) (cid:7)(cid:8)(cid:9)(cid:10)(cid:11)(cid:1)(cid:12) (cid:22) (cid:23)(cid:24)
(cid:24)(cid:28) (cid:24)(cid:29) (cid:24)(cid:20) (cid:24)(cid:24) (cid:23)(cid:28) (cid:23)(cid:29)
(cid:25)(cid:21) (cid:24)(cid:22)
(cid:25)(cid:23) (cid:25)(cid:31)(cid:22) (cid:24)(cid:27) (cid:24)(cid:26) (cid:24)(cid:25) (cid:24)(cid:23) (cid:24)(cid:21) (cid:23)(cid:27) (cid:23)(cid:26) (cid:23)(cid:20) (cid:23)(cid:25)
(a) 54nodesensornetworkdeployment
11220505050.700.000.007...9580595..98580..590.981501.90.915500.85.9000.8..990.75500..7905.650.8500..860.0.7065050.00..855.607.0.599.090.75.0750..695500.0.0770..56.858050.050000...988.05.65..50960..9005.55.5705.905.650.7010.7.8505.80.905.850.80.9 --------111111112222211143210987 -0-.00-0.15.051-0.15-00..1055 ---0.010..0.11510.150-00.-0.0-0-000.0..00055..5.2511150.25000.-..40021..02500.01.-.503300.050.510.300..0200050..502..1155-0.020..105500..0.110050-0.0500.1000.0.0055
5 10 15 20 25 30 35 40 43 44 45 46 47 48
(b) Temperaturecorrelations (c) Precipitationcorrelations
Figure1: (a) A deployment of a sensor network with 54 nodes at the Intel Berkeley Lab. Cor-
relations are often nonstationary as illustrated by (b) temperature data from the sensor
network deployment in Figure 1(a), showing the correlationbetween a sensor placed on
the blue square and other possible locations; (c) precipitation data from measurements
madeacrossthePacificNorthwest,Figure11(b).
Thepaperisorganizedasfollows. InSection2,weintroduceGaussianProcesses. Wereviewmutual
information criterion in Section 3, and describe our approximation algorithm to optimize mutual
information in Section 4. Section 5 presents several approaches towards making the optimization
more computationally efficient. In Section 6, we discuss how we can extend mutual information
to be robust against node failures and uncertainty in the model. Section 8 relates our approach to
other possible optimization criteria, and Section 7 describes related work. Section 9 presents our
experiments.
2. GaussianProcesses
In this section, we review Gaussian Processes, the probabilistic model for spatial phenomena that
formsthebasisofoursensorplacementalgorithms.
238
NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES
22
21 10
20
5
19
18 400 30
1470 30 30 20 25
30 20 20 15
20 10
10 10 10 5
0 0 0 0
(a) TemperaturepredictionusingGP (b) Varianceoftemperatureprediction
Figure2: Posterior mean and variance of the temperature GP estimated using all sensors: (a) Pre-
dictedtemperature;(b)predictedvariance.
2.1 ModelingSensorDataUsingtheMultivariateNormalDistribution
Consider, for example, the sensor network we deployed as shown in Figure 1(a) that measures
a temperature field at 54 discrete locations. In order to predict the temperature at one of these
locationsfromtheothersensorreadings,weneedthejointdistributionovertemperaturesatthe54
locations. Asimple,yetofteneffective(cf.Deshpandeetal.,2004),approachistoassumethatthe
temperatures have a (multivariate) Gaussian joint distribution. Denoting the set of locations as V,
inoursensornetworkexample V =54,wehaveasetofn= V correspondingrandomvariables
j j j j
X withjointdistribution:
V
1
P(XV =xV)= (2p )n=2 S e(cid:0)21(xV(cid:0)µV)TS (cid:0)V1V(xV(cid:0)µV);
VV
j j
whereµ isthemeanvectorandS isthecovariancematrix. Interestingly,ifweconsiderasubset,
V VV
A V,ofourrandomvariables,denotedbyX ,thentheirjointdistributionisalsoGaussian.
A
(cid:18)
2.2 ModelingSensorDataUsingGaussianProcesses
In our sensor network example, we are not just interested in temperatures at sensed locations, but
also at locations where no sensors were placed. In such cases, we can use regression techniques
toperformprediction(GolubandVanLoan,1989;Hastieetal.,2003). Althoughlinearregression
often gives excellent predictions, there is usually no notion of uncertainty about these predictions,
for example, for Figure 1(a), we are likely to have better temperature estimates at points near ex-
isting sensors, than in the two central areas that were not instrumented. A Gaussian process (GP)
is a natural generalization of linear regression that allows us to consider uncertainty about predic-
tions.
Intuitively, a GP generalizes multivariate Gaussians to an infinite number of random variables. In
analogytothemultivariateGaussianabovewheretheindexsetV wasfinite,wenowhavea(possi-
blyuncountably)infiniteindexsetV. Inourtemperatureexample,V wouldbeasubsetofR2,and
239
KRAUSE,SINGHANDGUESTRIN
Sensor Sensor
location location
15 15
10 10
5 5
40 40
30 50 30 50
20 30 40 20 30 40
10 20 10 20
10 10
0 0 0 0
(a) Examplekernelfunction. (b) Datafromtheempiricalcovariancematrix.
Figure3: Example kernel function learned from the Berkeley Lab temperature data: (a) learned
covariance function K(x; ), where x is the location of sensor 41; (b) “ground truth”,
(cid:1)
interpolated empirical covariance values for the same sensors. Observe the close match
betweenpredictedandmeasuredcovariances.
each index would correspond to a position in the lab. GPs have been widely studied (cf. MacKay,
2003;Paciorek,2003;Seeger,2004;O’Hagan,1978;ShewryandWynn,1987;LindleyandSmith,
1972),andgeneralizeKrigingestimatorscommonlyusedingeostatistics(Cressie,1991).
AnimportantpropertyofGPsisthatforeveryfinitesubsetA oftheindicesV,whichwecanthink
aboutaslocationsintheplane,thejointdistributionoverthecorrespondingrandomvariablesX is
A
Gaussian,forexample,thejointdistributionovertemperaturesatafinitenumberofsensorlocations
isGaussian. Inordertospecifythisdistribution,aGPisassociatedwithameanfunctionM( ),and
(cid:1)
asymmetricpositive-definitekernelfunctionK( ; ),oftencalledthecovariancefunction. Foreach
(cid:1) (cid:1)
random variable with index u V, its mean µ is given by M(u). Analogously, for each pair of
u
2
indices u;v V, their covariance s is given by K(u;v). For simplicity of notation, we denote
uv
2
the mean vector of some set of variables X by µ , where the entry for element u of µ is M(u).
A A A
Similarly,wedenotetheircovariancematrixbyS ,wheretheentryforu;visK(u;v).
AA
TheGPrepresentationisextremelypowerful. Forexample,ifweobserveasetofsensormeasure-
mentsX =x correspondingtothefinitesubsetA V,wecanpredictthevalueatanypointy V
A A
(cid:26) 2
conditionedon these measurements, P(X x ). The distribution of X given these observations is
y A y
j
aGaussianwhoseconditionalmeanµ andvariances 2 aregivenby:
yA yA
j j
µ = µ +S S 1(x µ ); (1)
yjA y yA (cid:0)AA A(cid:0) A
s 2 = K(y;y) S S 1S ; (2)
yA (cid:0) yA (cid:0)AA Ay
j
whereS isacovariancevectorwithoneentryforeachu A withvalueK(y;u),andS =S T .
yA 2 Ay yA
Figure2(a)andFigure2(b)showtheposteriormeanandvariancederivedusingtheseequationson
54sensorsatIntelLabsBerkeley. Notethattwoareasinthecenterofthelabwerenotinstrumented.
These areas have higher posterior variance, as expected. An important property of GPs is that the
posterior variance (2) does not depend on the actual observed values x . Thus, for a given kernel
A
function,thevariancesinFigure2(b)willnotdependontheobservedtemperatures.
240
NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES
2.3 Nonstationarity
In order to compute predictive distributions using (1) and (2), the mean and kernel functions have
tobeknown. Themeanfunctioncanusuallybeestimatedusingregressiontechniques. Estimating
kernelfunctionsisdifficult,andusually,stronglylimitingassumptionsaremade. Forexample,itis
commonlyassumedthatthekernelK(u;v)isstationary,whichmeansthatthekerneldependsonly
on the difference between the locations, considered as vectors v, u, that is, K(u;v)=Kq (u v).
(cid:0)
Hereby,q isasetofparameters. Veryoften,thekernelisevenassumedtobeisotropic,whichmeans
that the covariance only depends on the distance between locations, that is, K(u;v) = Kq ( u
v 2). Common choices for isotropic kernels are the exponential kernel, Kq (d )=exp( jqd j),jjan(cid:0)d
jj (cid:0)
the Gaussian kernel, Kq (d ) = exp((cid:0)qd 22): These assumptions are frequently strongly violated in
practice, as illustrated in the real sensor data shown in Figures 1(b) and 1(c). In Section 8.1, we
discusshowplacementsoptimizedfrommodelswithisotropickernelsreducetogeometriccovering
andpackingproblems.
Inthispaper,wedonot assumethatK( ; )isstationaryorisotropic. Ourapproachisgeneral,and
(cid:1) (cid:1)
canuseanykernelfunction. Inourexperiments,weusetheapproachofNottandDunsmuir(2002)
to estimate nonstationary kernels from data collected by an initial deployment. More specifically,
theirassumptionisthatanestimateoftheempiricalcovarianceS atasetofobservedlocationsis
AA
available,andthattheprocesscanbelocallydescribedbyacollectionofisotropicprocesses,asso-
ciatedwithasetofreferencepoints. Anexampleofakernelfunctionestimatedusingthismethod
is presented in Figure 3(a). In Section 9.2, we show that placements based on such nonstationary
GPsleadtofarbetterpredictionaccuraciesthanthoseobtainedfromisotropickernels.
3. OptimizingSensorPlacements
Usually, we are limited to deploying a small number of sensors, and thus must carefully choose
wheretoplacethem. Inspatialstatisticsthisoptimizationiscalledsamplingorexperimentaldesign:
findingthekbestsensorlocationsoutofafinitesubsetV ofpossiblelocations,forexample,outof
agriddiscretizationofR2.
3.1 TheEntropyCriterion
Wefirsthavetodefinewhatagooddesignis. Intuitively,wewanttoplacesensorswhicharemost
informativewithrespecttotheentiredesignspace. Anaturalnotionofuncertaintyistheconditional
entropyoftheunobservedlocationsV A afterplacingsensorsatlocationsA,
n
H(X X )= p(x ;x )logp(x x )dx dx ; (3)
VnA j A (cid:0)Z VnA A VnA j A VnA A
where we use X and X to refer to sets of random variables at the locations A and V A.
A V A
Intuitively, minimizing thisnquantity aims at finding the placement which results in the lowestnun-
certainty about all uninstrumented locations V A after observing the placed sensors A. A good
n
placementwouldthereforeminimizethisconditionalentropy,thatis,wewanttofind
A =argmin H(X X ):
(cid:3) A(cid:26)V:jAj=k VnA j A
241
KRAUSE,SINGHANDGUESTRIN
UsingtheidentityH(X X )=H(X ) H(X ),wecanseethat
V A A V A
n j (cid:0)
A =argmin H(X X )=argmax H(X ):
(cid:3) A(cid:26)V:jAj=k VnA j A A(cid:26)V:jAj=k A
SowecanseethatweneedtofindasetofsensorsA whichismostuncertainabouteachother. Un-
fortunately,thisoptimizationproblem,oftenalsoreferredtoasD-optimaldesignintheexperiment
designliterature(cf.Currinetal.,1991),hasbeenshowntobeNP-hard(Koetal.,1995):
Theorem1(Koetal.,1995) Given rational M and rational covariance matrix S over Gaus-
VV
sianrandomvariablesV, decidingwhetherthere exists a subsetA V of cardinalityk such that
(cid:18)
H(X ) M isNP-complete.
A
(cid:21)
Therefore, the following greedy heuristic has found common use (McKay et al., 1979; Cressie,
1991): Onestartsfromanemptysetoflocations,A =0/,andgreedilyaddsplacementsuntil A =k.
0
j j
At each iteration, starting with set A, the greedy rule used is to add the location y V A that
i (cid:3)H 2 n
hashighestconditionalentropy,
y =argmax H(X X ); (4)
(cid:3)H y yj Ai
that is, the location we are most uncertain about given the sensors placed thus far. If the set of
selected locations at iteration i is A = y ;:::;y , using the chain-rule of entropies, we have
i 1 i
f g
that:
H(X )=H(X X )+:::+H(X X )+H(X X ):
Ai yi j Ai 1 y2 j A1 y1 j A0
(cid:0)
Note that the (differential) entropy of a Gaussian random variable X conditioned on some set of
y
variablesX isamonotonicfunctionofitsvariance:
A
1 1 1
H(X X )= log(2p es 2 )= logs 2 + (log(2p )+1); (5)
yj A 2 XyXA 2 XyXA 2
j j
which can be computed in closed form using Equation (2). Since for a fixed kernel function, the
variancedoesnotdependontheobservedvalues,thisoptimizationcanbedonebeforedeployingthe
sensors, that is, a sequential, closed-loop design taking into account previous measurements bears
noadvantagesoveranopen-loopdesign,performedbeforeanymeasurementsaremade.
3.2 AnImprovedDesignCriterion: MutualInformation
The entropy criterion described above is intuitive for finding sensor placements, since the sensors
that are most uncertain about each other should cover the space well. Unfortunately, this entropy
criterion suffers from the problem shown in Figure 4, where sensors are placed far apart along the
boundary of the space. Since we expect predictions made from a sensor measurement to be most
precise in a region around it, such placements on the boundary are likely to “waste” information.
This phenomenon has been noticed previously by Ramakrishnan et al. (2005), who proposed a
weightingheuristic. Intuitively,thisproblemarisesbecausetheentropycriterionisindirect: thecri-
teriononlyconsiderstheentropyoftheselectedsensorlocations,ratherthanconsideringprediction
quality over the space of interest. This indirect quality of the entropy criterion is surprising, since
242
NEAR-OPTIMALSENSORPLACEMENTSINGAUSSIANPROCESSES
8
6
4
2
0 5 10 15 20
Figure4: An example of placements chosen using entropy and mutual information criteria on a
subsetofthetemperaturedatafromtheInteldeployment. Diamondsindicatethepositions
chosenusingentropy;squaresthepositionschosenusingMI.
thecriterionwasderivedfromthe“predictive”formulationH(V A A)inEquation(3),whichis
n j
equivalenttomaximizingH(A).
CaseltonandZidek(1984)proposedadifferentoptimizationcriterion,whichsearchesforthesubset
ofsensorlocationsthatmostsignificantlyreducestheuncertaintyabouttheestimatesintherestof
thespace. Moreformally,weconsiderourspaceasadiscretesetoflocationsV =S U composed
[
oftwoparts: asetS ofpossiblepositionswherewecanplacesensors,andanothersetUofpositions
ofinterest,wherenosensorplacementsarepossible. Thegoalistoplaceasetofksensorsthatwill
giveusgoodpredictionsatalluninstrumentedlocationsV A. Specifically,wewanttofind
n
A =argmax H(X ) H(X X );
(cid:3) A(cid:18)S:jAj=k VnA (cid:0) VnA j A
that is, the set A that maximally reduces the entropy over the rest of the space V A . Note that
(cid:3) (cid:3)
n
this criterion H(X ) H(X X ) is equivalent to finding the set that maximizes the mutual
V A V A A
informationI(X ;Xn (cid:0))betweenntjhelocationsA andtherestofthespaceV A. Intheirfollow-up
A V A
work, Caselton et al.n(1992) and Zidek et al. (2000), argue against the use onf mutual information
in a setting where the entropy H(X ) in the observed locationsconstitutesa significantpart of the
A
total uncertainty H(X ). Caselton et al. (1992) also argue that, in order to compute MI(A), one
V
needs an accurate model of P(X ). Since then, the entropy criterion has been dominantly used
V
as a placement criterion. Nowadays however, the estimation of complex nonstationary models for
P(X ), as well as computational aspects, are very well understood and handled. Furthermore, we
V
showempirically,thateveninthesensorselectioncase,mutualinformationoutperformsentropyon
severalpracticalplacementproblems.
On the same simple example in Figure 4, this mutual information criterion leads to intuitively
appropriate central sensor placements that do not have the “wasted information” property of the
entropy criterion. Our experimental results in Section 9 further demonstrate the advantages in
performance of the mutual information criterion. For simplicity of notation, we will often use
MI(A)=I(X ;X )todenotethemutualinformationobjectivefunction. Noticethatinthisno-
A V A
n
243
KRAUSE,SINGHANDGUESTRIN
9
Upper bound
8 on optimal solution
on7
ati Greedy
orm6 Best solution
nf solution
al i5 found
u
Mut4
3
2
1 2 3 4 5
Number of sensors placed
Figure5: Comparisonof the greedyalgorithmwith the optimalsolutionson a small problem. We
selectfrom1to5sensorlocationsoutof16,ontheIntelBerkeleytemperaturedatasetas
discussedinSection9. Thegreedyalgorithmisalwayswithin95percentoftheoptimal
solution.
tation the process X and the set of locations V is implicit. We will also write H(A) instead of
H(X ).
A
Themutualinformationisalsohardtooptimize:
Theorem2 GivenrationalM andarationalcovariancematrixS overGaussianrandomvari-
VV
ablesV =S U,decidingwhetherthereexistsasubsetA S ofcardinalityksuchthatMI(A) M
[ (cid:18) (cid:21)
isNP-complete.
Proofs of all results are given in Appendix A. Due to the problem complexity, we cannot expect
to find optimal solutions in polynomial time. However, if we implement the simple greedy algo-
rithmforthemutualinformationcriterion(detailsgivenbelow),andoptimizedesignsonreal-world
placementproblems,weseethatthegreedyalgorithmgivesalmostoptimalsolutions,aspresented
inFigure5. Inthissmallexample,wherewecouldcomputetheoptimalsolution,theperformance
of the greedy algorithm was at most five percent worse than the optimal solution. In the follow-
ing sections, we will give theoretical bounds and empirical evidence justifying this near-optimal
behavior.
4. ApproximationAlgorithm
OptimizingthemutualinformationcriterionisanNP-completeproblem. Wenowdescribeapoly-
nomialtimealgorithmwithaconstant-factorapproximationguarantee.
4.1 TheAlgorithm
Our algorithm is greedy, simply adding sensors in sequence, choosing the next sensor which pro-
vides the maximum increase in mutual information. More formally, using MI(A)=I(X ;X ),
A V A
n
244
Description:Theory, Efficient Algorithms and Empirical Studies c 2008 Andreas Krause, Ajit Singh and Carlos Guestrin #0 ". ##. #%. #. #&. #'. $!. $#. $ ". #). #(. #$. " ' (. ' ) %. $%. " #. $. %! .. At each iteration, starting with set Ai, the greedy rule used is to add the location y∗H ∈ V .. reintroduce