Table Of ContentN91-15622
A Method for Classification of Muitisource Data using
Interval-Valued Probabilities and Its Application to HIRIS Data
H. Kim and P.H. Swain
School of Electrical Engineering
and
Laboratory for Applications of Remote Sensing
Purdue University
West Lafayette, IN 47907, U.S.A.
Tel: (317)494-0212, FAX: (317)494-0776
E-mail address: swain@ecn.purdue.edu
tent information as "evidential informa-
ABSTRACT tion."
The objective of the current research is
A method of classifying multisource data to develop a mathematical framework
in remote sensing is presented. The pro- within which various applications can be
posed method considers each data source as made with multisource data in remote
an information source providing a body of sensing and geographic information sys-
evidence, represents statistical evidence by tems. The methodology described in this
interval-valued probabilities, and uses paper has evolved from "evidential reason-
Dempster's rule to integrate information ing," where each data source is considered
based on multiple data sources. as providing a body of evidence with a cer-
The method is applied to the problems of tain degree of belief. The degrees of belief
ground-cover classification of multispectral based on the body of evidence are repre-
data combined with digital terrain data such sented by "interval-valued (IV) probabili-
as elevation, slope, and aspect. Then this ties" rather than by conventional point-
method is applied to simulated 201 -band valued probabilities so that uncertainty can
High Resolution Imaging Spectrometer be embedded in the measures.
(HIRIS) data by dividing the dimensionally There are three fundamental problems
huge data source into smaller and more in the multisource data analysis based on IV
manageable pieces based on the global sta- probabilities: (1) how to represent bodies of
tistical correlation information. It produces evidence by IV probabilities, (2) how to
higher classification accuracy than the combine IV probabilities to give an overall
Maximum Likelihood (ML) classification assessment of the combined body of evi-
method when the Hughes phenomenon is dence, and (3) how to make decisions based
apparent. on IV probabilities.
The paper describes a formal method of
representing statistical evidence by IV
1 INTRODUCTION probabilities based on the Likelihood
Principle. In order to integrate informa-
The importance of utilizing multisource tion obtained from individual data sources,
data in ground-cover classification lies in the method presented in the paper uses
the fact that it is generally correct to as- Dempster's rule for combining multiple
sume that improvements in terms of classi- bodies of evidence. Although IV probabili-
fication accuracy can be achieved at the ex- ties together with Dempster's rule provide
pense of additional independent features an innovative means for the representation
provided by separate sensors. However, it and combination of evidential information,
should be recognized that information and they make the decision process rather
knowledge from most available data sources complicated. We need more intelligent
in the real world are neither certain nor strategies for making decisions. This paper
complete. We refer to such a body of uncer- also focuses on the development of decision
tain, incomplete, and sometimes inconsis- rules over IV probabilities.
75
2 AXIOMATIC DEFINITION OF when there is absolutely no knowledge of
IV PROBABILITY the probability of A.
The basic probability assignment m de-
Interval -valued probabilities can be fined in Shafer's mathematical theory of
thought as a generalization of ordinary evidence[6] has the following relations with
point-valued probabilities. The endpoints of the IV probabilities:
IV probabilities are called the "upper prob-
ability" and the "lower probability."
£(A) = 2 re(B) (2.8)
There have been various works intro-
ducing the concepts of IV probabilities in Beat
the areas of philosophy of science and
re(A) =Z (_I)IA-BI L(B) for all AcO (2.9)
statistics [1][2][3][4]. Although the mathe-
Bff.A
matical rationales behind those approaches
are different, there are some properties of
IV probabilities which are commonly re- U(A) = Z re(B) (2.10)
quired. The axiomatic approach to IV prob- BnA,O
abilities is based on those common proper-
ties, so that it can avoid conceptual ambi-
guities.
3 REPRESENTATION OF STATISTICAL
DEFINITION [5] Suppose O is a finite set of EVIDENCE BY IV PROBABILITY
exhaustive and mutually exclusive events.
Let _ denote a Boolean algebra of the subsets When a body of evidence is based on the
of O. The IV probability [L, U] is defined by outcomes of statistical experiments known
the set-theoretic functions: to be governed by any probability model, it
is called "statistical evidence." One of the
L" 13-* [0, 11 (2.1)
basic problems for any theory of IV prob-
U'I3 -->[0, 1] (2.2) abilities is how to represent a given body of
statistical evidence by IV probabilities.
satisfying the following properties:
I ) U(A) _>L(A) _>0 for any Ae 13 (2.3) DEFINITION [6] An upper probability
function U is said to be "consonant" if its
I I) U(O) = L(O) = 1 (2.4)
focal elements are nested, i.e., if for A i_O
III) U(A) + L(A) = 1 for any Ae_ (2.5) (i=l ..... r) such that m(Ai) > 0 for all i and
IV) For any A, Be15 and AnB=O, r
L(A)+L(B) <_L(AuB) < L(A)+U(B) m(Ai)=l, AicA j for any i < j, where m is the
i=l
< U(AuB) <_U(A)+U(B) (2.6)
basic probability assignment of u.
Given a system of IV probabilities over 13,
Suppose the observed data in a statistical
the actual probability measure, P(A), of any
subset A of @ is assumed to lie in the interval experiment are governed by a probability
model {P0:0eO}, where P0 is a conditional
[L, U] such that
probability density function on a sample
L(A) < P(A) < U(A) (2.7)
space X given 0. Our intuitive feeling is that
The degree of uncertainty about the actual an observation xe X seems to more likely
probability of A is represented by the width, belong to those elements of O which assign
U(A)-L(A), of the interval. In particular, the greater chance to x.
U(A)=L(A)=P(A) when there is complete Based on the above intuition along with
the consonance assumption of the upper
knowledge of the probability of A. In this
probability function, Shafer[6] proposed the
case, the IV probability becomes an ordi-
linear plausibility function defined as:
nary additive probability. And L(A)+L(A)=0
76
tion is obtained by the sum of m's committed
max p0.(x) to X k and its subsets. The denominator (l-f0
0'_A
_(AIx)-
for all A_ I] and A#_ (3.1) normalizes the result to compensate for the
max pe(x)
_0 measure committed to the empty set so that
the total probability mass has measure one.
The corresponding lower probability func-
Consequently, Dempster's rule discards the
tion is given as:
conflict between E 1 and E2 and carries their
consensus to the new belief function.
max po.(x)
Dempster's rule is both commutative and
0"eA
associative. Therefore, the order or group-
L(AIx) = 1- max po(x) for all A_ 13 (3.2)
ing of evidence in combination does not
affect the result, and a sequence of infor-
In particular, when the set A is singleton, mation sources can be combined either
say {e'}, the function in Eq.(3.1) gives the
sequentially or pairwise.
relative likelihood of 0' to the most likely
element in O.
5 DECISION RULES FOR
4 DEMPSTER'S RULE FOR IV PROBABILITIES [7]
COMBINING IV PROBABILITIES
Consider a classification problem where
an arbitrary pattern xe X from an unknown
Dempster's rule is a generalized scheme
class is assigned to one of n classes in O. Let
of Bayesian inference to aggregate bodies of
evidence provided by multiple information k(0il0 j) be a measure of the "loss" incurred
sources. Let m 1 and m2 be the basic prob- when the decision 0i is made and the true
ability assignments associated respec-tively pattern class is in fact 0j, where i, j = 1..... n.
with the belief functions Bel 1 and Bel 2 Also, let _(x) denote a decision rule that tells
which are inferred from two entirely dis- which class to choose for every pattern x.
tinct bodies of evidence E1 and E2. For all Ai, We define the "upper expected loss" and the
"lower expected loss" of making a decision
Bj, and XkCO, Dempster's rule (or Dempster's
O(x)=0 i as:
orthogonal sum) gives a new belief func-
tion denoted by n
Bel= Bel l _ Bel2 (4.1) fi(x) = 2 2L(OilOj) Ux(Oj) (5.1)
j=l
The basic probability assignment associated
with the new belief function is defined as: n
hi(x) = Z k(0il0j) Lx(0j) (5.2)
m(Xk )=(1"/0"1 2 ml(Ai)'m2(Bj) j=l
AinBj=X k
where U x and L x are respectively the upper
for any Xk¢ O (4.2) and the lower probabilities for x being
where k is the measure of conflict between actually from 0j.
The "Bayes-like rule" is the one which
Bel 1 and Bel 2 defined as:
minimizes both the upper and the lower
expected losses, i.e.,
k= Z ml (Ai)'m2(Sj) (4.3)
AinBj=_ _(x)=0 i if _(x)_<q(x) and l,i(x)_<l,j(x)
Dempster's rule computes the basic for j=l ..... n (5.3)
probability of X k, m(Xk), from the product
A problem with the above decision rule is
of ml(Ai)and m2(Bj) by considering all A i that there does not always exist 0 which sat-
and Bjwhose intersection is Xk. Once m is isfies the condition in Eq.(5.3), which can
computed for every XkCO, the belief func- lead to ambiguity. In such an ambiguous
77
situation, one may withhold the decision Radar (SAR) imagery in Shallow mode and
and wait for a new piece of information. Steep mode, respectively. Sources 4 through
Otherwise, the ambiguity may be resolved 6 provide digital terrain data.
by resorting to the following rule, so-called In this experiment, 6 classes were de-
"minimum average expected loss rule": fined as listed in Table 2, and 100 pixels per
class were used for training data, which is
_i(x)+ (,i(x) /_i(x)+ [.j(x) between 4% and 8% of the total pixels of the
O(x)=0 i if 2 _< 2 classes in the test fields. The training sam-
ples are uniformly distributed over the test
for j=l ..... n (5.4)
As an alternative to the Bayes-like rule, Table 1. Anderson River Data Set.
there are two other rules by which a deci-
sion is made according to individual mea- Source Data Spectral Input Spectral
sures of the interval, that is, either the up- Index Type Region Channel Band(Ixm)
per expected loss or the lower expected loss: 1 .38 - .42
2 .42 - .45
(A) minimum upper expected loss rule:
3 .45 - .50
Visible 4 .50 - .55
d(x)=0 i if _i(x) </_(x) for j=l,.:., n (5.5)
5 .55 - .60
(B) minimum lower expected loss rule: 1 A/B 6 .60 - .65
MSS
§(x)=0 i if /.i(x) < [.j(x) for j=l ..... n (5.6) 7 .65 - .69
8 .70 - .79
Although the above two rules always pro-
duce decisions and there is no ambiguous Near IR 9 .80 - .89
10 .92 - 1.10
situation in making a decision according to
the rules, they do not utilize all of the in- Thermal 11 8 - 14
formation represented by the IV probabili- XHV
ties. The performance of these rules are 2 SAIl Shallow XHH
compared with the minimum average ex- LHV
pected loss rule in the experiments by LHH
applying them to problems of ground-cover XHV
classification based on remotely sensed and 3 SAR Steep XHH
geographic data. LHV
I.HH
4 Topo- Elevation
6 EXPERIMENTAL RESULTS
5 graphic Aspect
6 Slope
The experiments have been performed
over two different image data sets. In the
experiments, the classification accuracies of Table 2. Information Classes for Test of
the multisource data (MSD) classification Anderson River Data Set.
based on the proposed method were com-
pared with those of Maximum Likelihood Class Cover Tree No. of % of
(ML) classifications based on the stacked Inde_ T_,pes Sizes ?ixels Total
vector approach. 1 Douglas Fir 2 (df2) 31 - 40m 2246 21.72
Table 1 describes the set of data sources 2 Douglas Fir 3 (df3) 21 - 30m 1501 14.52
for the first experiment. The image in this 3 DF+Other Species 2 31 - 40m 1352 13.08
data set consists of 256 lines by 256 columns (df+os2)
and covers a forestry site around the DF+Lodgepole Pine 2 21 - 30m 1589 15.37
Anderson River area of British Columbia, (df+lp2)
Canada. Source 1 is ll-band Airborne 5 Hemlock+Cedar (hc) 31 - 40m 1587 15.35
Multispectral Scanner (A/B MSS) data. 6 Sorest Clearings _fc) 2064 19.96
Sources 2 and 3 are Synthetic Aperture :Total 1033_ I00.0
78
fields so that they may be considered as good general which rule is the best. Further
representatives of the total samples. research is needed to determine whether
We have observed that some of the guidelines can be devised for selection of
classes defined in Table 2 cannot be assumed the decision rule.
to be normally distributed in the topo-
graphic data. Thus it was decided to adopt a Table 4. Results of Classifications over Test
nonparametric approach such as the Samples of Anderson River Data.
"Nearest Neighbor" (NN) method [8] in
computing probability measures while the Decision Sources
optical and radar data sources were assumed Rule
1 1, 4 1,2,41 -4 1 -5 1 -6
to have Gaussian probability density func-
tions. ML 74.1677.77 79.13 78.9379.80 ;1.0
First, the ML classification based on the
stacked vector approach was carried out for MUEL 80.60 82.39 82.6983.02 _4.54
various sets of the data sources, adding one
MSD MLEL 78.45 81.42 81.6782.24_3.65
source at a time to the A/B MSS data in the
order Elevation, SAR-Shallow, SAR-Steep, MAEL 78.21 80.9582.0581.88_3.1(
Aspect, and Slope. Then the MSD classifica-
tion based on the proposed method was per-
In the second experiment, the proposed
formed using different decision rules.
method was applied to the classification of
Tables 3 and 4 compare the results for the
HIRIS data by decomposing the data into
training samples and the test samples, re-
smaller pieces, i.e., subsets of spectral
spectively. Even though the compounded
bands. The data set used in this experiment
data in the ML classification were treated as
is simulated HIRIS data obtained by RSSIM
having Gaussian distributions, the ML and
[9]. RSSIM is a simulation tool for the study
the MSD methods produced similar results
of multispectral remotely sensed images and
for the training samples. This is not sur-
associated system parameters. It creates
prising because the ML method uses con-
realistic multispectral images based on de-
ventional additive probabilities assuming
tailed models of the ground surface, the at-
that the knowledge concerning the actual
mosphere, and the sensor. Table 5 provides
unknown probabilities is complete, which is
a description of the simulated HIRIS data set.
reasonable as far as the training samples
are concerned. Figure 1 is a visual representation of the
global statistical correlation coefficient
matrix of the data. The image is produced by
Table 3. Results of Classifications over
converting the absolute values of coeffi-
Training Samples of Anderson River Data.
cients to gray values between 0 and 255.
Based on the correlation image, the 201
Decision Sources
bands were divided into 3 groups in such a
Rule
1 1,4 1, 2,4 1 -411 - 5 1-6 way that intra-correlation is maximized and
inter-correlation is minimized. Table 6 de-
ML g2.513 ]8.67 91.67 92.00 _2.83 _3.5(
scribes the multisource data set after divi-
MUEL 89.83 92.00 92.5093.17_4.32 sion. Note that the spectral regions of the
input channels in Source 3 coincide with
MSD MLEL 88.67 91.17 91.3392.33_3.65 the water absorption bands.
With 225 training samples (a third of the
MAEL 88.50 91.00 91.6791.67_3.5C
total samples) for each class, the ML classi-
fication and the MSD classification using
Comparing the performance of the three the minimum upper expected loss rule were
decision rules, the minimum upper expected performed over the total samples for vari-
loss (MUEL) rule was superior to the other ous sets of the sources, and the results are
rules, the minimum lower expected loss listed in Table 7.
(MLEL) rule and the minimum average ex- The results of the ML method apparently
pected loss (MAEL) rule. It is not known in show effects of the Hughes phenomenon;
79
the accuracygoes down as the dimensional- Table 6. Divided Sources of HIRIS Data Set.
ity of the source increases while the num-
Source Input No. of
ber of training samples is fixed. In particu-
Index Channels Features
lar, the accuracy decreases by a consider-
able amount when all features are used. Source 1 1- 35T 107 - 141_ 157 - 201 115
Presence of the Hughes phenomenon causes Source 2 36 - 95 60
the ML method to be particularly sensitive Source 3 96- 106 (1.35 - 1.451.tm) 26
to a bad source, Source 3 in this case. 142- 156 (1.81 - 1.951am)
Meanwhile, the proposed MSD classification
method always shows robust performance
Table 7. Results of Classifications over Test
and gives consistent results.
Samples of Simulated HIRIS Data Set.
Table 5. Description of Simulated Sources
HIRIS Data Set.
S1 $2 $3 S1, $2 All
ML 75.75 75.60 45.83 74.56 65.14
Name Finney County Data Set
MSD - 77.83 77.63
Data Type 201-band HIRIS data simulated by
RSSIM
Spectral Region 0.4 - 2.4m m
6 CONCLUSIONS
Spectral 0.01ram
Resolution
In this paper we have investigated how
Image Size 45 lines x 45 columns (2025 interval-valued probabilities can be used to
samples) represent and aggregate evidential infor-
Information Winter Wheat, Summer Fallow, mation obtained from various data sources.
Classes Unknown Overall concepts of interval-valued proba-
bilities have been employed to develop a
new method of classifying multisource data
in remote sensing and geographic infor-
mation systems. The experiments demon-
strate the ability of our method to capture
1 uncertain information based on inexact and
incomplete multiple bodies of evidence. The
basic strategy of this method is to decompose
36 the relatively large size of evidence into
smaller, more manageable pieces, to assess
52 plausibilities and supports based on each
piece, and to combine the assessments by
75 Dempster's rule. In this scheme, we are able
95 to overcome the difficulty of precisely
estimating statistical parameters, and to
107 integrate statistical information as much as
possible.
141
ACKNOWLEDGEMENTS
157
This research is partially supported by
the National Aeronautics and Space
Administration under Contract No. NAGW-
201
925.
Figure 1. Global Statistical Correlation The SAR/MSS Anderson River data set
Coefficient Image of Simulated HIRIS Data. was acquired, processed and loaned to
Purdue University by the Canadian Center
80
ORIGINAL PAGE
BLACK AND WHITE PHOTOGRAPH
for Remote Sensing, Department of Energy, mapping", Ann. Math. Stat., vol.38,
Mines and Resources, of the Government of pp325-339, 1967.
Canada. [5] P. Suppes, "The measurement of be-lief",
J. Roy. Stat. Soc., Ser.B, vol.36, pp160-191,
1974.
REFERENCES [6] G. Shafer, A Mathematical Theory of
Evidence, Princeton University Press,
[1] B.O. Koopman, "The axioms and algebra 1976.
of intuitive probability", The Annals of [7] H. Kim, "A Method of Classification for
Mathematics, vol.41, pp269-292, 1940. Multisource Data in Remote Sensing
[2] I.J. Good, "Subjective probability as the based on Interval-Valued Probabilities",
measure of a non-measurable set", in Ph.D. Thesis, School of Electrical
E.Nagel, Suppes and Tarski (eds.), Logic, Engineering, Purdue University, West
Methodology and the Philosophy of Lafayette, IN 47907 U.S.A., August 1990.
Science, pp319-329, Stanford University [8] Fukunaga, K., Intro. to Statistical Pattern
Press, 1962. Recognition, Academic Press, 1972.
[3] C.A.B. Smith, "Consistency in statistical [9] Kerekes, J.P., and D.A. Landgrebe,
inference and decision", J. Roy Stat. Soc., "Modeling, Simulation, and Analysis of
Ser.B, vol.23, ppl-25, 1961. Optical Remote Sensing Systems", TR-EE
[4] A.P. Dempster, "Upper and lower pro- 89-49, School of Electrical Engineering,
babilities induced by a multivalued Purdue University, West Lafayette, IN
U.S.A., 1989.
81