Table Of ContentCombination of uncertain ordinal expert statements:
The combination rule EIDMR and its application to
low-voltage grid classification with SVM
Sebastian Breker Bernhard Sick
EnergieNetz Mitte GmbH, Kassel, Germany Intelligent Embedded Systems, University of Kassel, Germany
Email: sebastian.breker@energienetz-mitte.de Email: bsick@uni-kassel.de
Abstract—Theuse ofexpert knowledgeis alwaysmoreor less In the following, we assume that we want to gather knowl-
afflicted with uncertainties for many reasons: Expert knowledge edge of experts in an application task, and only refer to
may be imprecise, imperfect, or erroneous, for instance. If we
“experts” as human knowledge source. These domain experts
ask several experts to label data (e.g., to assign class labels to
assess data objects (samples), e.g., they label data using
givendataobjects,i.e.samples),weoftenstatethattheseexperts
make different, sometimes conflicting statements. The problem predefinedordinalclassestosolvemachinelearningproblems
of labeling data for classification tasks is a serious one in many with supervised learning techniques. In addition, the experts
technicalapplicationswhereitisrathereasytogatherunlabeled mayassessthedifficultyofprovidingthatlabel.Thisoptional
data,butthetaskoflabelingrequiressubstantialeffortregarding
value can be seen as a kind of uncertainty of a single expert
time and, consequently, money. In this article, we address the
regarding that statement.
problemofcombiningseveral,potentiallywrongclasslabels.We
Basically, the statement of experts using an ordinal scale is
assume that we have an ordinal class structure (i.e., three or
more classes are arranged such as “light”, “medium-weight”, influenced by many subjective parameters, e.g.,
and “heavy”) and only a few expert statements are available. 1) experts have individual experience levels,
We propose a novel combination rule, the Extended Imprecise
2) their forms of the day may vary,
Dirichlet Model Rule (EIDMR) which is based on a k-nearest-
3) they have different notions of “strictness”, and
neighborapproachandDirichletdistributions,i.e.,second-order
distributions for multinomial distributions. In addition, experts 4) experts have an individual tendency not to opt for
may assess the difficulty of the labeling task, which may op- extremes.
tionallybeconsideredinthecombination.Thecombinationrule Onanabstractlevel,experiencelevelandformofthedaycan
EIDMR is compared to others such as a standard Imprecise
beregardedasaffectingtheprobabilitythatacorrectstatement
Dirichlet Model Rule, the Dempster-Shafer Rule, and Murphy’s
is provided by an expert. The individual use of strictness
Rule. In our evaluation of EIDMR we first use artificial data
where we know the data characteristics and true class labels. can be seen as a probability that the expert tends to rate a
Then, we present results of a case study where we classify low- sample with a higher (or lower) class label compared to the
voltage grids with Support Vector Machines (SVM). Here, the true class. Furthermore, the individual tendency not to opt for
task is to assess the expandability of these grids with additional
extremesleadstoaprobabilitythatanexperttendstoassessa
photovoltaic generators (or other distributed generators) by
sample whose true class is near a “boundary” class (e.g., the
assigning these grids to one of five ordinal classes. It can be
shownthattheuseofournewEIDMRleadstobetterclassifiers “lowest” or “highest” class) with a label which is nearer to a
in cases where ordinal class labels are used and only a few, “middle”class.Dependingonthistendency,thebandwidthof
uncertain expert statements are available. the actually exploited ordinal classes may be low.
Assessing the difficulty of the labeling tasks, the fraction
I. INTRODUCTION
of statements rated as either being “easy” or “difficult” by
In general, human judgment can be gathered either in a an expert will be larger or smaller for an expert with a
quantitative or in a qualitative way. Often, humans feel not high experience level compared to a low experienced expert.
comfortable with the task to express their opinion or belief Likewise, these fractions will differ for a single expert with
quantitatively, because they worry that a concrete numeric different forms of the day. Because of all these reasons, a
value could give the (false) impression that there is more decisionregardinganordinalclassificationmadebyanexpert
confidence in a judgment than they really have. Even for ismoreorlessuncertain.Hence,afinalclassificationdecision
applicationexpertsitisdifficulttocopewithhigh-dimensional in a concrete application should not only be made using
andcomplextasks(see,e.g.,[1]).But,usingwordsinsteadof statements from a single expert. As experts might disagree,
concrete values is also ambiguous and imprecise [2]. Thus, uncertain statements have to be combined in an appropriate
making a statement based on predefined ordinal classes (e.g., way. The aim of all combination rules is to generate fused
“low”, “medium”, and “high”) can be seen as a reasonable classlabelswhichpredictthetrueclasswithahigheraccuracy
trade-off between eliciting a quantitative and a qualitative than with the more uncertain individual class labels.
judgment, for instance. In this article, we present a novel combination rule for
978-1-5090-0620-5/16/$31.00(cid:2)c2016IEEE 2164
class labels which is shown to be superior to some existing extendedbyanotherfifthrequirementin[5].Insummary,these
ones if (1) we have to cope with ordinal classes as discussed requirements are [5]:
above and (2) the number of labels (expert statements) for 1) irrelevance of order of statements in knowledge fusion
single samples is quite low. This rule, the Extended Im- (commutativity and associativity),
precise Dirichlet Model Rule (EIDMR), is based on a k- 2) decrease of ignorance with an increasing number of
nearest-neighbor (knn) approach and Dirichlet distributions, statements,
i.e., second-order distributions for multinomial distributions. 3) concordance of knowledge increases the belief in a
Basically, it extends a standard Imprecise Dirichlet Model statement,
Rule (IDMR) by considering (1) the class order and (2) 4) conflicting knowledge decreases the belief in a state-
additional,alsouncertainlabelsforsimilarsamples(measured ment, and
inanappropriatefeaturespace).Theuncertaintyaccompanied 5) persistent conflict is reflected.
with the combination can be determined with the help of two Most well-known combination rules are based on
probability boundaries. The usage of the resulting uncertainty Dempster-Shafer theory (DST) [6], e.g. the Dempster-Shafer
values for the combined statement is optional. They can be Rule (DSR) [6] or Murphy’s Rule (MR) [3]. A survey on
used, e.g., for a comparison of the uncertainty of different combination rules based on DST has been presented in [7].
data objects or for a comparison to another expert group in a MostofthealternativeDSTbasedrulesconsiderconflictredis-
secondclassificationapproach.Inaddition,expertsmayassess tribution. The counter-intuitive behavior concerning paradox
the difficulty of the labeling task, which may optionally be problems [8] caused by compatible evidence is rarely consid-
considered by EIDMR. ered [9]. Beside the DST based rules there exist some other
The accuracy of different rules (including EIDMR, IDMR, ruleswhicharederivedfromDezert-Smarandachetheory[10]
the Dempster-Shafer rule, and Murphy’s rule) in predicting and correspond to a non-Bayesian reasoning approach, e.g.,
a class label is first investigated using three artificial data the PCR5 [11]. Additionally, Smarandache et al. give several
sets for which the true classes are known. Then, we compare counterexamples where the DSR fails to provide coherent
the combination rules using real data from the field of low- results (or provides no results at all) [12].
voltage grid classification. With combined class labels, we In summary, most of all known DST based rules do not
trainsupportvectormachine(SVM)classifiers.Theempirical fulfill—as they were not designated to—the additional fifth
data contain 300 samples of rural and suburban low-voltage requirement.BecauseofthisfactAndradeetal.[5]proposean
gridswithtenfeatures.Theyweregatheredinanextensivegrid IDMbasedrule(IDMR)andshowthatthisruleaccomplishes
surveyandlabeledaccordingtothegrids’hostingcapacityfor all of the properties concerning the above requirements. At
distributed generators (e.g., photovoltaic generators) by five this point, it should be mentioned that none of the mentioned
experts in distribution grid planning. knownrulesconsiders(1)theclassorderand(2)availableun-
The remainder of the article is organized as follows: In certain labels for similar samples (measured in an appropriate
Section II, we first discuss related work and IDMR. Then, we feature space). We overcome these points with our EIDMR.
present EIDMR and illustrate its properties with an example.
B. Methodical Foundations: IDM and IDMR
In Section III, the accuracy of these combination rules is
To motivate the IDM, we assume a set Ω=1,...,C of C
investigated in more detail using three artificial data sets
mutuallyexclusiveevents,inourcaseanumberofC different
and compared to two other well-known rules derived from
Dempster-Shafer theory. Section IV presents the results of classes.Lettheprobabilityθcforthechoiceofaclasscduring
a labeling process for a sample be denoted as element of a
ourstudyontheclassificationoflow-voltagegrids.SectionV
summarizesthekeyfindingsandsketchesourfutureresearch. vector θθθ. Usually, these probabilities θc are unknown. If we
supposethatanobjectislabeledbyaltogetherNE experts,the
II. COMBINATIONOFCLASSLABELS result can be denoted by t(cid:2)he classes’ occurrence frequencies
InthisSection,weproposethenewEIDMR,whichisbased nnn = (n1,...,nC)T with Cc=1nc = NE. Furthermore, we
model the likelihood of a resultnnn with [13]:
onImpreciseDirichletModels(IDM)andcanbeusedtofuse
uncertain ordinal class labels. We start with a presentation (cid:3)C
of the current state of the art (Section II-A). After that, the p(nnn|θθθ) = (θc)nc. (1)
methodical foundations of IDM and IDMR are given in Sec- c=1
tion II-B. Next, we propose our new EIDMR in Section II-C. Because of the fact that θθθ is usually unknown, we now use
The application of IDMR and EIDMR is illustrated with a the Dirichlet distribution, which is the prior distribution in a
simple example in Section II-D. Bayesianparameterestimationapproachwithhyperparameters
h and ttt = (t1,...,tC)T, to model a distribution of vector
A. Combination Rules – The State of the Art θθθ which is assumed to be a random variable itself [14]:
Before we present the two combination rules which are (cid:3)C
based on IDM in a formal way, some important, general p(θθθ) = (cid:4)CΓ(h) · (θc)htc−1 (2)
properties of combination rules are identified. Four of these Γ(htc) c=1
requirements on combination rules are claimed in [3], [4] and c=1
2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2165
(cid:2)
C
with h > 0, 0 < tc < 1 for c = 1,...,C, and c=1tc = 1. consideration (i.e., cj = c), and 0 otherwise. We replace the
Γ is the gamma function [15]. The vector ttt represents the number of experts that assign a given sample to a class c by
prior knowledge about θθθ. In a next step, the combination
(cid:7)NE
of (1) and (2) according to p(θθθ|nnn) ∝ p(nnn|θθθ)·p(θθθ) yields nc = wj·Ij,c. (6)
the following posterior probability:
j=1
(cid:5) (cid:6)
(cid:2)C
Thus,thehighertheweightwj ofanexpert,themoreinfluence
Γ (nc+htc) (cid:3)C
p(θθθ|nnn) = (cid:4)Cc=1 · (θc)nc+htc−1. (3) mhaisnethtehecobroruesnpdoanrideisngacsctoartdeminegntto. W(4it)hanthdis(5n)c.,Twheebmoauynddaertieers-
Γ(nc+htc) c=1 can be interpreted similar to belief and plausibility in DST.
c=1
Their difference depends on the number of classification
We get a Dirichlet distribution again, because a Dirichlet is
statements with their weights and can be compared to the
the conjugate distribution of a multinomial distribution. The
degree of ignorance in DST.
influenceofthepriorknowledgetttontheposteriorprobability
In an application of IDMR, the boundaries can be used in
can be controlled with the hyperparameter h [13]. In a real
different ways. If we want to come to a sharp decision for
classification application, there is often no information about
a certain class, we may act carefully and consider the lower
the hyperparameters h andttt available. boundary p(c|nc,NE). Alternatively, we may choose class
The idea of the IDMR is now, not only to consider a
(cid:2)
single one, but to start with a set of Dirichlet distributions c = argmax nc, (7)
c
for a fixed value of h [13]. Thereby, a (cid:2)set of Dirichlet
distributions is chosen in such a way that Cc=1tc = 1 is i.e., the highest number of weighted observations. Then, the
adhered[16].Undertheseconditions,thefollowingupperand uncertainty of the decision can be determined by analyzing
lower bounds for the probability that a sample is assigned to the differences p(c|nc,NE) − p(c|nc,NE). The uncertainty
classccanbedeterminedbyamaximizationandminimization values can be used, e.g., for a comparison of the uncertainty
of tc, respectively [14]. That is, for eachttt, the corresponding for different samples or for a comparison to another expert
Dirichlet distribution is updated using (1) [13]. If there is no group in a second labeling approach.
prior information on θθθ available, the bounds are determined
C. EIDMR
with the one-sided limits tc →0 and tc →1 (because of the
In a next step, we propose a new combination rule EIDMR
linear updating step) and result according to [16] in:
whichisbasedonthesameideaasIDMR.But,thekeyideais
p(c|nc,NE) = nc , (4) todecreasetheuncertaintyofthefusedresultbymeansofad-
NE +h ditional information which is available from similar samples.
and Here, “similar” means that their feature vectors are similar,
determinedwithanappropriatesimilaritymeasure(whichmay
p(c|nc,NE) = nc+h (5) bebasedonametric).Tochoosetheseadditionalsamples,the
NE +h IDMR is extended by a k-nearest neighbor technique (knn)
where c is regarded here as the random variable for the appliedinthefeaturespace.Then,theexpertstatementsforthe
combined statement (i.e., the fused class label). The hyper- sampleunderconsiderationandthestatementsofitsk nearest
parameter h determines how quickly these upper and lower neighbors are considered. The application of knn ensures that
probabilities converge with an increasing number of observa- only samples which have a characteristic that is similar to
tions.Becauseofthisreason,Walleydefinedhasanumberof the one of the considered sample are involved in a weighted
observationsneededtoreducetheimprecisiontohalfitsinitial combination. Assuming that NE statements are available for
value [16]. Typically, h is set to either 1 or 2 [16]. Using the eachsample,(k+1)·NE statementsareconsideredbyEIDMR
derived bounds, the vectorttt needs not to be specified. to assess one specific sample. EIDMR weights the statements
IDMR (not related to DST, cf. [5], [16]) is an alternative not only depending on the experts’ difficulty assessment as
to the fusion rules based on DST. Andrade et al. used the described above, but also depending on differences between
probabilityboundariesfrom(4)and(5)tocombineacollection the class assignments for the sample under consideration and
of NE classification statements cj (cj ∈Ω is the statement of its k neighbors. EIDMR also considers the fact that we have
expert j, j =1,...,NE) among which some experts vote for ordinal classes: Larger differences in class numbers lead to
class c. lower weighting factors.
Above, nc is the number of experts that assign a given To describe the approach in a formal way, we first need an
sample to class c. With NE experts, we have 0 ≤ nc ≤ NE. additional index i for several variables as we have to consider
This approach does not consider a difficulty assessment of notonlyonesample,butalsoitskneighbors.Thus,thesample
the experts. To account for this, we introduce weights wj for under consideration gets the index 0, and its neighbors i =
each classification statement cj of an expert (with wj ≥ 1, 1,...,k. Then, for a sample i (i = 0,...,k) we have NE
as suggested in [5]). Then, we need an indicator function Ij,c expertstatements(assignmentstoclasses)summarizedinccci =
which is 1 if expert j assigns label c to the sample under (ci,1,...,ci,NE)T with corresponding difficulty assessments
2166 2016InternationalJointConferenceonNeuralNetworks(IJCNN)
wwwi = (wi,1,...,wi,NE)T. As we have ordinal classes, the Algorithm 1: EIDMR algorithm for one sample.
dissimilarity between a vector ccci and the vector ccc0 can be Input: set of samples with feature vectors, experts’
measuredusingastandardmetric,e.g.,theEuclideandistance classification statements, and difficulty
(cid:7)ccci−ccc0(cid:7). Then, we are able to define weights for samples assessment; sample 0 that has to be labeled.
gi (with i = 0,...,k) that consider differences in their class 1 Search k nearest neighbors of sample 0 in the feature
assignments as follows: space with a Euclidean distance measure;
(cid:8) (cid:9)
gi = k1 · 1− (cid:2)kl=(cid:7)ccc1i(cid:7)−ccclccc−0(cid:7)ccc0(cid:7) . (8) 23 forCio=m0putoteksidmoilarity weights gi;
4 for c=1 to C do
Toavoiddividingbyzerowehavetoexcludetheveryspecial 5 compute lower boundary p(c|nc,0,...,nc,k,NE);
casewhereallclassvectorsccci areequal(inthiscasewewould 6 compute upper boundary p(c|nc,0,...,nc,k,NE);
wpreoicgehetds hwaivtehtgh0e p=ro1peartnyd(cid:2)giki==0g0i =for1.i = 1,...,k). These 7 cOhuotopsuet:clcalsasssc(cid:2)c(cid:2)wainthdcu(cid:2)p=pearragnmdaloxwce(cid:2)r bki=ou0ngdia·rniecs,ip; and p.
Withthisspadework,weformallyextendthetwoprobability
boundaries in the following way:
(cid:2)k
gi·nc,i the figure, the classification statements and weights are set
p(c|nc,0,...,nc,k,NE) = (cid:2)C (cid:2)i=k0 , (9) out as vectors ccci and wwwi (i = 0,1,2). It can be seen that
gi·nc,i+h the weights of the experts’ statements are lower for the data
c=1i=0 sample 0 under consideration (red) compared to the weights
(cid:2)k of its two nearest neighbors (green). This simulates the case
gi·nc,i+h whenasampleishardtoclassifyandthereexistsomesimilar
p(c|nc,0,...,nc,k,NE) = (cid:2)Ci=(cid:2)0k (10) samples for which the classification task is easier.
gi·nc,i+h
c=1i=0 (cid:11)(cid:12)(cid:13)(cid:14)(cid:15)(cid:16)(cid:12)(cid:3)(cid:5)
with
(cid:2) (cid:2)(cid:3)(cid:4)(cid:5)(cid:3)(cid:3)(cid:5)(cid:6)(cid:7)(cid:3)(cid:2) (cid:2)(cid:3)(cid:4)(cid:8)(cid:9)(cid:10)(cid:3)(cid:3)(cid:8)(cid:9)(cid:10)(cid:6)(cid:7)(cid:3)
(cid:7)NE (cid:6)(cid:4) (cid:6)(cid:4)
nc,i = wi,j ·Ii,j,c, (11) (cid:2)(cid:3)(cid:4)(cid:2)(cid:3)(cid:4)(cid:5)(cid:3)(cid:3)(cid:5)(cid:6)(cid:7)(cid:3)(cid:2)(cid:3)(cid:4)(cid:2)(cid:3)(cid:4)(cid:8)(cid:9)(cid:10)(cid:3)(cid:3)(cid:8)(cid:9)(cid:10)(cid:6)(cid:7)
j=1
wherenowIi,j,c indicateswhethersampleihasbeenassigned
to class c by expert j (cf. 6 above).
(cid:2)(cid:2) (cid:2)(cid:2)(cid:3)(cid:4)(cid:4)(cid:8)(cid:8)(cid:3)(cid:3)(cid:5)(cid:5)(cid:6)(cid:6)(cid:7)(cid:3)(cid:2)(cid:2) (cid:2)(cid:2)(cid:3)(cid:4)(cid:4)(cid:8)(cid:8)(cid:3)(cid:3)(cid:8)(cid:8)(cid:6)(cid:6)(cid:7)
Such as with IDMR, the probability boundaries can be (cid:5)(cid:4) (cid:5)(cid:4)
directly used or a sharp decision for a single class can be (cid:11)(cid:12)(cid:13)(cid:14)(cid:15)(cid:16)(cid:12)(cid:3)(cid:8)
made by choosing c(cid:2) with
Fig.1. Sample0anditsk=2neighborsinthefeaturespace.
(cid:7)k
c(cid:2) = argmax gi·nc,i (12) Using IDMR for sample 0 results in n1 = 1, n2 = 1, and
c
i=0 n3 =0. Obviously, a clear decision concerning the combined
That is, class c with the highest number of weighted ob- classlabelcannotbemade.Thevaluesforthelowerandupper
servations is chosen as the combined label. The uncer- boundaries are p(1) = p(2) = 1/3 and p(1) = p(2) = 2/3
tainty accompanied by this decision can be determined with for the classes 1 and 2, as well as p(3) = 0 and p(3) =
p(c|nc,0,...,nc,k,NE)−p(c|nc,0,...,nc,k,NE). The uncer- 1/3 for class 3 if we use h = 1. That is, the uncertainty
tainty values may vary more likely between different samples regarding the class decision—estimated by the difference of
using EIDMR because of the additional k samples and the the boundaries—is 1/3 for each class.
additional weights gi. Now, we proceed with EIDMR. First, the three dissimilar-
The EIDMR procedure is set out in Algorithm 1. ities of samples are computed with the Euclidean distance:
(cid:7)ccc0−ccc0(cid:7) = 0, (cid:7)ccc1−ccc0(cid:7) = 1, and (cid:7)ccc2−ccc0(cid:7) = 1. In a next
D. Illustrative Example step, the sim(cid:10)ilarity(cid:11)weights are com(cid:10)puted:(cid:11)g0 = 12(1−0) =
To illustrate EIDMR and differences to IDMR, we now 12, g1 = 12 1− 12 = 14, g2 = 12 1− 12 = 14. Then, we
elaborate a simple example with three samples in a feature calculate the nc,i using (11): n1,0 = 1, n2,0 = 1, n2,1 = 3,
space as shown in Figure 1. We assume that each sample was n3,2 = 3. All other nc,i are 0 in this example. These values
(cid:2)k
classifiedbytwoexpertsintooneofthethreeordinalclasses1,
finally lead to the class decision argmaxc gi · nc,i =
2, and 3. Furthermore, we suggest that a difficulty assessment (cid:12) (cid:13) i=0
1
is known for each statement (either “easy” (with weight wi,j argmaxc 2,2,0 = 2. Hence, the additional information
= 1.5), “medium” (wi,j = 1.25), or “hard” (wi,j = 1.0). In of the two nearest neighbors considered in EIDMR leads
2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2167
to the rather clear decision that class 2 should be taken as for the samples in data sets I and III are shown for one of the
label. The computation of the boundaries with h = 1 yields experts in Figure 3.
p(1) = 1/7, p(1) = 3/7, p(2) = 4/7, p(2) = 6/7, and Additionally,thegenerationofadifficultystatementreflect-
p(3)=0,p(3)=2/7.IncomparisontoIDMR,theuncertainty ingthecertaintyofanexpertconcerningthelabelforaspecific
is reduced to 2/7. sample was implemented with three levels “easy”, “medium”,
Altogether, EIDMR leads to a clearer class decision with and “difficult” (with numerical values of 1.00, 1.25, and 1.50,
reduced uncertainty in this very simple example. respectively, cf. also the example in Section II-D). For the
application of the DSR and MR, simple support functions
III. EVALUATIONONARTIFICIALDATASETS (cf. [5]) were used. We had to transform the difficulty values
toanappropriateintervalbymultiplyingwith0.6(i.e.,to0.60,
In this section, the accuracy of the combination rules
0.75, and 0.90).
proposed in Section II-A is investigated and compared to two
further well-known rules, DSR and MR, which are derived C. Properties of the Combination Rules
from DST, on three artificial data sets for which the true
Inthissection,weapplyeachofthefourcombinationrules
classesareknown.Thegenerationoftheartificialdatasetsthat
DSR, MR, IDMR, and EIDMR to the artificial data sets to
weusedinourexperimentsisdescribedinSectionIII-A.Then,
combine the experts’ label statements. To measure the degree
thesimulationofartificialexpertsissketchedinSectionIII-B.
of agreement between the results of the combination rules
The accuracy of the four combination rules on these artificial
and the true labels, the inter-rater agreement is used as an
data is investigated in Section III-C.
accuracy measure. Cohen’s weighted kappa statistic κw is
a standard measure for this purpose [17]. It allows for the
A. Generation of Artificial Data Sets
use of weights to reflect the extent of similarity between
Before we apply the four combination rules to a real data
ordinal classes. Hence, it incorporates the magnitude of each
set, we want to assess these rules with regard to their ability
disagreement and provides partial credit for disagreement
to detect the true classes under different conditions. To do
when agreement is not perfect [18]. The two most widely
this,wegeneratedthreedatasetswhichcontain1000samples used weighting schemes are symmetric linear weights and
with five ordinal classes. The feature space consists of two
symmetric quadratic weights [19]. For our analysis, we use
dimensionswhichallowsforanillustrativepresentationofthe
symmetriclinearweights.Cohen’sweightedkappastatisticκw
datasets.ThesamplesweregeneratedwiththeuseofGaussian
can be interpreted as a chance-corrected index of agreement.
mixture models (GMM) with five components, each assigned
It yields 1 for a perfect agreement. If κw is 0, the agreement
to one of the five ordinal classes. All mixture coefficients of
is equal to that expected under independence. A negative
the GMM were set to 0.2. Thus, each class label occurred value indicates that the agreement is less than expected by
approximately 200 times in a data set. The three data sets chance [20].
were generated with similar GMM but differ regarding the
To apply EIDMR, the data had to be scaled using a z-
location of the samples in the feature space, because the
transformbecausefeatureswithlargervalueswoulddominate
distances of the expectations (mean values) are highest in
those with smaller values in the knn approach [21]. This
data set I and lowest in data set III. That is, the overlap of
scaling is not needed for the other three combination rules.
the component densities increases from data set I to III and,
In our first experiment, the number of additionally considered
thus, the samples belonging to different classes become more
samples in EIDMR was set to k = 3,5,7, or 9. In order
difficulttoseparate.InFigure2,thedatasetswithhighestand
to investigate the influence of the number of experts on the
lowest overlap are set out.
accuracy, we increased this number step-by-step. To obtain
statistically significant results, we repeated the generation of
B. Generation of Artificial Expert Statements
theartificialclasslabelsandtheapplicationofthecombination
Having prepared the data sets, we generated 30 artificial, rules ten times for each data set. The fact that the true classes
ordinal expert statements (class labels) for every sample by are known for the generated data sets enables us to evaluate
altering the true class labels with a random process according the accuracy of the rules.
totheinfluencesidentifiedinSectionI.Eachoftheinfluences In Figure 4, the mean values μ(κw) and the standard
was modeled separately, and, hence, could be chosen individ- deviations σ(κw) for the ten repetitions of the experiment are
ually for every simulated expert. Particularly, every simulated outlined for data set I. Different values of k have been used
expertwasallottedanindividuallevelofexperience(theform applying EIDMR. It can be seen that EIDMR outperforms
ofthedaywasnotmodeledseparately),anindividualnotionof the other combination rules for every considered number of
strictness, and an individual tendency not to opt for extremes. experts and k with regard to μ(κw), and, thus, with regard to
Eachoftheseinfluencesleadstoaprobabilitythatasimulated the accuracy in detecting the true classes. This is due to the
expert labels a sample with an altered (i.e., wrong) class. An additionalstatementsintheknnapproach.Becauseofthelow
expertwithadistinctnotionofstrictnesswill,e.g.,haveahigh overlap of the Gaussian components there are no differences
probabilitytochooseaclasslabelwhichislowerthanthetrue inthecombinationaccuracynomatterwhichoftheconsidered
class. To illustrate the results of this process, the statements valuesisusedfork ifthereismorethanoneexpertstatement
2168 2016InternationalJointConferenceonNeuralNetworks(IJCNN)
(cid:2)(cid:2) (cid:5)(cid:8)(cid:2)(cid:4) (cid:2)(cid:2) (cid:6)(cid:3)
(cid:5)(cid:7) (cid:6)(cid:2)
(cid:5)(cid:6)
(cid:5)(cid:5) (cid:6)(cid:7)
(cid:5)(cid:4)
(cid:3)(cid:3)(cid:3)(cid:6)(cid:7)(cid:2) (cid:9)(cid:9)(cid:10)(cid:10)(cid:11)(cid:11)(cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:3)(cid:5) (cid:4)(cid:5) (cid:8)(cid:8)(cid:9)(cid:9)(cid:10)(cid:10)(cid:11)(cid:11)(cid:11)(cid:11)(cid:12)(cid:12)(cid:6)(cid:2)
(cid:3)(cid:5) (cid:3)
(cid:3)(cid:4) (cid:9)(cid:10)(cid:11)(cid:12)(cid:12)(cid:13)(cid:8) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:13)
(cid:2)(cid:2) (cid:2)(cid:2)
(cid:7) (cid:9)(cid:10)(cid:11)(cid:12)(cid:12)(cid:13)(cid:6) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:3)
(cid:6) (cid:7)
(cid:4)(cid:5) (cid:9)(cid:10)(cid:11)(cid:12)(cid:12)(cid:13)(cid:15) (cid:14)(cid:2) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:15)
(cid:14)(cid:5) (cid:14)(cid:3)
(cid:14)(cid:6)
(cid:14)(cid:7) (cid:14)(cid:4)
(cid:14)(cid:3)(cid:6)(cid:14)(cid:3)(cid:5)(cid:14)(cid:3)(cid:4) (cid:14)(cid:2) (cid:14)(cid:7) (cid:14)(cid:6) (cid:14)(cid:5) (cid:4) (cid:5) (cid:6) (cid:7) (cid:2) (cid:3)(cid:4) (cid:3)(cid:5) (cid:3)(cid:6) (cid:3)(cid:7) (cid:3)(cid:2) (cid:5)(cid:4) (cid:14)(cid:6)(cid:7) (cid:14)(cid:5) (cid:14)(cid:4) (cid:14)(cid:3) (cid:14)(cid:2) (cid:7) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6)(cid:7) (cid:6)(cid:2) (cid:6)(cid:3)
(cid:2)(cid:3) (cid:2)(cid:3)
Fig.2. Twoofthedatasets,I(left)andIII(right),generatedwithdifferentGMM(sampleswithtruelabels).
(cid:3)(cid:4) (cid:5)(cid:21)(cid:19)(cid:20) (cid:2)(cid:2) (cid:6)(cid:3)
(cid:5)(cid:18) (cid:6)(cid:2)
(cid:5)(cid:17)
(cid:5)(cid:5) (cid:6)(cid:7)
(cid:8)(cid:5)(cid:19)(cid:20) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:8) (cid:5) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:6)
(cid:8)(cid:18) (cid:4)
(cid:8)(cid:17) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:5) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:2)
(cid:8)(cid:5) (cid:3)
(cid:8)(cid:20) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:21) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:13)
(cid:19) (cid:2)(cid:2)
(cid:18) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:17) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:3)
(cid:17) (cid:7)
(cid:5) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:10) (cid:14)(cid:2) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:15)
(cid:20)
(cid:25)(cid:5) (cid:14)(cid:3)
(cid:25)(cid:17)
(cid:25)(cid:18) (cid:14)(cid:4)
(cid:25)(cid:8)(cid:17)(cid:25)(cid:8)(cid:5)(cid:25)(cid:8)(cid:20) (cid:25)(cid:19) (cid:25)(cid:18) (cid:25)(cid:17) (cid:25)(cid:5) (cid:20) (cid:5) (cid:17) (cid:18) (cid:19) (cid:8)(cid:20) (cid:8)(cid:5) (cid:8)(cid:17) (cid:8)(cid:18) (cid:8)(cid:19) (cid:5)(cid:20) (cid:3)(cid:5) (cid:14)(cid:6)(cid:7) (cid:14)(cid:5) (cid:14)(cid:4) (cid:14)(cid:3) (cid:14)(cid:2) (cid:7) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6)(cid:7) (cid:6)(cid:2) (cid:6)(cid:3) (cid:2)(cid:3)
Fig.3. StatementsofanartificialexpertforsamplesindatasetI(left)andIII(right).
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:29)!(cid:31) (cid:9)(cid:6)(cid:7)(cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:29)!(cid:31) (cid:2)(cid:6)(cid:7)(cid:2)(cid:8)
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:10)(cid:6) (cid:30)(cid:31) (cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:10)(cid:6) (cid:30)(cid:31) (cid:20)(cid:9)(cid:5)(cid:17)
(cid:9)(cid:6)(cid:8)(cid:7)(cid:2)(cid:8) (cid:27)(cid:27)(cid:28)(cid:28)(cid:29)(cid:29)(cid:30)(cid:30)(cid:31)(cid:31)(cid:3)(cid:3)(cid:4)(cid:4) (cid:2)(cid:2)"(cid:26)(cid:6)(cid:6) (cid:28)(cid:29)(cid:30)(cid:31) (cid:2)(cid:20)(cid:6)(cid:7)(cid:9)(cid:5)(cid:2)(cid:17)(cid:8) (cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:27)(cid:28)(cid:28)(cid:29)(cid:29)(cid:30)(cid:30)(cid:31)(cid:31)(cid:3)(cid:3)(cid:4)(cid:4) (cid:2)(cid:2)"(cid:26)(cid:6)(cid:6) (cid:28)(cid:29)(cid:30)(cid:31) (cid:20)(cid:9)(cid:5)(cid:8)
(cid:20)(cid:9)(cid:26)(cid:10) (cid:20)(cid:9)(cid:5)(cid:8) (cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:8)(cid:19)
(cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:8)(cid:19) (cid:20)(cid:9)(cid:19)(cid:10) (cid:20)(cid:9)(cid:8)(cid:10)
(cid:20)(cid:9)(cid:19)(cid:10) (cid:20)(cid:9)(cid:8)(cid:10)
(cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:8)(cid:5)
(cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:8)(cid:5)
(cid:20)(cid:9)"(cid:10) (cid:20)(cid:9)(cid:20)(cid:26)
(cid:20)(cid:9)"(cid:10) (cid:20)(cid:9)(cid:20)(cid:26)
(cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:18)
(cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:18)
(cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:20)(cid:21)
(cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:20)(cid:21)
(cid:20)(cid:9)(cid:18) (cid:20)
(cid:20)(cid:9)(cid:18) (cid:20)
(cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20)
(cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20)
(cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20)
(cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20)
Fig. 5. Combination accuracy for data set II: mean (top) and standard
Fig. 4. Combination accuracy for data set I: mean (top) and standard deviation(bottom).
deviation(bottom).
strongly. Due to additional information gathered with the knn
available for a sample. In comparison to the other rules, the approach, EIDMR outperforms the other rules if the number
standard deviations of EIDMR are higher if the number of of fused statements (i.e., experts) is lower than 12. However,
expertsislowerthan8.Thiscanbereasonedwiththeinfluence the standard deviations are higher, too. The highest accuracy
of the often differing expert statements gathered with the knn usingEIDMRisreachedwithk =9.Ifthenumberofexperts
approach. The results of MR and IDMR coincide strongly, exceeds 12, DSR performs best in detecting the true classes.
only minor differences can be stated. This is reasoned by Thus, the additional benefit of the knn approach decreases
the combination process using MR where an average belief with an increasing number of experts.
function is computed and then combined NE−1 times using Figure6outlinestheresultsfordatasetIIIwhichisthedata
DSR. This yields very similar results compared to IDMR. set with the highest overlap of components. In the cases with
Results for data set II are shown in Figure 5. Compared to a low number of expert statements (i.e., where the number
the other two data sets, this data set is based on a medium of experts is at most 3), EIDMR outperforms the other rules.
overlap of the five components in the feature space that are In comparison to the other data sets, a variation of k has the
assignedtodifferentclasses.Comparingthethreecombination highest influence on the combination accuracy. The highest
rulesDSR,MR,andIDMR,therearenosignificantdifferences accuracyusingEIDMRisreachedfork =3.Foranincreasing
in their accuracies if the number of experts is lower than number of experts (> 3), the influence of the knn approach
five. As for data set I, the curves of MR and IDMR coincide leads to a lower accuracy of EIDMR compared to the other
2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2169
(cid:9)(cid:6)(cid:7)(cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:29)!(cid:31) (cid:2)(cid:6)(cid:7)(cid:2)(cid:8) k+1 considered samples with EIDMR. The results, outlined
(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:10)(cid:6) (cid:30)(cid:31) (cid:20)(cid:9)(cid:5)(cid:17)
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)"(cid:6) (cid:28)(cid:29)(cid:30)(cid:31) in Figure 8, show that both, the difficulty weights and the
(cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:26)(cid:6) (cid:20)(cid:9)(cid:5)(cid:8)
class order information, have a noticeable positive effect on
(cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:8)(cid:19)
the combination accuracy.
(cid:20)(cid:9)(cid:19)(cid:10) (cid:20)(cid:9)(cid:8)(cid:10)
(cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:8)(cid:5)
(cid:9)(cid:6)(cid:7)(cid:8)
(cid:2) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6)
(cid:20)(cid:9)"(cid:10) (cid:20)(cid:9)(cid:20)(cid:26) (cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)%&(cid:3)&(cid:16)’(cid:9)(cid:3)(cid:28)%(&(cid:16)#(cid:9)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6)
(cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:18) (cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)%&(cid:3)(cid:16)(cid:12)(cid:23)(cid:9)(cid:3))(cid:12)*+,(cid:14)(cid:9)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6)
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)%&(cid:9)(cid:3)(cid:16)(cid:12)(cid:23)(cid:9)(cid:3))(cid:12)*+,(cid:14)(cid:7)(cid:3)%&(cid:3)&(cid:16)’(cid:9)(cid:3)*%(&(cid:16)#(cid:9)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6)
(cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:20)(cid:21)
(cid:20)(cid:9)(cid:26)
(cid:20)(cid:9)(cid:18) (cid:20)
(cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) (cid:20)(cid:9)(cid:19)(cid:10)
(cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20) (cid:20)(cid:9)(cid:19)
(cid:20)(cid:9)"(cid:10)
Fig. 6. Combination accuracy for data set III: mean (top) and standard
deviation(bottom). (cid:20)(cid:9)"
(cid:20)(cid:9)(cid:18)(cid:10)
(cid:20)(cid:9)(cid:18)
combination rules. Both effects can be reasoned by the high
(cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20)
overlap of the components of the mixture model underlying (cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20)
data set III. Then, DSR has the highest average accuracy and
Fig. 8. Influence of class order information and difficulty weights on
also the lowest standard deviation of all combination rules.
combinationaccuracyfordatasetIII.
We now investigate some properties of EIDMR in two
additional experiments.
Considering the results of our experiments for all three
Inoursecondexperiment,wevariedthenumberofsamples
data sets, the DSR has a strong performance with regard
indatasetIIItoexaminetheinfluenceofthesampledensityon
to the detection of the true class labels for every data set.
thecombinationaccuracyofEIDMR.Thisisaveryinteresting
However, the combination accuracy can be further improved
aspectaswecanexpectthatthesampledensityinfluencesthe
using EIDMR in many cases where the the number of avail-
values of κw. Figure 7 outlines the results for 250, 750, and able expert statements is rather low. The actual benefit from
all 1000 samples contained in data set III which result from
EIDMR depends on the particular data set. The accuracy of
using EIDMR with k =3 and 9. It can be seen that the mean
EIDMR increases with more available samples in a data set.
μ(κw) (ten repetitions again, see above) actually increases Furthermore, the use of the ordinal information given by the
significantly with the number of samples. The increasing
class labels and the consideration of reliability weights lead
sample density will lead to a selection of more similar k
to an additional improvement in the application of EIDMR.
samples in the knn approach.
IV. CLASSIFICATIONOFLOW-VOLTAGEGRIDS
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:5)(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6)
(cid:9)(cid:6)(cid:7)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)"(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) Intheprecedingtwosectionsweformallyelaboratedanew
(cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:8)(cid:20)(cid:20)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) combination rule EIDMR and compared it to IDMR, DSR,
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:5)(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:26)(cid:6)
(cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)"(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:26)(cid:6) and MR on three artificial data sets. To further investigate
(cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:8)(cid:20)(cid:20)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:26)(cid:6)
(cid:20)(cid:9)(cid:26) and analyze the combination approaches with regard to a real
(cid:20)(cid:20)(cid:9)(cid:9)(cid:19)(cid:19)(cid:10)(cid:10) application, we now present a case study where we apply
(cid:20)(cid:9)(cid:19) the combination rules to classify 300 low-voltage grids. In
(cid:20)(cid:9)"(cid:10) SectionIV-A,wedescribethebackgroundofourcasestudyas
(cid:20)(cid:9)" wellasthecollectionofthegriddata.Additionally,ourexpert
(cid:20)(cid:9)(cid:18)(cid:10) knowledgebasedapproachtogatherlabelsforthegridsunder
(cid:20)(cid:9)(cid:18) investigation is briefly presented. The previously discussed
(cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) combination rules are compared with regard to their accuracy
(cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20)
in an application of SVM classifiers in Section IV-B.
Fig.7. Combinationaccuracyfordifferentsamplenumbers(datasetIII).
A. Background and Collection of Empirical Data
In our third experiment, we analyzed the influence of the The low-voltage distribution level is the one at which most
difficulty weights wi,j and the use of information about the of the end users—for example households—are connected to
classorderinEIDMR.Todothis,wemaysetthevalueofeach the electric power system. From the beginning of the supply
weightwi,j toonewhichisequivalenttoignoringtheexperts’ withelectricity,thepowersystemwasdesignedinahierarchi-
difficulty assessments for each sample. To ignore the class cal way to transport electric energy from central power plants
orderinformationcontainedintheclasslabels,wemaysetthe to consumers. With changing political circumstances, upcom-
value of each similarity gi to 1/(k+1). In this experiment, ing new market trends, increasing environmental loading, and
we investigate both cases and also their combination. The decreasing availability of fossil energy sources, a paradigm
latter case leads to a simple majority decision considering the change can be recognized [22], [23], [24], [25]. Especially
2170 2016InternationalJointConferenceonNeuralNetworks(IJCNN)
TABLEI
in rural and suburban areas—because of their high potential
GRIDPARAMETERSOFTWOSAMPLEGRIDS.
for Renewable Energies (RE)—the installation of distributed
generators (DG) has been forced in the past decade. Without CharacteristicGridParameter GridI GridII
appropriate counteraction, the high amount of installed DG 1No.oftransformerstations 2 2
2Sumofratedtransformerpower[kVA] 880 1260
in the low-voltage grids—especially Photovoltaic Generators
3No.ofcabledistributionboxes 5 9
(PV)—may cause overloading of the electrical equipment and 4Sumofwiredlinelength[m] 7475 5375
violationofvoltagelimits.Theemergenceofsuchbottlenecks 5Sumofintermeshedlinelength[m] 1764 3105
ishighlydependentonthegridstructureandtheconfiguration 6Portionofintermeshing[%] 23,6 57,8
7PortionofnewlinetypeNAYY150[%] 50,3 62,8
of the DG within the grid. Regarding these problems, the
8Max.straight-linedist.totransf.[m] 1082 354
responsibledistributionsystemoperator(DSO)hastoconsider 9Avg.straight-linedist.totransf.[m] 840 344
specific enhancement and long-acting development of low- 10No.ofhouseconnections 92 100
voltage grids. But regarding the limited financial possibil-
ities in regulated markets (e.g., incentive regulation), the
decision in which low-voltage grids an investment is placed decision at the beginning of the inquiry. With regard to
becomes increasingly important. The discrimination of low- the ranking and weighting of the parameters, we provided
voltage gridsinto different ordinalclasses with regardto their a classification indicator for every grid using the first five
hosting capacity for DG can support the investment decision parameters and their weighting. The indicator represents the
but it is a difficult task, because various and complex grid rounded weighted average of the first five indicator functions
structures exist. This is due to the fact that low-voltage grids in the ranking.
have historically grown structures with local and geographic Althoughweprovidedsupplementaryinformationtothefive
dependencies(e.g.,rivers,compositionoftheground).Tocope expertsthey,unsurprisingly,oftenmadeconflictingstatements.
with the challenge of classifying grids into different ordinal Wealsoaskedeveryexpertforadifficultystatementaccording
classes, expert knowledge can be elicited. Our aim is to build to three difficulty levels (“easy”, “medium”, “hard”) while
a system which is based on expert knowledge and allows for assessingagrid.ThisinformationwasconsideredbyEIDMR.
an automatic classification of a grid. The numeric values for the weights were set as described in
Inourgridsurvey,thetengridparametersshowninTableI Section III-B.
were gathered for 300 real rural and suburban low-voltage
B. Classification with Support Vector Machines
grids. In order to label these data, we accomplished an expert
based procedure. Admittedly, a real ground truth to validate We now apply the four combination rules IDMR, EIDMR,
the experts’ classification results could only be obtained by DSR,andMRtothedataofourcasestudy.Thatis,weaimfor
an actual increase of the DG power in these grids which a combination of five experts’ statements into one combined
is obviously not possible. A number of five experts from classlabelforeverysampleusingeachofthefourcombination
distribution grid planning practice (DSO staff) was chosen rules. Using EIDMR, the number of considered additional
as a trade-off between reliability of the combined statements sampleswassettok =5andthefeatureswerez-transformed.
and costs of the inquiry. The classification was intended to The accuracy of the combination results cannot be evaluated
yieldinformationontheDGcapacityofthelow-voltagegrids such as in Section III-C. This is due to the fact that the true
under consideration. For that, we used five distinct ordinal classes are not known for the data of our case study. We train
classes with an ascending order describing the strength of the a classifier on training data, evaluate it on test data, and do
grid structure: (1) “very weak”, (2) “weak”, (3) “average”, so in a cross-validation approach using the combined class
(4) “strong”, and (5) “very strong”. labels. Thus, we aim to compare the classification accuracy
The design of the questionnaire was oriented to hold the regarding the combined class labels of each combination rule.
qualitycriteriavalidity,reliability,andobjectivityforthemea- Our assumption is that a high accuracy of a combination rule
surement. Hence, we provided the gathered grid parameters concerningthedetectionofthetrueclassesisaccompaniedby
and a plan for every grid as well as some supplementary a high accuracy of the trained classifier.
information for optional usage by the experts. The first sup- In our experiments, we evaluate two different classification
plementary information consisted of five prototype grids, one conceptspredictingtheclasses.Inthefirstconcept,wetreated
for each class, which we selected from the 300 real grids. the expert statements individually by extending the training
To provide more information, we divided the range of the data in the following way: Each feature vector occurs five
parameters into five intervals so that every interval contains times and as targets we used the five expert statements. This
20% of the grids. This results in indicator functions assigning extension of the training data can be seen as an interpretation
one of the five grid classes for the manifestation of every of the set of the five expert statements as one gradual label.
particular parameter (e.g., if the percentage of intermeshing Only the testing is done with the combined label in this
liesbetween0%and31%thecorrespondingindicatorfunction concept. The second concept already uses the combined label
yields class 1 for this parameter). Additionally, the experts inthetrainingphase.Thetestingis,suchasinthefirstconcept,
were asked for a global ranking and weighting of the grid done with the combined label. To implement the classifiers,
parameters concerning their importance for the classification theLIBSVMlibrarywithastandardGaussiankernel[26]was
2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2171
TABLEII
ten 3-fold cross-validations for each classification concept. In
CLASSIFICATIONRESULTSOFTHEFIRSTSVMCLASSIFICATIONCONCEPT
PREDICTINGEIDMR-COMBINEDCLASSLABELS. each of these cross-validations, the data is randomly rear-
ranged into 3 stratified folds. As a consequence, the samples
AccuracyMeasure
# κw,train κw,test etrain [%] etest [%] whichareassignedtothefoldswilldifferineachrepetition.A
number of ten repetitions of the cross-validations is chosen to
1 0.456 0.736 42.8 23.7
2 0.456 0.744 43.1 23.7 getstatisticallysignificantresultsandtoconsidertheinfluence
3 0.449 0.737 43.3 21.7 of the random rearrangement of the folds.
4 0.459 0.713 42.8 24.7
In Table II, the classification error and the classification
5 0.459 0.768 42.9 20.7
Results 6 0.457 0.752 43.1 22.7 accuracy κw of the first classification concept predicting
7 0.459 0.767 42.8 20.7 EIDMR-combined labels are set out for each of the ten cross-
8 0.453 0.738 43.1 23.3
validationrepetitions.Theresultsofallsinglecross-validations
9 0.457 0.763 43.0 20.7
10 0.453 0.755 43.1 22.0 areoutlinedtoillustratetheinfluenceoftherandomrearrange-
μμμ 0.456 0.747 43.0 22.4 ment of the folds on the classification error and classification
σσσ 0.003 0.017 0.2 1.5
accuracy. The last two columns show the classification error
forthetrainingsamplesandthetestdata.Intheprecedingtwo
TABLEIII
CLASSIFICATIONRESULTSOFTHESECONDSVMCLASSIFICATION columns,thevaluesofκw areoutlinedforthetrainingsamples
CONCEPTPREDICTINGEIDMR-COMBINEDCLASSLABELS. as well as the test data. We show both, training and test
accuracies, to illustrate the differences between training and
AccuracyMeasure
test. In addition to the single values of the cross-validations,
# κw,train κw,test etrain [%] etest [%]
theoverallmeanvalues(μμμ)andtheoverallstandarddeviations
1 0.912 0.799 8.2 18.0
2 0.914 0.805 8.0 17.7 (σσσ) are presented for each accuracy measure. It can be seen,
3 0.910 0.786 8.3 19.4 that the overall test accuracy represented by the mean value
4 0.909 0.818 8.5 16.6
5 0.917 0.783 7.7 19.7 μμμ(κw,test) is approximately 0.75. The corresponding overall
Results 6 0.910 0.780 8.3 19.6 testerrorμμμ(etest)is22.4%.Theoveralltrainingaccuracyand
7 0.905 0.807 8.8 17.3 the overall training error attain significantly worse values of
89 00..991025 00..788199 88..28 1196..33 μμμ(κw,train) ≈ 0.46 and μμμ(etest) = 43.0%, respectively. This
10 0.908 0.814 8.5 16.7 isreasonedbythefactthatinthefirstclassificationconceptthe
μμμ 0.910 0.800 8.3 18.1 samefeaturevectorofatrainingsetiscontainedmultipletimes
σσσ 0.004 0.015 0.4 1.3
(here,fivetimes)inthetrainingdata(oftenwithdifferent,i.e.,
conflicting expert statements).
Table III shows the results of the second classification
used. The feature space was built by the characteristic grid concept which directly uses the combined labels to train the
parameters set out in Table I. SVM. Because of the direct use of the combined labels,
To estimate good parameter values for the SVM classifiers the overall training accuracy as well as the overall training
andtopreventthemfromoverfitting,weusedastratified3-fold error are noticeably better compared to the first classification
cross-validation. The stratification is essential here because concept. The test values result in μμμ(κw,train) = 0.80 and
the 300 combined class labels were not equally distributed no μμμ(etest)=18.1%andarealsobetterthantherespectivevalues
matter which of the four combination rules is applied. Thus, of the first classification concept.
the process of randomly rearranging the data into 3 folds has In summary, the second classification concept outperforms
toensurethateachfoldrepresentsthedistributionoftheclass the first one, and, thus, is more advantageous to predict the
labels. Furthermore, the selection of the SVM parameters C combined labels in this experiment. Furthermore, our results
and γ was done using a grid search. Because the test data show that the use of EIDMR has a high potential to yield a
of the 3-fold cross-validation process must not be used to good classification accuracy. But is this accuracy significantly
find the parameters, we implemented another 3-fold cross better compared to the use of the other three combination
validation within the actual validation procedure to robustly rules? To investigate the influence of the combination rule
searchtheparametersonthetrainingdata.Oneproblemisthat ontheclassificationresults,weconductedanotherexperiment.
theclasslabelshaveanordinalcharacterwhichareinterpreted Werealizedthebetterperformingsecondclassificationconcept
as nominal classes by the SVM. Thus, to estimate the model foreachcombinationrule.Theoverallclassificationresultsare
performance, we do not make use of the classification error e set out in Figure 9. Using the labels combined with the three
(amount of incorrectly classified samples) only but optimized well-known approaches DSR, MR, and IDMR yields an over-
the model parameters with regard to κw (which we use to allclassificationaccuracyofapproximatelyμμμ(κw,test)≈0.61
measuretheclassificationaccuracy,thus,e(cid:8)=1−κw)between for each of these three rules. These values are significantly
the test data and the output of the model (cf. Section III-B). worse compared to the discussed results based on EIDMR.
The resulting parameter combination which we used for all Using EIDMR labels leads to an enhancement of the overall
implemented SVM is C =10 and γ =0.03. classification accuracy to about 0.8 on test data. Additionally,
HavingfoundgoodparametervaluesfortheSVM,wemade we observe that the use of EIDMR reduces the standard
2172 2016InternationalJointConferenceonNeuralNetworks(IJCNN)
(cid:8) (cid:20)(cid:9)(cid:20)(cid:17)(cid:20)
(cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:20)(cid:21)(cid:18) [4] K.SentzandS.Ferson,“CombinationofEvidenceinDempster-Shafer-
(cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:20)(cid:21)(cid:5) Theory,”SandiaNationalLaboratories,TechnicalReport,2002.
[5] D. Andrade, T. Horeis, and B. Sick, “Knowledge Fusion Using
(cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:5)(cid:19)
Dempster-Shafer Theory and the Imprecise Dirichlet Model,” IEEE
(cid:20)(cid:9)(cid:18) (cid:20)(cid:9)(cid:20)(cid:5)(cid:17)
ConferenceonSoftComputinginIndustrialApplications,Muroran,pp.
(cid:20)(cid:9)(cid:10) (cid:20)(cid:9)(cid:20)(cid:5)(cid:20)
142–148,2008.
(cid:20)(cid:20)(cid:9)(cid:17)(cid:17) (cid:20)(cid:20)(cid:9)(cid:20)(cid:20)(cid:8)(cid:8)(cid:18)(cid:18)
[6] G.Shafer,AMathematicalTheoryofEvidence. PrincetonUniversity
(cid:20)(cid:9)(cid:21) (cid:20)(cid:9)(cid:20)(cid:8)(cid:5)
Press,Princeton,1976.
(cid:20)(cid:9)(cid:5) (cid:20)(cid:9)(cid:20)(cid:20)(cid:19)
[7] G.Hua-wei,S.Wen-kang,andD.Yong,“EvidentialConflictandits3D
(cid:20)(cid:9)(cid:8) (cid:20)(cid:9)(cid:20)(cid:20)(cid:17)
Strategy:Discard,DiscoverandDisassemble,”SystemsEngineeringand
(cid:20) (cid:20)(cid:9)(cid:20)(cid:20)(cid:20)
Electronics,vol.29,no.6,pp.890–898,2007.
(cid:29)!(cid:31) (cid:30)(cid:31) (cid:28)(cid:29)(cid:30)(cid:31) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31) [8] L.A.Zadeh,“ReviewofBooks:AMathematicalTheoryofEvidence,”
TheAIMagazine,vol.5,no.3,pp.81–83,1984.
-(cid:9)(cid:3) (cid:13)$$(cid:13)(cid:3)(cid:14)(cid:12)(cid:24)(cid:14) -(cid:9)(cid:3) (cid:13)$$(cid:13)(cid:3)(cid:14)(cid:16)(cid:13)*% !(cid:14)’(cid:9)(cid:3)’(cid:12).(cid:9)(cid:3)(cid:14)(cid:12)(cid:24)(cid:14) !(cid:14)’(cid:9)(cid:3)’(cid:12).(cid:9)(cid:3)(cid:14)(cid:16)(cid:13)*%
[9] D.Yi-jie,W.She-liang,Z.Xin-dong,L.Miao,andY.Xi-qin,“AMod-
ified Dempster-Shafer Combination Rule Based on Evidence Ullage,”
Fig. 9. Overall classification accuracy μμμ(κw) for each combination rule International Conference on Artificial Intelligence and Computational
usingthesecondclassificationconcept. Intelligence,Shanghai,pp.387–391,2009.
[10] F. Smarandache and J. Dezert, Advances and Applications of DSmT
for Information Fusion: Collected Works. American Research Press,
Rehoboth,2009.
deviations of the classification accuracy.
[11] A. Tchamova and J. Dezert, “On the Behavior of Dempster’s Rule of
Combination and the Foundations of Dempster-Shafer Theory,” IEEE
V. CONCLUSIONANDFUTURERESEARCH International Conference on Intelligent Systems, Sofia, pp. 108–113,
2012.
In this article, we proposed a novel combination rule for
[12] F. Smarandache, J. Dezert, and V. Kroumov, “Examples Where the
expert statements and compared it to the well-known com- Conjunctive and Dempster’s Rules are Insensitive,” Proceedings of
bination rules DSR, MR, and IDMR. The results show that the 2013 International Conference on Advanced Mechatronic Systems,
Luoyang,pp.7–9,2013.
especiallyifthereareonlyasmallnumberofexpertstatements
[13] M.C.M.TroffaesandF.Coolen,“OntheUseoftheImpreciseDirichlet
available (e.g., due to high elicitation costs), the additional ModelWithFaultTrees,”TechnicalReport,DurhamUniversity.,pp.1–
information gathered from similar samples by means of a 8,2007.
[14] L. V. Utkin, “Extensions of Belief Functions and Possibility Distribu-
knn approach leads to significantly better combination results
tionsbyUsingtheImpreciseDirichletModel,”FuzzySetsandSystems,
with the new EIDMR. The use of the ordinal information vol.154,no.3,pp.413–431,2005.
given by the class labels and the consideration of reliability [15] E. T. Whitaker and G. N. Watson, A Course Of Modern Analysis.
CambridgeUniversityPress,Cambridge,1996.
weights leads to an additional improvement in the application
[16] P.Walley,“InferencesFromMultinomialData:LearningAboutaBag
of EIDMR. ofMarbles,”JournalofRoyalStatisticalSociety,SeriesB(Methodolog-
If desired, EIDMR could be developed further by consid- ical),vol.58,no.1,pp.3–57,1996.
[17] M.J.Warrens,“EquivalencesofWeightedKappasforMultipleRaters,”
ering the distance (measured in the feature space) of similar
StatisticalMethodology,vol.9,no.3,pp.407–422,2012.
samples in the knn approach by means of additional weights. [18] M. Maclure and W. C. Willett, “Misinterpretation and Misuse of the
Tofurthervalidateournewcombinationrule,wepresenteda Kappa Statistic,” American Journal of Epidemiology, vol. 126, no. 2,
pp.161–169,1987.
comprehensivecasestudybyapplyingallcombinationrulesto
[19] P.W.MielkeandK.J.Berry,“ANoteonCohen’sWeightedKappaCo-
thedataof300reallow-voltagegrids.Thegridswereassessed efficientOfAgreementWithLinearWeights,”StatisticalMethodology,
withordinallabelsbyfiveexpertsfromaregionaldistribution vol.6,no.5,pp.439–446,2009.
[20] M.J.Warrens,“Cohen’sQuadraticallyWeightedKappaisHigherThan
system operator. In this case study, the combination rules
LinearlyWeightedKappaforTridiagonalAgreementTables,”Statistical
were compared with regard to the prediction accuracy in Methodology,vol.9,no.3,pp.440–444,2012.
a classification approach with SVM. Our results show that [21] C.-C. Chang and C.-J. Lin, “A Practical Guide to Support Vector
Classification,” Technical Report, National Taiwan University., pp. 1–
EIDMRnoticeablyoutperformstheotherrulesconcerningthe
16,2010.
prediction of the combined labels. [22] G. W. Ault, J. R. McDonald, and G. M. Burt, “Strategic Analysis
Ourfutureresearchactivitiesinthefieldoflow-voltagegrid FrameworkforEvaluatingDistributedGenerationandUtilityStrategies,”
IEEE Proceedings on Generation, Transmission and Distribution, vol.
classificationwillfurtherconsolidatetheconcludedresultsand
150,no.4,pp.475–481,2003.
we will investigate other ways to predict the combined class [23] H. B. Puttgen, P. R. MacGregor, and F. C. Lambert, “Distributed
labels such as, e.g., ensemble learning. Furthermore, we will Generation: Semantic Hype or the Dawn of a new era?” IEEE Power
andEnergyMagazine,vol.1,no.1,pp.1–5,2003.
improve the classification accuracy by integrating additional
[24] P.Chiradeja,“BenefitofDistributedGeneration:ALineLossReduction
features and using classification results from stochastic load Analysis,” Transmission and Distribution Conference and Exhibition:
flow simulations [27]. AsiaandPacific,Yokohama,pp.1–5,2005.
[25] J. Driesen and R. Belmans, “Distributed Generation: Challenges and
PossibleSolutions,”PowerEngineeringSocietyGeneralMeeting,Mon-
REFERENCES
treal,pp.1–5,2006.
[1] S.A.Sandri,D.Dubois,andH.W.Kalfsbeek,“Elicitation,Assessment, [26] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support
and Pooling of Expert Judgments Using Possibility Theory,” IEEE Vector Machines,” ACM Transactions on Intelligent Systems and
TransactionsonFuzzySystems,vol.3,no.3,pp.313–335,1995. Technology, vol. 2, no. 3, pp. 1–27, 2011, software available at
[2] S.S.Stevens,“OntheuseofExpertJudgementonComplexTechnical http://www.csie.ntu.edu.tw/˜cjlin/libsvm(13.01.2015).
Problems,” IEEE Transactions on Engineering Management, vol. 36, [27] S.Breker,A.Claudi,andB.Sick,“CapacityofLow-VoltageGridsfor
no.2,pp.83–86,1989. Distributed Generation: Classification by Means of Stochastic Simula-
[3] C. Murphy, “Combining Belief Functions When Evidence Conflicts,” tions,”IEEETransactionsonPowerSystems,vol.30,no.2,pp.689–700,
DecisionSupportSystems,vol.29,no.1,pp.1–9,2000. 2015.
2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2173