ebook img

Combination Of Uncertain Ordinal Expert Statements: The Combination Rule EIDMR And Its Application To Low-Voltage Grid Classification With SVM PDF

2016·0.49 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Combination Of Uncertain Ordinal Expert Statements: The Combination Rule EIDMR And Its Application To Low-Voltage Grid Classification With SVM

Combination of uncertain ordinal expert statements: The combination rule EIDMR and its application to low-voltage grid classification with SVM Sebastian Breker Bernhard Sick EnergieNetz Mitte GmbH, Kassel, Germany Intelligent Embedded Systems, University of Kassel, Germany Email: [email protected] Email: [email protected] Abstract—Theuse ofexpert knowledgeis alwaysmoreor less In the following, we assume that we want to gather knowl- afflicted with uncertainties for many reasons: Expert knowledge edge of experts in an application task, and only refer to may be imprecise, imperfect, or erroneous, for instance. If we “experts” as human knowledge source. These domain experts ask several experts to label data (e.g., to assign class labels to assess data objects (samples), e.g., they label data using givendataobjects,i.e.samples),weoftenstatethattheseexperts make different, sometimes conflicting statements. The problem predefinedordinalclassestosolvemachinelearningproblems of labeling data for classification tasks is a serious one in many with supervised learning techniques. In addition, the experts technicalapplicationswhereitisrathereasytogatherunlabeled mayassessthedifficultyofprovidingthatlabel.Thisoptional data,butthetaskoflabelingrequiressubstantialeffortregarding value can be seen as a kind of uncertainty of a single expert time and, consequently, money. In this article, we address the regarding that statement. problemofcombiningseveral,potentiallywrongclasslabels.We Basically, the statement of experts using an ordinal scale is assume that we have an ordinal class structure (i.e., three or more classes are arranged such as “light”, “medium-weight”, influenced by many subjective parameters, e.g., and “heavy”) and only a few expert statements are available. 1) experts have individual experience levels, We propose a novel combination rule, the Extended Imprecise 2) their forms of the day may vary, Dirichlet Model Rule (EIDMR) which is based on a k-nearest- 3) they have different notions of “strictness”, and neighborapproachandDirichletdistributions,i.e.,second-order distributions for multinomial distributions. In addition, experts 4) experts have an individual tendency not to opt for may assess the difficulty of the labeling task, which may op- extremes. tionallybeconsideredinthecombination.Thecombinationrule Onanabstractlevel,experiencelevelandformofthedaycan EIDMR is compared to others such as a standard Imprecise beregardedasaffectingtheprobabilitythatacorrectstatement Dirichlet Model Rule, the Dempster-Shafer Rule, and Murphy’s is provided by an expert. The individual use of strictness Rule. In our evaluation of EIDMR we first use artificial data where we know the data characteristics and true class labels. can be seen as a probability that the expert tends to rate a Then, we present results of a case study where we classify low- sample with a higher (or lower) class label compared to the voltage grids with Support Vector Machines (SVM). Here, the true class. Furthermore, the individual tendency not to opt for task is to assess the expandability of these grids with additional extremesleadstoaprobabilitythatanexperttendstoassessa photovoltaic generators (or other distributed generators) by sample whose true class is near a “boundary” class (e.g., the assigning these grids to one of five ordinal classes. It can be shownthattheuseofournewEIDMRleadstobetterclassifiers “lowest” or “highest” class) with a label which is nearer to a in cases where ordinal class labels are used and only a few, “middle”class.Dependingonthistendency,thebandwidthof uncertain expert statements are available. the actually exploited ordinal classes may be low. Assessing the difficulty of the labeling tasks, the fraction I. INTRODUCTION of statements rated as either being “easy” or “difficult” by In general, human judgment can be gathered either in a an expert will be larger or smaller for an expert with a quantitative or in a qualitative way. Often, humans feel not high experience level compared to a low experienced expert. comfortable with the task to express their opinion or belief Likewise, these fractions will differ for a single expert with quantitatively, because they worry that a concrete numeric different forms of the day. Because of all these reasons, a value could give the (false) impression that there is more decisionregardinganordinalclassificationmadebyanexpert confidence in a judgment than they really have. Even for ismoreorlessuncertain.Hence,afinalclassificationdecision applicationexpertsitisdifficulttocopewithhigh-dimensional in a concrete application should not only be made using andcomplextasks(see,e.g.,[1]).But,usingwordsinsteadof statements from a single expert. As experts might disagree, concrete values is also ambiguous and imprecise [2]. Thus, uncertain statements have to be combined in an appropriate making a statement based on predefined ordinal classes (e.g., way. The aim of all combination rules is to generate fused “low”, “medium”, and “high”) can be seen as a reasonable classlabelswhichpredictthetrueclasswithahigheraccuracy trade-off between eliciting a quantitative and a qualitative than with the more uncertain individual class labels. judgment, for instance. In this article, we present a novel combination rule for 978-1-5090-0620-5/16/$31.00(cid:2)c2016IEEE 2164 class labels which is shown to be superior to some existing extendedbyanotherfifthrequirementin[5].Insummary,these ones if (1) we have to cope with ordinal classes as discussed requirements are [5]: above and (2) the number of labels (expert statements) for 1) irrelevance of order of statements in knowledge fusion single samples is quite low. This rule, the Extended Im- (commutativity and associativity), precise Dirichlet Model Rule (EIDMR), is based on a k- 2) decrease of ignorance with an increasing number of nearest-neighbor (knn) approach and Dirichlet distributions, statements, i.e., second-order distributions for multinomial distributions. 3) concordance of knowledge increases the belief in a Basically, it extends a standard Imprecise Dirichlet Model statement, Rule (IDMR) by considering (1) the class order and (2) 4) conflicting knowledge decreases the belief in a state- additional,alsouncertainlabelsforsimilarsamples(measured ment, and inanappropriatefeaturespace).Theuncertaintyaccompanied 5) persistent conflict is reflected. with the combination can be determined with the help of two Most well-known combination rules are based on probability boundaries. The usage of the resulting uncertainty Dempster-Shafer theory (DST) [6], e.g. the Dempster-Shafer values for the combined statement is optional. They can be Rule (DSR) [6] or Murphy’s Rule (MR) [3]. A survey on used, e.g., for a comparison of the uncertainty of different combination rules based on DST has been presented in [7]. data objects or for a comparison to another expert group in a MostofthealternativeDSTbasedrulesconsiderconflictredis- secondclassificationapproach.Inaddition,expertsmayassess tribution. The counter-intuitive behavior concerning paradox the difficulty of the labeling task, which may optionally be problems [8] caused by compatible evidence is rarely consid- considered by EIDMR. ered [9]. Beside the DST based rules there exist some other The accuracy of different rules (including EIDMR, IDMR, ruleswhicharederivedfromDezert-Smarandachetheory[10] the Dempster-Shafer rule, and Murphy’s rule) in predicting and correspond to a non-Bayesian reasoning approach, e.g., a class label is first investigated using three artificial data the PCR5 [11]. Additionally, Smarandache et al. give several sets for which the true classes are known. Then, we compare counterexamples where the DSR fails to provide coherent the combination rules using real data from the field of low- results (or provides no results at all) [12]. voltage grid classification. With combined class labels, we In summary, most of all known DST based rules do not trainsupportvectormachine(SVM)classifiers.Theempirical fulfill—as they were not designated to—the additional fifth data contain 300 samples of rural and suburban low-voltage requirement.BecauseofthisfactAndradeetal.[5]proposean gridswithtenfeatures.Theyweregatheredinanextensivegrid IDMbasedrule(IDMR)andshowthatthisruleaccomplishes surveyandlabeledaccordingtothegrids’hostingcapacityfor all of the properties concerning the above requirements. At distributed generators (e.g., photovoltaic generators) by five this point, it should be mentioned that none of the mentioned experts in distribution grid planning. knownrulesconsiders(1)theclassorderand(2)availableun- The remainder of the article is organized as follows: In certain labels for similar samples (measured in an appropriate Section II, we first discuss related work and IDMR. Then, we feature space). We overcome these points with our EIDMR. present EIDMR and illustrate its properties with an example. B. Methodical Foundations: IDM and IDMR In Section III, the accuracy of these combination rules is To motivate the IDM, we assume a set Ω=1,...,C of C investigated in more detail using three artificial data sets mutuallyexclusiveevents,inourcaseanumberofC different and compared to two other well-known rules derived from Dempster-Shafer theory. Section IV presents the results of classes.Lettheprobabilityθcforthechoiceofaclasscduring a labeling process for a sample be denoted as element of a ourstudyontheclassificationoflow-voltagegrids.SectionV summarizesthekeyfindingsandsketchesourfutureresearch. vector θθθ. Usually, these probabilities θc are unknown. If we supposethatanobjectislabeledbyaltogetherNE experts,the II. COMBINATIONOFCLASSLABELS result can be denoted by t(cid:2)he classes’ occurrence frequencies InthisSection,weproposethenewEIDMR,whichisbased nnn = (n1,...,nC)T with Cc=1nc = NE. Furthermore, we model the likelihood of a resultnnn with [13]: onImpreciseDirichletModels(IDM)andcanbeusedtofuse uncertain ordinal class labels. We start with a presentation (cid:3)C of the current state of the art (Section II-A). After that, the p(nnn|θθθ) = (θc)nc. (1) methodical foundations of IDM and IDMR are given in Sec- c=1 tion II-B. Next, we propose our new EIDMR in Section II-C. Because of the fact that θθθ is usually unknown, we now use The application of IDMR and EIDMR is illustrated with a the Dirichlet distribution, which is the prior distribution in a simple example in Section II-D. Bayesianparameterestimationapproachwithhyperparameters h and ttt = (t1,...,tC)T, to model a distribution of vector A. Combination Rules – The State of the Art θθθ which is assumed to be a random variable itself [14]: Before we present the two combination rules which are (cid:3)C based on IDM in a formal way, some important, general p(θθθ) = (cid:4)CΓ(h) · (θc)htc−1 (2) properties of combination rules are identified. Four of these Γ(htc) c=1 requirements on combination rules are claimed in [3], [4] and c=1 2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2165 (cid:2) C with h > 0, 0 < tc < 1 for c = 1,...,C, and c=1tc = 1. consideration (i.e., cj = c), and 0 otherwise. We replace the Γ is the gamma function [15]. The vector ttt represents the number of experts that assign a given sample to a class c by prior knowledge about θθθ. In a next step, the combination (cid:7)NE of (1) and (2) according to p(θθθ|nnn) ∝ p(nnn|θθθ)·p(θθθ) yields nc = wj·Ij,c. (6) the following posterior probability: j=1 (cid:5) (cid:6) (cid:2)C Thus,thehighertheweightwj ofanexpert,themoreinfluence Γ (nc+htc) (cid:3)C p(θθθ|nnn) = (cid:4)Cc=1 · (θc)nc+htc−1. (3) mhaisnethtehecobroruesnpdoanrideisngacsctoartdeminegntto. W(4it)hanthdis(5n)c.,Twheebmoauynddaertieers- Γ(nc+htc) c=1 can be interpreted similar to belief and plausibility in DST. c=1 Their difference depends on the number of classification We get a Dirichlet distribution again, because a Dirichlet is statements with their weights and can be compared to the the conjugate distribution of a multinomial distribution. The degree of ignorance in DST. influenceofthepriorknowledgetttontheposteriorprobability In an application of IDMR, the boundaries can be used in can be controlled with the hyperparameter h [13]. In a real different ways. If we want to come to a sharp decision for classification application, there is often no information about a certain class, we may act carefully and consider the lower the hyperparameters h andttt available. boundary p(c|nc,NE). Alternatively, we may choose class The idea of the IDMR is now, not only to consider a (cid:2) single one, but to start with a set of Dirichlet distributions c = argmax nc, (7) c for a fixed value of h [13]. Thereby, a (cid:2)set of Dirichlet distributions is chosen in such a way that Cc=1tc = 1 is i.e., the highest number of weighted observations. Then, the adhered[16].Undertheseconditions,thefollowingupperand uncertainty of the decision can be determined by analyzing lower bounds for the probability that a sample is assigned to the differences p(c|nc,NE) − p(c|nc,NE). The uncertainty classccanbedeterminedbyamaximizationandminimization values can be used, e.g., for a comparison of the uncertainty of tc, respectively [14]. That is, for eachttt, the corresponding for different samples or for a comparison to another expert Dirichlet distribution is updated using (1) [13]. If there is no group in a second labeling approach. prior information on θθθ available, the bounds are determined C. EIDMR with the one-sided limits tc →0 and tc →1 (because of the In a next step, we propose a new combination rule EIDMR linear updating step) and result according to [16] in: whichisbasedonthesameideaasIDMR.But,thekeyideais p(c|nc,NE) = nc , (4) todecreasetheuncertaintyofthefusedresultbymeansofad- NE +h ditional information which is available from similar samples. and Here, “similar” means that their feature vectors are similar, determinedwithanappropriatesimilaritymeasure(whichmay p(c|nc,NE) = nc+h (5) bebasedonametric).Tochoosetheseadditionalsamples,the NE +h IDMR is extended by a k-nearest neighbor technique (knn) where c is regarded here as the random variable for the appliedinthefeaturespace.Then,theexpertstatementsforthe combined statement (i.e., the fused class label). The hyper- sampleunderconsiderationandthestatementsofitsk nearest parameter h determines how quickly these upper and lower neighbors are considered. The application of knn ensures that probabilities converge with an increasing number of observa- only samples which have a characteristic that is similar to tions.Becauseofthisreason,Walleydefinedhasanumberof the one of the considered sample are involved in a weighted observationsneededtoreducetheimprecisiontohalfitsinitial combination. Assuming that NE statements are available for value [16]. Typically, h is set to either 1 or 2 [16]. Using the eachsample,(k+1)·NE statementsareconsideredbyEIDMR derived bounds, the vectorttt needs not to be specified. to assess one specific sample. EIDMR weights the statements IDMR (not related to DST, cf. [5], [16]) is an alternative not only depending on the experts’ difficulty assessment as to the fusion rules based on DST. Andrade et al. used the described above, but also depending on differences between probabilityboundariesfrom(4)and(5)tocombineacollection the class assignments for the sample under consideration and of NE classification statements cj (cj ∈Ω is the statement of its k neighbors. EIDMR also considers the fact that we have expert j, j =1,...,NE) among which some experts vote for ordinal classes: Larger differences in class numbers lead to class c. lower weighting factors. Above, nc is the number of experts that assign a given To describe the approach in a formal way, we first need an sample to class c. With NE experts, we have 0 ≤ nc ≤ NE. additional index i for several variables as we have to consider This approach does not consider a difficulty assessment of notonlyonesample,butalsoitskneighbors.Thus,thesample the experts. To account for this, we introduce weights wj for under consideration gets the index 0, and its neighbors i = each classification statement cj of an expert (with wj ≥ 1, 1,...,k. Then, for a sample i (i = 0,...,k) we have NE as suggested in [5]). Then, we need an indicator function Ij,c expertstatements(assignmentstoclasses)summarizedinccci = which is 1 if expert j assigns label c to the sample under (ci,1,...,ci,NE)T with corresponding difficulty assessments 2166 2016InternationalJointConferenceonNeuralNetworks(IJCNN) wwwi = (wi,1,...,wi,NE)T. As we have ordinal classes, the Algorithm 1: EIDMR algorithm for one sample. dissimilarity between a vector ccci and the vector ccc0 can be Input: set of samples with feature vectors, experts’ measuredusingastandardmetric,e.g.,theEuclideandistance classification statements, and difficulty (cid:7)ccci−ccc0(cid:7). Then, we are able to define weights for samples assessment; sample 0 that has to be labeled. gi (with i = 0,...,k) that consider differences in their class 1 Search k nearest neighbors of sample 0 in the feature assignments as follows: space with a Euclidean distance measure; (cid:8) (cid:9) gi = k1 · 1− (cid:2)kl=(cid:7)ccc1i(cid:7)−ccclccc−0(cid:7)ccc0(cid:7) . (8) 23 forCio=m0putoteksidmoilarity weights gi; 4 for c=1 to C do Toavoiddividingbyzerowehavetoexcludetheveryspecial 5 compute lower boundary p(c|nc,0,...,nc,k,NE); casewhereallclassvectorsccci areequal(inthiscasewewould 6 compute upper boundary p(c|nc,0,...,nc,k,NE); wpreoicgehetds hwaivtehtgh0e p=ro1peartnyd(cid:2)giki==0g0i =for1.i = 1,...,k). These 7 cOhuotopsuet:clcalsasssc(cid:2)c(cid:2)wainthdcu(cid:2)p=pearragnmdaloxwce(cid:2)r bki=ou0ngdia·rniecs,ip; and p. Withthisspadework,weformallyextendthetwoprobability boundaries in the following way: (cid:2)k gi·nc,i the figure, the classification statements and weights are set p(c|nc,0,...,nc,k,NE) = (cid:2)C (cid:2)i=k0 , (9) out as vectors ccci and wwwi (i = 0,1,2). It can be seen that gi·nc,i+h the weights of the experts’ statements are lower for the data c=1i=0 sample 0 under consideration (red) compared to the weights (cid:2)k of its two nearest neighbors (green). This simulates the case gi·nc,i+h whenasampleishardtoclassifyandthereexistsomesimilar p(c|nc,0,...,nc,k,NE) = (cid:2)Ci=(cid:2)0k (10) samples for which the classification task is easier. gi·nc,i+h c=1i=0 (cid:11)(cid:12)(cid:13)(cid:14)(cid:15)(cid:16)(cid:12)(cid:3)(cid:5) with (cid:2) (cid:2)(cid:3)(cid:4)(cid:5)(cid:3)(cid:3)(cid:5)(cid:6)(cid:7)(cid:3)(cid:2) (cid:2)(cid:3)(cid:4)(cid:8)(cid:9)(cid:10)(cid:3)(cid:3)(cid:8)(cid:9)(cid:10)(cid:6)(cid:7)(cid:3) (cid:7)NE (cid:6)(cid:4) (cid:6)(cid:4) nc,i = wi,j ·Ii,j,c, (11) (cid:2)(cid:3)(cid:4)(cid:2)(cid:3)(cid:4)(cid:5)(cid:3)(cid:3)(cid:5)(cid:6)(cid:7)(cid:3)(cid:2)(cid:3)(cid:4)(cid:2)(cid:3)(cid:4)(cid:8)(cid:9)(cid:10)(cid:3)(cid:3)(cid:8)(cid:9)(cid:10)(cid:6)(cid:7) j=1 wherenowIi,j,c indicateswhethersampleihasbeenassigned to class c by expert j (cf. 6 above). (cid:2)(cid:2) (cid:2)(cid:2)(cid:3)(cid:4)(cid:4)(cid:8)(cid:8)(cid:3)(cid:3)(cid:5)(cid:5)(cid:6)(cid:6)(cid:7)(cid:3)(cid:2)(cid:2) (cid:2)(cid:2)(cid:3)(cid:4)(cid:4)(cid:8)(cid:8)(cid:3)(cid:3)(cid:8)(cid:8)(cid:6)(cid:6)(cid:7) Such as with IDMR, the probability boundaries can be (cid:5)(cid:4) (cid:5)(cid:4) directly used or a sharp decision for a single class can be (cid:11)(cid:12)(cid:13)(cid:14)(cid:15)(cid:16)(cid:12)(cid:3)(cid:8) made by choosing c(cid:2) with Fig.1. Sample0anditsk=2neighborsinthefeaturespace. (cid:7)k c(cid:2) = argmax gi·nc,i (12) Using IDMR for sample 0 results in n1 = 1, n2 = 1, and c i=0 n3 =0. Obviously, a clear decision concerning the combined That is, class c with the highest number of weighted ob- classlabelcannotbemade.Thevaluesforthelowerandupper servations is chosen as the combined label. The uncer- boundaries are p(1) = p(2) = 1/3 and p(1) = p(2) = 2/3 tainty accompanied by this decision can be determined with for the classes 1 and 2, as well as p(3) = 0 and p(3) = p(c|nc,0,...,nc,k,NE)−p(c|nc,0,...,nc,k,NE). The uncer- 1/3 for class 3 if we use h = 1. That is, the uncertainty tainty values may vary more likely between different samples regarding the class decision—estimated by the difference of using EIDMR because of the additional k samples and the the boundaries—is 1/3 for each class. additional weights gi. Now, we proceed with EIDMR. First, the three dissimilar- The EIDMR procedure is set out in Algorithm 1. ities of samples are computed with the Euclidean distance: (cid:7)ccc0−ccc0(cid:7) = 0, (cid:7)ccc1−ccc0(cid:7) = 1, and (cid:7)ccc2−ccc0(cid:7) = 1. In a next D. Illustrative Example step, the sim(cid:10)ilarity(cid:11)weights are com(cid:10)puted:(cid:11)g0 = 12(1−0) = To illustrate EIDMR and differences to IDMR, we now 12, g1 = 12 1− 12 = 14, g2 = 12 1− 12 = 14. Then, we elaborate a simple example with three samples in a feature calculate the nc,i using (11): n1,0 = 1, n2,0 = 1, n2,1 = 3, space as shown in Figure 1. We assume that each sample was n3,2 = 3. All other nc,i are 0 in this example. These values (cid:2)k classifiedbytwoexpertsintooneofthethreeordinalclasses1, finally lead to the class decision argmaxc gi · nc,i = 2, and 3. Furthermore, we suggest that a difficulty assessment (cid:12) (cid:13) i=0 1 is known for each statement (either “easy” (with weight wi,j argmaxc 2,2,0 = 2. Hence, the additional information = 1.5), “medium” (wi,j = 1.25), or “hard” (wi,j = 1.0). In of the two nearest neighbors considered in EIDMR leads 2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2167 to the rather clear decision that class 2 should be taken as for the samples in data sets I and III are shown for one of the label. The computation of the boundaries with h = 1 yields experts in Figure 3. p(1) = 1/7, p(1) = 3/7, p(2) = 4/7, p(2) = 6/7, and Additionally,thegenerationofadifficultystatementreflect- p(3)=0,p(3)=2/7.IncomparisontoIDMR,theuncertainty ingthecertaintyofanexpertconcerningthelabelforaspecific is reduced to 2/7. sample was implemented with three levels “easy”, “medium”, Altogether, EIDMR leads to a clearer class decision with and “difficult” (with numerical values of 1.00, 1.25, and 1.50, reduced uncertainty in this very simple example. respectively, cf. also the example in Section II-D). For the application of the DSR and MR, simple support functions III. EVALUATIONONARTIFICIALDATASETS (cf. [5]) were used. We had to transform the difficulty values toanappropriateintervalbymultiplyingwith0.6(i.e.,to0.60, In this section, the accuracy of the combination rules 0.75, and 0.90). proposed in Section II-A is investigated and compared to two further well-known rules, DSR and MR, which are derived C. Properties of the Combination Rules from DST, on three artificial data sets for which the true Inthissection,weapplyeachofthefourcombinationrules classesareknown.Thegenerationoftheartificialdatasetsthat DSR, MR, IDMR, and EIDMR to the artificial data sets to weusedinourexperimentsisdescribedinSectionIII-A.Then, combine the experts’ label statements. To measure the degree thesimulationofartificialexpertsissketchedinSectionIII-B. of agreement between the results of the combination rules The accuracy of the four combination rules on these artificial and the true labels, the inter-rater agreement is used as an data is investigated in Section III-C. accuracy measure. Cohen’s weighted kappa statistic κw is a standard measure for this purpose [17]. It allows for the A. Generation of Artificial Data Sets use of weights to reflect the extent of similarity between Before we apply the four combination rules to a real data ordinal classes. Hence, it incorporates the magnitude of each set, we want to assess these rules with regard to their ability disagreement and provides partial credit for disagreement to detect the true classes under different conditions. To do when agreement is not perfect [18]. The two most widely this,wegeneratedthreedatasetswhichcontain1000samples used weighting schemes are symmetric linear weights and with five ordinal classes. The feature space consists of two symmetric quadratic weights [19]. For our analysis, we use dimensionswhichallowsforanillustrativepresentationofthe symmetriclinearweights.Cohen’sweightedkappastatisticκw datasets.ThesamplesweregeneratedwiththeuseofGaussian can be interpreted as a chance-corrected index of agreement. mixture models (GMM) with five components, each assigned It yields 1 for a perfect agreement. If κw is 0, the agreement to one of the five ordinal classes. All mixture coefficients of is equal to that expected under independence. A negative the GMM were set to 0.2. Thus, each class label occurred value indicates that the agreement is less than expected by approximately 200 times in a data set. The three data sets chance [20]. were generated with similar GMM but differ regarding the To apply EIDMR, the data had to be scaled using a z- location of the samples in the feature space, because the transformbecausefeatureswithlargervalueswoulddominate distances of the expectations (mean values) are highest in those with smaller values in the knn approach [21]. This data set I and lowest in data set III. That is, the overlap of scaling is not needed for the other three combination rules. the component densities increases from data set I to III and, In our first experiment, the number of additionally considered thus, the samples belonging to different classes become more samples in EIDMR was set to k = 3,5,7, or 9. In order difficulttoseparate.InFigure2,thedatasetswithhighestand to investigate the influence of the number of experts on the lowest overlap are set out. accuracy, we increased this number step-by-step. To obtain statistically significant results, we repeated the generation of B. Generation of Artificial Expert Statements theartificialclasslabelsandtheapplicationofthecombination Having prepared the data sets, we generated 30 artificial, rules ten times for each data set. The fact that the true classes ordinal expert statements (class labels) for every sample by are known for the generated data sets enables us to evaluate altering the true class labels with a random process according the accuracy of the rules. totheinfluencesidentifiedinSectionI.Eachoftheinfluences In Figure 4, the mean values μ(κw) and the standard was modeled separately, and, hence, could be chosen individ- deviations σ(κw) for the ten repetitions of the experiment are ually for every simulated expert. Particularly, every simulated outlined for data set I. Different values of k have been used expertwasallottedanindividuallevelofexperience(theform applying EIDMR. It can be seen that EIDMR outperforms ofthedaywasnotmodeledseparately),anindividualnotionof the other combination rules for every considered number of strictness, and an individual tendency not to opt for extremes. experts and k with regard to μ(κw), and, thus, with regard to Eachoftheseinfluencesleadstoaprobabilitythatasimulated the accuracy in detecting the true classes. This is due to the expert labels a sample with an altered (i.e., wrong) class. An additionalstatementsintheknnapproach.Becauseofthelow expertwithadistinctnotionofstrictnesswill,e.g.,haveahigh overlap of the Gaussian components there are no differences probabilitytochooseaclasslabelwhichislowerthanthetrue inthecombinationaccuracynomatterwhichoftheconsidered class. To illustrate the results of this process, the statements valuesisusedfork ifthereismorethanoneexpertstatement 2168 2016InternationalJointConferenceonNeuralNetworks(IJCNN) (cid:2)(cid:2) (cid:5)(cid:8)(cid:2)(cid:4) (cid:2)(cid:2) (cid:6)(cid:3) (cid:5)(cid:7) (cid:6)(cid:2) (cid:5)(cid:6) (cid:5)(cid:5) (cid:6)(cid:7) (cid:5)(cid:4) (cid:3)(cid:3)(cid:3)(cid:6)(cid:7)(cid:2) (cid:9)(cid:9)(cid:10)(cid:10)(cid:11)(cid:11)(cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:3)(cid:5) (cid:4)(cid:5) (cid:8)(cid:8)(cid:9)(cid:9)(cid:10)(cid:10)(cid:11)(cid:11)(cid:11)(cid:11)(cid:12)(cid:12)(cid:6)(cid:2) (cid:3)(cid:5) (cid:3) (cid:3)(cid:4) (cid:9)(cid:10)(cid:11)(cid:12)(cid:12)(cid:13)(cid:8) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:13) (cid:2)(cid:2) (cid:2)(cid:2) (cid:7) (cid:9)(cid:10)(cid:11)(cid:12)(cid:12)(cid:13)(cid:6) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:3) (cid:6) (cid:7) (cid:4)(cid:5) (cid:9)(cid:10)(cid:11)(cid:12)(cid:12)(cid:13)(cid:15) (cid:14)(cid:2) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:15) (cid:14)(cid:5) (cid:14)(cid:3) (cid:14)(cid:6) (cid:14)(cid:7) (cid:14)(cid:4) (cid:14)(cid:3)(cid:6)(cid:14)(cid:3)(cid:5)(cid:14)(cid:3)(cid:4) (cid:14)(cid:2) (cid:14)(cid:7) (cid:14)(cid:6) (cid:14)(cid:5) (cid:4) (cid:5) (cid:6) (cid:7) (cid:2) (cid:3)(cid:4) (cid:3)(cid:5) (cid:3)(cid:6) (cid:3)(cid:7) (cid:3)(cid:2) (cid:5)(cid:4) (cid:14)(cid:6)(cid:7) (cid:14)(cid:5) (cid:14)(cid:4) (cid:14)(cid:3) (cid:14)(cid:2) (cid:7) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6)(cid:7) (cid:6)(cid:2) (cid:6)(cid:3) (cid:2)(cid:3) (cid:2)(cid:3) Fig.2. Twoofthedatasets,I(left)andIII(right),generatedwithdifferentGMM(sampleswithtruelabels). (cid:3)(cid:4) (cid:5)(cid:21)(cid:19)(cid:20) (cid:2)(cid:2) (cid:6)(cid:3) (cid:5)(cid:18) (cid:6)(cid:2) (cid:5)(cid:17) (cid:5)(cid:5) (cid:6)(cid:7) (cid:8)(cid:5)(cid:19)(cid:20) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:8) (cid:5) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:6) (cid:8)(cid:18) (cid:4) (cid:8)(cid:17) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:5) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:2) (cid:8)(cid:5) (cid:3) (cid:8)(cid:20) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:21) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:13) (cid:19) (cid:2)(cid:2) (cid:18) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:17) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:3) (cid:17) (cid:7) (cid:5) (cid:22)(cid:23)(cid:13)(cid:24)(cid:24)(cid:3)(cid:10) (cid:14)(cid:2) (cid:8)(cid:9)(cid:10)(cid:11)(cid:11)(cid:12)(cid:15) (cid:20) (cid:25)(cid:5) (cid:14)(cid:3) (cid:25)(cid:17) (cid:25)(cid:18) (cid:14)(cid:4) (cid:25)(cid:8)(cid:17)(cid:25)(cid:8)(cid:5)(cid:25)(cid:8)(cid:20) (cid:25)(cid:19) (cid:25)(cid:18) (cid:25)(cid:17) (cid:25)(cid:5) (cid:20) (cid:5) (cid:17) (cid:18) (cid:19) (cid:8)(cid:20) (cid:8)(cid:5) (cid:8)(cid:17) (cid:8)(cid:18) (cid:8)(cid:19) (cid:5)(cid:20) (cid:3)(cid:5) (cid:14)(cid:6)(cid:7) (cid:14)(cid:5) (cid:14)(cid:4) (cid:14)(cid:3) (cid:14)(cid:2) (cid:7) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6)(cid:7) (cid:6)(cid:2) (cid:6)(cid:3) (cid:2)(cid:3) Fig.3. StatementsofanartificialexpertforsamplesindatasetI(left)andIII(right). (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:29)!(cid:31) (cid:9)(cid:6)(cid:7)(cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:29)!(cid:31) (cid:2)(cid:6)(cid:7)(cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:10)(cid:6) (cid:30)(cid:31) (cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:10)(cid:6) (cid:30)(cid:31) (cid:20)(cid:9)(cid:5)(cid:17) (cid:9)(cid:6)(cid:8)(cid:7)(cid:2)(cid:8) (cid:27)(cid:27)(cid:28)(cid:28)(cid:29)(cid:29)(cid:30)(cid:30)(cid:31)(cid:31)(cid:3)(cid:3)(cid:4)(cid:4) (cid:2)(cid:2)"(cid:26)(cid:6)(cid:6) (cid:28)(cid:29)(cid:30)(cid:31) (cid:2)(cid:20)(cid:6)(cid:7)(cid:9)(cid:5)(cid:2)(cid:17)(cid:8) (cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:27)(cid:28)(cid:28)(cid:29)(cid:29)(cid:30)(cid:30)(cid:31)(cid:31)(cid:3)(cid:3)(cid:4)(cid:4) (cid:2)(cid:2)"(cid:26)(cid:6)(cid:6) (cid:28)(cid:29)(cid:30)(cid:31) (cid:20)(cid:9)(cid:5)(cid:8) (cid:20)(cid:9)(cid:26)(cid:10) (cid:20)(cid:9)(cid:5)(cid:8) (cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:8)(cid:19) (cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:8)(cid:19) (cid:20)(cid:9)(cid:19)(cid:10) (cid:20)(cid:9)(cid:8)(cid:10) (cid:20)(cid:9)(cid:19)(cid:10) (cid:20)(cid:9)(cid:8)(cid:10) (cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:8)(cid:5) (cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:8)(cid:5) (cid:20)(cid:9)"(cid:10) (cid:20)(cid:9)(cid:20)(cid:26) (cid:20)(cid:9)"(cid:10) (cid:20)(cid:9)(cid:20)(cid:26) (cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:18) (cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:18) (cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:20)(cid:21) (cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:20)(cid:21) (cid:20)(cid:9)(cid:18) (cid:20) (cid:20)(cid:9)(cid:18) (cid:20) (cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) (cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) (cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20) (cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20) Fig. 5. Combination accuracy for data set II: mean (top) and standard Fig. 4. Combination accuracy for data set I: mean (top) and standard deviation(bottom). deviation(bottom). strongly. Due to additional information gathered with the knn available for a sample. In comparison to the other rules, the approach, EIDMR outperforms the other rules if the number standard deviations of EIDMR are higher if the number of of fused statements (i.e., experts) is lower than 12. However, expertsislowerthan8.Thiscanbereasonedwiththeinfluence the standard deviations are higher, too. The highest accuracy of the often differing expert statements gathered with the knn usingEIDMRisreachedwithk =9.Ifthenumberofexperts approach. The results of MR and IDMR coincide strongly, exceeds 12, DSR performs best in detecting the true classes. only minor differences can be stated. This is reasoned by Thus, the additional benefit of the knn approach decreases the combination process using MR where an average belief with an increasing number of experts. function is computed and then combined NE−1 times using Figure6outlinestheresultsfordatasetIIIwhichisthedata DSR. This yields very similar results compared to IDMR. set with the highest overlap of components. In the cases with Results for data set II are shown in Figure 5. Compared to a low number of expert statements (i.e., where the number the other two data sets, this data set is based on a medium of experts is at most 3), EIDMR outperforms the other rules. overlap of the five components in the feature space that are In comparison to the other data sets, a variation of k has the assignedtodifferentclasses.Comparingthethreecombination highest influence on the combination accuracy. The highest rulesDSR,MR,andIDMR,therearenosignificantdifferences accuracyusingEIDMRisreachedfork =3.Foranincreasing in their accuracies if the number of experts is lower than number of experts (> 3), the influence of the knn approach five. As for data set I, the curves of MR and IDMR coincide leads to a lower accuracy of EIDMR compared to the other 2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2169 (cid:9)(cid:6)(cid:7)(cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:29)!(cid:31) (cid:2)(cid:6)(cid:7)(cid:2)(cid:8) k+1 considered samples with EIDMR. The results, outlined (cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:10)(cid:6) (cid:30)(cid:31) (cid:20)(cid:9)(cid:5)(cid:17) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)"(cid:6) (cid:28)(cid:29)(cid:30)(cid:31) in Figure 8, show that both, the difficulty weights and the (cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:26)(cid:6) (cid:20)(cid:9)(cid:5)(cid:8) class order information, have a noticeable positive effect on (cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:8)(cid:19) the combination accuracy. (cid:20)(cid:9)(cid:19)(cid:10) (cid:20)(cid:9)(cid:8)(cid:10) (cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:8)(cid:5) (cid:9)(cid:6)(cid:7)(cid:8) (cid:2) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4) (cid:2)(cid:21)(cid:6) (cid:20)(cid:9)"(cid:10) (cid:20)(cid:9)(cid:20)(cid:26) (cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)%&(cid:3)&(cid:16)’(cid:9)(cid:3)(cid:28)%(&(cid:16)#(cid:9)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) (cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:18) (cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)%&(cid:3)(cid:16)(cid:12)(cid:23)(cid:9)(cid:3))(cid:12)*+,(cid:14)(cid:9)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)%&(cid:9)(cid:3)(cid:16)(cid:12)(cid:23)(cid:9)(cid:3))(cid:12)*+,(cid:14)(cid:7)(cid:3)%&(cid:3)&(cid:16)’(cid:9)(cid:3)*%(&(cid:16)#(cid:9)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) (cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:20)(cid:21) (cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:18) (cid:20) (cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) (cid:20)(cid:9)(cid:19)(cid:10) (cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20) (cid:20)(cid:9)(cid:19) (cid:20)(cid:9)"(cid:10) Fig. 6. Combination accuracy for data set III: mean (top) and standard deviation(bottom). (cid:20)(cid:9)" (cid:20)(cid:9)(cid:18)(cid:10) (cid:20)(cid:9)(cid:18) combination rules. Both effects can be reasoned by the high (cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) overlap of the components of the mixture model underlying (cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20) data set III. Then, DSR has the highest average accuracy and Fig. 8. Influence of class order information and difficulty weights on also the lowest standard deviation of all combination rules. combinationaccuracyfordatasetIII. We now investigate some properties of EIDMR in two additional experiments. Considering the results of our experiments for all three Inoursecondexperiment,wevariedthenumberofsamples data sets, the DSR has a strong performance with regard indatasetIIItoexaminetheinfluenceofthesampledensityon to the detection of the true class labels for every data set. thecombinationaccuracyofEIDMR.Thisisaveryinteresting However, the combination accuracy can be further improved aspectaswecanexpectthatthesampledensityinfluencesthe using EIDMR in many cases where the the number of avail- values of κw. Figure 7 outlines the results for 250, 750, and able expert statements is rather low. The actual benefit from all 1000 samples contained in data set III which result from EIDMR depends on the particular data set. The accuracy of using EIDMR with k =3 and 9. It can be seen that the mean EIDMR increases with more available samples in a data set. μ(κw) (ten repetitions again, see above) actually increases Furthermore, the use of the ordinal information given by the significantly with the number of samples. The increasing class labels and the consideration of reliability weights lead sample density will lead to a selection of more similar k to an additional improvement in the application of EIDMR. samples in the knn approach. IV. CLASSIFICATIONOFLOW-VOLTAGEGRIDS (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:5)(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) (cid:9)(cid:6)(cid:7)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)"(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) Intheprecedingtwosectionsweformallyelaboratedanew (cid:2)(cid:8) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:8)(cid:20)(cid:20)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:21)(cid:6) combination rule EIDMR and compared it to IDMR, DSR, (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:5)(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:26)(cid:6) (cid:20)(cid:9)(cid:26)(cid:10) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)"(cid:10)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:26)(cid:6) and MR on three artificial data sets. To further investigate (cid:27)(cid:28)(cid:29)(cid:30)(cid:31)(cid:3)(cid:4)(cid:8)(cid:20)(cid:20)(cid:20)(cid:3)(cid:24)(cid:13)#$(cid:23)(cid:12)(cid:24)(cid:7)(cid:3) (cid:2)(cid:26)(cid:6) (cid:20)(cid:9)(cid:26) and analyze the combination approaches with regard to a real (cid:20)(cid:20)(cid:9)(cid:9)(cid:19)(cid:19)(cid:10)(cid:10) application, we now present a case study where we apply (cid:20)(cid:9)(cid:19) the combination rules to classify 300 low-voltage grids. In (cid:20)(cid:9)"(cid:10) SectionIV-A,wedescribethebackgroundofourcasestudyas (cid:20)(cid:9)" wellasthecollectionofthegriddata.Additionally,ourexpert (cid:20)(cid:9)(cid:18)(cid:10) knowledgebasedapproachtogatherlabelsforthegridsunder (cid:20)(cid:9)(cid:18) investigation is briefly presented. The previously discussed (cid:20) (cid:10) (cid:8)(cid:20) (cid:8)(cid:10) (cid:5)(cid:20) (cid:5)(cid:10) (cid:21)(cid:20) combination rules are compared with regard to their accuracy (cid:10)(cid:11)(cid:12)(cid:13)(cid:11)(cid:14)(cid:13)(cid:15)(cid:16)(cid:17)(cid:15)(cid:18)(cid:19)(cid:20) in an application of SVM classifiers in Section IV-B. Fig.7. Combinationaccuracyfordifferentsamplenumbers(datasetIII). A. Background and Collection of Empirical Data In our third experiment, we analyzed the influence of the The low-voltage distribution level is the one at which most difficulty weights wi,j and the use of information about the of the end users—for example households—are connected to classorderinEIDMR.Todothis,wemaysetthevalueofeach the electric power system. From the beginning of the supply weightwi,j toonewhichisequivalenttoignoringtheexperts’ withelectricity,thepowersystemwasdesignedinahierarchi- difficulty assessments for each sample. To ignore the class cal way to transport electric energy from central power plants orderinformationcontainedintheclasslabels,wemaysetthe to consumers. With changing political circumstances, upcom- value of each similarity gi to 1/(k+1). In this experiment, ing new market trends, increasing environmental loading, and we investigate both cases and also their combination. The decreasing availability of fossil energy sources, a paradigm latter case leads to a simple majority decision considering the change can be recognized [22], [23], [24], [25]. Especially 2170 2016InternationalJointConferenceonNeuralNetworks(IJCNN) TABLEI in rural and suburban areas—because of their high potential GRIDPARAMETERSOFTWOSAMPLEGRIDS. for Renewable Energies (RE)—the installation of distributed generators (DG) has been forced in the past decade. Without CharacteristicGridParameter GridI GridII appropriate counteraction, the high amount of installed DG 1No.oftransformerstations 2 2 2Sumofratedtransformerpower[kVA] 880 1260 in the low-voltage grids—especially Photovoltaic Generators 3No.ofcabledistributionboxes 5 9 (PV)—may cause overloading of the electrical equipment and 4Sumofwiredlinelength[m] 7475 5375 violationofvoltagelimits.Theemergenceofsuchbottlenecks 5Sumofintermeshedlinelength[m] 1764 3105 ishighlydependentonthegridstructureandtheconfiguration 6Portionofintermeshing[%] 23,6 57,8 7PortionofnewlinetypeNAYY150[%] 50,3 62,8 of the DG within the grid. Regarding these problems, the 8Max.straight-linedist.totransf.[m] 1082 354 responsibledistributionsystemoperator(DSO)hastoconsider 9Avg.straight-linedist.totransf.[m] 840 344 specific enhancement and long-acting development of low- 10No.ofhouseconnections 92 100 voltage grids. But regarding the limited financial possibil- ities in regulated markets (e.g., incentive regulation), the decision in which low-voltage grids an investment is placed decision at the beginning of the inquiry. With regard to becomes increasingly important. The discrimination of low- the ranking and weighting of the parameters, we provided voltage gridsinto different ordinalclasses with regardto their a classification indicator for every grid using the first five hosting capacity for DG can support the investment decision parameters and their weighting. The indicator represents the but it is a difficult task, because various and complex grid rounded weighted average of the first five indicator functions structures exist. This is due to the fact that low-voltage grids in the ranking. have historically grown structures with local and geographic Althoughweprovidedsupplementaryinformationtothefive dependencies(e.g.,rivers,compositionoftheground).Tocope expertsthey,unsurprisingly,oftenmadeconflictingstatements. with the challenge of classifying grids into different ordinal Wealsoaskedeveryexpertforadifficultystatementaccording classes, expert knowledge can be elicited. Our aim is to build to three difficulty levels (“easy”, “medium”, “hard”) while a system which is based on expert knowledge and allows for assessingagrid.ThisinformationwasconsideredbyEIDMR. an automatic classification of a grid. The numeric values for the weights were set as described in Inourgridsurvey,thetengridparametersshowninTableI Section III-B. were gathered for 300 real rural and suburban low-voltage B. Classification with Support Vector Machines grids. In order to label these data, we accomplished an expert based procedure. Admittedly, a real ground truth to validate We now apply the four combination rules IDMR, EIDMR, the experts’ classification results could only be obtained by DSR,andMRtothedataofourcasestudy.Thatis,weaimfor an actual increase of the DG power in these grids which a combination of five experts’ statements into one combined is obviously not possible. A number of five experts from classlabelforeverysampleusingeachofthefourcombination distribution grid planning practice (DSO staff) was chosen rules. Using EIDMR, the number of considered additional as a trade-off between reliability of the combined statements sampleswassettok =5andthefeatureswerez-transformed. and costs of the inquiry. The classification was intended to The accuracy of the combination results cannot be evaluated yieldinformationontheDGcapacityofthelow-voltagegrids such as in Section III-C. This is due to the fact that the true under consideration. For that, we used five distinct ordinal classes are not known for the data of our case study. We train classes with an ascending order describing the strength of the a classifier on training data, evaluate it on test data, and do grid structure: (1) “very weak”, (2) “weak”, (3) “average”, so in a cross-validation approach using the combined class (4) “strong”, and (5) “very strong”. labels. Thus, we aim to compare the classification accuracy The design of the questionnaire was oriented to hold the regarding the combined class labels of each combination rule. qualitycriteriavalidity,reliability,andobjectivityforthemea- Our assumption is that a high accuracy of a combination rule surement. Hence, we provided the gathered grid parameters concerningthedetectionofthetrueclassesisaccompaniedby and a plan for every grid as well as some supplementary a high accuracy of the trained classifier. information for optional usage by the experts. The first sup- In our experiments, we evaluate two different classification plementary information consisted of five prototype grids, one conceptspredictingtheclasses.Inthefirstconcept,wetreated for each class, which we selected from the 300 real grids. the expert statements individually by extending the training To provide more information, we divided the range of the data in the following way: Each feature vector occurs five parameters into five intervals so that every interval contains times and as targets we used the five expert statements. This 20% of the grids. This results in indicator functions assigning extension of the training data can be seen as an interpretation one of the five grid classes for the manifestation of every of the set of the five expert statements as one gradual label. particular parameter (e.g., if the percentage of intermeshing Only the testing is done with the combined label in this liesbetween0%and31%thecorrespondingindicatorfunction concept. The second concept already uses the combined label yields class 1 for this parameter). Additionally, the experts inthetrainingphase.Thetestingis,suchasinthefirstconcept, were asked for a global ranking and weighting of the grid done with the combined label. To implement the classifiers, parameters concerning their importance for the classification theLIBSVMlibrarywithastandardGaussiankernel[26]was 2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2171 TABLEII ten 3-fold cross-validations for each classification concept. In CLASSIFICATIONRESULTSOFTHEFIRSTSVMCLASSIFICATIONCONCEPT PREDICTINGEIDMR-COMBINEDCLASSLABELS. each of these cross-validations, the data is randomly rear- ranged into 3 stratified folds. As a consequence, the samples AccuracyMeasure # κw,train κw,test etrain [%] etest [%] whichareassignedtothefoldswilldifferineachrepetition.A number of ten repetitions of the cross-validations is chosen to 1 0.456 0.736 42.8 23.7 2 0.456 0.744 43.1 23.7 getstatisticallysignificantresultsandtoconsidertheinfluence 3 0.449 0.737 43.3 21.7 of the random rearrangement of the folds. 4 0.459 0.713 42.8 24.7 In Table II, the classification error and the classification 5 0.459 0.768 42.9 20.7 Results 6 0.457 0.752 43.1 22.7 accuracy κw of the first classification concept predicting 7 0.459 0.767 42.8 20.7 EIDMR-combined labels are set out for each of the ten cross- 8 0.453 0.738 43.1 23.3 validationrepetitions.Theresultsofallsinglecross-validations 9 0.457 0.763 43.0 20.7 10 0.453 0.755 43.1 22.0 areoutlinedtoillustratetheinfluenceoftherandomrearrange- μμμ 0.456 0.747 43.0 22.4 ment of the folds on the classification error and classification σσσ 0.003 0.017 0.2 1.5 accuracy. The last two columns show the classification error forthetrainingsamplesandthetestdata.Intheprecedingtwo TABLEIII CLASSIFICATIONRESULTSOFTHESECONDSVMCLASSIFICATION columns,thevaluesofκw areoutlinedforthetrainingsamples CONCEPTPREDICTINGEIDMR-COMBINEDCLASSLABELS. as well as the test data. We show both, training and test accuracies, to illustrate the differences between training and AccuracyMeasure test. In addition to the single values of the cross-validations, # κw,train κw,test etrain [%] etest [%] theoverallmeanvalues(μμμ)andtheoverallstandarddeviations 1 0.912 0.799 8.2 18.0 2 0.914 0.805 8.0 17.7 (σσσ) are presented for each accuracy measure. It can be seen, 3 0.910 0.786 8.3 19.4 that the overall test accuracy represented by the mean value 4 0.909 0.818 8.5 16.6 5 0.917 0.783 7.7 19.7 μμμ(κw,test) is approximately 0.75. The corresponding overall Results 6 0.910 0.780 8.3 19.6 testerrorμμμ(etest)is22.4%.Theoveralltrainingaccuracyand 7 0.905 0.807 8.8 17.3 the overall training error attain significantly worse values of 89 00..991025 00..788199 88..28 1196..33 μμμ(κw,train) ≈ 0.46 and μμμ(etest) = 43.0%, respectively. This 10 0.908 0.814 8.5 16.7 isreasonedbythefactthatinthefirstclassificationconceptthe μμμ 0.910 0.800 8.3 18.1 samefeaturevectorofatrainingsetiscontainedmultipletimes σσσ 0.004 0.015 0.4 1.3 (here,fivetimes)inthetrainingdata(oftenwithdifferent,i.e., conflicting expert statements). Table III shows the results of the second classification used. The feature space was built by the characteristic grid concept which directly uses the combined labels to train the parameters set out in Table I. SVM. Because of the direct use of the combined labels, To estimate good parameter values for the SVM classifiers the overall training accuracy as well as the overall training andtopreventthemfromoverfitting,weusedastratified3-fold error are noticeably better compared to the first classification cross-validation. The stratification is essential here because concept. The test values result in μμμ(κw,train) = 0.80 and the 300 combined class labels were not equally distributed no μμμ(etest)=18.1%andarealsobetterthantherespectivevalues matter which of the four combination rules is applied. Thus, of the first classification concept. the process of randomly rearranging the data into 3 folds has In summary, the second classification concept outperforms toensurethateachfoldrepresentsthedistributionoftheclass the first one, and, thus, is more advantageous to predict the labels. Furthermore, the selection of the SVM parameters C combined labels in this experiment. Furthermore, our results and γ was done using a grid search. Because the test data show that the use of EIDMR has a high potential to yield a of the 3-fold cross-validation process must not be used to good classification accuracy. But is this accuracy significantly find the parameters, we implemented another 3-fold cross better compared to the use of the other three combination validation within the actual validation procedure to robustly rules? To investigate the influence of the combination rule searchtheparametersonthetrainingdata.Oneproblemisthat ontheclassificationresults,weconductedanotherexperiment. theclasslabelshaveanordinalcharacterwhichareinterpreted Werealizedthebetterperformingsecondclassificationconcept as nominal classes by the SVM. Thus, to estimate the model foreachcombinationrule.Theoverallclassificationresultsare performance, we do not make use of the classification error e set out in Figure 9. Using the labels combined with the three (amount of incorrectly classified samples) only but optimized well-known approaches DSR, MR, and IDMR yields an over- the model parameters with regard to κw (which we use to allclassificationaccuracyofapproximatelyμμμ(κw,test)≈0.61 measuretheclassificationaccuracy,thus,e(cid:8)=1−κw)between for each of these three rules. These values are significantly the test data and the output of the model (cf. Section III-B). worse compared to the discussed results based on EIDMR. The resulting parameter combination which we used for all Using EIDMR labels leads to an enhancement of the overall implemented SVM is C =10 and γ =0.03. classification accuracy to about 0.8 on test data. Additionally, HavingfoundgoodparametervaluesfortheSVM,wemade we observe that the use of EIDMR reduces the standard 2172 2016InternationalJointConferenceonNeuralNetworks(IJCNN) (cid:8) (cid:20)(cid:9)(cid:20)(cid:17)(cid:20) (cid:20)(cid:9)(cid:26) (cid:20)(cid:9)(cid:20)(cid:21)(cid:18) [4] K.SentzandS.Ferson,“CombinationofEvidenceinDempster-Shafer- (cid:20)(cid:9)(cid:19) (cid:20)(cid:9)(cid:20)(cid:21)(cid:5) Theory,”SandiaNationalLaboratories,TechnicalReport,2002. [5] D. Andrade, T. Horeis, and B. Sick, “Knowledge Fusion Using (cid:20)(cid:9)" (cid:20)(cid:9)(cid:20)(cid:5)(cid:19) Dempster-Shafer Theory and the Imprecise Dirichlet Model,” IEEE (cid:20)(cid:9)(cid:18) (cid:20)(cid:9)(cid:20)(cid:5)(cid:17) ConferenceonSoftComputinginIndustrialApplications,Muroran,pp. (cid:20)(cid:9)(cid:10) (cid:20)(cid:9)(cid:20)(cid:5)(cid:20) 142–148,2008. (cid:20)(cid:20)(cid:9)(cid:17)(cid:17) (cid:20)(cid:20)(cid:9)(cid:20)(cid:20)(cid:8)(cid:8)(cid:18)(cid:18) [6] G.Shafer,AMathematicalTheoryofEvidence. PrincetonUniversity (cid:20)(cid:9)(cid:21) (cid:20)(cid:9)(cid:20)(cid:8)(cid:5) Press,Princeton,1976. (cid:20)(cid:9)(cid:5) (cid:20)(cid:9)(cid:20)(cid:20)(cid:19) [7] G.Hua-wei,S.Wen-kang,andD.Yong,“EvidentialConflictandits3D (cid:20)(cid:9)(cid:8) (cid:20)(cid:9)(cid:20)(cid:20)(cid:17) Strategy:Discard,DiscoverandDisassemble,”SystemsEngineeringand (cid:20) (cid:20)(cid:9)(cid:20)(cid:20)(cid:20) Electronics,vol.29,no.6,pp.890–898,2007. (cid:29)!(cid:31) (cid:30)(cid:31) (cid:28)(cid:29)(cid:30)(cid:31) (cid:27)(cid:28)(cid:29)(cid:30)(cid:31) [8] L.A.Zadeh,“ReviewofBooks:AMathematicalTheoryofEvidence,” TheAIMagazine,vol.5,no.3,pp.81–83,1984. -(cid:9)(cid:3) (cid:13)$$(cid:13)(cid:3)(cid:14)(cid:12)(cid:24)(cid:14) -(cid:9)(cid:3) (cid:13)$$(cid:13)(cid:3)(cid:14)(cid:16)(cid:13)*% !(cid:14)’(cid:9)(cid:3)’(cid:12).(cid:9)(cid:3)(cid:14)(cid:12)(cid:24)(cid:14) !(cid:14)’(cid:9)(cid:3)’(cid:12).(cid:9)(cid:3)(cid:14)(cid:16)(cid:13)*% [9] D.Yi-jie,W.She-liang,Z.Xin-dong,L.Miao,andY.Xi-qin,“AMod- ified Dempster-Shafer Combination Rule Based on Evidence Ullage,” Fig. 9. Overall classification accuracy μμμ(κw) for each combination rule International Conference on Artificial Intelligence and Computational usingthesecondclassificationconcept. Intelligence,Shanghai,pp.387–391,2009. [10] F. Smarandache and J. Dezert, Advances and Applications of DSmT for Information Fusion: Collected Works. American Research Press, Rehoboth,2009. deviations of the classification accuracy. [11] A. Tchamova and J. Dezert, “On the Behavior of Dempster’s Rule of Combination and the Foundations of Dempster-Shafer Theory,” IEEE V. CONCLUSIONANDFUTURERESEARCH International Conference on Intelligent Systems, Sofia, pp. 108–113, 2012. In this article, we proposed a novel combination rule for [12] F. Smarandache, J. Dezert, and V. Kroumov, “Examples Where the expert statements and compared it to the well-known com- Conjunctive and Dempster’s Rules are Insensitive,” Proceedings of bination rules DSR, MR, and IDMR. The results show that the 2013 International Conference on Advanced Mechatronic Systems, Luoyang,pp.7–9,2013. especiallyifthereareonlyasmallnumberofexpertstatements [13] M.C.M.TroffaesandF.Coolen,“OntheUseoftheImpreciseDirichlet available (e.g., due to high elicitation costs), the additional ModelWithFaultTrees,”TechnicalReport,DurhamUniversity.,pp.1– information gathered from similar samples by means of a 8,2007. [14] L. V. Utkin, “Extensions of Belief Functions and Possibility Distribu- knn approach leads to significantly better combination results tionsbyUsingtheImpreciseDirichletModel,”FuzzySetsandSystems, with the new EIDMR. The use of the ordinal information vol.154,no.3,pp.413–431,2005. given by the class labels and the consideration of reliability [15] E. T. Whitaker and G. N. Watson, A Course Of Modern Analysis. CambridgeUniversityPress,Cambridge,1996. weights leads to an additional improvement in the application [16] P.Walley,“InferencesFromMultinomialData:LearningAboutaBag of EIDMR. ofMarbles,”JournalofRoyalStatisticalSociety,SeriesB(Methodolog- If desired, EIDMR could be developed further by consid- ical),vol.58,no.1,pp.3–57,1996. [17] M.J.Warrens,“EquivalencesofWeightedKappasforMultipleRaters,” ering the distance (measured in the feature space) of similar StatisticalMethodology,vol.9,no.3,pp.407–422,2012. samples in the knn approach by means of additional weights. [18] M. Maclure and W. C. Willett, “Misinterpretation and Misuse of the Tofurthervalidateournewcombinationrule,wepresenteda Kappa Statistic,” American Journal of Epidemiology, vol. 126, no. 2, pp.161–169,1987. comprehensivecasestudybyapplyingallcombinationrulesto [19] P.W.MielkeandK.J.Berry,“ANoteonCohen’sWeightedKappaCo- thedataof300reallow-voltagegrids.Thegridswereassessed efficientOfAgreementWithLinearWeights,”StatisticalMethodology, withordinallabelsbyfiveexpertsfromaregionaldistribution vol.6,no.5,pp.439–446,2009. [20] M.J.Warrens,“Cohen’sQuadraticallyWeightedKappaisHigherThan system operator. In this case study, the combination rules LinearlyWeightedKappaforTridiagonalAgreementTables,”Statistical were compared with regard to the prediction accuracy in Methodology,vol.9,no.3,pp.440–444,2012. a classification approach with SVM. Our results show that [21] C.-C. Chang and C.-J. Lin, “A Practical Guide to Support Vector Classification,” Technical Report, National Taiwan University., pp. 1– EIDMRnoticeablyoutperformstheotherrulesconcerningthe 16,2010. prediction of the combined labels. [22] G. W. Ault, J. R. McDonald, and G. M. Burt, “Strategic Analysis Ourfutureresearchactivitiesinthefieldoflow-voltagegrid FrameworkforEvaluatingDistributedGenerationandUtilityStrategies,” IEEE Proceedings on Generation, Transmission and Distribution, vol. classificationwillfurtherconsolidatetheconcludedresultsand 150,no.4,pp.475–481,2003. we will investigate other ways to predict the combined class [23] H. B. Puttgen, P. R. MacGregor, and F. C. Lambert, “Distributed labels such as, e.g., ensemble learning. Furthermore, we will Generation: Semantic Hype or the Dawn of a new era?” IEEE Power andEnergyMagazine,vol.1,no.1,pp.1–5,2003. improve the classification accuracy by integrating additional [24] P.Chiradeja,“BenefitofDistributedGeneration:ALineLossReduction features and using classification results from stochastic load Analysis,” Transmission and Distribution Conference and Exhibition: flow simulations [27]. AsiaandPacific,Yokohama,pp.1–5,2005. [25] J. Driesen and R. Belmans, “Distributed Generation: Challenges and PossibleSolutions,”PowerEngineeringSocietyGeneralMeeting,Mon- REFERENCES treal,pp.1–5,2006. [1] S.A.Sandri,D.Dubois,andH.W.Kalfsbeek,“Elicitation,Assessment, [26] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support and Pooling of Expert Judgments Using Possibility Theory,” IEEE Vector Machines,” ACM Transactions on Intelligent Systems and TransactionsonFuzzySystems,vol.3,no.3,pp.313–335,1995. Technology, vol. 2, no. 3, pp. 1–27, 2011, software available at [2] S.S.Stevens,“OntheuseofExpertJudgementonComplexTechnical http://www.csie.ntu.edu.tw/˜cjlin/libsvm(13.01.2015). Problems,” IEEE Transactions on Engineering Management, vol. 36, [27] S.Breker,A.Claudi,andB.Sick,“CapacityofLow-VoltageGridsfor no.2,pp.83–86,1989. Distributed Generation: Classification by Means of Stochastic Simula- [3] C. Murphy, “Combining Belief Functions When Evidence Conflicts,” tions,”IEEETransactionsonPowerSystems,vol.30,no.2,pp.689–700, DecisionSupportSystems,vol.29,no.1,pp.1–9,2000. 2015. 2016InternationalJointConferenceonNeuralNetworks(IJCNN) 2173

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.