Performance Benchmarking and Analysis of Prognostic Methods for CMAPSS Datasets EmmanuelRamasso1andAbhinavSaxena2 1FEMTO-STInstitute,Dep. AS2M/DMA,UMRCNRS6174-UFC/ENSMM/UTBM,25000Besanc¸on,France [email protected] 2SGTInc.,NASAAmesResearchCenter,IntelligentSystemsDivision,MoffettField,CA,94035-1000,USA [email protected] ABSTRACT where PHM researchers were invited to develop prognostic methods as part of the competition. The competition was Sixyearsandmorethanseventypublicationslaterthispaper well received with a good set of winning methods. These looksbackandanalyzesthedevelopmentofprognosticalgo- data were further made available following the competition rithmsusingC-MAPSSdatasetsgeneratedanddisseminated through NASA PCoE data repository (Saxena & Goebel, by the prognostic center of excellence at NASA Ames Re- 2008)sothatresearcherscanusethemtobuild, test, bench- searchCenter.Amongthosedatasetsarefiverun-to-failureC- mark, and compare data-driven prognostic methods. Since MAPSSdatasetsthathavebeenpopularduetovariouschar- then, these datasets have been widely used by researchers acteristicsapplicabletoprognostics. TheC-MAPSSdatasets around the world and results published in over 70 publica- poseseveralchallengesthatareinherenttogeneralprognos- tions. Thispaperreviewstheseapproaches,publishedinthe tics applications. In particular, management of high vari- lastfiveyearsandanalyzesthemtounderstandwhysomeap- ability due to sensor noise, effects of operating conditions, proachesworkedbetterthanothers, howdidresearchersuse andpresenceofmultiplesimultaneousfaultmodesaresome these datasets to compare their methods, and what were the factors that have great impact on the generalization capabil- difficultiesfacedsonecessaryimprovementscanbemadeto ities of prognostics algorithms. More than seventy publica- thesedatasetstomakethemmoreuseful. tions have used the C-MAPSS datasets for developing data- drivenprognosticalgorithms.However,intheabsenceofper- 1.1. Background formance benchmarking results and due to common misun- derstandings in interpreting the relationships between these Prognosticshasgainedsignificantattentionforitspromiseto datasets,ithasbeendifficultfortheuserstosuitablycompare improvesystemshealthmanagementthroughadvancewarn- their results. In addition to identifying differentiating char- ingsaboutperformancedegradationandimpendingfailures. acteristics in these datasets, this paper also provides perfor- Predictingwithconfidence,however,hasposeditsownchal- manceresultsforthePHM’08datachallengewiningentries lenges due to various uncertainties involved in the process. toserveasperformancebaseline.Thispapersummarizesvar- Several government and industry supported programs have iousprognosticmodelingeffortsthatusedC-MAPSSdatasets helpedpushthethrustinprognosticstechnologydevelopment and provides guidelines and references to further usage of all round the globe. The science of prognostics has fairly these datasets in a manner that allows clear and consistent matured and the general understanding of health prediction comparisonbetweendifferentapproaches. problemanditsapplicationshasgreatlyimprovedinthepast decade. Both data-driven and physics based methods have 1. INTRODUCTION beenshowntopossessuniqueadvantagesthatarespecificto application contexts. However, until very recently, a com- Run-to-failure datasets from a turbofan engine simulation mon bottleneck in development of data-driven methods was model were first published by NASA’s Prognostics Center the lack of availability of run-to-failure data sets. In most of Excellence (PCoE) in 2008. This dataset was originally cases real-world data contain fault signatures for a growing usedinadatachallengecompetitioninPHM’08conference, fault at various severity levels but no or little data capture EmmanuelRamassoetal.Thisisanopen-accessarticledistributedunderthe fault evolution all the way through failure. Procuring actual termsoftheCreativeCommonsAttribution3.0UnitedStatesLicense,which systemfaultprogressiondataistypicallytimeconsumingand permitsunrestricteduse,distribution,andreproductioninanymedium,pro- expensive. Fielded systems are, most of the time, not prop- videdtheoriginalauthorandsourcearecredited. InternationalJournalofPrognosticsandHealthManagement,ISSN2153-2648,2014014 1 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT erlyinstrumentedforcollectionofrelevantdataorareunable ousdegradationsindifferentsectionsoftheenginesystem. todistributesuchdataduetoproprietaryconstraints.Thelack of common data sets, which researchers can use to compare 2.1. Datasetscharacteristics theirapproaches, hasbeenanimpedimenttoprogressinthe Using this simulation environment, five datasets were gen- fieldofprognostics. Totacklethisproblem, severaldatasets erated. By creating a custom code wrapper, as described havebeenpublishedonaprognosticsdatarepository(Saxena in (Saxena, Goebel, et al., 2008a), selected fault injection &Goebel,2008),whichhavebeenusedbymanyresearchers. parameters were varied to simulate continuous degradation Amongthesedatasetsarefivedatasetsfromaturbofanengine trends. Datafromvariouspartsofthesystemwerecollected simulation model - C-MAPSS (Commercial Modular Aero- torecordeffectsofdegradationsonsensormeasurementsand PropulsionSystemSimulation)(Frederick,DeCastro,&Litt, providetimeseriesexhibitingdegradationbehaviorsinmul- 2007). Bysimulatingavarietyofoperationalconditionsand tipleunits. Thesedatasetspossessuniquecharacteristicsthat injectingfaultsofvaryingdegreedegradation, datasetswere makethemveryusefulandsuitablefordevelopingprognostic generated for prognostics development (Saxena, Goebel, Si- algorithms: mon, & Eklund, 2008a). One of the first datasets was used foraprognosticsdatachallengeatthePHM’08conference.A 1. Data represent a multi-dimensional response from a subsequentsetwasthenreleasedlaterwithvaryingdegreesof complex non-linear system from a high fidelity simula- complexity. Thesedatasetshavesincebeenusedverywidely tionthatverycloselymodelsarealsystem. inpublicationsforbenchmarkingprognosticsalgorithms. 2. These simulations incorporated high levels of noise in- troducedatvariousstagestoaccommodatethenatureof 1.2. Motivation variabilitygenerallyencountered. The turbofan degradation datasets have received over seven 3. Theeffectsoffaultsaremaskedduetooperationalcon- thousanduniquedownloadsinthelastfiveyearsandyielded ditions,whichisyetanothercommontraitofmostoper- about seventy publications with various algorithms. How- ationalsystems. ever, results obtained by those algorithms can generally not 4. Datafromplentyofunitsisprovidedtoallowalgorithms be compared due to confusions and inconsistencies in how toextracttrendsandbuildassociationsforlearningsys- thesedatasetshavebeeninterpretedandused. Therefore,this tembehaviorusefulforpredictingRULs. paperintendstoanalyzevariousapproachesthatresearchers havetakentoimplementprognosticsusingturbofandatasets. These datasets were geared towards data-driven approaches Some unique characteristics of these datasets are also iden- whereverylittleornosysteminformationwasmadeavailable tified that led to use certain methods more often than oth- toPHMdevelopers. ers. Specifically, various differences among these datasets AsdescribedindetailinSection3,theanalysisofthepubli- arepointedout. Acommentaryisprovidedonhowtheseap- cationsusingthesedatasetsshowsthatmanyresearchershave proachesfaredcomparedtothewinnersofthedatachallenge. tried to make comparisons between results obtained from Furthermore, this paper also attempts to clear several issues these similar yet different datasets. This section briefly de- so that researchers, in the future, can take these factors into scribesanddistinguishesthefivedatasetsandexplainswhyit accountincomparingtheirapproacheswiththebenchmarks. mayormaynotbeappropriatetomakesuchcomparisons. The paper is organised as follows. In Section 2, the C- Table 1 summarizes the five datasets. The fundamental dif- MAPSSdatasetsarepresented. Section3isdedicatedtothe ferencebetweenthesedatasetsisattributedtothenumberof literaturereview. Section4presentsataxonomyofprognos- simultaneousfaultmodesandtheoperationalconditionssim- tics approaches for C-MAPSS datasets. Finally, Section 5 ulatedintheseexperiments. Datasets#1through#4incor- providessomeguidelinestogiveahandtofutureusersinde- porateanincreasinglevelofcomplexityandmaybeusedto velopingnewprognosticalgorithmsappliedtothesedatasets incrementallylearntheeffectsoffaultsandoperationalcon- andinfacilitatingalgorithmsbenchmarking. ditions. Furthermore,whatsetsthesefourdatasetsapartfrom the challenge datasets is the availability of ground truth to 2. TURBOFANSIMULATIONDATASETS measure performance. Datasets 1−4 consist of a training C-MAPSSisatool,codedintheMATLAB-Simulink(cid:13)R en- set that users can use to train their algorithms and a test set vironmentforsimulatingenginemodelofthe90,000lbthrust to test the algorithms. The ground truth RUL values for the class(Fredericketal.,2007). Usinganumberofeditablein- testsetarealsogiventoassesspredictionerrorsandcompute put parameters, it is possible to specify operational profile, anymetricsforcomparisonpurposes. Resultsbetweenthese closed-loopcontrollers,environmentalconditions(variousal- datasetsmaynotalwaysbecomparableasthesedatasimulate titudesandtemperatures), etc. Additionally, thereareprovi- differentlevelsofcomplexity,unlessauniversalgeneralized sionstomodifysomeefficiencyparameterstosimulatevari- modelisavailablethatregardsdatasets1−3asspecialcases ofdataset#4. 2 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT Table1. DescriptionofthefiveturbofandegradationdatasetsavailablefromNASArepository. Datasets #FaultModes #Conditions #TrainUnits #TestUnits #1 1 1 100 100 Turbofandata #2 1 6 260 259 fromNASA #3 2 1 100 100 repository #4 2 6 249 248 PHM2008Data #5T 1 6 218 218 Challenge #5V 1 6 218 435 The PHM challenge datasets are designed in a slightly dif- in (Saxena, Goebel, Simon, & Eklund, 2008b). Similarly a ferent way and divided into three parts. Dataset #5T con- number of error based performance metrics were computed tains a train set and test set just like for datasets 1−4 ex- fortheparticipantentriesandarepresentedinTables2and3 ceptwithonedifference. ThegroundtruthRULsforthetest forthetestandvalidationsetsrespectively. Whileinterpret- set are not revealed. The challenge participants were asked ingtheseresultsfollowingmustbekeptinmind: touploadtheirresults(onlyonceperday)toreceiveascore • Scores obtained on dataset #5V are expectedly poorer basedonanasymmetricalscoringfunction(Saxena,Goebel, in general than those obtained on dataset #5T. This is etal.,2008a). Userscanstillgettheirresultsevaluatedusing duetotworeasons-(1)PHMscoremetricisanaggre- the same scoring function by uploading their results on the gate over all units and is not normalized by number of repository page, but otherwise it is not possible to compute units (note that #5V has almost twice the number of anyothermetricontheresultsinabsenceofgroundtruthto unitsin#5V),and(2)theRULstatisticswereintention- allowerrorcomputation. Thethirdpartofthechallengesetis allychangedfordataset#5V tocheckagainstoverfitting dataset#5V,thefinalvalidationsetthatwasusedtorankthe orthemethodsthatimposethresholdstoavoidoverpre- challenge participants, where they were allowed only once dictionsthatarepenalizedmoreseverelybythescoring chancetosubmittheirresults.Thechallengesincethenisstill function. continuing and a participant may submit final results (only once) for evaluation per instructions posted with the dataset • Owing to the above two reasons the top ranking algo- ontheNASArepository(Saxena&Goebel,2008). rithmson#5T didnotnecessarilyperformequallywell on #5V. Therefore, it must not be interpreted that the 2.2. EstablishingPerformanceBaseline toprankedalgorithminFigure1(a)isalsothetoprank- ing one in Figure 1(b). At the same time both tables For performance comparison purposes, data challenge win- should be regarded as separate benchmarks for the two ning entries can be regarded as the benchmark performance datasets. withscorespublishedonthechallengewebsitein2008.Since • For the sake of a common ground, the ranks in Tables that webpage was taken down in subsequent years, these 2 and 3 are assigned based on PHM’08 scoring func- scores are not available except as reported in the published tion. As observed, that poorer ranking algorithms may publications from the winners. Therefore, those scores are performbetterwhenevaluatedonothermetrics. Thisil- included here for reference in future. It is, therefore, real- lustrates a key point that metrics must be chosen based ized that a direct comparison with the winners has not been on what is important to achieve the goals in a specific possible and researchers have inconsistently compared per- applicationandthattheremaynotbeauniversalmetric formance between these different datasets. To alleviate that forperformance. problem, this paper provides a number of error-based met- ricslistedin(Saxena,Celaya,etal.,2008)computedfortop • These tables do not include the prognostic metrics such tenentriestoformallyestablishaperformancebenchmarkfor as those proposed in (Saxena, Celaya, Saha, Saha, & datasets#5T and#5V. Sincethereisnocommonrecordof Goebel, 2010). Theprimary reason beingthat forthese results from datasets 1−4, no such benchmark can be eas- datasetspredictionsaremadeformultipleunitsandonly ily established and can only be partially collected from the once, which is a different scenario than for continuous published publications. It is however, expected that the per- predictionforprognosticmetricstobeapplicable. formanceobtainedondataset#2maybecomparabletothat from #5T due to similarity in fault mode and operational 3. C-MAPSSDATASETLITERATUREREVIEW conditions. To analyze various approaches that have been used to solve Figures1(a)and1(b)graphicallypresenttheperformanceof C-MAPSS dataset problem, all the publications that either topthirtyscoresfromthechallengedatasets.Thescoreswere citethesedatasetsorthecitationrecommendedbytherepos- calculated using the asymmetric scoring function described itorywerecollectedthroughstandardwebsearch.Thesearch results returned over seventy publications which were then 3 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT Top Thirty Results from PHM08 Test Set (dataset #5A) Top Thirty Results from PHM08 Challenge (dataset #5B) 00001 x3.35 00001 x2350 2.5 20 erocS 80M 1.25 erocS 80M 1105 HP 1 HP 5 0.5 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Ranked Participants (Final Result) Ranked Participants (PHM08 Leader Board) (a) Dataset#5T (b) Dataset#5V Figure 1. Top thirty scores from PHM’08 data Challenge (datasets #5T and #5V) computed using competition scoring function. Table2. ComprehensivesetofmetricsascomputedfromPHM’08Challengeleaderboardbasedontestset(dataset#5T). Rankingincolumn1isestablishedbasedonPHMScores. Rankon#5T CompetitionScore MSE FPR(%) FNR(%) MAPE(%) MAE Corr.Score Std.Dev. MAD MdAD 1 512.12 152.71 56.35 43.65 15.81 8.67 0.96 0.64 8.68 5.69 2 740.31 224.79 57.73 38.12 18.92 10.77 0.94 0.76 10.72 7.00 3 873.36 265.62 53.59 28.18 19.19 11.47 0.93 0.81 11.75 8.00 4 1218.43 269.68 51.93 44.20 20.15 11.87 0.93 0.85 12.05 8.00 5 1218.76 331.30 50.55 49.45 33.14 13.81 0.91 0.95 14.03 10.87 6 1232.27 334.52 42.27 57.73 32.90 14.14 0.91 0.96 14.28 10.37 7 1568.98 394.46 50.55 47.24 36.75 15.37 0.89 1.03 15.48 12.00 8 1645.77 330.02 47.24 50.28 30.00 13.47 0.91 0.95 13.59 10.00 9 1816.60 359.97 50.28 49.17 26.47 13.82 0.90 0.99 14.07 9.75 10 1839.06 377.01 45.86 53.59 27.72 14.31 0.89 1.02 14.43 9.10 preprocessedtoidentifyoverlappingeffortsbysameauthors techniques to preprocess the data such as PCA and kernel or the publications that only cite the dataset but perceivably smoothing(adetailedversioncanbefoundinthePhDdisser- did not use them for algorithm development. This resulted tation(T.Wang,2010)).Thesedatawerethendividedintosix infortyuniquepublicationsthatwerethenconsideredforre- bins corresponding to the six operating conditions by using view and analysis for this work. For the sake of readability, a K-means clustering applied on channels 3,4, and 5 (these eachofthesepublicationswereassignedauniqueIDtousein columnsrepresentoperatingconditions)1.Sensors7,8,9,12, various tables summarizing the results presented in this sec- 16,17,and20wereselectedmanuallyasrelevantfeaturesbe- tion. ThismappingbetweenpublicationandIDsispresented cause they exhibit continuous and consistent trends suitable in Table 16. Furthermore, to keep the paper length short, a for regression and generalization. The first 5% and the last detailed review analysis of each of the forty publications is 95% of each training instance (trajectory) were considered notincludedbutonlythesummarizedfindings. However,the as healthy and failure data respectively. For each operating threewinningmethodsfromPHM’08challengearereviewed regime, the data were used to estimate the parameters of an and summarized here to provide an appreciation of the ap- exponentialregressionmodeldescribingtheevolutionofthe proaches that were used by the top scorers. These publica- healthindicator(HI)fromthehealthystatetothefailurestate. tions also reported a lot of observations and other relevant TheseregressionmodelswerethenusedtoestimateHIgiven informationaboutthe datawithseveralinsights thatmaybe a test trajectory for all operating regimes individually, and usefulinfuturedevelopments. thenfusedtogethertogetoneHIusingspecificfusionrules. AthresholdonmaximumRULestimateswasappliedtode- 3.1. TheThreeWinningMethods creasetheriskofbeingpenalizedbylatepredictions. Being partofthecompetitionthisapproachwasappliedondatasets Similarity Based Approach (T. Wang, Yu, Siegel, & Lee, #4,#5T,#5V, and most of the results were published in 2008) (Publications ID 1 and 7) → Approach: The authors’ resultingpublications. winning approach implemented a similarity-based prognos- tics algorithm. The method included several data analysis 1Thepartitioningintooperatingconditionscanbedirectlyobtainedbyusing theresultsobtainedin (Richter,2012) 4 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT Table3. ComprehensivesetofmetricsascomputedfromPHM’08Challengeleaderboardbasedontestset(dataset#5V). Rankingincolumn1isestablishedbasedonPHMScores. Rankon#5V CompetitionScore MSE FPR(%) FNR(%) MAPE(%) MAE Corr.Score Std.Dev. MAD MdAD 1 5636.06 546.60 64.83 31.72 19.99 16.23 0.93 1.01 16.33 11.00 2 6691.86 560.12 63.68 36.32 17.92 15.38 0.94 1.03 16.29 8.08 3 8637.57 672.17 61.38 23.45 20.72 17.69 0.92 1.09 17.79 11.00 4 9530.35 741.20 58.39 39.54 34.93 20.19 0.90 1.22 20.17 15.00 5 10571.58 764.82 58.85 41.15 32.60 20.05 0.91 1.22 20.41 14.23 6 14275.60 716.65 59.77 37.01 21.61 18.16 0.90 1.17 18.57 11.00 7 19148.88 822.06 56.09 41.84 30.25 20.23 0.88 1.29 20.89 13.00 8 20471.33 1000.06 51.95 48.05 33.63 22.44 0.88 1.42 24.05 14.78 9 22755.85 1078.19 62.53 35.40 39.90 24.51 0.86 1.45 24.08 20.00 10 25921.26 854.57 34.25 64.83 51.38 22.66 0.86 1.36 21.49 16.00 Recurrent Neural Network Approach (Heimes, 2008) tooperatingconditions(Saxena,Goebel,etal.,2008b)). The (PublicationID2)→Approach:Theauthorsimplementedan second part of the publication is then dedicated to the algo- advanced proprietary algorithm initially developed for mod- rithm. Thedataprocessingincludeddatastandardizationde- elingcomplexdynamicsofaircraftcomponents(suchastur- fined specifically for each operating condition to obtain fea- bines)atBAESystems.Thisalgorithmwasbasedonthework turesinsimilarscales. Theauthorinvestigatedatournament of(Feldkamp&Puskorius,1998)abletosolvevariousprob- heuristic approach to select various sensor subsets by mini- lemsinadaptation, filteringandclassification. Thefirstpart mizing the error on RUL prediction. The RUL is estimated ofthepublicationisdedicatedtothedataexplorationphase. by an ensemble of RBFs and MLPs with multiple parame- AclassifierbasedonMLPwasfirsttrainedtodistinguishbe- terization. ThestepphaseistheBayes-optimalcombination tweenhealthyandfaultystatesyieldinganerrorof1%. The of RUL estimates using a Kalman filter. The Kalman filter authorthenfocusedonregressionmethodssuchasMLPand also allowed the author to reduce sensor noise and to treat RNNtocopewithtruncatedinstancesasitisthecaseforthe instancesastime-seriesbyintegratingpastinformation. data challenge. RNN was preferred to MLP because RNN Theanalysisofthecollectedpublicationsrevealsseveralim- wasabletocopewithtime-dependentdata. Thedataexplo- portant observations that are summarized here. First, these rationphaseendswithsomeobservationsthatmaybeuseful publications are binned into various different categories and forotheralgorithms: Thedegradationismadeoffourphases then analyzed. These categories and corresponding findings (steady,knee,accelerationofdegradationandfailure),anda arepresentedinthesequel. thresholdonmaximumRULestimatesissetto130. Thesec- ond part of the publication is then focused on the algorithm 3.2. C-MAPSSDatasetUsed implemented.TheparametersoftheRNNwereestimatedus- ingallsensordataandoperatingconditions. Gradientswere Table 4 identifies specific publications that use one or more computed by a truncated back-propagation through time al- ofthesefivedatasets. Itcanbeobservedthatthedataset#1 gorithm jointly with an extended Kalman filter to refine the wasthemostusedone(55%),followedbythetestset(#5T) weights. Anevolutionaryapproachbasedondifferentialevo- fromthePHM’08challenge(35%),whereasrestoftheother lution was also used to maintain an ensemble of solutions datasets are relatively under utilized. Three publications re- withvariousandefficientparameterization.Theoptimization portgeneratingtheirowndatasetsusingtheC-MAPSSsimu- wasperformedusingcross-validationbysplittingthetraining lator. dataintodistincttrainingandvalidationsets. The heavy usage of the dataset #1 (≈ 70%) compared to Multi Layer Perceptron and Kalman Filter Based Ap- allotherdatasetsamongthefourfromtheNASARepository proach (Peel, 2008) (Publication ID 3) → Approach: The may be attributed to its simplicity compared to the rest (see authorpresentedanarchitecturebasedonMLPandRBF(im- Figure2),whereashighusageofdataset#5T isattributedto plementedonNetlab)includingaKalmanfilter.Thefirstpart thePHM’08challenge,whereseveralteamshadalreadyused of the publication is dedicated to the data exploration phase thesedataextensively,therebygainingsignificantfamiliarity and in particular effective visualization techniques such as with the dataset as well as a preference due to availability neuroscale mapping. This technique allowed the author to ofcorrespondingbenchmarkperformancefromthechallenge pointoutthesixoperatingconditionsfromsensordataandto leaderboard. concludethattheseconditionscannotbeusedaloneforpre- SeveralpublicationsmentionedinTable4haveusedonlythe dictions. Similarly to (Heimes, 2008), the various degrada- trainingdatasetsthathavecomplete(run-to-failure)trajecto- tionlevelswerealsopointedasoneimportantdifficulty(due ries. Using data with complete trajectories gives access to 5 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT Table4. Listofpublicationsforeachdataset. 3.3. TargetProblemtoSolve Datasets PublicationID Rate As normally expected, there is a wide variety of approaches taken in interpreting the datasets, formulating the problem 5,6,10,13,14,15,19,20, #1 23, 24, 25, 26, 27, 28, 31, 22/40 andmodelingthesystemtosolveit.However,contrarytoex- Turbofandata 32,33,34,36,37,38,40 pectations,asignificantnumberofpublicationshaveutilized fromNASA #2 13,22,34,40 4/40 repository these datasets for “fault detection” purpose by considering #3 34,40 2/40 #4 7,34,40 3/40 “multi-class classification” or “clustering” rather than prog- 1,2,3,4,8,12,16,17,21, nostics. Table6identifiesanddistinguishesbetweenpublica- PHM’08Data #5T 14/40 challenge 29,30,34,35,40 tionsthatfocusondetectionversusprognostics. #5V 1,2,3,40 4/40 Simulator OWN 9,11,39 3/40 Byposingamulti-classclassificationproblemvariouspubli- Other - 18 1/40 cationsattempttosolvemainlythreetypesofproblems: • Supervisedclassification: Thetrainingdatasetislabeled (knownclassesforeachfeaturevector); 31 dataset #1 dataset #2 } dataset #3 dataset #4 • Unsupervised classification: The classes are not known apriorianddataarenotlabeled; srep 7%10% • Partiallysupervisedclassification: Someclassesarepre- aP 18 fo re 13% cciosnefilydeknncoewvna,luoethteorsexaprereusnskbneoliwefnionrtahraetactltaascsh.edwitha b m uN 70% Publications 1, 7, 10, 20, 24, 27, 32 use classification for 3 preprocessing steps towards solving a prognostics problem. 1 Specifically, unsupervised classification algorithms are used Other Simulator Data NASA Usage distribution for datasets in publications 1, 7 to segment the dataset into the six oper- ChallengeRepository from NASA Repository ating conditions. For reference, detailed information about various simulated operating conditions in C-MAPSS is de- Figure 2. Relative usage of various C-MAPSS datasets in literature. scribed in (Richter, 2012), which can also be used to label thesedatasets. Supervisedandunsupervisedclassificational- gorithms are also used in publications 6, 10, 20, 27, 32 to the true End-of-Life (EOL) to compute RUL from any time assignadegradationlevelaccordingtosensormeasurements. pointinadegradationtrajectorywhichcouldbeusedtogen- Thesequenceofdiscretefailuredegradationstagesisindeed erate a larger set of training data. This approach is also relevant for the estimation of the current health state and its relevant to estimating RULs at different time points and al- prediction(Kim,2010). lowstheusageofprognosticsmetrics(Saxena,Celaya,etal., Healthassessment,anomalydetection(seenasa1-classclas- 2008)suchasprognosticsHorizon,α−λmetric,orthecon- sificationproblem)orfaultidentificationaretackledinpub- vergence measure. However, in true learning sense the al- lications 6, 11, 12, 13, 26, 31, 35 using supervised classifi- gorithm, once trained, should be tested on unseen data for cation methods, and partially supervised classification tech- propervalidation,aswasrequiredforthePHM’08challenge niques in publications 12, 27, 33. For these approaches, a datasets.Table5showsthat11differentpublicationsusedthe known target (or a degradation level) is required to evaluate “full” training/testing datasets, meaning the training dataset the classification rate. For instance, four degradation levels for estimating the parameters and the full testing dataset for were defined for labeling data in publications 6, 10, 27, 33: performanceevaluation. normaldegradation(class1),kneecorrespondingtoanotice- abledegradation(class2viewedasatransitionbetweenclass Table 5. List of publications using only full training/testing 1 and 3), accelerated degradation (class 3)and failure (class datasets. 4). Some hand-made segmentations have been provided by some authors (publication 13). Using these segmented data Datasets PublicationID Rate (clusters) as proxy to ground truth, some level of classifica- #1 20,27,28,40 5/40 tionperformancecanbeevaluatedforcomparisonpurposes. Turbofandataset #2 40 1/40 These classification methods span all three classes of learn- fromNASA #3 40 1/40 repository ing as mentioned above and corresponding publications are #4 40 1/40 summarizedinTable7. PHM’08Data #5T 1,2,3,4,16,21,40 7/40 challenge #5V 1,2,3,40 4/40 Similar to several classification approaches used, many ap- proacheswereemployedforsolvingtheprognosticsproblem 6 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT for predicting RULs. In order to give due attention to the 6. InformationfusionstepbymergingmultipleRULesti- analysisofprognosticmethods,adiscussionispresentedsep- matesthroughBayesianupdatingaspointedinpublica- aratelyinSection4. tions4,21orinsimilarity-basedmatchingasinpublica- tions1,27,40. Table6. Listofpublicationsfocusedondetectionandpredic- tion. A variety of different uncertainty representation theories are foundtobeused. Table8classifiesdifferentpublicationsac- Purpose PublicationID Rate cordingtothetheoryofuncertaintytreatmentusedincorre- Detection 6,10,11,12,13,31,33,34,35,37 10/40 spondinganalysis(Klir&Wierman,1999). Asshowninthe 1,2,3,4,5,7,8,9,10,13,14,16, table, the probability theory is the most popular one (65%) Prediction 17,19,20,21,22,23,24,25,27,28, 31/40 29,30,32,34,36,37,38,39,40 followedbyset-membershipapproaches(inparticularfuzzy- Other 15,18,26 3/40 setswith15%),Dempster-Shafer’stheoryofbelieffunctions (13%), and other measures (such as polygon area and Cho- Table7. Methodsandreferencesforstate/faultclassification. quetintegral). Classificationmethod PublicationID Table 8. Methods for uncertainty management used on C- MAPSSdatasets. Supervised 6,10,11,12,13,26,31,35 Unsupervised 1,7,11,20,24,32 Theories PublicationID Rate Partiallysupervised 27,33 1,2,3,4,5,6,7,11,12,13,15,16, Probabilitytheory 17,19,20,21,22,26,28,29,30, 26/40 3.4. MethodforTreatmentofUncertainty 31,32,33,34,35 Set-membership 10,14,23,25,36,39 6/40 Given the inherent nature of datasets that include several Belieffunctions 6,10,24,27,33 5/40 noisefactorsandlackofspecificinformationontheeffectsof Othermeasures 10,40 2/40 operationalconditionsitisimportantforalgorithmstomodel andaccountforuncertaintyinthesystem. Differentpublica- 3.5. MethodsusedforPerformanceEvaluation tionshavedealtwithuncertaintyatvariousstagesofprocess- ingasdescribedbelow: Table9summarizestheperformancemeasuresthathavebeen 1. Signal processing step such as noise filtering using a used for prognostics-oriented publications. A taxonomy of Kalmanfilterasinpublications2,3,20,Gaussiankernel performance measures for RUL estimation was proposed in smoothing in publications 1, 7, and functional principal (Saxena,Celaya,etal.,2008;Saxenaetal.,2010),wheredif- componentanalysisinpublication15. ferentcategorieswerepresented: accuracy-based, precision- based,robustness-based,trajectory-based,computationalper- 2. Featureextraction/selectionstepsuchasusingprinci- formance and cost/benefit measures, as well as some mea- pal component analysis and other variants of it as sug- sures dedicated specifically to prognostics (PHM metrics). gestedinpublications1,7,13,grey-correlationinpubli- Since this problem involves predictions on multiple units, it cation22,relevanceoffeaturesforpredictioninpublica- isexpectedthatthemajorityofpublicationswoulduseerror- tion23,andagreedysearchwithresultsvisualizationin based accuracy and precision metrics. Metric like the Mean publication40. Squared Error (MSE) has been used in two different ways- 3. Healthestimationstepsuchasbasedonoperatingcon- fortheestimationofthegoodnessoffitbetweenapredicted ditionsassessmenttonormalize/factorouttheeffectsof and a real signal, and as an accuracy-based metric to aggre- operatingconditionsasproposedinpublications1,7,21, gate errors in RUL estimation. Only the publications that 40andusingnon-linearregression. fall under latter category are included in the table. The ta- 4. Classification step where uncertainty modeling plays a ble clearly shows that accuracy-based measures were most roleondatalabelingusingnoisyandimprecisedegrada- widelyused,inparticularthescoringfunctionfromPHM’08 tionlevelsasshowninpublications12,27,33,oronthe challenge, whichalsoweighsaccuracybytimelinessofpre- inferenceofasequenceofdegradationlevelssuchasus- dictions. Broader usage of this metric is also explained by ingMarkovModelsormulti-modelsasinpublications6, the fact that this is the only metric for which scores from 10,24,32,34. datachallengewereavailableandcanbeusedasbenchmark 5. Prediction step such as gradually incorporating prior to compare with any new development. However, one may knowledgeduringestimationinpresenceofnoiseaspro- also compute additional measures if using only the training posedinpublications4,14,16,17,19,21,30,indeter- datasets where full trajectories are available. In that case, miningfailurethresholdsasinpublications10,27,32or approaches like leave-one-out validation become applicable inrepresentinghealthindicatorsuchasinpublication40 wherealltraininginstancesbutoneareusedfortrainingeach tobeusedinprediction. time and the remaining ones can be used for performance 7 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT evaluation. Then the average of the performance measures Siqueira,Boccato,Attux,&Lyra,2012;Butcher,Verstraeten, is computed from all the runs. Publication 27 presents this Schrauwen, Day, & Haycock, 2013), while evolving mod- approachfordataset#1andacross-validationprocedurefor els (Angelov, Filev, & Kasabov, 2010) are well-suited for dataset#5T isusedinpublication21. Notethatpublications adaptivereal-timelearning. However,theselaterapproaches 19,20,32providetheonlyRULsestimatesforalltestingin- wereevaluatedonlyonafewinstancesofC-MAPSStraining stances(withoutcomputinganymetrics)andpublications10, datasetsinpublications24,25,32. Therefore,itwasdifficult 27presentdistributionoferrors. to evaluate whether reducing learning time, making models gray-boxes,orevenusingevolvingmodelsareefficientstrate- Table9. Performancemeasuresusedinprognostics-oriented gies to perform well on C-MAPSS datasets. More experi- publicationsappliedonC-MAPSS. mentsusingthefulltraining/testingdatasetsmaybeusefulto evaluatetheseapproachesbetter. Categories Measures PublicationID Rate PHM’08Score 1,2,4,5,8,16,21,29,30,40 10/40 Table 10. Category 1 methods using a mapping learned be- FPR,FNR 8,10,27,40 4/40 tweenasubsetofsensormeasurementsasinputsandRULas Accuracy MSE 3,8,15,17,29,40 6/40 output. MAPE 4,23,28,32,34,39,40 7/40 MAE 5,13,38,40 4/40 Methods PublicationID ME 25,28,32,39 4/40 Precision RNN,EKF 2 MAD 25 1/40 MLP,RBF,KF,Ensemble 3 PH 7,22 2/40 MLP 8 α−λ 7,22 2/40 ANN 9 Prognostics RA 7,22,34 3/40 ESN 20 CV 7,22,34 3/40 Fuzzyrules,geneticalgorithm 36 AB 34 1/40 MLP,adaboost 38 4. PROGNOSTICAPPROACHES 4.2. Category2: Functionalmappingbetweenhealthin- dex(HI)andRUL Asshownpreviously,C-MAPSSdatasetshavebeenusedfor thedevelopmentandbenchmarkingofvariousdetectionand Methods listed in Table 11 are based on the estimation of prognosticsapproaches. Thissectionfocusesspecificallyon two mapping functions: One maps sensor measurements to the prognostic approaches which can be divided into three ahealthindex(1-Dvariable)foreachtrainingunitbasedon broadcategoriesasdescribedinthesequel. sensor measurements; The second mapping links health in- dexvaluestotheRUL.Theseapproachesconstructalibrary 4.1. Category 1: Using functional mappings between set ofdegradationmodels. InferenceoftheRULforagiventest ofinputsandRUL instanceincludesusingthelibraryaspriorknowledgetoup- date the parameters of the model corresponding to the new Methods in this category (see Table 10) first transform the testinstance. UpdatingcanbedoneusingBayesruleaspro- training data (trajectories) into a multidimensional feature posedinpublication4orothermodelaveragingorensemble space and use the RULs to label the feature vectors. Then, techniques designed to take into account the uncertainty in- using supervised learning methods, a mapping between fea- herenttothemodelselectionprocess(Raftery,Gneiting,Bal- turevectorsandRULsisdeveloped. Methodswithinthiscat- abdaoui,&Polakowski,2003). egoryaremostlybasedonNeuralNetworkswithvariousar- chitectures. Differentsensorchannelswereusedtogenerate Table12listssomeotherapproachesthatuseapproximation correspondingfeatures.However,itwasobservedthattheap- functionstorepresenttheevolutionofindividualsensormea- proaches yielding good performance also included a feature surementthroughtime. Givenatestinstanceasmanypredic- selectionstepthroughadvancedparameteroptimizationsuch tions are made as the number of sensors. These predictions asusinggeneticalgorithmandKalmanfilteringasdescribed are then used in a classifier that assigns a class label related inpublications2,3thatranked2dand3rdrespectivelyinthe to identified degradation level. Some of these approaches competition. alsoupdateclassifierparameterswithnewmeasurementsus- ing some Bayesian updating rules as mentioned previously. The adaptation to new situations can be difficult with such Thesemethodswerehoweverappliedonlyondataset#1in approachesasitwouldbenecessarytoadaptmodelparame- whichsensorsdepictclearmonotonictrends. tersiftheprocessevolvesasopposedtokeepusingthestatic mapping. Somefurtherdevelopmentshaveconsideredthese Methods in the category 2 using health index to RUL map- issues,forinstance,EchoStateNetworksandExtremeLearn- ping are flexible since new instances are easily incorporated ingMachineareknowntoreducethelearningtimeandcom- intothelibrary. Themaindrawbacksincludethegeneraliza- plexity (Jaeger & Haas, 2004; Huang, Zhu, & Siew, 2004; tioncapabilityofupdatingtechniques(withperformanceesti- 8 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT Table 11. Type 2 methods using health index as input and tation to complex noise appears to be a critical issue since RULasoutput. generally,inpractice,noiseisnotsimplyadditiveandGaus- sian but multimodal and dependent on operating conditions Methods PublicationID (Saxena, Goebel, et al., 2008b). Finally, new model averag- Quadraticfit,Bayesianupdating 4 ingalgorithmscanbearelevantsourceofimprovementasil- Logisticregression 5 lustratedinpublications1,2,3,4,7,21,38,40,forexample Kernelregression,RVM 7 RVM 16 by considering information fusion tools and ensemble tech- Gammaprocess 17 niques(Kuncheva, Bezdek,&Duin, 2001; Francois, Grand- Linear,Bayesianupdating 19 valet, Denoeux,&Roger, 2003; Kuncheva, 2004; Monteith, RVM, SVM, RNN, Exponential and quadratic fit, 21 Carroll,Seppi,&Martinez,2011;Hu,Youn,Wang,&Yoon, Bayesianupdating 2012). Exponentialfit 28 Wienerprocess 29 Copula 30 4.3. Category3: Similarity-basedmatching HMM,LS-SVR 34 Inthesemethods(Table13),historicalinstancesofthesystem (sensormeasurementstrajectorieslabeledwithknownfailure Table 12. Category 2 methods based on individual sensor times) are used to create a library. For a given test instance modelingandclassification. similaritywithinstancesinthelibraryisevaluatedgenerating asetofRemainingUsefulLife(RUL)estimatesthatareeven- Methods PublicationID tuallyaggregatedusingdifferentmethods. Comparedtocat- exTS,supervisedclassification 10 egory2methods,thesemethodsdonotmakeuseoftraining SVR 13 exTS,ARX 14 trajectoryabstractionintofeatures,buttrajectorydata(possi- ANN,ANFIS 23 blyfiltered)arethemselvesstored. Similarityiscomputedin Piece-wiselinear(multi-models) 24 thesensorspaceasinpublication27orusinghealthindices exTS 25 asinpublications1,7,17,21,40. ELM,unsupervisedclassification 32 As mentioned in publications 1, 7, the test instance and the traininginstancemaytakedifferenttimeinreachingapartic- mationonfulltraining/testingdatasets).Theperformancefor ular degradation level from the initial healthy state. There- RUL prediction depends on both mapping functions which fore, similarity-based matching must accommodate this dif- are excellent research topics. The approaches for health in- ferenceintheearlyphasesofdegradationcurves. Inpublica- dex (degradation) estimation and modeling are of great im- tion40,thisproblemwastackledbyassumingaconstantini- portance,notonlyforC-MAPSSdatasetsbutmoregenerally tialwearforallinstancesyieldinganoffsetonhealthindices. forPHM.Themethodusedinpublications1,7wasusedthe Efficientsimilaritymeasuresarealsonecessarytocopewith mostinestimationofthehealthindex(thefirstmappingfunc- noiseanddegradationpaths.Forinstance,inpublications1,7 tion),withsomevariantsproposedinpublications21,17,40. threedifferentsimilaritymeasureswereused,andinpublica- ThehealthindexinC-MAPSSdatasetsisatemporalhidden tion40,computationalgeometrytoolswereusedforinstance variableandstronglyoccludedbyoperatingconditionswhich representationandsimilarityevaluation. havetobeinferredfromnoisysensormeasurements. Thees- timationofthefirstmappingfunctionisthuschallengingand Table13. Category3methodsusingsimilarity-basedmatch- advancedmachinelearningandpatternrecognitiontoolshave ing. tobeconsidered.Sincetheoperatingconditionsrepresentus- ageloadprofileaspointedoutinpublications9,18,theirse- Methods PublicationID quence is important. Therefore, the estimation of the health HI-based3similaritymeasuresandkernelsmoothing 1,7 indexcanbeimprovedbytakingintoaccountthesequenceof Similarto1and7using1similaritymeasure 22 Feature-based similarity, 1 similarity measure, en- 27 operatingconditions. semble,degradationlevelsclassification HI-basedsimilarity,polygoncoveragesimilarity,en- 40 Thesecondmappingfunctiongenerallytakestheformofare- semble gressionmodel(Table11)forwhichnewmethodsforuncer- taintyestimationandpropagationcanbedeveloped(Table8). Nonlinear methods demonstrated better performance since An advantage of approaches in this category is that new in- thedegradationsoftheunitsdemonstratechangingtrends,i.e. stances can be easily incorporated. Moreover, similarity- gradual and slow, sudden and steep, or regular as shown in basedmatchingapproacheshavedemonstratedgoodgeneral- publications1,2,3. Approachesbasedonsequencesofmul- izationcapabilityonallC-MAPSSdatasetsasshowninpub- tiplestates(Moghaddass&Zuo,2014)couldbeconsideredto lications1,7,40despiteahighlevelofnoise,multiplesimul- copewithslopechangeinthehealthindex. Moreover,adap- taneous fault modes, and a number of operating conditions. 9 INTERNATIONALJOURNALOFPROGNOSTICSANDHEALTHMANAGEMENT This category of algorithms are relatively easily parallelized tions, (Saxena, Goebel, et al., 2008b), (Richter, 2012), and toreducecomputationaltimesneededforinference. (T.Wang,2010). Moredetailsaboutthehierarchicaldecom- positionofthesimulatedsystemintocriticalcomponentscan Similarity-based approaches are generally said to be sensi- alsobefoundin(Fredericketal.,2007;Abbas,2010),which tive to the training dataset. However, to our knowledge, providesvaluabledomainknowledge. Thesepublicationsdo this sensitivity has not been studied for C-MAPSS datasets. not focus on the physics-of-failure of turbofan engines but TheimportantnumberoftrainingandtestinginstancesinC- describegenerationofthesedatasetsandvariouspracticalas- MAPSS should help to draw interesting conclusions about pectswhenusingC-MAPSSdatasetsforprognostics. These the behavior of an algorithm and its ability to map features includedescriptionofsensorsmeasurements,illustrationsof to RUL with respect to both the quantity and the quality of operating conditions, impact of fault modes, etc., which can data (Gouriveau, Ramasso, & Zerhouni, 2013). An interest- play an important role in improving data-driven prognostics ing direction should be to consider the performances of the algorithms as well. Going from dataset #1 to #4 repre- similarity-basedapproachesusingmultiplemetricsandcon- sentsvaryingdegreesofcomplexityand, therefore, itisrec- sidering various operating conditions and fault modes. This ommended to use them in that order to incrementally de- studymayhelpfutureend-usersofprognosticsalgorithmsto velopmethodstoaccommodatingindividualcomplexityone selecttheappropriateoneaccordingtothepossibilityofgath- byone. Thechallengedatasetsfallsomewhereinthemiddle eringsufficientandrepresentativedata. asfarascomplexitylevelgoesbutsufferfromavailabilityof groundtruthinformationforaquickerfeedbackduringalgo- 5. SOMEGUIDELINESTOUSINGC-MAPSSDATASETS rithm development. Therefore, these datasets may be used Anothercontributionfromthispaperisthroughsummarizing asvalidation examplesand shouldbe comparedto otherap- some guidelines in using C-MAPSS datasets that may help proachesusingbenchmarkspresentedinSection2.2. future users to understand and utilize these datasets better. [Step2:] Definingtheproblem– Giventhenatureofthese It summarizes information gathered from the literature re- datasets several types of problems can be defined. As men- viewandauthors’ownexperiences,whichinmanycasesgo tioned in Section 3.3 in addition to prediction, a multi-class beyond the documentation provided along with the datasets. classificationproblemcanbedefinedforamultidimensional Specifically,itofferssomegeneralprocessingstepsandlists feature space. However, the intent behind these data was relevant publications that describe implementation of these to promote prognostics algorithm development. Since these preprocessingstepsthatcouldbeusefulindevelopingaprog- data consist of multiple trajectories, the problem to predict nosticalgorithm(Figure3). theRULforalltrajectoriescanbeconstructedjustastheone posedinthedatachallenge. However, onecouldalsodefine theproblematahighergranularitybymodelingthedegrada- tionforeachtrajectoryindividuallyandpredictRULatmulti- pletimeinstances,whichwouldbemoreofaconditionbased prognosticscontext. [Step 3:] Data preparation – After a dataset (turbofan or datachallenge)isselected,itissuggestedtosplittheoriginal trainingdatasetintotwosubsets: atrainingdatasetformodel parameter estimation (learning) and a testing dataset to test the learned model 7 (see for example publications 21, 40). Forthedatasets#1−4correspondingRULvectorsarepro- vided for the test sets so users can validate their algorithms. However,forthechallengedatasets,theevaluationscanonly beobtainedbyuploadingtheRULtothedatarepositoryweb- site. Therefore, it may be desirable to split the training set itself for training, test, and validation purposes during algo- rithmdevelopment. Thenextstepistodownselectsensorsto Figure3. GuidelinestoUsingC-MAPSSDatasets. reduce problem dimensionality. Some data exploration and preparationapproachesforthedatachallenge(datasets#5T Based on the analysis presented in (Section 3), five general and#5V)arewelldescribedinpublications1,2and7.Some dataprocessingandalgorithmicstepsareconsidered: “heuristicrules”toavoidover-predictionsarealsopresented inpublication40andappliedonallfiveC-MAPSSdatasets. [Step 1:] Understanding C-MAPSS datasets – Com- SomeofthebetterperformingmethodsarebasedonaPCA prehensive background information on turbofan engines such as in publication 1, and other sensor selection proce- and C-MAPSS datasets is well presented in three publica- 10
Description: