ebook img

An Introduction to Data Analysis using Aggregation Functions in R PDF

205 Pages·2016·4.564 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Introduction to Data Analysis using Aggregation Functions in R

Simon James An Introduction to Data Analysis using Aggregation Functions in R An Introduction to Data Analysis using Aggregation Functions in R Simon James An Introduction to Data Analysis using Aggregation Functions in R 123 SimonJames SchoolofInformationTechnology DeakinUniversity Burwood,VIC,Australia ISBN978-3-319-46761-0 ISBN978-3-319-46762-7 (eBook) DOI10.1007/978-3-319-46762-7 LibraryofCongressControlNumber:2016953188 ©SpringerInternationalPublishingAG2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface This introduction to the area of aggregation and data analysis is intended for computerscienceorbusinessstudentswhoareinterestedintoolsforsummarizing andinterpretingdata,butwhodonotnecessarilyhaveaformalmathematicsback- ground.Itwasmotivatedbyourneedforsuchatextinpostgraduatedataanalytics subjectsatmyuniversity;howeveritcouldalsobeofinteresttoundergraduatedata sciencestudents.Byreadingthroughthechaptersandcompletingthequestions,you willbeintroducedtosomeoftheissuesthatcanarisewhenwetrytomakesenseof data.Itishopedthatyouwillgainanappreciationfortheusefulnessoftheresults establishedinthefieldofdataaggregationandtheirwideapplicability.Ipersonally find the study of aggregation functions to be very enjoyable and fulfilling, as it allowsforanicemixoftheoreticallyinterestingresultsandpracticalapplications. Theriseofdataanalyticsandthegrowingneedforcompaniestolearnmorefrom their data also make the need for intelligent data aggregation techniques more indispensablethanever. The overall aim of this book is to allow future data analysts to become aware of aggregation functions theory and methods in an accessible way, focusing on a fundamentalunderstandingofthe dataand summarizationtoolsthatcomplements anystudyinstatisticalormachinelearningtechniques.Tothisend,includedineach ofthechaptersisanRtutorialgivinganintroductiontocommandsandtechniques relevanttothetopicsstudied.Thesetutorialsassumenoprogrammingbackground and will give you some exposure to one of the most widely adopted (and freely available)computinglanguagesusedindataanalysis. For a more in-depth overview of aggregation functions, there are some great existingmonographs,includingthefollowingtonameafew. • Beliakov, G., Bustince, H. and Calvo, T.: A Practical Guide to Averaging Functions.Springer,Berlin,NewYork(2015) • Beliakov, G., Pradera, A. and Calvo, T.: Aggregation Functions: A Guide for Practitioners.Springer,Heidelberg(2007) • Gagolewski, M.: Data Fusion. Theory, Methods and Applications. Institute of ComputerScience,PolishAcademyofSciences(2015) v vi Preface • Grabisch, M., Marichal, J.-L., Mesiar, R. and Pap, E.: Aggregation Functions. CambridgeUniversitypress,Cambridge,(2009) • Torra, V. and Narukawa, Y.: Modeling Decisions. Information Fusion and AggregationOperators.Springer,Berlin,Heidelberg(2007) Whilstfocusingonestablishedresults,Ihopethatthefollowingpageswilloffer a fresh perspective on the recent trends in aggregation research. In this respect I am indebted to a number of colleaguesand researcherswho lead this field, many of whom I have been fortunate enough to meet or collaborate with over the span ofmyshortcareer.Inparticular,IwouldliketoacknowledgeHumbertoBustince, Radko Mesiar, and Michal Baczynski, who have all made me feel very welcome when visiting their institutions and participating in conferences overseas. Special mention should be made regarding the works of Michel Grabisch and Ronald R. Yager on fuzzy integrals and OWA operators respectively, which have been a huge influence in my understanding of these areas (and so I hope I do them justiceintheshortintroductionsprovided).Iamgratefultomyfriendandecology research expert, Dale Nimmo, who has helped identify a number of interesting applications aggregation theory. A huge thank-you to Marek Gagolewski for our recentdiscussionsandforhisextensivecommentsondraftsofthistext;Iamlooking forwardto ourfuture researchendeavors!Finally (on the academic side), I would liketothankGlebBeliakovforcontinuingtobeamentortomyresearch.Certainly themajorityofthesechaptershavebeeninfluencedbywhatyouhavetaughtmeand ourworktogether. LastlyIwouldliketoacknowledgethesupportoffamily(inparticularmyMum, whoproofreadsomechaptersofthisbook),friends(especiallyRachel,whohasbeen agreatsupportwhileI’vebeenworkingonthisbookinadditiontoherusualhonest andconstructivecomments),andmyworkcolleagues(averybusy2016wasmade muchmorebearablebyhavingpeoplelikeLauren,Elicia,Guy,Tim,Lei,Crystal, Menuri,Gang,Shui,Vicky,Sutharshan,andMichelletotalkto). Melbourne,VIC,Australia SimonJames August2016 Contents 1 AggregatingDatawithAveragingFunctions ............................. 1 1.1 TheProbleminData:AverageReturns............................... 2 1.2 BackgroundConcepts................................................. 4 1.2.1 WhatisaFunction?.......................................... 4 1.2.2 MultivariateFunctionsandVectors......................... 7 1.3 TheArithmeticMean.................................................. 10 1.3.1 Definition..................................................... 10 1.3.2 Properties..................................................... 12 1.4 TheMedian ............................................................ 14 1.5 TheGeometricandHarmonicMeans................................. 16 1.5.1 TheGeometricMean ........................................ 16 1.5.2 MoreProperties.............................................. 18 1.5.3 TheHarmonicMean......................................... 19 1.6 AveragingAggregationFunctions.................................... 21 1.7 UsesofAggregationFunctions ....................................... 23 1.7.1 SportingStatistics............................................ 24 1.7.2 SimpleForecasting........................................... 24 1.7.3 IndicesofDiversityandEquity ............................. 25 1.8 SummaryofFormulas................................................. 25 1.9 PracticeQuestions..................................................... 26 1.10 RTutorial .............................................................. 27 1.10.1 EnteringCommandsintheConsole ........................ 27 1.10.2 BasicMathematicalOperations............................. 27 1.10.3 AssignmentofVariables..................................... 28 1.10.4 AssigningaVectortoaVariable ............................ 29 1.10.5 BasicOperationsonVectors................................. 29 1.10.6 ExistingFunctions........................................... 30 1.10.7 CreatingNewFunctions..................................... 32 1.11 PracticeQuestionsUsingR ........................................... 33 References..................................................................... 34 vii viii Contents 2 TransformingData........................................................... 37 2.1 TheProbleminData:MulticriteriaEvaluation ...................... 38 2.1.1 WhichisBetter:HigherorLower? ......................... 39 2.1.2 ConsistentScales............................................. 39 2.1.3 DifferencesinDistribution .................................. 39 2.2 BackgroundConcepts................................................. 41 2.2.1 ArraysandMatrices(X)..................................... 41 2.2.2 Matrix/ArrayEntries(x ).................................... 41 ij 2.2.3 Matrix/ArrayRowsandColumns(x;x)................... 42 i j 2.3 NegationsandUtilityTransformations............................... 42 2.4 Scaling,StandardizationandNormalization ......................... 46 2.5 LogandPolynomialTransformations................................ 50 2.6 Piecewise-LinearTransformations.................................... 52 2.7 FunctionsBuiltfromTransformations................................ 56 2.8 PowerMeans........................................................... 57 2.9 Quasi-ArithmeticMeans .............................................. 60 2.10 SummaryofFormulas................................................. 61 2.11 PracticeQuestions..................................................... 62 2.12 RTutorial .............................................................. 63 2.12.1 ReplacingValues............................................. 63 2.12.2 ArraysandMatrices ......................................... 64 2.12.3 ReadingaTable.............................................. 65 2.12.4 TransformingVariables...................................... 67 2.12.5 Rank-BasedScores .......................................... 67 2.12.6 Usingif()forCases....................................... 68 2.12.7 PlottinginTwoVariables.................................... 69 2.12.8 DefiningPowerMeans....................................... 71 2.13 PracticeQuestionsUsingR ........................................... 72 References..................................................................... 72 3 WeightedAveraging ......................................................... 75 3.1 TheProbleminData:GroupDecisionMaking ...................... 76 3.2 BackgroundConcepts................................................. 78 3.2.1 RegressionParameters....................................... 78 3.3 WeightingVectors ..................................................... 79 3.3.1 InterpretingWeights ......................................... 81 3.3.2 Example:WelfareFunctions ................................ 81 3.4 WeightedPowerMeans ............................................... 82 3.5 WeightedMedians..................................................... 84 3.6 ExamplesofOtherWeightingConventions.......................... 85 3.6.1 Simpson’sDominanceIndex................................ 85 3.6.2 Entropy....................................................... 86 3.7 TheBordaCount....................................................... 87 3.8 SummaryofFormulas................................................. 88 3.9 PracticeQuestions..................................................... 89 Contents ix 3.10 RTutorial .............................................................. 91 3.10.1 WeightedArithmeticMeans................................. 91 3.10.2 WeightedPowerMeans...................................... 91 3.10.3 DefaultValuesforFunctions................................ 92 3.10.4 WeightedMedian ............................................ 92 3.10.5 BordaCounts................................................. 93 3.11 PracticeQuestionsUsingR ........................................... 95 References..................................................................... 95 4 AveragingwithInteraction ................................................. 97 4.1 TheProbleminData:SupplementaryAnalytics..................... 98 4.2 BackgroundConcepts................................................. 99 4.2.1 TrimmedandWinsorizedMean............................. 99 4.3 OrderedWeightedAveraging ......................................... 101 4.3.1 Definition..................................................... 101 4.3.2 Properties..................................................... 102 4.3.3 SpecialCases................................................. 102 4.3.4 Orness ........................................................ 103 4.3.5 DefiningWeightingVectors................................. 104 4.4 TheChoquetIntegral.................................................. 108 4.4.1 FuzzyMeasures.............................................. 108 4.4.2 Definition..................................................... 110 4.4.3 SpecialCases................................................. 111 4.4.4 Calculation ................................................... 111 4.4.5 Examples..................................................... 113 4.4.6 Properties..................................................... 115 4.4.7 TheShapleyValue........................................... 116 4.5 SummaryofFormulas................................................. 117 4.6 PracticeQuestions..................................................... 118 4.7 RTutorial .............................................................. 120 4.7.1 OWAOperator ............................................... 121 4.7.2 ChoquetIntegral ............................................. 122 4.7.3 Orness ........................................................ 123 4.7.4 ShapleyValues............................................... 125 4.8 PracticeQuestionsUsingR ........................................... 125 References..................................................................... 126 5 FittingAggregationFunctionstoEmpiricalData ....................... 129 5.1 TheProbleminData:RecommenderSystems....................... 130 5.2 BackgroundConcepts................................................. 134 5.2.1 OptimizationandLinearConstraints........................ 135 5.2.2 EvaluatingandInterpretingaModel’sAccuracy........... 139 5.2.3 FlexibilityandOverfitting................................... 142 5.3 LearningWeightsforAggregationFunctions........................ 143 5.3.1 FittingaWeightedPower(orQuasi-Arithmetic)Mean.... 143 5.3.2 FittinganOWAFunction.................................... 144 5.3.3 FittingtheChoquetIntegral................................. 145 x Contents 5.4 UsingAggregationModelsforAnalysisandPrediction............. 146 5.4.1 ComparingDifferentAveragingFunctions................. 149 5.4.2 MakingInferencesAbouttheImportanceof EachVariable................................................. 155 5.4.3 Make InferencesAboutTendencyToward LowerorHigherInputs...................................... 156 5.4.4 PredictingtheOutputsforUnknown/NewData............ 156 5.5 Reliability.............................................................. 157 5.5.1 DoOury ValuesRepresentthe‘GroundTruth’?.......... 157 i 5.5.2 AreWeEvaluatinganApproachortheModelItself? ..... 158 5.6 Conclusions............................................................ 159 5.7 SummaryofFormulas................................................. 160 5.8 PracticeQuestionsUsingR ........................................... 161 References..................................................................... 162 6 Solutions ...................................................................... 163 AggregatingDatawithAveragingFunctions:Solutions .................... 163 1.9 PracticeQuestions..................................................... 163 1.10 PracticeQuestionsUsingR ........................................... 165 TransformingData:Solutions ................................................ 168 2.11 PracticeQuestions..................................................... 168 2.13 PracticeQuestionsUsingR ........................................... 172 WeightedAggregation:Solutions ............................................ 176 3.9 PracticeQuestions..................................................... 176 3.11 PracticeQuestionsUsingR ........................................... 180 AveragingwithInteraction:Solutions........................................ 182 4.6 PracticeQuestions..................................................... 182 4.7 PracticeQuestionsUsingR ........................................... 190 FittingAggregationFunctionstoEmpiricalData:Solutions................ 192 5.6 PracticeQuestionsUsingR ........................................... 192 Index............................................................................... 197

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.