ebook img

Wineinformatics: A New Data Science Application PDF

76 Pages·2022·2.268 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Wineinformatics: A New Data Science Application

SpringerBriefs in Computer Science Bernard Chen Wineinformatics A New Data Science Application SpringerBriefs in Computer Science SeriesEditors StanZdonik,BrownUniversity,Providence,RI,USA ShashiShekhar,UniversityofMinnesota,Minneapolis,MN,USA XindongWu,UniversityofVermont,Burlington,VT,USA LakhmiC.Jain,UniversityofSouthAustralia,Adelaide,SA,Australia DavidPadua,UniversityofIllinoisUrbana-Champaign,Urbana,IL,USA XueminShermanShen,UniversityofWaterloo,Waterloo,ON,Canada BorkoFurht,FloridaAtlanticUniversity,BocaRaton,FL,USA V.S.Subrahmanian,UniversityofMaryland,CollegePark,MD,USA MartialHebert,CarnegieMellonUniversity,Pittsburgh,PA,USA KatsushiIkeuchi,UniversityofTokyo,Tokyo,Japan BrunoSiciliano,UniversitàdiNapoliFedericoII,Napoli,Italy SushilJajodia,GeorgeMasonUniversity,Fairfax,VA,USA NewtonLee,InstituteforEducation,ResearchandScholarships,LosAngeles,CA, USA SpringerBriefs present concise summaries of cutting-edge research and practical applications across a wide spectrum of fields. Featuring compact volumes of 50 to 125pages,theseriescoversarangeofcontentfromprofessionaltoacademic. Typicaltopicsmightinclude: (cid:129) Atimelyreportofstate-of-theartanalyticaltechniques (cid:129) A bridge between new research results, as published in journal articles, and a contextualliteraturereview (cid:129) Asnapshotofahotoremergingtopic (cid:129) Anin-depthcasestudyorclinicalexample (cid:129) A presentation of core concepts that students must understand in order to make independentcontributions Briefs allow authors to present their ideas and readers to absorb them with minimal time investment. Briefs will be published as part of Springer’s eBook collection, with millions of users worldwide. In addition, Briefs will be available forindividualprintandelectronicpurchase.Briefsarecharacterizedbyfast,global electronic dissemination, standard publishing contracts, easy-to-use manuscript preparation and formatting guidelines, and expedited production schedules. We aim for publication 8–12 weeks after acceptance. Both solicited and unsolicited manuscriptsareconsideredforpublicationinthisseries. **Indexing:ThisseriesisindexedinScopus,Ei-Compendex,andzbMATH** Bernard Chen Wineinformatics A New Data Science Application BernardChen UniversityofCentralArkansas Conway,AR,USA ISSN2191-5768 ISSN2191-5776 (electronic) SpringerBriefsinComputerScience ISBN978-981-19-7368-0 ISBN978-981-19-7369-7 (eBook) https://doi.org/10.1007/978-981-19-7369-7 ©TheAuthor(s),underexclusivelicensetoSpringerNatureSingaporePteLtd.2023 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseof illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similarordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors, and the editorsare safeto assume that the adviceand informationin this bookarebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSingaporePteLtd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Preface Wineisanancientbeverageproducedandenjoyedbyhumansforseveralthousand years, and it has become more popular and affordable nowadays. Hundred million hectoliters of wine was produced across more than 30 countries each year. During the grape growing, wine making, and wine evaluating process, lots of data are generated and stored. With the right data mining/data science techniques, hidden knowledge and information can be retrieved from the large amount of wine- relateddata. This book introduces a new data science application domain named Wineinformatics which combines wine-related data with data mining/data science techniquestodiscovernewtypesofknowledgeinwine.Supervisedlearning,which isoneoffourlearningtypesofalgorithms,appliedonprofessionalwinereviewsis the main focus used in this book. With the development and utilization of the Computational Wine Wheel, large volumes of wine reviews in human language format are converted into computer understandable binary encodings. New infor- mationcanbeminedthroughaproperlydesigneddatasettoanswersometoughand interestingquestionsonwine. Chapter 1 provides an introduction to data science and Wineinformatics. Chapter 2 introduces the development and usage of the Computational Wine Wheel and the three datasets used in this book. Chapter 3 presents a set of experi- ments to answer the question of “How does a wine achieve 90+ scores?” using classificationalgorithms.Chapter4discusseshowtoevaluatewinejudgesandfigure outthequestionof“Arewinereviewersreliableandconsistent?”Chapter5targets the question of “Can actual wine grade and price be predicted through their reviews?”usingregressionanalysis.Chapter6introducesnewerandmorecompli- cated computer science techniques, Multi-Label and Multi-Target, to Wineinformatics. These techniques are useful to work on the problem of “Can wine grade, price and region be predicted altogether with higher accuracy?” v vi Preface Chapter7discusseshowtoextractmoreinformationfromtheComputationalWine Wheeltorespondtothequestionof“Howcancomputersunderstandwinereviews even more?” Finally, Chapter 8 draws the conclusion and provides some of many intriguingfutureworks. Conway,AR,USA BernardChen Contents 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 DataScience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Wineinformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 BookOverview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 DataCollectionandPreprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 WineSensoryReviews. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 WineSpectator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 ComputationalWineWheel. . .. . . . .. . . . .. . . . .. . . . .. . . . .. . 6 2.3.1 WineAromaWheel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 BagofWordsinNaturalLanguageProcessing. . . . . . . . . . 8 2.3.3 DevelopmentoftheComputationalWineWheel(CWW). . . 8 2.3.4 HowtoUsetheCWW2.0. . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 Dataset1:TheBigDataset. . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.2 Dataset2:Twenty-FirstCenturyBordeauxDataset. . . . . . . 15 2.4.3 Dataset3:Twenty-FirstCenturyEliteBordeaux. . . . . . . . . 15 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 ClassificationinWineinformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1 ClassificationandClassificationLabel. . . . . . . . . . . . . . . . . . . . . . 17 3.2 ClassificationAlgorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 SupportVectorMachines(SVM). . . . . . . . . . . . . . . . . . . . 18 3.2.2 NaïveBayesClassifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 ClassificationEvaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 FivefoldCrossValidation. . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 EvaluationMetrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 vii viii Contents 3.4 HowDoesaWineAchieve90+Scores?. . . . . . . . . . . . . . . . . . . . 24 3.4.1 Dataset1:TheBigDatasetResults. . . . . . . . . . . . . . . . . . . 24 3.4.2 Dataset2:Twenty-FirstCenturyBordeauxDataset. . . . . . . 25 3.4.3 Dataset3:Twenty-FirstCenturyEliteBordeaux. . . . . . . . . 25 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 EvaluationofWineJudges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 WineReviewersinWineSpectator. . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 HowtoEvaluateWineReviewers. . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 RankingoftheWineReviewersUsingNaïveBayesClassifier. . . . . 31 4.4 RankingoftheWineReviewersUsingSVM. . . . . . . . . . . . . . . . . 34 4.5 ComparisonofNaïveBayesandSVM. . . . . . . . . . . . . . . . . . . . . . 36 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5 RegressioninWineinformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1 RegressionAnalysisData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Methods. . . . . .. . . . . .. . . . . .. . . . .. . . . . .. . . . . .. . . . .. . . . 40 5.2.1 SupportVectorRegression(SVR). . . . . . . . . .. . . . . . . . . . 40 5.2.2 EvaluationMetrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3 RegressionAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.1 CanWineGradeBePredictedThroughTheirReviews?. . . . 42 5.3.2 CanWinePriceBePredictedThroughTheirReviews?. . .. 43 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6 Multi-Class,Multi-LabelandMulti-TargetinWineinformatics. . . .. 45 6.1 Multi-Class,Multi-LabelandMulti-Target. . . . . . . . . . . . . . . . . . . 45 6.2 WineLabelsforMulti-TargetPrediction. . . . . . . . . . . . . . . . . . . . 46 6.3 HowtoPredictMulti-LabelandMulti-TargetProblems?. . . . . . . . . 47 6.3.1 BinaryRelevance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3.2 LabelPowerset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.3.3 ClassifierChains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.3.4 BayesianClassifierChainsandConditionalDependency Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.4 EvaluationMetrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.5 ImplementationofMulti-LabelandMulti-Target inWineinformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.5.1 ApplyMulti-LabelontheBigDataset. . . . . . . . . . . . . . . . . 51 6.5.2 ApplyMulti-TargetontheBigDataset. . . . . . . . . . . . . . . . 53 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7 AdvancedUsageoftheComputationalWineWheel. . . . . . . . . . . . . . 57 7.1 AdditionalInformationCapturedbytheComputationalWine Wheel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.2 NaïveBayesClassificationAlgorithmforMixedDatatype. . . . . . . 60 Contents ix 7.3 DoesAdditionalAttributesHelp?. . . . . . . . . . . . . . . . . . . . . . . . . 61 7.3.1 CaseStudyontheTwenty-FirstCenturyBordeaux Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 7.3.2 CaseStudyontheTwenty-FirstCenturyEliteBordeaux Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.3.3 TheMoreAttributes,theBetter?. . . . . . . . . . . . . . . . . . . . . 63 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8 ConclusionandFutureWorks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.1 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.2 FutureWorks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.