ebook img

Data Analysis with Machine Learning for Psychologists: Crash Course to Learn Python 3 and Machine Learning in 10 hours PDF

169 Pages·2022·9.285 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Analysis with Machine Learning for Psychologists: Crash Course to Learn Python 3 and Machine Learning in 10 hours

Chandril Ghosh Data Analysis with Machine Learning for Psychologists Crash Course to Learn Python 3 and Machine Learning in 10 hours Data Analysis with Machine Learning for Psychologists Chandril Ghosh Data Analysis with Machine Learning for Psychologists Crash Course to Learn Python 3 and Machine Learning in 10 hours Chandril Ghosh Department of Psychology Bath Spa University Newton Park Campus, Bath, UK ISBN 978-3-031-14633-6 ISBN 978-3-031-14634-3 (eBook) https://doi.org/10.1007/978-3-031-14634-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland This book is dedicated to the Psychologists of the future. Acknowledgement This book is a collection of ideas from the fascinating field of machine learning. I am thankful to all those researchers and practitioners of machine learning who con- ducted the research that I have picked and introduced in this book. Without their work, this book would not had been possible. There are also several anonymous authors on websites like Stack Overflow whom I could not cite but their advice, ideas and codes continue to inspire me and this book. I must acknowledge their contribution too. I had also been fortunate to receive insights and feedback from my colleagues, and students. In particular, I would like to thank Mr Richard Joltes, Mr Varun Thakre and Prof Deepak Padmanabhan for reviewing and providing feedback on various chapters of the book. I should thank my parents Chandan and Papiya Ghosh, and my partner Madhumanti Das for their support and faith in me. Finally, I must value and appreciate the constant support of the Project Coordinator Ms. Olivia Ramya Chitranjan, (ex) Senior Editor Lilith Dorko and the team of Springer. They have helped me make this book a reality. vii Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Before Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Overview (of Old Ways to Analyse Data and Some Problems Related to Them) . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Who Am I? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 How Did I Get There? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.4 Who Is This Book For? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.5 Who Is This Book NOT For? . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.6 What’s Special You Get in This Book? . . . . . . . . . . . . . . . . 9 1.1.7 So, What Does This Book Have? . . . . . . . . . . . . . . . . . . . . . 9 1.1.8 How to Best Make Use of This Book? . . . . . . . . . . . . . . . . 9 1.2 Types of Research Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 Explanatory Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.2 Predictive Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.3 Exploratory Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.1 To Collect or Not Collect Your Own Data . . . . . . . . . . . . . . 17 1.3.2 Where to Get the Data From? . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.3 Ways in Which Data Is Divided . . . . . . . . . . . . . . . . . . . . . . 23 1.3.4 Five Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4 Statistics: A Refresher Before Getting into Machine Learning . . . . 26 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2 Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.1 But Do I Have to Learn to Code for Data Analysis? . . . . . . . . . . . . 34 2.2 How to Install Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.1 Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.2 Comparison Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 ix x Contents 2.5 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.6 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.7 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.8 Methods and Functions (Built-Ins) in Python . . . . . . . . . . . . . . . . . 48 2.8.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.8.2 Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.9 Error Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.10 Last Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Problem 1: Duplicate Columns and Categorical Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.2 Problem 2: Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.3 Problem 3: Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.1 Converting Categorical Variables into Numeric Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.2 Converting Continuous Variables into Categorical Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.3 Feature Scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.4 Data Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4.1 Strategy 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4.2 Strategy 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.4.3 Strategy 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.4.4 Strategy 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.4.5 Strategy 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.5 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2.1 Getting Started with Supervised Machine Learning . . . . . . 101 4.2.2 Machine Learning (Classifier): The Leak-Proof Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.2.3 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.2.4 Choosing the Best Model for Classification . . . . . . . . . . . . 119 4.2.5 Optimising the Predictive Accuracies of the Model with Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . 123 4.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.3.1 Regression Using Machine Learning and How to Interpret the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.3.2 Feature Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Contents xi 4.3.3 Exploratory Research Using Unsupervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4.1 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4.2 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.5 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . 148 4.6 Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 5 End Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 About the Author Chandril Ghosh is a UK-based chartered psychologist and is currently working as the Lecturer in Clinical/Counselling Psychology at the Bath Spa University. He completed his BSc in psychology (with honours) and MSc in Clinical Psychology from India. After completing his MSc, Ghosh began to study machine learning and Python programming through books and online materials on the subject. He had no background or prior experience with coding or computer science back then. During his doctoral studies, he utilised his knowledge on the subject to employ machine learning techniques to explore psychopathology. You will find his work published by the BMC Psychiatry journal. Around the same time, he was hired multiple times to design and deliver a crash course on Python 3 and machine learning for post- graduate students at Queen’s University Belfast. Furthermore, he continues to run online courses on the subject outside the University and teaches students from about 56 countries. This book is a product of an accumulation of his hundreds of hours of teaching and feedback from students with social science backgrounds. xiii

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.