ebook img

Mastering Unlabeled Data (MEAP V5) PDF

315 Pages·2023·3.982 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mastering Unlabeled Data (MEAP V5)

Mastering Unlabeled Data MEAP V05 1. Copyright_2022_Manning_Publications 2. welcome 3. 1_Introduction_to_machine_learning 4. 2_Clustering_techniques 5. 3_Dimensionality_reduction 6. 4_Association_rules 7. 5_Clustering_(advanced) 8. 6_Dimensionality_reduction_(advanced) 9. 7_Unsupervised_learning_for_text_data MEAP Edition Manning Early Access Program Mastering Unlabeled Data Version 5 Copyright 2022 Manning Publications ©Manning Publications Co. To comment go to liveBook https://livebook.manning.com/#!/book/mastering-unlabeled-data/discussion For more information on this and other Manning titles go to manning.com welcome Thank you for purchasing the MEAP for Mastering Unlabeled Data. This is a cliché in today’s world – “Data is the new oil and electricity.” Businesses need to analyze patterns and trends of data, detect anomalies, reduce complexity of really high-dimensional datasets and then make sound decisions. There is an ever-growing need to draw meaningful insights which are sometimes quite difficult for us to comprehend. This book is an attempt to equip you with unsupervised learning techniques which will perform the complex modelling for you. Throughout the book, we introduce an algorithm, examine mathematical and statistical foundation and study the various forms and types. This book is a step to bridge the gap between complex mathematical and statistical concepts and pragmatic real-world case studies. Real-world cases on retail, telecommunication, banking, manufacturing etc. are discussed at length to make the knowledge complete. Python implementation of the datasets completes the knowledge. We are exploring clustering methods, dimensionality reduction methods and advanced concepts of machine learning. You will also work on text and images along with structured datasets. We are examining the best practices to be followed and the common issues faced. We are also dealing with end-to-end model development including model deployment. You will develop thorough knowledge and understanding of unsupervised learning based algorithms. This book is for both budding and experienced data analysts and data scientists. A researcher or a student who wishes to explore unsupervised algorithms or a manager who yearns to feel confident when discussing with their clients are some of the target minds. A curious mind and an attitude to conquer a mountain is required. To get the most benefit from this book, you’ll need a basic understanding of Python and Jupyter notebook. Some basic understanding of data and data science might prove handy. We are working on real-world datasets which are available freely online. Please let me know your thoughts in the liveBook Discussion forum on what's been written so far and what you'd like to see in the rest of the book. Your feedback will be invaluable in improving Mastering Unlabeled Data. Thanks again for your interest and for purchasing the MEAP! —Vaibhav Verdhan In this book Copyright 2022 Manning Publications welcome brief contents 1 Introduction to machine learning 2 Clustering techniques 3 Dimensionality reduction 4 Association rules 5 Clustering (advanced) 6 Dimensionality reduction (advanced) 7 Unsupervised learning for text data 1 Introduction to machine learning “There are only patterns, patterns on top of patterns, patterns that affect other patterns. Patterns hidden by patterns. Patterns within patterns– Chuck Palahniuk” We love patterns. Be it our business or our life, we find patterns and (generally) tend to stick to them. We have our preferences of groceries we buy, telecom operators and calling packs we use, news articles we follow, movie genre and audio tracks we like – these all are examples of patterns of ours preferences. We love patterns, and more then patterns we love finding them, arranging them and may be getting used to them! Then there is a cliché going on - “Data is the new electricity”. Data is indeed precious, nobody can deny that. But data in its purest form will be of no use. We have to clean the data, analyse and visualise it and then we can develop insights from it. Data sciences, machine learning and artificial intelligence are helping us in uncovering these patterns – so that we can take more insightful and balanced decisions in our activities and business. In this book, we are going to solve some of such mysteries. We will be studying a branch of machine learning referred to as Unsupervised Learning. Unsupervised learning solutions, are one of the most influential approaches which are changing face of the industry. They are utilized in banking and finance, retail, insurance, manufacturing, aviation, medical sciences, telecom and almost every sector. Throughout the book, we are discussing concepts of unsupervised learning - the building blocks of algorithms, their nuts and bolts, background processes and mathematical foundation. The concepts are examined, best practices are studied, common errors and pitfalls are analysed and a case study based approach complements the learning. At the same time, we are developing actual Python code for solving such problems. All the codes are accompanied by step-by-step explanation and comments. This book is divided into three parts. First part explores the basics of unsupervised learning and covers easier concepts of k-means clustering, hierarchical clustering, principal component analysis etc. This part gently prepares you for the journey ahead. If you are already well-versed with these topics, you can directly start with second part of the book. Though, it is advisable to give the chapters a quick read to refresh the concepts. The second part is at an intermediate level. We start with association rules algorithm like apriori, ECLAT and sequence rule mining. We then increase the pace and study more complex algorithms and concepts – spectral clustering, GMM clustering, t-SNE, multidimensional scaling (MDS) etc. And then we work on text data in the next chapter. The third and final part is advanced. We are discussing complex topics like Restricted Boltzmann Machine, autoencoders, GANs etc. We also examine end-to-end model development including model deployment, best practices, common pitfalls in the last chapter of the book. By the time you finish this book, you will have a very good understanding of unsupervised technique based machine learning, various algorithms, mathematics and statistical foundation on which the algorithm rests, business use cases, Python implementation and best practices followed. This book is suitable for students and researchers who want to generate in- depth understanding of unsupervised learning algorithms. It is recommended for professionals pursuing data science careers who wish to gather the best practices followed and solution of common challenges faced. The content is well suited for managers and leaders who intend to have a confidence while communicating with teams and clientele. Above all, a curious person who intends to get educated on unsupervised learning algorithms and develop Python experience to solve the case studies is well suited. It is advisable that you have a basic understanding of programming in object- oriented languages like C++, Java, Objective-C etc. We are going to use Python throughout the book, so if you are experienced with Python it will surely help. Basic understanding of mathematics and geometry will help in visualising the results and some knowledge of data related use cases will help to relate to the business use cases. Most important of all, an open mindset to absorb knowledge is necessary throughout the chapters in the book. The first chapter is designed to introduce the concepts of machine learning to you. In this opening chapter, we are going to cover the following topics: 1. Data, data types, data management and quality 2. Data Analysis, Machine Learning, Artificial Intelligence and Deep Learning 3. Nuts and bolts of Machine Learning 4. Different types of machine learning algorithms 5. Technical tool kit available 6. Summary Let’s first understand the smallest grain we have – “data” as the first topic. Welcome to the first chapter and all the very best! 1.1 Data, data types, data management and quality We are starting with the protagonist of everything called “data”. Data can be termed as facts and statistics which are collected for performing any kind of analysis or study. But data has its own traits, attributes, quality measures and management principles. It is stored, exported, loaded, transformed and measured. We are going to study all of it now- starting with the definition of data. Then we will proceed to different types of data, their respective examples and what are the attributes of data which make it useful and of good quality. 1.1.1 What is Data “DATA” is ubiquitous. You make a phone call using a mobile network – you are generating data. You are booking a flight ticket and hotel for upcoming vacation – data is being created. Making a bank transaction, surfing social media and shopping websites online, buying an insurance policy or buying a car – everywhere data originates. It is transformed from one form to another, stored, cleaned, managed and analysed. Formally put - data is a collection of facts, observations, measures, text, numbers, images, videos. It might be clean or unclean, ordered or unordered, having mixed data types or completely pure and historical or real-time. Figure 1-1 How we can transform raw data to become information, knowledge and finally insights which can be used in business to drive decisions and actions Data in itself is not useful till we clean it, arrange it, analyse it and draw insights from it. We can visualise the transition in (Figure 1-1). Raw data is converted to information when we are able to find distinctions in it. When we relate the terms and “connect the dots”, the same piece of information becomes knowledge. Insight is the stage when we are able to find the major centres and significant points. An insight has to be actionable, succinct and direct. For example, if a customer retention team of a telecom operator is told that customers who do not make a call for 9 days have 30% more chances of churn than those who do use, it will be an useful insight on which they can work and try to resolve. Similarly, if a line technician in a manufacturing plant is informed that using Mould ABC results in 60% more defects if used with Mould PQR, they will refrain from using this combination. An insight is quite useful for a business team and hence they can take corrective measures. We now know what is data. Let us study various types of data and their attributes and go deeper in data. 1.1.2 Various types of Data Data is generated across all the transactions we make, be it online mode or offline medium, as we discussed at the start of the section. We can broadly classify the data as shown in (Figure 1-2) below:

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.