Table Of Content

MINING SOCIAL MEDIA Finding Stories in Internet Data by Lam Thuy Vo San Francisco MINING SOCIAL MEDIA. Copyright © 2020 by Lam Thuy Vo. Some rights reserved. This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. ISBN-10: 1-59327-916-7 ISBN-13: 978-1-59327-916-5 Publisher: William Pollock Production Editor: Meg Sneeringer Cover Illustration: Gina Redman Developmental Editors: Jan Cash and Alex Freed Technical Reviewer: Melissa Lewis Copyeditor: Rachel Monaghan Compositor: Danielle Foster Proofreader: Emelie Burnette Indexer: Beth Nauman-Montana For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 245 8th Street, San Francisco, CA 94103 phone: 1.415.863.9900; [email protected] www.nostarch.com Library of Congress Cataloging-in-Publication Data: Names: Vo, Lam Thuy, author. Title: Mining social media : finding stories in Internet data / Lam Thuy Vo. Description: San Francisco : No Starch Press, Inc., 2019. | Includes bibliographical references and index. Identifiers: LCCN 2019030568 (print) | LCCN 2019030569 (ebook) | ISBN 9781593279165 (paperback) | ISBN 9781593279172 (ebook) Subjects: LCSH: Social sciences--Research--Methodology. | Internet research. | Data mining. | Social media--Research. | Quantitative research. | Qualitative research. Classification: LCC H61.95 .V63 2019 (print) | LCC H61.95 (ebook) | DDC 302.23/1072--dc23 LC record available at https://lccn.loc.gov/2019030568 LC ebook record available at https://lccn.loc.gov/2019030569 No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the beneﬁt of the trademark owner, with no intention of infringement of the trademark. The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it. To má Lua, ba Liem, and anh Luan About the Author Lam Thuy Vo is a senior reporter at BuzzFeed News where her area of expertise is the intersection of technology, society, and social media data, and where she covers the spread of misinformation, hatred online, and platform-related accountability. Previously, she led teams and reported for The Wall Street Journal, Al Jazeera America, and NPR’s Planet Money, telling economic stories across the US and throughout Asia. She has also worked as an educator for a decade, developing newsroom-wide training programs, workshops for journalists around the world, and semester-long courses for the Craig Newmark CUNY Graduate School of Journalism. She has also spoken at Pop-Up Magazine, the Tribeca Film Festival’s Interactive Day, and TEDxNYC, among other larger events. About the Technical Reviewer Melissa Lewis is a data reporter for Reveal from The Center for Investigative Reporting. Prior to joining Reveal, she was a data editor at The Oregonian, a data engineer at Simple, a data analyst at Periscopic, and a neuroscience research assistant at Oregon Health & Science University. She is an organizer for PyLadies Portland and the Portland chapter of the Asian American Journalists Association. BRIEF CONTENTS Acknowledgments Introduction Part I: Data Mining Chapter 1: The Programming Languages You’ll Need to Know Chapter 2: Where to Get Your Data Chapter 3: Getting Data with Code Chapter 4: Scraping Your Own Facebook Data Chapter 5: Scraping a Live Site Part II: Data Analysis Chapter 6: Introduction to Data Analysis Chapter 7: Visualizing Your Data Chapter 8: Advanced Tools for Data Analysis Chapter 9: Finding Trends in Reddit Data Chapter 10: Measuring the Twitter Activity of Political Actors Chapter 11: Where to Go from Here Index CONTENTS IN DETAIL Acknowledgments Introduction What Is Data Analysis? Who Is This Book For? Conventions Used in This Book What This Book Covers Part I: Data Mining Part II: Data Analysis Downloading and Installing Python Installing on Windows Installing on macOS Getting Help When You’re Stuck Summary PART I: DATA MINING 1 THE PROGRAMMING LANGUAGES YOU’LL NEED TO KNOW Frontend Languages How HTML Works How CSS Works How JavaScript Works Backend Languages Using Python Getting Started with Python Working with Numbers Working with Strings Storing Values in Variables Storing Multiple Values in Lists Working with Functions Creating Your Own Functions Using Loops Using Conditionals Summary 2 WHERE TO GET YOUR DATA What Is an API? Using an API to Get Data Getting a YouTube API Key Retrieving JSON Objects Using Your Credentials Answering a Research Question Using Data Reﬁning the Data That Your API Returns Summary 3 GETTING DATA WITH CODE Writing Your First Script Running a Script Planning Out a Script Libraries and pip Creating a URL-based API Call Storing Data in a Spreadsheet Converting JSON into a Dictionary Going Back to the Script Running the Finished Script Dealing with API Pagination Templates: How to Make Your Code Reusable Storing Values That Change in Variables Storing Code in a Reusable Function Summary 4 SCRAPING YOUR OWN FACEBOOK DATA Your Data Sources Downloading Your Facebook Data Reviewing the Data and Inspecting the Code Structuring Information as Data Scraping Automatically Analyzing HTML Code to Recognize Patterns Grabbing the Elements You Need Extracting the Contents