ebook img

Data Mashups in R.: A Case Study in Real-World Data Analysis PDF

38 Pages·2011·4.931 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Mashups in R.: A Case Study in Real-World Data Analysis

www.it-ebooks.info www.it-ebooks.info Data Mashups in R www.it-ebooks.info www.it-ebooks.info Data Mashups in R Jeremy Leipzig and Xiao-Yi Li Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Data Mashups in R by Jeremy Leipzig and Xiao-Yi Li Copyright © 2011 Jeremy Leipzig and Xiao-Yi Li. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editor: Mike Loukides Cover Designer: Karen Montgomery Production Editor: Kristen Borg Interior Designer: David Futato Proofreader: Kristen Borg Illustrator: Robert Romano Printing History: March 2011: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Data Mashups in R, the image of a black-billed Australian bustard, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-30353-2 [LSI] 1299253461 www.it-ebooks.info Table of Contents Introduction ................................................................ vii 1. Mapping Foreclosures ................................................... 1 Messy Address Parsing 1 Exploring “streets” 3 Obtaining Latitude and Longitude Using Yahoo 4 Shaking the XML Tree 5 The Many Ways to Philly (Latitude) 6 Using Data Structures 7 Using Helper Methods 7 Using Internal Class Methods 7 Exceptional Circumstances 8 The Unmappable Fake Street 8 No Connection 9 Taking Shape 9 Finding a Usable Map 10 PBSmapping 10 Developing the Plot 11 Preparing to Add Points to Our Map 12 Exploring R Data Structures: geoTable 14 Making Events of Our Foreclosures 15 Turning Up the Heat 15 Factors When You Need Them 16 Filling with Color Gradients 17 2. Statistics of Foreclosure ................................................. 19 Importing Census Data 19 Descriptive Statistics 22 Descriptive Plots 23 Correlation 25 Final Thoughts 26 v www.it-ebooks.info Appendix: Getting Started ..................................................... 27 vi | Table of Contents www.it-ebooks.info Introduction Programmers may spend a good part of their careers scripting code to conform to com- mercial statistics packages, visualization tools, and domain-specific third-party soft- ware. The same tasks can force end users to spend countless hours in copy-paste pur- gatory, each minor change necessitating another grueling round of formatting tabs and screenshots. Luckily, R scripting offers some reprieve. Because this open source project garners the support of a large community of package developers, the R statistical pro- gramming environment provides an amazing level of extensibility. Data from a multi- tude of sources can be imported into R and processed using R packages to aid statistical analysis and visualization. R scripts can also be configured to produce high-quality reports in an automated fashion—saving time, energy, and frustration. This book will demonstrate how real-world data is imported, managed, visualized, and analyzed within R. Spatial mashups provide an excellent way to explore the capabilities of R—encompassing R packages, R syntax, and data structures. Instead of canned sample data, we will be plotting and analyzing actual current home foreclosure auc- tions. Through this exercise, we hope to provide an general idea of how the R envi- ronment works with R packages as well as its own capabilities in statistical analysis. We will be accessing spatial data in several formats (HTML, XML, shapefiles, and text) both locally and over the web, to produce a map of home foreclosures and perform statistical analysis on these events. vii www.it-ebooks.info www.it-ebooks.info

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.