ebook img

Google BigQuery Analytics PDF

530 Pages·2014·8.22 MB·English
by  Naidu
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Google BigQuery Analytics

fl ast.indd 01:51:1:PM 05/08/2014 Page xii Google® BigQuery Analytics Jordan Tigani Siddartha Naidu ffi rs.indd 07:22:0:PM 05/07/2014 Page i Google® BigQuery Analytics Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2014 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-1-118-82482-5 ISBN: 978-1-118-82487-0 (ebk) ISBN: 978-1-118-82479-5 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or autho- rization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifi cally disclaim all warranties, including without limitation warranties of fi tness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disap- peared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http:// booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2014931958 Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affi liates, in the United States and other countries, and may not be used without written permission. Google is a registered trademark of Google, Inc. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. Executive Editor Copy Editor Business Manager Proofreader Robert Elliott San Dee Phillips Amy Knies Nancy Carrasco Project Editors Manager of Content Vice President and Executive Technical Proofreader Tom Dinse Development and Assembly Group Publisher Bruce Chhay Kevin Kent Mary Beth Wakefi eld Richard Swadley Indexer Technical Editor Director of Community Associate Publisher Robert Swanson Jeremy Condit Marketing Jim Minatel Cover Design and Image David Mayhew Production Editor Project Coordinator, Cover Wiley Christine Mugnolo Marketing Manager Todd Klemme Lorna Mein ffi rs.indd 07:22:0:PM 05/07/2014 Page ii About the Authors Jordan Tigani has more than 15 years of professional software development experience, the last 4 of which have been spent building BigQuery. Prior to join- ing Google, Jordan worked at a number of star-crossed startups. The startup experience made him realize that you don’t need to be a big company to have Big Data. Other past jobs have been in Microsoft Research and the Windows kernel team. When not writing code, Jordan is usually either running or playing soc- cer. He lives in Seattle with his wife, Tegan, where they both can walk to work. Siddartha Naidu joined Google after fi nishing his doctorate degree in Physics. At Google he has worked on Ad targeting, newspaper digitization, and for the past 4 years on building BigQuery. Most of his work at Google has revolved around data; analyzing it, modeling it, and manipulating large amounts of it. When he is not working on SQL recipes, he enjoys inventing and trying the kitchen variety. He currently lives in Seattle with his wife, Nitya, and son, Vivaan, who are the subjects of his kitchen experiments, and when they are not traveling, they are planning where to travel to next. iii ffi rs.indd 07:22:0:PM 05/07/2014 Page iii About the Technical Editor Jeremy Condit is one of the founding engineers of the BigQuery project at Google, where he has contributed to the design and implementation of BigQuery's API, query engine, and client tools. Prior to joining Google in 2010, he was a researcher in computer science, focusing on programming languages and operating sys- tems, and he has published and presented his research in a number of ACM and Usenix conferences. Jeremy has a bachelor's degree in computer science from Harvard and a Ph.D. in computer science from U.C. Berkeley. About the Technical Proofreader Bruce Chhay is an engineer on the Google BigQuery team. Previously he was at Microsoft, working on large-scale data analytics such as Windows error reporting and Windows usage telemetry. He also spent time as co-founder of a startup. He has a BE in computer engineering and MBA from the University of Washington. iv ffi rs.indd 07:22:0:PM 05/07/2014 Page iv Acknowledgments First, we would like to thank the Dremel and BigQuery teams for building and running a service worth writing about. The last four years since the offsite at Barry’s house, where we decided we weren’t going to build what management suggested but were going to build BigQuery instead, have been an exciting time. More generally, thanks to the Google tech infrastructure group that is home to many amazing people and projects. These are the type of people who say, “Only a petabyte?” and don’t mean it ironically. It is always a pleasure to come to work. There were a number of people who made this book possible: Robert Elliot, who approached us about writing the book and conveniently didn’t mention how much work would be involved; and Kevin Kent, Tom Dinse, and others from Wiley who helped shepherd us through the process. A very special thank you to our tech editor and colleague Jeremy Condit who showed us he can review a book just as carefully as he reviews code. Readers should thank him as well, because the book has been much improved by his suggestions. Other well-deserved thanks go to Bruce Chhay, another BigQuery team member, who volunteered on short notice to handle the fi nal edit. Jing Jing Long, one of the inventors of Dremel, read portions of the book to make sure our descriptions at least came close to matching his implementation. Craig Citro provided moral support with the Python programming language. And we’d like to thank the BigQuery users, whose feedback, suggestions, and even complaints have made BigQuery a better product. — The Authors v ffi rs.indd 07:22:0:PM 05/07/2014 Page v vi Acknowledgments It has been a great experience working on this project with Siddartha; he’s one of the best engineers I’ve worked with, and his technical judgment has formed the backbone of this book. I’d like to thank my parents, who helped inspire the Shakespeare examples, and my wife, Tegan, who inspires me in innumerable other ways. Tegan also lent us her editing skills, improving clarity and making sure I didn’t make too many embarrassing mistakes. Finally, I’d like to thank the Google Cafe staff, who provided much of the raw material for this book. — Jordan Tigani When I was getting started on this project, I was excited to have Jordan as my collaborator. In retrospect, it would have been impossible without him. His productivity can be a bit daunting, but it comes in handy when you need to slack off. I would like to thank my wife, Nitya, for helping me take on this project in addition to my day job. She had to work hard at keeping Vivaan occupied, who otherwise was my excuse for procrastinating. Lastly, I want to thank my parents for their tireless encouragement. — Siddartha Naidu ffi rs.indd 07:22:0:PM 05/07/2014 Page vi Contents Introduction xiii Part I BigQuery Fundamentals CHAPTER1 Chapter 1 The Story of Big Data at Google 3 Big Data Stack 1.0 4 Big Data Stack 2.0 (and Beyond) 5 Open Source Stack 7 Google Cloud Platform 8 Cloud Processing 9 Cloud Storage 9 Cloud Analytics 9 Problem Statement 10 What Is Big Data? 10 Why Big Data? 10 Why Do You Need New Ways to Process Big Data? 11 How Can You Read a Terabyte in a Second? 12 What about MapReduce? 12 How Can You Ask Questions of Your Big Data and Quickly Get Answers? 13 Summary 13 Chapter 2 BigQuery Fundamentals 15 What Is BigQuery? 15 SQL Queries over Big Data 16 Cloud Storage System 21 Distributed Cloud Computing 23 Analytics as a Service (AaaS?) 26 What BigQuery Isn’t 29 BigQuery Technology Stack 31 vii ftoc.indd 07:52:57:AM 05/10/2014 Page vii viii Contents Google Cloud Platform 34 BigQuery Service History 37 BigQuery Sensors Application 39 Sensor Client Android App 40 BigQuery Sensors AppEngine App 41 Running Ad-Hoc Queries 42 Summary 43 Chapter 3 Getting Started with BigQuery 45 Creating a Project 45 Google APIs Console 46 Free Tier Limitations and Billing 49 Running Your First Query 51 Loading Data 54 Using the Command-Line Client 57 Install and Setup 58 Using the Client 60 Service Account Access 62 Setting Up Google Cloud Storage 64 Development Environment 66 Python Libraries 66 Java Libraries 67 Additional Tools 67 Summary 68 Chapter 4 Understanding the BigQuery Object Model 69 Projects 70 Project Names 70 Project Billing 72 Project Access Control 72 Projects and AppEngine 73 BigQuery Data 73 Naming in BigQuery 73 Schemas 75 Tables 76 Datasets 77 Jobs 78 Job Components 78 BigQuery Billing and Quotas 85 Storage Costs 85 Processing Costs 86 Query RPCs 87 TableData.insertAll() RPCs 87 Data Model for End-to-End Application 87 Project 87 Datasets 88 Tables 89 Summary 91 ftoc.indd 07:52:57:AM 05/10/2014 Page viii

Description:
How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. Th
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.