Table Of Content

Spark™ Big Data Cluster Computing in Production Spark™ Big Data Cluster Computing in Production Ilya Ganelin Ema Orhian Kai Sasaki Brennon York Spark™: Big Data Cluster Computing in Production Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2016 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-1-119-25401-0 ISBN: 978-1-119-25404-1 (ebk) ISBN: 978-1-119-25405-8 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley .com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2016932284 Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Spark is a trademark of The Apache Software Foundation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. About the Authors Ilya Ganelin is a roboticist turned data engineer. After a few years at the University of Michigan building self‐discovering robots and another few years working on embedded DSP software with cell phones and radios at Boeing, he landed in the world of Big Data at the Capital One Data Innovation Lab. Ilya is an active contributor to the core components of Apache Spark and a committer to Apache Apex, with the goal of learning what it takes to build a next‐generation distributed computing platform. Ilya is an avid bread maker and cook, skier, and race‐car driver. Ema Orhian is a passionate Big Data Engineer interested in scaling algorithms. She is actively involved in the Big Data community, organizing and speaking at conferences, and contributing to open source projects. She is the main committer on jaws‐spark‐sql‐rest, a data warehouse explorer on top of Spark SQL. Ema has been working on bringing Big Data analytics into healthcare, developing an end‐to‐end pipeline for computing sta- tistical metrics on top of large datasets. v vi About the Authors Kai Sasaki is a Japanese software engineer who is interested in distributed computing and machine learning. Although the beginning of his career didn’t start with Hadoop or Spark, his original interest toward middleware and fundamental technologies that sup- port a lot of these services and the Internet drives him toward this field. He has been a Spark contributor who develops mainly MLlib and ML libraries. Nowadays, he is trying to research the great potential of combining deep learning and Big Data. He believes that Spark can play a significant role even in artificial intelligence in the Big Data era. GitHub: https://github.com/Lewuathe. Brennon York is an aerobatic pilot moonlighting as a computer scientist. His true loves are distributed computing, scalable architectures, and programming languages. He has been a core contributor to Apache Spark since 2014 with the goal of developing a stron- ger community and inspiring collaboration through development on GraphX and the core build environ- ment. He has had a relationship with Spark since his contributions began and has been taking applications into production with the framework since that time. About the Technical Editors Ted Yu is a Staff Engineer at HortonWorks. He is also an HBase PMC and Spark contributor and has been using/contributing to Spark for more than one year. Dan Osipov is a Principal Consultant at Applicative, LLC. He has been working with Spark for the last two years, and has been working in Scala for about four years, primarily with data tools and applications. Previously he was involved in mobile development and content management systems. Jeff Thompson is a neuro‐scientist turned data scientist with a PhD from UC Berkeley in vision science (primarily neuroscience and brain imaging), and a post‐doc at Boston University’s bio‐medical imaging center. He has spent a few years working at a homeland security startup as an algorithms engineer building next‐gen cargo screening systems. For the last two years he has been a senior data scientist at Bosch, a global engineering and manu- facturing company. Anant Asthana is a Big Data consultant and Data Scientist at Pythian. He has a background in device drivers and high availability/critical load database systems. Bernardo Palacio Gomez is a Consulting Member of the Technical Staff at Oracle on the Big Data Cloud Service Team. Gaspar Munoz works for Stratio (http://www.stratio.com) as a product architect. Stratio was the first Big Data platform based on Spark, so he has worked with Spark since it was in the incubator. He has put into production several projects vii viii About the Technical Editors using Spark core, Streaming, and SQL for some of the most important banks in Spain. He has also contributed to Spark and the spark‐csv projects. Brian Gawalt received a Ph.D. in electrical engineering from UC Berkeley in 2012. Since then he has been working in Silicon Valley as a data scientist, spe- cializing in machine learning over large datasets. Adamos Loizou is a Java/Scala Developer at OVO Energy.

Spark: Big Data Cluster Computing in Production PDF

219 Pages·2016·4.884 MB·English

by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Spark: Big Data Cluster Computing in Production PDF Free - Full Version

by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York| 2016| 219 pages| 4.884| English

Download Spark: Big Data Cluster Computing in Production by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Spark: Big Data Cluster Computing in Production

No description available for this book.

Detailed Information

Author:	Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York
Publication Year:	2016
ISBN:	9781119254010
Pages:	219
Language:	English
File Size:	4.884
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Spark: Big Data Cluster Computing in Production Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Spark: Big Data Cluster Computing in Production PDF?

Yes, on https://PDFdrive.to you can download Spark: Big Data Cluster Computing in Production by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Spark: Big Data Cluster Computing in Production on my mobile device?

After downloading Spark: Big Data Cluster Computing in Production PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Spark: Big Data Cluster Computing in Production?

Yes, this is the complete PDF version of Spark: Big Data Cluster Computing in Production by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Spark: Big Data Cluster Computing in Production PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.