ebook img

Streaming Data: Understanding the real-time pipeline PDF

219 Pages·2017·1.55 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Streaming Data: Understanding the real-time pipeline

Understanding the real-time pipeline Andrew G. Psaltis M A N N I N G The streaming data architectural blueprint Browser, Browser, device,vending device,vending machine,etc. machine, etc. Collection Message Analysis In-memory Data tier queuing tier tier data store access tier Longterm storage Sometimes we need to reach back toget data that has just been analyzed. We will not be covering this in detail.But you may want to persist analyzed data for future use. Streaming Data Understanding the real-time pipeline Streaming Data UNDERSTANDING THE REAL-TIME PIPELINE ANDREW G. PSALTIS MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected] ©2017 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Karen Miller 20 Baldwin Road Technical development editor: Gregor Zurowski PO Box 761 Project editor: Janet Vail Shelter Island, NY 11964 Copyeditor: Corbin Collins Proofreader: Elizabeth Martin Technical proofreader: Al Krinker Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617292286 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – EBM – 22 21 20 19 18 17 brief contents PART 1 A NEW HOLISTIC APPROACH...........................................1 1 ■ Introducing streaming data 3 2 ■ Getting data from clients: data ingestion 14 3 ■ Transporting the data from collection tier: decoupling the data pipeline 38 4 ■ Analyzing streaming data 58 5 ■ Algorithms for data analysis 77 6 ■ Storing the analyzed or collected data 95 7 ■ Making the data available 113 8 ■ Consumer device capabilities and limitations accessing the data 135 PART 2 TAKING IT REAL WORLD.............................................157 9 ■ Analyzing Meetup RSVPs in real time 159 v contents preface xi acknowledgments xiii about this book xv PART 1 A NEW HOLISTIC APPROACH.................................1 1 Introducing streaming data 3 1.1 What is a real-time system? 4 1.2 Differences between real-time and streaming systems 7 1.3 The architectural blueprint 9 1.4 Security for streaming systems 10 1.5 How do we scale? 11 1.6 Summary 12 2 Getting data from clients: data ingestion 14 2.1 Common interaction patterns 15 Request/response pattern 16 ■ Request/acknowledge pattern 19 Publish/subscribe pattern 20 ■ One-way pattern 22 Stream pattern 23 vii viii CONTENTS 2.2 Scaling the interaction patterns 25 Request/response optional pattern 25 ■ Scaling the stream pattern 27 2.3 Fault tolerance 28 Receiver-based message logging 31 ■ Sender-based message logging 33 ■ Hybrid message logging 34 2.4 A dose of reality 37 2.5 Summary 37 3 Transporting the data from collection tier: decoupling the data pipeline 38 3.1 Why we need a message queuing tier 39 3.2 Core concepts 40 The producer, the broker, and the consumer 41 ■ Isolating producers from consumers 43 ■ Durable messaging 44 Message delivery semantics 47 3.3 Security 51 3.4 Fault tolerance 52 3.5 Applying the core concepts to business problems 54 3.6 Summary 57 4 Analyzing streaming data 58 4.1 Understanding in-flight data analysis 59 4.2 Distributed stream-processing architecture 63 4.3 Key features of stream-processing frameworks 69 Message delivery semantics 69 4.4 Summary 76 5 Algorithms for data analysis 77 5.1 Accepting constraints and relaxing 78 5.2 Thinking about time 79 Sliding window 81 ■ Tumbling window 83 5.3 Summarization techniques 85 Random sampling 86 ■ Counting distinct elements 87 Frequency 90 ■ Membership 92 5.4 Summary 94

Description:
Summary Streaming Data introduces the concepts and requirements of streaming and real-time data systems. The book is an idea-rich tutorial that teaches you to think about how to efficiently interact with fast-flowing data. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub for
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.