ebook img

Event Streams in Action: Real-time event systems with Kafka and Kinesis PDF

343 Pages·2019·14.135 MB·344\343
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Event Streams in Action: Real-time event systems with Kafka and Kinesis

Real-time event systems with Kafka and Kinesis Alexander Dean Valentin Crettaz M A N N I N G Event Streams in Action Event Streams in Action REAL-TIME EVENT SYSTEMS WITH KAFKA AND KINESIS ALEXANDER DEAN VALENTIN CRETTAZ MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected] ©2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Acquisitions editors: Mike Stephens and 20 Baldwin Road Frank Pohlmann PO Box 761 Development editors: Jennifer Stout and Cynthia Kane Shelter Island, NY 11964 Technical development editor: Kostas Passadis Review editor: Aleks Dragosacljevic´ Production editor: Anthony Calcara Copy editor: Sharon Wilkey Proofreader: Melody Dolab Technical proofreader: Michiel Trimpe Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617292347 Printed in the United States of America brief contents PART 1 EVENT STREAMS AND UNIFIED LOGS. .............................1 1 ■ Introducing event streams 3 2 ■ The unified log 24 3 ■ Event stream processing with Apache Kafka 38 4 ■ Event stream processing with Amazon Kinesis 60 5 ■ Stateful stream processing 88 PART 2 DATA ENGINEERING WITH STREAMS............................115 6 ■ Schemas 117 7 ■ Archiving events 140 8 ■ Railway-oriented processing 171 9 ■ Commands 208 PART 3 EVENT ANALYTICS .....................................................235 10 ■ Analytics-on-read 237 11 ■ Analytics-on-write 268 v contents preface xiii acknowledgments xiv about this book xvi about the authors xix about the cover illustration xx PART 1 EVENT STREAMS AND UNIFIED LOGS . ...................1 1 Introducing event streams 3 1.1 Defining our terms 4 Events 5 ■ Continuous event streams 6 1.2 Exploring familiar event streams 7 Application-level logging 7 ■ Web analytics 8 Publish/subscribe messaging 10 1.3 Unifying continuous event streams 12 The classic era 13 ■ The hybrid era 16 The unified era 17 1.4 Introducing use cases for the unified log 19 Customer feedback loops 19 ■ Holistic systems monitoring 21 Hot-swapping data application versions 22 vii viii CONTENTS 2 The unified log 24 2.1 Understanding the anatomy of a unified log 25 Unified 25 ■ Append-only 26 ■ Distributed 27 Ordered 28 2.2 Introducing our application 29 Identifying our key events 30 ■ Unified log, e-commerce style 31 Modeling our first event 32 2.3 Setting up our unified log 34 Downloading and installing Apache Kafka 34 ■ Creating our stream 35 ■ Sending and receiving events 36 3 Event stream processing with Apache Kafka 38 3.1 Event stream processing 101 39 Why process event streams? 39 ■ Single-event processing 41 Multiple-event processing 42 3.2 Designing our first stream-processing app 42 Using Kafka as our company’s glue 43 ■ Locking down our requirements 44 3.3 Writing a simple Kafka worker 46 Setting up our development environment 46 ■ Configuring our application 47 ■ Reading from Kafka 49 ■ Writing to Kafka 50 ■ Stitching it all together 51 ■ Testing 52 3.4 Writing a single-event processor 54 Writing our event processor 54 ■ Updating our main function 56 ■ Testing, redux 57 4 Event stream processing with Amazon Kinesis 60 4.1 Writing events to Kinesis 61 Systems monitoring and the unified log 61 ■ Terminology differences from Kafka 63 ■ Setting up our stream 64 Modeling our events 65 ■ Writing our agent 66 4.2 Reading from Kinesis 72 Kinesis frameworks and SDKs 72 ■ Reading events with the AWS CLI 73 ■ Monitoring our stream with boto 79 5 Stateful stream processing 88 5.1 Detecting abandoned shopping carts 89 What management wants 89 ■ Defining our algorithm 90 Introducing our derived events stream 91 CONTENTS ix 5.2 Modeling our new events 92 Shopper adds item to cart 92 ■ Shopper places order 93 Shopper abandons cart 93 5.3 Stateful stream processing 94 Introducing state management 94 ■ Stream windowing 96 Stream processing frameworks and their capabilities 97 Stream processing frameworks 97 ■ Choosing a stream processing framework for Nile 100 5.4 Detecting abandoned carts 101 Designing our Samza job 101 ■ Preparing our project 102 Configuring our job 103 ■ Writing our job’s Java task 104 5.5 Running our Samza job 110 Introducing YARN 110 ■ Submitting our job 111 Testing our job 112 ■ Improving our job 113 PART 2 DATA ENGINEERING WITH STREAMS..................115 6 Schemas 117 6.1 An introduction to schemas 118 Introducing Plum 118 ■ Event schemas as contracts 120 Capabilities of schema technologies 121 ■ Some schema technologies 123 ■ Choosing a schema technology for Plum 125 6.2 Modeling our event in Avro 125 Setting up a development harness 126 ■ Writing our health check event schema 127 ■ From Avro to Java, and back again 129 Testing 131 6.3 Associating events with their schemas 132 Some modest proposals 132 ■ A self-describing event for Plum 135 ■ Plum’s schema registry 137 7 Archiving events 140 7.1 The archivist’s manifesto 141 Resilience 142 ■ Reprocessing 143 ■ Refinement 144 7.2 A design for archiving 146 What to archive 146 ■ Where to archive 147 How to archive 148 7.3 Archiving Kafka with Secor 149 Warming up Kafka 150 ■ Creating our event archive 152 Setting up Secor 153

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.