Scala High Performance Programming Leverage Scala and the functional paradigm to build performant software Vincent Theron Michael Diamant BIRMINGHAM - MUMBAI Scala High Performance Programming Copyright © 2016 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: May 2016 Production reference: 1250516 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78646-604-4 www.packtpub.com Credits Authors Copy Editor Vincent Theron Priyanka Ravi Michael Diamant Reviewer Project Coordinator Nermin Šerifović Francina Pinto Commissioning Editor Proofreader Edward Gordon Safis Editing Acquisition Editor Indexer Chaitanya Nair Rekha Nair Content Development Editor Production Coordinator Nikhil Borkar Manu Joseph Technical Editor Cover Work Madhunikita Sunil Chindarkar Manu Joseph About the Authors Vincent Theron is a professional software engineer with 9 years of experience. He discovered Scala 6 years ago and uses it to build highly scalable and reliable applications. He designs software to solve business problems in various industries, including online gambling, financial trading, and, most recently, advertising. He earned a master's degree in computer science and engineering from Université Paris-Est Marne-la-Vallée. Vincent lives in the Boston area with his wife, his son, and two furry cats. To everybody at Packt Publishing, thanks for working so hard to make this book a reality. To Chaitanya Nair, thanks for reaching out to me with this project. To Nikhil Borkar, thanks for providing us with guidance along the way. To Michael Diamant, my coauthor, coworker, and friend, thanks for the knowledge you have brought to this book and for being an inspiration every day. To my parents, thanks for your love and support and for buying me my first computer. And finally, to my wife, Julie, thanks for your constant encouragement and for giving me such a wonderful son. Michael Diamant is a professional software engineer and functional programming enthusiast. He began his career in 2009 focused on Java and the object-oriented programming paradigm. After learning about Scala in 2011, he has focused on using Scala and the functional programming paradigm to build software systems in the financial trading and advertising domains. Michael is a graduate of Worcester Polytechnic Institute and lives in the Boston area. The knowledge I am able to share in this book is the result of a lifetime of support and teaching from others. I want to recognize my coauthor, Vincent, for pushing me to take on this effort and for all the hours spent together developing the thoughts contained in our book. All of my current and former colleagues have helped me sharpen my engineering skills, and without their generosity of sharing their learning, I would not have been able to write this book. In addition to Vincent, I want to call out several colleagues that I feel particularly indebted to: Dave Stevens, Gary Malouf, Eugene Kolnick, and Johnny Everson. Thank you to my parents and my brother for supporting me and shaping me into the individual I am today. I am deeply appreciative of the support my girlfriend, Anna, gave me throughout the writing process. And last but not least, thank you to Packt Publishing for helping us write our first book. About the Reviewer Nermin Šerifović has been a Scala enthusiast since 2009, practicing it professionally since 2011. For most of his career, he has focused on building backend platforms using JVM technologies. Most recently, as VP Engineering at Pingup, he has been leading the development efforts of a local services booking system. Nermin is an instructor at Harvard Extension School, where he co-teaches the Concurrent Programming in Scala course and has also given talks at various conferences. An active Scala community member, Nermin organized the Boston Area Scala Enthusiasts user group and was part of the Northeast Scala Symposium founding team. He is a co- author of the Scala Puzzlers book and co-creator of the Scala Puzzlers website. Nermin holds an M.Eng in computer science from Cornell University, and his areas of interest include distributed systems, along with concurrent, reactive, and functional programming. www.PacktPub.com For support files and downloads related to your book, please visit www.PacktPub.com. eBooks, discount offers, and more Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. h t t p s : / / w w w 2 . p a c k t p u b . c o m / b o o k s / s u b s c r i p t i o n / p a c k t l i b Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browser Free access for Packt account holders Get notified! Find out when new books are published by following @PacktEnterprise on Twitter or the Packt Enterprise Facebook page. Table of Contents Preface 1 Chapter 1: The Road to Performance 6 Defining performance 7 Performant software 7 Hardware resources 8 Latency and throughput 9 Bottlenecks 10 Summarizing performance 11 The problem with averages 11 Percentiles to the rescue 15 Collecting measurements 15 Using benchmarks to measure performance 16 Profiling to locate bottlenecks 17 Pairing benchmarks and profiling 17 A case study 18 Tooling 18 Summary 19 Chapter 2: Measuring Performance on the JVM 20 A peek into the financial domain 20 Unexpected volatility crushes profits 23 Reproducing the problem 24 Throughput benchmark 25 Latency benchmark 27 The first latency benchmark 27 The coordinated omission problem 29 The second latency benchmark 30 The final latency benchmark 30 Locating bottlenecks 32 Did I test with the expected set of resources? 34 Was the system environment clean during the profiling? 36 Are the JVM's internal resources performing to expectations? 38 Where are the CPU bottlenecks? 41 What are the memory allocation patterns? 47 Trying to save the day 51 A word of caution 53 A profiling checklist 53 Taking big steps with microbenchmarks 53 Microbenchmarking the order book 55 Summary 61 Chapter 3: Unleashing Scala Performance 62 Value classes 63 Bytecode representation 63 Performance considerations 65 Tagged types – an alternative to value classes 66 Specialization 68 Bytecode representation 69 Performance considerations 71 Tuples 76 Bytecode representation 76 Performance considerations 76 Pattern matching 79 Bytecode representation 79 Performance considerations 80 Tail recursion 86 Bytecode representation 88 Performance considerations 89 The Option data type 92 Bytecode representation 93 Performance considerations 94 Case study – a more performant option 95 Summary 100 Chapter 4: Exploring the Collection API 101 High-throughput systems – improving the order book 102 Understanding historical trade-offs – list implementation 102 List 103 TreeMap 106 Adding limit orders 107 Canceling orders 110 The current order book – queue implementation 111 Queue 111 Improved cancellation performance through lazy evaluation 113 Set 114 Benchmarking LazyCancelOrderBook 119 Lessons learned 123 [ ii ] Historical data analysis 124 Lagged time series returns 124 Vector 128 Data clean up 129 Handling multiple return series 132 Array 133 Looping with the Spire cfor macro 135 Summary 137 Chapter 5: Lazy Collections and Event Sourcing 138 Improving the client report generation speed 138 Diving into the reporting code 139 Using views to speed up report generation time 142 Constructing a custom view 144 Applying views to improve report generation performance 147 View caveats 150 SeqView extends Seq 150 Views are not memoizers 151 Zipping up report generation 154 Rethinking reporting architecture 155 An overview of Stream 158 Transforming events 161 Building the event sourcing pipeline 166 Streaming Markov chains 170 Stream caveats 174 Streams are memoizers 174 Stream can be infinite 176 Summary 177 Chapter 6: Concurrency in Scala 178 Parallelizing backtesting strategies 178 Exploring Future 180 Future and crazy ideas 184 Future usage considerations 186 Performing side-effects 186 Blocking execution 187 Handling failures 189 Hampering performance through executor submissions 192 Handling blocking calls and callbacks 197 ExecutionContext and blocking calls 197 Asynchronous versus nonblocking 198 Using a dedicated ExecutionContext to block calls 198 [ iii ]