Elasticsearch: A Complete Guide End-to-end Search and Analytics A course in three modules BIRMINGHAM - MUMBAI Elasticsearch: A Complete Guide Copyright © 2017 Packt Publishing All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Published on: January 2017 Production reference: 1190117 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78728-854-6 www.packtpub.com Credits Authors Content Development Editor Bharvi Dixit Mayur Pawanikar Rafał Kuć Production Coordinator Marek Rogoziński Nilesh Mohite Saurabh Chhajed Reviewers Alberto Paro Hüseyin Akdoğan Julien Duponchelle Marcelo Ochoa Isra El Isa Anthony Lapenna Blake Praharaj Preface Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, open source search and analytics engine. It provides a new level of control over how you can index and search even huge sets of data. This course will take you from basics of Elasticsearch to using Elasticsearch in the Elastic stack, and in production. You will start with very basics of understanding Elasticsearch terminologies and installation & configuration. After this, you will understand the basic analytics and indexing, search, and querying. You will also learn about creating various maps and visualization. You will also get a quick understanding of cluster scaling, search and bulk operations, and more. You will also learn about backups and security. After this, you will dig your teeth deeper into Elasticsearch's internal functionalities including caches, Apache Lucene library, and its monitoring capabilities. You'll learn about practical usage of Elasticsearch configuration parameters and how to use the monitoring API. You will learn how to improve user search experience, index distribution, segment statistics, merging, and more. Once you are a master, it would be time to move on. You will dive into end-to-end visualize-analyze-log techniques with Elastic Stack (also known as the ELK stack). You will look at Elasticsearch, Logstash, and Kibana, and how to make them work together to build amazing insights and business metrics out of data. You will know how to effectively use Elasticsearch with other De facto components and get the most out of Elasticsearch. You will have developed a full-fledged data pipeline by the end of this course. [ i ] Preface What this learning path covers Module 1, Elasticsearch Essentials, this module provides a complete coverage of working with Elasticsearch using Python and as well as Java APIs to perform CRUD operations, aggregation-based analytics, handling document relationships, working with geospatial data, and controlling search relevancy. Module 2, Mastering Elasticsearch, in this module we start with an introduction to the world of Lucene and Elasticsearch. We will discuss topics such as different scoring algorithms, choosing the right store mechanism, what the differences between them are, and why choosing the proper one matters. We touch the administration part of Elasticsearch by discussing discovery and recovery modules and the human-friendly Cat API. Module 3, Learning ELK Stack, this module is aimed at introducing building your own ELK Stack data pipeline using the open source technologies stack of Elasticsearch, Logstash, and Kibana. This module covers the core concepts of each of the components of the stack and quickly using them to build your own log analytics solutions. What you need for this learning path Module 1: This book was written using Elasticsearch version 2.0.0, and all the examples and functions should work with it. Using Oracle Java 1.7u55 and above is recommended for creating Elasticsearch clusters. In addition to this, you'll need a command that allows you to send HTTP requests, such as curl, which is available for most operating systems. In addition to this, this book covers all the examples using Python and Java. For Java examples, you will need to have Java JDK (Java Development Kit) installed and an editor that will allow you to develop your code (or a Java IDE such as Eclipse). Apache Maven have been used to build Java codes. For running Python examples, you will need Python 2.7 and above and also need to install Elasticsearch-Py, the official Python client for Elasticsearch. In addition to this, some chapters may require additional software such as Elasticsearch plugins and other software but it has been explicitly mentioned when certain types of software are needed. [ ii ] Preface Module 2: This book was written for Elasticsearch users and enthusiasts who are already familiar with the basics concepts of this great search server and want to extend their knowledge when it comes to Elasticsearch itself as well as topics such as how Apache Lucene or the JVM garbage collector works. In addition to that, readers who want to see how to improve their query relevancy and learn how to extend Elasticsearch with their own plugin may find this book interesting and useful. If you are new to Elasticsearch and you are not familiar with basic concepts such as querying and data indexing, you may find it hard to use this book, as most of the chapters assume that you have this knowledge already. In such cases, we suggest that you look at our previous book about Elasticsearch— Elasticsearch Server, Second Edition, Packt Publishing. Module 3: You will need the following as a requisite for this module: Unix Operating System (Any flavor) Elasticsearch 1.5.2 Logstash 1.5.0 Kibana 4.0.2 Who this learning path is for This course appeals to anyone who wants to build efficient search and analytics applications. Some development experience is expected. Reader feedback Feedback from our readers is always welcome. Let us know what you think about this course—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the course's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a course, see our author guide at www.packtpub.com/authors. [ iii ] Preface Customer support Now that you are the proud owner of a Packt course, we have a number of things to help you to get the most from your purchase. Downloading the example code You can download the example code files for this course from your account at http://www.packtpub.com. If you purchased this course elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You can download the code files by following these steps: 1. Log in or register to our website using your e-mail address and password. 2. Hover the mouse pointer on the SUPPORT tab at the top. 3. Click on Code Downloads & Errata. 4. Enter the name of the course in the Search box. 5. Select the course for which you're looking to download the code files. 6. Choose from the drop-down menu where you purchased this course from. 7. Click on Code Download. You can also download the code files by clicking on the Code Files button on the course's webpage at the Packt Publishing website. This page can be accessed by entering the course's name in the Search box. Please note that you need to be logged in to your Packt account. Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of: • WinRAR / 7-Zip for Windows • Zipeg / iZip / UnRarX for Mac • 7-Zip / PeaZip for Linux The code bundle for the course is also hosted on GitHub at https://github.com/ PacktPublishing/ElasticSearch-A-Complete-Guide. We also have other code bundles from our rich catalog of books, videos and courses available at https://github.com/PacktPublishing/. Check them out! [ iv ] Preface Errata Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this course. If you find any errata, please report them by visiting http://www.packtpub. com/submit-errata, selecting your course, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/ content/support and enter the name of the book in the search field. The required information will appear under the Errata section. Piracy Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content. Questions If you have a problem with any aspect of this course, you can contact us at [email protected], and we will do our best to address the problem. [ v ]