Table Of Content

> m o ok.c o b e w o w w. w w < ok o B e w! o W m o d fr a o nl w o D For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Contents at a Glance About the Author ..........................................................................................................xv About the Technical Reviewer ....................................................................................xvii Acknowledgments .......................................................................................................xix Introduction .................................................................................................................xxi ■ Chapter 1: Index Fundamentals ...................................................................................1 ■ Chapter 2: Index Storage Fundamentals ...................................................................15 ■ Chapter 3: Index Statistics ........................................................................................51 ■ Chapter 4: XML, Spatial, and Full-Text Indexing ........................................................91 ■ Chapter 5: Index Myths and Best Practices ............................................................121 ■ Chapter 6: Index Maintenance .................................................................................135 ■ Chapter 7: Indexing Tools ........................................................................................165 ■ Chapter 8: Index Strategies .....................................................................................187 ■ Chapter 9: Query Strategies ....................................................................................235 ■ Chapter 10: Index Analysis ......................................................................................249 Index ...........................................................................................................................325 v Introduction Indexes are important. Not only that, they are vastly important. No single structure aids in retrieving data from a database more than an index. Indexes represent both how data is stored and the access paths by which data can be retrieved from your database. Without indexes, a database is an unordered mess minus the roadmap to find the information you seek. Throughout my experience with customers, one of the most common resolutions that I provide for performance tuning and application outages is to add indexes to their databases. Often, the effort of adding an index or two to the primary tables within a database provides significant performance improvements—much more so than tuning the database on statement. This is because an index can affect the many SQL statements that are being run against the database. Managing indexes may seem like an easy task. Unfortunately, their seeming simplicity is often the key to why they are overlooked. Often there is an assumption from developers that the database administrators will take care of indexing. Or there is an assumption by the database administrators that the developers are building the necessary indexes as they develop features in their applications. While these are primarily cases of miscommunication, people need to know how to determine what indexes are necessary and the value of those indexes. This book provides that information. Outside of the aforementioned scenarios is the fact that applications and how they are used changes over time. Features created and used to tune the database may not be as useful as expected, or a small change may lead to a big change in how the application and underlying database are used. All of this change affects the database and what needs to be accessed. As time goes on, databases and their indexes need to be reviewed to determine if the current indexing is accurate for the new load. This book also provides information in this regard. From beginning to end, this book provides information that can take you from an indexing novice to an indexing expert. The chapters are laid out such that you can start at any place to fill in the gaps in your knowledge and build out from there. Whether you need to understand the fundamentals or you need to start building out indexes, the information is available here. Chapter 1 covers index fundamentals. It lays the ground work for all of the following chapters. This chapter provides information regarding the types of indexes available in SQL Server. It covers some of the primary index types and defines what these are and how to build them. The chapter also explores the options available that can change the structure of indexes. From fill factor to included columns, the available attributes are defined and explained. Chapter 2 picks up where the previous chapter left off. Going beyond defining the indexes available, the chapter looks at the physical structure of indexes and the components that make up indexes. This internal understanding of indexes provides the basis for grasping why indexes behave in certain ways in certain situations. As you examine the physical structures of indexes, you’ll become familiar with the tools you can use to begin digging into these structures on your own. Armed with an understanding of the indexes available and how they are built, Chapter 3 explores the statistics that are stored on the indexes and how to use this information; these statistics provide insight into how SQL Server is utilizing indexes. The chapter also provides information necessary to decipher why an index may not be selected and why it is behaving in a certain way. You will gain a deeper understanding of how this information is collected by SQL Server through dynamic management views and what data is worthwhile to review. xxi ■ IntroduCtIon Not every index type was fully discussed in the first chapter; those types not discussed are covered in Chapter 4. Beyond the classic index structure, there are a few other index types that should also be considered when performance tuning. These indexes are applicable to specific situations. In this chapter, you’ll look into these other index types to understand what they have to offer. You’ll also look at situations where they should be implemented. Chapter 5 identifies and debunks some commonly held myths about indexes. Also, it outlines some best practices in regards to indexing a table. As you move into using tools and strategies to build indexes in the chapters that follow, this information will be important to remember. With a firm grasp of the options for indexing, the next thing that needs to be addressed is maintenance. In Chapter 6, you’ll look at what needs to be considered when maintaining indexes in your environment. First you’ll look at fragmentation. SQL Server is not without tools to automate your ability to build indexes. Chapter 7 explores these tools and looks at ways that you can begin build indexes in your environment today with minimal effort. The two tools discussed are the Missing Index DMVs and the Database Engine Tuning Advisor. You’ll look at the benefits and issues regarding both tools and get some guidance on how to use them effectively in your environment. The tools alone won’t give you everything you need to index your databases. In Chapter 8, you’ll begin to look at how to determine the indexes that are needed for a database and a table. There are a number of strategies for selecting what indexes to build within a database. They can be built according to recommendations by the Query Optimizer. They can also be built to support metadata structures such as foreign keys. For each strategy of indexing there are a number of considerations to take into account when deciding whether or not to build the index. Part of effective indexing is writing queries that can utilize an index on a query. Chapter 9 discusses a number of strategies for indexing. Sometimes when querying data the indexes that you assume will be used are not used after all. These situations are usually tied into how a query is structured or the data that is being retrieved. Indexes can be skipped due to SARGability issues (where the query isn’t being properly selective on the index). They can also be skipped over due to tipping point issues, such as when the number of reads to retrieve data from an index potentially exceeds the reads to scan that or another index. These issues effect index selection as well as the effectiveness and justification for some indexes. Today’s DBA isn’t in a position where they have only a single table to index. A database can have tens, hundred, or thousands of tables, and all of them need to have the proper indexes. In Chapter 10, you’ll learn some methods to approach indexing for a single database but also for all of the databases on a server and servers within your environment. As mentioned, indexes are important. Through the chapters in this book you will become armed with what you need to know about the indexes in your environment. You will also learn how to find the information you need to improve the performance of your environment. xxii Chapter 1 Index Fundamentals The end goal of this book is to help you improve the performance of your databases through the use of indexes. Before we can move toward that end, we must first understand what indexes are and why we need them. We need to understand the differences between how information on a clustered index and heap table is stored. We’ll also look at how nonclustered and column store indexes are laid out and how they rely on other indexes. This chapter will provide the building blocks to understanding the logical design of indexes. Why Build Indexes? Databases exist to provide data. A key piece in providing the data is delivering it efficiently. Indexes are the means to providing an efficient access path between the user and the data. By providing this access path, the user can ask for data from the database and the database will know where to go to retrieve the data. Why not just have all of the data in a table and return it when it is needed? Why go through the exercise of creating indexes? Returning data when needed is actually the point of indexes; they provide that path that is necessary to get to the data in the quickest manner possible. To illustrate, let’s consider an analogy that is often used to describe indexes—a library. When you go to the library, there are shelves upon shelves of books. In this library, a common task repeated over and over is finding a book. Most often we are particular on the book that we need, and we have a few options for finding that book. In the library, books are stored on the shelves using the Dewey Decimal Classification system. This system assigns a number to a book based on its subject. Once the value is assigned, the book is stored in numerical order within the library. For instance, books on science are in the range of 500 to 599. From there, if you wanted a book on mathematics, you would look for books with a classification of 510 to 519. Then to find a book on geometry, you’d look for books numbered 516. With this classification system, finding a book on any subject is easy and very efficient. Once you know the number of the book you are looking for, you can go directly to the stack in the library where the books with 516 are located, instead of wandering through the library until you happen upon geometry books. This is exactly how indexes work; they provide an ordered manner to store information that allows users to easily find the data. What happens, though, if you want to find all of the books in a library written by Jason Strate? You could make an educated guess, that they are all categorized under databases, but you would have to know that for certain. The only way to do that would be to walk through the library and check every stack. The library has a solution for this problem—the card catalog. The card catalog in the library lists books by author, title, subject, and category. Through this, you would be able to find the Dewey Decimal number for all books written by Jason Strate. Instead of wandering through the stacks and checking each book to see if I wrote it, you could instead go to the specific books in the library written by me. This is also how indexes work. The index provides a location of data so that the users can go directly to the data. Without these mechanisms, finding books in a library, or information in a database, would be difficult. Instead of going straight to the information, you’d need to browse through the library from beginning to end to 1 CHAPTER 1 ■ IndEx FundAmEnTAls find what you need. In smaller libraries, such as book mobiles, this wouldn’t be much of a problem. But as the library gets larger and settles into a building, it just isn’t efficient to browse all of the stacks. And when there is research that needs to be done and books need to be found, there isn’t time to browse through everything. This analogy has hopefully provided you with the basis that you need in order to understand the purpose and the need for indexes. In the following sections, we’ll dissect this analogy a bit more and pair it with the different indexing options that are available in SQL Server 2012 databases. Major Index Types You can categorize indexes in different ways. However, it’s essential to understand the three categories described in this particular section: heaps, clustered indexes, and nonclustered indexes. Heap and clustered indexes directly affect how data in their underlying tables are stored. Nonclustered indexes are independent of row storage. The first step toward understanding indexing is to grasp this categorization scheme. Heap Tables As mentioned in the library analogy, in a book mobile library the books available may change often or there may only be a few shelves of books. In these cases the librarian may not need to spend much time organizing the books under the Dewey Decimal system. Instead, the librarian may just number each book and place the books on the shelves as they are acquired. In this case, there is no real order to how the books are stored in the library. This lack of a structured and searchable indexing scheme is referred to as a heap. In a heap, the first row added to the index is the first record in the table, the second row is the second record in the table, the third row is the third record in the table, and so on. There is nothing in the data that is used to specify the order in which the data has been added. The data and records are in the table without any particular order. When a table is first created, the initial storage structure is called a heap. This is probably the simplest storage structure. Rows are inserted into the table in the order in which they are added. A table will use a heap until a clustered index is created on the table (we’ll discuss clustered indexes in the next section). A table can either be a heap or a clustered index, but not both. Also, there is only a single heap structure allowed per table. Clustered Indexes In the library analogy, we reviewed how the Dewey Decimal system defines how books are sorted and stored in the library. Regardless of when the book is added to the library, with the Dewey Decimal system it is assigned a number based on its subject and placed on the shelf between other books of the same subject. The subject of the book, not when it is added, determines the location of the book. This structure is the most direct method to find a book within the library. In the context of a table, the index that provides this functionality in a database is called a clustered index. With a clustered index, one or more columns are selected as the key columns for the index. These columns are used to sort and store the data in the table. Where a library stores books based on their Dewey Decimal number, a clustered index stores the records in the table based on the order of the key columns of the index. The column(s) used as the key columns for a clustered index are selected based on the most frequently used data path to the records in the table. For instance, in a table with states listed, the most common method of finding a record in the table would likely be through the state’s abbreviation. In that situation, using the state abbreviation for the clustering key would be best. With many tables, the primary key or business key will often function as the clustered index clustering key. Both heaps and clustered indexes affect how records are stored in a table. In a clustered index, the data outside the key columns is stored alongside the key columns. This equates to the clustered index as being the physical table itself, just as a heap defines the table. For this reason, a table cannot be both a heap and a clustered index. Also, since a clustered index defines how the data in a table is stored, a table cannot have more than one clustered index. 2 CHAPTER 1 ■ IndEx FundAmEnTAls Nonclustered Indexes As was noted in our analogy, the Dewey Decimal system doesn’t account for every way in which a person may need to search for a book. If the author or title is known, but not the subject, then the classification doesn’t really provide any value. Libraries solve this problem with card catalogs, which provide a place to cross reference the classification number of a book with the name of the author or the book title. Databases are also able to solve this problem with nonclustered indexes. In a nonclustered index, columns are selected and sorted based on their values. These columns contain a reference to the clustered index or heap location of the data they are related to. This is nearly identical to how a card catalog works in a library. The order of the books, or the records in the tables, doesn’t change, but a shortcut to the data is created based on the other search values. Nonclustered indexes do not have the same restrictions as heaps and clustered indexes. There can be many nonclustered indexes on a table, in fact up to 999 nonclustered indexes. This allows alternative routes to be created for users to get to the data they need without having to traverse all records in a table. Just because a table can have many indexes doesn’t mean that it should, as we’ll discuss later in this book. Column Store Indexes One of the problems with card catalogs in large libraries is that there could be dozens or hundreds of index cards that match a title of a book. Each of these index cards contains information such as the author, subject, title, International Standard Book Number (ISBN), page count, and publishing date; along with the Dewey Decimal number. In nearly all cases this additional information is not needed, but it’s there to help filter out index cards if needed. Imagine if instead of dozens or hundreds of index cards to look at, you had a few pieces of paper that only had the title and Dewey Decimal number. Where you previously would have had to look through dozens or hundreds of index cards, you instead are left with a few consolidated index cards. This type of index would be called a column store index. Column store indexes are completely new to SQL Server 2012. Traditionally, indexes are stored in row- based organization, also known as row store. This form of storage is extremely efficient when one row or a small range is requested. When a large range or all rows are returned, this organization can become inefficient. The column store index favors the return of large ranges of rows by storing data in column-wise organization. When you create a column store index, you typically include all the columns in a table. This ensures that all columns are included in the enhanced performance benefits of the column store organization. In a column store index, instead of storing all of the columns for a record together, each column is stored separately with all of the other rows in an index. The benefit of this type of index is that only the columns and rows required for a query need to be read. In data warehousing scenarios, often less than 15 percent of the columns in an index are needed for the results of a query.1 Column store indexes do have a few restrictions on them when compared to other indexes. To begin with, data modifications, such as those through INSERT, UPDATE, and DELETE statements, are disallowed. For this reason, column store indexes are ideally situated for large data warehouses where the data is not changed that frequently. They also take significantly longer to create; at the time of this writing, they average two to three times longer than the time to create a similar nonclustered index. Even with the restrictions above, column store indexes can provide significant value. Consider first that the index only loads the columns from the query that are required. Next consider the compression improvements that similar data on the same page can provide. Between these two aspects, column store indexes can provide significant performance improvements. We’ll discuss these in more depth in later chapters. 1http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C-3EE521C25CE9/Columnstore% 20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf. 3 CHAPTER 1 ■ IndEx FundAmEnTAls Other Index Types Besides the index types just discussed, there are a number of other index types available. These are XML, spatial, and full-text search indexes. These don’t necessarily fit into the library scenario that has been outlined so far, but they are important options. To help illustrate, we’ll be adding some new functionality to the library. Chapter 4 will expand on the information presented here. XML Indexes Suppose we needed a method to be able to search the table of contents for all of the books in the library. A table of contents provides a hierarchical view of a book. There are chapters that outline the main sections for the book; which are followed by subchapter heads that provide more detail of the contents of the chapter. This relationship model is similar to how XML documents are designed; there are nodes and a relation between them that define the structure of the information. As discussed with the card catalog, it would not be very efficient to look through every book in the library to find those that were written by Jason Strate. It would be even less efficient to look through all of the books in the library to find out if any of the chapters in any of the books were written by Ted Krueger. There are probably more than one chapter in each book, resulting in multiple values that would need to be checked for each book and no certainty as to how many chapters would need to be looked at before checking. One method of solving this problem would be to make a list of every book in the library and list all of the chapters for each book. Each book would have one or more chapter entries in the list. This provides the same benefit that a card catalog provides, but for some less than standard information. In a database, this is what an XML index does. For every node in an XML document an entry is made in the XML index. This information is persisted in internal tables that SQL Server can use to determine whether the XML document contains the data that is being queried. Creating and maintaining XML indexes can be quite costly. Every time the index is updated, it needs to shred all of the nodes of the XML document into the XML index. The larger the XML document, the more costly this process will be. However, if data in an XML column will be queried often, the cost of creating and maintaining an XML index can be offset quickly by removing the need to shred all of the XML documents at runtime. Spatial Indexes Every library has maps. Some maps cover the oceans; others are for continents, countries, states, or cities. Various maps can be found in a library, each providing a different view and information of perhaps the same areas. There are two basic challenges that exist with all of these maps. First, you may want to know which maps overlap or include the same information. For instance, you may be interested in all of the maps that include Minnesota. The second challenge is when you want to find all of the books in the library that where written or published at a specific place. Again in this case, how many books were written within 25 miles of Minneapolis? Both of these present a problem because, traditionally, data in a database is fairly one dimensional, meaning that data represent discrete facts. In the physical world, data often exist in more than one dimension. Maps are two dimensional and buildings and floor plans are three dimensional. To solve this problem, SQL Server provides the capabilities for spatial indexes. Spatial indexes dissect the spatial information that is provided into a four-level representation of the data. This representation allows SQL Server to plot out the spatial information, both geometry and geography, in the record to determine where rows overlap and the proximity of one point to another point. There are a few restrictions that exist with spatial indexes. The main restriction is that spatial indexes must be created on tables that have primary keys. Without a primary key, the spatial index creation will not succeed. When creating spatial indexes, they are restricted utilizing parallel processing, and only a single spatial index can 4 CHAPTER 1 ■ IndEx FundAmEnTAls be built at a time. Also, spatial indexes cannot be used on indexed views. These and other restrictions are covered in Chapter 4. Similar to XML indexes, spatial indexes have upfront and maintenance costs associate with their sizes. The benefit is that when spatial data needs to be queried using specific methods for querying spatial data, the value of the spatial index can be quickly realized. Full-Text Search The last scenario to consider is the idea of finding specific terms within books. Card catalogs do a good job of providing information on find books by author, title, or subject. The subject of a book isn’t the only keyword you may want to use to search for books. At the back of many books are keyword indexes to help you find other subjects within a book. When this book is completed, there will be an index and it will have the entry full-text search in it with a reference to this page and other pages where this is discussed in this book. Consider for a moment if every book in the library had a keyword index. Furthermore, let’s take all of those keywords and place them in their own card catalog. With this card catalog, you’d be able to find every book in the library with references to every page that discusses full-text searches. Generally speaking, this is what an implementation of a full-text search provides. Index Variations Up to this point, we’ve looked at the different types of indexes available within a SQL Server. These aren’t the only ways in which indexes can be defined. There are a few index properties that can be used to create variations on the types of indexes discussed previously. Implementing these variations can assist in implementing business rules associated with the data or to help improve the performance of the index. Primary Key In the library analogy, we discussed how all of the books have a Dewey Decimal number. This number identifies each book and where it is in the library. In a similar fashion, an index can be defined to identify a record within a table. To do this, an index is created with a primary key to identify a record within a table. There are some differences between the Dewey Decimal number and a primary key, but conceptually they are the same. A primary key is used to identify a record within a table. For this reason none of the records in a table can have the same primary key value. Typically, a primary key will be created on a single column, though it can be composed of multiple columns. There are a few other things that need to be remembered when using a primary key. First, a primary key is a unique value that identifies each record in a table. Because of this, all values within a primary key must be populated. No null values are allowed in a primary key. Also, there can only be one primary key on a table. There may be other identifying information in a table, but only a single column or set of columns can be identified as the primary key. Lastly, although it is not required, a primary key will typically be built on a clustered index. The primary key will be clustered by default, but this behavior can be overridden and will be ignored if a clustered index already exists. More information on why this is done will be included in Chapter 5. Unique Index As mentioned previously, there can be more than a single column or set of columns that can be used to uniquely identify a record in a table. This is similar to the fact that there is more than one way to uniquely identify a book in a library. Besides the Dewey Decimal number, a book can also be identified through its ISBN. Within a database, this is represented as a unique index. 5