® Greenplum Database 4.2 Administrator Guide P/N: 300-013-163 Rev: A04 Copyright © 2012 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Greenplum Database Administrator Guide 4.2 - Contents Greenplum Database Administrator Guide 4.2 - Contents Preface...............................................................................................1 About This Guide..............................................................................1 Document Conventions....................................................................2 Text Conventions........................................................................2 Command Syntax Conventions...................................................3 Getting Support ...............................................................................3 Product information....................................................................3 Technical support .......................................................................4 Section I: Introduction to Greenplum Chapter 1: About the Greenplum Architecture........................6 About the Greenplum Master............................................................7 About the Greenplum Segments.......................................................7 About the Greenplum Interconnect..................................................7 About Redundancy and Failover in Greenplum Database..................8 About Segment Mirroring............................................................8 About Master Mirroring...............................................................9 About Interconnect Redundancy.................................................9 About Parallel Data Loading............................................................10 About Management and Monitoring.................................................10 Chapter 2: About Distributed Databases.................................12 Understanding How Data is Stored..................................................12 Understanding Greenplum Distribution Policies................................13 Chapter 3: Summary of Greenplum Features.........................14 Greenplum SQL Standard Conformance..........................................14 Core SQL Conformance..............................................................14 SQL 1992 Conformance.............................................................15 SQL 1999 Conformance.............................................................16 SQL 2003 Conformance.............................................................16 SQL 2008 Conformance.............................................................17 Greenplum and PostgreSQL Compatibility.......................................18 Chapter 4: About Greenplum Query Processing ....................25 Understanding Query Planning and Dispatch...................................25 Understanding Greenplum Query Plans...........................................26 Understanding Parallel Query Execution..........................................27 Section II: Access Control and Security Chapter 5: Managing Roles and Privileges..............................30 Security Best Practices for Roles and Privileges...............................30 Creating New Roles (Users).............................................................31 Altering Role Attributes..............................................................31 Creating Groups (Role Membership)................................................32 Managing Object Privileges .............................................................33 Simulating Row and Column Level Access Control .....................34 Encrypting Data..............................................................................34 Encrypting Passwords......................................................................35 Table of Contents iii Greenplum Database Administrator Guide 4.2 - Contents Enabling SHA-256 Encryption....................................................35 Time-based Authentication..............................................................37 Required Permissions.................................................................37 How to Add Time-based Constraints..........................................37 Chapter 6: Configuring Client Authentication.........................41 Allowing Connections to Greenplum Database.................................41 Editing the pg_hba.conf File.......................................................42 Limiting Concurrent Connections.....................................................43 Encrypting Client/Server Connections .............................................44 Chapter 7: Accessing the Database...........................................46 Establishing a Database Session .....................................................46 Supported Client Applications..........................................................47 Greenplum Database Client Applications....................................48 pgAdmin III for Greenplum Database........................................49 Database Application Interfaces.................................................52 Third-Party Client Tools.............................................................53 Troubleshooting Connection Problems.............................................54 Chapter 8: Managing Workload and Resources .....................55 Overview of Greenplum Workload Management..............................55 How Resource Queues Work in Greenplum Database.................55 Steps to Enable Workload Management.....................................59 Configuring Workload Management.................................................60 Creating Resource Queues..............................................................61 Creating Queues with an Active Query Limit..............................61 Creating Queues with Memory Limits.........................................62 Creating Queues with a Query Planner Cost Limits ....................62 Setting Priority Levels................................................................63 Assigning Roles (Users) to a Resource Queue..................................64 Removing a Role from a Resource Queue..................................64 Modifying Resource Queues.............................................................64 Altering a Resource Queue.........................................................65 Dropping a Resource Queue ......................................................65 Checking Resource Queue Status....................................................65 Viewing Queued Statements and Resource Queue Status..........66 Viewing Resource Queue Statistics............................................66 Viewing the Roles Assigned to a Resource Queue......................66 Viewing the Waiting Queries for a Resource Queue....................67 Clearing a Waiting Statement From a Resource Queue..............67 Viewing the Priority of Active Statements..................................68 Resetting the Priority of an Active Statement.............................68 Section III: Database Administration Chapter 9: Defining Database Objects......................................70 Creating and Managing Databases..................................................70 About Template Databases........................................................70 Creating a Database..................................................................70 Viewing the List of Databases....................................................71 Altering a Database...................................................................71 Table of Contents iv Greenplum Database Administrator Guide 4.2 - Contents Dropping a Database.................................................................71 Creating and Managing Tablespaces................................................72 Creating a Filespace...................................................................72 Moving the Location of Temporary or Transaction Files..............73 Creating a Tablespace ...............................................................74 Using a Tablespace to Store Database Objects..........................74 Viewing Existing Tablespaces and Filespaces.............................75 Dropping Tablespaces and Filespaces........................................75 Creating and Managing Schemas.....................................................76 The Default “Public” Schema......................................................76 Creating a Schema....................................................................76 Schema Search Paths................................................................76 Dropping a Schema...................................................................77 System Schemas.......................................................................77 Creating and Managing Tables ........................................................78 Creating a Table........................................................................78 Altering a Table.........................................................................90 Dropping a Table.......................................................................92 Partitioning Large Tables.................................................................92 Understanding Table Partitioning in Greenplum Database..........92 Deciding on a Table Partitioning Strategy..................................93 Creating Partitioned Tables........................................................94 Loading Partitioned Tables.........................................................97 Verifying Your Partition Strategy................................................98 Viewing Your Partition Design....................................................99 Maintaining Partitioned Tables...................................................99 Creating and Using Sequences......................................................103 Creating a Sequence................................................................103 Using a Sequence....................................................................104 Altering a Sequence.................................................................104 Dropping a Sequence...............................................................104 Using Indexes in Greenplum Database..........................................104 Index Types.............................................................................106 Creating an Index....................................................................108 Examining Index Usage...........................................................108 Managing Indexes ...................................................................109 Dropping an Index...................................................................109 Creating and Managing Views........................................................109 Creating Views.........................................................................109 Dropping Views........................................................................109 Chapter 10: Managing Data.......................................................110 About Concurrency Control in Greenplum Database......................110 Inserting New Rows......................................................................111 Updating Existing Rows.................................................................112 Deleting Rows...............................................................................112 Truncating a Table...................................................................112 Working With Transactions............................................................113 Transaction Isolation Levels.....................................................113 Vacuuming the Database ..............................................................114 Table of Contents v Greenplum Database Administrator Guide 4.2 - Contents Configuring the Free Space Map..............................................114 Chapter 11: Querying Data........................................................116 Defining Queries ...........................................................................116 SQL Lexicon.............................................................................116 SQL Value Expressions ............................................................116 Using Functions and Operators......................................................127 Using Functions in Greenplum Database..................................127 User-Defined Functions............................................................128 Built-in Functions and Operators..............................................128 Query Performance.......................................................................144 Query Profiling..............................................................................144 Reading EXPLAIN Output.........................................................145 Reading EXPLAIN ANALYZE Output..........................................146 What to Look for in a Query Plan.............................................147 Chapter 12: Loading and Unloading Data..............................149 Greenplum Database Loading Tools Overview...............................149 About External Tables..............................................................149 About gpload...........................................................................150 About COPY ..............................................................................150 Loading Data into Greenplum Database........................................150 Accessing File-Based External Tables.......................................151 Defining External Tables - Examples.............................................154 Using the Greenplum Parallel File Server (gpfdist)...................157 Using Hadoop Distributed File System (HDFS) Tables..............160 Creating and Using Web External Tables..................................167 Loading Data Using an External Table......................................169 Loading and Writing Custom Data............................................169 Handling Load Errors...............................................................172 Loading Data into the Greenplum Database..................................175 Loading Data with gpload........................................................175 Loading Data with the gphdfs Protocol.....................................176 Loading Data with COPY..........................................................176 Data Loading Performance Tips ...............................................177 Unloading Data from Greenplum Database....................................177 Defining a File-Based Writable External Table..........................178 Defining a Command-Based Writable External Web Table........179 Unloading Data Using a Writable External Table ......................181 Unloading Data Using COPY.....................................................181 Transforming XML Data.................................................................182 Formatting Data Files....................................................................193 Formatting Rows......................................................................193 Formatting Columns................................................................193 Representing NULL Values.......................................................193 Escaping..................................................................................194 Character Encoding..................................................................195 Table of Contents vi Greenplum Database Administrator Guide 4.2 - Contents Section IV: System Administration Chapter 13: Starting and Stopping Greenplum....................197 Overview.......................................................................................197 Starting Greenplum Database.......................................................197 Restarting Greenplum Database..............................................197 Uploading Configuration File Changes Only..............................198 Starting the Master in Maintenance Mode................................198 Stopping Greenplum Database......................................................198 Chapter 14: Configuring Your Greenplum System..............200 About Greenplum Master and Local Parameters............................200 Setting Configuration Parameters..................................................200 Setting a Local Configuration Parameter..................................201 Setting a Master Configuration Parameter...............................201 Viewing Settings of Server Configuration Parameters ...................202 Configuration Parameter Categories..............................................202 Connection and Authentication Parameters..............................203 System Resource Consumption Parameters.............................203 Query Tuning Parameters........................................................205 Error Reporting and Logging Parameters.................................207 System Monitoring Parameters................................................207 Runtime Statistics Collection Parameters.................................208 Automatic Statistics Collection Parameters..............................208 Client Connection Default Parameters......................................209 Lock Management Parameters.................................................209 Workload Management Parameters..........................................209 External Table Parameters.......................................................210 Append-Only Table Parameters................................................210 Database and Tablespace/Filespace Parameters......................210 Past PostgreSQL Version Compatibility Parameters..................210 Greenplum Array Configuration Parameters.............................210 Chapter 15: Enabling High Availability Features.................212 Overview of High Availability in Greenplum Database....................212 Overview of Segment Mirroring...............................................212 Overview of Master Mirroring...................................................213 Overview of Fault Detection and Recovery...............................214 Enabling Mirroring in Greenplum Database....................................215 Enabling Segment Mirroring.....................................................215 Enabling Master Mirroring........................................................216 Knowing When a Segment is Down...............................................217 Enabling Alerts and Notifications..............................................217 Checking for Failed Segments..................................................217 Checking the Log Files.............................................................218 Recovering a Failed Segment........................................................218 Recovering From Segment Failures..........................................219 Recovering a Failed Master............................................................222 Restoring Master Mirroring After a Recovery............................223 Table of Contents vii Greenplum Database Administrator Guide 4.2 - Contents Chapter 16: Backing Up and Restoring Databases..............225 Overview of Backup and Restore Operations.................................225 About Parallel Backups............................................................225 About Non-Parallel Backups.....................................................226 About Parallel Restores............................................................226 About Non-Parallel Restores....................................................227 Backing Up a Database.................................................................228 Using Data Domain Boost........................................................228 Backing Up a Database with gp_dump.....................................231 Automating Parallel Backups with gpcrondump........................232 Restoring From Parallel Backup Files.............................................233 Restoring a Database with gp_restore.....................................233 Restoring a Database Using gpdbrestore.................................235 Restoring to a Different Greenplum System Configuration.......235 Chapter 17: Expanding a Greenplum System.......................237 Planning Greenplum System Expansion.........................................237 System Expansion Overview....................................................237 System Expansion Checklist ....................................................239 Planning New Hardware Platforms...........................................240 Planning Initialization of New Segments..................................240 Planning Table Redistribution...................................................242 Preparing and Adding Nodes.........................................................244 Adding New Nodes to the Trusted Host Environment...............244 Verifying OS Settings...............................................................246 Validating Disk I/O and Memory Bandwidth.............................246 Integrating New Hardware into the System.............................247 Initializing New Segments.............................................................247 Creating an Input File for System Expansion...........................247 Running gpexpand to Initialize New Segments ........................250 Rolling Back an Failed Expansion Setup...................................251 Redistributing Tables.....................................................................251 Ranking Tables for Redistribution ............................................251 Redistributing Tables Using gpexpand......................................252 Monitoring Table Redistribution................................................252 Removing the Expansion Schema..................................................253 Chapter 18: Monitoring a Greenplum System.......................254 Monitoring Database Activity and Performance..............................254 Monitoring System State...............................................................254 Enabling System Alerts and Notifications.................................255 Checking System State............................................................261 Checking Disk Space Usage.....................................................262 Checking for Data Distribution Skew........................................263 Viewing Metadata Information about Database Objects...........264 Viewing the Database Server Log Files..........................................265 Log File Format........................................................................265 Searching the Greenplum Database Server Log Files...............266 Using gp_toolkit............................................................................266 Table of Contents viii Greenplum Database Administrator Guide 4.2 - Contents Chapter 19: Routine System Maintenance Tasks.................268 Routine Vacuum and Analyze........................................................268 Transaction ID Management....................................................268 System Catalog Maintenance...................................................268 Vacuum and Analyze for Query Optimization...........................269 Routine Reindexing.......................................................................269 Managing Greenplum Database Log Files......................................270 Database Server Log Files.......................................................270 Management Utility Log Files...................................................270 Section V: Performance Tuning Chapter 20: Defining Database Performance.......................272 Understanding the Performance Factors........................................272 System Resources...................................................................272 Workload.................................................................................272 Throughput..............................................................................272 Contention...............................................................................273 Optimization............................................................................273 Determining Acceptable Performance............................................273 Baseline Hardware Performance..............................................273 Performance Benchmarks........................................................273 Chapter 21: Common Causes of Performance Issues.........275 Identifying Hardware and Segment Failures..................................275 Managing Workload.......................................................................276 Avoiding Contention......................................................................276 Maintaining Database Statistics.....................................................276 Identifying Statistics Problems in Query Plans.........................276 Tuning Statistics Collection......................................................277 Optimizing Data Distribution.........................................................277 Optimizing Your Database Design..................................................277 Greenplum Database Maximum Limits.....................................278 Chapter 22: Investigating a Performance Problem............279 Checking System State.................................................................279 Checking Database Activity...........................................................279 Checking for Active Sessions (Workload).................................279 Checking for Locks (Contention)..............................................279 Checking Query Status and System Utilization.........................280 Troubleshooting Problem Queries..................................................280 Investigating Error Messages........................................................280 Gathering Information for Greenplum Support.........................281 Section VI: Extending Greenplum Database Chapter 23: Using Greenplum MapReduce............................283 About Greenplum MapReduce.......................................................283 The Basics of MapReduce.........................................................283 How Greenplum MapReduce Works..........................................284 Programming Greenplum MapReduce............................................285 Defining Inputs........................................................................285 Table of Contents ix Greenplum Database Administrator Guide 4.2 - Contents Defining Map Functions............................................................288 Defining Reduce Functions.......................................................290 Defining Outputs......................................................................293 Defining Tasks.........................................................................294 Putting Together a Complete MapReduce Specification............295 Submitting MapReduce Jobs for Execution ....................................295 Troubleshooting Problems with MapReduce Jobs...........................296 Language Does Not Exist.........................................................296 Generic Python Iterator Error ..................................................297 Function Defined Using Wrong MODE.......................................297 Chapter 24: Greenplum PostGIS Extension ..........................301 About PostGIS...............................................................................301 Greenplum PostGIS Extension.......................................................301 Greenplum PostGIS Limitations...............................................301 Enabling PostGIS Support.............................................................301 Usage............................................................................................302 Spatial Indexes........................................................................303 Section VII: References Appendix A: SQL Command Reference.......................................305 SQL Syntax Summary...................................................................307 ABORT..........................................................................................335 ALTER AGGREGATE.......................................................................336 ALTER CONVERSION.....................................................................338 ALTER DATABASE..........................................................................339 ALTER DOMAIN.............................................................................341 ALTER EXTERNAL TABLE...............................................................343 ALTER FILESPACE .........................................................................345 ALTER FUNCTION..........................................................................346 ALTER GROUP...............................................................................349 ALTER INDEX................................................................................350 ALTER LANGUAGE.........................................................................352 ALTER OPERATOR.........................................................................353 ALTER OPERATOR CLASS..............................................................354 ALTER PROTOCOL.........................................................................355 ALTER RESOURCE QUEUE.............................................................357 ALTER ROLE..................................................................................360 ALTER SCHEMA.............................................................................364 ALTER SEQUENCE.........................................................................365 ALTER TABLE ................................................................................368 ALTER TABLESPACE......................................................................380 ALTER TRIGGER............................................................................381 ALTER TYPE...................................................................................382 ALTER USER..................................................................................384 ANALYZE.......................................................................................385 BEGIN...........................................................................................387 CHECKPOINT.................................................................................389 CLOSE...........................................................................................390 Table of Contents x
Description: