ebook img

Advanced SQL with SAS PDF

428 Pages·2022·9.042 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Advanced SQL with SAS

The correct bibliographic citation for this manual is as follows: Schendera, Christian FG. 2022. Advanced SQL with SAS®. Cary, NC: SAS Institute Inc. Advanced SQL with SAS® Copyright © 2022, SAS Institute Inc., Cary, NC, USA ISBN 978-1-955977-90-6 (Hardcover) ISBN 978-1-955977-87-6 (Paperback) ISBN 978-1-955977-88-3 (Web PDF) ISBN 978-1-955977-89-0 (EPUB) All Rights Reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only those set forth in this Agreement. SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414 May 2022 SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. SAS software may be provided with certain third-party software, including but not limited to open-source software, which is licensed under its applicable third-party software license agreement. For license information about third-party software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses. Contents Preface ................................................................................................................................................................ix About This Book........................................................................................................................................................x What Is SAS, for Me? ...............................................................................................................................................xi Acknowledgments ...................................................................................................................................................xi Acknowledgments for the 2012 Edition ................................................................................................................. xii About This Book .................................................................................................................................................xv What Does This Book Cover? .................................................................................................................................. xv Is This Book for You? ............................................................................................................................................... xv What Should You Know about the Examples? ....................................................................................................... xvi Software Used to Develop the Book’s Content ................................................................................................. xvi Example Code and Data ....................................................................................................................................xvi SAS OnDemand for Academics ......................................................................................................................... xvi We Want to Hear from You ................................................................................................................................... xvii Author Feedback .................................................................................................................................................. xvii About The Author ..............................................................................................................................................xix Chapter 1: Overview ............................................................................................................................................ 1 1.1 Detailed Description of This Book ...................................................................................................................... 1 1.2 Which SAS Language for the CAS Environment? ............................................................................................... 3 Chapter 2: Missing Values .................................................................................................................................... 5 2.1 From the Start: Defining Missing Values ............................................................................................................ 6 2.1.1 Missing Values in Numeric Variables ......................................................................................................... 6 2.1.2 Missing Values in String Variables .............................................................................................................. 7 2.1.3 Example: Test Data for Handling Missing Values ........................................................................................ 7 2.1.4 Defining Missing Entries when Creating Empty Tables .............................................................................. 8 2.1.5 SQL Approach: Creating an Empty Table Including Integrity Constraints for Missing Values .................... 8 2.1.6 DATA Step Approach ................................................................................................................................. 9 2.2 Queries for Missing Values ................................................................................................................................ 9 2.2.1 Query Missing Values in a Numeric Column ........................................................................................... 10 2.2.2 Exclusion of Missing Values from Two Numerical Columns ..................................................................... 10 2.2.3 Query of User-defined Missing Values .................................................................................................... 11 2.2.4 Query of Missing Values in a String Column ............................................................................................ 11 2.2.5 Variables, Accesses, and Possibly Undesired Results of Queries for Missing Values ............................... 11 2.3 Missing Values in Aggregating Functions ......................................................................................................... 12 2.3.1 Counting Existing, Duplicate, and Missing Values with COUNT ............................................................... 13 2.3.2 Variations in Aggregating Functions ......................................................................................................... 13 2.3.3 Adjusting an Aggregating Function Using CASE ...................................................................................... 14 2.3.4 Final Remarks on DISTINCT ...................................................................................................................... 16 2.4 Possibly Undesirable Results with WHERE, GROUP, and ORDER ..................................................................... 17 2.4.1 Unwanted Results with WHERE ............................................................................................................... 17 2.4.2 Undesired Results when Sorting Data ...................................................................................................... 18 2.4.3 Undesirable Results when Grouping Data ............................................................................................... 19 2.4.4 Undesired Results when Joining Tables ................................................................................................... 21 2.5 Possibly Undesirable Results with Joins ........................................................................................................... 21 2.5.1 Examples I: Self-Joins on a Single Table ................................................................................................... 21 2.5.2 Examples II: Missing Values in Multiple Tables ........................................................................................ 23 iv Advanced SQL with SAS 2.6 Searching and Replacing Missing Values ......................................................................................................... 27 2.6.1 Searching for Missing Values (Screening)................................................................................................. 27 2.6.2 Searching and Replacing of Missing Values (Conversion) ........................................................................ 29 2.7 Predicates ....................................................................................................................................................... 37 2.7.1 ALL, ANY, and SOME ................................................................................................................................. 37 2.7.2 EXISTS or NOT EXISTS ............................................................................................................................... 38 2.7.3 IN and NOT IN .......................................................................................................................................... 38 2.7.4 IS NULL, IS NOT NULL, IS MISSING, and IS NOT MISSING ......................................................................... 38 2.7.5 LIKE and NOT LIKE .................................................................................................................................... 39 Chapter 3: Data Quality with PROC SQL ............................................................................................................. 41 3.1 Integrity Constraints and Audit Trails ............................................................................................................... 41 3.1.1 Integrity Constraints (Check Rules) .......................................................................................................... 42 3.1.2 Audit Trails .............................................................................................................................................. 51 3.2 How to Identify and Filter Multiple Values ..................................................................................................... 58 3.2.1 Approach 1: Displaying Duplicate IDs (Univariate) ................................................................................... 60 3.2.2 Approach 2: Filtering of Duplicate Values (HAVING COUNT) ................................................................... 62 3.2.3 Approach 3: Finding Duplicate Values (Multivariate) .............................................................................. 63 3.2.4 Approach 4: Creating Lists for Duplicate Rows (Macro Variable) ............................................................. 63 3.2.5 Approach 5: Checking for Duplicate Entries (Macro) ............................................................................... 65 3.2.6 Approach 6: Identifying Duplicates in Multiple Tables ............................................................................. 66 3.3 Identifying and Filtering Outliers ..................................................................................................................... 66 3.3.1 Approach 1: Checking for Outliers Using Descriptive Statistics ............................................................... 67 3.3.2 Approach 2: Checking for Outliers Using Statistical Tests (David Test) ..................................................... 68 3.3.3 Approach 3: Filtering Outliers Using Conditions ..................................................................................... 69 3.4 Uniformity: Identify, Filter, and Replace Characters ........................................................................................ 71 3.4.1 Approach 1: Checking for Strings in Terms of Longer Character Strings .................................................. 72 3.4.2 Approach 2: Complete, Partially, or Not at All: Checking for Multiple Characters .................................. 74 3.4.3 Approach 3: Details: Checking for Single Characters ............................................................................... 75 Chapter 4: Macro Programming with PROC SQL ................................................................................................. 79 4.1 Macro Variables ............................................................................................................................................... 80 4.1.1 Automatic SAS Macro Variables ............................................................................................................... 81 4.1.2 Automatic SQL Macro Variables ............................................................................................................... 83 4.1.3 User-defined SAS Macro Variables (INTO) ............................................................................................... 86 4.1.4 Macro Variables, INTO and Possible Loss of Precision ............................................................................. 89 4.1.5 The Many Roads Leading to a Macro Variable ........................................................................................ 90 4.2 Macro Programs with PROC SQL ..................................................................................................................... 91 4.2.1 What Are Macro Programs? ..................................................................................................................... 92 4.2.2 Let’s Do It: SAS Macros with the %LET Statement ................................................................................... 95 4.2.3 Listwise Execution of Commands ............................................................................................................. 97 4.2.4 Condition-based Execution of Commands ............................................................................................. 103 4.2.5 Tips for the Use of Macros ..................................................................................................................... 107 4.3 Elements of the SAS Macro Language ........................................................................................................... 109 4.3.1 SAS Macro Functions ............................................................................................................................. 109 4.3.2 SAS Macro Statements .......................................................................................................................... 121 4.3.3 Interfaces I: From the SAS Macro Facility to the DATA Step ................................................................... 127 4.3.4 Interfaces II: From PROC SQL to the SAS Macro Facility (INTO) ............................................................. 133 4.4 Application 1: Rowwise Data Update Including Security Check................................................................................................................................................. 138 4.5 Application 2: Working with Multiple Files (Splitting) ................................................................................... 140 4.5.1 Splitting a Data Set into Uniformly Filtered Subsets (Split Variable is of Type “String”) ..................................................................................................................................... 140 4.5.2 Splitting a Data Set into Uniformly Filtered Subsets (Split Variable is of Type “Numeric”) .................... 142 Contents v 4.6 Application 3: Transposing a SAS Table (Stack and Unstack) ......................................................................... 144 4.6.1 Stack ....................................................................................................................................................... 144 4.6.2 Unstack .................................................................................................................................................. 144 4.6.3 From Stack to Unstack (From 1 to 3) ...................................................................................................... 145 4.6.4 From Unstack to Stack (From 3 to 1) ...................................................................................................... 151 4.7 Application 4: Macros to Retrieve System Information ................................................................................. 155 4.7.1 Query the Contents of SAS tables (VARLIST, VARLIST2) ......................................................................... 155 4.7.2 Searching a SAS Table for a Specific Column (DO_VAR_EX and DO_VAR_EX2) ...................................... 156 4.7.3 Search in the Dictionary.Options (IS_TERM) .......................................................................................... 158 4.8 Application 5: Creating Folders for Data Storage ........................................................................................... 160 4.8.1 Macro MYSTORAGE and Call .................................................................................................................. 161 4.9 Application 6: Consecutive “Exotic” Names for SAS Columns (“2010”, “2011”, ...) ....................................... 161 4.9.1 Sample Output ....................................................................................................................................... 162 4.9.2 Macro EXOTICS ....................................................................................................................................... 162 4.9.3 Explanation ............................................................................................................................................ 163 4.9.4 Sample Call ............................................................................................................................................. 163 4.10 Application 7: Converting Entire Lists of String Variables to Numeric Variables .......................................... 164 4.10.1 Explanation .......................................................................................................................................... 164 4.10.2 Application ........................................................................................................................................... 164 4.10.3 Requesting Outputs Before and After the Conversion (PROC CONTENTS) .......................................... 166 4.10.4 Further Notes on the Six Steps Presented ........................................................................................... 166 Chapter 5: SQL for Geodata ..............................................................................................................................169 5.1 Geographical Data and Distances .................................................................................................................. 169 5.1.1 Distances in Two-Dimensional Space (Basis: Metric Coordinates) ......................................................... 170 5.1.2 Distances in Spherical Space (Basis: Longitudes and Latitudes) ............................................................ 174 5.1.3 SAS Functions for the Calculation of Geodetic Distances Plus Notes about EUCLID .............................. 178 5.2 SQL Queries for Coordinates .......................................................................................................................... 179 5.3 SQL and Maps ................................................................................................................................................ 183 5.3.1 SAS Program (Five Steps) ....................................................................................................................... 184 5.3.2 Notes and Explanations ......................................................................................................................... 186 Chapter 6: Hash Programming as an Alternative to SQL ....................................................................................187 6.1 What Is Hash Programming? ......................................................................................................................... 187 6.1.1 Hash Programming from Three Angles .................................................................................................. 187 6.1.2 Rowwise Explanation of an Easy Introductory Example......................................................................... 189 6.1.3 Step-by-step Variations of the Introductory Example ............................................................................ 191 6.1.4 Sample Data for Benchmark Tests ......................................................................................................... 193 6.2 Working with One Table ................................................................................................................................ 194 6.2.1 Aggregating with SQL, SUMMARY, and Hash ......................................................................................... 194 6.2.2 Sorting with SORT, SQL, and Hash .......................................................................................................... 198 6.2.3 Subsetting: Filtering and Random Sampling .......................................................................................... 199 6.2.4 Eliminating Duplicate Keys ..................................................................................................................... 203 6.2.5 Querying Values (Retrieval) .................................................................................................................... 205 6.3 Working with Multiple Tables ........................................................................................................................ 209 6.3.1 Inner and Outer Joins with Two Tables .................................................................................................. 209 6.3.2 Fuzzy Join with Two Tables ..................................................................................................................... 213 6.3.3 Splitting of an Unsorted SAS Table ......................................................................................................... 218 6.4 Overview: Elements of Hash Programming ................................................................................................... 220 Chapter 7: FedSQL ............................................................................................................................................223 7.1 Benefits and Specifics of FedSQL Language ................................................................................................... 225 7.1.1 FedSQL Compared to PROC SQL ............................................................................................................ 226 7.1.2 Specifics of FedSQL: SAS versus CAS ...................................................................................................... 231 vi Advanced SQL with SAS 7.1.3 FEDSQL Syntax: Outline and Supported Statements .............................................................................. 233 7.1.4 The FedSQL Environment: Components of SAS Viya: SAS and CAS ........................................................ 234 7.2 Using FEDSQL Syntax in SAS and CAS ............................................................................................................. 235 7.2.1 FEDSQL on SAS: Replacing SQL with FEDSQL Successfully ..................................................................... 236 7.2.2 FEDSQL on SAS: Replacing PROC SQL with FEDSQL with Tuning ............................................................ 238 7.2.3 Using FedSQL in CAS: PROC FEDSQL and PROC CAS ............................................................................. 245 7.2.4 Not Just FEDSQL on CAS: DATA Step and Others ................................................................................... 256 7.2.5 CAS Procedures: Statistics, Data Mining, and Machine Learning .......................................................... 258 7.3 FEDSQL in DS2 Programs ............................................................................................................................... 259 7.3.1 Building Blocks of DS2 ............................................................................................................................ 259 7.3.2 Applications of DS2 Programs ................................................................................................................ 260 7.4 FEDSQL Syntax (Details) ................................................................................................................................. 270 7.4.1 Outline of FEDSQL Syntax ...................................................................................................................... 271 7.4.2 Short Overview of FEDSQL Statements .................................................................................................. 271 7.4.3 Short Description of FEDSQL Connection Options ................................................................................. 272 7.4.4 Short Description of FEDSQL Processing Options .................................................................................. 273 7.4.5 Attributes for the CONN= Option (Connection Option) ......................................................................... 274 7.4.6 Parameters for the CNTL= Option (Processing Option) ......................................................................... 278 7.5 Summary: When to Use FedSQL and When to Use SQL ................................................................................ 279 Chapter 8: Performance and Efficiency .............................................................................................................281 8.1 Introduction to Performance and Efficiency .................................................................................................. 281 8.1.1 The Price of Performance (Efficiency) .................................................................................................... 282 8.1.2 The Need for Speed: Considerations and Tips for Better Performance ................................................. 282 8.1.3 A Strategy as a SAS Program .................................................................................................................. 283 8.2 Less is More: Narrowing Large Tables Down to the Essential ........................................................................ 284 8.2.1 Reduce Columns: KEEP, DROP, and SELECT Statements ......................................................................... 285 8.2.2 Reduce Rows: WHERE Statement, Subsetting IF, and Data Set Options ................................................ 286 8.2.3 Reduce Tables by Deleting (DELETE) ...................................................................................................... 287 8.2.4 Reduce Structures: Transposing Tables .................................................................................................. 287 8.2.5 Eliminate Unnecessary Characters by Means of Tag Sets ...................................................................... 288 8.2.6 Aggregate ............................................................................................................................................... 288 8.3 Squeeze Even More Air Out of Data: Shortening and Compression .............................................................. 289 8.3.1 Shortening the Variable Length (LENGTH Statement)............................................................................ 289 8.3.2 To Compress or Not Compress? COMPRESS Function ........................................................................... 290 8.4 Sorting? The Fewer the Better ....................................................................................................................... 291 8.4.1 No Sorting: Do Without ORDER BY ........................................................................................................ 291 8.4.2 Minimal Sorting Using CREATE INDEX: Indexing Instead of Sorting ....................................................... 292 8.4.3 Minimal Reading (BY Processing) .......................................................................................................... 294 8.4.4 Sorting Supported by Hardware (Multi-Threading) ............................................................................... 294 8.5 Accelerate: Special Tricks for Special Occasions (SQL and More) ............................................................................................................................................... 295 8.5.1 Fundamental Techniques ....................................................................................................................... 295 8.5.2 SQL-specific Techniques ......................................................................................................................... 297 8.5.3 Other Techniques ................................................................................................................................... 300 8.6 Data Processing in SAS or in the DBMS: Tuning of SQL to DBMS ................................................................................................................................................... 301 8.6.1 Transfer of SQL to the DBMS .................................................................................................................. 301 8.6.2 Tracing of SQL Passed to the DBMS (SASTRACE) .................................................................................... 302 8.6.3 PROC SQL Operations that SAS can Pass to the DBMS ........................................................................... 302 8.6.4 Passing an SQL Statement Directly to a DBMS (DIRECT_EXE) ............................................................... 304 8.6.5 More Tips and Tricks for Performance: VIO Method ............................................................................. 304 8.7 Step by Step to Greater Performance: Performance as a Strategy ................................................................ 305 8.7.1 Optimization of I/O ................................................................................................................................ 305 8.7.2 Optimization of Memory Usage ............................................................................................................. 305 Contents vii 8.7.3 Optimization of CPU Performance ......................................................................................................... 305 8.7.4 Plan the Program, Program the Plan, Test the Program ......................................................................... 306 8.7.5 Go Beyond SQL and Programming ......................................................................................................... 306 8.8 Beyond SQL and Programming: Environments .............................................................................................. 307 8.8.1 SAS Programming on a SAS Grid ............................................................................................................ 307 8.8.2 SAS Cloud Analytic Services (CAS) ......................................................................................................... 308 Chapter 9: Tips, Tricks, and More ......................................................................................................................309 9.1 Runtime as the Key to Performance .............................................................................................................. 309 9.1.1 Overview ................................................................................................................................................ 310 9.1.2 Scenario I: Default Setting: CPU Time versus Real time ......................................................................... 310 9.1.3 Scenario II: FULLSTIMER Option: CPU time, Real time, and Memory .................................................... 311 9.1.4 Scenario III: Aggregating Calculation of the Runtime (Macro) ............................................................... 312 9.1.5 Scenario IV: Differentiated Calculation of the Runtime (ARM Macros) ................................................. 313 9.2 To Be in the Know: SAS Dictionaries .............................................................................................................. 317 9.2.1 Query a Dictionary as a Table and View ................................................................................................. 319 9.2.2 Refining the Query of a Dictionary (WHERE) ......................................................................................... 319 9.2.3 Example 1: Finding the Storage Location of Certain Variables (Macro WHR_IS_VAR) ........................... 320 9.2.4 Example 2: Query the Number of Rows in Specific Tables (Macro N_ROWS) ........................................ 322 9.2.5 Example 3: Renaming Complete Variable Lists Using Dictionaries ........................................................ 322 9.3 Data Handling and Data Structuring .............................................................................................................. 323 9.3.1 Topic 1: Creating “Exotic” Column Names ............................................................................................. 324 9.3.2 Topic 2: Creating a Primary Key (MONOTONIC and Safer Options) ........................................................ 324 9.3.3 Topic 3: Analyze and Structure: Segmenting a SAS Table (MOD Function) ............................................ 326 9.3.4 Topic 4: Defining a Tag Set for the Export of SAS Tables into CSV Format (PROC TEMPLATE) ................ 328 9.3.5 Topic 5: Protecting Contents of SAS Tables (Passwords) ........................................................................ 329 9.4 Updating Tables (SQL versus DATA Step) ....................................................................................................... 330 9.4.1 Scenario I: MASTER/UPDATE without Multiple IDs: Problem: System-defined Missing Values ........... 331 9.4.2 Scenario II: MASTER/UPDATE without Multiple IDs: Problem: User-defined Missing Values ............. 332 9.4.3 Scenario III: MASTER/UPDATE with Multiple IDs: Problems with Multiple IDs ..................................... 334 Chapter 10: SAS Syntax—PROC SQL, SAS Functions, and SAS CALL Routines ......................................................339 10.1 PROC SQL Syntax Overview ......................................................................................................................... 339 10.1.1 Schema of PROC SQL Syntax ................................................................................................................ 339 10.1.2 Short Description of PROC SQL Options ............................................................................................... 341 10.1.3 Selected SAS System Options for PROC SQL ........................................................................................ 344 10.2 SAS Functions and CALL Routines (Overview) ............................................................................................. 345 10.2.1 A Small Selection of SAS Functions and SAS CALL Routines ................................................................. 346 10.2.2 Quick Finder: Categories of SAS Functions and CALL Routines ............................................................ 347 10.2.3 Categories and Descriptions of SAS Functions and SAS CALL Routines ................................................ 348 10.3 Pass-Through Facility (Features) .................................................................................................................. 370 10.3.1 Implicit Pass-Through: Options ............................................................................................................ 373 10.3.2 Explicit Pass-Through: Statements and Component ............................................................................ 373 10.3.3 Explicit Pass-Through: Examples for DBMS-specific Features .............................................................. 377 References .......................................................................................................................................................383 Syntax Index .....................................................................................................................................................385 Subject Index ....................................................................................................................................................401 Preface* SQL (Structured Query Language) is the programming language for the definition, query, and manipulation of very large data sets in relational databases. Worldwide. SQL is a quasi-industry standard. In June 2010, a web search for the term “SQL” with Google resulted in 135 million hits. In February 2021, the same Google search found 307 million hits. SQL has been a standard at ANSI (American National Standards Institute) since 1986 and at ISO since 1987. With PROC SQL, it is possible to query data from tables or views; create tables, views, new variables, and indexes; merge data from tables or views; create summary parameters or statistics; perform complex calculations (new variables); update values in SAS tables; or create and design reports with the Output Delivery System (ODS) (Schendera, 2011). This volume shows what else PROC SQL can do. There are areas where PROC SQL goes far beyond the ANSI standard, for example: • PROC SQL handles missing values differently than the ANSI standard. While entries represent the presence of information, missing values indicate the opposite, the absence of information. Chapter 2 presents the handling of missing values: defining (system- and user-defined), querying, as well as searching and replacing alphanumerical missing values. • Automatic data integrity checks using integrity constraints and audit trails, as well as multiple value checks, outliers, filters, and unification of undesired strings (Chapter 3). • PROC SQL also supports the use of the SAS macro language. This difference is of such importance that Chapter 4, the most extensive chapter, is dedicated to the interaction between SQL and the SAS Macro Facility. This chapter presents how to automate and accelerate processes using macro variables (automatic, user-defined) and macro programs including the listwise execution of commands and the use of loops. There are some other “SAS extras” that go beyond the ANSI standard. However, no separate chapter covers them. These include countless SAS functions and function calls (see the overview in Subsection 10.2.), and other functionalities like CALCULATED, Boolean expressions, or remerging. Chapters 6 through 9 extend PROC SQL focusing on performance and efficiency, which are covered from multiple angles: programming languages (hash objects, FedSQL, and DS2), programming environments (client/server, grid, or cloud), or special techniques (for example, in-memory or distributed processing). SAS syntax for PROC SQL, SAS functions and function calls, and DBMS accesses are covered in Chapter 10. PROC SQL supports many more evaluation functions than required by the ANSI standard. (See Chapter 10.) In addition to functionality and programming, Base SAS users might need to get used to a slightly different terminology. (See Table 0.1.) In Base SAS, for example, a data file is called a SAS file, a data line is called an observation, and a data column is called a variable. In PROC SQL, however, a data file is called a table, a data row is called a row, and a data column is called a column. The names in RDBMS are again slightly different. Other models or modeling languages (ERM, UML) use different terminology. Table 0.1 Terminology in PROC SQL, Base SAS, and RDBMS Structure/Language SQL Base SAS RDBMS Data File Table SAS Data Set Relation Data Row Row Observation / Record Tuple / Record Data Column Column Variable Attribute / Field Data Cell Value Value (Attribute) Value x Advanced SQL with SAS Base SAS and PROC SQL elements are often used simultaneously in the same SAS program as this book will show in many places. Experience has shown that a strict terminological differentiation is therefore unnecessary. Although this book prefers SQL terminology, it will also use Base SAS terminology. This book therefore uses the terms “table”, “SAS file”, or “data set” interchangeably for a data set. In contrast to the RDBMS term, “data set” refers to the totality of all rows (even if a file should only contain one row). PROC SQL creates empty tables, views, or “only” performs queries. What is the difference between tables, views, and queries? • A table is nothing more than a SAS data set. SAS tables are often a consequence of queries. A SAS file (a) describes data, (b) contains data, and (c) requires storage space. The last two points are important; they serve to delimit views. • A view (a) describes data, (b) contains no data itself, and therefore (c) does not require any storage space. SAS views are only a view on data or results, without having to physically contain the data itself. Views and tables have in common that they are each a result of a query or a sequence of queries. Tables and views could also be rewritten as “stored queries,” for example, as the result of a SELECT statement or more complex queries. • Queries are a collection of conditions (for example, “take all data from table A”, “query only rows with a value in column X greater than 5 from view B”, and so on). Queries in their basic form only query data and write their result immediately to the SAS output. Depending on the programming (including the addition of CREATE, AS, and QUIT), queries can output the result of their query in (a) tables, (b) views, (c) directly to the SAS output, or (d) to SAS macro variables. Queries can also contain one or more subqueries. A subquery is a query (in parentheses) that is embedded within another query. If there are several subqueries, the innermost one is executed first followed by all the others in the direction of the outermost query. All concepts assume that a data row, observation, or tuple must be uniquely identifiable by at least one so-called key (also called a primary key or ID). A key is generally unique and thus uniquely identifies a row in a table, not its relative position in the table, which can always change by sorting. A key is generally used less for describing a data row than for managing it. A primary key is usually used to identify the rows in your own table, a foreign key is used to identify the rows in another table, usually its primary key. Keys are essential for joining two or more tables and therefore must not be deleted or changed. These remarks so far about joins, scenarios, and keys were mainly concerned with data in the sense of valid or existing values. A whole chapter of this book is dedicated to the absence of information, the handling of missing values (also called NULL values, NULL, or missings). In contrast to the ANSI standard, the handling of missing values is based on a different definition. In the ANSI standard, expressions such as “5 > NULL”, “0 > NULL” or “-5 > NULL” become NULL. In Boolean and comparison operators, however, these and other expressions become true, and this is also true when working with SAS. About This Book This book provides an in-depth look at working with SQL from SAS. SQL is, especially in the SAS variant, a very powerful programming language. This book presents the topics of missing values, data quality including integrity constraints and audit trails, macro variables and programs, the interplay of PROC SQL with geodata and even spherical distances, performance and efficiency, alternative programming languages, numerous other aids, tips and tricks, or the use of the numerous SAS functions and function calls. From time to time, users are explicitly warned of potentially undesirable results. Undesirable results when sorting, grouping, or joining tables with missing values often occur when one does not know how PROC SQL or SAS work in detail. This book is intended as an in-depth consolidation and extension of ANSI SQL and is intended for advanced users. The chapters are organized so that they each illustrate different facets of programming and using PROC SQL. By the end of this book, you should be an advanced user of PROC SQL, capable of thinking conceptually beyond SQL topics and implementing data quality, performance, and automation including integrity constraints, performance

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.