ebook img

Introduction to Regular Expressions in SAS PDF

120 Pages·2014·3.171 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Regular Expressions in SAS

Introduction to Regular Expressions in SAS® K. Matthew Windham support.sas.com/bookstore The correct bibliographic citation for this manual is as follows: Windham, K. Matthew. 2014. Introduction to Regular Expressions in SAS®. Cary, NC: SAS Institute Inc. Introduction to Regular Expressions in SAS® Copyright © 2014, SAS Institute Inc., Cary, NC, USA ISBN 978-1-61290-904-2 (Hardcopy) ISBN 978-1-62959-498-9 (EPUB) ISBN 978-1-62959-499-6 (MOBI) ISBN 978-1-62959-500-9 (PDF) All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a) and DFAR 227.7202-4 and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government's rights in Software and documentation shall be only those set forth in this Agreement. SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414. December 2014 SAS provides a complete selection of books and electronic products to help customers use SAS® software to its fullest potential. For more information about our offerings, visit support.sas.com/bookstore or call 1-800-727-0025. SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Contents About This Book ........................................................................................ vii About The Author ...................................................................................... xi Acknowledgments .................................................................................... xiii Chapter 1: Introduction .............................................................................. 1 1.1 Purpose of This Book .............................................................................................................. 1 1.2 Layout of This Book ................................................................................................................. 1 1.3 Defining Regular Expressions ................................................................................................. 2 1.4 Motivational Examples ............................................................................................................ 3 1.4.1 Extract, Transform, and Load (ETL) .............................................................................. 3 1.4.2 Data Manipulation .......................................................................................................... 4 1.4.3 Data Enrichment ............................................................................................................. 5 Chapter 2: Getting Started with Regular Expressions ................................. 9 2.1 Introduction ............................................................................................................................ 10 2.1.1 RegEx Test Code .......................................................................................................... 11 2.2 Special Characters ................................................................................................................. 13 2.3 Basic Metacharacters ............................................................................................................ 15 2.3.1 Wildcard ......................................................................................................................... 15 2.3.2 Word ............................................................................................................................... 15 2.3.3 Non-word ....................................................................................................................... 16 2.3.4 Tab.................................................................................................................................. 16 2.3.5 Whitespace .................................................................................................................... 17 2.3.6 Non-whitespace ............................................................................................................ 17 2.3.7 Digit ................................................................................................................................ 17 2.3.8 Non-digit ........................................................................................................................ 18 2.3.9 Newline .......................................................................................................................... 18 2.3.10 Bell ............................................................................................................................... 19 iv 2.3.11 Control Character ....................................................................................................... 20 2.3.12 Octal ............................................................................................................................. 20 2.3.13 Hexadecimal ................................................................................................................ 21 2.4 Character Classes .................................................................................................................. 21 2.4.1 List .................................................................................................................................. 21 2.4.2 Not List........................................................................................................................... 22 2.4.3 Range ............................................................................................................................. 22 2.5 Modifiers ................................................................................................................................. 23 2.5.1 Case Modifiers .............................................................................................................. 23 2.5.2 Repetition Modifiers ..................................................................................................... 25 2.6 Options .................................................................................................................................... 32 2.6.1 Ignore Case ................................................................................................................... 32 2.6.2 Single Line ..................................................................................................................... 32 2.6.3 Multiline ......................................................................................................................... 33 2.6.4 Compile Once ................................................................................................................ 33 2.6.5 Substitution Operator ................................................................................................... 34 2.7 Zero-width Metacharacters .................................................................................................. 34 2.7.1 Start of Line ................................................................................................................... 35 2.7.2 End of Line ..................................................................................................................... 35 2.7.3 Word Boundary ............................................................................................................. 35 2.7.4 Non-word Boundary ..................................................................................................... 36 2.7.5 String Start .................................................................................................................... 36 2.8 Summary ................................................................................................................................. 37 Chapter 3: Using Regular Expressions in SAS ........................................... 39 3.1 Introduction ............................................................................................................................ 39 3.1.1 Capture Buffer............................................................................................................... 39 3.2 Built-in SAS Functions ........................................................................................................... 40 3.2.1 PRXPARSE .................................................................................................................... 40 3.2.2 PRXMATCH ................................................................................................................... 42 3.2.3 PRXCHANGE ................................................................................................................. 43 3.2.4 PRXPOSN ...................................................................................................................... 46 3.2.5 PRXPAREN .................................................................................................................... 47 v 3.3 Built-in SAS Call Routines ..................................................................................................... 49 3.3.1 CALL PRXCHANGE ...................................................................................................... 50 3.3.2 CALL PRXPOSN ............................................................................................................ 54 3.3.3 CALL PRXSUBSTR ....................................................................................................... 56 3.3.4 CALL PRXNEXT ............................................................................................................ 57 3.3.5 CALL PRXDEBUG ......................................................................................................... 59 3.3.6 CALL PRXFREE ............................................................................................................. 62 3.4 Summary ................................................................................................................................. 63 Chapter 4: Applications of Regular Expressions in SAS ............................ 65 4.1 Introduction ............................................................................................................................ 65 4.1.1 Random PII Generator ................................................................................................. 66 4.2 Data Cleansing and Standardization .................................................................................... 72 4.3 Information Extraction ........................................................................................................... 77 4.4 Search and Replacement ...................................................................................................... 80 4.5 Summary ................................................................................................................................. 83 4.5.1 Start Small ..................................................................................................................... 83 4.5.2 Think Big ........................................................................................................................ 83 Appendix A: Perl Version Notes ................................................................ 85 Appendix B: ASCII Code Lookup Tables .................................................... 87 Non-Printing Characters ............................................................................................................. 87 Printing Characters ...................................................................................................................... 89 Appendix C: POSIX Metacharacters .......................................................... 97 Index ...................................................................................................... 101 vi About This Book Purpose This book is intended for a wide audience of SAS users, from novice programmer to the very advanced. As not much has previously been published on this topic, many different skill levels can benefit from the content herein. However, the book has been written to ensure that novice programmers can immediately implement every element discussed. Is This Book for You? Of course, it is! Do you wish you could process unstructured data sources? Would you like to more effectively process semi-structured data sources? Do you want to one day leverage advanced text mining concepts within your Base SAS code? Of course, you do! This book lays the foundation for all of this and more, making it the ideal text for anyone wanting to enhance their programming prowess. Prerequisites Readers should be comfortable using and applying the SAS DATA step, basic PROCs (e.g., PROC PRINT), DO loops, and conditional processing concepts. Readers should be familiar with SAS arrays and the RETAIN statement. Scope of This Book This book covers all PRX functions and call routines. This book does NOT cover advanced concepts requiring MACRO programming, PROC SQL, or system automation. About the Examples Software Used to Develop the Book's Content Base SAS (Microsoft Windows) viii Example Code and Data You can access the example code and data for this book by linking to its author page at http://support.sas.com/publishing/authors. Select the name of the author. Then, look for the cover thumbnail of this book, and select Example Code and Data to display the SAS programs that are included in this book. For an alphabetical listing of all books for which example code and data is available, see http://support.sas.com/bookcode. Select a title to display the book’s example code. If you are unable to access the code through the website, e-mail [email protected]. Output and Graphics Used in This Book All output used in this book was generated via the SAS log and PROC PRINT. Additional Help Although this book illustrates many analyses regularly performed in businesses across industries, questions specific to your aims and issues may arise. To fully support you, SAS Institute and SAS Press offer you the following help resources: • About topics covered in this book, contact the author through SAS Press: ◦ Send questions by e-mail to [email protected]; include the book title in your correspondence. ◦ Submit feedback on the author’s page at http://support.sas.com/author_feedback. • About topics in or beyond this book, post questions to the relevant SAS Support Communities at https://communities.sas.com/welcome. • SAS Institute maintains a comprehensive website with up-to-date information. One page that is particularly useful to both the novice and the seasoned SAS user is its Knowledge Base. Search for relevant notes in the “Samples and SAS Notes” section of the Knowledge Base at http://support.sas.com/resources. • Registered SAS users or their organizations can access SAS Customer Support at http://support.sas.com. Here you can pose specific questions to SAS Customer Support: Under Support, click Submit a Problem. You will need to provide an e-mail address to which replies can be sent, identify your organization, and provide a customer site number or license information. This information can be found in your SAS logs. ix Keep in Touch We look forward to hearing from you. We invite questions, comments, and concerns. If you want to contact us about a specific book, please include the book title in your correspondence to [email protected]. To Contact the Author through SAS Press By e-mail: [email protected] Via the Web: http://support.sas.com/author_feedback SAS Books For a complete list of books available through SAS, visit http://support.sas.com/bookstore. Phone: 1-800-727-0025 E-mail: [email protected] SAS Book Report Receive up-to-date information about all new SAS publications via e-mail by subscribing to the SAS Book Report monthly eNewsletter. Visit http://support.sas.com/sbr. Publish with SAS SAS is recruiting authors! Are you interested in writing a book? Visit http://support.sas.com/saspress for more information. x

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.