4 t A h G E N di Effective awk Programming U ti M o a n FOURTH EDITION n u When processing text files, the awk language is ideal for handling data “Arnoldhasdistilledover al extraction, reporting, and data-reformatting jobs. This practical guide twoandahalfdecadesof E serves as both a reference and tutorial for POSIX-standard awk and for the experiencewritingand f GNU implementation, called gawk. This book is useful for novices and awk f usingawkprograms,and experts alike. e developinggawk,intothis In this thoroughly revised edition, author and gawk lead developer Arnold c Robbins describes the awk language and gawk program in detail, shows you book.Ifyouuseawkor t how to use awk and gawk for problem solving, and then dives into specific wanttolearnhow,then i v features of gawk. System administrators, programmers, webmasters, and readthisbook.” e other power users will find everything they need to know about awk and gawk. You will learn how to: —Michael Brennan a author of mawk w ■ Format text and use regular expressions in awk and gawk ■ Process data using awk's operators and built-in functions k ■ Manage data relationships using associative arrays P ■ Define your own functions r Effective ■ “Think in awk” with two full chapters of sample functions and o programs g ■ Take advantage of gawk's many advanced features r awk ■ Debug awk programs with the gawk built-in debugger a m This book is published under the terms of the GNU Free Documentation License. You have the freedom to copy and modify this GNU manual. m Royalties from the sales of this book go to the Free Software Foundation and to the author. i n Programming Arnold Robbins, a professional programmer and technical author, has worked with g Unix systems since 1980 and has used awk since 1987. Arnold is the maintainer of gawk and its documentation. As a member of the POSIX 1003.2 balloting group, he helped shape the POSIX standard for awk. R o b b in UNIVERSAL TEXT PROCESSING AND PATTERN MATCHING s TEXT PROCESSING Twitter: @oreillymedia facebook.com/oreilly US $44.99 CAN $51.99 ISBN: 978-1-491-90461-9 Arnold Robbins 4 t A h G E N di Effective awk Programming U ti M o a n FOURTH EDITION n u When processing text files, the awk language is ideal for handling data “Arnoldhasdistilledover al extraction, reporting, and data-reformatting jobs. This practical guide twoandahalfdecadesof E serves as both a reference and tutorial for POSIX-standard awk and for the experiencewritingand f GNU implementation, called gawk. This book is useful for novices and awk f usingawkprograms,and experts alike. e developinggawk,intothis In this thoroughly revised edition, author and gawk lead developer Arnold c Robbins describes the awk language and gawk program in detail, shows you book.Ifyouuseawkor t how to use awk and gawk for problem solving, and then dives into specific wanttolearnhow,then i v features of gawk. System administrators, programmers, webmasters, and readthisbook.” e other power users will find everything they need to know about awk and gawk. You will learn how to: —Michael Brennan a author of mawk w ■ Format text and use regular expressions in awk and gawk ■ Process data using awk's operators and built-in functions k ■ Manage data relationships using associative arrays P ■ Define your own functions r Effective ■ “Think in awk” with two full chapters of sample functions and o programs g ■ Take advantage of gawk's many advanced features r awk ■ Debug awk programs with the gawk built-in debugger a m This book is published under the terms of the GNU Free Documentation License. You have the freedom to copy and modify this GNU manual. m Royalties from the sales of this book go to the Free Software Foundation and to the author. i n Programming Arnold Robbins, a professional programmer and technical author, has worked with g Unix systems since 1980 and has used awk since 1987. Arnold is the maintainer of gawk and its documentation. As a member of the POSIX 1003.2 balloting group, he helped shape the POSIX standard for awk. R o b b in UNIVERSAL TEXT PROCESSING AND PATTERN MATCHING s TEXT PROCESSING Twitter: @oreillymedia facebook.com/oreilly US $44.99 CAN $51.99 ISBN: 978-1-491-90461-9 Arnold Robbins FOURTH EDITION Effective awk Programming Arnold Robbins Effective awk Programming, Fourth Edition by Arnold Robbins Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2015 Free Software Foundation, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Phone: (617) 542-5942, Fax: (617) 542-2652, Email: [email protected], URL: http://www.gnu.org Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, with the Front-Cover Texts being “A GNU Manual”, and with the Back-Cover Texts as in (a) below. A copy of the license may be found on the Internet at the GNU Project’s web site: http://www.gnu.org/software/gawk/manual/html_node/GNU-Free- Documentation-License.html. a. The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual.” The GNU Free Documentation License does not apply to or include the O’Reilly Media, Inc. trademarks or trade dress. Editors: Andy Oram and Heather Scherer Indexer: Ellen Troutman-Zaig Production Editor: Colleen Lobner Cover Designer: Ellie Volckhausen Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Rachel Head Illustrator: Rebecca Demarest March 2015: Fourth Edition Revision History for the Fourth Edition: 2015-02-27: First release See http://oreilly.com/catalog/errata.csp?isbn=9781491904619 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. The cover image and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While the publisher and the author have used good faith efforts to ensure that the information and instruc‐ tions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intel‐ lectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. ISBN: 978-1-491-90461-9 [LSI] To my parents, for their love, and for the wonderful example they set for me. To my wife Miriam, for making me complete. Thank you for building your life together with me. To our children Chana, Rivka, Nachum, and Malka, for enrichening our lives in innumerable ways. Table of Contents Foreword to the Third Edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Foreword to the Fourth Edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Part I. The awk Language 1. Getting Started with awk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 How to Run awk Programs 3 One-Shot Throwaway awk Programs 4 Running awk Without Input Files 4 Running Long Programs 5 Executable awk Programs 6 Comments in awk Programs 7 Shell Quoting Issues 8 Datafiles for the Examples 11 Some Simple Examples 12 An Example with Two Rules 14 A More Complex Example 15 awk Statements Versus Lines 16 Other Features of awk 18 When to Use awk 18 Summary 19 2. Running awk and gawk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Invoking awk 21 Command-Line Options 22 Other Command-Line Arguments 29 Naming Standard Input 30 The Environment Variables gawk Uses 30 v The AWKPATH Environment Variable 31 The AWKLIBPATH Environment Variable 32 Other Environment Variables 32 gawk’s Exit Status 34 Including Other Files into Your Program 35 Loading Dynamic Extensions into Your Program 36 Obsolete Options and/or Features 37 Undocumented Options and Features 37 Summary 37 3. Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 How to Use Regular Expressions 39 Escape Sequences 40 Regular Expression Operators 43 Using Bracket Expressions 46 How Much Text Matches? 48 Using Dynamic Regexps 49 gawk-Specific Regexp Operators 50 Case Sensitivity in Matching 52 Summary 54 4. Reading Input Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 How Input Is Split into Records 55 Record Splitting with Standard awk 55 Record Splitting with gawk 58 Examining Fields 60 Nonconstant Field Numbers 61 Changing the Contents of a Field 62 Specifying How Fields Are Separated 65 Whitespace Normally Separates Fields 66 Using Regular Expressions to Separate Fields 66 Making Each Character a Separate Field 67 Setting FS from the Command Line 68 Making the Full Line Be a Single Field 69 Field-Splitting Summary 70 Reading Fixed-Width Data 71 Defining Fields by Content 73 Multiple-Line Records 75 Explicit Input with getline 77 Using getline with No Arguments 78 Using getline into a Variable 79 Using getline from a File 80 vi | Table of Contents Using getline into a Variable from a File 80 Using getline from a Pipe 81 Using getline into a Variable from a Pipe 82 Using getline from a Coprocess 83 Using getline into a Variable from a Coprocess 83 Points to Remember About getline 83 Summary of getline Variants 85 Reading Input with a Timeout 85 Directories on the Command Line 87 Summary 87 5. Printing Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 The print Statement 89 print Statement Examples 90 Output Separators 91 Controlling Numeric Output with print 92 Using printf Statements for Fancier Printing 93 Introduction to the printf Statement 93 Format-Control Letters 94 Modifiers for printf Formats 96 Examples Using printf 99 Redirecting Output of print and printf 100 Special Files for Standard Preopened Data Streams 103 Special Filenames in gawk 104 Accessing Other Open Files with gawk 104 Special Files for Network Communications 104 Special Filename Caveats 105 Closing Input and Output Redirections 105 Summary 108 6. Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Constants, Variables, and Conversions 111 Constant Expressions 111 Using Regular Expression Constants 114 Variables 115 Conversion of Strings and Numbers 117 Operators: Doing Something with Values 119 Arithmetic Operators 119 String Concatenation 121 Assignment Expressions 122 Increment and Decrement Operators 125 Truth Values and Conditions 127 Table of Contents | vii True and False in awk 127 Variable Typing and Comparison Expressions 128 Boolean Expressions 132 Conditional Expressions 134 Function Calls 134 Operator Precedence (How Operators Nest) 136 Where You Are Makes a Difference 138 Summary 139 7. Patterns, Actions, and Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Pattern Elements 141 Regular Expressions as Patterns 142 Expressions as Patterns 142 Specifying Record Ranges with Patterns 143 The BEGIN and END Special Patterns 145 The BEGINFILE and ENDFILE Special Patterns 147 The Empty Pattern 148 Using Shell Variables in Programs 148 Actions 149 Control Statements in Actions 150 The if-else Statement 150 The while Statement 151 The do-while Statement 151 The for Statement 152 The switch Statement 154 The break Statement 155 The continue Statement 156 The next Statement 157 The nextfile Statement 158 The exit Statement 159 Predefined Variables 160 Built-in Variables That Control awk 160 Built-in Variables That Convey Information 163 Using ARGC and ARGV 170 Summary 172 8. Arrays in awk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 The Basics of Arrays 173 Introduction to Arrays 173 Referring to an Array Element 175 Assigning Array Elements 176 Basic Array Example 177 viii | Table of Contents
Description: