"Fossies" - the Fresh Open Source Software Archive

Member "gawk-5.1.0/doc/gawktexi.in" (13 Apr 2020, 1628077 Bytes) of package /linux/misc/gawk-5.1.0.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 \input texinfo   @c -*-texinfo-*-
    2 @c vim: filetype=texinfo
    3 @c %**start of header (This is for running Texinfo on a region.)
    4 @setfilename gawk.info
    5 @settitle The GNU Awk User's Guide
    6 @c %**end of header (This is for running Texinfo on a region.)
    7 
    8 @dircategory Text creation and manipulation
    9 @direntry
   10 * Gawk: (gawk).                 A text scanning and processing language.
   11 @end direntry
   12 @dircategory Individual utilities
   13 @direntry
   14 * awk: (gawk)Invoking Gawk.                     Text scanning and processing.
   15 @end direntry
   16 
   17 @ifset FOR_PRINT
   18 @tex
   19 \gdef\xrefprintnodename#1{``#1''}
   20 @end tex
   21 @end ifset
   22 
   23 @ifclear FOR_PRINT
   24 @c With early 2014 texinfo.tex, restore PDF links and colors
   25 @tex
   26 \gdef\linkcolor{0.5 0.09 0.12} % Dark Red
   27 \gdef\urlcolor{0.5 0.09 0.12} % Also
   28 \global\urefurlonlylinktrue
   29 @end tex
   30 @end ifclear
   31 
   32 @ifnotdocbook
   33 @set BULLET @bullet{}
   34 @set MINUS @minus{}
   35 @end ifnotdocbook
   36 
   37 @ifdocbook
   38 @set BULLET
   39 @set MINUS
   40 @end ifdocbook
   41 
   42 @iftex
   43 @set TIMES @times
   44 @end iftex
   45 @ifnottex
   46 @set TIMES *
   47 @end ifnottex
   48         
   49 @c Let texinfo.tex give us full section titles
   50 @xrefautomaticsectiontitle on
   51 
   52 @c The following information should be updated here only!
   53 @c This sets the edition of the document, the version of gawk it
   54 @c applies to and all the info about who's publishing this edition
   55 
   56 @c These apply across the board.
   57 @set UPDATE-MONTH March, 2020
   58 @set VERSION 5.1
   59 @set PATCHLEVEL 0
   60 
   61 @set GAWKINETTITLE TCP/IP Internetworking with @command{gawk}
   62 @ifset FOR_PRINT
   63 @set TITLE Effective awk Programming
   64 @end ifset
   65 @ifclear FOR_PRINT
   66 @set TITLE GAWK: Effective AWK Programming
   67 @end ifclear
   68 @set SUBTITLE A User's Guide for GNU Awk
   69 @set EDITION 5.1
   70 
   71 @iftex
   72 @set DOCUMENT book
   73 @set CHAPTER chapter
   74 @set APPENDIX appendix
   75 @set SECTION section
   76 @set SUBSECTION subsection
   77 @set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}}
   78 @set COMMONEXT (c.e.)
   79 @set PAGE page
   80 @end iftex
   81 @ifinfo
   82 @set DOCUMENT Info file
   83 @set CHAPTER major node
   84 @set APPENDIX major node
   85 @set SECTION minor node
   86 @set SUBSECTION node
   87 @set DARKCORNER (d.c.)
   88 @set COMMONEXT (c.e.)
   89 @set PAGE screen
   90 @end ifinfo
   91 @ifhtml
   92 @set DOCUMENT Web page
   93 @set CHAPTER chapter
   94 @set APPENDIX appendix
   95 @set SECTION section
   96 @set SUBSECTION subsection
   97 @set DARKCORNER (d.c.)
   98 @set COMMONEXT (c.e.)
   99 @set PAGE screen
  100 @end ifhtml
  101 @ifdocbook
  102 @set DOCUMENT book
  103 @set CHAPTER chapter
  104 @set APPENDIX appendix
  105 @set SECTION section
  106 @set SUBSECTION subsection
  107 @set DARKCORNER (d.c.)
  108 @set COMMONEXT (c.e.)
  109 @set PAGE page
  110 @end ifdocbook
  111 @ifxml
  112 @set DOCUMENT book
  113 @set CHAPTER chapter
  114 @set APPENDIX appendix
  115 @set SECTION section
  116 @set SUBSECTION subsection
  117 @set DARKCORNER (d.c.)
  118 @set COMMONEXT (c.e.)
  119 @set PAGE page
  120 @end ifxml
  121 @ifplaintext
  122 @set DOCUMENT book
  123 @set CHAPTER chapter
  124 @set APPENDIX appendix
  125 @set SECTION section
  126 @set SUBSECTION subsection
  127 @set DARKCORNER (d.c.)
  128 @set COMMONEXT (c.e.)
  129 @set PAGE page
  130 @end ifplaintext
  131 
  132 @ifdocbook
  133 @c empty on purpose
  134 @set PART1
  135 @set PART2
  136 @set PART3
  137 @set PART4
  138 @end ifdocbook
  139 
  140 @ifnotdocbook
  141 @set PART1 Part I:@*
  142 @set PART2 Part II:@*
  143 @set PART3 Part III:@*
  144 @set PART4 Part IV:@*
  145 @end ifnotdocbook
  146 
  147 @c some special symbols
  148 @iftex
  149 @set LEQ @math{@leq}
  150 @set PI @math{@pi}
  151 @end iftex
  152 @ifdocbook
  153 @set LEQ @inlineraw{docbook, ≤}
  154 @set PI @inlineraw{docbook, &pgr;}
  155 @end ifdocbook
  156 @ifnottex
  157 @ifnotdocbook
  158 @set LEQ <=
  159 @set PI @i{pi}
  160 @end ifnotdocbook
  161 @end ifnottex
  162 
  163 @ifnottex
  164 @ifnotdocbook
  165 @macro ii{text}
  166 @i{\text\}
  167 @end macro
  168 @end ifnotdocbook
  169 @end ifnottex
  170 
  171 @ifdocbook
  172 @macro ii{text}
  173 @inlineraw{docbook,<lineannotation>\text\</lineannotation>}
  174 @end macro
  175 @end ifdocbook
  176 
  177 @ifclear FOR_PRINT
  178 @set FN file name
  179 @set FFN File name
  180 @set DF data file
  181 @set DDF Data file
  182 @set PVERSION version
  183 @end ifclear
  184 @ifset FOR_PRINT
  185 @set FN filename
  186 @set FFN Filename
  187 @set DF datafile
  188 @set DDF Datafile
  189 @set PVERSION version
  190 @end ifset
  191 
  192 @c For HTML, spell out email addresses, to avoid problems with
  193 @c address harvesters for spammers.
  194 @ifhtml
  195 @macro EMAIL{real,spelled}
  196 ``\spelled\''
  197 @end macro
  198 @end ifhtml
  199 @ifnothtml
  200 @macro EMAIL{real,spelled}
  201 @email{\real\}
  202 @end macro
  203 @end ifnothtml
  204 
  205 @c Indexing macros
  206 @ifinfo
  207 
  208 @macro cindexawkfunc{name}
  209 @cindex @code{\name\}
  210 @end macro
  211 
  212 @macro cindexgawkfunc{name}
  213 @cindex @code{\name\}
  214 @end macro
  215 
  216 @end ifinfo
  217 
  218 @ifnotinfo
  219 
  220 @macro cindexawkfunc{name}
  221 @cindex @code{\name\()} function
  222 @end macro
  223 
  224 @macro cindexgawkfunc{name}
  225 @cindex @code{\name\()} function (@command{gawk})
  226 @end macro
  227 @end ifnotinfo
  228 
  229 @ignore
  230 Some comments on the layout for TeX.
  231 1. Use at least texinfo.tex 2016-02-05.07.
  232 @end ignore
  233 
  234 @c merge the function and variable indexes into the concept index
  235 @ifinfo
  236 @synindex fn cp
  237 @synindex vr cp
  238 @end ifinfo
  239 @iftex
  240 @syncodeindex fn cp
  241 @syncodeindex vr cp
  242 @end iftex
  243 @ifxml
  244 @syncodeindex fn cp
  245 @syncodeindex vr cp
  246 @end ifxml
  247 @ifdocbook
  248 @synindex fn cp
  249 @synindex vr cp
  250 @end ifdocbook
  251 
  252 @c If "finalout" is commented out, the printed output will show
  253 @c black boxes that mark lines that are too long.  Thus, it is
  254 @c unwise to comment it out when running a master in case there are
  255 @c overfulls which are deemed okay.
  256 
  257 @iftex
  258 @finalout
  259 @end iftex
  260 
  261 @c Enabled '-quotes in PDF files so that cut/paste works in
  262 @c more places.
  263 
  264 @codequoteundirected on
  265 @codequotebacktick on
  266 
  267 @copying
  268 @docbook
  269 <para>
  270 &ldquo;To boldly go where no man has gone before&rdquo; is a
  271 Registered Trademark of Paramount Pictures Corporation.</para>
  272 
  273 <para>Published by:</para>
  274 
  275 <literallayout class="normal">Free Software Foundation
  276 51 Franklin Street, Fifth Floor
  277 Boston, MA  02110-1301 USA
  278 Phone: +1-617-542-5942
  279 Fax: +1-617-542-2652
  280 Email: <email>gnu@@gnu.org</email>
  281 URL: <ulink url="https://www.gnu.org">https://www.gnu.org/</ulink></literallayout>
  282 
  283 <literallayout class="normal">Copyright &copy; 1989, 1991, 1992, 1993, 1996&ndash;2005, 2007, 2009&ndash;2020
  284 Free Software Foundation, Inc.
  285 All Rights Reserved.</literallayout>
  286 @end docbook
  287 
  288 @ifnotdocbook
  289 Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2020 @*
  290 Free Software Foundation, Inc.
  291 @end ifnotdocbook
  292 @sp 2
  293 
  294 This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}},
  295 for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU
  296 implementation of AWK.
  297 
  298 Permission is granted to copy, distribute and/or modify this document
  299 under the terms of the GNU Free Documentation License, Version 1.3 or
  300 any later version published by the Free Software Foundation; with the
  301 Invariant Sections being ``GNU General Public License'', with the
  302 Front-Cover Texts being ``A GNU Manual'', and with the Back-Cover Texts
  303 as in (a) below.
  304 @ifclear FOR_PRINT
  305 A copy of the license is included in the section entitled
  306 ``GNU Free Documentation License''.
  307 @end ifclear
  308 @ifset FOR_PRINT
  309 A copy of the license
  310 may be found on the Internet at
  311 @uref{https://www.gnu.org/software/gawk/manual/html_node/GNU-Free-Documentation-License.html,
  312 the GNU Project's website}.
  313 @end ifset
  314 
  315 @enumerate a
  316 @item
  317 The FSF's Back-Cover Text is: ``You have the freedom to
  318 copy and modify this GNU manual.''
  319 @end enumerate
  320 @end copying
  321 
  322 @c Comment out the "smallbook" for technical review.  Saves
  323 @c considerable paper.  Remember to turn it back on *before*
  324 @c starting the page-breaking work.
  325 
  326 @c 4/2002: Karl Berry recommends commenting out this and the
  327 @c `@setchapternewpage odd', and letting users use `texi2dvi -t'
  328 @c if they want to waste paper.
  329 @c @smallbook
  330 
  331 
  332 @c Uncomment this for the release.  Leaving it off saves paper
  333 @c during editing and review.
  334 @setchapternewpage odd
  335 
  336 @shorttitlepage GNU Awk
  337 @titlepage
  338 @title @value{TITLE}
  339 @subtitle @value{SUBTITLE}
  340 @subtitle Edition @value{EDITION}
  341 @subtitle @value{UPDATE-MONTH}
  342 @author Arnold D. Robbins
  343 
  344 @ifnotdocbook
  345 @c Include the Distribution inside the titlepage environment so
  346 @c that headings are turned off.  Headings on and off do not work.
  347 
  348 @page
  349 @vskip 0pt plus 1filll
  350 ``To boldly go where no man has gone before'' is a
  351 Registered Trademark of Paramount Pictures Corporation. @*
  352 @c sorry, i couldn't resist
  353 @sp 3
  354 Published by:
  355 @sp 1
  356 
  357 Free Software Foundation @*
  358 51 Franklin Street, Fifth Floor @*
  359 Boston, MA  02110-1301 USA @*
  360 Phone: +1-617-542-5942 @*
  361 Fax: +1-617-542-2652 @*
  362 Email: @email{gnu@@gnu.org} @*
  363 URL: @uref{https://www.gnu.org/} @*
  364 
  365 @c This one is correct for gawk 3.1.0 from the FSF
  366 ISBN 1-882114-28-0 @*
  367 @sp 2
  368 @insertcopying
  369 @end ifnotdocbook
  370 @end titlepage
  371 
  372 @c Thanks to Bob Chassell for directions on doing dedications.
  373 @iftex
  374 @headings off
  375 @page
  376 @w{ }
  377 @sp 9
  378 @center @i{To my parents, for their love, and for the wonderful example they set for me.}
  379 @sp 1
  380 @center @i{To my wife, Miriam, for making me complete.
  381 Thank you for building your life together with me.}
  382 @sp 1
  383 @center @i{To our children, Chana, Rivka, Nachum, and Malka, for enrichening our lives in innumerable ways.}
  384 @sp 1
  385 @w{ }
  386 @page
  387 @w{ }
  388 @page
  389 @headings on
  390 @end iftex
  391 
  392 @docbook
  393 <dedication>
  394 <para>To my parents, for their love, and for the wonderful
  395 example they set for me.</para>
  396 <para>To my wife Miriam, for making me complete.
  397 Thank you for building your life together with me.</para>
  398 <para>To our children Chana, Rivka, Nachum and Malka,
  399 for enrichening our lives in innumerable ways.</para>
  400 </dedication>
  401 @end docbook
  402 
  403 @iftex
  404 @headings off
  405 @evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
  406 @oddheading  @| @| @strong{@thischapter}@ @ @ @thispage
  407 @end iftex
  408 
  409 @ifnottex
  410 @ifnotxml
  411 @ifnotdocbook
  412 @node Top
  413 @top General Introduction
  414 @c Preface node should come right after the Top
  415 @c node, in `unnumbered' sections, then the chapter, `What is gawk'.
  416 @c Licensing nodes are appendices, they're not central to AWK.
  417 
  418 This file documents @command{awk}, a program that you can use to select
  419 particular records in a file and perform operations upon them.
  420 
  421 @insertcopying
  422 
  423 @end ifnotdocbook
  424 @end ifnotxml
  425 @end ifnottex
  426 
  427 @menu
  428 * Foreword3::                      Some nice words about this
  429                                    @value{DOCUMENT}.
  430 * Foreword4::                      More nice words.
  431 * Preface::                        What this @value{DOCUMENT} is about; brief
  432                                    history and acknowledgments.
  433 * Getting Started::                A basic introduction to using
  434                                    @command{awk}. How to run an @command{awk}
  435                                    program. Command-line syntax.
  436 * Invoking Gawk::                  How to run @command{gawk}.
  437 * Regexp::                         All about matching things using regular
  438                                    expressions.
  439 * Reading Files::                  How to read files and manipulate fields.
  440 * Printing::                       How to print using @command{awk}. Describes
  441                                    the @code{print} and @code{printf}
  442                                    statements. Also describes redirection of
  443                                    output.
  444 * Expressions::                    Expressions are the basic building blocks
  445                                    of statements.
  446 * Patterns and Actions::           Overviews of patterns and actions.
  447 * Arrays::                         The description and use of arrays. Also
  448                                    includes array-oriented control statements.
  449 * Functions::                      Built-in and user-defined functions.
  450 * Library Functions::              A Library of @command{awk} Functions.
  451 * Sample Programs::                Many @command{awk} programs with complete
  452                                    explanations.
  453 * Advanced Features::              Stuff for advanced users, specific to
  454                                    @command{gawk}.
  455 * Internationalization::           Getting @command{gawk} to speak your
  456                                    language.
  457 * Debugger::                       The @command{gawk} debugger.
  458 * Namespaces::                     How namespaces work in @command{gawk}.
  459 * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
  460                                    @command{gawk}.
  461 * Dynamic Extensions::             Adding new built-in functions to
  462                                    @command{gawk}.
  463 * Language History::               The evolution of the @command{awk}
  464                                    language.
  465 * Installation::                   Installing @command{gawk} under various
  466                                    operating systems.
  467 * Notes::                          Notes about adding things to @command{gawk}
  468                                    and possible future work.
  469 * Basic Concepts::                 A very quick introduction to programming
  470                                    concepts.
  471 * Glossary::                       An explanation of some unfamiliar terms.
  472 * Copying::                        Your right to copy and distribute
  473                                    @command{gawk}.
  474 * GNU Free Documentation License:: The license for this @value{DOCUMENT}.
  475 * Index::                          Concept and Variable Index.
  476 
  477 @detailmenu
  478 * History::                             The history of @command{gawk} and
  479                                         @command{awk}.
  480 * Names::                               What name to use to find
  481                                         @command{awk}.
  482 * This Manual::                         Using this @value{DOCUMENT}. Includes
  483                                         sample input files that you can use.
  484 * Conventions::                         Typographical Conventions.
  485 * Manual History::                      Brief history of the GNU project and
  486                                         this @value{DOCUMENT}.
  487 * How To Contribute::                   Helping to save the world.
  488 * Acknowledgments::                     Acknowledgments.
  489 * Running gawk::                        How to run @command{gawk} programs;
  490                                         includes command-line syntax.
  491 * One-shot::                            Running a short throwaway
  492                                         @command{awk} program.
  493 * Read Terminal::                       Using no input files (input from the
  494                                         keyboard instead).
  495 * Long::                                Putting permanent @command{awk}
  496                                         programs in files.
  497 * Executable Scripts::                  Making self-contained @command{awk}
  498                                         programs.
  499 * Comments::                            Adding documentation to @command{gawk}
  500                                         programs.
  501 * Quoting::                             More discussion of shell quoting
  502                                         issues.
  503 * DOS Quoting::                         Quoting in Windows Batch Files.
  504 * Sample Data Files::                   Sample data files for use in the
  505                                         @command{awk} programs illustrated in
  506                                         this @value{DOCUMENT}.
  507 * Very Simple::                         A very simple example.
  508 * Two Rules::                           A less simple one-line example using
  509                                         two rules.
  510 * More Complex::                        A more complex example.
  511 * Statements/Lines::                    Subdividing or combining statements
  512                                         into lines.
  513 * Other Features::                      Other Features of @command{awk}.
  514 * When::                                When to use @command{gawk} and when to
  515                                         use other things.
  516 * Intro Summary::                       Summary of the introduction.
  517 * Command Line::                        How to run @command{awk}.
  518 * Options::                             Command-line options and their
  519                                         meanings.
  520 * Other Arguments::                     Input file names and variable
  521                                         assignments.
  522 * Naming Standard Input::               How to specify standard input with
  523                                         other files.
  524 * Environment Variables::               The environment variables
  525                                         @command{gawk} uses.
  526 * AWKPATH Variable::                    Searching directories for
  527                                         @command{awk} programs.
  528 * AWKLIBPATH Variable::                 Searching directories for
  529                                         @command{awk} shared libraries.
  530 * Other Environment Variables::         The environment variables.
  531 * Exit Status::                         @command{gawk}'s exit status.
  532 * Include Files::                       Including other files into your
  533                                         program.
  534 * Loading Shared Libraries::            Loading shared libraries into your
  535                                         program.
  536 * Obsolete::                            Obsolete Options and/or features.
  537 * Undocumented::                        Undocumented Options and Features.
  538 * Invoking Summary::                    Invocation summary.
  539 * Regexp Usage::                        How to Use Regular Expressions.
  540 * Escape Sequences::                    How to write nonprinting characters.
  541 * Regexp Operators::                    Regular Expression Operators.
  542 * Regexp Operator Details::             The actual details.
  543 * Interval Expressions::                Notes on interval expressions.
  544 * Bracket Expressions::                 What can go between @samp{[...]}.
  545 * Leftmost Longest::                    How much text matches.
  546 * Computed Regexps::                    Using Dynamic Regexps.
  547 * GNU Regexp Operators::                Operators specific to GNU software.
  548 * Case-sensitivity::                    How to do case-insensitive matching.
  549 * Regexp Summary::                      Regular expressions summary.
  550 * Records::                             Controlling how data is split into
  551                                         records.
  552 * awk split records::                   How standard @command{awk} splits
  553                                         records.
  554 * gawk split records::                  How @command{gawk} splits records.
  555 * Fields::                              An introduction to fields.
  556 * Nonconstant Fields::                  Nonconstant Field Numbers.
  557 * Changing Fields::                     Changing the Contents of a Field.
  558 * Field Separators::                    The field separator and how to change
  559                                         it.
  560 * Default Field Splitting::             How fields are normally separated.
  561 * Regexp Field Splitting::              Using regexps as the field separator.
  562 * Single Character Fields::             Making each character a separate
  563                                         field.
  564 * Command Line Field Separator::        Setting @code{FS} from the command
  565                                         line.
  566 * Full Line Fields::                    Making the full line be a single
  567                                         field.
  568 * Field Splitting Summary::             Some final points and a summary table.
  569 * Constant Size::                       Reading constant width data.
  570 * Fixed width data::                    Processing fixed-width data.
  571 * Skipping intervening::                Skipping intervening fields.
  572 * Allowing trailing data::              Capturing optional trailing data.
  573 * Fields with fixed data::              Field values with fixed-width data.
  574 * Splitting By Content::                Defining Fields By Content
  575 * More CSV::                            More on CSV files.
  576 * Testing field creation::              Checking how @command{gawk} is
  577                                         splitting records.
  578 * Multiple Line::                       Reading multiline records.
  579 * Getline::                             Reading files under explicit program
  580                                         control using the @code{getline}
  581                                         function.
  582 * Plain Getline::                       Using @code{getline} with no
  583                                         arguments.
  584 * Getline/Variable::                    Using @code{getline} into a variable.
  585 * Getline/File::                        Using @code{getline} from a file.
  586 * Getline/Variable/File::               Using @code{getline} into a variable
  587                                         from a file.
  588 * Getline/Pipe::                        Using @code{getline} from a pipe.
  589 * Getline/Variable/Pipe::               Using @code{getline} into a variable
  590                                         from a pipe.
  591 * Getline/Coprocess::                   Using @code{getline} from a coprocess.
  592 * Getline/Variable/Coprocess::          Using @code{getline} into a variable
  593                                         from a coprocess.
  594 * Getline Notes::                       Important things to know about
  595                                         @code{getline}.
  596 * Getline Summary::                     Summary of @code{getline} Variants.
  597 * Read Timeout::                        Reading input with a timeout.
  598 * Retrying Input::                      Retrying input after certain errors.
  599 * Command-line directories::            What happens if you put a directory on
  600                                         the command line.
  601 * Input Summary::                       Input summary.
  602 * Input Exercises::                     Exercises.
  603 * Print::                               The @code{print} statement.
  604 * Print Examples::                      Simple examples of @code{print}
  605                                         statements.
  606 * Output Separators::                   The output separators and how to
  607                                         change them.
  608 * OFMT::                                Controlling Numeric Output With
  609                                         @code{print}.
  610 * Printf::                              The @code{printf} statement.
  611 * Basic Printf::                        Syntax of the @code{printf} statement.
  612 * Control Letters::                     Format-control letters.
  613 * Format Modifiers::                    Format-specification modifiers.
  614 * Printf Examples::                     Several examples.
  615 * Redirection::                         How to redirect output to multiple
  616                                         files and pipes.
  617 * Special FD::                          Special files for I/O.
  618 * Special Files::                       File name interpretation in
  619                                         @command{gawk}. @command{gawk} allows
  620                                         access to inherited file descriptors.
  621 * Other Inherited Files::               Accessing other open files with
  622                                         @command{gawk}.
  623 * Special Network::                     Special files for network
  624                                         communications.
  625 * Special Caveats::                     Things to watch out for.
  626 * Close Files And Pipes::               Closing Input and Output Files and
  627                                         Pipes.
  628 * Nonfatal::                            Enabling Nonfatal Output.
  629 * Output Summary::                      Output summary.
  630 * Output Exercises::                    Exercises.
  631 * Values::                              Constants, Variables, and Regular
  632                                         Expressions.
  633 * Constants::                           String, numeric and regexp constants.
  634 * Scalar Constants::                    Numeric and string constants.
  635 * Nondecimal-numbers::                  What are octal and hex numbers.
  636 * Regexp Constants::                    Regular Expression constants.
  637 * Using Constant Regexps::              When and how to use a regexp constant.
  638 * Standard Regexp Constants::           Regexp constants in standard
  639                                         @command{awk}.
  640 * Strong Regexp Constants::             Strongly typed regexp constants.
  641 * Variables::                           Variables give names to values for
  642                                         later use.
  643 * Using Variables::                     Using variables in your programs.
  644 * Assignment Options::                  Setting variables on the command line
  645                                         and a summary of command-line syntax.
  646                                         This is an advanced method of input.
  647 * Conversion::                          The conversion of strings to numbers
  648                                         and vice versa.
  649 * Strings And Numbers::                 How @command{awk} Converts Between
  650                                         Strings And Numbers.
  651 * Locale influences conversions::       How the locale may affect conversions.
  652 * All Operators::                       @command{gawk}'s operators.
  653 * Arithmetic Ops::                      Arithmetic operations (@samp{+},
  654                                         @samp{-}, etc.)
  655 * Concatenation::                       Concatenating strings.
  656 * Assignment Ops::                      Changing the value of a variable or a
  657                                         field.
  658 * Increment Ops::                       Incrementing the numeric value of a
  659                                         variable.
  660 * Truth Values and Conditions::         Testing for true and false.
  661 * Truth Values::                        What is ``true'' and what is
  662                                         ``false''.
  663 * Typing and Comparison::               How variables acquire types and how
  664                                         this affects comparison of numbers and
  665                                         strings with @samp{<}, etc.
  666 * Variable Typing::                     String type versus numeric type.
  667 * Comparison Operators::                The comparison operators.
  668 * POSIX String Comparison::             String comparison with POSIX rules.
  669 * Boolean Ops::                         Combining comparison expressions using
  670                                         boolean operators @samp{||} (``or''),
  671                                         @samp{&&} (``and'') and @samp{!}
  672                                         (``not'').
  673 * Conditional Exp::                     Conditional expressions select between
  674                                         two subexpressions under control of a
  675                                         third subexpression.
  676 * Function Calls::                      A function call is an expression.
  677 * Precedence::                          How various operators nest.
  678 * Locales::                             How the locale affects things.
  679 * Expressions Summary::                 Expressions summary.
  680 * Pattern Overview::                    What goes into a pattern.
  681 * Regexp Patterns::                     Using regexps as patterns.
  682 * Expression Patterns::                 Any expression can be used as a
  683                                         pattern.
  684 * Ranges::                              Pairs of patterns specify record
  685                                         ranges.
  686 * BEGIN/END::                           Specifying initialization and cleanup
  687                                         rules.
  688 * Using BEGIN/END::                     How and why to use BEGIN/END rules.
  689 * I/O And BEGIN/END::                   I/O issues in BEGIN/END rules.
  690 * BEGINFILE/ENDFILE::                   Two special patterns for advanced
  691                                         control.
  692 * Empty::                               The empty pattern, which matches every
  693                                         record.
  694 * Using Shell Variables::               How to use shell variables with
  695                                         @command{awk}.
  696 * Action Overview::                     What goes into an action.
  697 * Statements::                          Describes the various control
  698                                         statements in detail.
  699 * If Statement::                        Conditionally execute some
  700                                         @command{awk} statements.
  701 * While Statement::                     Loop until some condition is
  702                                         satisfied.
  703 * Do Statement::                        Do specified action while looping
  704                                         until some condition is satisfied.
  705 * For Statement::                       Another looping statement, that
  706                                         provides initialization and increment
  707                                         clauses.
  708 * Switch Statement::                    Switch/case evaluation for conditional
  709                                         execution of statements based on a
  710                                         value.
  711 * Break Statement::                     Immediately exit the innermost
  712                                         enclosing loop.
  713 * Continue Statement::                  Skip to the end of the innermost
  714                                         enclosing loop.
  715 * Next Statement::                      Stop processing the current input
  716                                         record.
  717 * Nextfile Statement::                  Stop processing the current file.
  718 * Exit Statement::                      Stop execution of @command{awk}.
  719 * Built-in Variables::                  Summarizes the predefined variables.
  720 * User-modified::                       Built-in variables that you change to
  721                                         control @command{awk}.
  722 * Auto-set::                            Built-in variables where @command{awk}
  723                                         gives you information.
  724 * ARGC and ARGV::                       Ways to use @code{ARGC} and
  725                                         @code{ARGV}.
  726 * Pattern Action Summary::              Patterns and Actions summary.
  727 * Array Basics::                        The basics of arrays.
  728 * Array Intro::                         Introduction to Arrays
  729 * Reference to Elements::               How to examine one element of an
  730                                         array.
  731 * Assigning Elements::                  How to change an element of an array.
  732 * Array Example::                       Basic Example of an Array
  733 * Scanning an Array::                   A variation of the @code{for}
  734                                         statement. It loops through the
  735                                         indices of an array's existing
  736                                         elements.
  737 * Controlling Scanning::                Controlling the order in which arrays
  738                                         are scanned.
  739 * Numeric Array Subscripts::            How to use numbers as subscripts in
  740                                         @command{awk}.
  741 * Uninitialized Subscripts::            Using Uninitialized variables as
  742                                         subscripts.
  743 * Delete::                              The @code{delete} statement removes an
  744                                         element from an array.
  745 * Multidimensional::                    Emulating multidimensional arrays in
  746                                         @command{awk}.
  747 * Multiscanning::                       Scanning multidimensional arrays.
  748 * Arrays of Arrays::                    True multidimensional arrays.
  749 * Arrays Summary::                      Summary of arrays.
  750 * Built-in::                            Summarizes the built-in functions.
  751 * Calling Built-in::                    How to call built-in functions.
  752 * Numeric Functions::                   Functions that work with numbers,
  753                                         including @code{int()}, @code{sin()}
  754                                         and @code{rand()}.
  755 * String Functions::                    Functions for string manipulation,
  756                                         such as @code{split()}, @code{match()}
  757                                         and @code{sprintf()}.
  758 * Gory Details::                        More than you want to know about
  759                                         @samp{\} and @samp{&} with
  760                                         @code{sub()}, @code{gsub()}, and
  761                                         @code{gensub()}.
  762 * I/O Functions::                       Functions for files and shell
  763                                         commands.
  764 * Time Functions::                      Functions for dealing with timestamps.
  765 * Bitwise Functions::                   Functions for bitwise operations.
  766 * Type Functions::                      Functions for type information.
  767 * I18N Functions::                      Functions for string translation.
  768 * User-defined::                        Describes User-defined functions in
  769                                         detail.
  770 * Definition Syntax::                   How to write definitions and what they
  771                                         mean.
  772 * Function Example::                    An example function definition and
  773                                         what it does.
  774 * Function Calling::                    Calling user-defined functions.
  775 * Calling A Function::                  Don't use spaces.
  776 * Variable Scope::                      Controlling variable scope.
  777 * Pass By Value/Reference::             Passing parameters.
  778 * Function Caveats::                    Other points to know about functions.
  779 * Return Statement::                    Specifying the value a function
  780                                         returns.
  781 * Dynamic Typing::                      How variable types can change at
  782                                         runtime.
  783 * Indirect Calls::                      Choosing the function to call at
  784                                         runtime.
  785 * Functions Summary::                   Summary of functions.
  786 * Library Names::                       How to best name private global
  787                                         variables in library functions.
  788 * General Functions::                   Functions that are of general use.
  789 * Strtonum Function::                   A replacement for the built-in
  790                                         @code{strtonum()} function.
  791 * Assert Function::                     A function for assertions in
  792                                         @command{awk} programs.
  793 * Round Function::                      A function for rounding if
  794                                         @code{sprintf()} does not do it
  795                                         correctly.
  796 * Cliff Random Function::               The Cliff Random Number Generator.
  797 * Ordinal Functions::                   Functions for using characters as
  798                                         numbers and vice versa.
  799 * Join Function::                       A function to join an array into a
  800                                         string.
  801 * Getlocaltime Function::               A function to get formatted times.
  802 * Readfile Function::                   A function to read an entire file at
  803                                         once.
  804 * Shell Quoting::                       A function to quote strings for the
  805                                         shell.
  806 * Data File Management::                Functions for managing command-line
  807                                         data files.
  808 * Filetrans Function::                  A function for handling data file
  809                                         transitions.
  810 * Rewind Function::                     A function for rereading the current
  811                                         file.
  812 * File Checking::                       Checking that data files are readable.
  813 * Empty Files::                         Checking for zero-length files.
  814 * Ignoring Assigns::                    Treating assignments as file names.
  815 * Getopt Function::                     A function for processing command-line
  816                                         arguments.
  817 * Passwd Functions::                    Functions for getting user
  818                                         information.
  819 * Group Functions::                     Functions for getting group
  820                                         information.
  821 * Walking Arrays::                      A function to walk arrays of arrays.
  822 * Library Functions Summary::           Summary of library functions.
  823 * Library Exercises::                   Exercises.
  824 * Running Examples::                    How to run these examples.
  825 * Clones::                              Clones of common utilities.
  826 * Cut Program::                         The @command{cut} utility.
  827 * Egrep Program::                       The @command{egrep} utility.
  828 * Id Program::                          The @command{id} utility.
  829 * Split Program::                       The @command{split} utility.
  830 * Tee Program::                         The @command{tee} utility.
  831 * Uniq Program::                        The @command{uniq} utility.
  832 * Wc Program::                          The @command{wc} utility.
  833 * Miscellaneous Programs::              Some interesting @command{awk}
  834                                         programs.
  835 * Dupword Program::                     Finding duplicated words in a
  836                                         document.
  837 * Alarm Program::                       An alarm clock.
  838 * Translate Program::                   A program similar to the @command{tr}
  839                                         utility.
  840 * Labels Program::                      Printing mailing labels.
  841 * Word Sorting::                        A program to produce a word usage
  842                                         count.
  843 * History Sorting::                     Eliminating duplicate entries from a
  844                                         history file.
  845 * Extract Program::                     Pulling out programs from Texinfo
  846                                         source files.
  847 * Simple Sed::                          A Simple Stream Editor.
  848 * Igawk Program::                       A wrapper for @command{awk} that
  849                                         includes files.
  850 * Anagram Program::                     Finding anagrams from a dictionary.
  851 * Signature Program::                   People do amazing things with too much
  852                                         time on their hands.
  853 * Programs Summary::                    Summary of programs.
  854 * Programs Exercises::                  Exercises.
  855 * Nondecimal Data::                     Allowing nondecimal input data.
  856 * Array Sorting::                       Facilities for controlling array
  857                                         traversal and sorting arrays.
  858 * Controlling Array Traversal::         How to use PROCINFO["sorted_in"].
  859 * Array Sorting Functions::             How to use @code{asort()} and
  860                                         @code{asorti()}.
  861 * Two-way I/O::                         Two-way communications with another
  862                                         process.
  863 * TCP/IP Networking::                   Using @command{gawk} for network
  864                                         programming.
  865 * Profiling::                           Profiling your @command{awk} programs.
  866 * Advanced Features Summary::           Summary of advanced features.
  867 * I18N and L10N::                       Internationalization and Localization.
  868 * Explaining gettext::                  How GNU @command{gettext} works.
  869 * Programmer i18n::                     Features for the programmer.
  870 * Translator i18n::                     Features for the translator.
  871 * String Extraction::                   Extracting marked strings.
  872 * Printf Ordering::                     Rearranging @code{printf} arguments.
  873 * I18N Portability::                    @command{awk}-level portability
  874                                         issues.
  875 * I18N Example::                        A simple i18n example.
  876 * Gawk I18N::                           @command{gawk} is also
  877                                         internationalized.
  878 * I18N Summary::                        Summary of I18N stuff.
  879 * Debugging::                           Introduction to @command{gawk}
  880                                         debugger.
  881 * Debugging Concepts::                  Debugging in General.
  882 * Debugging Terms::                     Additional Debugging Concepts.
  883 * Awk Debugging::                       Awk Debugging.
  884 * Sample Debugging Session::            Sample debugging session.
  885 * Debugger Invocation::                 How to Start the Debugger.
  886 * Finding The Bug::                     Finding the Bug.
  887 * List of Debugger Commands::           Main debugger commands.
  888 * Breakpoint Control::                  Control of Breakpoints.
  889 * Debugger Execution Control::          Control of Execution.
  890 * Viewing And Changing Data::           Viewing and Changing Data.
  891 * Execution Stack::                     Dealing with the Stack.
  892 * Debugger Info::                       Obtaining Information about the
  893                                         Program and the Debugger State.
  894 * Miscellaneous Debugger Commands::     Miscellaneous Commands.
  895 * Readline Support::                    Readline support.
  896 * Limitations::                         Limitations and future plans.
  897 * Debugging Summary::                   Debugging summary.
  898 * Global Namespace::                    The global namespace in standard
  899                                         @command{awk}.
  900 * Qualified Names::                     How to qualify names with a namespace.
  901 * Default Namespace::                   The default namespace.
  902 * Changing The Namespace::              How to change the namespace.
  903 * Naming Rules::                        Namespace and Component Naming Rules.
  904 * Internal Name Management::            How names are stored internally.
  905 * Namespace Example::                   An example of code using a namespace.
  906 * Namespace And Features::              Namespaces and other @command{gawk}
  907                                         features.
  908 * Namespace Summary::                   Summarizing namespaces.
  909 * Computer Arithmetic::                 A quick intro to computer math.
  910 * Math Definitions::                    Defining terms used.
  911 * MPFR features::                       The MPFR features in @command{gawk}.
  912 * FP Math Caution::                     Things to know.
  913 * Inexactness of computations::         Floating point math is not exact.
  914 * Inexact representation::              Numbers are not exactly represented.
  915 * Comparing FP Values::                 How to compare floating point values.
  916 * Errors accumulate::                   Errors get bigger as they go.
  917 * Getting Accuracy::                    Getting more accuracy takes some work.
  918 * Try To Round::                        Add digits and round.
  919 * Setting precision::                   How to set the precision.
  920 * Setting the rounding mode::           How to set the rounding mode.
  921 * Arbitrary Precision Integers::        Arbitrary Precision Integer Arithmetic
  922                                         with @command{gawk}.
  923 * Checking for MPFR::                   How to check if MPFR is available.
  924 * POSIX Floating Point Problems::       Standards Versus Existing Practice.
  925 * Floating point summary::              Summary of floating point discussion.
  926 * Extension Intro::                     What is an extension.
  927 * Plugin License::                      A note about licensing.
  928 * Extension Mechanism Outline::         An outline of how it works.
  929 * Extension API Description::           A full description of the API.
  930 * Extension API Functions Introduction:: Introduction to the API functions.
  931 * General Data Types::                  The data types.
  932 * Memory Allocation Functions::         Functions for allocating memory.
  933 * Constructor Functions::               Functions for creating values.
  934 * Registration Functions::              Functions to register things with
  935                                         @command{gawk}.
  936 * Extension Functions::                 Registering extension functions.
  937 * Exit Callback Functions::             Registering an exit callback.
  938 * Extension Version String::            Registering a version string.
  939 * Input Parsers::                       Registering an input parser.
  940 * Output Wrappers::                     Registering an output wrapper.
  941 * Two-way processors::                  Registering a two-way processor.
  942 * Printing Messages::                   Functions for printing messages.
  943 * Updating @code{ERRNO}::               Functions for updating @code{ERRNO}.
  944 * Requesting Values::                   How to get a value.
  945 * Accessing Parameters::                Functions for accessing parameters.
  946 * Symbol Table Access::                 Functions for accessing global
  947                                         variables.
  948 * Symbol table by name::                Accessing variables by name.
  949 * Symbol table by cookie::              Accessing variables by ``cookie''.
  950 * Cached values::                       Creating and using cached values.
  951 * Array Manipulation::                  Functions for working with arrays.
  952 * Array Data Types::                    Data types for working with arrays.
  953 * Array Functions::                     Functions for working with arrays.
  954 * Flattening Arrays::                   How to flatten arrays.
  955 * Creating Arrays::                     How to create and populate arrays.
  956 * Redirection API::                     How to access and manipulate
  957                                         redirections.
  958 * Extension API Variables::             Variables provided by the API.
  959 * Extension Versioning::                API Version information.
  960 * Extension GMP/MPFR Versioning::       Version information about GMP and
  961                                         MPFR.
  962 * Extension API Informational Variables:: Variables providing information about
  963                                         @command{gawk}'s invocation.
  964 * Extension API Boilerplate::           Boilerplate code for using the API.
  965 * Changes from API V1::                 Changes from V1 of the API.
  966 * Finding Extensions::                  How @command{gawk} finds compiled
  967                                         extensions.
  968 * Extension Example::                   Example C code for an extension.
  969 * Internal File Description::           What the new functions will do.
  970 * Internal File Ops::                   The code for internal file operations.
  971 * Using Internal File Ops::             How to use an external extension.
  972 * Extension Samples::                   The sample extensions that ship with
  973                                         @command{gawk}.
  974 * Extension Sample File Functions::     The file functions sample.
  975 * Extension Sample Fnmatch::            An interface to @code{fnmatch()}.
  976 * Extension Sample Fork::               An interface to @code{fork()} and
  977                                         other process functions.
  978 * Extension Sample Inplace::            Enabling in-place file editing.
  979 * Extension Sample Ord::                Character to value to character
  980                                         conversions.
  981 * Extension Sample Readdir::            An interface to @code{readdir()}.
  982 * Extension Sample Revout::             Reversing output sample output
  983                                         wrapper.
  984 * Extension Sample Rev2way::            Reversing data sample two-way
  985                                         processor.
  986 * Extension Sample Read write array::   Serializing an array to a file.
  987 * Extension Sample Readfile::           Reading an entire file into a string.
  988 * Extension Sample Time::               An interface to @code{gettimeofday()}
  989                                         and @code{sleep()}.
  990 * Extension Sample API Tests::          Tests for the API.
  991 * gawkextlib::                          The @code{gawkextlib} project.
  992 * Extension summary::                   Extension summary.
  993 * Extension Exercises::                 Exercises.
  994 * V7/SVR3.1::                           The major changes between V7 and
  995                                         System V Release 3.1.
  996 * SVR4::                                Minor changes between System V
  997                                         Releases 3.1 and 4.
  998 * POSIX::                               New features from the POSIX standard.
  999 * BTL::                                 New features from Brian Kernighan's
 1000                                         version of @command{awk}.
 1001 * POSIX/GNU::                           The extensions in @command{gawk} not
 1002                                         in POSIX @command{awk}.
 1003 * Feature History::                     The history of the features in
 1004                                         @command{gawk}.
 1005 * Common Extensions::                   Common Extensions Summary.
 1006 * Ranges and Locales::                  How locales used to affect regexp
 1007                                         ranges.
 1008 * Contributors::                        The major contributors to
 1009                                         @command{gawk}.
 1010 * History summary::                     History summary.
 1011 * Gawk Distribution::                   What is in the @command{gawk}
 1012                                         distribution.
 1013 * Getting::                             How to get the distribution.
 1014 * Extracting::                          How to extract the distribution.
 1015 * Distribution contents::               What is in the distribution.
 1016 * Unix Installation::                   Installing @command{gawk} under
 1017                                         various versions of Unix.
 1018 * Quick Installation::                  Compiling @command{gawk} under Unix.
 1019 * Shell Startup Files::                 Shell convenience functions.
 1020 * Additional Configuration Options::    Other compile-time options.
 1021 * Configuration Philosophy::            How it's all supposed to work.
 1022 * Non-Unix Installation::               Installation on Other Operating
 1023                                         Systems.
 1024 * PC Installation::                     Installing and Compiling
 1025                                         @command{gawk} on Microsoft Windows.
 1026 * PC Binary Installation::              Installing a prepared distribution.
 1027 * PC Compiling::                        Compiling @command{gawk} for
 1028                                         Windows32.
 1029 * PC Using::                            Running @command{gawk} on Windows32.
 1030 * Cygwin::                              Building and running @command{gawk}
 1031                                         for Cygwin.
 1032 * MSYS::                                Using @command{gawk} In The MSYS
 1033                                         Environment.
 1034 * VMS Installation::                    Installing @command{gawk} on VMS.
 1035 * VMS Compilation::                     How to compile @command{gawk} under
 1036                                         VMS.
 1037 * VMS Dynamic Extensions::              Compiling @command{gawk} dynamic
 1038                                         extensions on VMS.
 1039 * VMS Installation Details::            How to install @command{gawk} under
 1040                                         VMS.
 1041 * VMS Running::                         How to run @command{gawk} under VMS.
 1042 * VMS GNV::                             The VMS GNV Project.
 1043 * VMS Old Gawk::                        An old version comes with some VMS
 1044                                         systems.
 1045 * Bugs::                                Reporting Problems and Bugs.
 1046 * Bug address::                         Where to send reports to.
 1047 * Usenet::                              Where not to send reports to.
 1048 * Maintainers::                         Maintainers of non-*nix ports.
 1049 * Other Versions::                      Other freely available @command{awk}
 1050                                         implementations.
 1051 * Installation summary::                Summary of installation.
 1052 * Compatibility Mode::                  How to disable certain @command{gawk}
 1053                                         extensions.
 1054 * Additions::                           Making Additions To @command{gawk}.
 1055 * Accessing The Source::                Accessing the Git repository.
 1056 * Adding Code::                         Adding code to the main body of
 1057                                         @command{gawk}.
 1058 * New Ports::                           Porting @command{gawk} to a new
 1059                                         operating system.
 1060 * Derived Files::                       Why derived files are kept in the Git
 1061                                         repository.
 1062 * Future Extensions::                   New features that may be implemented
 1063                                         one day.
 1064 * Implementation Limitations::          Some limitations of the
 1065                                         implementation.
 1066 * Extension Design::                    Design notes about the extension API.
 1067 * Old Extension Problems::              Problems with the old mechanism.
 1068 * Extension New Mechanism Goals::       Goals for the new mechanism.
 1069 * Extension Other Design Decisions::    Some other design decisions.
 1070 * Extension Future Growth::             Some room for future growth.
 1071 * Notes summary::                       Summary of implementation notes.
 1072 * Basic High Level::                    The high level view.
 1073 * Basic Data Typing::                   A very quick intro to data types.
 1074 @end detailmenu
 1075 @end menu
 1076 
 1077 @c dedication for Info file
 1078 @ifinfo
 1079 To my parents, for their love, and for the wonderful
 1080 example they set for me.
 1081 @sp 1
 1082 To my wife Miriam, for making me complete.
 1083 Thank you for building your life together with me.
 1084 @sp 1
 1085 To our children Chana, Rivka, Nachum and Malka,
 1086 for enrichening our lives in innumerable ways.
 1087 @end ifinfo
 1088 
 1089 @summarycontents
 1090 @contents
 1091 
 1092 @node Foreword3
 1093 @unnumbered Foreword to the Third Edition
 1094 
 1095 @c This bit is post-processed by a script which turns the chapter
 1096 @c tag into a preface tag, and moves this stuff to before the title.
 1097 @c Bleah.
 1098 @docbook
 1099   <prefaceinfo>
 1100     <author>
 1101       <firstname>Michael</firstname>
 1102       <surname>Brennan</surname>
 1103       <!-- can't put mawk into command tags. sigh. -->
 1104       <affiliation><jobtitle>Author of mawk</jobtitle></affiliation>
 1105     </author>
 1106     <date>March 2001</date>
 1107    </prefaceinfo>
 1108 @end docbook
 1109 
 1110 Arnold Robbins and I are good friends. We were introduced
 1111 @c 11 years ago
 1112 in 1990
 1113 by circumstances---and our favorite programming language, AWK.
 1114 The circumstances started a couple of years
 1115 earlier. I was working at a new job and noticed an unplugged
 1116 Unix computer sitting in the corner.  No one knew how to use it,
 1117 and neither did I.  However,
 1118 a couple of days later, it was running, and
 1119 I was @code{root} and the one-and-only user.
 1120 That day, I began the transition from statistician to Unix programmer.
 1121 
 1122 On one of many trips to the library or bookstore in search of
 1123 books on Unix, I found the gray AWK book, a.k.a.@:
 1124 Alfred V.@: Aho, Brian W.@: Kernighan, and
 1125 Peter J.@: Weinberger's @cite{The AWK Programming Language} (Addison-Wesley,
 1126 1988).  @command{awk}'s simple programming paradigm---find a pattern in the
 1127 input and then perform an action---often reduced complex or tedious
 1128 data manipulations to a few lines of code.  I was excited to try my
 1129 hand at programming in AWK.
 1130 
 1131 Alas,  the @command{awk} on my computer was a limited version of the
 1132 language described in the gray book.  I discovered that my computer
 1133 had ``old @command{awk}'' and the book described
 1134 ``new @command{awk}.''
 1135 I learned that this was typical; the old version refused to step
 1136 aside or relinquish its name.  If a system had a new @command{awk}, it was
 1137 invariably called @command{nawk}, and few systems had it.
 1138 The best way to get a new @command{awk} was to @command{ftp} the source code for
 1139 @command{gawk} from @code{prep.ai.mit.edu}.  @command{gawk} was a version of
 1140 new @command{awk} written by David Trueman and Arnold, and available under
 1141 the GNU General Public License.
 1142 
 1143 (Incidentally,
 1144 it's no longer difficult to find a new @command{awk}. @command{gawk} ships with
 1145 GNU/Linux, and you can download binaries or source code for almost
 1146 any system; my wife uses @command{gawk} on her VMS box.)
 1147 
 1148 My Unix system started out unplugged from the wall; it certainly was not
 1149 plugged into a network.  So, oblivious to the existence of @command{gawk}
 1150 and the Unix community in general, and desiring a new @command{awk}, I wrote
 1151 my own, called @command{mawk}.
 1152 Before I was finished, I knew about @command{gawk},
 1153 but it was too late to stop, so I eventually posted
 1154 to a @code{comp.sources} newsgroup.
 1155 
 1156 A few days after my posting, I got a friendly email
 1157 from Arnold introducing
 1158 himself.   He suggested we share design and algorithms and
 1159 attached a draft of the POSIX standard so
 1160 that I could update @command{mawk} to support language extensions added
 1161 after publication of @cite{The AWK Programming Language}.
 1162 
 1163 Frankly, if our roles had
 1164 been reversed, I would not have been so open and we probably would
 1165 have never met.  I'm glad we did meet.
 1166 He is an AWK expert's AWK expert and a genuinely nice person.
 1167 Arnold contributes significant amounts of his
 1168 expertise and time to the Free Software Foundation.
 1169 
 1170 This book is the @command{gawk} reference manual, but at its core it
 1171 is a book about AWK programming that
 1172 will appeal to a wide audience.
 1173 It is a definitive reference to the AWK language as defined by the
 1174 1987 Bell Laboratories release and codified in the 1992 POSIX Utilities
 1175 standard.
 1176 
 1177 On the other hand, the novice AWK programmer can study
 1178 a wealth of practical programs that emphasize
 1179 the power of AWK's basic idioms:
 1180 data-driven control flow, pattern matching with regular expressions,
 1181 and associative arrays.
 1182 Those looking for something new can try out @command{gawk}'s
 1183 interface to network protocols via special @file{/inet} files.
 1184 
 1185 The programs in this book make clear that an AWK program is
 1186 typically much smaller and faster to develop than
 1187 a counterpart written in C.
 1188 Consequently, there is often a payoff to prototyping an
 1189 algorithm or design in AWK to get it running quickly and expose
 1190 problems early. Often, the interpreted performance is adequate
 1191 and the AWK prototype becomes the product.
 1192 
 1193 The new @command{pgawk} (profiling @command{gawk}), produces
 1194 program execution counts.
 1195 I recently experimented with an algorithm that for
 1196 @ifnotdocbook
 1197 @math{n}
 1198 @end ifnotdocbook
 1199 @ifdocbook
 1200 @i{n}
 1201 @end ifdocbook
 1202 lines of input, exhibited
 1203 @tex
 1204 $\sim\! Cn^2$
 1205 @end tex
 1206 @ifnottex
 1207 @ifnotdocbook
 1208 ~ C n^2
 1209 @end ifnotdocbook
 1210 @end ifnottex
 1211 @docbook
 1212 <emphasis>&sim; Cn<superscript>2</superscript></emphasis>
 1213 @end docbook
 1214 performance, while
 1215 theory predicted
 1216 @tex
 1217 $\sim\! Cn\log n$
 1218 @end tex
 1219 @ifnottex
 1220 @ifnotdocbook
 1221 ~ C n log n
 1222 @end ifnotdocbook
 1223 @end ifnottex
 1224 @docbook
 1225 <emphasis>&sim; Cn log n</emphasis>
 1226 @end docbook
 1227 behavior. A few minutes poring
 1228 over the @file{awkprof.out} profile pinpointed the problem to
 1229 a single line of code.  @command{pgawk} is a welcome addition to
 1230 my programmer's toolbox.
 1231 
 1232 Arnold has distilled over a decade of experience writing and
 1233 using AWK programs, and developing @command{gawk}, into this book.  If you use
 1234 AWK or want to learn how, then read this book.
 1235 
 1236 @ifnotdocbook
 1237 @cindex Brennan, Michael
 1238 @display
 1239 Michael Brennan
 1240 Author of @command{mawk}
 1241 March 2001
 1242 @end display
 1243 @end ifnotdocbook
 1244 
 1245 @node Foreword4
 1246 @unnumbered Foreword to the Fourth Edition
 1247 
 1248 @c This bit is post-processed by a script which turns the chapter
 1249 @c tag into a preface tag, and moves this stuff to before the title.
 1250 @c Bleah.
 1251 @docbook
 1252   <prefaceinfo>
 1253     <author>
 1254       <firstname>Michael</firstname>
 1255       <surname>Brennan</surname>
 1256       <!-- can't put mawk into command tags. sigh. -->
 1257       <affiliation><jobtitle>Author of mawk</jobtitle></affiliation>
 1258     </author>
 1259     <date>October 2014</date>
 1260    </prefaceinfo>
 1261 @end docbook
 1262 
 1263 Some things don't change.  Thirteen years ago I wrote:
 1264 ``If you use AWK or want to learn how, then read this book.''
 1265 True then, and still true today.
 1266 
 1267 Learning to use a programming language is about more than mastering the
 1268 syntax.  One needs to acquire an understanding of how to use the
 1269 features of the language to solve practical programming problems.
 1270 A focus of this book is many examples that show how to use AWK.
 1271 
 1272 Some things do change. Our computers are much faster and have more memory.
 1273 Consequently, speed and storage inefficiencies of a high-level language
 1274 matter less.  Prototyping in AWK and then rewriting in C for performance
 1275 reasons happens less, because more often the prototype is fast enough.
 1276 
 1277 Of course, there are computing operations that are best done in C or C++.
 1278 With @command{gawk} 4.1 and later, you do not have to choose between writing
 1279 your program in AWK or in C/C++.  You can write most of your
 1280 program in AWK and the aspects that require C/C++ capabilities can be written
 1281 in C/C++, and then the pieces glued together when the @command{gawk} module loads
 1282 the C/C++ module as a dynamic plug-in.
 1283 @c Chapter 16
 1284 @ref{Dynamic Extensions},
 1285 has all the
 1286 details, and, as expected, many examples to help you learn the ins and outs.
 1287 
 1288 I enjoy programming in AWK and had fun (re)reading this book.
 1289 I think you will too.
 1290 
 1291 @ifnotdocbook
 1292 @cindex Brennan, Michael
 1293 @display
 1294 Michael Brennan
 1295 Author of @command{mawk}
 1296 October 2014
 1297 @end display
 1298 @end ifnotdocbook
 1299 
 1300 @node Preface
 1301 @unnumbered Preface
 1302 @c I saw a comment somewhere that the preface should describe the book itself,
 1303 @c and the introduction should describe what the book covers.
 1304 @c
 1305 @c 12/2000: Chuck wants the preface & intro combined.
 1306 
 1307 @c This bit is post-processed by a script which turns the chapter
 1308 @c tag into a preface tag, and moves this stuff to before the title.
 1309 @c Bleah.
 1310 @docbook
 1311   <prefaceinfo>
 1312     <author>
 1313       <firstname>Arnold</firstname>
 1314       <surname>Robbins</surname>
 1315       <affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation>
 1316       <affiliation><jobtitle>Israel</jobtitle></affiliation>
 1317     </author>
 1318     <date>February 2015</date>
 1319    </prefaceinfo>
 1320 @end docbook
 1321 
 1322 @cindex @command{awk}
 1323 Several kinds of tasks occur repeatedly when working with text files.
 1324 You might want to extract certain lines and discard the rest.  Or you
 1325 may need to make changes wherever certain patterns appear, but leave the
 1326 rest of the file alone.  Such jobs are often easy with @command{awk}.
 1327 The @command{awk} utility interprets a special-purpose programming
 1328 language that makes it easy to handle simple data-reformatting jobs.
 1329 
 1330 @cindex @command{gawk}
 1331 The GNU implementation of @command{awk} is called @command{gawk}; if you
 1332 invoke it with the proper options or environment variables,
 1333 it is fully compatible with
 1334 the POSIX@footnote{The 2018 POSIX standard is accessible online at
 1335 @w{@url{https://pubs.opengroup.org/onlinepubs/9699919799/}.}}
 1336 specification of the @command{awk} language
 1337 and with the Unix version of @command{awk} maintained
 1338 by Brian Kernighan.
 1339 This means that all
 1340 properly written @command{awk} programs should work with @command{gawk}.
 1341 So most of the time, we don't distinguish between @command{gawk} and other
 1342 @command{awk} implementations.
 1343 
 1344 @cindex @command{awk} @subentry POSIX and @seealso{POSIX @command{awk}}
 1345 @cindex @command{awk} @subentry POSIX and
 1346 @cindex POSIX @subentry @command{awk} and
 1347 @cindex @command{gawk} @subentry @command{awk} and
 1348 @cindex @command{awk} @subentry @command{gawk} and
 1349 @cindex @command{awk} @subentry uses for
 1350 Using @command{awk} you can:
 1351 
 1352 @itemize @value{BULLET}
 1353 @item
 1354 Manage small, personal databases
 1355 
 1356 @item
 1357 Generate reports
 1358 
 1359 @item
 1360 Validate data
 1361 
 1362 @item
 1363 Produce indexes and perform other document-preparation tasks
 1364 
 1365 @item
 1366 Experiment with algorithms that you can adapt later to other computer
 1367 languages
 1368 @end itemize
 1369 
 1370 @cindex @command{awk} @seealso{@command{gawk}}
 1371 @cindex @command{gawk} @seealso{@command{awk}}
 1372 @cindex @command{gawk} @subentry uses for
 1373 In addition,
 1374 @command{gawk}
 1375 provides facilities that make it easy to:
 1376 
 1377 @itemize @value{BULLET}
 1378 @item
 1379 Extract bits and pieces of data for processing
 1380 
 1381 @item
 1382 Sort data
 1383 
 1384 @item
 1385 Perform simple network communications
 1386 
 1387 @item
 1388 Profile and debug @command{awk} programs
 1389 
 1390 @item
 1391 Extend the language with functions written in C or C++
 1392 @end itemize
 1393 
 1394 This @value{DOCUMENT} teaches you about the @command{awk} language and
 1395 how you can use it effectively.  You should already be familiar with basic
 1396 system commands, such as @command{cat} and @command{ls},@footnote{These utilities
 1397 are available on POSIX-compliant systems, as well as on traditional
 1398 Unix-based systems. If you are using some other operating system, you still need to
 1399 be familiar with the ideas of I/O redirection and pipes.} as well as basic shell
 1400 facilities, such as input/output (I/O) redirection and pipes.
 1401 
 1402 @cindex GNU @command{awk} @seeentry{@command{gawk}}
 1403 Implementations of the @command{awk} language are available for many
 1404 different computing environments.  This @value{DOCUMENT}, while describing
 1405 the @command{awk} language in general, also describes the particular
 1406 implementation of @command{awk} called @command{gawk} (which stands for
 1407 ``GNU @command{awk}'').  @command{gawk} runs on a broad range of Unix systems,
 1408 ranging from Intel-architecture PC-based computers
 1409 up through large-scale systems.
 1410 @command{gawk} has also been ported to Mac OS X,
 1411 Microsoft Windows
 1412 (all versions),
 1413 and OpenVMS.@footnote{Some other, obsolete systems to which @command{gawk}
 1414 was once ported are no longer supported and the code for those systems
 1415 has been removed.}
 1416 
 1417 @menu
 1418 * History::                     The history of @command{gawk} and
 1419                                 @command{awk}.
 1420 * Names::                       What name to use to find @command{awk}.
 1421 * This Manual::                 Using this @value{DOCUMENT}. Includes sample
 1422                                 input files that you can use.
 1423 * Conventions::                 Typographical Conventions.
 1424 * Manual History::              Brief history of the GNU project and this
 1425                                 @value{DOCUMENT}.
 1426 * How To Contribute::           Helping to save the world.
 1427 * Acknowledgments::             Acknowledgments.
 1428 @end menu
 1429 
 1430 @node History
 1431 @unnumberedsec History of @command{awk} and @command{gawk}
 1432 @cindex recipe for a programming language
 1433 @cindex programming language, recipe for
 1434 @sidebar Recipe for a Programming Language
 1435 
 1436 @multitable {2 parts} {1 part  @code{egrep}} {1 part  @code{snobol}}
 1437 @item @tab 1 part  @code{egrep} @tab 1 part  @code{snobol}
 1438 @item @tab 2 parts @code{ed} @tab 3 parts C
 1439 @end multitable
 1440 
 1441 Blend all parts well using @code{lex} and @code{yacc}.
 1442 Document minimally and release.
 1443 
 1444 After eight years, add another part @code{egrep} and two
 1445 more parts C.  Document very well and release.
 1446 @end sidebar
 1447 
 1448 @cindex Aho, Alfred
 1449 @cindex Weinberger, Peter
 1450 @cindex Kernighan, Brian
 1451 @cindex @command{awk} @subentry history of
 1452 The name @command{awk} comes from the initials of its designers: Alfred V.@:
 1453 Aho, Peter J.@: Weinberger, and Brian W.@: Kernighan.  The original version of
 1454 @command{awk} was written in 1977 at AT&T Bell Laboratories.
 1455 In 1985, a new version made the programming
 1456 language more powerful, introducing user-defined functions, multiple input
 1457 streams, and computed regular expressions.
 1458 This new version became widely available with Unix System V
 1459 Release 3.1 (1987).
 1460 The version in System V Release 4 (1989) added some new features and cleaned
 1461 up the behavior in some of the ``dark corners'' of the language.
 1462 The specification for @command{awk} in the POSIX Command Language
 1463 and Utilities standard further clarified the language.
 1464 Both the @command{gawk} designers and the original @command{awk} designers at Bell Laboratories
 1465 provided feedback for the POSIX specification.
 1466 
 1467 @cindex Rubin, Paul
 1468 @cindex Fenlason, Jay
 1469 @cindex Trueman, David
 1470 Paul Rubin wrote @command{gawk} in 1986.
 1471 Jay Fenlason completed it, with advice from Richard Stallman.  John Woods
 1472 contributed parts of the code as well.  In 1988 and 1989, David Trueman, with
 1473 help from me, thoroughly reworked @command{gawk} for compatibility
 1474 with the newer @command{awk}.
 1475 Circa 1994, I became the primary maintainer.
 1476 Current development focuses on bug fixes,
 1477 performance improvements, standards compliance, and, occasionally, new features.
 1478 
 1479 In May 1997, J@"urgen Kahrs felt the need for network access
 1480 from @command{awk}, and with a little help from me, set about adding
 1481 features to do this for @command{gawk}.  At that time, he also
 1482 wrote the bulk of
 1483 @cite{@value{GAWKINETTITLE}}
 1484 (a separate document, available as part of the @command{gawk} distribution).
 1485 His code finally became part of the main @command{gawk} distribution
 1486 with @command{gawk} @value{PVERSION} 3.1.
 1487 
 1488 John Haque rewrote the @command{gawk} internals, in the process providing
 1489 an @command{awk}-level debugger. This version became available as
 1490 @command{gawk} @value{PVERSION} 4.0 in 2011.
 1491 
 1492 @xref{Contributors}
 1493 for a full list of those who have made important contributions to @command{gawk}.
 1494 
 1495 @node Names
 1496 @unnumberedsec A Rose by Any Other Name
 1497 
 1498 @cindex @command{awk} @subentry new vs.@: old
 1499 The @command{awk} language has evolved over the years. Full details are
 1500 provided in @ref{Language History}.
 1501 The language described in this @value{DOCUMENT}
 1502 is often referred to as ``new @command{awk}.''
 1503 By analogy, the original version of @command{awk} is
 1504 referred to as ``old @command{awk}.''
 1505 
 1506 On most current systems, when you run the @command{awk} utility
 1507 you get some version of new @command{awk}.@footnote{Only
 1508 Solaris systems still use an old @command{awk} for the
 1509 default @command{awk} utility. A more modern @command{awk} lives in
 1510 @file{/usr/xpg6/bin} on these systems.} If your system's standard
 1511 @command{awk} is the old one, you will see something like this
 1512 if you try the following test program:
 1513 
 1514 @example
 1515 @group
 1516 $ @kbd{awk 1 /dev/null}
 1517 @error{} awk: syntax error near line 1
 1518 @error{} awk: bailing out near line 1
 1519 @end group
 1520 @end example
 1521 
 1522 @noindent
 1523 In this case, you should find a version of new @command{awk},
 1524 or just install @command{gawk}!
 1525 
 1526 Throughout this @value{DOCUMENT}, whenever we refer to a language feature
 1527 that should be available in any complete implementation of POSIX @command{awk},
 1528 we simply use the term @command{awk}.  When referring to a feature that is
 1529 specific to the GNU implementation, we use the term @command{gawk}.
 1530 
 1531 @node This Manual
 1532 @unnumberedsec Using This Book
 1533 @cindex @command{awk} @subentry terms describing
 1534 
 1535 The term @command{awk} refers to a particular program as well as to the language you
 1536 use to tell this program what to do.  When we need to be careful, we call
 1537 the language ``the @command{awk} language,''
 1538 and the program ``the @command{awk} utility.''
 1539 This @value{DOCUMENT} explains
 1540 both how to write programs in the @command{awk} language and how to
 1541 run the @command{awk} utility.
 1542 The term ``@command{awk} program'' refers to a program written by you in
 1543 the @command{awk} programming language.
 1544 
 1545 @cindex @command{gawk} @subentry @command{awk} and
 1546 @cindex @command{awk} @subentry @command{gawk} and
 1547 @cindex POSIX @command{awk}
 1548 Primarily, this @value{DOCUMENT} explains the features of @command{awk}
 1549 as defined in the POSIX standard.  It does so in the context of the
 1550 @command{gawk} implementation.  While doing so, it also
 1551 attempts to describe important differences between @command{gawk}
 1552 and other @command{awk}
 1553 @ifclear FOR_PRINT
 1554 implementations.@footnote{All such differences
 1555 appear in the index under the
 1556 entry ``differences in @command{awk} and @command{gawk}.''}
 1557 @end ifclear
 1558 @ifset FOR_PRINT
 1559 implementations.
 1560 @end ifset
 1561 Finally, it notes any @command{gawk} features that are not in
 1562 the POSIX standard for @command{awk}.
 1563 
 1564 @ifnotinfo
 1565 This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference.
 1566 If you are a novice, feel free to skip over details that seem too complex.
 1567 You should also ignore the many cross-references; they are for the
 1568 expert user and for the Info and
 1569 @uref{https://www.gnu.org/software/gawk/manual/, HTML}
 1570 versions of the @value{DOCUMENT}.
 1571 @end ifnotinfo
 1572 
 1573 There are sidebars
 1574 scattered throughout the @value{DOCUMENT}.
 1575 They add a more complete explanation of points that are relevant, but not likely
 1576 to be of interest on first reading.
 1577 @ifclear FOR_PRINT
 1578 All appear in the index, under the heading ``sidebar.''
 1579 @end ifclear
 1580 
 1581 Most of the time, the examples use complete @command{awk} programs.
 1582 Some of the more advanced @value{SECTION}s show only the part of the @command{awk}
 1583 program that illustrates the concept being described.
 1584 
 1585 Although this @value{DOCUMENT} is aimed principally at people who have not been
 1586 exposed
 1587 to @command{awk}, there is a lot of information here that even the @command{awk}
 1588 expert should find useful.  In particular, the description of POSIX
 1589 @command{awk} and the example programs in
 1590 @ref{Library Functions}, and
 1591 @ifnotdocbook
 1592 in
 1593 @end ifnotdocbook
 1594 @ref{Sample Programs},
 1595 should be of interest.
 1596 
 1597 This @value{DOCUMENT} is split into several parts, as follows:
 1598 
 1599 @c FULLXREF ON
 1600 
 1601 @itemize @value{BULLET}
 1602 @item
 1603 Part I describes the @command{awk} language and the @command{gawk} program in detail.
 1604 It starts with the basics, and continues through all of the features of @command{awk}.
 1605 It contains the following chapters:
 1606 
 1607 @c nested
 1608 @itemize @value{MINUS}
 1609 @item
 1610 @ref{Getting Started},
 1611 provides the essentials you need to know to begin using @command{awk}.
 1612 
 1613 @item
 1614 @ref{Invoking Gawk},
 1615 describes how to run @command{gawk}, the meaning of its
 1616 command-line options, and how it finds @command{awk}
 1617 program source files.
 1618 
 1619 @item
 1620 @ref{Regexp},
 1621 introduces regular expressions in general, and in particular the flavors
 1622 supported by POSIX @command{awk} and @command{gawk}.
 1623 
 1624 @item
 1625 @ref{Reading Files},
 1626 describes how @command{awk} reads your data.
 1627 It introduces the concepts of records and fields, as well
 1628 as the @code{getline} command.
 1629 I/O redirection is first described here.
 1630 Network I/O is also briefly introduced here.
 1631 
 1632 @item
 1633 @ref{Printing},
 1634 describes how @command{awk} programs can produce output with
 1635 @code{print} and @code{printf}.
 1636 
 1637 @item
 1638 @ref{Expressions},
 1639 describes expressions, which are the basic building blocks
 1640 for getting most things done in a program.
 1641 
 1642 @item
 1643 @ref{Patterns and Actions},
 1644 describes how to write patterns for matching records, actions for
 1645 doing something when a record is matched, and the predefined variables
 1646 @command{awk} and @command{gawk} use.
 1647 
 1648 @item
 1649 @ref{Arrays},
 1650 covers @command{awk}'s one-and-only data structure: the associative array.
 1651 Deleting array elements and whole arrays is described, as well as
 1652 sorting arrays in @command{gawk}.  The @value{CHAPTER} also describes how
 1653 @command{gawk} provides arrays of arrays.
 1654 
 1655 @item
 1656 @ref{Functions},
 1657 describes the built-in functions @command{awk} and @command{gawk} provide,
 1658 as well as how to define your own functions.  It also discusses how
 1659 @command{gawk} lets you call functions indirectly.
 1660 @end itemize
 1661 
 1662 @item
 1663 Part II shows how to use @command{awk} and @command{gawk} for problem solving.
 1664 There is lots of code here for you to read and learn from.
 1665 This part contains the following chapters:
 1666 
 1667 @c nested
 1668 @itemize @value{MINUS}
 1669 @item
 1670 @ref{Library Functions}, provides a number of functions meant to
 1671 be used from main @command{awk} programs.
 1672 
 1673 @item
 1674 @ref{Sample Programs},
 1675 provides many sample @command{awk} programs.
 1676 @end itemize
 1677 
 1678 Reading these two chapters allows you to see @command{awk}
 1679 solving real problems.
 1680 
 1681 @item
 1682 Part III focuses on features specific to @command{gawk}.
 1683 It contains the following chapters:
 1684 
 1685 @c nested
 1686 @itemize @value{MINUS}
 1687 @item
 1688 @ref{Advanced Features},
 1689 describes a number of advanced features.
 1690 Of particular note
 1691 are the abilities to control the order of array traversal,
 1692 have two-way communications with another process,
 1693 perform TCP/IP networking, and
 1694 profile your @command{awk} programs.
 1695 
 1696 @item
 1697 @ref{Internationalization},
 1698 describes special features for translating program
 1699 messages into different languages at runtime.
 1700 
 1701 @item
 1702 @ref{Debugger}, describes the @command{gawk} debugger.
 1703 
 1704 @item
 1705 @ref{Namespaces}, describes how @command{gawk} allows variables and/or
 1706 functions of the same name to be in different namespaces.
 1707 
 1708 @item
 1709 @ref{Arbitrary Precision Arithmetic},
 1710 describes advanced arithmetic facilities.
 1711 
 1712 @item
 1713 @ref{Dynamic Extensions}, describes how to add new variables and
 1714 functions to @command{gawk} by writing extensions in C or C++.
 1715 @end itemize
 1716 
 1717 @item
 1718 @ifclear FOR_PRINT
 1719 Part IV provides the appendices, the Glossary, and two licenses that cover
 1720 the @command{gawk} source code and this @value{DOCUMENT}, respectively.
 1721 It contains the following appendices:
 1722 @end ifclear
 1723 @ifset FOR_PRINT
 1724 Part IV provides the following appendices,
 1725 including the GNU General Public License:
 1726 @end ifset
 1727 
 1728 @itemize @value{MINUS}
 1729 @item
 1730 @ref{Language History},
 1731 describes how the @command{awk} language has evolved since
 1732 its first release to the present.  It also describes how @command{gawk}
 1733 has acquired features over time.
 1734 
 1735 @item
 1736 @ref{Installation},
 1737 describes how to get @command{gawk}, how to compile it
 1738 on POSIX-compatible systems,
 1739 and how to compile and use it on different
 1740 non-POSIX systems.  It also describes how to report bugs
 1741 in @command{gawk} and where to get other freely
 1742 available @command{awk} implementations.
 1743 
 1744 @ifset FOR_PRINT
 1745 @item
 1746 @ref{Copying},
 1747 presents the license that covers the @command{gawk} source code.
 1748 @end ifset
 1749 
 1750 @ifclear FOR_PRINT
 1751 @item
 1752 @ref{Notes},
 1753 describes how to disable @command{gawk}'s extensions, as
 1754 well as how to contribute new code to @command{gawk},
 1755 and some possible future directions for @command{gawk} development.
 1756 
 1757 @item
 1758 @ref{Basic Concepts},
 1759 provides some very cursory background material for those who
 1760 are completely unfamiliar with computer programming.
 1761 
 1762 @item
 1763 The @ref{Glossary}, defines most, if not all, of the significant terms used
 1764 throughout the @value{DOCUMENT}.  If you find terms that you aren't familiar with,
 1765 try looking them up here.
 1766 
 1767 @item
 1768 @ref{Copying}, and
 1769 @ref{GNU Free Documentation License},
 1770 present the licenses that cover the @command{gawk} source code
 1771 and this @value{DOCUMENT}, respectively.
 1772 @end ifclear
 1773 @end itemize
 1774 @end itemize
 1775 
 1776 @ifset FOR_PRINT
 1777 The version of this @value{DOCUMENT} distributed with @command{gawk}
 1778 contains additional appendices and other end material.
 1779 To save space, we have omitted them from the
 1780 printed edition. You may find them online, as follows:
 1781 
 1782 @itemize @value{BULLET}
 1783 @item
 1784 @uref{https://www.gnu.org/software/gawk/manual/html_node/Notes.html,
 1785 The appendix on implementation notes}
 1786 describes how to disable @command{gawk}'s extensions, how to contribute
 1787 new code to @command{gawk}, where to find information on some possible
 1788 future directions for @command{gawk} development, and the design decisions
 1789 behind the extension API.
 1790 
 1791 @item
 1792 @uref{https://www.gnu.org/software/gawk/manual/html_node/Basic-Concepts.html,
 1793 The appendix on basic concepts}
 1794 provides some very cursory background material for those who
 1795 are completely unfamiliar with computer programming.
 1796 
 1797 @item
 1798 @uref{https://www.gnu.org/software/gawk/manual/html_node/Glossary.html,
 1799 The Glossary}
 1800 defines most, if not all, of the significant terms used
 1801 throughout the @value{DOCUMENT}.  If you find terms that you aren't familiar with,
 1802 try looking them up here.
 1803 
 1804 @item
 1805 @uref{https://www.gnu.org/software/gawk/manual/html_node/GNU-Free-Documentation-License.html,
 1806 The GNU FDL}
 1807 is the license that covers this @value{DOCUMENT}.
 1808 @end itemize
 1809 
 1810 @c ok not to use CHAPTER / SECTION here
 1811 Some of the chapters have exercise sections; these have also been
 1812 omitted from the print edition but are available online.
 1813 @end ifset
 1814 
 1815 @c FULLXREF OFF
 1816 
 1817 @node Conventions
 1818 @unnumberedsec Typographical Conventions
 1819 
 1820 @cindex Texinfo
 1821 This @value{DOCUMENT} is written in @uref{https://www.gnu.org/software/texinfo/, Texinfo},
 1822 the GNU documentation formatting language.
 1823 A single Texinfo source file is used to produce both the printed and online
 1824 versions of the documentation.
 1825 @ifnotinfo
 1826 Because of this, the typographical conventions
 1827 are slightly different than in other books you may have read.
 1828 @end ifnotinfo
 1829 @ifinfo
 1830 This @value{SECTION} briefly documents the typographical conventions used in Texinfo.
 1831 @end ifinfo
 1832 
 1833 Examples you would type at the command line are preceded by the common
 1834 shell primary and secondary prompts, @samp{$} and @samp{>}, respectively.
 1835 Input that you type is shown @kbd{like this}.
 1836 @c 8/2014: @print{} is stripped from the texi to make docbook.
 1837 @ifclear FOR_PRINT
 1838 Output from the command is preceded by the glyph ``@print{}''.
 1839 This typically represents the command's standard output.
 1840 @end ifclear
 1841 @ifset FOR_PRINT
 1842 Output from the command, usually its standard output, appears
 1843 @code{like this}.
 1844 @end ifset
 1845 Error messages and other output on the command's standard error are preceded
 1846 by the glyph ``@error{}''.  For example:
 1847 
 1848 @example
 1849 $ @kbd{echo hi on stdout}
 1850 @print{} hi on stdout
 1851 $ @kbd{echo hello on stderr 1>&2}
 1852 @error{} hello on stderr
 1853 @end example
 1854 
 1855 @ifnotinfo
 1856 In the text, almost anything related to programming, such as
 1857 command names,
 1858 variable and function names, and string, numeric and regexp constants
 1859 appear in @code{this font}. Code fragments
 1860 appear in the same font and quoted, @samp{like this}.
 1861 Things that are replaced by the user or programmer
 1862 appear in @var{this font}.
 1863 Options look like this: @option{-f}.
 1864 @value{FFN}s are indicated like this: @file{/path/to/ourfile}.
 1865 @ifclear FOR_PRINT
 1866 Some things are
 1867 emphasized @emph{like this}, and if a point needs to be made
 1868 strongly, it is done @strong{like this}.
 1869 @end ifclear
 1870 The first occurrence of
 1871 a new term is usually its @dfn{definition} and appears in the same
 1872 font as the previous occurrence of ``definition'' in this sentence.
 1873 @end ifnotinfo
 1874 
 1875 Characters that you type at the keyboard look @kbd{like this}.  In particular,
 1876 there are special characters called ``control characters.''  These are
 1877 characters that you type by holding down both the @kbd{CONTROL} key and
 1878 another key, at the same time.  For example, a @kbd{Ctrl-d} is typed
 1879 by first pressing and holding the @kbd{CONTROL} key, next
 1880 pressing the @kbd{d} key, and finally releasing both keys.
 1881 
 1882 For the sake of brevity, throughout this @value{DOCUMENT}, we refer to
 1883 Brian Kernighan's version of @command{awk} as ``BWK @command{awk}.''
 1884 (@xref{Other Versions} for information on his and other versions.)
 1885 
 1886 @ifset FOR_PRINT
 1887 @quotation NOTE
 1888 Notes of interest look like this.
 1889 @end quotation
 1890 
 1891 @quotation CAUTION
 1892 Cautionary or warning notes look like this.
 1893 @end quotation
 1894 @end ifset
 1895 
 1896 @c fakenode --- for prepinfo
 1897 @unnumberedsubsec Dark Corners
 1898 @cindex Kernighan, Brian
 1899 @quotation
 1900 @i{Dark corners are basically fractal---no matter how much
 1901 you illuminate, there's always a smaller but darker one.}
 1902 @author Brian Kernighan
 1903 @end quotation
 1904 
 1905 @cindex d.c. @seeentry{dark corner}
 1906 @cindex dark corner
 1907 Until the POSIX standard (and @cite{@value{TITLE}}),
 1908 many features of @command{awk} were either poorly documented or not
 1909 documented at all.  Descriptions of such features
 1910 (often called ``dark corners'') are noted in this @value{DOCUMENT} with
 1911 @iftex
 1912 the picture of a flashlight in the margin, as shown here.
 1913 @value{DARKCORNER}
 1914 @end iftex
 1915 @ifnottex
 1916 ``(d.c.).''
 1917 @end ifnottex
 1918 @ifclear FOR_PRINT
 1919 They also appear in the index under the heading ``dark corner.''
 1920 @end ifclear
 1921 
 1922 But, as noted by the opening quote, any coverage of dark
 1923 corners is by definition incomplete.
 1924 
 1925 @cindex c.e. @seeentry{common extensions}
 1926 Extensions to the standard @command{awk} language that are supported by
 1927 more than one @command{awk} implementation are marked
 1928 @ifclear FOR_PRINT
 1929 ``@value{COMMONEXT},'' and listed in the index under ``common extensions''
 1930 and ``extensions, common.''
 1931 @end ifclear
 1932 @ifset FOR_PRINT
 1933 ``@value{COMMONEXT}'' for ``common extension.''
 1934 @end ifset
 1935 
 1936 @node Manual History
 1937 @unnumberedsec The GNU Project and This Book
 1938 
 1939 @cindex FSF (Free Software Foundation)
 1940 @cindex Free Software Foundation (FSF)
 1941 @cindex Stallman, Richard
 1942 The Free Software Foundation (FSF) is a nonprofit organization dedicated
 1943 to the production and distribution of freely distributable software.
 1944 It was founded by Richard M.@: Stallman, the author of the original
 1945 Emacs editor.  GNU Emacs is the most widely used version of Emacs today.
 1946 
 1947 @cindex GNU Project
 1948 @cindex GPL (General Public License)
 1949 @cindex GNU General Public License @seeentry{GPL}
 1950 @cindex General Public License @seeentry{GPL}
 1951 @cindex documentation @subentry online
 1952 The GNU@footnote{GNU stands for ``GNU's Not Unix.''}
 1953 Project is an ongoing effort on the part of the Free Software
 1954 Foundation to create a complete, freely distributable, POSIX-compliant
 1955 computing environment.
 1956 The FSF uses the GNU General Public License (GPL) to ensure that
 1957 its software's
 1958 source code is always available to the end user.
 1959 @ifclear FOR_PRINT
 1960 A copy of the GPL is included
 1961 @ifnotinfo
 1962 in this @value{DOCUMENT}
 1963 @end ifnotinfo
 1964 for your reference
 1965 (@pxref{Copying}).
 1966 @end ifclear
 1967 The GPL applies to the C language source code for @command{gawk}.
 1968 To find out more about the FSF and the GNU Project online,
 1969 see @uref{https://www.gnu.org, the GNU Project's home page}.
 1970 This @value{DOCUMENT} may also be read from
 1971 @uref{https://www.gnu.org/software/gawk/manual/, GNU's website}.
 1972 
 1973 @ifclear FOR_PRINT
 1974 A shell, an editor (Emacs), highly portable optimizing C, C++, and
 1975 Objective-C compilers, a symbolic debugger and dozens of large and
 1976 small utilities (such as @command{gawk}), have all been completed and are
 1977 freely available.  The GNU operating
 1978 system kernel (the HURD), has been released but remains in an early
 1979 stage of development.
 1980 
 1981 @cindex Linux @seeentry{GNU/Linux}
 1982 @cindex GNU/Linux
 1983 @cindex operating systems @subentry BSD-based
 1984 Until the GNU operating system is more fully developed, you should
 1985 consider using GNU/Linux, a freely distributable, Unix-like operating
 1986 system for Intel,
 1987 Power Architecture,
 1988 Sun SPARC, IBM S/390, and other
 1989 systems.@footnote{The terminology ``GNU/Linux'' is explained
 1990 in the @ref{Glossary}.}
 1991 Many GNU/Linux distributions are
 1992 available for download from the Internet.
 1993 @end ifclear
 1994 
 1995 @ifnotinfo
 1996 The @value{DOCUMENT} you are reading is actually free---at least, the
 1997 information in it is free to anyone.  The machine-readable
 1998 source code for the @value{DOCUMENT} comes with @command{gawk}.
 1999 @ifclear FOR_PRINT
 2000 (Take a moment to check the Free Documentation
 2001 License in @ref{GNU Free Documentation License}.)
 2002 @end ifclear
 2003 @end ifnotinfo
 2004 
 2005 @cindex Close, Diane
 2006 The @value{DOCUMENT} itself has gone through multiple previous editions.
 2007 Paul Rubin wrote the very first draft of @cite{The GAWK Manual};
 2008 it was around 40 pages long.
 2009 Diane Close and Richard Stallman improved it, yielding a
 2010 version that was
 2011 around 90 pages and barely described the original, ``old''
 2012 version of @command{awk}.
 2013 
 2014 I started working with that version in the fall of 1988.
 2015 As work on it progressed,
 2016 the FSF published several preliminary versions (numbered 0.@var{x}).
 2017 In 1996, edition 1.0 was released with @command{gawk} 3.0.0.
 2018 The FSF published the first two editions under
 2019 the title @cite{The GNU Awk User's Guide}.
 2020 @ifset FOR_PRINT
 2021 SSC published two editions of the @value{DOCUMENT} under the
 2022 title @cite{Effective awk Programming}, and O'Reilly published
 2023 the third edition in 2001.
 2024 @end ifset
 2025 
 2026 This edition maintains the basic structure of the previous editions.
 2027 For FSF edition 4.0, the content was thoroughly reviewed and updated. All
 2028 references to @command{gawk} versions prior to 4.0 were removed.
 2029 Of significant note for that edition was the addition of @ref{Debugger}.
 2030 
 2031 For FSF edition
 2032 @ifclear FOR_PRINT
 2033 5.0,
 2034 @end ifclear
 2035 @ifset FOR_PRINT
 2036 @value{EDITION}
 2037 (the fourth edition as published by O'Reilly),
 2038 @end ifset
 2039 the content has been reorganized into parts,
 2040 and the major new additions are @ref{Arbitrary Precision Arithmetic},
 2041 and @ref{Dynamic Extensions}.
 2042 
 2043 This @value{DOCUMENT} will undoubtedly continue to evolve.  If you
 2044 find an error in the @value{DOCUMENT}, please report it!  @xref{Bugs}
 2045 for information on submitting problem reports electronically.
 2046 
 2047 @ifset FOR_PRINT
 2048 @c fakenode --- for prepinfo
 2049 @unnumberedsec How to Stay Current
 2050 
 2051 You may have a newer version of @command{gawk} than the
 2052 one described here.  To find out what has changed,
 2053 you should first look at the @file{NEWS} file in the @command{gawk}
 2054 distribution, which provides a high-level summary of the changes in
 2055 each release.
 2056 
 2057 You can then look at the @uref{https://www.gnu.org/software/gawk/manual/,
 2058 online version} of this @value{DOCUMENT} to read about any new features.
 2059 @end ifset
 2060 
 2061 @ifclear FOR_PRINT
 2062 @node How To Contribute
 2063 @unnumberedsec How to Contribute
 2064 
 2065 As the maintainer of GNU @command{awk}, I once thought that I would be
 2066 able to manage a collection of publicly available @command{awk} programs
 2067 and I even solicited contributions.  Making things available on the Internet
 2068 helps keep the @command{gawk} distribution down to manageable size.
 2069 
 2070 The initial collection of material, such as it is, is still available
 2071 at @uref{ftp://ftp.freefriends.org/arnold/Awkstuff}.
 2072 
 2073 In the hopes of doing something more broad, I acquired the
 2074 @code{awklang.org} domain.  Late in 2017, a volunteer took on the task
 2075 of managing it.
 2076 
 2077 If you have written an interesting @command{awk} program, that
 2078 you would like to share with the rest of the world, please see
 2079 @uref{http://www.awklang.org} and use the ``Contact'' link.
 2080 
 2081 If you have written a @command{gawk} extension, please see
 2082 @ref{gawkextlib}.
 2083 @end ifclear
 2084 
 2085 @node Acknowledgments
 2086 @unnumberedsec Acknowledgments
 2087 
 2088 The initial draft of @cite{The GAWK Manual} had the following acknowledgments:
 2089 
 2090 @quotation
 2091 Many people need to be thanked for their assistance in producing this
 2092 manual.  Jay Fenlason contributed many ideas and sample programs.  Richard
 2093 Mlynarik and Robert Chassell gave helpful comments on drafts of this
 2094 manual.  The paper @cite{A Supplemental Document for AWK} by John W.@:
 2095 Pierce of the Chemistry Department at UC San Diego, pinpointed several
 2096 issues relevant both to @command{awk} implementation and to this manual, that
 2097 would otherwise have escaped us.
 2098 @end quotation
 2099 
 2100 @cindex Stallman, Richard
 2101 I would like to acknowledge Richard M.@: Stallman, for his vision of a
 2102 better world and for his courage in founding the FSF and starting the
 2103 GNU Project.
 2104 
 2105 @ifclear FOR_PRINT
 2106 Earlier editions of this @value{DOCUMENT} had the following acknowledgements:
 2107 @end ifclear
 2108 @ifset FOR_PRINT
 2109 The previous edition of this @value{DOCUMENT} had
 2110 the following acknowledgements:
 2111 @end ifset
 2112 
 2113 @quotation
 2114 The following people (in alphabetical order)
 2115 provided helpful comments on various
 2116 versions of this book:
 2117 Rick Adams,
 2118 Dr.@: Nelson H.F. Beebe,
 2119 Karl Berry,
 2120 Dr.@: Michael Brennan,
 2121 Rich Burridge,
 2122 Claire Cloutier,
 2123 Diane Close,
 2124 Scott Deifik,
 2125 Christopher (``Topher'') Eliot,
 2126 Jeffrey Friedl,
 2127 Dr.@: Darrel Hankerson,
 2128 Michal Jaegermann,
 2129 Dr.@: Richard J.@: LeBlanc,
 2130 Michael Lijewski,
 2131 Pat Rankin,
 2132 Miriam Robbins,
 2133 Mary Sheehan,
 2134 and
 2135 Chuck Toporek.
 2136 
 2137 @cindex Berry, Karl
 2138 @cindex Chassell, Robert J.@:
 2139 @c @cindex Texinfo
 2140 Robert J.@: Chassell provided much valuable advice on
 2141 the use of Texinfo.
 2142 He also deserves special thanks for
 2143 convincing me @emph{not} to title this @value{DOCUMENT}
 2144 @cite{How to Gawk Politely}.
 2145 Karl Berry helped significantly with the @TeX{} part of Texinfo.
 2146 
 2147 @cindex Hartholz @subentry Marshall
 2148 @cindex Hartholz @subentry Elaine
 2149 @cindex Schreiber @subentry Bert
 2150 @cindex Schreiber @subentry Rita
 2151 I would like to thank Marshall and Elaine Hartholz of Seattle and
 2152 Dr.@: Bert and Rita Schreiber of Detroit for large amounts of quiet vacation
 2153 time in their homes, which allowed me to make significant progress on
 2154 this @value{DOCUMENT} and on @command{gawk} itself.
 2155 
 2156 @cindex Hughes, Phil
 2157 Phil Hughes of SSC
 2158 contributed in a very important way by loaning me his laptop GNU/Linux
 2159 system, not once, but twice, which allowed me to do a lot of work while
 2160 away from home.
 2161 
 2162 @cindex Trueman, David
 2163 David Trueman deserves special credit; he has done a yeoman job
 2164 of evolving @command{gawk} so that it performs well and without bugs.
 2165 Although he is no longer involved with @command{gawk},
 2166 working with him on this project was a significant pleasure.
 2167 
 2168 @cindex Drepper, Ulrich
 2169 @cindex GNITS mailing list
 2170 @cindex mailing list, GNITS
 2171 The intrepid members of the GNITS mailing list, and most notably Ulrich
 2172 Drepper, provided invaluable help and feedback for the design of the
 2173 internationalization features.
 2174 
 2175 Chuck Toporek, Mary Sheehan, and Claire Cloutier of O'Reilly & Associates contributed
 2176 significant editorial help for this @value{DOCUMENT} for the
 2177 3.1 release of @command{gawk}.
 2178 @end quotation
 2179 
 2180 @cindex Beebe, Nelson H.F.@:
 2181 @cindex Buening, Andreas
 2182 @cindex Collado, Manuel
 2183 @cindex Colombo, Antonio
 2184 @cindex Davies, Stephen
 2185 @cindex Deifik, Scott
 2186 @cindex Demaille, Akim
 2187 @cindex G., Daniel Richard
 2188 @cindex Guerrero, Juan Manuel
 2189 @cindex Hankerson, Darrel
 2190 @cindex Jaegermann, Michal
 2191 @cindex Kahrs, J@"urgen
 2192 @cindex Kasal, Stepan
 2193 @cindex Malmberg, John
 2194 @cindex Ramey, Chet
 2195 @cindex Rankin, Pat
 2196 @cindex Schorr, Andrew
 2197 @cindex Vinschen, Corinna
 2198 @cindex Zaretskii, Eli
 2199 
 2200 Dr.@: Nelson Beebe,
 2201 Andreas Buening,
 2202 Dr.@: Manuel Collado,
 2203 Antonio Colombo,
 2204 Stephen Davies,
 2205 Scott Deifik,
 2206 Akim Demaille,
 2207 Daniel Richard G.,
 2208 Juan Manuel Guerrero,
 2209 Darrel Hankerson,
 2210 Michal Jaegermann,
 2211 J@"urgen Kahrs,
 2212 Stepan Kasal,
 2213 John Malmberg,
 2214 Chet Ramey,
 2215 Pat Rankin,
 2216 Andrew Schorr,
 2217 Corinna Vinschen,
 2218 and Eli Zaretskii
 2219 (in alphabetical order)
 2220 make up the current @command{gawk} ``crack portability team.''  Without
 2221 their hard work and help, @command{gawk} would not be nearly the robust,
 2222 portable program it is today.  It has been and continues to be a pleasure
 2223 working with this team of fine people.
 2224 
 2225 Notable code and documentation contributions were made by
 2226 a number of people. @xref{Contributors} for the full list.
 2227 
 2228 @ifset FOR_PRINT
 2229 @cindex Oram, Andy
 2230 Thanks to Andy Oram of O'Reilly Media for initiating
 2231 the fourth edition and for his support during the work.
 2232 Thanks to Jasmine Kwityn for her copyediting work.
 2233 @end ifset
 2234 
 2235 Thanks to Michael Brennan for the Forewords.
 2236 
 2237 @cindex Duman, Patrice
 2238 @cindex Berry, Karl
 2239 @cindex Smith, Gavin
 2240 Thanks to Patrice Dumas for the new @command{makeinfo} program.
 2241 Thanks to Karl Berry for his past work on Texinfo, and
 2242 to Gavin Smith, who continues to work to improve
 2243 the Texinfo markup language.
 2244 
 2245 @cindex Kernighan, Brian
 2246 @cindex Brennan, Michael
 2247 @cindex Day, Robert P.J.@:
 2248 Robert P.J.@: Day, Michael Brennan, and Brian Kernighan kindly acted as
 2249 reviewers for the 2015 edition of this @value{DOCUMENT}. Their feedback
 2250 helped improve the final work.
 2251 
 2252 I would also like to thank Brian Kernighan for his invaluable assistance during the
 2253 testing and debugging of @command{gawk}, and for his ongoing
 2254 help and advice in clarifying numerous points about the language.
 2255 We could not have done nearly as good a job on either @command{gawk}
 2256 or its documentation without his help.
 2257 
 2258 Brian is in a class by himself as a programmer and technical
 2259 author.  I have to thank him (yet again) for his ongoing friendship
 2260 and for being a role model to me for over 30 years!
 2261 Having him as a reviewer is an exciting privilege. It has also
 2262 been extremely humbling@enddots{}
 2263 
 2264 @cindex Robbins @subentry Miriam
 2265 @cindex Robbins @subentry Jean
 2266 @cindex Robbins @subentry Harry
 2267 @cindex G-d
 2268 I must thank my wonderful wife, Miriam, for her patience through
 2269 the many versions of this project, for her proofreading,
 2270 and for sharing me with the computer.
 2271 I would like to thank my parents for their love, and for the grace with
 2272 which they raised and educated me.
 2273 Finally, I also must acknowledge my gratitude to G-d, for the many opportunities
 2274 He has sent my way, as well as for the gifts He has given me with which to
 2275 take advantage of those opportunities.
 2276 @ifnotdocbook
 2277 @sp 2
 2278 @noindent
 2279 Arnold Robbins @*
 2280 Nof Ayalon @*
 2281 Israel @*
 2282 March, 2020
 2283 @end ifnotdocbook
 2284 
 2285 @ifnotinfo
 2286 @part @value{PART1}The @command{awk} Language
 2287 @end ifnotinfo
 2288 
 2289 @ifdocbook
 2290 
 2291 Part I describes the @command{awk} language and @command{gawk} program
 2292 in detail.  It starts with the basics, and continues through all of
 2293 the features of @command{awk}. Included also are many, but not all,
 2294 of the features of @command{gawk}.  This part contains the
 2295 following chapters:
 2296 
 2297 @itemize @value{BULLET}
 2298 @item
 2299 @ref{Getting Started}
 2300 
 2301 @item
 2302 @ref{Invoking Gawk}
 2303 
 2304 @item
 2305 @ref{Regexp}
 2306 
 2307 @item
 2308 @ref{Reading Files}
 2309 
 2310 @item
 2311 @ref{Printing}
 2312 
 2313 @item
 2314 @ref{Expressions}
 2315 
 2316 @item
 2317 @ref{Patterns and Actions}
 2318 
 2319 @item
 2320 @ref{Arrays}
 2321 
 2322 @item
 2323 @ref{Functions}
 2324 @end itemize
 2325 @end ifdocbook
 2326 
 2327 @node Getting Started
 2328 @chapter Getting Started with @command{awk}
 2329 @c @cindex script, definition of
 2330 @c @cindex rule, definition of
 2331 @c @cindex program, definition of
 2332 @c @cindex basic function of @command{awk}
 2333 @cindex @command{awk} @subentry function of
 2334 
 2335 The basic function of @command{awk} is to search files for lines (or other
 2336 units of text) that contain certain patterns.  When a line matches one
 2337 of the patterns, @command{awk} performs specified actions on that line.
 2338 @command{awk} continues to process input lines in this way until it reaches
 2339 the end of the input files.
 2340 
 2341 @cindex @command{awk} @subentry uses for
 2342 @cindex programming languages @subentry data-driven vs.@: procedural
 2343 @cindex @command{awk} programs
 2344 Programs in @command{awk} are different from programs in most other languages,
 2345 because @command{awk} programs are @dfn{data driven} (i.e., you describe
 2346 the data you want to work with and then what to do when you find it).
 2347 Most other languages are @dfn{procedural}; you have to describe, in great
 2348 detail, every step the program should take.  When working with procedural
 2349 languages, it is usually much
 2350 harder to clearly describe the data your program will process.
 2351 For this reason, @command{awk} programs are often refreshingly easy to
 2352 read and write.
 2353 
 2354 @cindex program, definition of
 2355 @cindex rule, definition of
 2356 When you run @command{awk}, you specify an @command{awk} @dfn{program} that
 2357 tells @command{awk} what to do.  The program consists of a series of
 2358 @dfn{rules} (it may also contain @dfn{function definitions},
 2359 an advanced feature that we will ignore for now;
 2360 @pxref{User-defined}).  Each rule specifies one
 2361 pattern to search for and one action to perform
 2362 upon finding the pattern.
 2363 
 2364 Syntactically, a rule consists of a @dfn{pattern} followed by an
 2365 @dfn{action}.  The action is enclosed in braces to separate it from the
 2366 pattern.  Newlines usually separate rules.  Therefore, an @command{awk}
 2367 program looks like this:
 2368 
 2369 @example
 2370 @var{pattern} @{ @var{action} @}
 2371 @var{pattern} @{ @var{action} @}
 2372 @dots{}
 2373 @end example
 2374 
 2375 @menu
 2376 * Running gawk::                How to run @command{gawk} programs; includes
 2377                                 command-line syntax.
 2378 * Sample Data Files::           Sample data files for use in the @command{awk}
 2379                                 programs illustrated in this @value{DOCUMENT}.
 2380 * Very Simple::                 A very simple example.
 2381 * Two Rules::                   A less simple one-line example using two
 2382                                 rules.
 2383 * More Complex::                A more complex example.
 2384 * Statements/Lines::            Subdividing or combining statements into
 2385                                 lines.
 2386 * Other Features::              Other Features of @command{awk}.
 2387 * When::                        When to use @command{gawk} and when to use
 2388                                 other things.
 2389 * Intro Summary::               Summary of the introduction.
 2390 @end menu
 2391 
 2392 @node Running gawk
 2393 @section How to Run @command{awk} Programs
 2394 
 2395 @cindex @command{awk} programs @subentry running
 2396 There are several ways to run an @command{awk} program.  If the program is
 2397 short, it is easiest to include it in the command that runs @command{awk},
 2398 like this:
 2399 
 2400 @example
 2401 awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
 2402 @end example
 2403 
 2404 @cindex command line @subentry formats
 2405 When the program is long, it is usually more convenient to put it in a file
 2406 and run it with a command like this:
 2407 
 2408 @example
 2409 awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{}
 2410 @end example
 2411 
 2412 This @value{SECTION} discusses both mechanisms, along with several
 2413 variations of each.
 2414 
 2415 @menu
 2416 * One-shot::                    Running a short throwaway @command{awk}
 2417                                 program.
 2418 * Read Terminal::               Using no input files (input from the keyboard
 2419                                 instead).
 2420 * Long::                        Putting permanent @command{awk} programs in
 2421                                 files.
 2422 * Executable Scripts::          Making self-contained @command{awk} programs.
 2423 * Comments::                    Adding documentation to @command{gawk}
 2424                                 programs.
 2425 * Quoting::                     More discussion of shell quoting issues.
 2426 @end menu
 2427 
 2428 @node One-shot
 2429 @subsection One-Shot Throwaway @command{awk} Programs
 2430 
 2431 Once you are familiar with @command{awk}, you will often type in simple
 2432 programs the moment you want to use them.  Then you can write the
 2433 program as the first argument of the @command{awk} command, like this:
 2434 
 2435 @example
 2436 awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
 2437 @end example
 2438 
 2439 @noindent
 2440 where @var{program} consists of a series of patterns and
 2441 actions, as described earlier.
 2442 
 2443 @cindex single quote (@code{'})
 2444 @cindex @code{'} (single quote)
 2445 This command format instructs the @dfn{shell}, or command interpreter,
 2446 to start @command{awk} and use the @var{program} to process records in the
 2447 input file(s).  There are single quotes around @var{program} so
 2448 the shell won't interpret any @command{awk} characters as special shell
 2449 characters.  The quotes also cause the shell to treat all of @var{program} as
 2450 a single argument for @command{awk}, and allow @var{program} to be more
 2451 than one line long.
 2452 
 2453 @cindex shells @subentry scripts
 2454 @cindex @command{awk} programs @subentry running @subentry from shell scripts
 2455 This format is also useful for running short or medium-sized @command{awk}
 2456 programs from shell scripts, because it avoids the need for a separate
 2457 file for the @command{awk} program.  A self-contained shell script is more
 2458 reliable because there are no other files to misplace.
 2459 
 2460 Later in this chapter, in
 2461 @ifdocbook
 2462 the @value{SECTION}
 2463 @end ifdocbook
 2464 @ref{Very Simple},
 2465 we'll see examples of several short,
 2466 self-contained programs.
 2467 
 2468 @node Read Terminal
 2469 @subsection Running @command{awk} Without Input Files
 2470 
 2471 @cindex standard input
 2472 @cindex input @subentry standard
 2473 @cindex input files @subentry running @command{awk} without
 2474 You can also run @command{awk} without any input files.  If you type the
 2475 following command line:
 2476 
 2477 @example
 2478 awk '@var{program}'
 2479 @end example
 2480 
 2481 @noindent
 2482 @command{awk} applies the @var{program} to the @dfn{standard input},
 2483 which usually means whatever you type on the keyboard.  This continues
 2484 until you indicate end-of-file by typing @kbd{Ctrl-d}.
 2485 (On non-POSIX operating systems, the end-of-file character may be different.)
 2486 
 2487 @cindex files @subentry input @seeentry{input files}
 2488 @cindex input files @subentry running @command{awk} without
 2489 @cindex @command{awk} programs @subentry running @subentry without input files
 2490 As an example, the following program prints a friendly piece of advice
 2491 (from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}),
 2492 to keep you from worrying about the complexities of computer
 2493 programming:
 2494 
 2495 @example
 2496 $ @kbd{awk 'BEGIN @{ print "Don\47t Panic!" @}'}
 2497 @print{} Don't Panic!
 2498 @end example
 2499 
 2500 @command{awk} executes statements associated with @code{BEGIN} before
 2501 reading any input.  If there are no other statements in your program,
 2502 as is the case here, @command{awk} just stops, instead of trying to read
 2503 input it doesn't know how to process.
 2504 The @samp{\47} is a magic way (explained later) of getting a single quote into
 2505 the program, without having to engage in ugly shell quoting tricks.
 2506 
 2507 @quotation NOTE
 2508 If you use Bash as your shell, you should execute the
 2509 command @samp{set +H} before running this program interactively, to
 2510 disable the C shell-style command history, which treats @samp{!} as a
 2511 special character. We recommend putting this command into your personal
 2512 startup file.
 2513 @end quotation
 2514 
 2515 This next simple @command{awk} program
 2516 emulates the @command{cat} utility; it copies whatever you type on the
 2517 keyboard to its standard output (why this works is explained shortly):
 2518 
 2519 @example
 2520 $ @kbd{awk '@{ print @}'}
 2521 @kbd{Now is the time for all good men}
 2522 @print{} Now is the time for all good men
 2523 @kbd{to come to the aid of their country.}
 2524 @print{} to come to the aid of their country.
 2525 @kbd{Four score and seven years ago, ...}
 2526 @print{} Four score and seven years ago, ...
 2527 @kbd{What, me worry?}
 2528 @print{} What, me worry?
 2529 @kbd{Ctrl-d}
 2530 @end example
 2531 
 2532 @node Long
 2533 @subsection Running Long Programs
 2534 
 2535 @cindex @command{awk} programs @subentry running
 2536 @cindex @command{awk} programs @subentry lengthy
 2537 @cindex files @subentry @command{awk} programs in
 2538 Sometimes @command{awk} programs are very long.  In these cases, it is
 2539 more convenient to put the program into a separate file.  In order to tell
 2540 @command{awk} to use that file for its program, you type:
 2541 
 2542 @example
 2543 awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
 2544 @end example
 2545 
 2546 @cindex @option{-f} option
 2547 @cindex command line @subentry option @option{-f}
 2548 The @option{-f} instructs the @command{awk} utility to get the
 2549 @command{awk} program from the file @var{source-file} (@pxref{Options}).
 2550 Any @value{FN} can be used for @var{source-file}.  For example, you
 2551 could put the program:
 2552 
 2553 @example
 2554 BEGIN @{ print "Don't Panic!" @}
 2555 @end example
 2556 
 2557 @noindent
 2558 into the file @file{advice}.  Then this command:
 2559 
 2560 @example
 2561 awk -f advice
 2562 @end example
 2563 
 2564 @noindent
 2565 does the same thing as this one:
 2566 
 2567 @example
 2568 awk 'BEGIN @{ print "Don\47t Panic!" @}'
 2569 @end example
 2570 
 2571 @cindex quoting @subentry in @command{gawk} command lines
 2572 @noindent
 2573 This was explained earlier
 2574 (@pxref{Read Terminal}).
 2575 Note that you don't usually need single quotes around the @value{FN} that you
 2576 specify with @option{-f}, because most @value{FN}s don't contain any of the shell's
 2577 special characters.  Notice that in @file{advice}, the @command{awk}
 2578 program did not have single quotes around it.  The quotes are only needed
 2579 for programs that are provided on the @command{awk} command line.
 2580 (Also, placing the program in a file allows us to use a literal single quote in the program
 2581 text, instead of the magic @samp{\47}.)
 2582 
 2583 @cindex single quote (@code{'}) @subentry in @command{gawk} command lines
 2584 @cindex @code{'} (single quote) @subentry in @command{gawk} command lines
 2585 If you want to clearly identify an @command{awk} program file as such,
 2586 you can add the extension @file{.awk} to the @value{FN}.  This doesn't
 2587 affect the execution of the @command{awk} program but it does make
 2588 ``housekeeping'' easier.
 2589 
 2590 @node Executable Scripts
 2591 @subsection Executable @command{awk} Programs
 2592 @cindex @command{awk} programs
 2593 @cindex @code{#} (number sign) @subentry @code{#!} (executable scripts)
 2594 @cindex Unix @subentry @command{awk} scripts and
 2595 @cindex number sign (@code{#}) @subentry @code{#!} (executable scripts)
 2596 
 2597 Once you have learned @command{awk}, you may want to write self-contained
 2598 @command{awk} scripts, using the @samp{#!} script mechanism.  You can do
 2599 this on many systems.@footnote{The @samp{#!} mechanism works on
 2600 GNU/Linux systems, BSD-based systems, and commercial Unix systems.}
 2601 For example, you could update the file @file{advice} to look like this:
 2602 
 2603 @example
 2604 #! /bin/awk -f
 2605 
 2606 BEGIN @{ print "Don't Panic!" @}
 2607 @end example
 2608 
 2609 @noindent
 2610 After making this file executable (with the @command{chmod} utility),
 2611 simply type @samp{advice}
 2612 at the shell and the system arranges to run @command{awk} as if you had
 2613 typed @samp{awk -f advice}:
 2614 
 2615 @example
 2616 $ @kbd{chmod +x advice}
 2617 $ @kbd{./advice}
 2618 @print{} Don't Panic!
 2619 @end example
 2620 
 2621 @noindent
 2622 Self-contained @command{awk} scripts are useful when you want to write a
 2623 program that users can invoke without their having to know that the program is
 2624 written in @command{awk}.
 2625 
 2626 @sidebar Understanding @samp{#!}
 2627 @cindex portability @subentry @code{#!} (executable scripts)
 2628 
 2629 @command{awk} is an @dfn{interpreted} language. This means that the
 2630 @command{awk} utility reads your program and then processes your data
 2631 according to the instructions in your program. (This is different
 2632 from a @dfn{compiled} language such as C, where your program is first
 2633 compiled into machine code that is executed directly by your system's
 2634 processor.)  The @command{awk} utility is thus termed an @dfn{interpreter}.
 2635 Many modern languages are interpreted.
 2636 
 2637 The line beginning with @samp{#!} lists the full @value{FN} of an
 2638 interpreter to run and a single optional initial command-line argument
 2639 to pass to that interpreter.  The operating system then runs the
 2640 interpreter with the given argument and the full argument list of the
 2641 executed program.  The first argument in the list is the full @value{FN}
 2642 of the @command{awk} program.  The rest of the argument list contains
 2643 either options to @command{awk}, or @value{DF}s, or both. (Note that on
 2644 many systems @command{awk} is found in @file{/usr/bin} instead of
 2645 in @file{/bin}.)
 2646 
 2647 Some systems limit the length of the interpreter name to 32 characters.
 2648 Often, this can be dealt with by using a symbolic link.
 2649 
 2650 You should not put more than one argument on the @samp{#!}
 2651 line after the path to @command{awk}. It does not work. The operating system
 2652 treats the rest of the line as a single argument and passes it to @command{awk}.
 2653 Doing this leads to confusing behavior---most likely a usage diagnostic
 2654 of some sort from @command{awk}.
 2655 
 2656 @cindex @code{ARGC}/@code{ARGV} variables @subentry portability and
 2657 @cindex portability @subentry @code{ARGV} variable
 2658 @cindex dark corner @subentry @code{ARGV} variable, value of
 2659 Finally, the value of @code{ARGV[0]}
 2660 (@pxref{Built-in Variables})
 2661 varies depending upon your operating system.
 2662 Some systems put @samp{awk} there, some put the full pathname
 2663 of @command{awk} (such as @file{/bin/awk}), and some put the name
 2664 of your script (@samp{advice}).  @value{DARKCORNER}
 2665 Don't rely on the value of @code{ARGV[0]}
 2666 to provide your script name.
 2667 @end sidebar
 2668 
 2669 @node Comments
 2670 @subsection Comments in @command{awk} Programs
 2671 @cindex @code{#} (number sign) @subentry commenting
 2672 @cindex number sign (@code{#}) @subentry commenting
 2673 @cindex commenting
 2674 @cindex @command{awk} programs @subentry documenting
 2675 
 2676 A @dfn{comment} is some text that is included in a program for the sake
 2677 of human readers; it is not really an executable part of the program.  Comments
 2678 can explain what the program does and how it works.  Nearly all
 2679 programming languages have provisions for comments, as programs are
 2680 typically hard to understand without them.
 2681 
 2682 In the @command{awk} language, a comment starts with the number sign
 2683 character (@samp{#}) and continues to the end of the line.
 2684 The @samp{#} does not have to be the first character on the line. The
 2685 @command{awk} language ignores the rest of a line following a number sign.
 2686 For example, we could have put the following into @file{advice}:
 2687 
 2688 @example
 2689 # This program prints a nice, friendly message.  It helps
 2690 # keep novice users from being afraid of the computer.
 2691 BEGIN    @{ print "Don't Panic!" @}
 2692 @end example
 2693 
 2694 You can put comment lines into keyboard-composed throwaway @command{awk}
 2695 programs, but this usually isn't very useful; the purpose of a
 2696 comment is to help you or another person understand the program
 2697 when reading it at a later time.
 2698 
 2699 @cindex quoting @subentry for small awk programs
 2700 @cindex single quote (@code{'}) @subentry vs.@: apostrophe
 2701 @cindex @code{'} (single quote) @subentry vs.@: apostrophe
 2702 @quotation CAUTION
 2703 As mentioned in
 2704 @ref{One-shot},
 2705 you can enclose short to medium-sized programs in single quotes,
 2706 in order to keep
 2707 your shell scripts self-contained.  When doing so, @emph{don't} put
 2708 an apostrophe (i.e., a single quote) into a comment (or anywhere else
 2709 in your program). The shell interprets the quote as the closing
 2710 quote for the entire program. As a result, usually the shell
 2711 prints a message about mismatched quotes, and if @command{awk} actually
 2712 runs, it will probably print strange messages about syntax errors.
 2713 For example, look at the following:
 2714 
 2715 @example
 2716 $ @kbd{awk 'BEGIN @{ print "hello" @} # let's be cute'}
 2717 >
 2718 @end example
 2719 
 2720 The shell sees that the first two quotes match, and that
 2721 a new quoted object begins at the end of the command line.
 2722 It therefore prompts with the secondary prompt, waiting for more input.
 2723 With Unix @command{awk}, closing the quoted string produces this result:
 2724 
 2725 @example
 2726 $ @kbd{awk '@{ print "hello" @} # let's be cute'}
 2727 > @kbd{'}
 2728 @error{} awk: can't open file be
 2729 @error{}  source line number 1
 2730 @end example
 2731 
 2732 @cindex @code{\} (backslash)
 2733 @cindex backslash (@code{\})
 2734 Putting a backslash before the single quote in @samp{let's} wouldn't help,
 2735 because backslashes are not special inside single quotes.
 2736 The next @value{SUBSECTION} describes the shell's quoting rules.
 2737 @end quotation
 2738 
 2739 @node Quoting
 2740 @subsection Shell Quoting Issues
 2741 @cindex shell quoting, rules for
 2742 
 2743 @menu
 2744 * DOS Quoting::                 Quoting in Windows Batch Files.
 2745 @end menu
 2746 
 2747 For short to medium-length @command{awk} programs, it is most convenient
 2748 to enter the program on the @command{awk} command line.
 2749 This is best done by enclosing the entire program in single quotes.
 2750 This is true whether you are entering the program interactively at
 2751 the shell prompt, or writing it as part of a larger shell script:
 2752 
 2753 @example
 2754 awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{}
 2755 @end example
 2756 
 2757 @cindex shells @subentry quoting @subentry rules for
 2758 @cindex Bourne shell, quoting rules for
 2759 Once you are working with the shell, it is helpful to have a basic
 2760 knowledge of shell quoting rules.  The following rules apply only to
 2761 POSIX-compliant, Bourne-style shells (such as Bash, the GNU Bourne-Again
 2762 Shell).  If you use the C shell, you're on your own.
 2763 
 2764 Before diving into the rules, we introduce a concept that appears
 2765 throughout this @value{DOCUMENT}, which is that of the @dfn{null},
 2766 or empty, string.
 2767 
 2768 The null string is character data that has no value.
 2769 In other words, it is empty.  It is written in @command{awk} programs
 2770 like this: @code{""}. In the shell, it can be written using single
 2771 or double quotes: @code{""} or @code{''}. Although the null string has
 2772 no characters in it, it does exist. For example, consider this command:
 2773 
 2774 @example
 2775 $ @kbd{echo ""}
 2776 @end example
 2777 
 2778 @noindent
 2779 Here, the @command{echo} utility receives a single argument, even
 2780 though that argument has no characters in it. In the rest of this
 2781 @value{DOCUMENT}, we use the terms @dfn{null string} and @dfn{empty string}
 2782 interchangeably.  Now, on to the quoting rules:
 2783 
 2784 @itemize @value{BULLET}
 2785 @item
 2786 Quoted items can be concatenated with nonquoted items as well as with other
 2787 quoted items.  The shell turns everything into one argument for
 2788 the command.
 2789 
 2790 @item
 2791 Preceding any single character with a backslash (@samp{\}) quotes
 2792 that character.  The shell removes the backslash and passes the quoted
 2793 character on to the command.
 2794 
 2795 @item
 2796 @cindex @code{\} (backslash) @subentry in shell commands
 2797 @cindex backslash (@code{\}) @subentry in shell commands
 2798 @cindex single quote (@code{'}) @subentry in shell commands
 2799 @cindex @code{'} (single quote) @subentry in shell commands
 2800 Single quotes protect everything between the opening and closing quotes.
 2801 The shell does no interpretation of the quoted text, passing it on verbatim
 2802 to the command.
 2803 It is @emph{impossible} to embed a single quote inside single-quoted text.
 2804 Refer back to
 2805 @ref{Comments}
 2806 for an example of what happens if you try.
 2807 
 2808 @item
 2809 @cindex double quote (@code{"}) @subentry in shell commands
 2810 @cindex @code{"} (double quote) @subentry in shell commands
 2811 Double quotes protect most things between the opening and closing quotes.
 2812 The shell does at least variable and command substitution on the quoted text.
 2813 Different shells may do additional kinds of processing on double-quoted text.
 2814 
 2815 Because certain characters within double-quoted text are processed by the shell,
 2816 they must be @dfn{escaped} within the text.  Of note are the characters
 2817 @samp{$}, @samp{`}, @samp{\}, and @samp{"}, all of which must be preceded by
 2818 a backslash within double-quoted text if they are to be passed on literally
 2819 to the program.  (The leading backslash is stripped first.)
 2820 Thus, the example seen
 2821 @ifnotinfo
 2822 previously
 2823 @end ifnotinfo
 2824 in @ref{Read Terminal}:
 2825 
 2826 @example
 2827 awk 'BEGIN @{ print "Don\47t Panic!" @}'
 2828 @end example
 2829 
 2830 @noindent
 2831 could instead be written this way:
 2832 
 2833 @example
 2834 $ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"}
 2835 @print{} Don't Panic!
 2836 @end example
 2837 
 2838 @cindex single quote (@code{'}) @subentry with double quotes
 2839 @cindex @code{'} (single quote) @subentry with double quotes
 2840 Note that the single quote is not special within double quotes.
 2841 
 2842 @item
 2843 Null strings are removed when they occur as part of a non-null
 2844 command-line argument, while explicit null objects are kept.
 2845 For example, to specify that the field separator @code{FS} should
 2846 be set to the null string, use:
 2847 
 2848 @example
 2849 awk -F "" '@var{program}' @var{files} # correct
 2850 @end example
 2851 
 2852 @noindent
 2853 @cindex null strings @subentry in @command{gawk} arguments, quoting and
 2854 Don't use this:
 2855 
 2856 @example
 2857 awk -F"" '@var{program}' @var{files}  # wrong!
 2858 @end example
 2859 
 2860 @noindent
 2861 In the second case, @command{awk} attempts to use the text of the program
 2862 as the value of @code{FS}, and the first @value{FN} as the text of the program!
 2863 This results in syntax errors at best, and confusing behavior at worst.
 2864 @end itemize
 2865 
 2866 @cindex quoting @subentry in @command{gawk} command lines @subentry tricks for
 2867 Mixing single and double quotes is difficult.  You have to resort
 2868 to shell quoting tricks, like this:
 2869 
 2870 @example
 2871 $ @kbd{awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'}
 2872 @print{} Here is a single quote <'>
 2873 @end example
 2874 
 2875 @noindent
 2876 This program consists of three concatenated quoted strings.  The first and the
 2877 third are single-quoted, and the second is double-quoted.
 2878 
 2879 This can be ``simplified'' to:
 2880 
 2881 @example
 2882 $ @kbd{awk 'BEGIN @{ print "Here is a single quote <'\''>" @}'}
 2883 @print{} Here is a single quote <'>
 2884 @end example
 2885 
 2886 @noindent
 2887 Judge for yourself which of these two is the more readable.
 2888 
 2889 Another option is to use double quotes, escaping the embedded, @command{awk}-level
 2890 double quotes:
 2891 
 2892 @example
 2893 $ @kbd{awk "BEGIN @{ print \"Here is a single quote <'>\" @}"}
 2894 @print{} Here is a single quote <'>
 2895 @end example
 2896 
 2897 @noindent
 2898 This option is also painful, because double quotes, backslashes, and dollar signs
 2899 are very common in more advanced @command{awk} programs.
 2900 
 2901 A third option is to use the octal escape sequence equivalents
 2902 (@pxref{Escape Sequences})
 2903 for the
 2904 single- and double-quote characters, like so:
 2905 
 2906 @example
 2907 @group
 2908 $ @kbd{awk 'BEGIN @{ print "Here is a single quote <\47>" @}'}
 2909 @print{} Here is a single quote <'>
 2910 $ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'}
 2911 @print{} Here is a double quote <">
 2912 @end group
 2913 @end example
 2914 
 2915 @noindent
 2916 This works nicely, but you should comment clearly what the
 2917 escape sequences mean.
 2918 
 2919 A fourth option is to use command-line variable assignment, like this:
 2920 
 2921 @example
 2922 $ @kbd{awk -v sq="'" 'BEGIN @{ print "Here is a single quote <" sq ">" @}'}
 2923 @print{} Here is a single quote <'>
 2924 @end example
 2925 
 2926 (Here, the two string constants and the value of @code{sq} are concatenated
 2927 into a single string that is printed by @code{print}.)
 2928 
 2929 If you really need both single and double quotes in your @command{awk}
 2930 program, it is probably best to move it into a separate file, where
 2931 the shell won't be part of the picture and you can say what you mean.
 2932 
 2933 @node DOS Quoting
 2934 @subsubsection Quoting in MS-Windows Batch Files
 2935 
 2936 @ignore
 2937 Date: Wed, 21 May 2008 09:58:43 +0200 (CEST)
 2938 From: jeroen.brink@inter.NL.net
 2939 Subject: (g)awk "contribution"
 2940 To: arnold@skeeve.com
 2941 Message-id: <42220.193.172.132.34.1211356723.squirrel@webmail.internl.net>
 2942 
 2943 Hello Arnold,
 2944 
 2945 maybe you can help me out. Found your email on the GNU/awk online manual
 2946 pages.
 2947 
 2948 I've searched hard to figure out how, on Windows, to print double quotes.
 2949 Couldn't find it in the Quotes area, nor on google or elsewhere. Finally i
 2950 figured out how to do this myself.
 2951 
 2952 How to print all lines in a file surrounded by double quotes (on Windows):
 2953 
 2954 gawk "{ print \"\042\" $0 \"\042\" }" <file>
 2955 
 2956 Maybe this is a helpfull tip for other (Windows) gawk users. However, i
 2957 don't have a clue as to where to "publish" this tip! Do you?
 2958 
 2959 Kind regards,
 2960 
 2961 Jeroen Brink
 2962 @end ignore
 2963 
 2964 Although this @value{DOCUMENT} generally only worries about POSIX systems and the
 2965 POSIX shell, the following issue arises often enough for many users that
 2966 it is worth addressing.
 2967 
 2968 @cindex Brink, Jeroen
 2969 The ``shells'' on Microsoft Windows systems use the double-quote
 2970 character for quoting, and make it difficult or impossible to include an
 2971 escaped double-quote character in a command-line script.  The following
 2972 example, courtesy of Jeroen Brink, shows how to escape the double quotes
 2973 from this one liner script that prints all lines in a file surrounded by
 2974 double quotes:
 2975 
 2976 @example
 2977 @{ print "\"" $0 "\"" @}
 2978 @end example
 2979 
 2980 @noindent
 2981 In an MS-Windows command-line the one-liner script above may be passed as
 2982 follows:
 2983 
 2984 @example
 2985 gawk "@{ print \"\042\" $0 \"\042\" @}" @var{file}
 2986 @end example
 2987 
 2988 In this example the @samp{\042} is the octal code for a double-quote;
 2989 @command{gawk} converts it into a real double-quote for output by
 2990 the @code{print} statement.
 2991 
 2992 In MS-Windows escaping double-quotes is a little tricky because you use
 2993 backslashes to escape double-quotes, but backslashes themselves are not
 2994 escaped in the usual way; indeed they are either duplicated or not,
 2995 depending upon whether there is a subsequent double-quote.  The MS-Windows
 2996 rule for double-quoting a string is the following:
 2997 
 2998 @enumerate
 2999 @item
 3000 For each double quote in the original string, let @var{N} be the number
 3001 of backslash(es) before it, @var{N} might be zero. Replace these @var{N}
 3002 backslash(es) by @math{2@value{TIMES}@var{N}+1} backslash(es)
 3003 
 3004 @item
 3005 Let @var{N} be the number of backslash(es) tailing the original string,
 3006 @var{N} might be zero. Replace these @var{N} backslash(es) by
 3007 @math{2@value{TIMES}@var{N}} backslash(es)
 3008 
 3009 @item
 3010 Surround the resulting string by double-quotes.
 3011 @end enumerate
 3012 
 3013 So to double-quote the one-liner script @samp{@{ print "\"" $0 "\"" @}}
 3014 from the previous example you would do it this way:
 3015 
 3016 @example
 3017 gawk "@{ print \"\\\"\" $0 \"\\\"\" @}" @var{file}
 3018 @end example
 3019 
 3020 @noindent
 3021 However, the use of @samp{\042} instead of @samp{\\\"} is also possible
 3022 and easier to read, because backslashes that are not followed by a
 3023 double-quote don't need duplication.
 3024 
 3025 @node Sample Data Files
 3026 @section @value{DDF}s for the Examples
 3027 
 3028 @cindex input files @subentry examples
 3029 @cindex @code{mail-list} file
 3030 Many of the examples in this @value{DOCUMENT} take their input from two sample
 3031 @value{DF}s.  The first, @file{mail-list}, represents a list of peoples' names
 3032 together with their email addresses and information about those people.
 3033 The second @value{DF}, called @file{inventory-shipped}, contains
 3034 information about monthly shipments.  In both files,
 3035 each line is considered to be one @dfn{record}.
 3036 
 3037 In @file{mail-list}, each record contains the name of a person,
 3038 his/her phone number, his/her email address, and a code for his/her relationship
 3039 with the author of the list.
 3040 The columns are aligned using spaces.
 3041 An @samp{A} in the last column
 3042 means that the person is an acquaintance.  An @samp{F} in the last
 3043 column means that the person is a friend.
 3044 An @samp{R} means that the person is a relative:
 3045 
 3046 @example
 3047 @c system if test ! -d eg      ; then mkdir eg      ; fi
 3048 @c system if test ! -d eg/lib  ; then mkdir eg/lib  ; fi
 3049 @c system if test ! -d eg/data ; then mkdir eg/data ; fi
 3050 @c system if test ! -d eg/prog ; then mkdir eg/prog ; fi
 3051 @c system if test ! -d eg/misc ; then mkdir eg/misc ; fi
 3052 @c file eg/data/mail-list
 3053 Amelia       555-5553     amelia.zodiacusque@@gmail.com    F
 3054 Anthony      555-3412     anthony.asserturo@@hotmail.com   A
 3055 Becky        555-7685     becky.algebrarum@@gmail.com      A
 3056 Bill         555-1675     bill.drowning@@hotmail.com       A
 3057 Broderick    555-0542     broderick.aliquotiens@@yahoo.com R
 3058 Camilla      555-2912     camilla.infusarum@@skynet.be     R
 3059 Fabius       555-1234     fabius.undevicesimus@@ucb.edu    F
 3060 Julie        555-6699     julie.perscrutabor@@skeeve.com   F
 3061 Martin       555-6480     martin.codicibus@@hotmail.com    A
 3062 Samuel       555-3430     samuel.lanceolis@@shu.edu        A
 3063 Jean-Paul    555-2127     jeanpaul.campanorum@@nyu.edu     R
 3064 @c endfile
 3065 @end example
 3066 
 3067 @cindex @code{inventory-shipped} file
 3068 The @value{DF} @file{inventory-shipped} represents
 3069 information about shipments during the year.
 3070 Each record contains the month, the number
 3071 of green crates shipped, the number of red boxes shipped, the number of
 3072 orange bags shipped, and the number of blue packages shipped,
 3073 respectively.  There are 16 entries, covering the 12 months of last year
 3074 and the first four months of the current year.
 3075 An empty line separates the data for the two years:
 3076 
 3077 @example
 3078 @c file eg/data/inventory-shipped
 3079 Jan  13  25  15 115
 3080 Feb  15  32  24 226
 3081 Mar  15  24  34 228
 3082 Apr  31  52  63 420
 3083 May  16  34  29 208
 3084 Jun  31  42  75 492
 3085 Jul  24  34  67 436
 3086 Aug  15  34  47 316
 3087 Sep  13  55  37 277
 3088 Oct  29  54  68 525
 3089 Nov  20  87  82 577
 3090 Dec  17  35  61 401
 3091 
 3092 Jan  21  36  64 620
 3093 Feb  26  58  80 652
 3094 Mar  24  75  70 495
 3095 Apr  21  70  74 514
 3096 @c endfile
 3097 @end example
 3098 
 3099 The sample files are included in the @command{gawk} distribution,
 3100 in the directory @file{awklib/eg/data}.
 3101 
 3102 @node Very Simple
 3103 @section Some Simple Examples
 3104 
 3105 The following command runs a simple @command{awk} program that searches the
 3106 input file @file{mail-list} for the character string @samp{li} (a
 3107 grouping of characters is usually called a @dfn{string};
 3108 the term @dfn{string} is based on similar usage in English, such
 3109 as ``a string of pearls'' or ``a string of cars in a train''):
 3110 
 3111 @example
 3112 awk '/li/ @{ print $0 @}' mail-list
 3113 @end example
 3114 
 3115 @noindent
 3116 When lines containing @samp{li} are found, they are printed because
 3117 @w{@samp{print $0}} means print the current line.  (Just @samp{print} by
 3118 itself means the same thing, so we could have written that
 3119 instead.)
 3120 
 3121 You will notice that slashes (@samp{/}) surround the string @samp{li}
 3122 in the @command{awk} program.  The slashes indicate that @samp{li}
 3123 is the pattern to search for.  This type of pattern is called a
 3124 @dfn{regular expression}, which is covered in more detail later
 3125 (@pxref{Regexp}).
 3126 The pattern is allowed to match parts of words.
 3127 There are
 3128 single quotes around the @command{awk} program so that the shell won't
 3129 interpret any of it as special shell characters.
 3130 
 3131 Here is what this program prints:
 3132 
 3133 @example
 3134 $ @kbd{awk '/li/ @{ print $0 @}' mail-list}
 3135 @print{} Amelia       555-5553     amelia.zodiacusque@@gmail.com    F
 3136 @print{} Broderick    555-0542     broderick.aliquotiens@@yahoo.com R
 3137 @print{} Julie        555-6699     julie.perscrutabor@@skeeve.com   F
 3138 @print{} Samuel       555-3430     samuel.lanceolis@@shu.edu        A
 3139 @end example
 3140 
 3141 @cindex actions @subentry default
 3142 @cindex patterns @subentry default
 3143 In an @command{awk} rule, either the pattern or the action can be omitted,
 3144 but not both.  If the pattern is omitted, then the action is performed
 3145 for @emph{every} input line.  If the action is omitted, the default
 3146 action is to print all lines that match the pattern.
 3147 
 3148 @cindex actions @subentry empty
 3149 Thus, we could leave out the action (the @code{print} statement and the
 3150 braces) in the previous example and the result would be the same:
 3151 @command{awk} prints all lines matching the pattern @samp{li}.  By comparison,
 3152 omitting the @code{print} statement but retaining the braces makes an
 3153 empty action that does nothing (i.e., no lines are printed).
 3154 
 3155 @cindex @command{awk} programs @subentry one-line examples
 3156 Many practical @command{awk} programs are just a line or two long.  Following is a
 3157 collection of useful, short programs to get you started.  Some of these
 3158 programs contain constructs that haven't been covered yet. (The description
 3159 of the program will give you a good idea of what is going on, but you'll
 3160 need to read the rest of the @value{DOCUMENT} to become an @command{awk} expert!)
 3161 Most of the examples use a @value{DF} named @file{data}.  This is just a
 3162 placeholder; if you use these programs yourself, substitute
 3163 your own @value{FN}s for @file{data}.
 3164 For future reference, note that there is often more than
 3165 one way to do things in @command{awk}.  At some point, you may want
 3166 to look back at these examples and see if
 3167 you can come up with different ways to do the same things shown here:
 3168 
 3169 @itemize @value{BULLET}
 3170 @item
 3171 Print every line that is longer than 80 characters:
 3172 
 3173 @example
 3174 awk 'length($0) > 80' data
 3175 @end example
 3176 
 3177 The sole rule has a relational expression as its pattern and has no
 3178 action---so it uses the default action, printing the record.
 3179 
 3180 @item
 3181 Print the length of the longest input line:
 3182 
 3183 @example
 3184 @group
 3185 awk '@{ if (length($0) > max) max = length($0) @}
 3186      END @{ print max @}' data
 3187 @end group
 3188 @end example
 3189 
 3190 The code associated with @code{END} executes after all
 3191 input has been read; it's the other side of the coin to @code{BEGIN}.
 3192 
 3193 @cindex @command{expand} utility
 3194 @item
 3195 Print the length of the longest line in @file{data}:
 3196 
 3197 @example
 3198 expand data | awk '@{ if (x < length($0)) x = length($0) @}
 3199                    END @{ print "maximum line length is " x @}'
 3200 @end example
 3201 
 3202 This example differs slightly from the previous one:
 3203 the input is processed by the @command{expand} utility to change TABs
 3204 into spaces, so the widths compared are actually the right-margin columns,
 3205 as opposed to the number of input characters on each line.
 3206 
 3207 @item
 3208 Print every line that has at least one field:
 3209 
 3210 @example
 3211 awk 'NF > 0' data
 3212 @end example
 3213 
 3214 This is an easy way to delete blank lines from a file (or rather, to
 3215 create a new file similar to the old file but from which the blank lines
 3216 have been removed).
 3217 
 3218 @item
 3219 Print seven random numbers from 0 to 100, inclusive:
 3220 
 3221 @example
 3222 awk 'BEGIN @{ for (i = 1; i <= 7; i++)
 3223                  print int(101 * rand()) @}'
 3224 @end example
 3225 
 3226 @item
 3227 Print the total number of bytes used by @var{files}:
 3228 
 3229 @example
 3230 ls -l @var{files} | awk '@{ x += $5 @}
 3231                    END @{ print "total bytes: " x @}'
 3232 @end example
 3233 
 3234 @item
 3235 Print the total number of kilobytes used by @var{files}:
 3236 
 3237 @c Don't use \ continuation, not discussed yet
 3238 @c Remember that awk does floating point division,
 3239 @c no need for (x+1023) / 1024
 3240 @example
 3241 ls -l @var{files} | awk '@{ x += $5 @}
 3242    END @{ print "total K-bytes:", x / 1024 @}'
 3243 @end example
 3244 
 3245 @item
 3246 Print a sorted list of the login names of all users:
 3247 
 3248 @example
 3249 awk -F: '@{ print $1 @}' /etc/passwd | sort
 3250 @end example
 3251 
 3252 @item
 3253 Count the lines in a file:
 3254 
 3255 @example
 3256 awk 'END @{ print NR @}' data
 3257 @end example
 3258 
 3259 @item
 3260 Print the even-numbered lines in the @value{DF}:
 3261 
 3262 @example
 3263 awk 'NR % 2 == 0' data
 3264 @end example
 3265 
 3266 If you used the expression @samp{NR % 2 == 1} instead,
 3267 the program would print the odd-numbered lines.
 3268 @end itemize
 3269 
 3270 @node Two Rules
 3271 @section An Example with Two Rules
 3272 @cindex @command{awk} programs
 3273 
 3274 The @command{awk} utility reads the input files one line at a
 3275 time.  For each line, @command{awk} tries the patterns of each rule.
 3276 If several patterns match, then several actions execute in the order in
 3277 which they appear in the @command{awk} program.  If no patterns match, then
 3278 no actions run.
 3279 
 3280 After processing all the rules that match the line (and perhaps there are none),
 3281 @command{awk} reads the next line.  (However,
 3282 @pxref{Next Statement}
 3283 @ifdocbook
 3284 and @ref{Nextfile Statement}.)
 3285 @end ifdocbook
 3286 @ifnotdocbook
 3287 and also @pxref{Nextfile Statement}.)
 3288 @end ifnotdocbook
 3289 This continues until the program reaches the end of the file.
 3290 For example, the following @command{awk} program contains two rules:
 3291 
 3292 @example
 3293 /12/  @{ print $0 @}
 3294 /21/  @{ print $0 @}
 3295 @end example
 3296 
 3297 @noindent
 3298 The first rule has the string @samp{12} as the
 3299 pattern and @samp{print $0} as the action.  The second rule has the
 3300 string @samp{21} as the pattern and also has @samp{print $0} as the
 3301 action.  Each rule's action is enclosed in its own pair of braces.
 3302 
 3303 This program prints every line that contains the string
 3304 @samp{12} @emph{or} the string @samp{21}.  If a line contains both
 3305 strings, it is printed twice, once by each rule.
 3306 
 3307 This is what happens if we run this program on our two sample @value{DF}s,
 3308 @file{mail-list} and @file{inventory-shipped}:
 3309 
 3310 @example
 3311 $ @kbd{awk '/12/ @{ print $0 @}}
 3312 >      @kbd{/21/ @{ print $0 @}' mail-list inventory-shipped}
 3313 @print{} Anthony      555-3412     anthony.asserturo@@hotmail.com   A
 3314 @print{} Camilla      555-2912     camilla.infusarum@@skynet.be     R
 3315 @print{} Fabius       555-1234     fabius.undevicesimus@@ucb.edu    F
 3316 @print{} Jean-Paul    555-2127     jeanpaul.campanorum@@nyu.edu     R
 3317 @print{} Jean-Paul    555-2127     jeanpaul.campanorum@@nyu.edu     R
 3318 @print{} Jan  21  36  64 620
 3319 @print{} Apr  21  70  74 514
 3320 @end example
 3321 
 3322 @noindent
 3323 Note how the line beginning with @samp{Jean-Paul}
 3324 in @file{mail-list} was printed twice, once for each rule.
 3325 
 3326 @node More Complex
 3327 @section A More Complex Example
 3328 
 3329 Now that we've mastered some simple tasks, let's look at
 3330 what typical @command{awk}
 3331 programs do.  This example shows how @command{awk} can be used to
 3332 summarize, select, and rearrange the output of another utility.  It uses
 3333 features that haven't been covered yet, so don't worry if you don't
 3334 understand all the details:
 3335 
 3336 @example
 3337 ls -l | awk '$6 == "Nov" @{ sum += $5 @}
 3338              END @{ print sum @}'
 3339 @end example
 3340 
 3341 @cindex @command{ls} utility
 3342 This command prints the total number of bytes in all the files in the
 3343 current directory that were last modified in November (of any year).
 3344 The @w{@samp{ls -l}} part of this example is a system command that gives
 3345 you a listing of the files in a directory, including each file's size and the date
 3346 the file was last modified. Its output looks like this:
 3347 
 3348 @example
 3349 -rw-r--r--  1 arnold   user   1933 Nov  7 13:05 Makefile
 3350 -rw-r--r--  1 arnold   user  10809 Nov  7 13:03 awk.h
 3351 -rw-r--r--  1 arnold   user    983 Apr 13 12:14 awk.tab.h
 3352 -rw-r--r--  1 arnold   user  31869 Jun 15 12:20 awkgram.y
 3353 -rw-r--r--  1 arnold   user  22414 Nov  7 13:03 awk1.c
 3354 -rw-r--r--  1 arnold   user  37455 Nov  7 13:03 awk2.c
 3355 -rw-r--r--  1 arnold   user  27511 Dec  9 13:07 awk3.c
 3356 -rw-r--r--  1 arnold   user   7989 Nov  7 13:03 awk4.c
 3357 @end example
 3358 
 3359 @noindent
 3360 @cindex line continuations @subentry with C shell
 3361 The first field contains read-write permissions, the second field contains
 3362 the number of links to the file, and the third field identifies the file's owner.
 3363 The fourth field identifies the file's group.
 3364 The fifth field contains the file's size in bytes.  The
 3365 sixth, seventh, and eighth fields contain the month, day, and time,
 3366 respectively, that the file was last modified.  Finally, the ninth field
 3367 contains the @value{FN}.
 3368 
 3369 @c @cindex automatic initialization
 3370 @cindex initialization, automatic
 3371 The @samp{$6 == "Nov"} in our @command{awk} program is an expression that
 3372 tests whether the sixth field of the output from @w{@samp{ls -l}}
 3373 matches the string @samp{Nov}.  Each time a line has the string
 3374 @samp{Nov} for its sixth field, @command{awk} performs the action
 3375 @samp{sum += $5}.  This adds the fifth field (the file's size) to the variable
 3376 @code{sum}.  As a result, when @command{awk} has finished reading all the
 3377 input lines, @code{sum} is the total of the sizes of the files whose
 3378 lines matched the pattern.  (This works because @command{awk} variables
 3379 are automatically initialized to zero.)
 3380 
 3381 After the last line of output from @command{ls} has been processed, the
 3382 @code{END} rule executes and prints the value of @code{sum}.
 3383 In this example, the value of @code{sum} is 80600.
 3384 
 3385 These more advanced @command{awk} techniques are covered in later
 3386 @value{SECTION}s
 3387 (@pxref{Action Overview}).  Before you can move on to more
 3388 advanced @command{awk} programming, you have to know how @command{awk} interprets
 3389 your input and displays your output.  By manipulating fields and using
 3390 @code{print} statements, you can produce some very useful and
 3391 impressive-looking reports.
 3392 
 3393 @node Statements/Lines
 3394 @section @command{awk} Statements Versus Lines
 3395 @cindex line breaks
 3396 @cindex newlines
 3397 
 3398 Most often, each line in an @command{awk} program is a separate statement or
 3399 separate rule, like this:
 3400 
 3401 @example
 3402 awk '/12/  @{ print $0 @}
 3403      /21/  @{ print $0 @}' mail-list inventory-shipped
 3404 @end example
 3405 
 3406 @cindex @command{gawk} @subentry newlines in
 3407 However, @command{gawk} ignores newlines after any of the following
 3408 symbols and keywords:
 3409 
 3410 @example
 3411 ,    @{    ?    :    ||    &&    do    else
 3412 @end example
 3413 
 3414 @noindent
 3415 A newline at any other point is considered the end of the
 3416 statement.@footnote{The @samp{?} and @samp{:} referred to here is the
 3417 three-operand conditional expression described in
 3418 @ref{Conditional Exp}.
 3419 Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk}
 3420 extension; if @option{--posix} is specified
 3421 (@pxref{Options}), then this extension is disabled.}
 3422 
 3423 @cindex @code{\} (backslash) @subentry continuing lines and
 3424 @cindex backslash (@code{\}) @subentry continuing lines and
 3425 If you would like to split a single statement into two lines at a point
 3426 where a newline would terminate it, you can @dfn{continue} it by ending the
 3427 first line with a backslash character (@samp{\}).  The backslash must be
 3428 the final character on the line in order to be recognized as a continuation
 3429 character.  A backslash followed by a newline is allowed anywhere in the statement, even
 3430 in the middle of a string or regular expression.  For example:
 3431 
 3432 @example
 3433 awk '/This regular expression is too long, so continue it\
 3434  on the next line/ @{ print $1 @}'
 3435 @end example
 3436 
 3437 @noindent
 3438 @cindex portability @subentry backslash continuation and
 3439 We have generally not used backslash continuation in our sample programs.
 3440 @command{gawk} places no limit on the
 3441 length of a line, so backslash continuation is never strictly necessary;
 3442 it just makes programs more readable.  For this same reason, as well as
 3443 for clarity, we have kept most statements short in the programs
 3444 presented throughout the @value{DOCUMENT}.
 3445 
 3446 Backslash continuation is
 3447 most useful when your @command{awk} program is in a separate source file
 3448 instead of entered from the command line.  You should also note that
 3449 many @command{awk} implementations are more particular about where you
 3450 may use backslash continuation. For example, they may not allow you to
 3451 split a string constant using backslash continuation.  Thus, for maximum
 3452 portability of your @command{awk} programs, it is best not to split your
 3453 lines in the middle of a regular expression or a string.
 3454 @c 10/2000: gawk, mawk, and current bell labs awk allow it,
 3455 @c solaris 2.7 nawk does not. Solaris /usr/xpg4/bin/awk does though!  sigh.
 3456 
 3457 @cindex @command{csh} utility
 3458 @cindex backslash (@code{\}) @subentry continuing lines and @subentry in @command{csh}
 3459 @cindex @code{\} (backslash) @subentry continuing lines and @subentry in @command{csh}
 3460 @quotation CAUTION
 3461 @emph{Backslash continuation does not work as described
 3462 with the C shell.}  It works for @command{awk} programs in files and
 3463 for one-shot programs, @emph{provided} you are using a POSIX-compliant
 3464 shell, such as the Unix Bourne shell or Bash.  But the C shell behaves
 3465 differently!  There you must use two backslashes in a row, followed by
 3466 a newline.  Note also that when using the C shell, @emph{every} newline
 3467 in your @command{awk} program must be escaped with a backslash. To illustrate:
 3468 
 3469 @example
 3470 % @kbd{awk 'BEGIN @{ \}
 3471 ? @kbd{  print \\}
 3472 ? @kbd{      "hello, world" \}
 3473 ? @kbd{@}'}
 3474 @print{} hello, world
 3475 @end example
 3476 
 3477 @noindent
 3478 Here, the @samp{%} and @samp{?} are the C shell's primary and secondary
 3479 prompts, analogous to the standard shell's @samp{$} and @samp{>}.
 3480 
 3481 Compare the previous example to how it is done with a POSIX-compliant shell:
 3482 
 3483 @example
 3484 $ @kbd{awk 'BEGIN @{}
 3485 >   @kbd{print \}
 3486 >       @kbd{"hello, world"}
 3487 > @kbd{@}'}
 3488 @print{} hello, world
 3489 @end example
 3490 @end quotation
 3491 
 3492 @command{awk} is a line-oriented language.  Each rule's action has to
 3493 begin on the same line as the pattern.  To have the pattern and action
 3494 on separate lines, you @emph{must} use backslash continuation; there
 3495 is no other option.
 3496 
 3497 @cindex backslash (@code{\}) @subentry continuing lines and @subentry comments and
 3498 @cindex @code{\} (backslash) @subentry continuing lines and @subentry comments and
 3499 @cindex commenting @subentry backslash continuation and
 3500 Another thing to keep in mind is that backslash continuation and
 3501 comments do not mix. As soon as @command{awk} sees the @samp{#} that
 3502 starts a comment, it ignores @emph{everything} on the rest of the
 3503 line. For example:
 3504 
 3505 @example
 3506 @group
 3507 $ @kbd{gawk 'BEGIN @{ print "dont panic" # a friendly \}
 3508 > @kbd{                                   BEGIN rule}
 3509 > @kbd{@}'}
 3510 @error{} gawk: cmd. line:2:                BEGIN rule
 3511 @error{} gawk: cmd. line:2:                ^ syntax error
 3512 @end group
 3513 @end example
 3514 
 3515 @noindent
 3516 In this case, it looks like the backslash would continue the comment onto the
 3517 next line. However, the backslash-newline combination is never even
 3518 noticed because it is ``hidden'' inside the comment. Thus, the
 3519 @code{BEGIN} is noted as a syntax error.
 3520 
 3521 @cindex statements @subentry multiple
 3522 @cindex @code{;} (semicolon) @subentry separating statements in actions
 3523 @cindex semicolon (@code{;}) @subentry separating statements in actions
 3524 @cindex @code{;} (semicolon) @subentry separating rules
 3525 @cindex semicolon (@code{;}) @subentry separating rules
 3526 When @command{awk} statements within one rule are short, you might want to put
 3527 more than one of them on a line.  This is accomplished by separating the statements
 3528 with a semicolon (@samp{;}).
 3529 This also applies to the rules themselves.
 3530 Thus, the program shown at the start of this @value{SECTION}
 3531 could also be written this way:
 3532 
 3533 @example
 3534 /12/ @{ print $0 @} ; /21/ @{ print $0 @}
 3535 @end example
 3536 
 3537 @quotation NOTE
 3538 The requirement that states that rules on the same line must be
 3539 separated with a semicolon was not in the original @command{awk}
 3540 language; it was added for consistency with the treatment of statements
 3541 within an action.
 3542 @end quotation
 3543 
 3544 @node Other Features
 3545 @section Other Features of @command{awk}
 3546 
 3547 @cindex variables
 3548 The @command{awk} language provides a number of predefined, or
 3549 @dfn{built-in}, variables that your programs can use to get information
 3550 from @command{awk}.  There are other variables your program can set
 3551 as well to control how @command{awk} processes your data.
 3552 
 3553 In addition, @command{awk} provides a number of built-in functions for doing
 3554 common computational and string-related operations.
 3555 @command{gawk} provides built-in functions for working with timestamps,
 3556 performing bit manipulation, for runtime string translation (internationalization),
 3557 determining the type of a variable,
 3558 and array sorting.
 3559 
 3560 As we develop our presentation of the @command{awk} language, we will introduce
 3561 most of the variables and many of the functions. They are described
 3562 systematically in @ref{Built-in Variables} and in
 3563 @ref{Built-in}.
 3564 
 3565 @node When
 3566 @section When to Use @command{awk}
 3567 
 3568 @cindex @command{awk} @subentry uses for
 3569 Now that you've seen some of what @command{awk} can do,
 3570 you might wonder how @command{awk} could be useful for you.  By using
 3571 utility programs, advanced patterns, field separators, arithmetic
 3572 statements, and other selection criteria, you can produce much more
 3573 complex output.  The @command{awk} language is very useful for producing
 3574 reports from large amounts of raw data, such as summarizing information
 3575 from the output of other utility programs like @command{ls}.
 3576 (@xref{More Complex}.)
 3577 
 3578 Programs written with @command{awk} are usually much smaller than they would
 3579 be in other languages.  This makes @command{awk} programs easy to compose and
 3580 use.  Often, @command{awk} programs can be quickly composed at your keyboard,
 3581 used once, and thrown away.  Because @command{awk} programs are interpreted, you
 3582 can avoid the (usually lengthy) compilation part of the typical
 3583 edit-compile-test-debug cycle of software development.
 3584 
 3585 @cindex Brian Kernighan's @command{awk}
 3586 Complex programs have been written in @command{awk}, including a complete
 3587 retargetable assembler for
 3588 @ifclear FOR_PRINT
 3589 eight-bit microprocessors (@pxref{Glossary}, for more information),
 3590 @end ifclear
 3591 @ifset FOR_PRINT
 3592 eight-bit microprocessors,
 3593 @end ifset
 3594 and a microcode assembler for a special-purpose Prolog
 3595 computer.
 3596 The original @command{awk}'s capabilities were strained by tasks
 3597 of such complexity, but modern versions are more capable.
 3598 
 3599 @cindex @command{awk} programs @subentry complex
 3600 If you find yourself writing @command{awk} scripts of more than, say,
 3601 a few hundred lines, you might consider using a different programming
 3602 language.  The shell is good at string and pattern matching; in addition,
 3603 it allows powerful use of the system utilities.  Python offers a nice
 3604 balance between high-level ease of programming and access to system
 3605 facilities.@footnote{Other popular scripting languages include Ruby
 3606 and Perl.}
 3607 
 3608 @node Intro Summary
 3609 @section Summary
 3610 
 3611 @c FIXME: Review this chapter for summary of builtin functions called.
 3612 @itemize @value{BULLET}
 3613 @item
 3614 Programs in @command{awk} consist of @var{pattern}--@var{action} pairs.
 3615 
 3616 @item
 3617 An @var{action} without a @var{pattern} always runs.  The default
 3618 @var{action} for a pattern without one is @samp{@{ print $0 @}}.
 3619 
 3620 @item
 3621 Use either
 3622 @samp{awk '@var{program}' @var{files}}
 3623 or
 3624 @samp{awk -f @var{program-file} @var{files}}
 3625 to run @command{awk}.
 3626 
 3627 @item
 3628 You may use the special @samp{#!} header line to create @command{awk}
 3629 programs that are directly executable.
 3630 
 3631 @item
 3632 Comments in @command{awk} programs start with @samp{#} and continue to
 3633 the end of the same line.
 3634 
 3635 @item
 3636 Be aware of quoting issues when writing @command{awk} programs as
 3637 part of a larger shell script (or MS-Windows batch file).
 3638 
 3639 @item
 3640 You may use backslash continuation to continue a source line.
 3641 Lines are automatically continued after
 3642 a comma, open brace, question mark, colon,
 3643 @samp{||}, @samp{&&}, @code{do}, and @code{else}.
 3644 @end itemize
 3645 
 3646 @node Invoking Gawk
 3647 @chapter Running @command{awk} and @command{gawk}
 3648 
 3649 This @value{CHAPTER} covers how to run @command{awk}, both POSIX-standard
 3650 and @command{gawk}-specific command-line options, and what
 3651 @command{awk} and
 3652 @command{gawk} do with nonoption arguments.
 3653 It then proceeds to cover how @command{gawk} searches for source files,
 3654 reading standard input along with other files, @command{gawk}'s
 3655 environment variables, @command{gawk}'s exit status, using include files,
 3656 and obsolete and undocumented options and/or features.
 3657 
 3658 Many of the options and features described here are discussed in
 3659 more detail later in the @value{DOCUMENT}; feel free to skip over
 3660 things in this @value{CHAPTER} that don't interest you right now.
 3661 
 3662 @menu
 3663 * Command Line::                How to run @command{awk}.
 3664 * Options::                     Command-line options and their meanings.
 3665 * Other Arguments::             Input file names and variable assignments.
 3666 * Naming Standard Input::       How to specify standard input with other
 3667                                 files.
 3668 * Environment Variables::       The environment variables @command{gawk} uses.
 3669 * Exit Status::                 @command{gawk}'s exit status.
 3670 * Include Files::               Including other files into your program.
 3671 * Loading Shared Libraries::    Loading shared libraries into your program.
 3672 * Obsolete::                    Obsolete Options and/or features.
 3673 * Undocumented::                Undocumented Options and Features.
 3674 * Invoking Summary::            Invocation summary.
 3675 @end menu
 3676 
 3677 @node Command Line
 3678 @section Invoking @command{awk}
 3679 @cindex command line @subentry invoking @command{awk} from
 3680 @cindex @command{awk} @subentry invoking
 3681 @cindex arguments @subentry command-line @subentry invoking @command{awk}
 3682 @cindex options @subentry command-line @subentry invoking @command{awk}
 3683 
 3684 There are two ways to run @command{awk}---with an explicit program or with
 3685 one or more program files.  Here are templates for both of them; items
 3686 enclosed in [@dots{}] in these templates are optional:
 3687 
 3688 @display
 3689 @command{awk} [@var{options}] @option{-f} @var{progfile} [@option{--}] @var{file} @dots{}
 3690 @command{awk} [@var{options}] [@option{--}] @code{'@var{program}'} @var{file} @dots{}
 3691 @end display
 3692 
 3693 @cindex GNU long options
 3694 @cindex long options
 3695 @cindex options @subentry long
 3696 In addition to traditional one-letter POSIX-style options, @command{gawk} also
 3697 supports GNU long options.
 3698 
 3699 @cindex dark corner @subentry invoking @command{awk}
 3700 @cindex lint checking @subentry empty programs
 3701 It is possible to invoke @command{awk} with an empty program:
 3702 
 3703 @example
 3704 awk '' datafile1 datafile2
 3705 @end example
 3706 
 3707 @cindex @option{--lint} option
 3708 @cindex dark corner @subentry empty programs
 3709 @noindent
 3710 Doing so makes little sense, though; @command{awk} exits
 3711 silently when given an empty program.
 3712 @value{DARKCORNER}
 3713 If @option{--lint} has
 3714 been specified on the command line, @command{gawk} issues a
 3715 warning that the program is empty.
 3716 
 3717 @node Options
 3718 @section Command-Line Options
 3719 @cindex options @subentry command-line
 3720 @cindex command line @subentry options
 3721 @cindex GNU long options
 3722 @cindex options @subentry long
 3723 
 3724 Options begin with a dash and consist of a single character.
 3725 GNU-style long options consist of two dashes and a keyword.
 3726 The keyword can be abbreviated, as long as the abbreviation allows the option
 3727 to be uniquely identified.  If the option takes an argument, either the
 3728 keyword is immediately followed by an equals sign (@samp{=}) and the
 3729 argument's value, or the keyword and the argument's value are separated
 3730 by whitespace (spaces or TABs).
 3731 If a particular option with a value is given more than once, it is the
 3732 last value that counts.
 3733 
 3734 @cindex POSIX @command{awk} @subentry GNU long options and
 3735 Each long option for @command{gawk} has a corresponding
 3736 POSIX-style short option.
 3737 The long and short options are
 3738 interchangeable in all contexts.
 3739 The following list describes options mandated by the POSIX standard:
 3740 
 3741 @table @code
 3742 @item -F @var{fs}
 3743 @itemx --field-separator @var{fs}
 3744 @cindex @option{-F} option
 3745 @cindex @option{--field-separator} option
 3746 @cindex @code{FS} variable @subentry @code{--field-separator} option and
 3747 Set the @code{FS} variable to @var{fs}
 3748 (@pxref{Field Separators}).
 3749 
 3750 @item -f @var{source-file}
 3751 @itemx --file @var{source-file}
 3752 @cindex @option{-f} option
 3753 @cindex @option{--file} option
 3754 @cindex @command{awk} programs @subentry location of
 3755 Read the @command{awk} program source from @var{source-file}
 3756 instead of in the first nonoption argument.
 3757 This option may be given multiple times; the @command{awk}
 3758 program consists of the concatenation of the contents of
 3759 each specified @var{source-file}.
 3760 
 3761 Files named with @option{-f} are treated as if they had @samp{@@namespace "awk"}
 3762 at their beginning. @xref{Changing The Namespace}, for more information
 3763 on this advanced feature.
 3764 
 3765 @item -v @var{var}=@var{val}
 3766 @itemx --assign @var{var}=@var{val}
 3767 @cindex @option{-v} option
 3768 @cindex @option{--assign} option
 3769 @cindex variables @subentry setting
 3770 Set the variable @var{var} to the value @var{val} @emph{before}
 3771 execution of the program begins.  Such variable values are available
 3772 inside the @code{BEGIN} rule
 3773 (@pxref{Other Arguments}).
 3774 
 3775 The @option{-v} option can only set one variable, but it can be used
 3776 more than once, setting another variable each time, like this:
 3777 @samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}.
 3778 
 3779 @cindex predefined variables @subentry @code{-v} option, setting with
 3780 @cindex variables @subentry predefined @subentry @code{-v} option, setting with
 3781 @quotation CAUTION
 3782 Using @option{-v} to set the values of the built-in
 3783 variables may lead to surprising results.  @command{awk} will reset the
 3784 values of those variables as it needs to, possibly ignoring any
 3785 initial value you may have given.
 3786 @end quotation
 3787 
 3788 @item -W @var{gawk-opt}
 3789 @cindex @option{-W} option
 3790 Provide an implementation-specific option.
 3791 This is the POSIX convention for providing implementation-specific options.
 3792 These options
 3793 also have corresponding GNU-style long options.
 3794 Note that the long options may be abbreviated, as long as
 3795 the abbreviations remain unique.
 3796 The full list of @command{gawk}-specific options is provided next.
 3797 
 3798 @item --
 3799 @cindex command line @subentry options @subentry end of
 3800 @cindex options @subentry command-line @subentry end of
 3801 Signal the end of the command-line options.  The following arguments
 3802 are not treated as options even if they begin with @samp{-}.  This
 3803 interpretation of @option{--} follows the POSIX argument parsing
 3804 conventions.
 3805 
 3806 @cindex @code{-} (hyphen) @subentry file names beginning with
 3807 @cindex hyphen (@code{-}) @subentry file names beginning with
 3808 This is useful if you have @value{FN}s that start with @samp{-},
 3809 or in shell scripts, if you have @value{FN}s that will be specified
 3810 by the user that could start with @samp{-}.
 3811 It is also useful for passing options on to the @command{awk}
 3812 program; see @ref{Getopt Function}.
 3813 @end table
 3814 
 3815 The following list describes @command{gawk}-specific options:
 3816 
 3817 @c Have to use @asis here to get docbook to come out right.
 3818 @table @asis
 3819 @item @option{-b}
 3820 @itemx @option{--characters-as-bytes}
 3821 @cindex @option{-b} option
 3822 @cindex @option{--characters-as-bytes} option
 3823 Cause @command{gawk} to treat all input data as single-byte characters.
 3824 In addition, all output written with @code{print} or @code{printf}
 3825 is treated as single-byte characters.
 3826 
 3827 Normally, @command{gawk} follows the POSIX standard and attempts to process
 3828 its input data according to the current locale (@pxref{Locales}). This can often involve
 3829 converting multibyte characters into wide characters (internally), and
 3830 can lead to problems or confusion if the input data does not contain valid
 3831 multibyte characters. This option is an easy way to tell @command{gawk},
 3832 ``Hands off my data!''
 3833 
 3834 @item @option{-c}
 3835 @itemx @option{--traditional}
 3836 @cindex @option{-c} option
 3837 @cindex @option{--traditional} option
 3838 @cindex compatibility mode (@command{gawk}) @subentry specifying
 3839 Specify @dfn{compatibility mode}, in which the GNU extensions to
 3840 the @command{awk} language are disabled, so that @command{gawk} behaves just
 3841 like BWK @command{awk}.
 3842 @xref{POSIX/GNU},
 3843 which summarizes the extensions.
 3844 @ifclear FOR_PRINT
 3845 Also see
 3846 @ref{Compatibility Mode}.
 3847 @end ifclear
 3848 
 3849 @item @option{-C}
 3850 @itemx @option{--copyright}
 3851 @cindex @option{-C} option
 3852 @cindex @option{--copyright} option
 3853 @cindex GPL (General Public License) @subentry printing
 3854 Print the short version of the General Public License and then exit.
 3855 
 3856 @item @option{-d}[@var{file}]
 3857 @itemx @option{--dump-variables}[@code{=}@var{file}]
 3858 @cindex @option{-d} option
 3859 @cindex @option{--dump-variables} option
 3860 @cindex dump all variables of a program
 3861 @cindex @file{awkvars.out} file
 3862 @cindex files @subentry @file{awkvars.out}
 3863 @cindex variables @subentry global @subentry printing list of
 3864 Print a sorted list of global variables, their types, and final values
 3865 to @var{file}.  If no @var{file} is provided, print this
 3866 list to a file named @file{awkvars.out} in the current directory.
 3867 No space is allowed between the @option{-d} and @var{file}, if
 3868 @var{file} is supplied.
 3869 
 3870 @cindex troubleshooting @subentry typographical errors, global variables
 3871 Having a list of all global variables is a good way to look for
 3872 typographical errors in your programs.
 3873 You would also use this option if you have a large program with a lot of
 3874 functions, and you want to be sure that your functions don't
 3875 inadvertently use global variables that you meant to be local.
 3876 (This is a particularly easy mistake to make with simple variable
 3877 names like @code{i}, @code{j}, etc.)
 3878 
 3879 @item @option{-D}[@var{file}]
 3880 @itemx @option{--debug}[@code{=}@var{file}]
 3881 @cindex @option{-D} option
 3882 @cindex @option{--debug} option
 3883 @cindex @command{awk} programs @subentry debugging, enabling
 3884 Enable debugging of @command{awk} programs
 3885 (@pxref{Debugging}).
 3886 By default, the debugger reads commands interactively from the keyboard
 3887 (standard input).
 3888 The optional @var{file} argument allows you to specify a file with a list
 3889 of commands for the debugger to execute noninteractively.
 3890 No space is allowed between the @option{-D} and @var{file}, if
 3891 @var{file} is supplied.
 3892 
 3893 @item @option{-e} @var{program-text}
 3894 @itemx @option{--source} @var{program-text}
 3895 @cindex @option{-e} option
 3896 @cindex @option{--source} option
 3897 @cindex source code @subentry mixing
 3898 Provide program source code in the @var{program-text}.
 3899 This option allows you to mix source code in files with source
 3900 code that you enter on the command line.
 3901 This is particularly useful
 3902 when you have library functions that you want to use from your command-line
 3903 programs (@pxref{AWKPATH Variable}).
 3904 
 3905 Note that @command{gawk} treats each string as if it ended with
 3906 a newline character (even if it doesn't). This makes building
 3907 the total program easier.
 3908 
 3909 @quotation CAUTION
 3910 Prior to @value{PVERSION} 5.0, there was
 3911 no requirement that each @var{program-text}
 3912 be a full syntactic unit. I.e., the following worked:
 3913 
 3914 @example
 3915 $ @kbd{gawk -e 'BEGIN @{ a = 5 ;' -e 'print a @}'}
 3916 @print{} 5
 3917 @end example
 3918 
 3919 @noindent
 3920 However, this is no longer true. If you have any scripts that
 3921 rely upon this feature, you should revise them.
 3922 
 3923 This is because each @var{program-text} is treated as if it had
 3924 @samp{@@namespace "awk"} at its beginning. @xref{Changing The Namespace},
 3925 for more information.
 3926 @end quotation
 3927 
 3928 @item @option{-E} @var{file}
 3929 @itemx @option{--exec} @var{file}
 3930 @cindex @option{-E} option
 3931 @cindex @option{--exec} option
 3932 @cindex @command{awk} programs @subentry location of
 3933 @cindex CGI, @command{awk} scripts for
 3934 Similar to @option{-f}, read @command{awk} program text from @var{file}.
 3935 There are two differences from @option{-f}:
 3936 
 3937 @itemize @value{BULLET}
 3938 @item
 3939 This option terminates option processing; anything
 3940 else on the command line is passed on directly to the @command{awk} program.
 3941 
 3942 @item
 3943 Command-line variable assignments of the form
 3944 @samp{@var{var}=@var{value}} are disallowed.
 3945 @end itemize
 3946 
 3947 This option is particularly necessary for World Wide Web CGI applications
 3948 that pass arguments through the URL; using this option prevents a malicious
 3949 (or other) user from passing in options, assignments, or @command{awk} source
 3950 code (via @option{-e}) to the CGI application.@footnote{For more detail,
 3951 please see Section 4.4 of @uref{http://www.ietf.org/rfc/rfc3875,
 3952 RFC 3875}. Also see the
 3953 @uref{https://lists.gnu.org/archive/html/bug-gawk/2014-11/msg00022.html,
 3954 explanatory note sent to the @command{gawk} bug
 3955 mailing list}.}
 3956 This option should be used
 3957 with @samp{#!} scripts (@pxref{Executable Scripts}), like so:
 3958 
 3959 @example
 3960 #! /usr/local/bin/gawk -E
 3961 
 3962 @var{awk program here @dots{}}
 3963 @end example
 3964 
 3965 @item @option{-g}
 3966 @itemx @option{--gen-pot}
 3967 @cindex @option{-g} option
 3968 @cindex @option{--gen-pot} option
 3969 @cindex portable object @subentry files @subentry generating
 3970 @cindex files @subentry portable object @subentry generating
 3971 Analyze the source program and
 3972 generate a GNU @command{gettext} portable object template file on standard
 3973 output for all string constants that have been marked for translation.
 3974 @xref{Internationalization},
 3975 for information about this option.
 3976 
 3977 @item @option{-h}
 3978 @itemx @option{--help}
 3979 @cindex @option{-h} option
 3980 @cindex @option{--help} option
 3981 @cindex GNU long options @subentry printing list of
 3982 @cindex options @subentry printing list of
 3983 @cindex printing @subentry list of options
 3984 Print a ``usage'' message summarizing the short- and long-style options
 3985 that @command{gawk} accepts and then exit.
 3986 
 3987 @item @option{-i} @var{source-file}
 3988 @itemx @option{--include} @var{source-file}
 3989 @cindex @option{-i} option
 3990 @cindex @option{--include} option
 3991 @cindex @command{awk} programs @subentry location of
 3992 Read an @command{awk} source library from @var{source-file}.  This option
 3993 is completely equivalent to using the @code{@@include} directive inside
 3994 your program.  It is very similar to the @option{-f} option,
 3995 but there are two important differences.  First, when @option{-i} is
 3996 used, the program source is not loaded if it has been previously
 3997 loaded, whereas with @option{-f}, @command{gawk} always loads the file.
 3998 Second, because this option is intended to be used with code libraries,
 3999 @command{gawk} does not recognize such files as constituting main program
 4000 input.  Thus, after processing an @option{-i} argument, @command{gawk}
 4001 still expects to find the main source code via the @option{-f} option
 4002 or on the command line.
 4003 
 4004 Files named with @option{-i} are treated as if they had @samp{@@namespace "awk"}
 4005 at their beginning.  @xref{Changing The Namespace}, for more information.
 4006 
 4007 @item @option{-l} @var{ext}
 4008 @itemx @option{--load} @var{ext}
 4009 @cindex @option{-l} option
 4010 @cindex @option{--load} option
 4011 @cindex loading extensions
 4012 Load a dynamic extension named @var{ext}. Extensions
 4013 are stored as system shared libraries.
 4014 This option searches for the library using the @env{AWKLIBPATH}
 4015 environment variable.  The correct library suffix for your platform will be
 4016 supplied by default, so it need not be specified in the extension name.
 4017 The extension initialization routine should be named @code{dl_load()}.
 4018 An alternative is to use the @code{@@load} keyword inside the program to load
 4019 a shared library.  This advanced feature is described in detail in @ref{Dynamic Extensions}.
 4020 
 4021 @item @option{-L}[@var{value}]
 4022 @itemx @option{--lint}[@code{=}@var{value}]
 4023 @cindex @option{-l} option
 4024 @cindex @option{--lint} option
 4025 @cindex lint checking @subentry issuing warnings
 4026 @cindex warnings, issuing
 4027 Warn about constructs that are dubious or nonportable to
 4028 other @command{awk} implementations.
 4029 No space is allowed between the @option{-L} and @var{value}, if
 4030 @var{value} is supplied.
 4031 Some warnings are issued when @command{gawk} first reads your program.  Others
 4032 are issued at runtime, as your program executes. The optional
 4033 argument may be one of the following:
 4034 
 4035 @table @code
 4036 @item fatal
 4037 Cause lint warnings become fatal errors.
 4038 This may be drastic, but its use will certainly encourage the
 4039 development of cleaner @command{awk} programs.
 4040 
 4041 @item invalid
 4042 Only issue warnings about things
 4043 that are actually invalid are issued. (This is not fully implemented yet.)
 4044 
 4045 @item no-ext
 4046 Disable warnings about @command{gawk} extensions.
 4047 @end table
 4048 
 4049 Some warnings are only printed once, even if the dubious constructs they
 4050 warn about occur multiple times in your @command{awk} program.  Thus,
 4051 when eliminating problems pointed out by @option{--lint}, you should take
 4052 care to search for all occurrences of each inappropriate construct. As
 4053 @command{awk} programs are usually short, doing so is not burdensome.
 4054 
 4055 @item @option{-M}
 4056 @itemx @option{--bignum}
 4057 @cindex @option{-M} option
 4058 @cindex @option{--bignum} option
 4059 Select arbitrary-precision arithmetic on numbers. This option has no effect
 4060 if @command{gawk} is not compiled to use the GNU MPFR and MP libraries
 4061 (@pxref{Arbitrary Precision Arithmetic}).
 4062 
 4063 @item @option{-n}
 4064 @itemx @option{--non-decimal-data}
 4065 @cindex @option{-n} option
 4066 @cindex @option{--non-decimal-data} option
 4067 @cindex hexadecimal values, enabling interpretation of
 4068 @cindex octal values, enabling interpretation of
 4069 @cindex troubleshooting @subentry @code{--non-decimal-data} option
 4070 Enable automatic interpretation of octal and hexadecimal
 4071 values in input data
 4072 (@pxref{Nondecimal Data}).
 4073 
 4074 @quotation CAUTION
 4075 This option can severely break old programs.  Use with care.  Also note
 4076 that this option may disappear in a future version of @command{gawk}.
 4077 @end quotation
 4078 
 4079 @item @option{-N}
 4080 @itemx @option{--use-lc-numeric}
 4081 @cindex @option{-N} option
 4082 @cindex @option{--use-lc-numeric} option
 4083 Force the use of the locale's decimal point character
 4084 when parsing numeric input data (@pxref{Locales}).
 4085 
 4086 @cindex pretty printing
 4087 @item @option{-o}[@var{file}]
 4088 @itemx @option{--pretty-print}[@code{=}@var{file}]
 4089 @cindex @option{-o} option
 4090 @cindex @option{--pretty-print} option
 4091 Enable pretty-printing of @command{awk} programs.
 4092 Implies @option{--no-optimize}.
 4093 By default, the output program is created in a file named @file{awkprof.out}
 4094 (@pxref{Profiling}).
 4095 The optional @var{file} argument allows you to specify a different
 4096 @value{FN} for the output.
 4097 No space is allowed between the @option{-o} and @var{file}, if
 4098 @var{file} is supplied.
 4099 
 4100 @quotation NOTE
 4101 In the past, this option would also execute your program.
 4102 This is no longer the case.
 4103 @end quotation
 4104 
 4105 @item @option{-O}
 4106 @itemx @option{--optimize}
 4107 @cindex @option{--optimize} option
 4108 @cindex @option{-O} option
 4109 Enable @command{gawk}'s default optimizations on the internal
 4110 representation of the program.  At the moment, this includes just simple
 4111 constant folding.
 4112 
 4113 Optimization is enabled by default.
 4114 This option remains primarily for backwards compatibility. However, it may
 4115 be used to cancel the effect of an earlier @option{-s} option
 4116 (see later in this list).
 4117 
 4118 @item @option{-p}[@var{file}]
 4119 @itemx @option{--profile}[@code{=}@var{file}]
 4120 @cindex @option{-p} option
 4121 @cindex @option{--profile} option
 4122 @cindex @command{awk} @subentry profiling, enabling
 4123 Enable profiling of @command{awk} programs
 4124 (@pxref{Profiling}).
 4125 Implies @option{--no-optimize}.
 4126 By default, profiles are created in a file named @file{awkprof.out}.
 4127 The optional @var{file} argument allows you to specify a different
 4128 @value{FN} for the profile file.
 4129 No space is allowed between the @option{-p} and @var{file}, if
 4130 @var{file} is supplied.
 4131 
 4132 The profile contains execution counts for each statement in the program
 4133 in the left margin, and function call counts for each function.
 4134 
 4135 @item @option{-P}
 4136 @itemx @option{--posix}
 4137 @cindex @option{-P} option
 4138 @cindex @option{--posix} option
 4139 @cindex POSIX mode
 4140 @cindex @command{gawk} @subentry extensions, disabling
 4141 Operate in strict POSIX mode.  This disables all @command{gawk}
 4142 extensions (just like @option{--traditional}) and
 4143 disables all extensions not allowed by POSIX.
 4144 @xref{Common Extensions} for a summary of the extensions
 4145 in @command{gawk} that are disabled by this option.
 4146 Also,
 4147 the following additional
 4148 restrictions apply:
 4149 
 4150 @itemize @value{BULLET}
 4151 
 4152 @cindex newlines
 4153 @cindex whitespace @subentry newlines as
 4154 @item
 4155 Newlines are not allowed after @samp{?} or @samp{:}
 4156 (@pxref{Conditional Exp}).
 4157 
 4158 
 4159 @cindex @code{FS} variable @subentry TAB character as
 4160 @item
 4161 Specifying @samp{-Ft} on the command line does not set the value
 4162 of @code{FS} to be a single TAB character
 4163 (@pxref{Field Separators}).
 4164 
 4165 @cindex locale decimal point character
 4166 @cindex decimal point character, locale specific
 4167 @item
 4168 The locale's decimal point character is used for parsing input
 4169 data (@pxref{Locales}).
 4170 @end itemize
 4171 
 4172 @c @cindex automatic warnings
 4173 @c @cindex warnings, automatic
 4174 @cindex @option{--traditional} option @subentry @code{--posix} option and
 4175 @cindex @option{--posix} option @subentry @code{--traditional} option and
 4176 If you supply both @option{--traditional} and @option{--posix} on the
 4177 command line, @option{--posix} takes precedence. @command{gawk}
 4178 issues a warning if both options are supplied.
 4179 
 4180 @item @option{-r}
 4181 @itemx @option{--re-interval}
 4182 @cindex @option{-r} option
 4183 @cindex @option{--re-interval} option
 4184 @cindex regular expressions @subentry interval expressions and
 4185 Allow interval expressions
 4186 (@pxref{Regexp Operators})
 4187 in regexps.
 4188 This is now @command{gawk}'s default behavior.
 4189 Nevertheless, this option remains (both for backward compatibility
 4190 and for use in combination with @option{--traditional}).
 4191 
 4192 @item @option{-s}
 4193 @itemx @option{--no-optimize}
 4194 @cindex @option{--no-optimize} option
 4195 @cindex @option{-s} option
 4196 Disable @command{gawk}'s default optimizations on the internal
 4197 representation of the program.
 4198 
 4199 @item @option{-S}
 4200 @itemx @option{--sandbox}
 4201 @cindex @option{-S} option
 4202 @cindex @option{--sandbox} option
 4203 @cindex sandbox mode
 4204 @cindex @code{ARGV} array
 4205 Disable the @code{system()} function,
 4206 input redirections with @code{getline},
 4207 output redirections with @code{print} and @code{printf},
 4208 and dynamic extensions.
 4209 Also, disallow adding filenames to @code{ARGV} that were
 4210 not there when @command{gawk} started running.
 4211 This is particularly useful when you want to run @command{awk} scripts
 4212 from questionable sources and need to make sure the scripts
 4213 can't access your system (other than the specified input @value{DF}s).
 4214 
 4215 @item @option{-t}
 4216 @itemx @option{--lint-old}
 4217 @cindex @option{-L} option
 4218 @cindex @option{--lint-old} option
 4219 Warn about constructs that are not available in the original version of
 4220 @command{awk} from Version 7 Unix
 4221 (@pxref{V7/SVR3.1}).
 4222 
 4223 @item @option{-V}
 4224 @itemx @option{--version}
 4225 @cindex @option{-V} option
 4226 @cindex @option{--version} option
 4227 @cindex @command{gawk} @subentry version of @subentry printing information about
 4228 Print version information for this particular copy of @command{gawk}.
 4229 This allows you to determine if your copy of @command{gawk} is up to date
 4230 with respect to whatever the Free Software Foundation is currently
 4231 distributing.
 4232 It is also useful for bug reports
 4233 (@pxref{Bugs}).
 4234 
 4235 @cindex @code{-} (hyphen) @subentry @code{--} end of options marker
 4236 @cindex hyphen (@code{-}) @subentry @code{--} end of options marker
 4237 @item @code{--}
 4238 Mark the end of all options.
 4239 Any command-line arguments following @code{--} are placed in @code{ARGV},
 4240 even if they start with a minus sign.
 4241 @end table
 4242 
 4243 As long as program text has been supplied,
 4244 any other options are flagged as invalid with a warning message but
 4245 are otherwise ignored.
 4246 
 4247 @cindex @option{-F} option @subentry @option{-Ft} sets @code{FS} to TAB
 4248 In compatibility mode, as a special case, if the value of @var{fs} supplied
 4249 to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB
 4250 character (@code{"\t"}).  This is true only for @option{--traditional} and not
 4251 for @option{--posix}
 4252 (@pxref{Field Separators}).
 4253 
 4254 @cindex @option{-f} option @subentry multiple uses
 4255 The @option{-f} option may be used more than once on the command line.
 4256 If it is, @command{awk} reads its program source from all of the named files, as
 4257 if they had been concatenated together into one big file.  This is
 4258 useful for creating libraries of @command{awk} functions.  These functions
 4259 can be written once and then retrieved from a standard place, instead
 4260 of having to be included in each individual program.
 4261 The @option{-i} option is similar in this regard.
 4262 (As mentioned in
 4263 @ref{Definition Syntax},
 4264 function names must be unique.)
 4265 
 4266 With standard @command{awk}, library functions can still be used, even
 4267 if the program is entered at the keyboard,
 4268 by specifying @samp{-f /dev/tty}.  After typing your program,
 4269 type @kbd{Ctrl-d} (the end-of-file character) to terminate it.
 4270 (You may also use @samp{-f -} to read program source from the standard
 4271 input, but then you will not be able to also use the standard input as a
 4272 source of data.)
 4273 
 4274 Because it is clumsy using the standard @command{awk} mechanisms to mix
 4275 source file and command-line @command{awk} programs, @command{gawk}
 4276 provides the @option{-e} option.  This does not require you to
 4277 preempt the standard input for your source code, and it allows you to easily
 4278 mix command-line and library source code (@pxref{AWKPATH Variable}).
 4279 As with @option{-f}, the @option{-e} and @option{-i}
 4280 options may also be used multiple times on the command line.
 4281 
 4282 @cindex @option{-e} option
 4283 If no @option{-f} option (or @option{-e} option for @command{gawk})
 4284 is specified, then @command{awk} uses the first nonoption command-line
 4285 argument as the text of the program source code.  Arguments on
 4286 the command line that follow the program text are entered into the
 4287 @code{ARGV} array; @command{awk} does @emph{not} continue to parse the
 4288 command line looking for options.
 4289 
 4290 @cindex @env{POSIXLY_CORRECT} environment variable
 4291 @cindex environment variables @subentry @env{POSIXLY_CORRECT}
 4292 @cindex lint checking @subentry @env{POSIXLY_CORRECT} environment variable
 4293 @cindex POSIX mode
 4294 If the environment variable @env{POSIXLY_CORRECT} exists,
 4295 then @command{gawk} behaves in strict POSIX mode, exactly as if
 4296 you had supplied @option{--posix}.
 4297 Many GNU programs look for this environment variable to suppress
 4298 extensions that conflict with POSIX, but @command{gawk} behaves
 4299 differently: it suppresses all extensions, even those that do not
 4300 conflict with POSIX, and behaves in
 4301 strict POSIX mode. If @option{--lint} is supplied on the command line
 4302 and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT},
 4303 then it issues a warning message indicating that POSIX
 4304 mode is in effect.
 4305 You would typically set this variable in your shell's startup file.
 4306 For a Bourne-compatible shell (such as Bash), you would add these
 4307 lines to the @file{.profile} file in your home directory:
 4308 
 4309 @example
 4310 POSIXLY_CORRECT=true
 4311 export POSIXLY_CORRECT
 4312 @end example
 4313 
 4314 @cindex @command{csh} utility @subentry @env{POSIXLY_CORRECT} environment variable
 4315 For a C shell-compatible
 4316 shell,@footnote{Not recommended.}
 4317 you would add this line to the @file{.login} file in your home directory:
 4318 
 4319 @example
 4320 setenv POSIXLY_CORRECT true
 4321 @end example
 4322 
 4323 @cindex portability @subentry @env{POSIXLY_CORRECT} environment variable
 4324 Having @env{POSIXLY_CORRECT} set is not recommended for daily use,
 4325 but it is good for testing the portability of your programs to other
 4326 environments.
 4327 
 4328 @node Other Arguments
 4329 @section Other Command-Line Arguments
 4330 @cindex command line @subentry arguments
 4331 @cindex arguments @subentry command-line
 4332 
 4333 Any additional arguments on the command line are normally treated as
 4334 input files to be processed in the order specified.   However, an
 4335 argument that has the form @code{@var{var}=@var{value}}, assigns
 4336 the value @var{value} to the variable @var{var}---it does not specify a
 4337 file at all.  (See @ref{Assignment Options}.) In the following example,
 4338 @var{count=1} is a variable assignment, not a @value{FN}:
 4339 
 4340 @example
 4341 awk -f program.awk file1 count=1 file2
 4342 @end example
 4343 
 4344 @noindent
 4345 As a side point, should you really need to have @command{awk}
 4346 process a file named @file{count=1} (or any file whose name looks like
 4347 a variable assignment), precede the file name with @samp{./}, like so:
 4348 
 4349 @example
 4350 awk -f program.awk file1 ./count=1 file2
 4351 @end example
 4352 
 4353 @cindex @command{gawk} @subentry @code{ARGIND} variable in
 4354 @cindex @code{ARGIND} variable @subentry command-line arguments
 4355 @cindex @code{ARGV} array, indexing into
 4356 @cindex @code{ARGC}/@code{ARGV} variables @subentry command-line arguments
 4357 @cindex @command{gawk} @subentry @code{PROCINFO} array in
 4358 All the command-line arguments are made available to your @command{awk} program in the
 4359 @code{ARGV} array (@pxref{Built-in Variables}).  Command-line options
 4360 and the program text (if present) are omitted from @code{ARGV}.
 4361 All other arguments, including variable assignments, are
 4362 included.   As each element of @code{ARGV} is processed, @command{gawk}
 4363 sets @code{ARGIND} to the index in @code{ARGV} of the
 4364 current element.  (@command{gawk} makes the full command line,
 4365 including program text and options, available in @code{PROCINFO["argv"]};
 4366 @pxref{Auto-set}.)
 4367 
 4368 @c FIXME: One day, move the ARGC and ARGV node closer to here.
 4369 Changing @code{ARGC} and @code{ARGV} in your @command{awk} program lets
 4370 you control how @command{awk} processes the input files; this is described
 4371 in more detail in @ref{ARGC and ARGV}.
 4372 
 4373 @cindex input files @subentry variable assignments and
 4374 @cindex variable assignments and input files
 4375 The distinction between @value{FN} arguments and variable-assignment
 4376 arguments is made when @command{awk} is about to open the next input file.
 4377 At that point in execution, it checks the @value{FN} to see whether
 4378 it is really a variable assignment; if so, @command{awk} sets the variable
 4379 instead of reading a file.
 4380 
 4381 Therefore, the variables actually receive the given values after all
 4382 previously specified files have been read.  In particular, the values of
 4383 variables assigned in this fashion are @emph{not} available inside a
 4384 @code{BEGIN} rule
 4385 (@pxref{BEGIN/END}),
 4386 because such rules are run before @command{awk} begins scanning the argument list.
 4387 
 4388 @cindex dark corner @subentry escape sequences
 4389 The variable values given on the command line are processed for escape
 4390 sequences (@pxref{Escape Sequences}).
 4391 @value{DARKCORNER}
 4392 
 4393 In some very early implementations of @command{awk}, when a variable assignment
 4394 occurred before any @value{FN}s, the assignment would happen @emph{before}
 4395 the @code{BEGIN} rule was executed.  @command{awk}'s behavior was thus
 4396 inconsistent; some command-line assignments were available inside the
 4397 @code{BEGIN} rule, while others were not.  Unfortunately,
 4398 some applications came to depend
 4399 upon this ``feature.''  When @command{awk} was changed to be more consistent,
 4400 the @option{-v} option was added to accommodate applications that depended
 4401 upon the old behavior.
 4402 
 4403 The variable assignment feature is most useful for assigning to variables
 4404 such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
 4405 output formats, before scanning the @value{DF}s.  It is also useful for
 4406 controlling state if multiple passes are needed over a @value{DF}.  For
 4407 example:
 4408 
 4409 @cindex files @subentry multiple passes over
 4410 @example
 4411 awk 'pass == 1  @{ @var{pass 1 stuff} @}
 4412      pass == 2  @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata
 4413 @end example
 4414 
 4415 Given the variable assignment feature, the @option{-F} option for setting
 4416 the value of @code{FS} is not
 4417 strictly necessary.  It remains for historical compatibility.
 4418 
 4419 @node Naming Standard Input
 4420 @section Naming Standard Input
 4421 
 4422 Often, you may wish to read standard input together with other files.
 4423 For example, you may wish to read one file, read standard input coming
 4424 from a pipe, and then read another file.
 4425 
 4426 The way to name the standard input, with all versions of @command{awk},
 4427 is to use a single, standalone minus sign or dash, @samp{-}.  For example:
 4428 
 4429 @example
 4430 @var{some_command} | awk -f myprog.awk file1 - file2
 4431 @end example
 4432 
 4433 @noindent
 4434 Here, @command{awk} first reads @file{file1}, then it reads
 4435 the output of @var{some_command}, and finally it reads
 4436 @file{file2}.
 4437 
 4438 You may also use @code{"-"} to name standard input when reading
 4439 files with @code{getline} (@pxref{Getline/File}).
 4440 And, you can even use @code{"-"} with the @option{-f} option
 4441 to read program source code from standard input (@pxref{Options}).
 4442 
 4443 In addition, @command{gawk} allows you to specify the special
 4444 @value{FN} @file{/dev/stdin}, both on the command line and
 4445 with @code{getline}.
 4446 Some other versions of @command{awk} also support this, but it
 4447 is not standard.
 4448 (Some operating systems provide a @file{/dev/stdin} file
 4449 in the filesystem; however, @command{gawk} always processes
 4450 this @value{FN} itself.)
 4451 
 4452 @node Environment Variables
 4453 @section The Environment Variables @command{gawk} Uses
 4454 @cindex environment variables @subentry used by @command{gawk}
 4455 
 4456 A number of environment variables influence how @command{gawk}
 4457 behaves.
 4458 
 4459 @menu
 4460 * AWKPATH Variable::            Searching directories for @command{awk}
 4461                                 programs.
 4462 * AWKLIBPATH Variable::         Searching directories for @command{awk} shared
 4463                                 libraries.
 4464 * Other Environment Variables:: The environment variables.
 4465 @end menu
 4466 
 4467 @node AWKPATH Variable
 4468 @subsection The @env{AWKPATH} Environment Variable
 4469 @cindex @env{AWKPATH} environment variable
 4470 @cindex environment variables @subentry @env{AWKPATH}
 4471 @cindex directories @subentry searching @subentry for source files
 4472 @cindex search paths @subentry for source files
 4473 @cindex differences in @command{awk} and @command{gawk} @subentry @env{AWKPATH} environment variable
 4474 @ifinfo
 4475 The previous @value{SECTION} described how @command{awk} program files can be named
 4476 on the command line with the @option{-f} option.
 4477 @end ifinfo
 4478 In most @command{awk}
 4479 implementations, you must supply a precise pathname for each program
 4480 file, unless the file is in the current directory.
 4481 But with @command{gawk}, if the @value{FN} supplied to the @option{-f}
 4482 or @option{-i} options
 4483 does not contain a directory separator @samp{/}, then @command{gawk} searches a list of
 4484 directories (called the @dfn{search path}) one by one, looking for a
 4485 file with the specified name.
 4486 
 4487 The search path is a string consisting of directory names
 4488 separated by colons.@footnote{Semicolons on MS-Windows.}
 4489 @command{gawk} gets its search path from the
 4490 @env{AWKPATH} environment variable.  If that variable does not exist,
 4491 or if it has an empty value,
 4492 @command{gawk} uses a default path (described shortly).
 4493 
 4494 The search path feature is particularly helpful for building libraries
 4495 of useful @command{awk} functions.  The library files can be placed in a
 4496 standard directory in the default path and then specified on
 4497 the command line with a short @value{FN}.  Otherwise, you would have to
 4498 type the full @value{FN} for each file.
 4499 
 4500 By using the @option{-i} or @option{-f} options, your command-line
 4501 @command{awk} programs can use facilities in @command{awk} library files
 4502 (@pxref{Library Functions}).
 4503 Path searching is not done if @command{gawk} is in compatibility mode.
 4504 This is true for both @option{--traditional} and @option{--posix}.
 4505 @xref{Options}.
 4506 
 4507 If the source code file is not found after the initial search, the path is searched
 4508 again after adding the suffix @samp{.awk} to the @value{FN}.
 4509 
 4510 @command{gawk}'s path search mechanism is similar
 4511 to the shell's.
 4512 (See @uref{https://www.gnu.org/software/bash/manual/,
 4513 @cite{The Bourne-Again SHell manual}}.)
 4514 It treats a null entry in the path as indicating the current
 4515 directory.
 4516 (A null entry is indicated by starting or ending the path with a
 4517 colon or by placing two colons next to each other [@samp{::}].)
 4518 
 4519 @quotation NOTE
 4520 To include the current directory in the path, either place @file{.}
 4521 as an entry in the path or write a null entry in the path.
 4522 
 4523 Different past versions of @command{gawk} would also look explicitly in
 4524 the current directory, either before or after the path search.  As of
 4525 @value{PVERSION} 4.1.2, this no longer happens; if you wish to look
 4526 in the current directory, you must include @file{.} either as a separate
 4527 entry or as a null entry in the search path.
 4528 @end quotation
 4529 
 4530 The default value for @env{AWKPATH} is
 4531 @samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk}
 4532 may use a different directory; it
 4533 will depend upon how @command{gawk} was built and installed. The actual
 4534 directory is the value of @code{$(pkgdatadir)} generated when
 4535 @command{gawk} was configured.
 4536 (For more detail, see the @file{INSTALL} file in the source distribution,
 4537 and see @ref{Quick Installation}.
 4538 You probably don't need to worry about this,
 4539 though.)}  Since @file{.} is included at the beginning, @command{gawk}
 4540 searches first in the current directory and then in @file{/usr/local/share/awk}.
 4541 In practice, this means that you will rarely need to change the
 4542 value of @env{AWKPATH}.
 4543 
 4544 @xref{Shell Startup Files}, for information on functions that help to
 4545 manipulate the @env{AWKPATH} variable.
 4546 
 4547 @command{gawk} places the value of the search path that it used into
 4548 @code{ENVIRON["AWKPATH"]}. This provides access to the actual search
 4549 path value from within an @command{awk} program.
 4550 
 4551 Although you can change @code{ENVIRON["AWKPATH"]} within your @command{awk}
 4552 program, this has no effect on the running program's behavior.  This makes
 4553 sense: the @env{AWKPATH} environment variable is used to find the program
 4554 source files.  Once your program is running, all the files have been
 4555 found, and @command{gawk} no longer needs to use @env{AWKPATH}.
 4556 
 4557 @node AWKLIBPATH Variable
 4558 @subsection The @env{AWKLIBPATH} Environment Variable
 4559 @cindex @env{AWKLIBPATH} environment variable
 4560 @cindex environment variables @subentry @env{AWKLIBPATH}
 4561 @cindex directories @subentry searching @subentry for loadable extensions
 4562 @cindex search paths @subentry for loadable extensions
 4563 @cindex differences in @command{awk} and @command{gawk} @subentry @code{AWKLIBPATH} environment variable
 4564 
 4565 The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH}
 4566 variable, but it is used to search for loadable extensions (stored as
 4567 system shared libraries) specified with the @option{-l} option rather
 4568 than for source files.  If the extension is not found, the path is
 4569 searched again after adding the appropriate shared library suffix for
 4570 the platform.  For example, on GNU/Linux systems, the suffix @samp{.so}
 4571 is used.  The search path specified is also used for extensions loaded
 4572 via the @code{@@load} keyword (@pxref{Loading Shared Libraries}).
 4573 
 4574 If @env{AWKLIBPATH} does not exist in the environment, or if it has
 4575 an empty value, @command{gawk} uses a default path; this
 4576 is typically @samp{/usr/local/lib/gawk}, although it can vary depending
 4577 upon how @command{gawk} was built.@footnote{Your version of @command{gawk}
 4578 may use a different directory; it
 4579 will depend upon how @command{gawk} was built and installed. The actual
 4580 directory is the value of @code{$(pkgextensiondir)} generated when
 4581 @command{gawk} was configured.
 4582 (For more detail, see the @file{INSTALL} file in the source distribution,
 4583 and see @ref{Quick Installation}.
 4584 You probably don't need to worry about this,
 4585 though.)}
 4586 
 4587 @xref{Shell Startup Files}, for information on functions that help to
 4588 manipulate the @env{AWKLIBPATH} variable.
 4589 
 4590 @command{gawk} places the value of the search path that it used into
 4591 @code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search
 4592 path value from within an @command{awk} program.
 4593 
 4594 Although you can change @code{ENVIRON["AWKLIBPATH"]} within your
 4595 @command{awk} program, this has no effect on the running program's
 4596 behavior.  This makes sense: the @env{AWKLIBPATH} environment variable
 4597 is used to find any requested extensions, and they are loaded before
 4598 the program starts to run.  Once your program is running, all the
 4599 extensions have been found, and @command{gawk} no longer needs to use
 4600 @env{AWKLIBPATH}.
 4601 
 4602 @node Other Environment Variables
 4603 @subsection Other Environment Variables
 4604 
 4605 A number of other environment variables affect @command{gawk}'s
 4606 behavior, but they are more specialized. Those in the following
 4607 list are meant to be used by regular users:
 4608 
 4609 @table @env
 4610 @item GAWK_MSEC_SLEEP
 4611 Specifies the interval between connection retries,
 4612 in milliseconds. On systems that do not support
 4613 the @code{usleep()} system call,
 4614 the value is rounded up to an integral number of seconds.
 4615 
 4616 @item GAWK_READ_TIMEOUT
 4617 Specifies the time, in milliseconds, for @command{gawk} to
 4618 wait for input before returning with an error.
 4619 @xref{Read Timeout}.
 4620 
 4621 @item GAWK_SOCK_RETRIES
 4622 Controls the number of times @command{gawk} attempts to
 4623 retry a two-way TCP/IP (socket) connection before giving up.
 4624 @xref{TCP/IP Networking}.
 4625 Note that when nonfatal I/O is enabled (@pxref{Nonfatal}),
 4626 @command{gawk} only tries to open a TCP/IP socket once.
 4627 
 4628 @item POSIXLY_CORRECT
 4629 Causes @command{gawk} to switch to POSIX-compatibility
 4630 mode, disabling all traditional and GNU extensions.
 4631 @xref{Options}.
 4632 @end table
 4633 
 4634 The environment variables in the following list are meant
 4635 for use by the @command{gawk} developers for testing and tuning.
 4636 They are subject to change. The variables are:
 4637 
 4638 @table @env
 4639 @item AWKBUFSIZE
 4640 This variable only affects @command{gawk} on POSIX-compliant systems.
 4641 With a value of @samp{exact}, @command{gawk} uses the size of each input
 4642 file as the size of the memory buffer to allocate for I/O. Otherwise,
 4643 the value should be a number, and @command{gawk} uses that number as
 4644 the size of the buffer to allocate.  (When this variable is not set,
 4645 @command{gawk} uses the smaller of the file's size and the ``default''
 4646 blocksize, which is usually the filesystem's I/O blocksize.)
 4647 
 4648 @item AWK_HASH
 4649 If this variable exists with a value of @samp{gst}, @command{gawk}
 4650 switches to using the hash function from GNU Smalltalk for
 4651 managing arrays.
 4652 This function may be marginally faster than the standard function.
 4653 
 4654 @item AWKREADFUNC
 4655 If this variable exists, @command{gawk} switches to reading source
 4656 files one line at a time, instead of reading in blocks. This exists
 4657 for debugging problems on filesystems on non-POSIX operating systems
 4658 where I/O is performed in records, not in blocks.
 4659 
 4660 @item GAWK_MSG_SRC
 4661 If this variable exists, @command{gawk} includes the @value{FN}
 4662 and line number within the @command{gawk} source code
 4663 from which warning and/or fatal messages
 4664 are generated.  Its purpose is to help isolate the source of a
 4665 message, as there are multiple places that produce the
 4666 same warning or error message.
 4667 
 4668 @item GAWK_LOCALE_DIR
 4669 Specifies the location of compiled message object files
 4670 for @command{gawk} itself. This is passed to the @code{bindtextdomain()}
 4671 function when @command{gawk} starts up.
 4672 
 4673 @item GAWK_NO_DFA
 4674 If this variable exists, @command{gawk} does not use the DFA regexp matcher
 4675 for ``does it match'' kinds of tests. This can cause @command{gawk}
 4676 to be slower. Its purpose is to help isolate differences between the
 4677 two regexp matchers that @command{gawk} uses internally. (There aren't
 4678 supposed to be differences, but occasionally theory and practice don't
 4679 coordinate with each other.)
 4680 
 4681 @item GAWK_STACKSIZE
 4682 This specifies the amount by which @command{gawk} should grow its
 4683 internal evaluation stack, when needed.
 4684 
 4685 @item INT_CHAIN_MAX
 4686 This specifies intended maximum number of items @command{gawk} will maintain on a
 4687 hash chain for managing arrays indexed by integers.
 4688 
 4689 @item STR_CHAIN_MAX
 4690 This specifies intended maximum number of items @command{gawk} will maintain on a
 4691 hash chain for managing arrays indexed by strings.
 4692 
 4693 @item TIDYMEM
 4694 If this variable exists, @command{gawk} uses the @code{mtrace()} library
 4695 calls from the GNU C library to help track down possible memory leaks.
 4696 @end table
 4697 
 4698 @node Exit Status
 4699 @section @command{gawk}'s Exit Status
 4700 
 4701 @cindex exit status, of @command{gawk}
 4702 If the @code{exit} statement is used with a value
 4703 (@pxref{Exit Statement}), then @command{gawk} exits with
 4704 the numeric value given to it.
 4705 
 4706 Otherwise, if there were no problems during execution,
 4707 @command{gawk} exits with the value of the C constant
 4708 @code{EXIT_SUCCESS}.  This is usually zero.
 4709 
 4710 If an error occurs, @command{gawk} exits with the value of
 4711 the C constant @code{EXIT_FAILURE}.  This is usually one.
 4712 
 4713 If @command{gawk} exits because of a fatal error, the exit
 4714 status is two.  On non-POSIX systems, this value may be mapped
 4715 to @code{EXIT_FAILURE}.
 4716 
 4717 @node Include Files
 4718 @section Including Other Files into Your Program
 4719 
 4720 @c Panos Papadopoulos <panos1962@gmail.com> contributed the original
 4721 @c text for this section.
 4722 
 4723 This @value{SECTION} describes a feature that is specific to @command{gawk}.
 4724 
 4725 @cindex @code{@@} (at-sign) @subentry @code{@@include} directive
 4726 @cindex at-sign (@code{@@}) @subentry @code{@@include} directive
 4727 @cindex file inclusion, @code{@@include} directive
 4728 @cindex including files, @code{@@include} directive
 4729 @cindex @code{@@include} directive @sortas{include directive}
 4730 The @code{@@include} keyword can be used to read external @command{awk} source
 4731 files.  This gives you the ability to split large @command{awk} source files
 4732 into smaller, more manageable pieces, and also lets you reuse common @command{awk}
 4733 code from various @command{awk} scripts.  In other words, you can group
 4734 together @command{awk} functions used to carry out specific tasks
 4735 into external files. These files can be used just like function libraries,
 4736 using the @code{@@include} keyword in conjunction with the @env{AWKPATH}
 4737 environment variable.  Note that source files may also be included
 4738 using the @option{-i} option.
 4739 
 4740 Let's see an example.
 4741 We'll start with two (trivial) @command{awk} scripts, namely
 4742 @file{test1} and @file{test2}. Here is the @file{test1} script:
 4743 
 4744 @example
 4745 BEGIN @{
 4746     print "This is script test1."
 4747 @}
 4748 @end example
 4749 
 4750 @noindent
 4751 and here is @file{test2}:
 4752 
 4753 @example
 4754 @@include "test1"
 4755 BEGIN @{
 4756     print "This is script test2."
 4757 @}
 4758 @end example
 4759 
 4760 Running @command{gawk} with @file{test2}
 4761 produces the following result:
 4762 
 4763 @example
 4764 $ @kbd{gawk -f test2}
 4765 @print{} This is script test1.
 4766 @print{} This is script test2.
 4767 @end example
 4768 
 4769 @command{gawk} runs the @file{test2} script, which includes @file{test1}
 4770 using the @code{@@include}
 4771 keyword.  So, to include external @command{awk} source files, you just
 4772 use @code{@@include} followed by the name of the file to be included,
 4773 enclosed in double quotes.
 4774 
 4775 @quotation NOTE
 4776 Keep in mind that this is a language construct and the @value{FN} cannot
 4777 be a string variable, but rather just a literal string constant in double quotes.
 4778 @end quotation
 4779 
 4780 The files to be included may be nested; e.g., given a third
 4781 script, namely @file{test3}:
 4782 
 4783 @example
 4784 @group
 4785 @@include "test2"
 4786 BEGIN @{
 4787     print "This is script test3."
 4788 @}
 4789 @end group
 4790 @end example
 4791 
 4792 @noindent
 4793 Running @command{gawk} with the @file{test3} script produces the
 4794 following results:
 4795 
 4796 @example
 4797 $ @kbd{gawk -f test3}
 4798 @print{} This is script test1.
 4799 @print{} This is script test2.
 4800 @print{} This is script test3.
 4801 @end example
 4802 
 4803 The @value{FN} can, of course, be a pathname. For example:
 4804 
 4805 @example
 4806 @@include "../io_funcs"
 4807 @end example
 4808 
 4809 @noindent
 4810 and:
 4811 
 4812 @example
 4813 @@include "/usr/awklib/network"
 4814 @end example
 4815 
 4816 @noindent
 4817 are both valid. The @env{AWKPATH} environment variable can be of great
 4818 value when using @code{@@include}. The same rules for the use
 4819 of the @env{AWKPATH} variable in command-line file searches
 4820 (@pxref{AWKPATH Variable}) apply to
 4821 @code{@@include} also.
 4822 
 4823 This is very helpful in constructing @command{gawk} function libraries.
 4824 If you have a large script with useful, general-purpose @command{awk}
 4825 functions, you can break it down into library files and put those files
 4826 in a special directory.  You can then include those ``libraries,''
 4827 either by using the full pathnames of the files, or by setting the @env{AWKPATH}
 4828 environment variable accordingly and then using @code{@@include} with
 4829 just the file part of the full pathname. Of course,
 4830 you can keep library files in more than one directory;
 4831 the more complex the working
 4832 environment is, the more directories you may need to organize the files
 4833 to be included.
 4834 
 4835 Given the ability to specify multiple @option{-f} options, the
 4836 @code{@@include} mechanism is not strictly necessary.
 4837 However, the @code{@@include} keyword
 4838 can help you in constructing self-contained @command{gawk} programs,
 4839 thus reducing the need for writing complex and tedious command lines.
 4840 In particular, @code{@@include} is very useful for writing CGI scripts
 4841 to be run from web pages.
 4842 
 4843 The rules for finding a source file described in @ref{AWKPATH Variable} also
 4844 apply to files loaded with @code{@@include}.
 4845 
 4846 Finally, files included with @code{@@include}
 4847 are treated as if they had @samp{@@namespace "awk"}
 4848 at their beginning.  @xref{Changing The Namespace}, for more information.
 4849 
 4850 @node Loading Shared Libraries
 4851 @section Loading Dynamic Extensions into Your Program
 4852 
 4853 This @value{SECTION} describes a feature that is specific to @command{gawk}.
 4854 
 4855 @cindex @code{@@} (at-sign) @subentry @code{@@load} directive
 4856 @cindex at-sign (@code{@@}) @subentry @code{@@load} directive
 4857 @cindex loading extensions @subentry @code{@@load} directive
 4858 @cindex extensions @subentry loadable @subentry loading, @code{@@load} directive
 4859 @cindex @code{@@load} directive @sortas{load directive}
 4860 The @code{@@load} keyword can be used to read external @command{awk} extensions
 4861 (stored as system shared libraries).
 4862 This allows you to link in compiled code that may offer superior
 4863 performance and/or give you access to extended capabilities not supported
 4864 by the @command{awk} language.  The @env{AWKLIBPATH} variable is used to
 4865 search for the extension.  Using @code{@@load} is completely equivalent
 4866 to using the @option{-l} command-line option.
 4867 
 4868 If the extension is not initially found in @env{AWKLIBPATH}, another
 4869 search is conducted after appending the platform's default shared library
 4870 suffix to the @value{FN}.  For example, on GNU/Linux systems, the suffix
 4871 @samp{.so} is used:
 4872 
 4873 @example
 4874 $ @kbd{gawk '@@load "ordchr"; BEGIN @{print chr(65)@}'}
 4875 @print{} A
 4876 @end example
 4877 
 4878 @noindent
 4879 This is equivalent to the following example:
 4880 
 4881 @example
 4882 @group
 4883 $ @kbd{gawk -lordchr 'BEGIN @{print chr(65)@}'}
 4884 @print{} A
 4885 @end group
 4886 @end example
 4887 
 4888 @noindent
 4889 For command-line usage, the @option{-l} option is more convenient,
 4890 but @code{@@load} is useful for embedding inside an @command{awk} source file
 4891 that requires access to an extension.
 4892 
 4893 @ref{Dynamic Extensions}, describes how to write extensions (in C or C++)
 4894 that can be loaded with either @code{@@load} or the @option{-l} option.
 4895 It also describes the @code{ordchr} extension.
 4896 
 4897 @node Obsolete
 4898 @section Obsolete Options and/or Features
 4899 
 4900 @c update this section for each release!
 4901 
 4902 @cindex options @subentry deprecated
 4903 @cindex features @subentry deprecated
 4904 @cindex obsolete features
 4905 This @value{SECTION} describes features and/or command-line options from
 4906 previous releases of @command{gawk} that either are not available in the
 4907 current version or are still supported but deprecated (meaning that
 4908 they will @emph{not} be in the next release).
 4909 
 4910 The process-related special files @file{/dev/pid}, @file{/dev/ppid},
 4911 @file{/dev/pgrpid}, and @file{/dev/user} were deprecated in @command{gawk}
 4912 3.1, but still worked.  As of @value{PVERSION} 4.0, they are no longer
 4913 interpreted specially by @command{gawk}.  (Use @code{PROCINFO} instead;
 4914 see @ref{Auto-set}.)
 4915 
 4916 @ignore
 4917 This @value{SECTION}
 4918 is thus essentially a place holder,
 4919 in case some option becomes obsolete in a future version of @command{gawk}.
 4920 @end ignore
 4921 
 4922 @node Undocumented
 4923 @section Undocumented Options and Features
 4924 @cindex undocumented features
 4925 @cindex features @subentry undocumented
 4926 @cindex Skywalker, Luke
 4927 @cindex Kenobi, Obi-Wan
 4928 @cindex jedi knights
 4929 @cindex knights, jedi
 4930 @quotation
 4931 @i{Use the Source, Luke!}
 4932 @author Obi-Wan
 4933 @end quotation
 4934 
 4935 @cindex shells @subentry sea
 4936 This @value{SECTION} intentionally left
 4937 blank.
 4938 
 4939 @ignore
 4940 @c If these came out in the Info file or TeX document, then they wouldn't
 4941 @c be undocumented, would they?
 4942 
 4943 @command{gawk} has one undocumented option:
 4944 
 4945 @table @code
 4946 @item -W nostalgia
 4947 @itemx --nostalgia
 4948 Print the message @samp{awk: bailing out near line 1} and dump core.
 4949 This option was inspired by the common behavior of very early versions of
 4950 Unix @command{awk} and by a t--shirt.
 4951 The message is @emph{not} subject to translation in non-English locales.
 4952 @c so there! nyah, nyah.
 4953 @end table
 4954 
 4955 Early versions of @command{awk} used to not require any separator (either
 4956 a newline or @samp{;}) between the rules in @command{awk} programs.  Thus,
 4957 it was common to see one-line programs like:
 4958 
 4959 @example
 4960 awk '@{ sum += $1 @} END @{ print sum @}'
 4961 @end example
 4962 
 4963 @command{gawk} actually supports this but it is purposely undocumented
 4964 because it is bad style.  The correct way to write such a program
 4965 is either:
 4966 
 4967 @example
 4968 awk '@{ sum += $1 @} ; END @{ print sum @}'
 4969 @end example
 4970 
 4971 @noindent
 4972 or:
 4973 
 4974 @example
 4975 awk '@{ sum += $1 @}
 4976      END @{ print sum @}' data
 4977 @end example
 4978 
 4979 @noindent
 4980 @xref{Statements/Lines}, for a fuller explanation.
 4981 
 4982 You can insert newlines after the @samp{;} in @code{for} loops.
 4983 This seems to have been a long-undocumented feature in Unix @command{awk}.
 4984 
 4985 Similarly, you may use @code{print} or @code{printf} statements in the
 4986 @var{init} and @var{increment} parts of a @code{for} loop.  This is another
 4987 long-undocumented ``feature'' of Unix @command{awk}.
 4988 
 4989 @command{gawk} lets you use the names of built-in functions that are
 4990 @command{gawk} extensions as the names of parameters in user-defined functions.
 4991 This is intended to ``future-proof'' old code that happens to use
 4992 function names added by @command{gawk} after the code was written.
 4993 Standard @command{awk} built-in functions, such as @code{sin()} or
 4994 @code{substr()} are @emph{not} shadowed in this way.
 4995 
 4996 You can use a @samp{P} modifier for the @code{printf()} floating-point
 4997 format control letters to use the underlying C library's result for
 4998 NaN and Infinity values, instead of the special values @command{gawk}
 4999 usually produces, as described in @ref{POSIX Floating Point Problems}.
 5000 This is mainly useful for the included unit tests.
 5001 
 5002 The @code{typeof()} built-in function
 5003 (@pxref{Type Functions})
 5004 takes an optional second array argument that, if present, will be cleared
 5005 and populated with some information about the internal implementation of
 5006 the variable. This can be useful for debugging. At the moment, this
 5007 returns a textual version of the flags for scalar variables, and the
 5008 array back-end implementation type for arrays. This interface is subject
 5009 to change and may not be stable.
 5010 
 5011 When not in POSIX or compatibility mode, if you set @code{LINENO} to a
 5012 numeric value using the @option{-v} option, @command{gawk} adds that value
 5013 to the real line number for use in error messages.  This is intended for
 5014 use within Bash shell scripts, such that the error message will reflect
 5015 the line number in the shell script, instead of in the @command{awk}
 5016 program. To demonstrate:
 5017 
 5018 @example
 5019 $ @kbd{gawk -v LINENO=10 'BEGIN @{ print("hi" @}'}
 5020 @error{} gawk: cmd. line:11: BEGIN @{ print("hi" @}
 5021 @error{} gawk: cmd. line:11:                    ^ syntax error
 5022 @end example
 5023 
 5024 @end ignore
 5025 
 5026 @node Invoking Summary
 5027 @section Summary
 5028 
 5029 @itemize @value{BULLET}
 5030 
 5031 @c From Neil R. Ormos
 5032 @item
 5033 @command{gawk} parses arguments on the command line, left to right, to
 5034 determine if they should be treated as options or as non-option arguments.
 5035 
 5036 @item
 5037 @command{gawk} recognizes several options which control its operation,
 5038 as described in @ref{Options}.  All options begin with @samp{-}.
 5039 
 5040 @item
 5041 Any argument that is not recognized as an option is treated as a
 5042 non-option argument, even if it begins with @samp{-}.
 5043 
 5044 @itemize @value{MINUS}
 5045 @item
 5046 However, when an option itself requires an argument, and the option is separated
 5047 from that argument on the command line by at least one space, the space 
 5048 is ignored, and the argument is considered to be related to the option.  Thus, in
 5049 the invocation, @samp{gawk -F x}, the @samp{x} is treated as belonging to the
 5050 @option{-F} option, not as a separate non-option argument.
 5051 @end itemize
 5052 
 5053 @item
 5054 Once @command{gawk} finds a non-option argument, it stops looking for
 5055 options. Therefore, all following arguments are also non-option arguments,
 5056 even if they resemble recognized options.
 5057 
 5058 @item
 5059 If no @option{-e} or @option{-f} options are present, @command{gawk}
 5060 expects the program text to be in the first non-option argument.
 5061 
 5062 @item
 5063 All non-option arguments, except program text provided in the first
 5064 non-option argument, are placed in @code{ARGV} as explained in
 5065 @ref{ARGC and ARGV}, and are processed as described in @ref{Other Arguments}.
 5066 @c And I wrote:
 5067 Adjusting @code{ARGC} and @code{ARGV}
 5068 affects how @command{awk} processes input.
 5069 
 5070 @c ----------------------------------------
 5071 
 5072 @item
 5073 The three standard options for all versions of @command{awk} are
 5074 @option{-f}, @option{-F}, and @option{-v}.  @command{gawk} supplies these
 5075 and many others, as well as corresponding GNU-style long options.
 5076 
 5077 @item
 5078 Nonoption command-line arguments are usually treated as @value{FN}s,
 5079 unless they have the form @samp{@var{var}=@var{value}}, in which case
 5080 they are taken as variable assignments to be performed at that point
 5081 in processing the input.
 5082 
 5083 @item
 5084 You can use a single minus sign (@samp{-}) to refer to standard input
 5085 on the command line. @command{gawk} also lets you use the special
 5086 @value{FN} @file{/dev/stdin}.
 5087 
 5088 @item
 5089 @command{gawk} pays attention to a number of environment variables.
 5090 @env{AWKPATH}, @env{AWKLIBPATH}, and @env{POSIXLY_CORRECT} are the
 5091 most important ones.
 5092 
 5093 @item
 5094 @command{gawk}'s exit status conveys information to the program
 5095 that invoked it. Use the @code{exit} statement from within
 5096 an @command{awk} program to set the exit status.
 5097 
 5098 @item
 5099 @command{gawk} allows you to include other @command{awk} source files into
 5100 your program using the @code{@@include} statement and/or the @option{-i}
 5101 and @option{-f} command-line options.
 5102 
 5103 @item
 5104 @command{gawk} allows you to load additional functions written in C
 5105 or C++ using the @code{@@load} statement and/or the @option{-l} option.
 5106 (This advanced feature is described later, in @ref{Dynamic Extensions}.)
 5107 @end itemize
 5108 
 5109 @node Regexp
 5110 @chapter Regular Expressions
 5111 @cindex regexp
 5112 @cindex regular expressions
 5113 
 5114 A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
 5115 set of strings.
 5116 Because regular expressions are such a fundamental part of @command{awk}
 5117 programming, their format and use deserve a separate @value{CHAPTER}.
 5118 
 5119 @cindex forward slash (@code{/}) @subentry to enclose regular expressions
 5120 @cindex @code{/} (forward slash) @subentry to enclose regular expressions
 5121 A regular expression enclosed in slashes (@samp{/})
 5122 is an @command{awk} pattern that matches every input record whose text
 5123 belongs to that set.
 5124 The simplest regular expression is a sequence of letters, numbers, or
 5125 both.  Such a regexp matches any string that contains that sequence.
 5126 Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
 5127 Thus, the pattern @code{/foo/} matches any input record containing
 5128 the three adjacent characters @samp{foo} @emph{anywhere} in the record.  Other
 5129 kinds of regexps let you specify more complicated classes of strings.
 5130 
 5131 @ifnotinfo
 5132 Initially, the examples in this @value{CHAPTER} are simple.
 5133 As we explain more about how
 5134 regular expressions work, we present more complicated instances.
 5135 @end ifnotinfo
 5136 
 5137 @menu
 5138 * Regexp Usage::                How to Use Regular Expressions.
 5139 * Escape Sequences::            How to write nonprinting characters.
 5140 * Regexp Operators::            Regular Expression Operators.
 5141 * Bracket Expressions::         What can go between @samp{[...]}.
 5142 * Leftmost Longest::            How much text matches.
 5143 * Computed Regexps::            Using Dynamic Regexps.
 5144 * GNU Regexp Operators::        Operators specific to GNU software.
 5145 * Case-sensitivity::            How to do case-insensitive matching.
 5146 * Regexp Summary::              Regular expressions summary.
 5147 @end menu
 5148 
 5149 @node Regexp Usage
 5150 @section How to Use Regular Expressions
 5151 
 5152 @cindex patterns @subentry regexp constants as
 5153 @cindex regular expressions @subentry as patterns
 5154 A regular expression can be used as a pattern by enclosing it in
 5155 slashes.  Then the regular expression is tested against the
 5156 entire text of each record.  (Normally, it only needs
 5157 to match some part of the text in order to succeed.)  For example, the
 5158 following prints the second field of each record where the string
 5159 @samp{li} appears anywhere in the record:
 5160 
 5161 @example
 5162 $ @kbd{awk '/li/ @{ print $2 @}' mail-list}
 5163 @print{} 555-5553
 5164 @print{} 555-0542
 5165 @print{} 555-6699
 5166 @print{} 555-3430
 5167 @end example
 5168 
 5169 @cindex regular expressions @subentry operators
 5170 @cindex operators @subentry string-matching
 5171 @c @cindex operators, @code{~}
 5172 @cindex string-matching operators
 5173 @cindex @code{~} (tilde), @code{~} operator
 5174 @cindex tilde (@code{~}), @code{~} operator
 5175 @cindex @code{!} (exclamation point) @subentry @code{!~} operator
 5176 @cindex exclamation point (@code{!}) @subentry @code{!~} operator
 5177 @c @cindex operators, @code{!~}
 5178 @cindex @code{if} statement @subentry use of regexps in
 5179 @cindex @code{while} statement @subentry use of regexps in
 5180 @cindex @code{do}-@code{while} statement @subentry use of regexps in
 5181 @c @cindex statements, @code{if}
 5182 @c @cindex statements, @code{while}
 5183 @c @cindex statements, @code{do}
 5184 Regular expressions can also be used in matching expressions.  These
 5185 expressions allow you to specify the string to match against; it need
 5186 not be the entire current input record.  The two operators @samp{~}
 5187 and @samp{!~} perform regular expression comparisons.  Expressions
 5188 using these operators can be used as patterns, or in @code{if},
 5189 @code{while}, @code{for}, and @code{do} statements.
 5190 (@xref{Statements}.)
 5191 For example, the following is true if the expression @var{exp} (taken
 5192 as a string) matches @var{regexp}:
 5193 
 5194 @example
 5195 @var{exp} ~ /@var{regexp}/
 5196 @end example
 5197 
 5198 @noindent
 5199 This example matches, or selects, all input records with the uppercase
 5200 letter @samp{J} somewhere in the first field:
 5201 
 5202 @example
 5203 $ @kbd{awk '$1 ~ /J/' inventory-shipped}
 5204 @print{} Jan  13  25  15 115
 5205 @print{} Jun  31  42  75 492
 5206 @print{} Jul  24  34  67 436
 5207 @print{} Jan  21  36  64 620
 5208 @end example
 5209 
 5210 So does this:
 5211 
 5212 @example
 5213 awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
 5214 @end example
 5215 
 5216 This next example is true if the expression @var{exp}
 5217 (taken as a character string)
 5218 does @emph{not} match @var{regexp}:
 5219 
 5220 @example
 5221 @var{exp} !~ /@var{regexp}/
 5222 @end example
 5223 
 5224 The following example matches,
 5225 or selects, all input records whose first field @emph{does not} contain
 5226 the uppercase letter @samp{J}:
 5227 
 5228 @example
 5229 $ @kbd{awk '$1 !~ /J/' inventory-shipped}
 5230 @print{} Feb  15  32  24 226
 5231 @print{} Mar  15  24  34 228
 5232 @print{} Apr  31  52  63 420
 5233 @print{} May  16  34  29 208
 5234 @dots{}
 5235 @end example
 5236 
 5237 @cindex regexp constants
 5238 @cindex constants @subentry regexp
 5239 @cindex regular expressions, constants @seeentry{regexp constants}
 5240 When a regexp is enclosed in slashes, such as @code{/foo/}, we call it
 5241 a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and
 5242 @code{"foo"} is a string constant.
 5243 
 5244 @node Escape Sequences
 5245 @section Escape Sequences
 5246 
 5247 @cindex escape sequences
 5248 @cindex escape sequences @seealso{backslash}
 5249 @cindex backslash (@code{\}) @subentry in escape sequences
 5250 @cindex @code{\} (backslash) @subentry in escape sequences
 5251 Some characters cannot be included literally in string constants
 5252 (@code{"foo"}) or regexp constants (@code{/foo/}).
 5253 Instead, they should be represented with @dfn{escape sequences},
 5254 which are character sequences beginning with a backslash (@samp{\}).
 5255 One use of an escape sequence is to include a double-quote character in
 5256 a string constant.  Because a plain double quote ends the string, you
 5257 must use @samp{\"} to represent an actual double-quote character as a
 5258 part of the string.  For example:
 5259 
 5260 @example
 5261 $ @kbd{awk 'BEGIN @{ print "He said \"hi!\" to her." @}'}
 5262 @print{} He said "hi!" to her.
 5263 @end example
 5264 
 5265 The  backslash character itself is another character that cannot be
 5266 included normally; you must write @samp{\\} to put one backslash in the
 5267 string or regexp.  Thus, the string whose contents are the two characters
 5268 @samp{"} and @samp{\} must be written @code{"\"\\"}.
 5269 
 5270 Other escape sequences represent unprintable characters
 5271 such as TAB or newline.  There is nothing to stop you from entering most
 5272 unprintable characters directly in a string constant or regexp constant,
 5273 but they may look ugly.
 5274 
 5275 The following list presents
 5276 all the escape sequences used in @command{awk} and
 5277 what they represent. Unless noted otherwise, all these escape
 5278 sequences apply to both string constants and regexp constants:
 5279 
 5280 @cindex ASCII
 5281 @table @code
 5282 @item \\
 5283 A literal backslash, @samp{\}.
 5284 
 5285 @c @cindex @command{awk} language, V.4 version
 5286 @cindex @code{\} (backslash) @subentry @code{\a} escape sequence
 5287 @cindex backslash (@code{\}) @subentry @code{\a} escape sequence
 5288 @item \a
 5289 The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL).
 5290 (This often makes some sort of audible noise.)
 5291 
 5292 @cindex @code{\} (backslash) @subentry @code{\b} escape sequence
 5293 @cindex backslash (@code{\}) @subentry @code{\b} escape sequence
 5294 @item \b
 5295 Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS).
 5296 
 5297 @cindex @code{\} (backslash) @subentry @code{\f} escape sequence
 5298 @cindex backslash (@code{\}) @subentry @code{\f} escape sequence
 5299 @item \f
 5300 Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF).
 5301 
 5302 @cindex @code{\} (backslash) @subentry @code{\n} escape sequence
 5303 @cindex backslash (@code{\}) @subentry @code{\n} escape sequence
 5304 @item \n
 5305 Newline, @kbd{Ctrl-j}, ASCII code 10 (LF).
 5306 
 5307 @cindex @code{\} (backslash) @subentry @code{\r} escape sequence
 5308 @cindex backslash (@code{\}) @subentry @code{\r} escape sequence
 5309 @item \r
 5310 Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR).
 5311 
 5312 @cindex @code{\} (backslash) @subentry @code{\t} escape sequence
 5313 @cindex backslash (@code{\}) @subentry @code{\t} escape sequence
 5314 @item \t
 5315 Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT).
 5316 
 5317 @c @cindex @command{awk} language, V.4 version
 5318 @cindex @code{\} (backslash) @subentry @code{\v} escape sequence
 5319 @cindex backslash (@code{\}) @subentry @code{\v} escape sequence
 5320 @item \v
 5321 Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT).
 5322 
 5323 @cindex @code{\} (backslash) @subentry @code{\}@var{nnn} escape sequence
 5324 @cindex backslash (@code{\}) @subentry @code{\}@var{nnn} escape sequence
 5325 @item \@var{nnn}
 5326 The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits
 5327 between @samp{0} and @samp{7}.  For example, the code for the ASCII ESC
 5328 (escape) character is @samp{\033}.
 5329 
 5330 @c @cindex @command{awk} language, V.4 version
 5331 @c @cindex @command{awk} language, POSIX version
 5332 @cindex @code{\} (backslash) @subentry @code{\x} escape sequence
 5333 @cindex backslash (@code{\}) @subentry @code{\x} escape sequence
 5334 @cindex common extensions @subentry @code{\x} escape sequence
 5335 @cindex extensions @subentry common @subentry @code{\x} escape sequence
 5336 @item \x@var{hh}@dots{}
 5337 The hexadecimal value @var{hh}, where @var{hh} stands for a sequence
 5338 of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F}
 5339 or @samp{a}--@samp{f}).  A maximum of two digts are allowed after
 5340 the @samp{\x}. Any further hexadecimal digits are treated as simple
 5341 letters or numbers.  @value{COMMONEXT}
 5342 (The @samp{\x} escape sequence is not allowed in POSIX awk.)
 5343 
 5344 @quotation CAUTION
 5345 In ISO C, the escape sequence continues until the first nonhexadecimal
 5346 digit is seen.
 5347 For many years, @command{gawk} would continue incorporating
 5348 hexadecimal digits into the value until a non-hexadecimal digit
 5349 or the end of the string was encountered.
 5350 However, using more than two hexadecimal digits produced
 5351 undefined results.
 5352 As of @value{PVERSION} 4.2, only two digits
 5353 are processed.
 5354 @end quotation
 5355 
 5356 @cindex @code{\} (backslash) @subentry @code{\/} escape sequence
 5357 @cindex backslash (@code{\}) @subentry @code{\/} escape sequence
 5358 @item \/
 5359 A literal slash (should be used for regexp constants only).
 5360 This sequence is used when you want to write a regexp
 5361 constant that contains a slash
 5362 (such as @code{/.*:\/home\/[[:alnum:]]+:.*/}; the @samp{[[:alnum:]]}
 5363 notation is discussed in @ref{Bracket Expressions}).
 5364 Because the regexp is delimited by
 5365 slashes, you need to escape any slash that is part of the pattern,
 5366 in order to tell @command{awk} to keep processing the rest of the regexp.
 5367 
 5368 @cindex @code{\} (backslash) @subentry @code{\"} escape sequence
 5369 @cindex backslash (@code{\}) @subentry @code{\"} escape sequence
 5370 @item \"
 5371 A literal double quote (should be used for string constants only).
 5372 This sequence is used when you want to write a string
 5373 constant that contains a double quote
 5374 (such as @code{"He said \"hi!\" to her."}).
 5375 Because the string is delimited by
 5376 double quotes, you need to escape any quote that is part of the string,
 5377 in order to tell @command{awk} to keep processing the rest of the string.
 5378 @end table
 5379 
 5380 In @command{gawk}, a number of additional two-character sequences that begin
 5381 with a backslash have special meaning in regexps.
 5382 @xref{GNU Regexp Operators}.
 5383 
 5384 In a regexp, a backslash before any character that is not in the previous list
 5385 and not listed in
 5386 @ref{GNU Regexp Operators}
 5387 means that the next character should be taken literally, even if it would
 5388 normally be a regexp operator.  For example, @code{/a\+b/} matches the three
 5389 characters @samp{a+b}.
 5390 
 5391 @cindex backslash (@code{\}) @subentry in escape sequences
 5392 @cindex @code{\} (backslash) @subentry in escape sequences
 5393 @cindex portability
 5394 For complete portability, do not use a backslash before any character not
 5395 shown in the previous list or that is not an operator.
 5396 
 5397 @c 11/2014: Moved so as to not stack sidebars
 5398 @sidebar Backslash Before Regular Characters
 5399 @cindex portability @subentry backslash in escape sequences
 5400 @cindex POSIX @command{awk} @subentry backslashes in string constants
 5401 @cindex backslash (@code{\}) @subentry in escape sequences @subentry POSIX and
 5402 @cindex @code{\} (backslash) @subentry in escape sequences @subentry POSIX and
 5403 
 5404 @cindex troubleshooting @subentry backslash before nonspecial character
 5405 If you place a backslash in a string constant before something that is
 5406 not one of the characters previously listed, POSIX @command{awk} purposely
 5407 leaves what happens as undefined.  There are two choices:
 5408 
 5409 @c @cindex automatic warnings
 5410 @c @cindex warnings, automatic
 5411 @cindex Brian Kernighan's @command{awk}
 5412 @table @asis
 5413 @item Strip the backslash out
 5414 This is what BWK @command{awk} and @command{gawk} both do.
 5415 For example, @code{"a\qc"} is the same as @code{"aqc"}.
 5416 (Because this is such an easy bug both to introduce and to miss,
 5417 @command{gawk} warns you about it.)
 5418 Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars
 5419 surrounded by whitespace as the field separator. There should be
 5420 two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
 5421 @c I did this!  This is why I added the warning.
 5422 
 5423 @cindex @command{gawk} @subentry escape sequences
 5424 @cindex @command{gawk} @subentry escape sequences @seealso{backslash}
 5425 @cindex Unix @command{awk} @subentry backslashes in escape sequences
 5426 @cindex @command{mawk} utility
 5427 @item Leave the backslash alone
 5428 Some other @command{awk} implementations do this.
 5429 In such implementations, typing @code{"a\qc"} is the same as typing
 5430 @code{"a\\qc"}.
 5431 @end table
 5432 @end sidebar
 5433 
 5434 To summarize:
 5435 
 5436 @itemize @value{BULLET}
 5437 @item
 5438 The escape sequences in the preceding list are always processed first,
 5439 for both string constants and regexp constants. This happens very early,
 5440 as soon as @command{awk} reads your program.
 5441 
 5442 @item
 5443 @command{gawk} processes both regexp constants and dynamic regexps
 5444 (@pxref{Computed Regexps}),
 5445 for the special operators listed in
 5446 @ref{GNU Regexp Operators}.
 5447 
 5448 @item
 5449 A backslash before any other character means to treat that character
 5450 literally.
 5451 @end itemize
 5452 
 5453 @sidebar Escape Sequences for Metacharacters
 5454 @cindex metacharacters @subentry escape sequences for
 5455 
 5456 Suppose you use an octal or hexadecimal
 5457 escape to represent a regexp metacharacter.
 5458 (See @ref{Regexp Operators}.)
 5459 Does @command{awk} treat the character as a literal character or as a regexp
 5460 operator?
 5461 
 5462 @cindex dark corner @subentry escape sequences @subentry for metacharacters
 5463 Historically, such characters were taken literally.
 5464 @value{DARKCORNER}
 5465 However, the POSIX standard indicates that they should be treated
 5466 as real metacharacters, which is what @command{gawk} does.
 5467 In compatibility mode (@pxref{Options}),
 5468 @command{gawk} treats the characters represented by octal and hexadecimal
 5469 escape sequences literally when used in regexp constants. Thus,
 5470 @code{/a\52b/} is equivalent to @code{/a\*b/}.
 5471 @end sidebar
 5472 
 5473 @node Regexp Operators
 5474 @section Regular Expression Operators
 5475 @cindex regular expressions @subentry operators
 5476 @cindex metacharacters @subentry in regular expressions
 5477 
 5478 You can combine regular expressions with special characters,
 5479 called @dfn{regular expression operators} or @dfn{metacharacters}, to
 5480 increase the power and versatility of regular expressions.
 5481 
 5482 @menu
 5483 * Regexp Operator Details::     The actual details.
 5484 * Interval Expressions::        Notes on interval expressions.
 5485 @end menu
 5486 
 5487 @node Regexp Operator Details
 5488 @subsection Regexp Operators in @command{awk}
 5489 
 5490 The escape sequences described
 5491 @ifnotinfo
 5492 earlier
 5493 @end ifnotinfo
 5494 in @ref{Escape Sequences}
 5495 are valid inside a regexp.  They are introduced by a @samp{\} and
 5496 are recognized and converted into corresponding real characters as
 5497 the very first step in processing regexps.
 5498 
 5499 Here is a list of metacharacters.  All characters that are not escape
 5500 sequences and that are not listed here stand for themselves:
 5501 
 5502 @c Use @asis so the docbook comes out ok. Sigh.
 5503 @table @asis
 5504 @cindex backslash (@code{\}) @subentry regexp operator
 5505 @cindex @code{\} (backslash) @subentry regexp operator
 5506 @item @code{\}
 5507 This suppresses the special meaning of a character when
 5508 matching.  For example, @samp{\$}
 5509 matches the character @samp{$}.
 5510 
 5511 @cindex regular expressions @subentry anchors in
 5512 @cindex Texinfo @subentry chapter beginnings in files
 5513 @cindex @code{^} (caret) @subentry regexp operator
 5514 @cindex caret (@code{^}) @subentry regexp operator
 5515 @item @code{^}
 5516 This matches the beginning of a string.  @samp{^@@chapter}
 5517 matches @samp{@@chapter} at the beginning of a string,
 5518 for example, and can be used
 5519 to identify chapter beginnings in Texinfo source files.
 5520 The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to
 5521 match only at the beginning of the string.
 5522 
 5523 It is important to realize that @samp{^} does not match the beginning of
 5524 a line (the point right after a @samp{\n} newline character) embedded in a string.
 5525 The condition is not true in the following example:
 5526 
 5527 @example
 5528 if ("line1\nLINE 2" ~ /^L/) @dots{}
 5529 @end example
 5530 
 5531 @cindex @code{$} (dollar sign) @subentry regexp operator
 5532 @cindex dollar sign (@code{$}) @subentry regexp operator
 5533 @item @code{$}
 5534 This is similar to @samp{^}, but it matches only at the end of a string.
 5535 For example, @samp{p$}
 5536 matches a record that ends with a @samp{p}.  The @samp{$} is an anchor
 5537 and does not match the end of a line
 5538 (the point right before a @samp{\n} newline character)
 5539 embedded in a string.
 5540 The condition in the following example is not true:
 5541 
 5542 @example
 5543 if ("line1\nLINE 2" ~ /1$/) @dots{}
 5544 @end example
 5545 
 5546 @cindex @code{.} (period), regexp operator
 5547 @cindex period (@code{.}), regexp operator
 5548 @item @code{.} (period)
 5549 This matches any single character,
 5550 @emph{including} the newline character.  For example, @samp{.P}
 5551 matches any single character followed by a @samp{P} in a string.  Using
 5552 concatenation, we can make a regular expression such as @samp{U.A}, which
 5553 matches any three-character sequence that begins with @samp{U} and ends
 5554 with @samp{A}.
 5555 
 5556 @cindex POSIX mode
 5557 @cindex POSIX @command{awk} @subentry period (@code{.}), using
 5558 In strict POSIX mode (@pxref{Options}),
 5559 @samp{.} does not match the @sc{nul}
 5560 character, which is a character with all bits equal to zero.
 5561 Otherwise, @sc{nul} is just another character. Other versions of @command{awk}
 5562 may not be able to match the @sc{nul} character.
 5563 
 5564 @cindex @code{[]} (square brackets), regexp operator
 5565 @cindex square brackets (@code{[]}), regexp operator
 5566 @cindex bracket expressions
 5567 @cindex character sets (in regular expressions) @seeentry{bracket expressions}
 5568 @cindex character lists @seeentry{bracket expressions}
 5569 @cindex character classes @seeentry{bracket expressions}
 5570 @item @code{[}@dots{}@code{]}
 5571 This is called a @dfn{bracket expression}.@footnote{In other literature,
 5572 you may see a bracket expression referred to as either a
 5573 @dfn{character set}, a @dfn{character class}, or a @dfn{character list}.}
 5574 It matches any @emph{one} of the characters that are enclosed in
 5575 the square brackets.  For example, @samp{[MVX]} matches any one of
 5576 the characters @samp{M}, @samp{V}, or @samp{X} in a string.  A full
 5577 discussion of what can be inside the square brackets of a bracket expression
 5578 is given in
 5579 @ref{Bracket Expressions}.
 5580 
 5581 @cindex bracket expressions @subentry complemented
 5582 @item @code{[^}@dots{}@code{]}
 5583 This is a @dfn{complemented bracket expression}.  The first character after
 5584 the @samp{[} @emph{must} be a @samp{^}.  It matches any characters
 5585 @emph{except} those in the square brackets.  For example, @samp{[^awk]}
 5586 matches any character that is not an @samp{a}, @samp{w},
 5587 or @samp{k}.
 5588 
 5589 @cindex @code{|} (vertical bar)
 5590 @cindex vertical bar (@code{|})
 5591 @item @code{|}
 5592 This is the @dfn{alternation operator} and it is used to specify
 5593 alternatives.  The @samp{|} has the lowest precedence of all the regular
 5594 expression operators.  For example, @samp{^P|[aeiouy]} matches any string
 5595 that matches either @samp{^P} or @samp{[aeiouy]}.  This means it matches
 5596 any string that starts with @samp{P} or contains (anywhere within it)
 5597 a lowercase English vowel.
 5598 
 5599 The alternation applies to the largest possible regexps on either side.
 5600 
 5601 @cindex @code{()} (parentheses) @subentry regexp operator
 5602 @cindex parentheses @code{()} @subentry regexp operator
 5603 @item @code{(}@dots{}@code{)}
 5604 Parentheses are used for grouping in regular expressions, as in
 5605 arithmetic.  They can be used to concatenate regular expressions
 5606 containing the alternation operator, @samp{|}.  For example,
 5607 @samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and
 5608 @samp{@@samp@{bar@}}.
 5609 (These are Texinfo formatting control sequences. The @samp{+} is
 5610 explained further on in this list.)
 5611 
 5612 The left or opening parenthesis is always a metacharacter; to match
 5613 one literally, precede it with a backslash. However, the right or
 5614 closing parenthesis is only special when paired with a left parenthesis;
 5615 an unpaired right parenthesis is (silently) treated as a regular character.
 5616 
 5617 @cindex @code{*} (asterisk) @subentry @code{*} operator @subentry as regexp operator
 5618 @cindex asterisk (@code{*}) @subentry @code{*} operator @subentry as regexp operator
 5619 @item @code{*}
 5620 This symbol means that the preceding regular expression should be
 5621 repeated as many times as necessary to find a match.  For example, @samp{ph*}
 5622 applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
 5623 of one @samp{p} followed by any number of @samp{h}s.  This also matches
 5624 just @samp{p} if no @samp{h}s are present.
 5625 
 5626 There are two subtle points to understand about how @samp{*} works.
 5627 First, the @samp{*} applies only to the single preceding regular expression
 5628 component (e.g., in @samp{ph*}, it applies just to the @samp{h}).
 5629 To cause @samp{*} to apply to a larger subexpression, use parentheses:
 5630 @samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on.
 5631 
 5632 Second, @samp{*} finds as many repetitions as possible. If the text
 5633 to be matched is @samp{phhhhhhhhhhhhhhooey}, @samp{ph*} matches all of
 5634 the @samp{h}s.
 5635 
 5636 @cindex @code{+} (plus sign) @subentry regexp operator
 5637 @cindex plus sign (@code{+}) @subentry regexp operator
 5638 @item @code{+}
 5639 This symbol is similar to @samp{*}, except that the preceding expression must be
 5640 matched at least once.  This means that @samp{wh+y}
 5641 would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas
 5642 @samp{wh*y} would match all three.
 5643 
 5644 @cindex @code{?} (question mark) @subentry regexp operator
 5645 @cindex question mark (@code{?}) @subentry regexp operator
 5646 @item @code{?}
 5647 This symbol is similar to @samp{*}, except that the preceding expression can be
 5648 matched either once or not at all.  For example, @samp{fe?d}
 5649 matches @samp{fed} and @samp{fd}, but nothing else.
 5650 
 5651 @cindex @code{@{@}} (braces) @subentry regexp operator
 5652 @cindex braces (@code{@{@}}) @subentry regexp operator
 5653 @cindex interval expressions, regexp operator
 5654 @item @code{@{}@var{n}@code{@}}
 5655 @itemx @code{@{}@var{n}@code{,@}}
 5656 @itemx @code{@{}@var{n}@code{,}@var{m}@code{@}}
 5657 One or two numbers inside braces denote an @dfn{interval expression}.
 5658 If there is one number in the braces, the preceding regexp is repeated
 5659 @var{n} times.
 5660 If there are two numbers separated by a comma, the preceding regexp is
 5661 repeated @var{n} to @var{m} times.
 5662 If there is one number followed by a comma, then the preceding regexp
 5663 is repeated at least @var{n} times:
 5664 
 5665 @table @code
 5666 @item wh@{3@}y
 5667 Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}.
 5668 
 5669 @item wh@{3,5@}y
 5670 Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy} only.
 5671 
 5672 @item wh@{2,@}y
 5673 Matches @samp{whhy}, @samp{whhhy}, and so on.
 5674 @end table
 5675 @end table
 5676 
 5677 @cindex precedence @subentry regexp operators
 5678 @cindex regular expressions @subentry operators @subentry precedence of
 5679 In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators,
 5680 as well as the braces @samp{@{} and @samp{@}},
 5681 have
 5682 the highest precedence, followed by concatenation, and finally by @samp{|}.
 5683 As in arithmetic, parentheses can change how operators are grouped.
 5684 
 5685 @cindex POSIX @command{awk} @subentry regular expressions and
 5686 @cindex @command{gawk} @subentry regular expressions @subentry precedence
 5687 In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
 5688 @samp{?} operators stand for themselves when there is nothing in the
 5689 regexp that precedes them.  For example, @code{/+/} matches a literal
 5690 plus sign.  However, many other versions of @command{awk} treat such a
 5691 usage as a syntax error.
 5692 
 5693 @node Interval Expressions
 5694 @subsection Some Notes On Interval Expressions
 5695 
 5696 @cindex POSIX @command{awk} @subentry interval expressions in
 5697 Interval expressions were not traditionally available in @command{awk}.
 5698 They were added as part of the POSIX standard to make @command{awk}
 5699 and @command{egrep} consistent with each other.
 5700 
 5701 @cindex @command{gawk} @subentry interval expressions and
 5702 Initially, because old programs may use @samp{@{} and @samp{@}} in regexp
 5703 constants,
 5704 @command{gawk} did @emph{not} match interval expressions
 5705 in regexps.
 5706 
 5707 However, beginning with @value{PVERSION} 4.0,
 5708 @command{gawk} does match interval expressions by default.
 5709 This is because compatibility with POSIX has become more
 5710 important to most @command{gawk} users than compatibility with
 5711 old programs.
 5712 
 5713 For programs that use @samp{@{} and @samp{@}} in regexp constants,
 5714 it is good practice to always escape them with a backslash.  Then the
 5715 regexp constants are valid and work the way you want them to, using
 5716 any version of @command{awk}.@footnote{Use two backslashes if you're
 5717 using a string constant with a regexp operator or function.}
 5718 
 5719 Finally, when @samp{@{} and @samp{@}} appear in regexp constants
 5720 in a way that cannot be interpreted as an interval expression
 5721 (such as @code{/q@{a@}/}), then they stand for themselves.
 5722 
 5723 As mentioned, interval expressions were not traditionally available
 5724 in @command{awk}. In March of 2019, BWK @command{awk} (finally) acquired them.
 5725 
 5726 Nonetheless, because they were not available for
 5727 so many decades, @command{gawk} continues to not supply them
 5728 when in compatibility mode (@pxref{Options}).
 5729 
 5730 @node Bracket Expressions
 5731 @section Using Bracket Expressions
 5732 @cindex bracket expressions
 5733 @cindex bracket expressions @subentry range expressions
 5734 @cindex range expressions (regexps)
 5735 @cindex bracket expressions @subentry character lists
 5736 
 5737 As mentioned earlier, a bracket expression matches any character among
 5738 those listed between the opening and closing square brackets.
 5739 
 5740 Within a bracket expression, a @dfn{range expression} consists of two
 5741 characters separated by a hyphen.  It matches any single character that
 5742 sorts between the two characters, based upon the system's native character
 5743 set.  For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
 5744 (See @ref{Ranges and Locales} for an explanation of how the POSIX
 5745 standard and @command{gawk} have changed over time.  This is mainly
 5746 of historical interest.)
 5747 
 5748 With the increasing popularity of the
 5749 @uref{http://www.unicode.org, Unicode character standard},
 5750 there is an additional wrinkle to consider. Octal and hexadecimal
 5751 escape sequences inside bracket expressions are taken to represent
 5752 only single-byte characters (characters whose values fit within
 5753 the range 0--256).  To match a range of characters where the endpoints
 5754 of the range are larger than 256, enter the multibyte encodings of
 5755 the characters directly.
 5756 
 5757 @cindex @code{\} (backslash) @subentry in bracket expressions
 5758 @cindex backslash (@code{\}) @subentry in bracket expressions
 5759 @cindex @code{^} (caret) @subentry in bracket expressions
 5760 @cindex caret (@code{^}) @subentry in bracket expressions
 5761 @cindex @code{-} (hyphen) @subentry in bracket expressions
 5762 @cindex hyphen (@code{-}) @subentry in bracket expressions
 5763 To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a
 5764 bracket expression, put a @samp{\} in front of it.  For example:
 5765 
 5766 @example
 5767 [d\]]
 5768 @end example
 5769 
 5770 @noindent
 5771 matches either @samp{d} or @samp{]}.
 5772 Additionally, if you place @samp{]} right after the opening
 5773 @samp{[}, the closing bracket is treated as one of the
 5774 characters to be matched.
 5775 
 5776 @cindex POSIX @command{awk} @subentry bracket expressions and
 5777 @cindex Extended Regular Expressions (EREs)
 5778 @cindex EREs (Extended Regular Expressions)
 5779 @cindex @command{egrep} utility
 5780 The treatment of @samp{\} in bracket expressions
 5781 is compatible with other @command{awk}
 5782 implementations and is also mandated by POSIX.
 5783 The regular expressions in @command{awk} are a superset
 5784 of the POSIX specification for Extended Regular Expressions (EREs).
 5785 POSIX EREs are based on the regular expressions accepted by the
 5786 traditional @command{egrep} utility.
 5787 
 5788 @cindex bracket expressions @subentry character classes
 5789 @cindex POSIX @command{awk} @subentry bracket expressions and @subentry character classes
 5790 @dfn{Character classes} are a feature introduced in the POSIX standard.
 5791 A character class is a special notation for describing
 5792 lists of characters that have a specific attribute, but the
 5793 actual characters can vary from country to country and/or
 5794 from character set to character set.  For example, the notion of what
 5795 is an alphabetic character differs between the United States and France.
 5796 
 5797 A character class is only valid in a regexp @emph{inside} the
 5798 brackets of a bracket expression.  Character classes consist of @samp{[:},
 5799 a keyword denoting the class, and @samp{:]}.
 5800 @ref{table-char-classes} lists the character classes defined by the
 5801 POSIX standard.
 5802 
 5803 @float Table,table-char-classes
 5804 @caption{POSIX character classes}
 5805 @multitable @columnfractions .15 .85
 5806 @headitem Class @tab Meaning
 5807 @item @code{[:alnum:]} @tab Alphanumeric characters
 5808 @item @code{[:alpha:]} @tab Alphabetic characters
 5809 @item @code{[:blank:]} @tab Space and TAB characters
 5810 @item @code{[:cntrl:]} @tab Control characters
 5811 @item @code{[:digit:]} @tab Numeric characters
 5812 @item @code{[:graph:]} @tab Characters that are both printable and visible
 5813 (a space is printable but not visible, whereas an @samp{a} is both)
 5814 @item @code{[:lower:]} @tab Lowercase alphabetic characters
 5815 @item @code{[:print:]} @tab Printable characters (characters that are not control characters)
 5816 @item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits,
 5817 control characters, or space characters)
 5818 @item @code{[:space:]} @tab Space characters (these are: space, TAB, newline, carriage return, formfeed and vertical tab)
 5819 @item @code{[:upper:]} @tab Uppercase alphabetic characters
 5820 @item @code{[:xdigit:]} @tab Characters that are hexadecimal digits
 5821 @end multitable
 5822 @end float
 5823 
 5824 For example, before the POSIX standard, you had to write @code{/[A-Za-z0-9]/}
 5825 to match alphanumeric characters.  If your
 5826 character set had other alphabetic characters in it, this would not
 5827 match them.
 5828 With the POSIX character classes, you can write
 5829 @code{/[[:alnum:]]/} to match the alphabetic
 5830 and numeric characters in your character set.
 5831 
 5832 @ignore
 5833 From eliz@gnu.org  Fri Feb 15 03:38:41 2019
 5834 Date: Fri, 15 Feb 2019 12:38:23 +0200
 5835 From: Eli Zaretskii <eliz@gnu.org>
 5836 To: arnold@skeeve.com
 5837 CC: pengyu.ut@gmail.com, bug-gawk@gnu.org
 5838 Subject: Re: [bug-gawk] Does gawk character classes follow this?
 5839 
 5840 > From: arnold@skeeve.com
 5841 > Date: Fri, 15 Feb 2019 03:01:34 -0700
 5842 > Cc: pengyu.ut@gmail.com, bug-gawk@gnu.org
 5843 > 
 5844 > I get the feeling that there's something really bothering you, but
 5845 > I don't understand what.
 5846 > 
 5847 > Can you clarify, please?
 5848 
 5849 I thought I already did: we cannot be expected to provide a definitive
 5850 description of what the named classes stand for, because the answer
 5851 depends on various factors out of our control.
 5852 @end ignore
 5853 
 5854 @c Thanks to
 5855 @c Date: Tue, 01 Jul 2014 07:39:51 +0200
 5856 @c From: Hermann Peifer <peifer@gmx.eu>
 5857 @cindex ASCII
 5858 Some utilities that match regular expressions provide a nonstandard
 5859 @samp{[:ascii:]} character class; @command{awk} does not. However, you
 5860 can simulate such a construct using @samp{[\x00-\x7F]}.  This matches
 5861 all values numerically between zero and 127, which is the defined
 5862 range of the ASCII character set.  Use a complemented character list
 5863 (@samp{[^\x00-\x7F]}) to match any single-byte characters that are not
 5864 in the ASCII range.
 5865 
 5866 @quotation NOTE
 5867 Some older versions of Unix @command{awk}
 5868 treat @code{[:blank:]} like @code{[:space:]}, incorrectly matching
 5869 more characters than they should.  Caveat Emptor.
 5870 @end quotation
 5871 
 5872 @cindex bracket expressions @subentry collating elements
 5873 @cindex bracket expressions @subentry non-ASCII
 5874 @cindex collating elements
 5875 Two additional special sequences can appear in bracket expressions.
 5876 These apply to non-ASCII character sets, which can have single symbols
 5877 (called @dfn{collating elements}) that are represented with more than one
 5878 character. They can also have several characters that are equivalent for
 5879 @dfn{collating}, or sorting, purposes.  (For example, in French, a plain ``e''
 5880 and a grave-accented ``@`e'' are equivalent.)
 5881 These sequences are:
 5882 
 5883 @table @asis
 5884 @cindex bracket expressions @subentry collating symbols
 5885 @cindex collating symbols
 5886 @item Collating symbols
 5887 Multicharacter collating elements enclosed between
 5888 @samp{[.} and @samp{.]}.  For example, if @samp{ch} is a collating element,
 5889 then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas
 5890 @samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
 5891 
 5892 @cindex bracket expressions @subentry equivalence classes
 5893 @item Equivalence classes
 5894 Locale-specific names for a list of
 5895 characters that are equal. The name is enclosed between
 5896 @samp{[=} and @samp{=]}.
 5897 For example, the name @samp{e} might be used to represent all of
 5898 ``e,'' ``@^e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp
 5899 that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}.
 5900 @end table
 5901 
 5902 These features are very valuable in non-English-speaking locales.
 5903 
 5904 @cindex internationalization @subentry localization @subentry character classes
 5905 @cindex @command{gawk} @subentry character classes and
 5906 @cindex POSIX @command{awk} @subentry bracket expressions and @subentry character classes
 5907 @quotation CAUTION
 5908 The library functions that @command{gawk} uses for regular
 5909 expression matching currently recognize only POSIX character classes;
 5910 they do not recognize collating symbols or equivalence classes.
 5911 @end quotation
 5912 @c maybe one day ...
 5913 
 5914 Inside a bracket expression, an opening bracket (@samp{[}) that does
 5915 not start a character class, collating element or equivalence class is
 5916 taken literally. This is also true of @samp{.} and @samp{*}.
 5917 
 5918 @node Leftmost Longest
 5919 @section How Much Text Matches?
 5920 
 5921 @cindex regular expressions @subentry leftmost longest match
 5922 @c @cindex matching, leftmost longest
 5923 Consider the following:
 5924 
 5925 @example
 5926 echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
 5927 @end example
 5928 
 5929 This example uses the @code{sub()} function to make a change to the input
 5930 record.  (@code{sub()} replaces the first instance of any text matched
 5931 by the first argument with the string provided as the second argument;
 5932 @pxref{String Functions}.)  Here, the regexp @code{/a+/} indicates ``one
 5933 or more @samp{a} characters,'' and the replacement text is @samp{<A>}.
 5934 
 5935 The input contains four @samp{a} characters.
 5936 @command{awk} (and POSIX) regular expressions always match
 5937 the leftmost, @emph{longest} sequence of input characters that can
 5938 match.  Thus, all four @samp{a} characters are
 5939 replaced with @samp{<A>} in this example:
 5940 
 5941 @example
 5942 $ @kbd{echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'}
 5943 @print{} <A>bcd
 5944 @end example
 5945 
 5946 For simple match/no-match tests, this is not so important. But when doing
 5947 text matching and substitutions with the @code{match()}, @code{sub()}, @code{gsub()},
 5948 and @code{gensub()} functions, it is very important.
 5949 @ifinfo
 5950 @xref{String Functions},
 5951 for more information on these functions.
 5952 @end ifinfo
 5953 Understanding this principle is also important for regexp-based record
 5954 and field splitting (@pxref{Records},
 5955 and also @pxref{Field Separators}).
 5956 
 5957 @node Computed Regexps
 5958 @section Using Dynamic Regexps
 5959 
 5960 @cindex regular expressions @subentry computed
 5961 @cindex regular expressions @subentry dynamic
 5962 @cindex @code{~} (tilde), @code{~} operator
 5963 @cindex tilde (@code{~}), @code{~} operator
 5964 @cindex @code{!} (exclamation point) @subentry @code{!~} operator
 5965 @cindex exclamation point (@code{!}) @subentry @code{!~} operator
 5966 @c @cindex operators, @code{~}
 5967 @c @cindex operators, @code{!~}
 5968 The righthand side of a @samp{~} or @samp{!~} operator need not be a
 5969 regexp constant (i.e., a string of characters between slashes).  It may
 5970 be any expression.  The expression is evaluated and converted to a string
 5971 if necessary; the contents of the string are then used as the
 5972 regexp.  A regexp computed in this way is called a @dfn{dynamic
 5973 regexp} or a @dfn{computed regexp}:
 5974 
 5975 @example
 5976 BEGIN @{ digits_regexp = "[[:digit:]]+" @}
 5977 $0 ~ digits_regexp    @{ print @}
 5978 @end example
 5979 
 5980 @noindent
 5981 This sets @code{digits_regexp} to a regexp that describes one or more digits,
 5982 and tests whether the input record matches this regexp.
 5983 
 5984 @quotation NOTE
 5985 When using the @samp{~} and @samp{!~}
 5986 operators, be aware that there is a difference between a regexp constant
 5987 enclosed in slashes and a string constant enclosed in double quotes.
 5988 If you are going to use a string constant, you have to understand that
 5989 the string is, in essence, scanned @emph{twice}: the first time when
 5990 @command{awk} reads your program, and the second time when it goes to
 5991 match the string on the lefthand side of the operator with the pattern
 5992 on the right.  This is true of any string-valued expression (such as
 5993 @code{digits_regexp}, shown in the previous example), not just string constants.
 5994 @end quotation
 5995 
 5996 @cindex regexp constants @subentry slashes vs.@: quotes
 5997 @cindex @code{\} (backslash) @subentry in regexp constants
 5998 @cindex backslash (@code{\}) @subentry in regexp constants
 5999 @cindex @code{"} (double quote) @subentry in regexp constants
 6000 @cindex double quote (@code{"}) @subentry in regexp constants
 6001 What difference does it make if the string is
 6002 scanned twice? The answer has to do with escape sequences, and particularly
 6003 with backslashes.  To get a backslash into a regular expression inside a
 6004 string, you have to type two backslashes.
 6005 
 6006 For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
 6007 Only one backslash is needed.  To do the same thing with a string,
 6008 you have to type @code{"\\*"}.  The first backslash escapes the
 6009 second one so that the string actually contains the
 6010 two characters @samp{\} and @samp{*}.
 6011 
 6012 @cindex troubleshooting @subentry regexp constants vs.@: string constants
 6013 @cindex regexp constants @subentry vs.@: string constants
 6014 @cindex string @subentry constants @subentry vs.@: regexp constants
 6015 Given that you can use both regexp and string constants to describe
 6016 regular expressions, which should you use?  The answer is ``regexp
 6017 constants,'' for several reasons:
 6018 
 6019 @itemize @value{BULLET}
 6020 @item
 6021 String constants are more complicated to write and
 6022 more difficult to read. Using regexp constants makes your programs
 6023 less error-prone.  Not understanding the difference between the two
 6024 kinds of constants is a common source of errors.
 6025 
 6026 @item
 6027 It is more efficient to use regexp constants. @command{awk} can note
 6028 that you have supplied a regexp and store it internally in a form that
 6029 makes pattern matching more efficient.  When using a string constant,
 6030 @command{awk} must first convert the string into this internal form and
 6031 then perform the pattern matching.
 6032 
 6033 @item
 6034 Using regexp constants is better form; it shows clearly that you
 6035 intend a regexp match.
 6036 @end itemize
 6037 
 6038 @sidebar Using @code{\n} in Bracket Expressions of Dynamic Regexps
 6039 @cindex regular expressions @subentry dynamic @subentry with embedded newlines
 6040 @cindex newlines @subentry in dynamic regexps
 6041 
 6042 Some older versions of @command{awk} do not allow the newline
 6043 character to be used inside a bracket expression for a dynamic regexp:
 6044 
 6045 @example
 6046 $ @kbd{awk '$0 ~ "[ \t\n]"'}
 6047 @error{} awk: newline in character class [
 6048 @error{} ]...
 6049 @error{}  source line number 1
 6050 @error{}  context is
 6051 @error{}        $0 ~ "[ >>>  \t\n]" <<<
 6052 @end example
 6053 
 6054 @cindex newlines @subentry in regexp constants
 6055 But a newline in a regexp constant works with no problem:
 6056 
 6057 @example
 6058 $ @kbd{awk '$0 ~ /[ \t\n]/'}
 6059 @kbd{here is a sample line}
 6060 @print{} here is a sample line
 6061 @kbd{Ctrl-d}
 6062 @end example
 6063 
 6064 @command{gawk} does not have this problem, and it isn't likely to
 6065 occur often in practice, but it's worth noting for future reference.
 6066 @end sidebar
 6067 
 6068 @node GNU Regexp Operators
 6069 @section @command{gawk}-Specific Regexp Operators
 6070 
 6071 @c This section adapted (long ago) from the regex-0.12 manual
 6072 
 6073 @cindex regular expressions @subentry operators @subentry @command{gawk}
 6074 @cindex @command{gawk} @subentry regular expressions @subentry operators
 6075 @cindex operators @subentry GNU-specific
 6076 @cindex regular expressions @subentry operators @subentry for words
 6077 @cindex word, regexp definition of
 6078 GNU software that deals with regular expressions provides a number of
 6079 additional regexp operators.  These operators are described in this
 6080 @value{SECTION} and are specific to @command{gawk};
 6081 they are not available in other @command{awk} implementations.
 6082 Most of the additional operators deal with word matching.
 6083 For our purposes, a @dfn{word} is a sequence of one or more letters, digits,
 6084 or underscores (@samp{_}):
 6085 
 6086 @table @code
 6087 @c @cindex operators, @code{\s} (@command{gawk})
 6088 @cindex backslash (@code{\}) @subentry @code{\s} operator (@command{gawk})
 6089 @cindex @code{\} (backslash) @subentry @code{\s} operator (@command{gawk})
 6090 @item \s
 6091 Matches any space character as defined by the current locale.
 6092 Think of it as shorthand for
 6093 @w{@samp{[[:space:]]}}.
 6094 
 6095 @c @cindex operators, @code{\S} (@command{gawk})
 6096 @cindex backslash (@code{\}) @subentry @code{\S} operator (@command{gawk})
 6097 @cindex @code{\} (backslash) @subentry @code{\S} operator (@command{gawk})
 6098 @item \S
 6099 Matches any character that is not a space, as defined by the current locale.
 6100 Think of it as shorthand for
 6101 @w{@samp{[^[:space:]]}}.
 6102 
 6103 @c @cindex operators, @code{\w} (@command{gawk})
 6104 @cindex backslash (@code{\}) @subentry @code{\w} operator (@command{gawk})
 6105 @cindex @code{\} (backslash) @subentry @code{\w} operator (@command{gawk})
 6106 @item \w
 6107 Matches any word-constituent character---that is, it matches any
 6108 letter, digit, or underscore. Think of it as shorthand for
 6109 @w{@samp{[[:alnum:]_]}}.
 6110 
 6111 @c @cindex operators, @code{\W} (@command{gawk})
 6112 @cindex backslash (@code{\}) @subentry @code{\W} operator (@command{gawk})
 6113 @cindex @code{\} (backslash) @subentry @code{\W} operator (@command{gawk})
 6114 @item \W
 6115 Matches any character that is not word-constituent.
 6116 Think of it as shorthand for
 6117 @w{@samp{[^[:alnum:]_]}}.
 6118 
 6119 @c @cindex operators, @code{\<} (@command{gawk})
 6120 @cindex backslash (@code{\}) @subentry @code{\<} operator (@command{gawk})
 6121 @cindex @code{\} (backslash) @subentry @code{\<} operator (@command{gawk})
 6122 @item \<
 6123 Matches the empty string at the beginning of a word.
 6124 For example, @code{/\<away/} matches @samp{away} but not
 6125 @samp{stowaway}.
 6126 
 6127 @c @cindex operators, @code{\>} (@command{gawk})
 6128 @cindex backslash (@code{\}) @subentry @code{\>} operator (@command{gawk})
 6129 @cindex @code{\} (backslash) @subentry @code{\>} operator (@command{gawk})
 6130 @item \>
 6131 Matches the empty string at the end of a word.
 6132 For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}.
 6133 
 6134 @c @cindex operators, @code{\y} (@command{gawk})
 6135 @cindex backslash (@code{\}) @subentry @code{\y} operator (@command{gawk})
 6136 @cindex @code{\} (backslash) @subentry @code{\y} operator (@command{gawk})
 6137 @cindex word boundaries, matching
 6138 @item \y
 6139 Matches the empty string at either the beginning or the
 6140 end of a word (i.e., the word boundar@strong{y}).  For example, @samp{\yballs?\y}
 6141 matches either @samp{ball} or @samp{balls}, as a separate word.
 6142 
 6143 @c @cindex operators, @code{\B} (@command{gawk})
 6144 @cindex backslash (@code{\}) @subentry @code{\B} operator (@command{gawk})
 6145 @cindex @code{\} (backslash) @subentry @code{\B} operator (@command{gawk})
 6146 @item \B
 6147 Matches the empty string that occurs between two
 6148 word-constituent characters. For example,
 6149 @code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}.
 6150 @samp{\B} is essentially the opposite of @samp{\y}.
 6151 @end table
 6152 
 6153 @cindex buffers @subentry operators for
 6154 @cindex regular expressions @subentry operators @subentry for buffers
 6155 @cindex operators @subentry string-matching @subentry for buffers
 6156 There are two other operators that work on buffers.  In Emacs, a
 6157 @dfn{buffer} is, naturally, an Emacs buffer.
 6158 Other GNU programs, including @command{gawk},
 6159 consider the entire string to match as the buffer.
 6160 The operators are:
 6161 
 6162 @table @code
 6163 @item \`
 6164 @c @cindex operators, @code{\`} (@command{gawk})
 6165 @cindex backslash (@code{\}) @subentry @code{\`} operator (@command{gawk})
 6166 @cindex @code{\} (backslash) @subentry @code{\`} operator (@command{gawk})
 6167 Matches the empty string at the
 6168 beginning of a buffer (string)
 6169 
 6170 @c @cindex operators, @code{\'} (@command{gawk})
 6171 @cindex backslash (@code{\}) @subentry @code{\'} operator (@command{gawk})
 6172 @cindex @code{\} (backslash) @subentry @code{\'} operator (@command{gawk})
 6173 @item \'
 6174 Matches the empty string at the
 6175 end of a buffer (string)
 6176 @end table
 6177 
 6178 @cindex @code{^} (caret) @subentry regexp operator
 6179 @cindex caret (@code{^}) @subentry regexp operator
 6180 @cindex @code{?} (question mark) @subentry regexp operator
 6181 @cindex question mark (@code{?}) @subentry regexp operator
 6182 Because @samp{^} and @samp{$} always work in terms of the beginning
 6183 and end of strings, these operators don't add any new capabilities
 6184 for @command{awk}.  They are provided for compatibility with other
 6185 GNU software.
 6186 
 6187 @cindex @command{gawk} @subentry word-boundary operator
 6188 @cindex word-boundary operator (@command{gawk})
 6189 @cindex operators @subentry word-boundary (@command{gawk})
 6190 In other GNU software, the word-boundary operator is @samp{\b}. However,
 6191 that conflicts with the @command{awk} language's definition of @samp{\b}
 6192 as backspace, so @command{gawk} uses a different letter.
 6193 An alternative method would have been to require two backslashes in the
 6194 GNU operators, but this was deemed too confusing. The current
 6195 method of using @samp{\y} for the GNU @samp{\b} appears to be the
 6196 lesser of two evils.
 6197 
 6198 @cindex regular expressions @subentry @command{gawk}, command-line options
 6199 @cindex @command{gawk} @subentry command-line options, regular expressions and
 6200 The various command-line options
 6201 (@pxref{Options})
 6202 control how @command{gawk} interprets characters in regexps:
 6203 
 6204 @table @asis
 6205 @item No options
 6206 In the default case, @command{gawk} provides all the facilities of
 6207 POSIX regexps and the
 6208 @ifnotinfo
 6209 previously described
 6210 GNU regexp operators.
 6211 @end ifnotinfo
 6212 @ifnottex
 6213 @ifnotdocbook
 6214 GNU regexp operators described
 6215 in @ref{Regexp Operators}.
 6216 @end ifnotdocbook
 6217 @end ifnottex
 6218 
 6219 @item @code{--posix}
 6220 Match only POSIX regexps; the GNU operators are not special
 6221 (e.g., @samp{\w} matches a literal @samp{w}).  Interval expressions
 6222 are allowed.
 6223 
 6224 @cindex Brian Kernighan's @command{awk}
 6225 @item @code{--traditional}
 6226 Match traditional Unix @command{awk} regexps. The GNU operators
 6227 are not special, and interval expressions are not available.
 6228 Because BWK @command{awk} supports them,
 6229 the POSIX character classes (@samp{[[:alnum:]]}, etc.) are available.
 6230 Characters described by octal and hexadecimal escape sequences are
 6231 treated literally, even if they represent regexp metacharacters.
 6232 
 6233 @item @code{--re-interval}
 6234 Allow interval expressions in regexps, if @option{--traditional}
 6235 has been provided.
 6236 Otherwise, interval expressions are available by default.
 6237 @end table
 6238 
 6239 @node Case-sensitivity
 6240 @section Case Sensitivity in Matching
 6241 
 6242 @cindex regular expressions @subentry case sensitivity
 6243 @cindex case sensitivity @subentry regexps and
 6244 Case is normally significant in regular expressions, both when matching
 6245 ordinary characters (i.e., not metacharacters) and inside bracket
 6246 expressions.  Thus, a @samp{w} in a regular expression matches only a lowercase
 6247 @samp{w} and not an uppercase @samp{W}.
 6248 
 6249 The simplest way to do a case-independent match is to use a bracket
 6250 expression---for example, @samp{[Ww]}.  However, this can be cumbersome if
 6251 you need to use it often, and it can make the regular expressions harder
 6252 to read.  There are two alternatives that you might prefer.
 6253 
 6254 One way to perform a case-insensitive match at a particular point in the
 6255 program is to convert the data to a single case, using the
 6256 @code{tolower()} or @code{toupper()} built-in string functions (which we
 6257 haven't discussed yet;
 6258 @pxref{String Functions}).
 6259 For example:
 6260 
 6261 @example
 6262 tolower($1) ~ /foo/  @{ @dots{} @}
 6263 @end example
 6264 
 6265 @noindent
 6266 converts the first field to lowercase before matching against it.
 6267 This works in any POSIX-compliant @command{awk}.
 6268 
 6269 @cindex @command{gawk} @subentry regular expressions @subentry case sensitivity
 6270 @cindex case sensitivity @subentry @command{gawk}
 6271 @cindex differences in @command{awk} and @command{gawk} @subentry regular expressions
 6272 @cindex @code{~} (tilde), @code{~} operator
 6273 @cindex tilde (@code{~}), @code{~} operator
 6274 @cindex @code{!} (exclamation point) @subentry @code{!~} operator
 6275 @cindex exclamation point (@code{!}) @subentry @code{!~} operator
 6276 @cindex @code{IGNORECASE} variable @subentry with @code{~} and @code{!~} operators
 6277 @cindex @command{gawk} @subentry @code{IGNORECASE} variable in
 6278 @c @cindex variables, @code{IGNORECASE}
 6279 Another method, specific to @command{gawk}, is to set the variable
 6280 @code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}).
 6281 When @code{IGNORECASE} is not zero, @emph{all} regexp and string
 6282 operations ignore case.
 6283 
 6284 Changing the value of @code{IGNORECASE} dynamically controls the
 6285 case sensitivity of the program as it runs.  Case is significant by
 6286 default because @code{IGNORECASE} (like most variables) is initialized
 6287 to zero:
 6288 
 6289 @example
 6290 x = "aB"
 6291 if (x ~ /ab/) @dots{}   # this test will fail
 6292 
 6293 IGNORECASE = 1
 6294 if (x ~ /ab/) @dots{}   # now it will succeed
 6295 @end example
 6296 
 6297 In general, you cannot use @code{IGNORECASE} to make certain rules
 6298 case insensitive and other rules case sensitive, as there is no
 6299 straightforward way
 6300 to set @code{IGNORECASE} just for the pattern of
 6301 a particular rule.@footnote{Experienced C and C++ programmers will note
 6302 that it is possible, using something like
 6303 @samp{IGNORECASE = 1 && /foObAr/ @{ @dots{} @}}
 6304 and
 6305 @samp{IGNORECASE = 0 || /foobar/ @{ @dots{} @}}.
 6306 However, this is somewhat obscure and we don't recommend it.}
 6307 To do this, use either bracket expressions or @code{tolower()}.  However, one
 6308 thing you can do with @code{IGNORECASE} only is dynamically turn
 6309 case sensitivity on or off for all the rules at once.
 6310 
 6311 @code{IGNORECASE} can be set on the command line or in a @code{BEGIN} rule
 6312 (@pxref{Other Arguments}; also
 6313 @pxref{Using BEGIN/END}).
 6314 Setting @code{IGNORECASE} from the command line is a way to make
 6315 a program case insensitive without having to edit it.
 6316 
 6317 @c @cindex ISO 8859-1
 6318 @c @cindex ISO Latin-1
 6319 In multibyte locales, the equivalences between upper- and lowercase
 6320 characters are tested based on the wide-character values of the locale's
 6321 character set.  Prior to @value{PVERSION} 5.0, single-byte characters were
 6322 tested based on the ISO-8859-1 (ISO Latin-1) character set.  However, as
 6323 of @value{PVERSION} 5.0, single-byte characters are also tested based on
 6324 the values of the locale's character set.@footnote{If you don't understand
 6325 this, don't worry about it; it just means that @command{gawk} does the
 6326 right thing.}
 6327 
 6328 The value of @code{IGNORECASE} has no effect if @command{gawk} is in
 6329 compatibility mode (@pxref{Options}).
 6330 Case is always significant in compatibility mode.
 6331 
 6332 @node Regexp Summary
 6333 @section Summary
 6334 
 6335 @itemize @value{BULLET}
 6336 @item
 6337 Regular expressions describe sets of strings to be matched.
 6338 In @command{awk}, regular expression constants are written enclosed
 6339 between slashes: @code{/}@dots{}@code{/}.
 6340 
 6341 @item
 6342 Regexp constants may be used standalone in patterns and
 6343 in conditional expressions, or as part of matching expressions
 6344 using the @samp{~} and @samp{!~} operators.
 6345 
 6346 @item
 6347 Escape sequences let you represent nonprintable characters and
 6348 also let you represent regexp metacharacters as literal characters
 6349 to be matched.
 6350 
 6351 @item
 6352 Regexp operators provide grouping, alternation, and repetition.
 6353 
 6354 @item
 6355 Bracket expressions give you a shorthand for specifying sets
 6356 of characters that can match at a particular point in a regexp.
 6357 Within bracket expressions, POSIX character classes let you specify
 6358 certain groups of characters in a locale-independent fashion.
 6359 
 6360 @item
 6361 Regular expressions match the leftmost longest text in the string being
 6362 matched.  This matters for cases where you need to know the extent of
 6363 the match, such as for text substitution and when the record separator
 6364 is a regexp.
 6365 
 6366 @item
 6367 Matching expressions may use dynamic regexps (i.e., string values
 6368 treated as regular expressions).
 6369 
 6370 @item
 6371 @command{gawk}'s @code{IGNORECASE} variable lets you control the
 6372 case sensitivity of regexp matching.  In other @command{awk}
 6373 versions, use @code{tolower()} or @code{toupper()}.
 6374 
 6375 @end itemize
 6376 
 6377 
 6378 @node Reading Files
 6379 @chapter Reading Input Files
 6380 
 6381 @cindex reading input files
 6382 @cindex input files @subentry reading
 6383 @cindex input files
 6384 @cindex @code{FILENAME} variable
 6385 In the typical @command{awk} program,
 6386 @command{awk} reads all input either from the
 6387 standard input (by default, this is the keyboard, but often it is a pipe from another
 6388 command) or from files whose names you specify on the @command{awk}
 6389 command line.  If you specify input files, @command{awk} reads them
 6390 in order, processing all the data from one before going on to the next.
 6391 The name of the current input file can be found in the predefined variable
 6392 @code{FILENAME}
 6393 (@pxref{Built-in Variables}).
 6394 
 6395 @cindex records
 6396 @cindex fields
 6397 The input is read in units called @dfn{records}, and is processed by the
 6398 rules of your program one record at a time.
 6399 By default, each record is one line.  Each
 6400 record is automatically split into chunks called @dfn{fields}.
 6401 This makes it more convenient for programs to work on the parts of a record.
 6402 
 6403 @cindex @code{getline} command
 6404 On rare occasions, you may need to use the @code{getline} command.
 6405 The  @code{getline} command is valuable both because it
 6406 can do explicit input from any number of files, and because the files
 6407 used with it do not have to be named on the @command{awk} command line
 6408 (@pxref{Getline}).
 6409 
 6410 @menu
 6411 * Records::                     Controlling how data is split into records.
 6412 * Fields::                      An introduction to fields.
 6413 * Nonconstant Fields::          Nonconstant Field Numbers.
 6414 * Changing Fields::             Changing the Contents of a Field.
 6415 * Field Separators::            The field separator and how to change it.
 6416 * Constant Size::               Reading constant width data.
 6417 * Splitting By Content::        Defining Fields By Content
 6418 * Testing field creation::      Checking how @command{gawk} is splitting
 6419                                 records.
 6420 * Multiple Line::               Reading multiline records.
 6421 * Getline::                     Reading files under explicit program control
 6422                                 using the @code{getline} function.
 6423 * Read Timeout::                Reading input with a timeout.
 6424 * Retrying Input::              Retrying input after certain errors.
 6425 * Command-line directories::    What happens if you put a directory on the
 6426                                 command line.
 6427 * Input Summary::               Input summary.
 6428 * Input Exercises::             Exercises.
 6429 @end menu
 6430 
 6431 @node Records
 6432 @section How Input Is Split into Records
 6433 
 6434 @cindex input @subentry splitting into records
 6435 @cindex records @subentry splitting input into
 6436 @cindex @code{NR} variable
 6437 @cindex @code{FNR} variable
 6438 @command{awk} divides the input for your program into records and fields.
 6439 It keeps track of the number of records that have been read so far from
 6440 the current input file.  This value is stored in a predefined variable
 6441 called @code{FNR}, which is reset to zero every time a new file is started.
 6442 Another predefined variable, @code{NR}, records the total number of input
 6443 records read so far from all @value{DF}s.  It starts at zero, but is
 6444 never automatically reset to zero.
 6445 
 6446 Normally, records are separated by newline characters.  You can control how
 6447 records are separated by assigning values to the built-in variable @code{RS}.
 6448 If @code{RS} is any single character, that character separates records.
 6449 Otherwise (in @command{gawk}), @code{RS} is treated as a regular expression.
 6450 This mechanism is explained in greater detail shortly.
 6451 
 6452 @menu
 6453 * awk split records::           How standard @command{awk} splits records.
 6454 * gawk split records::          How @command{gawk} splits records.
 6455 @end menu
 6456 
 6457 @node awk split records
 6458 @subsection Record Splitting with Standard @command{awk}
 6459 
 6460 @cindex separators @subentry for records
 6461 @cindex record separators
 6462 Records are separated by a character called the @dfn{record separator}.
 6463 By default, the record separator is the newline character.
 6464 This is why records are, by default, single lines.
 6465 To use a different character for the record separator,
 6466 simply assign that character to the predefined variable @code{RS}.
 6467 
 6468 @cindex record separators @subentry newlines as
 6469 @cindex newlines @subentry as record separators
 6470 @cindex @code{RS} variable
 6471 Like any other variable,
 6472 the value of @code{RS} can be changed in the @command{awk} program
 6473 with the assignment operator, @samp{=}
 6474 (@pxref{Assignment Ops}).
 6475 The new record-separator character should be enclosed in quotation marks,
 6476 which indicate a string constant.  Often, the right time to do this is
 6477 at the beginning of execution, before any input is processed,
 6478 so that the very first record is read with the proper separator.
 6479 To do this, use the special @code{BEGIN} pattern
 6480 (@pxref{BEGIN/END}).
 6481 For example:
 6482 
 6483 @example
 6484 awk 'BEGIN @{ RS = "u" @}
 6485      @{ print $0 @}' mail-list
 6486 @end example
 6487 
 6488 @noindent
 6489 changes the value of @code{RS} to @samp{u}, before reading any input.
 6490 The new value is a string whose first character is the letter ``u''; as a result, records
 6491 are separated by the letter ``u''.  Then the input file is read, and the second
 6492 rule in the @command{awk} program (the action with no pattern) prints each
 6493 record.  Because each @code{print} statement adds a newline at the end of
 6494 its output, this @command{awk} program copies the input
 6495 with each @samp{u} changed to a newline.  Here are the results of running
 6496 the program on @file{mail-list}:
 6497 
 6498 @example
 6499 @group
 6500 $ @kbd{awk 'BEGIN @{ RS = "u" @}}
 6501 >      @kbd{@{ print $0 @}' mail-list}
 6502 @end group
 6503 @print{} Amelia       555-5553     amelia.zodiac
 6504 @print{} sq
 6505 @print{} e@@gmail.com    F
 6506 @print{} Anthony      555-3412     anthony.assert
 6507 @print{} ro@@hotmail.com   A
 6508 @print{} Becky        555-7685     becky.algebrar
 6509 @print{} m@@gmail.com      A
 6510 @print{} Bill         555-1675     bill.drowning@@hotmail.com       A
 6511 @print{} Broderick    555-0542     broderick.aliq
 6512 @print{} otiens@@yahoo.com R
 6513 @print{} Camilla      555-2912     camilla.inf
 6514 @print{} sar
 6515 @print{} m@@skynet.be     R
 6516 @print{} Fabi
 6517 @print{} s       555-1234     fabi
 6518 @print{} s.
 6519 @print{} ndevicesim
 6520 @print{} s@@
 6521 @print{} cb.ed
 6522 @print{}     F
 6523 @print{} J
 6524 @print{} lie        555-6699     j
 6525 @print{} lie.perscr
 6526 @print{} tabor@@skeeve.com   F
 6527 @print{} Martin       555-6480     martin.codicib
 6528 @print{} s@@hotmail.com    A
 6529 @print{} Sam
 6530 @print{} el       555-3430     sam
 6531 @print{} el.lanceolis@@sh
 6532 @print{} .ed
 6533 @print{}         A
 6534 @print{} Jean-Pa
 6535 @print{} l    555-2127     jeanpa
 6536 @print{} l.campanor
 6537 @print{} m@@ny
 6538 @print{} .ed
 6539 @print{}      R
 6540 @print{}
 6541 @end example
 6542 
 6543 @noindent
 6544 Note that the entry for the name @samp{Bill} is not split.
 6545 In the original @value{DF}
 6546 (@pxref{Sample Data Files}),
 6547 the line looks like this:
 6548 
 6549 @example
 6550 Bill         555-1675     bill.drowning@@hotmail.com       A
 6551 @end example
 6552 
 6553 @noindent
 6554 It contains no @samp{u}, so there is no reason to split the record,
 6555 unlike the others, which each have one or more occurrences of the @samp{u}.
 6556 In fact, this record is treated as part of the previous record;
 6557 the newline separating them in the output
 6558 is the original newline in the @value{DF}, not the one added by
 6559 @command{awk} when it printed the record!
 6560 
 6561 @cindex record separators @subentry changing
 6562 @cindex separators @subentry for records
 6563 Another way to change the record separator is on the command line,
 6564 using the variable-assignment feature
 6565 (@pxref{Other Arguments}):
 6566 
 6567 @example
 6568 awk '@{ print $0 @}' RS="u" mail-list
 6569 @end example
 6570 
 6571 @noindent
 6572 This sets @code{RS} to @samp{u} before processing @file{mail-list}.
 6573 
 6574 Using an alphabetic character such as @samp{u} for the record separator
 6575 is highly likely to produce strange results.
 6576 Using an unusual character such as @samp{/} is more likely to
 6577 produce correct behavior in the majority of cases, but there
 6578 are no guarantees. The moral is: Know Your Data.
 6579 
 6580 @command{gawk} allows @code{RS} to be a full regular expression
 6581 (discussed shortly; @pxref{gawk split records}).  Even so, using
 6582 a regular expression metacharacter, such as @samp{.} as the single
 6583 character in the value of @code{RS} has no special effect: it is
 6584 treated literally. This is required for backwards compatibility with
 6585 both Unix @command{awk} and with POSIX.
 6586 
 6587 When using regular characters as the record separator,
 6588 there is one unusual case that occurs when @command{gawk} is
 6589 being fully POSIX-compliant (@pxref{Options}).
 6590 Then, the following (extreme) pipeline prints a surprising @samp{1}:
 6591 
 6592 @example
 6593 $ @kbd{echo | gawk --posix 'BEGIN @{ RS = "a" @} ; @{ print NF @}'}
 6594 @print{} 1
 6595 @end example
 6596 
 6597 There is one field, consisting of a newline.  The value of the built-in
 6598 variable @code{NF} is the number of fields in the current record.
 6599 (In the normal case, @command{gawk} treats the newline as whitespace,
 6600 printing @samp{0} as the result. Most other versions of @command{awk}
 6601 also act this way.)
 6602 
 6603 @cindex dark corner @subentry input files
 6604 Reaching the end of an input file terminates the current input record,
 6605 even if the last character in the file is not the character in @code{RS}.
 6606 @value{DARKCORNER}
 6607 
 6608 @cindex empty strings @seeentry{null strings}
 6609 @cindex null strings
 6610 @cindex strings @subentry empty @seeentry{null strings}
 6611 The empty string @code{""} (a string without any characters)
 6612 has a special meaning
 6613 as the value of @code{RS}. It means that records are separated
 6614 by one or more blank lines and nothing else.
 6615 @xref{Multiple Line} for more details.
 6616 
 6617 If you change the value of @code{RS} in the middle of an @command{awk} run,
 6618 the new value is used to delimit subsequent records, but the record
 6619 currently being processed, as well as records already processed, are not
 6620 affected.
 6621 
 6622 @cindex @command{gawk} @subentry @code{RT} variable in
 6623 @cindex @code{RT} variable
 6624 @cindex records @subentry terminating
 6625 @cindex terminating records
 6626 @cindex differences in @command{awk} and @command{gawk} @subentry record separators
 6627 @cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables
 6628 @cindex regular expressions @subentry as record separators
 6629 @cindex record separators @subentry regular expressions as
 6630 @cindex separators @subentry for records @subentry regular expressions as
 6631 After the end of the record has been determined, @command{gawk}
 6632 sets the variable @code{RT} to the text in the input that matched
 6633 @code{RS}.
 6634 
 6635 @node gawk split records
 6636 @subsection Record Splitting with @command{gawk}
 6637 
 6638 @cindex common extensions @subentry @code{RS} as a regexp
 6639 @cindex extensions @subentry common @subentry @code{RS} as a regexp
 6640 When using @command{gawk}, the value of @code{RS} is not limited to a
 6641 one-character string.  If it contains more than one character, it is
 6642 treated as a regular expression
 6643 (@pxref{Regexp}). @value{COMMONEXT}
 6644 In general, each record
 6645 ends at the next string that matches the regular expression; the next
 6646 record starts at the end of the matching string.  This general rule is
 6647 actually at work in the usual case, where @code{RS} contains just a
 6648 newline: a record ends at the beginning of the next matching string (the
 6649 next newline in the input), and the following record starts just after
 6650 the end of this string (at the first character of the following line).
 6651 The newline, because it matches @code{RS}, is not part of either record.
 6652 
 6653 When @code{RS} is a single character, @code{RT}
 6654 contains the same single character. However, when @code{RS} is a
 6655 regular expression, @code{RT} contains
 6656 the actual input text that matched the regular expression.
 6657 
 6658 If the input file ends without any text matching @code{RS},
 6659 @command{gawk} sets @code{RT} to the null string.
 6660 
 6661 The following example illustrates both of these features.
 6662 It sets @code{RS} equal to a regular expression that
 6663 matches either a newline or a series of one or more uppercase letters
 6664 with optional leading and/or trailing whitespace:
 6665 
 6666 @example
 6667 @group
 6668 $ @kbd{echo record 1 AAAA record 2 BBBB record 3 |}
 6669 > @kbd{gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @}}
 6670 >             @kbd{@{ print "Record =", $0,"and RT = [" RT "]" @}'}
 6671 @end group
 6672 @print{} Record = record 1 and RT = [ AAAA ]
 6673 @print{} Record = record 2 and RT = [ BBBB ]
 6674 @print{} Record = record 3 and RT = [
 6675 @print{} ]
 6676 @end example
 6677 
 6678 @noindent
 6679 The square brackets delineate the contents of @code{RT}, letting you
 6680 see the leading and trailing whitespace. The final value of
 6681 @code{RT} is a newline.
 6682 @xref{Simple Sed} for a more useful example
 6683 of @code{RS} as a regexp and @code{RT}.
 6684 
 6685 If you set @code{RS} to a regular expression that allows optional
 6686 trailing text, such as @samp{RS = "abc(XYZ)?"}, it is possible, due
 6687 to implementation constraints, that @command{gawk} may match the leading
 6688 part of the regular expression, but not the trailing part, particularly
 6689 if the input text that could match the trailing part is fairly long.
 6690 @command{gawk} attempts to avoid this problem, but currently, there's
 6691 no guarantee that this will never happen.
 6692 
 6693 @quotation NOTE
 6694 Remember that in @command{awk}, the @samp{^} and @samp{$} anchor
 6695 metacharacters match the beginning and end of a @emph{string}, and not
 6696 the beginning and end of a @emph{line}.  As a result, something like
 6697 @samp{RS = "^[[:upper:]]"} can only match at the beginning of a file.
 6698 This is because @command{gawk} views the input file as one long string
 6699 that happens to contain newline characters.
 6700 It is thus best to avoid anchor metacharacters in the value of @code{RS}.
 6701 @end quotation
 6702 
 6703 @cindex @command{gawk} @subentry @code{RT} variable in
 6704 @cindex @code{RT} variable
 6705 @cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables
 6706 The use of @code{RS} as a regular expression and the @code{RT}
 6707 variable are @command{gawk} extensions; they are not available in
 6708 compatibility mode
 6709 (@pxref{Options}).
 6710 In compatibility mode, only the first character of the value of
 6711 @code{RS} determines the end of the record.
 6712 
 6713 @cindex Brian Kernighan's @command{awk}
 6714 @command{mawk} has allowed @code{RS} to be a regexp for decades.
 6715 As of October, 2019, BWK @command{awk} also supports it.  Neither
 6716 version supplies @code{RT}, however.
 6717 
 6718 @sidebar @code{RS = "\0"} Is Not Portable
 6719 @cindex portability @subentry data files as single record
 6720 There are times when you might want to treat an entire @value{DF} as a
 6721 single record.  The only way to make this happen is to give @code{RS}
 6722 a value that you know doesn't occur in the input file.  This is hard
 6723 to do in a general way, such that a program always works for arbitrary
 6724 input files.
 6725 
 6726 You might think that for text files, the @sc{nul} character, which
 6727 consists of a character with all bits equal to zero, is a good
 6728 value to use for @code{RS} in this case:
 6729 
 6730 @example
 6731 BEGIN @{ RS = "\0" @}  # whole file becomes one record?
 6732 @end example
 6733 
 6734 @cindex differences in @command{awk} and @command{gawk} @subentry strings @subentry storing
 6735 @command{gawk} in fact accepts this, and uses the @sc{nul}
 6736 character for the record separator.
 6737 This works for certain special files, such as @file{/proc/environ} on
 6738 GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
 6739 However, this usage is @emph{not} portable
 6740 to most other @command{awk} implementations.
 6741 
 6742 @cindex dark corner @subentry strings, storing
 6743 Almost all other @command{awk} implementations@footnote{At least that we know
 6744 about.} store strings internally as C-style strings.  C strings use the
 6745 @sc{nul} character as the string terminator.  In effect, this means that
 6746 @samp{RS = "\0"} is the same as @samp{RS = ""}.
 6747 @value{DARKCORNER}
 6748 
 6749 It happens that recent versions of @command{mawk} can use the @sc{nul}
 6750 character as a record separator. However, this is a special case:
 6751 @command{mawk} does not allow embedded @sc{nul} characters in strings.
 6752 (This may change in a future version of @command{mawk}.)
 6753 
 6754 @cindex records @subentry treating files as
 6755 @cindex treating files, as single records
 6756 @cindex single records, treating files as
 6757 @xref{Readfile Function} for an interesting way to read
 6758 whole files.  If you are using @command{gawk}, see @ref{Extension Sample
 6759 Readfile} for another option.
 6760 @end sidebar
 6761 
 6762 @node Fields
 6763 @section Examining Fields
 6764 
 6765 @cindex examining fields
 6766 @cindex fields
 6767 @cindex accessing fields
 6768 @cindex fields @subentry examining
 6769 @cindex whitespace @subentry definition of
 6770 When @command{awk} reads an input record, the record is
 6771 automatically @dfn{parsed} or separated by the @command{awk} utility into chunks
 6772 called @dfn{fields}.  By default, fields are separated by @dfn{whitespace},
 6773 like words in a line.
 6774 Whitespace in @command{awk} means any string of one or more spaces,
 6775 TABs, or newlines; other characters
 6776 that are considered whitespace by other languages
 6777 (such as formfeed, vertical tab, etc.) are @emph{not} considered
 6778 whitespace by @command{awk}.
 6779 
 6780 The purpose of fields is to make it more convenient for you to refer to
 6781 these pieces of the record.  You don't have to use them---you can
 6782 operate on the whole record if you want---but fields are what make
 6783 simple @command{awk} programs so powerful.
 6784 
 6785 @cindex field operator @code{$}
 6786 @cindex @code{$} (dollar sign) @subentry @code{$} field operator
 6787 @cindex dollar sign (@code{$}) @subentry @code{$} field operator
 6788 @cindex field operators, dollar sign as
 6789 You use a dollar sign (@samp{$})
 6790 to refer to a field in an @command{awk} program,
 6791 followed by the number of the field you want.  Thus, @code{$1}
 6792 refers to the first field, @code{$2} to the second, and so on.
 6793 (Unlike in the Unix shells, the field numbers are not limited to single digits.
 6794 @code{$127} is the 127th field in the record.)
 6795 For example, suppose the following is a line of input:
 6796 
 6797 @example
 6798 This seems like a pretty nice example.
 6799 @end example
 6800 
 6801 @noindent
 6802 Here the first field, or @code{$1}, is @samp{This}, the second field, or
 6803 @code{$2}, is @samp{seems}, and so on.  Note that the last field,
 6804 @code{$7}, is @samp{example.}.  Because there is no space between the
 6805 @samp{e} and the @samp{.}, the period is considered part of the seventh
 6806 field.
 6807 
 6808 @cindex @code{NF} variable
 6809 @cindex fields @subentry number of
 6810 @code{NF} is a predefined variable whose value is the number of fields
 6811 in the current record.  @command{awk} automatically updates the value
 6812 of @code{NF} each time it reads a record.  No matter how many fields
 6813 there are, the last field in a record can be represented by @code{$NF}.
 6814 So, @code{$NF} is the same as @code{$7}, which is @samp{example.}.
 6815 If you try to reference a field beyond the last
 6816 one (such as @code{$8} when the record has only seven fields), you get
 6817 the empty string.  (If used in a numeric operation, you get zero.)
 6818 
 6819 The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is
 6820 a special case: it represents the whole input record. Use it
 6821 when you are not interested in specific fields.
 6822 Here are some more examples:
 6823 
 6824 @example
 6825 $ @kbd{awk '$1 ~ /li/ @{ print $0 @}' mail-list}
 6826 @print{} Amelia       555-5553     amelia.zodiacusque@@gmail.com    F
 6827 @print{} Julie        555-6699     julie.perscrutabor@@skeeve.com   F
 6828 @end example
 6829 
 6830 @noindent
 6831 This example prints each record in the file @file{mail-list} whose first
 6832 field contains the string @samp{li}.
 6833 
 6834 By contrast, the following example looks for @samp{li} in @emph{the
 6835 entire record} and prints the first and last fields for each matching
 6836 input record:
 6837 
 6838 @example
 6839 $ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list}
 6840 @print{} Amelia F
 6841 @print{} Broderick R
 6842 @print{} Julie F
 6843 @print{} Samuel A
 6844 @end example
 6845 
 6846 @node Nonconstant Fields
 6847 @section Nonconstant Field Numbers
 6848 @cindex fields @subentry numbers
 6849 @cindex field numbers
 6850 
 6851 A field number need not be a constant.  Any expression in
 6852 the @command{awk} language can be used after a @samp{$} to refer to a
 6853 field.  The value of the expression specifies the field number.  If the
 6854 value is a string, rather than a number, it is converted to a number.
 6855 Consider this example:
 6856 
 6857 @example
 6858 awk '@{ print $NR @}'
 6859 @end example
 6860 
 6861 @noindent
 6862 Recall that @code{NR} is the number of records read so far: one in the
 6863 first record, two in the second, and so on.  So this example prints the first
 6864 field of the first record, the second field of the second record, and so
 6865 on.  For the twentieth record, field number 20 is printed; most likely,
 6866 the record has fewer than 20 fields, so this prints a blank line.
 6867 Here is another example of using expressions as field numbers:
 6868 
 6869 @example
 6870 awk '@{ print $(2*2) @}' mail-list
 6871 @end example
 6872 
 6873 @command{awk} evaluates the expression @samp{(2*2)} and uses
 6874 its value as the number of the field to print.  The @samp{*}
 6875 represents multiplication, so the expression @samp{2*2} evaluates to four.
 6876 The parentheses are used so that the multiplication is done before the
 6877 @samp{$} operation; they are necessary whenever there is a binary
 6878 operator@footnote{A @dfn{binary operator}, such as @samp{*} for
 6879 multiplication, is one that takes two operands. The distinction
 6880 is required because @command{awk} also has unary (one-operand)
 6881 and ternary (three-operand) operators.}
 6882 in the field-number expression.  This example, then, prints the
 6883 type of relationship (the fourth field) for every line of the file
 6884 @file{mail-list}.  (All of the @command{awk} operators are listed, in
 6885 order of decreasing precedence, in
 6886 @ref{Precedence}.)
 6887 
 6888 If the field number you compute is zero, you get the entire record.
 6889 Thus, @samp{$(2-2)} has the same value as @code{$0}.  Negative field
 6890 numbers are not allowed; trying to reference one usually terminates
 6891 the program.  (The POSIX standard does not define
 6892 what happens when you reference a negative field number.  @command{gawk}
 6893 notices this and terminates your program.  Other @command{awk}
 6894 implementations may behave differently.)
 6895 
 6896 As mentioned in @ref{Fields},
 6897 @command{awk} stores the current record's number of fields in the built-in
 6898 variable @code{NF} (also @pxref{Built-in Variables}).  Thus, the expression
 6899 @code{$NF} is not a special feature---it is the direct consequence of
 6900 evaluating @code{NF} and using its value as a field number.
 6901 
 6902 @node Changing Fields
 6903 @section Changing the Contents of a Field
 6904 
 6905 @cindex fields @subentry changing contents of
 6906 The contents of a field, as seen by @command{awk}, can be changed within an
 6907 @command{awk} program; this changes what @command{awk} perceives as the
 6908 current input record.  (The actual input is untouched; @command{awk} @emph{never}
 6909 modifies the input file.)
 6910 Consider the following example and its output:
 6911 
 6912 @example
 6913 $ @kbd{awk '@{ nboxes = $3 ; $3 = $3 - 10}
 6914 >        @kbd{print nboxes, $3 @}' inventory-shipped}
 6915 @print{} 25 15
 6916 @print{} 32 22
 6917 @print{} 24 14
 6918 @dots{}
 6919 @end example
 6920 
 6921 @noindent
 6922 The program first saves the original value of field three in the variable
 6923 @code{nboxes}.
 6924 The @samp{-} sign represents subtraction, so this program reassigns
 6925 field three, @code{$3}, as the original value of field three minus ten:
 6926 @samp{$3 - 10}.  (@xref{Arithmetic Ops}.)
 6927 Then it prints the original and new values for field three.
 6928 (Someone in the warehouse made a consistent mistake while inventorying
 6929 the red boxes.)
 6930 
 6931 For this to work, the text in @code{$3} must make sense
 6932 as a number; the string of characters must be converted to a number
 6933 for the computer to do arithmetic on it.  The number resulting
 6934 from the subtraction is converted back to a string of characters that
 6935 then becomes field three.
 6936 @xref{Conversion}.
 6937 
 6938 When the value of a field is changed (as perceived by @command{awk}), the
 6939 text of the input record is recalculated to contain the new field where
 6940 the old one was.  In other words, @code{$0} changes to reflect the altered
 6941 field.  Thus, this program
 6942 prints a copy of the input file, with 10 subtracted from the second
 6943 field of each line:
 6944 
 6945 @example
 6946 $ @kbd{awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped}
 6947 @print{} Jan 3 25 15 115
 6948 @print{} Feb 5 32 24 226
 6949 @print{} Mar 5 24 34 228
 6950 @dots{}
 6951 @end example
 6952 
 6953 It is also possible to assign contents to fields that are out
 6954 of range.  For example:
 6955 
 6956 @example
 6957 $ @kbd{awk '@{ $6 = ($5 + $4 + $3 + $2)}
 6958 > @kbd{       print $6 @}' inventory-shipped}
 6959 @print{} 168
 6960 @print{} 297
 6961 @print{} 301
 6962 @dots{}
 6963 @end example
 6964 
 6965 @cindex adding @subentry fields
 6966 @cindex fields @subentry adding
 6967 @noindent
 6968 We've just created @code{$6}, whose value is the sum of fields
 6969 @code{$2}, @code{$3}, @code{$4}, and @code{$5}.  The @samp{+} sign
 6970 represents addition.  For the file @file{inventory-shipped}, @code{$6}
 6971 represents the total number of parcels shipped for a particular month.
 6972 
 6973 Creating a new field changes @command{awk}'s internal copy of the current
 6974 input record, which is the value of @code{$0}.  Thus, if you do @samp{print $0}
 6975 after adding a field, the record printed includes the new field, with
 6976 the appropriate number of field separators between it and the previously
 6977 existing fields.
 6978 
 6979 @cindex @code{OFS} variable
 6980 @cindex output field separator @seeentry{@code{OFS} variable}
 6981 @cindex field separator @seealso{@code{OFS}}
 6982 This recomputation affects and is affected by
 6983 @code{NF} (the number of fields; @pxref{Fields}).
 6984 For example, the value of @code{NF} is set to the number of the highest
 6985 field you create.
 6986 The exact format of @code{$0} is also affected by a feature that has not been discussed yet:
 6987 the @dfn{output field separator}, @code{OFS},
 6988 used to separate the fields (@pxref{Output Separators}).
 6989 
 6990 Note, however, that merely @emph{referencing} an out-of-range field
 6991 does @emph{not} change the value of either @code{$0} or @code{NF}.
 6992 Referencing an out-of-range field only produces an empty string.  For
 6993 example:
 6994 
 6995 @example
 6996 if ($(NF+1) != "")
 6997     print "can't happen"
 6998 else
 6999     print "everything is normal"
 7000 @end example
 7001 
 7002 @noindent
 7003 should print @samp{everything is normal}, because @code{NF+1} is certain
 7004 to be out of range.  (@xref{If Statement}
 7005 for more information about @command{awk}'s @code{if-else} statements.
 7006 @xref{Typing and Comparison}
 7007 for more information about the @samp{!=} operator.)
 7008 
 7009 It is important to note that making an assignment to an existing field
 7010 changes the
 7011 value of @code{$0} but does not change the value of @code{NF},
 7012 even when you assign the empty string to a field.  For example:
 7013 
 7014 @example
 7015 $ @kbd{echo a b c d | awk '@{ OFS = ":"; $2 = ""}
 7016 >                       @kbd{print $0; print NF @}'}
 7017 @print{} a::c:d
 7018 @print{} 4
 7019 @end example
 7020 
 7021 @noindent
 7022 The field is still there; it just has an empty value, delimited by
 7023 the two colons between @samp{a} and @samp{c}.
 7024 This example shows what happens if you create a new field:
 7025 
 7026 @example
 7027 $ @kbd{echo a b c d | awk '@{ OFS = ":"; $2 = ""; $6 = "new"}
 7028 >                       @kbd{print $0; print NF @}'}
 7029 @print{} a::c:d::new
 7030 @print{} 6
 7031 @end example
 7032 
 7033 @noindent
 7034 The intervening field, @code{$5}, is created with an empty value
 7035 (indicated by the second pair of adjacent colons),
 7036 and @code{NF} is updated with the value six.
 7037 
 7038 @cindex dark corner @subentry @code{NF} variable, decrementing
 7039 @cindex @code{NF} variable @subentry decrementing
 7040 Decrementing @code{NF} throws away the values of the fields
 7041 after the new value of @code{NF} and recomputes @code{$0}.
 7042 @value{DARKCORNER}
 7043 Here is an example:
 7044 
 7045 @example
 7046 $ @kbd{echo a b c d e f | awk '@{ print "NF =", NF;}
 7047 > @kbd{                          NF = 3; print $0 @}'}
 7048 @print{} NF = 6
 7049 @print{} a b c
 7050 @end example
 7051 
 7052 @cindex portability @subentry @code{NF} variable, decrementing
 7053 @quotation CAUTION
 7054 Some versions of @command{awk} don't
 7055 rebuild @code{$0} when @code{NF} is decremented.
 7056 Until August, 2018, this included BWK @command{awk}; fortunately
 7057 his version now handles this correctly.
 7058 @end quotation
 7059 
 7060 Finally, there are times when it is convenient to force
 7061 @command{awk} to rebuild the entire record, using the current
 7062 values of the fields and @code{OFS}.  To do this, use the
 7063 seemingly innocuous assignment:
 7064 
 7065 @example
 7066 @group
 7067 $1 = $1   # force record to be reconstituted
 7068 print $0  # or whatever else with $0
 7069 @end group
 7070 @end example
 7071 
 7072 @noindent
 7073 This forces @command{awk} to rebuild the record.  It does help
 7074 to add a comment, as we've shown here.
 7075 
 7076 There is a flip side to the relationship between @code{$0} and
 7077 the fields.  Any assignment to @code{$0} causes the record to be
 7078 reparsed into fields using the @emph{current} value of @code{FS}.
 7079 This also applies to any built-in function that updates @code{$0},
 7080 such as @code{sub()} and @code{gsub()}
 7081 (@pxref{String Functions}).
 7082 
 7083 @sidebar Understanding @code{$0}
 7084 
 7085 It is important to remember that @code{$0} is the @emph{full}
 7086 record, exactly as it was read from the input.  This includes
 7087 any leading or trailing whitespace, and the exact whitespace (or other
 7088 characters) that separates the fields.
 7089 
 7090 It is a common error to try to change the field separators
 7091 in a record simply by setting @code{FS} and @code{OFS}, and then
 7092 expecting a plain @samp{print} or @samp{print $0} to print the
 7093 modified record.
 7094 
 7095 But this does not work, because nothing was done to change the record
 7096 itself.  Instead, you must force the record to be rebuilt, typically
 7097 with a statement such as @samp{$1 = $1}, as described earlier.
 7098 @end sidebar
 7099 
 7100 
 7101 @node Field Separators
 7102 @section Specifying How Fields Are Separated
 7103 
 7104 @menu
 7105 * Default Field Splitting::      How fields are normally separated.
 7106 * Regexp Field Splitting::       Using regexps as the field separator.
 7107 * Single Character Fields::      Making each character a separate field.
 7108 * Command Line Field Separator:: Setting @code{FS} from the command line.
 7109 * Full Line Fields::             Making the full line be a single field.
 7110 * Field Splitting Summary::      Some final points and a summary table.
 7111 @end menu
 7112 
 7113 @cindex @code{FS} variable
 7114 @cindex fields @subentry separating
 7115 @cindex field separator
 7116 @cindex fields @subentry separating
 7117 The @dfn{field separator}, which is either a single character or a regular
 7118 expression, controls the way @command{awk} splits an input record into fields.
 7119 @command{awk} scans the input record for character sequences that
 7120 match the separator; the fields themselves are the text between the matches.
 7121 
 7122 In the examples that follow, we use the bullet symbol (@bullet{}) to
 7123 represent spaces in the output.
 7124 If the field separator is @samp{oo}, then the following line:
 7125 
 7126 @example
 7127 moo goo gai pan
 7128 @end example
 7129 
 7130 @noindent
 7131 is split into three fields: @samp{m}, @samp{@bullet{}g}, and
 7132 @samp{@bullet{}gai@bullet{}pan}.
 7133 Note the leading spaces in the values of the second and third fields.
 7134 
 7135 @cindex troubleshooting @subentry @command{awk} uses @code{FS} not @code{IFS}
 7136 The field separator is represented by the predefined variable @code{FS}.
 7137 Shell programmers take note:  @command{awk} does @emph{not} use the
 7138 name @code{IFS} that is used by the POSIX-compliant shells (such as
 7139 the Unix Bourne shell, @command{sh}, or Bash).
 7140 
 7141 @cindex @code{FS} variable @subentry changing value of
 7142 The value of @code{FS} can be changed in the @command{awk} program with the
 7143 assignment operator, @samp{=} (@pxref{Assignment Ops}).
 7144 Often, the right time to do this is at the beginning of execution
 7145 before any input has been processed, so that the very first record
 7146 is read with the proper separator.  To do this, use the special
 7147 @code{BEGIN} pattern
 7148 (@pxref{BEGIN/END}).
 7149 For example, here we set the value of @code{FS} to the string
 7150 @code{","}:
 7151 
 7152 @example
 7153 awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}'
 7154 @end example
 7155 
 7156 @cindex @code{BEGIN} pattern
 7157 @noindent
 7158 Given the input line:
 7159 
 7160 @example
 7161 John Q. Smith, 29 Oak St., Walamazoo, MI 42139
 7162 @end example
 7163 
 7164 @noindent
 7165 this @command{awk} program extracts and prints the string
 7166 @samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
 7167 
 7168 @cindex field separator @subentry choice of
 7169 @cindex regular expressions @subentry as field separators
 7170 @cindex field separator @subentry regular expression as
 7171 Sometimes the input data contains separator characters that don't
 7172 separate fields the way you thought they would.  For instance, the
 7173 person's name in the example we just used might have a title or
 7174 suffix attached, such as:
 7175 
 7176 @example
 7177 John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
 7178 @end example
 7179 
 7180 @noindent
 7181 The same program would extract @samp{@bullet{}LXIX} instead of
 7182 @samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
 7183 If you were expecting the program to print the
 7184 address, you would be surprised.  The moral is to choose your data layout and
 7185 separator characters carefully to prevent such problems.
 7186 (If the data is not in a form that is easy to process, perhaps you
 7187 can massage it first with a separate @command{awk} program.)
 7188 
 7189 
 7190 @node Default Field Splitting
 7191 @subsection Whitespace Normally Separates Fields
 7192 
 7193 @cindex field separator @subentry whitespace as
 7194 @cindex whitespace @subentry as field separators
 7195 @cindex field separator @subentry @code{FS} variable and
 7196 @cindex separators @subentry field @subentry @code{FS} variable and
 7197 Fields are normally separated by whitespace sequences
 7198 (spaces, TABs, and newlines), not by single spaces.  Two spaces in a row do not
 7199 delimit an empty field.  The default value of the field separator @code{FS}
 7200 is a string containing a single space, @w{@code{" "}}.  If @command{awk}
 7201 interpreted this value in the usual way, each space character would separate
 7202 fields, so two spaces in a row would make an empty field between them.
 7203 The reason this does not happen is that a single space as the value of
 7204 @code{FS} is a special case---it is taken to specify the default manner
 7205 of delimiting fields.
 7206 
 7207 If @code{FS} is any other single character, such as @code{","}, then
 7208 each occurrence of that character separates two fields.  Two consecutive
 7209 occurrences delimit an empty field.  If the character occurs at the
 7210 beginning or the end of the line, that too delimits an empty field.  The
 7211 space character is the only single character that does not follow these
 7212 rules.
 7213 
 7214 @node Regexp Field Splitting
 7215 @subsection Using Regular Expressions to Separate Fields
 7216 
 7217 @cindex regular expressions @subentry as field separators
 7218 @cindex field separator @subentry regular expression as
 7219 The previous @value{SUBSECTION}
 7220 discussed the use of single characters or simple strings as the
 7221 value of @code{FS}.
 7222 More generally, the value of @code{FS} may be a string containing any
 7223 regular expression.  In this case, each match in the record for the regular
 7224 expression separates fields.  For example, the assignment:
 7225 
 7226 @example
 7227 FS = ", \t"
 7228 @end example
 7229 
 7230 @noindent
 7231 makes every area of an input line that consists of a comma followed by a
 7232 space and a TAB into a field separator.
 7233 @ifinfo
 7234 (@samp{\t}
 7235 is an @dfn{escape sequence} that stands for a TAB;
 7236 @pxref{Escape Sequences},
 7237 for the complete list of similar escape sequences.)
 7238 @end ifinfo
 7239 
 7240 For a less trivial example of a regular expression, try using
 7241 single spaces to separate fields the way single commas are used.
 7242 @code{FS} can be set to @w{@code{"[@ ]"}} (left bracket, space, right
 7243 bracket).  This regular expression matches a single space and nothing else
 7244 (@pxref{Regexp}).
 7245 
 7246 There is an important difference between the two cases of @samp{FS = @w{" "}}
 7247 (a single space) and @samp{FS = @w{"[ \t\n]+"}}
 7248 (a regular expression matching one or more spaces, TABs, or newlines).
 7249 For both values of @code{FS}, fields are separated by @dfn{runs}
 7250 (multiple adjacent occurrences) of spaces, TABs,
 7251 and/or newlines.  However, when the value of @code{FS} is @w{@code{" "}},
 7252 @command{awk} first strips leading and trailing whitespace from
 7253 the record and then decides where the fields are.
 7254 For example, the following pipeline prints @samp{b}:
 7255 
 7256 @example
 7257 $ @kbd{echo ' a b c d ' | awk '@{ print $2 @}'}
 7258 @print{} b
 7259 @end example
 7260 
 7261 @noindent
 7262 However, this pipeline prints @samp{a} (note the extra spaces around
 7263 each letter):
 7264 
 7265 @example
 7266 $ @kbd{echo ' a  b  c  d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @}}
 7267 >                                  @kbd{@{ print $2 @}'}
 7268 @print{} a
 7269 @end example
 7270 
 7271 @noindent
 7272 @cindex null strings
 7273 @cindex strings @subentry null
 7274 In this case, the first field is null, or empty.
 7275 
 7276 The stripping of leading and trailing whitespace also comes into
 7277 play whenever @code{$0} is recomputed.  For instance, study this pipeline:
 7278 
 7279 @example
 7280 $ @kbd{echo '   a b c d' | awk '@{ print; $2 = $2; print @}'}
 7281 @print{}    a b c d
 7282 @print{} a b c d
 7283 @end example
 7284 
 7285 @noindent
 7286 The first @code{print} statement prints the record as it was read,
 7287 with leading whitespace intact.  The assignment to @code{$2} rebuilds
 7288 @code{$0} by concatenating @code{$1} through @code{$NF} together,
 7289 separated by the value of @code{OFS} (which is a space by default).
 7290 Because the leading whitespace was ignored when finding @code{$1},
 7291 it is not part of the new @code{$0}.  Finally, the last @code{print}
 7292 statement prints the new @code{$0}.
 7293 
 7294 @cindex @code{FS} variable @subentry containing @code{^}
 7295 @cindex @code{^} (caret) @subentry in @code{FS}
 7296 @cindex dark corner @subentry @code{^}, in @code{FS}
 7297 There is an additional subtlety to be aware of when using regular expressions
 7298 for field splitting.
 7299 It is not well specified in the POSIX standard, or anywhere else, what @samp{^}
 7300 means when splitting fields.  Does the @samp{^}  match only at the beginning of
 7301 the entire record? Or is each field separator a new string?  It turns out that
 7302 different @command{awk} versions answer this question differently, and you
 7303 should not rely on any specific behavior in your programs.
 7304 @value{DARKCORNER}
 7305 
 7306 @cindex Brian Kernighan's @command{awk}
 7307 As a point of information, BWK @command{awk} allows @samp{^}
 7308 to match only at the beginning of the record. @command{gawk}
 7309 also works this way. For example:
 7310 
 7311 @example
 7312 $ @kbd{echo 'xxAA  xxBxx  C' |}
 7313 > @kbd{gawk -F '(^x+)|( +)' '@{ for (i = 1; i <= NF; i++)}
 7314 > @kbd{                            printf "-->%s<--\n", $i @}'}
 7315 @print{} --><--
 7316 @print{} -->AA<--
 7317 @print{} -->xxBxx<--
 7318 @print{} -->C<--
 7319 @end example
 7320 
 7321 @node Single Character Fields
 7322 @subsection Making Each Character a Separate Field
 7323 
 7324 @cindex common extensions @subentry single character fields
 7325 @cindex extensions @subentry common @subentry single character fields
 7326 @cindex differences in @command{awk} and @command{gawk} @subentry single-character fields
 7327 @cindex single-character fields
 7328 @cindex fields @subentry single-character
 7329 There are times when you may want to examine each character
 7330 of a record separately.  This can be done in @command{gawk} by
 7331 simply assigning the null string (@code{""}) to @code{FS}. @value{COMMONEXT}
 7332 In this case,
 7333 each individual character in the record becomes a separate field.
 7334 For example:
 7335 
 7336 @example
 7337 $ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}}
 7338 >                  @kbd{@{}
 7339 >                      @kbd{for (i = 1; i <= NF; i = i + 1)}
 7340 >                          @kbd{print "Field", i, "is", $i}
 7341 >                  @kbd{@}'}
 7342 @print{} Field 1 is a
 7343 @print{} Field 2 is
 7344 @print{} Field 3 is b
 7345 @end example
 7346 
 7347 @cindex dark corner @subentry @code{FS} as null string
 7348 @cindex @code{FS} variable @subentry null string as
 7349 Traditionally, the behavior of @code{FS} equal to @code{""} was not defined.
 7350 In this case, most versions of Unix @command{awk} simply treat the entire record
 7351 as only having one field.
 7352 @value{DARKCORNER}
 7353 In compatibility mode
 7354 (@pxref{Options}),
 7355 if @code{FS} is the null string, then @command{gawk} also
 7356 behaves this way.
 7357 
 7358 @node Command Line Field Separator
 7359 @subsection Setting @code{FS} from the Command Line
 7360 @cindex @option{-F} option @subentry command-line
 7361 @cindex field separator @subentry on command line
 7362 @cindex command line @subentry @code{FS} on, setting
 7363 @cindex @code{FS} variable @subentry setting from command line
 7364 
 7365 @code{FS} can be set on the command line.  Use the @option{-F} option to
 7366 do so.  For example:
 7367 
 7368 @example
 7369 awk -F, '@var{program}' @var{input-files}
 7370 @end example
 7371 
 7372 @noindent
 7373 sets @code{FS} to the @samp{,} character.  Notice that the option uses
 7374 an uppercase @samp{F} instead of a lowercase @samp{f}. The latter
 7375 option (@option{-f}) specifies a file containing an @command{awk} program.
 7376 
 7377 The value used for the argument to @option{-F} is processed in exactly the
 7378 same way as assignments to the predefined variable @code{FS}.
 7379 Any special characters in the field separator must be escaped
 7380 appropriately.  For example, to use a @samp{\} as the field separator
 7381 on the command line, you would have to type:
 7382 
 7383 @example
 7384 # same as FS = "\\"
 7385 awk -F\\\\ '@dots{}' files @dots{}
 7386 @end example
 7387 
 7388 @noindent
 7389 @cindex field separator @subentry backslash (@code{\}) as
 7390 @cindex @code{\} (backslash) @subentry as field separator
 7391 @cindex backslash (@code{\}) @subentry as field separator
 7392 Because @samp{\} is used for quoting in the shell, @command{awk} sees
 7393 @samp{-F\\}.  Then @command{awk} processes the @samp{\\} for escape
 7394 characters (@pxref{Escape Sequences}), finally yielding
 7395 a single @samp{\} to use for the field separator.
 7396 
 7397 @c @cindex historical features
 7398 As a special case, in compatibility mode
 7399 (@pxref{Options}),
 7400 if the argument to @option{-F} is @samp{t}, then @code{FS} is set to
 7401 the TAB character.  If you type @samp{-F\t} at the
 7402 shell, without any quotes, the @samp{\} gets deleted, so @command{awk}
 7403 figures that you really want your fields to be separated with TABs and
 7404 not @samp{t}s.  Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line
 7405 if you really do want to separate your fields with @samp{t}s.
 7406 Use @samp{-F '\t'} when not in compatibility mode to specify that TABs
 7407 separate fields.
 7408 
 7409 As an example, let's use an @command{awk} program file called @file{edu.awk}
 7410 that contains the pattern @code{/edu/} and the action @samp{print $1}:
 7411 
 7412 @example
 7413 /edu/   @{ print $1 @}
 7414 @end example
 7415 
 7416 Let's also set @code{FS} to be the @samp{-} character and run the
 7417 program on the file @file{mail-list}.  The following command prints a
 7418 list of the names of the people that work at or attend a university, and
 7419 the first three digits of their phone numbers:
 7420 
 7421 @example
 7422 $ @kbd{awk -F- -f edu.awk mail-list}
 7423 @print{} Fabius       555
 7424 @print{} Samuel       555
 7425 @print{} Jean
 7426 @end example
 7427 
 7428 @noindent
 7429 Note the third line of output.  The third line
 7430 in the original file looked like this:
 7431 
 7432 @example
 7433 Jean-Paul    555-2127     jeanpaul.campanorum@@nyu.edu     R
 7434 @end example
 7435 
 7436 The @samp{-} as part of the person's name was used as the field
 7437 separator, instead of the @samp{-} in the phone number that was
 7438 originally intended.  This demonstrates why you have to be careful in
 7439 choosing your field and record separators.
 7440 
 7441 @cindex Unix @command{awk} @subentry password files, field separators and
 7442 Perhaps the most common use of a single character as the field separator
 7443 occurs when processing the Unix system password file.  On many Unix
 7444 systems, each user has a separate entry in the system password file, with one
 7445 line per user.  The information in these lines is separated by colons.
 7446 The first field is the user's login name and the second is the user's
 7447 encrypted or shadow password.  (A shadow password is indicated by the
 7448 presence of a single @samp{x} in the second field.)  A password file
 7449 entry might look like this:
 7450 
 7451 @cindex Robbins @subentry Arnold
 7452 @example
 7453 arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash
 7454 @end example
 7455 
 7456 The following program searches the system password file and prints
 7457 the entries for users whose full name is not indicated:
 7458 
 7459 @example
 7460 awk -F: '$5 == ""' /etc/passwd
 7461 @end example
 7462 
 7463 @node Full Line Fields
 7464 @subsection Making the Full Line Be a Single Field
 7465 
 7466 Occasionally, it's useful to treat the whole input line as a
 7467 single field.  This can be done easily and portably simply by
 7468 setting @code{FS} to @code{"\n"} (a newline):@footnote{Thanks to
 7469 Andrew Schorr for this tip.}
 7470 
 7471 @example
 7472 awk -F'\n' '@var{program}' @var{files @dots{}}
 7473 @end example
 7474 
 7475 @noindent
 7476 When you do this, @code{$1} is the same as @code{$0}.
 7477 
 7478 @sidebar Changing @code{FS} Does Not Affect the Fields
 7479 
 7480 @cindex POSIX @command{awk} @subentry field separators and
 7481 @cindex field separator @subentry POSIX and
 7482 According to the POSIX standard, @command{awk} is supposed to behave
 7483 as if each record is split into fields at the time it is read.
 7484 In particular, this means that if you change the value of @code{FS}
 7485 after a record is read, the values of the fields (i.e., how they were split)
 7486 should reflect the old value of @code{FS}, not the new one.
 7487 
 7488 @cindex dark corner @subentry field separators
 7489 @cindex @command{sed} utility
 7490 @cindex stream editors
 7491 However, many older implementations of @command{awk} do not work this way.  Instead,
 7492 they defer splitting the fields until a field is actually
 7493 referenced.  The fields are split
 7494 using the @emph{current} value of @code{FS}!
 7495 @value{DARKCORNER}
 7496 This behavior can be difficult
 7497 to diagnose. The following example illustrates the difference
 7498 between the two methods:
 7499 
 7500 @example
 7501 sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
 7502 @end example
 7503 
 7504 @noindent
 7505 which usually prints:
 7506 
 7507 @example
 7508 root
 7509 @end example
 7510 
 7511 @noindent
 7512 on an incorrect implementation of @command{awk}, while @command{gawk}
 7513 prints the full first line of the file, something like:
 7514 
 7515 @example
 7516 root:x:0:0:Root:/:
 7517 @end example
 7518 
 7519 (The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.''
 7520 Its behavior is also defined by the POSIX standard.}
 7521 command prints just the first line of @file{/etc/passwd}.)
 7522 @end sidebar
 7523 
 7524 @node Field Splitting Summary
 7525 @subsection Field-Splitting Summary
 7526 
 7527 It is important to remember that when you assign a string constant
 7528 as the value of @code{FS}, it undergoes normal @command{awk} string
 7529 processing.  For example, with Unix @command{awk} and @command{gawk},
 7530 the assignment @samp{FS = "\.."} assigns the character string @code{".."}
 7531 to @code{FS} (the backslash is stripped).  This creates a regexp meaning
 7532 ``fields are separated by occurrences of any two characters.''
 7533 If instead you want fields to be separated by a literal period followed
 7534 by any single character, use @samp{FS = "\\.."}.
 7535 
 7536 The following list summarizes how fields are split, based on the value
 7537 of @code{FS} (@samp{==} means ``is equal to''):
 7538 
 7539 @table @code
 7540 @item FS == " "
 7541 Fields are separated by runs of whitespace.  Leading and trailing
 7542 whitespace are ignored.  This is the default.
 7543 
 7544 @item FS == @var{any other single character}
 7545 Fields are separated by each occurrence of the character.  Multiple
 7546 successive occurrences delimit empty fields, as do leading and
 7547 trailing occurrences.
 7548 The character can even be a regexp metacharacter; it does not need
 7549 to be escaped.
 7550 
 7551 @item FS == @var{regexp}
 7552 Fields are separated by occurrences of characters that match @var{regexp}.
 7553 Leading and trailing matches of @var{regexp} delimit empty fields.
 7554 
 7555 @item FS == ""
 7556 Each individual character in the record becomes a separate field.
 7557 (This is a common extension; it is not specified by the POSIX standard.)
 7558 @end table
 7559 
 7560 @sidebar @code{FS} and @code{IGNORECASE}
 7561 
 7562 The @code{IGNORECASE} variable
 7563 (@pxref{User-modified})
 7564 affects field splitting @emph{only} when the value of @code{FS} is a regexp.
 7565 It has no effect when @code{FS} is a single character, even if
 7566 that character is a letter.  Thus, in the following code:
 7567 
 7568 @example
 7569 FS = "c"
 7570 IGNORECASE = 1
 7571 $0 = "aCa"
 7572 print $1
 7573 @end example
 7574 
 7575 @noindent
 7576 The output is @samp{aCa}.  If you really want to split fields on an
 7577 alphabetic character while ignoring case, use a regexp that will
 7578 do it for you (e.g., @samp{FS = "[c]"}).  In this case, @code{IGNORECASE}
 7579 will take effect.
 7580 @end sidebar
 7581 
 7582 
 7583 @node Constant Size
 7584 @section Reading Fixed-Width Data
 7585 
 7586 @cindex data, fixed-width
 7587 @cindex fixed-width data
 7588 @cindex advanced features @subentry fixed-width data
 7589 
 7590 @c O'Reilly doesn't like it as a note the first thing in the section.
 7591 This @value{SECTION} discusses an advanced
 7592 feature of @command{gawk}.  If you are a novice @command{awk} user,
 7593 you might want to skip it on the first reading.
 7594 
 7595 @command{gawk} provides a facility for dealing with fixed-width fields
 7596 with no distinctive field separator. We discuss this feature in
 7597 the following @value{SUBSECTION}s.
 7598 
 7599 @menu
 7600 * Fixed width data::            Processing fixed-width data.
 7601 * Skipping intervening::        Skipping intervening fields.
 7602 * Allowing trailing data::      Capturing optional trailing data.
 7603 * Fields with fixed data::      Field values with fixed-width data.
 7604 @end menu
 7605 
 7606 @node Fixed width data
 7607 @subsection Processing Fixed-Width Data
 7608 
 7609 An example of fixed-width data would be the input for old Fortran programs
 7610 where numbers are run together, or the output of programs that did not
 7611 anticipate the use of their output as input for other programs.
 7612 
 7613 An example of the latter is a table where all the columns are lined up
 7614 by the use of a variable number of spaces and @emph{empty fields are
 7615 just spaces}.  Clearly, @command{awk}'s normal field splitting based
 7616 on @code{FS} does not work well in this case.  Although a portable
 7617 @command{awk} program can use a series of @code{substr()} calls on
 7618 @code{$0} (@pxref{String Functions}), this is awkward and inefficient
 7619 for a large number of fields.
 7620 
 7621 @cindex troubleshooting @subentry fatal errors @subentry field widths, specifying
 7622 @cindex @command{w} utility
 7623 @cindex @code{FIELDWIDTHS} variable
 7624 @cindex @command{gawk} @subentry @code{FIELDWIDTHS} variable in
 7625 The splitting of an input record into fixed-width fields is specified by
 7626 assigning a string containing space-separated numbers to the built-in
 7627 variable @code{FIELDWIDTHS}.  Each number specifies the width of the
 7628 field, @emph{including} columns between fields.  If you want to ignore
 7629 the columns between fields, you can specify the width as a separate
 7630 field that is subsequently ignored.  It is a fatal error to supply a
 7631 field width that has a negative value.
 7632 
 7633 The following data is the output of the Unix @command{w} utility.  It is useful
 7634 to illustrate the use of @code{FIELDWIDTHS}:
 7635 
 7636 @example
 7637 @group
 7638  10:06pm  up 21 days, 14:04,  23 users
 7639 User     tty       login@  idle   JCPU   PCPU  what
 7640 hzuo     ttyV0     8:58pm            9      5  vi p24.tex
 7641 hzang    ttyV3     6:37pm    50                -csh
 7642 eklye    ttyV5     9:53pm            7      1  em thes.tex
 7643 dportein ttyV6     8:17pm  1:47                -csh
 7644 gierd    ttyD3    10:00pm     1                elm
 7645 dave     ttyD4     9:47pm            4      4  w
 7646 brent    ttyp0    26Jun91  4:46  26:46   4:41  bash
 7647 dave     ttyq4    26Jun9115days     46     46  wnewmail
 7648 @end group
 7649 @end example
 7650 
 7651 The following program takes this input, converts the idle time to
 7652 number of seconds, and prints out the first two fields and the calculated
 7653 idle time:
 7654 
 7655 @example
 7656 BEGIN  @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @}
 7657 NR > 2 @{
 7658     idle = $4
 7659     sub(/^ +/, "", idle)   # strip leading spaces
 7660     if (idle == "")
 7661         idle = 0
 7662     if (idle ~ /:/) @{      # hh:mm
 7663         split(idle, t, ":")
 7664         idle = t[1] * 60 + t[2]
 7665     @}
 7666     if (idle ~ /days/)
 7667         idle *= 24 * 60 * 60
 7668 
 7669     print $1, $2, idle
 7670 @}
 7671 @end example
 7672 
 7673 @quotation NOTE
 7674 The preceding program uses a number of @command{awk} features that
 7675 haven't been introduced yet.
 7676 @end quotation
 7677 
 7678 Running the program on the data produces the following results:
 7679 
 7680 @example
 7681 hzuo      ttyV0  0
 7682 hzang     ttyV3  50
 7683 eklye     ttyV5  0
 7684 dportein  ttyV6  107
 7685 gierd     ttyD3  1
 7686 dave      ttyD4  0
 7687 brent     ttyp0  286
 7688 dave      ttyq4  1296000
 7689 @end example
 7690 
 7691 Another (possibly more practical) example of fixed-width input data
 7692 is the input from a deck of balloting cards.  In some parts of
 7693 the United States, voters mark their choices by punching holes in computer
 7694 cards.  These cards are then processed to count the votes for any particular
 7695 candidate or on any particular issue.  Because a voter may choose not to
 7696 vote on some issue, any column on the card may be empty.  An @command{awk}
 7697 program for processing such data could use the @code{FIELDWIDTHS} feature
 7698 to simplify reading the data.  (Of course, getting @command{gawk} to run on
 7699 a system with card readers is another story!)
 7700 
 7701 @node Skipping intervening
 7702 @subsection Skipping Intervening Fields
 7703 
 7704 Starting in @value{PVERSION} 4.2, each field width may optionally be
 7705 preceded by a colon-separated value specifying the number of characters
 7706 to skip before the field starts.  Thus, the preceding program could be
 7707 rewritten to specify @code{FIELDWIDTHS} like so:
 7708 
 7709 @example
 7710 BEGIN  @{ FIELDWIDTHS = "8 1:5 4:7 6 1:6 1:6 2:33" @}
 7711 @end example
 7712 
 7713 This strips away some of the white space separating the fields. With such
 7714 a change, the program produces the following results:
 7715 
 7716 @example
 7717 hzang    ttyV3 50
 7718 eklye    ttyV5 0
 7719 dportein ttyV6 107
 7720 gierd    ttyD3 1
 7721 dave     ttyD4 0
 7722 brent    ttyp0 286
 7723 dave     ttyq4 1296000
 7724 @end example
 7725 
 7726 @node Allowing trailing data
 7727 @subsection Capturing Optional Trailing Data
 7728 
 7729 There are times when fixed-width data may be followed by additional data
 7730 that has no fixed length.  Such data may or may not be present, but if
 7731 it is, it should be possible to get at it from an @command{awk} program.
 7732 
 7733 Starting with @value{PVERSION} 4.2, in order to provide a way to say ``anything
 7734 else in the record after the defined fields,'' @command{gawk}
 7735 allows you to add a final @samp{*} character to the value of
 7736 @code{FIELDWIDTHS}. There can only be one such character, and it must
 7737 be the final non-whitespace character in @code{FIELDWIDTHS}.
 7738 For example:
 7739 
 7740 @example
 7741 $ @kbd{cat fw.awk}                         @ii{Show the program}
 7742 @print{} BEGIN @{ FIELDWIDTHS = "2 2 *" @}
 7743 @print{} @{ print NF, $1, $2, $3 @}
 7744 $ @kbd{cat fw.in}                          @ii{Show sample input}
 7745 @print{} 1234abcdefghi
 7746 $ @kbd{gawk -f fw.awk fw.in}               @ii{Run the program}
 7747 @print{} 3 12 34 abcdefghi
 7748 @end example
 7749 
 7750 @node Fields with fixed data
 7751 @subsection Field Values With Fixed-Width Data
 7752 
 7753 So far, so good.  But what happens if there isn't as much data as there
 7754 should be based on the contents of @code{FIELDWIDTHS}? Or, what happens
 7755 if there is more data than expected?
 7756 
 7757 For many years, what happens in these cases was not well defined. Starting
 7758 with @value{PVERSION} 4.2, the rules are as follows:
 7759 
 7760 @table @asis
 7761 @item Enough data for some fields
 7762 For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the
 7763 input record is @samp{aabbb}.  In this case, @code{NF} is set to two.
 7764 
 7765 @item Not enough data for a field
 7766 For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the
 7767 input record is @samp{aab}.  In this case, @code{NF} is set to two and
 7768 @code{$2} has the value @code{"b"}. The idea is that even though there
 7769 aren't as many characters as were expected, there are some, so the data
 7770 should be made available to the program.
 7771 
 7772 @item Too much data
 7773 For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the
 7774 input record is @samp{aabbbccccddd}.  In this case, @code{NF} is set to
 7775 three and the extra characters (@samp{ddd}) are ignored.  If you want
 7776 @command{gawk} to capture the extra characters, supply a final @samp{*}
 7777 in the value of @code{FIELDWIDTHS}.
 7778 
 7779 @item Too much data, but with @samp{*} supplied
 7780 For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4 *"} and the
 7781 input record is @samp{aabbbccccddd}.  In this case, @code{NF} is set to
 7782 four, and @code{$4} has the value @code{"ddd"}.
 7783 
 7784 @end table
 7785 
 7786 @node Splitting By Content
 7787 @section Defining Fields by Content
 7788 
 7789 @menu
 7790 * More CSV::                    More on CSV files.
 7791 @end menu
 7792 
 7793 @c O'Reilly doesn't like it as a note the first thing in the section.
 7794 This @value{SECTION} discusses an advanced
 7795 feature of @command{gawk}.  If you are a novice @command{awk} user,
 7796 you might want to skip it on the first reading.
 7797 
 7798 @cindex advanced features @subentry specifying field content
 7799 Normally, when using @code{FS}, @command{gawk} defines the fields as the
 7800 parts of the record that occur in between each field separator. In other
 7801 words, @code{FS} defines what a field @emph{is not}, instead of what a field
 7802 @emph{is}.
 7803 However, there are times when you really want to define the fields by
 7804 what they are, and not by what they are not.
 7805 
 7806 The most notorious such case
 7807 is so-called @dfn{comma-separated values} (CSV) data. Many spreadsheet programs,
 7808 for example, can export their data into text files, where each record is
 7809 terminated with a newline, and fields are separated by commas. If
 7810 commas only separated the data, there wouldn't be an issue. The problem comes when
 7811 one of the fields contains an @emph{embedded} comma.
 7812 In such cases, most programs embed the field in double quotes.@footnote{The
 7813 CSV format lacked a formal standard definition for many years.
 7814 @uref{http://www.ietf.org/rfc/rfc4180.txt, RFC 4180}
 7815 standardizes the most common practices.}
 7816 So, we might have data like this:
 7817 
 7818 @example
 7819 @c file eg/misc/addresses.csv
 7820 Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
 7821 @c endfile
 7822 @end example
 7823 
 7824 @cindex @command{gawk} @subentry @code{FPAT} variable in
 7825 @cindex @code{FPAT} variable
 7826 The @code{FPAT} variable offers a solution for cases like this.
 7827 The value of @code{FPAT} should be a string that provides a regular expression.
 7828 This regular expression describes the contents of each field.
 7829 
 7830 In the case of CSV data as presented here, each field is either ``anything that
 7831 is not a comma,'' or ``a double quote, anything that is not a double quote, and a
 7832 closing double quote.''  (There are more complicated definitions of CSV data,
 7833 treated shortly.)
 7834 If written as a regular expression constant
 7835 (@pxref{Regexp}),
 7836 we would have @code{/([^,]+)|("[^"]+")/}.
 7837 Writing this as a string requires us to escape the double quotes, leading to:
 7838 
 7839 @example
 7840 FPAT = "([^,]+)|(\"[^\"]+\")"
 7841 @end example
 7842 
 7843 Putting this to use, here is a simple program to parse the data:
 7844 
 7845 @example
 7846 @c file eg/misc/simple-csv.awk
 7847 @group
 7848 BEGIN @{
 7849     FPAT = "([^,]+)|(\"[^\"]+\")"
 7850 @}
 7851 @end group
 7852 
 7853 @group
 7854 @{
 7855     print "NF = ", NF
 7856     for (i = 1; i <= NF; i++) @{
 7857         printf("$%d = <%s>\n", i, $i)
 7858     @}
 7859 @}
 7860 @end group
 7861 @c endfile
 7862 @end example
 7863 
 7864 When run, we get the following:
 7865 
 7866 @example
 7867 $ @kbd{gawk -f simple-csv.awk addresses.csv}
 7868 NF =  7
 7869 $1 = <Robbins>
 7870 $2 = <Arnold>
 7871 $3 = <"1234 A Pretty Street, NE">
 7872 $4 = <MyTown>
 7873 $5 = <MyState>
 7874 $6 = <12345-6789>
 7875 $7 = <USA>
 7876 @end example
 7877 
 7878 Note the embedded comma in the value of @code{$3}.
 7879 
 7880 A straightforward improvement when processing CSV data of this sort
 7881 would be to remove the quotes when they occur, with something like this:
 7882 
 7883 @example
 7884 if (substr($i, 1, 1) == "\"") @{
 7885     len = length($i)
 7886     $i = substr($i, 2, len - 2)    # Get text within the two quotes
 7887 @}
 7888 @end example
 7889 
 7890 @quotation NOTE
 7891 Some programs export CSV data that contains embedded newlines between
 7892 the double quotes.  @command{gawk} provides no way to deal with this.
 7893 Even though a formal specification for CSV data exists, there isn't much
 7894 more to be done;
 7895 the @code{FPAT} mechanism provides an elegant solution for the majority
 7896 of cases, and the @command{gawk} developers are satisfied with that.
 7897 @end quotation
 7898 
 7899 As written, the regexp used for @code{FPAT} requires that each field
 7900 contain at least one character.  A straightforward modification
 7901 (changing the first @samp{+} to @samp{*}) allows fields to be empty:
 7902 
 7903 @example
 7904 FPAT = "([^,]*)|(\"[^\"]+\")"
 7905 @end example
 7906 
 7907 @c FIXME: 4/2015
 7908 @c Consider use of FPAT = "([^,]*)|(\"[^\"]*\")"
 7909 @c (star in latter part of value) to allow quoted strings to be empty.
 7910 @c Per email from Ed Morton <mortoneccc@comcast.net>
 7911 
 7912 As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
 7913 affects field splitting with @code{FPAT}.
 7914 
 7915 Assigning a value to @code{FPAT} overrides field splitting
 7916 with @code{FS} and with @code{FIELDWIDTHS}.
 7917 
 7918 Finally, the @code{patsplit()} function makes the same functionality
 7919 available for splitting regular strings (@pxref{String Functions}).
 7920 
 7921 @node More CSV
 7922 @subsection More on CSV Files
 7923 
 7924 @cindex Collado, Manuel
 7925 Manuel Collado notes that in addition to commas, a CSV field can also
 7926 contains quotes, that have to be escaped by doubling them. The previously
 7927 described regexps fail to accept quoted fields with both commas and
 7928 quotes inside. He suggests that the simplest @code{FPAT} expression that
 7929 recognizes this kind of fields is @code{/([^,]*)|("([^"]|"")+")/}. He
 7930 provides the following input data to test these variants:
 7931 
 7932 @example
 7933 @c file eg/misc/sample.csv
 7934 p,"q,r",s
 7935 p,"q""r",s
 7936 p,"q,""r",s
 7937 p,"",s
 7938 p,,s
 7939 @c endfile
 7940 @end example
 7941 
 7942 @noindent
 7943 And here is his test program:
 7944 
 7945 @example
 7946 @c file eg/misc/test-csv.awk
 7947 @group
 7948 BEGIN @{
 7949      fp[0] = "([^,]+)|(\"[^\"]+\")"
 7950      fp[1] = "([^,]*)|(\"[^\"]+\")"
 7951      fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
 7952      FPAT = fp[fpat+0]
 7953 @}
 7954 @end group
 7955 
 7956 @group
 7957 @{
 7958      print "<" $0 ">"
 7959      printf("NF = %s ", NF)
 7960      for (i = 1; i <= NF; i++) @{
 7961          printf("<%s>", $i)
 7962      @}
 7963      print ""
 7964 @}
 7965 @end group
 7966 @c endfile
 7967 @end example
 7968 
 7969 When run on the third variant, it produces:
 7970 
 7971 @example
 7972 $ @kbd{gawk -v fpat=2 -f test-csv.awk sample.csv}
 7973 @print{} <p,"q,r",s>
 7974 @print{} NF = 3 <p><"q,r"><s>
 7975 @print{} <p,"q""r",s>
 7976 @print{} NF = 3 <p><"q""r"><s>
 7977 @print{} <p,"q,""r",s>
 7978 @print{} NF = 3 <p><"q,""r"><s>
 7979 @print{} <p,"",s>
 7980 @print{} NF = 3 <p><""><s>
 7981 @print{} <p,,s>
 7982 @print{} NF = 3 <p><><s>
 7983 @end example
 7984 
 7985 @node Testing field creation
 7986 @section Checking How @command{gawk} Is Splitting Records
 7987 
 7988 @cindex @command{gawk} @subentry splitting fields and
 7989 As we've seen, @command{gawk} provides three independent methods to split
 7990 input records into fields.  The mechanism used is based on which of the
 7991 three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was
 7992 last assigned to. In addition, an API input parser may choose to override
 7993 the record parsing mechanism; please refer to @ref{Input Parsers} for
 7994 further information about this feature.
 7995 
 7996 To restore normal field splitting after using @code{FIELDWIDTHS}
 7997 and/or @code{FPAT}, simply assign a value to @code{FS}.
 7998 You can use @samp{FS = FS} to do this,
 7999 without having to know the current value of @code{FS}.
 8000 
 8001 In order to tell which kind of field splitting is in effect,
 8002 use @code{PROCINFO["FS"]} (@pxref{Auto-set}).
 8003 The value is @code{"FS"} if regular field splitting is being used,
 8004 @code{"FIELDWIDTHS"} if fixed-width field splitting is being used,
 8005 or @code{"FPAT"} if content-based field splitting is being used:
 8006 
 8007 @example
 8008 if (PROCINFO["FS"] == "FS")
 8009     @var{regular field splitting} @dots{}
 8010 else if (PROCINFO["FS"] == "FIELDWIDTHS")
 8011     @var{fixed-width field splitting} @dots{}
 8012 else if (PROCINFO["FS"] == "FPAT")
 8013     @var{content-based field splitting} @dots{}
 8014 else
 8015     @var{API input parser field splitting} @dots{} @ii{(advanced feature)}
 8016 @end example
 8017 
 8018 This information is useful when writing a function that needs to
 8019 temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records,
 8020 and then restore the original settings (@pxref{Passwd Functions} for an
 8021 example of such a function).
 8022 
 8023 @node Multiple Line
 8024 @section Multiple-Line Records
 8025 
 8026 @cindex multiple-line records
 8027 @cindex records @subentry multiline
 8028 @cindex input @subentry multiline records
 8029 @cindex files @subentry reading @subentry multiline records
 8030 @cindex input, files @seeentry{input files}
 8031 In some databases, a single line cannot conveniently hold all the
 8032 information in one entry.  In such cases, you can use multiline
 8033 records.  The first step in doing this is to choose your data format.
 8034 
 8035 @cindex record separators @subentry with multiline records
 8036 One technique is to use an unusual character or string to separate
 8037 records.  For example, you could use the formfeed character (written
 8038 @samp{\f} in @command{awk}, as in C) to separate them, making each record
 8039 a page of the file.  To do this, just set the variable @code{RS} to
 8040 @code{"\f"} (a string containing the formfeed character).  Any
 8041 other character could equally well be used, as long as it won't be part
 8042 of the data in a record.
 8043 
 8044 @cindex @code{RS} variable @subentry multiline records and
 8045 Another technique is to have blank lines separate records.  By a special
 8046 dispensation, an empty string as the value of @code{RS} indicates that
 8047 records are separated by one or more blank lines.  When @code{RS} is set
 8048 to the empty string, each record always ends at the first blank line
 8049 encountered.  The next record doesn't start until the first nonblank
 8050 line that follows.  No matter how many blank lines appear in a row, they
 8051 all act as one record separator.
 8052 (Blank lines must be completely empty; lines that contain only
 8053 whitespace do not count.)
 8054 
 8055 @cindex leftmost longest match
 8056 @cindex matching @subentry leftmost longest
 8057 You can achieve the same effect as @samp{RS = ""} by assigning the
 8058 string @code{"\n\n+"} to @code{RS}. This regexp matches the newline
 8059 at the end of the record and one or more blank lines after the record.
 8060 In addition, a regular expression always matches the longest possible
 8061 sequence when there is a choice
 8062 (@pxref{Leftmost Longest}).
 8063 So, the next record doesn't start until
 8064 the first nonblank line that follows---no matter how many blank lines
 8065 appear in a row, they are considered one record separator.
 8066 
 8067 @cindex dark corner @subentry multiline records
 8068 However, there is an important difference between @samp{RS = ""} and
 8069 @samp{RS = "\n\n+"}. In the first case, leading newlines in the input
 8070 @value{DF} are ignored, and if a file ends without extra blank lines
 8071 after the last record, the final newline is removed from the record.
 8072 In the second case, this special processing is not done.
 8073 @value{DARKCORNER}
 8074 
 8075 @cindex field separator @subentry in multiline records
 8076 @cindex @code{FS} variable @subentry in multiline records
 8077 Now that the input is separated into records, the second step is to
 8078 separate the fields in the records.  One way to do this is to divide each
 8079 of the lines into fields in the normal manner.  This happens by default
 8080 as the result of a special feature.  When @code{RS} is set to the empty
 8081 string @emph{and} @code{FS} is set to a single character,
 8082 the newline character @emph{always} acts as a field separator.
 8083 This is in addition to whatever field separations result from
 8084 @code{FS}.
 8085 
 8086 @quotation NOTE
 8087 When @code{FS} is the null string (@code{""})
 8088 or a regexp, this special feature of @code{RS} does not apply.
 8089 It does apply to the default field separator of a single space:
 8090 @samp{FS = @w{" "}}.
 8091 
 8092 Note that language in the POSIX specification implies that
 8093 this special feature should apply when @code{FS} is a regexp.
 8094 However, Unix @command{awk} has never behaved that way, nor has
 8095 @command{gawk}. This is essentially a bug in POSIX.
 8096 @c Noted as of 4/2019; working to get the standard fixed.
 8097 @end quotation
 8098 
 8099 The original motivation for this special exception was probably to provide
 8100 useful behavior in the default case (i.e., @code{FS} is equal
 8101 to @w{@code{" "}}).  This feature can be a problem if you really don't
 8102 want the newline character to separate fields, because there is no way to
 8103 prevent it.  However, you can work around this by using the @code{split()}
 8104 function to break up the record manually
 8105 (@pxref{String Functions}).
 8106 If you have a single-character field separator, you can work around
 8107 the special feature in a different way, by making @code{FS} into a
 8108 regexp for that single character.  For example, if the field
 8109 separator is a percent character, instead of
 8110 @samp{FS = "%"}, use @samp{FS = "[%]"}.
 8111 
 8112 Another way to separate fields is to
 8113 put each field on a separate line: to do this, just set the
 8114 variable @code{FS} to the string @code{"\n"}.
 8115 (This single-character separator matches a single newline.)
 8116 A practical example of a @value{DF} organized this way might be a mailing
 8117 list, where blank lines separate the entries.  Consider a mailing
 8118 list in a file named @file{addresses}, which looks like this:
 8119 
 8120 @example
 8121 Jane Doe
 8122 123 Main Street
 8123 Anywhere, SE 12345-6789
 8124 
 8125 John Smith
 8126 456 Tree-lined Avenue
 8127 Smallville, MW 98765-4321
 8128 @dots{}
 8129 @end example
 8130 
 8131 @noindent
 8132 A simple program to process this file is as follows:
 8133 
 8134 @example
 8135 # addrs.awk --- simple mailing list program
 8136 
 8137 # Records are separated by blank lines.
 8138 # Each line is one field.
 8139 BEGIN @{ RS = "" ; FS = "\n" @}
 8140 
 8141 @{
 8142       print "Name is:", $1
 8143       print "Address is:", $2
 8144       print "City and State are:", $3
 8145       print ""
 8146 @}
 8147 @end example
 8148 
 8149 Running the program produces the following output:
 8150 
 8151 @example
 8152 $ @kbd{awk -f addrs.awk addresses}
 8153 @print{} Name is: Jane Doe
 8154 @print{} Address is: 123 Main Street
 8155 @print{} City and State are: Anywhere, SE 12345-6789
 8156 @print{}
 8157 @print{} Name is: John Smith
 8158 @print{} Address is: 456 Tree-lined Avenue
 8159 @print{} City and State are: Smallville, MW 98765-4321
 8160 @print{}
 8161 @dots{}
 8162 @end example
 8163 
 8164 @xref{Labels Program} for a more realistic program dealing with
 8165 address lists.  The following list summarizes how records are split,
 8166 based on the value of
 8167 @ifinfo
 8168 @code{RS}.
 8169 (@samp{==} means ``is equal to.'')
 8170 @end ifinfo
 8171 @ifnotinfo
 8172 @code{RS}:
 8173 @end ifnotinfo
 8174 
 8175 @table @code
 8176 @item RS == "\n"
 8177 Records are separated by the newline character (@samp{\n}).  In effect,
 8178 every line in the @value{DF} is a separate record, including blank lines.
 8179 This is the default.
 8180 
 8181 @item RS == @var{any single character}
 8182 Records are separated by each occurrence of the character.  Multiple
 8183 successive occurrences delimit empty records.
 8184 
 8185 @item RS == ""
 8186 Records are separated by runs of blank lines.
 8187 When @code{FS} is a single character, then
 8188 the newline character
 8189 always serves as a field separator, in addition to whatever value
 8190 @code{FS} may have. Leading and trailing newlines in a file are ignored.
 8191 
 8192 @item RS == @var{regexp}
 8193 Records are separated by occurrences of characters that match @var{regexp}.
 8194 Leading and trailing matches of @var{regexp} delimit empty records.
 8195 (This is a @command{gawk} extension; it is not specified by the
 8196 POSIX standard.)
 8197 @end table
 8198 
 8199 @cindex @command{gawk} @subentry @code{RT} variable in
 8200 @cindex @code{RT} variable
 8201 @cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables
 8202 If not in compatibility mode (@pxref{Options}), @command{gawk} sets
 8203 @code{RT} to the input text that matched the value specified by @code{RS}.
 8204 But if the input file ended without any text that matches @code{RS},
 8205 then @command{gawk} sets @code{RT} to the null string.
 8206 
 8207 @node Getline
 8208 @section Explicit Input with @code{getline}
 8209 
 8210 @cindex @code{getline} command @subentry explicit input with
 8211 @cindex input @subentry explicit
 8212 So far we have been getting our input data from @command{awk}'s main
 8213 input stream---either the standard input (usually your keyboard, sometimes
 8214 the output from another program) or the
 8215 files specified on the command line.  The @command{awk} language has a
 8216 special built-in command called @code{getline} that
 8217 can be used to read input under your explicit control.
 8218 
 8219 The @code{getline} command is used in several different ways and should
 8220 @emph{not} be used by beginners.
 8221 The examples that follow the explanation of the @code{getline} command
 8222 include material that has not been covered yet.  Therefore, come back
 8223 and study the @code{getline} command @emph{after} you have reviewed the
 8224 rest of
 8225 @ifinfo
 8226 this @value{DOCUMENT}
 8227 @end ifinfo
 8228 @ifhtml
 8229 this @value{DOCUMENT}
 8230 @end ifhtml
 8231 @ifnotinfo
 8232 @ifnothtml
 8233 Parts I and II
 8234 @end ifnothtml
 8235 @end ifnotinfo
 8236 and have a good knowledge of how @command{awk} works.
 8237 
 8238 @cindex @command{gawk} @subentry @code{ERRNO} variable in
 8239 @cindex @code{ERRNO} variable @subentry with @command{getline} command
 8240 @cindex differences in @command{awk} and @command{gawk} @subentry @code{getline} command
 8241 @cindex @code{getline} command @subentry return values
 8242 @cindex @option{--sandbox} option @subentry input redirection with @code{getline}
 8243 
 8244 The @code{getline} command returns 1 if it finds a record and 0 if
 8245 it encounters the end of the file.  If there is some error in getting
 8246 a record, such as a file that cannot be opened, then @code{getline}
 8247 returns @minus{}1.  In this case, @command{gawk} sets the variable
 8248 @code{ERRNO} to a string describing the error that occurred.
 8249 
 8250 If @code{ERRNO} indicates that the I/O operation may be
 8251 retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set,
 8252 then @code{getline} returns @minus{}2
 8253 instead of @minus{}1, and further calls to @code{getline}
 8254 may be attempted.  @xref{Retrying Input} for further information about
 8255 this feature.
 8256 
 8257 In the following examples, @var{command} stands for a string value that
 8258 represents a shell command.
 8259 
 8260 @quotation NOTE
 8261 When @option{--sandbox} is specified (@pxref{Options}),
 8262 reading lines from files, pipes, and coprocesses is disabled.
 8263 @end quotation
 8264 
 8265 @menu
 8266 * Plain Getline::               Using @code{getline} with no arguments.
 8267 * Getline/Variable::            Using @code{getline} into a variable.
 8268 * Getline/File::                Using @code{getline} from a file.
 8269 * Getline/Variable/File::       Using @code{getline} into a variable from a
 8270                                 file.
 8271 * Getline/Pipe::                Using @code{getline} from a pipe.
 8272 * Getline/Variable/Pipe::       Using @code{getline} into a variable from a
 8273                                 pipe.
 8274 * Getline/Coprocess::           Using @code{getline} from a coprocess.
 8275 * Getline/Variable/Coprocess::  Using @code{getline} into a variable from a
 8276                                 coprocess.
 8277 * Getline Notes::               Important things to know about @code{getline}.
 8278 * Getline Summary::             Summary of @code{getline} Variants.
 8279 @end menu
 8280 
 8281 @node Plain Getline
 8282 @subsection Using @code{getline} with No Arguments
 8283 
 8284 The @code{getline} command can be used without arguments to read input
 8285 from the current input file.  All it does in this case is read the next
 8286 input record and split it up into fields.  This is useful if you've
 8287 finished processing the current record, but want to do some special
 8288 processing on the next record @emph{right now}.  For example:
 8289 
 8290 @c 6/2019: Thanks to Mark Krauze <daburashka@ya.ru> for suggested
 8291 @c improvements (the inner while loop).
 8292 @example
 8293 # Remove text between /* and */, inclusive
 8294 @{
 8295     while ((start = index($0, "/*")) != 0) @{
 8296         out = substr($0, 1, start - 1)  # leading part of the string
 8297         rest = substr($0, start + 2)    # ... */ ...    
 8298         while ((end = index(rest, "*/")) == 0) @{  # is */ in trailing part?
 8299             # get more text
 8300             if (getline <= 0) @{
 8301                 print("unexpected EOF or error:", ERRNO) > "/dev/stderr"
 8302                 exit
 8303             @}
 8304             # build up the line using string concatenation
 8305             rest = rest $0
 8306         @}
 8307         rest = substr(rest, end + 2)  # remove comment
 8308         # build up the output line using string concatenation
 8309         $0 = out rest
 8310     @}
 8311     print $0
 8312 @}
 8313 @end example
 8314 
 8315 This @command{awk} program deletes C-style comments (@samp{/* @dots{}
 8316 */}) from the input.
 8317 It uses a number of features we haven't covered yet, including
 8318 string concatenation
 8319 (@pxref{Concatenation})
 8320 and the @code{index()} and @code{substr()} built-in
 8321 functions
 8322 (@pxref{String Functions}).
 8323 By replacing the @samp{print $0} with other
 8324 statements, you could perform more complicated processing on the
 8325 decommented input, such as searching for matches of a regular
 8326 expression.
 8327 
 8328 Here is some sample input:
 8329 
 8330 @example
 8331 mon/*comment*/key
 8332 rab/*commen
 8333 t*/bit
 8334 horse /*comment*/more text
 8335 part 1 /*comment*/part 2 /*comment*/part 3
 8336 no comment
 8337 @end example
 8338 
 8339 When run, the output is:
 8340 
 8341 @example
 8342 $ @kbd{awk -f strip_comments.awk example_text}
 8343 @print{} monkey
 8344 @print{} rabbit
 8345 @print{} horse more text
 8346 @print{} part 1 part 2 part 3
 8347 @print{} no comment
 8348 @end example
 8349 
 8350 This form of the @code{getline} command sets @code{NF},
 8351 @code{NR}, @code{FNR}, @code{RT}, and the value of @code{$0}.
 8352 
 8353 @quotation NOTE
 8354 The new value of @code{$0} is used to test
 8355 the patterns of any subsequent rules.  The original value
 8356 of @code{$0} that triggered the rule that executed @code{getline}
 8357 is lost.
 8358 By contrast, the @code{next} statement reads a new record
 8359 but immediately begins processing it normally, starting with the first
 8360 rule in the program.  @xref{Next Statement}.
 8361 @end quotation
 8362 
 8363 @node Getline/Variable
 8364 @subsection Using @code{getline} into a Variable
 8365 @cindex @code{getline} command @subentry into a variable
 8366 @cindex variables @subentry @code{getline} command into, using
 8367 
 8368 You can use @samp{getline @var{var}} to read the next record from
 8369 @command{awk}'s input into the variable @var{var}.  No other processing is
 8370 done.
 8371 For example, suppose the next line is a comment or a special string,
 8372 and you want to read it without triggering
 8373 any rules.  This form of @code{getline} allows you to read that line
 8374 and store it in a variable so that the main
 8375 read-a-line-and-check-each-rule loop of @command{awk} never sees it.
 8376 The following example swaps every two lines of input:
 8377 
 8378 @example
 8379 @group
 8380 @{
 8381      if ((getline tmp) > 0) @{
 8382           print tmp
 8383           print $0
 8384      @} else
 8385           print $0
 8386 @}
 8387 @end group
 8388 @end example
 8389 
 8390 @noindent
 8391 It takes the following list:
 8392 
 8393 @example
 8394 wan
 8395 tew
 8396 free
 8397 phore
 8398 @end example
 8399 
 8400 @noindent
 8401 and produces these results:
 8402 
 8403 @example
 8404 tew
 8405 wan
 8406 phore
 8407 free
 8408 @end example
 8409 
 8410 The @code{getline} command used in this way sets only the variables
 8411 @code{NR}, @code{FNR}, and @code{RT} (and, of course, @var{var}).
 8412 The record is not
 8413 split into fields, so the values of the fields (including @code{$0}) and
 8414 the value of @code{NF} do not change.
 8415 
 8416 @node Getline/File
 8417 @subsection Using @code{getline} from a File
 8418 
 8419 @cindex @code{getline} command @subentry from a file
 8420 @cindex input redirection
 8421 @cindex redirection @subentry of input
 8422 @cindex @code{<} (left angle bracket) @subentry @code{<} operator (I/O)
 8423 @cindex left angle bracket (@code{<}) @subentry @code{<} operator (I/O)
 8424 @cindex operators @subentry input/output
 8425 Use @samp{getline < @var{file}} to read the next record from @var{file}.
 8426 Here, @var{file} is a string-valued expression that
 8427 specifies the @value{FN}.  @samp{< @var{file}} is called a @dfn{redirection}
 8428 because it directs input to come from a different place.
 8429 For example, the following
 8430 program reads its input record from the file @file{secondary.input} when it
 8431 encounters a first field with a value equal to 10 in the current input
 8432 file:
 8433 
 8434 @example
 8435 @{
 8436     if ($1 == 10) @{
 8437          getline < "secondary.input"
 8438          print
 8439     @} else
 8440          print
 8441 @}
 8442 @end example
 8443 
 8444 Because the main input stream is not used, the values of @code{NR} and
 8445 @code{FNR} are not changed. However, the record it reads is split into fields in
 8446 the normal manner, so the values of @code{$0} and the other fields are
 8447 changed, resulting in a new value of @code{NF}.
 8448 @code{RT} is also set.
 8449 
 8450 @cindex POSIX @command{awk} @subentry @code{<} operator and
 8451 @c Thanks to Paul Eggert for initial wording here
 8452 According to POSIX, @samp{getline < @var{expression}} is ambiguous if
 8453 @var{expression} contains unparenthesized operators other than
 8454 @samp{$}; for example, @samp{getline < dir "/" file} is ambiguous
 8455 because the concatenation operator (not discussed yet; @pxref{Concatenation})
 8456 is not parenthesized.  You should write it as @samp{getline < (dir "/" file)} if
 8457 you want your program to be portable to all @command{awk} implementations.
 8458 
 8459 @node Getline/Variable/File
 8460 @subsection Using @code{getline} into a Variable from a File
 8461 @cindex variables @subentry @code{getline} command into, using
 8462 
 8463 Use @samp{getline @var{var} < @var{file}} to read input
 8464 from the file
 8465 @var{file}, and put it in the variable @var{var}.  As earlier, @var{file}
 8466 is a string-valued expression that specifies the file from which to read.
 8467 
 8468 In this version of @code{getline}, none of the predefined variables are
 8469 changed and the record is not split into fields.  The only variable
 8470 changed is @var{var}.@footnote{This is not quite true. @code{RT} could
 8471 be changed if @code{RS} is a regular expression.}
 8472 For example, the following program copies all the input files to the
 8473 output, except for records that say @w{@samp{@@include @var{filename}}}.
 8474 Such a record is replaced by the contents of the file
 8475 @var{filename}:
 8476 
 8477 @example
 8478 @{
 8479      if (NF == 2 && $1 == "@@include") @{
 8480           while ((getline line < $2) > 0)
 8481                print line
 8482           close($2)
 8483      @} else
 8484           print
 8485 @}
 8486 @end example
 8487 
 8488 Note here how the name of the extra input file is not built into
 8489 the program; it is taken directly from the data, specifically from the second field on
 8490 the @code{@@include} line.
 8491 
 8492 The @code{close()} function is called to ensure that if two identical
 8493 @code{@@include} lines appear in the input, the entire specified file is
 8494 included twice.
 8495 @xref{Close Files And Pipes}.
 8496 
 8497 One deficiency of this program is that it does not process nested
 8498 @code{@@include} statements
 8499 (i.e., @code{@@include} statements in included files)
 8500 the way a true macro preprocessor would.
 8501 @xref{Igawk Program} for a program
 8502 that does handle nested @code{@@include} statements.
 8503 
 8504 @node Getline/Pipe
 8505 @subsection Using @code{getline} from a Pipe
 8506 
 8507 @c From private email, dated October 2, 1988. Used by permission, March 2013.
 8508 @cindex Kernighan, Brian
 8509 @quotation
 8510 @i{Omniscience has much to recommend it.
 8511 Failing that, attention to details would be useful.}
 8512 @author Brian Kernighan
 8513 @end quotation
 8514 
 8515 @cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O)
 8516 @cindex vertical bar (@code{|}) @subentry @code{|} operator (I/O)
 8517 @cindex input pipeline
 8518 @cindex pipe @subentry input
 8519 @cindex operators @subentry input/output
 8520 The output of a command can also be piped into @code{getline}, using
 8521 @samp{@var{command} | getline}.  In
 8522 this case, the string @var{command} is run as a shell command and its output
 8523 is piped into @command{awk} to be used as input.  This form of @code{getline}
 8524 reads one record at a time from the pipe.
 8525 For example, the following program copies its input to its output, except for
 8526 lines that begin with @samp{@@execute}, which are replaced by the output
 8527 produced by running the rest of the line as a shell command:
 8528 
 8529 @example
 8530 @group
 8531 @{
 8532      if ($1 == "@@execute") @{
 8533           tmp = substr($0, 10)        # Remove "@@execute"
 8534           while ((tmp | getline) > 0)
 8535                print
 8536           close(tmp)
 8537      @} else
 8538           print
 8539 @}
 8540 @end group
 8541 @end example
 8542 
 8543 @noindent
 8544 The @code{close()} function is called to ensure that if two identical
 8545 @samp{@@execute} lines appear in the input, the command is run for
 8546 each one.
 8547 @ifnottex
 8548 @ifnotdocbook
 8549 @xref{Close Files And Pipes}.
 8550 @end ifnotdocbook
 8551 @end ifnottex
 8552 @c This example is unrealistic, since you could just use system
 8553 Given the input:
 8554 
 8555 @example
 8556 foo
 8557 bar
 8558 baz
 8559 @@execute who
 8560 bletch
 8561 @end example
 8562 
 8563 @noindent
 8564 the program might produce:
 8565 
 8566 @cindex Robbins @subentry Bill
 8567 @cindex Robbins @subentry Miriam
 8568 @cindex Robbins @subentry Arnold
 8569 @example
 8570 foo
 8571 bar
 8572 baz
 8573 arnold     ttyv0   Jul 13 14:22
 8574 miriam     ttyp0   Jul 13 14:23     (murphy:0)
 8575 bill       ttyp1   Jul 13 14:23     (murphy:0)
 8576 bletch
 8577 @end example
 8578 
 8579 @noindent
 8580 Notice that this program ran the command @command{who} and printed the result.
 8581 (If you try this program yourself, you will of course get different results,
 8582 depending upon who is logged in on your system.)
 8583 
 8584 This variation of @code{getline} splits the record into fields, sets the
 8585 value of @code{NF}, and recomputes the value of @code{$0}.  The values of
 8586 @code{NR} and @code{FNR} are not changed.
 8587 @code{RT} is set.
 8588 
 8589 @cindex POSIX @command{awk} @subentry @code{|} I/O operator and
 8590 @c Thanks to Paul Eggert for initial wording here
 8591 According to POSIX, @samp{@var{expression} | getline} is ambiguous if
 8592 @var{expression} contains unparenthesized operators other than
 8593 @samp{$}---for example, @samp{@w{"echo "} "date" | getline} is ambiguous
 8594 because the concatenation operator is not parenthesized.  You should
 8595 write it as @samp{(@w{"echo "} "date") | getline} if you want your program
 8596 to be portable to all @command{awk} implementations.
 8597 
 8598 @cindex Brian Kernighan's @command{awk}
 8599 @cindex @command{mawk} utility
 8600 @quotation NOTE
 8601 Unfortunately, @command{gawk} has not been consistent in its treatment
 8602 of a construct like @samp{@w{"echo "} "date" | getline}.
 8603 Most versions, including the current version, treat it as
 8604 @samp{@w{("echo "} "date") | getline}.
 8605 (This is also how BWK @command{awk} behaves.)
 8606 Some versions instead treat it as
 8607 @samp{@w{"echo "} ("date" | getline)}.
 8608 (This is how @command{mawk} behaves.)
 8609 In short, @emph{always} use explicit parentheses, and then you won't
 8610 have to worry.
 8611 @end quotation
 8612 
 8613 @node Getline/Variable/Pipe
 8614 @subsection Using @code{getline} into a Variable from a Pipe
 8615 @cindex variables @subentry @code{getline} command into, using
 8616 
 8617 When you use @samp{@var{command} | getline @var{var}}, the
 8618 output of @var{command} is sent through a pipe to
 8619 @code{getline} and into the variable @var{var}.  For example, the
 8620 following program reads the current date and time into the variable
 8621 @code{current_time}, using the @command{date} utility, and then
 8622 prints it:
 8623 
 8624 @example
 8625 BEGIN @{
 8626      "date" | getline current_time
 8627      close("date")
 8628      print "Report printed on " current_time
 8629 @}
 8630 @end example
 8631 
 8632 In this version of @code{getline}, none of the predefined variables are
 8633 changed and the record is not split into fields. However, @code{RT} is set.
 8634 
 8635 @ifinfo
 8636 @c Thanks to Paul Eggert for initial wording here
 8637 According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if
 8638 @var{expression} contains unparenthesized operators other than
 8639 @samp{$}; for example, @samp{@w{"echo "} "date" | getline @var{var}} is ambiguous
 8640 because the concatenation operator is not parenthesized. You should
 8641 write it as @samp{(@w{"echo "} "date") | getline @var{var}} if you want your
 8642 program to be portable to other @command{awk} implementations.
 8643 @end ifinfo
 8644 
 8645 @node Getline/Coprocess
 8646 @subsection Using @code{getline} from a Coprocess
 8647 @cindex coprocesses @subentry @code{getline} from
 8648 @cindex @code{getline} command @subentry coprocesses, using from
 8649 @cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O)
 8650 @cindex vertical bar (@code{|}) @subentry @code{|&} operator (I/O)
 8651 @cindex operators @subentry input/output
 8652 @cindex differences in @command{awk} and @command{gawk} @subentry input/output operators
 8653 
 8654 Reading input into @code{getline} from a pipe is a one-way operation.
 8655 The command that is started with @samp{@var{command} | getline} only
 8656 sends data @emph{to} your @command{awk} program.
 8657 
 8658 On occasion, you might want to send data to another program
 8659 for processing and then read the results back.
 8660 @command{gawk} allows you to start a @dfn{coprocess}, with which two-way
 8661 communications are possible.  This is done with the @samp{|&}
 8662 operator.
 8663 Typically, you write data to the coprocess first and then
 8664 read the results back, as shown in the following:
 8665 
 8666 @example
 8667 print "@var{some query}" |& "db_server"
 8668 "db_server" |& getline
 8669 @end example
 8670 
 8671 @noindent
 8672 which sends a query to @command{db_server} and then reads the results.
 8673 
 8674 The values of @code{NR} and
 8675 @code{FNR} are not changed,
 8676 because the main input stream is not used.
 8677 However, the record is split into fields in
 8678 the normal manner, thus changing the values of @code{$0}, of the other fields,
 8679 and of @code{NF} and @code{RT}.
 8680 
 8681 Coprocesses are an advanced feature. They are discussed here only because
 8682 this is the @value{SECTION} on @code{getline}.
 8683 @xref{Two-way I/O},
 8684 where coprocesses are discussed in more detail.
 8685 
 8686 @node Getline/Variable/Coprocess
 8687 @subsection Using @code{getline} into a Variable from a Coprocess
 8688 @cindex variables @subentry @code{getline} command into, using
 8689 
 8690 When you use @samp{@var{command} |& getline @var{var}}, the output from
 8691 the coprocess @var{command} is sent through a two-way pipe to @code{getline}
 8692 and into the variable @var{var}.
 8693 
 8694 In this version of @code{getline}, none of the predefined variables are
 8695 changed and the record is not split into fields.  The only variable
 8696 changed is @var{var}.
 8697 However, @code{RT} is set.
 8698 
 8699 @ifinfo
 8700 Coprocesses are an advanced feature. They are discussed here only because
 8701 this is the @value{SECTION} on @code{getline}.
 8702 @xref{Two-way I/O},
 8703 where coprocesses are discussed in more detail.
 8704 @end ifinfo
 8705 
 8706 @node Getline Notes
 8707 @subsection Points to Remember About @code{getline}
 8708 Here are some miscellaneous points about @code{getline} that
 8709 you should bear in mind:
 8710 
 8711 @itemize @value{BULLET}
 8712 @item
 8713 When @code{getline} changes the value of @code{$0} and @code{NF},
 8714 @command{awk} does @emph{not} automatically jump to the start of the
 8715 program and start testing the new record against every pattern.
 8716 However, the new record is tested against any subsequent rules.
 8717 
 8718 @cindex differences in @command{awk} and @command{gawk} @subentry implementation limitations
 8719 @cindex implementation issues, @command{gawk} @subentry limits
 8720 @cindex @command{awk} @subentry implementations @subentry limits
 8721 @cindex @command{gawk} @subentry implementation issues @subentry limits
 8722 @item
 8723 Some very old @command{awk} implementations limit the number of pipelines that an @command{awk}
 8724 program may have open to just one.  In @command{gawk}, there is no such limit.
 8725 You can open as many pipelines (and coprocesses) as the underlying operating
 8726 system permits.
 8727 
 8728 @cindex side effects @subentry @code{FILENAME} variable
 8729 @cindex @code{FILENAME} variable @subentry @code{getline}, setting with
 8730 @cindex dark corner @subentry @code{FILENAME} variable
 8731 @cindex @code{getline} command @subentry @code{FILENAME} variable and
 8732 @cindex @code{BEGIN} pattern @subentry @code{getline} and
 8733 @item
 8734 An interesting side effect occurs if you use @code{getline} without a
 8735 redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline}
 8736 reads from the command-line @value{DF}s, the first @code{getline} command
 8737 causes @command{awk} to set the value of @code{FILENAME}. Normally,
 8738 @code{FILENAME} does not have a value inside @code{BEGIN} rules, because you
 8739 have not yet started to process the command-line @value{DF}s.
 8740 @value{DARKCORNER}
 8741 (See @ref{BEGIN/END};
 8742 also @pxref{Auto-set}.)
 8743 
 8744 @item
 8745 Using @code{FILENAME} with @code{getline}
 8746 (@samp{getline < FILENAME})
 8747 is likely to be a source of
 8748 confusion.  @command{awk} opens a separate input stream from the
 8749 current input file.  However, by not using a variable, @code{$0}
 8750 and @code{NF} are still updated.  If you're doing this, it's
 8751 probably by accident, and you should reconsider what it is you're
 8752 trying to accomplish.
 8753 
 8754 @item
 8755 @ifdocbook
 8756 The next @value{SECTION}
 8757 @end ifdocbook
 8758 @ifnotdocbook
 8759 @ref{Getline Summary},
 8760 @end ifnotdocbook
 8761 presents a table summarizing the
 8762 @code{getline} variants and which variables they can affect.
 8763 It is worth noting that those variants that do not use redirection
 8764 can cause @code{FILENAME} to be updated if they cause
 8765 @command{awk} to start reading a new input file.
 8766 
 8767 @item
 8768 @cindex Moore, Duncan
 8769 If the variable being assigned is an expression with side effects,
 8770 different versions of @command{awk} behave differently upon encountering
 8771 end-of-file.  Some versions don't evaluate the expression; many versions
 8772 (including @command{gawk}) do.  Here is an example, courtesy of Duncan Moore:
 8773 
 8774 @ignore
 8775 Date: Sun, 01 Apr 2012 11:49:33 +0100
 8776 From: Duncan Moore <duncan.moore@@gmx.com>
 8777 @end ignore
 8778 
 8779 @example
 8780 BEGIN @{
 8781     system("echo 1 > f")
 8782     while ((getline a[++c] < "f") > 0) @{ @}
 8783     print c
 8784 @}
 8785 @end example
 8786 
 8787 @noindent
 8788 Here, the side effect is the @samp{++c}.  Is @code{c} incremented if
 8789 end-of-file is encountered before the element in @code{a} is assigned?
 8790 
 8791 @command{gawk} treats @code{getline} like a function call, and evaluates
 8792 the expression @samp{a[++c]} before attempting to read from @file{f}.
 8793 However, some versions of @command{awk} only evaluate the expression once they
 8794 know that there is a string value to be assigned.
 8795 @end itemize
 8796 
 8797 @node Getline Summary
 8798 @subsection Summary of @code{getline} Variants
 8799 @cindex @code{getline} command @subentry variants
 8800 
 8801 @ref{table-getline-variants}
 8802 summarizes the eight variants of @code{getline},
 8803 listing which predefined variables are set by each one,
 8804 and whether the variant is standard or a @command{gawk} extension.
 8805 Note: for each variant, @command{gawk} sets the @code{RT} predefined variable.
 8806 
 8807 @float Table,table-getline-variants
 8808 @caption{@code{getline} variants and what they set}
 8809 @multitable @columnfractions .33 .38 .27
 8810 @headitem Variant @tab Effect @tab @command{awk} / @command{gawk}
 8811 @item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, @code{NR}, and @code{RT} @tab @command{awk}
 8812 @item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, @code{NR}, and @code{RT} @tab @command{awk}
 8813 @item @code{getline <} @var{file} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{awk}
 8814 @item @code{getline @var{var} < @var{file}} @tab Sets @var{var} and @code{RT} @tab @command{awk}
 8815 @item @var{command} @code{| getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{awk}
 8816 @item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{awk}
 8817 @item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{gawk}
 8818 @item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{gawk}
 8819 @end multitable
 8820 @end float
 8821 
 8822 @node Read Timeout
 8823 @section Reading Input with a Timeout
 8824 @cindex timeout, reading input
 8825 
 8826 @cindex differences in @command{awk} and @command{gawk} @subentry read timeouts
 8827 This @value{SECTION} describes a feature that is specific to @command{gawk}.
 8828 
 8829 You may specify a timeout in milliseconds for reading input from the keyboard,
 8830 a pipe, or two-way communication, including TCP/IP sockets. This can be done
 8831 on a per-input, per-command, or per-connection basis, by setting a special
 8832 element in the @code{PROCINFO} array (@pxref{Auto-set}):
 8833 
 8834 @example
 8835 PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds}
 8836 @end example
 8837 
 8838 When set, this causes @command{gawk} to time out and return failure
 8839 if no data is available to read within the specified timeout period.
 8840 For example, a TCP client can decide to give up on receiving
 8841 any response from the server after a certain amount of time:
 8842 
 8843 @example
 8844 @group
 8845 Service = "/inet/tcp/0/localhost/daytime"
 8846 PROCINFO[Service, "READ_TIMEOUT"] = 100
 8847 if ((Service |& getline) > 0)
 8848     print $0
 8849 else if (ERRNO != "")
 8850     print ERRNO
 8851 @end group
 8852 @end example
 8853 
 8854 Here is how to read interactively from the user@footnote{This assumes
 8855 that standard input is the keyboard.} without waiting
 8856 for more than five seconds:
 8857 
 8858 @example
 8859 PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000
 8860 while ((getline < "/dev/stdin") > 0)
 8861     print $0
 8862 @end example
 8863 
 8864 @command{gawk} terminates the read operation if input does not
 8865 arrive after waiting for the timeout period, returns failure,
 8866 and sets @code{ERRNO} to an appropriate string value.
 8867 A negative or zero value for the timeout is the same as specifying
 8868 no timeout at all.
 8869 
 8870 A timeout can also be set for reading from the keyboard in the implicit
 8871 loop that reads input records and matches them against patterns,
 8872 like so:
 8873 
 8874 @example
 8875 $ @kbd{gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}}
 8876 > @kbd{@{ print "You entered: " $0 @}'}
 8877 @kbd{gawk}
 8878 @print{} You entered: gawk
 8879 @end example
 8880 
 8881 In this case, failure to respond within five seconds results in the following
 8882 error message:
 8883 
 8884 @example
 8885 @error{} gawk: cmd. line:2: (FILENAME=- FNR=1) fatal: error reading input file `-': Connection timed out
 8886 @end example
 8887 
 8888 The timeout can be set or changed at any time, and will take effect on the
 8889 next attempt to read from the input device. In the following example,
 8890 we start with a timeout value of one second, and progressively
 8891 reduce it by one-tenth of a second until we wait indefinitely
 8892 for the input to arrive:
 8893 
 8894 @example
 8895 PROCINFO[Service, "READ_TIMEOUT"] = 1000
 8896 while ((Service |& getline) > 0) @{
 8897     print $0
 8898     PROCINFO[Service, "READ_TIMEOUT"] -= 100
 8899 @}
 8900 @end example
 8901 
 8902 @quotation NOTE
 8903 You should not assume that the read operation will block
 8904 exactly after the tenth record has been printed. It is possible that
 8905 @command{gawk} will read and buffer more than one record's
 8906 worth of data the first time. Because of this, changing the value
 8907 of timeout like in the preceding example is not very useful.
 8908 @end quotation
 8909 
 8910 @cindex @env{GAWK_READ_TIMEOUT} environment variable
 8911 @cindex environment variables @subentry @env{GAWK_READ_TIMEOUT}
 8912 If the @code{PROCINFO} element is not present and the
 8913 @env{GAWK_READ_TIMEOUT} environment variable exists,
 8914 @command{gawk} uses its value to initialize the timeout value.
 8915 The exclusive use of the environment variable to specify timeout
 8916 has the disadvantage of not being able to control it
 8917 on a per-command or per-connection basis.
 8918 
 8919 @command{gawk} considers a timeout event to be an error even though
 8920 the attempt to read from the underlying device may
 8921 succeed in a later attempt. This is a limitation, and it also
 8922 means that you cannot use this to multiplex input from
 8923 two or more sources.  @xref{Retrying Input} for a way to enable 
 8924 later I/O attempts to succeed.
 8925 
 8926 Assigning a timeout value prevents read operations from
 8927 blocking indefinitely. But bear in mind that there are other ways
 8928 @command{gawk} can stall waiting for an input device to be ready.
 8929 A network client can sometimes take a long time to establish
 8930 a connection before it can start reading any data,
 8931 or the attempt to open a FIFO special file for reading can block
 8932 indefinitely until some other process opens it for writing.
 8933 
 8934 @node Retrying Input
 8935 @section Retrying Reads After Certain Input Errors
 8936 @cindex retrying input
 8937 
 8938 @cindex differences in @command{awk} and @command{gawk} @subentry retrying input
 8939 This @value{SECTION} describes a feature that is specific to @command{gawk}.
 8940 
 8941 When @command{gawk} encounters an error while reading input, by
 8942 default @code{getline} returns @minus{}1, and subsequent attempts to
 8943 read from that file result in an end-of-file indication.  However, you
 8944 may optionally instruct @command{gawk} to allow I/O to be retried when
 8945 certain errors are encountered by setting a special element in
 8946 the @code{PROCINFO} array (@pxref{Auto-set}):
 8947 
 8948 @example
 8949 PROCINFO["@var{input_name}", "RETRY"] = 1
 8950 @end example
 8951 
 8952 When this element exists, @command{gawk} checks the value of the system
 8953 (C language)
 8954 @code{errno} variable when an I/O error occurs.  If @code{errno} indicates
 8955 a subsequent I/O attempt may succeed, @code{getline} instead returns
 8956 @minus{}2 and
 8957 further calls to @code{getline} may succeed.  This applies to the @code{errno}
 8958 values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}.
 8959 
 8960 This feature is useful in conjunction with
 8961 @code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file
 8962 descriptor has been configured to behave in a non-blocking fashion.
 8963 
 8964 @node Command-line directories
 8965 @section Directories on the Command Line
 8966 @cindex differences in @command{awk} and @command{gawk} @subentry command-line directories
 8967 @cindex directories @subentry command-line
 8968 @cindex command line @subentry directories on
 8969 
 8970 According to the POSIX standard, files named on the @command{awk}
 8971 command line must be text files; it is a fatal error if they are not.
 8972 Most versions of @command{awk} treat a directory on the command line as
 8973 a fatal error.
 8974 
 8975 By default, @command{gawk} produces a warning for a directory on the
 8976 command line, but otherwise ignores it.  This makes it easier to use
 8977 shell wildcards with your @command{awk} program:
 8978 
 8979 @example
 8980 $ @kbd{gawk -f whizprog.awk *}        @ii{Directories could kill this program}
 8981 @end example
 8982 
 8983 If either of the @option{--posix}
 8984 or @option{--traditional} options is given, then @command{gawk} reverts
 8985 to treating a directory on the command line as a fatal error.
 8986 
 8987 @xref{Extension Sample Readdir} for a way to treat directories
 8988 as usable data from an @command{awk} program.
 8989 
 8990 @node Input Summary
 8991 @section Summary
 8992 
 8993 @itemize @value{BULLET}
 8994 @item
 8995 Input is split into records based on the value of @code{RS}.
 8996 The possibilities are as follows:
 8997 
 8998 @multitable @columnfractions .25 .35 .40
 8999 @headitem Value of @code{RS} @tab Records are split on @dots{} @tab @command{awk} / @command{gawk}
 9000 @item Any single character @tab That character @tab @command{awk}
 9001 @item The empty string (@code{""}) @tab Runs of two or more newlines @tab @command{awk}
 9002 @item A regexp @tab Text that matches the regexp @tab @command{gawk}
 9003 @end multitable
 9004 
 9005 @item
 9006 @code{FNR} indicates how many records have been read from the current input file;
 9007 @code{NR} indicates how many records have been read in total.
 9008 
 9009 @item
 9010 @command{gawk} sets @code{RT} to the text matched by @code{RS}.
 9011 
 9012 @item
 9013 After splitting the input into records, @command{awk} further splits
 9014 the records into individual fields, named @code{$1}, @code{$2}, and so
 9015 on. @code{$0} is the whole record, and @code{NF} indicates how many
 9016 fields there are.  The default way to split fields is between whitespace
 9017 characters.
 9018 
 9019 @item
 9020 Fields may be referenced using a variable, as in @code{$NF}.  Fields
 9021 may also be assigned values, which causes the value of @code{$0} to be
 9022 recomputed when it is later referenced. Assigning to a field with a number
 9023 greater than @code{NF} creates the field and rebuilds the record, using
 9024 @code{OFS} to separate the fields.  Incrementing @code{NF} does the same
 9025 thing. Decrementing @code{NF} throws away fields and rebuilds the record.
 9026 
 9027 @item
 9028 Field splitting is more complicated than record splitting:
 9029 
 9030 @multitable @columnfractions .40 .40 .20
 9031 @headitem Field separator value @tab Fields are split @dots{} @tab @command{awk} / @command{gawk}
 9032 @item @code{FS == " "} @tab On runs of whitespace @tab @command{awk}
 9033 @item @code{FS == @var{any single character}} @tab On that character @tab @command{awk}
 9034 @item @code{FS == @var{regexp}} @tab On text matching the regexp @tab @command{awk}
 9035 @item @code{FS == ""}  @tab Such that each individual character is a separate field @tab @command{gawk}
 9036 @item @code{FIELDWIDTHS == @var{list of columns}} @tab Based on character position @tab @command{gawk}
 9037 @item @code{FPAT == @var{regexp}} @tab On the text surrounding text matching the regexp @tab @command{gawk}
 9038 @end multitable
 9039 
 9040 @item
 9041 Using @samp{FS = "\n"} causes the entire record to be a single field
 9042 (assuming that newlines separate records).
 9043 
 9044 @item
 9045 @code{FS} may be set from the command line using the @option{-F} option.
 9046 This can also be done using command-line variable assignment.
 9047 
 9048 @item
 9049 Use @code{PROCINFO["FS"]} to see how fields are being split.
 9050 
 9051 @item
 9052 Use @code{getline} in its various forms to read additional records
 9053 from the default input stream, from a file, or from a pipe or coprocess.
 9054 
 9055 @item
 9056 Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out
 9057 for @var{file}.
 9058 
 9059 @cindex POSIX mode
 9060 @item
 9061 Directories on the command line are fatal for standard @command{awk};
 9062 @command{gawk} ignores them if not in POSIX mode.
 9063 
 9064 @end itemize
 9065 
 9066 @c EXCLUDE START
 9067 @node Input Exercises
 9068 @section Exercises
 9069 
 9070 @enumerate
 9071 @item
 9072 Using the @code{FIELDWIDTHS} variable (@pxref{Constant Size}),
 9073 write a program to read election data, where each record represents
 9074 one voter's votes.  Come up with a way to define which columns are
 9075 associated with each ballot item, and print the total votes,
 9076 including abstentions, for each item.
 9077 
 9078 @end enumerate
 9079 @c EXCLUDE END
 9080 
 9081 @node Printing
 9082 @chapter Printing Output
 9083 
 9084 @cindex printing
 9085 @cindex output, printing @seeentry{printing}
 9086 One of the most common programming actions is to @dfn{print}, or output,
 9087 some or all of the input.  Use the @code{print} statement
 9088 for simple output, and the @code{printf} statement
 9089 for fancier formatting.
 9090 The @code{print} statement is not limited when
 9091 computing @emph{which} values to print. However, with two exceptions,
 9092 you cannot specify @emph{how} to print them---how many
 9093 columns, whether to use exponential notation or not, and so on.
 9094 (For the exceptions, @pxref{Output Separators} and
 9095 @ref{OFMT}.)
 9096 For printing with specifications, you need the @code{printf} statement
 9097 (@pxref{Printf}).
 9098 
 9099 @cindex @code{print} statement
 9100 @cindex @code{printf} statement
 9101 Besides basic and formatted printing, this @value{CHAPTER}
 9102 also covers I/O redirections to files and pipes, introduces
 9103 the special @value{FN}s that @command{gawk} processes internally,
 9104 and discusses the @code{close()} built-in function.
 9105 
 9106 @menu
 9107 * Print::                       The @code{print} statement.
 9108 * Print Examples::              Simple examples of @code{print} statements.
 9109 * Output Separators::           The output separators and how to change them.
 9110 * OFMT::                        Controlling Numeric Output With @code{print}.
 9111 * Printf::                      The @code{printf} statement.
 9112 * Redirection::                 How to redirect output to multiple files and
 9113                                 pipes.
 9114 * Special FD::                  Special files for I/O.
 9115 * Special Files::               File name interpretation in @command{gawk}.
 9116                                 @command{gawk} allows access to inherited file
 9117                                 descriptors.
 9118 * Close Files And Pipes::       Closing Input and Output Files and Pipes.
 9119 * Nonfatal::                    Enabling Nonfatal Output.
 9120 * Output Summary::              Output summary.
 9121 * Output Exercises::            Exercises.
 9122 @end menu
 9123 
 9124 @node Print
 9125 @section The @code{print} Statement
 9126 
 9127 Use the @code{print} statement to produce output with simple, standardized
 9128 formatting.  You specify only the strings or numbers to print, in a
 9129 list separated by commas.  They are output, separated by single spaces,
 9130 followed by a newline.  The statement looks like this:
 9131 
 9132 @example
 9133 print @var{item1}, @var{item2}, @dots{}
 9134 @end example
 9135 
 9136 @noindent
 9137 The entire list of items may be optionally enclosed in parentheses.  The
 9138 parentheses are necessary if any of the item expressions uses the @samp{>}
 9139 relational operator; otherwise it could be confused with an output redirection
 9140 (@pxref{Redirection}).
 9141 
 9142 The items to print can be constant strings or numbers, fields of the
 9143 current record (such as @code{$1}), variables, or any @command{awk}
 9144 expression.  Numeric values are converted to strings and then printed.
 9145 
 9146 @cindex records @subentry printing
 9147 @cindex lines @subentry blank, printing
 9148 @cindex text, printing
 9149 The simple statement @samp{print} with no items is equivalent to
 9150 @samp{print $0}: it prints the entire current record.  To print a blank
 9151 line, use @samp{print ""}.
 9152 To print a fixed piece of text, use a string constant, such as
 9153 @w{@code{"Don't Panic"}}, as one item.  If you forget to use the
 9154 double-quote characters, your text is taken as an @command{awk}
 9155 expression, and you will probably get an error.  Keep in mind that a
 9156 space is printed between any two items.
 9157 
 9158 Note that the @code{print} statement is a statement and not an
 9159 expression---you can't use it in the pattern part of a
 9160 pattern--action statement, for example.
 9161 
 9162 @node Print Examples
 9163 @section @code{print} Statement Examples
 9164 
 9165 Each @code{print} statement makes at least one line of output.  However, it
 9166 isn't limited to only one line.  If an item value is a string containing a
 9167 newline, the newline is output along with the rest of the string.  A
 9168 single @code{print} statement can make any number of lines this way.
 9169 
 9170 @cindex newlines @subentry printing
 9171 The following is an example of printing a string that contains embedded
 9172 @ifinfo
 9173 newlines
 9174 (the @samp{\n} is an escape sequence, used to represent the newline
 9175 character; @pxref{Escape Sequences}):
 9176 @end ifinfo
 9177 @ifhtml
 9178 newlines
 9179 (the @samp{\n} is an escape sequence, used to represent the newline
 9180 character; @pxref{Escape Sequences}):
 9181 @end ifhtml
 9182 @ifnotinfo
 9183 @ifnothtml
 9184 newlines:
 9185 @end ifnothtml
 9186 @end ifnotinfo
 9187 
 9188 @example
 9189 @group
 9190 $ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'}
 9191 @print{} line one
 9192 @print{} line two
 9193 @print{} line three
 9194 @end group
 9195 @end example
 9196 
 9197 @cindex fields @subentry printing
 9198 The next example, which is run on the @file{inventory-shipped} file,
 9199 prints the first two fields of each input record, with a space between
 9200 them:
 9201 
 9202 @example
 9203 $ @kbd{awk '@{ print $1, $2 @}' inventory-shipped}
 9204 @print{} Jan 13
 9205 @print{} Feb 15
 9206 @print{} Mar 15
 9207 @dots{}
 9208 @end example
 9209 
 9210 @cindex @code{print} statement @subentry commas, omitting
 9211 @cindex troubleshooting @subentry @code{print} statement, omitting commas
 9212 A common mistake in using the @code{print} statement is to omit the comma
 9213 between two items.  This often has the effect of making the items run
 9214 together in the output, with no space.  The reason for this is that
 9215 juxtaposing two string expressions in @command{awk} means to concatenate
 9216 them.  Here is the same program, without the comma:
 9217 
 9218 @example
 9219 $ @kbd{awk '@{ print $1 $2 @}' inventory-shipped}
 9220 @print{} Jan13
 9221 @print{} Feb15
 9222 @print{} Mar15
 9223 @dots{}
 9224 @end example
 9225 
 9226 @cindex @code{BEGIN} pattern @subentry headings, adding
 9227 To someone unfamiliar with the @file{inventory-shipped} file, neither
 9228 example's output makes much sense.  A heading line at the beginning
 9229 would make it clearer.  Let's add some headings to our table of months
 9230 (@code{$1}) and green crates shipped (@code{$2}).  We do this using
 9231 a @code{BEGIN} rule (@pxref{BEGIN/END}) so that the headings are only
 9232 printed once:
 9233 
 9234 @example
 9235 awk 'BEGIN @{  print "Month Crates"
 9236               print "----- ------" @}
 9237            @{  print $1, $2 @}' inventory-shipped
 9238 @end example
 9239 
 9240 @noindent
 9241 When run, the program prints the following:
 9242 
 9243 @example
 9244 Month Crates
 9245 ----- ------
 9246 Jan 13
 9247 Feb 15
 9248 Mar 15
 9249 @dots{}
 9250 @end example
 9251 
 9252 @noindent
 9253 The only problem, however, is that the headings and the table data
 9254 don't line up!  We can fix this by printing some spaces between the
 9255 two fields:
 9256 
 9257 @example
 9258 @group
 9259 awk 'BEGIN @{ print "Month Crates"
 9260              print "----- ------" @}
 9261            @{ print $1, "     ", $2 @}' inventory-shipped
 9262 @end group
 9263 @end example
 9264 
 9265 @cindex @code{printf} statement @subentry columns, aligning
 9266 @cindex columns @subentry aligning
 9267 Lining up columns this way can get pretty
 9268 complicated when there are many columns to fix.  Counting spaces for two
 9269 or three columns is simple, but any more than this can take up
 9270 a lot of time. This is why the @code{printf} statement was
 9271 created (@pxref{Printf});
 9272 one of its specialties is lining up columns of data.
 9273 
 9274 @cindex line continuations @subentry in @code{print} statement
 9275 @cindex @code{print} statement @subentry line continuations and
 9276 @quotation NOTE
 9277 You can continue either a @code{print} or
 9278 @code{printf} statement simply by putting a newline after any comma
 9279 (@pxref{Statements/Lines}).
 9280 @end quotation
 9281 
 9282 @node Output Separators
 9283 @section Output Separators
 9284 
 9285 @cindex @code{OFS} variable
 9286 As mentioned previously, a @code{print} statement contains a list
 9287 of items separated by commas.  In the output, the items are normally
 9288 separated by single spaces.  However, this doesn't need to be the case;
 9289 a single space is simply the default.  Any string of
 9290 characters may be used as the @dfn{output field separator} by setting the
 9291 predefined variable @code{OFS}.  The initial value of this variable
 9292 is the string @w{@code{" "}} (i.e., a single space).
 9293 
 9294 The output from an entire @code{print} statement is called an @dfn{output
 9295 record}.  Each @code{print} statement outputs one output record, and
 9296 then outputs a string called the @dfn{output record separator} (or
 9297 @code{ORS}).  The initial value of @code{ORS} is the string @code{"\n"}
 9298 (i.e., a newline character).  Thus, each @code{print} statement normally
 9299 makes a separate line.
 9300 
 9301 @cindex output @subentry records
 9302 @cindex output record separator @seeentry{@code{ORS} variable}
 9303 @cindex @code{ORS} variable
 9304 @cindex @code{BEGIN} pattern @subentry @code{OFS}/@code{ORS} variables, assigning values to
 9305 In order to change how output fields and records are separated, assign
 9306 new values to the variables @code{OFS} and @code{ORS}.  The usual
 9307 place to do this is in the @code{BEGIN} rule
 9308 (@pxref{BEGIN/END}), so
 9309 that it happens before any input is processed.  It can also be done
 9310 with assignments on the command line, before the names of the input
 9311 files, or using the @option{-v} command-line option
 9312 (@pxref{Options}).
 9313 The following example prints the first and second fields of each input
 9314 record, separated by a semicolon, with a blank line added after each
 9315 newline:
 9316 
 9317 
 9318 @example
 9319 $ @kbd{awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}}
 9320 >            @kbd{@{ print $1, $2 @}' mail-list}
 9321 @print{} Amelia;555-5553
 9322 @print{}
 9323 @print{} Anthony;555-3412
 9324 @print{}
 9325 @print{} Becky;555-7685
 9326 @print{}
 9327 @print{} Bill;555-1675
 9328 @print{}
 9329 @print{} Broderick;555-0542
 9330 @print{}
 9331 @print{} Camilla;555-2912
 9332 @print{}
 9333 @print{} Fabius;555-1234
 9334 @print{}
 9335 @print{} Julie;555-6699
 9336 @print{}
 9337 @print{} Martin;555-6480
 9338 @print{}
 9339 @print{} Samuel;555-3430
 9340 @print{}
 9341 @print{} Jean-Paul;555-2127
 9342 @print{}
 9343 @end example
 9344 
 9345 If the value of @code{ORS} does not contain a newline, the program's output
 9346 runs together on a single line.
 9347 
 9348 @node OFMT
 9349 @section Controlling Numeric Output with @code{print}
 9350 @cindex numeric @subentry output format
 9351 @cindex formats, numeric output
 9352 When printing numeric values with the @code{print} statement,
 9353 @command{awk} internally converts each number to a string of characters
 9354 and prints that string.  @command{awk} uses the @code{sprintf()} function
 9355 to do this conversion
 9356 (@pxref{String Functions}).
 9357 For now, it suffices to say that the @code{sprintf()}
 9358 function accepts a @dfn{format specification} that tells it how to format
 9359 numbers (or strings), and that there are a number of different ways in which
 9360 numbers can be formatted.  The different format specifications are discussed
 9361 more fully in
 9362 @ref{Control Letters}.
 9363 
 9364 @cindexawkfunc{sprintf}
 9365 @cindex @code{OFMT} variable
 9366 @cindex output @subentry format specifier, @code{OFMT}
 9367 The predefined variable @code{OFMT} contains the format specification
 9368 that @code{print} uses with @code{sprintf()} when it wants to convert a
 9369 number to a string for printing.
 9370 The default value of @code{OFMT} is @code{"%.6g"}.
 9371 The way @code{print} prints numbers can be changed
 9372 by supplying a different format specification
 9373 for the value of @code{OFMT}, as shown in the following example:
 9374 
 9375 @example
 9376 $ @kbd{awk 'BEGIN @{}
 9377 >   @kbd{OFMT = "%.0f"  # print numbers as integers (rounds)}
 9378 >   @kbd{print 17.23, 17.54 @}'}
 9379 @print{} 17 18
 9380 @end example
 9381 
 9382 @noindent
 9383 @cindex dark corner @subentry @code{OFMT} variable
 9384 @cindex POSIX @command{awk} @subentry @code{OFMT} variable and
 9385 @cindex @code{OFMT} variable @subentry POSIX @command{awk} and
 9386 According to the POSIX standard, @command{awk}'s behavior is undefined
 9387 if @code{OFMT} contains anything but a floating-point conversion specification.
 9388 @value{DARKCORNER}
 9389 
 9390 @node Printf
 9391 @section Using @code{printf} Statements for Fancier Printing
 9392 
 9393 @cindex @code{printf} statement
 9394 @cindex output @subentry formatted
 9395 @cindex formatting @subentry output
 9396 For more precise control over the output format than what is
 9397 provided by @code{print}, use @code{printf}.
 9398 With @code{printf} you can
 9399 specify the width to use for each item, as well as various
 9400 formatting choices for numbers (such as what output base to use, whether to
 9401 print an exponent, whether to print a sign, and how many digits to print
 9402 after the decimal point).
 9403 
 9404 @menu
 9405 * Basic Printf::                Syntax of the @code{printf} statement.
 9406 * Control Letters::             Format-control letters.
 9407 * Format Modifiers::            Format-specification modifiers.
 9408 * Printf Examples::             Several examples.
 9409 @end menu
 9410 
 9411 @node Basic Printf
 9412 @subsection Introduction to the @code{printf} Statement
 9413 
 9414 @cindex @code{printf} statement @subentry syntax of
 9415 A simple @code{printf} statement looks like this:
 9416 
 9417 @example
 9418 printf @var{format}, @var{item1}, @var{item2}, @dots{}
 9419 @end example
 9420 
 9421 @noindent
 9422 As for @code{print}, the entire list of arguments may optionally be
 9423 enclosed in parentheses. Here too, the parentheses are necessary if any
 9424 of the item expressions uses the @samp{>} relational operator; otherwise,
 9425 it can be confused with an output redirection (@pxref{Redirection}).
 9426 
 9427 @cindex format specifiers
 9428 The difference between @code{printf} and @code{print} is the @var{format}
 9429 argument.  This is an expression whose value is taken as a string; it
 9430 specifies how to output each of the other arguments.  It is called the
 9431 @dfn{format string}.
 9432 
 9433 The format string is very similar to that in the ISO C library function
 9434 @code{printf()}.  Most of @var{format} is text to output verbatim.
 9435 Scattered among this text are @dfn{format specifiers}---one per item.
 9436 Each format specifier says to output the next item in the argument list
 9437 at that place in the format.
 9438 
 9439 The @code{printf} statement does not automatically append a newline
 9440 to its output.  It outputs only what the format string specifies.
 9441 So if a newline is needed, you must include one in the format string.
 9442 The output separator variables @code{OFS} and @code{ORS} have no effect
 9443 on @code{printf} statements. For example:
 9444 
 9445 @example
 9446 @group
 9447 $ @kbd{awk 'BEGIN @{}
 9448 >    @kbd{ORS = "\nOUCH!\n"; OFS = "+"}
 9449 >    @kbd{msg = "Don\47t Panic!"}
 9450 >    @kbd{printf "%s\n", msg}
 9451 > @kbd{@}'}
 9452 @print{} Don't Panic!
 9453 @end group
 9454 @end example
 9455 
 9456 @noindent
 9457 Here, neither the @samp{+} nor the @samp{OUCH!} appears in
 9458 the output message.
 9459 
 9460 @node Control Letters
 9461 @subsection Format-Control Letters
 9462 @cindex @code{printf} statement @subentry format-control characters
 9463 @cindex format specifiers @subentry @code{printf} statement
 9464 
 9465 A format specifier starts with the character @samp{%} and ends with
 9466 a @dfn{format-control letter}---it tells the @code{printf} statement
 9467 how to output one item.  The format-control letter specifies what @emph{kind}
 9468 of value to print.  The rest of the format specifier is made up of
 9469 optional @dfn{modifiers} that control @emph{how} to print the value, such as
 9470 the field width.  Here is a list of the format-control letters:
 9471 
 9472 @c @asis for docbook to come out right
 9473 @table @asis
 9474 @item @code{%a}, @code{%A}
 9475 A floating point number of the form
 9476 [@code{-}]@code{0x@var{h}.@var{hhhh}p+-@var{dd}}
 9477 (C99 hexadecimal floating point format).
 9478 For @code{%A},
 9479 uppercase letters are used instead of lowercase ones.
 9480 
 9481 @quotation NOTE
 9482 The current POSIX standard requires support for @code{%a} and @code{%A} in
 9483 @command{awk}. As far as we know, besides @command{gawk}, the only other
 9484 version of @command{awk} that actually implements it is BWK @command{awk}.
 9485 It's use is thus highly nonportable!
 9486 
 9487 Furthermore, these formats are not available on any system where the
 9488 underlying C library @code{printf()} function does not support them. As
 9489 of this writing, among current systems, only OpenVMS is known to not
 9490 support them.
 9491 @end quotation
 9492 
 9493 @item @code{%c}
 9494 Print a number as a character; thus, @samp{printf "%c",
 9495 65} outputs the letter @samp{A}. The output for a string value is
 9496 the first character of the string.
 9497 
 9498 @cindex dark corner @subentry format-control characters
 9499 @cindex @command{gawk} @subentry format-control characters
 9500 @quotation NOTE
 9501 The POSIX standard says the first character of a string is printed.
 9502 In locales with multibyte characters, @command{gawk} attempts to
 9503 convert the leading bytes of the string into a valid wide character
 9504 and then to print the multibyte encoding of that character.
 9505 Similarly, when printing a numeric value, @command{gawk} allows the
 9506 value to be within the numeric range of values that can be held
 9507 in a wide character.
 9508 If the conversion to multibyte encoding fails, @command{gawk}
 9509 uses the low eight bits of the value as the character to print.
 9510 
 9511 Other @command{awk} versions generally restrict themselves to printing
 9512 the first byte of a string or to numeric values within the range of
 9513 a single byte (0--255).
 9514 @value{DARKCORNER}
 9515 @end quotation
 9516 
 9517 
 9518 @item @code{%d}, @code{%i}
 9519 Print a decimal integer.
 9520 The two control letters are equivalent.
 9521 (The @samp{%i} specification is for compatibility with ISO C.)
 9522 
 9523 @item @code{%e}, @code{%E}
 9524 Print a number in scientific (exponential) notation.
 9525 For example:
 9526 
 9527 @example
 9528 printf "%4.3e\n", 1950
 9529 @end example
 9530 
 9531 @noindent
 9532 prints @samp{1.950e+03}, with a total of four significant figures, three of
 9533 which follow the decimal point.
 9534 (The @samp{4.3} represents two modifiers,
 9535 discussed in the next @value{SUBSECTION}.)
 9536 @samp{%E} uses @samp{E} instead of @samp{e} in the output.
 9537 
 9538 @item @code{%f}
 9539 Print a number in floating-point notation.
 9540 For example:
 9541 
 9542 @example
 9543 printf "%4.3f", 1950
 9544 @end example
 9545 
 9546 @noindent
 9547 prints @samp{1950.000}, with a minimum of four significant figures, three of
 9548 which follow the decimal point.
 9549 (The @samp{4.3} represents two modifiers,
 9550 discussed in the next @value{SUBSECTION}.)
 9551 
 9552 On systems supporting IEEE 754 floating-point format, values
 9553 representing negative
 9554 infinity are formatted as
 9555 @samp{-inf} or @samp{-infinity},
 9556 and positive infinity as
 9557 @samp{inf} or @samp{infinity}.
 9558 The special ``not a number'' value formats as @samp{-nan} or @samp{nan}
 9559 (@pxref{Math Definitions}).
 9560 
 9561 @item @code{%F}
 9562 Like @samp{%f}, but the infinity and ``not a number'' values are spelled
 9563 using uppercase letters.
 9564 
 9565 The @samp{%F} format is a POSIX extension to ISO C; not all systems
 9566 support it.  On those that don't, @command{gawk} uses @samp{%f} instead.
 9567 
 9568 @item @code{%g}, @code{%G}
 9569 Print a number in either scientific notation or in floating-point
 9570 notation, whichever uses fewer characters; if the result is printed in
 9571 scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}.
 9572 
 9573 @item @code{%o}
 9574 Print an unsigned octal integer
 9575 (@pxref{Nondecimal-numbers}).
 9576 
 9577 @item @code{%s}
 9578 Print a string.
 9579 
 9580 @item @code{%u}
 9581 Print an unsigned decimal integer.
 9582 (This format is of marginal use, because all numbers in @command{awk}
 9583 are floating point; it is provided primarily for compatibility with C.)
 9584 
 9585 @item @code{%x}, @code{%X}
 9586 Print an unsigned hexadecimal integer;
 9587 @samp{%X} uses the letters @samp{A} through @samp{F}
 9588 instead of @samp{a} through @samp{f}
 9589 (@pxref{Nondecimal-numbers}).
 9590 
 9591 @item @code{%%}
 9592 Print a single @samp{%}.
 9593 This does not consume an
 9594 argument and it ignores any modifiers.
 9595 @end table
 9596 
 9597 @cindex dark corner @subentry format-control characters
 9598 @cindex @command{gawk} @subentry format-control characters
 9599 @quotation NOTE
 9600 When using the integer format-control letters for values that are
 9601 outside the range of the widest C integer type, @command{gawk} switches to
 9602 the @samp{%g} format specifier. If @option{--lint} is provided on the
 9603 command line (@pxref{Options}), @command{gawk}
 9604 warns about this.  Other versions of @command{awk} may print invalid
 9605 values or do something else entirely.
 9606 @value{DARKCORNER}
 9607 @end quotation
 9608 
 9609 @quotation NOTE
 9610 The IEEE 754 standard for floating-point arithmetic allows for special
 9611 values that represent ``infinity'' (positive and negative) and values
 9612 that are ``not a number'' (NaN).
 9613 
 9614 Input and output of these values occurs as text strings. This is
 9615 somewhat problematic for the @command{awk} language, which predates
 9616 the IEEE standard.  Further details are provided in
 9617 @ref{POSIX Floating Point Problems}; please see there.
 9618 @end quotation
 9619 
 9620 @node Format Modifiers
 9621 @subsection Modifiers for @code{printf} Formats
 9622 
 9623 @cindex @code{printf} statement @subentry modifiers
 9624 @cindex modifiers, in format specifiers
 9625 A format specification can also include @dfn{modifiers} that can control
 9626 how much of the item's value is printed, as well as how much space it gets.
 9627 The modifiers come between the @samp{%} and the format-control letter.
 9628 We use the bullet symbol ``@bullet{}'' in the following examples to
 9629 represent
 9630 spaces in the output. Here are the possible modifiers, in the order in
 9631 which they may appear:
 9632 
 9633 @table @asis
 9634 @cindex differences in @command{awk} and @command{gawk} @subentry @code{print}/@code{printf} statements
 9635 @cindex @code{printf} statement @subentry positional specifiers
 9636 @c the code{} does NOT start a secondary
 9637 @cindex positional specifiers, @code{printf} statement
 9638 @item @code{@var{N}$}
 9639 An integer constant followed by a @samp{$} is a @dfn{positional specifier}.
 9640 Normally, format specifications are applied to arguments in the order
 9641 given in the format string.  With a positional specifier, the format
 9642 specification is applied to a specific argument, instead of what
 9643 would be the next argument in the list.  Positional specifiers begin
 9644 counting with one. Thus:
 9645 
 9646 @example
 9647 printf "%s %s\n", "don't", "panic"
 9648 printf "%2$s %1$s\n", "panic", "don't"
 9649 @end example
 9650 
 9651 @noindent
 9652 prints the famous friendly message twice.
 9653 
 9654 At first glance, this feature doesn't seem to be of much use.
 9655 It is in fact a @command{gawk} extension, intended for use in translating
 9656 messages at runtime.
 9657 @xref{Printf Ordering},
 9658 which describes how and why to use positional specifiers.
 9659 For now, we ignore them.
 9660 
 9661 @item @code{-} (Minus)
 9662 The minus sign, used before the width modifier (see later on in
 9663 this list),
 9664 says to left-justify
 9665 the argument within its specified width.  Normally, the argument
 9666 is printed right-justified in the specified width.  Thus:
 9667 
 9668 @example
 9669 printf "%-4s", "foo"
 9670 @end example
 9671 
 9672 @noindent
 9673 prints @samp{foo@bullet{}}.
 9674 
 9675 @item @var{space}
 9676 For numeric conversions, prefix positive values with a space and
 9677 negative values with a minus sign.
 9678 
 9679 @item @code{+}
 9680 The plus sign, used before the width modifier (see later on in
 9681 this list),
 9682 says to always supply a sign for numeric conversions, even if the data
 9683 to format is positive. The @samp{+} overrides the space modifier.
 9684 
 9685 @item @code{#}
 9686 Use an ``alternative form'' for certain control letters.
 9687 For @samp{%o}, supply a leading zero.
 9688 For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for
 9689 a nonzero result.
 9690 For @samp{%e}, @samp{%E}, @samp{%f}, and @samp{%F}, the result always
 9691 contains a decimal point.
 9692 For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result.
 9693 
 9694 @item @code{0}
 9695 A leading @samp{0} (zero) acts as a flag indicating that output should be
 9696 padded with zeros instead of spaces.
 9697 This applies only to the numeric output formats.
 9698 This flag only has an effect when the field width is wider than the
 9699 value to print.
 9700 
 9701 @item @code{'}
 9702 A single quote or apostrophe character is a POSIX extension to ISO C.
 9703 It indicates that the integer part of a floating-point value, or the
 9704 entire part of an integer decimal value, should have a thousands-separator
 9705 character in it.  This only works in locales that support such characters.
 9706 For example:
 9707 
 9708 @example
 9709 $ @kbd{cat thousands.awk}          @ii{Show source program}
 9710 @print{} BEGIN @{ printf "%'d\n", 1234567 @}
 9711 $ @kbd{LC_ALL=C gawk -f thousands.awk}
 9712 @print{} 1234567                   @ii{Results in} "C" @ii{locale}
 9713 $ @kbd{LC_ALL=en_US.UTF-8 gawk -f thousands.awk}
 9714 @print{} 1,234,567                 @ii{Results in US English UTF locale}
 9715 @end example
 9716 
 9717 @noindent
 9718 For more information about locales and internationalization issues,
 9719 see @ref{Locales}.
 9720 
 9721 @quotation NOTE
 9722 The @samp{'} flag is a nice feature, but its use complicates things: it
 9723 becomes difficult to use it in command-line programs.  For information
 9724 on appropriate quoting tricks, see @ref{Quoting}.
 9725 @end quotation
 9726 
 9727 @item @var{width}
 9728 This is a number specifying the desired minimum width of a field.  Inserting any
 9729 number between the @samp{%} sign and the format-control character forces the
 9730 field to expand to this width.  The default way to do this is to
 9731 pad with spaces on the left.  For example:
 9732 
 9733 @example
 9734 printf "%4s", "foo"
 9735 @end example
 9736 
 9737 @noindent
 9738 prints @samp{@bullet{}foo}.
 9739 
 9740 The value of @var{width} is a minimum width, not a maximum.  If the item
 9741 value requires more than @var{width} characters, it can be as wide as
 9742 necessary.  Thus, the following:
 9743 
 9744 @example
 9745 printf "%4s", "foobar"
 9746 @end example
 9747 
 9748 @noindent
 9749 prints @samp{foobar}.
 9750 
 9751 Preceding the @var{width} with a minus sign causes the output to be
 9752 padded with spaces on the right, instead of on the left.
 9753 
 9754 @item @code{.@var{prec}}
 9755 A period followed by an integer constant
 9756 specifies the precision to use when printing.
 9757 The meaning of the precision varies by control letter:
 9758 
 9759 @table @asis
 9760 @item @code{%d}, @code{%i}, @code{%o}, @code{%u}, @code{%x}, @code{%X}
 9761 Minimum number of digits to print.
 9762 
 9763 @item @code{%e}, @code{%E}, @code{%f}, @code{%F}
 9764 Number of digits to the right of the decimal point.
 9765 
 9766 @item @code{%g}, @code{%G}
 9767 Maximum number of significant digits.
 9768 
 9769 @item @code{%s}
 9770 Maximum number of characters from the string that should print.
 9771 @end table
 9772 
 9773 Thus, the following:
 9774 
 9775 @example
 9776 printf "%.4s", "foobar"
 9777 @end example
 9778 
 9779 @noindent
 9780 prints @samp{foob}.
 9781 @end table
 9782 
 9783 The C library @code{printf}'s dynamic @var{width} and @var{prec}
 9784 capability (e.g., @code{"%*.*s"}) is supported.  Instead of
 9785 supplying explicit @var{width} and/or @var{prec} values in the format
 9786 string, they are passed in the argument list.  For example:
 9787 
 9788 @example
 9789 w = 5
 9790 p = 3
 9791 s = "abcdefg"
 9792 printf "%*.*s\n", w, p, s
 9793 @end example
 9794 
 9795 @noindent
 9796 is exactly equivalent to:
 9797 
 9798 @example
 9799 s = "abcdefg"
 9800 printf "%5.3s\n", s
 9801 @end example
 9802 
 9803 @noindent
 9804 Both programs output @samp{@w{@bullet{}@bullet{}abc}}.
 9805 Earlier versions of @command{awk} did not support this capability.
 9806 If you must use such a version, you may simulate this feature by using
 9807 concatenation to build up the format string, like so:
 9808 
 9809 @example
 9810 w = 5
 9811 p = 3
 9812 s = "abcdefg"
 9813 printf "%" w "." p "s\n", s
 9814 @end example
 9815 
 9816 @noindent
 9817 This is not particularly easy to read, but it does work.
 9818 
 9819 @c @cindex lint checks
 9820 @cindex troubleshooting @subentry fatal errors @subentry @code{printf} format strings
 9821 @cindex POSIX @command{awk} @subentry @code{printf} format strings and
 9822 C programmers may be used to supplying additional modifiers (@samp{h},
 9823 @samp{j}, @samp{l}, @samp{L}, @samp{t}, and @samp{z}) in @code{printf}
 9824 format strings. These are not valid in @command{awk}.  Most @command{awk}
 9825 implementations silently ignore them.  If @option{--lint} is provided
 9826 on the command line (@pxref{Options}), @command{gawk} warns about their
 9827 use. If @option{--posix} is supplied, their use is a fatal error.
 9828 
 9829 @node Printf Examples
 9830 @subsection Examples Using @code{printf}
 9831 
 9832 The following simple example shows
 9833 how to use @code{printf} to make an aligned table:
 9834 
 9835 @example
 9836 awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list
 9837 @end example
 9838 
 9839 @noindent
 9840 This command
 9841 prints the names of the people (@code{$1}) in the file
 9842 @file{mail-list} as a string of 10 characters that are left-justified.  It also
 9843 prints the phone numbers (@code{$2}) next on the line.  This
 9844 produces an aligned two-column table of names and phone numbers,
 9845 as shown here:
 9846 
 9847 @example
 9848 $ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list}
 9849 @print{} Amelia     555-5553
 9850 @print{} Anthony    555-3412
 9851 @print{} Becky      555-7685
 9852 @print{} Bill       555-1675
 9853 @print{} Broderick  555-0542
 9854 @print{} Camilla    555-2912
 9855 @print{} Fabius     555-1234
 9856 @print{} Julie      555-6699
 9857 @print{} Martin     555-6480
 9858 @print{} Samuel     555-3430
 9859 @print{} Jean-Paul  555-2127
 9860 @end example
 9861 
 9862 In this case, the phone numbers had to be printed as strings because
 9863 the numbers are separated by dashes.  Printing the phone numbers as
 9864 numbers would have produced just the first three digits: @samp{555}.
 9865 This would have been pretty confusing.
 9866 
 9867 It wasn't necessary to specify a width for the phone numbers because
 9868 they are last on their lines.  They don't need to have spaces
 9869 after them.
 9870 
 9871 The table could be made to look even nicer by adding headings to the
 9872 tops of the columns.  This is done using a @code{BEGIN} rule
 9873 (@pxref{BEGIN/END})
 9874 so that the headers are only printed once, at the beginning of
 9875 the @command{awk} program:
 9876 
 9877 @example
 9878 awk 'BEGIN @{ print "Name      Number"
 9879              print "----      ------" @}
 9880            @{ printf "%-10s %s\n", $1, $2 @}' mail-list
 9881 @end example
 9882 
 9883 The preceding example mixes @code{print} and @code{printf} statements in
 9884 the same program.  Using just @code{printf} statements can produce the
 9885 same results:
 9886 
 9887 @example
 9888 awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
 9889              printf "%-10s %s\n", "----", "------" @}
 9890            @{ printf "%-10s %s\n", $1, $2 @}' mail-list
 9891 @end example
 9892 
 9893 @noindent
 9894 Printing each column heading with the same format specification
 9895 used for the column elements ensures that the headings
 9896 are aligned just like the columns.
 9897 
 9898 The fact that the same format specification is used three times can be
 9899 emphasized by storing it in a variable, like this:
 9900 
 9901 @example
 9902 awk 'BEGIN @{ format = "%-10s %s\n"
 9903              printf format, "Name", "Number"
 9904              printf format, "----", "------" @}
 9905            @{ printf format, $1, $2 @}' mail-list
 9906 @end example
 9907 
 9908 
 9909 @node Redirection
 9910 @section Redirecting Output of @code{print} and @code{printf}
 9911 
 9912 @cindex output redirection
 9913 @cindex redirection @subentry of output
 9914 @cindex @option{--sandbox} option @subentry output redirection with @code{print} @subentry @code{printf}
 9915 So far, the output from @code{print} and @code{printf} has gone
 9916 to the standard
 9917 output, usually the screen.  Both @code{print} and @code{printf} can
 9918 also send their output to other places.
 9919 This is called @dfn{redirection}.
 9920 
 9921 @quotation NOTE
 9922 When @option{--sandbox} is specified (@pxref{Options}),
 9923 redirecting output to files, pipes, and coprocesses is disabled.
 9924 @end quotation
 9925 
 9926 A redirection appears after the @code{print} or @code{printf} statement.
 9927 Redirections in @command{awk} are written just like redirections in shell
 9928 commands, except that they are written inside the @command{awk} program.
 9929 
 9930 @c the commas here are part of the see also
 9931 @cindex @code{print} statement @seealso{redirection of output}
 9932 @cindex @code{printf} statement @seealso{redirection of output}
 9933 There are four forms of output redirection: output to a file, output
 9934 appended to a file, output through a pipe to another command, and output
 9935 to a coprocess.  We show them all for the @code{print} statement,
 9936 but they work identically for @code{printf}:
 9937 
 9938 @table @code
 9939 @cindex @code{>} (right angle bracket) @subentry @code{>} operator (I/O)
 9940 @cindex right angle bracket (@code{>}) @subentry @code{>} operator (I/O)
 9941 @cindex operators @subentry input/output
 9942 @item print @var{items} > @var{output-file}
 9943 This redirection prints the items into the output file named
 9944 @var{output-file}.  The @value{FN} @var{output-file} can be any
 9945 expression.  Its value is changed to a string and then used as a
 9946 @value{FN} (@pxref{Expressions}).
 9947 
 9948 When this type of redirection is used, the @var{output-file} is erased
 9949 before the first output is written to it.  Subsequent writes to the same
 9950 @var{output-file} do not erase @var{output-file}, but append to it.
 9951 (This is different from how you use redirections in shell scripts.)
 9952 If @var{output-file} does not exist, it is created.  For example, here
 9953 is how an @command{awk} program can write a list of peoples' names to one
 9954 file named @file{name-list}, and a list of phone numbers to another file
 9955 named @file{phone-list}:
 9956 
 9957 @example
 9958 $ @kbd{awk '@{ print $2 > "phone-list"}
 9959 >        @kbd{print $1 > "name-list" @}' mail-list}
 9960 $ @kbd{cat phone-list}
 9961 @print{} 555-5553
 9962 @print{} 555-3412
 9963 @dots{}
 9964 $ @kbd{cat name-list}
 9965 @print{} Amelia
 9966 @print{} Anthony
 9967 @dots{}
 9968 @end example
 9969 
 9970 @noindent
 9971 Each output file contains one name or number per line.
 9972 
 9973 @cindex @code{>} (right angle bracket) @subentry @code{>>} operator (I/O)
 9974 @cindex right angle bracket (@code{>}) @subentry @code{>>} operator (I/O)
 9975 @item print @var{items} >> @var{output-file}
 9976 This redirection prints the items into the preexisting output file
 9977 named @var{output-file}.  The difference between this and the
 9978 single-@samp{>} redirection is that the old contents (if any) of
 9979 @var{output-file} are not erased.  Instead, the @command{awk} output is
 9980 appended to the file.
 9981 If @var{output-file} does not exist, then it is created.
 9982 
 9983 @cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O)
 9984 @cindex pipe @subentry output
 9985 @cindex output @subentry pipes
 9986 @item print @var{items} | @var{command}
 9987 It is possible to send output to another program through a pipe
 9988 instead of into a file.   This redirection opens a pipe to
 9989 @var{command}, and writes the values of @var{items} through this pipe
 9990 to another process created to execute @var{command}.
 9991 
 9992 The redirection argument @var{command} is actually an @command{awk}
 9993 expression.  Its value is converted to a string whose contents give
 9994 the shell command to be run.  For example, the following produces two
 9995 files, one unsorted list of peoples' names, and one list sorted in reverse
 9996 alphabetical order:
 9997 
 9998 @ignore
 9999 10/2000:
10000 This isn't the best style, since COMMAND is assigned for each
10001 record.  It's done to avoid overfull hboxes in TeX.  Leave it
10002 alone for now and let's hope no-one notices.
10003 @end ignore
10004 
10005 @example
10006 @group
10007 awk '@{ print $1 > "names.unsorted"
10008        command = "sort -r > names.sorted"
10009        print $1 | command @}' mail-list
10010 @end group
10011 @end example
10012 
10013 The unsorted list is written with an ordinary redirection, while
10014 the sorted list is written by piping through the @command{sort} utility.
10015 
10016 The next example uses redirection to mail a message to the mailing
10017 list @code{bug-system}.  This might be useful when trouble is encountered
10018 in an @command{awk} script run periodically for system maintenance:
10019 
10020 @example
10021 report = "mail bug-system"
10022 print("Awk script failed:", $0) | report
10023 print("at record number", FNR, "of", FILENAME) | report
10024 close(report)
10025 @end example
10026 
10027 The @code{close()} function is called here because it's a good idea to close
10028 the pipe as soon as all the intended output has been sent to it.
10029 @xref{Close Files And Pipes}
10030 for more information.
10031 
10032 This example also illustrates the use of a variable to represent
10033 a @var{file} or @var{command}---it is not necessary to always
10034 use a string constant.  Using a variable is generally a good idea,
10035 because (if you mean to refer to that same file or command)
10036 @command{awk} requires that the string value be written identically
10037 every time.
10038 
10039 @cindex coprocesses
10040 @cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O)
10041 @cindex operators @subentry input/output
10042 @cindex differences in @command{awk} and @command{gawk} @subentry input/output operators
10043 @item print @var{items} |& @var{command}
10044 This redirection prints the items to the input of @var{command}.
10045 The difference between this and the
10046 single-@samp{|} redirection is that the output from @var{command}
10047 can be read with @code{getline}.
10048 Thus, @var{command} is a @dfn{coprocess}, which works together with
10049 but is subsidiary to the @command{awk} program.
10050 
10051 This feature is a @command{gawk} extension, and is not available in
10052 POSIX @command{awk}.
10053 @ifnotdocbook
10054 @xref{Getline/Coprocess},
10055 for a brief discussion.
10056 @xref{Two-way I/O},
10057 for a more complete discussion.
10058 @end ifnotdocbook
10059 @ifdocbook
10060 @xref{Getline/Coprocess}
10061 for a brief discussion and
10062 @ref{Two-way I/O}
10063 for a more complete discussion.
10064 @end ifdocbook
10065 @end table
10066 
10067 Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&}
10068 asks the system to open a file, pipe, or coprocess only if the particular
10069 @var{file} or @var{command} you specify has not already been written
10070 to by your program or if it has been closed since it was last written to.
10071 
10072 @cindex troubleshooting @subentry printing
10073 It is a common error to use @samp{>} redirection for the first @code{print}
10074 to a file, and then to use @samp{>>} for subsequent output:
10075 
10076 @example
10077 # clear the file
10078 print "Don't panic" > "guide.txt"
10079 @dots{}
10080 # append
10081 print "Avoid improbability generators" >> "guide.txt"
10082 @end example
10083 
10084 @noindent
10085 This is indeed how redirections must be used from the shell.  But in
10086 @command{awk}, it isn't necessary.  In this kind of case, a program should
10087 use @samp{>} for all the @code{print} statements, because the output file
10088 is only opened once. (It happens that if you mix @samp{>} and @samp{>>}
10089 output is produced in the expected order. However, mixing the operators
10090 for the same file is definitely poor style, and is confusing to readers
10091 of your program.)
10092 
10093 @cindex differences in @command{awk} and @command{gawk} @subentry implementation limitations
10094 @cindex implementation issues, @command{gawk} @subentry limits
10095 @cindex @command{awk} @subentry implementation issues @subentry pipes
10096 @cindex @command{gawk} @subentry implementation issues @subentry pipes
10097 @ifnotinfo
10098 As mentioned earlier
10099 (@pxref{Getline Notes}),
10100 many
10101 @end ifnotinfo
10102 @ifnottex
10103 @ifnotdocbook
10104 Many
10105 @end ifnotdocbook
10106 @end ifnottex
10107 older
10108 @command{awk} implementations limit the number of pipelines that an @command{awk}
10109 program may have open to just one!  In @command{gawk}, there is no such limit.
10110 @command{gawk} allows a program to
10111 open as many pipelines as the underlying operating system permits.
10112 
10113 @sidebar Piping into @command{sh}
10114 @cindex shells @subentry piping commands into
10115 
10116 A particularly powerful way to use redirection is to build command lines
10117 and pipe them into the shell, @command{sh}.  For example, suppose you
10118 have a list of files brought over from a system where all the @value{FN}s
10119 are stored in uppercase, and you wish to rename them to have names in
10120 all lowercase.  The following program is both simple and efficient:
10121 
10122 @c @cindex @command{mv} utility
10123 @example
10124 @{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @}
10125 
10126 END @{ close("sh") @}
10127 @end example
10128 
10129 The @code{tolower()} function returns its argument string with all
10130 uppercase characters converted to lowercase
10131 (@pxref{String Functions}).
10132 The program builds up a list of command lines,
10133 using the @command{mv} utility to rename the files.
10134 It then sends the list to the shell for execution.
10135 
10136 @xref{Shell Quoting} for a function that can help in generating
10137 command lines to be fed to the shell.
10138 @end sidebar
10139 
10140 @node Special FD
10141 @section Special Files for Standard Preopened Data Streams
10142 @cindex standard input
10143 @cindex input @subentry standard
10144 @cindex standard output
10145 @cindex output @subentry standard
10146 @cindex error output
10147 @cindex standard error
10148 @cindex file descriptors
10149 @cindex files @subentry descriptors @seeentry{file descriptors}
10150 
10151 Running programs conventionally have three input and output streams
10152 already available to them for reading and writing.  These are known
10153 as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard
10154 error output}.  These open streams (and any other open files or pipes)
10155 are often referred to by the technical term @dfn{file descriptors}.
10156 
10157 These streams are, by default, connected to your keyboard and screen, but
10158 they are often redirected with the shell, via the @samp{<}, @samp{<<},
10159 @samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators.  Standard error
10160 is typically used for writing error messages; the reason there are two separate
10161 streams, standard output and standard error, is so that they can be
10162 redirected separately.
10163 
10164 @cindex differences in @command{awk} and @command{gawk} @subentry error messages
10165 @cindex error handling
10166 In traditional implementations of @command{awk}, the only way to write an error
10167 message to standard error in an @command{awk} program is as follows:
10168 
10169 @example
10170 print "Serious error detected!" | "cat 1>&2"
10171 @end example
10172 
10173 @noindent
10174 This works by opening a pipeline to a shell command that can access the
10175 standard error stream that it inherits from the @command{awk} process.
10176 @c 8/2014: Mike Brennan says not to cite this as inefficient. So, fixed.
10177 This is far from elegant, and it also requires a
10178 separate process.  So people writing @command{awk} programs often
10179 don't do this.  Instead, they send the error messages to the
10180 screen, like this:
10181 
10182 @example
10183 print "Serious error detected!" > "/dev/tty"
10184 @end example
10185 
10186 @noindent
10187 (@file{/dev/tty} is a special file supplied by the operating system
10188 that is connected to your keyboard and screen. It represents the
10189 ``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for
10190 ``Teletype,'' a serial terminal.} which on modern systems is a keyboard
10191 and screen, not a serial console.)
10192 This generally has the same effect, but not always: although the
10193 standard error stream is usually the screen, it can be redirected; when
10194 that happens, writing to the screen is not correct.  In fact, if
10195 @command{awk} is run from a background job, it may not have a
10196 terminal at all.
10197 Then opening @file{/dev/tty} fails.
10198 
10199 @command{gawk}, BWK @command{awk}, and @command{mawk} provide
10200 special @value{FN}s for accessing the three standard streams.
10201 If the @value{FN} matches one of these special names when @command{gawk}
10202 (or one of the others) redirects input or output, then it directly uses
10203 the descriptor that the @value{FN} stands for.  These special
10204 @value{FN}s work for all operating systems that @command{gawk}
10205 has been ported to, not just those that are POSIX-compliant:
10206 
10207 @cindex common extensions @subentry @code{/dev/stdin} special file
10208 @cindex common extensions @subentry @code{/dev/stdout} special file
10209 @cindex common extensions @subentry @code{/dev/stderr} special file
10210 @cindex extensions @subentry common @subentry @code{/dev/stdin} special file
10211 @cindex extensions @subentry common @subentry @code{/dev/stdout} special file
10212 @cindex extensions @subentry common @subentry @code{/dev/stderr} special file
10213 @cindex file names @subentry standard streams in @command{gawk}
10214 @cindex @code{/dev/@dots{}} special files
10215 @cindex files @subentry @code{/dev/@dots{}} special files
10216 @cindex @code{/dev/fd/@var{N}} special files (@command{gawk})
10217 @table @file
10218 @item /dev/stdin
10219 The standard input (file descriptor 0).
10220 
10221 @item /dev/stdout
10222 The standard output (file descriptor 1).
10223 
10224 @item /dev/stderr
10225 The standard error output (file descriptor 2).
10226 @end table
10227 
10228 With these facilities,
10229 the proper way to write an error message then becomes:
10230 
10231 @example
10232 print "Serious error detected!" > "/dev/stderr"
10233 @end example
10234 
10235 @cindex troubleshooting @subentry quotes with file names
10236 Note the use of quotes around the @value{FN}.
10237 Like with any other redirection, the value must be a string.
10238 It is a common error to omit the quotes, which leads
10239 to confusing results.
10240 
10241 @command{gawk} does not treat these @value{FN}s as special when
10242 in POSIX-compatibility mode. However, because BWK @command{awk}
10243 supports them, @command{gawk} does support them even when
10244 invoked with the @option{--traditional} option (@pxref{Options}).
10245 
10246 @node Special Files
10247 @section Special @value{FFN}s in @command{gawk}
10248 @cindex @command{gawk} @subentry file names in
10249 
10250 Besides access to standard input, standard output, and standard error,
10251 @command{gawk} provides access to any open file descriptor.
10252 Additionally, there are special @value{FN}s reserved for
10253 TCP/IP networking.
10254 
10255 @menu
10256 * Other Inherited Files::       Accessing other open files with
10257                                 @command{gawk}.
10258 * Special Network::             Special files for network communications.
10259 * Special Caveats::             Things to watch out for.
10260 @end menu
10261 
10262 @node Other Inherited Files
10263 @subsection Accessing Other Open Files with @command{gawk}
10264 
10265 Besides the @code{/dev/stdin}, @code{/dev/stdout}, and @code{/dev/stderr}
10266 special @value{FN}s mentioned earlier, @command{gawk} provides syntax
10267 for accessing any other inherited open file:
10268 
10269 @table @file
10270 @item /dev/fd/@var{N}
10271 The file associated with file descriptor @var{N}.  Such a file must
10272 be opened by the program initiating the @command{awk} execution (typically
10273 the shell).  Unless special pains are taken in the shell from which
10274 @command{gawk} is invoked, only descriptors 0, 1, and 2 are available.
10275 @end table
10276 
10277 The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
10278 are essentially aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and
10279 @file{/dev/fd/2}, respectively. However, those names are more self-explanatory.
10280 
10281 Note that using @code{close()} on a @value{FN} of the
10282 form @code{"/dev/fd/@var{N}"}, for file descriptor numbers
10283 above two, does actually close the given file descriptor.
10284 
10285 @node Special Network
10286 @subsection Special Files for Network Communications
10287 @cindex networks @subentry support for
10288 @cindex TCP/IP @subentry support for
10289 
10290 @command{gawk} programs
10291 can open a two-way
10292 TCP/IP connection, acting as either a client or a server.
10293 This is done using a special @value{FN} of the form:
10294 
10295 @example
10296 @file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}
10297 @end example
10298 
10299 The @var{net-type} is one of @samp{inet}, @samp{inet4}, or @samp{inet6}.
10300 The @var{protocol} is one of @samp{tcp} or @samp{udp},
10301 and the other fields represent the other essential pieces of information
10302 for making a networking connection.
10303 These @value{FN}s are used with the @samp{|&} operator for communicating
10304 with @w{a coprocess}
10305 (@pxref{Two-way I/O}).
10306 This is an advanced feature, mentioned here only for completeness.
10307 Full discussion is delayed until
10308 @ref{TCP/IP Networking}.
10309 
10310 @node Special Caveats
10311 @subsection Special @value{FFN} Caveats
10312 
10313 Here are some things to bear in mind when using the
10314 special @value{FN}s that @command{gawk} provides:
10315 
10316 @itemize @value{BULLET}
10317 @cindex compatibility mode (@command{gawk}) @subentry file names
10318 @cindex file names @subentry in compatibility mode
10319 @cindex POSIX mode
10320 @item
10321 Recognition of the @value{FN}s for the three standard preopened
10322 files is disabled only in POSIX mode.
10323 
10324 @item
10325 Recognition of the other special @value{FN}s is disabled if @command{gawk} is in
10326 compatibility mode (either @option{--traditional} or @option{--posix};
10327 @pxref{Options}).
10328 
10329 @item
10330 @command{gawk} @emph{always}
10331 interprets these special @value{FN}s.
10332 For example, using @samp{/dev/fd/4}
10333 for output actually writes on file descriptor 4, and not on a new
10334 file descriptor that is @code{dup()}ed from file descriptor 4.  Most of
10335 the time this does not matter; however, it is important to @emph{not}
10336 close any of the files related to file descriptors 0, 1, and 2.
10337 Doing so results in unpredictable behavior.
10338 @end itemize
10339 
10340 @node Close Files And Pipes
10341 @section Closing Input and Output Redirections
10342 @cindex files @subentry output @seeentry{output files}
10343 @cindex input files @subentry closing
10344 @cindex output @subentry files, closing
10345 @cindex pipe @subentry closing
10346 @cindex coprocesses @subentry closing
10347 @cindex @code{getline} command @subentry coprocesses, using from
10348 
10349 If the same @value{FN} or the same shell command is used with @code{getline}
10350 more than once during the execution of an @command{awk} program
10351 (@pxref{Getline}),
10352 the file is opened (or the command is executed) the first time only.
10353 At that time, the first record of input is read from that file or command.
10354 The next time the same file or command is used with @code{getline},
10355 another record is read from it, and so on.
10356 
10357 Similarly, when a file or pipe is opened for output, @command{awk} remembers
10358 the @value{FN} or command associated with it, and subsequent
10359 writes to the same file or command are appended to the previous writes.
10360 The file or pipe stays open until @command{awk} exits.
10361 
10362 @cindexawkfunc{close}
10363 This implies that special steps are necessary in order to read the same
10364 file again from the beginning, or to rerun a shell command (rather than
10365 reading more output from the same command).  The @code{close()} function
10366 makes these things possible:
10367 
10368 @example
10369 close(@var{filename})
10370 @end example
10371 
10372 @noindent
10373 or:
10374 
10375 @example
10376 close(@var{command})
10377 @end example
10378 
10379 The argument @var{filename} or @var{command} can be any expression.  Its
10380 value must @emph{exactly} match the string that was used to open the file or
10381 start the command (spaces and other ``irrelevant'' characters
10382 included). For example, if you open a pipe with this:
10383 
10384 @example
10385 "sort -r names" | getline foo
10386 @end example
10387 
10388 @noindent
10389 then you must close it with this:
10390 
10391 @example
10392 close("sort -r names")
10393 @end example
10394 
10395 Once this function call is executed, the next @code{getline} from that
10396 file or command, or the next @code{print} or @code{printf} to that
10397 file or command, reopens the file or reruns the command.
10398 Because the expression that you use to close a file or pipeline must
10399 exactly match the expression used to open the file or run the command,
10400 it is good practice to use a variable to store the @value{FN} or command.
10401 The previous example becomes the following:
10402 
10403 @example
10404 @group
10405 sortcom = "sort -r names"
10406 sortcom | getline foo
10407 @end group
10408 @group
10409 @dots{}
10410 close(sortcom)
10411 @end group
10412 @end example
10413 
10414 @noindent
10415 This helps avoid hard-to-find typographical errors in your @command{awk}
10416 programs.  Here are some of the reasons for closing an output file:
10417 
10418 @itemize @value{BULLET}
10419 @item
10420 To write a file and read it back later on in the same @command{awk}
10421 program.  Close the file after writing it, then
10422 begin reading it with @code{getline}.
10423 
10424 @item
10425 To write numerous files, successively, in the same @command{awk}
10426 program.  If the files aren't closed, eventually @command{awk} may exceed a
10427 system limit on the number of open files in one process.  It is best to
10428 close each one when the program has finished writing it.
10429 
10430 @item
10431 To make a command finish.  When output is redirected through a pipe,
10432 the command reading the pipe normally continues to try to read input
10433 as long as the pipe is open.  Often this means the command cannot
10434 really do its work until the pipe is closed.  For example, if
10435 output is redirected to the @command{mail} program, the message is not
10436 actually sent until the pipe is closed.
10437 
10438 @item
10439 To run the same program a second time, with the same arguments.
10440 This is not the same thing as giving more input to the first run!
10441 
10442 For example, suppose a program pipes output to the @command{mail} program.
10443 If it outputs several lines redirected to this pipe without closing
10444 it, they make a single message of several lines.  By contrast, if the
10445 program closes the pipe after each line of output, then each line makes
10446 a separate message.
10447 @end itemize
10448 
10449 @cindex differences in @command{awk} and @command{gawk} @subentry @code{close()} function
10450 @cindex portability @subentry @code{close()} function and
10451 @cindex @code{close()} function @subentry portability
10452 If you use more files than the system allows you to have open,
10453 @command{gawk} attempts to multiplex the available open files among
10454 your @value{DF}s.  @command{gawk}'s ability to do this depends upon the
10455 facilities of your operating system, so it may not always work.  It is
10456 therefore both good practice and good portability advice to always
10457 use @code{close()} on your files when you are done with them.
10458 In fact, if you are using a lot of pipes, it is essential that
10459 you close commands when done. For example, consider something like this:
10460 
10461 @example
10462 @{
10463     @dots{}
10464     command = ("grep " $1 " /some/file | my_prog -q " $3)
10465     while ((command | getline) > 0) @{
10466         @var{process output of} command
10467     @}
10468     # need close(command) here
10469 @}
10470 @end example
10471 
10472 This example creates a new pipeline based on data in @emph{each} record.
10473 Without the call to @code{close()} indicated in the comment, @command{awk}
10474 creates child processes to run the commands, until it eventually
10475 runs out of file descriptors for more pipelines.
10476 
10477 Even though each command has finished (as indicated by the end-of-file
10478 return status from @code{getline}), the child process is not
10479 terminated;@footnote{The technical terminology is rather morbid.
10480 The finished child is called a ``zombie,'' and cleaning up after
10481 it is referred to as ``reaping.''}
10482 @c Good old UNIX: give the marketing guys fits, that's the ticket
10483 more importantly, the file descriptor for the pipe
10484 is not closed and released until @code{close()} is called or
10485 @command{awk} exits.
10486 
10487 @code{close()} silently does nothing if given an argument that
10488 does not represent a file, pipe, or coprocess that was opened with
10489 a redirection.  In such a case, it returns a negative value,
10490 indicating an error. In addition, @command{gawk} sets @code{ERRNO}
10491 to a string indicating the error.
10492 
10493 Note also that @samp{close(FILENAME)} has no ``magic'' effects on the
10494 implicit loop that reads through the files named on the command line.
10495 It is, more likely, a close of a file that was never opened with a
10496 redirection, so @command{awk} silently does nothing, except return
10497 a negative value.
10498 
10499 @cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) @subentry pipes, closing
10500 When using the @samp{|&} operator to communicate with a coprocess,
10501 it is occasionally useful to be able to close one end of the two-way
10502 pipe without closing the other.
10503 This is done by supplying a second argument to @code{close()}.
10504 As in any other call to @code{close()},
10505 the first argument is the name of the command or special file used
10506 to start the coprocess.
10507 The second argument should be a string, with either of the values
10508 @code{"to"} or @code{"from"}.  Case does not matter.
10509 As this is an advanced feature, discussion is
10510 delayed until
10511 @ref{Two-way I/O},
10512 which describes it in more detail and gives an example.
10513 
10514 @sidebar Using @code{close()}'s Return Value
10515 @cindex dark corner @subentry @code{close()} function
10516 @cindex @code{close()} function @subentry return value
10517 @cindex return value, @code{close()} function
10518 @cindex differences in @command{awk} and @command{gawk} @subentry @code{close()} function
10519 @cindex Unix @command{awk} @subentry @code{close()} function and
10520 
10521 In many older versions of Unix @command{awk}, the @code{close()} function
10522 is actually a statement.
10523 @value{DARKCORNER}
10524 It is a syntax error to try and use the return
10525 value from @code{close()}:
10526 
10527 @example
10528 command = "@dots{}"
10529 command | getline info
10530 retval = close(command)  # syntax error in many Unix awks
10531 @end example
10532 
10533 @cindex @command{gawk} @subentry @code{ERRNO} variable in
10534 @cindex @code{ERRNO} variable @subentry with @command{close()} function
10535 @command{gawk} treats @code{close()} as a function.
10536 The return value is @minus{}1 if the argument names something
10537 that was never opened with a redirection, or if there is
10538 a system problem closing the file or process.
10539 In these cases, @command{gawk} sets the predefined variable
10540 @code{ERRNO} to a string describing the problem.
10541 
10542 In @command{gawk}, starting with @value{PVERSION} 4.2, when closing a pipe or
10543 coprocess (input or output), the return value is the exit status of the
10544 command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior
10545 to @value{PVERSION} 4.2, the return value from closing a pipe or co-process
10546 was the full 16-bit exit value as defined by the @code{wait()} system
10547 call.} Otherwise, it is the return value from the system's @code{close()}
10548 or @code{fclose()} C functions when closing input or output files,
10549 respectively.  This value is zero if the close succeeds, or @minus{}1
10550 if it fails.
10551 
10552 @float Table,table-close-pipe-return-values
10553 @caption{Return values from @code{close()} of a pipe}
10554 @multitable @columnfractions .50 .50
10555 @headitem Situation @tab Return value from @code{close()}
10556 @item Normal exit of command @tab Command's exit status
10557 @item Death by signal of command @tab 256 + number of murderous signal
10558 @item Death by signal of command with core dump @tab 512 + number of murderous signal
10559 @item Some kind of error @tab @minus{}1
10560 @end multitable
10561 @end float
10562 
10563 @cindex POSIX mode
10564 The POSIX standard is very vague; it says that @code{close()}
10565 returns zero on success and a nonzero value otherwise.  In general,
10566 different implementations vary in what they report when closing
10567 pipes; thus, the return value cannot be used portably.
10568 @value{DARKCORNER}
10569 In POSIX mode (@pxref{Options}), @command{gawk} just returns zero
10570 when closing a pipe.
10571 @end sidebar
10572 
10573 @node Nonfatal
10574 @section Enabling Nonfatal Output
10575 
10576 This @value{SECTION} describes a @command{gawk}-specific feature.
10577 
10578 In standard @command{awk}, output with @code{print} or @code{printf}
10579 to a nonexistent file, or some other I/O error (such as filling up the
10580 disk) is a fatal error.
10581 
10582 @example
10583 $ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'}
10584 @error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No
10585 @error{} such file or directory)
10586 @end example
10587 
10588 @command{gawk} makes it possible to detect that an error has
10589 occurred, allowing you to possibly recover from the error, or
10590 at least print an error message of your choosing before exiting.
10591 You can do this in one of two ways:
10592 
10593 @itemize @bullet
10594 @item
10595 For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}.
10596 
10597 @item
10598 On a per-file basis, by assigning any value to
10599 @code{PROCINFO[@var{filename}, "NONFATAL"]}.
10600 Here, @var{filename} is the name of the file to which
10601 you wish output to be nonfatal.
10602 @end itemize
10603 
10604 Once you have enabled nonfatal output, you must check @code{ERRNO}
10605 after every relevant @code{print} or @code{printf} statement to
10606 see if something went wrong.  It is also a good idea to initialize
10607 @code{ERRNO} to zero before attempting the output. For example:
10608 
10609 @example
10610 $ @kbd{gawk '}
10611 > @kbd{BEGIN @{}
10612 > @kbd{    PROCINFO["NONFATAL"] = 1}
10613 > @kbd{    ERRNO = 0}
10614 > @kbd{    print "hi" > "/no/such/file"}
10615 > @kbd{    if (ERRNO) @{}
10616 > @kbd{        print("Output failed:", ERRNO) > "/dev/stderr"}
10617 > @kbd{        exit 1}
10618 > @kbd{    @}}
10619 > @kbd{@}'}
10620 @error{} Output failed: No such file or directory
10621 @end example
10622 
10623 Here, @command{gawk} did not produce a fatal error; instead
10624 it let the @command{awk} program code detect the problem and handle it.
10625 
10626 This mechanism works also for standard output and standard error.
10627 For standard output, you may use @code{PROCINFO["-", "NONFATAL"]}
10628 or @code{PROCINFO["/dev/stdout", "NONFATAL"]}.  For standard error, use
10629 @code{PROCINFO["/dev/stderr", "NONFATAL"]}.
10630 
10631 @cindex @env{GAWK_SOCK_RETRIES} environment variable
10632 @cindex environment variables @subentry @env{GAWK_SOCK_RETRIES}
10633 When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}),
10634 @command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES}
10635 environment variable (@pxref{Other Environment Variables}) allows you to
10636 override @command{gawk}'s builtin default number of attempts.  However,
10637 once nonfatal I/O is enabled for a given socket, @command{gawk} only
10638 retries once, relying on @command{awk}-level code to notice that there
10639 was a problem.
10640 
10641 @node Output Summary
10642 @section Summary
10643 
10644 @itemize @value{BULLET}
10645 @item
10646 The @code{print} statement prints comma-separated expressions. Each
10647 expression is separated by the value of @code{OFS} and terminated by
10648 the value of @code{ORS}.  @code{OFMT} provides the conversion format
10649 for numeric values for the @code{print} statement.
10650 
10651 @item
10652 The @code{printf} statement provides finer-grained control over output,
10653 with format-control letters for different data types and various flags
10654 that modify the behavior of the format-control letters.
10655 
10656 @item
10657 Output from both @code{print} and @code{printf} may be redirected to
10658 files, pipes, and coprocesses.
10659 
10660 @item
10661 @command{gawk} provides special @value{FN}s for access to standard input,
10662 output, and error, and for network communications.
10663 
10664 @item
10665 Use @code{close()} to close open file, pipe, and coprocess redirections.
10666 For coprocesses, it is possible to close only one direction of the
10667 communications.
10668 
10669 @item
10670 Normally errors with @code{print} or @code{printf} are fatal.
10671 @command{gawk} lets you make output errors be nonfatal either for
10672 all files or on a per-file basis. You must then check for errors
10673 after every relevant output statement.
10674 
10675 @end itemize
10676 
10677 @c EXCLUDE START
10678 @node Output Exercises
10679 @section Exercises
10680 
10681 @enumerate
10682 @item
10683 Rewrite the program:
10684 
10685 @example
10686 awk 'BEGIN @{ print "Month Crates"
10687              print "----- ------" @}
10688            @{ print $1, "     ", $2 @}' inventory-shipped
10689 @end example
10690 
10691 @noindent
10692 from @ref{Output Separators}, by using a new value of @code{OFS}.
10693 
10694 @item
10695 Use the @code{printf} statement to line up the headings and table data
10696 for the @file{inventory-shipped} example that was covered in @ref{Print}.
10697 
10698 @item
10699 What happens if you forget the double quotes when redirecting
10700 output, as follows:
10701 
10702 @example
10703 BEGIN @{ print "Serious error detected!" > /dev/stderr @}
10704 @end example
10705 
10706 @end enumerate
10707 @c EXCLUDE END
10708 
10709 
10710 @node Expressions
10711 @chapter Expressions
10712 @cindex expressions
10713 
10714 Expressions are the basic building blocks of @command{awk} patterns
10715 and actions.  An expression evaluates to a value that you can print, test,
10716 or pass to a function.  Additionally, an expression
10717 can assign a new value to a variable or a field by using an assignment operator.
10718 
10719 An expression can serve as a pattern or action statement on its own.
10720 Most other kinds of
10721 statements contain one or more expressions that specify the data on which to
10722 operate.  As in other languages, expressions in @command{awk} can include
10723 variables, array references, constants, and function calls, as well as
10724 combinations of these with various operators.
10725 
10726 @menu
10727 * Values::                      Constants, Variables, and Regular Expressions.
10728 * All Operators::               @command{gawk}'s operators.
10729 * Truth Values and Conditions:: Testing for true and false.
10730 * Function Calls::              A function call is an expression.
10731 * Precedence::                  How various operators nest.
10732 * Locales::                     How the locale affects things.
10733 * Expressions Summary::         Expressions summary.
10734 @end menu
10735 
10736 @node Values
10737 @section Constants, Variables, and Conversions
10738 
10739 Expressions are built up from values and the operations performed
10740 upon them. This @value{SECTION} describes the elementary objects
10741 that provide the values used in expressions.
10742 
10743 @menu
10744 * Constants::                   String, numeric and regexp constants.
10745 * Using Constant Regexps::      When and how to use a regexp constant.
10746 * Variables::                   Variables give names to values for later use.
10747 * Conversion::                  The conversion of strings to numbers and vice
10748                                 versa.
10749 @end menu
10750 
10751 @node Constants
10752 @subsection Constant Expressions
10753 
10754 @cindex constants @subentry types of
10755 
10756 The simplest type of expression is the @dfn{constant}, which always has
10757 the same value.  There are three types of constants: numeric,
10758 string, and regular expression.
10759 
10760 Each is used in the appropriate context when you need a data
10761 value that isn't going to change.  Numeric constants can
10762 have different forms, but are internally stored in an identical manner.
10763 
10764 @menu
10765 * Scalar Constants::            Numeric and string constants.
10766 * Nondecimal-numbers::          What are octal and hex numbers.
10767 * Regexp Constants::            Regular Expression constants.
10768 @end menu
10769 
10770 @node Scalar Constants
10771 @subsubsection Numeric and String Constants
10772 
10773 @cindex constants @subentry numeric
10774 @cindex numeric @subentry constants
10775 A @dfn{numeric constant} stands for a number.  This number can be an
10776 integer, a decimal fraction, or a number in scientific (exponential)
10777 notation.@footnote{The internal representation of all numbers,
10778 including integers, uses double-precision floating-point numbers.
10779 On most modern systems, these are in IEEE 754 standard format.
10780 @xref{Arbitrary Precision Arithmetic}, for much more information.}
10781 Here are some examples of numeric constants that all
10782 have the same value:
10783 
10784 @example
10785 105
10786 1.05e+2
10787 1050e-1
10788 @end example
10789 
10790 @cindex string @subentry constants
10791 @cindex constants @subentry string
10792 A @dfn{string constant} consists of a sequence of characters enclosed in
10793 double quotation marks.  For example:
10794 
10795 @example
10796 "parrot"
10797 @end example
10798 
10799 @noindent
10800 @cindex differences in @command{awk} and @command{gawk} @subentry strings
10801 @cindex strings @subentry length limitations
10802 @cindex ASCII
10803 represents the string whose contents are @samp{parrot}.  Strings in
10804 @command{gawk} can be of any length, and they can contain any of the possible
10805 eight-bit ASCII characters, including ASCII @sc{nul} (character code zero).
10806 Other @command{awk}
10807 implementations may have difficulty with some character codes.
10808 
10809 Some languages allow you to continue long strings across
10810 multiple lines by ending the line with a backslash. For example in C:
10811 
10812 @example
10813 #include <stdio.h>
10814 
10815 int main()
10816 @{
10817     printf("hello, \
10818 world\n");
10819     return 0;
10820 @}
10821 @end example
10822 
10823 @noindent
10824 In such a case, the C compiler removes both the backslash and the newline,
10825 producing a string as if it had been typed @samp{"hello, world\n"}.
10826 This is useful when a single string needs to contain a large amount of text.
10827 
10828 The POSIX standard says explicitly that newlines are not allowed inside string
10829 constants.  And indeed, all @command{awk} implementations report an error
10830 if you try to do so. For example:
10831 
10832 @example
10833 $ @kbd{gawk 'BEGIN @{ print "hello, }
10834 > @kbd{world" @}'}
10835 @print{} gawk: cmd. line:1: BEGIN @{ print "hello,
10836 @print{} gawk: cmd. line:1:               ^ unterminated string
10837 @print{} gawk: cmd. line:1: BEGIN @{ print "hello,
10838 @print{} gawk: cmd. line:1:               ^ syntax error
10839 @end example
10840 
10841 @cindex dark corner @subentry string continuation
10842 @cindex strings @subentry continuation across lines
10843 @cindex differences in @command{awk} and @command{gawk} @subentry strings
10844 Although POSIX doesn't define what happens if you use an escaped
10845 newline, as in the previous C example, all known versions of
10846 @command{awk} allow you to do so.  Unfortunately, what each one
10847 does with such a string varies.  @value{DARKCORNER} @command{gawk},
10848 @command{mawk}, and the OpenSolaris POSIX @command{awk}
10849 (@pxref{Other Versions}) elide the backslash and newline, as in C:
10850 
10851 @example
10852 $ @kbd{gawk 'BEGIN @{ print "hello, \}
10853 > @kbd{world" @}'}
10854 @print{} hello, world
10855 @end example
10856 
10857 @cindex POSIX mode
10858 In POSIX mode (@pxref{Options}), @command{gawk} does not
10859 allow escaped newlines.  Otherwise, it behaves as just described.
10860 
10861 Brian Kernighan's @command{awk} and BusyBox @command{awk}
10862 remove the backslash but leave the newline
10863 intact, as part of the string:
10864 
10865 @example
10866 $ @kbd{nawk 'BEGIN @{ print "hello, \}
10867 > @kbd{world" @}'}
10868 @print{} hello, 
10869 @print{} world
10870 @end example
10871 
10872 @node Nondecimal-numbers
10873 @subsubsection Octal and Hexadecimal Numbers
10874 @cindex octal numbers
10875 @cindex hexadecimal numbers
10876 @cindex numbers @subentry octal
10877 @cindex numbers @subentry hexadecimal
10878 
10879 In @command{awk}, all numbers are in decimal (i.e., base 10).  Many other
10880 programming languages allow you to specify numbers in other bases, often
10881 octal (base 8) and hexadecimal (base 16).
10882 In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on.
10883 Just as @samp{11} in decimal is 1 times 10 plus 1, so
10884 @samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal.
10885 In hexadecimal, there are 16 digits. Because the everyday decimal
10886 number system only has ten digits (@samp{0}--@samp{9}), the letters
10887 @samp{a} through @samp{f} represent the rest.
10888 (Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A}
10889 have the same value.)
10890 Thus, @samp{11} in
10891 hexadecimal is 1 times 16 plus 1, which equals 17 in decimal.
10892 
10893 Just by looking at plain @samp{11}, you can't tell what base it's in.
10894 So, in C, C++, and other languages derived from C,
10895 @c such as PERL, but we won't mention that....
10896 there is a special notation to signify the base.
10897 Octal numbers start with a leading @samp{0},
10898 and hexadecimal numbers start with a leading @samp{0x} or @samp{0X}:
10899 
10900 @table @code
10901 @item 11
10902 Decimal value 11
10903 
10904 @item 011
10905 Octal 11, decimal value 9
10906 
10907 @item 0x11
10908 Hexadecimal 11, decimal value 17
10909 @end table
10910 
10911 This example shows the difference:
10912 
10913 @example
10914 $ @kbd{gawk 'BEGIN @{ printf "%d, %d, %d\n", 011, 11, 0x11 @}'}
10915 @print{} 9, 11, 17
10916 @end example
10917 
10918 Being able to use octal and hexadecimal constants in your programs is most
10919 useful when working with data that cannot be represented conveniently as
10920 characters or as regular numbers, such as binary data of various sorts.
10921 
10922 @cindex @command{gawk} @subentry octal numbers and
10923 @cindex @command{gawk} @subentry hexadecimal numbers and
10924 @command{gawk} allows the use of octal and hexadecimal
10925 constants in your program text.  However, such numbers in the input data
10926 are not treated differently; doing so by default would break old
10927 programs.
10928 (If you really need to do this, use the @option{--non-decimal-data}
10929 command-line option;
10930 @pxref{Nondecimal Data}.)
10931 If you have octal or hexadecimal data,
10932 you can use the @code{strtonum()} function
10933 (@pxref{String Functions})
10934 to convert the data into a number.
10935 Most of the time, you will want to use octal or hexadecimal constants
10936 when working with the built-in bit-manipulation functions;
10937 see @ref{Bitwise Functions}
10938 for more information.
10939 
10940 Unlike in some early C implementations, @samp{8} and @samp{9} are not
10941 valid in octal constants.  For example, @command{gawk} treats @samp{018}
10942 as decimal 18:
10943 
10944 @example
10945 $ @kbd{gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}'}
10946 @print{} 021 is 17
10947 @print{} 18
10948 @end example
10949 
10950 @cindex compatibility mode (@command{gawk}) @subentry octal numbers
10951 @cindex compatibility mode (@command{gawk}) @subentry hexadecimal numbers
10952 Octal and hexadecimal source code constants are a @command{gawk} extension.
10953 If @command{gawk} is in compatibility mode
10954 (@pxref{Options}),
10955 they are not available.
10956 
10957 @sidebar A Constant's Base Does Not Affect Its Value
10958 
10959 Once a numeric constant has
10960 been converted internally into a number,
10961 @command{gawk} no longer remembers
10962 what the original form of the constant was; the internal value is
10963 always used.  This has particular consequences for conversion of
10964 numbers to strings:
10965 
10966 @example
10967 $ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'}
10968 @print{} 0x11 is <17>
10969 @end example
10970 @end sidebar
10971 
10972 @node Regexp Constants
10973 @subsubsection Regular Expression Constants
10974 
10975 @cindex regexp constants
10976 @cindex @code{~} (tilde), @code{~} operator
10977 @cindex tilde (@code{~}), @code{~} operator
10978 @cindex @code{!} (exclamation point) @subentry @code{!~} operator
10979 @cindex exclamation point (@code{!}) @subentry @code{!~} operator
10980 A @dfn{regexp constant} is a regular expression description enclosed in
10981 slashes, such as @code{@w{/^beginning and end$/}}.  Most regexps used in
10982 @command{awk} programs are constant, but the @samp{~} and @samp{!~}
10983 matching operators can also match computed or dynamic regexps
10984 (which are typically just ordinary strings or variables that contain a regexp,
10985 but could be more complex expressions).
10986 
10987 @node Using Constant Regexps
10988 @subsection Using Regular Expression Constants
10989 
10990 Regular expression constants consist of text describing
10991 a regular expression enclosed in slashes (such as @code{/the +answer/}).
10992 This @value{SECTION} describes how such constants work in
10993 POSIX @command{awk} and @command{gawk}, and then goes on to describe
10994 @dfn{strongly typed regexp constants}, which are a @command{gawk} extension.
10995 
10996 @menu
10997 * Standard Regexp Constants::   Regexp constants in standard @command{awk}.
10998 * Strong Regexp Constants::     Strongly typed regexp constants.
10999 @end menu
11000 
11001 @node Standard Regexp Constants
11002 @subsubsection Standard Regular Expression Constants
11003 
11004 @cindex dark corner @subentry regexp constants
11005 When used on the righthand side of the @samp{~} or @samp{!~}
11006 operators, a regexp constant merely stands for the regexp that is to be
11007 matched.
11008 However, regexp constants (such as @code{/foo/}) may be used like simple expressions.
11009 When a
11010 regexp constant appears by itself, it has the same meaning as if it appeared
11011 in a pattern (i.e., @samp{($0 ~ /foo/)}).
11012 @value{DARKCORNER}
11013 @xref{Expression Patterns}.
11014 This means that the following two code segments:
11015 
11016 @example
11017 if ($0 ~ /barfly/ || $0 ~ /camelot/)
11018     print "found"
11019 @end example
11020 
11021 @noindent
11022 and:
11023 
11024 @example
11025 if (/barfly/ || /camelot/)
11026     print "found"
11027 @end example
11028 
11029 @noindent
11030 are exactly equivalent.
11031 One rather bizarre consequence of this rule is that the following
11032 Boolean expression is valid, but does not do what its author probably
11033 intended:
11034 
11035 @example
11036 # Note that /foo/ is on the left of the ~
11037 if (/foo/ ~ $1) print "found foo"
11038 @end example
11039 
11040 @c @cindex automatic warnings
11041 @c @cindex warnings, automatic
11042 @cindex @command{gawk} @subentry regexp constants and
11043 @cindex regexp constants @subentry in @command{gawk}
11044 @noindent
11045 This code is ``obviously'' testing @code{$1} for a match against the regexp
11046 @code{/foo/}.  But in fact, the expression @samp{/foo/ ~ $1} really means
11047 @samp{($0 ~ /foo/) ~ $1}.  In other words, first match the input record
11048 against the regexp @code{/foo/}.  The result is either zero or one,
11049 depending upon the success or failure of the match.  That result
11050 is then matched against the first field in the record.
11051 Because it is unlikely that you would ever really want to make this kind of
11052 test, @command{gawk} issues a warning when it sees this construct in
11053 a program.
11054 Another consequence of this rule is that the assignment statement:
11055 
11056 @example
11057 matches = /foo/
11058 @end example
11059 
11060 @noindent
11061 assigns either zero or one to the variable @code{matches}, depending
11062 upon the contents of the current input record.
11063 
11064 @cindex differences in @command{awk} and @command{gawk} @subentry regexp constants
11065 @cindex dark corner @subentry regexp constants @subentry as arguments to user-defined functions
11066 @cindexgawkfunc{gensub}
11067 @cindexawkfunc{sub}
11068 @cindexawkfunc{gsub}
11069 Constant regular expressions are also used as the first argument for
11070 the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, as the
11071 second argument of the @code{match()} function,
11072 and as the third argument of the @code{split()} and @code{patsplit()} functions
11073 (@pxref{String Functions}).
11074 Modern implementations of @command{awk}, including @command{gawk}, allow
11075 the third argument of @code{split()} to be a regexp constant, but some
11076 older implementations do not.
11077 @value{DARKCORNER}
11078 Because some built-in functions accept regexp constants as arguments,
11079 confusion can arise when attempting to use regexp constants as arguments
11080 to user-defined functions (@pxref{User-defined}).  For example:
11081 
11082 @example
11083 @group
11084 function mysub(pat, repl, str, global)
11085 @{
11086     if (global)
11087         gsub(pat, repl, str)
11088     else
11089         sub(pat, repl, str)
11090     return str
11091 @}
11092 @end group
11093 
11094 @group
11095 @{
11096     @dots{}
11097     text = "hi! hi yourself!"
11098     mysub(/hi/, "howdy", text, 1)
11099     @dots{}
11100 @}
11101 @end group
11102 @end example
11103 
11104 @c @cindex automatic warnings
11105 @c @cindex warnings, automatic
11106 In this example, the programmer wants to pass a regexp constant to the
11107 user-defined function @code{mysub()}, which in turn passes it on to
11108 either @code{sub()} or @code{gsub()}.  However, what really happens is that
11109 the @code{pat} parameter is assigned a value of either one or zero, depending upon whether
11110 or not @code{$0} matches @code{/hi/}.
11111 @command{gawk} issues a warning when it sees a regexp constant used as
11112 a parameter to a user-defined function, because passing a truth value in
11113 this way is probably not what was intended.
11114 
11115 @node Strong Regexp Constants
11116 @subsubsection Strongly Typed Regexp Constants
11117 
11118 This @value{SECTION} describes a @command{gawk}-specific feature.
11119 
11120 As we saw in the previous @value{SECTION},
11121 regexp constants (@code{/@dots{}/}) hold a strange position in the
11122 @command{awk} language. In most contexts, they act like an expression:
11123 @samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
11124 be matched. In no case are they really a ``first class citizen'' of the
11125 language. That is, you cannot define a scalar variable whose type is
11126 ``regexp'' in the same sense that you can define a variable to be a
11127 number or a string:
11128 
11129 @example
11130 num = 42        @ii{Numeric variable}
11131 str = "hi"      @ii{String variable}
11132 re = /foo/      @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
11133 @end example
11134 
11135 For a number of more advanced use cases,
11136 it would be nice to have regexp constants that
11137 are @dfn{strongly typed}; in other words, that denote a regexp useful
11138 for matching, and not an expression.
11139 
11140 @cindex values @subentry regexp
11141 @command{gawk} provides this feature.  A strongly typed regexp constant
11142 looks almost like a regular regexp constant, except that it is preceded
11143 by an @samp{@@} sign:
11144 
11145 @example
11146 re = @@/foo/     @ii{Regexp variable}
11147 @end example
11148 
11149 Strongly typed regexp constants @emph{cannot} be used everywhere that a
11150 regular regexp constant can, because this would make the language even more
11151 confusing.  Instead, you may use them only in certain contexts:
11152 
11153 @itemize @bullet
11154 @item
11155 On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/}
11156 (@pxref{Regexp Usage}).
11157 
11158 @item
11159 In the @code{case} part of a @code{switch} statement
11160 (@pxref{Switch Statement}).
11161 
11162 @item
11163 As an argument to one of the built-in functions that accept regexp constants:
11164 @code{gensub()},
11165 @code{gsub()},
11166 @code{match()},
11167 @code{patsplit()},
11168 @code{split()},
11169 and
11170 @code{sub()}
11171 (@pxref{String Functions}).
11172 
11173 @item
11174 As a parameter in a call to a user-defined function
11175 (@pxref{User-defined}).
11176 
11177 @item
11178 On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}.
11179 In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var}
11180 can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions
11181 listed above, or passed as a parameter to a user-defined function.
11182 @end itemize
11183 
11184 You may use the @code{typeof()} built-in function
11185 (@pxref{Type Functions})
11186 to determine if a variable or function parameter is
11187 a regexp variable.
11188 
11189 The true power of this feature comes from the ability to create variables that
11190 have regexp type. Such variables can be passed on to user-defined functions,
11191 without the confusing aspects of computed regular expressions created from
11192 strings or string constants. They may also be passed through indirect function
11193 calls (@pxref{Indirect Calls})
11194 and on to the built-in functions that accept regexp constants.
11195 
11196 When used in numeric conversions, strongly typed regexp variables convert
11197 to zero. When used in string conversions, they convert to the string
11198 value of the original regexp text.
11199 
11200 @node Variables
11201 @subsection Variables
11202 
11203 @cindex variables @subentry user-defined
11204 @cindex user-defined @subentry variables
11205 @dfn{Variables} are ways of storing values at one point in your program for
11206 use later in another part of your program.  They can be manipulated
11207 entirely within the program text, and they can also be assigned values
11208 on the @command{awk} command line.
11209 
11210 @menu
11211 * Using Variables::             Using variables in your programs.
11212 * Assignment Options::          Setting variables on the command line and a
11213                                 summary of command-line syntax. This is an
11214                                 advanced method of input.
11215 @end menu
11216 
11217 @node Using Variables
11218 @subsubsection Using Variables in a Program
11219 
11220 Variables let you give names to values and refer to them later.  Variables
11221 have already been used in many of the examples.  The name of a variable
11222 must be a sequence of letters, digits, or underscores, and it may not begin
11223 with a digit.
11224 Here, a @dfn{letter} is any one of the 52 upper- and lowercase
11225 English letters.  Other characters that may be defined as letters
11226 in non-English locales are not valid in variable names.
11227 Case is significant in variable names; @code{a} and @code{A}
11228 are distinct variables.
11229 
11230 A variable name is a valid expression by itself; it represents the
11231 variable's current value.  Variables are given new values with
11232 @dfn{assignment operators}, @dfn{increment operators}, and
11233 @dfn{decrement operators}
11234 (@pxref{Assignment Ops}).
11235 In addition, the @code{sub()} and @code{gsub()} functions can
11236 change a variable's value, and the @code{match()}, @code{split()},
11237 and @code{patsplit()} functions can change the contents of their
11238 array parameters (@pxref{String Functions}).
11239 
11240 @cindex variables @subentry built-in
11241 @cindex variables @subentry initializing
11242 A few variables have special built-in meanings, such as @code{FS} (the
11243 field separator) and @code{NF} (the number of fields in the current input
11244 record).  @xref{Built-in Variables} for a list of the predefined variables.
11245 These predefined variables can be used and assigned just like all other
11246 variables, but their values are also used or changed automatically by
11247 @command{awk}.  All predefined variables' names are entirely uppercase.
11248 
11249 Variables in @command{awk} can be assigned either numeric or string values.
11250 The kind of value a variable holds can change over the life of a program.
11251 By default, variables are initialized to the empty string, which
11252 is zero if converted to a number.  There is no need to explicitly
11253 initialize a variable in @command{awk},
11254 which is what you would do in C and in most other traditional languages.
11255 
11256 @node Assignment Options
11257 @subsubsection Assigning Variables on the Command Line
11258 @cindex variables @subentry assigning on command line
11259 @cindex command line @subentry variables, assigning on
11260 
11261 Any @command{awk} variable can be set by including a @dfn{variable assignment}
11262 among the arguments on the command line when @command{awk} is invoked
11263 (@pxref{Other Arguments}).
11264 Such an assignment has the following form:
11265 
11266 @example
11267 @var{variable}=@var{text}
11268 @end example
11269 
11270 @cindex @option{-v} option
11271 @noindent
11272 With it, a variable is set either at the beginning of the
11273 @command{awk} run or in between input files.
11274 When the assignment is preceded with the @option{-v} option,
11275 as in the following:
11276 
11277 @example
11278 -v @var{variable}=@var{text}
11279 @end example
11280 
11281 @noindent
11282 the variable is set at the very beginning, even before the
11283 @code{BEGIN} rules execute.  The @option{-v} option and its assignment
11284 must precede all the @value{FN} arguments, as well as the program text.
11285 (@xref{Options} for more information about
11286 the @option{-v} option.)
11287 Otherwise, the variable assignment is performed at a time determined by
11288 its position among the input file arguments---after the processing of the
11289 preceding input file argument.  For example:
11290 
11291 @example
11292 awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list
11293 @end example
11294 
11295 @noindent
11296 prints the value of field number @code{n} for all input records.  Before
11297 the first file is read, the command line sets the variable @code{n}
11298 equal to four.  This causes the fourth field to be printed in lines from
11299 @file{inventory-shipped}.  After the first file has finished,
11300 but before the second file is started, @code{n} is set to two, so that the
11301 second field is printed in lines from @file{mail-list}:
11302 
11303 @example
11304 $ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list}
11305 @print{} 15
11306 @print{} 24
11307 @dots{}
11308 @print{} 555-5553
11309 @print{} 555-3412
11310 @dots{}
11311 @end example
11312 
11313 @cindex dark corner @subentry command-line arguments
11314 Command-line arguments are made available for explicit examination by
11315 the @command{awk} program in the @code{ARGV} array
11316 (@pxref{ARGC and ARGV}).
11317 @command{awk} processes the values of command-line assignments for escape
11318 sequences
11319 (@pxref{Escape Sequences}).
11320 @value{DARKCORNER}
11321 
11322 Normally, variables assigned on the command line (with or without the
11323 @option{-v} option) are treated as strings.  When such variables are
11324 used as numbers, @command{awk}'s normal automatic conversion of strings
11325 to numbers takes place, and everything ``just works.''
11326 
11327 However, @command{gawk} supports variables whose types are ``regexp''.
11328 You can assign variables of this type using the following syntax:
11329 
11330 @example
11331 gawk -v 're1=@@/foo|bar/' '@dots{}' /path/to/file1 're2=@@/baz|quux/' /path/to/file2
11332 @end example
11333 
11334 @noindent
11335 Strongly typed regexps are an advanced feature (@pxref{Strong Regexp Constants}).
11336 We mention them here only for completeness.
11337 
11338 @node Conversion
11339 @subsection Conversion of Strings and Numbers
11340 
11341 Number-to-string and string-to-number conversion are generally
11342 straightforward.  There can be subtleties to be aware of;
11343 this @value{SECTION} discusses this important facet of @command{awk}.
11344 
11345 @menu
11346 * Strings And Numbers::         How @command{awk} Converts Between Strings And
11347                                 Numbers.
11348 * Locale influences conversions:: How the locale may affect conversions.
11349 @end menu
11350 
11351 @node Strings And Numbers
11352 @subsubsection How @command{awk} Converts Between Strings and Numbers
11353 
11354 @cindex converting @subentry string to numbers
11355 @cindex strings @subentry converting
11356 @cindex numbers @subentry converting
11357 @cindex converting @subentry numbers to strings
11358 Strings are converted to numbers and numbers are converted to strings, if the context
11359 of the @command{awk} program demands it.  For example, if the value of
11360 either @code{foo} or @code{bar} in the expression @samp{foo + bar}
11361 happens to be a string, it is converted to a number before the addition
11362 is performed.  If numeric values appear in string concatenation, they
11363 are converted to strings.  Consider the following:
11364 
11365 @example
11366 @group
11367 two = 2; three = 3
11368 print (two three) + 4
11369 @end group
11370 @end example
11371 
11372 @noindent
11373 This prints the (numeric) value 27.  The numeric values of
11374 the variables @code{two} and @code{three} are converted to strings and
11375 concatenated together.  The resulting string is converted back to the
11376 number 23, to which 4 is then added.
11377 
11378 @cindex null strings @subentry converting numbers to strings
11379 @cindex type @subentry conversion
11380 If, for some reason, you need to force a number to be converted to a
11381 string, concatenate that number with the empty string, @code{""}.
11382 To force a string to be converted to a number, add zero to that string.
11383 A string is converted to a number by interpreting any numeric prefix
11384 of the string as numerals:
11385 @code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1,000, and @code{"25fix"}
11386 has a numeric value of 25.
11387 Strings that can't be interpreted as valid numbers convert to zero.
11388 
11389 @cindex @code{CONVFMT} variable
11390 The exact manner in which numbers are converted into strings is controlled
11391 by the @command{awk} predefined variable @code{CONVFMT} (@pxref{Built-in Variables}).
11392 Numbers are converted using the @code{sprintf()} function
11393 with @code{CONVFMT} as the format
11394 specifier
11395 (@pxref{String Functions}).
11396 
11397 @code{CONVFMT}'s default value is @code{"%.6g"}, which creates a value with
11398 at most six significant digits.  For some applications, you might want to
11399 change it to specify more precision.
11400 On most modern machines,
11401 17 digits is usually enough to capture a floating-point number's
11402 value exactly.@footnote{Pathological cases can require up to
11403 752 digits (!), but we doubt that you need to worry about this.}
11404 
11405 @cindex dark corner @subentry @code{CONVFMT} variable
11406 Strange results can occur if you set @code{CONVFMT} to a string that doesn't
11407 tell @code{sprintf()} how to format floating-point numbers in a useful way.
11408 For example, if you forget the @samp{%} in the format, @command{awk} converts
11409 all numbers to the same constant string.
11410 
11411 As a special case, if a number is an integer, then the result of converting
11412 it to a string is @emph{always} an integer, no matter what the value of
11413 @code{CONVFMT} may be.  Given the following code fragment:
11414 
11415 @example
11416 CONVFMT = "%2.2f"
11417 a = 12
11418 b = a ""
11419 @end example
11420 
11421 @noindent
11422 @code{b} has the value @code{"12"}, not @code{"12.00"}.
11423 @value{DARKCORNER}
11424 
11425 @sidebar Pre-POSIX @command{awk} Used @code{OFMT} for String Conversion
11426 @cindex POSIX @command{awk} @subentry @code{OFMT} variable and
11427 @cindex @code{OFMT} variable
11428 @cindex portability @subentry new @command{awk} vs.@: old @command{awk}
11429 @cindex @command{awk} @subentry new vs.@: old @subentry @code{OFMT} variable
11430 Prior to the POSIX standard, @command{awk} used the value
11431 of @code{OFMT} for converting numbers to strings.  @code{OFMT}
11432 specifies the output format to use when printing numbers with @code{print}.
11433 @code{CONVFMT} was introduced in order to separate the semantics of
11434 conversion from the semantics of printing.  Both @code{CONVFMT} and
11435 @code{OFMT} have the same default value: @code{"%.6g"}.  In the vast majority
11436 of cases, old @command{awk} programs do not change their behavior.
11437 @xref{Print} for more information on the @code{print} statement.
11438 @end sidebar
11439 
11440 @node Locale influences conversions
11441 @subsubsection Locales Can Influence Conversion
11442 
11443 Where you are can matter when it comes to converting between numbers and
11444 strings.  The local character set and language---the @dfn{locale}---can
11445 affect numeric formats.  In particular, for @command{awk} programs,
11446 it affects the decimal point character and the thousands-separator
11447 character.  The @code{"C"} locale, and most English-language locales,
11448 use the period character (@samp{.}) as the decimal point and don't
11449 have a thousands separator.  However, many (if not most) European and
11450 non-English locales use the comma (@samp{,}) as the decimal point
11451 character. European locales often use either a space or a period as
11452 the thousands separator, if they have one.
11453 
11454 @cindex dark corner @subentry locale's decimal point character
11455 The POSIX standard says that @command{awk} always uses the period as the decimal
11456 point when reading the @command{awk} program source code, and for
11457 command-line variable assignments (@pxref{Other Arguments}).  However,
11458 when interpreting input data, for @code{print} and @code{printf} output,
11459 and for number-to-string conversion, the local decimal point character
11460 is used.  @value{DARKCORNER} In all cases, numbers in source code and
11461 in input data cannot have a thousands separator.  Here are some examples
11462 indicating the difference in behavior, on a GNU/Linux system:
11463 
11464 @example
11465 $ @kbd{export POSIXLY_CORRECT=1}                        @ii{Force POSIX behavior}
11466 $ @kbd{gawk 'BEGIN @{ printf "%g\n", 3.1415927 @}'}
11467 @print{} 3.14159
11468 $ @kbd{LC_ALL=en_DK.utf-8 gawk 'BEGIN @{ printf "%g\n", 3.1415927 @}'}
11469 @print{} 3,14159
11470 $ @kbd{echo 4,321 | gawk '@{ print $1 + 1 @}'}
11471 @print{} 5
11472 $ @kbd{echo 4,321 | LC_ALL=en_DK.utf-8 gawk '@{ print $1 + 1 @}'}
11473 @print{} 5,321
11474 @end example
11475 
11476 @noindent
11477 The @code{en_DK.utf-8} locale is for English in Denmark, where the comma acts as
11478 the decimal point separator.  In the normal @code{"C"} locale, @command{gawk}
11479 treats @samp{4,321} as 4, while in the Danish locale, it's treated
11480 as the full number including the fractional part, 4.321.
11481 
11482 @cindex POSIX mode
11483 Some earlier versions of @command{gawk} fully complied with this aspect
11484 of the standard.  However, many users in non-English locales complained
11485 about this behavior, because their data used a period as the decimal
11486 point, so the default behavior was restored to use a period as the
11487 decimal point character.  You can use the @option{--use-lc-numeric}
11488 option (@pxref{Options}) to force @command{gawk} to use the locale's
11489 decimal point character.  (@command{gawk} also uses the locale's decimal
11490 point character when in POSIX mode, either via @option{--posix} or the
11491 @env{POSIXLY_CORRECT} environment variable, as shown previously.)
11492 
11493 @ref{table-locale-affects} describes the cases in which the locale's decimal
11494 point character is used and when a period is used. Some of these
11495 features have not been described yet.
11496 
11497 @float Table,table-locale-affects
11498 @caption{Locale decimal point versus a period}
11499 @multitable @columnfractions .15 .20 .45
11500 @headitem Feature @tab Default @tab @option{--posix} or @option{--use-lc-numeric}
11501 @item @code{%'g} @tab Use locale @tab Use locale
11502 @item @code{%g} @tab Use period @tab Use locale
11503 @item Input @tab Use period @tab Use locale
11504 @item @code{strtonum()} @tab Use period @tab Use locale
11505 @end multitable
11506 @end float
11507 
11508 Finally, modern-day formal standards and the IEEE standard floating-point
11509 representation can have an unusual but important effect on the way
11510 @command{gawk} converts some special string values to numbers.  The details
11511 are presented in @ref{POSIX Floating Point Problems}.
11512 
11513 @node All Operators
11514 @section Operators: Doing Something with Values
11515 
11516 This @value{SECTION} introduces the @dfn{operators} that make use
11517 of the values provided by constants and variables.
11518 
11519 @menu
11520 * Arithmetic Ops::              Arithmetic operations (@samp{+}, @samp{-},
11521                                 etc.)
11522 * Concatenation::               Concatenating strings.
11523 * Assignment Ops::              Changing the value of a variable or a field.
11524 * Increment Ops::               Incrementing the numeric value of a variable.
11525 @end menu
11526 
11527 @node Arithmetic Ops
11528 @subsection Arithmetic Operators
11529 @cindex arithmetic operators
11530 @cindex operators @subentry arithmetic
11531 @c @cindex addition
11532 @c @cindex subtraction
11533 @c @cindex multiplication
11534 @c @cindex division
11535 @c @cindex remainder
11536 @c @cindex quotient
11537 @c @cindex exponentiation
11538 
11539 The @command{awk} language uses the common arithmetic operators when
11540 evaluating expressions.  All of these arithmetic operators follow normal
11541 precedence rules and work as you would expect them to.
11542 
11543 The following example uses a file named @file{grades}, which contains
11544 a list of student names as well as three test scores per student (it's
11545 a small class):
11546 
11547 @example
11548 Pat   100 97 58
11549 Sandy  84 72 93
11550 Chris  72 92 89
11551 @end example
11552 
11553 @noindent
11554 This program takes the file @file{grades} and prints the average
11555 of the scores:
11556 
11557 @example
11558 $ @kbd{awk '@{ sum = $2 + $3 + $4 ; avg = sum / 3}
11559 >        @kbd{print $1, avg @}' grades}
11560 @print{} Pat 85
11561 @print{} Sandy 83
11562 @print{} Chris 84.3333
11563 @end example
11564 
11565 The following list provides the arithmetic operators in @command{awk},
11566 in order from the highest precedence to the lowest:
11567 
11568 @table @code
11569 @cindex common extensions @subentry @code{**} operator
11570 @cindex extensions @subentry common @subentry @code{**} operator
11571 @cindex POSIX @command{awk} @subentry arithmetic operators and
11572 @item @var{x} ^ @var{y}
11573 @itemx @var{x} ** @var{y}
11574 Exponentiation; @var{x} raised to the @var{y} power.  @samp{2 ^ 3} has
11575 the value eight; the character sequence @samp{**} is equivalent to
11576 @samp{^}. @value{COMMONEXT}
11577 
11578 @item - @var{x}
11579 Negation.
11580 
11581 @item + @var{x}
11582 Unary plus; the expression is converted to a number.
11583 
11584 @item @var{x} * @var{y}
11585 Multiplication.
11586 
11587 @cindex troubleshooting @subentry division
11588 @cindex division
11589 @item @var{x} / @var{y}
11590 Division;  because all numbers in @command{awk} are floating-point
11591 numbers, the result is @emph{not} rounded to an integer---@samp{3 / 4} has
11592 the value 0.75.  (It is a common mistake, especially for C programmers,
11593 to forget that @emph{all} numbers in @command{awk} are floating point,
11594 and that division of integer-looking constants produces a real number,
11595 not an integer.)
11596 
11597 @item @var{x} % @var{y}
11598 Remainder; further discussion is provided in the text, just
11599 after this list.
11600 
11601 @item @var{x} + @var{y}
11602 Addition.
11603 
11604 @item @var{x} - @var{y}
11605 Subtraction.
11606 @end table
11607 
11608 Unary plus and minus have the same precedence,
11609 the multiplication operators all have the same precedence, and
11610 addition and subtraction have the same precedence.
11611 
11612 @cindex differences in @command{awk} and @command{gawk} @subentry trunc-mod operation
11613 @cindex trunc-mod operation
11614 When computing the remainder of @samp{@var{x} % @var{y}},
11615 the quotient is rounded toward zero to an integer and
11616 multiplied by @var{y}. This result is subtracted from @var{x};
11617 this operation is sometimes known as ``trunc-mod.''  The following
11618 relation always holds:
11619 
11620 @example
11621 b * int(a / b) + (a % b) == a
11622 @end example
11623 
11624 One possibly undesirable effect of this definition of remainder is that
11625 @samp{@var{x} % @var{y}} is negative if @var{x} is negative.  Thus:
11626 
11627 @example
11628 -17 % 8 = -1
11629 @end example
11630 
11631 In other @command{awk} implementations, the signedness of the remainder
11632 may be machine-dependent.
11633 @c FIXME !!! what does posix say?
11634 
11635 @cindex portability @subentry @code{**} operator and
11636 @cindex @code{*} (asterisk) @subentry @code{**} operator
11637 @cindex asterisk (@code{*}) @subentry @code{**} operator
11638 @quotation NOTE
11639 The POSIX standard only specifies the use of @samp{^}
11640 for exponentiation.
11641 For maximum portability, do not use the @samp{**} operator.
11642 @end quotation
11643 
11644 @node Concatenation
11645 @subsection String Concatenation
11646 @cindex Kernighan, Brian
11647 @quotation
11648 @i{It seemed like a good idea at the time.}
11649 @author Brian Kernighan
11650 @end quotation
11651 
11652 @cindex string @subentry operators
11653 @cindex operators @subentry string
11654 @cindex concatenating
11655 There is only one string operation: concatenation.  It does not have a
11656 specific operator to represent it.  Instead, concatenation is performed by
11657 writing expressions next to one another, with no operator.  For example:
11658 
11659 @example
11660 $ @kbd{awk '@{ print "Field number one: " $1 @}' mail-list}
11661 @print{} Field number one: Amelia
11662 @print{} Field number one: Anthony
11663 @dots{}
11664 @end example
11665 
11666 Without the space in the string constant after the @samp{:}, the line
11667 runs together.  For example:
11668 
11669 @example
11670 $ @kbd{awk '@{ print "Field number one:" $1 @}' mail-list}
11671 @print{} Field number one:Amelia
11672 @print{} Field number one:Anthony
11673 @dots{}
11674 @end example
11675 
11676 @cindex troubleshooting @subentry string concatenation
11677 Because string concatenation does not have an explicit operator, it is
11678 often necessary to ensure that it happens at the right time by using
11679 parentheses to enclose the items to concatenate.  For example,
11680 you might expect that the
11681 following code fragment concatenates @code{file} and @code{name}:
11682 
11683 @example
11684 file = "file"
11685 name = "name"
11686 print "something meaningful" > file name
11687 @end example
11688 
11689 @cindex Brian Kernighan's @command{awk}
11690 @cindex @command{mawk} utility
11691 @noindent
11692 This produces a syntax error with some versions of Unix
11693 @command{awk}.@footnote{It happens that BWK
11694 @command{awk}, @command{gawk}, and @command{mawk} all ``get it right,''
11695 but you should not rely on this.}
11696 It is necessary to use the following:
11697 
11698 @example
11699 print "something meaningful" > (file name)
11700 @end example
11701 
11702 @cindex order of evaluation, concatenation
11703 @cindex evaluation order @subentry concatenation
11704 @cindex side effects
11705 Parentheses should be used around concatenation in all but the
11706 most common contexts, such as on the righthand side of @samp{=}.
11707 Be careful about the kinds of expressions used in string concatenation.
11708 In particular, the order of evaluation of expressions used for concatenation
11709 is undefined in the @command{awk} language.  Consider this example:
11710 
11711 @example
11712 BEGIN @{
11713     a = "don't"
11714     print (a " " (a = "panic"))
11715 @}
11716 @end example
11717 
11718 @noindent
11719 It is not defined whether the second assignment to @code{a} happens
11720 before or after the value of @code{a} is retrieved for producing the
11721 concatenated value.  The result could be either @samp{don't panic},
11722 or @samp{panic panic}.
11723 @c see test/nasty.awk for a worse example
11724 
11725 The precedence of concatenation, when mixed with other operators, is often
11726 counter-intuitive.  Consider this example:
11727 
11728 @ignore
11729 > To: bug-gnu-utils@@gnu.org
11730 > CC: arnold@@gnu.org
11731 > Subject: gawk 3.0.4 bug with {print -12 " " -24}
11732 > From: Russell Schulz <Russell_Schulz@locutus.ofB.ORG>
11733 > Date: Tue, 8 Feb 2000 19:56:08 -0700
11734 >
11735 > gawk 3.0.4 on NT gives me:
11736 >
11737 > prompt> cat bad.awk
11738 > BEGIN { print -12 " " -24; }
11739 >
11740 > prompt> gawk -f bad.awk
11741 > -12-24
11742 >
11743 > when I would expect
11744 >
11745 > -12 -24
11746 >
11747 > I have not investigated the source, or other implementations.  The
11748 > bug is there on my NT and DOS versions 2.15.6 .
11749 @end ignore
11750 
11751 @example
11752 $ @kbd{awk 'BEGIN @{ print -12 " " -24 @}'}
11753 @print{} -12-24
11754 @end example
11755 
11756 This ``obviously'' is concatenating @minus{}12, a space, and @minus{}24.
11757 But where did the space disappear to?
11758 The answer lies in the combination of operator precedences and
11759 @command{awk}'s automatic conversion rules.  To get the desired result,
11760 write the program this way:
11761 
11762 @example
11763 $ @kbd{awk 'BEGIN @{ print -12 " " (-24) @}'}
11764 @print{}