"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "doc/gawk.texi" between
gawk-5.0.1.tar.xz and gawk-5.1.0.tar.xz

About: GNU awk - pattern scanning and processing language.

gawk.texi  (gawk-5.0.1.tar.xz):gawk.texi  (gawk-5.1.0.tar.xz)
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end ifnottex @end ifnottex
@c Let texinfo.tex give us full section titles @c Let texinfo.tex give us full section titles
@xrefautomaticsectiontitle on @xrefautomaticsectiontitle on
@c The following information should be updated here only! @c The following information should be updated here only!
@c This sets the edition of the document, the version of gawk it @c This sets the edition of the document, the version of gawk it
@c applies to and all the info about who's publishing this edition @c applies to and all the info about who's publishing this edition
@c These apply across the board. @c These apply across the board.
@set UPDATE-MONTH June, 2019 @set UPDATE-MONTH March, 2020
@set VERSION 5.0 @set VERSION 5.1
@set PATCHLEVEL 1 @set PATCHLEVEL 0
@set GAWKINETTITLE TCP/IP Internetworking with @command{gawk} @set GAWKINETTITLE TCP/IP Internetworking with @command{gawk}
@ifset FOR_PRINT @ifset FOR_PRINT
@set TITLE Effective awk Programming @set TITLE Effective awk Programming
@end ifset @end ifset
@ifclear FOR_PRINT @ifclear FOR_PRINT
@set TITLE GAWK: Effective AWK Programming @set TITLE GAWK: Effective AWK Programming
@end ifclear @end ifclear
@set SUBTITLE A User's Guide for GNU Awk @set SUBTITLE A User's Guide for GNU Awk
@set EDITION 5.0 @set EDITION 5.1
@iftex @iftex
@set DOCUMENT book @set DOCUMENT book
@set CHAPTER chapter @set CHAPTER chapter
@set APPENDIX appendix @set APPENDIX appendix
@set SECTION section @set SECTION section
@set SUBSECTION subsection @set SUBSECTION subsection
@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}} @set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}}
@set COMMONEXT (c.e.) @set COMMONEXT (c.e.)
@set PAGE page @set PAGE page
skipping to change at page 110, line ? skipping to change at page 110, line ?
@c If "finalout" is commented out, the printed output will show @c If "finalout" is commented out, the printed output will show
@c black boxes that mark lines that are too long. Thus, it is @c black boxes that mark lines that are too long. Thus, it is
@c unwise to comment it out when running a master in case there are @c unwise to comment it out when running a master in case there are
@c overfulls which are deemed okay. @c overfulls which are deemed okay.
@iftex @iftex
@finalout @finalout
@end iftex @end iftex
@c Enabled '-quotes in PDF files so that cut/paste works in
@c more places.
@codequoteundirected on
@codequotebacktick on
@copying @copying
@docbook @docbook
<para> <para>
&ldquo;To boldly go where no man has gone before&rdquo; is a &ldquo;To boldly go where no man has gone before&rdquo; is a
Registered Trademark of Paramount Pictures Corporation.</para> Registered Trademark of Paramount Pictures Corporation.</para>
<para>Published by:</para> <para>Published by:</para>
<literallayout class="normal">Free Software Foundation <literallayout class="normal">Free Software Foundation
51 Franklin Street, Fifth Floor 51 Franklin Street, Fifth Floor
Boston, MA 02110-1301 USA Boston, MA 02110-1301 USA
Phone: +1-617-542-5942 Phone: +1-617-542-5942
Fax: +1-617-542-2652 Fax: +1-617-542-2652
Email: <email>gnu@@gnu.org</email> Email: <email>gnu@@gnu.org</email>
URL: <ulink url="https://www.gnu.org">https://www.gnu.org/</ulink></literallayou t> URL: <ulink url="https://www.gnu.org">https://www.gnu.org/</ulink></literallayou t>
<literallayout class="normal">Copyright &copy; 1989, 1991, 1992, 1993, 1996&ndas h;2005, 2007, 2009&ndash;2019 <literallayout class="normal">Copyright &copy; 1989, 1991, 1992, 1993, 1996&ndas h;2005, 2007, 2009&ndash;2020
Free Software Foundation, Inc. Free Software Foundation, Inc.
All Rights Reserved.</literallayout> All Rights Reserved.</literallayout>
@end docbook @end docbook
@ifnotdocbook @ifnotdocbook
Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2019 @* Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2020 @*
Free Software Foundation, Inc. Free Software Foundation, Inc.
@end ifnotdocbook @end ifnotdocbook
@sp 2 @sp 2
This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}},
for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU
implementation of AWK. implementation of AWK.
Permission is granted to copy, distribute and/or modify this document Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or under the terms of the GNU Free Documentation License, Version 1.3 or
skipping to change at page 110, line ? skipping to change at page 110, line ?
line. line.
* Full Line Fields:: Making the full line be a single * Full Line Fields:: Making the full line be a single
field. field.
* Field Splitting Summary:: Some final points and a summary table. * Field Splitting Summary:: Some final points and a summary table.
* Constant Size:: Reading constant width data. * Constant Size:: Reading constant width data.
* Fixed width data:: Processing fixed-width data. * Fixed width data:: Processing fixed-width data.
* Skipping intervening:: Skipping intervening fields. * Skipping intervening:: Skipping intervening fields.
* Allowing trailing data:: Capturing optional trailing data. * Allowing trailing data:: Capturing optional trailing data.
* Fields with fixed data:: Field values with fixed-width data. * Fields with fixed data:: Field values with fixed-width data.
* Splitting By Content:: Defining Fields By Content * Splitting By Content:: Defining Fields By Content
* More CSV:: More on CSV files.
* Testing field creation:: Checking how @command{gawk} is * Testing field creation:: Checking how @command{gawk} is
splitting records. splitting records.
* Multiple Line:: Reading multiline records. * Multiple Line:: Reading multiline records.
* Getline:: Reading files under explicit program * Getline:: Reading files under explicit program
control using the @code{getline} control using the @code{getline}
function. function.
* Plain Getline:: Using @code{getline} with no * Plain Getline:: Using @code{getline} with no
arguments. arguments.
* Getline/Variable:: Using @code{getline} into a variable. * Getline/Variable:: Using @code{getline} into a variable.
* Getline/File:: Using @code{getline} from a file. * Getline/File:: Using @code{getline} from a file.
skipping to change at page 110, line ? skipping to change at page 110, line ?
* Breakpoint Control:: Control of Breakpoints. * Breakpoint Control:: Control of Breakpoints.
* Debugger Execution Control:: Control of Execution. * Debugger Execution Control:: Control of Execution.
* Viewing And Changing Data:: Viewing and Changing Data. * Viewing And Changing Data:: Viewing and Changing Data.
* Execution Stack:: Dealing with the Stack. * Execution Stack:: Dealing with the Stack.
* Debugger Info:: Obtaining Information about the * Debugger Info:: Obtaining Information about the
Program and the Debugger State. Program and the Debugger State.
* Miscellaneous Debugger Commands:: Miscellaneous Commands. * Miscellaneous Debugger Commands:: Miscellaneous Commands.
* Readline Support:: Readline support. * Readline Support:: Readline support.
* Limitations:: Limitations and future plans. * Limitations:: Limitations and future plans.
* Debugging Summary:: Debugging summary. * Debugging Summary:: Debugging summary.
* Global Namespace:: The global namespace in standard @comman * Global Namespace:: The global namespace in standard
d{awk}. @command{awk}.
* Qualified Names:: How to qualify names with a namespace. * Qualified Names:: How to qualify names with a namespace.
* Default Namespace:: The default namespace. * Default Namespace:: The default namespace.
* Changing The Namespace:: How to change the namespace. * Changing The Namespace:: How to change the namespace.
* Naming Rules:: Namespace and Component Naming Rules. * Naming Rules:: Namespace and Component Naming Rules.
* Internal Name Management:: How names are stored internally. * Internal Name Management:: How names are stored internally.
* Namespace Example:: An example of code using a namespace. * Namespace Example:: An example of code using a namespace.
* Namespace And Features:: Namespaces and other @command{gawk} feat * Namespace And Features:: Namespaces and other @command{gawk}
ures. features.
* Namespace Summary:: Summarizing namespaces. * Namespace Summary:: Summarizing namespaces.
* Computer Arithmetic:: A quick intro to computer math. * Computer Arithmetic:: A quick intro to computer math.
* Math Definitions:: Defining terms used. * Math Definitions:: Defining terms used.
* MPFR features:: The MPFR features in @command{gawk}. * MPFR features:: The MPFR features in @command{gawk}.
* FP Math Caution:: Things to know. * FP Math Caution:: Things to know.
* Inexactness of computations:: Floating point math is not exact. * Inexactness of computations:: Floating point math is not exact.
* Inexact representation:: Numbers are not exactly represented. * Inexact representation:: Numbers are not exactly represented.
* Comparing FP Values:: How to compare floating point values. * Comparing FP Values:: How to compare floating point values.
* Errors accumulate:: Errors get bigger as they go. * Errors accumulate:: Errors get bigger as they go.
* Getting Accuracy:: Getting more accuracy takes some work. * Getting Accuracy:: Getting more accuracy takes some work.
skipping to change at page 110, line ? skipping to change at page 110, line ?
<author> <author>
<firstname>Arnold</firstname> <firstname>Arnold</firstname>
<surname>Robbins</surname> <surname>Robbins</surname>
<affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation> <affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation>
<affiliation><jobtitle>Israel</jobtitle></affiliation> <affiliation><jobtitle>Israel</jobtitle></affiliation>
</author> </author>
<date>February 2015</date> <date>February 2015</date>
</prefaceinfo> </prefaceinfo>
@end docbook @end docbook
@cindex @command{awk}
Several kinds of tasks occur repeatedly when working with text files. Several kinds of tasks occur repeatedly when working with text files.
You might want to extract certain lines and discard the rest. Or you You might want to extract certain lines and discard the rest. Or you
may need to make changes wherever certain patterns appear, but leave the may need to make changes wherever certain patterns appear, but leave the
rest of the file alone. Such jobs are often easy with @command{awk}. rest of the file alone. Such jobs are often easy with @command{awk}.
The @command{awk} utility interprets a special-purpose programming The @command{awk} utility interprets a special-purpose programming
language that makes it easy to handle simple data-reformatting jobs. language that makes it easy to handle simple data-reformatting jobs.
@cindex @command{gawk}
The GNU implementation of @command{awk} is called @command{gawk}; if you The GNU implementation of @command{awk} is called @command{gawk}; if you
invoke it with the proper options or environment variables, invoke it with the proper options or environment variables,
it is fully compatible with it is fully compatible with
the POSIX@footnote{The 2008 POSIX standard is accessible online at the POSIX@footnote{The 2018 POSIX standard is accessible online at
@w{@url{http://www.opengroup.org/onlinepubs/9699919799/}.}} @w{@url{https://pubs.opengroup.org/onlinepubs/9699919799/}.}}
specification of the @command{awk} language specification of the @command{awk} language
and with the Unix version of @command{awk} maintained and with the Unix version of @command{awk} maintained
by Brian Kernighan. by Brian Kernighan.
This means that all This means that all
properly written @command{awk} programs should work with @command{gawk}. properly written @command{awk} programs should work with @command{gawk}.
So most of the time, we don't distinguish between @command{gawk} and other So most of the time, we don't distinguish between @command{gawk} and other
@command{awk} implementations. @command{awk} implementations.
@cindex @command{awk}, POSIX and, See Also POSIX @command{awk} @cindex @command{awk} @subentry POSIX and @seealso{POSIX @command{awk}}
@cindex @command{awk}, POSIX and @cindex @command{awk} @subentry POSIX and
@cindex POSIX, @command{awk} and @cindex POSIX @subentry @command{awk} and
@cindex @command{gawk}, @command{awk} and @cindex @command{gawk} @subentry @command{awk} and
@cindex @command{awk}, @command{gawk} and @cindex @command{awk} @subentry @command{gawk} and
@cindex @command{awk}, uses for @cindex @command{awk} @subentry uses for
Using @command{awk} you can: Using @command{awk} you can:
@itemize @value{BULLET} @itemize @value{BULLET}
@item @item
Manage small, personal databases Manage small, personal databases
@item @item
Generate reports Generate reports
@item @item
Validate data Validate data
@item @item
Produce indexes and perform other document-preparation tasks Produce indexes and perform other document-preparation tasks
@item @item
Experiment with algorithms that you can adapt later to other computer Experiment with algorithms that you can adapt later to other computer
languages languages
@end itemize @end itemize
@cindex @command{awk}, See Also @command{gawk} @cindex @command{awk} @seealso{@command{gawk}}
@cindex @command{gawk}, See Also @command{awk} @cindex @command{gawk} @seealso{@command{awk}}
@cindex @command{gawk}, uses for @cindex @command{gawk} @subentry uses for
In addition, In addition,
@command{gawk} @command{gawk}
provides facilities that make it easy to: provides facilities that make it easy to:
@itemize @value{BULLET} @itemize @value{BULLET}
@item @item
Extract bits and pieces of data for processing Extract bits and pieces of data for processing
@item @item
Sort data Sort data
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end itemize @end itemize
This @value{DOCUMENT} teaches you about the @command{awk} language and This @value{DOCUMENT} teaches you about the @command{awk} language and
how you can use it effectively. You should already be familiar with basic how you can use it effectively. You should already be familiar with basic
system commands, such as @command{cat} and @command{ls},@footnote{These utilitie s system commands, such as @command{cat} and @command{ls},@footnote{These utilitie s
are available on POSIX-compliant systems, as well as on traditional are available on POSIX-compliant systems, as well as on traditional
Unix-based systems. If you are using some other operating system, you still need to Unix-based systems. If you are using some other operating system, you still need to
be familiar with the ideas of I/O redirection and pipes.} as well as basic shell be familiar with the ideas of I/O redirection and pipes.} as well as basic shell
facilities, such as input/output (I/O) redirection and pipes. facilities, such as input/output (I/O) redirection and pipes.
@cindex GNU @command{awk}, See @command{gawk} @cindex GNU @command{awk} @seeentry{@command{gawk}}
Implementations of the @command{awk} language are available for many Implementations of the @command{awk} language are available for many
different computing environments. This @value{DOCUMENT}, while describing different computing environments. This @value{DOCUMENT}, while describing
the @command{awk} language in general, also describes the particular the @command{awk} language in general, also describes the particular
implementation of @command{awk} called @command{gawk} (which stands for implementation of @command{awk} called @command{gawk} (which stands for
``GNU @command{awk}''). @command{gawk} runs on a broad range of Unix systems, ``GNU @command{awk}''). @command{gawk} runs on a broad range of Unix systems,
ranging from Intel-architecture PC-based computers ranging from Intel-architecture PC-based computers
up through large-scale systems. up through large-scale systems.
@command{gawk} has also been ported to Mac OS X, @command{gawk} has also been ported to Mac OS X,
Microsoft Windows Microsoft Windows
(all versions), (all versions),
skipping to change at page 110, line ? skipping to change at page 110, line ?
* Manual History:: Brief history of the GNU project and this * Manual History:: Brief history of the GNU project and this
@value{DOCUMENT}. @value{DOCUMENT}.
* How To Contribute:: Helping to save the world. * How To Contribute:: Helping to save the world.
* Acknowledgments:: Acknowledgments. * Acknowledgments:: Acknowledgments.
@end menu @end menu
@node History @node History
@unnumberedsec History of @command{awk} and @command{gawk} @unnumberedsec History of @command{awk} and @command{gawk}
@cindex recipe for a programming language @cindex recipe for a programming language
@cindex programming language, recipe for @cindex programming language, recipe for
@cindex sidebar, Recipe for a Programming Language @cindex sidebar @subentry Recipe for a Programming Language
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Recipe for a Programming Language</title> <sidebar><title>Recipe for a Programming Language</title>
@end docbook @end docbook
@multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}} @multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}}
@item @tab 1 part @code{egrep} @tab 1 part @code{snobol} @item @tab 1 part @code{egrep} @tab 1 part @code{snobol}
@item @tab 2 parts @code{ed} @tab 3 parts C @item @tab 2 parts @code{ed} @tab 3 parts C
@end multitable @end multitable
skipping to change at page 110, line ? skipping to change at page 110, line ?
Document minimally and release. Document minimally and release.
After eight years, add another part @code{egrep} and two After eight years, add another part @code{egrep} and two
more parts C. Document very well and release. more parts C. Document very well and release.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@cindex Aho, Alfred @cindex Aho, Alfred
@cindex Weinberger, Peter @cindex Weinberger, Peter
@cindex Kernighan, Brian @cindex Kernighan, Brian
@cindex @command{awk}, history of @cindex @command{awk} @subentry history of
The name @command{awk} comes from the initials of its designers: Alfred V.@: The name @command{awk} comes from the initials of its designers: Alfred V.@:
Aho, Peter J.@: Weinberger, and Brian W.@: Kernighan. The original version of Aho, Peter J.@: Weinberger, and Brian W.@: Kernighan. The original version of
@command{awk} was written in 1977 at AT&T Bell Laboratories. @command{awk} was written in 1977 at AT&T Bell Laboratories.
In 1985, a new version made the programming In 1985, a new version made the programming
language more powerful, introducing user-defined functions, multiple input language more powerful, introducing user-defined functions, multiple input
streams, and computed regular expressions. streams, and computed regular expressions.
This new version became widely available with Unix System V This new version became widely available with Unix System V
Release 3.1 (1987). Release 3.1 (1987).
The version in System V Release 4 (1989) added some new features and cleaned The version in System V Release 4 (1989) added some new features and cleaned
up the behavior in some of the ``dark corners'' of the language. up the behavior in some of the ``dark corners'' of the language.
skipping to change at page 110, line ? skipping to change at page 110, line ?
John Haque rewrote the @command{gawk} internals, in the process providing John Haque rewrote the @command{gawk} internals, in the process providing
an @command{awk}-level debugger. This version became available as an @command{awk}-level debugger. This version became available as
@command{gawk} @value{PVERSION} 4.0 in 2011. @command{gawk} @value{PVERSION} 4.0 in 2011.
@xref{Contributors} @xref{Contributors}
for a full list of those who have made important contributions to @command{gawk} . for a full list of those who have made important contributions to @command{gawk} .
@node Names @node Names
@unnumberedsec A Rose by Any Other Name @unnumberedsec A Rose by Any Other Name
@cindex @command{awk}, new vs.@: old @cindex @command{awk} @subentry new vs.@: old
The @command{awk} language has evolved over the years. Full details are The @command{awk} language has evolved over the years. Full details are
provided in @ref{Language History}. provided in @ref{Language History}.
The language described in this @value{DOCUMENT} The language described in this @value{DOCUMENT}
is often referred to as ``new @command{awk}.'' is often referred to as ``new @command{awk}.''
By analogy, the original version of @command{awk} is By analogy, the original version of @command{awk} is
referred to as ``old @command{awk}.'' referred to as ``old @command{awk}.''
On most current systems, when you run the @command{awk} utility On most current systems, when you run the @command{awk} utility
you get some version of new @command{awk}.@footnote{Only you get some version of new @command{awk}.@footnote{Only
Solaris systems still use an old @command{awk} for the Solaris systems still use an old @command{awk} for the
default @command{awk} utility. A more modern @command{awk} lives in default @command{awk} utility. A more modern @command{awk} lives in
@file{/usr/xpg6/bin} on these systems.} If your system's standard @file{/usr/xpg6/bin} on these systems.} If your system's standard
@command{awk} is the old one, you will see something like this @command{awk} is the old one, you will see something like this
if you try the test program: if you try the following test program:
@example @example
@group @group
$ @kbd{awk 1 /dev/null} $ @kbd{awk 1 /dev/null}
@error{} awk: syntax error near line 1 @error{} awk: syntax error near line 1
@error{} awk: bailing out near line 1 @error{} awk: bailing out near line 1
@end group @end group
@end example @end example
@noindent @noindent
In this case, you should find a version of new @command{awk}, In this case, you should find a version of new @command{awk},
or just install @command{gawk}! or just install @command{gawk}!
Throughout this @value{DOCUMENT}, whenever we refer to a language feature Throughout this @value{DOCUMENT}, whenever we refer to a language feature
that should be available in any complete implementation of POSIX @command{awk}, that should be available in any complete implementation of POSIX @command{awk},
we simply use the term @command{awk}. When referring to a feature that is we simply use the term @command{awk}. When referring to a feature that is
specific to the GNU implementation, we use the term @command{gawk}. specific to the GNU implementation, we use the term @command{gawk}.
@node This Manual @node This Manual
@unnumberedsec Using This Book @unnumberedsec Using This Book
@cindex @command{awk}, terms describing @cindex @command{awk} @subentry terms describing
The term @command{awk} refers to a particular program as well as to the language you The term @command{awk} refers to a particular program as well as to the language you
use to tell this program what to do. When we need to be careful, we call use to tell this program what to do. When we need to be careful, we call
the language ``the @command{awk} language,'' the language ``the @command{awk} language,''
and the program ``the @command{awk} utility.'' and the program ``the @command{awk} utility.''
This @value{DOCUMENT} explains This @value{DOCUMENT} explains
both how to write programs in the @command{awk} language and how to both how to write programs in the @command{awk} language and how to
run the @command{awk} utility. run the @command{awk} utility.
The term ``@command{awk} program'' refers to a program written by you in The term ``@command{awk} program'' refers to a program written by you in
the @command{awk} programming language. the @command{awk} programming language.
@cindex @command{gawk}, @command{awk} and @cindex @command{gawk} @subentry @command{awk} and
@cindex @command{awk}, @command{gawk} and @cindex @command{awk} @subentry @command{gawk} and
@cindex POSIX @command{awk} @cindex POSIX @command{awk}
Primarily, this @value{DOCUMENT} explains the features of @command{awk} Primarily, this @value{DOCUMENT} explains the features of @command{awk}
as defined in the POSIX standard. It does so in the context of the as defined in the POSIX standard. It does so in the context of the
@command{gawk} implementation. While doing so, it also @command{gawk} implementation. While doing so, it also
attempts to describe important differences between @command{gawk} attempts to describe important differences between @command{gawk}
and other @command{awk} and other @command{awk}
@ifclear FOR_PRINT @ifclear FOR_PRINT
implementations.@footnote{All such differences implementations.@footnote{All such differences
appear in the index under the appear in the index under the
entry ``differences in @command{awk} and @command{gawk}.''} entry ``differences in @command{awk} and @command{gawk}.''}
skipping to change at page 110, line ? skipping to change at page 110, line ?
versions of the documentation. versions of the documentation.
@ifnotinfo @ifnotinfo
Because of this, the typographical conventions Because of this, the typographical conventions
are slightly different than in other books you may have read. are slightly different than in other books you may have read.
@end ifnotinfo @end ifnotinfo
@ifinfo @ifinfo
This @value{SECTION} briefly documents the typographical conventions used in Tex info. This @value{SECTION} briefly documents the typographical conventions used in Tex info.
@end ifinfo @end ifinfo
Examples you would type at the command line are preceded by the common Examples you would type at the command line are preceded by the common
shell primary and secondary prompts, @samp{$} and @samp{>}. shell primary and secondary prompts, @samp{$} and @samp{>}, respectively.
Input that you type is shown @kbd{like this}. Input that you type is shown @kbd{like this}.
@c 8/2014: @print{} is stripped from the texi to make docbook. @c 8/2014: @print{} is stripped from the texi to make docbook.
@ifclear FOR_PRINT @ifclear FOR_PRINT
Output from the command is preceded by the glyph ``@print{}''. Output from the command is preceded by the glyph ``@print{}''.
This typically represents the command's standard output. This typically represents the command's standard output.
@end ifclear @end ifclear
@ifset FOR_PRINT @ifset FOR_PRINT
Output from the command, usually its standard output, appears Output from the command, usually its standard output, appears
@code{like this}. @code{like this}.
@end ifset @end ifset
skipping to change at page 110, line ? skipping to change at page 110, line ?
@c fakenode --- for prepinfo @c fakenode --- for prepinfo
@unnumberedsubsec Dark Corners @unnumberedsubsec Dark Corners
@cindex Kernighan, Brian @cindex Kernighan, Brian
@quotation @quotation
@i{Dark corners are basically fractal---no matter how much @i{Dark corners are basically fractal---no matter how much
you illuminate, there's always a smaller but darker one.} you illuminate, there's always a smaller but darker one.}
@author Brian Kernighan @author Brian Kernighan
@end quotation @end quotation
@cindex d.c., See dark corner @cindex d.c. @seeentry{dark corner}
@cindex dark corner @cindex dark corner
Until the POSIX standard (and @cite{@value{TITLE}}), Until the POSIX standard (and @cite{@value{TITLE}}),
many features of @command{awk} were either poorly documented or not many features of @command{awk} were either poorly documented or not
documented at all. Descriptions of such features documented at all. Descriptions of such features
(often called ``dark corners'') are noted in this @value{DOCUMENT} with (often called ``dark corners'') are noted in this @value{DOCUMENT} with
@iftex @iftex
the picture of a flashlight in the margin, as shown here. the picture of a flashlight in the margin, as shown here.
@value{DARKCORNER} @value{DARKCORNER}
@end iftex @end iftex
@ifnottex @ifnottex
``(d.c.).'' ``(d.c.).''
@end ifnottex @end ifnottex
@ifclear FOR_PRINT @ifclear FOR_PRINT
They also appear in the index under the heading ``dark corner.'' They also appear in the index under the heading ``dark corner.''
@end ifclear @end ifclear
But, as noted by the opening quote, any coverage of dark But, as noted by the opening quote, any coverage of dark
corners is by definition incomplete. corners is by definition incomplete.
@cindex c.e., See common extensions @cindex c.e. @seeentry{common extensions}
Extensions to the standard @command{awk} language that are supported by Extensions to the standard @command{awk} language that are supported by
more than one @command{awk} implementation are marked more than one @command{awk} implementation are marked
@ifclear FOR_PRINT @ifclear FOR_PRINT
``@value{COMMONEXT},'' and listed in the index under ``common extensions'' ``@value{COMMONEXT},'' and listed in the index under ``common extensions''
and ``extensions, common.'' and ``extensions, common.''
@end ifclear @end ifclear
@ifset FOR_PRINT @ifset FOR_PRINT
``@value{COMMONEXT}'' for ``common extension.'' ``@value{COMMONEXT}'' for ``common extension.''
@end ifset @end ifset
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex FSF (Free Software Foundation) @cindex FSF (Free Software Foundation)
@cindex Free Software Foundation (FSF) @cindex Free Software Foundation (FSF)
@cindex Stallman, Richard @cindex Stallman, Richard
The Free Software Foundation (FSF) is a nonprofit organization dedicated The Free Software Foundation (FSF) is a nonprofit organization dedicated
to the production and distribution of freely distributable software. to the production and distribution of freely distributable software.
It was founded by Richard M.@: Stallman, the author of the original It was founded by Richard M.@: Stallman, the author of the original
Emacs editor. GNU Emacs is the most widely used version of Emacs today. Emacs editor. GNU Emacs is the most widely used version of Emacs today.
@cindex GNU Project @cindex GNU Project
@cindex GPL (General Public License) @cindex GPL (General Public License)
@cindex General Public License, See GPL @cindex GNU General Public License @seeentry{GPL}
@cindex documentation, online @cindex General Public License @seeentry{GPL}
@cindex documentation @subentry online
The GNU@footnote{GNU stands for ``GNU's Not Unix.''} The GNU@footnote{GNU stands for ``GNU's Not Unix.''}
Project is an ongoing effort on the part of the Free Software Project is an ongoing effort on the part of the Free Software
Foundation to create a complete, freely distributable, POSIX-compliant Foundation to create a complete, freely distributable, POSIX-compliant
computing environment. computing environment.
The FSF uses the GNU General Public License (GPL) to ensure that The FSF uses the GNU General Public License (GPL) to ensure that
its software's its software's
source code is always available to the end user. source code is always available to the end user.
@ifclear FOR_PRINT @ifclear FOR_PRINT
A copy of the GPL is included A copy of the GPL is included
@ifnotinfo @ifnotinfo
skipping to change at page 110, line ? skipping to change at page 110, line ?
@uref{https://www.gnu.org/software/gawk/manual/, GNU's website}. @uref{https://www.gnu.org/software/gawk/manual/, GNU's website}.
@ifclear FOR_PRINT @ifclear FOR_PRINT
A shell, an editor (Emacs), highly portable optimizing C, C++, and A shell, an editor (Emacs), highly portable optimizing C, C++, and
Objective-C compilers, a symbolic debugger and dozens of large and Objective-C compilers, a symbolic debugger and dozens of large and
small utilities (such as @command{gawk}), have all been completed and are small utilities (such as @command{gawk}), have all been completed and are
freely available. The GNU operating freely available. The GNU operating
system kernel (the HURD), has been released but remains in an early system kernel (the HURD), has been released but remains in an early
stage of development. stage of development.
@cindex Linux @cindex Linux @seeentry{GNU/Linux}
@cindex GNU/Linux @cindex GNU/Linux
@cindex operating systems, BSD-based @cindex operating systems @subentry BSD-based
Until the GNU operating system is more fully developed, you should Until the GNU operating system is more fully developed, you should
consider using GNU/Linux, a freely distributable, Unix-like operating consider using GNU/Linux, a freely distributable, Unix-like operating
system for Intel, system for Intel,
Power Architecture, Power Architecture,
Sun SPARC, IBM S/390, and other Sun SPARC, IBM S/390, and other
systems.@footnote{The terminology ``GNU/Linux'' is explained systems.@footnote{The terminology ``GNU/Linux'' is explained
in the @ref{Glossary}.} in the @ref{Glossary}.}
Many GNU/Linux distributions are Many GNU/Linux distributions are
available for download from the Internet. available for download from the Internet.
@end ifclear @end ifclear
skipping to change at page 110, line ? skipping to change at page 110, line ?
the third edition in 2001. the third edition in 2001.
@end ifset @end ifset
This edition maintains the basic structure of the previous editions. This edition maintains the basic structure of the previous editions.
For FSF edition 4.0, the content was thoroughly reviewed and updated. All For FSF edition 4.0, the content was thoroughly reviewed and updated. All
references to @command{gawk} versions prior to 4.0 were removed. references to @command{gawk} versions prior to 4.0 were removed.
Of significant note for that edition was the addition of @ref{Debugger}. Of significant note for that edition was the addition of @ref{Debugger}.
For FSF edition For FSF edition
@ifclear FOR_PRINT @ifclear FOR_PRINT
@value{EDITION}, 5.0,
@end ifclear @end ifclear
@ifset FOR_PRINT @ifset FOR_PRINT
@value{EDITION} @value{EDITION}
(the fourth edition as published by O'Reilly), (the fourth edition as published by O'Reilly),
@end ifset @end ifset
the content has been reorganized into parts, the content has been reorganized into parts,
and the major new additions are @ref{Arbitrary Precision Arithmetic}, and the major new additions are @ref{Arbitrary Precision Arithmetic},
and @ref{Dynamic Extensions}. and @ref{Dynamic Extensions}.
This @value{DOCUMENT} will undoubtedly continue to evolve. If you This @value{DOCUMENT} will undoubtedly continue to evolve. If you
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex Berry, Karl @cindex Berry, Karl
@cindex Chassell, Robert J.@: @cindex Chassell, Robert J.@:
@c @cindex Texinfo @c @cindex Texinfo
Robert J.@: Chassell provided much valuable advice on Robert J.@: Chassell provided much valuable advice on
the use of Texinfo. the use of Texinfo.
He also deserves special thanks for He also deserves special thanks for
convincing me @emph{not} to title this @value{DOCUMENT} convincing me @emph{not} to title this @value{DOCUMENT}
@cite{How to Gawk Politely}. @cite{How to Gawk Politely}.
Karl Berry helped significantly with the @TeX{} part of Texinfo. Karl Berry helped significantly with the @TeX{} part of Texinfo.
@cindex Hartholz, Marshall @cindex Hartholz @subentry Marshall
@cindex Hartholz, Elaine @cindex Hartholz @subentry Elaine
@cindex Schreiber, Bert @cindex Schreiber @subentry Bert
@cindex Schreiber, Rita @cindex Schreiber @subentry Rita
I would like to thank Marshall and Elaine Hartholz of Seattle and I would like to thank Marshall and Elaine Hartholz of Seattle and
Dr.@: Bert and Rita Schreiber of Detroit for large amounts of quiet vacation Dr.@: Bert and Rita Schreiber of Detroit for large amounts of quiet vacation
time in their homes, which allowed me to make significant progress on time in their homes, which allowed me to make significant progress on
this @value{DOCUMENT} and on @command{gawk} itself. this @value{DOCUMENT} and on @command{gawk} itself.
@cindex Hughes, Phil @cindex Hughes, Phil
Phil Hughes of SSC Phil Hughes of SSC
contributed in a very important way by loaning me his laptop GNU/Linux contributed in a very important way by loaning me his laptop GNU/Linux
system, not once, but twice, which allowed me to do a lot of work while system, not once, but twice, which allowed me to do a lot of work while
away from home. away from home.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex Davies, Stephen @cindex Davies, Stephen
@cindex Deifik, Scott @cindex Deifik, Scott
@cindex Demaille, Akim @cindex Demaille, Akim
@cindex G., Daniel Richard @cindex G., Daniel Richard
@cindex Guerrero, Juan Manuel @cindex Guerrero, Juan Manuel
@cindex Hankerson, Darrel @cindex Hankerson, Darrel
@cindex Jaegermann, Michal @cindex Jaegermann, Michal
@cindex Kahrs, J@"urgen @cindex Kahrs, J@"urgen
@cindex Kasal, Stepan @cindex Kasal, Stepan
@cindex Malmberg, John @cindex Malmberg, John
@cindex Pitts, Dave
@cindex Ramey, Chet @cindex Ramey, Chet
@cindex Rankin, Pat @cindex Rankin, Pat
@cindex Schorr, Andrew @cindex Schorr, Andrew
@cindex Vinschen, Corinna @cindex Vinschen, Corinna
@cindex Zaretskii, Eli @cindex Zaretskii, Eli
Dr.@: Nelson Beebe, Dr.@: Nelson Beebe,
Andreas Buening, Andreas Buening,
Dr.@: Manuel Collado, Dr.@: Manuel Collado,
Antonio Colombo, Antonio Colombo,
Stephen Davies, Stephen Davies,
Scott Deifik, Scott Deifik,
Akim Demaille, Akim Demaille,
Daniel Richard G., Daniel Richard G.,
Juan Manuel Guerrero, Juan Manuel Guerrero,
Darrel Hankerson, Darrel Hankerson,
Michal Jaegermann, Michal Jaegermann,
J@"urgen Kahrs, J@"urgen Kahrs,
Stepan Kasal, Stepan Kasal,
John Malmberg, John Malmberg,
Dave Pitts,
Chet Ramey, Chet Ramey,
Pat Rankin, Pat Rankin,
Andrew Schorr, Andrew Schorr,
Corinna Vinschen, Corinna Vinschen,
and Eli Zaretskii and Eli Zaretskii
(in alphabetical order) (in alphabetical order)
make up the current @command{gawk} ``crack portability team.'' Without make up the current @command{gawk} ``crack portability team.'' Without
their hard work and help, @command{gawk} would not be nearly the robust, their hard work and help, @command{gawk} would not be nearly the robust,
portable program it is today. It has been and continues to be a pleasure portable program it is today. It has been and continues to be a pleasure
working with this team of fine people. working with this team of fine people.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex Oram, Andy @cindex Oram, Andy
Thanks to Andy Oram of O'Reilly Media for initiating Thanks to Andy Oram of O'Reilly Media for initiating
the fourth edition and for his support during the work. the fourth edition and for his support during the work.
Thanks to Jasmine Kwityn for her copyediting work. Thanks to Jasmine Kwityn for her copyediting work.
@end ifset @end ifset
Thanks to Michael Brennan for the Forewords. Thanks to Michael Brennan for the Forewords.
@cindex Duman, Patrice @cindex Duman, Patrice
@cindex Berry, Karl @cindex Berry, Karl
@cindex Smith, Gavin
Thanks to Patrice Dumas for the new @command{makeinfo} program. Thanks to Patrice Dumas for the new @command{makeinfo} program.
Thanks to Karl Berry, who continues to work to keep Thanks to Karl Berry for his past work on Texinfo, and
the Texinfo markup language sane. to Gavin Smith, who continues to work to improve
the Texinfo markup language.
@cindex Kernighan, Brian @cindex Kernighan, Brian
@cindex Brennan, Michael @cindex Brennan, Michael
@cindex Day, Robert P.J.@: @cindex Day, Robert P.J.@:
Robert P.J.@: Day, Michael Brennan, and Brian Kernighan kindly acted as Robert P.J.@: Day, Michael Brennan, and Brian Kernighan kindly acted as
reviewers for the 2015 edition of this @value{DOCUMENT}. Their feedback reviewers for the 2015 edition of this @value{DOCUMENT}. Their feedback
helped improve the final work. helped improve the final work.
I would also like to thank Brian Kernighan for his invaluable assistance during the I would also like to thank Brian Kernighan for his invaluable assistance during the
testing and debugging of @command{gawk}, and for his ongoing testing and debugging of @command{gawk}, and for his ongoing
help and advice in clarifying numerous points about the language. help and advice in clarifying numerous points about the language.
We could not have done nearly as good a job on either @command{gawk} We could not have done nearly as good a job on either @command{gawk}
or its documentation without his help. or its documentation without his help.
Brian is in a class by himself as a programmer and technical Brian is in a class by himself as a programmer and technical
author. I have to thank him (yet again) for his ongoing friendship author. I have to thank him (yet again) for his ongoing friendship
and for being a role model to me for close to 30 years! and for being a role model to me for over 30 years!
Having him as a reviewer is an exciting privilege. It has also Having him as a reviewer is an exciting privilege. It has also
been extremely humbling@enddots{} been extremely humbling@enddots{}
@cindex Robbins, Miriam @cindex Robbins @subentry Miriam
@cindex Robbins, Jean @cindex Robbins @subentry Jean
@cindex Robbins, Harry @cindex Robbins @subentry Harry
@cindex G-d @cindex G-d
I must thank my wonderful wife, Miriam, for her patience through I must thank my wonderful wife, Miriam, for her patience through
the many versions of this project, for her proofreading, the many versions of this project, for her proofreading,
and for sharing me with the computer. and for sharing me with the computer.
I would like to thank my parents for their love, and for the grace with I would like to thank my parents for their love, and for the grace with
which they raised and educated me. which they raised and educated me.
Finally, I also must acknowledge my gratitude to G-d, for the many opportunities Finally, I also must acknowledge my gratitude to G-d, for the many opportunities
He has sent my way, as well as for the gifts He has given me with which to He has sent my way, as well as for the gifts He has given me with which to
take advantage of those opportunities. take advantage of those opportunities.
@ifnotdocbook @ifnotdocbook
@sp 2 @sp 2
@noindent @noindent
Arnold Robbins @* Arnold Robbins @*
Nof Ayalon @* Nof Ayalon @*
Israel @* Israel @*
February 2015 March, 2020
@end ifnotdocbook @end ifnotdocbook
@ifnotinfo @ifnotinfo
@part @value{PART1}The @command{awk} Language @part @value{PART1}The @command{awk} Language
@end ifnotinfo @end ifnotinfo
@ifdocbook @ifdocbook
Part I describes the @command{awk} language and @command{gawk} program Part I describes the @command{awk} language and @command{gawk} program
in detail. It starts with the basics, and continues through all of in detail. It starts with the basics, and continues through all of
skipping to change at page 110, line ? skipping to change at page 110, line ?
@ref{Functions} @ref{Functions}
@end itemize @end itemize
@end ifdocbook @end ifdocbook
@node Getting Started @node Getting Started
@chapter Getting Started with @command{awk} @chapter Getting Started with @command{awk}
@c @cindex script, definition of @c @cindex script, definition of
@c @cindex rule, definition of @c @cindex rule, definition of
@c @cindex program, definition of @c @cindex program, definition of
@c @cindex basic function of @command{awk} @c @cindex basic function of @command{awk}
@cindex @command{awk}, function of @cindex @command{awk} @subentry function of
The basic function of @command{awk} is to search files for lines (or other The basic function of @command{awk} is to search files for lines (or other
units of text) that contain certain patterns. When a line matches one units of text) that contain certain patterns. When a line matches one
of the patterns, @command{awk} performs specified actions on that line. of the patterns, @command{awk} performs specified actions on that line.
@command{awk} continues to process input lines in this way until it reaches @command{awk} continues to process input lines in this way until it reaches
the end of the input files. the end of the input files.
@cindex @command{awk}, uses for @cindex @command{awk} @subentry uses for
@cindex programming languages@comma{} data-driven vs.@: procedural @cindex programming languages @subentry data-driven vs.@: procedural
@cindex @command{awk} programs @cindex @command{awk} programs
Programs in @command{awk} are different from programs in most other languages, Programs in @command{awk} are different from programs in most other languages,
because @command{awk} programs are @dfn{data driven} (i.e., you describe because @command{awk} programs are @dfn{data driven} (i.e., you describe
the data you want to work with and then what to do when you find it). the data you want to work with and then what to do when you find it).
Most other languages are @dfn{procedural}; you have to describe, in great Most other languages are @dfn{procedural}; you have to describe, in great
detail, every step the program should take. When working with procedural detail, every step the program should take. When working with procedural
languages, it is usually much languages, it is usually much
harder to clearly describe the data your program will process. harder to clearly describe the data your program will process.
For this reason, @command{awk} programs are often refreshingly easy to For this reason, @command{awk} programs are often refreshingly easy to
read and write. read and write.
skipping to change at page 110, line ? skipping to change at page 110, line ?
lines. lines.
* Other Features:: Other Features of @command{awk}. * Other Features:: Other Features of @command{awk}.
* When:: When to use @command{gawk} and when to use * When:: When to use @command{gawk} and when to use
other things. other things.
* Intro Summary:: Summary of the introduction. * Intro Summary:: Summary of the introduction.
@end menu @end menu
@node Running gawk @node Running gawk
@section How to Run @command{awk} Programs @section How to Run @command{awk} Programs
@cindex @command{awk} programs, running @cindex @command{awk} programs @subentry running
There are several ways to run an @command{awk} program. If the program is There are several ways to run an @command{awk} program. If the program is
short, it is easiest to include it in the command that runs @command{awk}, short, it is easiest to include it in the command that runs @command{awk},
like this: like this:
@example @example
awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
@end example @end example
@cindex command line, formats @cindex command line @subentry formats
When the program is long, it is usually more convenient to put it in a file When the program is long, it is usually more convenient to put it in a file
and run it with a command like this: and run it with a command like this:
@example @example
awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{} awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{}
@end example @end example
This @value{SECTION} discusses both mechanisms, along with several This @value{SECTION} discusses both mechanisms, along with several
variations of each. variations of each.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex single quote (@code{'}) @cindex single quote (@code{'})
@cindex @code{'} (single quote) @cindex @code{'} (single quote)
This command format instructs the @dfn{shell}, or command interpreter, This command format instructs the @dfn{shell}, or command interpreter,
to start @command{awk} and use the @var{program} to process records in the to start @command{awk} and use the @var{program} to process records in the
input file(s). There are single quotes around @var{program} so input file(s). There are single quotes around @var{program} so
the shell won't interpret any @command{awk} characters as special shell the shell won't interpret any @command{awk} characters as special shell
characters. The quotes also cause the shell to treat all of @var{program} as characters. The quotes also cause the shell to treat all of @var{program} as
a single argument for @command{awk}, and allow @var{program} to be more a single argument for @command{awk}, and allow @var{program} to be more
than one line long. than one line long.
@cindex shells, scripts @cindex shells @subentry scripts
@cindex @command{awk} programs, running, from shell scripts @cindex @command{awk} programs @subentry running @subentry from shell scripts
This format is also useful for running short or medium-sized @command{awk} This format is also useful for running short or medium-sized @command{awk}
programs from shell scripts, because it avoids the need for a separate programs from shell scripts, because it avoids the need for a separate
file for the @command{awk} program. A self-contained shell script is more file for the @command{awk} program. A self-contained shell script is more
reliable because there are no other files to misplace. reliable because there are no other files to misplace.
Later in this chapter, in Later in this chapter, in
@ifdocbook @ifdocbook
the @value{SECTION} the @value{SECTION}
@end ifdocbook @end ifdocbook
@ref{Very Simple}, @ref{Very Simple},
we'll see examples of several short, we'll see examples of several short,
self-contained programs. self-contained programs.
@node Read Terminal @node Read Terminal
@subsection Running @command{awk} Without Input Files @subsection Running @command{awk} Without Input Files
@cindex standard input @cindex standard input
@cindex input, standard @cindex input @subentry standard
@cindex input files, running @command{awk} without @cindex input files @subentry running @command{awk} without
You can also run @command{awk} without any input files. If you type the You can also run @command{awk} without any input files. If you type the
following command line: following command line:
@example @example
awk '@var{program}' awk '@var{program}'
@end example @end example
@noindent @noindent
@command{awk} applies the @var{program} to the @dfn{standard input}, @command{awk} applies the @var{program} to the @dfn{standard input},
which usually means whatever you type on the keyboard. This continues which usually means whatever you type on the keyboard. This continues
until you indicate end-of-file by typing @kbd{Ctrl-d}. until you indicate end-of-file by typing @kbd{Ctrl-d}.
(On non-POSIX operating systems, the end-of-file character may be different.) (On non-POSIX operating systems, the end-of-file character may be different.)
@cindex files, input, See input files @cindex files @subentry input @seeentry{input files}
@cindex input files, running @command{awk} without @cindex input files @subentry running @command{awk} without
@cindex @command{awk} programs, running, without input files @cindex @command{awk} programs @subentry running @subentry without input files
As an example, the following program prints a friendly piece of advice As an example, the following program prints a friendly piece of advice
(from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}), (from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}),
to keep you from worrying about the complexities of computer to keep you from worrying about the complexities of computer
programming: programming:
@example @example
$ @kbd{awk 'BEGIN @{ print "Don\47t Panic!" @}'} $ @kbd{awk 'BEGIN @{ print "Don\47t Panic!" @}'}
@print{} Don't Panic! @print{} Don't Panic!
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
@kbd{Four score and seven years ago, ...} @kbd{Four score and seven years ago, ...}
@print{} Four score and seven years ago, ... @print{} Four score and seven years ago, ...
@kbd{What, me worry?} @kbd{What, me worry?}
@print{} What, me worry? @print{} What, me worry?
@kbd{Ctrl-d} @kbd{Ctrl-d}
@end example @end example
@node Long @node Long
@subsection Running Long Programs @subsection Running Long Programs
@cindex @command{awk} programs, running @cindex @command{awk} programs @subentry running
@cindex @command{awk} programs, lengthy @cindex @command{awk} programs @subentry lengthy
@cindex files, @command{awk} programs in @cindex files @subentry @command{awk} programs in
Sometimes @command{awk} programs are very long. In these cases, it is Sometimes @command{awk} programs are very long. In these cases, it is
more convenient to put the program into a separate file. In order to tell more convenient to put the program into a separate file. In order to tell
@command{awk} to use that file for its program, you type: @command{awk} to use that file for its program, you type:
@example @example
awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{} awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
@end example @end example
@cindex @option{-f} option @cindex @option{-f} option
@cindex command line, option @option{-f} @cindex command line @subentry option @option{-f}
The @option{-f} instructs the @command{awk} utility to get the The @option{-f} instructs the @command{awk} utility to get the
@command{awk} program from the file @var{source-file} (@pxref{Options}). @command{awk} program from the file @var{source-file} (@pxref{Options}).
Any @value{FN} can be used for @var{source-file}. For example, you Any @value{FN} can be used for @var{source-file}. For example, you
could put the program: could put the program:
@example @example
BEGIN @{ print "Don't Panic!" @} BEGIN @{ print "Don't Panic!" @}
@end example @end example
@noindent @noindent
skipping to change at page 110, line ? skipping to change at page 110, line ?
awk -f advice awk -f advice
@end example @end example
@noindent @noindent
does the same thing as this one: does the same thing as this one:
@example @example
awk 'BEGIN @{ print "Don\47t Panic!" @}' awk 'BEGIN @{ print "Don\47t Panic!" @}'
@end example @end example
@cindex quoting, in @command{gawk} command lines @cindex quoting @subentry in @command{gawk} command lines
@noindent @noindent
This was explained earlier This was explained earlier
(@pxref{Read Terminal}). (@pxref{Read Terminal}).
Note that you don't usually need single quotes around the @value{FN} that you Note that you don't usually need single quotes around the @value{FN} that you
specify with @option{-f}, because most @value{FN}s don't contain any of the shel l's specify with @option{-f}, because most @value{FN}s don't contain any of the shel l's
special characters. Notice that in @file{advice}, the @command{awk} special characters. Notice that in @file{advice}, the @command{awk}
program did not have single quotes around it. The quotes are only needed program did not have single quotes around it. The quotes are only needed
for programs that are provided on the @command{awk} command line. for programs that are provided on the @command{awk} command line.
(Also, placing the program in a file allows us to use a literal single quote in the program (Also, placing the program in a file allows us to use a literal single quote in the program
text, instead of the magic @samp{\47}.) text, instead of the magic @samp{\47}.)
@cindex single quote (@code{'}) in @command{gawk} command lines @cindex single quote (@code{'}) @subentry in @command{gawk} command lines
@cindex @code{'} (single quote) in @command{gawk} command lines @cindex @code{'} (single quote) @subentry in @command{gawk} command lines
If you want to clearly identify an @command{awk} program file as such, If you want to clearly identify an @command{awk} program file as such,
you can add the extension @file{.awk} to the @value{FN}. This doesn't you can add the extension @file{.awk} to the @value{FN}. This doesn't
affect the execution of the @command{awk} program but it does make affect the execution of the @command{awk} program but it does make
``housekeeping'' easier. ``housekeeping'' easier.
@node Executable Scripts @node Executable Scripts
@subsection Executable @command{awk} Programs @subsection Executable @command{awk} Programs
@cindex @command{awk} programs @cindex @command{awk} programs
@cindex @code{#} (number sign), @code{#!} (executable scripts) @cindex @code{#} (number sign) @subentry @code{#!} (executable scripts)
@cindex Unix, @command{awk} scripts and @cindex Unix @subentry @command{awk} scripts and
@cindex number sign (@code{#}), @code{#!} (executable scripts) @cindex number sign (@code{#}) @subentry @code{#!} (executable scripts)
Once you have learned @command{awk}, you may want to write self-contained Once you have learned @command{awk}, you may want to write self-contained
@command{awk} scripts, using the @samp{#!} script mechanism. You can do @command{awk} scripts, using the @samp{#!} script mechanism. You can do
this on many systems.@footnote{The @samp{#!} mechanism works on this on many systems.@footnote{The @samp{#!} mechanism works on
GNU/Linux systems, BSD-based systems, and commercial Unix systems.} GNU/Linux systems, BSD-based systems, and commercial Unix systems.}
For example, you could update the file @file{advice} to look like this: For example, you could update the file @file{advice} to look like this:
@example @example
#! /bin/awk -f #! /bin/awk -f
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
@noindent @noindent
After making this file executable (with the @command{chmod} utility), After making this file executable (with the @command{chmod} utility),
simply type @samp{advice} simply type @samp{advice}
at the shell and the system arranges to run @command{awk} as if you had at the shell and the system arranges to run @command{awk} as if you had
typed @samp{awk -f advice}: typed @samp{awk -f advice}:
@example @example
$ @kbd{chmod +x advice} $ @kbd{chmod +x advice}
$ @kbd{advice} $ @kbd{./advice}
@print{} Don't Panic! @print{} Don't Panic!
@end example @end example
@noindent @noindent
(We assume you have the current directory in your shell's search
path variable [typically @code{$PATH}]. If not, you may need
to type @samp{./advice} at the shell.)
Self-contained @command{awk} scripts are useful when you want to write a Self-contained @command{awk} scripts are useful when you want to write a
program that users can invoke without their having to know that the program is program that users can invoke without their having to know that the program is
written in @command{awk}. written in @command{awk}.
@cindex sidebar, Understanding @samp{#!} @cindex sidebar @subentry Understanding @samp{#!}
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Understanding @samp{#!}</title> <sidebar><title>Understanding @samp{#!}</title>
@end docbook @end docbook
@cindex portability, @code{#!} (executable scripts) @cindex portability @subentry @code{#!} (executable scripts)
@command{awk} is an @dfn{interpreted} language. This means that the @command{awk} is an @dfn{interpreted} language. This means that the
@command{awk} utility reads your program and then processes your data @command{awk} utility reads your program and then processes your data
according to the instructions in your program. (This is different according to the instructions in your program. (This is different
from a @dfn{compiled} language such as C, where your program is first from a @dfn{compiled} language such as C, where your program is first
compiled into machine code that is executed directly by your system's compiled into machine code that is executed directly by your system's
processor.) The @command{awk} utility is thus termed an @dfn{interpreter}. processor.) The @command{awk} utility is thus termed an @dfn{interpreter}.
Many modern languages are interpreted. Many modern languages are interpreted.
The line beginning with @samp{#!} lists the full @value{FN} of an The line beginning with @samp{#!} lists the full @value{FN} of an
interpreter to run and a single optional initial command-line argument interpreter to run and a single optional initial command-line argument
to pass to that interpreter. The operating system then runs the to pass to that interpreter. The operating system then runs the
interpreter with the given argument and the full argument list of the interpreter with the given argument and the full argument list of the
executed program. The first argument in the list is the full @value{FN} executed program. The first argument in the list is the full @value{FN}
of the @command{awk} program. The rest of the argument list contains of the @command{awk} program. The rest of the argument list contains
either options to @command{awk}, or @value{DF}s, or both. (Note that on either options to @command{awk}, or @value{DF}s, or both. (Note that on
many systems @command{awk} may be found in @file{/usr/bin} instead of many systems @command{awk} is found in @file{/usr/bin} instead of
in @file{/bin}.) in @file{/bin}.)
Some systems limit the length of the interpreter name to 32 characters. Some systems limit the length of the interpreter name to 32 characters.
Often, this can be dealt with by using a symbolic link. Often, this can be dealt with by using a symbolic link.
You should not put more than one argument on the @samp{#!} You should not put more than one argument on the @samp{#!}
line after the path to @command{awk}. It does not work. The operating system line after the path to @command{awk}. It does not work. The operating system
treats the rest of the line as a single argument and passes it to @command{awk}. treats the rest of the line as a single argument and passes it to @command{awk}.
Doing this leads to confusing behavior---most likely a usage diagnostic Doing this leads to confusing behavior---most likely a usage diagnostic
of some sort from @command{awk}. of some sort from @command{awk}.
@cindex @code{ARGC}/@code{ARGV} variables, portability and @cindex @code{ARGC}/@code{ARGV} variables @subentry portability and
@cindex portability, @code{ARGV} variable @cindex portability @subentry @code{ARGV} variable
@cindex dark corner, @code{ARGV} variable, value of @cindex dark corner @subentry @code{ARGV} variable, value of
Finally, the value of @code{ARGV[0]} Finally, the value of @code{ARGV[0]}
(@pxref{Built-in Variables}) (@pxref{Built-in Variables})
varies depending upon your operating system. varies depending upon your operating system.
Some systems put @samp{awk} there, some put the full pathname Some systems put @samp{awk} there, some put the full pathname
of @command{awk} (such as @file{/bin/awk}), and some put the name of @command{awk} (such as @file{/bin/awk}), and some put the name
of your script (@samp{advice}). @value{DARKCORNER} of your script (@samp{advice}). @value{DARKCORNER}
Don't rely on the value of @code{ARGV[0]} Don't rely on the value of @code{ARGV[0]}
to provide your script name. to provide your script name.
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Understanding @samp{#!}} @center @b{Understanding @samp{#!}}
@cindex portability, @code{#!} (executable scripts) @cindex portability @subentry @code{#!} (executable scripts)
@command{awk} is an @dfn{interpreted} language. This means that the @command{awk} is an @dfn{interpreted} language. This means that the
@command{awk} utility reads your program and then processes your data @command{awk} utility reads your program and then processes your data
according to the instructions in your program. (This is different according to the instructions in your program. (This is different
from a @dfn{compiled} language such as C, where your program is first from a @dfn{compiled} language such as C, where your program is first
compiled into machine code that is executed directly by your system's compiled into machine code that is executed directly by your system's
processor.) The @command{awk} utility is thus termed an @dfn{interpreter}. processor.) The @command{awk} utility is thus termed an @dfn{interpreter}.
Many modern languages are interpreted. Many modern languages are interpreted.
The line beginning with @samp{#!} lists the full @value{FN} of an The line beginning with @samp{#!} lists the full @value{FN} of an
interpreter to run and a single optional initial command-line argument interpreter to run and a single optional initial command-line argument
to pass to that interpreter. The operating system then runs the to pass to that interpreter. The operating system then runs the
interpreter with the given argument and the full argument list of the interpreter with the given argument and the full argument list of the
executed program. The first argument in the list is the full @value{FN} executed program. The first argument in the list is the full @value{FN}
of the @command{awk} program. The rest of the argument list contains of the @command{awk} program. The rest of the argument list contains
either options to @command{awk}, or @value{DF}s, or both. (Note that on either options to @command{awk}, or @value{DF}s, or both. (Note that on
many systems @command{awk} may be found in @file{/usr/bin} instead of many systems @command{awk} is found in @file{/usr/bin} instead of
in @file{/bin}.) in @file{/bin}.)
Some systems limit the length of the interpreter name to 32 characters. Some systems limit the length of the interpreter name to 32 characters.
Often, this can be dealt with by using a symbolic link. Often, this can be dealt with by using a symbolic link.
You should not put more than one argument on the @samp{#!} You should not put more than one argument on the @samp{#!}
line after the path to @command{awk}. It does not work. The operating system line after the path to @command{awk}. It does not work. The operating system
treats the rest of the line as a single argument and passes it to @command{awk}. treats the rest of the line as a single argument and passes it to @command{awk}.
Doing this leads to confusing behavior---most likely a usage diagnostic Doing this leads to confusing behavior---most likely a usage diagnostic
of some sort from @command{awk}. of some sort from @command{awk}.
@cindex @code{ARGC}/@code{ARGV} variables, portability and @cindex @code{ARGC}/@code{ARGV} variables @subentry portability and
@cindex portability, @code{ARGV} variable @cindex portability @subentry @code{ARGV} variable
@cindex dark corner, @code{ARGV} variable, value of @cindex dark corner @subentry @code{ARGV} variable, value of
Finally, the value of @code{ARGV[0]} Finally, the value of @code{ARGV[0]}
(@pxref{Built-in Variables}) (@pxref{Built-in Variables})
varies depending upon your operating system. varies depending upon your operating system.
Some systems put @samp{awk} there, some put the full pathname Some systems put @samp{awk} there, some put the full pathname
of @command{awk} (such as @file{/bin/awk}), and some put the name of @command{awk} (such as @file{/bin/awk}), and some put the name
of your script (@samp{advice}). @value{DARKCORNER} of your script (@samp{advice}). @value{DARKCORNER}
Don't rely on the value of @code{ARGV[0]} Don't rely on the value of @code{ARGV[0]}
to provide your script name. to provide your script name.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node Comments @node Comments
@subsection Comments in @command{awk} Programs @subsection Comments in @command{awk} Programs
@cindex @code{#} (number sign), commenting @cindex @code{#} (number sign) @subentry commenting
@cindex number sign (@code{#}), commenting @cindex number sign (@code{#}) @subentry commenting
@cindex commenting @cindex commenting
@cindex @command{awk} programs, documenting @cindex @command{awk} programs @subentry documenting
A @dfn{comment} is some text that is included in a program for the sake A @dfn{comment} is some text that is included in a program for the sake
of human readers; it is not really an executable part of the program. Comments of human readers; it is not really an executable part of the program. Comments
can explain what the program does and how it works. Nearly all can explain what the program does and how it works. Nearly all
programming languages have provisions for comments, as programs are programming languages have provisions for comments, as programs are
typically hard to understand without them. typically hard to understand without them.
In the @command{awk} language, a comment starts with the number sign In the @command{awk} language, a comment starts with the number sign
character (@samp{#}) and continues to the end of the line. character (@samp{#}) and continues to the end of the line.
The @samp{#} does not have to be the first character on the line. The The @samp{#} does not have to be the first character on the line. The
skipping to change at page 110, line ? skipping to change at page 110, line ?
# This program prints a nice, friendly message. It helps # This program prints a nice, friendly message. It helps
# keep novice users from being afraid of the computer. # keep novice users from being afraid of the computer.
BEGIN @{ print "Don't Panic!" @} BEGIN @{ print "Don't Panic!" @}
@end example @end example
You can put comment lines into keyboard-composed throwaway @command{awk} You can put comment lines into keyboard-composed throwaway @command{awk}
programs, but this usually isn't very useful; the purpose of a programs, but this usually isn't very useful; the purpose of a
comment is to help you or another person understand the program comment is to help you or another person understand the program
when reading it at a later time. when reading it at a later time.
@cindex quoting, for small awk programs @cindex quoting @subentry for small awk programs
@cindex single quote (@code{'}), vs.@: apostrophe @cindex single quote (@code{'}) @subentry vs.@: apostrophe
@cindex @code{'} (single quote), vs.@: apostrophe @cindex @code{'} (single quote) @subentry vs.@: apostrophe
@quotation CAUTION @quotation CAUTION
As mentioned in As mentioned in
@ref{One-shot}, @ref{One-shot},
you can enclose short to medium-sized programs in single quotes, you can enclose short to medium-sized programs in single quotes,
in order to keep in order to keep
your shell scripts self-contained. When doing so, @emph{don't} put your shell scripts self-contained. When doing so, @emph{don't} put
an apostrophe (i.e., a single quote) into a comment (or anywhere else an apostrophe (i.e., a single quote) into a comment (or anywhere else
in your program). The shell interprets the quote as the closing in your program). The shell interprets the quote as the closing
quote for the entire program. As a result, usually the shell quote for the entire program. As a result, usually the shell
prints a message about mismatched quotes, and if @command{awk} actually prints a message about mismatched quotes, and if @command{awk} actually
skipping to change at page 110, line ? skipping to change at page 110, line ?
For short to medium-length @command{awk} programs, it is most convenient For short to medium-length @command{awk} programs, it is most convenient
to enter the program on the @command{awk} command line. to enter the program on the @command{awk} command line.
This is best done by enclosing the entire program in single quotes. This is best done by enclosing the entire program in single quotes.
This is true whether you are entering the program interactively at This is true whether you are entering the program interactively at
the shell prompt, or writing it as part of a larger shell script: the shell prompt, or writing it as part of a larger shell script:
@example @example
awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{} awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{}
@end example @end example
@cindex shells, quoting, rules for @cindex shells @subentry quoting @subentry rules for
@cindex Bourne shell, quoting rules for @cindex Bourne shell, quoting rules for
Once you are working with the shell, it is helpful to have a basic Once you are working with the shell, it is helpful to have a basic
knowledge of shell quoting rules. The following rules apply only to knowledge of shell quoting rules. The following rules apply only to
POSIX-compliant, Bourne-style shells (such as Bash, the GNU Bourne-Again POSIX-compliant, Bourne-style shells (such as Bash, the GNU Bourne-Again
Shell). If you use the C shell, you're on your own. Shell). If you use the C shell, you're on your own.
Before diving into the rules, we introduce a concept that appears Before diving into the rules, we introduce a concept that appears
throughout this @value{DOCUMENT}, which is that of the @dfn{null}, throughout this @value{DOCUMENT}, which is that of the @dfn{null},
or empty, string. or empty, string.
skipping to change at page 110, line ? skipping to change at page 110, line ?
Quoted items can be concatenated with nonquoted items as well as with other Quoted items can be concatenated with nonquoted items as well as with other
quoted items. The shell turns everything into one argument for quoted items. The shell turns everything into one argument for
the command. the command.
@item @item
Preceding any single character with a backslash (@samp{\}) quotes Preceding any single character with a backslash (@samp{\}) quotes
that character. The shell removes the backslash and passes the quoted that character. The shell removes the backslash and passes the quoted
character on to the command. character on to the command.
@item @item
@cindex @code{\} (backslash), in shell commands @cindex @code{\} (backslash) @subentry in shell commands
@cindex backslash (@code{\}), in shell commands @cindex backslash (@code{\}) @subentry in shell commands
@cindex single quote (@code{'}), in shell commands @cindex single quote (@code{'}) @subentry in shell commands
@cindex @code{'} (single quote), in shell commands @cindex @code{'} (single quote) @subentry in shell commands
Single quotes protect everything between the opening and closing quotes. Single quotes protect everything between the opening and closing quotes.
The shell does no interpretation of the quoted text, passing it on verbatim The shell does no interpretation of the quoted text, passing it on verbatim
to the command. to the command.
It is @emph{impossible} to embed a single quote inside single-quoted text. It is @emph{impossible} to embed a single quote inside single-quoted text.
Refer back to Refer back to
@ref{Comments} @ref{Comments}
for an example of what happens if you try. for an example of what happens if you try.
@item @item
@cindex double quote (@code{"}), in shell commands @cindex double quote (@code{"}) @subentry in shell commands
@cindex @code{"} (double quote), in shell commands @cindex @code{"} (double quote) @subentry in shell commands
Double quotes protect most things between the opening and closing quotes. Double quotes protect most things between the opening and closing quotes.
The shell does at least variable and command substitution on the quoted text. The shell does at least variable and command substitution on the quoted text.
Different shells may do additional kinds of processing on double-quoted text. Different shells may do additional kinds of processing on double-quoted text.
Because certain characters within double-quoted text are processed by the shell, Because certain characters within double-quoted text are processed by the shell,
they must be @dfn{escaped} within the text. Of note are the characters they must be @dfn{escaped} within the text. Of note are the characters
@samp{$}, @samp{`}, @samp{\}, and @samp{"}, all of which must be preceded by @samp{$}, @samp{`}, @samp{\}, and @samp{"}, all of which must be preceded by
a backslash within double-quoted text if they are to be passed on literally a backslash within double-quoted text if they are to be passed on literally
to the program. (The leading backslash is stripped first.) to the program. (The leading backslash is stripped first.)
Thus, the example seen Thus, the example seen
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
@noindent @noindent
could instead be written this way: could instead be written this way:
@example @example
$ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"} $ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"}
@print{} Don't Panic! @print{} Don't Panic!
@end example @end example
@cindex single quote (@code{'}), with double quotes @cindex single quote (@code{'}) @subentry with double quotes
@cindex @code{'} (single quote), with double quotes @cindex @code{'} (single quote) @subentry with double quotes
Note that the single quote is not special within double quotes. Note that the single quote is not special within double quotes.
@item @item
Null strings are removed when they occur as part of a non-null Null strings are removed when they occur as part of a non-null
command-line argument, while explicit null objects are kept. command-line argument, while explicit null objects are kept.
For example, to specify that the field separator @code{FS} should For example, to specify that the field separator @code{FS} should
be set to the null string, use: be set to the null string, use:
@example @example
awk -F "" '@var{program}' @var{files} # correct awk -F "" '@var{program}' @var{files} # correct
@end example @end example
@noindent @noindent
@cindex null strings in @command{gawk} arguments, quoting and @cindex null strings @subentry in @command{gawk} arguments, quoting and
Don't use this: Don't use this:
@example @example
awk -F"" '@var{program}' @var{files} # wrong! awk -F"" '@var{program}' @var{files} # wrong!
@end example @end example
@noindent @noindent
In the second case, @command{awk} attempts to use the text of the program In the second case, @command{awk} attempts to use the text of the program
as the value of @code{FS}, and the first @value{FN} as the text of the program! as the value of @code{FS}, and the first @value{FN} as the text of the program!
This results in syntax errors at best, and confusing behavior at worst. This results in syntax errors at best, and confusing behavior at worst.
@end itemize @end itemize
@cindex quoting, in @command{gawk} command lines, tricks for @cindex quoting @subentry in @command{gawk} command lines @subentry tricks for
Mixing single and double quotes is difficult. You have to resort Mixing single and double quotes is difficult. You have to resort
to shell quoting tricks, like this: to shell quoting tricks, like this:
@example @example
$ @kbd{awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'} $ @kbd{awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'}
@print{} Here is a single quote <'> @print{} Here is a single quote <'>
@end example @end example
@noindent @noindent
This program consists of three concatenated quoted strings. The first and the This program consists of three concatenated quoted strings. The first and the
skipping to change at page 110, line ? skipping to change at page 110, line ?
@group @group
$ @kbd{awk 'BEGIN @{ print "Here is a single quote <\47>" @}'} $ @kbd{awk 'BEGIN @{ print "Here is a single quote <\47>" @}'}
@print{} Here is a single quote <'> @print{} Here is a single quote <'>
$ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'} $ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'}
@print{} Here is a double quote <"> @print{} Here is a double quote <">
@end group @end group
@end example @end example
@noindent @noindent
This works nicely, but you should comment clearly what the This works nicely, but you should comment clearly what the
escapes mean. escape sequences mean.
A fourth option is to use command-line variable assignment, like this: A fourth option is to use command-line variable assignment, like this:
@example @example
$ @kbd{awk -v sq="'" 'BEGIN @{ print "Here is a single quote <" sq ">" @}'} $ @kbd{awk -v sq="'" 'BEGIN @{ print "Here is a single quote <" sq ">" @}'}
@print{} Here is a single quote <'> @print{} Here is a single quote <'>
@end example @end example
(Here, the two string constants and the value of @code{sq} are concatenated (Here, the two string constants and the value of @code{sq} are concatenated
into a single string that is printed by @code{print}.) into a single string that is printed by @code{print}.)
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
@noindent @noindent
However, the use of @samp{\042} instead of @samp{\\\"} is also possible However, the use of @samp{\042} instead of @samp{\\\"} is also possible
and easier to read, because backslashes that are not followed by a and easier to read, because backslashes that are not followed by a
double-quote don't need duplication. double-quote don't need duplication.
@node Sample Data Files @node Sample Data Files
@section @value{DDF}s for the Examples @section @value{DDF}s for the Examples
@cindex input files, examples @cindex input files @subentry examples
@cindex @code{mail-list} file @cindex @code{mail-list} file
Many of the examples in this @value{DOCUMENT} take their input from two sample Many of the examples in this @value{DOCUMENT} take their input from two sample
@value{DF}s. The first, @file{mail-list}, represents a list of peoples' names @value{DF}s. The first, @file{mail-list}, represents a list of peoples' names
together with their email addresses and information about those people. together with their email addresses and information about those people.
The second @value{DF}, called @file{inventory-shipped}, contains The second @value{DF}, called @file{inventory-shipped}, contains
information about monthly shipments. In both files, information about monthly shipments. In both files,
each line is considered to be one @dfn{record}. each line is considered to be one @dfn{record}.
In @file{mail-list}, each record contains the name of a person, In @file{mail-list}, each record contains the name of a person,
his/her phone number, his/her email address, and a code for his/her relationship his/her phone number, his/her email address, and a code for his/her relationship
skipping to change at page 110, line ? skipping to change at page 110, line ?
Here is what this program prints: Here is what this program prints:
@example @example
$ @kbd{awk '/li/ @{ print $0 @}' mail-list} $ @kbd{awk '/li/ @{ print $0 @}' mail-list}
@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F @print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F
@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R @print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R
@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F @print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F
@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A @print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A
@end example @end example
@cindex actions, default @cindex actions @subentry default
@cindex patterns, default @cindex patterns @subentry default
In an @command{awk} rule, either the pattern or the action can be omitted, In an @command{awk} rule, either the pattern or the action can be omitted,
but not both. If the pattern is omitted, then the action is performed but not both. If the pattern is omitted, then the action is performed
for @emph{every} input line. If the action is omitted, the default for @emph{every} input line. If the action is omitted, the default
action is to print all lines that match the pattern. action is to print all lines that match the pattern.
@cindex actions, empty @cindex actions @subentry empty
Thus, we could leave out the action (the @code{print} statement and the Thus, we could leave out the action (the @code{print} statement and the
braces) in the previous example and the result would be the same: braces) in the previous example and the result would be the same:
@command{awk} prints all lines matching the pattern @samp{li}. By comparison, @command{awk} prints all lines matching the pattern @samp{li}. By comparison,
omitting the @code{print} statement but retaining the braces makes an omitting the @code{print} statement but retaining the braces makes an
empty action that does nothing (i.e., no lines are printed). empty action that does nothing (i.e., no lines are printed).
@cindex @command{awk} programs, one-line examples @cindex @command{awk} programs @subentry one-line examples
Many practical @command{awk} programs are just a line or two long. Following is a Many practical @command{awk} programs are just a line or two long. Following is a
collection of useful, short programs to get you started. Some of these collection of useful, short programs to get you started. Some of these
programs contain constructs that haven't been covered yet. (The description programs contain constructs that haven't been covered yet. (The description
of the program will give you a good idea of what is going on, but you'll of the program will give you a good idea of what is going on, but you'll
need to read the rest of the @value{DOCUMENT} to become an @command{awk} expert! ) need to read the rest of the @value{DOCUMENT} to become an @command{awk} expert! )
Most of the examples use a @value{DF} named @file{data}. This is just a Most of the examples use a @value{DF} named @file{data}. This is just a
placeholder; if you use these programs yourself, substitute placeholder; if you use these programs yourself, substitute
your own @value{FN}s for @file{data}. your own @value{FN}s for @file{data}.
For future reference, note that there is often more than For future reference, note that there is often more than
one way to do things in @command{awk}. At some point, you may want one way to do things in @command{awk}. At some point, you may want
skipping to change at page 110, line ? skipping to change at page 110, line ?
-rw-r--r-- 1 arnold user 10809 Nov 7 13:03 awk.h -rw-r--r-- 1 arnold user 10809 Nov 7 13:03 awk.h
-rw-r--r-- 1 arnold user 983 Apr 13 12:14 awk.tab.h -rw-r--r-- 1 arnold user 983 Apr 13 12:14 awk.tab.h
-rw-r--r-- 1 arnold user 31869 Jun 15 12:20 awkgram.y -rw-r--r-- 1 arnold user 31869 Jun 15 12:20 awkgram.y
-rw-r--r-- 1 arnold user 22414 Nov 7 13:03 awk1.c -rw-r--r-- 1 arnold user 22414 Nov 7 13:03 awk1.c
-rw-r--r-- 1 arnold user 37455 Nov 7 13:03 awk2.c -rw-r--r-- 1 arnold user 37455 Nov 7 13:03 awk2.c
-rw-r--r-- 1 arnold user 27511 Dec 9 13:07 awk3.c -rw-r--r-- 1 arnold user 27511 Dec 9 13:07 awk3.c
-rw-r--r-- 1 arnold user 7989 Nov 7 13:03 awk4.c -rw-r--r-- 1 arnold user 7989 Nov 7 13:03 awk4.c
@end example @end example
@noindent @noindent
@cindex line continuations, with C shell @cindex line continuations @subentry with C shell
The first field contains read-write permissions, the second field contains The first field contains read-write permissions, the second field contains
the number of links to the file, and the third field identifies the file's owner . the number of links to the file, and the third field identifies the file's owner .
The fourth field identifies the file's group. The fourth field identifies the file's group.
The fifth field contains the file's size in bytes. The The fifth field contains the file's size in bytes. The
sixth, seventh, and eighth fields contain the month, day, and time, sixth, seventh, and eighth fields contain the month, day, and time,
respectively, that the file was last modified. Finally, the ninth field respectively, that the file was last modified. Finally, the ninth field
contains the @value{FN}. contains the @value{FN}.
@c @cindex automatic initialization @c @cindex automatic initialization
@cindex initialization, automatic @cindex initialization, automatic
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex newlines @cindex newlines
Most often, each line in an @command{awk} program is a separate statement or Most often, each line in an @command{awk} program is a separate statement or
separate rule, like this: separate rule, like this:
@example @example
awk '/12/ @{ print $0 @} awk '/12/ @{ print $0 @}
/21/ @{ print $0 @}' mail-list inventory-shipped /21/ @{ print $0 @}' mail-list inventory-shipped
@end example @end example
@cindex @command{gawk}, newlines in @cindex @command{gawk} @subentry newlines in
However, @command{gawk} ignores newlines after any of the following However, @command{gawk} ignores newlines after any of the following
symbols and keywords: symbols and keywords:
@example @example
, @{ ? : || && do else , @{ ? : || && do else
@end example @end example
@noindent @noindent
A newline at any other point is considered the end of the A newline at any other point is considered the end of the
statement.@footnote{The @samp{?} and @samp{:} referred to here is the statement.@footnote{The @samp{?} and @samp{:} referred to here is the
three-operand conditional expression described in three-operand conditional expression described in
@ref{Conditional Exp}. @ref{Conditional Exp}.
Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk} Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk}
extension; if @option{--posix} is specified extension; if @option{--posix} is specified
(@pxref{Options}), then this extension is disabled.} (@pxref{Options}), then this extension is disabled.}
@cindex @code{\} (backslash), continuing lines and @cindex @code{\} (backslash) @subentry continuing lines and
@cindex backslash (@code{\}), continuing lines and @cindex backslash (@code{\}) @subentry continuing lines and
If you would like to split a single statement into two lines at a point If you would like to split a single statement into two lines at a point
where a newline would terminate it, you can @dfn{continue} it by ending the where a newline would terminate it, you can @dfn{continue} it by ending the
first line with a backslash character (@samp{\}). The backslash must be first line with a backslash character (@samp{\}). The backslash must be
the final character on the line in order to be recognized as a continuation the final character on the line in order to be recognized as a continuation
character. A backslash is allowed anywhere in the statement, even character. A backslash followed by a newline is allowed anywhere in the stateme nt, even
in the middle of a string or regular expression. For example: in the middle of a string or regular expression. For example:
@example @example
awk '/This regular expression is too long, so continue it\ awk '/This regular expression is too long, so continue it\
on the next line/ @{ print $1 @}' on the next line/ @{ print $1 @}'
@end example @end example
@noindent @noindent
@cindex portability, backslash continuation and @cindex portability @subentry backslash continuation and
We have generally not used backslash continuation in our sample programs. We have generally not used backslash continuation in our sample programs.
@command{gawk} places no limit on the @command{gawk} places no limit on the
length of a line, so backslash continuation is never strictly necessary; length of a line, so backslash continuation is never strictly necessary;
it just makes programs more readable. For this same reason, as well as it just makes programs more readable. For this same reason, as well as
for clarity, we have kept most statements short in the programs for clarity, we have kept most statements short in the programs
presented throughout the @value{DOCUMENT}. Backslash continuation is presented throughout the @value{DOCUMENT}.
Backslash continuation is
most useful when your @command{awk} program is in a separate source file most useful when your @command{awk} program is in a separate source file
instead of entered from the command line. You should also note that instead of entered from the command line. You should also note that
many @command{awk} implementations are more particular about where you many @command{awk} implementations are more particular about where you
may use backslash continuation. For example, they may not allow you to may use backslash continuation. For example, they may not allow you to
split a string constant using backslash continuation. Thus, for maximum split a string constant using backslash continuation. Thus, for maximum
portability of your @command{awk} programs, it is best not to split your portability of your @command{awk} programs, it is best not to split your
lines in the middle of a regular expression or a string. lines in the middle of a regular expression or a string.
@c 10/2000: gawk, mawk, and current bell labs awk allow it, @c 10/2000: gawk, mawk, and current bell labs awk allow it,
@c solaris 2.7 nawk does not. Solaris /usr/xpg4/bin/awk does though! sigh. @c solaris 2.7 nawk does not. Solaris /usr/xpg4/bin/awk does though! sigh.
@cindex @command{csh} utility @cindex @command{csh} utility
@cindex backslash (@code{\}), continuing lines and, in @command{csh} @cindex backslash (@code{\}) @subentry continuing lines and @subentry in @comman
@cindex @code{\} (backslash), continuing lines and, in @command{csh} d{csh}
@cindex @code{\} (backslash) @subentry continuing lines and @subentry in @comman
d{csh}
@quotation CAUTION @quotation CAUTION
@emph{Backslash continuation does not work as described @emph{Backslash continuation does not work as described
with the C shell.} It works for @command{awk} programs in files and with the C shell.} It works for @command{awk} programs in files and
for one-shot programs, @emph{provided} you are using a POSIX-compliant for one-shot programs, @emph{provided} you are using a POSIX-compliant
shell, such as the Unix Bourne shell or Bash. But the C shell behaves shell, such as the Unix Bourne shell or Bash. But the C shell behaves
differently! There you must use two backslashes in a row, followed by differently! There you must use two backslashes in a row, followed by
a newline. Note also that when using the C shell, @emph{every} newline a newline. Note also that when using the C shell, @emph{every} newline
in your @command{awk} program must be escaped with a backslash. To illustrate: in your @command{awk} program must be escaped with a backslash. To illustrate:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
> @kbd{@}'} > @kbd{@}'}
@print{} hello, world @print{} hello, world
@end example @end example
@end quotation @end quotation
@command{awk} is a line-oriented language. Each rule's action has to @command{awk} is a line-oriented language. Each rule's action has to
begin on the same line as the pattern. To have the pattern and action begin on the same line as the pattern. To have the pattern and action
on separate lines, you @emph{must} use backslash continuation; there on separate lines, you @emph{must} use backslash continuation; there
is no other option. is no other option.
@cindex backslash (@code{\}), continuing lines and, comments and @cindex backslash (@code{\}) @subentry continuing lines and @subentry comments a
@cindex @code{\} (backslash), continuing lines and, comments and nd
@cindex commenting, backslash continuation and @cindex @code{\} (backslash) @subentry continuing lines and @subentry comments a
nd
@cindex commenting @subentry backslash continuation and
Another thing to keep in mind is that backslash continuation and Another thing to keep in mind is that backslash continuation and
comments do not mix. As soon as @command{awk} sees the @samp{#} that comments do not mix. As soon as @command{awk} sees the @samp{#} that
starts a comment, it ignores @emph{everything} on the rest of the starts a comment, it ignores @emph{everything} on the rest of the
line. For example: line. For example:
@example @example
@group @group
$ @kbd{gawk 'BEGIN @{ print "dont panic" # a friendly \} $ @kbd{gawk 'BEGIN @{ print "dont panic" # a friendly \}
> @kbd{ BEGIN rule} > @kbd{ BEGIN rule}
> @kbd{@}'} > @kbd{@}'}
skipping to change at page 110, line ? skipping to change at page 110, line ?
@error{} gawk: cmd. line:2: ^ syntax error @error{} gawk: cmd. line:2: ^ syntax error
@end group @end group
@end example @end example
@noindent @noindent
In this case, it looks like the backslash would continue the comment onto the In this case, it looks like the backslash would continue the comment onto the
next line. However, the backslash-newline combination is never even next line. However, the backslash-newline combination is never even
noticed because it is ``hidden'' inside the comment. Thus, the noticed because it is ``hidden'' inside the comment. Thus, the
@code{BEGIN} is noted as a syntax error. @code{BEGIN} is noted as a syntax error.
@cindex statements, multiple @cindex statements @subentry multiple
@cindex @code{;} (semicolon), separating statements in actions @cindex @code{;} (semicolon) @subentry separating statements in actions
@cindex semicolon (@code{;}), separating statements in actions @cindex semicolon (@code{;}) @subentry separating statements in actions
@cindex @code{;} (semicolon), separating rules @cindex @code{;} (semicolon) @subentry separating rules
@cindex semicolon (@code{;}), separating rules @cindex semicolon (@code{;}) @subentry separating rules
When @command{awk} statements within one rule are short, you might want to put When @command{awk} statements within one rule are short, you might want to put
more than one of them on a line. This is accomplished by separating the stateme nts more than one of them on a line. This is accomplished by separating the stateme nts
with a semicolon (@samp{;}). with a semicolon (@samp{;}).
This also applies to the rules themselves. This also applies to the rules themselves.
Thus, the program shown at the start of this @value{SECTION} Thus, the program shown at the start of this @value{SECTION}
could also be written this way: could also be written this way:
@example @example
/12/ @{ print $0 @} ; /21/ @{ print $0 @} /12/ @{ print $0 @} ; /21/ @{ print $0 @}
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
and array sorting. and array sorting.
As we develop our presentation of the @command{awk} language, we will introduce As we develop our presentation of the @command{awk} language, we will introduce
most of the variables and many of the functions. They are described most of the variables and many of the functions. They are described
systematically in @ref{Built-in Variables} and in systematically in @ref{Built-in Variables} and in
@ref{Built-in}. @ref{Built-in}.
@node When @node When
@section When to Use @command{awk} @section When to Use @command{awk}
@cindex @command{awk}, uses for @cindex @command{awk} @subentry uses for
Now that you've seen some of what @command{awk} can do, Now that you've seen some of what @command{awk} can do,
you might wonder how @command{awk} could be useful for you. By using you might wonder how @command{awk} could be useful for you. By using
utility programs, advanced patterns, field separators, arithmetic utility programs, advanced patterns, field separators, arithmetic
statements, and other selection criteria, you can produce much more statements, and other selection criteria, you can produce much more
complex output. The @command{awk} language is very useful for producing complex output. The @command{awk} language is very useful for producing
reports from large amounts of raw data, such as summarizing information reports from large amounts of raw data, such as summarizing information
from the output of other utility programs like @command{ls}. from the output of other utility programs like @command{ls}.
(@xref{More Complex}.) (@xref{More Complex}.)
Programs written with @command{awk} are usually much smaller than they would Programs written with @command{awk} are usually much smaller than they would
skipping to change at page 110, line ? skipping to change at page 110, line ?
eight-bit microprocessors (@pxref{Glossary}, for more information), eight-bit microprocessors (@pxref{Glossary}, for more information),
@end ifclear @end ifclear
@ifset FOR_PRINT @ifset FOR_PRINT
eight-bit microprocessors, eight-bit microprocessors,
@end ifset @end ifset
and a microcode assembler for a special-purpose Prolog and a microcode assembler for a special-purpose Prolog
computer. computer.
The original @command{awk}'s capabilities were strained by tasks The original @command{awk}'s capabilities were strained by tasks
of such complexity, but modern versions are more capable. of such complexity, but modern versions are more capable.
@cindex @command{awk} programs, complex @cindex @command{awk} programs @subentry complex
If you find yourself writing @command{awk} scripts of more than, say, If you find yourself writing @command{awk} scripts of more than, say,
a few hundred lines, you might consider using a different programming a few hundred lines, you might consider using a different programming
language. The shell is good at string and pattern matching; in addition, language. The shell is good at string and pattern matching; in addition,
it allows powerful use of the system utilities. Python offers a nice it allows powerful use of the system utilities. Python offers a nice
balance between high-level ease of programming and access to system balance between high-level ease of programming and access to system
facilities.@footnote{Other popular scripting languages include Ruby facilities.@footnote{Other popular scripting languages include Ruby
and Perl.} and Perl.}
@node Intro Summary @node Intro Summary
@section Summary @section Summary
skipping to change at page 110, line ? skipping to change at page 110, line ?
* Exit Status:: @command{gawk}'s exit status. * Exit Status:: @command{gawk}'s exit status.
* Include Files:: Including other files into your program. * Include Files:: Including other files into your program.
* Loading Shared Libraries:: Loading shared libraries into your program. * Loading Shared Libraries:: Loading shared libraries into your program.
* Obsolete:: Obsolete Options and/or features. * Obsolete:: Obsolete Options and/or features.
* Undocumented:: Undocumented Options and Features. * Undocumented:: Undocumented Options and Features.
* Invoking Summary:: Invocation summary. * Invoking Summary:: Invocation summary.
@end menu @end menu
@node Command Line @node Command Line
@section Invoking @command{awk} @section Invoking @command{awk}
@cindex command line, invoking @command{awk} from @cindex command line @subentry invoking @command{awk} from
@cindex @command{awk}, invoking @cindex @command{awk} @subentry invoking
@cindex arguments, command-line, invoking @command{awk} @cindex arguments @subentry command-line @subentry invoking @command{awk}
@cindex options, command-line, invoking @command{awk} @cindex options @subentry command-line @subentry invoking @command{awk}
There are two ways to run @command{awk}---with an explicit program or with There are two ways to run @command{awk}---with an explicit program or with
one or more program files. Here are templates for both of them; items one or more program files. Here are templates for both of them; items
enclosed in [@dots{}] in these templates are optional: enclosed in [@dots{}] in these templates are optional:
@display @display
@command{awk} [@var{options}] @option{-f} @var{progfile} [@option{--}] @var{file } @dots{} @command{awk} [@var{options}] @option{-f} @var{progfile} [@option{--}] @var{file } @dots{}
@command{awk} [@var{options}] [@option{--}] @code{'@var{program}'} @var{file} @d ots{} @command{awk} [@var{options}] [@option{--}] @code{'@var{program}'} @var{file} @d ots{}
@end display @end display
@cindex GNU long options @cindex GNU long options
@cindex long options @cindex long options
@cindex options, long @cindex options @subentry long
In addition to traditional one-letter POSIX-style options, @command{gawk} also In addition to traditional one-letter POSIX-style options, @command{gawk} also
supports GNU long options. supports GNU long options.
@cindex dark corner, invoking @command{awk} @cindex dark corner @subentry invoking @command{awk}
@cindex lint checking, empty programs @cindex lint checking @subentry empty programs
It is possible to invoke @command{awk} with an empty program: It is possible to invoke @command{awk} with an empty program:
@example @example
awk '' datafile1 datafile2 awk '' datafile1 datafile2
@end example @end example
@cindex @option{--lint} option @cindex @option{--lint} option
@cindex dark corner, empty programs @cindex dark corner @subentry empty programs
@noindent @noindent
Doing so makes little sense, though; @command{awk} exits Doing so makes little sense, though; @command{awk} exits
silently when given an empty program. silently when given an empty program.
@value{DARKCORNER} @value{DARKCORNER}
If @option{--lint} has If @option{--lint} has
been specified on the command line, @command{gawk} issues a been specified on the command line, @command{gawk} issues a
warning that the program is empty. warning that the program is empty.
@node Options @node Options
@section Command-Line Options @section Command-Line Options
@cindex options, command-line @cindex options @subentry command-line
@cindex command line, options @cindex command line @subentry options
@cindex GNU long options @cindex GNU long options
@cindex options, long @cindex options @subentry long
Options begin with a dash and consist of a single character. Options begin with a dash and consist of a single character.
GNU-style long options consist of two dashes and a keyword. GNU-style long options consist of two dashes and a keyword.
The keyword can be abbreviated, as long as the abbreviation allows the option The keyword can be abbreviated, as long as the abbreviation allows the option
to be uniquely identified. If the option takes an argument, either the to be uniquely identified. If the option takes an argument, either the
keyword is immediately followed by an equals sign (@samp{=}) and the keyword is immediately followed by an equals sign (@samp{=}) and the
argument's value, or the keyword and the argument's value are separated argument's value, or the keyword and the argument's value are separated
by whitespace. by whitespace (spaces or TABs).
If a particular option with a value is given more than once, it is the If a particular option with a value is given more than once, it is the
last value that counts. last value that counts.
@cindex POSIX @command{awk}, GNU long options and @cindex POSIX @command{awk} @subentry GNU long options and
Each long option for @command{gawk} has a corresponding Each long option for @command{gawk} has a corresponding
POSIX-style short option. POSIX-style short option.
The long and short options are The long and short options are
interchangeable in all contexts. interchangeable in all contexts.
The following list describes options mandated by the POSIX standard: The following list describes options mandated by the POSIX standard:
@table @code @table @code
@item -F @var{fs} @item -F @var{fs}
@itemx --field-separator @var{fs} @itemx --field-separator @var{fs}
@cindex @option{-F} option @cindex @option{-F} option
@cindex @option{--field-separator} option @cindex @option{--field-separator} option
@cindex @code{FS} variable, @code{--field-separator} option and @cindex @code{FS} variable @subentry @code{--field-separator} option and
Set the @code{FS} variable to @var{fs} Set the @code{FS} variable to @var{fs}
(@pxref{Field Separators}). (@pxref{Field Separators}).
@item -f @var{source-file} @item -f @var{source-file}
@itemx --file @var{source-file} @itemx --file @var{source-file}
@cindex @option{-f} option @cindex @option{-f} option
@cindex @option{--file} option @cindex @option{--file} option
@cindex @command{awk} programs, location of @cindex @command{awk} programs @subentry location of
Read the @command{awk} program source from @var{source-file} Read the @command{awk} program source from @var{source-file}
instead of in the first nonoption argument. instead of in the first nonoption argument.
This option may be given multiple times; the @command{awk} This option may be given multiple times; the @command{awk}
program consists of the concatenation of the contents of program consists of the concatenation of the contents of
each specified @var{source-file}. each specified @var{source-file}.
Files named with @option{-f} are treated as if they had @samp{@@namespace "awk"} Files named with @option{-f} are treated as if they had @samp{@@namespace "awk"}
at their beginning. @xref{Changing The Namespace}, for more information. at their beginning. @xref{Changing The Namespace}, for more information
on this advanced feature.
@item -v @var{var}=@var{val} @item -v @var{var}=@var{val}
@itemx --assign @var{var}=@var{val} @itemx --assign @var{var}=@var{val}
@cindex @option{-v} option @cindex @option{-v} option
@cindex @option{--assign} option @cindex @option{--assign} option
@cindex variables, setting @cindex variables @subentry setting
Set the variable @var{var} to the value @var{val} @emph{before} Set the variable @var{var} to the value @var{val} @emph{before}
execution of the program begins. Such variable values are available execution of the program begins. Such variable values are available
inside the @code{BEGIN} rule inside the @code{BEGIN} rule
(@pxref{Other Arguments}). (@pxref{Other Arguments}).
The @option{-v} option can only set one variable, but it can be used The @option{-v} option can only set one variable, but it can be used
more than once, setting another variable each time, like this: more than once, setting another variable each time, like this:
@samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}. @samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}.
@cindex predefined variables, @code{-v} option@comma{} setting with @cindex predefined variables @subentry @code{-v} option, setting with
@cindex variables, predefined, @code{-v} option@comma{} setting with @cindex variables @subentry predefined @subentry @code{-v} option, setting with
@quotation CAUTION @quotation CAUTION
Using @option{-v} to set the values of the built-in Using @option{-v} to set the values of the built-in
variables may lead to surprising results. @command{awk} will reset the variables may lead to surprising results. @command{awk} will reset the
values of those variables as it needs to, possibly ignoring any values of those variables as it needs to, possibly ignoring any
initial value you may have given. initial value you may have given.
@end quotation @end quotation
@item -W @var{gawk-opt} @item -W @var{gawk-opt}
@cindex @option{-W} option @cindex @option{-W} option
Provide an implementation-specific option. Provide an implementation-specific option.
This is the POSIX convention for providing implementation-specific options. This is the POSIX convention for providing implementation-specific options.
These options These options
also have corresponding GNU-style long options. also have corresponding GNU-style long options.
Note that the long options may be abbreviated, as long as Note that the long options may be abbreviated, as long as
the abbreviations remain unique. the abbreviations remain unique.
The full list of @command{gawk}-specific options is provided next. The full list of @command{gawk}-specific options is provided next.
@item -- @item --
@cindex command line, options, end of @cindex command line @subentry options @subentry end of
@cindex options, command-line, end of @cindex options @subentry command-line @subentry end of
Signal the end of the command-line options. The following arguments Signal the end of the command-line options. The following arguments
are not treated as options even if they begin with @samp{-}. This are not treated as options even if they begin with @samp{-}. This
interpretation of @option{--} follows the POSIX argument parsing interpretation of @option{--} follows the POSIX argument parsing
conventions. conventions.
@cindex @code{-} (hyphen), file names beginning with @cindex @code{-} (hyphen) @subentry file names beginning with
@cindex hyphen (@code{-}), file names beginning with @cindex hyphen (@code{-}) @subentry file names beginning with
This is useful if you have @value{FN}s that start with @samp{-}, This is useful if you have @value{FN}s that start with @samp{-},
or in shell scripts, if you have @value{FN}s that will be specified or in shell scripts, if you have @value{FN}s that will be specified
by the user that could start with @samp{-}. by the user that could start with @samp{-}.
It is also useful for passing options on to the @command{awk} It is also useful for passing options on to the @command{awk}
program; see @ref{Getopt Function}. program; see @ref{Getopt Function}.
@end table @end table
The following list describes @command{gawk}-specific options: The following list describes @command{gawk}-specific options:
@c Have to use @asis here to get docbook to come out right. @c Have to use @asis here to get docbook to come out right.
skipping to change at page 110, line ? skipping to change at page 110, line ?
its input data according to the current locale (@pxref{Locales}). This can often involve its input data according to the current locale (@pxref{Locales}). This can often involve
converting multibyte characters into wide characters (internally), and converting multibyte characters into wide characters (internally), and
can lead to problems or confusion if the input data does not contain valid can lead to problems or confusion if the input data does not contain valid
multibyte characters. This option is an easy way to tell @command{gawk}, multibyte characters. This option is an easy way to tell @command{gawk},
``Hands off my data!'' ``Hands off my data!''
@item @option{-c} @item @option{-c}
@itemx @option{--traditional} @itemx @option{--traditional}
@cindex @option{-c} option @cindex @option{-c} option
@cindex @option{--traditional} option @cindex @option{--traditional} option
@cindex compatibility mode (@command{gawk}), specifying @cindex compatibility mode (@command{gawk}) @subentry specifying
Specify @dfn{compatibility mode}, in which the GNU extensions to Specify @dfn{compatibility mode}, in which the GNU extensions to
the @command{awk} language are disabled, so that @command{gawk} behaves just the @command{awk} language are disabled, so that @command{gawk} behaves just
like BWK @command{awk}. like BWK @command{awk}.
@xref{POSIX/GNU}, @xref{POSIX/GNU},
which summarizes the extensions. which summarizes the extensions.
@ifclear FOR_PRINT @ifclear FOR_PRINT
Also see Also see
@ref{Compatibility Mode}. @ref{Compatibility Mode}.
@end ifclear @end ifclear
@item @option{-C} @item @option{-C}
@itemx @option{--copyright} @itemx @option{--copyright}
@cindex @option{-C} option @cindex @option{-C} option
@cindex @option{--copyright} option @cindex @option{--copyright} option
@cindex GPL (General Public License), printing @cindex GPL (General Public License) @subentry printing
Print the short version of the General Public License and then exit. Print the short version of the General Public License and then exit.
@item @option{-d}[@var{file}] @item @option{-d}[@var{file}]
@itemx @option{--dump-variables}[@code{=}@var{file}] @itemx @option{--dump-variables}[@code{=}@var{file}]
@cindex @option{-d} option @cindex @option{-d} option
@cindex @option{--dump-variables} option @cindex @option{--dump-variables} option
@cindex dump all variables of a program @cindex dump all variables of a program
@cindex @file{awkvars.out} file @cindex @file{awkvars.out} file
@cindex files, @file{awkvars.out} @cindex files @subentry @file{awkvars.out}
@cindex variables, global, printing list of @cindex variables @subentry global @subentry printing list of
Print a sorted list of global variables, their types, and final values Print a sorted list of global variables, their types, and final values
to @var{file}. If no @var{file} is provided, print this to @var{file}. If no @var{file} is provided, print this
list to a file named @file{awkvars.out} in the current directory. list to a file named @file{awkvars.out} in the current directory.
No space is allowed between the @option{-d} and @var{file}, if No space is allowed between the @option{-d} and @var{file}, if
@var{file} is supplied. @var{file} is supplied.
@cindex troubleshooting, typographical errors@comma{} global variables @cindex troubleshooting @subentry typographical errors, global variables
Having a list of all global variables is a good way to look for Having a list of all global variables is a good way to look for
typographical errors in your programs. typographical errors in your programs.
You would also use this option if you have a large program with a lot of You would also use this option if you have a large program with a lot of
functions, and you want to be sure that your functions don't functions, and you want to be sure that your functions don't
inadvertently use global variables that you meant to be local. inadvertently use global variables that you meant to be local.
(This is a particularly easy mistake to make with simple variable (This is a particularly easy mistake to make with simple variable
names like @code{i}, @code{j}, etc.) names like @code{i}, @code{j}, etc.)
@item @option{-D}[@var{file}] @item @option{-D}[@var{file}]
@itemx @option{--debug}[@code{=}@var{file}] @itemx @option{--debug}[@code{=}@var{file}]
@cindex @option{-D} option @cindex @option{-D} option
@cindex @option{--debug} option @cindex @option{--debug} option
@cindex @command{awk} debugging, enabling @cindex @command{awk} programs @subentry debugging, enabling
Enable debugging of @command{awk} programs Enable debugging of @command{awk} programs
(@pxref{Debugging}). (@pxref{Debugging}).
By default, the debugger reads commands interactively from the keyboard By default, the debugger reads commands interactively from the keyboard
(standard input). (standard input).
The optional @var{file} argument allows you to specify a file with a list The optional @var{file} argument allows you to specify a file with a list
of commands for the debugger to execute noninteractively. of commands for the debugger to execute noninteractively.
No space is allowed between the @option{-D} and @var{file}, if No space is allowed between the @option{-D} and @var{file}, if
@var{file} is supplied. @var{file} is supplied.
@item @option{-e} @var{program-text} @item @option{-e} @var{program-text}
@itemx @option{--source} @var{program-text} @itemx @option{--source} @var{program-text}
@cindex @option{-e} option @cindex @option{-e} option
@cindex @option{--source} option @cindex @option{--source} option
@cindex source code, mixing @cindex source code @subentry mixing
Provide program source code in the @var{program-text}. Provide program source code in the @var{program-text}.
This option allows you to mix source code in files with source This option allows you to mix source code in files with source
code that you enter on the command line. code that you enter on the command line.
This is particularly useful This is particularly useful
when you have library functions that you want to use from your command-line when you have library functions that you want to use from your command-line
programs (@pxref{AWKPATH Variable}). programs (@pxref{AWKPATH Variable}).
Note that @command{gawk} treats each string as if it ended with Note that @command{gawk} treats each string as if it ended with
a newline character (even if it doesn't). This makes building a newline character (even if it doesn't). This makes building
the total program easier. the total program easier.
skipping to change at page 110, line ? skipping to change at page 110, line ?
This is because each @var{program-text} is treated as if it had This is because each @var{program-text} is treated as if it had
@samp{@@namespace "awk"} at its beginning. @xref{Changing The Namespace}, @samp{@@namespace "awk"} at its beginning. @xref{Changing The Namespace},
for more information. for more information.
@end quotation @end quotation
@item @option{-E} @var{file} @item @option{-E} @var{file}
@itemx @option{--exec} @var{file} @itemx @option{--exec} @var{file}
@cindex @option{-E} option @cindex @option{-E} option
@cindex @option{--exec} option @cindex @option{--exec} option
@cindex @command{awk} programs, location of @cindex @command{awk} programs @subentry location of
@cindex CGI, @command{awk} scripts for @cindex CGI, @command{awk} scripts for
Similar to @option{-f}, read @command{awk} program text from @var{file}. Similar to @option{-f}, read @command{awk} program text from @var{file}.
There are two differences from @option{-f}: There are two differences from @option{-f}:
@itemize @value{BULLET} @itemize @value{BULLET}
@item @item
This option terminates option processing; anything This option terminates option processing; anything
else on the command line is passed on directly to the @command{awk} program. else on the command line is passed on directly to the @command{awk} program.
@item @item
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
#! /usr/local/bin/gawk -E #! /usr/local/bin/gawk -E
@var{awk program here @dots{}} @var{awk program here @dots{}}
@end example @end example
@item @option{-g} @item @option{-g}
@itemx @option{--gen-pot} @itemx @option{--gen-pot}
@cindex @option{-g} option @cindex @option{-g} option
@cindex @option{--gen-pot} option @cindex @option{--gen-pot} option
@cindex portable object files, generating @cindex portable object @subentry files @subentry generating
@cindex files, portable object, generating @cindex files @subentry portable object @subentry generating
Analyze the source program and Analyze the source program and
generate a GNU @command{gettext} portable object template file on standard generate a GNU @command{gettext} portable object template file on standard
output for all string constants that have been marked for translation. output for all string constants that have been marked for translation.
@xref{Internationalization}, @xref{Internationalization},
for information about this option. for information about this option.
@item @option{-h} @item @option{-h}
@itemx @option{--help} @itemx @option{--help}
@cindex @option{-h} option @cindex @option{-h} option
@cindex @option{--help} option @cindex @option{--help} option
@cindex GNU long options, printing list of @cindex GNU long options @subentry printing list of
@cindex options, printing list of @cindex options @subentry printing list of
@cindex printing, list of options @cindex printing @subentry list of options
Print a ``usage'' message summarizing the short- and long-style options Print a ``usage'' message summarizing the short- and long-style options
that @command{gawk} accepts and then exit. that @command{gawk} accepts and then exit.
@item @option{-i} @var{source-file} @item @option{-i} @var{source-file}
@itemx @option{--include} @var{source-file} @itemx @option{--include} @var{source-file}
@cindex @option{-i} option @cindex @option{-i} option
@cindex @option{--include} option @cindex @option{--include} option
@cindex @command{awk} programs, location of @cindex @command{awk} programs @subentry location of
Read an @command{awk} source library from @var{source-file}. This option Read an @command{awk} source library from @var{source-file}. This option
is completely equivalent to using the @code{@@include} directive inside is completely equivalent to using the @code{@@include} directive inside
your program. It is very similar to the @option{-f} option, your program. It is very similar to the @option{-f} option,
but there are two important differences. First, when @option{-i} is but there are two important differences. First, when @option{-i} is
used, the program source is not loaded if it has been previously used, the program source is not loaded if it has been previously
loaded, whereas with @option{-f}, @command{gawk} always loads the file. loaded, whereas with @option{-f}, @command{gawk} always loads the file.
Second, because this option is intended to be used with code libraries, Second, because this option is intended to be used with code libraries,
@command{gawk} does not recognize such files as constituting main program @command{gawk} does not recognize such files as constituting main program
input. Thus, after processing an @option{-i} argument, @command{gawk} input. Thus, after processing an @option{-i} argument, @command{gawk}
still expects to find the main source code via the @option{-f} option still expects to find the main source code via the @option{-f} option
or on the command line. or on the command line.
Files named with @option{-i} are treated as if they had @samp{@@namespace "awk"} Files named with @option{-i} are treated as if they had @samp{@@namespace "awk"}
at their beginning. @xref{Changing The Namespace}, for more information. at their beginning. @xref{Changing The Namespace}, for more information.
@item @option{-l} @var{ext} @item @option{-l} @var{ext}
@itemx @option{--load} @var{ext} @itemx @option{--load} @var{ext}
@cindex @option{-l} option @cindex @option{-l} option
@cindex @option{--load} option @cindex @option{--load} option
@cindex loading, extensions @cindex loading extensions
Load a dynamic extension named @var{ext}. Extensions Load a dynamic extension named @var{ext}. Extensions
are stored as system shared libraries. are stored as system shared libraries.
This option searches for the library using the @env{AWKLIBPATH} This option searches for the library using the @env{AWKLIBPATH}
environment variable. The correct library suffix for your platform will be environment variable. The correct library suffix for your platform will be
supplied by default, so it need not be specified in the extension name. supplied by default, so it need not be specified in the extension name.
The extension initialization routine should be named @code{dl_load()}. The extension initialization routine should be named @code{dl_load()}.
An alternative is to use the @code{@@load} keyword inside the program to load An alternative is to use the @code{@@load} keyword inside the program to load
a shared library. This advanced feature is described in detail in @ref{Dynamic Extensions}. a shared library. This advanced feature is described in detail in @ref{Dynamic Extensions}.
@item @option{-L}[@var{value}] @item @option{-L}[@var{value}]
@itemx @option{--lint}[@code{=}@var{value}] @itemx @option{--lint}[@code{=}@var{value}]
@cindex @option{-l} option @cindex @option{-l} option
@cindex @option{--lint} option @cindex @option{--lint} option
@cindex lint checking, issuing warnings @cindex lint checking @subentry issuing warnings
@cindex warnings, issuing @cindex warnings, issuing
Warn about constructs that are dubious or nonportable to Warn about constructs that are dubious or nonportable to
other @command{awk} implementations. other @command{awk} implementations.
No space is allowed between the @option{-L} and @var{value}, if No space is allowed between the @option{-L} and @var{value}, if
@var{value} is supplied. @var{value} is supplied.
Some warnings are issued when @command{gawk} first reads your program. Others Some warnings are issued when @command{gawk} first reads your program. Others
are issued at runtime, as your program executes. are issued at runtime, as your program executes. The optional
With an optional argument of @samp{fatal}, argument may be one of the following:
lint warnings become fatal errors.
@table @code
@item fatal
Cause lint warnings become fatal errors.
This may be drastic, but its use will certainly encourage the This may be drastic, but its use will certainly encourage the
development of cleaner @command{awk} programs. development of cleaner @command{awk} programs.
With an optional argument of @samp{invalid}, only warnings about things
@item invalid
Only issue warnings about things
that are actually invalid are issued. (This is not fully implemented yet.) that are actually invalid are issued. (This is not fully implemented yet.)
With an optional argument of @samp{no-ext}, warnings about @command{gawk}
extensions are disabled. @item no-ext
Disable warnings about @command{gawk} extensions.
@end table
Some warnings are only printed once, even if the dubious constructs they Some warnings are only printed once, even if the dubious constructs they
warn about occur multiple times in your @command{awk} program. Thus, warn about occur multiple times in your @command{awk} program. Thus,
when eliminating problems pointed out by @option{--lint}, you should take when eliminating problems pointed out by @option{--lint}, you should take
care to search for all occurrences of each inappropriate construct. As care to search for all occurrences of each inappropriate construct. As
@command{awk} programs are usually short, doing so is not burdensome. @command{awk} programs are usually short, doing so is not burdensome.
@item @option{-M} @item @option{-M}
@itemx @option{--bignum} @itemx @option{--bignum}
@cindex @option{-M} option @cindex @option{-M} option
@cindex @option{--bignum} option @cindex @option{--bignum} option
Select arbitrary-precision arithmetic on numbers. This option has no effect Select arbitrary-precision arithmetic on numbers. This option has no effect
if @command{gawk} is not compiled to use the GNU MPFR and MP libraries if @command{gawk} is not compiled to use the GNU MPFR and MP libraries
(@pxref{Arbitrary Precision Arithmetic}). (@pxref{Arbitrary Precision Arithmetic}).
@item @option{-n} @item @option{-n}
@itemx @option{--non-decimal-data} @itemx @option{--non-decimal-data}
@cindex @option{-n} option @cindex @option{-n} option
@cindex @option{--non-decimal-data} option @cindex @option{--non-decimal-data} option
@cindex hexadecimal values@comma{} enabling interpretation of @cindex hexadecimal values, enabling interpretation of
@cindex octal values@comma{} enabling interpretation of @cindex octal values, enabling interpretation of
@cindex troubleshooting, @code{--non-decimal-data} option @cindex troubleshooting @subentry @code{--non-decimal-data} option
Enable automatic interpretation of octal and hexadecimal Enable automatic interpretation of octal and hexadecimal
values in input data values in input data
(@pxref{Nondecimal Data}). (@pxref{Nondecimal Data}).
@quotation CAUTION @quotation CAUTION
This option can severely break old programs. Use with care. Also note This option can severely break old programs. Use with care. Also note
that this option may disappear in a future version of @command{gawk}. that this option may disappear in a future version of @command{gawk}.
@end quotation @end quotation
@item @option{-N} @item @option{-N}
skipping to change at page 110, line ? skipping to change at page 110, line ?
Optimization is enabled by default. Optimization is enabled by default.
This option remains primarily for backwards compatibility. However, it may This option remains primarily for backwards compatibility. However, it may
be used to cancel the effect of an earlier @option{-s} option be used to cancel the effect of an earlier @option{-s} option
(see later in this list). (see later in this list).
@item @option{-p}[@var{file}] @item @option{-p}[@var{file}]
@itemx @option{--profile}[@code{=}@var{file}] @itemx @option{--profile}[@code{=}@var{file}]
@cindex @option{-p} option @cindex @option{-p} option
@cindex @option{--profile} option @cindex @option{--profile} option
@cindex @command{awk} profiling, enabling @cindex @command{awk} @subentry profiling, enabling
Enable profiling of @command{awk} programs Enable profiling of @command{awk} programs
(@pxref{Profiling}). (@pxref{Profiling}).
Implies @option{--no-optimize}. Implies @option{--no-optimize}.
By default, profiles are created in a file named @file{awkprof.out}. By default, profiles are created in a file named @file{awkprof.out}.
The optional @var{file} argument allows you to specify a different The optional @var{file} argument allows you to specify a different
@value{FN} for the profile file. @value{FN} for the profile file.
No space is allowed between the @option{-p} and @var{file}, if No space is allowed between the @option{-p} and @var{file}, if
@var{file} is supplied. @var{file} is supplied.
The profile contains execution counts for each statement in the program The profile contains execution counts for each statement in the program
in the left margin, and function call counts for each function. in the left margin, and function call counts for each function.
@item @option{-P} @item @option{-P}
@itemx @option{--posix} @itemx @option{--posix}
@cindex @option{-P} option @cindex @option{-P} option
@cindex @option{--posix} option @cindex @option{--posix} option
@cindex POSIX mode @cindex POSIX mode
@cindex @command{gawk}, extensions@comma{} disabling @cindex @command{gawk} @subentry extensions, disabling
Operate in strict POSIX mode. This disables all @command{gawk} Operate in strict POSIX mode. This disables all @command{gawk}
extensions (just like @option{--traditional}) and extensions (just like @option{--traditional}) and
disables all extensions not allowed by POSIX. disables all extensions not allowed by POSIX.
@xref{Common Extensions} for a summary of the extensions @xref{Common Extensions} for a summary of the extensions
in @command{gawk} that are disabled by this option. in @command{gawk} that are disabled by this option.
Also, Also,
the following additional the following additional
restrictions apply: restrictions apply:
@itemize @value{BULLET} @itemize @value{BULLET}
@cindex newlines @cindex newlines
@cindex whitespace, newlines as @cindex whitespace @subentry newlines as
@item @item
Newlines are not allowed after @samp{?} or @samp{:} Newlines are not allowed after @samp{?} or @samp{:}
(@pxref{Conditional Exp}). (@pxref{Conditional Exp}).
@cindex @code{FS} variable, TAB character as @cindex @code{FS} variable @subentry TAB character as
@item @item
Specifying @samp{-Ft} on the command line does not set the value Specifying @samp{-Ft} on the command line does not set the value
of @code{FS} to be a single TAB character of @code{FS} to be a single TAB character
(@pxref{Field Separators}). (@pxref{Field Separators}).
@cindex locale decimal point character @cindex locale decimal point character
@cindex decimal point character, locale specific @cindex decimal point character, locale specific
@item @item
The locale's decimal point character is used for parsing input The locale's decimal point character is used for parsing input
data (@pxref{Locales}). data (@pxref{Locales}).
@end itemize @end itemize
@c @cindex automatic warnings @c @cindex automatic warnings
@c @cindex warnings, automatic @c @cindex warnings, automatic
@cindex @option{--traditional} option, @code{--posix} option and @cindex @option{--traditional} option @subentry @code{--posix} option and
@cindex @option{--posix} option, @code{--traditional} option and @cindex @option{--posix} option @subentry @code{--traditional} option and
If you supply both @option{--traditional} and @option{--posix} on the If you supply both @option{--traditional} and @option{--posix} on the
command line, @option{--posix} takes precedence. @command{gawk} command line, @option{--posix} takes precedence. @command{gawk}
issues a warning if both options are supplied. issues a warning if both options are supplied.
@item @option{-r} @item @option{-r}
@itemx @option{--re-interval} @itemx @option{--re-interval}
@cindex @option{-r} option @cindex @option{-r} option
@cindex @option{--re-interval} option @cindex @option{--re-interval} option
@cindex regular expressions, interval expressions and @cindex regular expressions @subentry interval expressions and
Allow interval expressions Allow interval expressions
(@pxref{Regexp Operators}) (@pxref{Regexp Operators})
in regexps. in regexps.
This is now @command{gawk}'s default behavior. This is now @command{gawk}'s default behavior.
Nevertheless, this option remains (both for backward compatibility Nevertheless, this option remains (both for backward compatibility
and for use in combination with @option{--traditional}). and for use in combination with @option{--traditional}).
@item @option{-s} @item @option{-s}
@itemx @option{--no-optimize} @itemx @option{--no-optimize}
@cindex @option{--no-optimize} option @cindex @option{--no-optimize} option
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex sandbox mode @cindex sandbox mode
@cindex @code{ARGV} array @cindex @code{ARGV} array
Disable the @code{system()} function, Disable the @code{system()} function,
input redirections with @code{getline}, input redirections with @code{getline},
output redirections with @code{print} and @code{printf}, output redirections with @code{print} and @code{printf},
and dynamic extensions. and dynamic extensions.
Also, disallow adding filenames to @code{ARGV} that were Also, disallow adding filenames to @code{ARGV} that were
not there when @command{gawk} started running. not there when @command{gawk} started running.
This is particularly useful when you want to run @command{awk} scripts This is particularly useful when you want to run @command{awk} scripts
from questionable sources and need to make sure the scripts from questionable sources and need to make sure the scripts
can't access your system (other than the specified input @value{DF}). can't access your system (other than the specified input @value{DF}s).
@item @option{-t} @item @option{-t}
@itemx @option{--lint-old} @itemx @option{--lint-old}
@cindex @option{-L} option @cindex @option{-L} option
@cindex @option{--lint-old} option @cindex @option{--lint-old} option
Warn about constructs that are not available in the original version of Warn about constructs that are not available in the original version of
@command{awk} from Version 7 Unix @command{awk} from Version 7 Unix
(@pxref{V7/SVR3.1}). (@pxref{V7/SVR3.1}).
@item @option{-V} @item @option{-V}
@itemx @option{--version} @itemx @option{--version}
@cindex @option{-V} option @cindex @option{-V} option
@cindex @option{--version} option @cindex @option{--version} option
@cindex @command{gawk}, versions of, information about@comma{} printing @cindex @command{gawk} @subentry version of @subentry printing information about
Print version information for this particular copy of @command{gawk}. Print version information for this particular copy of @command{gawk}.
This allows you to determine if your copy of @command{gawk} is up to date This allows you to determine if your copy of @command{gawk} is up to date
with respect to whatever the Free Software Foundation is currently with respect to whatever the Free Software Foundation is currently
distributing. distributing.
It is also useful for bug reports It is also useful for bug reports
(@pxref{Bugs}). (@pxref{Bugs}).
@cindex @code{-} (hyphen) @subentry @code{--} end of options marker
@cindex hyphen (@code{-}) @subentry @code{--} end of options marker
@item @code{--}
Mark the end of all options.
Any command-line arguments following @code{--} are placed in @code{ARGV},
even if they start with a minus sign.
@end table @end table
As long as program text has been supplied, As long as program text has been supplied,
any other options are flagged as invalid with a warning message but any other options are flagged as invalid with a warning message but
are otherwise ignored. are otherwise ignored.
@cindex @option{-F} option, @option{-Ft} sets @code{FS} to TAB @cindex @option{-F} option @subentry @option{-Ft} sets @code{FS} to TAB
In compatibility mode, as a special case, if the value of @var{fs} supplied In compatibility mode, as a special case, if the value of @var{fs} supplied
to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB
character (@code{"\t"}). This is true only for @option{--traditional} and not character (@code{"\t"}). This is true only for @option{--traditional} and not
for @option{--posix} for @option{--posix}
(@pxref{Field Separators}). (@pxref{Field Separators}).
@cindex @option{-f} option, multiple uses @cindex @option{-f} option @subentry multiple uses
The @option{-f} option may be used more than once on the command line. The @option{-f} option may be used more than once on the command line.
If it is, @command{awk} reads its program source from all of the named files, as If it is, @command{awk} reads its program source from all of the named files, as
if they had been concatenated together into one big file. This is if they had been concatenated together into one big file. This is
useful for creating libraries of @command{awk} functions. These functions useful for creating libraries of @command{awk} functions. These functions
can be written once and then retrieved from a standard place, instead can be written once and then retrieved from a standard place, instead
of having to be included in each individual program. of having to be included in each individual program.
The @option{-i} option is similar in this regard. The @option{-i} option is similar in this regard.
(As mentioned in (As mentioned in
@ref{Definition Syntax}, @ref{Definition Syntax},
function names must be unique.) function names must be unique.)
skipping to change at page 110, line ? skipping to change at page 110, line ?
if the program is entered at the keyboard, if the program is entered at the keyboard,
by specifying @samp{-f /dev/tty}. After typing your program, by specifying @samp{-f /dev/tty}. After typing your program,
type @kbd{Ctrl-d} (the end-of-file character) to terminate it. type @kbd{Ctrl-d} (the end-of-file character) to terminate it.
(You may also use @samp{-f -} to read program source from the standard (You may also use @samp{-f -} to read program source from the standard
input, but then you will not be able to also use the standard input as a input, but then you will not be able to also use the standard input as a
source of data.) source of data.)
Because it is clumsy using the standard @command{awk} mechanisms to mix Because it is clumsy using the standard @command{awk} mechanisms to mix
source file and command-line @command{awk} programs, @command{gawk} source file and command-line @command{awk} programs, @command{gawk}
provides the @option{-e} option. This does not require you to provides the @option{-e} option. This does not require you to
preempt the standard input for your source code; it allows you to easily preempt the standard input for your source code, and it allows you to easily
mix command-line and library source code (@pxref{AWKPATH Variable}). mix command-line and library source code (@pxref{AWKPATH Variable}).
As with @option{-f}, the @option{-e} and @option{-i} As with @option{-f}, the @option{-e} and @option{-i}
options may also be used multiple times on the command line. options may also be used multiple times on the command line.
@cindex @option{-e} option @cindex @option{-e} option
If no @option{-f} option (or @option{-e} option for @command{gawk}) If no @option{-f} option (or @option{-e} option for @command{gawk})
is specified, then @command{awk} uses the first nonoption command-line is specified, then @command{awk} uses the first nonoption command-line
argument as the text of the program source code. Arguments on argument as the text of the program source code. Arguments on
the command line that follow the program text are entered into the the command line that follow the program text are entered into the
@code{ARGV} array; @command{awk} does @emph{not} continue to parse the @code{ARGV} array; @command{awk} does @emph{not} continue to parse the
command line looking for options. command line looking for options.
@cindex @env{POSIXLY_CORRECT} environment variable @cindex @env{POSIXLY_CORRECT} environment variable
@cindex lint checking, @env{POSIXLY_CORRECT} environment variable @cindex environment variables @subentry @env{POSIXLY_CORRECT}
@cindex lint checking @subentry @env{POSIXLY_CORRECT} environment variable
@cindex POSIX mode @cindex POSIX mode
If the environment variable @env{POSIXLY_CORRECT} exists, If the environment variable @env{POSIXLY_CORRECT} exists,
then @command{gawk} behaves in strict POSIX mode, exactly as if then @command{gawk} behaves in strict POSIX mode, exactly as if
you had supplied @option{--posix}. you had supplied @option{--posix}.
Many GNU programs look for this environment variable to suppress Many GNU programs look for this environment variable to suppress
extensions that conflict with POSIX, but @command{gawk} behaves extensions that conflict with POSIX, but @command{gawk} behaves
differently: it suppresses all extensions, even those that do not differently: it suppresses all extensions, even those that do not
conflict with POSIX, and behaves in conflict with POSIX, and behaves in
strict POSIX mode. If @option{--lint} is supplied on the command line strict POSIX mode. If @option{--lint} is supplied on the command line
and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT}, and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT},
skipping to change at page 110, line ? skipping to change at page 110, line ?
mode is in effect. mode is in effect.
You would typically set this variable in your shell's startup file. You would typically set this variable in your shell's startup file.
For a Bourne-compatible shell (such as Bash), you would add these For a Bourne-compatible shell (such as Bash), you would add these
lines to the @file{.profile} file in your home directory: lines to the @file{.profile} file in your home directory:
@example @example
POSIXLY_CORRECT=true POSIXLY_CORRECT=true
export POSIXLY_CORRECT export POSIXLY_CORRECT
@end example @end example
@cindex @command{csh} utility, @env{POSIXLY_CORRECT} environment variable @cindex @command{csh} utility @subentry @env{POSIXLY_CORRECT} environment variab le
For a C shell-compatible For a C shell-compatible
shell,@footnote{Not recommended.} shell,@footnote{Not recommended.}
you would add this line to the @file{.login} file in your home directory: you would add this line to the @file{.login} file in your home directory:
@example @example
setenv POSIXLY_CORRECT true setenv POSIXLY_CORRECT true
@end example @end example
@cindex portability, @env{POSIXLY_CORRECT} environment variable @cindex portability @subentry @env{POSIXLY_CORRECT} environment variable
Having @env{POSIXLY_CORRECT} set is not recommended for daily use, Having @env{POSIXLY_CORRECT} set is not recommended for daily use,
but it is good for testing the portability of your programs to other but it is good for testing the portability of your programs to other
environments. environments.
@node Other Arguments @node Other Arguments
@section Other Command-Line Arguments @section Other Command-Line Arguments
@cindex command line, arguments @cindex command line @subentry arguments
@cindex arguments, command-line @cindex arguments @subentry command-line
Any additional arguments on the command line are normally treated as Any additional arguments on the command line are normally treated as
input files to be processed in the order specified. However, an input files to be processed in the order specified. However, an
argument that has the form @code{@var{var}=@var{value}}, assigns argument that has the form @code{@var{var}=@var{value}}, assigns
the value @var{value} to the variable @var{var}---it does not specify a the value @var{value} to the variable @var{var}---it does not specify a
file at all. (See @ref{Assignment Options}.) In the following example, file at all. (See @ref{Assignment Options}.) In the following example,
@var{count=1} is a variable assignment, not a @value{FN}: @var{count=1} is a variable assignment, not a @value{FN}:
@example @example
awk -f program.awk file1 count=1 file2 awk -f program.awk file1 count=1 file2
@end example @end example
@cindex @command{gawk}, @code{ARGIND} variable in @noindent
@cindex @code{ARGIND} variable, command-line arguments As a side point, should you really need to have @command{awk}
process a file named @file{count=1} (or any file whose name looks like
a variable assignment), precede the file name with @samp{./}, like so:
@example
awk -f program.awk file1 ./count=1 file2
@end example
@cindex @command{gawk} @subentry @code{ARGIND} variable in
@cindex @code{ARGIND} variable @subentry command-line arguments
@cindex @code{ARGV} array, indexing into @cindex @code{ARGV} array, indexing into
@cindex @code{ARGC}/@code{ARGV} variables, command-line arguments @cindex @code{ARGC}/@code{ARGV} variables @subentry command-line arguments
@cindex @command{gawk} @subentry @code{PROCINFO} array in
All the command-line arguments are made available to your @command{awk} program in the All the command-line arguments are made available to your @command{awk} program in the
@code{ARGV} array (@pxref{Built-in Variables}). Command-line options @code{ARGV} array (@pxref{Built-in Variables}). Command-line options
and the program text (if present) are omitted from @code{ARGV}. and the program text (if present) are omitted from @code{ARGV}.
All other arguments, including variable assignments, are All other arguments, including variable assignments, are
included. As each element of @code{ARGV} is processed, @command{gawk} included. As each element of @code{ARGV} is processed, @command{gawk}
sets @code{ARGIND} to the index in @code{ARGV} of the sets @code{ARGIND} to the index in @code{ARGV} of the
current element. current element. (@command{gawk} makes the full command line,
including program text and options, available in @code{PROCINFO["argv"]};
@pxref{Auto-set}.)
@c FIXME: One day, move the ARGC and ARGV node closer to here. @c FIXME: One day, move the ARGC and ARGV node closer to here.
Changing @code{ARGC} and @code{ARGV} in your @command{awk} program lets Changing @code{ARGC} and @code{ARGV} in your @command{awk} program lets
you control how @command{awk} processes the input files; this is described you control how @command{awk} processes the input files; this is described
in more detail in @ref{ARGC and ARGV}. in more detail in @ref{ARGC and ARGV}.
@cindex input files, variable assignments and @cindex input files @subentry variable assignments and
@cindex variable assignments and input files @cindex variable assignments and input files
The distinction between @value{FN} arguments and variable-assignment The distinction between @value{FN} arguments and variable-assignment
arguments is made when @command{awk} is about to open the next input file. arguments is made when @command{awk} is about to open the next input file.
At that point in execution, it checks the @value{FN} to see whether At that point in execution, it checks the @value{FN} to see whether
it is really a variable assignment; if so, @command{awk} sets the variable it is really a variable assignment; if so, @command{awk} sets the variable
instead of reading a file. instead of reading a file.
Therefore, the variables actually receive the given values after all Therefore, the variables actually receive the given values after all
previously specified files have been read. In particular, the values of previously specified files have been read. In particular, the values of
variables assigned in this fashion are @emph{not} available inside a variables assigned in this fashion are @emph{not} available inside a
@code{BEGIN} rule @code{BEGIN} rule
(@pxref{BEGIN/END}), (@pxref{BEGIN/END}),
because such rules are run before @command{awk} begins scanning the argument lis t. because such rules are run before @command{awk} begins scanning the argument lis t.
@cindex dark corner, escape sequences @cindex dark corner @subentry escape sequences
The variable values given on the command line are processed for escape The variable values given on the command line are processed for escape
sequences (@pxref{Escape Sequences}). sequences (@pxref{Escape Sequences}).
@value{DARKCORNER} @value{DARKCORNER}
In some very early implementations of @command{awk}, when a variable assignment In some very early implementations of @command{awk}, when a variable assignment
occurred before any @value{FN}s, the assignment would happen @emph{before} occurred before any @value{FN}s, the assignment would happen @emph{before}
the @code{BEGIN} rule was executed. @command{awk}'s behavior was thus the @code{BEGIN} rule was executed. @command{awk}'s behavior was thus
inconsistent; some command-line assignments were available inside the inconsistent; some command-line assignments were available inside the
@code{BEGIN} rule, while others were not. Unfortunately, @code{BEGIN} rule, while others were not. Unfortunately,
some applications came to depend some applications came to depend
upon this ``feature.'' When @command{awk} was changed to be more consistent, upon this ``feature.'' When @command{awk} was changed to be more consistent,
the @option{-v} option was added to accommodate applications that depended the @option{-v} option was added to accommodate applications that depended
upon the old behavior. upon the old behavior.
The variable assignment feature is most useful for assigning to variables The variable assignment feature is most useful for assigning to variables
such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
output formats, before scanning the @value{DF}s. It is also useful for output formats, before scanning the @value{DF}s. It is also useful for
controlling state if multiple passes are needed over a @value{DF}. For controlling state if multiple passes are needed over a @value{DF}. For
example: example:
@cindex files, multiple passes over @cindex files @subentry multiple passes over
@example @example
awk 'pass == 1 @{ @var{pass 1 stuff} @} awk 'pass == 1 @{ @var{pass 1 stuff} @}
pass == 2 @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata pass == 2 @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata
@end example @end example
Given the variable assignment feature, the @option{-F} option for setting Given the variable assignment feature, the @option{-F} option for setting
the value of @code{FS} is not the value of @code{FS} is not
strictly necessary. It remains for historical compatibility. strictly necessary. It remains for historical compatibility.
@node Naming Standard Input @node Naming Standard Input
skipping to change at page 110, line ? skipping to change at page 110, line ?
@value{FN} @file{/dev/stdin}, both on the command line and @value{FN} @file{/dev/stdin}, both on the command line and
with @code{getline}. with @code{getline}.
Some other versions of @command{awk} also support this, but it Some other versions of @command{awk} also support this, but it
is not standard. is not standard.
(Some operating systems provide a @file{/dev/stdin} file (Some operating systems provide a @file{/dev/stdin} file
in the filesystem; however, @command{gawk} always processes in the filesystem; however, @command{gawk} always processes
this @value{FN} itself.) this @value{FN} itself.)
@node Environment Variables @node Environment Variables
@section The Environment Variables @command{gawk} Uses @section The Environment Variables @command{gawk} Uses
@cindex environment variables used by @command{gawk} @cindex environment variables @subentry used by @command{gawk}
A number of environment variables influence how @command{gawk} A number of environment variables influence how @command{gawk}
behaves. behaves.
@menu @menu
* AWKPATH Variable:: Searching directories for @command{awk} * AWKPATH Variable:: Searching directories for @command{awk}
programs. programs.
* AWKLIBPATH Variable:: Searching directories for @command{awk} shared * AWKLIBPATH Variable:: Searching directories for @command{awk} shared
libraries. libraries.
* Other Environment Variables:: The environment variables. * Other Environment Variables:: The environment variables.
@end menu @end menu
@node AWKPATH Variable @node AWKPATH Variable
@subsection The @env{AWKPATH} Environment Variable @subsection The @env{AWKPATH} Environment Variable
@cindex @env{AWKPATH} environment variable @cindex @env{AWKPATH} environment variable
@cindex directories, searching for source files @cindex environment variables @subentry @env{AWKPATH}
@cindex search paths, for source files @cindex directories @subentry searching @subentry for source files
@cindex differences in @command{awk} and @command{gawk}, @env{AWKPATH} environme @cindex search paths @subentry for source files
nt variable @cindex differences in @command{awk} and @command{gawk} @subentry @env{AWKPATH}
environment variable
@ifinfo @ifinfo
The previous @value{SECTION} described how @command{awk} program files can be na med The previous @value{SECTION} described how @command{awk} program files can be na med
on the command line with the @option{-f} option. on the command line with the @option{-f} option.
@end ifinfo @end ifinfo
In most @command{awk} In most @command{awk}
implementations, you must supply a precise pathname for each program implementations, you must supply a precise pathname for each program
file, unless the file is in the current directory. file, unless the file is in the current directory.
But with @command{gawk}, if the @value{FN} supplied to the @option{-f} But with @command{gawk}, if the @value{FN} supplied to the @option{-f}
or @option{-i} options or @option{-i} options
does not contain a directory separator @samp{/}, then @command{gawk} searches a list of does not contain a directory separator @samp{/}, then @command{gawk} searches a list of
skipping to change at page 110, line ? skipping to change at page 110, line ?
the current directory, either before or after the path search. As of the current directory, either before or after the path search. As of
@value{PVERSION} 4.1.2, this no longer happens; if you wish to look @value{PVERSION} 4.1.2, this no longer happens; if you wish to look
in the current directory, you must include @file{.} either as a separate in the current directory, you must include @file{.} either as a separate
entry or as a null entry in the search path. entry or as a null entry in the search path.
@end quotation @end quotation
The default value for @env{AWKPATH} is The default value for @env{AWKPATH} is
@samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk} @samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk}
may use a different directory; it may use a different directory; it
will depend upon how @command{gawk} was built and installed. The actual will depend upon how @command{gawk} was built and installed. The actual
directory is the value of @code{$(datadir)} generated when directory is the value of @code{$(pkgdatadir)} generated when
@command{gawk} was configured. You probably don't need to worry about this, @command{gawk} was configured.
though.} Since @file{.} is included at the beginning, @command{gawk} (For more detail, see the @file{INSTALL} file in the source distribution,
and see @ref{Quick Installation}.
You probably don't need to worry about this,
though.)} Since @file{.} is included at the beginning, @command{gawk}
searches first in the current directory and then in @file{/usr/local/share/awk}. searches first in the current directory and then in @file{/usr/local/share/awk}.
In practice, this means that you will rarely need to change the In practice, this means that you will rarely need to change the
value of @env{AWKPATH}. value of @env{AWKPATH}.
@xref{Shell Startup Files}, for information on functions that help to @xref{Shell Startup Files}, for information on functions that help to
manipulate the @env{AWKPATH} variable. manipulate the @env{AWKPATH} variable.
@command{gawk} places the value of the search path that it used into @command{gawk} places the value of the search path that it used into
@code{ENVIRON["AWKPATH"]}. This provides access to the actual search @code{ENVIRON["AWKPATH"]}. This provides access to the actual search
path value from within an @command{awk} program. path value from within an @command{awk} program.
Although you can change @code{ENVIRON["AWKPATH"]} within your @command{awk} Although you can change @code{ENVIRON["AWKPATH"]} within your @command{awk}
program, this has no effect on the running program's behavior. This makes program, this has no effect on the running program's behavior. This makes
sense: the @env{AWKPATH} environment variable is used to find the program sense: the @env{AWKPATH} environment variable is used to find the program
source files. Once your program is running, all the files have been source files. Once your program is running, all the files have been
found, and @command{gawk} no longer needs to use @env{AWKPATH}. found, and @command{gawk} no longer needs to use @env{AWKPATH}.
@node AWKLIBPATH Variable @node AWKLIBPATH Variable
@subsection The @env{AWKLIBPATH} Environment Variable @subsection The @env{AWKLIBPATH} Environment Variable
@cindex @env{AWKLIBPATH} environment variable @cindex @env{AWKLIBPATH} environment variable
@cindex directories, searching for loadable extensions @cindex environment variables @subentry @env{AWKLIBPATH}
@cindex search paths, for loadable extensions @cindex directories @subentry searching @subentry for loadable extensions
@cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} envir @cindex search paths @subentry for loadable extensions
onment variable @cindex differences in @command{awk} and @command{gawk} @subentry @code{AWKLIBPA
TH} environment variable
The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH} The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH}
variable, but it is used to search for loadable extensions (stored as variable, but it is used to search for loadable extensions (stored as
system shared libraries) specified with the @option{-l} option rather system shared libraries) specified with the @option{-l} option rather
than for source files. If the extension is not found, the path is than for source files. If the extension is not found, the path is
searched again after adding the appropriate shared library suffix for searched again after adding the appropriate shared library suffix for
the platform. For example, on GNU/Linux systems, the suffix @samp{.so} the platform. For example, on GNU/Linux systems, the suffix @samp{.so}
is used. The search path specified is also used for extensions loaded is used. The search path specified is also used for extensions loaded
via the @code{@@load} keyword (@pxref{Loading Shared Libraries}). via the @code{@@load} keyword (@pxref{Loading Shared Libraries}).
If @env{AWKLIBPATH} does not exist in the environment, or if it has If @env{AWKLIBPATH} does not exist in the environment, or if it has
an empty value, @command{gawk} uses a default path; this an empty value, @command{gawk} uses a default path; this
is typically @samp{/usr/local/lib/gawk}, although it can vary depending is typically @samp{/usr/local/lib/gawk}, although it can vary depending
upon how @command{gawk} was built. upon how @command{gawk} was built.@footnote{Your version of @command{gawk}
may use a different directory; it
will depend upon how @command{gawk} was built and installed. The actual
directory is the value of @code{$(pkgextensiondir)} generated when
@command{gawk} was configured.
(For more detail, see the @file{INSTALL} file in the source distribution,
and see @ref{Quick Installation}.
You probably don't need to worry about this,
though.)}
@xref{Shell Startup Files}, for information on functions that help to @xref{Shell Startup Files}, for information on functions that help to
manipulate the @env{AWKLIBPATH} variable. manipulate the @env{AWKLIBPATH} variable.
@command{gawk} places the value of the search path that it used into @command{gawk} places the value of the search path that it used into
@code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search @code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search
path value from within an @command{awk} program. path value from within an @command{awk} program.
Although you can change @code{ENVIRON["AWKLIBPATH"]} within your Although you can change @code{ENVIRON["AWKLIBPATH"]} within your
@command{awk} program, this has no effect on the running program's @command{awk} program, this has no effect on the running program's
skipping to change at page 110, line ? skipping to change at page 110, line ?
to @code{EXIT_FAILURE}. to @code{EXIT_FAILURE}.
@node Include Files @node Include Files
@section Including Other Files into Your Program @section Including Other Files into Your Program
@c Panos Papadopoulos <panos1962@gmail.com> contributed the original @c Panos Papadopoulos <panos1962@gmail.com> contributed the original
@c text for this section. @c text for this section.
This @value{SECTION} describes a feature that is specific to @command{gawk}. This @value{SECTION} describes a feature that is specific to @command{gawk}.
@cindex @code{@@include} directive @cindex @code{@@} (at-sign) @subentry @code{@@include} directive
@cindex at-sign (@code{@@}) @subentry @code{@@include} directive
@cindex file inclusion, @code{@@include} directive @cindex file inclusion, @code{@@include} directive
@cindex including files, @code{@@include} directive @cindex including files, @code{@@include} directive
@cindex @code{@@include} directive @sortas{include directive}
The @code{@@include} keyword can be used to read external @command{awk} source The @code{@@include} keyword can be used to read external @command{awk} source
files. This gives you the ability to split large @command{awk} source files files. This gives you the ability to split large @command{awk} source files
into smaller, more manageable pieces, and also lets you reuse common @command{aw k} into smaller, more manageable pieces, and also lets you reuse common @command{aw k}
code from various @command{awk} scripts. In other words, you can group code from various @command{awk} scripts. In other words, you can group
together @command{awk} functions used to carry out specific tasks together @command{awk} functions used to carry out specific tasks
into external files. These files can be used just like function libraries, into external files. These files can be used just like function libraries,
using the @code{@@include} keyword in conjunction with the @env{AWKPATH} using the @code{@@include} keyword in conjunction with the @env{AWKPATH}
environment variable. Note that source files may also be included environment variable. Note that source files may also be included
using the @option{-i} option. using the @option{-i} option.
skipping to change at page 110, line ? skipping to change at page 110, line ?
Finally, files included with @code{@@include} Finally, files included with @code{@@include}
are treated as if they had @samp{@@namespace "awk"} are treated as if they had @samp{@@namespace "awk"}
at their beginning. @xref{Changing The Namespace}, for more information. at their beginning. @xref{Changing The Namespace}, for more information.
@node Loading Shared Libraries @node Loading Shared Libraries
@section Loading Dynamic Extensions into Your Program @section Loading Dynamic Extensions into Your Program
This @value{SECTION} describes a feature that is specific to @command{gawk}. This @value{SECTION} describes a feature that is specific to @command{gawk}.
@cindex @code{@@load} directive @cindex @code{@@} (at-sign) @subentry @code{@@load} directive
@cindex loading extensions, @code{@@load} directive @cindex at-sign (@code{@@}) @subentry @code{@@load} directive
@cindex extensions, loading, @code{@@load} directive @cindex loading extensions @subentry @code{@@load} directive
@cindex extensions @subentry loadable @subentry loading, @code{@@load} directive
@cindex @code{@@load} directive @sortas{load directive}
The @code{@@load} keyword can be used to read external @command{awk} extensions The @code{@@load} keyword can be used to read external @command{awk} extensions
(stored as system shared libraries). (stored as system shared libraries).
This allows you to link in compiled code that may offer superior This allows you to link in compiled code that may offer superior
performance and/or give you access to extended capabilities not supported performance and/or give you access to extended capabilities not supported
by the @command{awk} language. The @env{AWKLIBPATH} variable is used to by the @command{awk} language. The @env{AWKLIBPATH} variable is used to
search for the extension. Using @code{@@load} is completely equivalent search for the extension. Using @code{@@load} is completely equivalent
to using the @option{-l} command-line option. to using the @option{-l} command-line option.
If the extension is not initially found in @env{AWKLIBPATH}, another If the extension is not initially found in @env{AWKLIBPATH}, another
search is conducted after appending the platform's default shared library search is conducted after appending the platform's default shared library
skipping to change at page 110, line ? skipping to change at page 110, line ?
@ref{Dynamic Extensions}, describes how to write extensions (in C or C++) @ref{Dynamic Extensions}, describes how to write extensions (in C or C++)
that can be loaded with either @code{@@load} or the @option{-l} option. that can be loaded with either @code{@@load} or the @option{-l} option.
It also describes the @code{ordchr} extension. It also describes the @code{ordchr} extension.
@node Obsolete @node Obsolete
@section Obsolete Options and/or Features @section Obsolete Options and/or Features
@c update this section for each release! @c update this section for each release!
@cindex options, deprecated @cindex options @subentry deprecated
@cindex features, deprecated @cindex features @subentry deprecated
@cindex obsolete features @cindex obsolete features
This @value{SECTION} describes features and/or command-line options from This @value{SECTION} describes features and/or command-line options from
previous releases of @command{gawk} that either are not available in the previous releases of @command{gawk} that either are not available in the
current version or are still supported but deprecated (meaning that current version or are still supported but deprecated (meaning that
they will @emph{not} be in the next release). they will @emph{not} be in the next release).
The process-related special files @file{/dev/pid}, @file{/dev/ppid}, The process-related special files @file{/dev/pid}, @file{/dev/ppid},
@file{/dev/pgrpid}, and @file{/dev/user} were deprecated in @command{gawk} @file{/dev/pgrpid}, and @file{/dev/user} were deprecated in @command{gawk}
3.1, but still worked. As of @value{PVERSION} 4.0, they are no longer 3.1, but still worked. As of @value{PVERSION} 4.0, they are no longer
interpreted specially by @command{gawk}. (Use @code{PROCINFO} instead; interpreted specially by @command{gawk}. (Use @code{PROCINFO} instead;
skipping to change at page 110, line ? skipping to change at page 110, line ?
@ignore @ignore
This @value{SECTION} This @value{SECTION}
is thus essentially a place holder, is thus essentially a place holder,
in case some option becomes obsolete in a future version of @command{gawk}. in case some option becomes obsolete in a future version of @command{gawk}.
@end ignore @end ignore
@node Undocumented @node Undocumented
@section Undocumented Options and Features @section Undocumented Options and Features
@cindex undocumented features @cindex undocumented features
@cindex features, undocumented @cindex features @subentry undocumented
@cindex Skywalker, Luke @cindex Skywalker, Luke
@cindex Kenobi, Obi-Wan @cindex Kenobi, Obi-Wan
@cindex jedi knights @cindex jedi knights
@cindex knights, jedi @cindex knights, jedi
@quotation @quotation
@i{Use the Source, Luke!} @i{Use the Source, Luke!}
@author Obi-Wan @author Obi-Wan
@end quotation @end quotation
@cindex shells, sea @cindex shells @subentry sea
This @value{SECTION} intentionally left This @value{SECTION} intentionally left
blank. blank.
@ignore @ignore
@c If these came out in the Info file or TeX document, then they wouldn't @c If these came out in the Info file or TeX document, then they wouldn't
@c be undocumented, would they? @c be undocumented, would they?
@command{gawk} has one undocumented option: @command{gawk} has one undocumented option:
@table @code @table @code
skipping to change at page 110, line ? skipping to change at page 110, line ?
@error{} gawk: cmd. line:11: BEGIN @{ print("hi" @} @error{} gawk: cmd. line:11: BEGIN @{ print("hi" @}
@error{} gawk: cmd. line:11: ^ syntax error @error{} gawk: cmd. line:11: ^ syntax error
@end example @end example
@end ignore @end ignore
@node Invoking Summary @node Invoking Summary
@section Summary @section Summary
@itemize @value{BULLET} @itemize @value{BULLET}
@c From Neil R. Ormos
@item @item
Use either @command{gawk} parses arguments on the command line, left to right, to
@samp{awk '@var{program}' @var{files}} determine if they should be treated as options or as non-option arguments.
or
@samp{awk -f @var{program-file} @var{files}} @item
to run @command{awk}. @command{gawk} recognizes several options which control its operation,
as described in @ref{Options}. All options begin with @samp{-}.
@item
Any argument that is not recognized as an option is treated as a
non-option argument, even if it begins with @samp{-}.
@itemize @value{MINUS}
@item
However, when an option itself requires an argument, and the option is separated
from that argument on the command line by at least one space, the space
is ignored, and the argument is considered to be related to the option. Thus, i
n
the invocation, @samp{gawk -F x}, the @samp{x} is treated as belonging to the
@option{-F} option, not as a separate non-option argument.
@end itemize
@item
Once @command{gawk} finds a non-option argument, it stops looking for
options. Therefore, all following arguments are also non-option arguments,
even if they resemble recognized options.
@item
If no @option{-e} or @option{-f} options are present, @command{gawk}
expects the program text to be in the first non-option argument.
@item
All non-option arguments, except program text provided in the first
non-option argument, are placed in @code{ARGV} as explained in
@ref{ARGC and ARGV}, and are processed as described in @ref{Other Arguments}.
@c And I wrote:
Adjusting @code{ARGC} and @code{ARGV}
affects how @command{awk} processes input.
@c ----------------------------------------
@item @item
The three standard options for all versions of @command{awk} are The three standard options for all versions of @command{awk} are
@option{-f}, @option{-F}, and @option{-v}. @command{gawk} supplies these @option{-f}, @option{-F}, and @option{-v}. @command{gawk} supplies these
and many others, as well as corresponding GNU-style long options. and many others, as well as corresponding GNU-style long options.
@item @item
Nonoption command-line arguments are usually treated as @value{FN}s, Nonoption command-line arguments are usually treated as @value{FN}s,
unless they have the form @samp{@var{var}=@var{value}}, in which case unless they have the form @samp{@var{var}=@var{value}}, in which case
they are taken as variable assignments to be performed at that point they are taken as variable assignments to be performed at that point
in processing the input. in processing the input.
@item @item
All nonoption command-line arguments, excluding the program text,
are placed in the @code{ARGV} array. Adjusting @code{ARGC} and @code{ARGV}
affects how @command{awk} processes input.
@item
You can use a single minus sign (@samp{-}) to refer to standard input You can use a single minus sign (@samp{-}) to refer to standard input
on the command line. @command{gawk} also lets you use the special on the command line. @command{gawk} also lets you use the special
@value{FN} @file{/dev/stdin}. @value{FN} @file{/dev/stdin}.
@item @item
@command{gawk} pays attention to a number of environment variables. @command{gawk} pays attention to a number of environment variables.
@env{AWKPATH}, @env{AWKLIBPATH}, and @env{POSIXLY_CORRECT} are the @env{AWKPATH}, @env{AWKLIBPATH}, and @env{POSIXLY_CORRECT} are the
most important ones. most important ones.
@item @item
skipping to change at page 110, line ? skipping to change at page 110, line ?
@node Regexp @node Regexp
@chapter Regular Expressions @chapter Regular Expressions
@cindex regexp @cindex regexp
@cindex regular expressions @cindex regular expressions
A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
set of strings. set of strings.
Because regular expressions are such a fundamental part of @command{awk} Because regular expressions are such a fundamental part of @command{awk}
programming, their format and use deserve a separate @value{CHAPTER}. programming, their format and use deserve a separate @value{CHAPTER}.
@cindex forward slash (@code{/}) to enclose regular expressions @cindex forward slash (@code{/}) @subentry to enclose regular expressions
@cindex @code{/} (forward slash) to enclose regular expressions @cindex @code{/} (forward slash) @subentry to enclose regular expressions
A regular expression enclosed in slashes (@samp{/}) A regular expression enclosed in slashes (@samp{/})
is an @command{awk} pattern that matches every input record whose text is an @command{awk} pattern that matches every input record whose text
belongs to that set. belongs to that set.
The simplest regular expression is a sequence of letters, numbers, or The simplest regular expression is a sequence of letters, numbers, or
both. Such a regexp matches any string that contains that sequence. both. Such a regexp matches any string that contains that sequence.
Thus, the regexp @samp{foo} matches any string containing @samp{foo}. Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
Thus, the pattern @code{/foo/} matches any input record containing Thus, the pattern @code{/foo/} matches any input record containing
the three adjacent characters @samp{foo} @emph{anywhere} in the record. Other the three adjacent characters @samp{foo} @emph{anywhere} in the record. Other
kinds of regexps let you specify more complicated classes of strings. kinds of regexps let you specify more complicated classes of strings.
skipping to change at page 110, line ? skipping to change at page 110, line ?
* Leftmost Longest:: How much text matches. * Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps. * Computed Regexps:: Using Dynamic Regexps.
* GNU Regexp Operators:: Operators specific to GNU software. * GNU Regexp Operators:: Operators specific to GNU software.
* Case-sensitivity:: How to do case-insensitive matching. * Case-sensitivity:: How to do case-insensitive matching.
* Regexp Summary:: Regular expressions summary. * Regexp Summary:: Regular expressions summary.
@end menu @end menu
@node Regexp Usage @node Regexp Usage
@section How to Use Regular Expressions @section How to Use Regular Expressions
@cindex patterns, regular expressions as @cindex patterns @subentry regexp constants as
@cindex regular expressions, as patterns @cindex regular expressions @subentry as patterns
A regular expression can be used as a pattern by enclosing it in A regular expression can be used as a pattern by enclosing it in
slashes. Then the regular expression is tested against the slashes. Then the regular expression is tested against the
entire text of each record. (Normally, it only needs entire text of each record. (Normally, it only needs
to match some part of the text in order to succeed.) For example, the to match some part of the text in order to succeed.) For example, the
following prints the second field of each record where the string following prints the second field of each record where the string
@samp{li} appears anywhere in the record: @samp{li} appears anywhere in the record:
@example @example
$ @kbd{awk '/li/ @{ print $2 @}' mail-list} $ @kbd{awk '/li/ @{ print $2 @}' mail-list}
@print{} 555-5553 @print{} 555-5553
@print{} 555-0542 @print{} 555-0542
@print{} 555-6699 @print{} 555-6699
@print{} 555-3430 @print{} 555-3430
@end example @end example
@cindex regular expressions, operators @cindex regular expressions @subentry operators
@cindex operators, string-matching @cindex operators @subentry string-matching
@c @cindex operators, @code{~} @c @cindex operators, @code{~}
@cindex string-matching operators @cindex string-matching operators
@cindex @code{~} (tilde), @code{~} operator @cindex @code{~} (tilde), @code{~} operator
@cindex tilde (@code{~}), @code{~} operator @cindex tilde (@code{~}), @code{~} operator
@cindex @code{!} (exclamation point), @code{!~} operator @cindex @code{!} (exclamation point) @subentry @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator @cindex exclamation point (@code{!}) @subentry @code{!~} operator
@c @cindex operators, @code{!~} @c @cindex operators, @code{!~}
@cindex @code{if} statement, use of regexps in @cindex @code{if} statement @subentry use of regexps in
@cindex @code{while} statement, use of regexps in @cindex @code{while} statement @subentry use of regexps in
@cindex @code{do}-@code{while} statement, use of regexps in @cindex @code{do}-@code{while} statement @subentry use of regexps in
@c @cindex statements, @code{if} @c @cindex statements, @code{if}
@c @cindex statements, @code{while} @c @cindex statements, @code{while}
@c @cindex statements, @code{do} @c @cindex statements, @code{do}
Regular expressions can also be used in matching expressions. These Regular expressions can also be used in matching expressions. These
expressions allow you to specify the string to match against; it need expressions allow you to specify the string to match against; it need
not be the entire current input record. The two operators @samp{~} not be the entire current input record. The two operators @samp{~}
and @samp{!~} perform regular expression comparisons. Expressions and @samp{!~} perform regular expression comparisons. Expressions
using these operators can be used as patterns, or in @code{if}, using these operators can be used as patterns, or in @code{if},
@code{while}, @code{for}, and @code{do} statements. @code{while}, @code{for}, and @code{do} statements.
(@xref{Statements}.) (@xref{Statements}.)
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
$ @kbd{awk '$1 !~ /J/' inventory-shipped} $ @kbd{awk '$1 !~ /J/' inventory-shipped}
@print{} Feb 15 32 24 226 @print{} Feb 15 32 24 226
@print{} Mar 15 24 34 228 @print{} Mar 15 24 34 228
@print{} Apr 31 52 63 420 @print{} Apr 31 52 63 420
@print{} May 16 34 29 208 @print{} May 16 34 29 208
@dots{} @dots{}
@end example @end example
@cindex regexp constants @cindex regexp constants
@cindex constant regexps @cindex constants @subentry regexp
@cindex regular expressions, constants, See regexp constants @cindex regular expressions, constants @seeentry{regexp constants}
When a regexp is enclosed in slashes, such as @code{/foo/}, we call it When a regexp is enclosed in slashes, such as @code{/foo/}, we call it
a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and
@code{"foo"} is a string constant. @code{"foo"} is a string constant.
@node Escape Sequences @node Escape Sequences
@section Escape Sequences @section Escape Sequences
@cindex escape sequences, in strings @cindex escape sequences
@cindex backslash (@code{\}), in escape sequences @cindex escape sequences @seealso{backslash}
@cindex @code{\} (backslash), in escape sequences @cindex backslash (@code{\}) @subentry in escape sequences
@cindex @code{\} (backslash) @subentry in escape sequences
Some characters cannot be included literally in string constants Some characters cannot be included literally in string constants
(@code{"foo"}) or regexp constants (@code{/foo/}). (@code{"foo"}) or regexp constants (@code{/foo/}).
Instead, they should be represented with @dfn{escape sequences}, Instead, they should be represented with @dfn{escape sequences},
which are character sequences beginning with a backslash (@samp{\}). which are character sequences beginning with a backslash (@samp{\}).
One use of an escape sequence is to include a double-quote character in One use of an escape sequence is to include a double-quote character in
a string constant. Because a plain double quote ends the string, you a string constant. Because a plain double quote ends the string, you
must use @samp{\"} to represent an actual double-quote character as a must use @samp{\"} to represent an actual double-quote character as a
part of the string. For example: part of the string. For example:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
Other escape sequences represent unprintable characters Other escape sequences represent unprintable characters
such as TAB or newline. There is nothing to stop you from entering most such as TAB or newline. There is nothing to stop you from entering most
unprintable characters directly in a string constant or regexp constant, unprintable characters directly in a string constant or regexp constant,
but they may look ugly. but they may look ugly.
The following list presents The following list presents
all the escape sequences used in @command{awk} and all the escape sequences used in @command{awk} and
what they represent. Unless noted otherwise, all these escape what they represent. Unless noted otherwise, all these escape
sequences apply to both string constants and regexp constants: sequences apply to both string constants and regexp constants:
@cindex ASCII
@table @code @table @code
@item \\ @item \\
A literal backslash, @samp{\}. A literal backslash, @samp{\}.
@c @cindex @command{awk} language, V.4 version @c @cindex @command{awk} language, V.4 version
@cindex @code{\} (backslash), @code{\a} escape sequence @cindex @code{\} (backslash) @subentry @code{\a} escape sequence
@cindex backslash (@code{\}), @code{\a} escape sequence @cindex backslash (@code{\}) @subentry @code{\a} escape sequence
@item \a @item \a
The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL). The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL).
(This often makes some sort of audible noise.) (This often makes some sort of audible noise.)
@cindex @code{\} (backslash), @code{\b} escape sequence @cindex @code{\} (backslash) @subentry @code{\b} escape sequence
@cindex backslash (@code{\}), @code{\b} escape sequence @cindex backslash (@code{\}) @subentry @code{\b} escape sequence
@item \b @item \b
Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS). Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS).
@cindex @code{\} (backslash), @code{\f} escape sequence @cindex @code{\} (backslash) @subentry @code{\f} escape sequence
@cindex backslash (@code{\}), @code{\f} escape sequence @cindex backslash (@code{\}) @subentry @code{\f} escape sequence
@item \f @item \f
Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF). Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF).
@cindex @code{\} (backslash), @code{\n} escape sequence @cindex @code{\} (backslash) @subentry @code{\n} escape sequence
@cindex backslash (@code{\}), @code{\n} escape sequence @cindex backslash (@code{\}) @subentry @code{\n} escape sequence
@item \n @item \n
Newline, @kbd{Ctrl-j}, ASCII code 10 (LF). Newline, @kbd{Ctrl-j}, ASCII code 10 (LF).
@cindex @code{\} (backslash), @code{\r} escape sequence @cindex @code{\} (backslash) @subentry @code{\r} escape sequence
@cindex backslash (@code{\}), @code{\r} escape sequence @cindex backslash (@code{\}) @subentry @code{\r} escape sequence
@item \r @item \r
Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR). Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR).
@cindex @code{\} (backslash), @code{\t} escape sequence @cindex @code{\} (backslash) @subentry @code{\t} escape sequence
@cindex backslash (@code{\}), @code{\t} escape sequence @cindex backslash (@code{\}) @subentry @code{\t} escape sequence
@item \t @item \t
Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT). Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT).
@c @cindex @command{awk} language, V.4 version @c @cindex @command{awk} language, V.4 version
@cindex @code{\} (backslash), @code{\v} escape sequence @cindex @code{\} (backslash) @subentry @code{\v} escape sequence
@cindex backslash (@code{\}), @code{\v} escape sequence @cindex backslash (@code{\}) @subentry @code{\v} escape sequence
@item \v @item \v
Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT). Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT).
@cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence @cindex @code{\} (backslash) @subentry @code{\}@var{nnn} escape sequence
@cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence @cindex backslash (@code{\}) @subentry @code{\}@var{nnn} escape sequence
@item \@var{nnn} @item \@var{nnn}
The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits
between @samp{0} and @samp{7}. For example, the code for the ASCII ESC between @samp{0} and @samp{7}. For example, the code for the ASCII ESC
(escape) character is @samp{\033}. (escape) character is @samp{\033}.
@c @cindex @command{awk} language, V.4 version @c @cindex @command{awk} language, V.4 version
@c @cindex @command{awk} language, POSIX version @c @cindex @command{awk} language, POSIX version
@cindex @code{\} (backslash), @code{\x} escape sequence @cindex @code{\} (backslash) @subentry @code{\x} escape sequence
@cindex backslash (@code{\}), @code{\x} escape sequence @cindex backslash (@code{\}) @subentry @code{\x} escape sequence
@cindex common extensions, @code{\x} escape sequence @cindex common extensions @subentry @code{\x} escape sequence
@cindex extensions, common@comma{} @code{\x} escape sequence @cindex extensions @subentry common @subentry @code{\x} escape sequence
@item \x@var{hh}@dots{} @item \x@var{hh}@dots{}
The hexadecimal value @var{hh}, where @var{hh} stands for a sequence The hexadecimal value @var{hh}, where @var{hh} stands for a sequence
of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F} of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F}
or @samp{a}--@samp{f}). A maximum of two digts are allowed after or @samp{a}--@samp{f}). A maximum of two digts are allowed after
the @samp{\x}. Any further hexadecimal digits are treated as simple the @samp{\x}. Any further hexadecimal digits are treated as simple
letters or numbers. @value{COMMONEXT} letters or numbers. @value{COMMONEXT}
(The @samp{\x} escape sequence is not allowed in POSIX awk.) (The @samp{\x} escape sequence is not allowed in POSIX awk.)
@quotation CAUTION @quotation CAUTION
In ISO C, the escape sequence continues until the first nonhexadecimal In ISO C, the escape sequence continues until the first nonhexadecimal
digit is seen. digit is seen.
For many years, @command{gawk} would continue incorporating For many years, @command{gawk} would continue incorporating
hexadecimal digits into the value until a non-hexadecimal digit hexadecimal digits into the value until a non-hexadecimal digit
or the end of the string was encountered. or the end of the string was encountered.
However, using more than two hexadecimal digits produced However, using more than two hexadecimal digits produced
undefined results. undefined results.
As of @value{PVERSION} 4.2, only two digits As of @value{PVERSION} 4.2, only two digits
are processed. are processed.
@end quotation @end quotation
@cindex @code{\} (backslash), @code{\/} escape sequence @cindex @code{\} (backslash) @subentry @code{\/} escape sequence
@cindex backslash (@code{\}), @code{\/} escape sequence @cindex backslash (@code{\}) @subentry @code{\/} escape sequence
@item \/ @item \/
A literal slash (necessary for regexp constants only). A literal slash (should be used for regexp constants only).
This sequence is used when you want to write a regexp This sequence is used when you want to write a regexp
constant that contains a slash constant that contains a slash
(such as @code{/.*:\/home\/[[:alnum:]]+:.*/}; the @samp{[[:alnum:]]} (such as @code{/.*:\/home\/[[:alnum:]]+:.*/}; the @samp{[[:alnum:]]}
notation is discussed in @ref{Bracket Expressions}). notation is discussed in @ref{Bracket Expressions}).
Because the regexp is delimited by Because the regexp is delimited by
slashes, you need to escape any slash that is part of the pattern, slashes, you need to escape any slash that is part of the pattern,
in order to tell @command{awk} to keep processing the rest of the regexp. in order to tell @command{awk} to keep processing the rest of the regexp.
@cindex @code{\} (backslash), @code{\"} escape sequence @cindex @code{\} (backslash) @subentry @code{\"} escape sequence
@cindex backslash (@code{\}), @code{\"} escape sequence @cindex backslash (@code{\}) @subentry @code{\"} escape sequence
@item \" @item \"
A literal double quote (necessary for string constants only). A literal double quote (should be used for string constants only).
This sequence is used when you want to write a string This sequence is used when you want to write a string
constant that contains a double quote constant that contains a double quote
(such as @code{"He said \"hi!\" to her."}). (such as @code{"He said \"hi!\" to her."}).
Because the string is delimited by Because the string is delimited by
double quotes, you need to escape any quote that is part of the string, double quotes, you need to escape any quote that is part of the string,
in order to tell @command{awk} to keep processing the rest of the string. in order to tell @command{awk} to keep processing the rest of the string.
@end table @end table
In @command{gawk}, a number of additional two-character sequences that begin In @command{gawk}, a number of additional two-character sequences that begin
with a backslash have special meaning in regexps. with a backslash have special meaning in regexps.
@xref{GNU Regexp Operators}. @xref{GNU Regexp Operators}.
In a regexp, a backslash before any character that is not in the previous list In a regexp, a backslash before any character that is not in the previous list
and not listed in and not listed in
@ref{GNU Regexp Operators} @ref{GNU Regexp Operators}
means that the next character should be taken literally, even if it would means that the next character should be taken literally, even if it would
normally be a regexp operator. For example, @code{/a\+b/} matches the three normally be a regexp operator. For example, @code{/a\+b/} matches the three
characters @samp{a+b}. characters @samp{a+b}.
@cindex backslash (@code{\}), in escape sequences @cindex backslash (@code{\}) @subentry in escape sequences
@cindex @code{\} (backslash), in escape sequences @cindex @code{\} (backslash) @subentry in escape sequences
@cindex portability @cindex portability
For complete portability, do not use a backslash before any character not For complete portability, do not use a backslash before any character not
shown in the previous list or that is not an operator. shown in the previous list or that is not an operator.
@c 11/2014: Moved so as to not stack sidebars @c 11/2014: Moved so as to not stack sidebars
@cindex sidebar, Backslash Before Regular Characters @cindex sidebar @subentry Backslash Before Regular Characters
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Backslash Before Regular Characters</title> <sidebar><title>Backslash Before Regular Characters</title>
@end docbook @end docbook
@cindex portability, backslash in escape sequences @cindex portability @subentry backslash in escape sequences
@cindex POSIX @command{awk}, backslashes in string constants @cindex POSIX @command{awk} @subentry backslashes in string constants
@cindex backslash (@code{\}), in escape sequences, POSIX and @cindex backslash (@code{\}) @subentry in escape sequences @subentry POSIX and
@cindex @code{\} (backslash), in escape sequences, POSIX and @cindex @code{\} (backslash) @subentry in escape sequences @subentry POSIX and
@cindex troubleshooting, backslash before nonspecial character @cindex troubleshooting @subentry backslash before nonspecial character
If you place a backslash in a string constant before something that is If you place a backslash in a string constant before something that is
not one of the characters previously listed, POSIX @command{awk} purposely not one of the characters previously listed, POSIX @command{awk} purposely
leaves what happens as undefined. There are two choices: leaves what happens as undefined. There are two choices:
@c @cindex automatic warnings @c @cindex automatic warnings
@c @cindex warnings, automatic @c @cindex warnings, automatic
@cindex Brian Kernighan's @command{awk} @cindex Brian Kernighan's @command{awk}
@table @asis @table @asis
@item Strip the backslash out @item Strip the backslash out
This is what BWK @command{awk} and @command{gawk} both do. This is what BWK @command{awk} and @command{gawk} both do.
For example, @code{"a\qc"} is the same as @code{"aqc"}. For example, @code{"a\qc"} is the same as @code{"aqc"}.
(Because this is such an easy bug both to introduce and to miss, (Because this is such an easy bug both to introduce and to miss,
@command{gawk} warns you about it.) @command{gawk} warns you about it.)
Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars
surrounded by whitespace as the field separator. There should be surrounded by whitespace as the field separator. There should be
two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
@c I did this! This is why I added the warning. @c I did this! This is why I added the warning.
@cindex @command{gawk}, escape sequences @cindex @command{gawk} @subentry escape sequences
@cindex Unix @command{awk}, backslashes in escape sequences @cindex @command{gawk} @subentry escape sequences @seealso{backslash}
@cindex Unix @command{awk} @subentry backslashes in escape sequences
@cindex @command{mawk} utility @cindex @command{mawk} utility
@item Leave the backslash alone @item Leave the backslash alone
Some other @command{awk} implementations do this. Some other @command{awk} implementations do this.
In such implementations, typing @code{"a\qc"} is the same as typing In such implementations, typing @code{"a\qc"} is the same as typing
@code{"a\\qc"}. @code{"a\\qc"}.
@end table @end table
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Backslash Before Regular Characters} @center @b{Backslash Before Regular Characters}
@cindex portability, backslash in escape sequences @cindex portability @subentry backslash in escape sequences
@cindex POSIX @command{awk}, backslashes in string constants @cindex POSIX @command{awk} @subentry backslashes in string constants
@cindex backslash (@code{\}), in escape sequences, POSIX and @cindex backslash (@code{\}) @subentry in escape sequences @subentry POSIX and
@cindex @code{\} (backslash), in escape sequences, POSIX and @cindex @code{\} (backslash) @subentry in escape sequences @subentry POSIX and
@cindex troubleshooting, backslash before nonspecial character @cindex troubleshooting @subentry backslash before nonspecial character
If you place a backslash in a string constant before something that is If you place a backslash in a string constant before something that is
not one of the characters previously listed, POSIX @command{awk} purposely not one of the characters previously listed, POSIX @command{awk} purposely
leaves what happens as undefined. There are two choices: leaves what happens as undefined. There are two choices:
@c @cindex automatic warnings @c @cindex automatic warnings
@c @cindex warnings, automatic @c @cindex warnings, automatic
@cindex Brian Kernighan's @command{awk} @cindex Brian Kernighan's @command{awk}
@table @asis @table @asis
@item Strip the backslash out @item Strip the backslash out
This is what BWK @command{awk} and @command{gawk} both do. This is what BWK @command{awk} and @command{gawk} both do.
For example, @code{"a\qc"} is the same as @code{"aqc"}. For example, @code{"a\qc"} is the same as @code{"aqc"}.
(Because this is such an easy bug both to introduce and to miss, (Because this is such an easy bug both to introduce and to miss,
@command{gawk} warns you about it.) @command{gawk} warns you about it.)
Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars
surrounded by whitespace as the field separator. There should be surrounded by whitespace as the field separator. There should be
two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
@c I did this! This is why I added the warning. @c I did this! This is why I added the warning.
@cindex @command{gawk}, escape sequences @cindex @command{gawk} @subentry escape sequences
@cindex Unix @command{awk}, backslashes in escape sequences @cindex @command{gawk} @subentry escape sequences @seealso{backslash}
@cindex Unix @command{awk} @subentry backslashes in escape sequences
@cindex @command{mawk} utility @cindex @command{mawk} utility
@item Leave the backslash alone @item Leave the backslash alone
Some other @command{awk} implementations do this. Some other @command{awk} implementations do this.
In such implementations, typing @code{"a\qc"} is the same as typing In such implementations, typing @code{"a\qc"} is the same as typing
@code{"a\\qc"}. @code{"a\\qc"}.
@end table @end table
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
To summarize: To summarize:
skipping to change at page 110, line ? skipping to change at page 110, line ?
@command{gawk} processes both regexp constants and dynamic regexps @command{gawk} processes both regexp constants and dynamic regexps
(@pxref{Computed Regexps}), (@pxref{Computed Regexps}),
for the special operators listed in for the special operators listed in
@ref{GNU Regexp Operators}. @ref{GNU Regexp Operators}.
@item @item
A backslash before any other character means to treat that character A backslash before any other character means to treat that character
literally. literally.
@end itemize @end itemize
@cindex sidebar, Escape Sequences for Metacharacters @cindex sidebar @subentry Escape Sequences for Metacharacters
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Escape Sequences for Metacharacters</title> <sidebar><title>Escape Sequences for Metacharacters</title>
@end docbook @end docbook
@cindex metacharacters, escape sequences for @cindex metacharacters @subentry escape sequences for
Suppose you use an octal or hexadecimal Suppose you use an octal or hexadecimal
escape to represent a regexp metacharacter. escape to represent a regexp metacharacter.
(See @ref{Regexp Operators}.) (See @ref{Regexp Operators}.)
Does @command{awk} treat the character as a literal character or as a regexp Does @command{awk} treat the character as a literal character or as a regexp
operator? operator?
@cindex dark corner, escape sequences, for metacharacters @cindex dark corner @subentry escape sequences @subentry for metacharacters
Historically, such characters were taken literally. Historically, such characters were taken literally.
@value{DARKCORNER} @value{DARKCORNER}
However, the POSIX standard indicates that they should be treated However, the POSIX standard indicates that they should be treated
as real metacharacters, which is what @command{gawk} does. as real metacharacters, which is what @command{gawk} does.
In compatibility mode (@pxref{Options}), In compatibility mode (@pxref{Options}),
@command{gawk} treats the characters represented by octal and hexadecimal @command{gawk} treats the characters represented by octal and hexadecimal
escape sequences literally when used in regexp constants. Thus, escape sequences literally when used in regexp constants. Thus,
@code{/a\52b/} is equivalent to @code{/a\*b/}. @code{/a\52b/} is equivalent to @code{/a\*b/}.
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Escape Sequences for Metacharacters} @center @b{Escape Sequences for Metacharacters}
@cindex metacharacters, escape sequences for @cindex metacharacters @subentry escape sequences for
Suppose you use an octal or hexadecimal Suppose you use an octal or hexadecimal
escape to represent a regexp metacharacter. escape to represent a regexp metacharacter.
(See @ref{Regexp Operators}.) (See @ref{Regexp Operators}.)
Does @command{awk} treat the character as a literal character or as a regexp Does @command{awk} treat the character as a literal character or as a regexp
operator? operator?
@cindex dark corner, escape sequences, for metacharacters @cindex dark corner @subentry escape sequences @subentry for metacharacters
Historically, such characters were taken literally. Historically, such characters were taken literally.
@value{DARKCORNER} @value{DARKCORNER}
However, the POSIX standard indicates that they should be treated However, the POSIX standard indicates that they should be treated
as real metacharacters, which is what @command{gawk} does. as real metacharacters, which is what @command{gawk} does.
In compatibility mode (@pxref{Options}), In compatibility mode (@pxref{Options}),
@command{gawk} treats the characters represented by octal and hexadecimal @command{gawk} treats the characters represented by octal and hexadecimal
escape sequences literally when used in regexp constants. Thus, escape sequences literally when used in regexp constants. Thus,
@code{/a\52b/} is equivalent to @code{/a\*b/}. @code{/a\52b/} is equivalent to @code{/a\*b/}.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node Regexp Operators @node Regexp Operators
@section Regular Expression Operators @section Regular Expression Operators
@cindex regular expressions, operators @cindex regular expressions @subentry operators
@cindex metacharacters in regular expressions @cindex metacharacters @subentry in regular expressions
You can combine regular expressions with special characters, You can combine regular expressions with special characters,
called @dfn{regular expression operators} or @dfn{metacharacters}, to called @dfn{regular expression operators} or @dfn{metacharacters}, to
increase the power and versatility of regular expressions. increase the power and versatility of regular expressions.
@menu @menu
* Regexp Operator Details:: The actual details. * Regexp Operator Details:: The actual details.
* Interval Expressions:: Notes on interval expressions. * Interval Expressions:: Notes on interval expressions.
@end menu @end menu
skipping to change at page 110, line ? skipping to change at page 110, line ?
in @ref{Escape Sequences} in @ref{Escape Sequences}
are valid inside a regexp. They are introduced by a @samp{\} and are valid inside a regexp. They are introduced by a @samp{\} and
are recognized and converted into corresponding real characters as are recognized and converted into corresponding real characters as
the very first step in processing regexps. the very first step in processing regexps.
Here is a list of metacharacters. All characters that are not escape Here is a list of metacharacters. All characters that are not escape
sequences and that are not listed here stand for themselves: sequences and that are not listed here stand for themselves:
@c Use @asis so the docbook comes out ok. Sigh. @c Use @asis so the docbook comes out ok. Sigh.
@table @asis @table @asis
@cindex backslash (@code{\}), regexp operator @cindex backslash (@code{\}) @subentry regexp operator
@cindex @code{\} (backslash), regexp operator @cindex @code{\} (backslash) @subentry regexp operator
@item @code{\} @item @code{\}
This suppresses the special meaning of a character when This suppresses the special meaning of a character when
matching. For example, @samp{\$} matching. For example, @samp{\$}
matches the character @samp{$}. matches the character @samp{$}.
@cindex regular expressions, anchors in @cindex regular expressions @subentry anchors in
@cindex Texinfo, chapter beginnings in files @cindex Texinfo @subentry chapter beginnings in files
@cindex @code{^} (caret), regexp operator @cindex @code{^} (caret) @subentry regexp operator
@cindex caret (@code{^}), regexp operator @cindex caret (@code{^}) @subentry regexp operator
@item @code{^} @item @code{^}
This matches the beginning of a string. @samp{^@@chapter} This matches the beginning of a string. @samp{^@@chapter}
matches @samp{@@chapter} at the beginning of a string, matches @samp{@@chapter} at the beginning of a string,
for example, and can be used for example, and can be used
to identify chapter beginnings in Texinfo source files. to identify chapter beginnings in Texinfo source files.
The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to
match only at the beginning of the string. match only at the beginning of the string.
It is important to realize that @samp{^} does not match the beginning of It is important to realize that @samp{^} does not match the beginning of
a line (the point right after a @samp{\n} newline character) embedded in a strin g. a line (the point right after a @samp{\n} newline character) embedded in a strin g.
The condition is not true in the following example: The condition is not true in the following example:
@example @example
if ("line1\nLINE 2" ~ /^L/) @dots{} if ("line1\nLINE 2" ~ /^L/) @dots{}
@end example @end example
@cindex @code{$} (dollar sign), regexp operator @cindex @code{$} (dollar sign) @subentry regexp operator
@cindex dollar sign (@code{$}), regexp operator @cindex dollar sign (@code{$}) @subentry regexp operator
@item @code{$} @item @code{$}
This is similar to @samp{^}, but it matches only at the end of a string. This is similar to @samp{^}, but it matches only at the end of a string.
For example, @samp{p$} For example, @samp{p$}
matches a record that ends with a @samp{p}. The @samp{$} is an anchor matches a record that ends with a @samp{p}. The @samp{$} is an anchor
and does not match the end of a line and does not match the end of a line
(the point right before a @samp{\n} newline character) (the point right before a @samp{\n} newline character)
embedded in a string. embedded in a string.
The condition in the following example is not true: The condition in the following example is not true:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
@cindex @code{.} (period), regexp operator @cindex @code{.} (period), regexp operator
@cindex period (@code{.}), regexp operator @cindex period (@code{.}), regexp operator
@item @code{.} (period) @item @code{.} (period)
This matches any single character, This matches any single character,
@emph{including} the newline character. For example, @samp{.P} @emph{including} the newline character. For example, @samp{.P}
matches any single character followed by a @samp{P} in a string. Using matches any single character followed by a @samp{P} in a string. Using
concatenation, we can make a regular expression such as @samp{U.A}, which concatenation, we can make a regular expression such as @samp{U.A}, which
matches any three-character sequence that begins with @samp{U} and ends matches any three-character sequence that begins with @samp{U} and ends
with @samp{A}. with @samp{A}.
@cindex POSIX @command{awk}, period (@code{.})@comma{} using @cindex POSIX mode
@cindex POSIX @command{awk} @subentry period (@code{.}), using
In strict POSIX mode (@pxref{Options}), In strict POSIX mode (@pxref{Options}),
@samp{.} does not match the @sc{nul} @samp{.} does not match the @sc{nul}
character, which is a character with all bits equal to zero. character, which is a character with all bits equal to zero.
Otherwise, @sc{nul} is just another character. Other versions of @command{awk} Otherwise, @sc{nul} is just another character. Other versions of @command{awk}
may not be able to match the @sc{nul} character. may not be able to match the @sc{nul} character.
@cindex @code{[]} (square brackets), regexp operator @cindex @code{[]} (square brackets), regexp operator
@cindex square brackets (@code{[]}), regexp operator @cindex square brackets (@code{[]}), regexp operator
@cindex bracket expressions @cindex bracket expressions
@cindex character sets, See Also bracket expressions @cindex character sets (in regular expressions) @seeentry{bracket expressions}
@cindex character lists, See bracket expressions @cindex character lists @seeentry{bracket expressions}
@cindex character classes, See bracket expressions @cindex character classes @seeentry{bracket expressions}
@item @code{[}@dots{}@code{]} @item @code{[}@dots{}@code{]}
This is called a @dfn{bracket expression}.@footnote{In other literature, This is called a @dfn{bracket expression}.@footnote{In other literature,
you may see a bracket expression referred to as either a you may see a bracket expression referred to as either a
@dfn{character set}, a @dfn{character class}, or a @dfn{character list}.} @dfn{character set}, a @dfn{character class}, or a @dfn{character list}.}
It matches any @emph{one} of the characters that are enclosed in It matches any @emph{one} of the characters that are enclosed in
the square brackets. For example, @samp{[MVX]} matches any one of the square brackets. For example, @samp{[MVX]} matches any one of
the characters @samp{M}, @samp{V}, or @samp{X} in a string. A full the characters @samp{M}, @samp{V}, or @samp{X} in a string. A full
discussion of what can be inside the square brackets of a bracket expression discussion of what can be inside the square brackets of a bracket expression
is given in is given in
@ref{Bracket Expressions}. @ref{Bracket Expressions}.
@cindex bracket expressions, complemented @cindex bracket expressions @subentry complemented
@item @code{[^}@dots{}@code{]} @item @code{[^}@dots{}@code{]}
This is a @dfn{complemented bracket expression}. The first character after This is a @dfn{complemented bracket expression}. The first character after
the @samp{[} @emph{must} be a @samp{^}. It matches any characters the @samp{[} @emph{must} be a @samp{^}. It matches any characters
@emph{except} those in the square brackets. For example, @samp{[^awk]} @emph{except} those in the square brackets. For example, @samp{[^awk]}
matches any character that is not an @samp{a}, @samp{w}, matches any character that is not an @samp{a}, @samp{w},
or @samp{k}. or @samp{k}.
@cindex @code{|} (vertical bar) @cindex @code{|} (vertical bar)
@cindex vertical bar (@code{|}) @cindex vertical bar (@code{|})
@item @code{|} @item @code{|}
This is the @dfn{alternation operator} and it is used to specify This is the @dfn{alternation operator} and it is used to specify
alternatives. The @samp{|} has the lowest precedence of all the regular alternatives. The @samp{|} has the lowest precedence of all the regular
expression operators. For example, @samp{^P|[aeiouy]} matches any string expression operators. For example, @samp{^P|[aeiouy]} matches any string
that matches either @samp{^P} or @samp{[aeiouy]}. This means it matches that matches either @samp{^P} or @samp{[aeiouy]}. This means it matches
any string that starts with @samp{P} or contains (anywhere within it) any string that starts with @samp{P} or contains (anywhere within it)
a lowercase English vowel. a lowercase English vowel.
The alternation applies to the largest possible regexps on either side. The alternation applies to the largest possible regexps on either side.
@cindex @code{()} (parentheses), regexp operator @cindex @code{()} (parentheses) @subentry regexp operator
@cindex parentheses @code{()}, regexp operator @cindex parentheses @code{()} @subentry regexp operator
@item @code{(}@dots{}@code{)} @item @code{(}@dots{}@code{)}
Parentheses are used for grouping in regular expressions, as in Parentheses are used for grouping in regular expressions, as in
arithmetic. They can be used to concatenate regular expressions arithmetic. They can be used to concatenate regular expressions
containing the alternation operator, @samp{|}. For example, containing the alternation operator, @samp{|}. For example,
@samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and @samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and
@samp{@@samp@{bar@}}. @samp{@@samp@{bar@}}.
(These are Texinfo formatting control sequences. The @samp{+} is (These are Texinfo formatting control sequences. The @samp{+} is
explained further on in this list.) explained further on in this list.)
The left or opening parenthesis is always a metacharacter; to match The left or opening parenthesis is always a metacharacter; to match
one literally, precede it with a backslash. However, the right or one literally, precede it with a backslash. However, the right or
closing parenthesis is only special when paired with a left parenthesis; closing parenthesis is only special when paired with a left parenthesis;
an unpaired right parenthesis is (silently) treated as a regular character. an unpaired right parenthesis is (silently) treated as a regular character.
@cindex @code{*} (asterisk), @code{*} operator, as regexp operator @cindex @code{*} (asterisk) @subentry @code{*} operator @subentry as regexp oper
@cindex asterisk (@code{*}), @code{*} operator, as regexp operator ator
@cindex asterisk (@code{*}) @subentry @code{*} operator @subentry as regexp oper
ator
@item @code{*} @item @code{*}
This symbol means that the preceding regular expression should be This symbol means that the preceding regular expression should be
repeated as many times as necessary to find a match. For example, @samp{ph*} repeated as many times as necessary to find a match. For example, @samp{ph*}
applies the @samp{*} symbol to the preceding @samp{h} and looks for matches applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
of one @samp{p} followed by any number of @samp{h}s. This also matches of one @samp{p} followed by any number of @samp{h}s. This also matches
just @samp{p} if no @samp{h}s are present. just @samp{p} if no @samp{h}s are present.
There are two subtle points to understand about how @samp{*} works. There are two subtle points to understand about how @samp{*} works.
First, the @samp{*} applies only to the single preceding regular expression First, the @samp{*} applies only to the single preceding regular expression
component (e.g., in @samp{ph*}, it applies just to the @samp{h}). component (e.g., in @samp{ph*}, it applies just to the @samp{h}).
To cause @samp{*} to apply to a larger subexpression, use parentheses: To cause @samp{*} to apply to a larger subexpression, use parentheses:
@samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on. @samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on.
Second, @samp{*} finds as many repetitions as possible. If the text Second, @samp{*} finds as many repetitions as possible. If the text
to be matched is @samp{phhhhhhhhhhhhhhooey}, @samp{ph*} matches all of to be matched is @samp{phhhhhhhhhhhhhhooey}, @samp{ph*} matches all of
the @samp{h}s. the @samp{h}s.
@cindex @code{+} (plus sign), regexp operator @cindex @code{+} (plus sign) @subentry regexp operator
@cindex plus sign (@code{+}), regexp operator @cindex plus sign (@code{+}) @subentry regexp operator
@item @code{+} @item @code{+}
This symbol is similar to @samp{*}, except that the preceding expression must be This symbol is similar to @samp{*}, except that the preceding expression must be
matched at least once. This means that @samp{wh+y} matched at least once. This means that @samp{wh+y}
would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas
@samp{wh*y} would match all three. @samp{wh*y} would match all three.
@cindex @code{?} (question mark), regexp operator @cindex @code{?} (question mark) @subentry regexp operator
@cindex question mark (@code{?}), regexp operator @cindex question mark (@code{?}) @subentry regexp operator
@item @code{?} @item @code{?}
This symbol is similar to @samp{*}, except that the preceding expression can be This symbol is similar to @samp{*}, except that the preceding expression can be
matched either once or not at all. For example, @samp{fe?d} matched either once or not at all. For example, @samp{fe?d}
matches @samp{fed} and @samp{fd}, but nothing else. matches @samp{fed} and @samp{fd}, but nothing else.
@cindex @code{@{@}} (braces) @subentry regexp operator
@cindex braces (@code{@{@}}) @subentry regexp operator
@cindex interval expressions, regexp operator @cindex interval expressions, regexp operator
@item @code{@{}@var{n}@code{@}} @item @code{@{}@var{n}@code{@}}
@itemx @code{@{}@var{n}@code{,@}} @itemx @code{@{}@var{n}@code{,@}}
@itemx @code{@{}@var{n}@code{,}@var{m}@code{@}} @itemx @code{@{}@var{n}@code{,}@var{m}@code{@}}
One or two numbers inside braces denote an @dfn{interval expression}. One or two numbers inside braces denote an @dfn{interval expression}.
If there is one number in the braces, the preceding regexp is repeated If there is one number in the braces, the preceding regexp is repeated
@var{n} times. @var{n} times.
If there are two numbers separated by a comma, the preceding regexp is If there are two numbers separated by a comma, the preceding regexp is
repeated @var{n} to @var{m} times. repeated @var{n} to @var{m} times.
If there is one number followed by a comma, then the preceding regexp If there is one number followed by a comma, then the preceding regexp
skipping to change at page 110, line ? skipping to change at page 110, line ?
Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}. Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}.
@item wh@{3,5@}y @item wh@{3,5@}y
Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy} only. Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy} only.
@item wh@{2,@}y @item wh@{2,@}y
Matches @samp{whhy}, @samp{whhhy}, and so on. Matches @samp{whhy}, @samp{whhhy}, and so on.
@end table @end table
@end table @end table
@cindex precedence, regexp operators @cindex precedence @subentry regexp operators
@cindex regular expressions, operators, precedence of @cindex regular expressions @subentry operators @subentry precedence of
In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators, In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators,
as well as the braces @samp{@{} and @samp{@}}, as well as the braces @samp{@{} and @samp{@}},
have have
the highest precedence, followed by concatenation, and finally by @samp{|}. the highest precedence, followed by concatenation, and finally by @samp{|}.
As in arithmetic, parentheses can change how operators are grouped. As in arithmetic, parentheses can change how operators are grouped.
@cindex POSIX @command{awk}, regular expressions and @cindex POSIX @command{awk} @subentry regular expressions and
@cindex @command{gawk}, regular expressions, precedence @cindex @command{gawk} @subentry regular expressions @subentry precedence
In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
@samp{?} operators stand for themselves when there is nothing in the @samp{?} operators stand for themselves when there is nothing in the
regexp that precedes them. For example, @code{/+/} matches a literal regexp that precedes them. For example, @code{/+/} matches a literal
plus sign. However, many other versions of @command{awk} treat such a plus sign. However, many other versions of @command{awk} treat such a
usage as a syntax error. usage as a syntax error.
@node Interval Expressions @node Interval Expressions
@subsection Some Notes On Interval Expressions @subsection Some Notes On Interval Expressions
@cindex POSIX @command{awk}, interval expressions in @cindex POSIX @command{awk} @subentry interval expressions in
Interval expressions were not traditionally available in @command{awk}. Interval expressions were not traditionally available in @command{awk}.
They were added as part of the POSIX standard to make @command{awk} They were added as part of the POSIX standard to make @command{awk}
and @command{egrep} consistent with each other. and @command{egrep} consistent with each other.
@cindex @command{gawk}, interval expressions and @cindex @command{gawk} @subentry interval expressions and
Initially, because old programs may use @samp{@{} and @samp{@}} in regexp Initially, because old programs may use @samp{@{} and @samp{@}} in regexp
constants, constants,
@command{gawk} did @emph{not} match interval expressions @command{gawk} did @emph{not} match interval expressions
in regexps. in regexps.
However, beginning with @value{PVERSION} 4.0, However, beginning with @value{PVERSION} 4.0,
@command{gawk} does match interval expressions by default. @command{gawk} does match interval expressions by default.
This is because compatibility with POSIX has become more This is because compatibility with POSIX has become more
important to most @command{gawk} users than compatibility with important to most @command{gawk} users than compatibility with
old programs. old programs.
skipping to change at page 110, line ? skipping to change at page 110, line ?
it is good practice to always escape them with a backslash. Then the it is good practice to always escape them with a backslash. Then the
regexp constants are valid and work the way you want them to, using regexp constants are valid and work the way you want them to, using
any version of @command{awk}.@footnote{Use two backslashes if you're any version of @command{awk}.@footnote{Use two backslashes if you're
using a string constant with a regexp operator or function.} using a string constant with a regexp operator or function.}
Finally, when @samp{@{} and @samp{@}} appear in regexp constants Finally, when @samp{@{} and @samp{@}} appear in regexp constants
in a way that cannot be interpreted as an interval expression in a way that cannot be interpreted as an interval expression
(such as @code{/q@{a@}/}), then they stand for themselves. (such as @code{/q@{a@}/}), then they stand for themselves.
As mentioned, interval expressions were not traditionally available As mentioned, interval expressions were not traditionally available
in @command{awk}. In March of 2019, Brian Kernighan's in @command{awk}. In March of 2019, BWK @command{awk} (finally) acquired them.
@command{awk} (finally) acquired them.
Nonetheless, because they were not available for Nonetheless, because they were not available for
so many decades, @command{gawk} continues to not supply them so many decades, @command{gawk} continues to not supply them
when in compatibility mode (@pxref{Options}). when in compatibility mode (@pxref{Options}).
@node Bracket Expressions @node Bracket Expressions
@section Using Bracket Expressions @section Using Bracket Expressions
@cindex bracket expressions @cindex bracket expressions
@cindex bracket expressions, range expressions @cindex bracket expressions @subentry range expressions
@cindex range expressions (regexps) @cindex range expressions (regexps)
@cindex character lists in regular expressions @cindex bracket expressions @subentry character lists
As mentioned earlier, a bracket expression matches any character among As mentioned earlier, a bracket expression matches any character among
those listed between the opening and closing square brackets. those listed between the opening and closing square brackets.
Within a bracket expression, a @dfn{range expression} consists of two Within a bracket expression, a @dfn{range expression} consists of two
characters separated by a hyphen. It matches any single character that characters separated by a hyphen. It matches any single character that
sorts between the two characters, based upon the system's native character sorts between the two characters, based upon the system's native character
set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}. set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
(See @ref{Ranges and Locales} for an explanation of how the POSIX (See @ref{Ranges and Locales} for an explanation of how the POSIX
standard and @command{gawk} have changed over time. This is mainly standard and @command{gawk} have changed over time. This is mainly
skipping to change at page 110, line ? skipping to change at page 110, line ?
With the increasing popularity of the With the increasing popularity of the
@uref{http://www.unicode.org, Unicode character standard}, @uref{http://www.unicode.org, Unicode character standard},
there is an additional wrinkle to consider. Octal and hexadecimal there is an additional wrinkle to consider. Octal and hexadecimal
escape sequences inside bracket expressions are taken to represent escape sequences inside bracket expressions are taken to represent
only single-byte characters (characters whose values fit within only single-byte characters (characters whose values fit within
the range 0--256). To match a range of characters where the endpoints the range 0--256). To match a range of characters where the endpoints
of the range are larger than 256, enter the multibyte encodings of of the range are larger than 256, enter the multibyte encodings of
the characters directly. the characters directly.
@cindex @code{\} (backslash), in bracket expressions @cindex @code{\} (backslash) @subentry in bracket expressions
@cindex backslash (@code{\}), in bracket expressions @cindex backslash (@code{\}) @subentry in bracket expressions
@cindex @code{^} (caret), in bracket expressions @cindex @code{^} (caret) @subentry in bracket expressions
@cindex caret (@code{^}), in bracket expressions @cindex caret (@code{^}) @subentry in bracket expressions
@cindex @code{-} (hyphen), in bracket expressions @cindex @code{-} (hyphen) @subentry in bracket expressions
@cindex hyphen (@code{-}), in bracket expressions @cindex hyphen (@code{-}) @subentry in bracket expressions
To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a
bracket expression, put a @samp{\} in front of it. For example: bracket expression, put a @samp{\} in front of it. For example:
@example @example
[d\]] [d\]]
@end example @end example
@noindent @noindent
matches either @samp{d} or @samp{]}. matches either @samp{d} or @samp{]}.
Additionally, if you place @samp{]} right after the opening Additionally, if you place @samp{]} right after the opening
@samp{[}, the closing bracket is treated as one of the @samp{[}, the closing bracket is treated as one of the
characters to be matched. characters to be matched.
@cindex POSIX @command{awk}, bracket expressions and @cindex POSIX @command{awk} @subentry bracket expressions and
@cindex Extended Regular Expressions (EREs) @cindex Extended Regular Expressions (EREs)
@cindex EREs (Extended Regular Expressions) @cindex EREs (Extended Regular Expressions)
@cindex @command{egrep} utility @cindex @command{egrep} utility
The treatment of @samp{\} in bracket expressions The treatment of @samp{\} in bracket expressions
is compatible with other @command{awk} is compatible with other @command{awk}
implementations and is also mandated by POSIX. implementations and is also mandated by POSIX.
The regular expressions in @command{awk} are a superset The regular expressions in @command{awk} are a superset
of the POSIX specification for Extended Regular Expressions (EREs). of the POSIX specification for Extended Regular Expressions (EREs).
POSIX EREs are based on the regular expressions accepted by the POSIX EREs are based on the regular expressions accepted by the
traditional @command{egrep} utility. traditional @command{egrep} utility.
@cindex bracket expressions, character classes @cindex bracket expressions @subentry character classes
@cindex POSIX @command{awk}, bracket expressions and, character classes @cindex POSIX @command{awk} @subentry bracket expressions and @subentry characte
r classes
@dfn{Character classes} are a feature introduced in the POSIX standard. @dfn{Character classes} are a feature introduced in the POSIX standard.
A character class is a special notation for describing A character class is a special notation for describing
lists of characters that have a specific attribute, but the lists of characters that have a specific attribute, but the
actual characters can vary from country to country and/or actual characters can vary from country to country and/or
from character set to character set. For example, the notion of what from character set to character set. For example, the notion of what
is an alphabetic character differs between the United States and France. is an alphabetic character differs between the United States and France.
A character class is only valid in a regexp @emph{inside} the A character class is only valid in a regexp @emph{inside} the
brackets of a bracket expression. Character classes consist of @samp{[:}, brackets of a bracket expression. Character classes consist of @samp{[:},
a keyword denoting the class, and @samp{:]}. a keyword denoting the class, and @samp{:]}.
skipping to change at page 110, line ? skipping to change at page 110, line ?
> Can you clarify, please? > Can you clarify, please?
I thought I already did: we cannot be expected to provide a definitive I thought I already did: we cannot be expected to provide a definitive
description of what the named classes stand for, because the answer description of what the named classes stand for, because the answer
depends on various factors out of our control. depends on various factors out of our control.
@end ignore @end ignore
@c Thanks to @c Thanks to
@c Date: Tue, 01 Jul 2014 07:39:51 +0200 @c Date: Tue, 01 Jul 2014 07:39:51 +0200
@c From: Hermann Peifer <peifer@gmx.eu> @c From: Hermann Peifer <peifer@gmx.eu>
@cindex ASCII
Some utilities that match regular expressions provide a nonstandard Some utilities that match regular expressions provide a nonstandard
@samp{[:ascii:]} character class; @command{awk} does not. However, you @samp{[:ascii:]} character class; @command{awk} does not. However, you
can simulate such a construct using @samp{[\x00-\x7F]}. This matches can simulate such a construct using @samp{[\x00-\x7F]}. This matches
all values numerically between zero and 127, which is the defined all values numerically between zero and 127, which is the defined
range of the ASCII character set. Use a complemented character list range of the ASCII character set. Use a complemented character list
(@samp{[^\x00-\x7F]}) to match any single-byte characters that are not (@samp{[^\x00-\x7F]}) to match any single-byte characters that are not
in the ASCII range. in the ASCII range.
@quotation NOTE @quotation NOTE
Some older versions of Unix @command{awk} Some older versions of Unix @command{awk}
treat @code{[:blank:]} like @code{[:space:]}, incorrectly matching treat @code{[:blank:]} like @code{[:space:]}, incorrectly matching
more characters than they should. Caveat Emptor. more characters than they should. Caveat Emptor.
@end quotation @end quotation
@cindex bracket expressions, collating elements @cindex bracket expressions @subentry collating elements
@cindex bracket expressions, non-ASCII @cindex bracket expressions @subentry non-ASCII
@cindex collating elements @cindex collating elements
Two additional special sequences can appear in bracket expressions. Two additional special sequences can appear in bracket expressions.
These apply to non-ASCII character sets, which can have single symbols These apply to non-ASCII character sets, which can have single symbols
(called @dfn{collating elements}) that are represented with more than one (called @dfn{collating elements}) that are represented with more than one
character. They can also have several characters that are equivalent for character. They can also have several characters that are equivalent for
@dfn{collating}, or sorting, purposes. (For example, in French, a plain ``e'' @dfn{collating}, or sorting, purposes. (For example, in French, a plain ``e''
and a grave-accented ``@`e'' are equivalent.) and a grave-accented ``@`e'' are equivalent.)
These sequences are: These sequences are:
@table @asis @table @asis
@cindex bracket expressions, collating symbols @cindex bracket expressions @subentry collating symbols
@cindex collating symbols @cindex collating symbols
@item Collating symbols @item Collating symbols
Multicharacter collating elements enclosed between Multicharacter collating elements enclosed between
@samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element, @samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element,
then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas
@samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}. @samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
@cindex bracket expressions, equivalence classes @cindex bracket expressions @subentry equivalence classes
@item Equivalence classes @item Equivalence classes
Locale-specific names for a list of Locale-specific names for a list of
characters that are equal. The name is enclosed between characters that are equal. The name is enclosed between
@samp{[=} and @samp{=]}. @samp{[=} and @samp{=]}.
For example, the name @samp{e} might be used to represent all of For example, the name @samp{e} might be used to represent all of
``e,'' ``@^e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp ``e,'' ``@^e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp
that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}. that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}.
@end table @end table
These features are very valuable in non-English-speaking locales. These features are very valuable in non-English-speaking locales.
@cindex internationalization, localization, character classes @cindex internationalization @subentry localization @subentry character classes
@cindex @command{gawk}, character classes and @cindex @command{gawk} @subentry character classes and
@cindex POSIX @command{awk}, bracket expressions and, character classes @cindex POSIX @command{awk} @subentry bracket expressions and @subentry characte
r classes
@quotation CAUTION @quotation CAUTION
The library functions that @command{gawk} uses for regular The library functions that @command{gawk} uses for regular
expression matching currently recognize only POSIX character classes; expression matching currently recognize only POSIX character classes;
they do not recognize collating symbols or equivalence classes. they do not recognize collating symbols or equivalence classes.
@end quotation @end quotation
@c maybe one day ... @c maybe one day ...
Inside a bracket expression, an opening bracket (@samp{[}) that does Inside a bracket expression, an opening bracket (@samp{[}) that does
not start a character class, collating element or equivalence class is not start a character class, collating element or equivalence class is
taken literally. This is also true of @samp{.} and @samp{*}. taken literally. This is also true of @samp{.} and @samp{*}.
@node Leftmost Longest @node Leftmost Longest
@section How Much Text Matches? @section How Much Text Matches?
@cindex regular expressions, leftmost longest match @cindex regular expressions @subentry leftmost longest match
@c @cindex matching, leftmost longest @c @cindex matching, leftmost longest
Consider the following: Consider the following:
@example @example
echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
@end example @end example
This example uses the @code{sub()} function to make a change to the input This example uses the @code{sub()} function to make a change to the input
record. (@code{sub()} replaces the first instance of any text matched record. (@code{sub()} replaces the first instance of any text matched
by the first argument with the string provided as the second argument; by the first argument with the string provided as the second argument;
skipping to change at page 110, line ? skipping to change at page 110, line ?
@xref{String Functions}, @xref{String Functions},
for more information on these functions. for more information on these functions.
@end ifinfo @end ifinfo
Understanding this principle is also important for regexp-based record Understanding this principle is also important for regexp-based record
and field splitting (@pxref{Records}, and field splitting (@pxref{Records},
and also @pxref{Field Separators}). and also @pxref{Field Separators}).
@node Computed Regexps @node Computed Regexps
@section Using Dynamic Regexps @section Using Dynamic Regexps
@cindex regular expressions, computed @cindex regular expressions @subentry computed
@cindex regular expressions, dynamic @cindex regular expressions @subentry dynamic
@cindex @code{~} (tilde), @code{~} operator @cindex @code{~} (tilde), @code{~} operator
@cindex tilde (@code{~}), @code{~} operator @cindex tilde (@code{~}), @code{~} operator
@cindex @code{!} (exclamation point), @code{!~} operator @cindex @code{!} (exclamation point) @subentry @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator @cindex exclamation point (@code{!}) @subentry @code{!~} operator
@c @cindex operators, @code{~} @c @cindex operators, @code{~}
@c @cindex operators, @code{!~} @c @cindex operators, @code{!~}
The righthand side of a @samp{~} or @samp{!~} operator need not be a The righthand side of a @samp{~} or @samp{!~} operator need not be a
regexp constant (i.e., a string of characters between slashes). It may regexp constant (i.e., a string of characters between slashes). It may
be any expression. The expression is evaluated and converted to a string be any expression. The expression is evaluated and converted to a string
if necessary; the contents of the string are then used as the if necessary; the contents of the string are then used as the
regexp. A regexp computed in this way is called a @dfn{dynamic regexp. A regexp computed in this way is called a @dfn{dynamic
regexp} or a @dfn{computed regexp}: regexp} or a @dfn{computed regexp}:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
operators, be aware that there is a difference between a regexp constant operators, be aware that there is a difference between a regexp constant
enclosed in slashes and a string constant enclosed in double quotes. enclosed in slashes and a string constant enclosed in double quotes.
If you are going to use a string constant, you have to understand that If you are going to use a string constant, you have to understand that
the string is, in essence, scanned @emph{twice}: the first time when the string is, in essence, scanned @emph{twice}: the first time when
@command{awk} reads your program, and the second time when it goes to @command{awk} reads your program, and the second time when it goes to
match the string on the lefthand side of the operator with the pattern match the string on the lefthand side of the operator with the pattern
on the right. This is true of any string-valued expression (such as on the right. This is true of any string-valued expression (such as
@code{digits_regexp}, shown in the previous example), not just string constants. @code{digits_regexp}, shown in the previous example), not just string constants.
@end quotation @end quotation
@cindex regexp constants, slashes vs.@: quotes @cindex regexp constants @subentry slashes vs.@: quotes
@cindex @code{\} (backslash), in regexp constants @cindex @code{\} (backslash) @subentry in regexp constants
@cindex backslash (@code{\}), in regexp constants @cindex backslash (@code{\}) @subentry in regexp constants
@cindex @code{"} (double quote), in regexp constants @cindex @code{"} (double quote) @subentry in regexp constants
@cindex double quote (@code{"}), in regexp constants @cindex double quote (@code{"}) @subentry in regexp constants
What difference does it make if the string is What difference does it make if the string is
scanned twice? The answer has to do with escape sequences, and particularly scanned twice? The answer has to do with escape sequences, and particularly
with backslashes. To get a backslash into a regular expression inside a with backslashes. To get a backslash into a regular expression inside a
string, you have to type two backslashes. string, you have to type two backslashes.
For example, @code{/\*/} is a regexp constant for a literal @samp{*}. For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
Only one backslash is needed. To do the same thing with a string, Only one backslash is needed. To do the same thing with a string,
you have to type @code{"\\*"}. The first backslash escapes the you have to type @code{"\\*"}. The first backslash escapes the
second one so that the string actually contains the second one so that the string actually contains the
two characters @samp{\} and @samp{*}. two characters @samp{\} and @samp{*}.
@cindex troubleshooting, regexp constants vs.@: string constants @cindex troubleshooting @subentry regexp constants vs.@: string constants
@cindex regexp constants, vs.@: string constants @cindex regexp constants @subentry vs.@: string constants
@cindex string constants, vs.@: regexp constants @cindex string @subentry constants @subentry vs.@: regexp constants
Given that you can use both regexp and string constants to describe Given that you can use both regexp and string constants to describe
regular expressions, which should you use? The answer is ``regexp regular expressions, which should you use? The answer is ``regexp
constants,'' for several reasons: constants,'' for several reasons:
@itemize @value{BULLET} @itemize @value{BULLET}
@item @item
String constants are more complicated to write and String constants are more complicated to write and
more difficult to read. Using regexp constants makes your programs more difficult to read. Using regexp constants makes your programs
less error-prone. Not understanding the difference between the two less error-prone. Not understanding the difference between the two
kinds of constants is a common source of errors. kinds of constants is a common source of errors.
skipping to change at page 110, line ? skipping to change at page 110, line ?
that you have supplied a regexp and store it internally in a form that that you have supplied a regexp and store it internally in a form that
makes pattern matching more efficient. When using a string constant, makes pattern matching more efficient. When using a string constant,
@command{awk} must first convert the string into this internal form and @command{awk} must first convert the string into this internal form and
then perform the pattern matching. then perform the pattern matching.
@item @item
Using regexp constants is better form; it shows clearly that you Using regexp constants is better form; it shows clearly that you
intend a regexp match. intend a regexp match.
@end itemize @end itemize
@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps @cindex sidebar @subentry Using @code{\n} in Bracket Expressions of Dynamic Rege xps
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title > <sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title >
@end docbook @end docbook
@cindex regular expressions, dynamic, with embedded newlines @cindex regular expressions @subentry dynamic @subentry with embedded newlines
@cindex newlines, in dynamic regexps @cindex newlines @subentry in dynamic regexps
Some older versions of @command{awk} do not allow the newline Some older versions of @command{awk} do not allow the newline
character to be used inside a bracket expression for a dynamic regexp: character to be used inside a bracket expression for a dynamic regexp:
@example @example
$ @kbd{awk '$0 ~ "[ \t\n]"'} $ @kbd{awk '$0 ~ "[ \t\n]"'}
@error{} awk: newline in character class [ @error{} awk: newline in character class [
@error{} ]... @error{} ]...
@error{} source line number 1 @error{} source line number 1
@error{} context is @error{} context is
@error{} $0 ~ "[ >>> \t\n]" <<< @error{} $0 ~ "[ >>> \t\n]" <<<
@end example @end example
@cindex newlines, in regexp constants @cindex newlines @subentry in regexp constants
But a newline in a regexp constant works with no problem: But a newline in a regexp constant works with no problem:
@example @example
$ @kbd{awk '$0 ~ /[ \t\n]/'} $ @kbd{awk '$0 ~ /[ \t\n]/'}
@kbd{here is a sample line} @kbd{here is a sample line}
@print{} here is a sample line @print{} here is a sample line
@kbd{Ctrl-d} @kbd{Ctrl-d}
@end example @end example
@command{gawk} does not have this problem, and it isn't likely to @command{gawk} does not have this problem, and it isn't likely to
skipping to change at page 110, line ? skipping to change at page 110, line ?
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps} @center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps}
@cindex regular expressions, dynamic, with embedded newlines @cindex regular expressions @subentry dynamic @subentry with embedded newlines
@cindex newlines, in dynamic regexps @cindex newlines @subentry in dynamic regexps
Some older versions of @command{awk} do not allow the newline Some older versions of @command{awk} do not allow the newline
character to be used inside a bracket expression for a dynamic regexp: character to be used inside a bracket expression for a dynamic regexp:
@example @example
$ @kbd{awk '$0 ~ "[ \t\n]"'} $ @kbd{awk '$0 ~ "[ \t\n]"'}
@error{} awk: newline in character class [ @error{} awk: newline in character class [
@error{} ]... @error{} ]...
@error{} source line number 1 @error{} source line number 1
@error{} context is @error{} context is
@error{} $0 ~ "[ >>> \t\n]" <<< @error{} $0 ~ "[ >>> \t\n]" <<<
@end example @end example
@cindex newlines, in regexp constants @cindex newlines @subentry in regexp constants
But a newline in a regexp constant works with no problem: But a newline in a regexp constant works with no problem:
@example @example
$ @kbd{awk '$0 ~ /[ \t\n]/'} $ @kbd{awk '$0 ~ /[ \t\n]/'}
@kbd{here is a sample line} @kbd{here is a sample line}
@print{} here is a sample line @print{} here is a sample line
@kbd{Ctrl-d} @kbd{Ctrl-d}
@end example @end example
@command{gawk} does not have this problem, and it isn't likely to @command{gawk} does not have this problem, and it isn't likely to
occur often in practice, but it's worth noting for future reference. occur often in practice, but it's worth noting for future reference.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node GNU Regexp Operators @node GNU Regexp Operators
@section @command{gawk}-Specific Regexp Operators @section @command{gawk}-Specific Regexp Operators
@c This section adapted (long ago) from the regex-0.12 manual @c This section adapted (long ago) from the regex-0.12 manual
@cindex regular expressions, operators, @command{gawk} @cindex regular expressions @subentry operators @subentry @command{gawk}
@cindex @command{gawk}, regular expressions, operators @cindex @command{gawk} @subentry regular expressions @subentry operators
@cindex operators, GNU-specific @cindex operators @subentry GNU-specific
@cindex regular expressions, operators, for words @cindex regular expressions @subentry operators @subentry for words
@cindex word, regexp definition of @cindex word, regexp definition of
GNU software that deals with regular expressions provides a number of GNU software that deals with regular expressions provides a number of
additional regexp operators. These operators are described in this additional regexp operators. These operators are described in this
@value{SECTION} and are specific to @command{gawk}; @value{SECTION} and are specific to @command{gawk};
they are not available in other @command{awk} implementations. they are not available in other @command{awk} implementations.
Most of the additional operators deal with word matching. Most of the additional operators deal with word matching.
For our purposes, a @dfn{word} is a sequence of one or more letters, digits, For our purposes, a @dfn{word} is a sequence of one or more letters, digits,
or underscores (@samp{_}): or underscores (@samp{_}):
@table @code @table @code
@c @cindex operators, @code{\s} (@command{gawk}) @c @cindex operators, @code{\s} (@command{gawk})
@cindex backslash (@code{\}), @code{\s} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\s} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\s} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\s} operator (@command{gawk})
@item \s @item \s
Matches any whitespace character. Matches any space character as defined by the current locale.
Think of it as shorthand for Think of it as shorthand for
@w{@samp{[[:space:]]}}. @w{@samp{[[:space:]]}}.
@c @cindex operators, @code{\S} (@command{gawk}) @c @cindex operators, @code{\S} (@command{gawk})
@cindex backslash (@code{\}), @code{\S} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\S} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\S} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\S} operator (@command{gawk})
@item \S @item \S
Matches any character that is not whitespace. Matches any character that is not a space, as defined by the current locale.
Think of it as shorthand for Think of it as shorthand for
@w{@samp{[^[:space:]]}}. @w{@samp{[^[:space:]]}}.
@c @cindex operators, @code{\w} (@command{gawk}) @c @cindex operators, @code{\w} (@command{gawk})
@cindex backslash (@code{\}), @code{\w} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\w} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\w} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\w} operator (@command{gawk})
@item \w @item \w
Matches any word-constituent character---that is, it matches any Matches any word-constituent character---that is, it matches any
letter, digit, or underscore. Think of it as shorthand for letter, digit, or underscore. Think of it as shorthand for
@w{@samp{[[:alnum:]_]}}. @w{@samp{[[:alnum:]_]}}.
@c @cindex operators, @code{\W} (@command{gawk}) @c @cindex operators, @code{\W} (@command{gawk})
@cindex backslash (@code{\}), @code{\W} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\W} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\W} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\W} operator (@command{gawk})
@item \W @item \W
Matches any character that is not word-constituent. Matches any character that is not word-constituent.
Think of it as shorthand for Think of it as shorthand for
@w{@samp{[^[:alnum:]_]}}. @w{@samp{[^[:alnum:]_]}}.
@c @cindex operators, @code{\<} (@command{gawk}) @c @cindex operators, @code{\<} (@command{gawk})
@cindex backslash (@code{\}), @code{\<} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\<} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\<} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\<} operator (@command{gawk})
@item \< @item \<
Matches the empty string at the beginning of a word. Matches the empty string at the beginning of a word.
For example, @code{/\<away/} matches @samp{away} but not For example, @code{/\<away/} matches @samp{away} but not
@samp{stowaway}. @samp{stowaway}.
@c @cindex operators, @code{\>} (@command{gawk}) @c @cindex operators, @code{\>} (@command{gawk})
@cindex backslash (@code{\}), @code{\>} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\>} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\>} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\>} operator (@command{gawk})
@item \> @item \>
Matches the empty string at the end of a word. Matches the empty string at the end of a word.
For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}. For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}.
@c @cindex operators, @code{\y} (@command{gawk}) @c @cindex operators, @code{\y} (@command{gawk})
@cindex backslash (@code{\}), @code{\y} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\y} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\y} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\y} operator (@command{gawk})
@cindex word boundaries@comma{} matching @cindex word boundaries, matching
@item \y @item \y
Matches the empty string at either the beginning or the Matches the empty string at either the beginning or the
end of a word (i.e., the word boundar@strong{y}). For example, @samp{\yballs?\y } end of a word (i.e., the word boundar@strong{y}). For example, @samp{\yballs?\y }
matches either @samp{ball} or @samp{balls}, as a separate word. matches either @samp{ball} or @samp{balls}, as a separate word.
@c @cindex operators, @code{\B} (@command{gawk}) @c @cindex operators, @code{\B} (@command{gawk})
@cindex backslash (@code{\}), @code{\B} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\B} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\B} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\B} operator (@command{gawk})
@item \B @item \B
Matches the empty string that occurs between two Matches the empty string that occurs between two
word-constituent characters. For example, word-constituent characters. For example,
@code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}. @code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}.
@samp{\B} is essentially the opposite of @samp{\y}. @samp{\B} is essentially the opposite of @samp{\y}.
@end table @end table
@cindex buffers, operators for @cindex buffers @subentry operators for
@cindex regular expressions, operators, for buffers @cindex regular expressions @subentry operators @subentry for buffers
@cindex operators, string-matching, for buffers @cindex operators @subentry string-matching @subentry for buffers
There are two other operators that work on buffers. In Emacs, a There are two other operators that work on buffers. In Emacs, a
@dfn{buffer} is, naturally, an Emacs buffer. @dfn{buffer} is, naturally, an Emacs buffer.
Other GNU programs, including @command{gawk}, Other GNU programs, including @command{gawk},
consider the entire string to match as the buffer. consider the entire string to match as the buffer.
The operators are: The operators are:
@table @code @table @code
@item \` @item \`
@c @cindex operators, @code{\`} (@command{gawk}) @c @cindex operators, @code{\`} (@command{gawk})
@cindex backslash (@code{\}), @code{\`} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\`} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\`} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\`} operator (@command{gawk})
Matches the empty string at the Matches the empty string at the
beginning of a buffer (string) beginning of a buffer (string)
@c @cindex operators, @code{\'} (@command{gawk}) @c @cindex operators, @code{\'} (@command{gawk})
@cindex backslash (@code{\}), @code{\'} operator (@command{gawk}) @cindex backslash (@code{\}) @subentry @code{\'} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\'} operator (@command{gawk}) @cindex @code{\} (backslash) @subentry @code{\'} operator (@command{gawk})
@item \' @item \'
Matches the empty string at the Matches the empty string at the
end of a buffer (string) end of a buffer (string)
@end table @end table
@cindex @code{^} (caret), regexp operator @cindex @code{^} (caret) @subentry regexp operator
@cindex caret (@code{^}), regexp operator @cindex caret (@code{^}) @subentry regexp operator
@cindex @code{?} (question mark), regexp operator @cindex @code{?} (question mark) @subentry regexp operator
@cindex question mark (@code{?}), regexp operator @cindex question mark (@code{?}) @subentry regexp operator
Because @samp{^} and @samp{$} always work in terms of the beginning Because @samp{^} and @samp{$} always work in terms of the beginning
and end of strings, these operators don't add any new capabilities and end of strings, these operators don't add any new capabilities
for @command{awk}. They are provided for compatibility with other for @command{awk}. They are provided for compatibility with other
GNU software. GNU software.
@cindex @command{gawk}, word-boundary operator @cindex @command{gawk} @subentry word-boundary operator
@cindex word-boundary operator (@command{gawk}) @cindex word-boundary operator (@command{gawk})
@cindex operators, word-boundary (@command{gawk}) @cindex operators @subentry word-boundary (@command{gawk})
In other GNU software, the word-boundary operator is @samp{\b}. However, In other GNU software, the word-boundary operator is @samp{\b}. However,
that conflicts with the @command{awk} language's definition of @samp{\b} that conflicts with the @command{awk} language's definition of @samp{\b}
as backspace, so @command{gawk} uses a different letter. as backspace, so @command{gawk} uses a different letter.
An alternative method would have been to require two backslashes in the An alternative method would have been to require two backslashes in the
GNU operators, but this was deemed too confusing. The current GNU operators, but this was deemed too confusing. The current
method of using @samp{\y} for the GNU @samp{\b} appears to be the method of using @samp{\y} for the GNU @samp{\b} appears to be the
lesser of two evils. lesser of two evils.
@cindex regular expressions, @command{gawk}, command-line options @cindex regular expressions @subentry @command{gawk}, command-line options
@cindex @command{gawk}, command-line options, and regular expressions @cindex @command{gawk} @subentry command-line options, regular expressions and
The various command-line options The various command-line options
(@pxref{Options}) (@pxref{Options})
control how @command{gawk} interprets characters in regexps: control how @command{gawk} interprets characters in regexps:
@table @asis @table @asis
@item No options @item No options
In the default case, @command{gawk} provides all the facilities of In the default case, @command{gawk} provides all the facilities of
POSIX regexps and the POSIX regexps and the
@ifnotinfo @ifnotinfo
previously described previously described
skipping to change at page 110, line ? skipping to change at page 110, line ?
@item @code{--re-interval} @item @code{--re-interval}
Allow interval expressions in regexps, if @option{--traditional} Allow interval expressions in regexps, if @option{--traditional}
has been provided. has been provided.
Otherwise, interval expressions are available by default. Otherwise, interval expressions are available by default.
@end table @end table
@node Case-sensitivity @node Case-sensitivity
@section Case Sensitivity in Matching @section Case Sensitivity in Matching
@cindex regular expressions, case sensitivity @cindex regular expressions @subentry case sensitivity
@cindex case sensitivity, regexps and @cindex case sensitivity @subentry regexps and
Case is normally significant in regular expressions, both when matching Case is normally significant in regular expressions, both when matching
ordinary characters (i.e., not metacharacters) and inside bracket ordinary characters (i.e., not metacharacters) and inside bracket
expressions. Thus, a @samp{w} in a regular expression matches only a lowercase expressions. Thus, a @samp{w} in a regular expression matches only a lowercase
@samp{w} and not an uppercase @samp{W}. @samp{w} and not an uppercase @samp{W}.
The simplest way to do a case-independent match is to use a bracket The simplest way to do a case-independent match is to use a bracket
expression---for example, @samp{[Ww]}. However, this can be cumbersome if expression---for example, @samp{[Ww]}. However, this can be cumbersome if
you need to use it often, and it can make the regular expressions harder you need to use it often, and it can make the regular expressions harder
to read. There are two alternatives that you might prefer. to read. There are two alternatives that you might prefer.
skipping to change at page 110, line ? skipping to change at page 110, line ?
For example: For example:
@example @example
tolower($1) ~ /foo/ @{ @dots{} @} tolower($1) ~ /foo/ @{ @dots{} @}
@end example @end example
@noindent @noindent
converts the first field to lowercase before matching against it. converts the first field to lowercase before matching against it.
This works in any POSIX-compliant @command{awk}. This works in any POSIX-compliant @command{awk}.
@cindex @command{gawk}, regular expressions, case sensitivity @cindex @command{gawk} @subentry regular expressions @subentry case sensitivity
@cindex case sensitivity, @command{gawk} @cindex case sensitivity @subentry @command{gawk}
@cindex differences in @command{awk} and @command{gawk}, regular expressions @cindex differences in @command{awk} and @command{gawk} @subentry regular expres
sions
@cindex @code{~} (tilde), @code{~} operator @cindex @code{~} (tilde), @code{~} operator
@cindex tilde (@code{~}), @code{~} operator @cindex tilde (@code{~}), @code{~} operator
@cindex @code{!} (exclamation point), @code{!~} operator @cindex @code{!} (exclamation point) @subentry @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator @cindex exclamation point (@code{!}) @subentry @code{!~} operator
@cindex @code{IGNORECASE} variable, with @code{~} and @code{!~} operators @cindex @code{IGNORECASE} variable @subentry with @code{~} and @code{!~} operato
@cindex @command{gawk}, @code{IGNORECASE} variable in rs
@cindex @command{gawk} @subentry @code{IGNORECASE} variable in
@c @cindex variables, @code{IGNORECASE} @c @cindex variables, @code{IGNORECASE}
Another method, specific to @command{gawk}, is to set the variable Another method, specific to @command{gawk}, is to set the variable
@code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}). @code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}).
When @code{IGNORECASE} is not zero, @emph{all} regexp and string When @code{IGNORECASE} is not zero, @emph{all} regexp and string
operations ignore case. operations ignore case.
Changing the value of @code{IGNORECASE} dynamically controls the Changing the value of @code{IGNORECASE} dynamically controls the
case sensitivity of the program as it runs. Case is significant by case sensitivity of the program as it runs. Case is significant by
default because @code{IGNORECASE} (like most variables) is initialized default because @code{IGNORECASE} (like most variables) is initialized
to zero: to zero:
skipping to change at page 110, line ? skipping to change at page 110, line ?
@command{gawk}'s @code{IGNORECASE} variable lets you control the @command{gawk}'s @code{IGNORECASE} variable lets you control the
case sensitivity of regexp matching. In other @command{awk} case sensitivity of regexp matching. In other @command{awk}
versions, use @code{tolower()} or @code{toupper()}. versions, use @code{tolower()} or @code{toupper()}.
@end itemize @end itemize
@node Reading Files @node Reading Files
@chapter Reading Input Files @chapter Reading Input Files
@cindex reading input files @cindex reading input files
@cindex input files, reading @cindex input files @subentry reading
@cindex input files @cindex input files
@cindex @code{FILENAME} variable @cindex @code{FILENAME} variable
In the typical @command{awk} program, In the typical @command{awk} program,
@command{awk} reads all input either from the @command{awk} reads all input either from the
standard input (by default, this is the keyboard, but often it is a pipe from an other standard input (by default, this is the keyboard, but often it is a pipe from an other
command) or from files whose names you specify on the @command{awk} command) or from files whose names you specify on the @command{awk}
command line. If you specify input files, @command{awk} reads them command line. If you specify input files, @command{awk} reads them
in order, processing all the data from one before going on to the next. in order, processing all the data from one before going on to the next.
The name of the current input file can be found in the predefined variable The name of the current input file can be found in the predefined variable
@code{FILENAME} @code{FILENAME}
skipping to change at page 110, line ? skipping to change at page 110, line ?
* Retrying Input:: Retrying input after certain errors. * Retrying Input:: Retrying input after certain errors.
* Command-line directories:: What happens if you put a directory on the * Command-line directories:: What happens if you put a directory on the
command line. command line.
* Input Summary:: Input summary. * Input Summary:: Input summary.
* Input Exercises:: Exercises. * Input Exercises:: Exercises.
@end menu @end menu
@node Records @node Records
@section How Input Is Split into Records @section How Input Is Split into Records
@cindex input, splitting into records @cindex input @subentry splitting into records
@cindex records, splitting input into @cindex records @subentry splitting input into
@cindex @code{NR} variable @cindex @code{NR} variable
@cindex @code{FNR} variable @cindex @code{FNR} variable
@command{awk} divides the input for your program into records and fields. @command{awk} divides the input for your program into records and fields.
It keeps track of the number of records that have been read so far from It keeps track of the number of records that have been read so far from
the current input file. This value is stored in a predefined variable the current input file. This value is stored in a predefined variable
called @code{FNR}, which is reset to zero every time a new file is started. called @code{FNR}, which is reset to zero every time a new file is started.
Another predefined variable, @code{NR}, records the total number of input Another predefined variable, @code{NR}, records the total number of input
records read so far from all @value{DF}s. It starts at zero, but is records read so far from all @value{DF}s. It starts at zero, but is
never automatically reset to zero. never automatically reset to zero.
skipping to change at page 110, line ? skipping to change at page 110, line ?
This mechanism is explained in greater detail shortly. This mechanism is explained in greater detail shortly.
@menu @menu
* awk split records:: How standard @command{awk} splits records. * awk split records:: How standard @command{awk} splits records.
* gawk split records:: How @command{gawk} splits records. * gawk split records:: How @command{gawk} splits records.
@end menu @end menu
@node awk split records @node awk split records
@subsection Record Splitting with Standard @command{awk} @subsection Record Splitting with Standard @command{awk}
@cindex separators, for records @cindex separators @subentry for records
@cindex record separators @cindex record separators
Records are separated by a character called the @dfn{record separator}. Records are separated by a character called the @dfn{record separator}.
By default, the record separator is the newline character. By default, the record separator is the newline character.
This is why records are, by default, single lines. This is why records are, by default, single lines.
To use a different character for the record separator, To use a different character for the record separator,
simply assign that character to the predefined variable @code{RS}. simply assign that character to the predefined variable @code{RS}.
@cindex record separators, newlines as @cindex record separators @subentry newlines as
@cindex newlines, as record separators @cindex newlines @subentry as record separators
@cindex @code{RS} variable @cindex @code{RS} variable
Like any other variable, Like any other variable,
the value of @code{RS} can be changed in the @command{awk} program the value of @code{RS} can be changed in the @command{awk} program
with the assignment operator, @samp{=} with the assignment operator, @samp{=}
(@pxref{Assignment Ops}). (@pxref{Assignment Ops}).
The new record-separator character should be enclosed in quotation marks, The new record-separator character should be enclosed in quotation marks,
which indicate a string constant. Often, the right time to do this is which indicate a string constant. Often, the right time to do this is
at the beginning of execution, before any input is processed, at the beginning of execution, before any input is processed,
so that the very first record is read with the proper separator. so that the very first record is read with the proper separator.
To do this, use the special @code{BEGIN} pattern To do this, use the special @code{BEGIN} pattern
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
@noindent @noindent
It contains no @samp{u}, so there is no reason to split the record, It contains no @samp{u}, so there is no reason to split the record,
unlike the others, which each have one or more occurrences of the @samp{u}. unlike the others, which each have one or more occurrences of the @samp{u}.
In fact, this record is treated as part of the previous record; In fact, this record is treated as part of the previous record;
the newline separating them in the output the newline separating them in the output
is the original newline in the @value{DF}, not the one added by is the original newline in the @value{DF}, not the one added by
@command{awk} when it printed the record! @command{awk} when it printed the record!
@cindex record separators, changing @cindex record separators @subentry changing
@cindex separators, for records @cindex separators @subentry for records
Another way to change the record separator is on the command line, Another way to change the record separator is on the command line,
using the variable-assignment feature using the variable-assignment feature
(@pxref{Other Arguments}): (@pxref{Other Arguments}):
@example @example
awk '@{ print $0 @}' RS="u" mail-list awk '@{ print $0 @}' RS="u" mail-list
@end example @end example
@noindent @noindent
This sets @code{RS} to @samp{u} before processing @file{mail-list}. This sets @code{RS} to @samp{u} before processing @file{mail-list}.
skipping to change at page 110, line ? skipping to change at page 110, line ?
$ @kbd{echo | gawk --posix 'BEGIN @{ RS = "a" @} ; @{ print NF @}'} $ @kbd{echo | gawk --posix 'BEGIN @{ RS = "a" @} ; @{ print NF @}'}
@print{} 1 @print{} 1
@end example @end example
There is one field, consisting of a newline. The value of the built-in There is one field, consisting of a newline. The value of the built-in
variable @code{NF} is the number of fields in the current record. variable @code{NF} is the number of fields in the current record.
(In the normal case, @command{gawk} treats the newline as whitespace, (In the normal case, @command{gawk} treats the newline as whitespace,
printing @samp{0} as the result. Most other versions of @command{awk} printing @samp{0} as the result. Most other versions of @command{awk}
also act this way.) also act this way.)
@cindex dark corner, input files @cindex dark corner @subentry input files
Reaching the end of an input file terminates the current input record, Reaching the end of an input file terminates the current input record,
even if the last character in the file is not the character in @code{RS}. even if the last character in the file is not the character in @code{RS}.
@value{DARKCORNER} @value{DARKCORNER}
@cindex empty strings @cindex empty strings @seeentry{null strings}
@cindex null strings @cindex null strings
@cindex strings, empty, See null strings @cindex strings @subentry empty @seeentry{null strings}
The empty string @code{""} (a string without any characters) The empty string @code{""} (a string without any characters)
has a special meaning has a special meaning
as the value of @code{RS}. It means that records are separated as the value of @code{RS}. It means that records are separated
by one or more blank lines and nothing else. by one or more blank lines and nothing else.
@xref{Multiple Line} for more details. @xref{Multiple Line} for more details.
If you change the value of @code{RS} in the middle of an @command{awk} run, If you change the value of @code{RS} in the middle of an @command{awk} run,
the new value is used to delimit subsequent records, but the record the new value is used to delimit subsequent records, but the record
currently being processed, as well as records already processed, are not currently being processed, as well as records already processed, are not
affected. affected.
@cindex @command{gawk}, @code{RT} variable in @cindex @command{gawk} @subentry @code{RT} variable in
@cindex @code{RT} variable @cindex @code{RT} variable
@cindex records, terminating @cindex records @subentry terminating
@cindex terminating records @cindex terminating records
@cindex differences in @command{awk} and @command{gawk}, record separators @cindex differences in @command{awk} and @command{gawk} @subentry record separat
@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} var ors
iables @cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@cod
@cindex regular expressions, as record separators e{RT} variables
@cindex record separators, regular expressions as @cindex regular expressions @subentry as record separators
@cindex separators, for records, regular expressions as @cindex record separators @subentry regular expressions as
@cindex separators @subentry for records @subentry regular expressions as
After the end of the record has been determined, @command{gawk} After the end of the record has been determined, @command{gawk}
sets the variable @code{RT} to the text in the input that matched sets the variable @code{RT} to the text in the input that matched
@code{RS}. @code{RS}.
@node gawk split records @node gawk split records
@subsection Record Splitting with @command{gawk} @subsection Record Splitting with @command{gawk}
@cindex common extensions, @code{RS} as a regexp @cindex common extensions @subentry @code{RS} as a regexp
@cindex extensions, common@comma{} @code{RS} as a regexp @cindex extensions @subentry common @subentry @code{RS} as a regexp
When using @command{gawk}, the value of @code{RS} is not limited to a When using @command{gawk}, the value of @code{RS} is not limited to a
one-character string. If it contains more than one character, it is one-character string. If it contains more than one character, it is
treated as a regular expression treated as a regular expression
(@pxref{Regexp}). @value{COMMONEXT} (@pxref{Regexp}). @value{COMMONEXT}
In general, each record In general, each record
ends at the next string that matches the regular expression; the next ends at the next string that matches the regular expression; the next
record starts at the end of the matching string. This general rule is record starts at the end of the matching string. This general rule is
actually at work in the usual case, where @code{RS} contains just a actually at work in the usual case, where @code{RS} contains just a
newline: a record ends at the beginning of the next matching string (the newline: a record ends at the beginning of the next matching string (the
next newline in the input), and the following record starts just after next newline in the input), and the following record starts just after
skipping to change at page 110, line ? skipping to change at page 110, line ?
@quotation NOTE @quotation NOTE
Remember that in @command{awk}, the @samp{^} and @samp{$} anchor Remember that in @command{awk}, the @samp{^} and @samp{$} anchor
metacharacters match the beginning and end of a @emph{string}, and not metacharacters match the beginning and end of a @emph{string}, and not
the beginning and end of a @emph{line}. As a result, something like the beginning and end of a @emph{line}. As a result, something like
@samp{RS = "^[[:upper:]]"} can only match at the beginning of a file. @samp{RS = "^[[:upper:]]"} can only match at the beginning of a file.
This is because @command{gawk} views the input file as one long string This is because @command{gawk} views the input file as one long string
that happens to contain newline characters. that happens to contain newline characters.
It is thus best to avoid anchor metacharacters in the value of @code{RS}. It is thus best to avoid anchor metacharacters in the value of @code{RS}.
@end quotation @end quotation
@cindex @command{gawk}, @code{RT} variable in @cindex @command{gawk} @subentry @code{RT} variable in
@cindex @code{RT} variable @cindex @code{RT} variable
@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} var iables @cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@cod e{RT} variables
The use of @code{RS} as a regular expression and the @code{RT} The use of @code{RS} as a regular expression and the @code{RT}
variable are @command{gawk} extensions; they are not available in variable are @command{gawk} extensions; they are not available in
compatibility mode compatibility mode
(@pxref{Options}). (@pxref{Options}).
In compatibility mode, only the first character of the value of In compatibility mode, only the first character of the value of
@code{RS} determines the end of the record. @code{RS} determines the end of the record.
@cindex sidebar, @code{RS = "\0"} Is Not Portable @cindex Brian Kernighan's @command{awk}
@command{mawk} has allowed @code{RS} to be a regexp for decades.
As of October, 2019, BWK @command{awk} also supports it. Neither
version supplies @code{RT}, however.
@cindex sidebar @subentry @code{RS = "\0"} Is Not Portable
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>@code{RS = "\0"} Is Not Portable</title> <sidebar><title>@code{RS = "\0"} Is Not Portable</title>
@end docbook @end docbook
@cindex portability, data files as single record @cindex portability @subentry data files as single record
There are times when you might want to treat an entire @value{DF} as a There are times when you might want to treat an entire @value{DF} as a
single record. The only way to make this happen is to give @code{RS} single record. The only way to make this happen is to give @code{RS}
a value that you know doesn't occur in the input file. This is hard a value that you know doesn't occur in the input file. This is hard
to do in a general way, such that a program always works for arbitrary to do in a general way, such that a program always works for arbitrary
input files. input files.
You might think that for text files, the @sc{nul} character, which You might think that for text files, the @sc{nul} character, which
consists of a character with all bits equal to zero, is a good consists of a character with all bits equal to zero, is a good
value to use for @code{RS} in this case: value to use for @code{RS} in this case:
@example @example
BEGIN @{ RS = "\0" @} # whole file becomes one record? BEGIN @{ RS = "\0" @} # whole file becomes one record?
@end example @end example
@cindex differences in @command{awk} and @command{gawk}, strings, storing @cindex differences in @command{awk} and @command{gawk} @subentry strings @suben try storing
@command{gawk} in fact accepts this, and uses the @sc{nul} @command{gawk} in fact accepts this, and uses the @sc{nul}
character for the record separator. character for the record separator.
This works for certain special files, such as @file{/proc/environ} on This works for certain special files, such as @file{/proc/environ} on
GNU/Linux systems, where the @sc{nul} character is in fact the record separator. GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
However, this usage is @emph{not} portable However, this usage is @emph{not} portable
to most other @command{awk} implementations. to most other @command{awk} implementations.
@cindex dark corner, strings, storing @cindex dark corner @subentry strings, storing
Almost all other @command{awk} implementations@footnote{At least that we know Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the about.} store strings internally as C-style strings. C strings use the
@sc{nul} character as the string terminator. In effect, this means that @sc{nul} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}. @samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER} @value{DARKCORNER}
It happens that recent versions of @command{mawk} can use the @sc{nul} It happens that recent versions of @command{mawk} can use the @sc{nul}
character as a record separator. However, this is a special case: character as a record separator. However, this is a special case:
@command{mawk} does not allow embedded @sc{nul} characters in strings. @command{mawk} does not allow embedded @sc{nul} characters in strings.
(This may change in a future version of @command{mawk}.) (This may change in a future version of @command{mawk}.)
@cindex records, treating files as @cindex records @subentry treating files as
@cindex treating files, as single records @cindex treating files, as single records
@cindex single records, treating files as @cindex single records, treating files as
@xref{Readfile Function} for an interesting way to read @xref{Readfile Function} for an interesting way to read
whole files. If you are using @command{gawk}, see @ref{Extension Sample whole files. If you are using @command{gawk}, see @ref{Extension Sample
Readfile} for another option. Readfile} for another option.
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{@code{RS = "\0"} Is Not Portable} @center @b{@code{RS = "\0"} Is Not Portable}
@cindex portability, data files as single record @cindex portability @subentry data files as single record
There are times when you might want to treat an entire @value{DF} as a There are times when you might want to treat an entire @value{DF} as a
single record. The only way to make this happen is to give @code{RS} single record. The only way to make this happen is to give @code{RS}
a value that you know doesn't occur in the input file. This is hard a value that you know doesn't occur in the input file. This is hard
to do in a general way, such that a program always works for arbitrary to do in a general way, such that a program always works for arbitrary
input files. input files.
You might think that for text files, the @sc{nul} character, which You might think that for text files, the @sc{nul} character, which
consists of a character with all bits equal to zero, is a good consists of a character with all bits equal to zero, is a good
value to use for @code{RS} in this case: value to use for @code{RS} in this case:
@example @example
BEGIN @{ RS = "\0" @} # whole file becomes one record? BEGIN @{ RS = "\0" @} # whole file becomes one record?
@end example @end example
@cindex differences in @command{awk} and @command{gawk}, strings, storing @cindex differences in @command{awk} and @command{gawk} @subentry strings @suben try storing
@command{gawk} in fact accepts this, and uses the @sc{nul} @command{gawk} in fact accepts this, and uses the @sc{nul}
character for the record separator. character for the record separator.
This works for certain special files, such as @file{/proc/environ} on This works for certain special files, such as @file{/proc/environ} on
GNU/Linux systems, where the @sc{nul} character is in fact the record separator. GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
However, this usage is @emph{not} portable However, this usage is @emph{not} portable
to most other @command{awk} implementations. to most other @command{awk} implementations.
@cindex dark corner, strings, storing @cindex dark corner @subentry strings, storing
Almost all other @command{awk} implementations@footnote{At least that we know Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the about.} store strings internally as C-style strings. C strings use the
@sc{nul} character as the string terminator. In effect, this means that @sc{nul} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}. @samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER} @value{DARKCORNER}
It happens that recent versions of @command{mawk} can use the @sc{nul} It happens that recent versions of @command{mawk} can use the @sc{nul}
character as a record separator. However, this is a special case: character as a record separator. However, this is a special case:
@command{mawk} does not allow embedded @sc{nul} characters in strings. @command{mawk} does not allow embedded @sc{nul} characters in strings.
(This may change in a future version of @command{mawk}.) (This may change in a future version of @command{mawk}.)
@cindex records, treating files as @cindex records @subentry treating files as
@cindex treating files, as single records @cindex treating files, as single records
@cindex single records, treating files as @cindex single records, treating files as
@xref{Readfile Function} for an interesting way to read @xref{Readfile Function} for an interesting way to read
whole files. If you are using @command{gawk}, see @ref{Extension Sample whole files. If you are using @command{gawk}, see @ref{Extension Sample
Readfile} for another option. Readfile} for another option.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node Fields @node Fields
@section Examining Fields @section Examining Fields
@cindex examining fields @cindex examining fields
@cindex fields @cindex fields
@cindex accessing fields @cindex accessing fields
@cindex fields, examining @cindex fields @subentry examining
@cindex whitespace @subentry definition of
When @command{awk} reads an input record, the record is When @command{awk} reads an input record, the record is
automatically @dfn{parsed} or separated by the @command{awk} utility into chunks automatically @dfn{parsed} or separated by the @command{awk} utility into chunks
called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, called @dfn{fields}. By default, fields are separated by @dfn{whitespace},
like words in a line. like words in a line.
Whitespace in @command{awk} means any string of one or more spaces, Whitespace in @command{awk} means any string of one or more spaces,
TABs, or newlines; other characters TABs, or newlines; other characters
that are considered whitespace by other languages that are considered whitespace by other languages
(such as formfeed, vertical tab, etc.) are @emph{not} considered (such as formfeed, vertical tab, etc.) are @emph{not} considered
whitespace by @command{awk}. whitespace by @command{awk}.
The purpose of fields is to make it more convenient for you to refer to The purpose of fields is to make it more convenient for you to refer to
these pieces of the record. You don't have to use them---you can these pieces of the record. You don't have to use them---you can
operate on the whole record if you want---but fields are what make operate on the whole record if you want---but fields are what make
simple @command{awk} programs so powerful. simple @command{awk} programs so powerful.
@cindex field operator @code{$} @cindex field operator @code{$}
@cindex @code{$} (dollar sign), @code{$} field operator @cindex @code{$} (dollar sign) @subentry @code{$} field operator
@cindex dollar sign (@code{$}), @code{$} field operator @cindex dollar sign (@code{$}) @subentry @code{$} field operator
@cindex field operators@comma{} dollar sign as @cindex field operators, dollar sign as
You use a dollar sign (@samp{$}) You use a dollar sign (@samp{$})
to refer to a field in an @command{awk} program, to refer to a field in an @command{awk} program,
followed by the number of the field you want. Thus, @code{$1} followed by the number of the field you want. Thus, @code{$1}
refers to the first field, @code{$2} to the second, and so on. refers to the first field, @code{$2} to the second, and so on.
(Unlike in the Unix shells, the field numbers are not limited to single digits. (Unlike in the Unix shells, the field numbers are not limited to single digits.
@code{$127} is the 127th field in the record.) @code{$127} is the 127th field in the record.)
For example, suppose the following is a line of input: For example, suppose the following is a line of input:
@example @example
This seems like a pretty nice example. This seems like a pretty nice example.
@end example @end example
@noindent @noindent
Here the first field, or @code{$1}, is @samp{This}, the second field, or Here the first field, or @code{$1}, is @samp{This}, the second field, or
@code{$2}, is @samp{seems}, and so on. Note that the last field, @code{$2}, is @samp{seems}, and so on. Note that the last field,
@code{$7}, is @samp{example.}. Because there is no space between the @code{$7}, is @samp{example.}. Because there is no space between the
@samp{e} and the @samp{.}, the period is considered part of the seventh @samp{e} and the @samp{.}, the period is considered part of the seventh
field. field.
@cindex @code{NF} variable @cindex @code{NF} variable
@cindex fields, number of @cindex fields @subentry number of
@code{NF} is a predefined variable whose value is the number of fields @code{NF} is a predefined variable whose value is the number of fields
in the current record. @command{awk} automatically updates the value in the current record. @command{awk} automatically updates the value
of @code{NF} each time it reads a record. No matter how many fields of @code{NF} each time it reads a record. No matter how many fields
there are, the last field in a record can be represented by @code{$NF}. there are, the last field in a record can be represented by @code{$NF}.
So, @code{$NF} is the same as @code{$7}, which is @samp{example.}. So, @code{$NF} is the same as @code{$7}, which is @samp{example.}.
If you try to reference a field beyond the last If you try to reference a field beyond the last
one (such as @code{$8} when the record has only seven fields), you get one (such as @code{$8} when the record has only seven fields), you get
the empty string. (If used in a numeric operation, you get zero.) the empty string. (If used in a numeric operation, you get zero.)
The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
$ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list} $ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list}
@print{} Amelia F @print{} Amelia F
@print{} Broderick R @print{} Broderick R
@print{} Julie F @print{} Julie F
@print{} Samuel A @print{} Samuel A
@end example @end example
@node Nonconstant Fields @node Nonconstant Fields
@section Nonconstant Field Numbers @section Nonconstant Field Numbers
@cindex fields, numbers @cindex fields @subentry numbers
@cindex field numbers @cindex field numbers
A field number need not be a constant. Any expression in A field number need not be a constant. Any expression in
the @command{awk} language can be used after a @samp{$} to refer to a the @command{awk} language can be used after a @samp{$} to refer to a
field. The value of the expression specifies the field number. If the field. The value of the expression specifies the field number. If the
value is a string, rather than a number, it is converted to a number. value is a string, rather than a number, it is converted to a number.
Consider this example: Consider this example:
@example @example
awk '@{ print $NR @}' awk '@{ print $NR @}'
skipping to change at page 110, line ? skipping to change at page 110, line ?
As mentioned in @ref{Fields}, As mentioned in @ref{Fields},
@command{awk} stores the current record's number of fields in the built-in @command{awk} stores the current record's number of fields in the built-in
variable @code{NF} (also @pxref{Built-in Variables}). Thus, the expression variable @code{NF} (also @pxref{Built-in Variables}). Thus, the expression
@code{$NF} is not a special feature---it is the direct consequence of @code{$NF} is not a special feature---it is the direct consequence of
evaluating @code{NF} and using its value as a field number. evaluating @code{NF} and using its value as a field number.
@node Changing Fields @node Changing Fields
@section Changing the Contents of a Field @section Changing the Contents of a Field
@cindex fields, changing contents of @cindex fields @subentry changing contents of
The contents of a field, as seen by @command{awk}, can be changed within an The contents of a field, as seen by @command{awk}, can be changed within an
@command{awk} program; this changes what @command{awk} perceives as the @command{awk} program; this changes what @command{awk} perceives as the
current input record. (The actual input is untouched; @command{awk} @emph{never } current input record. (The actual input is untouched; @command{awk} @emph{never }
modifies the input file.) modifies the input file.)
Consider the following example and its output: Consider the following example and its output:
@example @example
$ @kbd{awk '@{ nboxes = $3 ; $3 = $3 - 10} $ @kbd{awk '@{ nboxes = $3 ; $3 = $3 - 10}
> @kbd{print nboxes, $3 @}' inventory-shipped} > @kbd{print nboxes, $3 @}' inventory-shipped}
@print{} 25 15 @print{} 25 15
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
$ @kbd{awk '@{ $6 = ($5 + $4 + $3 + $2)} $ @kbd{awk '@{ $6 = ($5 + $4 + $3 + $2)}
> @kbd{ print $6 @}' inventory-shipped} > @kbd{ print $6 @}' inventory-shipped}
@print{} 168 @print{} 168
@print{} 297 @print{} 297
@print{} 301 @print{} 301
@dots{} @dots{}
@end example @end example
@cindex adding, fields @cindex adding @subentry fields
@cindex fields, adding @cindex fields @subentry adding
@noindent @noindent
We've just created @code{$6}, whose value is the sum of fields We've just created @code{$6}, whose value is the sum of fields
@code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign @code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign
represents addition. For the file @file{inventory-shipped}, @code{$6} represents addition. For the file @file{inventory-shipped}, @code{$6}
represents the total number of parcels shipped for a particular month. represents the total number of parcels shipped for a particular month.
Creating a new field changes @command{awk}'s internal copy of the current Creating a new field changes @command{awk}'s internal copy of the current
input record, which is the value of @code{$0}. Thus, if you do @samp{print $0} input record, which is the value of @code{$0}. Thus, if you do @samp{print $0}
after adding a field, the record printed includes the new field, with after adding a field, the record printed includes the new field, with
the appropriate number of field separators between it and the previously the appropriate number of field separators between it and the previously
existing fields. existing fields.
@cindex @code{OFS} variable @cindex @code{OFS} variable
@cindex output field separator, See @code{OFS} variable @cindex output field separator @seeentry{@code{OFS} variable}
@cindex field separators, See Also @code{OFS} @cindex field separator @seealso{@code{OFS}}
This recomputation affects and is affected by This recomputation affects and is affected by
@code{NF} (the number of fields; @pxref{Fields}). @code{NF} (the number of fields; @pxref{Fields}).
For example, the value of @code{NF} is set to the number of the highest For example, the value of @code{NF} is set to the number of the highest
field you create. field you create.
The exact format of @code{$0} is also affected by a feature that has not been di scussed yet: The exact format of @code{$0} is also affected by a feature that has not been di scussed yet:
the @dfn{output field separator}, @code{OFS}, the @dfn{output field separator}, @code{OFS},
used to separate the fields (@pxref{Output Separators}). used to separate the fields (@pxref{Output Separators}).
Note, however, that merely @emph{referencing} an out-of-range field Note, however, that merely @emph{referencing} an out-of-range field
does @emph{not} change the value of either @code{$0} or @code{NF}. does @emph{not} change the value of either @code{$0} or @code{NF}.
skipping to change at page 110, line ? skipping to change at page 110, line ?
> @kbd{print $0; print NF @}'} > @kbd{print $0; print NF @}'}
@print{} a::c:d::new @print{} a::c:d::new
@print{} 6 @print{} 6
@end example @end example
@noindent @noindent
The intervening field, @code{$5}, is created with an empty value The intervening field, @code{$5}, is created with an empty value
(indicated by the second pair of adjacent colons), (indicated by the second pair of adjacent colons),
and @code{NF} is updated with the value six. and @code{NF} is updated with the value six.
@cindex dark corner, @code{NF} variable, decrementing @cindex dark corner @subentry @code{NF} variable, decrementing
@cindex @code{NF} variable, decrementing @cindex @code{NF} variable @subentry decrementing
Decrementing @code{NF} throws away the values of the fields Decrementing @code{NF} throws away the values of the fields
after the new value of @code{NF} and recomputes @code{$0}. after the new value of @code{NF} and recomputes @code{$0}.
@value{DARKCORNER} @value{DARKCORNER}
Here is an example: Here is an example:
@example @example
$ @kbd{echo a b c d e f | awk '@{ print "NF =", NF;} $ @kbd{echo a b c d e f | awk '@{ print "NF =", NF;}
> @kbd{ NF = 3; print $0 @}'} > @kbd{ NF = 3; print $0 @}'}
@print{} NF = 6 @print{} NF = 6
@print{} a b c @print{} a b c
@end example @end example
@cindex portability, @code{NF} variable@comma{} decrementing @cindex portability @subentry @code{NF} variable, decrementing
@quotation CAUTION @quotation CAUTION
Some versions of @command{awk} don't Some versions of @command{awk} don't
rebuild @code{$0} when @code{NF} is decremented. rebuild @code{$0} when @code{NF} is decremented.
Until August, 2018, this included BWK @command{awk}; fortunately Until August, 2018, this included BWK @command{awk}; fortunately
his version now handles this correctly. his version now handles this correctly.
@end quotation @end quotation
Finally, there are times when it is convenient to force Finally, there are times when it is convenient to force
@command{awk} to rebuild the entire record, using the current @command{awk} to rebuild the entire record, using the current
values of the fields and @code{OFS}. To do this, use the values of the fields and @code{OFS}. To do this, use the
skipping to change at page 110, line ? skipping to change at page 110, line ?
This forces @command{awk} to rebuild the record. It does help This forces @command{awk} to rebuild the record. It does help
to add a comment, as we've shown here. to add a comment, as we've shown here.
There is a flip side to the relationship between @code{$0} and There is a flip side to the relationship between @code{$0} and
the fields. Any assignment to @code{$0} causes the record to be the fields. Any assignment to @code{$0} causes the record to be
reparsed into fields using the @emph{current} value of @code{FS}. reparsed into fields using the @emph{current} value of @code{FS}.
This also applies to any built-in function that updates @code{$0}, This also applies to any built-in function that updates @code{$0},
such as @code{sub()} and @code{gsub()} such as @code{sub()} and @code{gsub()}
(@pxref{String Functions}). (@pxref{String Functions}).
@cindex sidebar, Understanding @code{$0} @cindex sidebar @subentry Understanding @code{$0}
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Understanding @code{$0}</title> <sidebar><title>Understanding @code{$0}</title>
@end docbook @end docbook
It is important to remember that @code{$0} is the @emph{full} It is important to remember that @code{$0} is the @emph{full}
record, exactly as it was read from the input. This includes record, exactly as it was read from the input. This includes
any leading or trailing whitespace, and the exact whitespace (or other any leading or trailing whitespace, and the exact whitespace (or other
characters) that separates the fields. characters) that separates the fields.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@menu @menu
* Default Field Splitting:: How fields are normally separated. * Default Field Splitting:: How fields are normally separated.
* Regexp Field Splitting:: Using regexps as the field separator. * Regexp Field Splitting:: Using regexps as the field separator.
* Single Character Fields:: Making each character a separate field. * Single Character Fields:: Making each character a separate field.
* Command Line Field Separator:: Setting @code{FS} from the command line. * Command Line Field Separator:: Setting @code{FS} from the command line.
* Full Line Fields:: Making the full line be a single field. * Full Line Fields:: Making the full line be a single field.
* Field Splitting Summary:: Some final points and a summary table. * Field Splitting Summary:: Some final points and a summary table.
@end menu @end menu
@cindex @code{FS} variable @cindex @code{FS} variable
@cindex fields, separating @cindex fields @subentry separating
@cindex field separators @cindex field separator
@cindex fields, separating @cindex fields @subentry separating
The @dfn{field separator}, which is either a single character or a regular The @dfn{field separator}, which is either a single character or a regular
expression, controls the way @command{awk} splits an input record into fields. expression, controls the way @command{awk} splits an input record into fields.
@command{awk} scans the input record for character sequences that @command{awk} scans the input record for character sequences that
match the separator; the fields themselves are the text between the matches. match the separator; the fields themselves are the text between the matches.
In the examples that follow, we use the bullet symbol (@bullet{}) to In the examples that follow, we use the bullet symbol (@bullet{}) to
represent spaces in the output. represent spaces in the output.
If the field separator is @samp{oo}, then the following line: If the field separator is @samp{oo}, then the following line:
@example @example
moo goo gai pan moo goo gai pan
@end example @end example
@noindent @noindent
is split into three fields: @samp{m}, @samp{@bullet{}g}, and is split into three fields: @samp{m}, @samp{@bullet{}g}, and
@samp{@bullet{}gai@bullet{}pan}. @samp{@bullet{}gai@bullet{}pan}.
Note the leading spaces in the values of the second and third fields. Note the leading spaces in the values of the second and third fields.
@cindex troubleshooting, @command{awk} uses @code{FS} not @code{IFS} @cindex troubleshooting @subentry @command{awk} uses @code{FS} not @code{IFS}
The field separator is represented by the predefined variable @code{FS}. The field separator is represented by the predefined variable @code{FS}.
Shell programmers take note: @command{awk} does @emph{not} use the Shell programmers take note: @command{awk} does @emph{not} use the
name @code{IFS} that is used by the POSIX-compliant shells (such as name @code{IFS} that is used by the POSIX-compliant shells (such as
the Unix Bourne shell, @command{sh}, or Bash). the Unix Bourne shell, @command{sh}, or Bash).
@cindex @code{FS} variable, changing value of @cindex @code{FS} variable @subentry changing value of
The value of @code{FS} can be changed in the @command{awk} program with the The value of @code{FS} can be changed in the @command{awk} program with the
assignment operator, @samp{=} (@pxref{Assignment Ops}). assignment operator, @samp{=} (@pxref{Assignment Ops}).
Often, the right time to do this is at the beginning of execution Often, the right time to do this is at the beginning of execution
before any input has been processed, so that the very first record before any input has been processed, so that the very first record
is read with the proper separator. To do this, use the special is read with the proper separator. To do this, use the special
@code{BEGIN} pattern @code{BEGIN} pattern
(@pxref{BEGIN/END}). (@pxref{BEGIN/END}).
For example, here we set the value of @code{FS} to the string For example, here we set the value of @code{FS} to the string
@code{","}: @code{","}:
skipping to change at page 110, line ? skipping to change at page 110, line ?
Given the input line: Given the input line:
@example @example
John Q. Smith, 29 Oak St., Walamazoo, MI 42139 John Q. Smith, 29 Oak St., Walamazoo, MI 42139
@end example @end example
@noindent @noindent
this @command{awk} program extracts and prints the string this @command{awk} program extracts and prints the string
@samp{@bullet{}29@bullet{}Oak@bullet{}St.}. @samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
@cindex field separators, choice of @cindex field separator @subentry choice of
@cindex regular expressions, as field separators @cindex regular expressions @subentry as field separators
@cindex field separators, regular expressions as @cindex field separator @subentry regular expression as
Sometimes the input data contains separator characters that don't Sometimes the input data contains separator characters that don't
separate fields the way you thought they would. For instance, the separate fields the way you thought they would. For instance, the
person's name in the example we just used might have a title or person's name in the example we just used might have a title or
suffix attached, such as: suffix attached, such as:
@example @example
John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
@end example @end example
@noindent @noindent
skipping to change at page 110, line ? skipping to change at page 110, line ?
@samp{@bullet{}29@bullet{}Oak@bullet{}St.}. @samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
If you were expecting the program to print the If you were expecting the program to print the
address, you would be surprised. The moral is to choose your data layout and address, you would be surprised. The moral is to choose your data layout and
separator characters carefully to prevent such problems. separator characters carefully to prevent such problems.
(If the data is not in a form that is easy to process, perhaps you (If the data is not in a form that is easy to process, perhaps you
can massage it first with a separate @command{awk} program.) can massage it first with a separate @command{awk} program.)
@node Default Field Splitting @node Default Field Splitting
@subsection Whitespace Normally Separates Fields @subsection Whitespace Normally Separates Fields
@cindex field separators, whitespace as @cindex field separator @subentry whitespace as
@cindex whitespace, as field separators @cindex whitespace @subentry as field separators
@cindex field separator @subentry @code{FS} variable and
@cindex separators @subentry field @subentry @code{FS} variable and
Fields are normally separated by whitespace sequences Fields are normally separated by whitespace sequences
(spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not
delimit an empty field. The default value of the field separator @code{FS} delimit an empty field. The default value of the field separator @code{FS}
is a string containing a single space, @w{@code{" "}}. If @command{awk} is a string containing a single space, @w{@code{" "}}. If @command{awk}
interpreted this value in the usual way, each space character would separate interpreted this value in the usual way, each space character would separate
fields, so two spaces in a row would make an empty field between them. fields, so two spaces in a row would make an empty field between them.
The reason this does not happen is that a single space as the value of The reason this does not happen is that a single space as the value of
@code{FS} is a special case---it is taken to specify the default manner @code{FS} is a special case---it is taken to specify the default manner
of delimiting fields. of delimiting fields.
If @code{FS} is any other single character, such as @code{","}, then If @code{FS} is any other single character, such as @code{","}, then
each occurrence of that character separates two fields. Two consecutive each occurrence of that character separates two fields. Two consecutive
occurrences delimit an empty field. If the character occurs at the occurrences delimit an empty field. If the character occurs at the
beginning or the end of the line, that too delimits an empty field. The beginning or the end of the line, that too delimits an empty field. The
space character is the only single character that does not follow these space character is the only single character that does not follow these
rules. rules.
@node Regexp Field Splitting @node Regexp Field Splitting
@subsection Using Regular Expressions to Separate Fields @subsection Using Regular Expressions to Separate Fields
@cindex regular expressions, as field separators @cindex regular expressions @subentry as field separators
@cindex field separators, regular expressions as @cindex field separator @subentry regular expression as
The previous @value{SUBSECTION} The previous @value{SUBSECTION}
discussed the use of single characters or simple strings as the discussed the use of single characters or simple strings as the
value of @code{FS}. value of @code{FS}.
More generally, the value of @code{FS} may be a string containing any More generally, the value of @code{FS} may be a string containing any
regular expression. In this case, each match in the record for the regular regular expression. In this case, each match in the record for the regular
expression separates fields. For example, the assignment: expression separates fields. For example, the assignment:
@example @example
FS = ", \t" FS = ", \t"
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
each letter): each letter):
@example @example
$ @kbd{echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @}} $ @kbd{echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @}}
> @kbd{@{ print $2 @}'} > @kbd{@{ print $2 @}'}
@print{} a @print{} a
@end example @end example
@noindent @noindent
@cindex null strings @cindex null strings
@cindex strings, null @cindex strings @subentry null
@cindex empty strings, See null strings
In this case, the first field is null, or empty. In this case, the first field is null, or empty.
The stripping of leading and trailing whitespace also comes into The stripping of leading and trailing whitespace also comes into
play whenever @code{$0} is recomputed. For instance, study this pipeline: play whenever @code{$0} is recomputed. For instance, study this pipeline:
@example @example
$ @kbd{echo ' a b c d' | awk '@{ print; $2 = $2; print @}'} $ @kbd{echo ' a b c d' | awk '@{ print; $2 = $2; print @}'}
@print{} a b c d @print{} a b c d
@print{} a b c d @print{} a b c d
@end example @end example
@noindent @noindent
The first @code{print} statement prints the record as it was read, The first @code{print} statement prints the record as it was read,
with leading whitespace intact. The assignment to @code{$2} rebuilds with leading whitespace intact. The assignment to @code{$2} rebuilds
@code{$0} by concatenating @code{$1} through @code{$NF} together, @code{$0} by concatenating @code{$1} through @code{$NF} together,
separated by the value of @code{OFS} (which is a space by default). separated by the value of @code{OFS} (which is a space by default).
Because the leading whitespace was ignored when finding @code{$1}, Because the leading whitespace was ignored when finding @code{$1},
it is not part of the new @code{$0}. Finally, the last @code{print} it is not part of the new @code{$0}. Finally, the last @code{print}
statement prints the new @code{$0}. statement prints the new @code{$0}.
@cindex @code{FS}, containing @code{^} @cindex @code{FS} variable @subentry containing @code{^}
@cindex @code{^} (caret), in @code{FS} @cindex @code{^} (caret) @subentry in @code{FS}
@cindex dark corner, @code{^}, in @code{FS} @cindex dark corner @subentry @code{^}, in @code{FS}
There is an additional subtlety to be aware of when using regular expressions There is an additional subtlety to be aware of when using regular expressions
for field splitting. for field splitting.
It is not well specified in the POSIX standard, or anywhere else, what @samp{^} It is not well specified in the POSIX standard, or anywhere else, what @samp{^}
means when splitting fields. Does the @samp{^} match only at the beginning of means when splitting fields. Does the @samp{^} match only at the beginning of
the entire record? Or is each field separator a new string? It turns out that the entire record? Or is each field separator a new string? It turns out that
different @command{awk} versions answer this question differently, and you different @command{awk} versions answer this question differently, and you
should not rely on any specific behavior in your programs. should not rely on any specific behavior in your programs.
@value{DARKCORNER} @value{DARKCORNER}
@cindex Brian Kernighan's @command{awk} @cindex Brian Kernighan's @command{awk}
skipping to change at page 110, line ? skipping to change at page 110, line ?
> @kbd{ printf "-->%s<--\n", $i @}'} > @kbd{ printf "-->%s<--\n", $i @}'}
@print{} --><-- @print{} --><--
@print{} -->AA<-- @print{} -->AA<--
@print{} -->xxBxx<-- @print{} -->xxBxx<--
@print{} -->C<-- @print{} -->C<--
@end example @end example
@node Single Character Fields @node Single Character Fields
@subsection Making Each Character a Separate Field @subsection Making Each Character a Separate Field
@cindex common extensions, single character fields @cindex common extensions @subentry single character fields
@cindex extensions, common@comma{} single character fields @cindex extensions @subentry common @subentry single character fields
@cindex differences in @command{awk} and @command{gawk}, single-character fields @cindex differences in @command{awk} and @command{gawk} @subentry single-charact
er fields
@cindex single-character fields @cindex single-character fields
@cindex fields, single-character @cindex fields @subentry single-character
There are times when you may want to examine each character There are times when you may want to examine each character
of a record separately. This can be done in @command{gawk} by of a record separately. This can be done in @command{gawk} by
simply assigning the null string (@code{""}) to @code{FS}. @value{COMMONEXT} simply assigning the null string (@code{""}) to @code{FS}. @value{COMMONEXT}
In this case, In this case,
each individual character in the record becomes a separate field. each individual character in the record becomes a separate field.
For example: For example:
@example @example
$ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}} $ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}}
> @kbd{@{} > @kbd{@{}
> @kbd{for (i = 1; i <= NF; i = i + 1)} > @kbd{for (i = 1; i <= NF; i = i + 1)}
> @kbd{print "Field", i, "is", $i} > @kbd{print "Field", i, "is", $i}
> @kbd{@}'} > @kbd{@}'}
@print{} Field 1 is a @print{} Field 1 is a
@print{} Field 2 is @print{} Field 2 is
@print{} Field 3 is b @print{} Field 3 is b
@end example @end example
@cindex dark corner, @code{FS} as null string @cindex dark corner @subentry @code{FS} as null string
@cindex @code{FS} variable, as null string @cindex @code{FS} variable @subentry null string as
Traditionally, the behavior of @code{FS} equal to @code{""} was not defined. Traditionally, the behavior of @code{FS} equal to @code{""} was not defined.
In this case, most versions of Unix @command{awk} simply treat the entire record In this case, most versions of Unix @command{awk} simply treat the entire record
as only having one field. as only having one field.
@value{DARKCORNER} @value{DARKCORNER}
In compatibility mode In compatibility mode
(@pxref{Options}), (@pxref{Options}),
if @code{FS} is the null string, then @command{gawk} also if @code{FS} is the null string, then @command{gawk} also
behaves this way. behaves this way.
@node Command Line Field Separator @node Command Line Field Separator
@subsection Setting @code{FS} from the Command Line @subsection Setting @code{FS} from the Command Line
@cindex @option{-F} option, command-line @cindex @option{-F} option @subentry command-line
@cindex field separator, on command line @cindex field separator @subentry on command line
@cindex command line, @code{FS} on@comma{} setting @cindex command line @subentry @code{FS} on, setting
@cindex @code{FS} variable, setting from command line @cindex @code{FS} variable @subentry setting from command line
@code{FS} can be set on the command line. Use the @option{-F} option to @code{FS} can be set on the command line. Use the @option{-F} option to
do so. For example: do so. For example:
@example @example
awk -F, '@var{program}' @var{input-files} awk -F, '@var{program}' @var{input-files}
@end example @end example
@noindent @noindent
sets @code{FS} to the @samp{,} character. Notice that the option uses sets @code{FS} to the @samp{,} character. Notice that the option uses
skipping to change at page 110, line ? skipping to change at page 110, line ?
Any special characters in the field separator must be escaped Any special characters in the field separator must be escaped
appropriately. For example, to use a @samp{\} as the field separator appropriately. For example, to use a @samp{\} as the field separator
on the command line, you would have to type: on the command line, you would have to type:
@example @example
# same as FS = "\\" # same as FS = "\\"
awk -F\\\\ '@dots{}' files @dots{} awk -F\\\\ '@dots{}' files @dots{}
@end example @end example
@noindent @noindent
@cindex field separator, backslash (@code{\}) as @cindex field separator @subentry backslash (@code{\}) as
@cindex @code{\} (backslash), as field separator @cindex @code{\} (backslash) @subentry as field separator
@cindex backslash (@code{\}), as field separator @cindex backslash (@code{\}) @subentry as field separator
Because @samp{\} is used for quoting in the shell, @command{awk} sees Because @samp{\} is used for quoting in the shell, @command{awk} sees
@samp{-F\\}. Then @command{awk} processes the @samp{\\} for escape @samp{-F\\}. Then @command{awk} processes the @samp{\\} for escape
characters (@pxref{Escape Sequences}), finally yielding characters (@pxref{Escape Sequences}), finally yielding
a single @samp{\} to use for the field separator. a single @samp{\} to use for the field separator.
@c @cindex historical features @c @cindex historical features
As a special case, in compatibility mode As a special case, in compatibility mode
(@pxref{Options}), (@pxref{Options}),
if the argument to @option{-F} is @samp{t}, then @code{FS} is set to if the argument to @option{-F} is @samp{t}, then @code{FS} is set to
the TAB character. If you type @samp{-F\t} at the the TAB character. If you type @samp{-F\t} at the
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
@end example @end example
The @samp{-} as part of the person's name was used as the field The @samp{-} as part of the person's name was used as the field
separator, instead of the @samp{-} in the phone number that was separator, instead of the @samp{-} in the phone number that was
originally intended. This demonstrates why you have to be careful in originally intended. This demonstrates why you have to be careful in
choosing your field and record separators. choosing your field and record separators.
@cindex Unix @command{awk}, password files@comma{} field separators and @cindex Unix @command{awk} @subentry password files, field separators and
Perhaps the most common use of a single character as the field separator Perhaps the most common use of a single character as the field separator
occurs when processing the Unix system password file. On many Unix occurs when processing the Unix system password file. On many Unix
systems, each user has a separate entry in the system password file, with one systems, each user has a separate entry in the system password file, with one
line per user. The information in these lines is separated by colons. line per user. The information in these lines is separated by colons.
The first field is the user's login name and the second is the user's The first field is the user's login name and the second is the user's
encrypted or shadow password. (A shadow password is indicated by the encrypted or shadow password. (A shadow password is indicated by the
presence of a single @samp{x} in the second field.) A password file presence of a single @samp{x} in the second field.) A password file
entry might look like this: entry might look like this:
@cindex Robbins, Arnold @cindex Robbins @subentry Arnold
@example @example
arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash
@end example @end example
The following program searches the system password file and prints The following program searches the system password file and prints
the entries for users whose full name is not indicated: the entries for users whose full name is not indicated:
@example @example
awk -F: '$5 == ""' /etc/passwd awk -F: '$5 == ""' /etc/passwd
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
setting @code{FS} to @code{"\n"} (a newline):@footnote{Thanks to setting @code{FS} to @code{"\n"} (a newline):@footnote{Thanks to
Andrew Schorr for this tip.} Andrew Schorr for this tip.}
@example @example
awk -F'\n' '@var{program}' @var{files @dots{}} awk -F'\n' '@var{program}' @var{files @dots{}}
@end example @end example
@noindent @noindent
When you do this, @code{$1} is the same as @code{$0}. When you do this, @code{$1} is the same as @code{$0}.
@cindex sidebar, Changing @code{FS} Does Not Affect the Fields @cindex sidebar @subentry Changing @code{FS} Does Not Affect the Fields
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Changing @code{FS} Does Not Affect the Fields</title> <sidebar><title>Changing @code{FS} Does Not Affect the Fields</title>
@end docbook @end docbook
@cindex POSIX @command{awk}, field separators and @cindex POSIX @command{awk} @subentry field separators and
@cindex field separator, POSIX and @cindex field separator @subentry POSIX and
According to the POSIX standard, @command{awk} is supposed to behave According to the POSIX standard, @command{awk} is supposed to behave
as if each record is split into fields at the time it is read. as if each record is split into fields at the time it is read.
In particular, this means that if you change the value of @code{FS} In particular, this means that if you change the value of @code{FS}
after a record is read, the values of the fields (i.e., how they were split) after a record is read, the values of the fields (i.e., how they were split)
should reflect the old value of @code{FS}, not the new one. should reflect the old value of @code{FS}, not the new one.
@cindex dark corner, field separators @cindex dark corner @subentry field separators
@cindex @command{sed} utility @cindex @command{sed} utility
@cindex stream editors @cindex stream editors
However, many older implementations of @command{awk} do not work this way. Inst ead, However, many older implementations of @command{awk} do not work this way. Inst ead,
they defer splitting the fields until a field is actually they defer splitting the fields until a field is actually
referenced. The fields are split referenced. The fields are split
using the @emph{current} value of @code{FS}! using the @emph{current} value of @code{FS}!
@value{DARKCORNER} @value{DARKCORNER}
This behavior can be difficult This behavior can be difficult
to diagnose. The following example illustrates the difference to diagnose. The following example illustrates the difference
between the two methods: between the two methods:
skipping to change at page 110, line ? skipping to change at page 110, line ?
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Changing @code{FS} Does Not Affect the Fields} @center @b{Changing @code{FS} Does Not Affect the Fields}
@cindex POSIX @command{awk}, field separators and @cindex POSIX @command{awk} @subentry field separators and
@cindex field separator, POSIX and @cindex field separator @subentry POSIX and
According to the POSIX standard, @command{awk} is supposed to behave According to the POSIX standard, @command{awk} is supposed to behave
as if each record is split into fields at the time it is read. as if each record is split into fields at the time it is read.
In particular, this means that if you change the value of @code{FS} In particular, this means that if you change the value of @code{FS}
after a record is read, the values of the fields (i.e., how they were split) after a record is read, the values of the fields (i.e., how they were split)
should reflect the old value of @code{FS}, not the new one. should reflect the old value of @code{FS}, not the new one.
@cindex dark corner, field separators @cindex dark corner @subentry field separators
@cindex @command{sed} utility @cindex @command{sed} utility
@cindex stream editors @cindex stream editors
However, many older implementations of @command{awk} do not work this way. Inst ead, However, many older implementations of @command{awk} do not work this way. Inst ead,
they defer splitting the fields until a field is actually they defer splitting the fields until a field is actually
referenced. The fields are split referenced. The fields are split
using the @emph{current} value of @code{FS}! using the @emph{current} value of @code{FS}!
@value{DARKCORNER} @value{DARKCORNER}
This behavior can be difficult This behavior can be difficult
to diagnose. The following example illustrates the difference to diagnose. The following example illustrates the difference
between the two methods: between the two methods:
skipping to change at page 110, line ? skipping to change at page 110, line ?
@item FS == @var{regexp} @item FS == @var{regexp}
Fields are separated by occurrences of characters that match @var{regexp}. Fields are separated by occurrences of characters that match @var{regexp}.
Leading and trailing matches of @var{regexp} delimit empty fields. Leading and trailing matches of @var{regexp} delimit empty fields.
@item FS == "" @item FS == ""
Each individual character in the record becomes a separate field. Each individual character in the record becomes a separate field.
(This is a common extension; it is not specified by the POSIX standard.) (This is a common extension; it is not specified by the POSIX standard.)
@end table @end table
@cindex sidebar, @code{FS} and @code{IGNORECASE} @cindex sidebar @subentry @code{FS} and @code{IGNORECASE}
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>@code{FS} and @code{IGNORECASE}</title> <sidebar><title>@code{FS} and @code{IGNORECASE}</title>
@end docbook @end docbook
The @code{IGNORECASE} variable The @code{IGNORECASE} variable
(@pxref{User-modified}) (@pxref{User-modified})
affects field splitting @emph{only} when the value of @code{FS} is a regexp. affects field splitting @emph{only} when the value of @code{FS} is a regexp.
It has no effect when @code{FS} is a single character, even if It has no effect when @code{FS} is a single character, even if
that character is a letter. Thus, in the following code: that character is a letter. Thus, in the following code:
skipping to change at page 110, line ? skipping to change at page 110, line ?
do it for you (e.g., @samp{FS = "[c]"}). In this case, @code{IGNORECASE} do it for you (e.g., @samp{FS = "[c]"}). In this case, @code{IGNORECASE}
will take effect. will take effect.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node Constant Size @node Constant Size
@section Reading Fixed-Width Data @section Reading Fixed-Width Data
@cindex data, fixed-width @cindex data, fixed-width
@cindex fixed-width data @cindex fixed-width data
@cindex advanced features, fixed-width data @cindex advanced features @subentry fixed-width data
@c O'Reilly doesn't like it as a note the first thing in the section. @c O'Reilly doesn't like it as a note the first thing in the section.
This @value{SECTION} discusses an advanced This @value{SECTION} discusses an advanced
feature of @command{gawk}. If you are a novice @command{awk} user, feature of @command{gawk}. If you are a novice @command{awk} user,
you might want to skip it on the first reading. you might want to skip it on the first reading.
@command{gawk} provides a facility for dealing with fixed-width fields @command{gawk} provides a facility for dealing with fixed-width fields
with no distinctive field separator. We discuss this feature in with no distinctive field separator. We discuss this feature in
the following @value{SUBSECTION}s. the following @value{SUBSECTION}s.
skipping to change at page 110, line ? skipping to change at page 110, line ?
anticipate the use of their output as input for other programs. anticipate the use of their output as input for other programs.
An example of the latter is a table where all the columns are lined up An example of the latter is a table where all the columns are lined up
by the use of a variable number of spaces and @emph{empty fields are by the use of a variable number of spaces and @emph{empty fields are
just spaces}. Clearly, @command{awk}'s normal field splitting based just spaces}. Clearly, @command{awk}'s normal field splitting based
on @code{FS} does not work well in this case. Although a portable on @code{FS} does not work well in this case. Although a portable
@command{awk} program can use a series of @code{substr()} calls on @command{awk} program can use a series of @code{substr()} calls on
@code{$0} (@pxref{String Functions}), this is awkward and inefficient @code{$0} (@pxref{String Functions}), this is awkward and inefficient
for a large number of fields. for a large number of fields.
@cindex troubleshooting, fatal errors, field widths@comma{} specifying @cindex troubleshooting @subentry fatal errors @subentry field widths, specifyin g
@cindex @command{w} utility @cindex @command{w} utility
@cindex @code{FIELDWIDTHS} variable @cindex @code{FIELDWIDTHS} variable
@cindex @command{gawk}, @code{FIELDWIDTHS} variable in @cindex @command{gawk} @subentry @code{FIELDWIDTHS} variable in
The splitting of an input record into fixed-width fields is specified by The splitting of an input record into fixed-width fields is specified by
assigning a string containing space-separated numbers to the built-in assigning a string containing space-separated numbers to the built-in
variable @code{FIELDWIDTHS}. Each number specifies the width of the variable @code{FIELDWIDTHS}. Each number specifies the width of the
field, @emph{including} columns between fields. If you want to ignore field, @emph{including} columns between fields. If you want to ignore
the columns between fields, you can specify the width as a separate the columns between fields, you can specify the width as a separate
field that is subsequently ignored. It is a fatal error to supply a field that is subsequently ignored. It is a fatal error to supply a
field width that has a negative value. field width that has a negative value.
The following data is the output of the Unix @command{w} utility. It is useful The following data is the output of the Unix @command{w} utility. It is useful
to illustrate the use of @code{FIELDWIDTHS}: to illustrate the use of @code{FIELDWIDTHS}:
skipping to change at page 110, line ? skipping to change at page 110, line ?
@item Too much data, but with @samp{*} supplied @item Too much data, but with @samp{*} supplied
For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4 *"} and the For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4 *"} and the
input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to
four, and @code{$4} has the value @code{"ddd"}. four, and @code{$4} has the value @code{"ddd"}.
@end table @end table
@node Splitting By Content @node Splitting By Content
@section Defining Fields by Content @section Defining Fields by Content
@menu
* More CSV:: More on CSV files.
@end menu
@c O'Reilly doesn't like it as a note the first thing in the section. @c O'Reilly doesn't like it as a note the first thing in the section.
This @value{SECTION} discusses an advanced This @value{SECTION} discusses an advanced
feature of @command{gawk}. If you are a novice @command{awk} user, feature of @command{gawk}. If you are a novice @command{awk} user,
you might want to skip it on the first reading. you might want to skip it on the first reading.
@cindex advanced features, specifying field content @cindex advanced features @subentry specifying field content
Normally, when using @code{FS}, @command{gawk} defines the fields as the Normally, when using @code{FS}, @command{gawk} defines the fields as the
parts of the record that occur in between each field separator. In other parts of the record that occur in between each field separator. In other
words, @code{FS} defines what a field @emph{is not}, instead of what a field words, @code{FS} defines what a field @emph{is not}, instead of what a field
@emph{is}. @emph{is}.
However, there are times when you really want to define the fields by However, there are times when you really want to define the fields by
what they are, and not by what they are not. what they are, and not by what they are not.
The most notorious such case The most notorious such case
is so-called @dfn{comma-separated values} (CSV) data. Many spreadsheet programs, is so-called @dfn{comma-separated values} (CSV) data. Many spreadsheet programs,
for example, can export their data into text files, where each record is for example, can export their data into text files, where each record is
skipping to change at page 110, line ? skipping to change at page 110, line ?
@uref{http://www.ietf.org/rfc/rfc4180.txt, RFC 4180} @uref{http://www.ietf.org/rfc/rfc4180.txt, RFC 4180}
standardizes the most common practices.} standardizes the most common practices.}
So, we might have data like this: So, we might have data like this:
@example @example
@c file eg/misc/addresses.csv @c file eg/misc/addresses.csv
Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
@c endfile @c endfile
@end example @end example
@cindex @command{gawk}, @code{FPAT} variable in @cindex @command{gawk} @subentry @code{FPAT} variable in
@cindex @code{FPAT} variable @cindex @code{FPAT} variable
The @code{FPAT} variable offers a solution for cases like this. The @code{FPAT} variable offers a solution for cases like this.
The value of @code{FPAT} should be a string that provides a regular expression. The value of @code{FPAT} should be a string that provides a regular expression.
This regular expression describes the contents of each field. This regular expression describes the contents of each field.
In the case of CSV data as presented here, each field is either ``anything that In the case of CSV data as presented here, each field is either ``anything that
is not a comma,'' or ``a double quote, anything that is not a double quote, and a is not a comma,'' or ``a double quote, anything that is not a double quote, and a
closing double quote.'' If written as a regular expression constant closing double quote.'' (There are more complicated definitions of CSV data,
treated shortly.)
If written as a regular expression constant
(@pxref{Regexp}), (@pxref{Regexp}),
we would have @code{/([^,]+)|("[^"]+")/}. we would have @code{/([^,]+)|("[^"]+")/}.
Writing this as a string requires us to escape the double quotes, leading to: Writing this as a string requires us to escape the double quotes, leading to:
@example @example
FPAT = "([^,]+)|(\"[^\"]+\")" FPAT = "([^,]+)|(\"[^\"]+\")"
@end example @end example
Putting this to use, here is a simple program to parse the data: Putting this to use, here is a simple program to parse the data:
skipping to change at page 110, line ? skipping to change at page 110, line ?
A straightforward improvement when processing CSV data of this sort A straightforward improvement when processing CSV data of this sort
would be to remove the quotes when they occur, with something like this: would be to remove the quotes when they occur, with something like this:
@example @example
if (substr($i, 1, 1) == "\"") @{ if (substr($i, 1, 1) == "\"") @{
len = length($i) len = length($i)
$i = substr($i, 2, len - 2) # Get text within the two quotes $i = substr($i, 2, len - 2) # Get text within the two quotes
@} @}
@end example @end example
As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
affects field splitting with @code{FPAT}.
Assigning a value to @code{FPAT} overrides field splitting
with @code{FS} and with @code{FIELDWIDTHS}.
@quotation NOTE @quotation NOTE
Some programs export CSV data that contains embedded newlines between Some programs export CSV data that contains embedded newlines between
the double quotes. @command{gawk} provides no way to deal with this. the double quotes. @command{gawk} provides no way to deal with this.
Even though a formal specification for CSV data exists, there isn't much Even though a formal specification for CSV data exists, there isn't much
more to be done; more to be done;
the @code{FPAT} mechanism provides an elegant solution for the majority the @code{FPAT} mechanism provides an elegant solution for the majority
of cases, and the @command{gawk} developers are satisfied with that. of cases, and the @command{gawk} developers are satisfied with that.
@end quotation @end quotation
As written, the regexp used for @code{FPAT} requires that each field As written, the regexp used for @code{FPAT} requires that each field
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
FPAT = "([^,]*)|(\"[^\"]+\")" FPAT = "([^,]*)|(\"[^\"]+\")"
@end example @end example
@c FIXME: 4/2015 @c FIXME: 4/2015
@c Consider use of FPAT = "([^,]*)|(\"[^\"]*\")" @c Consider use of FPAT = "([^,]*)|(\"[^\"]*\")"
@c (star in latter part of value) to allow quoted strings to be empty. @c (star in latter part of value) to allow quoted strings to be empty.
@c Per email from Ed Morton <mortoneccc@comcast.net> @c Per email from Ed Morton <mortoneccc@comcast.net>
As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
affects field splitting with @code{FPAT}.
Assigning a value to @code{FPAT} overrides field splitting
with @code{FS} and with @code{FIELDWIDTHS}.
Finally, the @code{patsplit()} function makes the same functionality Finally, the @code{patsplit()} function makes the same functionality
available for splitting regular strings (@pxref{String Functions}). available for splitting regular strings (@pxref{String Functions}).
@node More CSV
@subsection More on CSV Files
@cindex Collado, Manuel
Manuel Collado notes that in addition to commas, a CSV field can also
contains quotes, that have to be escaped by doubling them. The previously
described regexps fail to accept quoted fields with both commas and
quotes inside. He suggests that the simplest @code{FPAT} expression that
recognizes this kind of fields is @code{/([^,]*)|("([^"]|"")+")/}. He
provides the following input data to test these variants:
@example
@c file eg/misc/sample.csv
p,"q,r",s
p,"q""r",s
p,"q,""r",s
p,"",s
p,,s
@c endfile
@end example
@noindent
And here is his test program:
@example
@c file eg/misc/test-csv.awk
@group
BEGIN @{
fp[0] = "([^,]+)|(\"[^\"]+\")"
fp[1] = "([^,]*)|(\"[^\"]+\")"
fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
FPAT = fp[fpat+0]
@}
@end group
@group
@{
print "<" $0 ">"
printf("NF = %s ", NF)
for (i = 1; i <= NF; i++) @{
printf("<%s>", $i)
@}
print ""
@}
@end group
@c endfile
@end example
When run on the third variant, it produces:
@example
$ @kbd{gawk -v fpat=2 -f test-csv.awk sample.csv}
@print{} <p,"q,r",s>
@print{} NF = 3 <p><"q,r"><s>
@print{} <p,"q""r",s>
@print{} NF = 3 <p><"q""r"><s>
@print{} <p,"q,""r",s>
@print{} NF = 3 <p><"q,""r"><s>
@print{} <p,"",s>
@print{} NF = 3 <p><""><s>
@print{} <p,,s>
@print{} NF = 3 <p><><s>
@end example
@node Testing field creation @node Testing field creation
@section Checking How @command{gawk} Is Splitting Records @section Checking How @command{gawk} Is Splitting Records
@cindex @command{gawk}, splitting fields and @cindex @command{gawk} @subentry splitting fields and
As we've seen, @command{gawk} provides three independent methods to split As we've seen, @command{gawk} provides three independent methods to split
input records into fields. The mechanism used is based on which of the input records into fields. The mechanism used is based on which of the
three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was
last assigned to. In addition, an API input parser may choose to override last assigned to. In addition, an API input parser may choose to override
the record parsing mechanism; please refer to @ref{Input Parsers} for the record parsing mechanism; please refer to @ref{Input Parsers} for
further information about this feature. further information about this feature.
To restore normal field splitting after using @code{FIELDWIDTHS} To restore normal field splitting after using @code{FIELDWIDTHS}
and/or @code{FPAT}, simply assign a value to @code{FS}. and/or @code{FPAT}, simply assign a value to @code{FS}.
You can use @samp{FS = FS} to do this, You can use @samp{FS = FS} to do this,
skipping to change at page 110, line ? skipping to change at page 110, line ?
This information is useful when writing a function that needs to This information is useful when writing a function that needs to
temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records, temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records,
and then restore the original settings (@pxref{Passwd Functions} for an and then restore the original settings (@pxref{Passwd Functions} for an
example of such a function). example of such a function).
@node Multiple Line @node Multiple Line
@section Multiple-Line Records @section Multiple-Line Records
@cindex multiple-line records @cindex multiple-line records
@cindex records, multiline @cindex records @subentry multiline
@cindex input, multiline records @cindex input @subentry multiline records
@cindex files, reading, multiline records @cindex files @subentry reading @subentry multiline records
@cindex input, files, See input files @cindex input, files @seeentry{input files}
In some databases, a single line cannot conveniently hold all the In some databases, a single line cannot conveniently hold all the
information in one entry. In such cases, you can use multiline information in one entry. In such cases, you can use multiline
records. The first step in doing this is to choose your data format. records. The first step in doing this is to choose your data format.
@cindex record separators, with multiline records @cindex record separators @subentry with multiline records
One technique is to use an unusual character or string to separate One technique is to use an unusual character or string to separate
records. For example, you could use the formfeed character (written records. For example, you could use the formfeed character (written
@samp{\f} in @command{awk}, as in C) to separate them, making each record @samp{\f} in @command{awk}, as in C) to separate them, making each record
a page of the file. To do this, just set the variable @code{RS} to a page of the file. To do this, just set the variable @code{RS} to
@code{"\f"} (a string containing the formfeed character). Any @code{"\f"} (a string containing the formfeed character). Any
other character could equally well be used, as long as it won't be part other character could equally well be used, as long as it won't be part
of the data in a record. of the data in a record.
@cindex @code{RS} variable, multiline records and @cindex @code{RS} variable @subentry multiline records and
Another technique is to have blank lines separate records. By a special Another technique is to have blank lines separate records. By a special
dispensation, an empty string as the value of @code{RS} indicates that dispensation, an empty string as the value of @code{RS} indicates that
records are separated by one or more blank lines. When @code{RS} is set records are separated by one or more blank lines. When @code{RS} is set
to the empty string, each record always ends at the first blank line to the empty string, each record always ends at the first blank line
encountered. The next record doesn't start until the first nonblank encountered. The next record doesn't start until the first nonblank
line that follows. No matter how many blank lines appear in a row, they line that follows. No matter how many blank lines appear in a row, they
all act as one record separator. all act as one record separator.
(Blank lines must be completely empty; lines that contain only (Blank lines must be completely empty; lines that contain only
whitespace do not count.) whitespace do not count.)
@cindex leftmost longest match @cindex leftmost longest match
@cindex matching, leftmost longest @cindex matching @subentry leftmost longest
You can achieve the same effect as @samp{RS = ""} by assigning the You can achieve the same effect as @samp{RS = ""} by assigning the
string @code{"\n\n+"} to @code{RS}. This regexp matches the newline string @code{"\n\n+"} to @code{RS}. This regexp matches the newline
at the end of the record and one or more blank lines after the record. at the end of the record and one or more blank lines after the record.
In addition, a regular expression always matches the longest possible In addition, a regular expression always matches the longest possible
sequence when there is a choice sequence when there is a choice
(@pxref{Leftmost Longest}). (@pxref{Leftmost Longest}).
So, the next record doesn't start until So, the next record doesn't start until
the first nonblank line that follows---no matter how many blank lines the first nonblank line that follows---no matter how many blank lines
appear in a row, they are considered one record separator. appear in a row, they are considered one record separator.
@cindex dark corner, multiline records @cindex dark corner @subentry multiline records
However, there is an important difference between @samp{RS = ""} and However, there is an important difference between @samp{RS = ""} and
@samp{RS = "\n\n+"}. In the first case, leading newlines in the input @samp{RS = "\n\n+"}. In the first case, leading newlines in the input
@value{DF} are ignored, and if a file ends without extra blank lines @value{DF} are ignored, and if a file ends without extra blank lines
after the last record, the final newline is removed from the record. after the last record, the final newline is removed from the record.
In the second case, this special processing is not done. In the second case, this special processing is not done.
@value{DARKCORNER} @value{DARKCORNER}
@cindex field separator, in multiline records @cindex field separator @subentry in multiline records
@cindex @code{FS}, in multiline records @cindex @code{FS} variable @subentry in multiline records
Now that the input is separated into records, the second step is to Now that the input is separated into records, the second step is to
separate the fields in the records. One way to do this is to divide each separate the fields in the records. One way to do this is to divide each
of the lines into fields in the normal manner. This happens by default of the lines into fields in the normal manner. This happens by default
as the result of a special feature. When @code{RS} is set to the empty as the result of a special feature. When @code{RS} is set to the empty
string @emph{and} @code{FS} is set to a single character, string @emph{and} @code{FS} is set to a single character,
the newline character @emph{always} acts as a field separator. the newline character @emph{always} acts as a field separator.
This is in addition to whatever field separations result from This is in addition to whatever field separations result from
@code{FS}. @code{FS}.
@quotation NOTE @quotation NOTE
skipping to change at page 110, line ? skipping to change at page 110, line ?
always serves as a field separator, in addition to whatever value always serves as a field separator, in addition to whatever value
@code{FS} may have. Leading and trailing newlines in a file are ignored. @code{FS} may have. Leading and trailing newlines in a file are ignored.
@item RS == @var{regexp} @item RS == @var{regexp}
Records are separated by occurrences of characters that match @var{regexp}. Records are separated by occurrences of characters that match @var{regexp}.
Leading and trailing matches of @var{regexp} delimit empty records. Leading and trailing matches of @var{regexp} delimit empty records.
(This is a @command{gawk} extension; it is not specified by the (This is a @command{gawk} extension; it is not specified by the
POSIX standard.) POSIX standard.)
@end table @end table
@cindex @command{gawk}, @code{RT} variable in @cindex @command{gawk} @subentry @code{RT} variable in
@cindex @code{RT} variable @cindex @code{RT} variable
@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} var iables @cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@cod e{RT} variables
If not in compatibility mode (@pxref{Options}), @command{gawk} sets If not in compatibility mode (@pxref{Options}), @command{gawk} sets
@code{RT} to the input text that matched the value specified by @code{RS}. @code{RT} to the input text that matched the value specified by @code{RS}.
But if the input file ended without any text that matches @code{RS}, But if the input file ended without any text that matches @code{RS},
then @command{gawk} sets @code{RT} to the null string. then @command{gawk} sets @code{RT} to the null string.
@node Getline @node Getline
@section Explicit Input with @code{getline} @section Explicit Input with @code{getline}
@cindex @code{getline} command, explicit input with @cindex @code{getline} command @subentry explicit input with
@cindex input, explicit @cindex input @subentry explicit
So far we have been getting our input data from @command{awk}'s main So far we have been getting our input data from @command{awk}'s main
input stream---either the standard input (usually your keyboard, sometimes input stream---either the standard input (usually your keyboard, sometimes
the output from another program) or the the output from another program) or the
files specified on the command line. The @command{awk} language has a files specified on the command line. The @command{awk} language has a
special built-in command called @code{getline} that special built-in command called @code{getline} that
can be used to read input under your explicit control. can be used to read input under your explicit control.
The @code{getline} command is used in several different ways and should The @code{getline} command is used in several different ways and should
@emph{not} be used by beginners. @emph{not} be used by beginners.
The examples that follow the explanation of the @code{getline} command The examples that follow the explanation of the @code{getline} command
skipping to change at page 110, line ? skipping to change at page 110, line ?
@ifhtml @ifhtml
this @value{DOCUMENT} this @value{DOCUMENT}
@end ifhtml @end ifhtml
@ifnotinfo @ifnotinfo
@ifnothtml @ifnothtml
Parts I and II Parts I and II
@end ifnothtml @end ifnothtml
@end ifnotinfo @end ifnotinfo
and have a good knowledge of how @command{awk} works. and have a good knowledge of how @command{awk} works.
@cindex @command{gawk}, @code{ERRNO} variable in @cindex @command{gawk} @subentry @code{ERRNO} variable in
@cindex @code{ERRNO} variable, with @command{getline} command @cindex @code{ERRNO} variable @subentry with @command{getline} command
@cindex differences in @command{awk} and @command{gawk}, @code{getline} command @cindex differences in @command{awk} and @command{gawk} @subentry @code{getline}
@cindex @code{getline} command, return values command
@cindex @option{--sandbox} option, input redirection with @code{getline} @cindex @code{getline} command @subentry return values
@cindex @option{--sandbox} option @subentry input redirection with @code{getline
}
The @code{getline} command returns 1 if it finds a record and 0 if The @code{getline} command returns 1 if it finds a record and 0 if
it encounters the end of the file. If there is some error in getting it encounters the end of the file. If there is some error in getting
a record, such as a file that cannot be opened, then @code{getline} a record, such as a file that cannot be opened, then @code{getline}
returns @minus{}1. In this case, @command{gawk} sets the variable returns @minus{}1. In this case, @command{gawk} sets the variable
@code{ERRNO} to a string describing the error that occurred. @code{ERRNO} to a string describing the error that occurred.
If @code{ERRNO} indicates that the I/O operation may be If @code{ERRNO} indicates that the I/O operation may be
retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set, retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set,
then @code{getline} returns @minus{}2 then @code{getline} returns @minus{}2
skipping to change at page 110, line ? skipping to change at page 110, line ?
the patterns of any subsequent rules. The original value the patterns of any subsequent rules. The original value
of @code{$0} that triggered the rule that executed @code{getline} of @code{$0} that triggered the rule that executed @code{getline}
is lost. is lost.
By contrast, the @code{next} statement reads a new record By contrast, the @code{next} statement reads a new record
but immediately begins processing it normally, starting with the first but immediately begins processing it normally, starting with the first
rule in the program. @xref{Next Statement}. rule in the program. @xref{Next Statement}.
@end quotation @end quotation
@node Getline/Variable @node Getline/Variable
@subsection Using @code{getline} into a Variable @subsection Using @code{getline} into a Variable
@cindex @code{getline} into a variable @cindex @code{getline} command @subentry into a variable
@cindex variables, @code{getline} command into@comma{} using @cindex variables @subentry @code{getline} command into, using
You can use @samp{getline @var{var}} to read the next record from You can use @samp{getline @var{var}} to read the next record from
@command{awk}'s input into the variable @var{var}. No other processing is @command{awk}'s input into the variable @var{var}. No other processing is
done. done.
For example, suppose the next line is a comment or a special string, For example, suppose the next line is a comment or a special string,
and you want to read it without triggering and you want to read it without triggering
any rules. This form of @code{getline} allows you to read that line any rules. This form of @code{getline} allows you to read that line
and store it in a variable so that the main and store it in a variable so that the main
read-a-line-and-check-each-rule loop of @command{awk} never sees it. read-a-line-and-check-each-rule loop of @command{awk} never sees it.
The following example swaps every two lines of input: The following example swaps every two lines of input:
skipping to change at page 110, line ? skipping to change at page 110, line ?
The @code{getline} command used in this way sets only the variables The @code{getline} command used in this way sets only the variables
@code{NR}, @code{FNR}, and @code{RT} (and, of course, @var{var}). @code{NR}, @code{FNR}, and @code{RT} (and, of course, @var{var}).
The record is not The record is not
split into fields, so the values of the fields (including @code{$0}) and split into fields, so the values of the fields (including @code{$0}) and
the value of @code{NF} do not change. the value of @code{NF} do not change.
@node Getline/File @node Getline/File
@subsection Using @code{getline} from a File @subsection Using @code{getline} from a File
@cindex @code{getline} from a file @cindex @code{getline} command @subentry from a file
@cindex input redirection @cindex input redirection
@cindex redirection of input @cindex redirection @subentry of input
@cindex @code{<} (left angle bracket), @code{<} operator (I/O) @cindex @code{<} (left angle bracket) @subentry @code{<} operator (I/O)
@cindex left angle bracket (@code{<}), @code{<} operator (I/O) @cindex left angle bracket (@code{<}) @subentry @code{<} operator (I/O)
@cindex operators, input/output @cindex operators @subentry input/output
Use @samp{getline < @var{file}} to read the next record from @var{file}. Use @samp{getline < @var{file}} to read the next record from @var{file}.
Here, @var{file} is a string-valued expression that Here, @var{file} is a string-valued expression that
specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection}
because it directs input to come from a different place. because it directs input to come from a different place.
For example, the following For example, the following
program reads its input record from the file @file{secondary.input} when it program reads its input record from the file @file{secondary.input} when it
encounters a first field with a value equal to 10 in the current input encounters a first field with a value equal to 10 in the current input
file: file:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
print print
@} @}
@end example @end example
Because the main input stream is not used, the values of @code{NR} and Because the main input stream is not used, the values of @code{NR} and
@code{FNR} are not changed. However, the record it reads is split into fields in @code{FNR} are not changed. However, the record it reads is split into fields in
the normal manner, so the values of @code{$0} and the other fields are the normal manner, so the values of @code{$0} and the other fields are
changed, resulting in a new value of @code{NF}. changed, resulting in a new value of @code{NF}.
@code{RT} is also set. @code{RT} is also set.
@cindex POSIX @command{awk}, @code{<} operator and @cindex POSIX @command{awk} @subentry @code{<} operator and
@c Thanks to Paul Eggert for initial wording here @c Thanks to Paul Eggert for initial wording here
According to POSIX, @samp{getline < @var{expression}} is ambiguous if According to POSIX, @samp{getline < @var{expression}} is ambiguous if
@var{expression} contains unparenthesized operators other than @var{expression} contains unparenthesized operators other than
@samp{$}; for example, @samp{getline < dir "/" file} is ambiguous @samp{$}; for example, @samp{getline < dir "/" file} is ambiguous
because the concatenation operator (not discussed yet; @pxref{Concatenation}) because the concatenation operator (not discussed yet; @pxref{Concatenation})
is not parenthesized. You should write it as @samp{getline < (dir "/" file)} if is not parenthesized. You should write it as @samp{getline < (dir "/" file)} if
you want your program to be portable to all @command{awk} implementations. you want your program to be portable to all @command{awk} implementations.
@node Getline/Variable/File @node Getline/Variable/File
@subsection Using @code{getline} into a Variable from a File @subsection Using @code{getline} into a Variable from a File
@cindex variables, @code{getline} command into@comma{} using @cindex variables @subentry @code{getline} command into, using
Use @samp{getline @var{var} < @var{file}} to read input Use @samp{getline @var{var} < @var{file}} to read input
from the file from the file
@var{file}, and put it in the variable @var{var}. As earlier, @var{file} @var{file}, and put it in the variable @var{var}. As earlier, @var{file}
is a string-valued expression that specifies the file from which to read. is a string-valued expression that specifies the file from which to read.
In this version of @code{getline}, none of the predefined variables are In this version of @code{getline}, none of the predefined variables are
changed and the record is not split into fields. The only variable changed and the record is not split into fields. The only variable
changed is @var{var}.@footnote{This is not quite true. @code{RT} could changed is @var{var}.@footnote{This is not quite true. @code{RT} could
be changed if @code{RS} is a regular expression.} be changed if @code{RS} is a regular expression.}
skipping to change at page 110, line ? skipping to change at page 110, line ?
@subsection Using @code{getline} from a Pipe @subsection Using @code{getline} from a Pipe
@c From private email, dated October 2, 1988. Used by permission, March 2013. @c From private email, dated October 2, 1988. Used by permission, March 2013.
@cindex Kernighan, Brian @cindex Kernighan, Brian
@quotation @quotation
@i{Omniscience has much to recommend it. @i{Omniscience has much to recommend it.
Failing that, attention to details would be useful.} Failing that, attention to details would be useful.}
@author Brian Kernighan @author Brian Kernighan
@end quotation @end quotation
@cindex @code{|} (vertical bar), @code{|} operator (I/O) @cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O)
@cindex vertical bar (@code{|}), @code{|} operator (I/O) @cindex vertical bar (@code{|}) @subentry @code{|} operator (I/O)
@cindex input pipeline @cindex input pipeline
@cindex pipe, input @cindex pipe @subentry input
@cindex operators, input/output @cindex operators @subentry input/output
The output of a command can also be piped into @code{getline}, using The output of a command can also be piped into @code{getline}, using
@samp{@var{command} | getline}. In @samp{@var{command} | getline}. In
this case, the string @var{command} is run as a shell command and its output this case, the string @var{command} is run as a shell command and its output
is piped into @command{awk} to be used as input. This form of @code{getline} is piped into @command{awk} to be used as input. This form of @code{getline}
reads one record at a time from the pipe. reads one record at a time from the pipe.
For example, the following program copies its input to its output, except for For example, the following program copies its input to its output, except for
lines that begin with @samp{@@execute}, which are replaced by the output lines that begin with @samp{@@execute}, which are replaced by the output
produced by running the rest of the line as a shell command: produced by running the rest of the line as a shell command:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
foo foo
bar bar
baz baz
@@execute who @@execute who
bletch bletch
@end example @end example
@noindent @noindent
the program might produce: the program might produce:
@cindex Robbins, Bill @cindex Robbins @subentry Bill
@cindex Robbins, Miriam @cindex Robbins @subentry Miriam
@cindex Robbins, Arnold @cindex Robbins @subentry Arnold
@example @example
foo foo
bar bar
baz baz
arnold ttyv0 Jul 13 14:22 arnold ttyv0 Jul 13 14:22
miriam ttyp0 Jul 13 14:23 (murphy:0) miriam ttyp0 Jul 13 14:23 (murphy:0)
bill ttyp1 Jul 13 14:23 (murphy:0) bill ttyp1 Jul 13 14:23 (murphy:0)
bletch bletch
@end example @end example
@noindent @noindent
Notice that this program ran the command @command{who} and printed the result. Notice that this program ran the command @command{who} and printed the result.
(If you try this program yourself, you will of course get different results, (If you try this program yourself, you will of course get different results,
depending upon who is logged in on your system.) depending upon who is logged in on your system.)
This variation of @code{getline} splits the record into fields, sets the This variation of @code{getline} splits the record into fields, sets the
value of @code{NF}, and recomputes the value of @code{$0}. The values of value of @code{NF}, and recomputes the value of @code{$0}. The values of
@code{NR} and @code{FNR} are not changed. @code{NR} and @code{FNR} are not changed.
@code{RT} is set. @code{RT} is set.
@cindex POSIX @command{awk}, @code{|} I/O operator and @cindex POSIX @command{awk} @subentry @code{|} I/O operator and
@c Thanks to Paul Eggert for initial wording here @c Thanks to Paul Eggert for initial wording here
According to POSIX, @samp{@var{expression} | getline} is ambiguous if According to POSIX, @samp{@var{expression} | getline} is ambiguous if
@var{expression} contains unparenthesized operators other than @var{expression} contains unparenthesized operators other than
@samp{$}---for example, @samp{@w{"echo "} "date" | getline} is ambiguous @samp{$}---for example, @samp{@w{"echo "} "date" | getline} is ambiguous
because the concatenation operator is not parenthesized. You should because the concatenation operator is not parenthesized. You should
write it as @samp{(@w{"echo "} "date") | getline} if you want your program write it as @samp{(@w{"echo "} "date") | getline} if you want your program
to be portable to all @command{awk} implementations. to be portable to all @command{awk} implementations.
@cindex Brian Kernighan's @command{awk} @cindex Brian Kernighan's @command{awk}
@cindex @command{mawk} utility @cindex @command{mawk} utility
@quotation NOTE @quotation NOTE
Unfortunately, @command{gawk} has not been consistent in its treatment Unfortunately, @command{gawk} has not been consistent in its treatment
of a construct like @samp{@w{"echo "} "date" | getline}. of a construct like @samp{@w{"echo "} "date" | getline}.
Most versions, including the current version, treat it at as Most versions, including the current version, treat it as
@samp{@w{("echo "} "date") | getline}. @samp{@w{("echo "} "date") | getline}.
(This is also how BWK @command{awk} behaves.) (This is also how BWK @command{awk} behaves.)
Some versions instead treat it as Some versions instead treat it as
@samp{@w{"echo "} ("date" | getline)}. @samp{@w{"echo "} ("date" | getline)}.
(This is how @command{mawk} behaves.) (This is how @command{mawk} behaves.)
In short, @emph{always} use explicit parentheses, and then you won't In short, @emph{always} use explicit parentheses, and then you won't
have to worry. have to worry.
@end quotation @end quotation
@node Getline/Variable/Pipe @node Getline/Variable/Pipe
@subsection Using @code{getline} into a Variable from a Pipe @subsection Using @code{getline} into a Variable from a Pipe
@cindex variables, @code{getline} command into@comma{} using @cindex variables @subentry @code{getline} command into, using
When you use @samp{@var{command} | getline @var{var}}, the When you use @samp{@var{command} | getline @var{var}}, the
output of @var{command} is sent through a pipe to output of @var{command} is sent through a pipe to
@code{getline} and into the variable @var{var}. For example, the @code{getline} and into the variable @var{var}. For example, the
following program reads the current date and time into the variable following program reads the current date and time into the variable
@code{current_time}, using the @command{date} utility, and then @code{current_time}, using the @command{date} utility, and then
prints it: prints it:
@example @example
BEGIN @{ BEGIN @{
skipping to change at page 110, line ? skipping to change at page 110, line ?
According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if
@var{expression} contains unparenthesized operators other than @var{expression} contains unparenthesized operators other than
@samp{$}; for example, @samp{@w{"echo "} "date" | getline @var{var}} is ambiguou s @samp{$}; for example, @samp{@w{"echo "} "date" | getline @var{var}} is ambiguou s
because the concatenation operator is not parenthesized. You should because the concatenation operator is not parenthesized. You should
write it as @samp{(@w{"echo "} "date") | getline @var{var}} if you want your write it as @samp{(@w{"echo "} "date") | getline @var{var}} if you want your
program to be portable to other @command{awk} implementations. program to be portable to other @command{awk} implementations.
@end ifinfo @end ifinfo
@node Getline/Coprocess @node Getline/Coprocess
@subsection Using @code{getline} from a Coprocess @subsection Using @code{getline} from a Coprocess
@cindex coprocesses, @code{getline} from @cindex coprocesses @subentry @code{getline} from
@cindex @code{getline} command, coprocesses@comma{} using from @cindex @code{getline} command @subentry coprocesses, using from
@cindex @code{|} (vertical bar), @code{|&} operator (I/O) @cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O)
@cindex vertical bar (@code{|}), @code{|&} operator (I/O) @cindex vertical bar (@code{|}) @subentry @code{|&} operator (I/O)
@cindex operators, input/output @cindex operators @subentry input/output
@cindex differences in @command{awk} and @command{gawk}, input/output operators @cindex differences in @command{awk} and @command{gawk} @subentry input/output o
perators
Reading input into @code{getline} from a pipe is a one-way operation. Reading input into @code{getline} from a pipe is a one-way operation.
The command that is started with @samp{@var{command} | getline} only The command that is started with @samp{@var{command} | getline} only
sends data @emph{to} your @command{awk} program. sends data @emph{to} your @command{awk} program.
On occasion, you might want to send data to another program On occasion, you might want to send data to another program
for processing and then read the results back. for processing and then read the results back.
@command{gawk} allows you to start a @dfn{coprocess}, with which two-way @command{gawk} allows you to start a @dfn{coprocess}, with which two-way
communications are possible. This is done with the @samp{|&} communications are possible. This is done with the @samp{|&}
operator. operator.
skipping to change at page 110, line ? skipping to change at page 110, line ?
the normal manner, thus changing the values of @code{$0}, of the other fields, the normal manner, thus changing the values of @code{$0}, of the other fields,
and of @code{NF} and @code{RT}. and of @code{NF} and @code{RT}.
Coprocesses are an advanced feature. They are discussed here only because Coprocesses are an advanced feature. They are discussed here only because
this is the @value{SECTION} on @code{getline}. this is the @value{SECTION} on @code{getline}.
@xref{Two-way I/O}, @xref{Two-way I/O},
where coprocesses are discussed in more detail. where coprocesses are discussed in more detail.
@node Getline/Variable/Coprocess @node Getline/Variable/Coprocess
@subsection Using @code{getline} into a Variable from a Coprocess @subsection Using @code{getline} into a Variable from a Coprocess
@cindex variables, @code{getline} command into@comma{} using @cindex variables @subentry @code{getline} command into, using
When you use @samp{@var{command} |& getline @var{var}}, the output from When you use @samp{@var{command} |& getline @var{var}}, the output from
the coprocess @var{command} is sent through a two-way pipe to @code{getline} the coprocess @var{command} is sent through a two-way pipe to @code{getline}
and into the variable @var{var}. and into the variable @var{var}.
In this version of @code{getline}, none of the predefined variables are In this version of @code{getline}, none of the predefined variables are
changed and the record is not split into fields. The only variable changed and the record is not split into fields. The only variable
changed is @var{var}. changed is @var{var}.
However, @code{RT} is set. However, @code{RT} is set.
skipping to change at page 110, line ? skipping to change at page 110, line ?
Here are some miscellaneous points about @code{getline} that Here are some miscellaneous points about @code{getline} that
you should bear in mind: you should bear in mind:
@itemize @value{BULLET} @itemize @value{BULLET}
@item @item
When @code{getline} changes the value of @code{$0} and @code{NF}, When @code{getline} changes the value of @code{$0} and @code{NF},
@command{awk} does @emph{not} automatically jump to the start of the @command{awk} does @emph{not} automatically jump to the start of the
program and start testing the new record against every pattern. program and start testing the new record against every pattern.
However, the new record is tested against any subsequent rules. However, the new record is tested against any subsequent rules.
@cindex differences in @command{awk} and @command{gawk}, implementation limitati @cindex differences in @command{awk} and @command{gawk} @subentry implementation
ons limitations
@cindex implementation issues, @command{gawk}, limits @cindex implementation issues, @command{gawk} @subentry limits
@cindex @command{awk}, implementations, limits @cindex @command{awk} @subentry implementations @subentry limits
@cindex @command{gawk}, implementation issues, limits @cindex @command{gawk} @subentry implementation issues @subentry limits
@item @item
Some very old @command{awk} implementations limit the number of pipelines that a n @command{awk} Some very old @command{awk} implementations limit the number of pipelines that a n @command{awk}
program may have open to just one. In @command{gawk}, there is no such limit. program may have open to just one. In @command{gawk}, there is no such limit.
You can open as many pipelines (and coprocesses) as the underlying operating You can open as many pipelines (and coprocesses) as the underlying operating
system permits. system permits.
@cindex side effects, @code{FILENAME} variable @cindex side effects @subentry @code{FILENAME} variable
@cindex @code{FILENAME} variable, @code{getline}@comma{} setting with @cindex @code{FILENAME} variable @subentry @code{getline}, setting with
@cindex dark corner, @code{FILENAME} variable @cindex dark corner @subentry @code{FILENAME} variable
@cindex @code{getline} command, @code{FILENAME} variable and @cindex @code{getline} command @subentry @code{FILENAME} variable and
@cindex @code{BEGIN} pattern, @code{getline} and @cindex @code{BEGIN} pattern @subentry @code{getline} and
@item @item
An interesting side effect occurs if you use @code{getline} without a An interesting side effect occurs if you use @code{getline} without a
redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline} redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline}
reads from the command-line @value{DF}s, the first @code{getline} command reads from the command-line @value{DF}s, the first @code{getline} command
causes @command{awk} to set the value of @code{FILENAME}. Normally, causes @command{awk} to set the value of @code{FILENAME}. Normally,
@code{FILENAME} does not have a value inside @code{BEGIN} rules, because you @code{FILENAME} does not have a value inside @code{BEGIN} rules, because you
have not yet started to process the command-line @value{DF}s. have not yet started to process the command-line @value{DF}s.
@value{DARKCORNER} @value{DARKCORNER}
(See @ref{BEGIN/END}; (See @ref{BEGIN/END};
also @pxref{Auto-set}.) also @pxref{Auto-set}.)
skipping to change at page 110, line ? skipping to change at page 110, line ?
end-of-file is encountered before the element in @code{a} is assigned? end-of-file is encountered before the element in @code{a} is assigned?
@command{gawk} treats @code{getline} like a function call, and evaluates @command{gawk} treats @code{getline} like a function call, and evaluates
the expression @samp{a[++c]} before attempting to read from @file{f}. the expression @samp{a[++c]} before attempting to read from @file{f}.
However, some versions of @command{awk} only evaluate the expression once they However, some versions of @command{awk} only evaluate the expression once they
know that there is a string value to be assigned. know that there is a string value to be assigned.
@end itemize @end itemize
@node Getline Summary @node Getline Summary
@subsection Summary of @code{getline} Variants @subsection Summary of @code{getline} Variants
@cindex @code{getline} command, variants @cindex @code{getline} command @subentry variants
@ref{table-getline-variants} @ref{table-getline-variants}
summarizes the eight variants of @code{getline}, summarizes the eight variants of @code{getline},
listing which predefined variables are set by each one, listing which predefined variables are set by each one,
and whether the variant is standard or a @command{gawk} extension. and whether the variant is standard or a @command{gawk} extension.
Note: for each variant, @command{gawk} sets the @code{RT} predefined variable. Note: for each variant, @command{gawk} sets the @code{RT} predefined variable.
@float Table,table-getline-variants @float Table,table-getline-variants
@caption{@code{getline} variants and what they set} @caption{@code{getline} variants and what they set}
@multitable @columnfractions .33 .38 .27 @multitable @columnfractions .33 .38 .27
skipping to change at page 110, line ? skipping to change at page 110, line ?
@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{awk} @item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{awk}
@item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{ RT} @tab @command{gawk} @item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{ RT} @tab @command{gawk}
@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT } @tab @command{gawk} @item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT } @tab @command{gawk}
@end multitable @end multitable
@end float @end float
@node Read Timeout @node Read Timeout
@section Reading Input with a Timeout @section Reading Input with a Timeout
@cindex timeout, reading input @cindex timeout, reading input
@cindex differences in @command{awk} and @command{gawk}, read timeouts @cindex differences in @command{awk} and @command{gawk} @subentry read timeouts
This @value{SECTION} describes a feature that is specific to @command{gawk}. This @value{SECTION} describes a feature that is specific to @command{gawk}.
You may specify a timeout in milliseconds for reading input from the keyboard, You may specify a timeout in milliseconds for reading input from the keyboard,
a pipe, or two-way communication, including TCP/IP sockets. This can be done a pipe, or two-way communication, including TCP/IP sockets. This can be done
on a per-input, per-command, or per-connection basis, by setting a special on a per-input, per-command, or per-connection basis, by setting a special
element in the @code{PROCINFO} array (@pxref{Auto-set}): element in the @code{PROCINFO} array (@pxref{Auto-set}):
@example @example
PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds} PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds}
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
@quotation NOTE @quotation NOTE
You should not assume that the read operation will block You should not assume that the read operation will block
exactly after the tenth record has been printed. It is possible that exactly after the tenth record has been printed. It is possible that
@command{gawk} will read and buffer more than one record's @command{gawk} will read and buffer more than one record's
worth of data the first time. Because of this, changing the value worth of data the first time. Because of this, changing the value
of timeout like in the preceding example is not very useful. of timeout like in the preceding example is not very useful.
@end quotation @end quotation
@cindex @env{GAWK_READ_TIMEOUT} environment variable
@cindex environment variables @subentry @env{GAWK_READ_TIMEOUT}
If the @code{PROCINFO} element is not present and the If the @code{PROCINFO} element is not present and the
@env{GAWK_READ_TIMEOUT} environment variable exists, @env{GAWK_READ_TIMEOUT} environment variable exists,
@command{gawk} uses its value to initialize the timeout value. @command{gawk} uses its value to initialize the timeout value.
The exclusive use of the environment variable to specify timeout The exclusive use of the environment variable to specify timeout
has the disadvantage of not being able to control it has the disadvantage of not being able to control it
on a per-command or per-connection basis. on a per-command or per-connection basis.
@command{gawk} considers a timeout event to be an error even though @command{gawk} considers a timeout event to be an error even though
the attempt to read from the underlying device may the attempt to read from the underlying device may
succeed in a later attempt. This is a limitation, and it also succeed in a later attempt. This is a limitation, and it also
skipping to change at page 110, line ? skipping to change at page 110, line ?
@command{gawk} can stall waiting for an input device to be ready. @command{gawk} can stall waiting for an input device to be ready.
A network client can sometimes take a long time to establish A network client can sometimes take a long time to establish
a connection before it can start reading any data, a connection before it can start reading any data,
or the attempt to open a FIFO special file for reading can block or the attempt to open a FIFO special file for reading can block
indefinitely until some other process opens it for writing. indefinitely until some other process opens it for writing.
@node Retrying Input @node Retrying Input
@section Retrying Reads After Certain Input Errors @section Retrying Reads After Certain Input Errors
@cindex retrying input @cindex retrying input
@cindex differences in @command{awk} and @command{gawk}, retrying input @cindex differences in @command{awk} and @command{gawk} @subentry retrying input
This @value{SECTION} describes a feature that is specific to @command{gawk}. This @value{SECTION} describes a feature that is specific to @command{gawk}.
When @command{gawk} encounters an error while reading input, by When @command{gawk} encounters an error while reading input, by
default @code{getline} returns @minus{}1, and subsequent attempts to default @code{getline} returns @minus{}1, and subsequent attempts to
read from that file result in an end-of-file indication. However, you read from that file result in an end-of-file indication. However, you
may optionally instruct @command{gawk} to allow I/O to be retried when may optionally instruct @command{gawk} to allow I/O to be retried when
certain errors are encountered by setting a special element in certain errors are encountered by setting a special element in
the @code{PROCINFO} array (@pxref{Auto-set}): the @code{PROCINFO} array (@pxref{Auto-set}):
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
@minus{}2 and @minus{}2 and
further calls to @code{getline} may succeed. This applies to the @code{errno} further calls to @code{getline} may succeed. This applies to the @code{errno}
values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}. values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}.
This feature is useful in conjunction with This feature is useful in conjunction with
@code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file @code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file
descriptor has been configured to behave in a non-blocking fashion. descriptor has been configured to behave in a non-blocking fashion.
@node Command-line directories @node Command-line directories
@section Directories on the Command Line @section Directories on the Command Line
@cindex differences in @command{awk} and @command{gawk}, command-line directorie @cindex differences in @command{awk} and @command{gawk} @subentry command-line d
s irectories
@cindex directories, command-line @cindex directories @subentry command-line
@cindex command line, directories on @cindex command line @subentry directories on
According to the POSIX standard, files named on the @command{awk} According to the POSIX standard, files named on the @command{awk}
command line must be text files; it is a fatal error if they are not. command line must be text files; it is a fatal error if they are not.
Most versions of @command{awk} treat a directory on the command line as Most versions of @command{awk} treat a directory on the command line as
a fatal error. a fatal error.
By default, @command{gawk} produces a warning for a directory on the By default, @command{gawk} produces a warning for a directory on the
command line, but otherwise ignores it. This makes it easier to use command line, but otherwise ignores it. This makes it easier to use
shell wildcards with your @command{awk} program: shell wildcards with your @command{awk} program:
skipping to change at page 110, line ? skipping to change at page 110, line ?
Use @code{PROCINFO["FS"]} to see how fields are being split. Use @code{PROCINFO["FS"]} to see how fields are being split.
@item @item
Use @code{getline} in its various forms to read additional records Use @code{getline} in its various forms to read additional records
from the default input stream, from a file, or from a pipe or coprocess. from the default input stream, from a file, or from a pipe or coprocess.
@item @item
Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out
for @var{file}. for @var{file}.
@cindex POSIX mode
@item @item
Directories on the command line are fatal for standard @command{awk}; Directories on the command line are fatal for standard @command{awk};
@command{gawk} ignores them if not in POSIX mode. @command{gawk} ignores them if not in POSIX mode.
@end itemize @end itemize
@c EXCLUDE START @c EXCLUDE START
@node Input Exercises @node Input Exercises
@section Exercises @section Exercises
@enumerate @enumerate
@item @item
Using the @code{FIELDWIDTHS} variable (@pxref{Constant Size}), Using the @code{FIELDWIDTHS} variable (@pxref{Constant Size}),
write a program to read election data, where each record represents write a program to read election data, where each record represents
one voter's votes. Come up with a way to define which columns are one voter's votes. Come up with a way to define which columns are
associated with each ballot item, and print the total votes, associated with each ballot item, and print the total votes,
including abstentions, for each item. including abstentions, for each item.
@item
@ref{Plain Getline}, presented a program to remove C-style
comments (@samp{/* @dots{} */}) from the input. That program
does not work if one comment ends on one line and another one
starts later on the same line.
That can be fixed by making one simple change. What is it?
@end enumerate @end enumerate
@c EXCLUDE END @c EXCLUDE END
@node Printing @node Printing
@chapter Printing Output @chapter Printing Output
@cindex printing @cindex printing
@cindex output, printing, See printing @cindex output, printing @seeentry{printing}
One of the most common programming actions is to @dfn{print}, or output, One of the most common programming actions is to @dfn{print}, or output,
some or all of the input. Use the @code{print} statement some or all of the input. Use the @code{print} statement
for simple output, and the @code{printf} statement for simple output, and the @code{printf} statement
for fancier formatting. for fancier formatting.
The @code{print} statement is not limited when The @code{print} statement is not limited when
computing @emph{which} values to print. However, with two exceptions, computing @emph{which} values to print. However, with two exceptions,
you cannot specify @emph{how} to print them---how many you cannot specify @emph{how} to print them---how many
columns, whether to use exponential notation or not, and so on. columns, whether to use exponential notation or not, and so on.
(For the exceptions, @pxref{Output Separators} and (For the exceptions, @pxref{Output Separators} and
@ref{OFMT}.) @ref{OFMT}.)
skipping to change at page 110, line ? skipping to change at page 110, line ?
@noindent @noindent
The entire list of items may be optionally enclosed in parentheses. The The entire list of items may be optionally enclosed in parentheses. The
parentheses are necessary if any of the item expressions uses the @samp{>} parentheses are necessary if any of the item expressions uses the @samp{>}
relational operator; otherwise it could be confused with an output redirection relational operator; otherwise it could be confused with an output redirection
(@pxref{Redirection}). (@pxref{Redirection}).
The items to print can be constant strings or numbers, fields of the The items to print can be constant strings or numbers, fields of the
current record (such as @code{$1}), variables, or any @command{awk} current record (such as @code{$1}), variables, or any @command{awk}
expression. Numeric values are converted to strings and then printed. expression. Numeric values are converted to strings and then printed.
@cindex records, printing @cindex records @subentry printing
@cindex lines, blank, printing @cindex lines @subentry blank, printing
@cindex text, printing @cindex text, printing
The simple statement @samp{print} with no items is equivalent to The simple statement @samp{print} with no items is equivalent to
@samp{print $0}: it prints the entire current record. To print a blank @samp{print $0}: it prints the entire current record. To print a blank
line, use @samp{print ""}. line, use @samp{print ""}.
To print a fixed piece of text, use a string constant, such as To print a fixed piece of text, use a string constant, such as
@w{@code{"Don't Panic"}}, as one item. If you forget to use the @w{@code{"Don't Panic"}}, as one item. If you forget to use the
double-quote characters, your text is taken as an @command{awk} double-quote characters, your text is taken as an @command{awk}
expression, and you will probably get an error. Keep in mind that a expression, and you will probably get an error. Keep in mind that a
space is printed between any two items. space is printed between any two items.
skipping to change at page 110, line ? skipping to change at page 110, line ?
pattern--action statement, for example. pattern--action statement, for example.
@node Print Examples @node Print Examples
@section @code{print} Statement Examples @section @code{print} Statement Examples
Each @code{print} statement makes at least one line of output. However, it Each @code{print} statement makes at least one line of output. However, it
isn't limited to only one line. If an item value is a string containing a isn't limited to only one line. If an item value is a string containing a
newline, the newline is output along with the rest of the string. A newline, the newline is output along with the rest of the string. A
single @code{print} statement can make any number of lines this way. single @code{print} statement can make any number of lines this way.
@cindex newlines, printing @cindex newlines @subentry printing
The following is an example of printing a string that contains embedded The following is an example of printing a string that contains embedded
@ifinfo @ifinfo
newlines newlines
(the @samp{\n} is an escape sequence, used to represent the newline (the @samp{\n} is an escape sequence, used to represent the newline
character; @pxref{Escape Sequences}): character; @pxref{Escape Sequences}):
@end ifinfo @end ifinfo
@ifhtml @ifhtml
newlines newlines
(the @samp{\n} is an escape sequence, used to represent the newline (the @samp{\n} is an escape sequence, used to represent the newline
character; @pxref{Escape Sequences}): character; @pxref{Escape Sequences}):
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
@group @group
$ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'} $ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'}
@print{} line one @print{} line one
@print{} line two @print{} line two
@print{} line three @print{} line three
@end group @end group
@end example @end example
@cindex fields, printing @cindex fields @subentry printing
The next example, which is run on the @file{inventory-shipped} file, The next example, which is run on the @file{inventory-shipped} file,
prints the first two fields of each input record, with a space between prints the first two fields of each input record, with a space between
them: them:
@example @example
$ @kbd{awk '@{ print $1, $2 @}' inventory-shipped} $ @kbd{awk '@{ print $1, $2 @}' inventory-shipped}
@print{} Jan 13 @print{} Jan 13
@print{} Feb 15 @print{} Feb 15
@print{} Mar 15 @print{} Mar 15
@dots{} @dots{}
@end example @end example
@cindex @code{print} statement, commas, omitting @cindex @code{print} statement @subentry commas, omitting
@cindex troubleshooting, @code{print} statement@comma{} omitting commas @cindex troubleshooting @subentry @code{print} statement, omitting commas
A common mistake in using the @code{print} statement is to omit the comma A common mistake in using the @code{print} statement is to omit the comma
between two items. This often has the effect of making the items run between two items. This often has the effect of making the items run
together in the output, with no space. The reason for this is that together in the output, with no space. The reason for this is that
juxtaposing two string expressions in @command{awk} means to concatenate juxtaposing two string expressions in @command{awk} means to concatenate
them. Here is the same program, without the comma: them. Here is the same program, without the comma:
@example @example
$ @kbd{awk '@{ print $1 $2 @}' inventory-shipped} $ @kbd{awk '@{ print $1 $2 @}' inventory-shipped}
@print{} Jan13 @print{} Jan13
@print{} Feb15 @print{} Feb15
@print{} Mar15 @print{} Mar15
@dots{} @dots{}
@end example @end example
@cindex @code{BEGIN} pattern, headings@comma{} adding @cindex @code{BEGIN} pattern @subentry headings, adding
To someone unfamiliar with the @file{inventory-shipped} file, neither To someone unfamiliar with the @file{inventory-shipped} file, neither
example's output makes much sense. A heading line at the beginning example's output makes much sense. A heading line at the beginning
would make it clearer. Let's add some headings to our table of months would make it clearer. Let's add some headings to our table of months
(@code{$1}) and green crates shipped (@code{$2}). We do this using (@code{$1}) and green crates shipped (@code{$2}). We do this using
a @code{BEGIN} rule (@pxref{BEGIN/END}) so that the headings are only a @code{BEGIN} rule (@pxref{BEGIN/END}) so that the headings are only
printed once: printed once:
@example @example
awk 'BEGIN @{ print "Month Crates" awk 'BEGIN @{ print "Month Crates"
print "----- ------" @} print "----- ------" @}
skipping to change at page 110, line ? skipping to change at page 110, line ?
two fields: two fields:
@example @example
@group @group
awk 'BEGIN @{ print "Month Crates" awk 'BEGIN @{ print "Month Crates"
print "----- ------" @} print "----- ------" @}
@{ print $1, " ", $2 @}' inventory-shipped @{ print $1, " ", $2 @}' inventory-shipped
@end group @end group
@end example @end example
@cindex @code{printf} statement, columns@comma{} aligning @cindex @code{printf} statement @subentry columns, aligning
@cindex columns, aligning @cindex columns @subentry aligning
Lining up columns this way can get pretty Lining up columns this way can get pretty
complicated when there are many columns to fix. Counting spaces for two complicated when there are many columns to fix. Counting spaces for two
or three columns is simple, but any more than this can take up or three columns is simple, but any more than this can take up
a lot of time. This is why the @code{printf} statement was a lot of time. This is why the @code{printf} statement was
created (@pxref{Printf}); created (@pxref{Printf});
one of its specialties is lining up columns of data. one of its specialties is lining up columns of data.
@cindex line continuations, in @code{print} statement @cindex line continuations @subentry in @code{print} statement
@cindex @code{print} statement, line continuations and @cindex @code{print} statement @subentry line continuations and
@quotation NOTE @quotation NOTE
You can continue either a @code{print} or You can continue either a @code{print} or
@code{printf} statement simply by putting a newline after any comma @code{printf} statement simply by putting a newline after any comma
(@pxref{Statements/Lines}). (@pxref{Statements/Lines}).
@end quotation @end quotation
@node Output Separators @node Output Separators
@section Output Separators @section Output Separators
@cindex @code{OFS} variable @cindex @code{OFS} variable
skipping to change at page 110, line ? skipping to change at page 110, line ?
predefined variable @code{OFS}. The initial value of this variable predefined variable @code{OFS}. The initial value of this variable
is the string @w{@code{" "}} (i.e., a single space). is the string @w{@code{" "}} (i.e., a single space).
The output from an entire @code{print} statement is called an @dfn{output The output from an entire @code{print} statement is called an @dfn{output
record}. Each @code{print} statement outputs one output record, and record}. Each @code{print} statement outputs one output record, and
then outputs a string called the @dfn{output record separator} (or then outputs a string called the @dfn{output record separator} (or
@code{ORS}). The initial value of @code{ORS} is the string @code{"\n"} @code{ORS}). The initial value of @code{ORS} is the string @code{"\n"}
(i.e., a newline character). Thus, each @code{print} statement normally (i.e., a newline character). Thus, each @code{print} statement normally
makes a separate line. makes a separate line.
@cindex output, records @cindex output @subentry records
@cindex output record separator, See @code{ORS} variable @cindex output record separator @seeentry{@code{ORS} variable}
@cindex @code{ORS} variable @cindex @code{ORS} variable
@cindex @code{BEGIN} pattern, @code{OFS}/@code{ORS} variables, assigning values to @cindex @code{BEGIN} pattern @subentry @code{OFS}/@code{ORS} variables, assignin g values to
In order to change how output fields and records are separated, assign In order to change how output fields and records are separated, assign
new values to the variables @code{OFS} and @code{ORS}. The usual new values to the variables @code{OFS} and @code{ORS}. The usual
place to do this is in the @code{BEGIN} rule place to do this is in the @code{BEGIN} rule
(@pxref{BEGIN/END}), so (@pxref{BEGIN/END}), so
that it happens before any input is processed. It can also be done that it happens before any input is processed. It can also be done
with assignments on the command line, before the names of the input with assignments on the command line, before the names of the input
files, or using the @option{-v} command-line option files, or using the @option{-v} command-line option
(@pxref{Options}). (@pxref{Options}).
The following example prints the first and second fields of each input The following example prints the first and second fields of each input
record, separated by a semicolon, with a blank line added after each record, separated by a semicolon, with a blank line added after each
skipping to change at page 110, line ? skipping to change at page 110, line ?
@print{} @print{}
@print{} Jean-Paul;555-2127 @print{} Jean-Paul;555-2127
@print{} @print{}
@end example @end example
If the value of @code{ORS} does not contain a newline, the program's output If the value of @code{ORS} does not contain a newline, the program's output
runs together on a single line. runs together on a single line.
@node OFMT @node OFMT
@section Controlling Numeric Output with @code{print} @section Controlling Numeric Output with @code{print}
@cindex numeric, output format @cindex numeric @subentry output format
@cindex formats@comma{} numeric output @cindex formats, numeric output
When printing numeric values with the @code{print} statement, When printing numeric values with the @code{print} statement,
@command{awk} internally converts each number to a string of characters @command{awk} internally converts each number to a string of characters
and prints that string. @command{awk} uses the @code{sprintf()} function and prints that string. @command{awk} uses the @code{sprintf()} function
to do this conversion to do this conversion
(@pxref{String Functions}). (@pxref{String Functions}).
For now, it suffices to say that the @code{sprintf()} For now, it suffices to say that the @code{sprintf()}
function accepts a @dfn{format specification} that tells it how to format function accepts a @dfn{format specification} that tells it how to format
numbers (or strings), and that there are a number of different ways in which numbers (or strings), and that there are a number of different ways in which
numbers can be formatted. The different format specifications are discussed numbers can be formatted. The different format specifications are discussed
more fully in more fully in
@ref{Control Letters}. @ref{Control Letters}.
@cindexawkfunc{sprintf} @cindexawkfunc{sprintf}
@cindex @code{OFMT} variable @cindex @code{OFMT} variable
@cindex output, format specifier@comma{} @code{OFMT} @cindex output @subentry format specifier, @code{OFMT}
The predefined variable @code{OFMT} contains the format specification The predefined variable @code{OFMT} contains the format specification
that @code{print} uses with @code{sprintf()} when it wants to convert a that @code{print} uses with @code{sprintf()} when it wants to convert a
number to a string for printing. number to a string for printing.
The default value of @code{OFMT} is @code{"%.6g"}. The default value of @code{OFMT} is @code{"%.6g"}.
The way @code{print} prints numbers can be changed The way @code{print} prints numbers can be changed
by supplying a different format specification by supplying a different format specification
for the value of @code{OFMT}, as shown in the following example: for the value of @code{OFMT}, as shown in the following example:
@example @example
$ @kbd{awk 'BEGIN @{} $ @kbd{awk 'BEGIN @{}
> @kbd{OFMT = "%.0f" # print numbers as integers (rounds)} > @kbd{OFMT = "%.0f" # print numbers as integers (rounds)}
> @kbd{print 17.23, 17.54 @}'} > @kbd{print 17.23, 17.54 @}'}
@print{} 17 18 @print{} 17 18
@end example @end example
@noindent @noindent
@cindex dark corner, @code{OFMT} variable @cindex dark corner @subentry @code{OFMT} variable
@cindex POSIX @command{awk}, @code{OFMT} variable and @cindex POSIX @command{awk} @subentry @code{OFMT} variable and
@cindex @code{OFMT} variable, POSIX @command{awk} and @cindex @code{OFMT} variable @subentry POSIX @command{awk} and
According to the POSIX standard, @command{awk}'s behavior is undefined According to the POSIX standard, @command{awk}'s behavior is undefined
if @code{OFMT} contains anything but a floating-point conversion specification. if @code{OFMT} contains anything but a floating-point conversion specification.
@value{DARKCORNER} @value{DARKCORNER}
@node Printf @node Printf
@section Using @code{printf} Statements for Fancier Printing @section Using @code{printf} Statements for Fancier Printing
@cindex @code{printf} statement @cindex @code{printf} statement
@cindex output, formatted @cindex output @subentry formatted
@cindex formatting output @cindex formatting @subentry output
For more precise control over the output format than what is For more precise control over the output format than what is
provided by @code{print}, use @code{printf}. provided by @code{print}, use @code{printf}.
With @code{printf} you can With @code{printf} you can
specify the width to use for each item, as well as various specify the width to use for each item, as well as various
formatting choices for numbers (such as what output base to use, whether to formatting choices for numbers (such as what output base to use, whether to
print an exponent, whether to print a sign, and how many digits to print print an exponent, whether to print a sign, and how many digits to print
after the decimal point). after the decimal point).
@menu @menu
* Basic Printf:: Syntax of the @code{printf} statement. * Basic Printf:: Syntax of the @code{printf} statement.
* Control Letters:: Format-control letters. * Control Letters:: Format-control letters.
* Format Modifiers:: Format-specification modifiers. * Format Modifiers:: Format-specification modifiers.
* Printf Examples:: Several examples. * Printf Examples:: Several examples.
@end menu @end menu
@node Basic Printf @node Basic Printf
@subsection Introduction to the @code{printf} Statement @subsection Introduction to the @code{printf} Statement
@cindex @code{printf} statement, syntax of @cindex @code{printf} statement @subentry syntax of
A simple @code{printf} statement looks like this: A simple @code{printf} statement looks like this:
@example @example
printf @var{format}, @var{item1}, @var{item2}, @dots{} printf @var{format}, @var{item1}, @var{item2}, @dots{}
@end example @end example
@noindent @noindent
As for @code{print}, the entire list of arguments may optionally be As for @code{print}, the entire list of arguments may optionally be
enclosed in parentheses. Here too, the parentheses are necessary if any enclosed in parentheses. Here too, the parentheses are necessary if any
of the item expressions uses the @samp{>} relational operator; otherwise, of the item expressions uses the @samp{>} relational operator; otherwise,
skipping to change at page 110, line ? skipping to change at page 110, line ?
@print{} Don't Panic! @print{} Don't Panic!
@end group @end group
@end example @end example
@noindent @noindent
Here, neither the @samp{+} nor the @samp{OUCH!} appears in Here, neither the @samp{+} nor the @samp{OUCH!} appears in
the output message. the output message.
@node Control Letters @node Control Letters
@subsection Format-Control Letters @subsection Format-Control Letters
@cindex @code{printf} statement, format-control characters @cindex @code{printf} statement @subentry format-control characters
@cindex format specifiers, @code{printf} statement @cindex format specifiers @subentry @code{printf} statement
A format specifier starts with the character @samp{%} and ends with A format specifier starts with the character @samp{%} and ends with
a @dfn{format-control letter}---it tells the @code{printf} statement a @dfn{format-control letter}---it tells the @code{printf} statement
how to output one item. The format-control letter specifies what @emph{kind} how to output one item. The format-control letter specifies what @emph{kind}
of value to print. The rest of the format specifier is made up of of value to print. The rest of the format specifier is made up of
optional @dfn{modifiers} that control @emph{how} to print the value, such as optional @dfn{modifiers} that control @emph{how} to print the value, such as
the field width. Here is a list of the format-control letters: the field width. Here is a list of the format-control letters:
@c @asis for docbook to come out right @c @asis for docbook to come out right
@table @asis @table @asis
skipping to change at page 110, line ? skipping to change at page 110, line ?
underlying C library @code{printf()} function does not support them. As underlying C library @code{printf()} function does not support them. As
of this writing, among current systems, only OpenVMS is known to not of this writing, among current systems, only OpenVMS is known to not
support them. support them.
@end quotation @end quotation
@item @code{%c} @item @code{%c}
Print a number as a character; thus, @samp{printf "%c", Print a number as a character; thus, @samp{printf "%c",
65} outputs the letter @samp{A}. The output for a string value is 65} outputs the letter @samp{A}. The output for a string value is
the first character of the string. the first character of the string.
@cindex dark corner, format-control characters @cindex dark corner @subentry format-control characters
@cindex @command{gawk}, format-control characters @cindex @command{gawk} @subentry format-control characters
@quotation NOTE @quotation NOTE
The POSIX standard says the first character of a string is printed. The POSIX standard says the first character of a string is printed.
In locales with multibyte characters, @command{gawk} attempts to In locales with multibyte characters, @command{gawk} attempts to
convert the leading bytes of the string into a valid wide character convert the leading bytes of the string into a valid wide character
and then to print the multibyte encoding of that character. and then to print the multibyte encoding of that character.
Similarly, when printing a numeric value, @command{gawk} allows the Similarly, when printing a numeric value, @command{gawk} allows the
value to be within the numeric range of values that can be held value to be within the numeric range of values that can be held
in a wide character. in a wide character.
If the conversion to multibyte encoding fails, @command{gawk} If the conversion to multibyte encoding fails, @command{gawk}
uses the low eight bits of the value as the character to print. uses the low eight bits of the value as the character to print.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@samp{%X} uses the letters @samp{A} through @samp{F} @samp{%X} uses the letters @samp{A} through @samp{F}
instead of @samp{a} through @samp{f} instead of @samp{a} through @samp{f}
(@pxref{Nondecimal-numbers}). (@pxref{Nondecimal-numbers}).
@item @code{%%} @item @code{%%}
Print a single @samp{%}. Print a single @samp{%}.
This does not consume an This does not consume an
argument and it ignores any modifiers. argument and it ignores any modifiers.
@end table @end table
@cindex dark corner, format-control characters @cindex dark corner @subentry format-control characters
@cindex @command{gawk}, format-control characters @cindex @command{gawk} @subentry format-control characters
@quotation NOTE @quotation NOTE
When using the integer format-control letters for values that are When using the integer format-control letters for values that are
outside the range of the widest C integer type, @command{gawk} switches to outside the range of the widest C integer type, @command{gawk} switches to
the @samp{%g} format specifier. If @option{--lint} is provided on the the @samp{%g} format specifier. If @option{--lint} is provided on the
command line (@pxref{Options}), @command{gawk} command line (@pxref{Options}), @command{gawk}
warns about this. Other versions of @command{awk} may print invalid warns about this. Other versions of @command{awk} may print invalid
values or do something else entirely. values or do something else entirely.
@value{DARKCORNER} @value{DARKCORNER}
@end quotation @end quotation
skipping to change at page 110, line ? skipping to change at page 110, line ?
Input and output of these values occurs as text strings. This is Input and output of these values occurs as text strings. This is
somewhat problematic for the @command{awk} language, which predates somewhat problematic for the @command{awk} language, which predates
the IEEE standard. Further details are provided in the IEEE standard. Further details are provided in
@ref{POSIX Floating Point Problems}; please see there. @ref{POSIX Floating Point Problems}; please see there.
@end quotation @end quotation
@node Format Modifiers @node Format Modifiers
@subsection Modifiers for @code{printf} Formats @subsection Modifiers for @code{printf} Formats
@cindex @code{printf} statement, modifiers @cindex @code{printf} statement @subentry modifiers
@cindex modifiers@comma{} in format specifiers @cindex modifiers, in format specifiers
A format specification can also include @dfn{modifiers} that can control A format specification can also include @dfn{modifiers} that can control
how much of the item's value is printed, as well as how much space it gets. how much of the item's value is printed, as well as how much space it gets.
The modifiers come between the @samp{%} and the format-control letter. The modifiers come between the @samp{%} and the format-control letter.
We use the bullet symbol ``@bullet{}'' in the following examples to We use the bullet symbol ``@bullet{}'' in the following examples to
represent represent
spaces in the output. Here are the possible modifiers, in the order in spaces in the output. Here are the possible modifiers, in the order in
which they may appear: which they may appear:
@table @asis @table @asis
@cindex differences in @command{awk} and @command{gawk}, @code{print}/@code{prin @cindex differences in @command{awk} and @command{gawk} @subentry @code{print}/@
tf} statements code{printf} statements
@cindex @code{printf} statement, positional specifiers @cindex @code{printf} statement @subentry positional specifiers
@c the code{} does NOT start a secondary @c the code{} does NOT start a secondary
@cindex positional specifiers, @code{printf} statement @cindex positional specifiers, @code{printf} statement
@item @code{@var{N}$} @item @code{@var{N}$}
An integer constant followed by a @samp{$} is a @dfn{positional specifier}. An integer constant followed by a @samp{$} is a @dfn{positional specifier}.
Normally, format specifications are applied to arguments in the order Normally, format specifications are applied to arguments in the order
given in the format string. With a positional specifier, the format given in the format string. With a positional specifier, the format
specification is applied to a specific argument, instead of what specification is applied to a specific argument, instead of what
would be the next argument in the list. Positional specifiers begin would be the next argument in the list. Positional specifiers begin
counting with one. Thus: counting with one. Thus:
skipping to change at page 110, line ? skipping to change at page 110, line ?
w = 5 w = 5
p = 3 p = 3
s = "abcdefg" s = "abcdefg"
printf "%" w "." p "s\n", s printf "%" w "." p "s\n", s
@end example @end example
@noindent @noindent
This is not particularly easy to read, but it does work. This is not particularly easy to read, but it does work.
@c @cindex lint checks @c @cindex lint checks
@cindex troubleshooting, fatal errors, @code{printf} format strings @cindex troubleshooting @subentry fatal errors @subentry @code{printf} format st
@cindex POSIX @command{awk}, @code{printf} format strings and rings
@cindex POSIX @command{awk} @subentry @code{printf} format strings and
C programmers may be used to supplying additional modifiers (@samp{h}, C programmers may be used to supplying additional modifiers (@samp{h},
@samp{j}, @samp{l}, @samp{L}, @samp{t}, and @samp{z}) in @code{printf} @samp{j}, @samp{l}, @samp{L}, @samp{t}, and @samp{z}) in @code{printf}
format strings. These are not valid in @command{awk}. Most @command{awk} format strings. These are not valid in @command{awk}. Most @command{awk}
implementations silently ignore them. If @option{--lint} is provided implementations silently ignore them. If @option{--lint} is provided
on the command line (@pxref{Options}), @command{gawk} warns about their on the command line (@pxref{Options}), @command{gawk} warns about their
use. If @option{--posix} is supplied, their use is a fatal error. use. If @option{--posix} is supplied, their use is a fatal error.
@node Printf Examples @node Printf Examples
@subsection Examples Using @code{printf} @subsection Examples Using @code{printf}
skipping to change at page 110, line ? skipping to change at page 110, line ?
awk 'BEGIN @{ format = "%-10s %s\n" awk 'BEGIN @{ format = "%-10s %s\n"
printf format, "Name", "Number" printf format, "Name", "Number"
printf format, "----", "------" @} printf format, "----", "------" @}
@{ printf format, $1, $2 @}' mail-list @{ printf format, $1, $2 @}' mail-list
@end example @end example
@node Redirection @node Redirection
@section Redirecting Output of @code{print} and @code{printf} @section Redirecting Output of @code{print} and @code{printf}
@cindex output redirection @cindex output redirection
@cindex redirection of output @cindex redirection @subentry of output
@cindex @option{--sandbox} option, output redirection with @code{print}, @code{p @cindex @option{--sandbox} option @subentry output redirection with @code{print}
rintf} @subentry @code{printf}
So far, the output from @code{print} and @code{printf} has gone So far, the output from @code{print} and @code{printf} has gone
to the standard to the standard
output, usually the screen. Both @code{print} and @code{printf} can output, usually the screen. Both @code{print} and @code{printf} can
also send their output to other places. also send their output to other places.
This is called @dfn{redirection}. This is called @dfn{redirection}.
@quotation NOTE @quotation NOTE
When @option{--sandbox} is specified (@pxref{Options}), When @option{--sandbox} is specified (@pxref{Options}),
redirecting output to files, pipes, and coprocesses is disabled. redirecting output to files, pipes, and coprocesses is disabled.
@end quotation @end quotation
A redirection appears after the @code{print} or @code{printf} statement. A redirection appears after the @code{print} or @code{printf} statement.
Redirections in @command{awk} are written just like redirections in shell Redirections in @command{awk} are written just like redirections in shell
commands, except that they are written inside the @command{awk} program. commands, except that they are written inside the @command{awk} program.
@c the commas here are part of the see also @c the commas here are part of the see also
@cindex @code{print} statement, See Also redirection@comma{} of output @cindex @code{print} statement @seealso{redirection of output}
@cindex @code{printf} statement, See Also redirection@comma{} of output @cindex @code{printf} statement @seealso{redirection of output}
There are four forms of output redirection: output to a file, output There are four forms of output redirection: output to a file, output
appended to a file, output through a pipe to another command, and output appended to a file, output through a pipe to another command, and output
to a coprocess. We show them all for the @code{print} statement, to a coprocess. We show them all for the @code{print} statement,
but they work identically for @code{printf}: but they work identically for @code{printf}:
@table @code @table @code
@cindex @code{>} (right angle bracket), @code{>} operator (I/O) @cindex @code{>} (right angle bracket) @subentry @code{>} operator (I/O)
@cindex right angle bracket (@code{>}), @code{>} operator (I/O) @cindex right angle bracket (@code{>}) @subentry @code{>} operator (I/O)
@cindex operators, input/output @cindex operators @subentry input/output
@item print @var{items} > @var{output-file} @item print @var{items} > @var{output-file}
This redirection prints the items into the output file named This redirection prints the items into the output file named
@var{output-file}. The @value{FN} @var{output-file} can be any @var{output-file}. The @value{FN} @var{output-file} can be any
expression. Its value is changed to a string and then used as a expression. Its value is changed to a string and then used as a
@value{FN} (@pxref{Expressions}). @value{FN} (@pxref{Expressions}).
When this type of redirection is used, the @var{output-file} is erased When this type of redirection is used, the @var{output-file} is erased
before the first output is written to it. Subsequent writes to the same before the first output is written to it. Subsequent writes to the same
@var{output-file} do not erase @var{output-file}, but append to it. @var{output-file} do not erase @var{output-file}, but append to it.
(This is different from how you use redirections in shell scripts.) (This is different from how you use redirections in shell scripts.)
skipping to change at page 110, line ? skipping to change at page 110, line ?
@dots{} @dots{}
$ @kbd{cat name-list} $ @kbd{cat name-list}
@print{} Amelia @print{} Amelia
@print{} Anthony @print{} Anthony
@dots{} @dots{}
@end example @end example
@noindent @noindent
Each output file contains one name or number per line. Each output file contains one name or number per line.
@cindex @code{>} (right angle bracket), @code{>>} operator (I/O) @cindex @code{>} (right angle bracket) @subentry @code{>>} operator (I/O)
@cindex right angle bracket (@code{>}), @code{>>} operator (I/O) @cindex right angle bracket (@code{>}) @subentry @code{>>} operator (I/O)
@item print @var{items} >> @var{output-file} @item print @var{items} >> @var{output-file}
This redirection prints the items into the preexisting output file This redirection prints the items into the preexisting output file
named @var{output-file}. The difference between this and the named @var{output-file}. The difference between this and the
single-@samp{>} redirection is that the old contents (if any) of single-@samp{>} redirection is that the old contents (if any) of
@var{output-file} are not erased. Instead, the @command{awk} output is @var{output-file} are not erased. Instead, the @command{awk} output is
appended to the file. appended to the file.
If @var{output-file} does not exist, then it is created. If @var{output-file} does not exist, then it is created.
@cindex @code{|} (vertical bar), @code{|} operator (I/O) @cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O)
@cindex pipe, output @cindex pipe @subentry output
@cindex output, pipes @cindex output @subentry pipes
@item print @var{items} | @var{command} @item print @var{items} | @var{command}
It is possible to send output to another program through a pipe It is possible to send output to another program through a pipe
instead of into a file. This redirection opens a pipe to instead of into a file. This redirection opens a pipe to
@var{command}, and writes the values of @var{items} through this pipe @var{command}, and writes the values of @var{items} through this pipe
to another process created to execute @var{command}. to another process created to execute @var{command}.
The redirection argument @var{command} is actually an @command{awk} The redirection argument @var{command} is actually an @command{awk}
expression. Its value is converted to a string whose contents give expression. Its value is converted to a string whose contents give
the shell command to be run. For example, the following produces two the shell command to be run. For example, the following produces two
files, one unsorted list of peoples' names, and one list sorted in reverse files, one unsorted list of peoples' names, and one list sorted in reverse
skipping to change at page 110, line ? skipping to change at page 110, line ?
for more information. for more information.
This example also illustrates the use of a variable to represent This example also illustrates the use of a variable to represent
a @var{file} or @var{command}---it is not necessary to always a @var{file} or @var{command}---it is not necessary to always
use a string constant. Using a variable is generally a good idea, use a string constant. Using a variable is generally a good idea,
because (if you mean to refer to that same file or command) because (if you mean to refer to that same file or command)
@command{awk} requires that the string value be written identically @command{awk} requires that the string value be written identically
every time. every time.
@cindex coprocesses @cindex coprocesses
@cindex @code{|} (vertical bar), @code{|&} operator (I/O) @cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O)
@cindex operators, input/output @cindex operators @subentry input/output
@cindex differences in @command{awk} and @command{gawk}, input/output operators @cindex differences in @command{awk} and @command{gawk} @subentry input/output o
perators
@item print @var{items} |& @var{command} @item print @var{items} |& @var{command}
This redirection prints the items to the input of @var{command}. This redirection prints the items to the input of @var{command}.
The difference between this and the The difference between this and the
single-@samp{|} redirection is that the output from @var{command} single-@samp{|} redirection is that the output from @var{command}
can be read with @code{getline}. can be read with @code{getline}.
Thus, @var{command} is a @dfn{coprocess}, which works together with Thus, @var{command} is a @dfn{coprocess}, which works together with
but is subsidiary to the @command{awk} program. but is subsidiary to the @command{awk} program.
This feature is a @command{gawk} extension, and is not available in This feature is a @command{gawk} extension, and is not available in
POSIX @command{awk}. POSIX @command{awk}.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@ref{Two-way I/O} @ref{Two-way I/O}
for a more complete discussion. for a more complete discussion.
@end ifdocbook @end ifdocbook
@end table @end table
Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&} Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&}
asks the system to open a file, pipe, or coprocess only if the particular asks the system to open a file, pipe, or coprocess only if the particular
@var{file} or @var{command} you specify has not already been written @var{file} or @var{command} you specify has not already been written
to by your program or if it has been closed since it was last written to. to by your program or if it has been closed since it was last written to.
@cindex troubleshooting, printing @cindex troubleshooting @subentry printing
It is a common error to use @samp{>} redirection for the first @code{print} It is a common error to use @samp{>} redirection for the first @code{print}
to a file, and then to use @samp{>>} for subsequent output: to a file, and then to use @samp{>>} for subsequent output:
@example @example
# clear the file # clear the file
print "Don't panic" > "guide.txt" print "Don't panic" > "guide.txt"
@dots{} @dots{}
# append # append
print "Avoid improbability generators" >> "guide.txt" print "Avoid improbability generators" >> "guide.txt"
@end example @end example
@noindent @noindent
This is indeed how redirections must be used from the shell. But in This is indeed how redirections must be used from the shell. But in
@command{awk}, it isn't necessary. In this kind of case, a program should @command{awk}, it isn't necessary. In this kind of case, a program should
use @samp{>} for all the @code{print} statements, because the output file use @samp{>} for all the @code{print} statements, because the output file
is only opened once. (It happens that if you mix @samp{>} and @samp{>>} is only opened once. (It happens that if you mix @samp{>} and @samp{>>}
output is produced in the expected order. However, mixing the operators output is produced in the expected order. However, mixing the operators
for the same file is definitely poor style, and is confusing to readers for the same file is definitely poor style, and is confusing to readers
of your program.) of your program.)
@cindex differences in @command{awk} and @command{gawk}, implementation limitati @cindex differences in @command{awk} and @command{gawk} @subentry implementation
ons limitations
@cindex implementation issues, @command{gawk}, limits @cindex implementation issues, @command{gawk} @subentry limits
@cindex @command{awk}, implementation issues, pipes @cindex @command{awk} @subentry implementation issues @subentry pipes
@cindex @command{gawk}, implementation issues, pipes @cindex @command{gawk} @subentry implementation issues @subentry pipes
@ifnotinfo @ifnotinfo
As mentioned earlier As mentioned earlier
(@pxref{Getline Notes}), (@pxref{Getline Notes}),
many many
@end ifnotinfo @end ifnotinfo
@ifnottex @ifnottex
@ifnotdocbook @ifnotdocbook
Many Many
@end ifnotdocbook @end ifnotdocbook
@end ifnottex @end ifnottex
older older
@command{awk} implementations limit the number of pipelines that an @command{awk } @command{awk} implementations limit the number of pipelines that an @command{awk }
program may have open to just one! In @command{gawk}, there is no such limit. program may have open to just one! In @command{gawk}, there is no such limit.
@command{gawk} allows a program to @command{gawk} allows a program to
open as many pipelines as the underlying operating system permits. open as many pipelines as the underlying operating system permits.
@cindex sidebar, Piping into @command{sh} @cindex sidebar @subentry Piping into @command{sh}
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Piping into @command{sh}</title> <sidebar><title>Piping into @command{sh}</title>
@end docbook @end docbook
@cindex shells, piping commands into @cindex shells @subentry piping commands into
A particularly powerful way to use redirection is to build command lines A particularly powerful way to use redirection is to build command lines
and pipe them into the shell, @command{sh}. For example, suppose you and pipe them into the shell, @command{sh}. For example, suppose you
have a list of files brought over from a system where all the @value{FN}s have a list of files brought over from a system where all the @value{FN}s
are stored in uppercase, and you wish to rename them to have names in are stored in uppercase, and you wish to rename them to have names in
all lowercase. The following program is both simple and efficient: all lowercase. The following program is both simple and efficient:
@c @cindex @command{mv} utility @c @cindex @command{mv} utility
@example @example
@{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @} @{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @}
skipping to change at page 110, line ? skipping to change at page 110, line ?
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Piping into @command{sh}} @center @b{Piping into @command{sh}}
@cindex shells, piping commands into @cindex shells @subentry piping commands into
A particularly powerful way to use redirection is to build command lines A particularly powerful way to use redirection is to build command lines
and pipe them into the shell, @command{sh}. For example, suppose you and pipe them into the shell, @command{sh}. For example, suppose you
have a list of files brought over from a system where all the @value{FN}s have a list of files brought over from a system where all the @value{FN}s
are stored in uppercase, and you wish to rename them to have names in are stored in uppercase, and you wish to rename them to have names in
all lowercase. The following program is both simple and efficient: all lowercase. The following program is both simple and efficient:
@c @cindex @command{mv} utility @c @cindex @command{mv} utility
@example @example
@{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @} @{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @}
skipping to change at page 110, line ? skipping to change at page 110, line ?
It then sends the list to the shell for execution. It then sends the list to the shell for execution.
@xref{Shell Quoting} for a function that can help in generating @xref{Shell Quoting} for a function that can help in generating
command lines to be fed to the shell. command lines to be fed to the shell.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node Special FD @node Special FD
@section Special Files for Standard Preopened Data Streams @section Special Files for Standard Preopened Data Streams
@cindex standard input @cindex standard input
@cindex input, standard @cindex input @subentry standard
@cindex standard output @cindex standard output
@cindex output, standard @cindex output @subentry standard
@cindex error output @cindex error output
@cindex standard error @cindex standard error
@cindex file descriptors @cindex file descriptors
@cindex files, descriptors, See file descriptors @cindex files @subentry descriptors @seeentry{file descriptors}
Running programs conventionally have three input and output streams Running programs conventionally have three input and output streams
already available to them for reading and writing. These are known already available to them for reading and writing. These are known
as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard
error output}. These open streams (and any other open files or pipes) error output}. These open streams (and any other open files or pipes)
are often referred to by the technical term @dfn{file descriptors}. are often referred to by the technical term @dfn{file descriptors}.
These streams are, by default, connected to your keyboard and screen, but These streams are, by default, connected to your keyboard and screen, but
they are often redirected with the shell, via the @samp{<}, @samp{<<}, they are often redirected with the shell, via the @samp{<}, @samp{<<},
@samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators. Standard error @samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators. Standard error
is typically used for writing error messages; the reason there are two separate is typically used for writing error messages; the reason there are two separate
streams, standard output and standard error, is so that they can be streams, standard output and standard error, is so that they can be
redirected separately. redirected separately.
@cindex differences in @command{awk} and @command{gawk}, error messages @cindex differences in @command{awk} and @command{gawk} @subentry error messages
@cindex error handling @cindex error handling
In traditional implementations of @command{awk}, the only way to write an error In traditional implementations of @command{awk}, the only way to write an error
message to standard error in an @command{awk} program is as follows: message to standard error in an @command{awk} program is as follows:
@example @example
print "Serious error detected!" | "cat 1>&2" print "Serious error detected!" | "cat 1>&2"
@end example @end example
@noindent @noindent
This works by opening a pipeline to a shell command that can access the This works by opening a pipeline to a shell command that can access the
skipping to change at page 110, line ? skipping to change at page 110, line ?
Then opening @file{/dev/tty} fails. Then opening @file{/dev/tty} fails.
@command{gawk}, BWK @command{awk}, and @command{mawk} provide @command{gawk}, BWK @command{awk}, and @command{mawk} provide
special @value{FN}s for accessing the three standard streams. special @value{FN}s for accessing the three standard streams.
If the @value{FN} matches one of these special names when @command{gawk} If the @value{FN} matches one of these special names when @command{gawk}
(or one of the others) redirects input or output, then it directly uses (or one of the others) redirects input or output, then it directly uses
the descriptor that the @value{FN} stands for. These special the descriptor that the @value{FN} stands for. These special
@value{FN}s work for all operating systems that @command{gawk} @value{FN}s work for all operating systems that @command{gawk}
has been ported to, not just those that are POSIX-compliant: has been ported to, not just those that are POSIX-compliant:
@cindex common extensions, @code{/dev/stdin} special file @cindex common extensions @subentry @code{/dev/stdin} special file
@cindex common extensions, @code{/dev/stdout} special file @cindex common extensions @subentry @code{/dev/stdout} special file
@cindex common extensions, @code{/dev/stderr} special file @cindex common extensions @subentry @code{/dev/stderr} special file
@cindex extensions, common@comma{} @code{/dev/stdin} special file @cindex extensions @subentry common @subentry @code{/dev/stdin} special file
@cindex extensions, common@comma{} @code{/dev/stdout} special file @cindex extensions @subentry common @subentry @code{/dev/stdout} special file
@cindex extensions, common@comma{} @code{/dev/stderr} special file @cindex extensions @subentry common @subentry @code{/dev/stderr} special file
@cindex file names, standard streams in @command{gawk} @cindex file names @subentry standard streams in @command{gawk}
@cindex @code{/dev/@dots{}} special files @cindex @code{/dev/@dots{}} special files
@cindex files, @code{/dev/@dots{}} special files @cindex files @subentry @code{/dev/@dots{}} special files
@cindex @code{/dev/fd/@var{N}} special files (@command{gawk}) @cindex @code{/dev/fd/@var{N}} special files (@command{gawk})
@table @file @table @file
@item /dev/stdin @item /dev/stdin
The standard input (file descriptor 0). The standard input (file descriptor 0).
@item /dev/stdout @item /dev/stdout
The standard output (file descriptor 1). The standard output (file descriptor 1).
@item /dev/stderr @item /dev/stderr
The standard error output (file descriptor 2). The standard error output (file descriptor 2).
@end table @end table
With these facilities, With these facilities,
the proper way to write an error message then becomes: the proper way to write an error message then becomes:
@example @example
print "Serious error detected!" > "/dev/stderr" print "Serious error detected!" > "/dev/stderr"
@end example @end example
@cindex troubleshooting, quotes with file names @cindex troubleshooting @subentry quotes with file names
Note the use of quotes around the @value{FN}. Note the use of quotes around the @value{FN}.
Like with any other redirection, the value must be a string. Like with any other redirection, the value must be a string.
It is a common error to omit the quotes, which leads It is a common error to omit the quotes, which leads
to confusing results. to confusing results.
@command{gawk} does not treat these @value{FN}s as special when @command{gawk} does not treat these @value{FN}s as special when
in POSIX-compatibility mode. However, because BWK @command{awk} in POSIX-compatibility mode. However, because BWK @command{awk}
supports them, @command{gawk} does support them even when supports them, @command{gawk} does support them even when
invoked with the @option{--traditional} option (@pxref{Options}). invoked with the @option{--traditional} option (@pxref{Options}).
@node Special Files @node Special Files
@section Special @value{FFN}s in @command{gawk} @section Special @value{FFN}s in @command{gawk}
@cindex @command{gawk}, file names in @cindex @command{gawk} @subentry file names in
Besides access to standard input, standard output, and standard error, Besides access to standard input, standard output, and standard error,
@command{gawk} provides access to any open file descriptor. @command{gawk} provides access to any open file descriptor.
Additionally, there are special @value{FN}s reserved for Additionally, there are special @value{FN}s reserved for
TCP/IP networking. TCP/IP networking.
@menu @menu
* Other Inherited Files:: Accessing other open files with * Other Inherited Files:: Accessing other open files with
@command{gawk}. @command{gawk}.
* Special Network:: Special files for network communications. * Special Network:: Special files for network communications.
skipping to change at page 110, line ? skipping to change at page 110, line ?
The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
are essentially aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and are essentially aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and
@file{/dev/fd/2}, respectively. However, those names are more self-explanatory. @file{/dev/fd/2}, respectively. However, those names are more self-explanatory.
Note that using @code{close()} on a @value{FN} of the Note that using @code{close()} on a @value{FN} of the
form @code{"/dev/fd/@var{N}"}, for file descriptor numbers form @code{"/dev/fd/@var{N}"}, for file descriptor numbers
above two, does actually close the given file descriptor. above two, does actually close the given file descriptor.
@node Special Network @node Special Network
@subsection Special Files for Network Communications @subsection Special Files for Network Communications
@cindex networks, support for @cindex networks @subentry support for
@cindex TCP/IP, support for @cindex TCP/IP @subentry support for
@command{gawk} programs @command{gawk} programs
can open a two-way can open a two-way
TCP/IP connection, acting as either a client or a server. TCP/IP connection, acting as either a client or a server.
This is done using a special @value{FN} of the form: This is done using a special @value{FN} of the form:
@example @example
@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{rem ote-port}} @file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{rem ote-port}}
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
Full discussion is delayed until Full discussion is delayed until
@ref{TCP/IP Networking}. @ref{TCP/IP Networking}.
@node Special Caveats @node Special Caveats
@subsection Special @value{FFN} Caveats @subsection Special @value{FFN} Caveats
Here are some things to bear in mind when using the Here are some things to bear in mind when using the
special @value{FN}s that @command{gawk} provides: special @value{FN}s that @command{gawk} provides:
@itemize @value{BULLET} @itemize @value{BULLET}
@cindex compatibility mode (@command{gawk}), file names @cindex compatibility mode (@command{gawk}) @subentry file names
@cindex file names, in compatibility mode @cindex file names @subentry in compatibility mode
@cindex POSIX mode
@item @item
Recognition of the @value{FN}s for the three standard preopened Recognition of the @value{FN}s for the three standard preopened
files is disabled only in POSIX mode. files is disabled only in POSIX mode.
@item @item
Recognition of the other special @value{FN}s is disabled if @command{gawk} is in Recognition of the other special @value{FN}s is disabled if @command{gawk} is in
compatibility mode (either @option{--traditional} or @option{--posix}; compatibility mode (either @option{--traditional} or @option{--posix};
@pxref{Options}). @pxref{Options}).
@item @item
skipping to change at page 110, line ? skipping to change at page 110, line ?
For example, using @samp{/dev/fd/4} For example, using @samp{/dev/fd/4}
for output actually writes on file descriptor 4, and not on a new for output actually writes on file descriptor 4, and not on a new
file descriptor that is @code{dup()}ed from file descriptor 4. Most of file descriptor that is @code{dup()}ed from file descriptor 4. Most of
the time this does not matter; however, it is important to @emph{not} the time this does not matter; however, it is important to @emph{not}
close any of the files related to file descriptors 0, 1, and 2. close any of the files related to file descriptors 0, 1, and 2.
Doing so results in unpredictable behavior. Doing so results in unpredictable behavior.
@end itemize @end itemize
@node Close Files And Pipes @node Close Files And Pipes
@section Closing Input and Output Redirections @section Closing Input and Output Redirections
@cindex files, output, See output files @cindex files @subentry output @seeentry{output files}
@cindex input files, closing @cindex input files @subentry closing
@cindex output, files@comma{} closing @cindex output @subentry files, closing
@cindex pipe, closing @cindex pipe @subentry closing
@cindex coprocesses, closing @cindex coprocesses @subentry closing
@cindex @code{getline} command, coprocesses@comma{} using from @cindex @code{getline} command @subentry coprocesses, using from
If the same @value{FN} or the same shell command is used with @code{getline} If the same @value{FN} or the same shell command is used with @code{getline}
more than once during the execution of an @command{awk} program more than once during the execution of an @command{awk} program
(@pxref{Getline}), (@pxref{Getline}),
the file is opened (or the command is executed) the first time only. the file is opened (or the command is executed) the first time only.
At that time, the first record of input is read from that file or command. At that time, the first record of input is read from that file or command.
The next time the same file or command is used with @code{getline}, The next time the same file or command is used with @code{getline},
another record is read from it, and so on. another record is read from it, and so on.
Similarly, when a file or pipe is opened for output, @command{awk} remembers Similarly, when a file or pipe is opened for output, @command{awk} remembers
skipping to change at page 110, line ? skipping to change at page 110, line ?
To run the same program a second time, with the same arguments. To run the same program a second time, with the same arguments.
This is not the same thing as giving more input to the first run! This is not the same thing as giving more input to the first run!
For example, suppose a program pipes output to the @command{mail} program. For example, suppose a program pipes output to the @command{mail} program.
If it outputs several lines redirected to this pipe without closing If it outputs several lines redirected to this pipe without closing
it, they make a single message of several lines. By contrast, if the it, they make a single message of several lines. By contrast, if the
program closes the pipe after each line of output, then each line makes program closes the pipe after each line of output, then each line makes
a separate message. a separate message.
@end itemize @end itemize
@cindex differences in @command{awk} and @command{gawk}, @code{close()} function @cindex differences in @command{awk} and @command{gawk} @subentry @code{close()}
@cindex portability, @code{close()} function and function
@cindex @code{close()} function, portability @cindex portability @subentry @code{close()} function and
@cindex @code{close()} function @subentry portability
If you use more files than the system allows you to have open, If you use more files than the system allows you to have open,
@command{gawk} attempts to multiplex the available open files among @command{gawk} attempts to multiplex the available open files among
your @value{DF}s. @command{gawk}'s ability to do this depends upon the your @value{DF}s. @command{gawk}'s ability to do this depends upon the
facilities of your operating system, so it may not always work. It is facilities of your operating system, so it may not always work. It is
therefore both good practice and good portability advice to always therefore both good practice and good portability advice to always
use @code{close()} on your files when you are done with them. use @code{close()} on your files when you are done with them.
In fact, if you are using a lot of pipes, it is essential that In fact, if you are using a lot of pipes, it is essential that
you close commands when done. For example, consider something like this: you close commands when done. For example, consider something like this:
@example @example
skipping to change at page 110, line ? skipping to change at page 110, line ?
a redirection. In such a case, it returns a negative value, a redirection. In such a case, it returns a negative value,
indicating an error. In addition, @command{gawk} sets @code{ERRNO} indicating an error. In addition, @command{gawk} sets @code{ERRNO}
to a string indicating the error. to a string indicating the error.
Note also that @samp{close(FILENAME)} has no ``magic'' effects on the Note also that @samp{close(FILENAME)} has no ``magic'' effects on the
implicit loop that reads through the files named on the command line. implicit loop that reads through the files named on the command line.
It is, more likely, a close of a file that was never opened with a It is, more likely, a close of a file that was never opened with a
redirection, so @command{awk} silently does nothing, except return redirection, so @command{awk} silently does nothing, except return
a negative value. a negative value.
@cindex @code{|} (vertical bar), @code{|&} operator (I/O), pipes@comma{} closing @cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) @subentry pip es, closing
When using the @samp{|&} operator to communicate with a coprocess, When using the @samp{|&} operator to communicate with a coprocess,
it is occasionally useful to be able to close one end of the two-way it is occasionally useful to be able to close one end of the two-way
pipe without closing the other. pipe without closing the other.
This is done by supplying a second argument to @code{close()}. This is done by supplying a second argument to @code{close()}.
As in any other call to @code{close()}, As in any other call to @code{close()},
the first argument is the name of the command or special file used the first argument is the name of the command or special file used
to start the coprocess. to start the coprocess.
The second argument should be a string, with either of the values The second argument should be a string, with either of the values
@code{"to"} or @code{"from"}. Case does not matter. @code{"to"} or @code{"from"}. Case does not matter.
As this is an advanced feature, discussion is As this is an advanced feature, discussion is
delayed until delayed until
@ref{Two-way I/O}, @ref{Two-way I/O},
which describes it in more detail and gives an example. which describes it in more detail and gives an example.
@cindex sidebar, Using @code{close()}'s Return Value @cindex sidebar @subentry Using @code{close()}'s Return Value
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>Using @code{close()}'s Return Value</title> <sidebar><title>Using @code{close()}'s Return Value</title>
@end docbook @end docbook
@cindex dark corner, @code{close()} function @cindex dark corner @subentry @code{close()} function
@cindex @code{close()} function, return value @cindex @code{close()} function @subentry return value
@cindex return value@comma{} @code{close()} function @cindex return value, @code{close()} function
@cindex differences in @command{awk} and @command{gawk}, @code{close()} function @cindex differences in @command{awk} and @command{gawk} @subentry @code{close()}
@cindex Unix @command{awk}, @code{close()} function and function
@cindex Unix @command{awk} @subentry @code{close()} function and
In many older versions of Unix @command{awk}, the @code{close()} function In many older versions of Unix @command{awk}, the @code{close()} function
is actually a statement. is actually a statement.
@value{DARKCORNER} @value{DARKCORNER}
It is a syntax error to try and use the return It is a syntax error to try and use the return
value from @code{close()}: value from @code{close()}:
@example @example
command = "@dots{}" command = "@dots{}"
command | getline info command | getline info
retval = close(command) # syntax error in many Unix awks retval = close(command) # syntax error in many Unix awks
@end example @end example
@cindex @command{gawk}, @code{ERRNO} variable in @cindex @command{gawk} @subentry @code{ERRNO} variable in
@cindex @code{ERRNO} variable, with @command{close()} function @cindex @code{ERRNO} variable @subentry with @command{close()} function
@command{gawk} treats @code{close()} as a function. @command{gawk} treats @code{close()} as a function.
The return value is @minus{}1 if the argument names something The return value is @minus{}1 if the argument names something
that was never opened with a redirection, or if there is that was never opened with a redirection, or if there is
a system problem closing the file or process. a system problem closing the file or process.
In these cases, @command{gawk} sets the predefined variable In these cases, @command{gawk} sets the predefined variable
@code{ERRNO} to a string describing the problem. @code{ERRNO} to a string describing the problem.
In @command{gawk}, starting with @value{PVERSION} 4.2, when closing a pipe or In @command{gawk}, starting with @value{PVERSION} 4.2, when closing a pipe or
coprocess (input or output), the return value is the exit status of the coprocess (input or output), the return value is the exit status of the
command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior
skipping to change at page 110, line ? skipping to change at page 110, line ?
@caption{Return values from @code{close()} of a pipe} @caption{Return values from @code{close()} of a pipe}
@multitable @columnfractions .50 .50 @multitable @columnfractions .50 .50
@headitem Situation @tab Return value from @code{close()} @headitem Situation @tab Return value from @code{close()}
@item Normal exit of command @tab Command's exit status @item Normal exit of command @tab Command's exit status
@item Death by signal of command @tab 256 + number of murderous signal @item Death by signal of command @tab 256 + number of murderous signal
@item Death by signal of command with core dump @tab 512 + number of murderous s ignal @item Death by signal of command with core dump @tab 512 + number of murderous s ignal
@item Some kind of error @tab @minus{}1 @item Some kind of error @tab @minus{}1
@end multitable @end multitable
@end float @end float
@cindex POSIX mode
The POSIX standard is very vague; it says that @code{close()} The POSIX standard is very vague; it says that @code{close()}
returns zero on success and a nonzero value otherwise. In general, returns zero on success and a nonzero value otherwise. In general,
different implementations vary in what they report when closing different implementations vary in what they report when closing
pipes; thus, the return value cannot be used portably. pipes; thus, the return value cannot be used portably.
@value{DARKCORNER} @value{DARKCORNER}
In POSIX mode (@pxref{Options}), @command{gawk} just returns zero In POSIX mode (@pxref{Options}), @command{gawk} just returns zero
when closing a pipe. when closing a pipe.
@docbook @docbook
</sidebar> </sidebar>
@end docbook @end docbook
@end ifdocbook @end ifdocbook
@ifnotdocbook @ifnotdocbook
@cartouche @cartouche
@center @b{Using @code{close()}'s Return Value} @center @b{Using @code{close()}'s Return Value}
@cindex dark corner, @code{close()} function @cindex dark corner @subentry @code{close()} function
@cindex @code{close()} function, return value @cindex @code{close()} function @subentry return value
@cindex return value@comma{} @code{close()} function @cindex return value, @code{close()} function
@cindex differences in @command{awk} and @command{gawk}, @code{close()} function @cindex differences in @command{awk} and @command{gawk} @subentry @code{close()}
@cindex Unix @command{awk}, @code{close()} function and function
@cindex Unix @command{awk} @subentry @code{close()} function and
In many older versions of Unix @command{awk}, the @code{close()} function In many older versions of Unix @command{awk}, the @code{close()} function
is actually a statement. is actually a statement.
@value{DARKCORNER} @value{DARKCORNER}
It is a syntax error to try and use the return It is a syntax error to try and use the return
value from @code{close()}: value from @code{close()}:
@example @example
command = "@dots{}" command = "@dots{}"
command | getline info command | getline info
retval = close(command) # syntax error in many Unix awks retval = close(command) # syntax error in many Unix awks
@end example @end example
@cindex @command{gawk}, @code{ERRNO} variable in @cindex @command{gawk} @subentry @code{ERRNO} variable in
@cindex @code{ERRNO} variable, with @command{close()} function @cindex @code{ERRNO} variable @subentry with @command{close()} function
@command{gawk} treats @code{close()} as a function. @command{gawk} treats @code{close()} as a function.
The return value is @minus{}1 if the argument names something The return value is @minus{}1 if the argument names something
that was never opened with a redirection, or if there is that was never opened with a redirection, or if there is
a system problem closing the file or process. a system problem closing the file or process.
In these cases, @command{gawk} sets the predefined variable In these cases, @command{gawk} sets the predefined variable
@code{ERRNO} to a string describing the problem. @code{ERRNO} to a string describing the problem.
In @command{gawk}, starting with @value{PVERSION} 4.2, when closing a pipe or In @command{gawk}, starting with @value{PVERSION} 4.2, when closing a pipe or
coprocess (input or output), the return value is the exit status of the coprocess (input or output), the return value is the exit status of the
command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior
skipping to change at page 110, line ? skipping to change at page 110, line ?
@caption{Return values from @code{close()} of a pipe} @caption{Return values from @code{close()} of a pipe}
@multitable @columnfractions .50 .50 @multitable @columnfractions .50 .50
@headitem Situation @tab Return value from @code{close()} @headitem Situation @tab Return value from @code{close()}
@item Normal exit of command @tab Command's exit status @item Normal exit of command @tab Command's exit status
@item Death by signal of command @tab 256 + number of murderous signal @item Death by signal of command @tab 256 + number of murderous signal
@item Death by signal of command with core dump @tab 512 + number of murderous s ignal @item Death by signal of command with core dump @tab 512 + number of murderous s ignal
@item Some kind of error @tab @minus{}1 @item Some kind of error @tab @minus{}1
@end multitable @end multitable
@end float @end float
@cindex POSIX mode
The POSIX standard is very vague; it says that @code{close()} The POSIX standard is very vague; it says that @code{close()}
returns zero on success and a nonzero value otherwise. In general, returns zero on success and a nonzero value otherwise. In general,
different implementations vary in what they report when closing different implementations vary in what they report when closing
pipes; thus, the return value cannot be used portably. pipes; thus, the return value cannot be used portably.
@value{DARKCORNER} @value{DARKCORNER}
In POSIX mode (@pxref{Options}), @command{gawk} just returns zero In POSIX mode (@pxref{Options}), @command{gawk} just returns zero
when closing a pipe. when closing a pipe.
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
Here, @command{gawk} did not produce a fatal error; instead Here, @command{gawk} did not produce a fatal error; instead
it let the @command{awk} program code detect the problem and handle it. it let the @command{awk} program code detect the problem and handle it.
This mechanism works also for standard output and standard error. This mechanism works also for standard output and standard error.
For standard output, you may use @code{PROCINFO["-", "NONFATAL"]} For standard output, you may use @code{PROCINFO["-", "NONFATAL"]}
or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use
@code{PROCINFO["/dev/stderr", "NONFATAL"]}. @code{PROCINFO["/dev/stderr", "NONFATAL"]}.
@cindex @env{GAWK_SOCK_RETRIES} environment variable
@cindex environment variables @subentry @env{GAWK_SOCK_RETRIES}
When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}), When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}),
@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES} @command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES}
environment variable (@pxref{Other Environment Variables}) allows you to environment variable (@pxref{Other Environment Variables}) allows you to
override @command{gawk}'s builtin default number of attempts. However, override @command{gawk}'s builtin default number of attempts. However,
once nonfatal I/O is enabled for a given socket, @command{gawk} only once nonfatal I/O is enabled for a given socket, @command{gawk} only
retries once, relying on @command{awk}-level code to notice that there retries once, relying on @command{awk}-level code to notice that there
was a problem. was a problem.
@node Output Summary @node Output Summary
@section Summary @section Summary
skipping to change at page 110, line ? skipping to change at page 110, line ?
* Constants:: String, numeric and regexp constants. * Constants:: String, numeric and regexp constants.
* Using Constant Regexps:: When and how to use a regexp constant. * Using Constant Regexps:: When and how to use a regexp constant.
* Variables:: Variables give names to values for later use. * Variables:: Variables give names to values for later use.
* Conversion:: The conversion of strings to numbers and vice * Conversion:: The conversion of strings to numbers and vice
versa. versa.
@end menu @end menu
@node Constants @node Constants
@subsection Constant Expressions @subsection Constant Expressions
@cindex constants, types of @cindex constants @subentry types of
The simplest type of expression is the @dfn{constant}, which always has The simplest type of expression is the @dfn{constant}, which always has
the same value. There are three types of constants: numeric, the same value. There are three types of constants: numeric,
string, and regular expression. string, and regular expression.
Each is used in the appropriate context when you need a data Each is used in the appropriate context when you need a data
value that isn't going to change. Numeric constants can value that isn't going to change. Numeric constants can
have different forms, but are internally stored in an identical manner. have different forms, but are internally stored in an identical manner.
@menu @menu
* Scalar Constants:: Numeric and string constants. * Scalar Constants:: Numeric and string constants.
* Nondecimal-numbers:: What are octal and hex numbers. * Nondecimal-numbers:: What are octal and hex numbers.
* Regexp Constants:: Regular Expression constants. * Regexp Constants:: Regular Expression constants.
@end menu @end menu
@node Scalar Constants @node Scalar Constants
@subsubsection Numeric and String Constants @subsubsection Numeric and String Constants
@cindex constants, numeric @cindex constants @subentry numeric
@cindex numeric constants @cindex numeric @subentry constants
A @dfn{numeric constant} stands for a number. This number can be an A @dfn{numeric constant} stands for a number. This number can be an
integer, a decimal fraction, or a number in scientific (exponential) integer, a decimal fraction, or a number in scientific (exponential)
notation.@footnote{The internal representation of all numbers, notation.@footnote{The internal representation of all numbers,
including integers, uses double-precision floating-point numbers. including integers, uses double-precision floating-point numbers.
On most modern systems, these are in IEEE 754 standard format. On most modern systems, these are in IEEE 754 standard format.
@xref{Arbitrary Precision Arithmetic}, for much more information.} @xref{Arbitrary Precision Arithmetic}, for much more information.}
Here are some examples of numeric constants that all Here are some examples of numeric constants that all
have the same value: have the same value:
@example @example
105 105
1.05e+2 1.05e+2
1050e-1 1050e-1
@end example @end example
@cindex string constants @cindex string @subentry constants
@cindex constants @subentry string
A @dfn{string constant} consists of a sequence of characters enclosed in A @dfn{string constant} consists of a sequence of characters enclosed in
double quotation marks. For example: double quotation marks. For example:
@example @example
"parrot" "parrot"
@end example @end example
@noindent @noindent
@cindex differences in @command{awk} and @command{gawk}, strings @cindex differences in @command{awk} and @command{gawk} @subentry strings
@cindex strings, length limitations @cindex strings @subentry length limitations
@cindex ASCII
represents the string whose contents are @samp{parrot}. Strings in represents the string whose contents are @samp{parrot}. Strings in
@command{gawk} can be of any length, and they can contain any of the possible @command{gawk} can be of any length, and they can contain any of the possible
eight-bit ASCII characters, including ASCII @sc{nul} (character code zero). eight-bit ASCII characters, including ASCII @sc{nul} (character code zero).
Other @command{awk} Other @command{awk}
implementations may have difficulty with some character codes. implementations may have difficulty with some character codes.
Some languages allow you to continue long strings across Some languages allow you to continue long strings across
multiple lines by ending the line with a backslash. For example in C: multiple lines by ending the line with a backslash. For example in C:
@example @example
#include <stdio.h> #include <stdio.h>
int main() int main()
@{ @{
printf "hello, \ printf("hello, \
world\n"); world\n");
return 0; return 0;
@} @}
@end example @end example
@noindent @noindent
In such a case, the C compiler removes both the backslash and the newline, In such a case, the C compiler removes both the backslash and the newline,
producing a string as if it had been typed @samp{"hello, world\n"}. producing a string as if it had been typed @samp{"hello, world\n"}.
This is useful when a single string needs to contain a large amount of text. This is useful when a single string needs to contain a large amount of text.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
$ @kbd{gawk 'BEGIN @{ print "hello, } $ @kbd{gawk 'BEGIN @{ print "hello, }
> @kbd{world" @}'} > @kbd{world" @}'}
@print{} gawk: cmd. line:1: BEGIN @{ print "hello, @print{} gawk: cmd. line:1: BEGIN @{ print "hello,
@print{} gawk: cmd. line:1: ^ unterminated string @print{} gawk: cmd. line:1: ^ unterminated string
@print{} gawk: cmd. line:1: BEGIN @{ print "hello, @print{} gawk: cmd. line:1: BEGIN @{ print "hello,
@print{} gawk: cmd. line:1: ^ syntax error @print{} gawk: cmd. line:1: ^ syntax error
@end example @end example
@cindex dark corner, string continuation @cindex dark corner @subentry string continuation
@cindex strings, continuation across lines @cindex strings @subentry continuation across lines
@cindex differences in @command{awk} and @command{gawk}, strings @cindex differences in @command{awk} and @command{gawk} @subentry strings
Although POSIX doesn't define what happens if you use an escaped Although POSIX doesn't define what happens if you use an escaped
newline, as in the previous C example, all known versions of newline, as in the previous C example, all known versions of
@command{awk} allow you to do so. Unfortunately, what each one @command{awk} allow you to do so. Unfortunately, what each one
does with such a string varies. @value{DARKCORNER} @command{gawk}, does with such a string varies. @value{DARKCORNER} @command{gawk},
@command{mawk}, and the OpenSolaris POSIX @command{awk} @command{mawk}, and the OpenSolaris POSIX @command{awk}
(@pxref{Other Versions}) elide the backslash and newline, as in C: (@pxref{Other Versions}) elide the backslash and newline, as in C:
@example @example
$ @kbd{gawk 'BEGIN @{ print "hello, \} $ @kbd{gawk 'BEGIN @{ print "hello, \}
> @kbd{world" @}'} > @kbd{world" @}'}
@print{} hello, world @print{} hello, world
@end example @end example
@cindex POSIX mode
In POSIX mode (@pxref{Options}), @command{gawk} does not In POSIX mode (@pxref{Options}), @command{gawk} does not
allow escaped newlines. Otherwise, it behaves as just described. allow escaped newlines. Otherwise, it behaves as just described.
Brian Kernighan's @command{awk} and BusyBox @command{awk} Brian Kernighan's @command{awk} and BusyBox @command{awk}
remove the backslash but leave the newline remove the backslash but leave the newline
intact, as part of the string: intact, as part of the string:
@example @example
$ @kbd{nawk 'BEGIN @{ print "hello, \} $ @kbd{nawk 'BEGIN @{ print "hello, \}
> @kbd{world" @}'} > @kbd{world" @}'}
@print{} hello, @print{} hello,
@print{} world @print{} world
@end example @end example
@node Nondecimal-numbers @node Nondecimal-numbers
@subsubsection Octal and Hexadecimal Numbers @subsubsection Octal and Hexadecimal Numbers
@cindex octal numbers @cindex octal numbers
@cindex hexadecimal numbers @cindex hexadecimal numbers
@cindex numbers, octal @cindex numbers @subentry octal
@cindex numbers, hexadecimal @cindex numbers @subentry hexadecimal
In @command{awk}, all numbers are in decimal (i.e., base 10). Many other In @command{awk}, all numbers are in decimal (i.e., base 10). Many other
programming languages allow you to specify numbers in other bases, often programming languages allow you to specify numbers in other bases, often
octal (base 8) and hexadecimal (base 16). octal (base 8) and hexadecimal (base 16).
In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on. In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on.
Just as @samp{11} in decimal is 1 times 10 plus 1, so Just as @samp{11} in decimal is 1 times 10 plus 1, so
@samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal. @samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal.
In hexadecimal, there are 16 digits. Because the everyday decimal In hexadecimal, there are 16 digits. Because the everyday decimal
number system only has ten digits (@samp{0}--@samp{9}), the letters number system only has ten digits (@samp{0}--@samp{9}), the letters
@samp{a} through @samp{f} represent the rest. @samp{a} through @samp{f} represent the rest.
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
$ @kbd{gawk 'BEGIN @{ printf "%d, %d, %d\n", 011, 11, 0x11 @}'} $ @kbd{gawk 'BEGIN @{ printf "%d, %d, %d\n", 011, 11, 0x11 @}'}
@print{} 9, 11, 17 @print{} 9, 11, 17
@end example @end example
Being able to use octal and hexadecimal constants in your programs is most Being able to use octal and hexadecimal constants in your programs is most
useful when working with data that cannot be represented conveniently as useful when working with data that cannot be represented conveniently as
characters or as regular numbers, such as binary data of various sorts. characters or as regular numbers, such as binary data of various sorts.
@cindex @command{gawk}, octal numbers and @cindex @command{gawk} @subentry octal numbers and
@cindex @command{gawk}, hexadecimal numbers and @cindex @command{gawk} @subentry hexadecimal numbers and
@command{gawk} allows the use of octal and hexadecimal @command{gawk} allows the use of octal and hexadecimal
constants in your program text. However, such numbers in the input data constants in your program text. However, such numbers in the input data
are not treated differently; doing so by default would break old are not treated differently; doing so by default would break old
programs. programs.
(If you really need to do this, use the @option{--non-decimal-data} (If you really need to do this, use the @option{--non-decimal-data}
command-line option; command-line option;
@pxref{Nondecimal Data}.) @pxref{Nondecimal Data}.)
If you have octal or hexadecimal data, If you have octal or hexadecimal data,
you can use the @code{strtonum()} function you can use the @code{strtonum()} function
(@pxref{String Functions}) (@pxref{String Functions})
skipping to change at page 110, line ? skipping to change at page 110, line ?
Unlike in some early C implementations, @samp{8} and @samp{9} are not Unlike in some early C implementations, @samp{8} and @samp{9} are not
valid in octal constants. For example, @command{gawk} treats @samp{018} valid in octal constants. For example, @command{gawk} treats @samp{018}
as decimal 18: as decimal 18:
@example @example
$ @kbd{gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}'} $ @kbd{gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}'}
@print{} 021 is 17 @print{} 021 is 17
@print{} 18 @print{} 18
@end example @end example
@cindex compatibility mode (@command{gawk}), octal numbers @cindex compatibility mode (@command{gawk}) @subentry octal numbers
@cindex compatibility mode (@command{gawk}), hexadecimal numbers @cindex compatibility mode (@command{gawk}) @subentry hexadecimal numbers
Octal and hexadecimal source code constants are a @command{gawk} extension. Octal and hexadecimal source code constants are a @command{gawk} extension.
If @command{gawk} is in compatibility mode If @command{gawk} is in compatibility mode
(@pxref{Options}), (@pxref{Options}),
they are not available. they are not available.
@cindex sidebar, A Constant's Base Does Not Affect Its Value @cindex sidebar @subentry A Constant's Base Does Not Affect Its Value
@ifdocbook @ifdocbook
@docbook @docbook
<sidebar><title>A Constant's Base Does Not Affect Its Value</title> <sidebar><title>A Constant's Base Does Not Affect Its Value</title>
@end docbook @end docbook
Once a numeric constant has Once a numeric constant has
been converted internally into a number, been converted internally into a number,
@command{gawk} no longer remembers @command{gawk} no longer remembers
what the original form of the constant was; the internal value is what the original form of the constant was; the internal value is
always used. This has particular consequences for conversion of always used. This has particular consequences for conversion of
skipping to change at page 110, line ? skipping to change at page 110, line ?
@end example @end example
@end cartouche @end cartouche
@end ifnotdocbook @end ifnotdocbook
@node Regexp Constants @node Regexp Constants
@subsubsection Regular Expression Constants @subsubsection Regular Expression Constants
@cindex regexp constants @cindex regexp constants
@cindex @code{~} (tilde), @code{~} operator @cindex @code{~} (tilde), @code{~} operator
@cindex tilde (@code{~}), @code{~} operator @cindex tilde (@code{~}), @code{~} operator
@cindex @code{!} (exclamation point), @code{!~} operator @cindex @code{!} (exclamation point) @subentry @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator @cindex exclamation point (@code{!}) @subentry @code{!~} operator
A @dfn{regexp constant} is a regular expression description enclosed in A @dfn{regexp constant} is a regular expression description enclosed in
slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in
@command{awk} programs are constant, but the @samp{~} and @samp{!~} @command{awk} programs are constant, but the @samp{~} and @samp{!~}
matching operators can also match computed or dynamic regexps matching operators can also match computed or dynamic regexps
(which are typically just ordinary strings or variables that contain a regexp, (which are typically just ordinary strings or variables that contain a regexp,
but could be more complex expressions). but could be more complex expressions).
@node Using Constant Regexps @node Using Constant Regexps
@subsection Using Regular Expression Constants @subsection Using Regular Expression Constants
skipping to change at page 110, line ? skipping to change at page 110, line ?
@dfn{strongly typed regexp constants}, which are a @command{gawk} extension. @dfn{strongly typed regexp constants}, which are a @command{gawk} extension.
@menu @menu
* Standard Regexp Constants:: Regexp constants in standard @command{awk}. * Standard Regexp Constants:: Regexp constants in standard @command{awk}.
* Strong Regexp Constants:: Strongly typed regexp constants. * Strong Regexp Constants:: Strongly typed regexp constants.
@end menu @end menu
@node Standard Regexp Constants @node Standard Regexp Constants
@subsubsection Standard Regular Expression Constants @subsubsection Standard Regular Expression Constants
@cindex dark corner, regexp constants @cindex dark corner @subentry regexp constants
When used on the righthand side of the @samp{~} or @samp{!~} When used on the righthand side of the @samp{~} or @samp{!~}
operators, a regexp constant merely stands for the regexp that is to be operators, a regexp constant merely stands for the regexp that is to be
matched. matched.
However, regexp constants (such as @code{/foo/}) may be used like simple express ions. However, regexp constants (such as @code{/foo/}) may be used like simple express ions.
When a When a
regexp constant appears by itself, it has the same meaning as if it appeared regexp constant appears by itself, it has the same meaning as if it appeared
in a pattern (i.e., @samp{($0 ~ /foo/)}). in a pattern (i.e., @samp{($0 ~ /foo/)}).
@value{DARKCORNER} @value{DARKCORNER}
@xref{Expression Patterns}. @xref{Expression Patterns}.
This means that the following two code segments: This means that the following two code segments:
skipping to change at page 110, line ? skipping to change at page 110, line ?
Boolean expression is valid, but does not do what its author probably Boolean expression is valid, but does not do what its author probably
intended: intended:
@example @example
# Note that /foo/ is on the left of the ~ # Note that /foo/ is on the left of the ~
if (/foo/ ~ $1) print "found foo" if (/foo/ ~ $1) print "found foo"
@end example @end example
@c @cindex automatic warnings @c @cindex automatic warnings
@c @cindex warnings, automatic @c @cindex warnings, automatic
@cindex @command{gawk}, regexp constants and @cindex @command{gawk} @subentry regexp constants and
@cindex regexp constants, in @command{gawk} @cindex regexp constants @subentry in @command{gawk}
@noindent @noindent
This code is ``obviously'' testing @code{$1} for a match against the regexp This code is ``obviously'' testing @code{$1} for a match against the regexp
@code{/foo/}. But in fact, the expression @samp{/foo/ ~ $1} really means @code{/foo/}. But in fact, the expression @samp{/foo/ ~ $1} really means
@samp{($0 ~ /foo/) ~ $1}. In other words, first match the input record @samp{($0 ~ /foo/) ~ $1}. In other words, first match the input record
against the regexp @code{/foo/}. The result is either zero or one, against the regexp @code{/foo/}. The result is either zero or one,
depending upon the success or failure of the match. That result depending upon the success or failure of the match. That result
is then matched against the first field in the record. is then matched against the first field in the record.
Because it is unlikely that you would ever really want to make this kind of Because it is unlikely that you would ever really want to make this kind of
test, @command{gawk} issues a warning when it sees this construct in test, @command{gawk} issues a warning when it sees this construct in
a program. a program.
Another consequence of this rule is that the assignment statement: Another consequence of this rule is that the assignment statement:
@example @example
matches = /foo/ matches = /foo/
@end example @end example
@noindent @noindent
assigns either zero or one to the variable @code{matches}, depending assigns either zero or one to the variable @code{matches}, depending
upon the contents of the current input record. upon the contents of the current input record.
@cindex differences in @command{awk} and @command{gawk}, regexp constants @cindex differences in @command{awk} and @command{gawk} @subentry regexp constan
@cindex dark corner, regexp constants, as arguments to user-defined functions ts
@cindex dark corner @subentry regexp constants @subentry as arguments to user-de
fined functions
@cindexgawkfunc{gensub} @cindexgawkfunc{gensub}
@cindexawkfunc{sub} @cindexawkfunc{sub}
@cindexawkfunc{gsub} @cindexawkfunc{gsub}
Constant regular expressions are also used as the first argument for Constant regular expressions are also used as the first argument for
the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, as the the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, as the
second argument of the @code{match()} function, second argument of the @code{match()} function,
and as the third argument of the @code{split()} and @code{patsplit()} functions and as the third argument of the @code{split()} and @code{patsplit()} functions
(@pxref{String Functions}). (@pxref{String Functions}).
Modern implementations of @command{awk}, including @command{gawk}, allow Modern implementations of @command{awk}, including @command{gawk}, allow
the third argument of @code{split()} to be a regexp constant, but some the third argument of @code{split()} to be a regexp constant, but some
skipping to change at page 110, line ? skipping to change at page 110, line ?
num = 42 @ii{Numeric variable} num = 42 @ii{Numeric variable}
str = "hi" @ii{String variable} str = "hi" @ii{String variable}
re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/ re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
@end example @end example
For a number of more advanced use cases, For a number of more advanced use cases,
it would be nice to have regexp constants that it would be nice to have regexp constants that
are @dfn{strongly typed}; in other words, that denote a regexp useful are @dfn{strongly typed}; in other words, that denote a regexp useful
for matching, and not an expression. for matching, and not an expression.
@cindex values, regexp @cindex values @subentry regexp
@command{gawk} provides this feature. A strongly typed regexp constant @command{gawk} provides this feature. A strongly typed regexp constant
looks almost like a regular regexp constant, except that it is preceded looks almost like a regular regexp constant, except that it is preceded
by an @samp{@@} sign: by an @samp{@@} sign:
@example @example
re = @@/foo/ @ii{Regexp variable} re = @@/foo/ @ii{Regexp variable}
@end example @end example
Strongly typed regexp constants @emph{cannot} be used everywhere that a Strongly typed regexp constants @emph{cannot} be used everywhere that a
regular regexp constant can, because this would make the language even more regular regexp constant can, because this would make the language even more
skipping to change at page 110, line ? skipping to change at page 110, line ?
calls (@pxref{Indirect Calls}) calls (@pxref{Indirect Calls})
and on to the built-in functions that accept regexp constants. and on to the built-in functions that accept regexp constants.
When used in numeric conversions, strongly typed regexp variables convert When used in numeric conversions, strongly typed regexp variables convert
to zero. When used in string conversions, they convert to the string to zero. When used in string conversions, they convert to the string
value of the original regexp text. value of the original regexp text.
@node Variables @node Variables
@subsection Variables @subsection Variables
@cindex variables, user-defined @cindex variables @subentry user-defined
@cindex user-defined, variables @cindex user-defined @subentry variables
@dfn{Variables} are ways of storing values at one point in your program for @dfn{Variables} are ways of storing values at one point in your program for
use later in another part of your program. They can be manipulated use later in another part of your program. They can be manipulated
entirely within the program text, and they can also be assigned values entirely within the program text, and they can also be assigned values
on the @command{awk} command line. on the @command{awk} command line.
@menu @menu
* Using Variables:: Using variables in your programs. * Using Variables:: Using variables in your programs.
* Assignment Options:: Setting variables on the command line and a * Assignment Options:: Setting variables on the command line and a
summary of command-line syntax. This is an summary of command-line syntax. This is an
advanced method of input. advanced method of input.
skipping to change at page 110, line ? skipping to change at page 110, line ?
A variable name is a valid expression by itself; it represents the A variable name is a valid expression by itself; it represents the
variable's current value. Variables are given new values with variable's current value. Variables are given new values with
@dfn{assignment operators}, @dfn{increment operators}, and @dfn{assignment operators}, @dfn{increment operators}, and
@dfn{decrement operators} @dfn{decrement operators}
(@pxref{Assignment Ops}). (@pxref{Assignment Ops}).
In addition, the @code{sub()} and @code{gsub()} functions can In addition, the @code{sub()} and @code{gsub()} functions can
change a variable's value, and the @code{match()}, @code{split()}, change a variable's value, and the @code{match()}, @code{split()},
and @code{patsplit()} functions can change the contents of their and @code{patsplit()} functions can change the contents of their
array parameters (@pxref{String Functions}). array parameters (@pxref{String Functions}).
@cindex variables, built-in @cindex variables @subentry built-in
@cindex variables, initializing @cindex variables @subentry initializing
A few variables have special built-in meanings, such as @code{FS} (the A few variables have special built-in meanings, such as @code{FS} (the
field separator) and @code{NF} (the number of fields in the current input field separator) and @code{NF} (the number of fields in the current input
record). @xref{Built-in Variables} for a list of the predefined variables. record). @xref{Built-in Variables} for a list of the predefined variables.
These predefined variables can be used and assigned just like all other These predefined variables can be used and assigned just like all other
variables, but their values are also used or changed automatically by variables, but their values are also used or changed automatically by
@command{awk}. All predefined variables' names are entirely uppercase. @command{awk}. All predefined variables' names are entirely uppercase.
Variables in @command{awk} can be assigned either numeric or string values. Variables in @command{awk} can be assigned either numeric or string values.
The kind of value a variable holds can change over the life of a program. The kind of value a variable holds can change over the life of a program.
By default, variables are initialized to the empty string, which By default, variables are initialized to the empty string, which
is zero if converted to a number. There is no need to explicitly is zero if converted to a number. There is no need to explicitly
initialize a variable in @command{awk}, initialize a variable in @command{awk},
which is what you would do in C and in most other traditional languages. which is what you would do in C and in most other traditional languages.
@node Assignment Options @node Assignment Options
@subsubsection Assigning Variables on the Command Line @subsubsection Assigning Variables on the Command Line
@cindex variables, assigning on command line @cindex variables @subentry assigning on command line
@cindex command line, variables@comma{} assigning on @cindex command line @subentry variables, assigning on
Any @command{awk} variable can be set by including a @dfn{variable assignment} Any @command{awk} variable can be set by including a @dfn{variable assignment}
among the arguments on the command line when @command{awk} is invoked among the arguments on the command line when @command{awk} is invoked
(@pxref{Other Arguments}). (@pxref{Other Arguments}).
Such an assignment has the following form: Such an assignment has the following form:
@example @example
@var{variable}=@var{text} @var{variable}=@var{text}
@end example @end example
skipping to change at page 110, line ? skipping to change at page 110, line ?
@example @example
$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list} $ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list}
@print{} 15 @print{} 15
@print{} 24 @print{} 24
@dots{} @dots{}
@print{} 555-5553 @print{} 555-5553
@print{} 555-3412 @print{} 555-3412
@dots{} @dots{}
@end example @end example
@cindex dark corner, command-line arguments @cindex dark corner @subentry command-line arguments
Command-line arguments are made available for explicit examination by Command-line arguments are made available for explicit examination by
the @command{awk} program in the @code{ARGV} array the @command{awk} program in the @code{ARGV} array
(@pxref{ARGC and ARGV}). (@pxref{ARGC and ARGV}).
@command{awk} processes the values of command-line assignments for escape @command{awk} processes the values of command-line assignments for escape
sequences sequences
(@pxref{Escape Sequences}). (@pxref{Escape Sequences}).
@value{DARKCORNER} @value{DARKCORNER}
Normally, variables assigned on the command line (with or without the Normally, variables assigned on the command line (with or without the
@option{-v} option) are treated as strings. When such variables are @option{-v} option) are treated as strings. When such variables are
used as numbers, @command{awk}'s normal automatic conversion of strings used as numbers, @command{awk}'s normal automatic conversion of strings
to numbers takes place, and everything ``just works.'' to numbers takes place, and everything ``just works.''
However, @command{gawk} supports variables whose types are ``regexp''. However, @command{gawk} supports variables whose types are ``regexp''.
You can assign variables of this type using the following syntax: You can assign variables of this type using the following syntax:
@example @example
gawk -v 're1=@/foo|bar/' '@dots{}' /path/to/file1 're2=@/baz|quux/' /path/to/fil e2 gawk -v 're1=@@/foo|bar/' '@dots{}' /path/to/file1 're2=@@/baz|quux/' /path/to/f ile2
@end example @end example
@noindent @noind