awk.1 (17541B)
1 .\" $OpenBSD: awk.1,v 1.40 2011/05/02 11:14:11 jmc Exp $ 2 .\" 3 .\" Copyright (C) Lucent Technologies 1997 4 .\" All Rights Reserved 5 .\" 6 .\" Permission to use, copy, modify, and distribute this software and 7 .\" its documentation for any purpose and without fee is hereby 8 .\" granted, provided that the above copyright notice appear in all 9 .\" copies and that both that the copyright notice and this 10 .\" permission notice and warranty disclaimer appear in supporting 11 .\" documentation, and that the name Lucent Technologies or any of 12 .\" its entities not be used in advertising or publicity pertaining 13 .\" to distribution of the software without specific, written prior 14 .\" permission. 15 .\" 16 .\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17 .\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18 .\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19 .\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21 .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22 .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23 .\" THIS SOFTWARE. 24 .\" 25 .Dd $Mdocdate: May 2 2011 $ 26 .Dt AWK 1 27 .Os 28 .Sh NAME 29 .Nm awk 30 .Nd pattern-directed scanning and processing language 31 .Sh SYNOPSIS 32 .Nm awk 33 .Op Fl safe 34 .Op Fl V 35 .Op Fl d Ns Op Ar n 36 .Op Fl F Ar fs 37 .Op Fl v Ar var Ns = Ns Ar value 38 .Op Ar prog | Fl f Ar progfile 39 .Ar 40 .Sh DESCRIPTION 41 .Nm 42 scans each input 43 .Ar file 44 for lines that match any of a set of patterns specified literally in 45 .Ar prog 46 or in one or more files specified as 47 .Fl f Ar progfile . 48 With each pattern there can be an associated action that will be performed 49 when a line of a 50 .Ar file 51 matches the pattern. 52 Each line is matched against the 53 pattern portion of every pattern-action statement; 54 the associated action is performed for each matched pattern. 55 The file name 56 .Sq - 57 means the standard input. 58 Any 59 .Ar file 60 of the form 61 .Ar var Ns = Ns Ar value 62 is treated as an assignment, not a filename, 63 and is executed at the time it would have been opened if it were a filename. 64 .Pp 65 The options are as follows: 66 .Bl -tag -width "-safe " 67 .It Fl d Ns Op Ar n 68 Debug mode. 69 Set debug level to 70 .Ar n , 71 or 1 if 72 .Ar n 73 is not specified. 74 A value greater than 1 causes 75 .Nm 76 to dump core on fatal errors. 77 .It Fl F Ar fs 78 Define the input field separator to be the regular expression 79 .Ar fs . 80 .It Fl f Ar progfile 81 Read program code from the specified file 82 .Ar progfile 83 instead of from the command line. 84 .It Fl safe 85 Disable file output 86 .Pf ( Ic print No > , 87 .Ic print No >> ) , 88 process creation 89 .Po 90 .Ar cmd | Ic getline , 91 .Ic print | , 92 .Ic system 93 .Pc 94 and access to the environment 95 .Pf ( Va ENVIRON ; 96 see the section on variables below). 97 This is a first 98 .Pq and not very reliable 99 approximation to a 100 .Dq safe 101 version of 102 .Nm . 103 .It Fl V 104 Print the version number of 105 .Nm 106 to standard output and exit. 107 .It Fl v Ar var Ns = Ns Ar value 108 Assign 109 .Ar value 110 to variable 111 .Ar var 112 before 113 .Ar prog 114 is executed; 115 any number of 116 .Fl v 117 options may be present. 118 .El 119 .Pp 120 The input is normally made up of input lines 121 .Pq records 122 separated by newlines, or by the value of 123 .Va RS . 124 If 125 .Va RS 126 is null, then any number of blank lines are used as the record separator, 127 and newlines are used as field separators 128 (in addition to the value of 129 .Va FS ) . 130 This is convenient when working with multi-line records. 131 .Pp 132 An input line is normally made up of fields separated by whitespace, 133 or by the regular expression 134 .Va FS . 135 The fields are denoted 136 .Va $1 , $2 , ... , 137 while 138 .Va $0 139 refers to the entire line. 140 If 141 .Va FS 142 is null, the input line is split into one field per character. 143 .Pp 144 Normally, any number of blanks separate fields. 145 In order to set the field separator to a single blank, use the 146 .Fl F 147 option with a value of 148 .Sq [\ \&] . 149 If a field separator of 150 .Sq t 151 is specified, 152 .Nm 153 treats it as if 154 .Sq \et 155 had been specified and uses 156 .Aq TAB 157 as the field separator. 158 In order to use a literal 159 .Sq t 160 as the field separator, use the 161 .Fl F 162 option with a value of 163 .Sq [t] . 164 .Pp 165 A pattern-action statement has the form 166 .Pp 167 .D1 Ar pattern Ic \&{ Ar action Ic \&} 168 .Pp 169 A missing 170 .Ic \&{ Ar action Ic \&} 171 means print the line; 172 a missing pattern always matches. 173 Pattern-action statements are separated by newlines or semicolons. 174 .Pp 175 Newlines are permitted after a terminating statement or following a comma 176 .Pq Sq ,\& , 177 an open brace 178 .Pq Sq { , 179 a logical AND 180 .Pq Sq && , 181 a logical OR 182 .Pq Sq || , 183 after the 184 .Sq do 185 or 186 .Sq else 187 keywords, 188 or after the closing parenthesis of an 189 .Sq if , 190 .Sq for , 191 or 192 .Sq while 193 statement. 194 Additionally, a backslash 195 .Pq Sq \e 196 can be used to escape a newline between tokens. 197 .Pp 198 An action is a sequence of statements. 199 A statement can be one of the following: 200 .Pp 201 .Bl -tag -width Ds -offset indent -compact 202 .It Xo Ic if ( Ar expression ) Ar statement 203 .Op Ic else Ar statement 204 .Xc 205 .It Ic while ( Ar expression ) Ar statement 206 .It Xo Ic for 207 .No ( Ar expression ; expression ; expression ) statement 208 .Xc 209 .It Xo Ic for 210 .No ( Ar var Ic in Ar array ) statement 211 .Xc 212 .It Xo Ic do 213 .Ar statement Ic while ( Ar expression ) 214 .Xc 215 .It Ic break 216 .It Ic continue 217 .It Xo Ic { 218 .Op Ar statement ... 219 .Ic } 220 .Xc 221 .It Xo Ar expression 222 .No # commonly 223 .Ar var No = Ar expression 224 .Xc 225 .It Xo Ic print 226 .Op Ar expression-list 227 .Op > Ns Ar expression 228 .Xc 229 .It Xo Ic printf Ar format 230 .Op Ar ... , expression-list 231 .Op > Ns Ar expression 232 .Xc 233 .It Ic return Op Ar expression 234 .It Xo Ic next 235 .No # skip remaining patterns on this input line 236 .Xc 237 .It Xo Ic nextfile 238 .No # skip rest of this file, open next, start at top 239 .Xc 240 .It Xo Ic delete 241 .Sm off 242 .Ar array Ic \&[ Ar expression Ic \&] 243 .Sm on 244 .No # delete an array element 245 .Xc 246 .It Xo Ic delete Ar array 247 .No # delete all elements of array 248 .Xc 249 .It Xo Ic exit 250 .Op Ar expression 251 .No # exit immediately; status is Ar expression 252 .Xc 253 .El 254 .Pp 255 Statements are terminated by 256 semicolons, newlines or right braces. 257 An empty 258 .Ar expression-list 259 stands for 260 .Ar $0 . 261 String constants are quoted 262 .Li \&"" , 263 with the usual C escapes recognized within 264 (see 265 .Xr printf 1 266 for a complete list of these). 267 Expressions take on string or numeric values as appropriate, 268 and are built using the operators 269 .Ic + \- * / % ^ 270 .Pq exponentiation , 271 and concatenation 272 .Pq indicated by whitespace . 273 The operators 274 .Ic \&! ++ \-\- += \-= *= /= %= ^= 275 .Ic > >= < <= == != ?: 276 are also available in expressions. 277 Variables may be scalars, array elements 278 (denoted 279 .Li x[i] ) 280 or fields. 281 Variables are initialized to the null string. 282 Array subscripts may be any string, 283 not necessarily numeric; 284 this allows for a form of associative memory. 285 Multiple subscripts such as 286 .Li [i,j,k] 287 are permitted; the constituents are concatenated, 288 separated by the value of 289 .Va SUBSEP 290 .Pq see the section on variables below . 291 .Pp 292 The 293 .Ic print 294 statement prints its arguments on the standard output 295 (or on a file if 296 .Pf > Ns Ar file 297 or 298 .Pf >> Ns Ar file 299 is present or on a pipe if 300 .Pf |\ \& Ar cmd 301 is present), separated by the current output field separator, 302 and terminated by the output record separator. 303 .Ar file 304 and 305 .Ar cmd 306 may be literal names or parenthesized expressions; 307 identical string values in different statements denote 308 the same open file. 309 The 310 .Ic printf 311 statement formats its expression list according to the format 312 (see 313 .Xr printf 1 ) . 314 .Pp 315 Patterns are arbitrary Boolean combinations 316 (with 317 .Ic "\&! || &&" ) 318 of regular expressions and 319 relational expressions. 320 .Nm 321 supports extended regular expressions 322 .Pq EREs . 323 See 324 .Xr re_format 7 325 for more information on regular expressions. 326 Isolated regular expressions 327 in a pattern apply to the entire line. 328 Regular expressions may also occur in 329 relational expressions, using the operators 330 .Ic ~ 331 and 332 .Ic !~ . 333 .Pf / Ns Ar re Ns / 334 is a constant regular expression; 335 any string (constant or variable) may be used 336 as a regular expression, except in the position of an isolated regular expression 337 in a pattern. 338 .Pp 339 A pattern may consist of two patterns separated by a comma; 340 in this case, the action is performed for all lines 341 from an occurrence of the first pattern 342 through an occurrence of the second. 343 .Pp 344 A relational expression is one of the following: 345 .Pp 346 .Bl -tag -width Ds -offset indent -compact 347 .It Ar expression matchop regular-expression 348 .It Ar expression relop expression 349 .It Ar expression Ic in Ar array-name 350 .It Xo Ic \&( Ns 351 .Ar expr , expr , \&... Ns Ic \&) in 352 .Ar array-name 353 .Xc 354 .El 355 .Pp 356 where a 357 .Ar relop 358 is any of the six relational operators in C, and a 359 .Ar matchop 360 is either 361 .Ic ~ 362 (matches) 363 or 364 .Ic !~ 365 (does not match). 366 A conditional is an arithmetic expression, 367 a relational expression, 368 or a Boolean combination 369 of these. 370 .Pp 371 The special patterns 372 .Ic BEGIN 373 and 374 .Ic END 375 may be used to capture control before the first input line is read 376 and after the last. 377 .Ic BEGIN 378 and 379 .Ic END 380 do not combine with other patterns. 381 .Pp 382 Variable names with special meanings: 383 .Pp 384 .Bl -tag -width "FILENAME " -compact 385 .It Va ARGC 386 Argument count, assignable. 387 .It Va ARGV 388 Argument array, assignable; 389 non-null members are taken as filenames. 390 .It Va CONVFMT 391 Conversion format when converting numbers 392 (default 393 .Qq Li %.6g ) . 394 .It Va ENVIRON 395 Array of environment variables; subscripts are names. 396 .It Va FILENAME 397 The name of the current input file. 398 .It Va FNR 399 Ordinal number of the current record in the current file. 400 .It Va FS 401 Regular expression used to separate fields; also settable 402 by option 403 .Fl F Ar fs . 404 .It Va NF 405 Number of fields in the current record. 406 .Va $NF 407 can be used to obtain the value of the last field in the current record. 408 .It Va NR 409 Ordinal number of the current record. 410 .It Va OFMT 411 Output format for numbers (default 412 .Qq Li %.6g ) . 413 .It Va OFS 414 Output field separator (default blank). 415 .It Va ORS 416 Output record separator (default newline). 417 .It Va RLENGTH 418 The length of the string matched by the 419 .Fn match 420 function. 421 .It Va RS 422 Input record separator (default newline). 423 .It Va RSTART 424 The starting position of the string matched by the 425 .Fn match 426 function. 427 .It Va SUBSEP 428 Separates multiple subscripts (default 034). 429 .El 430 .Sh FUNCTIONS 431 The awk language has a variety of built-in functions: 432 arithmetic, string, input/output, general, and bit-operation. 433 .Pp 434 Functions may be defined (at the position of a pattern-action statement) 435 thusly: 436 .Pp 437 .Dl function foo(a, b, c) { ...; return x } 438 .Pp 439 Parameters are passed by value if scalar, and by reference if array name; 440 functions may be called recursively. 441 Parameters are local to the function; all other variables are global. 442 Thus local variables may be created by providing excess parameters in 443 the function definition. 444 .Ss Arithmetic Functions 445 .Bl -tag -width "atan2(y, x)" 446 .It Fn atan2 y x 447 Return the arctangent of 448 .Fa y Ns / Ns Fa x 449 in radians. 450 .It Fn cos x 451 Return the cosine of 452 .Fa x , 453 where 454 .Fa x 455 is in radians. 456 .It Fn exp x 457 Return the exponential of 458 .Fa x . 459 .It Fn int x 460 Return 461 .Fa x 462 truncated to an integer value. 463 .It Fn log x 464 Return the natural logarithm of 465 .Fa x . 466 .It Fn rand 467 Return a random number, 468 .Fa n , 469 such that 470 .Sm off 471 .Pf 0 \*(Le Fa n No \*(Lt 1 . 472 .Sm on 473 .It Fn sin x 474 Return the sine of 475 .Fa x , 476 where 477 .Fa x 478 is in radians. 479 .It Fn sqrt x 480 Return the square root of 481 .Fa x . 482 .It Fn srand expr 483 Sets seed for 484 .Fn rand 485 to 486 .Fa expr 487 and returns the previous seed. 488 If 489 .Fa expr 490 is omitted, the time of day is used instead. 491 .El 492 .Ss String Functions 493 .Bl -tag -width "split(s, a, fs)" 494 .It Fn gsub r t s 495 The same as 496 .Fn sub 497 except that all occurrences of the regular expression are replaced. 498 .Fn gsub 499 returns the number of replacements. 500 .It Fn index s t 501 The position in 502 .Fa s 503 where the string 504 .Fa t 505 occurs, or 0 if it does not. 506 .It Fn length s 507 The length of 508 .Fa s 509 taken as a string, 510 or of 511 .Va $0 512 if no argument is given. 513 .It Fn match s r 514 The position in 515 .Fa s 516 where the regular expression 517 .Fa r 518 occurs, or 0 if it does not. 519 The variable 520 .Va RSTART 521 is set to the starting position of the matched string 522 .Pq which is the same as the returned value 523 or zero if no match is found. 524 The variable 525 .Va RLENGTH 526 is set to the length of the matched string, 527 or \-1 if no match is found. 528 .It Fn split s a fs 529 Splits the string 530 .Fa s 531 into array elements 532 .Va a[1] , a[2] , ... , a[n] 533 and returns 534 .Va n . 535 The separation is done with the regular expression 536 .Ar fs 537 or with the field separator 538 .Va FS 539 if 540 .Ar fs 541 is not given. 542 An empty string as field separator splits the string 543 into one array element per character. 544 .It Fn sprintf fmt expr ... 545 The string resulting from formatting 546 .Fa expr , ... 547 according to the 548 .Xr printf 1 549 format 550 .Fa fmt . 551 .It Fn sub r t s 552 Substitutes 553 .Fa t 554 for the first occurrence of the regular expression 555 .Fa r 556 in the string 557 .Fa s . 558 If 559 .Fa s 560 is not given, 561 .Va $0 562 is used. 563 An ampersand 564 .Pq Sq & 565 in 566 .Fa t 567 is replaced in string 568 .Fa s 569 with regular expression 570 .Fa r . 571 A literal ampersand can be specified by preceding it with two backslashes 572 .Pq Sq \e\e . 573 A literal backslash can be specified by preceding it with another backslash 574 .Pq Sq \e\e . 575 .Fn sub 576 returns the number of replacements. 577 .It Fn substr s m n 578 Return at most the 579 .Fa n Ns -character 580 substring of 581 .Fa s 582 that begins at position 583 .Fa m 584 counted from 1. 585 If 586 .Fa n 587 is omitted, or if 588 .Fa n 589 specifies more characters than are left in the string, 590 the length of the substring is limited by the length of 591 .Fa s . 592 .It Fn tolower str 593 Returns a copy of 594 .Fa str 595 with all upper-case characters translated to their 596 corresponding lower-case equivalents. 597 .It Fn toupper str 598 Returns a copy of 599 .Fa str 600 with all lower-case characters translated to their 601 corresponding upper-case equivalents. 602 .El 603 .Ss Input/Output and General Functions 604 .Bl -tag -width "getline [var] < file" 605 .It Fn close expr 606 Closes the file or pipe 607 .Fa expr . 608 .Fa expr 609 should match the string that was used to open the file or pipe. 610 .It Ar cmd | Ic getline Op Va var 611 Read a record of input from a stream piped from the output of 612 .Ar cmd . 613 If 614 .Va var 615 is omitted, the variables 616 .Va $0 617 and 618 .Va NF 619 are set. 620 Otherwise 621 .Va var 622 is set. 623 If the stream is not open, it is opened. 624 As long as the stream remains open, subsequent calls 625 will read subsequent records from the stream. 626 The stream remains open until explicitly closed with a call to 627 .Fn close . 628 .Ic getline 629 returns 1 for a successful input, 0 for end of file, and \-1 for an error. 630 .It Fn fflush [expr] 631 Flushes any buffered output for the file or pipe 632 .Fa expr , 633 or all open files or pipes if 634 .Fa expr 635 is omitted. 636 .Fa expr 637 should match the string that was used to open the file or pipe. 638 .It Ic getline 639 Sets 640 .Va $0 641 to the next input record from the current input file. 642 This form of 643 .Ic getline 644 sets the variables 645 .Va NF , 646 .Va NR , 647 and 648 .Va FNR . 649 .Ic getline 650 returns 1 for a successful input, 0 for end of file, and \-1 for an error. 651 .It Ic getline Va var 652 Sets 653 .Va $0 654 to variable 655 .Va var . 656 This form of 657 .Ic getline 658 sets the variables 659 .Va NR 660 and 661 .Va FNR . 662 .Ic getline 663 returns 1 for a successful input, 0 for end of file, and \-1 for an error. 664 .It Xo 665 .Ic getline Op Va var 666 .Pf \ \&< Ar file 667 .Xc 668 Sets 669 .Va $0 670 to the next record from 671 .Ar file . 672 If 673 .Va var 674 is omitted, the variables 675 .Va $0 676 and 677 .Va NF 678 are set. 679 Otherwise 680 .Va var 681 is set. 682 If 683 .Ar file 684 is not open, it is opened. 685 As long as the stream remains open, subsequent calls will read subsequent 686 records from 687 .Ar file . 688 .Ar file 689 remains open until explicitly closed with a call to 690 .Fn close . 691 .It Fn system cmd 692 Executes 693 .Fa cmd 694 and returns its exit status. 695 .El 696 .Ss Bit-Operation Functions 697 .Bl -tag -width "lshift(a, b)" 698 .It Fn compl x 699 Returns the bitwise complement of integer argument x. 700 .It Fn and x y 701 Performs a bitwise AND on integer arguments x and y. 702 .It Fn or x y 703 Performs a bitwise OR on integer arguments x and y. 704 .It Fn xor x y 705 Performs a bitwise Exclusive-OR on integer arguments x and y. 706 .It Fn lshift x n 707 Returns integer argument x shifted by n bits to the left. 708 .It Fn rshift x n 709 Returns integer argument x shifted by n bits to the right. 710 .El 711 .Sh EXIT STATUS 712 .Ex -std awk 713 .Pp 714 But note that the 715 .Ic exit 716 expression can modify the exit status. 717 .Sh EXAMPLES 718 Print lines longer than 72 characters: 719 .Pp 720 .Dl length($0) > 72 721 .Pp 722 Print first two fields in opposite order: 723 .Pp 724 .Dl { print $2, $1 } 725 .Pp 726 Same, with input fields separated by comma and/or blanks and tabs: 727 .Bd -literal -offset indent 728 BEGIN { FS = ",[ \et]*|[ \et]+" } 729 { print $2, $1 } 730 .Ed 731 .Pp 732 Add up first column, print sum and average: 733 .Bd -literal -offset indent 734 { s += $1 } 735 END { print "sum is", s, " average is", s/NR } 736 .Ed 737 .Pp 738 Print all lines between start/stop pairs: 739 .Pp 740 .Dl /start/, /stop/ 741 .Pp 742 Simulate echo(1): 743 .Bd -literal -offset indent 744 BEGIN { # Simulate echo(1) 745 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 746 printf "\en" 747 exit } 748 .Ed 749 .Pp 750 Print an error message to standard error: 751 .Bd -literal -offset indent 752 { print "error!" > "/dev/stderr" } 753 .Ed 754 .Sh SEE ALSO 755 .Xr lex 1 , 756 .Xr printf 1 , 757 .Xr sed 1 , 758 .Xr re_format 7 , 759 .Xr script 7 760 .Rs 761 .%A A. V. Aho 762 .%A B. W. Kernighan 763 .%A P. J. Weinberger 764 .%T The AWK Programming Language 765 .%I Addison-Wesley 766 .%D 1988 767 .%O ISBN 0-201-07981-X 768 .Re 769 .Sh STANDARDS 770 The 771 .Nm 772 utility is compliant with the 773 .St -p1003.1-2008 774 specification. 775 .Pp 776 The flags 777 .Op Fl \&dV 778 and 779 .Op Fl safe , 780 as well as the commands 781 .Cm fflush , compl , and , or , 782 .Cm xor , lshift , rshift , 783 are extensions to that specification. 784 .Pp 785 .Nm 786 does not support {n,m} pattern matching. 787 .Sh HISTORY 788 An 789 .Nm 790 utility appeared in 791 .At v7 . 792 .Sh BUGS 793 There are no explicit conversions between numbers and strings. 794 To force an expression to be treated as a number add 0 to it; 795 to force it to be treated as a string concatenate 796 .Li \&"" 797 to it. 798 .Pp 799 The scope rules for variables in functions are a botch; 800 the syntax is worse.