Published Book on Amazon
All of IOT Starting with the Latest Raspberry Pi from Beginner to Advanced – Volume 1 | |
All of IOT Starting with the Latest Raspberry Pi from Beginner to Advanced – Volume 2 |
출판된 한글판 도서
최신 라즈베리파이(Raspberry Pi)로 시작하는 사물인터넷(IOT)의 모든 것 – 초보에서 고급까지 (상) | |
최신 라즈베리파이(Raspberry Pi)로 시작하는 사물인터넷(IOT)의 모든 것 – 초보에서 고급까지 (하) |
Original Book Contents
10.9 Commands Related to Data Manipulation
10.9.1 "awk" Command
The "awk" command reads the specified file, finds the line that has a value matching the specified pattern, and performs the specified operation if the pattern matches. Here, the operation means a field manipulation in a line or an arithmetic operation using a field value. The name "awk" comes from the names of three people who developed it, Aho, Weinberger, and Kernighan.
Awk is a programming language with features such as Shell programming, BC, and the C programming language. It works perfectly with BC, and field variables with names like Shell arguments $1, $2, and $3 can be used on each input line. It also has a printing and control operator similar to the C language.
[Command Format]
awk program <directory/file> |
[Command Overview]
■ In the specified file, this finds a line with a value matching the specified pattern, and if there is a line matching the pattern, performs the specified operation.
■ User privilege -- Normal user.
[Detail Description]
● How to read "awk" input line
When reading data from a standard input or file, each line is divided into several fields based on whitespace. In this command, you can use a variable that represents a field. $1 represents the first field and $2 represents the second field. $0 represents the entire line.
If you want to use a different delimiter instead of a space character, you can use it as a field delimiter by specifying an arbitrary character with "-F" (field) option. For example, to use ":" (colon) as a delimiter, type as follows:
■ awk -F: program data-files
● Program
The program expresses the operations on lines and fields that are read from the command. A "program" consists of one or more program lines. Each program line is defined as a pair of pattern and action, and the overall format is as follows:
■ pattern {action}
■ pattern {action}
Consider the example below. Using this program, you can search for lines containing the string "rotate" and print them. This has the same effect as using "grep rotate filename" command:
■ /rotate/ {print} à pattern is "rotate", and action is "print".
When defining a pattern, a simple string pattern is defined by enclosing it with "/" before/after the string. You can also use the following unusual patterns:
■ /fish/ line containing the string "fish"
■ $1 ~ /fish/ line where the first field contains the string "fish"
■ $1 !~ /fish/ line where the first field does not contain the string "fish"
In the above, "~" means that the pattern on the right is included in the field on the left. The combination "! ~" means that the pattern on the right is not included in the left field.
The action part is enclosed in curly braces to distinguish it from the pattern. If the action part is omitted, the line is printed. A commonly used action is "print", which prints its arguments to standard output. When several actions are included in { }, separate them with semicolon:
■ {print $2} Print the second field
■ {print $4,$2} Print them in the order of the fourth field and second field
■ {print $2,$2+$4} print the second field, the sum of the second and fourth field
■ {s=$2+$4; print s} Sum the second and fourth fields, and print the sum
● Examples of various action
In the following, we will show examples of various actions. Basically, we will explain using {print $2, $1}. The files generated below are for testing purposes.
pi@raspberrypi ~ $ vi in.file |
hello goodbye again 111 222 thirty forty |
The following example output the two input fields in reverses order. In this example, the arguments of "print" are separated by comma, and it cause to inserts a blank which is the current field separator between the output data. If the comma is omitted, $1 and $2 are printed consecutively.
pi@raspberrypi ~ $ vi awk.prog1 |
{print $2, $1} |
pi@raspberrypi ~ $ awk -f awk.prog1 in.file |
goodbye hello 222 111 forty thirty |
In the following example, for the program of the "awk" command, only one line matches the "/hello/" pattern in the input file, so the specified action "print" will be executed only for the line.
pi@raspberrypi ~ $ vi awk.prog2 |
/hello/ {print $2, $1} |
pi@raspberrypi ~ $ awk -f awk.prog2 in.file |
goodbye hello |
In the following example, the parameters added in the second "print" command are enclosed in quotation marks, and the string is added to the existing output and displayed.
pi@raspberrypi ~ $ vi awk.prog3 |
/hello/ {print $2, $1} /thirty/ {print $1, $2, "and more"} |
pi@raspberrypi ~ $ awk -f awk.prog3 in.file |
goodbye hello thirty forty and more |
In the following example, the "||" operator executes the specified operation if only one of the two regular expressions matches, and the "&&" operator executes the specified operation when all two regular expressions match. The "!" operator performs the specified operation if the regular expression does not match, and is expressed before the regular expression.
pi@raspberrypi ~ $ vi awk.prog4 |
/hello/||/111/ {print "hit", $1, $2} |
pi@raspberrypi ~ $ awk -f awk.prog4 in.file |
hit hello goodbye hit 111 222 |
In the following example, only the first input line matching the pattern "/hello/" is the target of the operation.
pi@raspberrypi ~ $ vi awk.prog6 |
/hello/ { print $2 print "another" print $1 } |
pi@raspberrypi ~ $ awk -f awk.prog6 in.file |
goodbye another hello |
● Arithmetic operation in action of "awk" command
The difference between arithmetic operations in "awk" and arithmetic operations in bc is that "awk" can use pattern function to select some of the lines in the input file. For example, the internal function of "awk", "length" returns the length of an input field treated as a string, while a numeric variable can be assigned the value of a field treated as a number, but a string that can not be converted to a number has a value of "0 ".
In the following example, the string "goodbye" or "thirty" can not be converted to a number, so "0" is assigned. However, the string "222" was correctly converted to a number. You can also assign values to variables in the same format as "s=0".
pi@raspberrypi ~ $ vi awk.prog7 |
{ s += $2 print $2, "length=" length($2), "s=" s } |
pi@raspberrypi ~ $ awk -f awk.prog7 in.file |
goodbye length=7 s=0 222 length=3 s=222 forth length=5 s=222 |
● "awk" command and variable
You can use variables in "awk". Variables do not need to be declared or initialized before use. The "awk" itself initializes variables, and you can store strings or numbers in the variables as needed. The following are correct examples of "awk" variables.
s, S, SS, S1, qwerty[42]
Because the type of the variable is automatically converted as shown in the following example, variables can be easily used.
pi@raspberrypi ~ $ vi awk.prog8 |
/hello/ { SSS=34 print "SSS is", SSS SSS=hello print "SSS is", SSS } |
pi@raspberrypi ~ $ awk -f awk.prog8 in.file |
SSS is 34 SSS is hello |
[Main Option]
-f file | program text is read from file instead of from the Shell. Multiple -f options are allowed. file: file name that include program instruction |
-F(field) |
|
-F value | sets the field separator, FS, to value. If you enter a character, field separator changes into the character |
[Used Example]
The file "sales" has six columns of information. From the beginning, item name, item selling price, and the item sales quantity for each quarter. Here we use the "awk" command to calculate the sum of the item sales quantity and the item sales amount, and add them to the file. To do this, create a file named "addup" as follows.
pi@raspberrypi ~ $ vi sales |
carts 29.99 45 13 55 22 corks 0.02 30 20 25 93 doors 49.99 40 15 20 25 geese 15.00 2 8 1 128 nighties 50.00 11 23 15 82 |
pi@raspberrypi ~ $ vi addup |
{total=$3+$4+$5+$6;print $0, total, total*$2} |
pi@raspberrypi ~ $ awk -f addup sales |
carts 29.99 45 13 55 22 135 4040.64 corks 0.02 30 20 25 93 93 3.36 doors 49.99 40 15 20 25 100 4999 geese 15.00 2 8 1 128 139 2085 nighties 50.00 11 23 15 82 131 6550 |
The above action consists of two parts separated by ";". The first part adds the sales quantities and stores them in the "total" variable. The second part prints the original line ($0), followed by the sales quantity, followed by the value of total * $2, which means "sales quantity x second column value".