One of good use of awk is splitting files (based on different conditions) into sub-files.
Lets see some examples:
Example 1:
Input file log1.txt:- Line starting with H is the main header line.
- Line starting with "h" and subsequent lines starting with "s" (till the next "h" line) are part of the same entry/section.
$ cat log1.txt
H,555,etho0
h,1,3
s,1233,456456,1212
s,4251,452456,7215
s,6283,851456,1219
h,9,2
s,2233,156456,1912
s,9233,256456,8212
h,2,4
s,4233,456456,1212
s,7251,252456,7215
s,1288,851456,9219
s,9183,851456,6219
Required:- Split or subdivide the above file into sub files corresponding to each entry (one entry being the section starting with "h" and "s" lines till the next "h" line)
- Each sub-file should contain(start with) the main header line ("H" line).
- The required output is 3 sub files with the following contents and filename convention.
$ cat 555.1.1.log
H,555,etho0
h,1,3
s,1233,456456,1212
s,4251,452456,7215
s,6283,851456,1219
$ cat 555.9.2.log
H,555,etho0
h,9,2
s,2233,156456,1912
s,9233,256456,8212
$ cat 555.2.3.log
H,555,etho0
h,2,4
s,4233,456456,1212
s,7251,252456,7215
s,1288,851456,9219
s,9183,851456,6219
The awk program:
$ awk -F "," '
$1=="H" {mainH=$0;id=$2;next}
/^h/{
hid=$2;close(id"."hid"."f".log")
f++
print mainH > id"."hid"."f".log"
}
{print $0 > id"."hid"."f".log"}
' log1.txt
Example 2:
Input file:
$ cat log.txt
1252468812,yahoo,3.5
1252468812,hotmail,2.4
1252468819,yahoo,1.2
1252468812,msn,8.9
1252468923,gmail,12
1252468819,live,3.4
1252468929,yahoo,9.0
1252468929,msn,1.2
Required:a) Split the above files based on the first field (i.e. lines with same first field should go to the same file)
The awk one liner:
$ awk -F "," '{close(f);f=$1}{print > f".txt"}' log.txt
Output:Above file is splited into the following sub-files.
$ cat 1252468812.txt
1252468812,yahoo,3.5
1252468812,hotmail,2.4
1252468812,msn,8.9
$ cat 1252468819.txt
1252468819,yahoo,1.2
1252468819,live,3.4
$ cat 1252468923.txt
1252468923,gmail,12
$ cat 1252468929.txt
1252468929,yahoo,9.0
1252468929,msn,1.2
b) Send every 3 lines of above file into a sub file.
The awk code:
$ awk '{print >("log_" int((NR+2)/3))}' log.txt
Output: The sub-files generated.
$ cat log_1
1252468812,yahoo,3.5
1252468812,hotmail,2.4
1252468819,yahoo,1.2
$ cat log_2
1252468812,msn,8.9
1252468923,gmail,12
1252468819,live,3.4
$ cat log_3
1252468929,yahoo,9.0
1252468929,msn,1.2
Related post: