Monday, November 24, 2008

Print range of columns using awk - exclude range


This is more relevant when you have a file with huge number of columns and you want to print a range of columns from the file or you want to exclude a range of columns from printing and print the rest of the columns.

Lets see with an small example,

Input file:

$ cat details.txt
Name Age Sex Add DOB CARD
AXU 12 M IN 12-Jul Y
ANI 13 F IN 10-Jan N
JCK 16 M JP 03-Frb Y
LON 12 M IN 12-Oct Y

A) Print a range of columns using awk:

e.g. Print from column 2 till column 4 using awk

$ awk -v f=2 -v t=4 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt

Age Sex Add
12 M IN
13 F IN
16 M JP
12 M IN

B) Exclude column range from printing in awk:

e.g. Exclude column range 2-4 and print the rest of the columns from the above file details.txt

$ awk -v f=2 -v t=4 '
{ for (i=1; i<=NF;i++)
if( i>=f && i<=t) continue;
else
printf("%s%s", $i,(i!=NF) ? OFS : ORS) }' details.txt

Name DOB CARD
AXU 12-Jul Y
ANI 10-Jan N
JCK 03-Frb Y
LON 12-Oct Y

Wednesday, November 19, 2008

Awk FNR variable usage example

I already discussed about awk FNR variable in many of my previous posts. Here is one more example for the use of NR==FNR in awk.

Description:

NR is the Number of the current Record (line) being processed.
FNR is the Number of the current Record within the current file.

So, for the first file processed they(NR, FNR) will be equal but on the first line of the second and subsequent files FNR will start from 1 again.

when NR==FNR (i.e. when processing the first file) an associative array is built up which stores the first field in an array element which also has the first field as its index.
and
when NR!=FNR (i.e. when processing the second and subsequent files) the associative array is checked to see if it has an element indexed by the first field and if so the default behavior (of printing out all of the line being processed) is carried out.


Input file:

$ cat file.txt
10 AD
20 NA
30 PS
50 KR

Required:

Add a third column to the above file which is the sum of first field elements.
i.e. required output:

10 AD 110
20 NA 110
30 PS 110
50 KR 110

awk solution:

$ awk 'NR==FNR{sum+=$1;next}$0=$0 FS sum' file.txt file.txt

10 AD 110
20 NA 110
30 PS 110
50 KR 110

Related posts for NR==FNR in awk:

- Match words between two file using awk
- Perform join using awk
- Update a file based on another file using sed
- Delete lines based on another file using awk
- Update file based on another file using awk

Monday, November 17, 2008

Concatenate lines using awk in bash

Input file:

$ cat rft01.txt
data set 01
unid=ef023; pid=34
data set 03
data set 09
unid=ef028; pid=36
data set 02
unid=ef021; pid=54

Output Required:
concatenate lines in the above file such that the o/p looks like this:

data set 01 unid=ef023; pid=34
data set 03
data set 09 unid=ef028; pid=36
data set 02 unid=ef021; pid=54

Awk solution:

$ awk 'END{print RS}$0=(/^data set/?NR==1?_:RS:FS)$0' ORS= rft01.txt

data set 01 unid=ef023; pid=34
data set 03
data set 09 unid=ef028; pid=36
data set 02 unid=ef021; pid=54

And if you want the o/p like this:

data set 01 unid=ef023; pid=34
data set 09 unid=ef028; pid=36
data set 02 unid=ef021; pid=54

The awk solution would be:

$ awk '/^data set/{s=$0;next}{print s " "$0}' rft01.txt

Related post:

- Merging lines using awk
- Merge previous line using sed

Wednesday, November 12, 2008

List empty directories using find in bash

Linux command find gives an option called "empty" using which we can list empty regular files or empty directories.

e.g.

To list all the empty directories

$ find . -type d -empty

Output:

./bdb/prac
./sim/old/data
./prac/testdir

Related post:

- find file names with only digits, no text
- Use logical operator with linux find command
- exclude directory from find command

Friday, November 7, 2008

Print individual records using awk array - bash

Office lunch time. JSingh, Vis and KKR were playing a game and they wanted me to count their scores.
I made a rough text file which took this form after completion of 2 rounds.

$ cat officegame.txt
Name|Round1|Round2
JSingh|0|20
Vis|50|0
KKR|20|20
JSingh|10|40
Vis|50|20
KKR|40|10
JSingh|40|60
Vis|30|20
KKR|90|20
JSingh|0|60
Vis|20|20
KKR|50|50

After 2 rounds, they asked me their individual total scores in each rounds. This is what I did for the same.


$ awk -F"|" '
NR==1 {print}
NR!=1 {OFS="|";a[$1]+=$2;b[$1]+=$3}
END{for (i in a){print i,a[i],b[i]}}
' officegame.txt


Output:
Name|Round1|Round2
Vis|150|60
JSingh|50|180
KKR|200|100

I sent them the output. JSingh asked me the breakdown of each individual score in each of the rounds.

I had to write this to achieve his requirement,


$ awk -F "|" 'NR > 1 {
if (n[$1] == $1) {
r1[$1] = r1[$1] "+" $2
r2[$1] = r2[$1] "+" $3

} else {
n[$1] = $1
r1[$1] = $2
r2[$1] = $3

}
}

END {
for (i in n) {
printf "%s [Round1={%s}, Round2={%s}]\n", n[i], r1[i], r2[i]
}
}' officegame.txt


Output:

Vis [Round1={50+50+30+20}, Round2={0+20+20+20}]
JSingh [Round1={0+10+40+0}, Round2={20+40+60+60}]
KKR [Round1={20+40+90+50}, Round2={20+10+20+50}]

Related post:
- sum of and group by using awk
- group-by clause functionality in awk
- awk associative array examples

Sunday, November 2, 2008

Reverse order of few lines awk - bash

Input file:

$ cat details.txt
line1
line2
line3
line4
line5
line6
line7
line8
line9
line10

Required: Reverse the order of the lines from line5 to line8. i.e. required output:

line1
line2
line3
line4
line8
line7
line6
line5
line9
line10

The awk solution:

$ awk -v from=5 -v to=8 'NR==from {
s=$0
for(i=from+1;i<to;i++){
getline;s=$0"\n"s
}
getline;print;print s
next
}1' details.txt

The 1 above in awk one liner can be replaced as {print}

In order to reverse the order of lines of the whole file, we have tac command, which print files in reverse.

$ tac details.txt

The same can be achieved using sed and awk as mentioned below:

$ sed -n '1!G;h;$p' details.txt
$ awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' details.txt

© Jadu Saikia http://unstableme.blogspot.com