Wednesday, March 24, 2010

Awk - count and sum on multiple columns


I have got the idea for this post from one of my reader "manimaran" who have posted a comment on one of my post related to group by using awk. Thanks "manimaran".

Input file:
      
$ cat details.txt
SA 0 2345
AF 0 123
SA 1 89
OC 0 890
EU 1 24
AF 1 90
NA 0 5678
AF 1 345
OC 1 90
OC 0 23
SA 1 1234
EU 0 90
AF 0 12
SA 0 909


Output required :

- Group by continent
- For each continent count the followings:

Count of 0
Count of 1
Sum of last fields where 2nd field is 0
Sum of last fields where 2nd field is 1

i.e. required output:

Continent count0 count1 sum0 sum1
NA 1 0 5678 0
OC 2 1 913 90
AF 2 2 135 435
SA 2 2 3254 1323
EU 1 1 90 24

The awk solution using awk associative array and awk if-else

$ awk '
BEGIN {print "Continent count0 count1 sum0 sum1"} {
n[$1]++;
C0_[$1] += ($2 == "0" ? 1 : 0); S0_[$1] += ($2 == "0" ? $NF : 0)
C1_[$1] += ($2 == "1" ? 1 : 0); S1_[$1] += ($2 == "1" ? $NF : 0)
}
END {
for (i in n) {
print i,C0_[i],C1_[i],S0_[i],S1_[i]
}
}' details.txt

Output:

Continent count0 count1 sum0 sum1
NA 1 0 5678 0
OC 2 1 913 90
AF 2 2 135 435
SA 2 2 3254 1323
EU 1 1 90 24

And if you need the output of the above awk script in a well formatted way, here is the one:

$ awk '
BEGIN {
FORMAT="%-10s%-8s%-8s%-8s%s\n"
{printf FORMAT,"Continent","count0","count1","sum0","sum1"}
}
{ n[$1]++
C0_[$1] += ($2 == "0" ? 1 : 0)
S0_[$1] += ($2 == "0" ? $NF : 0)
C1_[$1] += ($2 == "1" ? 1 : 0)
S1_[$1] += ($2 == "1" ? $NF : 0)
}
END {
for (i in n) {
printf FORMAT,i,C0_[i],C1_[i],S0_[i],S1_[i]
}
}' details.txt

Output:

Continent count0 count1 sum0 sum1
NA 1 0 5678 0
OC 2 1 913 90
AF 2 2 135 435
SA 2 2 3254 1323
EU 1 1 90 24

This post explains how we can align text using awk printf function.

Related posts:

- Group by clause functionality in awk
- Sum of and group by using awk in bash
- Grouping similar files using awk in bash

© Jadu Saikia http://unstableme.blogspot.com