Sunday, September 28, 2008

Sum of and group by using awk


$ cat cont_details.txt
continent|mval|cval|kval
SA|2345|10|2.3
AF|123|12|4.5
SA|89|12.67
OC|890|10|2.3
EU|24|45|2.4
AF|90|10|10
NA|5678|12|89
AF|345|12|3.5
OC|90|78|5.6
OC|23|12|4.5
SA|1234|12|6.7
EU|90|12|10
AF|12|12|34
SA|909|12|56

Output required:

Select continent, sum(mval),sum(cval),sum(kval) group by continent

i.e. required output:

continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4

Awk solution:

$ awk -F"|" '
NR==1 {print}
NR!=1 {OFS="|";a[$1]+=$2;b[$1]+=$3;c[$1]+=$NF}
END{for (i in a){print i,a[i],b[i],c[i]}}
' cont_details.txt

Output:

continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4

Related post:
Implement group by clause functionality using awk

15 comments:

Ruturaj Vartak said...

that is great! thanks a lot

Prashant Joshi said...

Nice one

yoosfg said...

Excellent.. Much needed help indeed..Thanks

awkseeker said...

I used two of your tricks and perfect work for me. I just wonder how does your script work, because you refer to $1 and then your add with $2, it is a big question in my head. Could you please explain more clear for me. Thank you very much for your contribution here. Hope to hear from you, my professor.

Jadu Saikia said...

@awkseeker,
Thanks for reading this blog.
I am putting an example below that will help you to understand awk associative-array. Please let me know if you still have any doubt on this.

$ cat file1.txt
Alex,147,1
Brian,056,7
Nanda,559,2
Lisa,221,0
Nanda,268,2
Alex,200,5
Alex,408,8


i)
#a[$1] registers the first field of the lines seen in the associative-array "a" (arrays are always associative in Awk)

$ awk -F "," '{a[$1]} END {for (i in a) print i}' file.txt

Nanda
Lisa
Alex
Brian

ii)
#Count of first field, sum of 2nd field, sum of 3rd field group by each first field

$ awk -F "," '{a[$1]++; b[$1]+=$2; c[$1]+=$3 } END {for (i in a) print i,a[i],b[i],c[i]}' file1.txt

Nanda 2 827 4
Lisa 1 221 0
Alex 3 755 14
Brian 1 56 7

iii)
#Sum of 2nd field, sum of 3rd field group by each first field

$ awk -F "," '{a[$1]+=$2; b[$1]+=$3 } END {for (i in a) print i,a[i],b[i]}' file1.txt

Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7

iv)
#Sum of 2nd field, sum of 3rd field group by each first field, same as ii)

$ awk -F "," '{a[$1]+=$2; b[$1]+=$3 } END {for (i in b) print i,a[i],b[i]}' file1.txt

Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7

Hope this helps.

awkseeker said...

Thank Juda, your blog is so valuable and save huge time to my work. I use you trick to do tabular text file and much more efficiency compare to traditional sql.

I am new to awk and have some basic of turbo c in previous time. however, your good explanation make me more clear and I would bother you if some doubts.

Best regards,

Jadu Saikia said...

@Awkseeker : thanks so much for your comment. Its really inspiring. Feel free to ask any doubt on Unix one liner, awk, sed and I will definitely try to help.

awkseeker said...
This comment has been removed by the author.
awkseeker said...
This comment has been removed by the author.
cef_luxan said...

nice tutor,
How about grouping with more than a column, i.e:
Data:
a, b, c, d
Require:
select a, b, sum(c), sum(d) group by a,b;

Thnk's

cef_luxan said...

nice tutor,
How about grouping with more than a column, i.e:
Data:
a, b, c, d
Require:
select a, b, sum(c), sum(d) group by a,b;

Thnk's

Jadu Saikia said...

@cef_luxan, something like this?

$ cat file.txt
US|A|23
IN|A|12
US|B|12
LK|C|20
US|B|12

$ awk -F"|" '
{OFS="|";a[$1]+=$NF}
END{for (i in a){print i,a[i]}}
' file.txt

LK|20
IN|12
US|47

$ awk -F"|" '
{OFS="|";a[$1"|"$2]+=$NF}
END{for (i in a){print i,a[i]}}
' file.txt

IN|A|12
LK|C|20
US|A|23
US|B|24

cef_luxan said...

Hi Mr. Jadu,
Thank you for your excellent answer

awkseeker said...

Hi Juda, love to bother you again. I tried some SOAP and got the result in big XML. I want to use awk to archive the needed parameters only. Could you help me.

Jadu Saikia said...

@awkseeker, sure, could you please put your input and expected output. Thanks.

© Jadu Saikia http://unstableme.blogspot.com