Tuesday, September 30, 2008

Calculate percentage using awk in bash


$ cat myde.txt
OC:4343
AF:5600
AS:1233
NA:6546
SA:3243

Output required: Calculate the percentage of each of the numbers.
i.e. required output:

4343(20.72 %)
5600(26.71 %)
1233(5.88 %)
6546(31.22 %)
3243(15.47 %)

Awk solution:


$ awk '{a[NR] = $2; sum+= $2 }
END {
for (i = 1; i <= NR; i++)
printf "%s(%2.2f %)\n", a[i],(100 * a[i])/sum
}
' FS=":" myde.txt

Sunday, September 28, 2008

Sum of and group by using awk

$ cat cont_details.txt
continent|mval|cval|kval
SA|2345|10|2.3
AF|123|12|4.5
SA|89|12.67
OC|890|10|2.3
EU|24|45|2.4
AF|90|10|10
NA|5678|12|89
AF|345|12|3.5
OC|90|78|5.6
OC|23|12|4.5
SA|1234|12|6.7
EU|90|12|10
AF|12|12|34
SA|909|12|56

Output required:

Select continent, sum(mval),sum(cval),sum(kval) group by continent

i.e. required output:

continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4

Awk solution:

$ awk -F"|" '
NR==1 {print}
NR!=1 {OFS="|";a[$1]+=$2;b[$1]+=$3;c[$1]+=$NF}
END{for (i in a){print i,a[i],b[i],c[i]}}
' cont_details.txt

Output:

continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4

Related post:
Implement group by clause functionality using awk

Saturday, September 27, 2008

Exclude directory from bash find command

Find command to search all the *.sh files starting from current directory.

$ find . -name "*.sh" -print
./dir1/dir11/help.sh
./dir1/sol.sh
./dir2/dir22/a.sh
./dir2/g.sh
./dir3/dert.sh
./dir3/tty.sh


Now to skip or exclude the directory './dir2' and all files and directories under it, use -prune option with bash find command.

$ find . -path './dir2' -prune -o -name "*.sh" -print
./dir1/dir11/help.sh
./dir1/sol.sh
./dir3/dert.sh
./dir3/tty.sh

Thursday, September 25, 2008

Bash desk calculator dc help manual

The dc (desk calculator) utility is stack-oriented and uses RPN ("Reverse Polish Notation").

Some of its uses today I explored are:

1) Hex, Binary, Decimal conversion:

Hex of 10

$ echo 10 16 o p | dc
A

Binary of 10

$ echo 10 2 o p | dc
1010

Decimal of C

$ echo C 10 o p | dc
12

here:

"o" sets radix (numerical base) of output.
"p" prints the top of stack.

2) Notation:

(500 % 34) + 234 is represented like this in dc:

$ echo 500 % 34 + 234 | bc
258

$ echo 500 34 % 234 + p | dc
258

p Prints the value on the top of the stack, without altering the stack. A newline is printed after the value.

+ Pops two values off the stack, adds them, and pushes the result. The precision of the result is determined only by the values of
the arguments, and is enough to be exact.

% Pops two values, computes the remainder of the division that the / command would do, and pushes that. The value computed is the
same as that computed by the sequence Sd dld/ Ld*- .


3) Generate RANDOM number between a range using dc:

To generate a RANDOM number between 200 and 300

$ echo $RANDOM 101 % 200 + p | dc

the same can be achieved in a simple way like this:

$ echo $((RANDOM%101+200))

Related post:

- Simple front end for bash bc calculator
- Bash float comparison using bc calculator
- Bash float comparison using bc and awk

Saturday, September 20, 2008

GROUP BY clause functionality in awk - bash

Q: For a sample two-column data below (cont_bd.txt), how to sum the second column and group by the first column:

Input file:

$ cat cont_bd.txt
continent:mval
SA:2345
AF:123
SA:89
OC:890
EU:24
AF:90
NA:5678
AF:345
OC:90
OC:23
SA:1234
EU:90
AF:12
SA:909

Awk solution for group by clause implementation:

$ awk 'BEGIN{FS=":"; print "continent count total avg"} NR!=1 {a[$1]++;b[$1]=b[$1]+$2}END{for (i in a) printf("%s %10.0f %10.0f %10.2f\n", i, a[i], b[i], b[i]/a[i])} ' cont_bd.txt

Output:

continent count total avg
NA 1 5678 5678.00
OC 3 1003 334.33
AF 4 570 142.50
SA 4 4577 1144.25
EU 2 114 57.00

Friday, September 19, 2008

Bash rename command - rename multiple files

Bash rename command is very useful to rename files in bulk.

e.g.

$ ls
a.cpp b.cpp c.cpp d.cpp

Now to rename all the *.cpp to *.cpp.bak using rename

$ rename 's/\.cpp$/.cpp.bak/' *.cpp

$ ls
a.cpp.bak b.cpp.bak c.cpp.bak d.cpp.bak


Now to rename all files matching "*.bak" to strip the extension,

$ rename 's/\.bak$//' *.bak

$ ls
a.cpp b.cpp c.cpp d.cpp

Friday, September 12, 2008

Matrix addition using awk in bash

This is how can do matrix addition using awk ( basically adding columns of two files)

Input files:

$ cat mat1.txt
1 5 6
2 2 6
4 1 8

$ cat mat2.txt
4 5 3
2 4 5
2 4 6

Output required:
5 10 9
4 6 11
6 5 14

$ awk '
FNR==NR {
for(i=1; i<=NF; i++)
_[FNR,i]=$i
next
}
{
for(i=1; i<=NF; i++)
printf("%d%s", $i+_[FNR,i], (i==NF) ? "\n" : FS);
}' mat1.txt mat2.txt

Delete lines based on another file - awk

$ cat main.txt
ID1:A:45
ID2:B:12
ID4:C:12
ID3:D:56
ID7:F:90
ID9:K:14
ID5:P:32

$ cat filter.txt
ID7:0
ID3:0
ID4:0

Required output: Delete those lines from "main.txt" for which the ID field (first field) matched with that in "filter.txt". Basically the output file say rest.txt will be subtraction of filter.txt from main.txt.

The awk solution:

$ awk >rest.txt 'NR==FNR{arr[$1];next}!($1 in arr)' FS=":" filter.txt main.txt

or

$ awk >rest.txt 'NR==FNR{_[$1];next}!($1 in _)' FS=":" filter.txt main.txt

Result:

$ cat rest.txt
ID1:A:45
ID2:B:12
ID9:K:14
ID5:P:32

Tuesday, September 9, 2008

Linux find command and logical operators

Some of the example to show how we can use logical operators with linux find command.

To find any file whose name ends with either 'sh' or 'pl'

$ find . -type f \( -iname "*.sh" -or -iname "*.pl" \)

To find .txt files that are writeable by "others"

$ find . -type f \( -iname "*.txt" -and -perm -o=w \)

To find .txt files but exclude the ones which are writeable by "others"

$
find . -type f \( -iname "*.txt" ! -perm -o=w \)

or

some find versions support "-not" as well

$
find . -type f \( -iname "*.txt" -not -perm -o=w \)

Remember: The parentheses must be escaped with a backslash, "\(" and "\)", to prevent them from being interpreted as special shell characters.

Monday, September 8, 2008

Calculate sum and average of multiple lines - awk in bash

Input file:

$
cat details.txt
line1|5002|1200|90
line2|3002|4200|80
line3|5052|1600|90
line4|2006|3260|10

Required: Calculate and print the sum and averages of each fields in all the lines of details.txt

$ awk 'BEGIN {FS=OFS="|"} { print; for (i=2; i<=NF; ++i) sum[i] += $i; j=NF }
END { printf "%s%s", "------------------", "\ntotal"; for (i=2; i <= j; ++i) printf "%s%s", OFS, sum[i]; printf "\n"; }' details.txt

Output:
line1|5002|1200|90
line2|3002|4200|80
line3|5052|1600|90
line4|2006|3260|10
------------------
total|15062|10260|270

$ awk 'BEGIN {FS=OFS="|"} { print; for (i=2; i<=NF; ++i) sum[i] += $i; j=NF }
END { printf "%s%s", "------------------", "\nAvg"; for (i=2; i <= j; ++i) printf "%s%s", OFS, sum[i]/NR; printf "\n"; }' details.txt

Output:
line1|5002|1200|90
line2|3002|4200|80
line3|5052|1600|90
line4|2006|3260|10
------------------
Avg|3765.5|2565|67.5

© Jadu Saikia http://unstableme.blogspot.com