Wednesday, September 30, 2009

Subtract from total amount - Awk example


Input file:

$ cat expense.txt
Particulars,Item1,Item2,Item3
BudgetAmount,12000,4560,5000
Expense@2006,1800,3000,250
Expense@2007,2210,2100,3000
Expense@2008,100,1500,320
Expense@2009,0,100,20

Output required:

For all the items, calculate the amount left after expense i.e.

For an item:
Amount Left = (BudgetAmount - (Expense@2006 + Expense@2007 + Expense@2008 + Expense@2009))

i.e. required output:

BudgetAmount,12000,4560,5000
Expense@2006,1800,3000,250
Expense@2007,2210,2100,3000
Expense@2008,100,1500,320
Expense@2009,0,100,20
Amount Left,7890,-2140,1410


Microsoft Excel representation of the above:




The awk program:

$ awk 'BEGIN { FS=OFS="," }
$1 == "BudgetAmount" {
bI1 = $2
bI2 = $3
bI3 = $4
print
}
/^Expense@/ {
bI1 -= $2
bI2 -= $3
bI3 -= $4
print
}
END {
print "Amount Left",bI1, bI2, bI3
}' expense.txt

Related post:

- Bash script for sequential subtraction of numbers

Tuesday, September 29, 2009

If else examples in awk - bash

Input file: Each line of 'num.txt' contains 2 numbers (say A and B).

$ cat num.txt
34,140
190,140
89,120
110,110
210,115

Required: Calculate and print percentage (A/B)*100 with the following conditions:

- If percentage is less than 100, print the calculated actual percentage
- If percentage is more than 100, print the percentage as 100

First solution:

$ awk '
BEGIN {FS=OFS=","}
{if($1>$2) {print $0,100}
else {print $0,($1/$2)*100}
}' num.txt

Output:
34,140,24.2857
190,140,100
89,120,74.1667
110,110,100
210,115,100

Lets do some text alignment and formatting using awk.

$ awk '
BEGIN {FS="," ; {printf "%-10s%-8s%s\n","A","B","% age"}}
{if($1>=$2) {printf "%-10s%-8s%s\n",$1,$2,100}
else {printf "%-10s%-8s%2.2f\n",$1,$2,($1/$2)*100}
}' num.txt

Output:
A B % age
34 140 24.29
190 140 100
89 120 74.17
110 110 100
210 115 100

Or a different look of the above script:

$ awk '
BEGIN {
FS="," ; FORMAT="%-10s%-8s%s\n" ;
{printf FORMAT,"A","B","% age"}
}
{
if($1>=$2) {printf FORMAT,$1,$2,100}
else {printf FORMAT,$1,$2,($1/$2)*100}
}' num.txt

Output:
A B % age
34 140 24.2857
190 140 100
89 120 74.1667
110 110 100
210 115 100

Another way of writing if else in AWK.

$ awk '
{printf("%-10s%-8s%2.2f\n",\
$1,$2, ($1<=$2) ? ($1/$2)*100 : 100)
}' FS="," num.txt

Output:
34 140 24.29
190 140 100.00
89 120 74.17
110 110 100.00
210 115 100.00

Related post:

- Calculate percentage using awk in bash
- Align text with awk printf function

Thursday, September 24, 2009

Insert after certain characters - awk and sed


$ add="20010db885a3000000008a2e03707334"
$ echo $add
20010db885a3000000008a2e03707334

Required output: Insert a colon ':' after every 4 characters in the above line.
So the output required:

2001:0db8:85a3:0000:0000:8a2e:0370:7334

Using awk:

$ echo $add | awk -F "" '
{for(i=1;i<=NF;i++){printf("%s%s",$i,i%4?"":":")}}'|awk '{sub(/:$/,"")};1'

Note: Mind the use of "" as the field separator.

Using sed:

$ echo $add | sed 's/..../&:/g;s/:$//'

Related post:

- Break a line into multiple lines using awk and sed

Tuesday, September 22, 2009

Replace asterisk using sed - bash

For example:

$ countries="**India **South Africa **Sri Lanka **West Indies"
$ echo $countries
**India **South Africa **Sri Lanka **West Indies

Output required:
Replace "**" with a newline, so that above line becomes:

**India
**South Africa
**Sri Lanka
**West Indies

The sed replacement:

$ echo $countries | sed 's! **!\n**!g'
sed: -e expression #1, char 12: Invalid preceding regular expression

So you would need to escape the asterisk above.
i.e.

$ echo $countries | sed 's! \*\*!\n**!g'

Another way using sed:

$ echo $countries | sed 's! \*\*!\
\*\*!g'

Similarly:

$ echo "a,b,c,d" | sed 's!,!\n!g'

Output:
a
b
c
d

Related post:

1)
Suppose your i/p line is:

1 b 3 4 e 6 g 8 i j k

And you wish to split the above line into multiple lines (each line with say 3 entries)

i.e.

1 b 3
4 e 6
g 8 i
j k

here is a post

2)
One more related post of breaking a line into multiple lines based on length

Monday, September 21, 2009

Print except few columns using awk - bash

Input file:

$ cat details.txt
AX|23.45|1932323|A|VI|-|Y|0
TY|93.45|2932323|B|VI|-|Y|1
RE|63.25|8932323|A|VI|0|N|1
AY|83.85|0932323|C|VI|-|Y|0

Required:
Print all columns from the above file except column number 2 and 7.
i.e. required output:

AX|1932323|A|VI|-|0
TY|2932323|B|VI|-|1
RE|8932323|A|VI|0|1
AY|0932323|C|VI|-|0

Basically for the above file we have to print column # 1,3,4,5,6,8

i.e.

$ awk '
BEGIN{FS=OFS="|"}{print $1,$3,$4,$5,$6,$8}
' details.txt

But if number of fields is very large on the input file, the above method is not going to be so useful. So here is another technique.

$ awk '
BEGIN{FS=OFS="|"}
{ for (i=1; i<=NF;i++)
if( i==2 || i==7 ) continue
else
printf("%s%s", $i,(i!=NF) ? OFS : ORS)}
' details.txt

And if you want to exclude a range of column numbers (say exclude column 3 to column 6) here is my earlier post

An additional tip:
Suppose you need to generate the print 'statement' for printing a number of consecutive fields for an awk program, here is a quick way:

$ seq -s ",$" 1 8 | sed 's/.*/{print $&}/'

Output:

{print $1,$2,$3,$4,$5,$6,$7,$8}

Or you can use the the for loop mentioned above.

Friday, September 18, 2009

Find process running time in UNIX

Question. How to find out how long a process is running in an UNIX system ?

Ans: Here are some tips to find the process' running time in an UNIX system.

----------------
For 2.6 kernels:
----------------

Identify your process Id

and then do a

ls -ld /proc/PID-OF-YOUR-PROCESS

So the modification time listed on the above file(directory) is the time that the process has started.

e.g. I have started a process say "sleep 10000" few minutes back

$ ps -ef | grep "[s]leep 10000"
jsaikia 24375 23306 0 22:13 pts/10 00:00:00 sleep 10000

$ ls -ld /proc/24375
dr-xr-xr-x 6 jsaikia staff 0 2009-09-18 22:14 /proc/24375

So, "2009-09-18 22:14" is the start time of the above sleep process; if I subtract this time from the current time I can find how long this process has been running.

For subtraction you can have a script like this:

#!/bin/sh

T1=$(date +%s -d "$1")
T2=$(date +%s -d "$2")
((diffsec=T1-T2))
echo - \
| awk -v D=$diffsec '{printf "%d:%d:%d\n",D/(60*60),D%(60*60)/60,D%60}'

So that you can execute like this:

$ sh cal-tdiff.sh "$(date)" "2009-09-18 22:14"
0:22:56

----------------
For 2.4 kernels:
----------------
For 2.4 kernels, the modification time on the "/proc/PID-OF-YOUR-PROCESS" will be the current system time (unlike 2.6 kernels where its the actual process start time)

So how to find the running time of a process on a 2.4 kernel UNIX system ?

Here is the way (this is going to work for 2.6 also)

e.g.

$ ps -ef | grep [s]leep
root 7702 7689 0 17:34 pts/0 00:00:00 sleep 100000

so 7702 is the pid of the above process.

$ pd=7702
$ expr $(awk '{print $1}' FS=\. /proc/uptime) - $(awk '{printf ("%10d\n",$22/100)}' /proc/$pd/stat)

The output will show the number of seconds the process(with pid=pd) is running.

Thursday, September 17, 2009

Bash script to copy required files

Contents of my "inputdir" is a set of files with filename like this:

$ ls -1 inputdir/
log.10.16.1253168140.txt
log.11.5.1253168345.txt
log.11.9.1253168347.txt
log.12.1.1253168347.txt
log.19.1.1253168140.txt

Directory "testcfgs" contains a set of config xmls.

$ ls -1 testcfgs/
cfg_10_16.xml
cfg_10_5.xml
cfg_11_5.xml
cfg_11_9.xml
cfg_12_1.xml
cfg_19_1.xml
cfg_19_2.xml
cfg_91_9.xml

Required:

For each file of name "log.X.Y.timestamp.txt" in "inputdir", copy the corresponding "cfg_X_Y.xml" config file from "testcfgs" to a directory say "requiredcfgs".

A simple practical bash one liner script:

$ for filename in $(ls -1 inputdir/)
> do
> X=$(echo "$filename" | cut -d"." -f2)
> Y=$(echo "$filename" | cut -d"." -f3)
> cp testcfgs/cfg_$X\_$Y.xml requiredcfgs/
> done

The two lines above for finding X and Y value can be replaced by a single line using 'eval with awk', like this:

$ for filename in $(ls -1 inputdir/)
> do
> eval $(echo "$filename" | awk -F "." '{print "X="$2";Y="$3}')
> cp testcfgs/cfg_$X\_$Y.xml requiredcfgs/
> done

Contents of "requiredcfgs" directory after execution of the above bash script.

$ ls -1 requiredcfgs/
cfg_10_16.xml
cfg_11_5.xml
cfg_11_9.xml
cfg_12_1.xml
cfg_19_1.xml

Related post on eval with awk:

- Subdivide an ip address - assign each part to an variable using awk

Tuesday, September 15, 2009

Linux seq command format option example

I have already post on Linux/UNIX seq command, using which we can generate sequence of numbers. Seq is very useful to generate loop arguments in UNIX bash scripting.

One of very useful seq command line option is -f

-f, --format=FORMAT
use printf style floating-point FORMAT

Lets see some simple examples on the same.


$ seq -f "%04g" 3
Output:
0001
0002
0003


$ seq -f "logfile%02g.txt" 10
Output:
logfile01.txt
logfile02.txt
logfile03.txt
logfile04.txt
logfile05.txt
logfile06.txt
logfile07.txt
logfile08.txt
logfile09.txt
logfile10.txt

Now, to create 10 files with names logfile01.txt, logfile02.txt ,....., logfile10.txt

$ touch $(seq -f "logfile%02g.txt" 10)

$ ls -1
logfile01.txt
logfile02.txt
logfile03.txt
logfile04.txt
logfile05.txt
logfile06.txt
logfile07.txt
logfile08.txt
logfile09.txt
logfile10.txt

FIRST INCREMENT LAST
Sequence numbers between 1.0003 and 1.0012 with an increment of .00002

$ seq -f "1.%04g" 3 2 12
1.0003
1.0005
1.0007
1.0009
1.0011

-s is to specify separator between sequence numbers.

$ seq -s "+" -f "1.%04g" 3 2 12
1.0003+1.0005+1.0007+1.0009+1.0011

$ seq -s "+" -f "1.%04g" 3 2 12 | bc
5.0035

Another good command for printing sequential and random data is jot, read here

Related post:
- Ways of writing for loops in bash scripting
- Print text within style box in bash scripting

Thursday, September 10, 2009

Replace duplicate line with blank - awk

I just received an query as a comment on one of my older post on "removing duplicates based on fields using awk"

Question was:

Any Idea on how to replace duplicate line with blank line instead of deleting them?
e.g.

Input:

test1
test1
test2
test2
test2
test3

Output:

test1

test2


test3

Thought of making it a separate post here.

The solution using awk:

$ awk 'x[$0]++ {$0=""} {print}' file.txt


Related post:
- Remove duplicate without sorting file using awk

Wednesday, September 9, 2009

Split file using awk - few examples

One of good use of awk is splitting files (based on different conditions) into sub-files.
Lets see some examples:

Example 1:

Input file log1.txt:
- Line starting with H is the main header line.
- Line starting with "h" and subsequent lines starting with "s" (till the next "h" line) are part of the same entry/section.

$ cat log1.txt
H,555,etho0
h,1,3
s,1233,456456,1212
s,4251,452456,7215
s,6283,851456,1219
h,9,2
s,2233,156456,1912
s,9233,256456,8212
h,2,4
s,4233,456456,1212
s,7251,252456,7215
s,1288,851456,9219
s,9183,851456,6219

Required:

- Split or subdivide the above file into sub files corresponding to each entry (one entry being the section starting with "h" and "s" lines till the next "h" line)
- Each sub-file should contain(start with) the main header line ("H" line).
- The required output is 3 sub files with the following contents and filename convention.

$ cat 555.1.1.log
H,555,etho0
h,1,3
s,1233,456456,1212
s,4251,452456,7215
s,6283,851456,1219

$ cat 555.9.2.log
H,555,etho0
h,9,2
s,2233,156456,1912
s,9233,256456,8212

$ cat 555.2.3.log
H,555,etho0
h,2,4
s,4233,456456,1212
s,7251,252456,7215
s,1288,851456,9219
s,9183,851456,6219

The awk program:

$ awk -F "," '
$1=="H" {mainH=$0;id=$2;next}
/^h/{
hid=$2;close(id"."hid"."f".log")
f++
print mainH > id"."hid"."f".log"
}
{print $0 > id"."hid"."f".log"}
' log1.txt

Example 2:

Input file:

$ cat log.txt
1252468812,yahoo,3.5
1252468812,hotmail,2.4
1252468819,yahoo,1.2
1252468812,msn,8.9
1252468923,gmail,12
1252468819,live,3.4
1252468929,yahoo,9.0
1252468929,msn,1.2

Required:

a) Split the above files based on the first field (i.e. lines with same first field should go to the same file)

The awk one liner:

$ awk -F "," '{close(f);f=$1}{print > f".txt"}' log.txt

Output:
Above file is splited into the following sub-files.

$ cat 1252468812.txt
1252468812,yahoo,3.5
1252468812,hotmail,2.4
1252468812,msn,8.9

$ cat 1252468819.txt
1252468819,yahoo,1.2
1252468819,live,3.4

$ cat 1252468923.txt
1252468923,gmail,12

$ cat 1252468929.txt
1252468929,yahoo,9.0
1252468929,msn,1.2

b) Send every 3 lines of above file into a sub file.

The awk code:

$ awk '{print >("log_" int((NR+2)/3))}' log.txt

Output:
The sub-files generated.

$ cat log_1
1252468812,yahoo,3.5
1252468812,hotmail,2.4
1252468819,yahoo,1.2

$ cat log_2
1252468812,msn,8.9
1252468923,gmail,12
1252468819,live,3.4

$ cat log_3
1252468929,yahoo,9.0
1252468929,msn,1.2

Related post:

Tuesday, September 8, 2009

Print first character of a field - awk substr

Input file:

$ cat file.txt
8965,1212,c32,1
1221,9000,d90,0
1222,7823,,2
9012,1901,c12,7
9012,1342,t90,9

Output Required: If 3rd field is non empty, print only the first character of the value, else(i.e. when the field is blank) print "NA" in the 3rd field.
i.e. required output:

8965,1212,c,1
1221,9000,d,0
1222,7823,NA,2
9012,1901,c,7
9012,1342,t,9


The awk program:

$ awk '
BEGIN {FS=OFS=","}
{ if ( length($3) ) { $3 = substr($3, 0, 1) }
else { $3 = "NA" }
print
}
' file.txt

Related post:
- Brief about awk substr function
- Blank column in file - awk newbie
- Count non empty field in file using awk

Monday, September 7, 2009

Truncate string using bash script

Input file:

$ cat spears.txt
Baby-One-More-Time.mp3
Autumn-Goodbye.mp3
Baby-One-More-Time.mp3
Cant-Make-You-Love-Me.wmv
Crazy.mp3
Crazy---Stop-Remix.mp3
Dont-Go-Knocking-on-My-Door.mp3
Dont-Let-Me-Be-The-Last-to-Know.flv
From-The-Bottom-of-My-Broken-Heart.mp3
Im-Not-a-Girl-Not-Yet-a-Woman.mp3


Required: Truncate the lines of the above file (filename part, and not the extension) to 15 character long.
Also insert a string "..." in between the filename part and extension in case the line is truncated.

The bash script:

#!/bin/sh
#Bash Script to truncate string
#

while read filename
do
name=${filename%%.*}
extn=${filename##*.}
if [ ${#name} -gt 15 ]
then
nfile=$(echo $name | cut -c1-15)
fullname=${nfile}...${extn}
echo $fullname
else
echo $filename
fi
done < spears.txt > spears.txt.truncated

The output file produced after execution of the above bash script:

$ cat spears.txt.truncated
Baby-One-More-T...mp3
Autumn-Goodbye.mp3
Baby-One-More-T...mp3
Cant-Make-You-L...wmv
Crazy.mp3
Crazy---Stop-Re...mp3
Dont-Go-Knockin...mp3
Dont-Let-Me-Be-...flv
From-The-Bottom...mp3
Im-Not-a-Girl-N...mp3

Saturday, September 5, 2009

Execute multiple commands with exec and find

Is it possible to execute multiple commands using exec on the Linux/UNIX find command output ?

The answer is "yes". Each -exec action is to be associated with a escaped semi-colon (\;)

e.g.

I had to find files named "1251936000.log" and then need to perform two actions on it:

- Count number of lines in the file.
- Do a "ls -l" listing of the file.

And the find command I wrote with exec:

$ find . -name 1251936000.log -exec wc -l {} \; -exec ls -l {} \;

Output:

6924 ./lv1/1251936000.log
-rw-r--r-- 1 root root 977264 Sep 4 00:17 ./lv1/1251936000.log

Related post:
- A set of posts on Linux/UNIX find command.

Friday, September 4, 2009

Rename file to uppercase except extension - Bash

My present working directory got the following files.

$ ls -1
new.py
readme.txt
sl.pl
test.py

Required: Make all file in this directory uppercase but not their extension (ex: image.jpg becoming IMAGE.jpg)

The shell script on the command prompt:

$ ls | while read file
> do
> name=${file%%.*}
> extn=${file##*.}
> newfilename=$(echo $(echo $name | tr 'a-z' 'A-Z').$extn)
> echo "Moving $file to $newfilename"
> mv $file $newfilename
> done

Note: Linux secondary prompt (PS2)
The > (greater than sign) is the Linux secondary prompt (PS2).
When one issues command that is incomplete, the shell will display this prompt and will wait for the user to complete the command and hit Enter again.

Output of the above bash script:

Moving new.py to NEW.py
Moving readme.txt to README.txt
Moving sl.pl to SL.pl
Moving test.py to TEST.py

Now

$ ls -1
NEW.py
README.txt
SL.pl
TEST.py


Related post:
- Renaming file from lowercase to uppercase in Bash

© Jadu Saikia http://unstableme.blogspot.com