Tuesday, August 26, 2008

Sort date in ddmmyyyy format - awk and bash script


Input file is having first field as ddmmyyyy format.

$ cat myf.dat
12082008;pull done;ret=34;Y
08072008;push hanged;s=3;N
15082008;pull done;ret=34;Y
01062008;psuh done;ret=23;Y
18082007;old entry;old;N

Required output: We need to sort the above file based on first field date in ddmmyyyy format; so that the final output after sort should be:

18082007;old entry;old;N
01062008;psuh done;ret=23;Y
08072008;push hanged;s=3;N
12082008;pull done;ret=34;Y
15082008;pull done;ret=34;Y

The solution is divided into 3 steps:

1) Adding a temporary field to the beginning. This field is nothing but the yyyymmdd format of the corresponding first field.

$ awk '{
tempfield=sprintf("%s%s%s",substr($1,5),substr($1,3,2),substr($1,1,2))
print tempfield","$0
}' FS=";" myf.dat

20080812,12082008;pull done;ret=34;Y
20080708,08072008;push hanged;s=3;N
20080815,15082008;pull done;ret=34;Y
20080601,01062008;psuh done;ret=23;Y
20070818,18082007;old entry;old;N

2) Now Doing a numeric sort.

$ awk '{
tempfield=sprintf("%s%s%s",substr($1,5),substr($1,3,2),substr($1,1,2))
print tempfield","$0
}' FS=";" myf.dat | sort -n

20070818,18082007;old entry;old;N
20080601,01062008;psuh done;ret=23;Y
20080708,08072008;push hanged;s=3;N
20080812,12082008;pull done;ret=34;Y
20080815,15082008;pull done;ret=34;Y

3) Removing the temporary field from beginning.

$ awk '{
tempfield=sprintf("%s%s%s",substr($1,5),substr($1,3,2),substr($1,1,2))
print tempfield","$0
}' FS=";" myf.dat | sort -n | cut -d"," -f2

18082007;old entry;old;N
01062008;psuh done;ret=23;Y
08072008;push hanged;s=3;N
12082008;pull done;ret=34;Y
15082008;pull done;ret=34;Y

The above is the required output.

Monday, August 25, 2008

Redirect both stderr and stdout to single file - linux

If you want two files, one for the output(mylog) and one for the errors(mylog.err)

$ ./command 2>mylog.err 1>mylog


And if you want to redirect both stderr and stdout to a single logfile

$ ./command >mylog 2>&1

so "mylog" will contain both stderr and stdout.

Saturday, August 23, 2008

Subdivide an ip address - awk and eval

The purpose is quite simple! We need to assign each field in an IP address to separate variables.

e.g.

$
IP="172.21.60.1"

The way:

$ eval $(echo "$IP" | awk '{print "IP1="$1";IP2="$2";IP3="$3";IP4="$4}' FS=.)

So, just to confirm.

$ echo $IP1
172

$ echo $IP2
21

$ echo $IP3
60

$ echo $IP4
1

Making it more generic:

$ eval $(echo "$IP" | awk '{for(i=1;i<=NF;i++) printf "IP%s=%s\n",i,$i}' FS=.)

Related post: awk and eval

Thursday, August 21, 2008

Split a file, add headers - awk and bash script

Input file:

$ cat master.dat
h1|323|0|v1
l3|2121|MOD
k|1|53453
k|2|312312
k|3|12121
k|4|76577
k|5|76577
k|6|96557
k|7|76577
k|8|26579
k|9|96532
k|10|76577
k|11|6577
k|12|96577
k|13|16577

Output required: Split the above file into sub files such that each subfile will contain 4 lines and first two lines (beginning with h1 and l3) will be there in each of the subfiles.

The script:

$ cat split.sh

#!/bin/sh

NOARG=64
[ -z $1 ] && echo "one file please" && exit $NOARG || FILE=$1
numlines=4
echo "Operation on: $FILE"

awk 'NR==1{h=$0} NR==2{t=$0} NR==L*(n+1)+3 {close(F); n++; print h RS t > "'"$FILE"'"n+1".sub"} {print > (F="'"$FILE"'"n+1".sub")}' L=$numlines $FILE

mkdir -p backup; mv $FILE backup/.
echo "Done for: $FILE"

Executing:

$ ./split.sh master.dat
Operation on: master.dat
Done for: master.dat

Output:

$ ls
backup master.dat1.sub master.dat2.sub master.dat3.sub master.dat4.sub split.sh

$ cat master.dat1.sub
h1|323|0|v1
l3|2121|MOD
k|1|53453
k|2|312312
k|3|12121
k|4|76577

$ cat master.dat2.sub
h1|323|0|v1
l3|2121|MOD
k|5|76577
k|6|96557
k|7|76577
k|8|26579

$ cat master.dat3.sub
h1|323|0|v1
l3|2121|MOD
k|9|96532
k|10|76577
k|11|6577
k|12|96577

$ cat master.dat4.sub
h1|323|0|v1
l3|2121|MOD
k|13|16577

Monday, August 18, 2008

Numeric int function in awk

wrt.log contains some consolidated log information of an application for a whole day (00:00:00 IST to 23:59:59 IST in a particular day). The format of the log lines is:

Timestamp IP Status

we need to find out the IPs corresponding to status "ACT" and of the present hour timestamp.

i.e.

input file:

$
cat wrt.log
05:18:37 IST 2008 172.21.45.2 ACT
09:18:27 IST 2008 172.21.45.12 ACT
06:18:37 IST 2008 172.21.45.22 DES
08:18:37 IST 2008 172.21.45.3 ACT
00:18:37 IST 2008 172.21.45.23 DES
09:18:39 IST 2008 172.21.45.9 DES

And present date:

$ date
Mon Aug 18 09:23:21 IST 2008

$ date +%H
09

so output required is:

172.21.45.12

The awk solution:

$ awk -v hour=$(date +%H) '
int($1) == hour && /ACT/ {
print $4
}
' wrt.log

Things to digest: Use of numeric int function within awk.

Saturday, August 16, 2008

Row to column transpose - bash scripting

I already discussed about transpose using awk in one of my older post. Now a different awk solution for the same.

Input file:

$ cat mtf.txt
a:b:cde:f:g
1:2
I:II:III:IV

Awk code:


$ awk 'BEGIN {FS=OFS=":"}
{
for (i=1;i<=NF;i++)
{
arr[NR,i]=$i;
if(big <= NF)
big=NF;
}
}

END {
for(i=1;i<=big;i++)
{
for(j=1;j<=NR;j++)
{
printf("%s%s",arr[j,i], (j==NR ? "" : OFS));
}
print "";
}
}' mtf.txt



One more solution:

$ awk -F ":" '{
for (f = 1; f <= NF; f++)
a[NR, f] = $f
}
NF > nf { nf = NF }
END {
for (f = 1; f <= nf; f++)
for (r = 1; r <= NR; r++)
printf a[r, f] (r==NR ? RS : FS)
}' mtf.txt


Output:
a:1:I
b:2:II
cde::III
f::IV
g::

Thursday, August 7, 2008

"Reserved" Exit Codes - bash shell

1::catchall for general errors[ miscellaneous errors, such as "divide by zero"]
--------------------------------------------------------------------------------
$ let "var1 = 1/0"
-bash: let: var1 = 1/0: division by 0 (error token is "0")

$ echo $?
1

2:: misuse of shell builtins, according to Bash documentation [Seldom seen, usually defaults to exit code 1]
------------------------------------------------------------------------------------------
$ ls /tmp/notexi
ls: /tmp/notexi: No such file or directory

$ echo $?
1

126::"command invoked cannot execute" [permission problem or command is not an executable]
------------------------------------------------------------------------------------------------------------------
$ ./asd
-bash: ./asd: Permission denied

$ echo $?
126

127:: "command not found" [possible problem with $PATH or a typo]
-----------------------------------------------------------------------------------
$ pks -ef
-bash: pks: command not found

$ echo $?
127


130::"script terminated by Control-C"
--------------------------------------------
$ sleep 100

(pressed ^C)

$ echo $?
130


Read in details

Saturday, August 2, 2008

Print lines with more than two occurrence of a pattern using sed, awk, grep

Input file:

$
cat myfile.txt
"AA "345" SDF"
"BB DERT"
CC"123" "DERA
"12"DD"TYU
ASD""123

Output required:
Print only lines which contain more than two " (double quote)
i.e.

"AA "345" SDF"
CC"123" "DERA
"12"DD"TYU

There are a number of ways to do this:

$ grep '".*".*"' myfile.txt

$ awk -F'"' 'NF>=4' myfile.txt

$ awk '/".*".*"/' myfile.txt

$ sed -n '/".*".*"/'p myfile.txt

$ sed -n '/\("\).*\1.*\1/p' myfile.txt

Friday, August 1, 2008

Edit a remote file using vi and scp - linux

unstableme.log is a file on server 172.21.20.104 /root directory. In order to edit this file from your local system the command is:

$ vim scp://root@172.21.20.104//root/unstableme.log

** This requires key based ssh access to the remote server 172.21.20.104.

For more details on editing remote files via scp in vim read the vim tip from vim.org

© Jadu Saikia http://unstableme.blogspot.com