Friday, October 22, 2010

Check equality of multiple numbers - awk


My input file 'file.txt' contains 4 values of a certain metric for each of the following 'Continents'.

$ cat file.txt
Continent Val1 Val2 Val3 Val4
AS 440518 440518 440516 440516
AF 253317 253317 253315 253317
EU 245397 245397 245397 245397
OC 226410 226410 226410 226410
NA 221961 221961 221962 221961

Required : I was required to find out only those 'Continents' for which 'all' values are 'same'.

Solutions:

1) Using awk:

$ awk '
/^Continent/ {print $1; next}
$2==$3 && $3==$4 && $4==$5 {print $1}
' file.txt

Output:

Continent
EU
OC

2) Wrote this python program using python 'sets' (Unordered collections of unique elements) to achieve the same. Something like:

from sets import Set
for line in open("file.txt"):
if line.startswith('Continent'):
print line.split()[0]
firstfield = line.split()[0]
remaining = line.split()[1:]
vals = Set(remaining)
if len(vals) == 1:
print firstfield

Executing it:

$ python printequal.py
Continent
EU
OC

3) Any other solution using Bash, Awk or any other scripting languages ? Readers, please put your solutions here in the comment section. Much appreciated.

Related:
- Bash function to compare multiple numbers equality

Wednesday, October 20, 2010

Print future date using unix date command

I have already discussed in one of my earlier post about how we can print future and past dates using UNIX date command, revisiting it again to show a very interesting date that we are going to witness today.

From DATE(1) man pages, some of the options used in this example are:

%H hour (00..23)
%M minute (00..59)
%d day of month (e.g, 01)
%m month (01..12)
%Y year

This is an interesting time in 2010, 8:10 PM of 20th Oct 2010, it going to be 20:10 20/10 2010. Lets try to print this using UNIX date command.




Hope you found it interesting !

Tuesday, October 12, 2010

Awk - Print particular instances of a file

I am requesting everyone to provide a better alternative (in any scripting language) to this problem. Thanks in advance.

My input file has the following format:
- A instance is a combination of 'h' 'v' and one or more 'i' lines.
- All lines starting with 'h' are header lines and the 3rd field in that line is the 'header number'.
- All lines starting with 'v' are version lines and the second field in that line is the 'version number'.

$ cat file.txt
h,1,100
v,1
i,rt,200
i,rt,210
i,rt,810
h,1,101
v,5
i,rt,500
i,rt,700
h,1,100
v,2
i,rt,100
i,rt,910
h,1,500
v,1
i,rt,190
h,1,100
v,1
i,rt,900
i,rt,210
h,1,300
v,1
i,rt,800
i,rt,210

Required:
- Print all the 'i' lines associated with header number '100' and version number '1'
i.e. required output:

i,rt,200
i,rt,210
i,rt,810
i,rt,900
i,rt,210

The quick solution I can think about is to associate 'header number' and 'version number' with all 'i' lines

$ awk -F "," '$1=="h" {h_value=$NF}
$1=="v" {v_value=$NF}
$1=="i" {print h_value,v_value,$0}
' file.txt

Output:

100 1 i,rt,200
100 1 i,rt,210
100 1 i,rt,810
101 5 i,rt,500
101 5 i,rt,700
100 2 i,rt,100
100 2 i,rt,910
500 1 i,rt,190
100 1 i,rt,900
100 1 i,rt,210
300 1 i,rt,800
300 1 i,rt,210

And then print only the lines with header number=100 and version number=1.

$ awk -F "," '$1=="h" {h_value=$NF}
$1=="v" {v_value=$NF}
$1=="i" {print h_value,v_value,$0}
' file.txt | awk '$1==100 && $2==1 {print $NF}'

Output:

i,rt,200
i,rt,210
i,rt,810
i,rt,900
i,rt,210

I am sure there can be a better solution to this problem. Readers, please put your solutions here in the comment section. Much appreciated.

Related posts:
- Count instances without specific line in UNIX using Python
- Print last instance of a file in UNIX
- Print first few instances of a file - python

© Jadu Saikia http://unstableme.blogspot.com