Wednesday, February 10, 2010

Bash - find diff based on a field


Input files:

$ cat file1.txt
timestamp,records,attempts
1264723200,96,6117
1264809600,856,7298
1264896000,904,8628
1264982400,0,0
1265155200,644,1786

$ cat file2.txt
timestamp,records,attempts
1264723200,96,6117
1264809600,859,7298
1264896000,904,8628
1264982400,0,0
1265155200,641,1786

Required:

If file1.txt and file2.txt do not have same values in the 2nd field (i.e 'records' field) in any one of the lines (both the files are sorted on first field), print "BAD", else "GOOD".

As the above two files are already sorted based on first field (i.e. timestamp), we can use join command to join the lines of the two files and then use awk to perform the comparison between two fields of the joined output.

$ join -t, -1 1 -2 1 file1.txt file2.txt

Output:
timestamp,records,attempts,records,attempts
1264723200,96,6117,96,6117
1264809600,856,7298,859,7298
1264896000,904,8628,904,8628
1264982400,0,0,0,0
1265155200,644,1786,641,1786

Lines for which the 2nd and 4th fields in the above output do not match.

$ join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4' FS=\,
1264809600,856,7298,859,7298
1265155200,644,1786,641,1786

How many lines like that ?

$ join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4 {++c} END {print c}' FS=\,
2

or you can use 'wc -l' for the count

$ join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4' FS=\, | wc -l
2

Now based on the above count, you can conclude if there is difference between file1.txt and file2.txt based on the 2nd field (i.e. records).

$ unmatched_count=$(join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4 {++c} END {print c}' FS=\,)
$ [ -z $unmatched_count ] && echo "OK" || echo "BAD"

Output:
BAD

or

$ unmatched_count=$(join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4' FS=\, | wc -l)
$ [ $unmatched_count -eq 0 ] && echo "OK" || echo "BAD"

Output:
BAD

Related posts:

- Bash join command explained
- Perform outer join using awk
- Performing join using awk
- Search and print output using awk

© Jadu Saikia http://unstableme.blogspot.com