UNIX BASH scripting

Dedicated to all BASH newbies and Linux one liner lovers. Useful AWK,SED,BASH one liners.

Tuesday, November 10, 2009

Sum of numbers in file - UNIX alternatives

Input file:


$ cat /tmp/file.txt
286
255564800
609
146
671290

Required: Add (Sum) all the numbers present in the above file.

Way#1: This is supposed to be the most popular way of doing an addition of numbers present in a particular field of a file.

$ awk '{s+=$0} END {print s}' /tmp/file.txt
256237131

Way#2: Using UNIX/Linux 'paste' command and 'bc'

$ paste -sd+ /tmp/file.txt
286+255564800+609+146+671290

$ paste -sd+ /tmp/file.txt | bc
256237131

Way#3: Using UNIX/Linux 'tr' command and 'bc'

$ tr -s '\n' '+' < /tmp/file.txt
286+255564800+609+146+671290+

$ echo $(tr -s '\n' '+' < /tmp/file.txt)
286+255564800+609+146+671290+

#Since there's an extra '+' at end of above output, echo an additional '0' like this
$ echo $(tr -s '\n' '+' < /tmp/file.txt)0
286+255564800+609+146+671290+0

$ echo $(tr -s '\n' '+' < /tmp/file.txt)0 | bc
256237131

Way#4: Same as above but doing the arithmetic without using 'bc'

$ printf "%d\n" $(( $(tr -s '\n' '+' < /tmp/file.txt) 0 ))
256237131

Way#5: Using sed and 'bc'

$ sed 's/$/+/' /tmp/file.txt
286+
255564800+
609+
146+
671290+

$ echo $(sed 's/$/+/' /tmp/file.txt) 0
286+ 255564800+ 609+ 146+ 671290+ 0

$ echo $(sed 's/$/+/' /tmp/file.txt) 0 | bc
256237131

Way#6 : or a basic bash script using for loop

sum=0
for num in $(cat /tmp/file.txt)
do
((sum+=num))
done
echo $sum

Way#7: Using python

>>> sum = 0
>>> lines = open("/tmp/file.txt", "r").readlines()
>>> lines
['286\n', '255564800\n', '609\n', '146\n', '671290\n']
>>> for line in lines:
... sum+=eval(line)
...
>>> sum
256237131

Related posts:

- 'Sum of' and 'group by' using awk
- Sum using awk substr function in bash
- Bash 'while loop' sum issue explained
- 'Exponential' value in awk sum output
- Python - adding numbers in a list

Saturday, November 7, 2009

Construct range from numbers - awk

Required:
With the numbers between 100 and 139 whose last digit is in between 0-3, construct the following output:


100-103
110-113
120-123
130-133

Step by step solution:
1) Numbers between 100 and 139 whose last digit is between 0-3

$ seq 100 139 | grep '[0-3]$'

Output:

100
101
102
103
110
111
112
113
120
121
122
123
130
131
132
133

2) Make them a single line with comma separated

$ seq 100 139 | grep '[0-3]$' | paste -sd,

Output:

100,101,102,103,110,111,112,113,120,121,122,123,130,131,132,133

3) Split the above into multiple sub-lines with each line containing 4 numbers

$ seq 100 139 | grep '[0-3]$' | paste -sd, | awk -F, '
{ for(i=1;i<=NF;i++)
{printf("%s%s",$i,i%4?",":"\n")}
}'

Output:

100,101,102,103
110,111,112,113
120,121,122,123
130,131,132,133

4) Print the first and last field

$ seq 100 139 | grep '[0-3]$' | paste -sd, | awk -F, '
{ for(i=1;i<=NF;i++)
{printf("%s%s",$i,i%4?",":"\n")}
}' | awk -F, '{print $1"-"$NF}'

Output:

100-103
110-113
120-123
130-133

I am sure there must be better ways to achieve this, please comment.

Related post:

- Break a line into multiple lines using awk and sed

Monday, November 2, 2009

Bash - numbering lines in file using awk

Input file 'file.txt' contains names of few students.


$ cat file.txt
Sam G
Ashok Niak
Rosy M
Peter K
Sid Thom
Rasi Yad
Papu S
Niaraj J
Aloh N K
Nipu H
Quam L

Required output:

For the entries of the above file,
- add a serial number to each line
- Also add 'House' number such that all the students are group into total 4 houses in the following fashion:

Sl No,Name,House
1,Sam G,House1
2,Ashok Niak,House2
3,Rosy M,House3
4,Peter K,House4
5,Sid Thom,House1
6,Rasi Yad,House2
7,Papu S,House3
8,Niaraj J,House4
9,Aloh N K,House1
10,Nipu H,House2
11,Quam L,House3

The awk solution using awk NR variable:

$ awk '
BEGIN {OFS=","; print "Sl No,Name,House"}
{print NR,$0,"House"((NR-1)%4)+1}
' file.txt

Lets format the output for a better look:

$ awk '
BEGIN {
FORMAT="%-8s%-18s%s\n" ;
{printf FORMAT,"Sl No","Name","House"}
}
{printf FORMAT,NR,$0,"House"((NR-1)%4)+1}
' file.txt

Output:

Sl No Name House
1 Sam G House1
2 Ashok Niak House2
3 Rosy M House3
4 Peter K House4
5 Sid Thom House1
6 Rasi Yad House2
7 Papu S House3
8 Niaraj J House4
9 Aloh N K House1
10 Nipu H House2
11 Quam L House3

Read about text alignment using awk printf function here

A Bash script for the same will be something like this:

#!/bin/sh
i=0
while read
do
echo "$((i+1)),$REPLY,House$((i++ % 4 + 1))"
done < file.txt

Output:

$ sh numbering.sh
1,Sam G,House1
2,Ashok Niak,House2
3,Rosy M,House3
4,Peter K,House4
5,Sid Thom,House1
6,Rasi Yad,House2
7,Papu S,House3
8,Niaraj J,House4
9,Aloh N K,House1
10,Nipu H,House2
11,Quam L,House3

Now a question:
What is that '$REPLY' in the above script ?

Answer: '$REPLY' is the default value when a variable is not supplied to read.

So the above script is same as:

#!/bin/sh
i=0
while read line
do
echo "$((i+1)),$line,House$((i++ % 4 + 1))"
done < file.txt


In general, numbering of the lines of a file can be done in several ways viz

Using UNIX/Linux nl(1) command - number lines of files

$ nl file.txt
1 Sam G
2 Ashok Niak
3 Rosy M
4 Peter K
5 Sid Thom
6 Rasi Yad
7 Papu S
8 Niaraj J
9 Aloh N K
10 Nipu H
11 Quam L

Using awk NR:

$ awk '{print "\t"NR"\t"$0}' file.txt
1 Sam G
2 Ashok Niak
3 Rosy M
4 Peter K
5 Sid Thom
6 Rasi Yad
7 Papu S
8 Niaraj J
9 Aloh N K
10 Nipu H
11 Quam L

Using sed syntax:

$ sed = file.txt | sed 'N;s/\n/\t/'
1 Sam G
2 Ashok Niak
3 Rosy M
4 Peter K
5 Sid Thom
6 Rasi Yad
7 Papu S
8 Niaraj J
9 Aloh N K
10 Nipu H
11 Quam L

Saturday, October 31, 2009

Extract range of lines using sed awk bash

Below are few different ways to print or extract a section of a file based on line numbers.

Lets try to extract lines between line number 27 and line number 99 of input file 'file.txt'

Using sed editor:


$ sed -n '27,99 p' file.txt > /tmp/file1

Which is same as:

$ sed '27,99 !d' file.txt > /tmp/file2

Awk alternative : you can make use of awk NR variable

$ awk 'NR >= 27 && NR <= 99' file.txt > /tmp/file3

Using Linux/UNIX 'head' and 'tail' command:

$ head -99 file.txt | tail -73 > /tmp/file4

Which is basically:

$ head -99 file.txt | tail -$(((99-27)+1)) > /tmp/file5

In vi editor, we can use the following command in ex mode (open the main file 'file.txt' in vi):

:27,99 w! /tmp/file6

i.e. Write lines between line number 27 and line number 99 of main file 'file.txt' to file '/tmp/file6'

Perl alternative would be:

$ perl -ne 'print if 27..99' file.txt > /tmp/file7

And the solution using python:

$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

>>> fp = open("/tmp/file8","w")
>>> for i,line in enumerate(open("file.txt")):
... if i >= 26 and i < 99 :
... fp.write(line)
...
>>>

So the contents of all the output files produced (i.e /tmp/file[1-8]) will be the same (i.e. line number 27 to line number 99 of 'file.txt')

Friday, October 30, 2009

Bash while loop sum issue explained

On one of my directory I had a lot of log files and I had to find the count of the total number of lines which starts with 's' (i.e. ^s).
My first approach was:


$ ls | xargs -i grep -c ^s {} | awk '{sum+=$0} END {print sum}'
190978

And I got my result. Then I thought of performing the same using bash scripting for and while loop and this is what I tried.

#!/bin/sh

sum=0
DIR=~/original
for file in $(ls $DIR)
do
Slines=$(grep -c ^s $DIR/$file)
((sum+=Slines))
#You can also use
#sum=$(expr $sum + $Slines)
#sum=`expr $sum + $Slines`
done
echo $sum

Executing it:

$ ./usingfor.sh
190978

Cool, correct result.

And then I modified the above script for bash while loop:

#!/bin/sh
sum=0
DIR=~/original
ls $DIR | while read file
do
Slines=$(grep -c ^s $DIR/$file)
((sum+=Slines))
done
echo $sum

Executing it:

$ ./usingwhile.sh
0


Oops!!! what went wrong ?

In Bash shell, piping directly to bash while loop causes the bash shell to function in a sub shell.
So in the above example the scope of the 'sum' variable is limited to the sub-shell of the while loop and so the modified value of 'sum' is not reflected when we exit the loop. Value of sum is still 0 (local value) as we initialized it to 0 at the beginning of the script.

The solution of this variable scoping problem with while and direct piping will be:

Remove the direct pipe and feed the list of file names under '~/original' directory as stdin to the while loop as shown below (Basically create a temp file with the file names of the directory '~/original')

#!/bin/sh
sum=0
DIR=~/original
ls $DIR > /tmp/filelist

while read file
do
Slines=$(grep -c ^s $DIR/$file)
((sum+=Slines))
done < /tmp/filelist
echo $sum

Executing it:

$ ./usingwhile_1.sh
190978

And the result is correct.

UNIX BASH scripting Headlines

© Jadu Saikia http://unstableme.blogspot.com