Using awk to sum rows of numbers

I have a script which takes a tab-delmited file for regression tests, and converts it xml. I want to do a sanity check, to make sure that the number of utterances in my xml files matches the number in the tab-delimited.txt file. I can do this in 2 lines in UNIX

robert_felty$ wc -l samples2.txt  
72148 samples2.txt  
robert_felty$ find . -name '*.xml' | xargs grep -c "<utterance lang='pt-br'" | cut -f 2 -d ':' | awk ' { sum +=$1 } END { print sum }'  
72147

In the first line, I count the number of lines (there is a heade line, so I will be expecting 1 fewer lines)

In the next line, I find all the .xml file using find, then pipe that to xargs, where I use “grep -c” to count the number of matches to the utternace pattern. grep -c outputs rows like this
filename:count
I want to sum up all the counts, so I cut out just the count field using cut, then I use awk to sum up all the counts.

I love UNIX pipelines!

This entry was posted in bash, linux, UNIX. Bookmark the permalink.

Comments are closed.