Category Archives: UNIX

UNIX tip of the day —
duplicate and replace lines with awk

Today I got some data I wanted to add to my machine learning training datasets for named entity recognition. My system is designed to be used with output from automatic speech recognition (ASR). It is frequently difficult to be certain whether ASR output will contain hyphens or not, e.g. (email, vs e-mail) so frequently I […]

Posted in linguistics, UNIX | Comments Off on UNIX tip of the day —
duplicate and replace lines with awk

UNIX tip of the day – grep -P is slow

Unless you really need some advanced regular expressions only supported by PCRE, using POSIX regular expressions with grep is usually an order of magnitude faster – that’s because the default engine with grep uses finite automata, as opposed to a backtracking algorithm which PCRE uses ( the main featuress you gain from the backtracking algorithm […]

Posted in perl, regex, UNIX | Comments Off on UNIX tip of the day – grep -P is slow

UNIX tip – xargs with multiple commands

Xargs is an extremely powerful complement to the awesome find command. One downside is that you usually need to have a single pipeline. By default you can’t put together a bunch of commands which are not piped. However, it is possible to call a shell with xargs. In this way, you can execute multiple commands […]

Posted in UNIX | Comments Off on UNIX tip – xargs with multiple commands

Using awk to sum rows of numbers

I have a script which takes a tab-delmited file for regression tests, and converts it xml. I want to do a sanity check, to make sure that the number of utterances in my xml files matches the number in the tab-delimited.txt file. I can do this in 2 lines in UNIX robert_felty$ wc -l samples2.txt […]

Posted in bash, linux, UNIX | Comments Off on Using awk to sum rows of numbers