Category: Computers
-
UNIX tip of the day —
duplicate and replace lines with awkToday I got some data I wanted to add to my machine learning training datasets for named entity recognition. My system is designed to be used with output from automatic speech recognition (ASR). It is frequently difficult to be certain whether ASR output will contain hyphens or not, e.g. (email, vs e-mail) so frequently I…
-
Git tip – restoring “lost” commits
I ran into a git issue today where I thought I was ready to push a recent commit, and the push failed, saying that I was in the middle of a rebase. I don’t remember starting a rebase, but maybe I did. I tried git rebase –continue, but that didn’t work, so then I tried…
-
UNIX tip of the day: two file processing with AWK
I recently came across some AWK code from a work colleague that I did not understand at all awk -F’\t’ -v OFS=’\t’ ‘FNR==NR{a[$1]=$1;next};$1 in a{print $1,$2,$3}’ file1 file2 I usually like to understand code instead of blindly copying and pasting, so I did a little research into what this was doing. Searching for “awk FNR…
-
UNIX tip of the day – trap EXIT
I was reading a shell script today and came across the trap command, which I was not aware of. Some googling led me to this article: How “Exit Traps” Can Make Your Bash Scripts Way MoreRobust And Reliable , which has a really nice explanation. Basically, trap acts sort of like a finally block in…
-
UNIX tip of the day – grep -P is slow
Unless you really need some advanced regular expressions only supported by PCRE, using POSIX regular expressions with grep is usually an order of magnitude faster – that’s because the default engine with grep uses finite automata, as opposed to a backtracking algorithm which PCRE uses ( the main featuress you gain from the backtracking algorithm…