Category: linguistics
-
UNIX tip of the day —
duplicate and replace lines with awkToday I got some data I wanted to add to my machine learning training datasets for named entity recognition. My system is designed to be used with output from automatic speech recognition (ASR). It is frequently difficult to be certain whether ASR output will contain hyphens or not, e.g. (email, vs e-mail) so frequently I…
-
Vim regex-fu for LaTeX
When writing a beamer presentation with LaTeX, I organize my presentation into sections and subsections. Frequently, the title of the first frame (slide) in a subsection has the same name as the subsection. Let’s say I start off with the following structure: \section[corpora]{Accessing text corpora} \subsection[gutenberg]{The Gutenberg Corpus} \subsection[chat]{The web and chat Corpus} \subsection[brown]{The Brown…
-
Why doesn’t Mac update standard UNIX utilities?
I am currently teaching a course on programming for linguists. We are using python, but for the first few classes, I have been going over some standard UNIX utilities like cd, ls and such, plus using regular expressions with grep and sed. I actually don’t use sed that much. I tend to reach for perl,…
-
Bash one-liners to the rescue
I recently find myself using handy bash one-liners more all the time. I think that this is where unix/linux can really start to shine. There are so many little programs that just do one thing, and one thing well. But the ability to combine these together through pipes means you have extremely flexible and powerful…