Category Archives: bash

Unicode block names in regular expressions

Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression […]

Posted in bash, java, perl, python, regex | Comments Off on Unicode block names in regular expressions

Pretty printing json

Here is a really simple way to pretty print some unformatted json $ echo ‘{"foo": "lorem", "bar": "ipsum"}’ | python -mjson.tool   {       "bar": "ipsum",       "foo": "lorem"   }

Posted in bash, python | Comments Off on Pretty printing json

Using awk to sum rows of numbers

I have a script which takes a tab-delmited file for regression tests, and converts it xml. I want to do a sanity check, to make sure that the number of utterances in my xml files matches the number in the tab-delimited.txt file. I can do this in 2 lines in UNIX robert_felty$ wc -l samples2.txt […]

Posted in bash, linux, UNIX | Comments Off on Using awk to sum rows of numbers