Monthly Archives: November 2014

Unicode block names in regular expressions

Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression […]

Posted in bash, java, perl, python, regex | Comments Off on Unicode block names in regular expressions