Yearly Archives: 2014

Unicode block names in regular expressions

Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression […]

Posted in bash, java, perl, python, regex | Comments Off on Unicode block names in regular expressions

Monkey patching in python

I was just reading an article about Martijn Pieters, who is a python expert, and he mentioned monkey patching I did not know what monkey patching is, so I googled it, and found a great answer on stack overflow Basically, it takes advantage of python’s class access philosophy. Unlike java, which has a strict access […]

Posted in python | Comments Off on Monkey patching in python

Java anchored regex

I just discovered this today when doing some regex in Java. When I first started doing regex in Java, I was surprised to learn that Java seems to treat all regular expressions as anchored. That is, if you have a string foobar and search for “foo” it will not match. This is different than grep, […]

Posted in java, regex | Comments Off on Java anchored regex

Solr DataImportHandler preImportDeleteQuery gotcha

One handy feature of the DataImportHandler in solr is that you can group documents by different entities. In the MKB we have a couple different kinds of entities we import – songs, albums, tvshows, etc. Sometimes we make a change or improvement to the underlying data of one type of entity, and want to test […]

Posted in lucene, solr | Comments Off on Solr DataImportHandler preImportDeleteQuery gotcha

Pretty printing json

Here is a really simple way to pretty print some unformatted json $ echo ‘{"foo": "lorem", "bar": "ipsum"}’ | python -mjson.tool   {       "bar": "ipsum",       "foo": "lorem"   }

Posted in bash, python | Comments Off on Pretty printing json