I am now an automattician!

Today is my first official day at Automattic. I am excited!

Posted in wordpress | 1 Comment

UNIX tip – xargs with multiple commands

Xargs is an extremely powerful complement to the awesome find command. One downside is that you usually need to have a single pipeline. By default you can’t put together a bunch of commands which are not piped. However, it is possible to call a shell with xargs. In this way, you can execute multiple commands in this shell, but from xargs point of view, it is calling a single command – the shell interpreter. More details here:

bash – xargs with multiple commands as argument – Stack Overflow

Here is an example for copying some log files

find ../mca/logs -name '*20140713*' -type d | xargs -I XXX bash -c 'source=XXX; target=${source/foo/bar}; rsync -rltvu ${source}/ $target;'
Posted in UNIX | Comments Off on UNIX tip – xargs with multiple commands

Postgres tip of the day – show size of all databases

Here is a handy little query to show the size of all the databases on a particular postgres server:

SELECT pg_database.datname,  
  pg_size_pretty(pg_database_size(pg_database.datname)) AS size  
FROM pg_database;  
  
   datname    |  size  
--------------+---------  
 template1    | 6369 kB  
 template0    | 6361 kB  
 postgres     | 6589 kB  
 foo          | 55 MB  
 bar          | 5129 MB  
 foobar       | 85 GB  
(6 rows)  
Posted in sql | Comments Off on Postgres tip of the day – show size of all databases

Unicode block names in regular expressions

Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression which will test if a character is contained in a certain block.
For all the different possible blocks, see here:
Unicode block names for use in XSD regular expressions

Here are some very simple blocks for detecting katakana, hiragana and kanji

robert_felty$ echo "ア" | perl perl -CIO -nle 'if (/\p{Katakana}/) { print "this contains katakana\n";}'  
this contains katakana  
 
robert_felty$ echo "あ" | perl perl -CIO -nle 'if (/\p{Hiragana}/) { print "this contains hiragana\n";}'  
this contains hiragana  
 
robert_felty$ echo "安" | perl perl -CIO -nle 'if (/\p{Han}/) { print "this contains kanji\n";}'  
this contains kanji  

This style of character block for regex is supported in many languages, including Java and perl. Note that it is not supported in python using the default “re” module. There is an alternative module called “regex”, which does support it:
regex 2014.02.19 : Python Package Index

One final thought – don’t try to use unicode block ranges, like: [\x{4E00}-\x{9FBF}]. This is prone to error

Posted in bash, java, perl, python, regex | Comments Off on Unicode block names in regular expressions

Monkey patching in python

I was just reading an article about Martijn Pieters, who is a python expert, and he mentioned
monkey patching

I did not know what monkey patching is, so I googled it, and found a great answer on stack overflow

Basically, it takes advantage of python’s class access philosophy. Unlike java, which has a strict access policy, in python, all attributes and methods of a class are mutable. So it is possible to write code like this:

from SomeOtherProduct.SomeModule import SomeClass  
 
def speak(self):  
   return "ook ook eee eee eee!"  
 
SomeClass.speak = speak

This could be particularly useful for unittesting, which is also mentioned in the stackoverflow answer

For instance, consider a class that has a method get_data. This method does an external lookup (on a database or web API, for example), and various other methods in the class call it. However, in a unit test, you don’t want to depend on the external data source – so you dynamically replace the get_datamethod with a stub that returns some fixed data.

Posted in python | Comments Off on Monkey patching in python