Monkey patching in python

I was just reading an article about Martijn Pieters, who is a python expert, and he mentioned
monkey patching

I did not know what monkey patching is, so I googled it, and found a great answer on stack overflow

Basically, it takes advantage of python’s class access philosophy. Unlike java, which has a strict access policy, in python, all attributes and methods of a class are mutable. So it is possible to write code like this:

from SomeOtherProduct.SomeModule import SomeClass  
def speak(self):  
   return "ook ook eee eee eee!"  
SomeClass.speak = speak

This could be particularly useful for unittesting, which is also mentioned in the stackoverflow answer

For instance, consider a class that has a method get_data. This method does an external lookup (on a database or web API, for example), and various other methods in the class call it. However, in a unit test, you don’t want to depend on the external data source – so you dynamically replace the get_datamethod with a stub that returns some fixed data.

Posted in python | Comments Off on Monkey patching in python

Java anchored regex

I just discovered this today when doing some regex in Java. When I first started doing regex in Java, I was surprised to learn that Java seems to treat all regular expressions as anchored. That is, if you have a string foobar and search for “foo” it will not match. This is different than grep, perl, and other tools. In other words, for Java, the following regexes are equivalent:


If you want to find foo within foobar you need to use


I discovered one more interesting tidbit. If you put explicit anchors in, leading and trailing parts of the regex are ignored.

Here are some examples:

// some tests which illustrate implicit anchoring  
"foobar".matches("foo"); //false - rewrite = "^foo$"  
"foobar".matches("bar"); //false - rewrite = "^bar$"  
"foobar".matches("foo.*"); //true - rewrite = "^foo.*$"  
"foobar".matches("bar.*"); //false - rewrite = "^bar.*$"  
"foobar".matches(".*foo.*"); //true - rewrite = "^.*foo.*$"  
"foobar".matches(".*bar.*"); //true - rewrite = "^.*bar.*$"  
"foobar".matches(".*oo.*"); //true - rewrite = "^.*oo.*$"  
// now some tests with optional characters before or after explicit anchors  
// optional characters before or after initial/final anchors have no effect  
"foobar".matches(".*^foo"); //false - rewrite = "^foo$"  
"foobar".matches(".*^foo.*"); //true - rewrite = "^foo.*$"  
"foobar".matches(".*^foo$.*"); //false - rewrite = "^foo$"  
"foobar".matches(".*^foobar$.*"); //true - rewrite = "^foobar$"  
"foobar".matches("[a-z]*^foobar$.*"); //true - rewrite = "^foobar$"  
"foobar".matches(".+^foobar$.*"); //false can't match a character before the beginning of the string
Posted in java, regex | Comments Off on Java anchored regex

Solr DataImportHandler preImportDeleteQuery gotcha

One handy feature of the DataImportHandler in solr is that you can group documents by different entities. In the MKB we have a couple different kinds of entities we import – songs, albums, tvshows, etc. Sometimes we make a change or improvement to the underlying data of one type of entity, and want to test it out. Instead of reimporting all the data, we can just reimport that one specific entity. To do this correctly, we need to define a preImportDeleteQuery. When solr does dataimport, if you select to “clean” the data, it will remove all the documents before it imports new ones, so you don’t end up with duplicates. By default, solr will simply delete documents using the query “*:*”, which deletes all documents. That is not what we want to do. We only want to delete a certain type of document matching our entity. We have a type field in solr for every entity. All I have to do id specify a preImportDeleteQuery for an entity in the DataImportHandler configuration file, like so:

 <entity name="sportTeams" dataSource="sports"  
 transformer="RegexTransformer" query="

Today I discovered a gotcha though – I left this off for one of my entities, and when doing a full import of all entities, I was only getting the last entity. This was happening because it was deleting all the documents which had just been imported. I was surprised by this, since I thought that the preImportDeleteQuery only applied when importing one entity at a time, not when importing all entities. Apparently I was wrong – so this is the gotcha
If you use preImportDeleteQuery on any entity in your DataImportHandler configuration file, you should use it for all entities

Posted in lucene, solr | Comments Off on Solr DataImportHandler preImportDeleteQuery gotcha

Pretty printing json

Here is a really simple way to pretty print some unformatted json

$ echo '{"foo": "lorem", "bar": "ipsum"}' | python -mjson.tool  
    "bar": "ipsum",  
    "foo": "lorem"  
Posted in bash, python | Comments Off on Pretty printing json

Using awk to sum rows of numbers

I have a script which takes a tab-delmited file for regression tests, and converts it xml. I want to do a sanity check, to make sure that the number of utterances in my xml files matches the number in the tab-delimited.txt file. I can do this in 2 lines in UNIX

robert_felty$ wc -l samples2.txt  
72148 samples2.txt  
robert_felty$ find . -name '*.xml' | xargs grep -c "<utterance lang='pt-br'" | cut -f 2 -d ':' | awk ' { sum +=$1 } END { print sum }'  

In the first line, I count the number of lines (there is a heade line, so I will be expecting 1 fewer lines)

In the next line, I find all the .xml file using find, then pipe that to xargs, where I use “grep -c” to count the number of matches to the utternace pattern. grep -c outputs rows like this
I want to sum up all the counts, so I cut out just the count field using cut, then I use awk to sum up all the counts.

I love UNIX pipelines!

Posted in bash, linux, UNIX | Comments Off on Using awk to sum rows of numbers