lucene, solr

Solr DataImportHandler preImportDeleteQuery gotcha

One handy feature of the DataImportHandler in solr is that you can group documents by different entities. In the MKB we have a couple different kinds of entities we import – songs, albums, tvshows, etc. Sometimes we make a change or improvement to the underlying data of one type of entity, and want to test it out. Instead of reimporting all the data, we can just reimport that one specific entity. To do this correctly, we need to define a preImportDeleteQuery. When solr does dataimport, if you select to “clean” the data, it will remove all the documents before it imports new ones, so you don’t end up with duplicates. By default, solr will simply delete documents using the query “*:*”, which deletes all documents. That is not what we want to do. We only want to delete a certain type of document matching our entity. We have a type field in solr for every entity. All I have to do id specify a preImportDeleteQuery for an entity in the DataImportHandler configuration file, like so:

Like this: