Elasticsearch type gotcha

create-a-simple-image-stating-false-true-with-a

I recently was working on updating some elasticsearch tests we have for WordPress.com. We have so-called isolated tests, which use a real MySQL database and a real Elasticsearch test cluster. We create sample blogs, posts, and so forth, store them in the database, which triggers our hooks for indexing into Elasticsearch, and then we make search queries and test that we get the expected results. This is all done with PHPUnit. While creating a new test for a boolean field, I first tried to get the test to fail before getting it to pass, so that I know that the test is actually testing the right thing, and I was surprised to find that it wasn’t failing. It was telling me that true == false. Strange. I then discovered that it was using assertEquals instead of assertSame, in which case "false" does evaluate to true through type conversion. Any string evaluates to true. After switching to assertSame, my test started working as expecting. This change was done inside a function used by many other tests, so I expected many other tests to start failing too. Surprisingly, less than a dozen tests were failing out of thousands, and they were mostly all string to number comparisons, e.g. "1" === 1. In digging further, I discovered that there were several places in our indexing code in which numbers were being sent as strings. But when I looked at the field mapping, these fields were clearly defined as float or integer fields. I would expect that if I try to index a string into an integer field that I would get an Elasticsearch exception. Apparently not. Does that mean it is being stored as a string? I then tried some range queries, which I expected to fail. They passed as well. Then I did a query with both a fields parameter as well as _source=true, and I saw the difference. The value returned from fields was an integer, while the value from _source was a string. The _source is the exact json passed in during indexing. And this is how I learned that there is a coerce setting for Elasticsearch, which is true by default. Thus the coercion is done for the stored field, but the source remains unchanged. I am a bit surprised that I didn’t already know this, but I didn’t. Now I do, and knowing is half the battle. Hopefully this might help some other people if they run into this issue as well.

Join 38 other subscribers

archives

  • 2026 (1)
  • 2025 (10)
  • 2024 (10)
  • 2023 (8)
  • 2022 (15)
  • 2021 (19)
  • 2020 (1)
  • 2019 (1)
  • 2018 (2)
  • 2017 (1)
  • 2016 (2)
  • 2015 (5)
  • 2014 (5)
  • 2013 (2)
  • 2011 (7)
  • 2010 (10)
  • 2009 (50)
  • 2008 (28)
  • 2007 (31)
  • 2006 (8)

Category