UNIX tip of the day – grep -P is slow

Unless you really need some advanced regular expressions only supported by PCRE, using POSIX regular expressions with grep is usually an order of magnitude faster – that’s because the default engine with grep uses finite automata, as opposed to a backtracking algorithm which PCRE uses ( the main featuress you gain from the backtracking algorithm are lookahead/lookbehind and backreferences)

Here’s a small example


$ time grep -E 'post:content.*facebook' a_bunch_of_files* | wc -l
1643
real 0m2.643s
user 0m1.304s
sys 0m1.306s


$ time grep -P 'post:content.*facebook' a_bunch_of_files* | wc -l
1643
real 0m29.542s
user 0m28.365s
sys 0m1.04s

Note that the -E flag uses “extended” regular expressions. All this does is change the default meaning of special characters. With the -E flag a pipe “|” means OR. Without the -E flag, a pipe “|” just represents the normal character, and to get the “special” meaning of OR, you have to escape it.

Join 165 other subscribers

archives

  • 2024 (10)
  • 2023 (8)
  • 2022 (15)
  • 2021 (19)
  • 2020 (1)
  • 2019 (1)
  • 2018 (2)
  • 2017 (1)
  • 2016 (2)
  • 2015 (5)
  • 2014 (5)
  • 2013 (2)
  • 2011 (7)
  • 2010 (10)
  • 2009 (50)
  • 2008 (28)
  • 2007 (31)
  • 2006 (8)

Category