• UNIX tip of the day: two file processing with AWK

    I recently came across some AWK code from a work colleague that I did not understand at all awk -F’\t’ -v OFS=’\t’ ‘FNR==NR{a[$1]=$1;next};$1 in a{print $1,$2,$3}’ file1 file2 I usually like to understand code instead of blindly copying and pasting, so I did a little research into what this was doing. Searching for “awk FNR…


  • UNIX tip of the day – trap EXIT

    I was reading a shell script today and came across the trap command, which I was not aware of. Some googling led me to this article: How “Exit Traps” Can Make Your Bash Scripts Way MoreRobust And Reliable , which has a really nice explanation. Basically, trap acts sort of like a finally block in…


  • UNIX tip of the day – grep -P is slow

    Unless you really need some advanced regular expressions only supported by PCRE, using POSIX regular expressions with grep is usually an order of magnitude faster – that’s because the default engine with grep uses finite automata, as opposed to a backtracking algorithm which PCRE uses ( the main featuress you gain from the backtracking algorithm…


  • Improving my coding efficiency in vim

    I have been using Vim for most editing for about 12 years now. I think I tried it for the first time in about 2003, and quickly gave up. Then in 2004 my roommate at the time convinced me to give it another try, and I quickly got hooked. I wrote my dissertation completely in…


  • Exploring querying parquet with Hive, Impala, and Spark

    At Automattic, we have a lot of data from WordPress.com, our flagship product. We have over 90 million users, and 100 million blogs. Our data team is constantly analyzing our data to discover how we can better serve our users. In 2015, one of our big focuses has been to improve the new user experience.…


Join 165 other subscribers

archives

  • 2024 (10)
  • 2023 (8)
  • 2022 (15)
  • 2021 (19)
  • 2020 (1)
  • 2019 (1)
  • 2018 (2)
  • 2017 (1)
  • 2016 (2)
  • 2015 (5)
  • 2014 (5)
  • 2013 (2)
  • 2011 (7)
  • 2010 (10)
  • 2009 (50)
  • 2008 (28)
  • 2007 (31)
  • 2006 (8)

Category