Tag Archives: hadoop

Exploring querying parquet with Hive, Impala, and Spark

At Automattic, we have a lot of data from WordPress.com, our flagship product. We have over 90 million users, and 100 million blogs. Our data team is constantly analyzing our data to discover how we can better serve our users. In 2015, one of our big focuses has been to improve the new user experience. […]

Posted in wordpress | Tagged , , , , , | Comments Off on Exploring querying parquet with Hive, Impala, and Spark