Yearly Archives: 2015

Exploring querying parquet with Hive, Impala, and Spark

At Automattic, we have a lot of data from WordPress.com, our flagship product. We have over 90 million users, and 100 million blogs. Our data team is constantly analyzing our data to discover how we can better serve our users. In 2015, one of our big focuses has been to improve the new user experience. […]

Posted in wordpress | Tagged , , , , , | Comments Off on Exploring querying parquet with Hive, Impala, and Spark

(Un)verified

According to my city’s website to pay my water bill, I am both a verified and an unverified user. Not sure how that it is possible

Posted in wordpress | Comments Off on (Un)verified

I am now an automattician!

Today is my first official day at Automattic. I am excited!

Posted in wordpress | Comments Off on I am now an automattician!

UNIX tip – xargs with multiple commands

Xargs is an extremely powerful complement to the awesome find command. One downside is that you usually need to have a single pipeline. By default you can’t put together a bunch of commands which are not piped. However, it is possible to call a shell with xargs. In this way, you can execute multiple commands […]

Posted in UNIX | Comments Off on UNIX tip – xargs with multiple commands

Postgres tip of the day – show size of all databases

Here is a handy little query to show the size of all the databases on a particular postgres server: SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database; datname | size ————–+——— template1 | 6369 kB template0 | 6361 kB postgres | 6589 kB foo | 55 MB bar | 5129 MB foobar | 85 GB (6 […]

Posted in sql | Comments Off on Postgres tip of the day – show size of all databases