Big Data ETL and Analysis with Google Big Query

Google App Engine Pipeline Screenshot

So, after watching a few cool videos from Google I/O 2011 & 2012, and reading a bit of documentation, I wanted to try to do the following in Python:

  1. Use Google App Engine to run a MapReduce job to transform (i.e., ETL) some data in parallel via App Engine’s MapReduce and Pipelines API into a CSV format compatible with Google Big Query.
  2. Store the transformed results of my MapReduce job in Google Cloud Storage
  3. Ingest (i.e., store) the transformed results in a new table inside of a Big Query dataset
  4. Utilize BigQuery to run blazingly fast queries across my data.

The only issue was easily finding a working “hello world” sort of code sample that would teach me how to do all of that. After a bit of searching, I found what I was looking for, and quickly was able to do what I wanted. Here are the links in case anyone is interested:

Enjoy!

5 thoughts on “Big Data ETL and Analysis with Google Big Query

  1. Hi Michael,

    I’d love to help you improve the codelab. We’re big users of App Engine, MapReduce, Cloud Storage, and Big Query for a lot projects ranging from our API to our Big Data analytics at YouVersion, so I’d be happy to gather feedback from the team and get you as much useful info as possible. Could you follow me on Twitter (@ashvs) so that I can DM you my email address? Unless of course you feel like risking it and want to just put your email address in a comment on this blog post
    :-)

  2. Awesome Michael, thanks! I just DM’d you my email address.
    Incidentally, we use Google+ quite a bit on our team as well–almost daily for Hangouts since our little team within the bigger YouVersion team consists of folks living in OKC, Dallas, Houston, Florida, and France.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s