So, after watching a few cool videos from Google I/O 2011 & 2012, and reading a bit of documentation, I wanted to try to do the following in Python:
- Use Google App Engine to run a MapReduce job to transform (i.e., ETL) some data in parallel via App Engine’s MapReduce and Pipelines API into a CSV format compatible with Google Big Query.
- Store the transformed results of my MapReduce job in Google Cloud Storage
- Ingest (i.e., store) the transformed results in a new table inside of a Big Query dataset
- Utilize BigQuery to run blazingly fast queries across my data.
The only issue was easily finding a working “hello world” sort of code sample that would teach me how to do all of that. After a bit of searching, I found what I was looking for, and quickly was able to do what I wanted. Here are the links in case anyone is interested:
- Sample Python code from Google’s open source code repository:
- A tutorial that walks you through that sample code step by step at a high level
- Caveat: There were one or two typos in the tutorial’s version of the code, so rely on the code in the repository when in doubt
Enjoy!

Hi Ash, I am glad you had a chance to test out the codelab. I’m in the process of improving the codelab documentation, can you let me know which parts of the tutorial code gave you problems?
Hi Michael,
I’d love to help you improve the codelab. We’re big users of App Engine, MapReduce, Cloud Storage, and Big Query for a lot projects ranging from our API to our Big Data analytics at YouVersion, so I’d be happy to gather feedback from the team and get you as much useful info as possible. Could you follow me on Twitter (@ashvs) so that I can DM you my email address? Unless of course you feel like risking it and want to just put your email address in a comment on this blog post
:-)
Sounds great Ash! I followed you on Twitter, and anyone can contact my from the link on my public G+ profile: https://plus.google.com/106641576811513429422/posts
Awesome Michael, thanks! I just DM’d you my email address.
Incidentally, we use Google+ quite a bit on our team as well–almost daily for Hangouts since our little team within the bigger YouVersion team consists of folks living in OKC, Dallas, Houston, Florida, and France.
Reblogged this on clasense4 blog.