Friday, April 27, 2018

How to get Snowplow-Mini running on AWS

While looking at various Analytics engines we came across Snowplow Analytics. We wanted to give it a shot and experience it first hand. Luckily, they have something called as Snowplow-Mini. Its an easily deployable, single instance version of Snowplow. It essentially gives us, a taste of what Snowplow can do for us, as far as data collection, processing and analytics is concerned!

We started with the quick start guide and usage guide, performed all the steps mentioned there to get the Snowplow-Mini instance working. However, we did faced two annoying issues, investigating and fixing them, wasted a few hrs. This post is about those two issues, so that my fellow developers do not have to waste any time on investigating and fixing them.

Unable to: Generate a pair of read/write API keys for the local Iglu schema registry

We followed all steps mentioned in the usage guide but we were unable to generate the keys.
  • Navigate to http://<public dns>/iglu-server
  • Input the super API key set up in previous step, in the input box in the top right corner
  • Expanded the keygen section
  • Expanded the POST /api/auth/keygen operation
  • Input the appropriate vendor_prefix for this API key
  • Click Try it out!
At this, it should have generated the read and write keys for us. But all it did instead was, showed a progress bar and runs forever without return.

Investigating it in Chrome Developer Console revealed that the calls were failing with 401 UnAuthorized. After googling for this error a bit, I found that someone else was also facing a similar problem. Their solution was to do HTTP POST via CURL and that seemed to work. However it didn't work for us either.

I looked around for ways to debug the problem.
  • I connected to the Showplow-Mini instance via SSH (refer to AWS documentation on how to do this)
  • Checked the config under "snowplow" directory on the instance. Could not spot anything unusual there -- not that I knew much about it anyways :D
  • Checked the logs under "/var/logs" directory. Found a few things but could not really solve the problem.
  • Connected to PostgreSQL DB on the instance using the following command
    • psql --host=localhost --port=5432 --username=snowplow --dbname=iglu
      # Password is "snowplow"
  • Ran the query to check the API key
    • select * from apikeys;
  • What I saw next, made my jaw drop, in disbelief!
  • They API key is case-sensitive and the key Snowplow-Mini had saved was all in lowercase, even though when I had given it the key, I had given it in all caps.
  • Passing the key in small case and making the following call did result in generating the read/write API keys for local iglu schema registry
    • curl http://<IP address of your server>/api/auth/keygen -X POST -H "apikey: <your case sensitive API key>" -d "vendor_prefix=com.makkajai"
  • Duh! Yea I know.

  • How to connect to PostgreSQL Snowplow-Mini DB, I got to know that from here 
I must have easily wasted an hour trying to fix this problem. I hope others can save that time!

Unable to: See events in Kibana Dashboard

This was a tricky one. After raising sample events, I was unable to see them in Kibana Dashboard. This happens mainly because the "snowplow_stream_enrich" is not able to connect to the "elastic search service".

How Did I figure it out?
  • ssh into the Snowplow-Mini instance
  • I checked the logs under "/var/logs" directory. 
  • The logs seemed to be filled with exceptions like
    • Exception in thread “main” ip-xx-xx-xx-xx: ip-xx-xx-xx-xx: unknown
  • Googled it a bit, found the solution here 
  • Edit the file "/etc/hosts" and add the IP address information in that file as follows.
    • sudo vim /etc/hosts 
    • xx.xx.xx.xx ip-xx-xx-xx-xx localhost
  • xx.xx.xx.xx being the AWS local IP address.
  • Save and exit and re-start all services from the Snowplow-Mini console.
  • Generate a few events and open Kibana dashboard, and it worked this time!
After these two problems were out of the way, my Snowplow-Mini instance was fully up and running on AWS!
Have some Fun!