Six reasons to start learning SQL

This is a guest post from Tomi Mester of Data36.

Back in 2011, during my college years, I got my first-ever job (an internship) at a cool IT startup. I got to do all kinds of interesting tasks but mostly it was data analysis.

The process was simple:

  • I received data from the developers in .csv format
  • I imported it into Excel
  • I crunched the data there
  • I sent the beautiful charts to my manager

We learned a lot about user behaviour, trends, and how we reached (or didn’t reach) our goals. I loved it – except for one thing: I always had to wait days (sometimes weeks) for the developers to export the data for me from the company’s SQL database.

That was my #1 reason to learn SQL!

I did so. And ever since then I’ve been a huge fan of this simple and elegant data language.


Because knowing SQL allowed me to do better, more detailed data analyses – more easily and faster. And I strongly believe that it can be extremely useful for everyone else who works in data analytics and who uses different spreadsheet tools in their day-to-day job.

In this article, I’ll give you 6 reasons why you should learn SQL too!
Continue reading Six reasons to start learning SQL

Refugees resettling in the U.S. in 2015: where from, where to and who?

2015 was a challenging year for humanity. The plight of refugees was never far from the headlines, as multiple catastrophes unfolded across the globe.

Origin and state maps

This case study has two aims: first, to show facts on the numbers and details of refugees resettled into the U.S. in 2015, and, second, to discuss the methodology to help and inspire others to explore the data.


  1. Introduction
  2. Where From?
  3. Where To?
  4. What Religious Affiliation?
  5. Appendix 1: Sourcing the Raw Data
  6. Appendix 2: Setting up the Data in SQL
  7. Appendix 3: Analysis in SQL
  8. Appendix 4: Visualizing in Tableau

Continue reading Refugees resettling in the U.S. in 2015: where from, where to and who?

Hang on a Median, whaddya Mean there’s no Mode function in MySQL?

Mean Median and Mode

Calculating median and mode values for a dataset in MySQL is more complicated than you might expect.

Sure, there is the avg function to calculate mean, which is simple to use, but what about finding the median or mode average values? There are no equivalent functions in MySQL, so we have to work a bit harder to compute these values.

There are many ways to do this. Below I show the approach I take to calculate mean, median and mode values, along with sharing some resources at the end of this post that continue the discussion on the merits of different approaches.
Continue reading Hang on a Median, whaddya Mean there’s no Mode function in MySQL?

Survival and retention analysis using MySQL

I’m working my way through this book, Data Analysis Using SQL and Excel, at the moment and chapter 6 is all about survival and retention modelling. I learn best when I can attack real life problems, so I took some of the lessons from this chapter and applied them to the MailChimp email data I already had in a MySQL database.

Previously I looked at setting up a database and doing some basic MySQL analysis on MailChimp data, as well as some more in-depth analysis such as creating a histogram of email subscribers.

This post takes it a step further by looking at subscriber behaviour within the email campaign data. What can we say about how long people remain active subscribers? For a subscriber who has been active for a given length of time, how likely are they to continue being active?

Continue reading Survival and retention analysis using MySQL

Creating a histogram from MailChimp data using MySQL

This post digs deeper into the MailChimp data which I analyzed in my previous post. In this analysis, I’ve focused on the distribution of subscribers and what was the most recent date they opened an email.

Specifically, I wanted to answer:

  • What is the distribution of subscribers by campaign date that opened an email in 2014?
  • And what are their email addresses?

The way I approached this problem was to break it down into its constituent parts, tackle each of those and then build that back up into a single query.

Continue reading Creating a histogram from MailChimp data using MySQL