2015 was a challenging year for humanity. The plight of refugees was never far from the headlines, as multiple catastrophes unfolded across the globe.
This case study has two aims: first, to show facts on the numbers and details of refugees resettled into the U.S. in 2015, and, second, to discuss the methodology to help and inspire others to explore the data.
- Where From?
- Where To?
- What Religious Affiliation?
- Appendix 1: Sourcing the Raw Data
- Appendix 2: Setting up the Data in SQL
- Appendix 3: Analysis in SQL
- Appendix 4: Visualizing in Tableau
Continue reading Refugees resettling in the U.S. in 2015: where from, where to and who?
Calculating median and mode values for a dataset in MySQL is more complicated than you might expect.
Sure, there is the avg function to calculate mean, which is simple to use, but what about finding the median or mode average values? There are no equivalent functions in MySQL, so we have to work a bit harder to compute these values.
There are many ways to do this. Below I show the approach I take to calculate mean, median and mode values, along with sharing some resources at the end of this post that continue the discussion on the merits of different approaches.
Continue reading Hang on a Median, whaddya Mean there’s no Mode function in MySQL?
I’m working my way through this book, Data Analysis Using SQL and Excel, at the moment and chapter 6 is all about survival and retention modelling. I learn best when I can attack real life problems, so I took some of the lessons from this chapter and applied them to the MailChimp email data I already had in a MySQL database.
Previously I looked at setting up a database and doing some basic MySQL analysis on MailChimp data, as well as some more in-depth analysis such as creating a histogram of email subscribers.
This post takes it a step further by looking at subscriber behaviour within the email campaign data. What can we say about how long people remain active subscribers? For a subscriber who has been active for a given length of time, how likely are they to continue being active?
Continue reading Survival and retention analysis using MySQL
This post digs deeper into the MailChimp data which I analyzed in my previous post. In this analysis, I’ve focused on the distribution of subscribers and what was the most recent date they opened an email.
Specifically, I wanted to answer:
- What is the distribution of subscribers by campaign date that opened an email in 2014?
- And what are their email addresses?
The way I approached this problem was to break it down into its constituent parts, tackle each of those and then build that back up into a single query.
Continue reading Creating a histogram from MailChimp data using MySQL
This post shows how I analyzed a MailChimp email list, including all the data from weekly newsletters for 2014, using MySQL. It was inspired by this excellent tutorial: Performing cohort analysis by Micheal Herman. The email list I’m working with consists of about 18,000 subscribers.
I wanted to answer questions such as:
- How many emails do active subscribers open on average?
- How active are the subset of users who bought a product during the digital flash sale in March of this year?
- Who hasn’t opened an email at all in 2014?
Continue reading Deep dive into Mailchimp email data using MySQL