Refugees resettling in the U.S. in 2015: where from, where to and who?

2015 was a challenging year for humanity. The plight of refugees was never far from the headlines, as multiple catastrophes unfolded across the globe.

Origin and state maps

This case study has two aims: first, to show facts on the numbers and details of refugees resettled into the U.S. in 2015, and, second, to discuss the methodology to help and inspire others to explore the data.


  1. Introduction
  2. Where From?
  3. Where To?
  4. What Religious Affiliation?
  5. Appendix 1: Sourcing the Raw Data
  6. Appendix 2: Setting up the Data in SQL
  7. Appendix 3: Analysis in SQL
  8. Appendix 4: Visualizing in Tableau


The Refugee Processing Center (RPC) provides the technical systems to process refugees resettling in the U.S. and they release public data on the numbers, countries of origin, religious affiliations and destinations of refugees resettled into the U.S.. I’ve investigated the 2015 data and present the analysis and methodology below.

I came across the RPC datasets through the excellent Data Is Plural newsletter – a weekly list of curated data sources.

I wanted to look at the data and answer some basic questions about the origins, destinations and religious affiliations of the refugees.

In the second half of this post, I delve into the methodology for this case study, from sourcing the data to creating the final charts in Tableau.

Where From?

In 2015, the U.S. accepted a total of 66,517 refugees from 74 different countries.

Here’s a chart showing a breakdown by country, ordered by largest to smallest number (I’m only showing the first 50 of the 74 countries, to fit on the screen):

Refugees by Country of Origin

And here are just the 20 largest, to better see the details:

Origin Largest 20

What’s interesting is that the first 5 countries accounted for almost 75% of the total refugees resettled in the U.S. in 2015, the first 10 countries for just over 90% of the refugees and the first 20 countries for nearly 99% of the total number. The remaining 54 countries therefore accounted for only around 1% of the total.

The following table shows the 10 countries from where the largest number of refugees were resettled in the U.S. in 2015:

# Country of Origin Number of Refugees Total % of Total Cumulative %
1 Burma 17,483 66,517 26.3 26.3
2 Iraq 10,169 66,517 15.3 41.6
3 Dem. Rep. Congo 8,206 66,517 12.3 53.9
4 Somalia 7,779 66,517 11.7 65.6
5 Bhutan 5,764 66,517 8.7 74.3
6 Iran 3,121 66,517 4.7 79.0
7 Syria 2,192 66,517 3.3 82.3
8 Eritrea 1,783 66,517 2.7 85.0
9 Ukraine 1,760 66,517 2.6 87.6
10 Sudan 1,688 66,517 2.5 90.1

To me, Bhutan was the most surprising country on this list. It’s famed for measuring Gross National Happiness and appears to be getting happier, so the large number of resettled refugees seems to contradict the idea of a mountain paradise, a veritable Shangri-La. It seems likely that at least some of these refugees being resettled are from Bhutan’s forgotten people: those of Nepalese ethnicity that lost their citizenship in the 1990’s and have been living in limbo since.

Where To?

This map shows the Lower 48 U.S. states shaded according to how many refugees they resettled in 2015. The darker colors correspond to larger numbers of refugees, the lighter shades to fewer numbers (so Texas resettled the most, and Montana and Wyoming, did not resettle any).

State map in Tableau

Note: Alaska and Hawaii were not shown on the map to achieve a better fit on the screen, but both did resettle a small number of refugees in 2015.

The following chart shows the numbers by state:

Refugee numbers by state

Zooming in to show just the larger states:

10 largest states

As you’d expect, it’s the largest and most populous U.S. states resettling the most number of refugees.

By combining the state and origin data into a single visualization, I created this tiled map showing which states in the U.S. resettled refugees, from the largest nine origin countries (by count). As before, the darker the shade of the map, the larger the number of refugees resettled.

Origin and state maps

See the visualization at a larger scale on Tableau here.

What Religious Affiliation?

The following chart shows the major religious groups of the resettled refugees, as listed in the RPC dataset:

Refugee religious affiliation

At first glance it looks like approximately equal numbers of refugees claim Christian and Moslem affiliations, but look a little closer and you see there are other sub-groups of Moslems and Christians listed (e.g. Moslems who explicitly list Suni or Shiite). So let’s classify the religions into broader groups and see what the totals look like:

Group Count Total % of Total
Christian 30,706 66,517 46.2
Islam 26,636 66,517 40.0
Other 9,175 66,517 13.8

So in fact, the largest group of refugees by religious affiliation were Christian, representing 46% of the refugees resettled. Not far behind, around 40% of refugees identified with a branch of Islam. That left approximately 14% who identified with various other different religious groups, or were atheists, or the information was not available.

Case Study Methodology Notes

Appendix 1: Sourcing the Raw Data

I originally found out about this dataset via the excellent Data is Plural newsletter, which is a weekly collection of interesting and esoteric datasets and recommended to anyone interested in data.

The raw data for refugee numbers is publicly available from the Refugee Processing Center website.

Go to the Reports > Interactive Reporting menu, where you’ll see the following menu:

Refugee Processing Center Menu

For this case study, I downloaded datasets 2 and 3 out of the 4 options shown above. Clicking each in turn takes you to the next menu where you can choose the specifics for the dataset. To get the 2015 data, I entered the start and end dates of 2015, selected all of the Nationalities and all of the Religions:

Refugee Processing Center dataset

This will create the report in your browser, from where you can download it to your desktop. Tip: Download as CSV, not Excel, because the format is tabular and therefore much easier to work with off the bat.

Rename the two CSV files to remove the spaces, as these can cause issues when we try to import into PostgreSQL later. I went with:

  • mx_arrivals_destination_nationality.csv
  • mx_arrivals_nationality_religion.csv

At this stage, you should now have two CSV files: i) Arrivals by Destination and Nationality, and ii) Arrivals by Nationality and Religion:

Raw refugee CSV data

As a final clean-up step before importing to PostgreSQL, delete the top 3 rows of each Excel sheet (you can see them in the image above). The table column headings start in row 4, so we want to get rid of what’s above.

Appendix 2: Setting up the data in SQL

Let’s now look at importing and analyzing that raw data in SQL. In this case, I’ve used PosgreSQL, although any other SQL environment would suffice.

First, I need to create two tables which I will load my data into. I’ve added id numbers to both (since we didn’t have one in the CSV files) and set all of the fields to “text” datatype for the time being. We’ll adjust this as required in the next step. The code:

creates two new tables in your database. They will show in the object browser window when you refresh.

Next, let’s import the two CSV files, using the following code (replace the filepath with your own, pointing to the CSV files you downloaded in Step 1):

Check your data has imported correctly with the following code:

You should get an output looking like this now:

SELECT statement SQL

It looks great at first glance!

But take a look down at rows 2845 and 2846 of the ref_destination_data table or rows 336 and 337 of the ref_religion_data table. Something doesn’t look right. Turns out the reports from the Refugee Processing Center each contain two tables, which are duplicated but with different ordering (for example first ordered by Nationality, then by Destination).

So let’s delete these duplicate records from each table, as follows:

Great, that fixed that issue. Now our data tables are in much better shape, but not quite ready for analysis.

Before we can start, there’s a couple of final preparation steps we need to do. Firstly, we need to convert the datatypes of the number columns from “text” to a suitable number datatype, so that we may perform mathematical operations on them. Secondly, we’ll rename the column headings to something more meaningful.

So, let’s deal with the datatype issue. I need to remove the commas from the text values, with the REPLACE function. Then I cast the column as an integer with the USING clause, because the text columns cannot automatically be cast to integer. The code is as follows:

Lastly, let’s change the column headings to something more meaningful, with the following code:

Well, that’s all the prep we need to do with the tables, so let’s start the analysis.

Appendix 3: Analysis in SQL

To calculate refugee numbers by Country of Origin, we use a SQL query and group by the Country of Origin, as follows:

which gives an output like this:

Origin Query in SQL

To calculate a cumulative % total (so I can easily see how “big” the top 10 or 20 countries are) I used this more complex query, which uses a WITH CTE and a cross-join (cartesian product) to create a running total:

The LIMIT clause just restricts the output table to the first 20 results, which is what I’m interested in here. The output from this query is this table shown at the top of the blog post here.

To find the number of refugees resettled into each State, we use a similar SQL query to the first one above, grouping by state this time, as follows:

We can modify this slightly to show the states ordered by number of refugees:

which gives an output like this:

State Query in SQL

Finally, if we want, we can easily calculate the mean and standard deviation for the number of refugees admitted to each state, and again for the number of refugees from each country of origin:

From the Religion data, we can summarize the data to see which are the most common religions amongst the refugees:

The output is:

Religion data in SQL

To create the summary table of religions by group, I used a rather cumbersome CASE statement to categorize the religions into broad groups:

View the entire SQL file here on GitHub.

That’s all the analysis in SQL, so let’s jump over to Tableau and chart this data.

Appendix 4: Visualizing in Tableau

Firstly, we need to connect Tableau Desktop to our PostgreSQL database so we can get hold of the data to chart.

So open up Tableau and click PostgreSQL on the menu item to get to the following Server Connection menu:

Connect to Server in Tableau

For me, the credentials I used were:

Server: localhost
Port: 5432
Database: Test (but use whatever name you’ve given your database in PostgreSQL)

and then whatever username and password I use in PostgreSQL.

The first chart I created was the column chart showing the breakdown by country of origin. This is based off the ref_destination_data dataset.

I put the “Origin” dimension into the Columns shelf, added the measure “Num Unique” to the Rows shelf and made sure it was set to “SUM”. This gives me a basic column chart, so I changed the color to red, changed the axis names, added data labels and added a title. The following screenshot shows the chart:

Origin chart in Tableau

I created the column charts for the breakdown by state and by religion the same way.

To create the more detailed column charts showing only the 10 or 20 largest results, I added a filter. Drag the Dimension onto the Filter shelf, click on it to bring up the drop down menu and select Filter…. Next select Top from the four menu items across the top of the pop-up pane and set the conditions to select only the Top 20 based on the value field (Num Unique), as follows:

Filter on Top 20 in Tableau

Next, create a map of the U.S. states showing number of resettled refugees.

Drag Longitude(generated) onto the Columns shelf, and Latitude(generated) onto the Rows shelf. Then drag the State Dimension onto the bottom of the Marks shelf and use the drop down menu to select “Filled Map”, which should give you a map of the U.S. with states shaded blue.

Next add State Dimension to the Filter shelf and filter out Alaska and Hawaii, to focus on the lower 48. Drag the Num Unique Measure onto the Color icon within the Marks shelf, and finally change the color to a red scale, to get the following map:

Tableau basic map

I created the small-multiples chart for the Top 9 Countries of Origin to show in which states the refugees found new homes. Hat tip to Ben Jones at Data Remixed for his excellent tutorial, which served as a launchpad for my own explorations.

I won’t repeat all of the steps here (I encourage you to check out the Data Remixed tutorial above), but in essence what I did was create a duplicate of the basic map I created above, and then add Calculated Column and Row fields to divide the chart into small multiples.

The calculated fields are added by right-clicking in the Measures pane, then selecting Create Calculated Field… from the menu.

The code for the Row Calculated Field is:

And the corresponding code for the Column Calculated Field is:

It’s critical when you add these two calculated fields to their respective Column and Row shelves, that they go before the Longitude and Latitude fields, and that they are set to be Dimensions and Discrete (they should have blue shading).

The following screenshot shows the final chart and the Calculated Row code:

Tableau multiple map

The final step was to create a blank dashboard sheet, then copy this tiled map into it. Then I added titles for each tile, simply as manual text boxes.

This Tableau chart is available for viewing and downloading here.

Parting Thoughts

Phew, it’s a long post, covering a lot of ground. Even so, we’ve only done a relatively basic analysis of the datasets and there is much more we could do.

It would be interesting to download prior year data and compare and contrast findings year-on-year. It would be interesting to cross-reference these findings against the dates of natural or humanitarian crises to calculate the length of time from crisis to resettlement, and see if this is changing with time.

There is also a demographic dataset, including Age, Gender, Ethnicity, Language and Education, available on the RPC website too, so it would be interesting to download and analyze that data.

Update 1/20/2016: The Refugee Processing Center are hiring a Data Mining Specialist – check it out here!

Thoughts, comments or corrections? Leave a comment below.

One thought on “Refugees resettling in the U.S. in 2015: where from, where to and who?”

  1. Awesome. I was actually right in the middle of doing this exact same thing and thought, “I bet someone’s already done this.” Thanks for the work!

Leave a Reply

Your email address will not be published. Required fields are marked *