Refugees resettling in the U.S. in 2015: where from, where to and who?

2015 was a challenging year for humanity. The plight of refugees was never far from the headlines, as multiple catastrophes unfolded across the globe.

Origin and state maps

This case study has two aims: first, to show facts on the numbers and details of refugees resettled into the U.S. in 2015, and, second, to discuss the methodology to help and inspire others to explore the data.

Contents

  1. Introduction
  2. Where From?
  3. Where To?
  4. What Religious Affiliation?
  5. Appendix 1: Sourcing the Raw Data
  6. Appendix 2: Setting up the Data in SQL
  7. Appendix 3: Analysis in SQL
  8. Appendix 4: Visualizing in Tableau

Continue reading Refugees resettling in the U.S. in 2015: where from, where to and who?

Google Sheets Query function: Learn the most powerful function in Google Sheets

Have you heard of the Google Sheets query formula?

It allows you to use pseudo-SQL (Structured Query Language, the code used to communicate with databases) code to manipulate your data in Google Sheets and it’s SUPER powerful.

It’s arguably the most powerful function in Google Sheets.

This video is lesson 14 of 30 from my free Google Sheets course: Advanced Formulas 30 Day Challenge.

Setting up your Google sheet

1. If you want to follow along and have the same data and see all of the formula solutions, click this link to make your own copy of the Google Sheet:

Grab the data and solution file for this tutorial:
Click here to get your own copy >>

This is what our starting data looks like:

Google Sheets query function data

2. Ensure you have the whole table selected (Pro-tip: click somewhere in your table and hit Ctrl + A (on PC) or Cmd + A (on Mac) to highlight the whole table):

Google Sheets highlight whole table with Cmd A

3. Go to the menu: Data > Named ranges… and click this menu. A new pane will show in the right side of your spreadsheet as follows:

Google Sheets Named range menu

4. In the first input box, enter the word “countries”, as shown in the image below. This names our table of data so we can refer to it easily.

Google Sheets Named range menu detail

5. In Google sheets we use the Google Sheets QUERY function and write our pseudo-SQL code inside this function. So, we’ll enter all of our SQL code inside a QUERY function in cell G1.

=QUERY(countries,"our SQL code goes here between the quotes",1)

Ok, now we’re set up, let’s start writing SQL code!

SELECT all the data with the Google Sheets QUERY function

The SQL code SELECT * retrieves all of the columns from our data table.

6. To the right side of the table, type the following Google Sheets QUERY function into cell G1:

=QUERY(countries,"SELECT *",1)

7. The output from this query is our full table again, because SELECT * retrieves all of the columns from the countries table:

Google Sheets query select star

Wow, there you go! You’ve written your first piece of SQL code! Pat yourself on the back.

The equivalent real world SQL code would be:

SELECT *
FROM countries

Spot the difference: We don’t include a FROM clause with the Google Sheets QUERY function. This is different to SQL in the real world, where we must specify a FROM clause to select our table.

SELECT specific columns only

8. What if we don’t want to select every column, but only certain ones? Modify your Google Sheets QUERY function to read:

=QUERY(countries,"SELECT B, D",1)

9. This time we’ve selected only columns B and D, so our output will look like this:

Google Sheets query select specific columns

Equivalent real world SQL code:

SELECT country, population
FROM countries

Grab the data and solution file for this tutorial:
Click here to get your own copy >>

WHERE clause

The WHERE clause specifies a condition that must be satisfied. It filters our data. It comes after the SELECT clause.

10. Modify your Google Sheets QUERY function in cell G2 to select only countries that have a population greater than 100 million:

=QUERY(countries,"SELECT B, D WHERE D > 100000000",1)

11. Our output table is:

Google Sheets query select where clause

Equivalent real world SQL code:

SELECT country, population
FROM countries
WHERE population > 100000000

12. Let’s see another WHERE clause example, this time selecting only European countries. Modify your formula to:

=QUERY(countries,"SELECT B, C, D WHERE C = 'Europe' ",1)

13. Now the output table is:

Google Sheets query select where clause

Equivalent real world SQL code:

SELECT country, continent, population
FROM countries
WHERE continent = 'Europe'

ORDER BY clause

The ORDER BY clause sorts our data. We can specify column(s) and direction (ascending or descending). It comes after the SELECT and WHERE clauses.

14. Let’s sort our data by population from smallest to largest. Modify your formula to add the following ORDER BY clause, specifying an ascending direction with ASC:

=QUERY(countries,"SELECT B, C, D ORDER BY D ASC",1)

15. The output table:

Google Sheets query Select with order by ascending

Equivalent real world SQL code:

SELECT country, continent, population
FROM countries
ORDER BY population ASC

16. Modify your formula in cell G1 to sort the data by country in descending order, Z – A:

=QUERY(countries,"SELECT B, C, D ORDER BY B DESC",1)

17. Output table:

Google Sheets query Select with order by descending

Equivalent real world SQL code:

SELECT country, continent, population
FROM countries
ORDER BY country DESC

LIMIT clause

The LIMIT clause restricts the number of results returned. It comes after the SELECT, WHERE and ORDER BY clauses.

18. Let’s add a LIMIT clause to our formula in G1 and return only 10 results:

=QUERY(countries,"SELECT B, C, D ORDER BY D ASC LIMIT 10",1)

19. This now returns only 10 results from our data:

Google Sheets query limit clause

Equivalent real world SQL code:

SELECT country, continent, population
FROM countries
ORDER BY population ASC
LIMIT 10

Arithmetic functions

We can perform standard math operations on numeric columns.

So let’s figure out what percentage of the total world population (7.16 billion) each country accounts for.

20. We’re going to divide the population column by the total (7,162,119,434) and multiply by 100 to calculate percentages. So, modify our formula to read:

=QUERY(countries,"SELECT B, C, (D / 7162119434) * 100",1)

I’ve divided the values in column D by the total population (inside the parentheses), then multiplied by 100 to get a percentage.

21. The output table this time is:

Google Sheets query arithmetic operations

Note – I’ve applied formatting to the output column in Google Sheets to only show 2 decimal places.

Equivalent real world SQL code:

SELECT country, continent, (population / 7162119434) * 100
FROM countries

LABEL clause:

22. That heading for the arithmetic column is pretty ugly right? Well, we can rename it using the LABEL clause (however, careful as this is not part of SQL syntax). Try this out:

=QUERY(countries,"SELECT B, C, (D / 7162119434) * 100 LABEL (D / 7162119434) * 100 'Percentage'",1)

=QUERY(countries,"SELECT B, C, (D / 7162119434) * 100 LABEL (D / 7162119434) * 100 'Percentage' ",1)

Equivalent real world SQL code:

SELECT country, continent, (population / 7162119434) * 100 AS Percentage
FROM countries

Grab the data and solution file for this tutorial:
Click here to get your own copy >>

Aggregation functions

We can use other functions in our calculations, for example min, max and average.

23. To calculate the min, max and average populations in your country dataset, use aggregate functions in your query as follows:

=QUERY(countries,"SELECT max(D), min(D), avg(D)",1)

24. The output returns three values – the max, min and average populations of the dataset, as follows:

Google Sheets query aggregate functions

Equivalent real world SQL code:

SELECT max(population), min(population), avg(population)
FROM countries

GROUP BY clause

Ok, take a deep breath. This is the most challenging concept to understand. However, if you’ve ever used pivot tables in Google Sheets (or Excel) then you should be fine with this.

The GROUP BY clause is used with aggregate functions to summarize data into groups, in the same way a pivot table does.

25. Let’s summarize by continent and count out how many countries per continent. Change your query formula to include a GROUP BY clause and use the COUNT aggregate function to count how many countries, as follows:

=QUERY(countries,"SELECT C, count(B) GROUP BY C",1)

Note, every column in the SELECT clause (i.e. before the GROUP BY) must either be aggregated (e.g. counted, min, max) or appear after the GROUP BY clause (e.g. column C in this case).

26. The output for this query is:

Google Sheets select query with group by

Equivalent real world SQL code:

SELECT continent, count(country)
FROM countries
GROUP BY continent

27. Let’s see a more complex example, incorporating many different types of clause. Modify the formula in G1 to read:

=QUERY(countries,"SELECT C, count(B), min(D), max(D), avg(D) GROUP BY C ORDER BY avg(D) DESC LIMIT 3",1)

This may be easier to read broken out onto multiple lines:

=QUERY(countries,
"SELECT C, count(B), min(D), max(D), avg(D)
GROUP BY C
ORDER BY avg(D) DESC
LIMIT 3"
,1)

This summarizes our data for each continent, sorts by highest to lowest average population and finally limits the results to just the top 3.

The code is very similar to real SQL and it’s looking pretty complex now! Good job!

28. The output of this query is:

Google Sheets complex query

Equivalent real world SQL code:

SELECT continent, count(country), min(population), max(population), avg(population)
FROM countries
GROUP BY continent
ORDER BY avg(population) DESC
LIMIT 3

Well, that’s all for this tutorial folks!

I hope this tutorial allowed you to understand and write some basic SQL code, and see that it needn’t be intimidating.

And I hope you feel inspired to keep learning!

Please feel free to leave comments or questions below.

Advanced techniques

This video is lesson 15 of 30 from my free Google Sheets course: Advanced Formulas 30 Day Challenge.

See also:

> How to add a total row to your Query formulas

> How to use dates as filters in your Query formulas

Resources

My free Advanced Formulas course for Google Sheets: Learn 30 Advanced Formulas in 30 Days

Official documentation for the QUERY() function.

Official documentation for Google’s Visualization API Query Language.

Want to keep learning SQL? Try this online tutorial from W3 Schools.

Prefer video? Try this free 1 hour introduction to the SQL language.

Excel tutorial: Building a dynamic, animated dashboard for U.S. political data

Excel dynamic dashboard screenshot

Who doesn’t love a dynamic, animated Excel dashboard?

Here’s one I’ve been working on recently, a data visualization of historical U.S. political data, showing party trends, state populations and sitting presidents over time:

Excel Dynamic Dashboard - US Political data
Click to see full size

In the following post, I delve into the details of how I created this dashboard. It’s not a full cell-by-cell account of how I did it, because that would require an article at least twice as long, but rather a look at the various steps and thought processes along the way.

If it appears a little ragged, that’s because it probably is! Most likely because I’m writing this bleary eyed at 1am, between feeds and diaper changes of my 6 week old son. 😉

Continue reading Excel tutorial: Building a dynamic, animated dashboard for U.S. political data