Beyond Sheets: Get Started With Google BigQuery

This tutorial is written for Google Sheets users who have datasets that are too big or too slow to use in Google Sheets. It’s written to help you get started with Google BigQuery.

If you’re experiencing slow Google Sheets that no amount of clever tricks will fix, or you work with datasets that are outgrowing the 5-million cell limit of Google Sheets, then you need to think about moving your data into a database.

As a Google user, probably the best and most logical next step is to get started with Google BigQuery and move your data out of Google Sheets and into BigQuery.

Get started with Google BigQuery

We’ll explore five topics:

  1. What is BigQuery?
  2. Google BigQuery Setup
  3. How to get your data from Google Sheets into BigQuery
  4. How to analyze your data in BigQuery
  5. How to get your data out of BigQuery back into Google Sheets for reporting

By the end of this tutorial, you will have created a BigQuery account, uploaded a dataset from Google Sheets, written some queries to analyze the data and exported the results back to Google Sheets to create a chart.

You’ll also do the same analysis side-by-side in a Google Sheet, so you can understand exactly what’s happening in BigQuery.

I’ve highlighted the action steps throughout the tutorial, to make it super easy for you to follow along:

Google BigQuery exercise steps are shown in blue.

Actions for you to do in Google BigQuery.

Google Sheet exercise steps are shown in green.

Actions for you to do in Google Sheets.

Section 1: What is BigQuery?

Google BigQuery is a data warehouse for storing and analyzing huge amounts of data.

Officially, BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

This is a formal way of saying that it’s:

  • Works with any size data (thousands, millions, billions of rows…)
  • Easy to set up because Google handles the infrastructure
  • Grows as your data grows
  • Good value for money, with a generous free tier and pay-as-you-go beyond that
  • Lightning fast
  • Seamlessly integrated with other Google tools, like Sheets and Data Studio
  • Can import and export data from and to many sources
  • Has Built-in machine learning, so predictive modeling can be set up quickly

What’s the difference between BigQuery and a “regular” database?

BigQuery is a database optimized for storing and analyzing data, not for updating or deleting data.

It’s ideal for data that’s generated by e-commerce, operations, digital marketing, engineering sensors etc. Basically, transactional data that you want to analyze to gain insights.

A regular database is suitable for data that is stored, but also updated or deleted. Think of your social media profile or customer database. Names, emails, addresses, etc. are stored in a relational database. They frequently need to be updated as details change.

Section 2: Google BigQuery Setup

It’s super easy to get started wit Google BigQuery!

There are two ways to get started: 1) use the free sandbox account (no billing details required), or 2) use the free tier (requires you to enter billing details, but you’ll also get $300 free Cloud credits).

In either case, this tutorial won’t cost you anything in BigQuery, since the volume of data is so tiny.

We’ll proceed using the sandbox account, so that you don’t have to enter any billing details.

Step 1: Set up BigQuery

Follow these steps:

  1. Go to the Google Cloud BigQuery homepage
  2. Click “Sign in” in the top right corner
  3. Click on “Console” in the top right corner
  4. A new project called “My First Project” is automatically created
  5. In the left side pane, scroll down until you see BigQuery and click it

Here’s that process shown as a GIF:

BigQuery Login Process

You’re ready for Step 2 below.

BigQuery Console

BigQuery console
(click to enlarge)

Here’s what you can see in the console:

  1. The SANDBOX tag to tell you you’re in the sandbox environment
  2. Message to upgrade to the free trial and $300 credit (may or may not show)
  3. UPGRADE button to upgrade out of the Sandbox account
  4. ACTIVATE button to claim the free $300 credit
  5. The current project and where to create new projects
  6. The Query editor window where you type your SQL code
  7. Current project resource
  8. Button to create a new dataset for this project (see below)
  9. Query outputs and table information window

What is the free Sandbox Account?

The sandbox account is an option that lets you use BigQuery without having to enter any credit card information. There are limits to what you can do, but it gives you peace of mind that you won’t run up any charges whilst you’re learning.

In the sandbox account:

  • Tables or views last 60 days
  • You get 10 Gb of storage per month for free
  • And 1 Tb data processing each month

It’s more than enough to do everything in this tutorial today.

How to set up the BigQuery sandbox (YouTube video from Google Cloud)

BigQuery Pricing for Regular Accounts

Unlike Google Sheets, you have to pay to use BigQuery based on your storage and processing needs.

However, there is a sandbox account for free experimentation (see below) and then a generous free tier to continue using BigQuery.

In fact, if you’re working with datasets that are only just too big for Sheets, it’ll probably be free to use BigQuery or very cheap.

BigQuery charges for data storage, streaming inserts, and for querying data, but loading and exporting data are free of charge.

Your first 1 TB (1,000 GB) per month is free.

Full BigQuery pricing information can be found here.

Clicking on the blue “Try BigQuery free” button on the BigQuery homepage will let you register your account with billing details and claim the free $300 cloud credits.

Section 3: How to get your data into BigQuery

Extracting, loading and transforming (ELT) is sometimes the most challenging and time consuming part of a data analysis project. It’s the most engineering-heavy stage, where the heavy lifting happens.

You can load data into BigQuery in a number of ways:

  1. From a readable data source (such as your local machine)
  2. From Google Sheets
  3. From other Google services, such as Google Ad Manager and Google Ads
  4. Use a third-party data integration tool, e.g. Supermetrics, Stitch
  5. Use the CIFL BigQuery connector
  6. Write Apps Script to upload data
  7. From Google Cloud Storage, such as Google Cloud SQL
  8. Other advanced methods specific to Google Cloud

In this tutorial, we’ll look at loading data from a Google Sheet into BigQuery.

Get started with Google BigQuery: Dataset For This Tutorial

Step 2: Make a copy of the datasets for this tutorial

Make a copy of these Google Sheets in your Drive folder:

Brooklyn Bridge pedestrian traffic

Bicycle Crossings Of New York City Bridges

You might want to make a SECOND copy in your Drive folder too, so you can keep one copy untouched for the upload to BigQuery and use the second copy for doing the follow-along analysis in Google Sheets.

The first dataset is a record of pedestrian traffic crossing Brooklyn Bridge in New York city (source).

It’s only 7,000 rows, so it could be easily analyzed in Sheets of course, but we’ll use it here so that you can do the same steps in BigQuery and in Sheets.

The second dataset is a daily total of bike counts for New York’s East River bridges (source).

There’s noting inherently wrong with putting “small” data into BigQuery. Yes, it’s designed for truly gigantic datasets (billions of rows+) but it works equally well on data of any size.

Back in the BigQuery Console, you need to set up a project before you can add data to it.

Get started with Google BigQuery: Loading data From A Google Sheet

Think of the Project as a folder in Google Drive, the Dataset as a Google Sheet and the Table as individual Sheet within that Google Sheet.

The first step to get started with Google BigQuery is to create a project.

In step 1, BigQuery will have automatically generated a new project for you, called “My First Project”.

If it didn’t, or you want to create another new project, here’s how.

Step 3: Create a new Project

In the top bar, to the right of where it says “Google Cloud Platform”, click on Project drop-down menu.

In the popup window, click NEW PROJECT.

Give it a name, organization (your domain) and location (parent organization or folder).

Optionally, you can choose to bookmark this project in the Resources section of the sidebar. Click “? PIN PROJECT” to do this.

Step 4: Create a new Dataset

Next you need to create a dataset by clicking “? CREATE DATASET“.

Name it “start_bigquery”. You’re not allowed to have any spaces or special characters apart from the underscore.

Set the data location to your locale, leave the other settings alone and then click “Create dataset”

This new dataset will show up underneath your project name in the sidebar.

Step 5: Create a new Table

With the dataset selected, click on the “+ CREATE TABLE” or big blue plus button.

You want to select “Drive”, add the URL and set the file format to Google Sheets.

Name your table “brooklyn_bridge_pedestrians”.

Choose Auto detect schema.

Under Advanced settings, tell BigQuery you have a single header row to skip by entering the value 1.

Your settings should look like this:

Google BigQuery create table

If you make a mistake, you can simply delete the table and start again.

Section 4: Analyzing Data in BigQuery

Google BigQuery uses Structure Query Language (SQL) to analyze data.

The Google Sheets Query function uses a similar SQL style syntax to parse data. So if you know how to use the Query function then you basically know enough SQL to get started with Google BigQuery!

Basic SQL Syntax for BigQuery

The basic SQL syntax to write queries looks like this:

SELECT these columns
FROM this table
WHERE these filter conditions are true
GROUP BY these aggregate conditions
HAVING these filters on aggregates
ORDER BY i.e. sort by these columns
LIMIT restrict answer to X number of rows

You’ll see all of these keywords and more in the exercises below.

Get started with Google BigQuery: First Query

The BigQuery console provides a button that gives you a starter query.

Step 6: Write your first query

Click on “? QUERY TABLE” and this query shows up in your editor window:

SELECT  FROM `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians` LIMIT 1000

Modify it by adding a * between the SELECT and FROM, and reducing the number after LIMIT to 10:

SELECT * FROM `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians` LIMIT 10

Then format your query across multiple lines with through the menu: More > Format

SELECT
  *
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
LIMIT
  10

Click “▶️ Run” to execute the query.

The output of this query will be 10 rows of data showing under the query editor:

Google BigQuery first query
(click to enlarge)

Woohoo!

You just wrote your first query in Google BigQuery.

Let’s continue and analyze the dataset:

Exercise 2: Analyzing Data In BigQuery

Run through the following steps:

Step 7: tell the story of one row

I always advocate doing this with any new dataset.

Write a query that selects all the columns (SELECT *) and a limited number of rows (e.g. LIMIT 10), as you did in step 6 above.

Run that query and look at the output. Scan across one whole row. Look at every column and think about what data is stored there.

Think about doing the equivalent step in Google Sheets. Look at your dataset and scroll to the right, telling the story of a single row.

We do this step to understand our data, before getting too immersed in the weeds.

Select Specific Columns

Step 8: Select specific columns

Select specific columns by writing the column names into your query.

You can also click on column names in the schema view (click on the table name in the left sidebar to access this) to add them to the query directly.

SELECT
  hour_beginning,
  location,
  Pedestrians,
  weather_summary
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
LIMIT
  10

Math Operations

Let’s find out the total number of pedestrians that crossed the Brooklyn Bridge across the whole time period.

Step 9: Calculate total in Google Sheets

Open the Google Sheet you copied in Step 2, called “Copy of Brooklyn Bridge pedestrian count dataset”

Add this simple SUM function to cell C7298 to calculate the total:

=SUM(C2:C7297)

This gives an answer of 5,021,692

Let’s see how to do that in BigQuery:

Step 10: Math operations in BigQuery

Write a query with the pedestrians column and wrap it with a SUM function:

SELECT
  SUM(Pedestrians) AS total_pedestrians
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`

This gives the same answer of 5,021,692

You’ll notice that I gave the output a new column name using the code “AS total_pedestrians“. This is similar to using the LABEL clause in the QUERY function in Google Sheets

Filtering Data

In SQL, the WHERE clause is used to filter rows of data.

It acts in the same way as the filter operation on a dataset in Google Sheets.

Step 11: Filtering data in Google Sheets

Back in your Google Sheet with the pedestrian data, add a filter to the dataset: Data > Create a filter

Click on the filter on the weather_summary column to open the filter menu.

Click “Clear” to deselect all the items.

Then choose “sleet” and “snow” as your filter values.

Google Sheets filter

Hit OK to implement the filter.

You end up with 61 rows of data showing only the “sleet” or “snow” rows.

Now let’s see that same filter in BigQuery.

Step 12: WHERE filter keyword

Add the WHERE clause after the FROM line, and use the OR statement to filter on two conditions.

SELECT
  *
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
WHERE
  weather_summary = 'snow' OR weather_summary = 'sleet'

Check the count of the rows outputted by the this query. It’s 61, which matches the row count from your Google Sheet.

Ordering Data

Another common operation we want to do to understand our data is sort it. In Sheets we can either sort through the filter menu options or through the Data menu.

Step 13: Sorting data in Google Sheets

Remove the sleet and snow filter you applied above.

On the temperature column, click the Sort A → Z option, to sort the lowest temperature records to the top.

(Quick aside: it’s amazing to still see so many people walking across the bridge in sub-zero temps!)

Let’s recreate this sort in BigQuery.

Step 14: ORDER BY sort keyword

Add the ORDER BY clause to your query, after the FROM clause:

SELECT
  *
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
ORDER BY
  temperature ASC;

Use the keyword ASC to sort ascending (A – Z) or the keyword DESC to sort descending (Z – A).

You might notice that the first two records that show up have “null” in the temperature column, which means that no temperature value was recorded for those rows or it’s missing.

Let’s filter them out with the WHERE clause, so you can see how the WHERE and ORDER BY fit together.

Step 15: Filter out null values

The WHERE clause comes after the FROM clause but before the ORDER BY.

Remove the nulls by using the keyword phrase “IS NOT NULL”.

SELECT
  *
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
WHERE
  temperature IS NOT NULL
ORDER BY
  temperature ASC;

Aggregating Data

In Google Sheets, we group data with a pivot table.

Typically you choose a category for the rows and aggregate (summarize) the data into each category.

In this dataset, we have a row of data for each hour of each day. We want to group all 24 rows into a single summary row for each day.

Step 16: Pivot tables in Google Sheets

With your cursor somewhere in the pedestrian dataset, click Data < Pivot table

In the pivot table, add hour_beginning to the Rows.

Uncheck the “Show totals” checkbox.

Right click on one of the dates in the pivot table and choose “Create pivot date group“.

Select “Day of the month” from the list of options.

Add hour_beginning to Rows again, and move it so it’s the top category in Rows.

Check the “Repeat row labels” checkbox.

Right click on one of the dates in the pivot table and choose “Year-Month” from the list of options.

Add Pedestrians field to the Values section, and leave it set to the default SUM.

Your pivot table should look like this, with the total pedestrian counts for each day:

Google Sheets pivot table

Now let’s recreate this in BigQuery.

If you’ve ever used the QUERY function in Google Sheets then you’re probably familiar with the GROUP BY keyword. It does exactly what the pivot table in Sheets does and “rolls up” the data into the summary categories.

Step 17: GROUP BY in BigQuery to aggregate data

First off, you need to use the EXTRACT function to extract the date from the timestamp in BigQuery.

This query selects the extracted date and the original timestamp, so you can see them side-by-side:

SELECT
  EXTRACT(DATE FROM hour_beginning) AS bb_date,
  hour_beginning
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`

The EXTRACT DATE function turns “2017-10-01 00:00:00 UTC” into “2017-10-01”, which lets us aggregate by the date.

Modify the query above to add the SUM(Pedestrians) column, remove the “hour_beginning” column you no longer need and add the GROUP BY clause, referencing the grouping column by the alias name you gave it “bb_date”

SELECT
  EXTRACT(DATE FROM hour_beginning) AS bb_date,
  SUM(Pedestrians) AS bb_pedestrians
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
GROUP BY 
  bb_date

The output of this query will be a table that matches the data in your pivot table in Google Sheet. Great work!

Functions in BigQuery

You’ll notice we used a special function (EXTRACT) in that previous query.

Like Google Sheets, BigQuery has a huge library of built-in functions. As you make progress on your BigQuery journey, you’ll find more and more of these functions to use.

For more information on functions in BigQuery, have a look at the function reference.

There’s also this handy tool from Analytics Canvas that converts Google Sheets functions into their BigQuery SQL code equivalent.

Filtering Aggregated Data

We saw the WHERE clause earlier, which lets you filter rows in your dataset.

However, if you aggregate your data with a GROUP BY clause and you want to filter this grouped data, you need to use the HAVING keyword.

Remember:

  • WHERE = filter original rows of data in dataset
  • HAVING = filter aggregated data after a GROUP BY operation

To conceptualize this, let’s apply the filter to our aggregate data in the Google Sheet pivot table.

Step 18: Pivot table filter in Google Sheets

Add hour_beginning to the filter section of your pivot table in Google Sheets.

Filter by condition and set it to Date is before > exact date > 11/01/2017

This filter removes rows of data in your Pivot Table where the data is on or after 1 November 2017. It leaves just the October 2017 data.

By now, I think you know what’s coming next.

Let’s apply that same filter condition in BigQuery using the HAVING keyword.

Step 19: HAVING filter keyword

Add the HAVING clause to your existing query, to filter out data on or after 1 November 2017.

Only data that satisfies the HAVING condition (less than 2017-11-01) is included.

SELECT
  EXTRACT(DATE FROM hour_beginning) AS bb_date,
  SUM(Pedestrians) AS bb_pedestrians
FROM
  `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
GROUP BY 
  bb_date
HAVING 
  bb_date < '2017-11-01'

The output of this query is 31 rows of data, for each day of the month of October.

Get started with Google BigQuery: Joining Data

A SQL Query walks into a bar.
In one corner of the bar are two tables.
The Query walks up to the tables and asks:
Mind if I join you?

JOIN pulls multiple tables together, like the VLOOKUP function in Google Sheets. Let's start in your Google Sheet.

Step 20: Vlookup to join data tables in Google Sheets

Create a new blank Sheet inside your Google Sheet.

Add this formula to import the bicycle bridge data:

=IMPORTRANGE("https://docs.google.com/spreadsheets/d/1TvebfUaO03fkzB0GGMw07mnpzrprTubixmgCMdyMRXo/edit#gid=1409549390","Sheet1!A1:J32")

Back in the pivot table sheet, use a VLOOKUP to bring the Brooklyn Bridge bicycle data next to the pedestrian data.

Put the VLOOKUP in column D, next to the pedestrian count values:

=VLOOKUP( DATE(2017,10,B2) , Sheet2!A1:F , 6 , false )

Drag the formula down the rows to complete the dataset.

The data in your Sheet now looks like this:

Google Sheets Pivot Table Results

That's great!

We summarized the pedestrian data by day and joined the bicycle data to it, so you can compare the two numbers.

As you can see, there's around 10k - 20k pedestrian crossings/day and about 2k - 3k bike crossings/day.

Joining tables in BigQuery

Let's recreate this table in BigQuery, using a JOIN.

Step 21: Upload bicycle data to BigQuery

Following step 5 above, create a new table in your start_bigquery dataset and upload the second dataset, of bike data for NYC bridges from October 2017.

Name your table "nyc_bridges_bikes"

Your project should now look like this in the Resources pane in the left sidebar:

BigQuery Project hierarchy

What we want to do now is take the table the you created above, with pedestrian data per day, and add the bike counts for each day to it.

To do that we use an INNER JOIN.

There are several different types of JOIN available in SQL, but we'll only look at the INNER JOIN in this article. It creates a new table with only the rows from each of the constituent tables that meet the join condition.

In our case the join condition is matching dates from the pedestrian table and the bike table.

We'll end up with a table consisting of the date, the pedestrian data and the bike data.

Ready? Let's go.

Step 22: JOIN the datasets in BigQuery

First, wrap the query you wrote above with the WITH clause, so you can refer to the temporary table that's created by the name "pedestrian_table".

WITH pedestrian_table AS (
  SELECT
    EXTRACT(DATE FROM hour_beginning) AS bb_date,
    SUM(Pedestrians) AS bb_pedestrians
  FROM
    `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
  GROUP BY 
    bb_date
  HAVING 
    bb_date < '2017-11-01'
)

Next, select both columns from the pedestrian table and one column from the bike table:

SELECT 
  pedestrian_table.bb_date,
  pedestrian_table.bb_pedestrians,
  bike_table.Brooklyn_Bridge AS bb_bikes
FROM
  pedestrian_table

Of course, you need to add in the bike table to the query so the bike data can be retrieved:

INNER JOIN 
  `start-bigquery-294922.start_bigquery.nyc_bridges_bikes` AS bike_table

Finally, specify the join condition, which tells the query what columns to match:

ON 
  pedestrian_table.bb_date = bike_table.Date

Phew, that's a lot!

Here's the full query:

WITH pedestrian_table AS (
  SELECT
    EXTRACT(DATE FROM hour_beginning) AS bb_date,
    SUM(Pedestrians) AS bb_pedestrians
  FROM
    `start-bigquery-294922.start_bigquery.brooklyn_bridge_pedestrians`
  GROUP BY 
    bb_date
  HAVING 
    bb_date < '2017-11-01'
)
SELECT 
  pedestrian_table.bb_date,
  pedestrian_table.bb_pedestrians,
  bike_table.Brooklyn_Bridge AS bb_bikes
FROM
  pedestrian_table
INNER JOIN 
  `start-bigquery-294922.start_bigquery.nyc_bridges_bikes` AS bike_table
ON 
  pedestrian_table.bb_date = bike_table.Date

You'll notice that the names of the columns in our SELECT clause are preceded by the table name, e.g. "pedestrian_table.bb_date".

This ensures there is no confusion over which columns from which tables are being requested. It’s also necessary when you join tables that have common column headings.

The output of this query is the same as the table you created in your Google Sheet step 20 (using the pivot table and VLOOKUP).

Google BigQuery join query

Formatting Your Queries

Last couple of things to mention with the SQL syntax is how to add comments and format your queries.

Step 23: Formatting Your Queries

You can add comments in SQL two ways, with a double dash "--" or forward slash and star combination "/*...*/".

-- single line comment, ignored when the program is run

or

/* multi-line comment
everything between the slash-stars 
is ignored by the program when it's run */

It's also a good habit to put SQL keywords on separate lines, to make it more readable.

Use the menu More > Format to do this automatically.

Section 5: Export Data Out Of BigQuery

You have a few options to export data out of BigQuery.

In the Query results section of the editor, click on the "? SAVE RESULTS" button to:

  • Save as a CSV file
  • Save as a JSON file
  • Export query results to Google Sheets (up to 16,000 rows)
  • Copy to Clipboard

In this tutorial, we're going to export the data out of BigQuery and back into a Google Sheet, to create a chart. We're able to do this because the summary dataset we've created is small (it's aggregated data we want to use to create a chart, not the row-by-row data).

Explore BigQuery Data in Sheets or Data Studio

If you want to create a chart based on hundreds of thousands or millions or rows of data, then you can explore the data in Google Sheets or Data Studio directly, without taking it out of BigQuery.

Click on the "? EXPLORE DATA" option in the Query results section of the editor:

  • Explore in Google Sheets using Connected Sheets (Enterprise customers only)
  • Explore directly in Data Studio

Get started with Google BigQuery: Export to Google Sheets

In this tutorial, the output table is easily small enough to fit in Google Sheets, so let's export the data out of BigQuery and into Sheets.

There, we'll create chart a chart showing the pedestrian and bike traffic across the Brooklyn Bridge.

Step 24: Export Data Out Of BigQuery

Run your query from step 22 above, which outputs a table with date, pedestrian count and bike count.

Click on the "? SAVE RESULTS" and select Google Sheets.

Hit Save.

Select Open in the toast popup that tells you a new Sheet has been created, or find it in your Drive root folder (the top folder).

The data now looks like this in the new Sheet:

Export data from BigQuery to Google Sheets

Yay! Back on familiar territory!

From here, you can do whatever you want with your data.

I chose to create a simple line chart to compare the daily foot and bike traffic across Brooklyn Bridge:

Step 25: Display the data in a chart in Google Sheets

Highlight your dataset and go to Insert > Chart

Select the line chart (if it isn't selected as the default).

Fix the title and change the column names to display better in chart.

Under the Horizontal Axis option, check the "Treat labels as text" checkbox.

Brooklyn Bridge pedestrian and bike traffic

See how much information this chart gives you, compared to thousands of rows of raw data.

It tells you the story of the pedestrian and bike traffic crossing the Brooklyn Bridge.

Congratulations! ???

You've completed your first end-to-end BigQuery + Google Sheets data analysis project.

Seriously, well done!

Get started with Google BigQuery: Resources

BigQuery Documentation

Explore the public datasets in BigQuery for query practise.

Google BigQuery: The Definitive Guide book is quite advanced for a beginner but extremely comprehensive.

The full code for this tutorial "Get started with Google BigQuery" is also available here on GitHub.

How To Merge Cells In Google Sheets And When To Be Careful

In this tutorial, you’ll learn how to merge cells in Google Sheets, when to use merged cells in Google Sheets, the pros and cons of using merged cells, and finally, how to identify them with Apps Script.
Continue reading How To Merge Cells In Google Sheets And When To Be Careful

When Your Formula Doesn’t Work: Formula Parse Errors in Google Sheets

Whether you’re just starting out with Google Sheets or are a seasoned pro, sooner or later one of your formulas will give you a formula parse error message rather than the result you want.

It can be frustrating, especially if it’s a longer formula where the formula parse error may not be obvious.

In this post, I’ll explain what a Google Sheets formula parse error is, how to identify what’s causing the problem, and how to fix it.

What is a formula parse error?

Before we get into the different types of errors, you might be wondering what does formula parse error mean?

Essentially, it means Google Sheets can’t interpret your formula. It can’t fulfill the formula request so it returns an error message.

There are a variety of ways this can happen — everything from typos to mathematical impossibilities — and we’ll explore them all in detail below.

Understanding the meaning behind the error messages, and learning how to fix them, is a crucial step to becoming a formula pro in Google Sheets.

Auditing and Debugging Formula Parse Errors in Google Sheets

Match the error message in your Google Sheet to the sections below, and find out what might be causing your error.

  1. An formula parse error message popup prevents me entering my formula
  2. I’m getting an #N/A error message
  3. I’m getting an #DIV/0! error message
  4. I’m getting an #VALUE! error message
  5. I’m getting an #REF! error message
  6. I’m getting an #NAME? error message
  7. I’m getting an #NUM! error message
  8. I’m getting an #ERROR! error message
  9. I’m getting an #NULL! error message
  10. Other strategies for dealing with errors
  11. Functions to help deal with formula errors in Google Sheets
  12. Help! My formula is STILL not working

Here’s a Google Sheet with all these examples in.

1. A formula parse error message popup prevents me entering my formula

You think you’ve finished your formula, so you hit enter and boom! You get slapped with a popup message box "Houston, we have a problem" or similar:

Formula parse error in Google Sheets

It’s reasonably rare that you’ll experience this, and it usually points to some fundamental problem with your formula.

For example, imagine that as you hit the Enter key, you also accidentally struck the “\” key (which is right above the Enter key) and inadvertently added that to the end of your formula:

Unwanted character causes formula parse error

This will result in the popup error message. It’s easily corrected by removing the unwanted character.

How to correct this error?

Try to avoid these in the first place by checking your formula prior to hitting enter. Make sure you’re not missing a cell reference and you don’t have any unwanted characters lurking.

2. I’m getting an #N/A error message. How do I fix it?

The #N/A formula parse error signifies that a value is not available.

#N/A error in Google Sheets

It happens most frequently when you’re using a lookup function (e.g. the VLOOKUP function) and the search term isn’t found. This is exactly what has happened in the exact match VLOOKUP in the image above. The search term A-051 is not in our data table so the formula returns #N/A.

This formula is not wrong or broken, so we don’t want to delete it. However, it would be cool if you could display a custom message, something like “Result not found”, instead of #N/A error message, especially if you have a lot of these errors showing. It gives the spreadsheet user much more information and reduces confusion.

Thankfully we can:

How to correct an #N/A error?

Well, there’s this super handy IFERROR function in Google Sheets:

=IFERROR(original formula, value to display if the original formula gives an error)

In this VLOOKUP example, the full formula would look like this:

=IFERROR(VLOOKUP(Search Term, Table, Column Index, FALSE),”Search term not found”)

as shown in this example:

iferror and vlookup Formula parse error example

Instead of showing the #N/A formula parse error when a value is not found, the formula will output our custom message instead “Search term not found”.

3. I’m getting an #DIV/0! error message

This formula parse error happens when a number is divided by zero, which can occur when you have a zero or a blank cell reference in the denominator.

In layman’s terms, what this means is that we’re trying to compute something like this:

= A / 0

which has no meaning because you can’t divide by 0.

Read more about division by 0 here, although it gets super technical super quickly.

Division by 0 error

Another example is using a formula like AVERAGE with a blank range.

So, = AVERAGE(A1:A10)   will cause a #DIV/0! error if the range A1:A10 contains no numerical values.

How to correct an #DIV/0! error?

Well the first thing to do is determine why your denominator is evaluating to zero.

You can select the denominator and see what it is evaluating to by highlighting it in the formula bar, and seeing what the result is in the little popup box, as shown in this image:

Divide by 0 error evaluation

In this case, the formula in the denominator SUM(A1:A7) evaluates to 0, which causes the error. So check whether your denominator result is 0.

Next, check whether you have linked to blank cells or a blank range in your denominator. Then you can either fill in the blank cell or range, or select a different cell or range for your formula.

If your formula is correct and your cell/ranges are not unintentionally blank, then you’ll want to handle the #DIV/0! error. It looks unsightly and makes your spreadsheet look unfinished if you leave these errors floating around.

As with the #N/A error example, use the IFERROR formula to wrap your current formula and specify a result for when a #DIV/0! error occurs. You might want to output an error message, e.g. “Division by 0 error”, or maybe a specific value, e.g. 0:

Iferror to handle div 0 error

4. I’m getting an #VALUE! error message

This formula parse error typically occurs when your formula is expecting a certain data type as an input but receives the wrong type, for example trying to do math operations on a text value instead of a numerical value.

Spaces in your cells can also cause this error message.

In this example, cell B1 contains a space, which is a string value and causes the #VALUE! error because Google Sheets can’t perform a math operation on it, as seen in this error message:

value error in google Sheets

In general, Google Sheets do a pretty good job of coercing text into numbers when needed. If you enter a value into a cell with some spaces, format it as text and then try to do math on it, Google Sheets will actually force the text into a number and still perform the calculation.

Another cause of #VALUE! errors is mixing US and Rest of World date formats.

US dates have the form MM/DD/YYYY whilst the Rest of the World goes for DD/MM/YYYY. If you have a mix of the two and try to subtract them to get the number of days between them for example, you’ll get the #VALUE! error.

(In fact, it’s the same text/number issue happening underneath the surface. Dates are stored as numbers, but if you’re date is in the wrong format for the country setting for your spreadsheet, it’ll be stored as a text string and Google won’t know it’s meant to be a date.)

Value error caused by dates

Here the correct answer should have been 59, the number of days between the 28 Feb 2017 and the 31st Dec 2016.

How to correct an #VALUE! error?

The error message should give you some information on which part of your formula is causing the problem.

Search for any possible text/number mismatches, or cells containing errant spaces. If you click into a cell and the flashing cursor has a gap between itself and the element it’s next to, then you’ll have a space there.

Cells can look empty but still contain spaces:

Value error explained

Dates with spaces in the middle won’t work either:

Date Value error explained

5. I’m getting a #REF! error message

The #REF! formula parse error occurs when you have an invalid reference.

Missing reference: For example when you reference a cell in your formula that has since been deleted (not the value inside the cell, but the whole cell has been deleted, typically when you’ve deleted a row or column in your worksheet).

In this example, the original formula was = A1 * B1, but when I deleted column A, the formula went haywire because of the missing reference:

Ref error message

Another way that a formula can refer to missing references is when you copy a formula with a relative range at the edge of your sheet. When you copy and paste, it’s possible the relative range moves as if it were outside the bounds of the sheet, which is not allowed and will cause a #REF! error.

In this example, the sum function adds the cells in the 3 rows above. When I try to copy-paste the sum function into a new cell with fewer than 3 rows above, it’ll give me the #REF! error:

Ref Formula parse error caused by copy

Lookup out of bounds: You’ve probably seen the #REF! error if you use lookup formulas frequently, when you’ve tried to return a value outside of ranges you’ve specified. In this VLOOKUP example, I’m trying to return an answer from the 3rd column of a search table that only has 2 columns:

Ref error message lookup out of bounds

Circular dependency: You’ll also get a #REF! error when a circular dependency is detected (when the formula refers to itself).

Ref error message circular dependence

In this example, I have numbers in the range A1 to A3, but the SUM formula in cell A4 tries to sum from A1 to A4, which includes itself. Hence, we have a circular argument where cell A4 is trying to be both an input and output cell, which is not allowed.

How to correct a #REF! error?

First of all, read the error message to determine what kind of #REF! error you’re dealing with. This should give you a big hint on how to correct the error.

For deleted references, look for the #REF! error is inside your formula, and replace the #REF! with the correct reference to a cell or range.

For out-of-bound lookup errors, look through your formula carefully and check your range sizes against any row or column indexes you’re using.

For circular dependencies, find the reference that’s causing the problem (i.e. where you refer to the current cell inside your formula too) and modify it.

6. I’m getting a #NAME? error message

The #NAME? formula parse error signifies a problem with your formula syntax.

The most common reason for this error is a misspelling in one of your function names.

In this example, I misspelt the SUM function as SUMM, which Google Sheets didn’t recognize, so returned an error:

Sum error from misspelling

Another reason for a #NAME? error is referencing a named range which doesn’t actually exist, or is misspelt.

So

=SUM(profit)

will give you a #NAME? error if the named range profit does not exist

Missing quotation marks around a text value, as shown in this simple formula, will also cause a #NAME? error:

=CONCAT(“First”,Second)

(The word Second is missing quotation marks.)

How to correct an #NAME? error?

Check your function names are correct. Use the function helper wizard to reduce the chances of errors happening, especially for the functions with longer names. As you start typing your formula, you’ll see a menu of functions, which you can select with the up and down arrows and Tab.

Check you have defined all named ranges before using them in your formulas and that they all have the correct spellings.

Check any text values are entered with the required quotation marks.

Lastly, have you missed the colon in your range references? It’ll be obvious because it won’t be highlighted correctly.

This formula =SUM(A1A10)

is missing the colon between A1 and A10 and will throw a #NAME? error.

It should of course read =SUM(A1:A10)

7. I’m getting an #NUM! error message

The #NUM! formula parse error is shown when your formula contains numeric values that aren’t valid.

The classic example is trying to find the square root of a negative number, which isn’t allowed:

Num error in google sheets

(For any math geeks out there, you’ll know that you can resolve square roots of negative numbers with complex (imaginary) numbers.)

Some other functions that can result in #NUM! error messages are the SMALL and LARGE functions. If you try to find the smallest n-th value in your dataset, where n is outside the count of values in your dataset, you’ll get a #NUM! error.

For example, you ask Google Sheets to find the 10th smallest number in a dataset that only has 5 values in it:

Num error caused by small function

(Why this doesn’t return a #REF! error like the VLOOKUP out of bounds example, I don’t know.)

How to correct a #NUM! error?

You need to check the numeric arguments in your formula. The error message should give you some hints about which part of the formula is causing the issue.

8. I’m getting an #ERROR! formula parse error message

This formula parse error message is unique to Google Sheets and doesn’t have a direct equivalent in Excel. It means that Google Sheets can’t understand the formula you’ve entered, because it can’t parse the formula to execute it.

For example, if you manually type in a $ symbol to refer to an amount, but Google Sheets thinks you’re referring to an absolute reference:

Error Formula parse error

or you’ve missed a “&” when concatenating text and numerical values:

Error error concatenation

In this case the formula should be: =”Total “&sum(A1:A3)

Another case, caused when we messed up the closing brackets of a formula:

Error Formula parse error

How to correct an #ERROR! error?

Carefully check your formula for accuracy.

You want to ensure you’ve got the correct number of brackets and correct join syntax between text and numerical values (e.g. using “&”).

When you want to show values with currency symbols or as percentages, don’t manually type in the “$” or the “%”. Instead enter a plain number and then use the formatting options to change it to the style you want.

9. I’m getting an #NULL! error message

I haven’t been able to recreate a #NULL! formula parse error in the wild but theoretically, it exists!

Null Formula parse error

(If you have one showing in your sheet, let me know! I’d love to update this article with an example here.)

10. Other strategies for dealing with a formula parse error

Look for red highlighting in your formula as this will help identify the source of your error e.g. in the case of too many brackets, the extra, superfluous ones will be highlighted in red.

Peeling back the onion: this is a technique to debug errors for long, complex formulas. Unwrap the outer functions in your formula one-by-one, until you get it working again. Then you can start to add them back one-by-one again, and see exactly which step is causing the issue and fix that.

Different syntax in different countries: Some European countries will use semi-colons “;” in place of commas “,” so this could be a cause of your error. Compare these two formula, which have identical inputs and outputs, but the syntax is different for users in different countries (locales).

=ArrayFormula(VLOOKUP(A1;Sheet2!A:I;{2\3\4\5\6\7\8};FALSE))

is the same formula as this:

=ArrayFormula(VLOOKUP(A1,Sheet2!A:I,{2,3,4,5,6,7,8};FALSE))

(This is an example of a VLOOKUP returning multiple values (an array) instead of just a single value.)

Pro tip:

Use apostrophe at the start of a formula to turn it into a text string, which won’t execute. This is sometimes useful for seeing your whole formula for debugging, keeping a copy of your formula so you can copy and paste bits of it elsewhere for testing.

11. Functions to help deal with formula parse errors in Google Sheets

A few other functions related to formula parse errors are worth knowing about.

In fact, there is even a function to generate #N/A errors. It’s of limited use, but can be helpful for doing data validation in more complex formulas.

=NA()    will output an #N/A error. (Google Docs Help on NA)

=ERROR.TYPE(value)    will return a number corresponding to the error type:

  • 1 for #NULL!
  • 2 for #DIV/0!
  • 3 for #VALUE!
  • 4 for #REF!
  • 5 for #NAME?
  • 6 for #NUM!
  • 7 for #N/A
  • 8 for all other errors

(Google Docs Help on ERROR.TYPE)

=ISNA(value)
checks whether a value is the error #N/A, and will give the output TRUE for a #N/A error and FALSE otherwise. (Google Docs Help on ISNA)

=ISERR(value)
checks whether a value is any error other than the #N/A error. (Google Docs Help on ISERR)

=ISERROR(value)
checks whether a value is an error, and will give the output TRUE for any error. (Google Docs Help on ISERROR)

These functions can be summarized in the following table:

#N/A error functions

13. Help! My formula is STILL not working

Take a deep breath, don’t panic! There’s an army of Google Sheets super users out there who would love to help you fix your issue, free of charge, in the active help forums.

Try posting your problem into the forum and someone will likely help you out.

To make it easier for people to help you, please share your Google Sheet (either view-only or create a redacted copy if sharing is a concern), what error message you’re getting and what you were expecting the correct answer to be.

Google Sheets Help Forum

Goal Seek in Google Sheets

Goal Seek for Sheets is an Add-On for Google Sheets for doing Goal Seek type data analysis.

In October 2019, Google launched an official Add-On, called “Goal Seek for Sheets”, and it is that Add-On that this tutorial references.

1. What is Goal Seek?

It’s a tremendously powerful and useful technique in data analysis. It’s a process where you set an output you want to achieve (e.g. break even, sell 10k units, save $1m) and let the computer find the input value that will get you there (e.g. 500 attendees, $100k capital lump sum, save $8k/year).

There are three components: 1) the unknown input variable, 2) the equation or calculation that is performed on the input variables to get the output, and 3) the known output.

The Goal Seek algorithm performs a series of “what-if” calculations by plugging in different input values. Each guess (hopefully) gets closer and closer to the solution.

For example, a classic use case of Goal Seek is to determine the number of sales required to break even, given other variables like fixed costs etc.

2. How do you use Goal Seek in Google Sheets?

Imagine Jennifer runs an annual conference for Google Sheet developers called “Sheet Freakz ?”.

She has a great venue picked out with room for 500 and she’s confident she can fill it. She knows what her costs are — the rental fee for the room, the cost of catering, the cost of promoting the conference — and she has agreed a $1,500 fee with 15 Google Sheets experts to come and talk about the latest and greatest in Sheets developments.

(Editor note: I wish this was a real conference!! ? It is! It’s called SheetsCon. Check out 2020’s wrap up or watch the replays.)

What price must she charge to cover her costs?

This is a classic break even cost analysis example that the Goal Seek Add-On is ideally suited for solving.

Setting up the Sheet

The first step is to simply add all of the known variables into a sheet, like so:

Goal Seek Variables in Google Sheet

These are the variables that Jennifer knows at the start of her problem.

Next, add a line for the registration fee per attendee, but set it to 0 for now. I’ve highlighted the cell yellow to indicate that it’s the solution cell that I want Goal Seek to solve for:

Goal Seek Variables

Finally, add a profit line, which is my revenue (# of attendees * registration fee) less expenses (fixed costs + (# of speakers * speaker fee)):

= ( B7 * B8 ) - ( B4 + ( B5 * B6 ) )

Goal Seek setup

Of course, initially, my profit is -$47,500 because I have no attendees and hence $0 revenue.

It’s time to use Goal Seek and let it find the break even registration fee for us.

How do you add Goal Seek in Google Sheets?

Goal Seek is an Add-On, which means you need to add it to your Google Sheet before you can use it.

Search for “Goal Seek” in the Add-Ons marketplace, found under the menu Add-ons > Get add-ons

The official Google Add-On information page will appear.

Click to install. That’s all it takes to add it to your Sheets.

Goal Seek for Sheets

You can also find it in the G Suite marketplace directly by clicking here.

Using Goal Seek

Open the Goal Seek sidebar: Add-ons > Goal Seek > Open

There are three pieces of information you need to enter:

i) Set Cell

Jennifer wants to know what price to charge to break even. In other words, what’s the minimum ticket price to ensure her profit is $0 and she doesn’t lose any money on the conference.

The “Set Cell” is the one we want to specify a value for. It’s the target we’re aiming for.

It’s the cell with the calculation formula in.

** With the Goal Seek sidebar open, click on the cell with the formula, which is cell B10 in this case (1). Add that reference to the Goal Seek solver by clicking on the grid icon next to the Set Cell box (2). This will auto-populate with the cell B10 reference. **

Select cells in Goal Seek for Sheets

ii) To Value

Next, type in the value of the output you want to achieve in the “Set Cell” box.

In this example, we want to set the profit value to 0, so we simply type 0 into this input box.

iii) By Changing Cell

What input variable do we want to vary to solve our equation?

In this case it’s the registration fee. It’s the variable that Jennifer is trying to find, such that her profit is $0 and she breaks even.

Select the cell that holds this variable and then click on the grid icon next to the input field to auto-populate it.

Read it out loud to understand what you’re asking the application to do: “Set Cell X To Value Y By Changing Cell Z”.

In our case, “Set Cell Profit To $0 By Changing Cell Registration Fee” or even cleaner “Set Profit To $0 By Changing Registration Fee”.

When you have all three inputs filled in for the Goal Seek application, the Solve button will turn blue and become active.

Press it.

The registration fee value will start jumping around all over the place as the computer tries different guesses to see what brings the profit value closer to 0.

Eventually it’ll find a solution and notify you that it’s done!

Goal Seek to determine break even cost

In this example, it’s found that the break even registration fee is $94.99999 dollars, or $95. Great!

Click here to open a Google Sheet template with all the examples from this tutorial >>

(Feel free to make your own copy: File > Make a copy…

If the file won’t open without permission, please open in an incognito window and copy from there.)

Manual Checks

It’s always a good idea to check the final solution that the Add-On finds, and not just trust it blindly.

In our example, it’s very simple to check that the two sides of the equation balance.

Jennifer’s expenses to run the conference are:

Expenses
$25,000 fixed costs + ( 15 speakers @ $1,500 each ) = 25,000 + ( 15 * 1,500 ) = $47,500

And her revenue on the other side of the equation will be:

Revenue
500 attendees * $94.99 registration fee (Goal Seek solution) = $47,499.99

The difference is simply a rounding error.

This is good.

The result from the Goal Seek is indeed a solution that gets Jennifer the break even registration price.

Another way to look at how the Goal Seek solver works is to visualize it. A very simplified version might look something like this:

Attempt 1

The computer has no idea what the solution is, so it makes a guess. For example, it might overestimate the result:

Guess 1: Overestimate

Attempt 2

The computer makes another guess. This time it might underestimate the result:

Guess 2: Underestimate

(It won’t always be a neat over/under/over/under guess. For example, if it guesses low, it might take many guesses before the first “over” guess happens. So this over/under flow is just to illustrate the concept.)

Attempt 3

With each additional “guess” the computer gets more accurate, because it uses the information from prior guesses to get closer to the solution. It might still overestimate, as shown here, but it’s getting closer to the solution:

Guess 3: Overestimate

Attempts 4 onwards

So on and so forth, as the computer makes guesses that get closer and closer, under, over, under, over, under, over, etc. until the solution is found:

Converging on solution

The program converges on the solution.

3. Other features of the Goal Seek Add-On

These features are all found in the sidebar underneath the input section for the Goal Seek variables.

Options

Under the Options menu, you can adjust the default settings for the Goal Seek solver.

You can change the 1) max number of iterations, 2) the tolerance (how accurate you need to be) and 3) the maximum time limit for the process, in seconds.

Goal Seek Settings

I’d suggest that the default settings will suffice for the majority of scenarios, but it’s good to know that you can make these changes should you need to.

Solve Status

The Solve Status box displays helpful information about the current Goal Seek solution.

It lets you know when the algorithm is finished, what the final status is, how many iterations it required and how long it took.

Solve Status in Goal Seek for Sheets

History

Access previous runs of the Goal Seek solver under the History menu.

Use the drop down menu to choose a prior solution based on a timestamp.

Goal Seek History

Error Messages

Occasionally the Goal Seek solver fails to find a solution.

One reason might be that the computer guesses get successively further away from the actual solution. They diverge away from the solution.

Should this happen, you’ll see an error message like this:

Goal Seek error message

4. More Goal Seek examples

Conference Break Even Example For Attendee Numbers

In the conference example above, instead of knowing how many people would attend, suppose Jennifer knew the registration fee.

She charges $299 for the registration fee and wants to know how many attendees she requires to break even?

The “changing cell” in this case becomes the number of attendees, rather than the registration fee.

The setup would be:

Goal Seek conference Example 2

The formula in cell B10 does not change from the original example. It’s still:

= ( B7 * B8 ) - ( B4 + ( B5 * B6 ))

The Goal Seek settings are:

Settings for conference example 2

Run Goal Seek and the answer comes back as 158.86.

In other words Jennifer needs 159 attendees paying a registration fee of $299 to break even.

Mortgage Calculation Example

Suppose you’re looking to buy a new house. You have an upper limit for your monthly payments of $1,500.

The other known variables in this case are: it’s a 30 year term with an annual interest rate of 4.5%.

What’s the maximum amount you can borrow?

Goal Seek can solve this for you.

Firstly, setup the sheet with the known variables in your Sheet:

Mortgage Goal Seek Example

The payment equation uses the PMT function in Google Sheets in cell B8:

= -PMT( B6 / 12 , B5 , B7 )

In Goal Seek, set the “Set Cell” to be this equation in cell B8.

Set the “To Value” to $1,500 (the maximum monthly payment that can be tolerated).

Set the “By Changing Cell” to the Amount Borrowed in cell B7 (currently $0.00).

Click Solve and let Goal Seek find your solution.

The algorithm will churn through the possible solutions until it settles on one that satisfies your tolerance (accuracy) setting.

In this case, the maximum amount you could borrow is $296,041.75.

Mortgage Solver Solution

Retirement Calculation Example

Suppose you want to retire with a pot of $1.5m in 40 years time. You’re confident of getting a 5% return on your investments.

What’s the annual contribution you need to make each year to hit this target?

Let’s use Goal Seek to find out.

Firstly, setup the sheet with the known variables in your Sheet:

Retirement Goal Seek Example

The calculation of the retirement pot value uses the Future Value function, the FV function, in Google Sheets in cell B7:

= FV( B5 , B4 , -B6 , 0 , 0 )

In Goal Seek, set the “Set Cell” to be this equation in cell B7.

Set the “To Value” to $1,500,000 (your target retirement pot).

Set the “By Changing Cell” to the Annual Contribution in cell B6 (currently $0.00).

Click Solve and let Goal Seek find your solution.

In this case, you need to contribute $12,417 each year to hit your retirement pot target of $1.5m.

Retirement Solution

Click here to open a Google Sheet template with all the examples from this tutorial >>

(Feel free to make your own copy: File > Make a copy…

If the file won’t open without permission, please open in an incognito window and copy from there.)

Resources

Google Documentation on the Goal Seek feature.