Connected Sheets: Analyze Big Data In Google Sheets

The future of Google Sheets is analyzing millions, even billions, of rows of data with regular formulas and pivot tables.

The future of Google Sheets is Connected Sheets.

That’s a bold claim, so let’s explore it.

Check out this formula operating on 1.5 billion rows of data:

Billion Row Formula Connected Sheets

I just ran a COUNT function across 1.5 Billion rows of data. Inside my Google Sheet.

1.5 Billion rows of data!


Fzzzzzzt……that’s the sound of all the circuit boards in my head frying simultaneously 🤯

Connected Sheets

I think we can all agree there’s a lot of data out there. We live in a data-rich world. And it’s only ever going to grow from here as more and more people and devices come online.

And we can also agree that “big data” is too large to store and analyze in your Google Sheets.

Until now…

Connected Sheets is a new data feature in Google Sheets for customers of G Suite Enterprise, G Suite Enterprise Essentials and G Suite Enterprise for Education.

You use the Connected Sheets feature to analyze millions or billions of rows of data inside your Google Sheets, using regular functions, pivot tables and charts.

The data lives in Google BigQuery, which is a big data analysis product from Google Cloud.

BigQuery is an incredible tool but does require knowledge of SQL code to use it.

Connected Sheets circumvents that need and gives Google Sheets users the ability to analyze big data inside their Sheets, using the familiar interface.

What Can Connected Sheets Do?

Connected Sheets can do similar operations with big datasets that you’re used to doing with regular datasets in Google Sheets, namely:

  1. Running a defined set of functions on big datasets*
  2. Creating calculated columns with big datasets*
  3. Extracting small data tables, called Extracts, from big datasets*, which can be used like regular Google Sheets tables
  4. Creating pivot tables from big datasets*
  5. Creating charts from big datasets*
  6. Scheduling automatic data refresh jobs to keep data current

* i.e. tables with millions or even billions of rows of data in BigQuery

I think the best way to explore these features is to run through a quick demo data analysis project.

Analysis Of New York Citibike Data

You access Connected Sheets through the Data menu of your Google Sheet.

You need to have a project setup in BigQuery with billing enabled before you can connect to it from the Connected Sheets menu.

When you create a new Connected Sheet, you’re prompted through a series of options to select a BigQuery project and table that you want to connect to.

In this example, I’ve connected to the New York Citibike public dataset, which has 59 million rows of data and 16 columns, which would be nearly a billion cells of data if we could put it into a spreadsheet.

Connected Sheets Data Connection

When it opens, you’re shown a preview of the dataset, i.e. all of the columns and the first 500 rows.

New York citibike dataset Connected Sheets

It looks like an ordinary Google Sheet, but there are five action buttons above the Sheet: Chart, Pivot Table, Function, Extract and Calculated Column.

You’ll also notice the word “Refresh options” next to the table name and “Refresh” down at the bottom in the preview note. You need to refresh periodically to include any new data from BigQuery in your Connected Sheets results. You can schedule these refreshes to happen automatically.

Now let’s explore each of the features in turn, starting with functions:

Functions with Connected Sheets

Functions in Connected Sheets are useful for getting specific answers from your data, i.e in situations when you know what you’re looking for.

For example, if I want to count how many rows of data there are in a dataset then I would use a simple COUNT function.

It’s exactly the same here with the Big Dataset in the Connected Sheet. I use a simple function in Connected Sheets.

The flow is a little different, as there is an extra click to activate the function, but otherwise it’s a regular ole’ formula:

Connected Sheets function Apply Button

The formula looks like this in the Sheet:


And gives the answer:


53 million rows of data, each representing a bike trip with one of New York City’s Citibike bikes.

Boom! It’s as easy as that!

Comparison: To do the same calculation in BigQuery, you need to write a simple query in the BigQuery editor:


Which gives the same answer of 53,108,721

As you can see, it requires knowing your way around BigQuery and of course, how to write basic SQL.

Wait a minute though!

This dataset is supposed to have 59m rows of data, not 53m.

There must be some blank rows (null values) in the data.

Let’s use Connected Sheets to test that hypothesis, using the Extract feature.

Connected Sheets Extracts

The Extract feature lets you extract smaller tables from the larger BigQuery dataset (remember the data shown in the Sheet is just a preview and the actual dataset lives in BigQuery).

Extracts are useful for things like extracting all records pertaining to a certain customer, or looking at the largest X transactions for example. Another use case might be investigating issues with datasets.

For example, I have a strong hunch there are null values (blanks) in the dataset.

So I create an Extract using a filter set to “Value is empty” to return a subset of empty rows.

The Extract Editor is similar to the pivot table Editor, allowing you to select which columns to include, what filters or sorts to apply and what the row limit is (how many to return).

Here’s the Extract editor prior to pressing the Apply button.

You can see the blue cross shading applied to the data area to show it hasn’t been queried yet.

Connected Sheets Table Extract

With every calculation in Connected Sheets, you need to press the “Apply” button to run the query in BigQuery to return the data.

The Extract is returned with blank rows as I expected. So there are null values in the dataset!

Presumably there are about 6 million of them, but let’s confirm that using a calculated column and function in Connected Sheets to count the null values.

Calculated Columns in Connected Sheets

Much the same way as you’d insert a column into a regular Google Sheet table and add in a formula to perform some calculation, you can do the same with Connected Sheets datasets.

Calculated columns are added to the right side of the Connected Sheets preview table (although the order doesn’t matter in a database, since you access a column by its name).

You can use a limited set of regular Google Sheets functions in calculated columns, which are then applied to the entire big dataset in BigQuery.

Let’s continue with the New York Citibike dataset.

Choose the “+ Calculated Column” button above the data preview.

The Calculated Column editor pops up, where you can enter a formula using column names instead of range references.

I’m going to multiply the start station ID by 1, because this will transform any blanks (nulls) into 0 values, which I can count.

Connected Sheets calculated column

I can now create a COUNTIF function using this calculated column, to count these null rows:

Connected Sheets Countif function

That looks about right!

Added together, those two values give us the 59 million rows in our full dataset.

Let’s add another calculated column to calculate a rider’s age.

The dataset contains the birth year for every rider, which we can use to calculate the age:

Connected Sheets calculated column

There are some very old people in the data!

To me this seems unlikely (people over 100 years old riding citibikes in NYC? Hmm, I don’t know about that). These are probably errors in the data and would merit further exploration.

But for now, let’s draw a chart of rides by age.

Connected Sheets Charts

You can showcase your results with Charts, created directly from the data in BigQuery.

Clicking on the Chart button above the dataset creates a chart and opens the chart editor.

It’s more akin to creating a pivot table than a regular Google Sheets chart. The same customization options are all there however.

Connected Sheets chart

Two things jumped out at me from this chart:

Firstly, it’s an example of a classic right-skewed distribution. It’s unsurprising to see the peak age of riders is in the mid- to late-thirties and then it gradually decreases from there. This is what you’d expect.

Secondly though, what on earth is that spike doing around age 51? It seems unlikely there would be a sudden, huge increase of ridership going from 50 to 51, and then equally precipitous drop-off going from 51 to 52. Perhaps it’s a strange mid-life crisis unique to New York City.

Let’s use a pivot table to investigate further.

Connected Sheets Pivot Tables

Pivot tables are an incredible tool that allow you to explore your datasets in depth.

In essence, pivot tables aggregate (i.e. roll-up or summarize) your data.

It’s quick and easy to change the variables shown in your pivot tables, so they’re a great tool to understand what’s going on with your data.

Pivot tables in Connected Sheets condense millions or billions of rows of data into concrete insights. Insights that would be impossible to find any other way.

In this example, the pivot table shows the number of rides summarized by the age (rows) and gender (column) of each rider. It includes the full 59m rows of data.

Pivot table on BigQuery dataset

Using this pivot table data, let’s recreate the rides-by-age chart again, but this time split by gender as well:

Connected Sheets Chart

And there we go!

The mystery is revealed.

There’s a group of riders in the data whose gender is unknown and who all have the same age around 51.

Perhaps there’s a ‘prefer not to say option’ for both “How old are you?” and “Gender?” questions during the sign up process.

That would explain the unknown gender designation, but what about the spike of riders aged 51?

Well, databases use a system called Unix time or Epoch time to measure points in time. This begins on 1/1/1970. So if some riders didn’t enter a date of birth, maybe the system sets the date of birth to the start of Epoch time, causing a spike at age 51. Very interesting!

More Connected Sheets Examples

Proportion of round trips

Suppose we want to know how many trips are round-trips, where the bike is dropped off at the starting station?

In other words, we want to identify all rows of data where the start station and end station match.

Here’s the formula to identify the round-trip rows, which will show as TRUE values.

= start_station_id = end_station_id

Connected Sheets Calculated Column

What’s super cool is that we can then use these calculated columns elsewhere too, for example in a pivot table where we can now answer the original question about identifying round-trips:

Connected Sheets Calculated Column in a Pivot Table

Out of the 59 million bike trips in the dataset, approximately 6.9 million started and finished at the same station (the TRUE values). About 12% of the trips.

Trips per month

Here’s a chart showing the number of bike trips per month. Although this looks like a regular Google Sheets chart, it includes all 59m rows of data from BigQuery (click to enlarge):

Big Data line chart

It clearly shows the increasing number of trips undertaken with Citibike, as well as clear seasonal cycles (the dips in winter).

It also illustrates another important point.

The chart is a static representation of your data at the time you created it.

You can see the timestamp in the bottom left, underneath the chart.

There’s also the option to refresh the data, which updates the chart.

Daily rides by gender

Here’s another chart example — this time drawn from the data in a Connected Sheets pivot table, rather than directly — which looks at the entire population of bike trips and counts the number on each day, split by gender.

Big Data chart

It’s clear to see that the demand is driven by commuters during the week, as evidenced by the peak during the week.

It also clearly shows there have been about three times as many male riders as female riders, to date.

One final feature to showcase is the refresh pane.

Refreshing Components

It’s important to note that the components aren’t live in the way the equivalent Google Sheets component is.

A Connected Sheets function is calculated at a given moment in time and has to be refreshed for new data to be included (whereas in a plain Google Sheet the formula will just reflect whatever data is included).

Similarly other components need to be refreshed. You can control them individually or collectively from the Refresh pane.

There’s also an option to schedule a data refresh, so that the data under consideration in your Connected Sheet is up-to-date. This can be done daily, weekly or monthly.

Connected Sheets Refresh

Availability of Connected Sheets

Connected Sheets is available to customers of G Suite Enterprise, G Suite Enterprise Essentials and G Suite Enterprise for Education.

A BigQuery account is required to use it and it will fall under the existing BigQuery pricing plan (prices are based on both the volume of data stored and the consumption of that data).


Connected Sheets represents a huge leap forward for Google Sheets.

It opens up the world of Big Data to the regular Google Sheets user, and that is very exciting.

Through the examples in this article, I hope you’ve seen how easy it is to work with big datasets. If you can analyze small data in your Google Sheets already, then you’ll be able to apply those same skills to big datasets.

Now you won’t have to rely on a developer to write custom queries for you. You can just dive straight into the data yourself.

I can’t wait to see how this feature is used over the next few years. After all, datasets are only getting bigger from here on!

Further Reading

For more information, keep an eye out for announcements on the G Suite section of the Google Cloud blog and the G Suites Updates blog.

Check out this Connected Sheets YouTube tutorial from the G Suite team.

How to import social media statistics into Google Sheets: The Import Cookbook

Google Sheets has a powerful and versatile set of IMPORT formulas that can import social media statistics.

This article looks at importing social media statistics from popular social media channels into a Google sheet, for social network analysis or social media management. If you manage a lot of different channels then you could use these techniques to set up a master view (dashboard) to display all your metrics in one place.

The formulas below are generally set up to return the number of followers (or page likes) for a given channel, but you could adapt them to return other metrics (such as follows) with a little extra work.

Caveats: these formulas occasionally stop working when the underlying website changes, but I will try to keep this post updated with working versions for the major social media statistics.

Example workbooks: Each example has a link to an associated Google Sheet workbook, so feel free to make your own copy: File > Make a copy....


  1. Facebook
  2. Twitter
  3. Instagram
  4. Youtube
  5. Pinterest
  6. Alexa rank
  7. Quora
  8. Reddit
  9. Spotify
  10. Soundcloud
  11. GTmetrix
  12. Bitly
  13. Linkedin
  14. Sites that don’t work and why not
  15. Closing thoughts
  16. Resources

How to import social media statistics into Google Sheets with formulas

facebook icon Import Facebook data

November 2018 update: The Facebook formula is working again! The trick is to use the mobile URL 😉 Thanks to reader Mark O. for this discovery.

Start with the mobile Facebook page URL in cell A1, e.g. this url

or this variation of it:

Here is the formula to extract page likes:

=INDEX(REGEXEXTRACT(REGEXEXTRACT(LOWER(INDEX(IMPORTXML(A1,"//@content"),2)),"([0-9km,.]+)(?: likes)"),"([0-9,.]+)([km]?)"),,1) * SWITCH(INDEX(REGEXEXTRACT(REGEXEXTRACT(LOWER(INDEX(IMPORTXML(A1,"//@content"),2)),"([0-9km,.]+)(?: likes)"),"([0-9,.]+)([km]?)"),,2),"k",1000,"m",1000000,1)

The following screenshot shows these formulas:

Facebook Data

See the Facebook Import Sheet.

^ Back to Contents

twitter logo Import Twitter data

Start with the mobile Twitter handle URL in cell A1, e.g.

Here is the formula to extract follower count:

=VALUE(REGEXEXTRACT(IMPORTXML(A1,"/"),"(?:Following )([\d,]+)(?: Followers)"))

The following screenshot shows this formula:

Import twitter data

Note 1: This Twitter formula seems to be particularly volatile, working fine one minute, then not at all the next. I have two Sheets open where it’ll work in one, but not the other!

See the Twitter Import Sheet.

^ Back to Contents

Build Business Dashboards With Google Sheets and Data Studio

Digital marketing dashboard in Google Sheets
  • Learn how to build beautiful, interactive dashboards in my online course.
  • 9+ hours of video content, lifetime access and copies of all the finished dashboard templates.
  • Learn more

instagram logo Import Instagram data

Start with the Instagram page URL in cell A1:

Then, this formula in cell B1 to extract the follower metadata (this may or may not work):


This extracts the following info: “230 Followers, 259 Following, 465 Posts – See Instagram photos and videos from Ben Collins (@benlcollins)”

Next step is to combine this with REGEX to extract the followers for example. Here’s the formula to do that (still assuming url in cell A1):

=INDEX(REGEXEXTRACT(REGEXEXTRACT(LOWER(IMPORTXML(A1,"//meta[@name='description']/@content")),"([0-9km,.]+)( followers)"),"([0-9,.]+)([km]?)"),,1) * SWITCH(INDEX(REGEXEXTRACT(REGEXEXTRACT(LOWER(IMPORTXML(A1,"//meta[@name='description']/@content")),"([0-9km,.]+)( followers)"),"([0-9,.]+)([km]?)"),,2),"k",1000,"m",1000000,1)

This deals with any accounts that have abbreviated thousands (k) or millions (m) notations.

Alternative Approach

The following formulas to extract account metrics appear to only work for the instagram account when you are logged in.

Here’s the number of followers:

=REGEXEXTRACT(INDEX(SPLIT(QUERY(IMPORTDATA(A1),"select Col1 limit 1 offset 181"),""""),1,2),"[\d,]+")

Here’s the number following:

=REGEXEXTRACT(QUERY(IMPORTDATA(A1),"select Col2 limit 1 offset 181"),"[\d,]+")

Here’s the number of posts:

=REGEXEXTRACT(QUERY(IMPORTDATA(A1),"select Col3 limit 1 offset 181"),"[\d,]+")

The following screenshot shows these formulas:

instagram data

See the Instagram Import Sheet.

^ Back to Contents

youtube logo Import YouTube data

Start with the YouTube channel URL in cell A1:

To get the number of subscribers to a YouTube channel, use this formula in cell B1:

=VALUE(INDEX(REGEXEXTRACT(LOWER(INDEX(REGEXEXTRACT(INDEX(IMPORTXML(A1,"//div[@class='primary-header-actions']"),1,1),"(Unsubscribe)([0-9kmKM.]+)"),1,2)),"([0-9,.]+)([km]?)"),,1) * SWITCH(INDEX(REGEXEXTRACT(LOWER(INDEX(REGEXEXTRACT(INDEX(IMPORTXML(A1,"//div[@class='primary-header-actions']"),1,1),"(Unsubscribe)([0-9kmKM.]+)"),1,2)),"([0-9,.]+)([km]?)"),,2),"k",1000,"m",1000000,1))

See the YouTube Import Sheet.

^ Back to Contents

pinterest logo Import Pinterest data

In cell A1, enter the following URL, again replacing benlcollins with the profile you’re interested in:

Then in the adjacent cell, B1, enter the following formula:


to get the following output (screenshot shows older version of the formula, latest one is above and in the template file):

pinterest data

Note, you can also get hold of the profile metadata with the import formulas, as follows:


See the Pinterest Import Sheet.

^ Back to Contents

alexa logo Import Alexa ranking data

Here there are two metrics I’m interested in – a site’s Global rank and a site’s US rank.

Global Rank

To get the Global rank for your site, enter your URL into cell A1 (replace

and use the following helper formula in cell B1:

=QUERY(ArrayFormula(QUERY(IMPORTDATA(A1),"select Col1") & QUERY(IMPORTDATA(A1),"select Col2")),"select * limit 1 offset " & MATCH(FALSE,ArrayFormula(ISNA(REGEXEXTRACT(QUERY(IMPORTDATA(A1),"select Col1") & QUERY(IMPORTDATA(A1),"select Col2"),"Global rank icon.{10,}"))),0)+1)

and then extract the rank in cell C1:


US Rank

Assuming you have the Alexa URL in cell A1 again, then the US rank is extracted with this helper formula:

=QUERY(ArrayFormula(QUERY(IMPORTDATA(A1),"select Col1") & QUERY(IMPORTDATA(A1),"select Col2")),"select * limit 1 offset " & MATCH(FALSE,ArrayFormula(ISNA(REGEXEXTRACT(QUERY(IMPORTDATA(A1),"select Col1") & QUERY(IMPORTDATA(A1),"select Col2"),"title='United States Flag'.alt.{50,}"))),0))

and this formula to extract the actual rank value:


The following screenshot shows these formulas:

alexa data

See the Alexa Ranking Import Sheet.

^ Back to Contents

Build Business Dashboards With Google Sheets and Data Studio

Digital marketing dashboard in Google Sheets
  • Learn how to build beautiful, interactive dashboards in my online course.
  • 9+ hours of video content, lifetime access and copies of all the finished dashboard templates.
  • Learn more

quora logo Import Quora data

In this instance, I’ve imported the number followers Barack Obama has on Quora.

Quora is a little bit different because I need to use the URL and the profile name in my formula, so I’ve kept them in separate cells for that purpose. So in cell A1, add the generic Quora URL:

And then in cell B1, add the profile name:


Then the formula in C1 to get the number of followers is:

=VALUE(QUERY(IMPORTXML(A1&B1,"//a[@href='/profile/"&B1&"/followers']"),"select Col2"))

The following screenshot shows this formula:

quora data

See the Quora Import Sheet.

^ Back to Contents

reddit logo Import Reddit data

Here, I’m using the funny subreddit as my example.

In A1:

To get the number of followers of this subreddit, use this formula in cell B1:


Bonus: To get the number of active viewers of this subreddit:


The following screenshot shows these formulas:

reddit data

See the Reddit Import Sheet.

^ Back to Contents

spotify logo Import Spotify monthly listeners

Here’s a method for extracting the number of followers an artist has on the music streaming site Spotify.

First, find your favorite artist on Spotify:

Copy the URL into cell A1 (it’ll look like this):

(This is Metallica, yeah 🤘)

Then put the following formula into cell A2 to extract the monthly listeners:


See the Spotify Import Sheet.

^ Back to Contents

soundcloud logo Import Soundcloud data

Start with the Soundcloud page URL in cell A1, e.g.

Here is the formula to extract page likes:

=ArrayFormula(VALUE(REGEXEXTRACT(QUERY(SORT(IFERROR(REGEXEXTRACT( IMPORTXML(A1,"//script"),"followers_count..\d{1,}"),""),1,FALSE),"select * limit 1"),"\d{1,}")))

Alternative formula:

Here is an alternative formula to extract the page metadata, which includes the likes:


the formula to extract likes is:

=VALUE(REGEXEXTRACT(REGEXEXTRACT(SUBSTITUTE( IMPORTXML(A1,"//meta[@name='description']/@content"),",",""),"\d{1,}.Followers"),"\d{1,}"))

and to extract the “talking about” number:

=VALUE(REGEXEXTRACT(REGEXEXTRACT(SUBSTITUTE( IMPORTXML(A1,"//meta[@name='description']/@content"),",",""),"\d{1,}.Tracks"),"\d{1,}"))

The following screenshot shows these formulas:

soundcloud data

See the Soundcloud Import Sheet.

^ Back to Contents

gtmetrix_logo Import GTmetrix data

GTmetrix is a website that analyzes website performance.

You need to grab the correct URL before you can start scraping the data. So navigate to the GTmetrix site and enter the URL and hit analyze. You’ll end up with a URL like this:

Those last 8 characters (“BcHv78bP”) appear to be unique each time you run an analysis, so you’ll have to do this step manually.

Then in column B, I use this formula to extract the Page Speed Score and YSlow Score, into cells B1 and B2:


and this formula in cell B3, to get the page details (Fully Loaded Time, Total Page Size and Requests) in cells B3, B4 and B5:


The following screenshot shows these formulas:

gtmetrix data

See the GTmetrix Import Sheet.

^ Back to Contents

Bitly logoImport Bitly click data

Bitly is a service for shortening urls. They provide metrics for how many clicks you’ve had on each bitly link, e,g.

Bitly link metric data for import to Google Sheets

Taking a standard Bitly link (e.g. and appending a “+” to it will take you to the dashboard page, with the metrics. Then we can use the import data function, a query function and a REGEX function to extract the click metrics.

User clicks are:

=VALUE(REGEXEXTRACT(QUERY(IMPORTDATA(A9&"+"),"select Col1 limit 1"),"(?:user_clicks...)([0-9]+)"))

and global clicks are:

=VALUE(REGEXEXTRACT(QUERY(IMPORTDATA(A9&"+"),"select Col5"),"(?:global_clicks:.)([0-9]+)"))

Clicks from the Bitly network are then simply the user clicks subtracted from the global clicks.

The following screenshot shows these formulas:

Bitly data import in Google Sheets

See the Bitly Import Sheet.

^ Back to Contents

linkedin logo Import Linkedin data

This formula is no longer working for extracting Linkedin followers and I have not found an alternative.

In cell A1:

This formula used to work to get the number of Linkedin followers, but no longer:

=QUERY(IMPORTXML(A1,"//div[@class='member-connections']"),"select Col1")

and the output:

linkedin import

There is no example sheet for Linkedin since the formula is no longer working.

^ Back to Contents

cancel icon Sites that don’t work and why not

I’ve tried the following sites but the IMPORT formulas are unable to extract the social media statistics:

  • Linkedin (see above)
  • Similar Web
  • Twitch
  • Mobcrush
  • Crunchbase
  • Majestic SEO

These are all modern sites built using front-end, client-side Javascript frameworks, so the IMPORT formulas can’t extract any data because the page is built dynamically in browser as it’s loaded up. The IMPORT formulas work fine on sites built in the traditional fashion, with lots of well formed HTML tags, where the social media statistics are embedded into the site markup that is passed from the server.

Compare this screenshot of the source code for Mobcrush, built using Angular JS it looks like (click to enlarge):

Angular front end

versus what the source code looks for this page on my website (click to enlarge):

benlcollins code

You can see the code for my site has lots of tags which the IMPORT formulas can parse, whereas the other site’s code does not.

If anyone knows of any clever way to get around this, do share!

Otherwise, you’re next option is to venture down the API route. Yes, this involves coding, but it’s not as hard as you think.

I’ll be posting some API focussed articles soon. In the meantime, check out my post on how to get started with APIs, or for a peak at what’s coming, take a look at my Apps Script + API repo on GitHub.

Loading error

Also, even when these formulas are working, they can be temperamental. If you work with them a lot, sooner or later you’ll find yourself hitting this loading issue all the time, where the formulas stop displaying any results:

Loading error

^ Back to Contents

settings logo Closing thoughts

These formulas are unstable and will sometimes display an error message.

I’ve found that adding or removing the final “/” from the URL can sometimes get the formula working again (the issue is to do with caching).

I can make no guarantee that these will work for you or into the future. Whilst researching this article, I came across several older articles where many of the formulas no longer work. So things change!

To summarize: Caveat Emptor!

^ Back to Contents

link icon Resources

^ Back to Contents

As always, leave any comments, corrections or request other social media statistics below.

Icons from Freepik.