Humanlytics

View Original

Mastering Google Analytics Reporting API Requests

Chapter 3— Google Analytics Guide for Absolute Beginners

Mastering Google Analytics Reporting API Requests

As promised in the last post in our series, we’re going to go through the structure of requests in the Google Analytics Reporting API and show you a few pointers on how to create the most awesome request that satisfies your business needs.

More specifically, we are literally going through all fields in the Google Analytics Reporting API Request field talk about

  1. What each field means, and

  2. How can you adjust fields to satisfy your business needs

Because Google Analytics request fields are all-encompassing (and because of our obsessive desire for detail) we won’t go into every field in this post, but instead will only introduce enough fields to get you started submitting requests to Google Analytics.

At the end of this guide, we will also show you some resources you can use to practice constructing and submitting Google Analytics API requests without coding to help you further solidify the concepts introduced in this article.

Throughout the guide, we will be mostly referencing the documentation of Google Analytics Reporting API. Feel free to have it open in another tab as you read, but it would be useful to at least skim it before getting started.

Ready?

Let’s start by looking at requests as a whole

Displayed below are all possible fields you can submit as part of the Google Analytics request API query:

“viewId”: string,
“dateRanges”: [{object(
DateRange)}],
“samplingLevel”: enum(
Sampling),
“dimensions”: [{object(
Dimension)}],
“dimensionFilterClauses”: [{object(
DimensionFilterClause)}],
“metrics”: [{object(
Metric)}],
“metricFilterClauses”: [{object(
MetricFilterClause)}],
“filtersExpression”: string,
“orderBys”: [{object(
OrderBy)}],
“segments”: [{object(
Segment)}],
“pivots”: [{object(
Pivot)}],
“cohortGroup”: {object(
CohortGroup)},
“pageToken”: string,
“pageSize”: number,
“includeEmptyRows”: boolean,
“hideTotals”: boolean,
“hideValueRanges”: boolean

There are some fields that probably would take an entire article to explain (segmentation), whereas other fields are self-explanatory (such as hideTotals).

Let’s start by walking through the easy to understand fields and look at what each of these fields is for and how it might be relevant based on your business needs.

Then, we’ll spend some time talking through the key fields that will truly make you a reporting API master, and teach you how to extract the most value from their functionalities.

We’ll also end up writing some more on these fields later!

The self-explanatory fields

google-analytics-api-guide-for-absolute-beginners-chapter-two

Mastering Google Analytics Reporting API Requests

In this section, fields are presented by their rough order of importance.

ViewId (Mandatory)

Input: String

This is essentially the id of your Google Analytics view that you want to get your data from.

If you are not familiar with the Account -> Property -> View structure of Google Analytics, use this guide to learn more about it.

Essentially, the view is a unique copy of your website data filtered by some pre-set conditions if you have configured it.

dateRanges (Mandatory)

Input: A list of (max 2) dateRanges objects

This is another essential request field that you will work with very frequently when you are submitting Google Analytics requests.

If you want to, for example, analyze your data on a day-to-day or week-to-week basis, this is NOT the field you should change to make that analysis happen (you should add day or week as a dimension instead).

This field only reflects the range of dates you want to pull your data from in Google Analytics. The ways in which you choose to slice and dice it will be reflected in other fields, such as dimensions and metrics.

Another detail to notice is that instead of inputting a string or number, we are actually dealing with an object (think this as an input field that you can enter multiple information) here. So let’s look at it in more detail:

{
“startDate”: string,
“endDate”: string,
}

This object is rather simple and only contains two fields: startDate and endDate. All you need to do is to replace the “string” in the object with the actual string representation of the dates.

Google Analytics uses the format “YYYY-MM-DD” to represent dates (ie: 2018–03–05). To read more about date formatting and date representation, check out this quick guide.

Google Analytics also accepts up to two date ranges in one request, mostly to make metric comparison across time much easier for you.

For this reason, the two dateRanges you submit should ALMOST ALWAYS be within the same interval, with most common intervals being 7 days (by week) and 28 days (by month).

Why those two? Week and month are generally seen as the best intervals to avoid fluctuation throughout the week and month while offering your business enough timely insights to make decisions.

pageToken and pageSize (Both optional, mandatory if your request is due to return over 10,000 rows)

Input: String for pageToken and Number for pageSize

To avoid situations in which a novice user sends a 100,000-row request and crashes the server, Google Analytics limits by default the number of rows you can request to 10,000.

Therefore, if you want to submit a 100,000-row request, you have to send 10 requests back-to-back (they have to be separate requests). 

The pageToken field is a token that Google will give you in your API response to indicate that this specific request you are sending is a continuation of a previous uncompleted request.

pageSize determines how many rows each of the requests you request will actually contain. This number is 1000 by default (if you don’t submit anything) and can range from 0 to 10,000.

Usually, 1000 rows should be plenty for most of the analysis you want to conduct (unless you submit too many dimensions, something we will touch on later).

For advanced usage — such as running machine learning models on your Google Analytics data — this feature is crucial to know to get enough data for your models.

includeEmptyRows (Optional, default False)

Input: True or False

Let’s say that you want to get the number of sessions generated by your website this month on a day-to-day basis.

However, you know for a fact that you don’t have any sessions in some days of this month.

By default, Google will omit those days and only return the days with sessions on your website. However, if you set this field to true, those days will not be omitted.

For many data analytics usage cases such as the one explained in this section, you should set this field to true to avoid data gaps in your analysis.

hideTotals and hideValueRanges (Optional, both defaults to False)

Input: True or False for both

By default, Google Analytics returns a little additional information, such as the total and value range of your metrics, along with its responses.

You can turn either of those functions off by setting the corresponding field to true. That said, there is really no point in doing so unless you have to submit a LOT of requests and want to save marginal processing time.

samplingLevel

Input: Choose between [Default, Large, Small]

If you have a large enough number of sessions for your website (500k+ for free users), Google Analytics will only run the analysis of your request on part of your data, instead of its entirety.

This is used to ensure that your analysis is returned in a reasonable amount of time, but it can be annoying and confusing for novice users.

My principle for this field is: if you need your report to return really quickly choose small. The difference will be a matter of only seconds per request, but it adds up in advanced use cases. If not, choose largely or don’t worry about it and just leave it as default.

The important fields that require detailed explanations

Brace yourself for a heavy dose of Google Analytics Awesomeness

Now we are getting into the fields that are, in my opinion, really interesting and exciting to talk about.

For all of the fields to come, I want to go into great depth and detail on how to best leverage them to obtain analytical value for your business. They’re incredibly important, and I promise the time taken to read through them now will pay off in the long run!

With that said, there’s a LOT to be said about these fields, so we’ll only talk about the two essential ones (Metrics and Dimensions) so that we can keep this article readable! We’ll go into the others in an upcoming article. (Plus, keeping articles short helps us with SEO :D)

metrics

Input: A list of Metric Object

Let’s start our discussion with metrics, your bread, and butter for any Google Analytics analyses.

Metrics are the actual numbers Google Analytics measures from your website, whether that’s the number of sessions, time on page, or bounce rate of your homepage.

In the API, all metrics need to be presented in the “metric object” format specified below:

{
“expression”: string,
“alias”: string (optional),
“formattingType”: enum(
MetricType),
}

Let’s start with expression, the primary field in which you define what specific metric you want to include in your Google Analytics API report.

The interesting part of the “expression” field is that you can actually create metrics on the fly in addition to using the default metrics (more on this here)

For example, if you want to know the number of sessions per user on your website, you can both use the pre-computed metric expression “ga:sessionsPerUser”, or create an expression on the fly with the string “ga:sessions/ga:users”.

For a novice user, you might not find an immediate need for this “metric on the fly” use case, as Google Analytics is pretty comprehensive in providing a list of pre-computed metrics, but there are a lot of interesting advanced use cases for this.

For example, if you have 3 major business objectives (all set up as goals) on your website and want to create a metric that represents the overall conversion number for all three goals, you can accomplish this by creating the custom metric “ga:goal01Completions + ga:goal02Completions + ga:goal03Completions”.

As long as you keep in mind that those functionalities are possible, inspirations will come to help you use them when you actually need them.

Now let’s move on to alias.

When submitting requests in the API, you will sometimes need to submit filters and orderBy requests to further narrow the data down to only those that you need.

It would be a pain to have to type “ga:goal01Completions + ga:goal02Completions + ga:goal03Completions” every time you include this metric in a filter or orderBy clause. Instead, you can create an alias — such as “goalAverage” — and use this in all your filters, making it much easier to manage.

Note that the alias field is completely optional even if you are using custom expressions, so use it at your own discretion when you foresee that you will be using the same expression multiple times in the request.

Finally, let’s talk about formattingType, our last optional field.

Just like samplingLevel, you only have a few options here: INTEGER, FLOAT, CURRENCY, TIME, and PERCENT.

For default metrics already provided by Google Analytics, there will be a pre-configured option available. However, you will find, very frequently, you will need to adjust the format here to better reflect your reporting needs.

For example, Google Analytics returns your bounce rate as a float by default, which is essentially number with several digits after the decimal (0.23232323, for example).

It is probably a good idea to change that to a PERCENT format, which makes it an actual percentage number (23.33%), and reduce some of the decimals trailing (you can adjust this after getting the response too, which is what I usually do).

This becomes more and more important as you create your own metrics, as Google Analytics does a really poor job predicting the format of your new metric.

dimensions

Input: A list of Dimension Object

Metrics and dimensions are like best friends — they cannot go without each other.

Whereas metrics are actual measurements produced on your website, dimensions are rules in which you can compile those measurements and convert them into actual, readable numbers relevant to your business.

Here’s an example

With solely metrics, you can only get a summary of your data over a fixed date range, such as all your total number of sessions in the past month, but cannot dig deep into it to find real insights that are actionable for your business, such as the trend of your traffic in the past month, who consisted of those traffics, things as such.

To accomplish that, you need dimensions.

What dimension does is divide your data into pieces based on an attribute of the dataset, whether that’s date/time, acquisition channel, or the cohort group of users so you can drill into your data and answer important business questions such as:

  • What is my traffic trend this week?

  • Who are my best customers?

  • How do my best customers behave?

With that primer on the importance of dimensions, let’s proceed to look at the request structure of the dimension object:

{
“name”: string,
“histogramBuckets”: [
string
],
}

The name of the dimension is rather straightforward, and you can find a list of them here.

The histogramBuckets field, on the other hand, will help you group your continuous dimensions (such as session count, days since the last session, session duration) into discrete buckets that will help you with your plotting and analysis needs.

For example, if you want to use the session count dimension but don’t want to have EVERY section count returned in a separate row, you can set a histogram bucket of [“<10”, “11–20”, “20+”], which will neatly group your session count into categories, making it easier to manage and analyze the data.

While the dimension request field only has those two inputs, there are a few more things we need to touch on to help you get the most useful results from your requests.

First: you can only have the sum of 10 dimensions and metrics in a single request (meaning that you can have 2 dimensions, 8 metrics, or any such combinations that adds to 10), so you need to plan your dimensions and metric requests carefully.

Second: you need to be careful when selecting how many dimensions you want to include in your data.

This is because one additional dimension will create a significant increase in the number of rows returned in your responses.

Consider the number of sessions within 28 days as an example. If you want to divide this data up by days, you will use “ga:date” as a dimension and end up with 28 rows of data, with one row representing each day.

However, if you want to add one more dimension — let’s say channel of your sessions in each of those days — you would add the “ga:channels” dimension and the total amount of rows increase to 28*7 = 196 if you have 7 channels in your GA account.

Each addition of a dimension will multiply that number by the number of entries of that specific dimension, which quickly multiplies to a very large number.

At the same time, if you don’t have enough traffic on your website, the actual number of sessions will decrease drastically with each increase in the number of dimensions as the data need to be split up in more ways, eventually to a degree that you might only have single-digit of sessions in each of the rows that you request.

If the number of sessions in most of your rows is lower than a certain number (50 in most cases), it becomes difficult to draw concrete conclusions on any of your discoveries, simply because the sample size is too small to conclude whether the differences between two dimensions are statistically significant.

Therefore, it is generally recommended to not go beyond 3 dimensions when submitting any Google Analytics API requests unless you really need to make sure that the conclusions you draw from your data are not due to statistical noise.

You can always switch around the 3 dimensions that you want to include in your analysis, but keeping the number below 3 will make sure your data is something that’s easy to understand and statistically valid.

And here is where we going to stop our discussion today. With all the fields introduced today, you are already able to submit a Google Analytics Reporting API request that can return plenty of data related to your businesses.

You can actually play with the request fields introduced in this session at this link.

While this guide was for the most basic requests, we’ve left a lot of stones unturned. Our next chapter will cover:

  • How to pre-sort your results so they return in your preferred format

  • How filtering works in the reporting API

  • How to use metric and dimension filters to exclude unnecessary traffic

  • How to use existing segments or create dynamic segments for advanced analysis

  • How to use the pivot function in Google Analytics to create pivot tables and pivot analysis.

  • How to use cohortGroup function to conduct cohort analysis on your website users.

As you can see, there are a lot more exciting discussions to be had on all those topics, and I am more than thrilled to talk through them all with you.

It will probably be a longer article like this one since we do have a lot to cover and I do want to be very thorough in discussing this topic, so please be patient while we produce it :D

With all that said, take care and until next time!

This article was produced by Humanlytics. Looking for more content just like this? Check us out on Twitter and Medium, and join our Analytics for Humans Facebook community to discuss more ideas and topics like this!