Pregunta

We are a distribution platform and we are planning to build an analytics service on top of our own Google Analytics (GA) data of our high traffic website, as an service for our users. The users can basically see how the assets they uploaded to our site perform over time.

For this purpose we created a little web app, which allows our users to query against our own GA data through a Node.js app (oAuth2), which makes requests to the GA API under one Service Account with our access token.

After using this app a bit and starting to scale it, we figured out the GA API has quite strong limitations, which makes it hard to scale.

If one does requests under one Service Account you can't do more than 10 requests per sec/ip and never more than 4 requests at the same time. This is a showstopper, which makes it impossible for us to roll out our app to hundreds of users.

Do you have any suggestions on how to scale a Google API app which makes requests under one Service Account?

¿Fue útil?

Solución

You are forgetting one more limitation. You can make a max of 10k requests per view (profile) per day. You will blow that quota out very quickly. There is no way to extend that or the 10 requests per second quota.

I suggest you create a script that extracts your data onto your server and then server that data to your users. Then you are only requesting the data from GA once and you will bypass all the quota limits.

Second option would be to create multiple client id's and assign a different client id to a different group of users. But IMO this isn't very scale-able and will be very hard to administrate. So i wouldn't advice you to use this.


There is no way to ask Google to extend the 10 requests per second or the 10k requests per view per day. Because you will always be requesting from the same view I think the best option for you would be to extract the data every night onto your own server and then serve it to your users from there.

Since you haven't stated what platform or language you are doing I will give you an example of what can be done, what I have personally done to solve this problem.

I have Created a custom SSIS connection manager that uses OAuth2 to connect to Google's Authentication servers. Then I created a custom data flow task that uses the Connection manager to get a connection to the Google Analytics API. I then created a SSIS package that requests the information I need down into SQL Server this job runs every night to ensure that I have all the data I need.

A few things to remember:

  1. Data under 24 hours old hasn't finished processing don't bother selecting out yesterday. After that the data is stable so you will never need to request it again.
  2. Depending upon how much data there is in your Google analytics make sure that you request it in smaller chunks or you will end up with sampling or timeouts. But it depends on how busy the site is. I have one site that has 1.5 million records in a month that site I am forced to get day by day, to prevent timeouts.
  3. There is a max of 7 dimensions you can select out at a time with the GA API but using creative filtering you can get around that a little.

If you post some more information about what type of system you have and what programing language you have access to I might be able to help more. This was just an example.

Otros consejos

As an alternative, you can try using Piwik.org where you can get the same metrics as in GA, but do not run into limitations other than your hardware (see Piwik's Data Limitations page for more information). It has a REST API which is very easy to use and implement (see API usage examples here).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top