Question

I have a list of youtube videos from different playlists and I need to check if these videos are still valid (they are around 1000). What I am doing at the moment it is hitting Youtube using its API v2 and Groovy with this simple script:

import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.Method.GET

http = new HTTPBuilder('http://gdata.youtube.com')

myVideoIds.each { id ->
    if (!isValidYoutubeUrl(id)) {
        // do stuff
    }
}

boolean isValidYoutubeUrl (id) {
    boolean valid = true
    http.request(GET) {
        uri.path = "feeds/api/videos/${id}"

        headers.'User-Agent' = 'Mozilla/5.0 Ubuntu/8.10 Firefox/3.0.4'

        response.failure = { resp ->
            valid = false
        }
    }
    valid
}

but after a few seconds it starts to return 403 for any single id (it may be due to the fact it is running too many requests closely). The problem is reduced if I insert something like Thread.sleep(3000). Is there a better solution than just delaying the requests?

Was it helpful?

Solution

In V2 of the API, there are time-based limits on how many requests you can make, but they aren't a hard and fast limit (that is, it depends somewhat on many under-the-hood factors and may not always be the same limit). Here's what the documentation says:

The YouTube API enforces quotas to prevent problems associated with irregular API usage. Specifically, a too_many_recent_calls error indicates that the API servers have received too many calls from the same caller in a short amount of time. If you receive this type of error, then we recommend that you wait a few minutes and then try your request again.

You can avoid this by putting in a sleep like you do, but you'd want it to be 10-15 seconds or so.

It's more important, though, to implement batch processing. With this, you can make up to 50 requests at once (this counts as 50 requests against your overall request per day quota, but only as one against your per time quota). Batch processing with v2 of the API is a little involved, as you make a POST request to a batch endpoint first, and then based on those results you can send in the multiple requests. Here's the documentation:

https://developers.google.com/youtube/2.0/developers_guide_protocol?hl=en#Batch_processing

If you use v3 of the API, batch processing becomes quite a bit easier, as you just send 50 IDs at a time in the request. Change:

http = new HTTPBuilder('http://gdata.youtube.com')

to:

http = new HTTPBuilder('https://www.googleapis.com')

Then set your uri.path to

youtube/v3/videos?part=id&max_results=key={your API key}&id={variable here that represents 50 YouTube IDs, comma separated}

For 1000 videos, then, you'll only need to make 20 calls. Any video that doesn't come back in the list doesn't exist anymore (if you need to get video details, change the part parameter to be id,snippet,contentDetails or something appropriate for your needs.

Here's the documentation:

https://developers.google.com/youtube/v3/docs/videos/list#id

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top