Question

I am testing out BigQuery in Google API, and would like to run some queries on Wikipedia full text dump. Google sample data doesn't include full text dump (only revision history).

There are are few sources for Wikipedia dump, such as this one on Amazon: http://aws.amazon.com/datasets/2506

My question are: Is there a way to query these datasets without transfering them to Google BigQuery projec? Equivalently, there is a way for BigQuery to communicate with one of these datasets directly?

If it is not possible for BigQuery, then is there an equivalent service in Amazon EC2 that can do the same thing?

Thank you.

Was it helpful?

Solution

Is there a way to query these databases without transferring them to Google BigQuery project?

No. BigQuery operates against BigQuery projects and datasets.

Equivalently, there is a way for BigQuery to communicate with one of these datasets directly?

Equivalently, no. For exactly the same reason.

If it is not possible for BigQuery, then is there an equivalent service in Amazon EC2 that can do the same thing?

No, not really. There's Amazon Cloud Search, but it basically operates on the same principal and requires that you upload the data to be searched. So unless somone has already uploaded that data into an Amazon Cloud Search account, no, there's no eay to do it without uploading the data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top