Pergunta

DBpedia just released their data as tables, suitable to import into a relational database. How can I query this data online using SQL?

Dataset: http://wiki.dbpedia.org/DBpediaAsTables

Foi útil?

Solução

I took the raw data, uploaded it to BigQuery, and made it public. So far I've done it with the 'person' and the 'place' table. Check them at https://bigquery.cloud.google.com/table/fh-bigquery:dbpedia.person.

Now is easy to know what are the most popular alma maters, for example:

SELECT COUNT(*), almaMater_label
FROM [fh-bigquery:dbpedia.person]
WHERE almaMater_label != 'NULL'
GROUP BY 2
ORDER BY 1 DESC

It's a little more complicated than that, as some people have more than one alma mater - and the particular way DBpedia encodes that. I left the complete query at http://www.reddit.com/r/bigquery/comments/1rjee7/query_wikipedia_in_bigquery_the_dbpedia_dataset/.

Btw, the top alma maters are:

   494     Harvard University
   320     University of Cambridge
   314     University of Michigan
   267     Yale University
   216     Trinity College Cambridge

You can also do joins between tables.

For example, for each building (from the place table) that has an architect: What year was that architect born? How many buildings with an architect born that year are listed in DBpedia?

SELECT COUNT(*), LEFT(b.birthDate, 4) birthYear
FROM [fh-bigquery:dbpedia.place] a
JOIN EACH [fh-bigquery:dbpedia.person] b
ON a.architect = b.URI
WHERE a.architect != 'NULL'
AND birthDate != 'NULL'
GROUP BY 2
ORDER BY 2

Results:

...
8   1934
13  1935
9   1937
7   1938
17  1939
7   1941
1   1943
15  1944
10  1945
12  1946
7   1947
9   1950
20  1951
1   1952
...

(Google BigQuery has a free monthly quota to query, up to a 100GB each month) (DBpedia data from version 3.4 on is licensed under the terms of the Creative Commons Attribution-ShareAlike 3.0 license and the GNU Free Documentation License. http://dbpedia.org/Datasets#h338-24)

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top