Question

Wikipedia is geotagging a lot of its articles. (Look in the top right corner of the page.)

Is there any API for querying all geotagged pages within a specified radius of a geographical position?

Update

Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer):

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?subject ?label ?lat ?long WHERE {
    ?subject geo:lat ?lat.
    ?subject geo:long ?long.
    ?subject rdfs:label ?label.
    FILTER(xsd:float(?lat) - 57.03185 <= 0.05 && 57.03185 - xsd:float(?lat) <= 0.05
        && xsd:float(?long) - 9.94513 <= 0.05 && 9.94513 - xsd:float(?long) <= 0.05
        && lang(?label) = "en"
    ).
} LIMIT 20

This is very close to what I want, except it returns results within a (local) square around the point and not a circle. Also I would like if the results where sorted based on the distance from the point. (If possible.)

Update 2

I am trying to determine the euclidean distance as an approximation of the true distance, But I am having trouble on squaring a number in SPARQL. (Question opened here.) When I get something useful I will update the question, but in the meantime I will appreciate any suggestions on alternative approaches.

Update 3

A final update. I gave up on using SPARQL through DBpedia. I have written a simple parser which fetches the Wikipedia article text nightly database dump and parses all articles for geocodes. It works rather nicely and it allows me to store information about geotagged articles however I wish.

This is probably the solution I will continue using, and if I get around to create a nice interface to it I might consider allowing public API access and/or publishing the source to the parser.

Was it helpful?

Solution

The OpenLink Virtuoso server used by the dbpedia endpoint has several query features. I found the information on http://docs.openlinksw.com/virtuoso/rdfsparqlgeospat.html useful for a similar problem.

I ended up with a query such as this:

SELECT ?page ?lat ?long (bif:st_distance(?geo, bif:st_point(15.560278, 58.394167)))
WHERE{
    ?m foaf:page ?page.
    ?m geo:geometry ?geo.
    ?m geo:lat ?lat.
    ?m geo:long ?long.
    FILTER (bif:st_intersects (?geo, bif:st_point(15.560278, 58.394167), 30))
}
ORDER BY ASC 4 LIMIT 15

This example retrieves the geotagged locations within 30 km from the origin position.

OTHER TIPS

You should be able to query for latitude/longitude using SPARQL and dbpedia. An example (from here):

SELECT distinct ?s ?la ?lo ?name ?country WHERE {
?s dbpedia2:latitude ?la .
?s dbpedia2:longitude ?lo .
?s dbpedia2:officialName ?name .
?s dbpedia2:country ?country .
filter (
  regex(?country, 'England|Scotland|Wales|Ireland')
  and regex(?name, '^[Aa]')
)
}

You can run your own queries here.

There are a couple of tools listed on Tools and applications based on coordinates from Wikipedia. I'm not sure if it's what you're looking for, but the Geosearch.py tool looks pretty cool.

Not an API, but you can also download this nice set of all geo-tagged wikipedia articles and query it directly in a local database: http://www.google.com/fusiontables/DataSource?dsrcid=423292

The free GeoNames.org FindNearbyWikipedia service can fetch geotagged articles for a give postal code or coordinates (latitude, longitude)

It provides 30,000 credits daily limit per application (identified by the parameter 'username'), the hourly limit is 2000 credits. A credit is a web service request hit for most services. An exception is thrown when the limit is exceeded.

I'm not familiar enough with SPARQL, but if it can use power in its filter then its easy to compute the distance of a given article from a given point using Pythagoras theorem (a^2 + b^2 = c^2) and that would give you all the articles in a radius.

Another option would be to get a Wikipedia data dump and process it yourself - this is what I did when I needed to do some linguistic analysis on Wikipedia article.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top