Question

Let's say you have a document that mentions "Turkey" and "Istanbul" and you want to extract those keywords and match it to a Wikipedia article. But for "turkey" it could mean for instance either Turkey the country or turkey the bird. Is it then possible to use the second keyword, Istanbul, to measure the "distance" between that and the right "Turkey". So:

Istanbul -> Turkey the country -> close.

Istanbul -> turkey the bird -> distant.

To explain what I mean with distance further: as I understand SPARQL can traverse graphs and DBPedia is a type of (knowledge) graph so the distance I am looking for could probably be in the graph.

Was it helpful?

Solution

You can find the length of a path between two resources in SPARQL if there's a unique path between the resources. (This has been described in a number of places now; e.g., this answer to Calculate length of path between nodes?.) However, you cannot use that technique if there are multiple paths joining the endpoints, because it works by counting nodes on the path(s) between the resources, so if there are multiple paths, it won't be very useful.

In DBpedia, there could be lots of paths between any pair of resources, so it's rather hard to use that sort of metric. An alternative that you could use, though, is to find the closest common superclass, and use a metric based on that. That approach has been discussed in this answer to finding common superclass and length of path in class hierarchies.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top