Question

I'm writing a C library that parses and serializes domain-specific data as RDF/XML. After I parse a file and write it back out again I'd like to be able to check that the content hasn't changed.

I could parse everything again, write it out as NTriples, and compare the two Ntriples files line-by-line but there must be a better way right?

Thanks!

Was it helpful?

Solution

You could use an hash function (such as MD5 or SHA1) but RDF/XML as well as other RDF serialization formats can be serialized in different ways all corresponding to the same RDF graph.

Moreover, blank nodes labels if you use Turtle or N-Triples formats will be different each time. For this reasons, an hash function might not be the best option.

Graph isomorphism is an 'interesting' problem. ;-)

It's not written in C, but you could try to decipher what Apache Jena does in GraphMatcher.java.

As alternative, as you said, if you do not have blank nodes, you can serialize data out in N-Triples, sort and compare those files or have your own sorted data structure and use that instead, avoiding the serialization step.

OTHER TIPS

You can calculate a strong hash (e.g., MD5 or SHA1) of both files. The hashes will match if the files are equal byte for byte.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top