How to improve an RDF document?

https://stackoverflow.com/questions/21605988

08-10-2022
|

Question

Am building a catalog of chemicals. The example below is 2,4-D (a pesticide).

The long term goal is to combine many RDF files into an OWL "catalog" with reasoning capabilities.

The short term objective is to build one RDF file at a time.

Then, when RDF files are combined into an OWL catalog, will add new relationships and rules for reasoning.

My short term approach below is:

Declare ontologies.
Use SKOS to set Concept.
Use DBpedia to set broad term.
Use DBpedia to set exact term.
Identify exact and similar items.
Define a Resource map and Aggregation.
Itemize aggregation.

Here is the RDF:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:agrontology="http://aims.fao.org/aos/agrontology#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:cheminf="http://semantiscience.org/resource/"
  xmlns:dbp="http://dbpedia.org/resource/"
  xmlns:ore="http://www.openarchives.org/ore/terms/"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:schema="http://schema.org/">
  <skos:Concept rdf:about="http://dbpedia.org/resource/Chemical_substance">
    <ore:aggregatedBy rdf:resource="http://dbpedia.org/resource/Chemical_substance#Aggregation" />
  </skos:Concept>
  <ore:aggregates rdf:about="http://dbpedia.org/resource/Chemical_substance">
    <cheminf:CHEMINF_000266>chemical substance</cheminf:CHEMINF_000266>
  </ore:aggregates>
  <ore:ResourceMap rdf:about="http://dbpedia.org/resource/Chemical_substance#ResourceMap">
    <ore:describes rdf:resource="http://dbpedia.org/resource/Chemical_substance#Aggregation" />
  </ore:ResourceMap>
  <dbp:Chemical rdf:about="http://dbpedia.org/resource/2,4-Dichlorophenoxyacetic_acid">
    <schema:name>2,4-Dichlorophenoxyacetic acid</schema:name>
  </dbp:Chemical>
  <cheminf:CHEMINF_000140 rdf:about="http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID1486">
    <schema:name>PubChem CID:1486</schema:name>
    <rdf:type rdf:resource="http://dbpedia.org/resource/Chemical"/>
  </cheminf:CHEMINF_000140>
  <dbp:Chemical rdf:about="http://id.loc.gov/authorities/subjects/sh85037669">
    <schema:name>LCSH:85037669</schema:name>
    <skos:closeMatch>
      <cheminf:CHEMINF_000140 rdf:about="http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID1486">
        <schema:name>PubChem CID:1486</schema:name>
        <rdf:type rdf:resource="http://dbpedia.org/resource/Chemical"/>
      </cheminf:CHEMINF_000140>
    </skos:closeMatch>
  </dbp:Chemical>
  <dbp:Chemical rdf:about="http://lod.nal.usda.gov/nalt/1353">
    <schema:name>NALT:1353</schema:name>
    <schema:sameAs rdf:resource="http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID1486"/>
  </dbp:Chemical>
  <dbp:Chemical rdf:about="http://aims.fao.org/aos/agrovoc/c_8543">
    <schema:name>Agrovoc:8543</schema:name>
    <schema:sameAs rdf:resource="http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID1486"/>
    <rdf:type rdf:resource="http://aims.fao.org/aos/agrontology#hasCodeAgrovoc"/>
  </dbp:Chemical>
  <ore:Aggregation rdf:about="http://dbpedia.org/resource/Chemical_substance#Aggregation">
    <ore:describedBy rdf:resource="http://dbpedia.org/resource/Chemical_substance#ResourceMap" />
    <ore:aggregates rdf:resource="http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID1486" />
    <ore:aggregates rdf:resource="http://aims.fao.org/aos/agrovoc/c_8543" />
    <ore:aggregates rdf:resource="http://dbpedia.org/resource/2,4-Dichlorophenoxyacetic_acid" />
    <ore:aggregates rdf:resource="http://id.loc.gov/authorities/subjects/sh85037669" />
    <ore:aggregates rdf:resource="http://lod.nal.usda.gov/nalt/1353" />
  </ore:Aggregation>
</rdf:RDF>

How can I improve on this approach? For example:

Add more expressive relationships.
Make it more compact.
Establish provenance (using http://www.w3.org/TR/prov-o/) for an Ontology; an Authority; an item?
Better organize the file for combining with other RDF files into an OWL environment?

Thank you for your help here.

Solution

I believe that questions #2 and #4 can be addressed by considering the idea of removing schema-style elements from your representation. The data that you provided seems to mix the definition of domain concepts (ie: that a ChemicalSubstance is a Concept) with instance information (http://dbpedia.org/resource/2,4-Dichlorophenoxyacetic_acid http://schema.org/name "2,4-Dichlorophenoxyacetic acid"). A best practice here would be to keep a single vocabulary/ontology for describing the backbone structures that you'll be using to represent the instance data for individual chemicals.

I'd also like to note at a superficial level, it seems that you may not be consistently using rdf:type to define class membership. Additionally, you may be using rdf:type where you'd probably instead prefer to have rdfs:subClassOf.

I would suggest storing each of these different documents within a named graph of a RDF dataset. Create a meaningful IRI for each chemical graph using some consistent convention.

The simplest approach to #3 (provenance information) could be to store triples describing an owl:Ontology (the x in these examples) stored with each document (named graph). This follows existing conventions, but can break down if you combine named graphs (as the triples for each graph would not be necessarily annotated to describe what ontology contained them). A minor variation, down to preferences, would be to store provenance information within the default graph of your data-store in order to separate chemical meta information from descriptions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow