Question

I have a problem with HXT. i wanna parse an owl file and i have a problem with my arrow, because he dont wanna parse a tree ! I saw that the problem is that one: Firstly , the code :

import System.Environment  --para uso do getArgs

import Data.List.Split (splitOn)


data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)


main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser


parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"



split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l


atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = (split subClass) }

with that i should be able to parse on the owl file each nodes where it exists this example :

<owl:Class rdf:about="Damien">
    <rdfs:subClassOf rdf:resource="PurchaseableItem"/>
</owl:Class>

But, when i want to parse a tree like that , it simply doesnt compute and throw it away !

<owl:Class rdf:about="&camera;BodyWithNonAdjustableShutterSpeed">
    <owl:equivalentClass>
        <owl:Class>
            <owl:intersectionOf rdf:parseType="Collection">
                <rdf:Description rdf:about="&camera;Body"/>
                <owl:Restriction>
                    <owl:onProperty rdf:resource="&camera;shutter-speed"/>
                    <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">0</owl:cardinality>
                </owl:Restriction>
            </owl:intersectionOf>
        </owl:Class>
    </owl:equivalentClass>
</owl:Class>

Why ? Because the Subclass node doesnt exist ! But i want the Class available there and put it on my data even if the subclass doesnt exists ! So, how is that possible to do it ?


My newest version:

import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)

data Class = Class {
                    name ::String,
                    subClassOf :: String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")
parseSubClass = (getAttrValue "rdf:resource") `orElse` arr (const "" )

--Test  (é preciso rever esta definição) uma falha se o nome tiver o "#"
split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)
getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = split subClass }
Was it helpful?

Solution

You need to decide what you want when the SubClass node doesn't exist. As I see it, you have two choices:

  • A missing SubClass node means that subClass is the empty string. In that case, simply change your parser to fall back to the empty string when the arrow built around atTag "rdfs:subClassOf" fails:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass) `orElse` arr (const "")
    
  • A missing SubClass node means that subClass is Nothing. This requires changing your data definition so that subClassOf is of type Maybe String, but after that it's fairly similar to the previous answer:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = fmap split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass >>> arr Just)
            `orElse` arr (const Nothing)
    

Just so we're clear, because you say that this isn't working in the comments, here's exactly the full program I'm running, which works fine for me:

{-# LANGUAGE Arrows #-}
import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)
import Text.XML.HXT.Core

data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")
             (getAttrValue "rdf:about")
             (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"

split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    subClass <- getSubClass -< l
    returnA -< Class { name = split className, subClassOf = split subClass }
    where
      getSubClass =
        (atTag "rdfs:subClassOf" >>> parseSubClass)
        `orElse` arr (const "")

Note that if you really don't want to combine multple arrow steps with >>> or <<<, another possibility is to use an inner proc:

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    subClass <- (proc l' -> do
      s <- atTag "rdfs:subClassOf" -< l'
      parseSubClass -< s)
      `orElse` constA "" -< l
    returnA -< Class { name = split className, subClassOf = split subClass}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top