openrdf Sesame: Is it possible to parse single lines?

https://stackoverflow.com/questions/17628962

03-06-2022
|

Question

Is it possible to use the parsers from the openrdf framework to parse single lines of text into the openrdf model? I would like to to parse huge nquads files and would like to use the

org.openrdf.rio.nquads.NQuadsParser

for this task. My dream solution would return an org.openrdf.model.Statement, with proper instances of subject, predicate, object and context. I know the class itself does not have a method to do this. Since I try to parse very big files, I can not load them completely into a repository. I could probably parse chunks of the file into a repository, evaluate and then clear the repository, making room for the next chunk. I am wondering if there is a better way to get Statements from lines of a text file?

For some context, I want to gather statistics on Huge Nquads Files, for which I need to evaluate each statement, but do not need to store the majority of it.

Solution

It isn't possible to parse single lines AFAIK but Sesame does have an API that allows you to control what is done with parsed statements which would avoid the need for your code to actually store statements into a repository.

See the documentation for a simple example of just counting triples but you can easily do much more complex processing this way.

OTHER TIPS

Upon further investigation, I realized that there are more ParserSettings apart from the ones in

org.openrdf.rio.helpers.BasicParserSettings

Specifically, the

NTriplesParserSettings.FAIL_ON_NTRIPLES_INVALID_LINES

can prevent a parser from failing when discovering invalid lines. E.g. setting

parser.getParserConfig().addNonFatalError(NTriplesParserSettings.FAIL_ON_NTRIPLES_INVALID_LINES);

will configure your parser to be more error-tolerant. This allows to parse the 'good' triples in a noisy nquads/ntriples file.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow