Extract content from DBPedia's big dump file in .NET

https://stackoverflow.com/questions/15520574

24-03-2022
|

Question

I want to extract labels, abstracts, categories and relevant dates to each article from DBPedia dump file.

I'm using dotnetrdf and I want to save the extracted data to MS SQL database (I don't want to use triple stores like Virtuoso).

Due to the size of dump file, I can't load the dump file into memory.

Is there any solution to extract statements? The only way I can imagine is to split the dump file into smaller chunk files, is it the only solution?

Solution

Actually everything in dotNetRDF is designed to support streaming parsing, the most common use case happens to be loading stuff into our in-memory structures but even that uses the streaming parser subsystem under the hood.

See the Advanced Parsing section of the Reading RDF documentation which introduces the Handlers API, this API gives users complete control over what happens to the data as it is produced by the parser. So you can write a custom handler which receives the data as it is produced by the stream and puts it into your database.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow