How does Apache TDB store RDF data?

https://stackoverflow.com/questions/21083954

27-09-2022
|

Question

According to the Apache website TDB 'can be used as a high performance RDF store on a single machine'. Reading the documentation I don't see where it stores anything. Is it simply storing each resource in its own file within a defined directory as outlined in this tutorial? If so that seems as if it will not scale very well.

Solution

TDB does not only store the file in a specified folder. The content of the file will be indexed. There are several indexes built for one file: One index is built for S P O order, another for e.g., P O S and so on (as I said for each combination).

Those indexes are stored in the specified folder. Depending on the queries, the appropriate indexes will be loaded.

If you add a RDF file to a TDB store, you will see that many files are created. Although this means that the actual content of the file will be stored multiple times (for each index), it will speed up query execution which is often more preferred that minimal storage usage.

OTHER TIPS

The documentation you linked includes a TDB Design link.

This page covers the technical details of what data structures are used internally and how they are stored on disk

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow