Question

I created a dynamic C++ library which depends on some 30 or more mb of data. Now im trying to figure out what is the best way to store that data.

The data is essentially a one big array with over million elements.

I want the installing/uninstalling of the library be as simple as possible. The library can be references by other interface programs such as Terminal program, R program etc. It needs to only support UNIX

One idea i had was to hardcode the data into one big array and compile the file into the library, but that doesnt seem to be the correct nor efficient way to do things. Also, if i make the file over 1 gb large, then thigs get out of hand.

Another i has was to copy the file with data to a predefined path and hardcode the reference in the library to that path. However, some users dont want to install everything to the default installation path.

Another idea i had was to let each interface have the ability to provide the path of the data file, but that seems such a hassle for the interface and why should the interface know where the library data is?

Is there any well knows practice for such a case?

Was it helpful?

Solution

I don't think there is one "right" answer to this.

Storing data in the file is fine, as long as the data isn't changing more often than you wish to release a new library - you need the amount of storage in some way or another anyway, so as long as the compiler doesn't do a terrible job with storing the data in the shared library, it's no worse than any other options, as far as I see it.

Having a secondary file is only useful if you expect the data to be changed more often than you wish to release a new shared library. It adds the extra complication of opening and reading the secondary file - the drawback is that you then also need to add checking that it's correct/present and code dealing with it not being there.

If you do have a secondary file, having SOME way to redefine the location would definitely be beneficial.

If the data is really large, you may want to use a compressed format. You can still store compressed data as data in your shared library, and use a compression library that can expand the data from that. Or you can use a library that reads from an external file...

In the end, it really comes down to:

  1. How you are using the data - do you always need ALL of it, or do you just need some of it at times? If the latter, how do you know which bits?
  2. How often the data changes.
  3. If the data can be compressed or not, and if so by what method do you compress it?

I'm not sure there is any direct size limits on a shared library - if you need 1GB of data, then you need 1GB of space in memory either way, so it's not like you are saving memory [assuming you always need ALL the data and/or can't determine which parts you need].

OTHER TIPS

You can use a test file and save data in it as a comppressed binary format. then distribut the text file and the dll/lib together

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top