Question

I am involving with a project that I need to run different experiments with different settings. Since each experiment takes 2 days to finish, I need to store as much information as possible for later analysis, i.e.:

learner(string) / Kernel(string)/gamma (float)/features(string)/train_size(int)/test_size(int)/train_Index(vector<int>)/test_index(vector<int>)/ predictions (array of pointers(float))/ run_time (float)

I should be able to append the file as well.One way to that would be to store everything in a text file with specific format and read it later, but it is slow and not memory friendly (I need to run about 500-1000 experiments). The other way is to use XML file which makes reading file easier. But still I am not sure what is the best way to go. Can give me some hints (or even better some examples) ?

Was it helpful?

Solution

I recommend writing to text files in a tab separated way. Use a different separator for list items (e.g. ',' or ';')

This a has comparatively little overhead space- and complexity-wise (personally i like to print results of experiments run in c++ to files and analyze them in python which has awesome built in support for stuff like reading linewise and splitting strings into lists at given separators - but all langauges offer that with more or less short syntax and you can even write it yourself in a few lines).

It should also be fast, but if you're training and running svm's, writing and reading the output will be a no-issue time-wise, anyway.

There is very little you gain by using xml or the like: The strengths of such formats are that they allow interchanging files, enforcing rules via schema, etc. You can also give proper names to each attribute. But if it's only your experiments, you do not really need those features. You'll know which column corresponds to what.

Tab separeted files have another great benefit. You can use GNU programs to access the data and look something up quickly: cut/sed/grep/etc work nicely and they don't with xml.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top