Question

I have a GUI application which works with point cloud data and a quadtree data structure behind it to handle the data. As the point format I'm working with has changed recently I had to modify my point class to hold the new attributes, which cause Point objects to grow in size significantly and in effect reducing the performance of my quadtree. Some of this attributes are not needed for displaying and processing the data, but they still need to be preserved in the output. This is roughly how my point class looks at the moment:

class Point {
public:
    /* ... */
private:
    /* Used data members */
    double x;
    double y;
    double z;
    double time;
    int attr1;
    int attr2;

    /* Unused data members */
    int atr3;
    double atr4;
    float atr5;
    float atr6;
    float atr7;
}

When the data is loaded from the file the Points are stored in a Point* array and then handled by the quadtree. Similarly when they are savedan array of points is passed from the quadtree and saved to a file. Note that Point objects I'm using in my quadtree are different fomr those stored in the file, but I'm using a library that provides reader and writer objects which I use to create my points. Here's an example:

int PointLoader::load(int n, Point* points) {

    Point temp;
    int pointCounter = 0;

    /* reader object is provided by the library and declared elsewhere */        
    while (pointCounter < n && reader->read_point()) {
        temp = Point(reader->get_x(), reader->get_y(), reader->get_z(), /* ... */ )

        points[pointCounter] = temp;
        ++pointCounter;        
    }
    return pointCounter;
}

Now, my idea is to reduce size of the Point class, and store unused attributes in another class (or struct) called PointData on the hard drive. This is necessary because the data usually doesn't fit in memory and there's a caching system in place, which again would benefit from smaller point objects. So given the example it would look something like this:

int PointLoader::load(int n, Point* points) {

    Point temp;
    PointData tempData;
    int pointCounter = 0;    

    while (pointCounter < n && reader->read_point()) {
        temp = Point(reader->get_x(), reader->get_y(), reader->get_z(), /* ... */ )
        pointData = (reader->get_attr3(), reader->get_attr4(), /* ... */)

        temp.dataHandle = /* some kind of handle to the data object */
        points[pointCounter] = temp;

        /* Save pointData to file to retrieve when saving points */

        ++pointCounter;        
    }
    return pointCounter;
}

Then when I save my modified points I'd simply used the dataHandle (file offset? an index in memory mapped array?) to retrieve the pointData of each point and write it back to the file.

Does that sound like a good idea? What would be the most sensible approach of achieving this?

Was it helpful?

Solution

I would suggest you use mapped files for storing the additional data. This will automatically cause them to be flushed to disk and removed from RAM if there is memory pressure, but they will stay resident in RAM most of the time if there is enough memory.

In your Point class, storing offsets in the file is better than storing direct pointers into the mapped memory region as offsets will still be correct if you have to remap the file in order to grow it (you have to grow the file using e.g. lseek() yourself, as you can only map as much as the size of the file).

This mechanism is very convenient to code to, but you must have enough address space to map the whole file - no issue in a 64-bit app, but possibly a problem if you're 32-bit and need more than a few hundred MB of data in the file. You can of course map and unmap multiple files, but it requires more coding work and is less performant (there is some cost to mapping and unmapping files).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top