Question

I'm currently working on a DNA database class and I currently associate each row in the database with both a match score (based on edit distance) and the actual DNA sequence itself, is it safe to modify first this way within an iteration loop?

typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT>          DnaDatabaseT;

// ....

for(DnaDatabaseT::iterator it = database.begin();
    it != database.end(); it++)
{
    int score = it->second.query(query);
    it->first = score;
}

The reason I am doing this is so that I can sort them by score later. I have tried maps and received a compilation error about modifying first, but is there perhaps a better way than this to store all the information for sorting later?

Was it helpful?

Solution

To answer your first question, yes. It is perfectly safe to modify the members of your pair, since the actual data in the pair does not affect the vector itself.

edit: I have a feeling that you were getting an error when using a map because you tried to modify the first value of the map's internal pair. That would not be allowed because that value is part of the map's inner workings.

As stated by dribeas:

In maps you cannot change first as it would break the invariant of the map being a sorted balanced tree

edit: To answer your second question, I see nothing at all wrong with the way you are structuring the data, but I would have the database hold pointers to DnaPairT objects, instead of the objects themselves. This would dramatically reduce the amount of memory that gets copied around during the sort procedure.

#include <vector>
#include <utility>
#include <algorithm> 

typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT *>       DnaDatabaseT;

// ...

// your scoring code, modified to use pointers
void calculateScoresForQuery(DnaDatabaseT& database, queryT& query)
{
    for(DnaDatabaseT::iterator it = database.begin(); it != database.end(); it++)
    {
        int score = (*it)->second.query(query);
        (*it)->first = score;
    }
}

// custom sorting function to handle DnaPairT pointers
bool sortByScore(DnaPairT * A, DnaPairT * B) { return (A->first < B->first); }

// function to sort the database
void sortDatabaseByScore(DnaDatabaseT& database)
{
    sort(database.begin(), database.end(), sortByScore);
}

// main
int main()
{
    DnaDatabaseT database;

    // code to load the database with DnaPairT pointers ...

    calculateScoresForQuery(database, query);
    sortDatabaseByScore(database);

    // code that uses the sorted database ...
}

The only reason you might need to look into more efficient methods is if your database is so enormous that the sorting loop takes too long to complete. If that is the case, though, I would imagine that your query function would be the one taking up most of the processing time.

OTHER TIPS

You can't modify since the variable first of std::pair is defined const

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top