Custom indexes with C++ matrix

Question 1

If you want to switch columns, rows of matrices, you can use some indirection:

 indexTable[0][0] = 0; // map row index 0 to 0
 indexTable[1][0] = 5; // map column index 0 to 5

and use it like this:

 value = matrix[indexTable[0][RowIndex]][indexTable[1][ColumnIndex];

or you can write a class to handle this indirection for you.

Question 2

A data.frame in R is essentially just a fancy wrapper for a list of columns, and a list is – illogically – a close equivalent of std::map¹ in C++ (rather than, as its name suggests, of std::list).

So in other words you can just use a typedef similar to this to approximate data.frame:

typedef std::map<int, std::vector<int>> data_frame;

… but R classes are actually quite a bit more powerful since they are to some extent generic, support different types inside the data frame, check that all rows have the same length and allow named access for columns and rows. And finally let’s not forget that R has support for working with data frames, pretty-printing them and loading and saving them efficiently.

Depending on what you need it’s of course unnecessary to replicate all this in C++ but it’s definitely beneficial to wrap your structure inside a class and provide a proper interface to access it.

¹ Actually, a std::unordered_map. This requires C++11.

Question 3

I would create a class which

converts arbitrary specified indices to 0-based incremental indices
wraps a 1D array such that you can access it as a 2D array using the converted indices

A working example would look something like this

#include <iostream>
#include <vector>
#include <algorithm>
#include <stdexcept>
#include <iterator>
#include <cassert>

using namespace std;

class DataFrame
{
  vector<int> data;

public:
  typedef vector<ssize_t> idx_t;
private:
  idx_t rowIdx;
  idx_t colIdx;

public:
  DataFrame(const idx_t &rowIdx, const idx_t &colIdx)
    : data(rowIdx.size() * colIdx.size())
    , rowIdx(rowIdx)
    , colIdx(colIdx)
  {
    assert(is_sorted(rowIdx.begin(), rowIdx.end()));
    assert(is_sorted(colIdx.begin(), colIdx.end()));
  }

  int& operator()(int i, int j)
  {
    idx_t::iterator itI, itJ;

    itI = lower_bound(rowIdx.begin(), rowIdx.end(), i);
    if(rowIdx.end() == itI || i != *itI) throw out_of_range("could not find specified row");
    itJ = lower_bound(colIdx.begin(), colIdx.end(), j);
    if(colIdx.end() == itJ || j != *itJ) throw out_of_range("could not find specified col");

    return data[distance(rowIdx.begin(), itI)*colIdx.size() +
      distance(colIdx.begin(), itJ)];
  }

  vector<int> & getData() { return data; }
};


int main()
{
  DataFrame::idx_t rI, cI;
  rI.push_back(3);
  rI.push_back(5);

  cI.push_back(2);
  cI.push_back(3);
  cI.push_back(10);

  DataFrame df(rI, cI);

  df(3,2) = 1;
  df(3,3) = 2;
  df(3,10) = 3;
  df(5,2) = 4;
  df(5,3) = 5;
  df(5,10) = 6;

  ostream_iterator<int> out_it(cout, ", ");
  copy(df.getData().begin(), df.getData().end(), out_it);
  cout << endl;

  return 0;
}

The arbitrary indices of each row/column are specified in a vector. To maintain some performance, the code requires that indices are monotonically increasing. (If you have C++11, this is checked in the ctor; if you don't have C++11, then you don't have the is_sorted function. Also, this code doesn't verify uniqueness of arbitrary indices.)

When you access the data, it just does a binary search on each vector of indices to find the position in the vector that matches the arbitrary index, and uses that position as the corresponding index to the underlying data. There is a simple conversion from 2D indices to a 1D index.

You may want to check that my index error checking is correct for all combinations of bad/good indices if you need to worry about that.

I'll leave it to you to add more robustness/functionality in terms of const accessors, various constructors, etc. If you want to generalize this to arrays of dimension other than 2, I recommend you create a class for just converting arbitrary index to 0-based index, and that will get rid of some of the code repetition I have. There are also other ways to do the conversion of arbitrary index to 0-based index, such as with a map like others have suggested. In that case though, there are some problems such as the creator of the map would have to ensure that if there are say 10 columns, each index in [0, 10) appears exactly once as a value in the map.