how to reindex a sparse associative array

https://stackoverflow.com/questions/13013750

13-07-2021
|

Вопрос

first off, this question is not related to a specific language - I use Haxe to target multiple platforms - so a pseudo code will be more than enough.

here's my problem : I have a sparse Matrix description in this form:

edges = 
[
1,1,2,1,3,1,4,1,
2,2,3,2,
3,3,4,3,5,3,
4,4,5,4,6,4,
5,5,6,5,7,5,25,5,27,5,28,5,29,5,30,5
];

this describes edges associations:

point 1 is linked to points 1, 2, 3 & 4
point 2 is linked to points 2 & 3
point 3 is linked to points 3, 4 & 5
point 4 is linked to points 4, 5 & 6
point 5 is linked to points 5, 6, 7, 25, 27, 28, 29 & 30

now I need to render this in 3D and to do so, I need to "compress" the data into an index buffer without "gaps". say with the above example, I'd need to get :

newEdges = 
[ 
1,2, 1, 3, 1, 4,
2,3,
3,4, 3,5,
4,5, 4,6,
5,6, 5,7, 5,8, 5,9, 5,10, 5,11, 5,12
]

so, the edge linking themselves (edge 1-1, 2-2, 3-3 etc. ) must be removed (easy).

as the point order is not important ( the edge 1-2 = edges 2-1 ) we shall also remove the duplicate edges (sort of easy).

now the tricky part is to remove the "gaps": as 7 was the highest consecutive value and 25 is the one right after, 25 must become 8, 27 must become 9, 28 must become 10 and so on.

for now I'm using a BitmapData in which I plot all the values as XY coordinates. then I recursively copy the non empty vertical stripes (1 pixel wide rect) of this bitmap next to each other into a temporary bitmap. then I do the same for the horizontal stripes and finally scan my bitmap and store X & Y values of the pixels as edges' IDs.

and it works!( at least it seems to :) ) but the overhead is terrible and depending on the input Matrix, I might just not be able to generate the Bitmaps (for instance flash is limited to 4092 pixels max, JS doesn't support copyPixels very well).

so the question is how would you do this "gap removal" without bitmaps and without a language-specific method?

hoping this was explicit enough, thanks for your attention.

Nicolas

Решение

Let E[m+1][m+1] be the 2-dimensional adjacency matrix corresponding to edges, where the range of point indices is [0..m].

Let f[n] be a sorted array of the n points visited in edges. By creating the f[n] array we're creating a mapping between the non-contiguous point indices in the range [0..m] and the contiguous point indices from [0..n-1].

Create a new adjacency matrix G as follows:

for i = 0..(n-1)
    for j = 0..(n-1)    // or, j = (i+1)..(n-1) for upper triangular portion
        G[i][j] = E[f[i]][f[j]]
    end
end

This will only take O(n^2) rather than O(m^2) time.

Edit: Removed the if statement. If both E and G are initialized to all 0's it's unnecessary.

Другие советы

Since your matrix is sparse, I suggest you use a sorted list data structure to build a sparse structure from your edge list. For every row you need to create a dynamic sorted list (ascending order) to which you add the edges. For example, for an edge (1,2) you will insert column 2 into a sorted list sorted_lists{1}. For small number of entries per row (a few hundred) this is best done using a linear search in sorted lists, followed by moving the larger elements to the end of the list. For larger number of entries per row you can use bisection to find the correct position. I routinely use this approach for sparse matrices arising in the Finite Element Method. It is in my experience the fastest approach, and it can be trivially parallelized! (split row ranges among threads)

Here is an example MATLAB code that implements a sorted list:

function list = sorted_list_insert(list, col)

% special case for empty list
if isempty(list)
    list = col;
    return;
end

% search for col position in the row
% can be done with bisection,
% but linear search is much faster for small number of entries per row
it = 1;
while it<length(list) && list(it)<col
    it = it+1;
end

% duplicate entry - do not add
if list(it)==col
    return;
end

% insert col in proper position, move other elements in the list
list = [list(1:it) col list(it+1:end)];
end

The complexity of adding all entries in a row to this sorted list is O(number of entries per row ^ 2).

The next thing you have to do is to go through your edge list and add columns to correct row sorted lists (sorted_lists{row}). In the below, edges is assumed to be a 2D array, where edges(1,i) is the column, and edges(2,i) is the row:

% find maximum row id
max_row = number of rows in the matrix

% initialize sorted list structures for all rows - max_row empty lists
sorted_lists = cell(max_row, 1);

% create sorted rows
nedges = total number of edges
for it=1:nedges
    row = edges(2,it);
    col = edges(1,it);
    sorted_lists{row} = sorted_list_insert(sorted_lists{row}, col);
end

The complexity of the above step is O(number of rows * number of entries per row ^ 2).

The last thing is to remove gaps. With the sorted lists it is trivially done by finding the position of col in the sorted lists. You also have to add the offset. From your data it looks like you deal with upper-triangular part of the matrix (you said the order of nodes in the edges does not matter). So the offset is simply the row number (-1 in MATLAB since it has 1-based numbering)

% the positions of col in every row (plus offset)
% are the new col id with removed gaps
for it=1:nedges
    offset = edges(2,it)-1;
    edges(1,it) = offset + find(sorted_lists{edges(2,it)}==edges(1,it));
end

This is how the edges look after processing with the above code:

edges =

Columns 1 through 13

 1     2     3     4     2     3     3     4     5     4     5     6     5
 1     1     1     1     2     2     3     3     3     4     4     4     5

Columns 14 through 20

 6     7     8     9    10    11    12
 5     5     5     5     5     5     5

The procedure works fine for sorted and unsorted edges. It only assumes that col >= row. That you can easily achieve. You can also easily add removal of the diagonal (i,i) edges.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow