Question

I have an array, which is filled with numbers so that it contains empty and non-empty elements and in fact consists of several arrays. It may look like this:

0 х х х 0 х х х 0 0 0 0 0 0
0 x х х х x х х 0 0 0 х х х
0 0 х х х х 0 х 0 0 0 х х х
0 х х 0 х х х х 0 0 х 0 х х
0 х х х 0 0 0 х 0 0 х х х 0
0 0 0 0 0 0 0 0 0 0 0 х х х
0 0 0 0 0 0 0 0 0 0 х х х х
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 x x x x x 0 0 0
0 0 0 0 0 x x x x 0 x 0 0 0

"0" means empty element
"x" means non-empty element

A man easily finds 3 subarrays here, but I wonder how to find them using a program. The difficulty is that the subarrays are not full, i.e. there are empty elements. Is there an algorithm to cut the initial array into subarrays or maybe anyone can help me?

P.S. The subarrays' coordinates (x1,y1,x2,y2): {2,1,8,5} {11,2,14,7} {6,9,11,10}. Also highlighted by bold

Was it helpful?

Solution

You can solve this problem in O(NM) time by using a union find data structure. Follow these steps:

Step1: Generate continuous horizontal lines in the matrix for adjacent filled elements. And label these lines with different numbers.

For this altered matrix

0 х х х 0 х х х 0 0 0 0 0 0
0 x х х х x х х 0 0 0 х х х
0 0 х х х х 0 0 0 0 0 х х х
0 0 0 0 0 0 0 0 0 0 х х х х
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 x x x x x 0 0 0
0 0 0 0 0 x x x x 0 0 0 0 0

The different lines of the matrix will become:

0 1 1 1 0 2 2 2 0 0 0 0 0 0
0 3 3 3 3 3 3 3 0 0 0 4 4 4
0 0 5 5 5 5 0 0 0 0 0 6 6 6
0 0 0 0 0 0 0 0 0 0 7 7 7 7
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 8 8 8 8 8 0 0 0
0 0 0 0 0 9 9 9 9 0 0 0 0 0

Step2: Now initialize a Union Find data structure with all the different lines as separate clusters/sets in the data structure.

Step3: Start iterating over all the columns of the matrix. Whenever you encounter the beginning of a new line, check if there is some line exactly above or below that line. If there exists such a line above or below, then union the two sets that correspond to the adjacent lines in the data-structure.

Step4: Now that you have all the different clusters of lines at the end of the algorithm, you can easily calculate the boundaries of the clusters. And hence you have all the different sub arrays.

Running time: O(NM) considering that we use a weighted quick union-find data structure with path compression providing O(α(n)) time (where α is the inverse Ackermann function) for union operations. And since the inverse ackermann function grows very slowly, I am treating it as a constant.

Edit: The algorithm described will fail for this case (as told by user beaker in the comments)

1 1 1 1 1
2 0 0 0 3
0 0 4 0 0

To handle these types of cases, apart from the mentioned data structures, we will maintain an array that will contain the occupied rectangles of each row. So when we merge line 2 with line 1, the value for the 2nd row will become 0...4 indicating that there is a rectangle that covers indices 0 to 4 of the second row. Now when line 4 is considered in step 3 it will check if the points above or below are covered by some rectangle or not. And hence it will merge line 4's cluster with line 2's cluster.

The other parts of the algorithm will remain the same except that we now check if the adjacent points are covered by some rectangle or not.

The extra information that we are maintaing for all the rows will increase the running time. Whenever we consider the matrix column wise in Step3, we may have to do O(n) extra work. So the running time rises to O(N^2 * M)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top