Question

I have a large series of numbers, in an array, about 150MB of numbers, and I need to find consecutive sequences of numbers, the sequences might be from 3 to 160 numbers. so to make it simple, I decided the each thread should start such as ThreadID = CellID

So thread0 looks at cell0, and if the number in cell0 matches my sequence, then, thread0 = cell1 and so on, and if the numbed does not match, the thread is stopped and I do that for my 20000 threads.

So that works out, fine but I wanted to know how to reuse threads, because the array in which i'm looking for the series of number is much bigger.

So should I divide my array in smaller arrays, and load them into shared memory, and loop over the number of smaller arrays and (eventually pad the last one). Or should I keep the big array in global memory, and have my thread to be to ThreadID = cellID and then ThreadID = cellID+20000 etc. or is there a better way to go through.

To clarify : At the moment i use 20 000 threads, 1 Array of numbers in Global Memory (150MB), and a sequence of numbers in shared memory (eg: 1,2,3,4,5), represented as an array. Thread0 start at Cell0, and look if the cell0 in global memory, is equal to cell0 in shared memory, if yes, thread0 compare cell1 in global memory, to cell1 in shared memory, and so on until there is a full match.

If the numbers in both (global and shared memory) cells are not equal, that thread is simply discarded. Since, most of the numbers in the Global memory Array will not match the first number of my sequence. I thought it was a good idea to use one thread to match Cell_N in GM to Cell_N in ShM and overlap the threads. and this technique allows coalesced memory access the first time, since every thread from 0 to 19 999 will access contiguous memory.

But what I would like to know, is "what would be the best way to re-use the threads" that have been discarded, or the threads that finished to match. To be able to match the entire array of 150MB instead of simply match (20000 numbers + (length of sequence -1)).

Was it helpful?

Solution

"what would be the best way to re-use the threads" that have been discarded, or the threads that finished to match. To be able to match the entire array of 150MB instead of simply match (20000 numbers + (length of sequence -1)).

You can re-use threads in a fashion similar to the canonical CUDA reduction sample (using the final implementation as a reference).

int idx = threadIdx.x+blockDim.x*blockIdx.x;
while (idx < DSIZE){
  perform_sequence_matching(idx);
  idx += gridDim.x*blockDim.x;
  }

In this way, with an arbitrary number of threads in your grid, you can cover an arbitrary problem size (DSIZE);

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top