Ideas Related to Subset Sum with 2,3 and more integers

https://stackoverflow.com/questions/10964708

13-06-2021
|

Pergunta

I've been struggling with this problem just like everyone else and I'm quite sure there has been more than enough posts to explain this problem. However in terms of understanding it fully, I wanted to share my thoughts and get more efficient solutions from all the great people in here related to Subset Sum problem.

I've searched it over the Internet and there is actually a lot sources but I'm really willing to re-implement an algorithm or finding my own in order to understand fully.

The key thing I'm struggling with is the efficiency considering the set size will be large. (I do not have a limit, just conceptually large). The two phases I'm trying to implement ideas on is finding two numbers that are equal to given integer T, finding three numbers and eventually K numbers. Some ideas I've though;

For the two integer part I'm thing basically sorting the array O(nlogn) and for each element in the array searching for its negative value. (i.e if the array element is 3 searching for -3). Maybe a hash table inclusion could be better, providing a O(1) indexing the element?

For the three or more integers I've found an amazing blog post;http://www.skorks.com/2011/02/algorithms-a-dropbox-challenge-and-dynamic-programming/. However even the author itself states that it is not applicable for large numbers.

So I was for 2 and 3 and more integers what ideas could be applied for the subset problem. I'm struggling with setting up a dynamic programming method that will be efficient for the large inputs as well.

Solução

That blog post you linked to looked pretty great, actually. This is, after all, an NP-complete problem...

But I bet you could speed it up even further. I haven't done any benchmarks, but I'm gonna guess that his use of a matrix is his single biggest time sink. First, it'll take a huge amount of memory for some really trivial inputs (For example: [-1000, 1000] will need 2001 columns! Good grief!), and then you're wasting a ton of cycles scanning through each row looking for "T"s, which are often gonna be pretty sparse.

So instead: Use a "set" data structure. That'll keep space and iteration time to a minimum,* but store values just as well: If it's in the set, it's a "T"; otherwise, it's an "F".

Hope that helps!

*: Of course, "minimum" doesn't necessarily = "small."

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow