如何提高我的算法来查找阵列的最佳平衡的P-Way划分

https://cs.stackexchange.com/questions/125821

29-09-2020
|

题

我有一个 $ n $ 权重 $ w_i $ ，例如 $ w_i={4,5,12,16,3,10,1 \ \} $ ，我需要将此数组划分为 $ P $ 分区使得分区是最佳平衡的，即任何分区的最大权重和尽可能小。幸运的是，这个问题受到无法重新排序的权重的事实。如果分区的数量是三个，则上面的示例将提供最佳分区： $ \ {4,5,12 \}，\ {16 \}，\ {3,10， 1 \} $ 。

我已经找到了高效的食谱（例如分区问题，子集合，最佳分区的书籍章节，分区算法， K-Way数组分区的算法）对于权重的情况下的许多类似问题和/或分区数量在2或3处固定，但是似乎无法完全解决我的问题，其中分区数是任意的。

我已经使用鸿沟和征服算法（在下面的Python中写入）解决了问题，但对于许多分区似乎非常慢（例如，n= 100，p= 8）。所以我认为必须使用动态编程或其他一些聪明的技巧来更好的方式？

有人有任何建议吗？

慢Python划分 - 征管算法：

def findOptimalPartitions(weights, num_partitions):
    if num_partitions == 1:
        # If there is only one partition, it must start at the first index
        # and have a size equal to the sum of all weights.
        return numpy.array([0], dtype=int), sum(weights)

    # Initially we let all partitions start at zero, meaning that all but the
    # last partition gets zero elements, and the last gets them all.
    partition_offsets = numpy.array([0] * num_partitions)
    max_partition_size = sum(weights)

    # We now divide the weigths into two partitions that split at index n.
    # We know that each partition should have at least one element, so there
    # is no point in looping over all elements.
    for n in range(1, len(weights) - num_partitions):
        first_partition_size = sum(weights[:n])

        if first_partition_size > max_partition_size:
            # If the first partition size is larger than the best currently
            # found, there is no point in searching further.
            break

        # The second partition that starts at n we now further split into
        # subpartitions in a recursive manner.
        subpartition_offsets, best_subpartition_size = \
            findOptimalPartitions(weights[n:], num_partitions - 1)

        # If the maximum size of any of the current partitions is smaller
        # than the current best partitioning, we update the best partitions.
        if ((first_partition_size < max_partition_size)
                and (best_subpartition_size < max_partition_size)):
            # The first partition always start at 0. The others start at
            # ones from the subpartition relative to the current index, so
            # add the current index to those.
            partition_offsets[1:] = n + subpartition_offsets
            # Find the maximum partition size.
            max_partition_size = max(first_partition_size, best_subpartition_size)

    return partition_offsets, max_partition_size

编辑：一个琐碎的贪婪算法，其中最后一个分区通常太大。

def greedyPartition(weights, num_partitions):
    target_size = sum(weights) / num_partitions

    partition_offsets = numpy.zeros(num_partitions, dtype=int)
    partition_sizes = numpy.zeros(num_partitions, dtype=int)
    current_divider = 0
    for p in range(0, num_partitions - 1):
        partition_size = 0
        for n in range(current_divider, len(weights)):
            if partition_size + weights[n] > target_size:
                current_divider = n
                partition_offsets[p + 1] = current_divider
                partition_sizes[p] = partition_size
                break

            partition_size += weights[n]
    partition_sizes[-1] = sum(weights) - sum(partition_sizes[:-1])
    max_partition_size = max(partition_sizes)

    return partition_offsets, max_partition_size

解决方案

二进制搜索是一种常规策略，通常可以应用于寻求作为答案的单个优化数字的问题。

let $ a= [a_1，\ cdots，a_n] $ 是给定的权重阵列。假设 $ 1 \ lt p \ lt n $ ;否则，问题变得微不足道。

给定权重 $ w $ 这样 $ 1 \ le w \ le \ sum a_i $ ，我们可以关联分区 $ \ mathcal p（w）$ ， $ a= p_1 \，p_2 \，\ cdots p_m $ 这样的

子阵列 $ p_1 $ 只要它的所有重量的总和都不多于 $ W $ 。
然后子阵列 $ p_2 $ 只要它的所有权重的总和都不多于 $ w $ 。
然后子阵列 $ p_3 $ 只要它的所有权重的总和都不多于 $ w $ 。
等等。
最后，我们留下了非空的子阵列 $ p_m $ ，所有权重的总和不超过 $ w $ 。

注意， $ m= \＃\ mathcal p（w）$ 是一个分区中的最小子阵列数，每个子阵列中的所有权重的总和最多 $ w $ 。我们想找到 $ w $ 这样 $ \＃\ mathcal p（w）= p $ 和 $ \＃\ mathcal p（w + 1）

。由于来自 $ w $ 到 $ \ mathcal p（w）$ 的映射，我们可以使用二进制搜索找到它。

这里是一个算法的概要，它找到了最大的 $ w $ ，使得 $ \ mathcal p（w）= p $ 。假设给定权重是整数;否则，我们需要调整算法一点。。

let $ ma=max \ {a_i \} $ 。如果 $ p \ ge \＃\ mathcal p（ma）$ ，则最小总和是 $ ma $ 。处理这种简单的案例并返回。

let $ low= ma $ 和 $ high=sum a_i $ 。

如果 $ low \ lt高1 $ ， $ mid=（低+高）// 2 $ 和compute $ m= \＃\ mathcal p（mid）$ 。
- 如果 $ m \ le p $ ，较低 $ high $ to $ Mid $ 。
- 如果 $ m \ gt p $ ，请提升 $ low $ 到 $ Mid $ 。
返回此步骤的开始。

let $ h $ be partition $ \ mathcal p（high）$ 。如果 $ \＃h \ not= p $ 则将其中的一些 $ h $ 划分为较小的子阵列所以通过 $ p - \＃（p）$ 来增加子阵列的数量。返回 $ h $ 。

步骤3的循环不变是 $ \＃\ mathcal p（low）> p $ ， $ \＃ \ Mathcal p（high）\ Le P $ 和 $ low \ le high-1 $ 。当循环结束时，即，当 $ low== high-1 $ 时，我们仍然必须有 $ \＃\ mathcal p（低）> p $ 和 $ \＃\ mathcal p（high）\ le p $ 。这意味着， $ high $ 是大小分区中的任何子阵列中的最大权重的最小值，不超过 $ p $ ，这在很大程度上解释了这种算法是正确的。

此算法在 $ O中运行（n \ log（\ sum a_i））$ 时间，因为大多数时间都花在计算 $ \ mathcal p（w）$ ，它需要 $ o（n）$ 时间来计算 $ \ mathcal p（w）$ 对于任何给定权重 $ w $ 。

锻炼（双重问题）。给定一个 $ n $ 正整数= span class=“math-container”> $ w_i $ 和一个整数 $ P $ ，将数组分成 $ p $ 零件，使得部件最佳地平衡

D，即任何部分的最小，任何部分的权重和都是如大。

许可以下： CC-BY-SA 和归因

不隶属于 cs.stackexchange