使用numpy构建两个数组的所有组合的数组

https://stackoverflow.com/questions/1208118

05-07-2019
|

题

我试图在6参数函数的参数空间上运行以研究它的数值行为，然后再尝试使用它进行复杂的操作，所以我正在寻找一种有效的方法来实现这一点。

我的函数在给定6-dim numpy数组作为输入时获取浮点值。我最初尝试做的是：

首先，我创建了一个函数，它接受2个数组并生成一个数组，其中包含来自两个数组的所有值组合

from numpy import *
def comb(a,b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

然后我使用 reduce（）将其应用于同一数组的m个副本：

def combs(a,m):
    return reduce(comb,[a]*m)

然后我评估我的功能：

values = combs(np.arange(0,1,0.1),6)
for val in values:
    print F(val)

这有效，但它太慢了。我知道参数的空间很大，但这不应该太慢。在这个例子中，我只抽取了10个⁶（一百万）个点，并且创建数组 values 只花了15秒多。

你知道用numpy做这个更有效的方法吗？

如果有必要，我可以修改函数 F 获取它的参数的方式。

解决方案

在较新版本的 numpy （＆gt; 1.8.x）中， numpy.meshgrid（） 提供了更快的实现：

@pv的解决方案

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid（） 仅用于2D，现在它具有ND功能。在这种情况下，3D：

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

请注意，最终结果的顺序略有不同。

其他提示

这是一个纯粹的实现。它是ca. 5＆＃215;比使用itertools更快。


import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

itertools.combinations 通常是获得组合的最快方式来自Python容器（如果你确实想要组合，即没有重复和独立于顺序的安排;这不是你的代码似乎在做什么，但我不知道这是因为你的代码是错误还是因为你'使用错误的术语。）

如果你想要的东西不同于itertools中的其他迭代器， product 或 permutations ，可能会更好地为你服务。例如，您的代码看起来与以下内容大致相同：

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

所有这些迭代器都会产生元组，而不是列表或numpy数组，所以如果你的F特别挑剔得到一个numpy数组，你将不得不接受构造或清理的额外开销，并在每一步重新填充一个。

以下numpy实现应该是约。 2倍于给定答案的速度：

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix

看起来你想要一个网格来评估你的函数，在这种情况下你可以使用 numpy.ogrid （open）或 numpy.mgrid （充实）：

import numpy
my_grid = numpy.mgrid[[slice(0,1,0.1)]*6]

你可以做这样的事情

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)        
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # fake data
print(cartesian_coord(*6*[a])

给出了

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ..., 
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])

你可以使用 np.array（itertools.product（a，b））

这是另一种方式，使用纯NumPy，没有递归，没有列表理解，也没有明确的for循环。它比原始答案慢约20％，并且它基于np.meshgrid。

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # standard numpy meshgrid
    dim = len(mesh)  # number of dimensions
    elements = mesh[0].size  # number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # reshape and transpose
    return reshape

例如，

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

给出

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ..., 
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]

对于1D数组（或平面python列表）的笛卡尔积的纯粹numpy实现，只需使用 meshgrid（），用 transpose（）滚动轴，并且重塑到所需的输出：

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'), 
                      roll(arange(N + 1), -1)).reshape(-1, N)

注意，这具有最后一个轴变化最快的惯例（“C样式”或“行主要”）。

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]: 
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

如果您想快速更改第一个轴（“FORTRAN样式”或“column-major”），只需更改的 order 参数即可reshape（）像这样： reshape（（ - 1，N），order ='F'）

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow