Numpy를 사용하여 두 배열의 모든 조합 배열을 구축합니다.

https://stackoverflow.com/questions/1208118

05-07-2019
|

문제

나는 6 개의 매개 변수 함수의 매개 변수 공간을 실행하여 복잡한 일을하려고 노력하기 전에 수치 행동을 연구하여 효율적인 방법을 찾고 있습니다.

내 함수는 입력으로 6-dim numpy 배열이 주어지면 플로트 값을 가져옵니다. 내가 처음에 시도한 것은 다음과 같습니다.

먼저 2 개의 배열을 취하고 두 배열의 모든 값 조합으로 배열을 생성하는 함수를 만들었습니다.

from numpy import *
def comb(a,b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

그런 다음 사용했습니다 reduce() 이를 동일한 배열의 M 사본에 적용하려면 :

def combs(a,m):
    return reduce(comb,[a]*m)

그런 다음 다음과 같은 기능을 평가합니다.

values = combs(np.arange(0,1,0.1),6)
for val in values:
    print F(val)

이것은 작동하지만 너무 느립니다. 매개 변수의 공간이 크다는 것을 알고 있지만 그렇게 느리지 않아야합니다. 나는 10 만 샘플링했다⁶ (백만)이 예제의 포인트는 배열을 만드는 데 15 초 이상 걸렸습니다. values.

Numpy로 더 효율적인 방법을 알고 있습니까?

기능의 방식을 수정할 수 있습니다 F 필요한 경우 논쟁을 취합니다.

해결책

최신 버전의 numpy (> 1.8.x), numpy.meshgrid() 훨씬 더 빠른 구현을 제공합니다.

@PV의 솔루션

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid() 2D 만 사용하면 이제 ND가 가능합니다. 이 경우 3D :

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

최종 결과의 순서는 약간 다릅니다.

다른 팁

순수한 새로운 구현이 있습니다. CA입니다. IterTools를 사용하는 것보다 5 × 빠릅니다.


import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

itertools.combinations 일반적으로 Python 컨테이너에서 조합을 얻는 가장 빠른 방법입니다 (실제로 조합, 즉 반복없이 배열 및 순서와 무관 한 조합을 원한다면, 코드가 수행하는 것처럼 보이는 것은 아니지만, 그것이 코드가 버그가 많거나 잘못된 용어를 사용하고 있기 때문에).

조합과 다른 것을 원한다면 아마도 Itertools의 다른 반복자 일 것입니다. product 또는 permutations, 더 나은 서비스를 제공 할 수 있습니다. 예를 들어, 코드는 다음과 거의 동일합니다.

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

이 모든 반복자는 목록이나 멍청한 배열이 아닌 튜플을 생성하므로 F가 구체적으로 낭비가 많은 배열을 얻는 것에 대해 까다로운 경우 각 단계마다 하나를 구성하거나 청소하고 다시 채워야하는 여분의 오버 헤드를 수용해야합니다.

다음 Numpy 구현은 약입니다. 2x 주어진 답변 속도 :

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix

기능을 평가하기 위해 그리드를 원하는 것 같습니다.이 경우 사용할 수 있습니다. numpy.ogrid (오픈) 또는 numpy.mgrid (육체) :

import numpy
my_grid = numpy.mgrid[[slice(0,1,0.1)]*6]

당신은 이런 일을 할 수 있습니다

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)        
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # fake data
print(cartesian_coord(*6*[a])

주는 것

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ..., 
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])

당신이 사용할 수있는 np.array(itertools.product(a, b))

여기에 또 다른 방법은 순수한 Numpy, 재귀, 목록 이해력 없음, 루프에 대한 명시적인 방법을 사용합니다. 원래 답변보다 약 20% 느리고 NP.Meshgrid를 기반으로합니다.

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # standard numpy meshgrid
    dim = len(mesh)  # number of dimensions
    elements = mesh[0].size  # number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # reshape and transpose
    return reshape

예를 들어,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

주어진

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ..., 
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]

1D 어레이 (또는 플랫 파이썬 목록)의 직교 제품의 순수한 Numpy 구현을 위해서만 사용하십시오. meshgrid(), 도끼를 굴립니다 transpose(), 원하는 oUput로 재구성 :

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'), 
                      roll(arange(N + 1), -1)).reshape(-1, N)

이것은 마지막 축의 규칙이 가장 빠르게 바뀌는 규칙을 가지고 있습니다 ( "C 스타일"또는 "행-메이 조르").

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]: 
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

당신이 바꾸고 싶다면 첫 번째 축 빠른 축 ( "Fortran Style"또는 "Column-Major")은 order 매개 변수 reshape() 이와 같이: reshape((-1, N), order='F')

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow