NumPy：比较两个数组中的元素

https://stackoverflow.com/questions/1613249

python
numpy

06-07-2019
|

题

有人遇到过这个问题吗？假设您有两个数组，如下所示

a = array([1,2,3,4,5,6])
b = array([1,4,5])

有没有办法比较b中存在的元素？例如，

c = a == b # Wishful example here
print c
array([1,4,5])
# Or even better
array([True, False, False, True, True, False])

我正在努力避免循环，因为数百万元素需要花费数年时间。有任何想法吗？

干杯

解决方案

实际上，有一个比任何一个更简单的解决方案：

import numpy as np

a = array([1,2,3,4,5,6])
b = array([1,4,5])

c = np.in1d(a,b)

结果c是：

array([ True, False, False,  True,  True, False], dtype=bool)

其他提示

使用np.intersect1d。

#!/usr/bin/env python
import numpy as np
a = np.array([1,2,3,4,5,6])
b = np.array([1,4,5])
c=np.intersect1d(a,b)
print(c)
# [1 4 5]

请注意，如果a或b具有非唯一元素，则np.intersect1d会给出错误的答案。在那种情况下使用 np.intersect1d_nu。

还有np.setdiff1d，setxor1d，setmember1d和union1d。看到带有文档的Numpy示例列表

感谢您的回复kaizer.se。这不是我想要的，但是根据朋友的建议和你所说的我想出了以下内容。

import numpy as np

a = np.array([1,4,5]).astype(np.float32)
b = np.arange(10).astype(np.float32)

# Assigning matching values from a in b as np.nan
b[b.searchsorted(a)] = np.nan

# Now generating Boolean arrays
match = np.isnan(b)
nonmatch = match == False

这是一个繁琐的过程，但它比编写循环或使用带循环的编织更好。

干杯

Numpy有一个set函数numpy.setmember1d（），它对sort和uniqued数组有效，并返回你想要的布尔数组。如果输入数组与您需要转换为设置格式的条件不匹配，并反转结果转换。

import numpy as np
a = np.array([6,1,2,3,4,5,6])
b = np.array([1,4,5])

# convert to the uniqued form
a_set, a_inv = np.unique1d(a, return_inverse=True)
b_set = np.unique1d(b)
# calculate matching elements
matches = np.setmea_set, b_set)
# invert the transformation
result = matches[a_inv]
print(result)
# [False  True False False  True  True False]

编辑：不幸的是，numpy中的setmember1d方法效率很低。您提出的搜索排序和分配方法工作得更快，但如果您可以直接分配，您也可以直接分配给结果并避免大量不必要的复制。如果b包含不在a中的任何内容，您的方法也会失败。以下更正了这些错误：

result = np.zeros(a.shape, dtype=np.bool)
idxs = a.searchsorted(b)
idxs = idxs[np.where(idxs < a.shape[0])] # Filter out out of range values
idxs = idxs[np.where(a[idxs] == b)] # Filter out where there isn't an actual match
result[idxs] = True
print(result)

我的基准测试显示91us对比你的方法为6.6ms，对于1M元素a和100元素b的numpy setmember1d为109ms。

ebresset，您的答案将无效a是b的子集（并且a和b被排序）。否则searchsorted将返回false索引。我必须做类似的事情，并将其与您的代码结合起来：

# Assume a and b are sorted
idxs = numpy.mod(b.searchsorted(a),len(b))
idxs = idxs[b[idxs]==a]
b[idxs] = numpy.nan
match = numpy.isnan(b)

您的示例意味着类似集合的行为，更关心数组中存在，而不是在正确的位置使用正确的元素。 Numpy通过数学数组和矩阵对此做了不同的处理，它只会告诉您有关正确位置的项目。你能为你做这件事吗？

>>> import numpy
>>> a = numpy.array([1,2,3])
>>> b = numpy.array([1,3,3])
>>> a == b
array([ True, False,  True], dtype=bool)

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow