如何从列表列表中制作平面列表

https://stackoverflow.com/questions/952914

11-09-2019
|

题

我想知道是否有一种快捷方式可以在Python中从列表列表中创建一个简单的列表。

我可以在 for 循环，但也许有一些很酷的“一行”？我尝试过减少, ，但我收到错误。

代码

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

错误信息

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

解决方案

鉴于列表l的列表，

flat_list = [item for sublist in l for item in sublist]

这意味着：

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

比迄今发布快捷方式更快。（l是变平的列表。）

下面是相应的功能：

flatten = lambda l: [item for sublist in l for item in sublist]

作为证据，则可以使用timeit模块中的标准库：

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

说明：基于+快捷方式（包括在sum隐含使用）是，必要性，O(L**2)当存在L子列表 - 作为中间结果列表保持越来越长，在每一步新的中间结果列表对象获取分配的，在之前的中间结果中的所有项目必须被复制（以及在末尾添加了几个新的）。因此，为了简单起见，没有一般性的实际损失，说你的我的项目大号子列表每个：第一我项目复制来回L-1次，第二我的项目L-2倍，依此类推;副本的总数为x的对于x从1到L排除余时间的总和，即，I * (L**2)/2。

在列表理解只是生成一个列表，一次，并将其复制在每个项目（从原来的居住地点到结果列表中的）也正好一次。

其他提示

可以使用 itertools.chain() ：

>>> import itertools
>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain(*list2d))

或，关于Python> = 2.6，使用 itertools.chain.from_iterable() 不需要拆包列表：

>>> import itertools
>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain.from_iterable(list2d))

本方法是可以说比[item for sublist in l for item in sublist]更易读并且似乎更快太：

[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools' 'list(itertools.chain.from_iterable(l))'
10000 loops, best of 3: 24.2 usec per loop
[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 45.2 usec per loop
[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 488 usec per loop
[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 522 usec per loop
[me@home]$ python --version
Python 2.7.3

<强>从作者注意：这是低效的。但有趣的，因为幺是真棒。这不适合于生产Python代码。

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

这只是概括的迭代中第一个参数传递的元素，处理第二个参数为所述和的初始值（如果没有给出，0来代替与该情况下会给出错误）。

由于你正在总结嵌套列表时，实际上得到作为[1,3]+[2,4] sum([[1,3],[2,4]],[])的结果，它等于[1,3,2,4]。

请注意，仅适用于列表的列表。对于名单列表的列表，你需要另一种解决方案。

我测试与 perfplot （我的宠物项目，基本上围绕timeit一个包装）最建议的解决方案之后，发现

functools.reduce(operator.iconcat, a, [])

是最快的溶液。（operator.iadd同样快。）

代码重现情节：

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    kernels=[
        forfor, sum_brackets, functools_reduce, functools_reduce_iconcat,
        itertools_chain, numpy_flat, numpy_concatenate
        ],
    n_range=[2**k for k in range(16)],
    logx=True,
    logy=True,
    xlabel='num lists'
    )

from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

在你的实施例中的extend()方法修改，而不是返回一个有用的值（其x预计）reduce()。

一个更快的方式做reduce版本将是

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

这是适用于的通用方法数字, 字符串, 嵌套的 列表和 混合的 容器。

代码

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

笔记:

在Python 3中， yield from flatten(x) 可以替代 for sub_x in flatten(x): yield sub_x
在Python 3.8中，抽象基类是搬家了从 collection.abc 到 typing 模块。

演示

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

参考

该解决方案是根据中的配方修改的 比兹利，D.和B。琼斯.食谱 4.14，Python Cookbook 第三版，O'Reilly Media Inc.加利福尼亚州塞瓦斯托波尔：2013年。
发现一个更早的所以帖子, ，可能是原始演示。

如果你想展平一个不知道嵌套深度的数据结构，你可以使用 iteration_utilities.deepflatten¹

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

它是一个生成器，因此您需要将结果转换为 list 或明确地迭代它。

要仅展平一个级别，并且如果每个项目本身都是可迭代的，您也可以使用 iteration_utilities.flatten 它本身只是一个薄薄的包装纸 itertools.chain.from_iterable:

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

只是添加一些计时（基于 Nico Schlömer 的答案，不包括此答案中提供的功能）：

这是一个双对数图，可以容纳大范围的值。对于定性推理：越低越好。

结果表明，如果可迭代对象仅包含几个内部可迭代对象，则 sum 将是最快的，但是对于长迭代只有 itertools.chain.from_iterable, iteration_utilities.deepflatten 或者嵌套理解具有合理的性能 itertools.chain.from_iterable 是最快的（Nico Schlömer 已经注意到）。

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

^{1 免责声明：我是那个图书馆的作者}

我把我的发言了。和不赢家。虽然这是更快，当列表很小。但性能具有较大列表显著下降。

>>> timeit.Timer( '[item for sublist in l for item in sublist]', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000' ).timeit(100) 2.0440959930419922

总和版本仍在运行时间超过一分钟，还没有做到尚未处理！

有关介质列表：

>>> timeit.Timer( '[item for sublist in l for item in sublist]', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10' ).timeit() 20.126545906066895 >>> timeit.Timer( 'reduce(lambda x,y: x+y,l)', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10' ).timeit() 22.242258071899414 >>> timeit.Timer( 'sum(l, [])', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10' ).timeit() 16.449732065200806

使用小列表和timeit：号码= 1000000

>>> timeit.Timer( '[item for sublist in l for item in sublist]', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]' ).timeit() 2.4598159790039062 >>> timeit.Timer( 'reduce(lambda x,y: x+y,l)', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]' ).timeit() 1.5289170742034912 >>> timeit.Timer( 'sum(l, [])', 'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]' ).timeit() 1.0598428249359131

有似乎是一个混淆operator.add！当您添加两个列表一起，对于正确的说法是concat，而不是增加。 operator.concat是你需要使用的东西。

如果你正在考虑的功能，它是那么容易，因为这::

>>> from functools import reduce >>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9)) >>> reduce(operator.concat, list2d) (1, 2, 3, 4, 5, 6, 7, 8, 9)

您看到降低方面的序列类型，所以当你提供一个元组，你回来的元组。让我们尝试用一个列表：

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]] >>> reduce(operator.concat, list2d) [1, 2, 3, 4, 5, 6, 7, 8, 9]

哈哈，你回来的列表。

如何性能::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]] >>> %timeit list(itertools.chain.from_iterable(list2d)) 1000000 loops, best of 3: 1.36 µs per loop

from_iterable是相当快！但它没有比较，以减少与concat。

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9)) >>> %timeit reduce(operator.concat, list2d) 1000000 loops, best of 3: 492 ns per loop

为什么使用扩展？

reduce(lambda x, y: x+y, l)

此应该正常工作。

考虑安装 more_itertools 包裹。

> pip install more_itertools

它附带了一个实现 flatten (来源, ，来自 itertools 食谱):

import more_itertools lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] list(more_itertools.flatten(lst)) # [1, 2, 3, 4, 5, 6, 7, 8, 9]

从版本 2.4 开始，您可以使用以下命令展平更复杂的嵌套迭代 more_itertools.collapse (来源, ，由 abarnet 贡献）。

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] list(more_itertools.collapse(lst)) # [1, 2, 3, 4, 5, 6, 7, 8, 9] lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9] # complex nesting list(more_itertools.collapse(lst)) # [1, 2, 3, 4, 5, 6, 7, 8, 9]

您的功能没有工作的原因是：扩展延伸就地阵列和不会返回。您仍然可以从拉姆达返回X，使用一些技巧：

reduce(lambda x,y: x.extend(y) or x, l)

注意：延伸大于在列表+更有效率

如果您使用的是不要重新发明轮子的 Django的：

>>> from django.contrib.admin.utils import flatten >>> l = [[1,2,3], [4,5], [6]] >>> flatten(l) >>> [1, 2, 3, 4, 5, 6]

... <强>熊猫：

>>> from pandas.core.common import flatten >>> list(flatten(l))

... <强> Itertools ：

>>> import itertools >>> flatten = itertools.chain.from_iterable >>> list(flatten(l))

... <强> Matplotlib

>>> from matplotlib.cbook import flatten >>> list(flatten(l))

... <强> Unipath ：

>>> from unipath.path import flatten >>> list(flatten(l))

... <强> setuptools的：

>>> from setuptools.namespaces import flatten >>> list(flatten(l))

def flatten(l, a): for i in l: if isinstance(i, list): flatten(i, a) else: a.append(i) return a print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], [])) # [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]

以上Anil的函数的一个不良特征是，它需要用户总是手动指定的第二个参数是一个空列表[]。这应该不是是默认。由于方式Python对象工作，这些应该在函数内部中设置，而不是在参数。

这里的一个工作功能：

def list_flatten(l, a=None): #check a if a is None: #initialize with empty list a = [] for i in l: if isinstance(i, list): list_flatten(i, a) else: a.append(i) return a

测试：

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]] In [3]: lst Out[3]: [1, 2, [3], [[4]], [5, [6]]] In [11]: list_flatten(lst) Out[11]: [1, 2, 3, 4, 5, 6]

，即使它们巢matplotlib.cbook.flatten()将用于嵌套列表的工作深度大于的例子中

import matplotlib l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] print(list(matplotlib.cbook.flatten(l))) l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]] print list(matplotlib.cbook.flatten(l2))

结果：

[1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

这比下划线._ 18倍更快弄平：

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec (time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

可变长度的基于文本的列表打交道时接受的答案并没有为我工作。这里是做了工作为我的替代方法。

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

接受的答案是没有的不的工作：

flat_list = [item for sublist in l for item in sublist] print(flat_list) ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

新提出的解决方案，它的没有的对我的工作：

flat_list = [] _ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item] print(flat_list) ['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

递归版本

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]] def flatten_list(k): result = list() for i in k: if isinstance(i,list): #The isinstance() function checks if the object (first argument) is an #instance or subclass of classinfo class (second argument) result.extend(flatten_list(i)) #Recursive call else: result.append(i) return result flatten_list(x) #result = [1,2,3,4,5,6,7,8,9,10]

随着似乎最简单的对我说：

>>> import numpy as np >>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] >>> print (np.concatenate(l)) [1 2 3 4 5 6 7 8 9]

一还可以使用NumPy的的平坦：

import numpy as np list(np.array(l).flat)

编辑2016年11月2日：只有当工作子列表具有相同的尺寸
。

您可以使用numpy的：点击 flat_list = list(np.concatenate(list_of_list))

为简单代码 underscore.py 包风扇

from underscore import _ _.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]]) # [1, 2, 3, 4, 5, 6, 7, 8, 9]

它解决了所有问题弄平（无列表项或复杂的嵌套）

from underscore import _ # 1 is none list item # [2, [3]] is complex nesting _.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]]) # [1, 2, 3, 4, 5, 6, 7, 8, 9]

可以安装 underscore.py 与PIP

pip install underscore.py

def flatten(alist): if alist == []: return [] elif type(alist) is not list: return [alist] else: return flatten(alist[0]) + flatten(alist[1:])

flat_list = [] for i in list_of_list: flat_list+=i

此代码也可以罚款，它只是扩展列表中的所有方式。虽然它是非常相似的，但只有一个for循环。所以它具有较少的复杂性比加入2 for循环。

如果你愿意放弃速度的微小量简洁的外观，那么你可以使用numpy.concatenate().tolist()或numpy.concatenate().ravel().tolist()：

import numpy l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99 %timeit numpy.concatenate(l).ravel().tolist() 1000 loops, best of 3: 313 µs per loop %timeit numpy.concatenate(l).tolist() 1000 loops, best of 3: 312 µs per loop %timeit [item for sublist in l for item in sublist] 1000 loops, best of 3: 31.5 µs per loop

您可以找到更多在这里的文档 numpy.concatenate 和 numpy.ravel

最快的解决方案，我发现（用于大列表反正）：

import numpy as np #turn list into an array and flatten() np.array(l).flatten()

完成！可以通过执行列表当然将其重新插入到列表（L）

这可能不是最有效的方式，但我认为把一个一行（实际上是一个两衬垫）。这两个版本将工作于任意级嵌套的列表，并利用语言特性（Python3.5）和递归。

def make_list_flat (l): flist = [] flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l] return flist a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]] flist = make_list_flat(a) print (flist)

的输出是

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

这个工作在一个深度优先的方式。递归下降，直到它找到一个非列表元素，然后延伸局部变量flist然后辊它回父。每当返回flist，将其推广到列表中理解父母的flist。因此，在根，则返回一个平坦的列表。

上面的一个创建几个本地列表和返回它们被用于扩展父列表。我想的方式围绕该可创建gloabl flist，象下面。

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]] flist = [] def make_list_flat (l): flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l] make_list_flat(a) print (flist)

在再次输出是

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

虽然我不知道在这个时候大约效率。

笔记: ：以下适用于 Python 3.3+，因为它使用 yield_from. six 也是第三方包，虽然稳定。或者，您可以使用 sys.version.

如果是 obj = [[1, 2,], [3, 4], [5, 6]], ，这里的所有解决方案都很好，包括列表理解和 itertools.chain.from_iterable.

然而，考虑一下这个稍微复杂的情况：

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

这里有几个问题：

一个元素， 6, 只是一个标量；它是不可迭代的，所以上面的路线在这里会失败。

一个元素， 'abc', 是技术上可迭代（所有 str是）。然而，仔细阅读一下字里行间，您不想将其视为这样——您希望将其视为单个元素。

最后的元素， [8, [9, 10]] 本身就是一个嵌套的可迭代对象。基本列表理解和 chain.from_iterable 只提取“下一级”。

您可以按如下方式解决此问题：

>>> from collections import Iterable >>> from six import string_types >>> def flatten(obj): ... for i in obj: ... if isinstance(i, Iterable) and not isinstance(i, string_types): ... yield from flatten(i) ... else: ... yield i >>> list(flatten(obj)) [1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

在这里，您检查子元素 (1) 是否可迭代 Iterable, ABC 从 itertools, ，但还要确保 (2) 元素是不是 “像绳子一样”。

另一个不寻常的方法对杂合子和整数的均匀列表如下：

from typing import List def flatten(l: list) -> List[int]: """Flatten an arbitrary deep nested list of lists of integers. Examples: >>> flatten([1, 2, [1, [10]]]) [1, 2, 1, 10] Args: l: Union[l, Union[int, List[int]] Returns: Flatted list of integer """ return [int(i.strip('[ ]')) for i in str(l).split(',')]

使用从reduce functools和上列出了add运营商A简单的递归方法：

>>> from functools import reduce >>> from operator import add >>> flatten = lambda lst: [lst] if type(lst) is int else reduce(add, [flatten(ele) for ele in lst]) >>> flatten(l) [1, 2, 3, 4, 5, 6, 7, 8, 9]

该函数flatten发生在lst作为参数。它循环lst的所有元素，直到到达整数（也可以改变int到float，str等对其他数据类型），其被添加到最外层的递归的返回值。

递归，不同于像for循环和单子方法，是它的不被列表深度限定一个通用的解决方案即可。例如，用5深度的列表可以被平坦化的相同方式l：

>>> l2 = [[3, [1, 2], [[[6], 5], 4, 0], 7, [[8]], [9, 10]]] >>> flatten(l2) [3, 1, 2, 6, 5, 4, 0, 7, 8, 9, 10]

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow

如何从列表列表中制作平面列表

如果您使用的是不要重新发明轮子的 Django的：

接受的答案是没有的不的工作：

新提出的解决方案，它的没有的对我的工作：