len（text.split（））比text.count（“”）更快？为什么这么做？Python

https://stackoverflow.com//questions/22036918

21-12-2019
|

题

鉴于我有1,000,000,000行按空格分隔每行〜20-100令牌，计算每条线的长度变为非动力。

假设两个令牌之间从不双倍的空白，

是生成的速度比世代古代常规何种速度？

为什么如此？

解决方案

易于检查哪个速度更快：

>python -m timeit -s "s='q w e r t y u i o p a s d f g h j k l'" "s.count(' ')+1"
1000000 loops, best of 3: 0.272 usec per loop

>python -m timeit -s "s='q w e r t y u i o p a s d f g h j k l'" "len(s.split())"
1000000 loops, best of 3: 0.653 usec per loop

split较慢，可能是因为它构成了分割列表。

其他提示

text.count(" ")错误，请参阅以下：

In [706]: t='a  b    c'

In [707]: t.split()
Out[707]: ['a', 'b', 'c']

In [708]: t.count(' ')
Out[708]: 6

在这种情况下，你不想得到6。

您的前提是不正确的。这两个操作都没有给出相同的结果，让我们用你的问题作为一个例子：

>>> text = "Given that I have 1,000,000,000 lines of ~20-100 tokens per line delimited by whitespace, counting the length of each line becomes sort of non-trival."
>>> len(text.split())
24
>>> text.count(" ")
23

鉴于您的问题“计算每行的长度”，这些操作也不是这样做。

要计算您需要执行的每一行：

line_lengths = [len(line) for line in text.splitlines()]

但它可能会更好地注意行号：

line_lengths = [(idx, len(line)) for idx, line in enumerate(text.splitlines())]

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow