how to efficiently process string in python line by line?
-
28-05-2021 - |
题
I received some multi-line data via HTTP and have it in one string. I need to filter only lines containing specific keywords and write it to a file.
How do I process these individual lines without consuming excessive memory? I.e. without splitting the input string at newline and then processing the list?
Jython-specific solutions are welcome, too.
解决方案 4
I now actually tested the memory requirements of using data.split('\n'), re.finditer('.*?\n', data) and StringIO.readline() in Jython. I was surprised to find out that split() didn't increase used memory (PS Old Gen), StringIO came second and re third.
Jython 2.5.1+:
split() +0 x data
StringIO +2 x data
re +4 x data
Jython 2.2.1:
split() +0 x data
re +2 x data
StringIO +7 x data
StringIO didn't use additional memory after the .write() call, i.e. it seems to be backed by the same string in Jython.
I didn't test speed.
其他提示
Since there is no iterator version of str.split
, your best bet is to emulate it using the re
module:
for line in re.finditer('.*?\n', data):
# do stuff
However, note that this will leave the trailing newlines in place, unlike the regular split
method.
You can try to use compiled regular expressions python re
Use the StringIO module to access your string as a file-like object. Then you can iterate over lines as you would do for a file.