how to efficiently process string in python line by line?

https://stackoverflow.com/questions/9939860

28-05-2021
|

Question

I received some multi-line data via HTTP and have it in one string. I need to filter only lines containing specific keywords and write it to a file.

How do I process these individual lines without consuming excessive memory? I.e. without splitting the input string at newline and then processing the list?

Jython-specific solutions are welcome, too.

La solution 4

I now actually tested the memory requirements of using data.split('\n'), re.finditer('.*?\n', data) and StringIO.readline() in Jython. I was surprised to find out that split() didn't increase used memory (PS Old Gen), StringIO came second and re third.

Jython 2.5.1+:
  split()  +0 x data
  StringIO +2 x data
  re       +4 x data

Jython 2.2.1:
  split()  +0 x data
  re       +2 x data
  StringIO +7 x data

StringIO didn't use additional memory after the .write() call, i.e. it seems to be backed by the same string in Jython.

I didn't test speed.

Autres conseils

Since there is no iterator version of str.split, your best bet is to emulate it using the re module:

for line in re.finditer('.*?\n', data):
   # do stuff

However, note that this will leave the trailing newlines in place, unlike the regular split method.

You can try to use compiled regular expressions python re

Use the StringIO module to access your string as a file-like object. Then you can iterate over lines as you would do for a file.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow