Question

I am new to this forum and to programming and apologize in advance if I violate any of the forum rules. I have researched this extensively, but I couldn't find a solution for my problem.

So I have a very long file that has this general structure:

data="""
   20.020001    563410   9
   20.520001    577410  20
   21.022001    591466   9
   21.522001    605466 120
   23.196001    652338   2
   25.278001    710634   7
   25.780001    724690 144
   26.280001    738690   9
   26.782001    752746  40
   27.282001    766746   9
   27.784001    780802 140
   29.372001    825266   2
   31.458001    883674   7
   31.958002    897674   8
   32.458002    911674   9
   32.958002    925674  10

"""

I imported the file using

with open("C:\blablabla\text.txt", 'r+') as infile:
data = infile.read()

Now I am trying to use a regular expression to find all lines that end with 140 through 146, so I did this:

items=re.findall('.......................14[0-6]\n',data,re.MULTILINE)
for x in items:
    print x

This works, but when I now try to copy those lines that contain the regular expression,

for x in items:
    if items in data:
        data.write(items)

I get the following error:

if items in data:
TypeError: 'in <string>' requires string as left operand, not list

I understand what the problem is, but I don't know how to solve it. How can I feed the left operand a string when the outcome of my regex is a list?

Any help is much appreciated!

Was it helpful?

Solution

You should simply handle each line separately:

data = infile.readlines()
for line in data:
    if re.match('.......................14[0-6]\n', line):
        print line[:-1]

The last character of the line is a trailing newline, which would be duplicated by the one the print statement includes.

OTHER TIPS

You can read the file line by line:

data=""
with open("file.txt", 'r+') as infile:
    for line in infile:
        if (146 >= int(line.split()[-1]) >= 140) :
            data = data + line

print data

Your Regex can be simplified further

re.findall('.*?14[0-6]\n')

To overcome your further problems

items = re.findall('.*?14[0-6]\n',data)
result=""""""
for x in items:
     result+=str(x)
print result
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top