Question

I have a text file full of data that starts with

#Name
#main

then it's followed by lots of numbers and then the file ends with

#extra
!side

So here's a small snippet

#Name
#main
60258960
33031674
72302403
#extra
!side

I want to read only the numbers. But here's the kick, I want them to each be their own individual string.

So I know how to read starting after the headers with

read=f.readlines()[3:]

But I'm stumped on everything else. Any suggestions?

Was it helpful?

Solution 2

You're pretty close, as you are. You just need to modify your list slice to chop off the last two lines in the file along with the first two. readlines will naturally return a list where each item is one line from the file. However, it will also have the 'newline' character at the end of each string, so you may need to filter that out.

with open("myfile.txt") as myfile:
    # Get only numbers
    read = myfile.readlines()[2:-2]

# Remove newlines
read = [number.strip() for number in read]
print read

OTHER TIPS

Read line by line. Use #main as a flag to start processing. Use #extra as a flag to stop processing.

start = '#main'
end = '#extra'
numbers = []
file_handler = open('read_up_to_a_point.txt')
started = False
for line in file_handler:
    if end in line:
        started = False       
    if started:
        numbers.append(line.strip())
    if start in line:
        started = True
file_handler.close()
print numbers

sample output

python read_up_to_a_point.py ['60258960', '33031674', '72302403']

I would do something like this:

nums = []
for line in f:
  stripped = line.rstrip('\n')
  if stripped.isnumeric():
    nums.append(stripped)

nums will contain only those lines with numbers. If your numbers are well formed, meaning not negative and no hexadecimal. That will take a regular expression to match precisely.

You should only use .readlines() if you know your input files will fit comfortably into memory; it reads all lines at once.

Most of the time you can read one input line at a time, and for that you can just iterate the file handle object.

When you want special, tricky input handling, I recommend encapsulating the handling in a generator function like this:

def do_something_with_point(point):
    print(point)

class BadInputFile(ValueError):
    pass

def read_points_data(f):
    try:
        line = next(f)
        if not line.startswith("#Name"):
            raise BadInputFile("file does not start with #Name")

        line = next(f)
        if not line.startswith("#main"):
            raise BadInputFile("second line does not start with #main")
    except StopIteration:
        raise BadInputFile("truncated input file")

    # use enumerate() to count input lines; start at line number 3
    # since we just handled two lines of header
    for line_num, line in enumerate(f, 3):
        if line.startswith("#extra"):
            break
        else:
            try:
                yield int(line)
            except ValueError:
                raise BadInputFile("illegal line %d: %s" % (line_num, line))
            # if you really do want strings: yield line
    else:
        # this code will run if we never see a "#extra" line
        # if break is executed, this doesn't run.
        raise BadInputFile("#extra not seen")

    try:
        line = next(f)
        if not line.startswith("!side"):
            raise BadInputFile("!side not seen after #extra")
    except StopIteration:
        raise BadInputFile("input file truncated after #extra")

with open("points_input_file.txt") as f:
    for point in read_points_data(f):
        do_something_with_point(point)

Note that this input function thoroughly validates the input, raising an exception when anything is incorrect on the input. But the loop using the input data is simple and clean; code using read_points_data() can be uncluttered.

I made read_points_data() convert the input points to int values. If you really want the points as strings, you can modify the code; I left a comment there to remind you.

It's not always a good idea (or perhaps even a feasible one) to usereadlines()without an argument because it will read in the entire file and potentially consume a lot of memory—and doing that may not be necessary if you don't need the all of them at once, depending on exactly what you're doing.

So, one way to do what you want is to use a Python generator function to extract just the lines or values you need from a file. They're very easy to create, essentially you just useyieldstatements to return values instead ofreturn. From a programming point-of-view the main difference between them is that execution will continue with the line following theyieldstatement next time the function is called, rather than from it first line as would normally be the case. This means their internal state automatically gets saved between subsequent calls, which makes doing complicated processing inside them easier.

Here's a fairly minimal example of using one to get the just the data you want out of the file, incrementally one-line-at-a-time so it doesn't require enough memory to hold the whole file:

def read_data(filename):
    with open(filename, 'rt') as file:
        next(file); next(file)  # ignore first two lines
        value = next(file).rstrip('\n')  # read what should be the first number
        while value != '#extra':  # not end-of-numbers marker
            yield value
            value = next(file).rstrip('\n')

for number in read_data('mydatafile'):
    # process each number string produced

Of course you can still gather them all together into a list, if you wish, like this:

numbers = list(read_data('mydatafile'))

As you can see it's possible to do other useful things in the function, such as validating the format of the file data or preprocessing it in other ways. In the example above I've done a little of that by removing the newline charactersreadlines()leaves on each line of the list it returns. It would be trivial to also convert each string value into an integer by usingyield int(value)instead of justyield value.

Hopefully this will give you enough of an idea of what's possible and the trade-offs involved when deciding on what approach to use to perform the task at hand.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top