Reading multiple lines from an external text file in Python

Question 1

If you just want to check the type of caracters contained in the file, I wouldn't use readlines but a regular read.

STEP_BYTES = 1024

def main():
    infile = open("module3.txt","r")
    uppercasecount = 0
    lowercasecount = 0
    digitcount = 0
    spacecount = 0
    data = infile.read(STEP_BYTES)
    while data:
        for character in data:
            if character.isupper() == True:
                uppercasecount += 1
            if character.islower() == True:
                lowercasecount += 1
            if character.isdigit() == True:
                digitcount += 1
            if character.isspace() == True:
                spacecount += 1
        data = infile.read(STEP_BYTES)

    print ("Total count is %d Upper case, %d Lower case, %d Digit(s) and %d spaces." %(uppercasecount, lowercasecount, digitcount, spacecount))

main()

If you really need to use readlines, keep in mind that that method will read all the lines of the file and put them in memory (not so good with very large files) in a list of lines.

For instance, assuming that your module3.txt file contains:

this Is a TEST
and this is another line

Using readlines() will return:

['this Is a TEST\n', 'and this is another line']

With that in mind, you can walk the file contents using a double for loop:

def main():
    infile = open("module3.txt","r")
    uppercasecount = 0
    lowercasecount = 0
    digitcount = 0
    spacecount = 0
    lines = infile.readlines()
    for line in lines:
        for character in line:
            if character.isupper() == True:
                uppercasecount += 1
            if character.islower() == True:
                lowercasecount += 1
            if character.isdigit() == True:
                digitcount += 1
            if character.isspace() == True:
                spacecount += 1
    print ("Total count is %d Upper case, %d Lower case, %d Digit(s) and %d spaces." %(uppercasecount, lowercasecount, digitcount, spacecount))

main()

As for the directory thing, if your code and your text file (module3.txt) are going to be shipped in the same directory, you don't need to do the chdir. By default, the working directory of the script is the directory where the script is.

Let's say you ship it in a directory like:

  |-> Count
     |-> script.py
     |-> module3.txt

You can just use relative paths to open module3.txt from within script.py: the line open("module3.txt", "r") will go look for a file called module3.txt withing the directory where the script is running (meaning, Count\ ). You don't need the call to os.chdir. If you still want to make sure, you could chdir to the directory where the script is located (take a look to this):

Knowing that, change your hardcoded chdir line ( os.chdir(r'M:\Project\Count') on top of your file) to:

print "Changing current directory to %s" % os.path.dirname(os.path.realpath(__file__))
os.chdir(os.path.dirname(os.path.realpath(__file__)))

Question 2

You can use the two-form iter to read them an arbitrary number of bytes at a time, and itertools.chain to consider them as one long input. Instead of keeping track of several variables, you can use the str methods as keys to a collections.Counter, eg:

from collections import Counter
from itertools import chain

counts = Counter()
with open('yourfile') as fin:
    chars = chain.from_iterable(iter(lambda: fin.read(4096), ''))
    for ch in chars:
        for fn in (str.isupper, str.islower, str.isdigit, str.isspace):
            counts[fn] += fn(ch)

#Counter({<method 'islower' of 'str' objects>: 39, <method 'isspace' of 'str' objects>: 10, <method 'isdigit' of 'str' objects>: 0, <method 'isupper' of 'str' objects>: 0})

Then counts[str.lower] will give you 39 for instance...