Python - glob.glob doesn't find *.txt in specified filepath within Unix OS

https://stackoverflow.com/questions/18559392

27-06-2022
|

Question

I am converting some Python scripts I wrote in a Windows environment to run in Unix (Red Hat 5.4), and I'm having trouble converting the lines that deal with filepaths. In Windows, I usually read in all .txt files within a directory using something like:

pathtotxt = "C:\\Text Data\\EJC\\Philosophical Transactions 1665-1678\\*\\*.txt"
for file in glob.glob(pathtotxt):

It seems one can use the glob.glob() method in Unix as well, so I'm trying to implement this method to find all text files within a directory entitled "source" using the following code:

#!/usr/bin/env python
import commands
import sys
import glob
import os

testout = open('testoutput.txt', 'w')
numbers = [1,2,3]
for number in numbers:
    testout.write(str(number + 1) + "\r\n")
testout.close

sourceout = open('sourceoutput.txt', 'w')
pathtosource = "/afs/crc.nd.edu/user/d/dduhaime/data/hill/source/*.txt"
for file in glob.glob(pathtosource):
    with open(file, 'r') as openfile:
        readfile = openfile.read()
        souceout.write (str(readfile))
sourceout.close

When I run this code, the testout.txt file comes out as expected, but the sourceout.txt file is empty. I thought the problem might be solved if I change the line

pathtosource = "/afs/crc.nd.edu/user/d/dduhaime/data/hill/source/*.txt"

pathtosource = "/source/*.txt"

and then run the code from the /hill directory, but that didn't resolve my problem. Do others know how I might be able to read in the text files in the source directory? I would be grateful for any insights others can offer.

EDIT: In case it is relevant, the /afs/ tree of directories referenced above is located on a remote server that I'm ssh-ing into via Putty. I'm also using a test.job file to qsub the Python script above. (This is all to prepare myself to submit jobs on the SGE cluster system.) The test.job script looks like:

#!/bin/csh
#$ -M dduhaime@nd.edu
#$ -m abe
#$ -r y
#$ -o tmp.out
#$ -e tmp.err
module load python/2.7.3
echo "Start - `date`"
python tmp.py 
echo "Finish - `date`"

Solution

Got it! I had misspelled the output command. I wrote

souceout.write (str(readfile))

instead of

sourceout.write (str(readfile))

What a dunce. I also added a newline bit to the line:

sourceout.write (str(readfile) + "\r\n")

and it works fine. I think it's time for a new IDE!

OTHER TIPS

You haven't really closed the file. The function testout.close() isn't called, because you have forgotten the parentheses. The same is for sourceout.close()

testout.close
...
sourceout.close

Has to be:

testout.close()
...
sourceout.close()

If the program finishes all files are automatically closed so it is only important if you reopen the file.
Even better (the pythonic version) would be to use the with statement. Instead of this:

testout = open('testoutput.txt', 'w')
numbers = [1,2,3]
for number in numbers:
    testout.write(str(number + 1) + "\r\n")
testout.close()

you would write this:

with open('testoutput.txt', 'w') as testout:
    numbers = [1,2,3]
    for number in numbers:
        testout.write(str(number + 1) + "\r\n")

In this case the file will be automatically closed even when an error occurs.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow