How can I get readlines() to ignore the EOF 0x1A character?

https://stackoverflow.com/questions/21274158

01-10-2022
|

Question

I am writing a Python script that will take STDIN from TextWrangler and do something to it on a line by line basis. In Textwrangler, I combine multiple text files using drag and drop. Problem is that the documents retain the ^Z (0x1A) character, which my Python script is interpreting as a EOF indicator. The result is that my script only "sees" the first of the many combined text documents (up to the first EOF character).

I've researched and read about reading in binary modes, buffers and such, but I'm a complete newbie to this kind of stuff and can't figure out how to implement any of those ideas. It seems that readlines() looks for the EOF and stops. How can I prevent that?

Here is my code:

import sys

for line_number, line in enumerate(sys.stdin.readlines()):
    if len(line) > 4:  # Blank lines are skipped
        if line.split()[0].isdigit():  #Determine if the line begins with an EVENT NUMBER
            print line.split()[7]

Solution

Option 1: Since you are generating your source files external to python, just add a step after TextWrangler to remove the offending characters. I've become a big fan of sed and grep. Ports are available for windows, and natively available for *nix.

Option 2: Fix the file in TextWrangler.

Option 3: Convert the Textwrangler steps to a python script and avoid the issue altogether.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow