Text Lines are missed when reading a file Line by Line in Perl. <cr> <lf> mismatch

Question 1

The -72[CR] line isn't missing. You're just not seeing it.

This is because it's not a line since the Carriage Return character isn't normally recognized as a line break character. What is happening is that you're reading this as one line:

-72[CR]&nbsp(dBm)&nbsp(High)</td>[LF]

And what is happening is that you're printing:

Line No. 101 is -72

Then that carriage return character is being printed which makes the cursor go back to the beginning of the line. Then, the rest of the line is printed. This covers up what you printed out, and thus you see:

&nbsp(High)</td>

because that overwrote the previous text on that line.

I used VI to create three different files with three different file formats ("mac" = "\r", "unix" = "\n", and "dos" = "\r\n"), then I used the Unix cat command to combine them into a single bastardized file.

Here's my program:

use 5.12.0;
use autodie;

open my $test_fh, "<:crlf", "new_test";

local ($/);               #Enable "slurp" mode
my $file = <$test_fh>;    #Whole file is read in.

$file =~ s/[\r\n]+/\n/g;  #Make all line endings just \n

#
# Now "rewrite" the file
#
my @file = split /\n/, $file;
for my $line (@file) {
    say qq(Line: "$line");
}

This prints out:

Line: "MAC FILE"
Line: "this"
Line: "is"
Line: "a"
Line: "test of my"
Line: "program"
Line: "this"
Line: "WINDOWS FILE"
Line: "is"
Line: "a"
Line: "test of my"
Line: "program"
Line: "UNIX FILE"
Line: "this"
Line: "is"
Line: "a"
Line: "test of my"
Line: "program"

As you can see, the MAC FILE did show all the lines, but the word Line: didn't print out with all of them. That's because Perl read it in as one big line. My s/\r+/\n/g converted it to print on multiple lines, but the while loop read it in as a single line.

Take a look at my open statement. I use three parameters which solves some minor issues in Perl. The nice thing is you can attach layers or encodings to the file. For example, the <:crlf automatically converts Windows files from the \r\n ending to just \n, but won't touch Unix files. It's a life saver for those who work in mixed Unix/Windows environments.

I was hoping to find some similar layer for the old Mac style text files (In pre Mac OS X days, Macintosh files ended with just a \r and no \n at all. That would have really solved the issue. Unfortunately, I didn't find any documentation on it. It's been a long time since you had pre-OS X Macintosh text files.

Question 2

Carriage return is \r. It is listed in perldoc perlreref. Removing it from your input, for example in that loop of yours, can be done like so:

while (<hLOGFILE>) { 
    s/\r//g;
    print "Line no $.  Text is $_ ";
}

Alternatives

tr/\r//d;        # same thing as above, really
s/[\r\n]+$//;    # remove all line endings

Question 3

You could chomp() it off...

open hLOGFILE, "<output.txt";
while (<hLOGFILE>)
{
    chomp(); 
    print "Line no $.  Text is $_ \n" if( $_ );
}

On some systems I've seen the need to call chomp() twice, to get rid of multiple end-of-line characters...yes the do exist. You may want to add something to strip out all those HTML tags as well? See: How can I strip HTML in a string using Perl?