Question

I'm writing a simple wrapper-class for scanning a stream of characters character-by-character.

Scanner scanner("Hi\r\nYou!");
const char* current =  scanner.cchar();
while (*current != 0) {
    printf("Char: %d, Column: %d, Line: %d\n", *current, scanner.column(), scanner.line());
    current = scanner.read();
}

C:\Users\niklas\Desktop>g++ main.cpp -o main.exe
C:\Users\niklas\Desktop>main.exe
Char: 72, Column: 0, Line: 0
Char: 105, Column: 1, Line: 0
Char: 13, Column: 0, Line: 1
Char: 10, Column: 0, Line: 2
Char: 89, Column: 1, Line: 2
Char: 111, Column: 2, Line: 2
Char: 117, Column: 3, Line: 2
Char: 33, Column: 4, Line: 2

This example already shows the problem I'm stuck with. One can interpret \r as a new-line, as well as \n. But together (\r\ n) they are just a single new-line as well!

The function that processes line- and column-numbers is this:

void _processChar(int revue) {
    char chr = _source[_position];
    if (chr == '\r' or chr == '\n') {
        _line += revue;
        _column = 0;
    }
    else {
        _column += revue;
    }
}

Sure, I could just look at the character that appears after the character at the current position, but: I do not check for NULL-termination on the source because I want to be able to process character streams that may contain \0 characters without being terminated at that point.

How can I handle CRLF this way?

Edit 1: DOH! This seems to be working fine. Is this safe in any case or do I have an issue somewhere?

void _processChar(int revue) {
    char chr = _source[_position];

    bool is_newline = (chr == '\r' or chr == '\n');
    if (chr == '\n' and _position > 0) {
        is_newline = (_source[_position - 1] != '\r');
    }

    if (is_newline) {
        _line += revue;
        _column = 0;
    }
    else {
        _column += revue;
    }
}

Thanks!

Was it helpful?

Solution 4

This seems legit to me:

void _processChar() {
    char chr = _source[_position];

    // Treat CRLF as a single new-line
    bool is_newline = (chr == '\r' or chr == '\n');
    if (chr == '\n' and _position > 0) {
        is_newline = (_source[_position - 1] != '\r');
    }

    if (is_newline) {
        _line += 1;
        _column = 0;
    }
    else {
        _column += 1;
    }
}

At the point where a \n is processed, it checks whether the previous character is carriage return (\r). If so, the line-number is not increased.

Also, before it checks the previous character, it tests whether there is actually a previous character (and _position > 0).

I've removed the int revue argument as I just noticed that what I wanted to achieve is not possible they way I tried to achieve it. I wanted to be able to go backwards in the source, but I can not retrieve the column-number from the previous line then.

OTHER TIPS

Most modern systems handle \n as the the newline for the current target platform so all of that should happen automatically for you if you just check for \n.

You may need to keep state inside your stream wrapper -- a stateless wrapper, as you've noticed, simply cannot do this, because every output can (by definition) depend on the previous output.

Your _processChar doesn’t appear to increment the stream read position. Once you change that, you can implement the full newline check:

void _processChar(int revue) {
    char chr = _source[_position];
    if (chr != '\r' and chr != '\n') {
        _column += revue;
        return;
    }
    if (if chr == '\r' and _source[_position + 1] == '\n')
        ++_position;
    _line += revue;
    _column = 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top