How to be definite about the number of whitespace fmt.Fscanf consumes?

https://stackoverflow.com/questions/15841257

01-04-2022
|

Question

I am trying to implement a PPM decoder in Go. PPM is an image format that consists of a plaintext header and then some binary image data. The header looks like this (from the spec):

Each PPM image consists of the following:

A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".

Whitespace (blanks, TABs, CRs, LFs).

A width, formatted as ASCII characters in decimal.

Whitespace.

A height, again in ASCII decimal.

Whitespace.

The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.

A single whitespace character (usually a newline).

I try to decode this header with the fmt.Fscanf function. The following call to fmt.Fscanf parses the header (not addressing the caveat explained below):

var magic string
var width, height, maxVal uint

fmt.Fscanf(input,"%2s %d %d %d",&magic,&width,&height,&maxVal)

The documentation of fmt states:

Note: Fscan etc. can read one character (rune) past the input they return, which means that a loop calling a scan routine may skip some of the input. This is usually a problem only when there is no space between input values. If the reader provided to Fscan implements ReadRune, that method will be used to read characters. If the reader also implements UnreadRune, that method will be used to save the character and successive calls will not lose data. To attach ReadRune and UnreadRune methods to a reader without that capability, use bufio.NewReader.

As the very next character after the final whitespace is already the beginning of the image data, I have to be certain about how many whitespace fmt.Fscanf did consume after reading MaxVal. My code must work on whatever reader the was provided by the caller and parts of it must not read past the end of the header, therefore wrapping stuff into a buffered reader is not an option; the buffered reader might read more from the input than I actually want to read.

Some testing suggests that parsing a dummy character at the end solves the issues:

var magic string
var width, height, maxVal uint
var dummy byte

fmt.Fscanf(input,"%2s %d %d %d%c",&magic,&width,&height,&maxVal,&dummy)

Is that guaranteed to work according to the specification?

Solution

No, I would not consider that safe. While it works now, the documentation states that the function reserves the right to read past the value by one character unless you have an UnreadRune() method.

By wrapping your reader in a bufio.Reader, you can ensure the reader has an UnreadRune() method. You will then need to read the final whitespace yourself.

buf := bufio.NewReader(input)
fmt.Fscanf(buf,"%2s %d %d %d",&magic,&width,&height,&maxVal)
buf.ReadRune() // remove next rune (the whitespace) from the buffer.

Edit:

As we discussed in the chat, you can assume the dummy char method works and then write a test so you know when it stops working. The test can be something like:

func TestFmtBehavior(t *testing.T) {
    // use multireader to prevent r from implementing io.RuneScanner
    r := io.MultiReader(bytes.NewReader([]byte("data  ")))

    n, err := fmt.Fscanf(r, "%s%c", new(string), new(byte))
    if n != 2 || err != nil {
        t.Error("failed scan", n, err)
    }

    // the dummy char read 1 extra char past "data".
    // one byte should still remain
    if n, err := r.Read(make([]byte, 5)); n != 1 {
        t.Error("assertion failed", n, err)
    }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow