Why does the C stdio 'ungetc' function exist?

https://softwareengineering.stackexchange.com/questions/330181

27-12-2020
|

Question

In the C programming language (and many subsequent languages that either directly interfaced with or built a facsimile of the C's Standard IO functions), there exists a function called ungetc: int ungetc(int char, FILE *stream);. It 'puts back' a char to the front of the stream that is being read. This putting back is only virtual: the original input stream is not altered, only the result of subsequent 'getc' calls will first read the 'ungetc' values before continuing with the real next values in the stream.

Why does this function exist? What are examples of use cases that can only be handled by using 'ungetc'?

Solution

The short answer is that ungetc allows you to peek at the next character without consuming it.

Let's say you're reading a packetized data format. It contains, among other things, a frame sync pattern. Frame sync patterns allow you to align data by marking the beginning of a data frame in an otherwise unsynchronized data stream.

To facilitate the discussion, here's a data definition:

[sync pattern] [packet length] [--------------data--------------] [checksum]

|---0xEB25---| |-- 16 bits --| |-- packet length minus 64 bits--| |32 bits |

The sync pattern EB25 is chosen for a number of reasons. Its bit pattern is resistant to false positives, and it's unique enough to serve as a file type "magic number."

The checksum is there to detect transmission errors and to validate the sync pattern (since EB25 has a small chance of actually being valid data). When combined with an accurate packet length, the combination of sync pattern, packet length and checksum virtually guarantees that you have identified a valid data packet.

Now imagine going through this exercise without the ability to back up to a previous point in the data stream. To find the next packet, you must scan bytes until you identify a sync pattern of EB25, taking into account that the bytes are reversed because the spec is based on Little Endian. Once you have identified the sync pattern, you must read the packet length, and then the remainder of the packet, and compute a checksum. If the checksum check fails, you must start over again from the byte following the failed sync pattern. To do that, you must back up to the start of the sync pattern + 4 bytes, and begin scanning again.

So far, I haven't described anything that couldn't also be accomplished by buffering the input stream. But what if the sync pattern is not guaranteed to align on a byte boundary? In that situation, the first bit of the sync pattern could occur in the middle of a byte. So to get the first 8 bits, you would have to read two bytes, not just one. Under these conditions, wouldn't it be useful to scrub backwards one byte if no consecutive 8 bits were an E (without standing up a buffered reader)?

This isn't just an idle hypothetical. The IRIG 106 Chapter 10 specification works exactly this way, although I've simplified the story somewhat for this demonstration.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange