Question

How does the C/C++ compiler manipulate the escape character ["\"] in source code? How is compiler grammar written for processing that character? What does the compiler do after encountering that character?

Was it helpful?

Solution

Most compilers are divided into parts: the compiler front-end is called a lexical analyzer or a scanner. This part of the compiler reads the actual characters and creates tokens. It has a state machine which decides, upon seeing an escape character, whether it is genuine (for example when it appears inside a string) or it modifies the next character. The token is output accordingly as the escape character or some other token (such as a tab or a newline) to the next part of the compiler (the parser). The state machine can group several characters into a token.

OTHER TIPS

An interesting note on this subject is On Trusting Trust [PDF link].

The paper describes one way a compiler could handle this problem exactly, shows how the c-written-in-c compiler does not have an explicit translation of the codes into ASCII values; and how to bootstrap a new escape code into the compiler so that the understanding of the ASCII value for the new code is also implicit.

It generally escapes the following character:

  • In a string literal or character literal, it means escape the next character. \a means 'alert' (flashing the terminal, beeping or whatever), \n means 'linefeed', \xNUM means an hexadecimal number for example.
  • If it appears as the last visible character before a newline, whether within a string or not (and even within a line-wide comment!), it acts as a line-continuation: The following newline character is ignored, and the next line is merged with the current line.

escape character with a following character (like \n) is a single character for C compiler - scanner presents it to parser as character token, so there is no need in special syntax rules in parser for escape character.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top