Question

I've been doing some research into this topic and haven't found good concrete answers. Let's say you have these expression in your code:

B = 2
…
B = B + 5
…
B = J + B
…

(These are very simple examples, I know they aren't realistic)

B has many different values throughout these lines. In line one it's 2, later on it becomes 7, and more later on it's 7 + J. A compiler would need to keep track of these different values for B, so one way is to rename them. For example, when B is redefined as B = B+5, it could be changed to B1 = B+5. The last redefinition would then look like B2 = J+B1.

The motivation behind this idea involves an optimizing program I'm building. It involves replacing variables with expressions that they're related to. If a variable is redefined, however, then the character 'B' can mean multiple things at once. The method I'm using to keep track of things is what I've described above, redefining variable names.

Is this at all how compiler's work? Is there a name for this?

I'm trying to find out as much as I can about this process of a compiler redefining variables in the case of redefining variables.

If it helps, I believe this would be done at a pre-processing stage of compiling, and I believe it's a similar concept to macro-expansion.

EDIT: I added a little more context to the question.

Was it helpful?

Solution

Your hunch is correct, many modern compilers use flow analysis to rename variables so that each variable is unique. The resulting form is called "single static assignment" or SSA for short.

Input:

B = 2
B = B + 5
B = J + B

Output:

B1 = 2
B2 = B1 + 5
B3 = J + B2

There are additional parts to this for working with branches and loops, such as:

Input:

if X < 5
    B = Y + Z
else
    B = 2
B = B + 1

Output:

if X < 5:
    B1 = Y + Z
else
    B2 = 2
B3 = phi(B1, B2)
B4 = B3 + 1

The "phi" function selects whichever of its inputs is live.

This is NOT done during preprocessing, it is done after the code is compiled to some IR, usually consisting of basic blocks. It is NOT similar to macro-expansion.

OTHER TIPS

What you describe has been formalized as Static Single Assignment (SSA) form. It's a bit more invasive than "rename variables upon assignment", because you must also know the "current" variable to read from in the face of control flow, for example if you rewrite this:

if (a) x = 0;
else x = 1;
print(x);

into this, you have to insert a so-called phi node to select the correct value at the print:

if (a) x0 = 0;
else x1 = 1;
print(<which x?>)

Typically, the IR has SSA built in and thus the code is turned into SSA while it is translated into IR (or shortly thereafter). Macro expansion happens long before that, typically on the token stream or on the AST depending on how powerful your macros are.

Note however that this is in no way mandatory. It's useful for some optimizations, but not required (and some optimizations don't benefit at all). You can perform the same optimizations working with mutable variables (and many compiler IRs featuring SSA leave at least the heap as non-SSA), it's just less convenient and possibly more expensive. For example, when propagating constants you have to make sure there aren't any other assignments in between the constant and the use you're replacing, but you can check that quite easily without SSA.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top