What is referential transparency?

https://softwareengineering.stackexchange.com/questions/254304

05-10-2020
|

Question

I have seen that in imperative paradigms

f(x)+f(x)

might not be the same as:

2*f(x)

But in a functional paradigm it should be the same. I have tried to implement both cases in Python and Scheme, but for me they look pretty straightforward the same.

What would be an example that could point out the difference with the given function?

Solution

Referential transparency, referred to a function, indicates that you can determine the result of applying that function only by looking at the values of its arguments. You can write referentially transparent functions in any programming language, e.g. Python, Scheme, Pascal, C.

On the other hand, in most languages you can also write non referentially transparent functions. For example, this Python function:

counter = 0

def foo(x):
  global counter

  counter += 1
  return x + counter

is not referentially transparent, in fact calling

foo(x) + foo(x)

and

2 * foo(x)

will produce different values, for any argument x. The reason for this is that the function uses and modifies a global variable, therefore the result of each invocation depends on this changing state, and not only on the function's argument.

Haskell, a purely functional language, strictly separates expression evaluation in which pure functions are applied and which is always referentially transparent, from action execution (processing of special values), which is not referentially transparent, i.e. executing the same action can have each time a different result.

So, for any Haskell function

f :: Int -> Int

and any integer x, it is always true that

2 * (f x) == (f x) + (f x)

An example of an action is the result of the library function getLine:

getLine :: IO String

As a result of expression evaluation, this function (actually a constant) first of all produces a pure value of type IO String. Values of this type are values like any other: you can pass them around, put them in data structures, compose them using special functions, and so on. For example you can make a list of actions like so:

[getLine, getLine] :: [IO String]

Actions are special in that you can tell the Haskell runtime to execute them by writing:

main = <some action>

In this case, when your Haskell program is started, the runtime walks through the action bound to main and executes it, possibly producing side-effects. Therefore, action execution is not referentially transparent because executing the same action two times can produce different results depending on what the runtime gets as input.

Thanks to Haskell's type system, an action can never be used in a context where another type is expected, and vice versa. So, if you want to find the length of a string you can use the length function:

length "Hello"

will return 5. But if you want to find the length of a string read from the terminal, you cannot write

length (getLine)

because you get a type error: length expects an input of type list (and a String is, indeed, a list) but getLine is a value of type IO String (an action). In this way, the type system ensures that an action value like getLine (whose execution is carried out outside the core language and which may be non-referentially transparent) cannot be hidden inside a non-action value of type Int.

EDIT

To answer exizt question, here is a small Haskell program that reads a line from the console and prints its length.

main :: IO () -- The main program is an action of type IO ()
main = do
          line <- getLine
          putStrLn (show (length line))

The main action consists of two subactions that are executed sequentially:

getline of type IO String,
the second is constructed by evaluating the function putStrLn of type String -> IO () on its argument.

More precisely, the second action is built by

binding line to the value read by the first action,
evaluating the pure functions length (compute length as an integer) and then show (turn the integer into a string),
building the action by applying function putStrLn to the result of show.

At this point, the second action can be executed. If you have typed "Hello", it will print "5".

Note that if you get a value out of an action using the <- notation, you can only use that value inside another action, e.g. you cannot write:

main = do
          line <- getLine
          show (length line) -- Error:
                             -- Expected type: IO ()
                             --   Actual type: String

because show (length line) has type String whereas the do notation requires that an action (getLine of type IO String) be followed by another action (e.g. putStrLn (show (length line)) of type IO ()).

EDIT 2

Jörg W Mittag's definition of referential transparency is more general than mine (I have upvoted his answer). I have used a restricted definition because the example in the question focuses on the return value of functions and I wanted to illustrate this aspect. However, RT in general refers to the meaning of the whole program, including changes to global state and interactions with the environment (IO) caused by evaluating an expression. So, for a correct, general definition, you should refer to that answer.

OTHER TIPS

def f(x): return x()

from random import random
f(random) + f(random) == 2*f(random)
# => False

However, that's not what Referential Transparency means. RT means that you can replace any expression in the program with the result of evaluating that expression (or vice versa) without changing the meaning of the program.

Take, for example, the following program:

def f(): return 2

print(f() + f())
print(2)

This program is referentially transparent. I can replace one or both occurences of f() with 2 and it will still work the same:

def f(): return 2

print(2 + f())
print(2)

def f(): return 2

print(f() + 2)
print(2)

def f(): return 2

print(2 + 2)
print(f())

will all behave the same.

Well, actually, I cheated. I should be able to replace the call to print with its return value (which is no value at all) without changing the meaning of the program. However, clearly, if I just remove the two print statements, the meaning of the program will change: before, it printed something to the screen, after it doesn't. I/O isn't referentially transparent.

The simple rule of thumb is: if you can replace any expression, sub-expression or subroutine call with the return value of that expression, sub-expression or subroutine call anywhere in the program, without the program changing its meaning, then you have referential transparency. And what this means, practically speaking is that you can't have any I/O, can't have any mutable state, can't have any side-effects. In every expression, the value of the expression must depend solely on the values of the constituent parts of the expression. And in every subroutine call, the return value must depend solely on the arguments.

Parts of this answer are taken directly from an unfinished tutorial on functional programming, hosted on my GitHub account:

A function is said to be referentially transparent if it, given the same input parameters, always produces the same output (return value). If one is looking for a raison d'être for pure functional programming, referential transparency is a good candidate. When reasoning with formulae in algebra, arithmetic, and logic, this property — also called substitutivity of equals for equals — is so fundamentally important that it is usually taken for granted...

Consider a simple example:

x = 42

In a pure functional language, the left-hand and right-hand side of the equals sign are substitutable for each other both ways. That is, unlike in a language like C, the above notation truly asserts an equality. A consequence of this is that we can reason about program code just like mathematical equations.

From Haskell wiki:

Pure computations yield the same value each time they are invoked. This property is called referential transparency and makes possible to conduct equational reasoning on the code...

To contrast this, the type of operation performed by C-like languages is sometimes referred to as a destructive assignment.

The term pure is often used to describe a property of expressions, relevant to this discussion. For a function to be considered pure,

it is not allowed to exhibit any side effects, and
it must be referentially transparent.

According to the black-box metaphor, found in numerous mathematical textbooks, a function's internals are completely sealed off from the outside world. A side-effect is when a function or expression violates this principle — that is, the procedure is allowed to communicate in some way with other program units (e.g. to share and exchange information).

In summary, referential transparency is a must for functions to behave like true, mathematical functions also in the semantics of programming languages.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange