Question

Sorry for yet another FP + side effects question, but I couldn't find an existing one which quite answered this for me.

My (limited) understanding of functional programming is that state/side effects should be minimised and kept separate from stateless logic.

I also gather Haskell’s approach to this, the IO monad, achieves this by wrapping stateful actions in a container, for later execution, considered outside the scope of the program itself.

I’m trying to understand this pattern, but actually to determine whether to use it in a Python project, so want to avoid Haskell specifics if poss.

Crude example incoming.

If my program converts an XML file to a JSON file:

def main():
    xml_data = read_file('input.xml')  # impure
    json_data = convert(xml_data)  # pure
    write_file('output.json', json_data) # impure

Isn’t the IO monad’s approach effectively to do this:

steps = list(
    read_file,
    convert,
    write_file,
)

then absolve itself of responsibility by not actually calling those steps, but letting the interpreter do it?

Or put another way, it’s like writing:

def main():  # pure
    def inner():  # impure
        xml_data = read_file('input.xml')
        json_data = convert(xml_data)
        write_file('output.json', json_data)
    return inner

then expecting someone else to call inner() and saying your job is done because main() is pure.

The whole program is going to end up contained in the IO monad, basically.

When the code is actually executed, everything after reading the file depends on that file’s state so will still suffer from the same state-related bugs as the imperative implementation, so have you actually gained anything, as a programmer who will maintain this?

I totally appreciate the benefit of reducing and isolating stateful behaviour, which is in fact why I structured the imperative version like that: gather inputs, do pure stuff, spit out outputs. Hopefully convert() can be completely pure and reap the benefits of cachability, threadsafety, etc.

I also appreciate that monadic types can be useful, especially in pipelines operating on comparable types, but don’t see why IO should use monads unless already in such a pipeline.

Is there some additional benefit to dealing with side effects the IO monad pattern brings, which I’m missing?

Was it helpful?

Solution

The whole program is going to end up contained in the IO monad, basically.

That's the bit where I think you're not seeing it from the Haskellers' perspective. So we have a program like this:

module Main

main :: IO ()
main = do
  xmlData <- readFile "input.xml"
  let jsonData = convert xmlData
  writeFile "output.json" jsonData

convert :: String -> String
convert xml = ...

I think a typical Haskeller's take on this would be that convert, the pure part:

  1. Is probably the bulk of this program, and by far more complicated than the IO parts;
  2. Can be reasoned about and tested without having to deal with IO at all.

So they don't see this as convert being "contained" in IO, but rather, as it being isolated from IO. From its type, whatever convert does can never depend on anything that happens in an IO action.

When the code is actually executed, everything after reading the file depends on that file’s state so will still suffer from the same state-related bugs as the imperative implementation, so have you actually gained anything, as a programmer who will maintain this?

I'd say that this splits into two things:

  1. When the program runs, the value of the argument to convert depends on the state of the file.
  2. But what the convert function does, that doesn't depend on the state of the file. convert is always the same function, even if it is invoked with different arguments at different points.

This is a somewhat abstract point, but it's really key to what Haskellers mean when they talk about this. You want to write convert in such a way that given any valid argument, it will produce a correct result for that argument. When you look at it like that, the fact that reading a file is a stateful operation doesn't enter into the equation; all that matters is that whatever argument is fed to it and wherever that may have come from, convert must handle it correctly. And the fact that purity restricts what convert can do with its input simplifies that reasoning.

So if convert produces incorrect results from some arguments, and readFile feeds it such an argument, we don't see that as a bug introduced by state. It's a bug in a pure function!

OTHER TIPS

It's hard to be sure exactly what you mean by "purely academic", but I think the answer is mostly "no".

As explained in Tackling the Awkward Squad by Simon Peyton Jones (strongly recommended reading!), monadic I/O was meant to solve real problems with the way Haskell used to handle I/O. Read the example of the server with Requests and Responses, which I won't copy here; it's very instructive.

Haskell, unlike Python, encourages a style of "pure" computation which can be enforced by its type system. Of course, you can use self-discipline when programming in Python to comply with this style, but what about modules you didn't write? Without much help from the type system (and common libraries), monadic I/O is probably less useful in Python. The philosophy of the language is just not meant to enforce a strict pure/impure separation.

Note that this says more about the different philosophies of Haskell and Python than about how academic monadic I/O is. I wouldn't use it for Python.

One other thing. You say:

The whole program is going to end up contained in the IO monad, basically.

It's true the Haskell main function "lives" in IO, but real Haskell programs are encouraged not to use IO whenever it isn't needed. Almost every function you write that doesn't need to do I/O shouldn't have type IO.

So I'd say in your last example you got it backwards: main is impure (because it reads and writes files) but core functions like convert are pure.

Why is IO impure? Because it may return different values at different times. There is a dependency on time that must be accounted for, one way or another. This is even more crucial with lazy evaluation. Consider the following program:

main = do  
    putStrLn "Please enter your name"  
    name <- getLine
    putStrLn $ "Hello, " ++ name

Without an IO monad, why would the first prompt ever get output? There is nothing depending on it, so lazy evaluation means it will never get demanded. There is also nothing compelling the prompt to be output before the input is read. As far as the computer is concerned, without an IO monad, those first two expressions are completely independent of each other. Fortunately, name imposes an order on the second two.

There are other ways to solve the problem of order dependency, but using an IO monad is probably the simplest way (from the language point of view at least) to allow everything to stay in the lazy functional realm, without little sections of imperative code. It's also the most flexible. For example, you can relatively easily build an IO pipeline dynamically at runtime based on user input.

My (limited) understanding of functional programming is that state/side effects should be minimised and kept separate from stateless logic.

That's not just functional programming; that's usually a good idea in any language. If you do unit testing, the way you split apart read_file(), convert() and write_file() comes perfectly naturally because, despite convert() being by far the most complex and largest part of the code, writing tests for it is relatively easy: all you need to set up is the input parameter. Writing tests for read_file() and write_file() is quite a bit harder (even though the functions themselves are almost trivial) because you need to create and/or read things on the file system before and after calling the function. Ideally you'd make such functions so simple that you feel comfortable not testing them and thus save yourself a lot of hassle.

The difference between Python and Haskell here is that Haskell has a type checker that can prove that functions have no side effects. In Python you need to hope that nobody's accidentally dropped in a file-reading or -writing function into convert() (say, read_config_file()). In Haskell when you declare convert :: String -> String or similar, with no IO monad, the type checker will guarantee that this is a pure function that relies only on its input parameter and nothing else. If someone tries to modify convert to read a config file they will quickly see compiler errors showing that they'd be breaking the purity of the function. (And hopefully they'd be sensible enough to move read_config_file out of convert and pass its result into convert, maintaining the purity.)

Licensed under: CC-BY-SA with attribution
scroll top