Question

God I hate the term "code smell", but I can't think of anything more accurate.

I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).

During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.

I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).

I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

Was it helpful?

Solution

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:

  • VM: the virtual machine that ran the simulation program
  • Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
  • Solution: generation of the solution file based on the output of the controllers
  • Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed

Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

OTHER TIPS

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.

Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:

empyGraph    :: Graph
mkLabel      :: Label -> Graph
mkAssignment :: Assignment -> Graph  -- modify a register or memory
mkTransfer   :: ControlTransfer -> Graph   -- any control transfer
(<*>)        :: Graph -> Graph -> Graph

But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:

withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
             -> Graph   -- code in the 'then' branch
             -> Graph   -- code in the 'else' branch 
             -> Graph   -- resulting if-then-else construct

The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.

The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:

type Graph = RealGraph -> [Label] -> (RealGraph, [Label])

or a bit more accurately

type Graph = RealGraph -> State [Label] RealGraph
  -- a Graph is a monadic function from a successor RealGraph to a new RealGraph

With the state monad hidden behind a layer of abstraction, it's not smelly at all!

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?

With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.

Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.

Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.

At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.


Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.

In that case, you could manually write the sort of code that attributes grammars generate for you, for example:

data AbstractSyntax = Literal Int | Block AbstractSyntax
                    | Comment String AbstractSyntax

compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _      = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
                             in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
                             in (code, s : comments')

generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

It's possible that you may want an applicative functor instead of a monad:

http://www.haskell.org/haskellwiki/Applicative_functor

I think the original paper explains it better than the wiki, however:

http://www.soi.city.ac.uk/~ross/papers/Applicative.html

I don't think using the State Monad is a code smell when it used to model state.

If you need to thread state through your functions, you can do this explicitly, taking the the state as an argument and returning it in each function. The State Monad offers a good abstraction: it passes the state along for you and provides lots of useful function to combine functions that require state. In this case, using the State Monad (or Applicatives) is not a code smell.

However, if you use the State Monad to emulate an imperative style of programming while a functional solution would suffice, you are just making things complicated.

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.

The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.

Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.

So, yes, it's probably a code smell.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top