Different, interacting levels of state in haskell

https://stackoverflow.com/questions/9093282

21-04-2021
|

Question

I'm emulating a 4 bit microprocessor. I need to keep track of the registers, the memory and the running output (bonus points for also having a fetch-execute cycle counter). I've managed to do this without monads, but it feels messy passing around that much stuff at once explicitly. Also the function definition is messy, long and hard to read.

I've tried to do this with monads and it just doesn't fit together. I tried treating all the separate state components as a single type, but that left me with the problem of what to make the value.

State Program () -- Represents the state of the processor after a single iteration of the fetch execute cycle

Was the only type that made any sense. But at that point why even bother? I tried breaking it up by pulling the string out of my composite type and treating it as the value

State Program' String

which worked great, except for the fact that I needed RUNNING output. No matter what I did I couldn't hold on to both the string and the state at the same time.

Now I'm trying to grapple with monad transformers. It seems like I have to separate out all the different levels of state. But my head is exploding fast.

StateT Registers (StateT Memory (State Output)) a = 
StateT (registers -> (StateT Memory (State Output)) (a,registers))

StateT Registers (StateT Memory (State Output)) a = 
StateT (registers -> (Memory -> (Output -> (((a,Registers),Memory),Output))))

I haven't even put in the FEcycle counter yet!

Questions:

Am I on the right track?
Seeing as I'm pulling out monad transformers now, is it possible to stop treating "running output" as state and just palm it off to the IO monad? That would be awesome, instead of holding on to it I could just print it.
How many layers should I separate the state into? I can see two distinct layers, but they depend on each other closely (both the memory and the registers depend on the state of both the memory and the registers). Should I keep them together as a single state or separate them out and stack them up? Which approach will produce the most readable code?

Solution

Layering multiple state monads on top of each other is a bad idea: you will have to compose a bunch of lifts to get at each piece of state, identified only by the number of layers down the stack it is. Yuck! Indeed, the mtl library in general is designed to be used, with rare exceptions, with one monad transformer of each "kind" in a stack.

Instead, I would suggest StateT Program IO (). The interface to the state is the same and you can, as you said, do output in IO simply by using liftIO. Sure, the value type is (), but what's wrong with that? There's no relevant value you can return from the top-level emulator. And, of course, you're likely to have smaller, reusable components as part of your emulator, and those will have relevant result types. (Indeed, get is one such component.) There's nothing wrong with having no meaningful return value at the top level.

As far as conveniently accessing each portion of the state, what you're looking for is lenses; this Stack Overflow answer is an excellent introduction. They let you simply and easily access and modify independent parts of your state. For example, with the data-lens implementation, you can easily write something like regA += 1 to increment regA, or stack %= drop 2 to remove the first two elements of the stack.

Sure, it's essentially turning your code into imperative mutation of a set of global variables, but this is actually an advantage, since that's exactly the paradigm the CPU you're emulating is based in. And with the data-lens-template package, you can derive these lenses from a record definition in a single line.

OTHER TIPS

A simple way to do this would be to create a data type that represents the registers and memory:

data Register = ...
data Memory = ...
data Machine = Machine [Register] Memory

Then have some functions that update the registers/memory. Now use this type for your state and the output for your type:

type Simulation = State Machine Output

Now each operation could be in the form:

operation previous = do machine <- get
                        (result, newMachine) <- operate on machine
                        put newMachine
                        return result

Here previous is the previous output of the machine. You can incorporate it into the result as well.

So the Machine type represents the state of the machine; you're threading the output of the previous operations through it.

A more sophisticated way would be to use state threads (Control.Monad.ST). These let you use mutable references and arrays inside a function while being guaranteed purity on the outside.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow