Semantics II: State, Exception

You would be correct to guess that a model for mutable variables looks like the state monad. How simple it is depends on the rest of the language.

If you have a fixed set of variables known up front, then the state can be as simple as a record/tuple, each field for a variable.

But you expect an interesting language to also have local variables that pop up and disappear, and even locally created functions or mutable data structures that can be passed/returned to the caller. This will be the hard part.

Stateful language set up

I use a toy language that's still expression-centred, with local variables and functions:

data Expr = Num Integer
          | Var String
          | Prim2 Op2 Expr Expr         -- Prim2 op operand operand
          | Let [(String, Expr)] Expr   -- Let [(name, rhs), ...] eval-me
          | Lambda String Expr          -- Lambda var body
          | App Expr Expr               -- App func arg
          | ...

in SemanticsState.hs

data Expr = ...
          | Assign String Expr          -- var := expr

in SemanticsState.hs

And to make it really useful, you will like to have a construct that stands for, e.g.,

so I have a sequential execution construct that holds a list of expressions, the semantics being to evaluate or run them in order, the point being most of them are for modifying variables:

data Expr = ...
          | Seq [Expr]                  -- sequential execution

in SemanticsState.hs

Possible values: I have one more case VNone for “no value”. This happens as the result of evaluating “Seq []”. I also use this for assignments, but this is an arbitrary choice, some of you may prefer the value of the RHS.

data Value = ...
           | VNone

in SemanticsState.hs

My language is getting closer to Scheme! But it is also possible to tell the story in this lecture with a command-oriented language like shell scripts, Python, Go, etc.

Two flawed models

Before I present a correct but pretty complex model, let me briefly describe the problem with two simple first-instinct models.

First instinct: Upon “x:=5”, change x's value in the environment. But in the interpreter, environments are only passed downwards in a recursive call chain, never upwards. The change will be lost as soon as the recursive call returns.

Second instinct: The state monad is precisely for passing data upwards (and sideways too)—you experienced this with the tree relabeling lab exercise. So don't use environments, use state for the variable→value dictionary!

The new problem is that it is very hard to avoid either of these two bugs about local variables. Just before entering a “let”, do you save a copy of the state, so that when you leave it, you restore the before-let state?

To be fair, there is a very tricky way to fix it correctly, but it is more like performance optimization than a model, so I won't describe it. But the trick is used by practical interpreters and compilers: basically, they have one state (the whole memory) but it's divided into a stack and a heap. If you're interested, I still think it's better to understand the model below first, and then the trick will make a lot of sense as an implementation technique.

A correct model

Previously we had an immutable language, data didn't change over time, so we could stuff data into the environment. Not anymore. Now we need to de-couple them and add a layer of indirection:

Also don't forget that I can still have runtime errors. So here is my state monad with failure, along with important primitives:

data MutM a = MkMutM (IntMap Value -> Either String (IntMap Value, a))

unMutM (MkMutM f) = f

instance Monad MutM where
    return a = MkMutM (\s -> Right (s, a))
    MkMutM f >>= k = MkMutM (\s -> case f s of
                                     Left msg -> Left msg
                                     Right (s', a) -> unMutM (k a) s')

-- Return the whole memory. Used by new and load below.
getMem = MkMutM (\s -> Right (s, s))

-- Modify the memory by a function, i.e., change s to f s. Used by store below.
modMem f = MkMutM (\s -> Right (f s, ()))

-- The interpreter shouldn't directly use getMem and modMem. Instead, it just
-- needs to know how to new (allocate and init), load from an address, store to
-- an address.

-- Allocate and store initial value, return new address.  What's the new
-- address?  I simply use the largest used address plus one.
new :: Value -> MutM Int
new val =
    fmap IntMap.size getMem
    >>= \n -> store n val
    >> pure n

-- Read value at address.
load :: Int -> MutM Value
load a = fmap (! a) getMem

-- Write value to address.
store :: Int -> Value -> MutM ()
store a val = modMem (IntMap.insert a val)

-- And how to flag errors. Because no longer simply "Left".
raise :: String -> MutM a
raise msg = MkMutM (\_ -> Left msg)

in SemanticsState.hs

The main interpreter calls the helper interpreter interp with initially empty environment and initially empty memory. When all is done, only the error or the final value is extracted. interp is the one that does the recursion and works in my monad.

mainInterp :: Expr -> Either String Value
mainInterp expr = fmap snd (unMutM (interp expr Map.empty) IntMap.empty)
-- This fmap is Either's fmap.

interp :: Expr -> Map String Int -> MutM Value

in SemanticsState.hs

I first show assignment and those constructs affected by the split between environment and memory. Keep in mind that every time you access a variable by name, the environment gives you only an address; then you need to use load or store for the memory content.

interp (Var v) env =
    case Map.lookup v env of
      Just a -> load a
      Nothing -> raise "var not found"

interp (Assign v e) env =
    case Map.lookup v env of
      Nothing -> raise "var not found"
      Just addr ->
          interp e env
          >>= \val -> store addr val
          >> pure VNone

interp (Let eqns evalMe) env =
    extend env eqns
    >>= \env' -> interp evalMe env'
  where
    extend env [] = pure env
    extend env ((v,rhs) : eqns) =
        interp rhs env
        -- Now we need to allocate memory to store val
        >>= \val -> new val
        -- And the environment stores the address, not the value!
        >>= \addr -> let env' = Map.insert v addr env
        in extend env' eqns

interp (App f e) env =
    interp f env
    >>= \c -> case c of
      VClosure fEnv v body ->
          interp e env
          -- Now we need to allocate memory to store val.
          >>= \eVal -> new eVal
          -- And the environment stores the address, not the value!
          >>= \addr -> let bEnv = Map.insert v addr fEnv
          in interp body bEnv
      _ -> raise "type error: not lambda"

in SemanticsState.hs

The rest are straightforward or like the previous functional interpreter. I just show how to do Seq just because it's new, but it's easy after you know the intention:

interp (Seq es) env = go es
  where
    -- The answer of the whole Seq is VNone if the list is empty, else the
    -- answer from the last expression.  But still run all expressions for the
    -- effects.
    go [] = pure VNone
    go [e] = interp e env
    go (e:es) = interp e env *> go es

in SemanticsState.hs

Discussion

Pass by value, pass by reference

Note the App case allocates new memory for y (and later forgets it). x will not be affected. This is “pass by value”.

Pass by value = value of parameter passed to function. Cloning and allocation if necessary (if mutable language).

Note: Not to be confused with call by value. Call by value refers to evaluation order. Pass by value refers to not changing the caller's variable.

Pass by reference = y stands for the same address as x's rather than new memory. “y:=2” will change x! (Also disallows “(\y -> y:=2) (1+2)”.)

Pascal supports both, with pass-by-value as the default, pass-by-reference as an opt-in per parameter. It looks like this:

Exception catching and state

Suppose there is a language that supports state and catching exceptions. Let's try a program like this:

It's going to be 20. When an exception or error happens, the current state must be preserved, not forgotten, because the we expect to see it later if we catch the exception.

This means we cannot use the model of the parser monad or the monad in the previous section:

This model says that the Left case for exceptions does not have the new state. This is OK for parser monads because a parser backtracks to an old state (input string) anyway; it is still OK in the previous section because I was not catching any exception, the whole interpreter aborts anyway.

But if exceptions are catchable, then you always have a new state no matter what. Only the “answer” part is absent when an exception happens. So we switch to this new model:

Stateful exception language set up

For simplicity and minimum distraction, this toy language just has a single global variable, call it x. It has just enough constructs to show my point.

data Expr = Num Integer
          | VarX                -- x
          | AssignX Expr        -- x := expr
          | Seq [Expr]          -- sequential execution
          | Raise Exception
          | CatcMyException Expr Expr -- try { expr } catch myexception { expr }

-- Possible exceptions.
data Exception = MyException | AnotherException
    deriving (Eq, Show)

-- The type of possible answers from the interpreter.
data Value = VN Integer
           | VNone
    deriving (Eq, Show)

in SemanticsException.hs

State and exception model

I have just one global variable, so I just use Value directly for my state. Here is my monad type definition:

data EM a = MkEM (Value -> (Value, Either Exception a))

unEM :: EM a -> Value -> (Value, Either Exception a)
unEM (MkEM f) = f

in SemanticsException.hs

Implementation of connectives and useful primitives: Note how in many places I carefully obtain the new state s1 and pass it on to the next thing to do.

instance Monad EM where
    return a = MkEM (\s -> (s, Right a))
    MkEM f >>= k = MkEM (\s0 -> case f s0 of
                            (s1, Left e) -> (s1, Left e)
                            (s1, Right a) -> unEM (k a) s1)
instance Applicative EM where
    pure = return
    mf <*> ma = mf >>= \f -> ma >>= \a -> return (f a)
instance Functor EM where
    fmap f ma = ma >>= \a -> return (f a)

-- Primitives for use by the interpreter.

getX :: EM Value
getX = MkEM (\s -> (s, Right s))

setX :: Value -> EM ()
setX i = MkEM (\_ -> (i, Right ()))

raise :: Exception -> EM a
raise e = MkEM (\s -> (s, Left e))

-- 1st param: run this first
-- 2nd param: handler if myexception happens
catchMyException :: EM a -> EM a -> EM a
catchMyException (MkEM f) handler = MkEM (\s0 -> case f s0 of
                                             (s1, Left MyException) -> unEM handler s1
                                             othercases -> othercases)

in SemanticsException.hs

With that, the interpreter is almost straightforward (and I don't have an environment because I don't have local variables or function application here):

mainInterp :: Expr -> Either Exception Value
mainInterp expr = snd (unEM (interp expr) VNone)
-- So I'm using VNone to initialize x.

interp :: Expr -> EM Value

interp (Num i) = pure (VN i)

interp VarX = getX

interp (AssignX e) =
    interp e
    >>= \val -> setX val
    >> pure VNone

interp (Seq es) = go es
  where
    -- The answer of the whole Seq is VNone if the list is empty, else the
    -- answer from the last expression.  But still run all expressions for the
    -- effects.
    go [] = pure VNone
    go [e] = interp e
    go (e:es) = interp e *> go es

interp (Raise e) = raise e

interp (CatcMyException expr handler) = catchMyException (interp expr) (interp handler)

in SemanticsException.hs

For CatcMyException, just be careful that it is not catchMyException expr handler.

-- x := 10;
-- try {
--   x := 20;
--   raise myexception;
-- } catch myexception {
-- }
-- read x
example = Seq [ AssignX (Num 10)
              , CatcMyException
                  (Seq [ AssignX (Num 20)
                       , Raise MyException
                       ])
                  (Seq [])
              , VarX
              ]
-- What should you get the end? 10? 20?
-- Test with: mainInterp example

in SemanticsException.hs