You would be correct to guess that a model for mutable variables looks like the state monad. How simple it is depends on the rest of the language.
If you have a fixed set of variables known up front, then the state can be as simple as a record/tuple, each field for a variable.
But you expect an interesting language to also have local variables that pop up and disappear, and even locally created functions or mutable data structures that can be passed/returned to the caller. This will be the hard part.
I use a toy language that's still expression-centred, with local variables and functions:
data Expr = Num Integer | Var String | Prim2 Op2 Expr Expr -- Prim2 op operand operand | Let [(String, Expr)] Expr -- Let [(name, rhs), ...] eval-me | Lambda String Expr -- Lambda var body | App Expr Expr -- App func arg | ...in SemanticsState.hs
but it adds a way to change variable values:
data Expr = ... | Assign String Expr -- var := exprin SemanticsState.hs
And to make it really useful, you will like to have a construct that stands for, e.g.,
x:=x+y; y:=x-y; x:=x-y;
so I have a sequential execution construct that holds a list of expressions, the semantics being to evaluate or run them in order, the point being most of them are for modifying variables:
data Expr = ... | Seq [Expr] -- sequential executionin SemanticsState.hs
Possible values: I have one more case VNone
for “no value”. This
happens as the result of evaluating “Seq []
”. I also use this for
assignments, but this is an arbitrary choice, some of you may prefer the value
of the RHS.
data Value = ... | VNonein SemanticsState.hs
My language is getting closer to Scheme! But it is also possible to tell the story in this lecture with a command-oriented language like shell scripts, Python, Go, etc.
Before I present a correct but pretty complex model, let me briefly describe the problem with two simple first-instinct models.
First instinct: Upon “x:=5”, change x's value in the environment. But in the interpreter, environments are only passed downwards in a recursive call chain, never upwards. The change will be lost as soon as the recursive call returns.
Second instinct: The state monad is precisely for passing data upwards (and sideways too)—you experienced this with the tree relabeling lab exercise. So don't use environments, use state for the variable→value dictionary!
The new problem is that it is very hard to avoid either of these two bugs about local variables. Just before entering a “let”, do you save a copy of the state, so that when you leave it, you restore the before-let state?
To be fair, there is a very tricky way to fix it correctly, but it is more like performance optimization than a model, so I won't describe it. But the trick is used by practical interpreters and compilers: basically, they have one state (the whole memory) but it's divided into a stack and a heap. If you're interested, I still think it's better to understand the model below first, and then the trick will make a lot of sense as an implementation technique.
The two flaws were trying to teach us:
Previously we had an immutable language, data didn't change over time, so we could stuff data into the environment. Not anymore. Now we need to de-couple them and add a layer of indirection:
Also don't forget that I can still have runtime errors. So here is my state monad with failure, along with important primitives:
data MutM a = MkMutM (IntMap Value -> Either String (IntMap Value, a)) unMutM (MkMutM f) = f instance Monad MutM where return a = MkMutM (\s -> Right (s, a)) MkMutM f >>= k = MkMutM (\s -> case f s of Left msg -> Left msg Right (s', a) -> unMutM (k a) s') -- Return the whole memory. Used by new and load below. getMem = MkMutM (\s -> Right (s, s)) -- Modify the memory by a function, i.e., change s to f s. Used by store below. modMem f = MkMutM (\s -> Right (f s, ())) -- The interpreter shouldn't directly use getMem and modMem. Instead, it just -- needs to know how to new (allocate and init), load from an address, store to -- an address. -- Allocate and store initial value, return new address. What's the new -- address? I simply use the largest used address plus one. new :: Value -> MutM Int new val = fmap IntMap.size getMem >>= \n -> store n val >> pure n -- Read value at address. load :: Int -> MutM Value load a = fmap (! a) getMem -- Write value to address. store :: Int -> Value -> MutM () store a val = modMem (IntMap.insert a val) -- And how to flag errors. Because no longer simply "Left". raise :: String -> MutM a raise msg = MkMutM (\_ -> Left msg)in SemanticsState.hs
The main interpreter calls the helper interpreter interp
with
initially empty environment and initially empty memory. When all is done, only
the error or the final value is extracted. interp
is the one that
does the recursion and works in my monad.
mainInterp :: Expr -> Either String Value mainInterp expr = fmap snd (unMutM (interp expr Map.empty) IntMap.empty) -- This fmap is Either's fmap. interp :: Expr -> Map String Int -> MutM Valuein SemanticsState.hs
I first show assignment and those constructs affected by the split between
environment and memory. Keep in mind that every time you access a variable by
name, the environment gives you only an address; then you need to use
load
or store
for the memory content.
interp (Var v) env = case Map.lookup v env of Just a -> load a Nothing -> raise "var not found" interp (Assign v e) env = case Map.lookup v env of Nothing -> raise "var not found" Just addr -> interp e env >>= \val -> store addr val >> pure VNone interp (Let eqns evalMe) env = extend env eqns >>= \env' -> interp evalMe env' where extend env [] = pure env extend env ((v,rhs) : eqns) = interp rhs env -- Now we need to allocate memory to store val >>= \val -> new val -- And the environment stores the address, not the value! >>= \addr -> let env' = Map.insert v addr env in extend env' eqns interp (App f e) env = interp f env >>= \c -> case c of VClosure fEnv v body -> interp e env -- Now we need to allocate memory to store val. >>= \eVal -> new eVal -- And the environment stores the address, not the value! >>= \addr -> let bEnv = Map.insert v addr fEnv in interp body bEnv _ -> raise "type error: not lambda"in SemanticsState.hs
The rest are straightforward or like the previous functional interpreter. I
just show how to do Seq
just because it's new, but it's easy after
you know the intention:
interp (Seq es) env = go es where -- The answer of the whole Seq is VNone if the list is empty, else the -- answer from the last expression. But still run all expressions for the -- effects. go [] = pure VNone go [e] = interp e env go (e:es) = interp e env *> go esin SemanticsState.hs
In the code file there is a swapDemo
sample program that goes
like:
let x=42; y=23 in x:=x+y; y:=x-y; x:=x-y; x*10000 + y
Can you guess what it does?
“foo + bar
” is sensitive to which operand is evaluated first
because either of them may change a variable the other may read. I chose left
to right, but it is not the only sensible choice. The C standard takes
“undefined behaviour” or “up to the compiler”, I forgot which.
What if I have this?
let x=0 in (\y -> x:=y) 4; x
The function will change x! This is like Javascript rather than Java.
What if I have this?
let f = (let x=0 in \y -> x:=x+1; x) in f 0; f 0; f 0
You get 3. f has an internal counter! This is related to objects, their fields, and how methods are modeled by closures over the fields (in fact over the whole object). Also think of the “this” pointer. Right here I have f playing the dual role of a single-method object and its method.
Exercise: I think you can do this in Javascript, without objects, just with functions, but I'm not sure. Can you help me?
I don't free up memory. Memory clean-up is a big interesting topic but outside the scope of this course. Here I want to focus on the need for thinking in terms of an environment and an address-indexed memory, before you go on after this course to study how they are implemented as a stack, a heap, and garbage collection.
What if I do this?
let x=0 in (\y -> y:=2) x; x
Note the App
case allocates new memory for y (and later forgets
it). x will not be affected. This is “pass by value”.
Pass by value = value of parameter passed to function. Cloning and allocation if necessary (if mutable language).
Note: Not to be confused with call by value. Call by value refers to evaluation order. Pass by value refers to not changing the caller's variable.
Pass by reference = y stands for the same address as x's rather than new
memory. “y:=2
” will change x! (Also disallows
“(\y -> y:=2) (1+2)
”.)
Pascal supports both, with pass-by-value as the default, pass-by-reference as an opt-in per parameter. It looks like this:
function foo(m : integer, var r : integer) : integer; begin r := m + 1; return m end;
m is pass-by-value. r is pass-by-reference (the “var” marker).
C++ is similar:
int foo(int m, int &r) { r = m + 1; return m; }
Suppose there is a language that supports state and catching exceptions. Let's try a program like this:
x := 10; try { x := 20; raise myexception; } catch myexception { } read x
What do we expect to be the final value of x?
It's going to be 20. When an exception or error happens, the current state must be preserved, not forgotten, because the we expect to see it later if we catch the exception.
This means we cannot use the model of the parser monad or the monad in the previous section:
MyState -> Either Exception (MyState, a)
This model says that the Left
case for exceptions does not have
the new state. This is OK for parser monads because a parser backtracks to an
old state (input string) anyway; it is still OK in the previous section because
I was not catching any exception, the whole interpreter aborts anyway.
But if exceptions are catchable, then you always have a new state no matter what. Only the “answer” part is absent when an exception happens. So we switch to this new model:
MyState -> (MyState, Either Exception a)
For simplicity and minimum distraction, this toy language just has a single global variable, call it x. It has just enough constructs to show my point.
data Expr = Num Integer | VarX -- x | AssignX Expr -- x := expr | Seq [Expr] -- sequential execution | Raise Exception | CatcMyException Expr Expr -- try { expr } catch myexception { expr } -- Possible exceptions. data Exception = MyException | AnotherException deriving (Eq, Show) -- The type of possible answers from the interpreter. data Value = VN Integer | VNone deriving (Eq, Show)in SemanticsException.hs
I have just one global variable, so I just use Value
directly
for my state. Here is my monad type definition:
data EM a = MkEM (Value -> (Value, Either Exception a)) unEM :: EM a -> Value -> (Value, Either Exception a) unEM (MkEM f) = fin SemanticsException.hs
Implementation of connectives and useful primitives: Note how in many places
I carefully obtain the new state s1
and pass it on to the next
thing to do.
instance Monad EM where return a = MkEM (\s -> (s, Right a)) MkEM f >>= k = MkEM (\s0 -> case f s0 of (s1, Left e) -> (s1, Left e) (s1, Right a) -> unEM (k a) s1) instance Applicative EM where pure = return mf <*> ma = mf >>= \f -> ma >>= \a -> return (f a) instance Functor EM where fmap f ma = ma >>= \a -> return (f a) -- Primitives for use by the interpreter. getX :: EM Value getX = MkEM (\s -> (s, Right s)) setX :: Value -> EM () setX i = MkEM (\_ -> (i, Right ())) raise :: Exception -> EM a raise e = MkEM (\s -> (s, Left e)) -- 1st param: run this first -- 2nd param: handler if myexception happens catchMyException :: EM a -> EM a -> EM a catchMyException (MkEM f) handler = MkEM (\s0 -> case f s0 of (s1, Left MyException) -> unEM handler s1 othercases -> othercases)in SemanticsException.hs
With that, the interpreter is almost straightforward (and I don't have an environment because I don't have local variables or function application here):
mainInterp :: Expr -> Either Exception Value mainInterp expr = snd (unEM (interp expr) VNone) -- So I'm using VNone to initialize x. interp :: Expr -> EM Value interp (Num i) = pure (VN i) interp VarX = getX interp (AssignX e) = interp e >>= \val -> setX val >> pure VNone interp (Seq es) = go es where -- The answer of the whole Seq is VNone if the list is empty, else the -- answer from the last expression. But still run all expressions for the -- effects. go [] = pure VNone go [e] = interp e go (e:es) = interp e *> go es interp (Raise e) = raise e interp (CatcMyException expr handler) = catchMyException (interp expr) (interp handler)in SemanticsException.hs
For CatcMyException
, just be careful that it is
not catchMyException expr handler
.
The code file also includes the example program I used earlier:
-- x := 10; -- try { -- x := 20; -- raise myexception; -- } catch myexception { -- } -- read x example = Seq [ AssignX (Num 10) , CatcMyException (Seq [ AssignX (Num 20) , Raise MyException ]) (Seq []) , VarX ] -- What should you get the end? 10? 20? -- Test with: mainInterp examplein SemanticsException.hs
Do test it out and see what happens.