I use a toy language that's still expression-centred, with local variables and functions:
data Expr = Num Integer | Var String | Prim2 Op2 Expr Expr -- Prim2 op operand operand | Let [(String, Expr)] Expr -- Let [(name, rhs), ...] eval-me | Lambda String Expr -- Lambda var body | App Expr Expr -- App func arg | ...in SemanticsState.hs
but it adds assignment for changing variable values:
data Expr = ... | Assign String Expr -- var := exprin SemanticsState.hs
And to make it really useful, you will like sequential composition, e.g.,
x:=x+y; y:=x-y; x:=x-y;
so I have a sequential construct that holds a list of expressions, the semantics being to evaluate them in order, the point being most of them are for modifying variables:
data Expr = ... | Seq [Expr] -- sequential executionin SemanticsState.hs
Possible values: I have one more case VNone
for “no value”. This
happens as the result of evaluating “Seq []
”. I also use this for
assignments, but this is an arbitrary choice, some of you may prefer the
assigned value.
data Value = ... | VNonein SemanticsState.hs
If you start simple, stateful programs are mathematically modelled by state transition functions S → S, if S is the set of all states. This works if you just have global variables and no local variables.
But complications creep in quickly.
If expressions can change state too, e.g., C has x++ + y--
, then
you need S → (S, a) for both the new state and the result of evaluation.
If runtime errors are possible, then you need S → Either ErrorType (S, a).
(Reminder: data Either a b = Left a | Right b
.)
If there are local variables, then the cleanest model needs a middle person between variables and values:
The environment maps variables to the middle person: addresses.
Reason: If it stores values, you lose updated values when exiting a scope.
The state maps addresses to values. It is memory content.
Reason: If it stores variables, then removal of local variables when exiting a scope becomes tricky. (Doable, and compilers do this, but tricky.)
In retrospect: environments are best for tracking variables, states are best for tracking current values. Hence the separation.
So here are the representation and important connectives and helpers:
data MutM a = MkMutM (Map Int Value -> Either ErrorType (Map Int Value, a)) data ErrorType = TypeError | VarNotFound deriving Show unMutM (MkMutM f) = f -- Give an answer. pure :: a -> MutM a pure a = MkMutM (\s -> Right (s, a)) -- Sequential composition that passes answer from 1st stage to 2nd. (>>=) :: MutM a -> (a -> MutM b) -> MutM b MkMutM f >>= k = MkMutM (\s -> case f s of Left msg -> Left msg Right (s', a) -> unMutM (k a) s') infixl 1 >>= -- Raise an error. raise :: ErrorType -> MutM a raise err = MkMutM (\_ -> Left err) -- Apply a function to the answer. fmap :: (a -> b) -> MutM a -> MutM b fmap f ma = ma >>= \a -> pure (f a) -- Sequential composition that combines answers by a function. liftA2 :: (a -> b -> c) -> MutM a -> MutM b -> MutM c liftA2 f ma mb = ma >>= \a -> mb >>= \b -> pure (f a b) -- Return the whole memory. Used by new and load below. getMem = MkMutM (\s -> Right (s, s)) -- Modify the memory by a function, i.e., change s to f s. Used by store below. modMem f = MkMutM (\s -> Right (f s, ())) -- The interpreter shouldn't directly use getMem and modMem. Instead, use new, -- load, store for individual addresses. -- Allocate and store initial value, return new address. What's the new -- address? I simply use the largest used address plus one. new :: Value -> MutM Int new val = fmap Map.size getMem >>= \n -> store n val >>= \_ -> pure n -- Read value at address. load :: Int -> MutM Value load a = fmap (! a) getMem -- Write value to address. store :: Int -> Value -> MutM () store a val = modMem (Map.insert a val)in SemanticsState.hs
The main interpreter calls the helper interpreter interp
with
initially empty environment and initially empty memory. When all is done, only
the error or the final value is extracted. interp
is the one that
does the recursion and works in the MutM
representation.
mainInterp :: Expr -> Either String Value mainInterp expr = case unMutM (interp expr Map.empty) Map.empty of Left e -> Left e Right (_, v) -> Right v interp :: Expr -> Map String Int -> MutM Valuein SemanticsState.hs
I first show assignment and those constructs affected by the split between
environment and memory. Keep in mind that every time you access a variable by
name, the environment gives you only an address; then you need to use
load
or store
for the memory content.
interp (Var v) env = case Map.lookup v env of Just a -> load a Nothing -> raise VarNotFound interp (Assign v e) env = case Map.lookup v env of Nothing -> raise VarNotFound Just addr -> interp e env >>= \val -> store addr val >>= \_ -> pure VNone interp (Let eqns evalMe) env = extend env eqns >>= \env' -> interp evalMe env' where extend env [] = pure env extend env ((v,rhs) : eqns) = interp rhs env -- Now we need to allocate memory to store val >>= \val -> new val -- And the environment stores the address, not the value! >>= \addr -> let env' = Map.insert v addr env in extend env' eqns interp (App f e) env = interp f env >>= \c -> case c of VClosure fEnv v body -> interp e env -- Now we need to allocate memory to store val. >>= \eVal -> new eVal -- And the environment stores the address, not the value! >>= \addr -> let bEnv = Map.insert v addr fEnv in interp body bEnv _ -> raise TypeErrorin SemanticsState.hs
The rest are straightforward or like the previous functional interpreter. I
just show how to do Seq
just because it's new, but it's easy after
you know the intention:
interp (Seq es) env = go es where -- The answer of the whole Seq is VNone if the list is empty, else the -- answer from the last expression. But still run all expressions for the -- effects. go [] = pure VNone go [e] = interp e env go (e:es) = interp e env >>= \_ -> go esin SemanticsState.hs
In the code file there is a swapDemo
sample program that goes
like:
let x=42; y=23 in x:=x+y; y:=x-y; x:=x-y; x*10000 + y
Can you guess what it does?
“foo + bar
” is sensitive to which operand is evaluated first
because either of them may change a variable the other may read. I chose left
to right, but it is not the only sensible choice. The C standard takes
“undefined behaviour” or “up to the compiler”, I forgot which.
What if I have this?
let x=0 in (\y -> x:=y) 4; x
The function will change x! This is like Javascript rather than Java.
What if I have this?
let f = (let x=0 in \y -> x:=x+1; x) in f 0; f 0; f 0
You get 3. f has an internal counter! This is related to objects, their fields, and how methods are modeled by closures over the fields (in fact over the whole object). Also think of the “this” pointer. Right here I have f playing the dual role of a single-method object and its method.
Exercise: I think you can do this in Javascript, without objects, just with functions, but I'm not sure. Can you help me?
I don't free up memory. Memory clean-up is a big interesting topic but outside the scope of this course. Here I want to focus on the need for thinking in terms of an environment and an address-indexed memory, before you go on after this course to study how they are implemented as a stack, a heap, and garbage collection.
What if I do this?
let x=0 in (\y -> y:=2) x; x
Note the App
case allocates new memory for y (and later forgets
it). x will not be affected. This is “pass by value”.
Pass by value = value of parameter passed to function. Cloning and allocation if necessary (if mutable language), basically creating a local variable.
Note: Not to be confused with call by value. Call by value refers to evaluation order. Pass by value refers to copying the caller's argument value.
Pass by reference = y stands for the same address as x's rather than new
memory. “y:=2
” will change x! (Also disallows
“(\y -> y:=2) (1+2)
”.)
Pascal supports both, with pass-by-value as the default, pass-by-reference as an opt-in per parameter. It looks like this:
function foo(m : integer, var r : integer) : integer; begin r := m + 1; return m end;
m is pass-by-value. r is pass-by-reference (the “var” marker).
C++ is similar, the marker is “&”:
int foo(int m, int &r) { r = m + 1; return m; }
I extend my language and interpreter with a pass-by-reference function call:
data Expr = ... | AppByRef Expr Expr -- AppByRef func arg interp (AppByRef f e) env = interp f env >>= \c -> case c of VClosure fEnv v body -> -- Require e to be Var case e of Var x -> -- Find address of x case Map.lookup x env of Nothing -> raise VarNotFound Just addr -> -- map v to address of x let bEnv = Map.insert v addr fEnv in interp body bEnv _ -> raise TypeError _ -> raise TypeErrorin SemanticsState.hs