I use a toy language that's still expression-centred, with local variables and functions:
data Expr = Num Integer
| Var String
| Prim2 Op2 Expr Expr -- Prim2 op operand operand
| Let [(String, Expr)] Expr -- Let [(name, rhs), ...] eval-me
| Lambda String Expr -- Lambda var body
| App Expr Expr -- App func arg
| ...
in SemanticsState.hs
but it adds assignment for changing variable values:
data Expr = ...
| Assign String Expr -- var := expr
in SemanticsState.hs
And to make it really useful, you will like sequential composition, e.g.,
x:=x+y; y:=x-y; x:=x-y;
so I have a sequential construct that holds a list of expressions, the semantics being to evaluate them in order, the point being most of them are for modifying variables:
data Expr = ...
| Seq [Expr] -- sequential execution
in SemanticsState.hs
Possible values: I have one more case VNone for “no value”. This
happens as the result of evaluating “Seq []”. I also use this for
assignments, but this is an arbitrary choice, some of you may prefer the
assigned value.
data Value = ...
| VNone
in SemanticsState.hs
If you start simple, stateful programs are mathematically modelled by state transition functions S → S, if S is the set of all states. This works if you just have global variables and no local variables.
But complications creep in quickly.
If expressions can change state too, e.g., C has x++ + y--, then
you need S → (S, a) for both the new state and the result of evaluation.
If runtime errors are possible, then you need S → Either ErrorType (S, a).
(Reminder: data Either a b = Left a | Right b.)
If there are local variables, then the cleanest model needs a middle person between variables and values:
The environment maps variables to the middle person: addresses.
Reason: If it stores values, you lose updated values when exiting a scope.
The state maps addresses to values. It is memory content.
Reason: If it stores variables, then removal of local variables when exiting a scope becomes tricky. (Doable, and compilers do this, but tricky.)
In retrospect: environments are best for tracking variables, states are best for tracking current values. Hence the separation.
So here are the representation and important connectives and helpers:
data MutM a = MkMutM (Map Int Value -> Either ErrorType (Map Int Value, a))
data ErrorType = TypeError | VarNotFound
deriving Show
unMutM (MkMutM f) = f
-- Give an answer.
pure :: a -> MutM a
pure a = MkMutM (\s -> Right (s, a))
-- Sequential composition that passes answer from 1st stage to 2nd.
(>>=) :: MutM a -> (a -> MutM b) -> MutM b
MkMutM f >>= k = MkMutM (\s -> case f s of
Left msg -> Left msg
Right (s', a) -> unMutM (k a) s')
infixl 1 >>=
-- Raise an error.
raise :: ErrorType -> MutM a
raise err = MkMutM (\_ -> Left err)
-- Apply a function to the answer.
fmap :: (a -> b) -> MutM a -> MutM b
fmap f ma = ma >>= \a -> pure (f a)
-- Sequential composition that combines answers by a function.
liftA2 :: (a -> b -> c) -> MutM a -> MutM b -> MutM c
liftA2 f ma mb = ma >>= \a -> mb >>= \b -> pure (f a b)
-- Return the whole memory. Used by new and load below.
getMem = MkMutM (\s -> Right (s, s))
-- Modify the memory by a function, i.e., change s to f s. Used by store below.
modMem f = MkMutM (\s -> Right (f s, ()))
-- The interpreter shouldn't directly use getMem and modMem. Instead, use new,
-- load, store for individual addresses.
-- Allocate and store initial value, return new address. What's the new
-- address? I simply use the largest used address plus one.
new :: Value -> MutM Int
new val =
fmap Map.size getMem
>>= \n -> store n val
>>= \_ -> pure n
-- Read value at address.
load :: Int -> MutM Value
load a = fmap (! a) getMem
-- Write value to address.
store :: Int -> Value -> MutM ()
store a val = modMem (Map.insert a val)
in SemanticsState.hs
The main interpreter calls the helper interpreter interp with
initially empty environment and initially empty memory. When all is done, only
the error or the final value is extracted. interp is the one that
does the recursion and works in the MutM representation.
mainInterp :: Expr -> Either String Value mainInterp expr = case unMutM (interp expr Map.empty) Map.empty of Left e -> Left e Right (_, v) -> Right v interp :: Expr -> Map String Int -> MutM Valuein SemanticsState.hs
I first show assignment and those constructs affected by the split between
environment and memory. Keep in mind that every time you access a variable by
name, the environment gives you only an address; then you need to use
load or store for the memory content.
interp (Var v) env =
case Map.lookup v env of
Just a -> load a
Nothing -> raise VarNotFound
interp (Assign v e) env =
case Map.lookup v env of
Nothing -> raise VarNotFound
Just addr ->
interp e env
>>= \val -> store addr val
>>= \_ -> pure VNone
interp (Let eqns evalMe) env =
extend env eqns
>>= \env' -> interp evalMe env'
where
extend env [] = pure env
extend env ((v,rhs) : eqns) =
interp rhs env
-- Now we need to allocate memory to store val
>>= \val -> new val
-- And the environment stores the address, not the value!
>>= \addr -> let env' = Map.insert v addr env
in extend env' eqns
interp (App f e) env =
interp f env
>>= \c -> case c of
VClosure fEnv v body ->
interp e env
-- Now we need to allocate memory to store val.
>>= \eVal -> new eVal
-- And the environment stores the address, not the value!
>>= \addr -> let bEnv = Map.insert v addr fEnv
in interp body bEnv
_ -> raise TypeError
in SemanticsState.hs
The rest are straightforward or like the previous functional interpreter. I
just show how to do Seq just because it's new, but it's easy after
you know the intention:
interp (Seq es) env = go es
where
-- The answer of the whole Seq is VNone if the list is empty, else the
-- answer from the last expression. But still run all expressions for the
-- effects.
go [] = pure VNone
go [e] = interp e env
go (e:es) = interp e env >>= \_ -> go es
in SemanticsState.hs
In the code file there is a swapDemo sample program that goes
like:
let x=42; y=23
in x:=x+y; y:=x-y; x:=x-y;
x*10000 + y
Can you guess what it does?
“foo + bar” is sensitive to which operand is evaluated first
because either of them may change a variable the other may read. I chose left
to right, but it is not the only sensible choice. The C standard takes
“undefined behaviour” or “up to the compiler”, I forgot which.
What if I have this?
let x=0 in (\y -> x:=y) 4; x
The function will change x! This is like Javascript rather than Java.
What if I have this?
let f = (let x=0 in \y -> x:=x+1; x) in f 0; f 0; f 0
You get 3. f has an internal counter! This is related to objects, their fields, and how methods are modeled by closures over the fields (in fact over the whole object). Also think of the “this” pointer. Right here I have f playing the dual role of a single-method object and its method.
Exercise: I think you can do this in Javascript, without objects, just with functions, but I'm not sure. Can you help me?
I don't free up memory. Memory clean-up is a big interesting topic but outside the scope of this course. Here I want to focus on the need for thinking in terms of an environment and an address-indexed memory, before you go on after this course to study how they are implemented as a stack, a heap, and garbage collection.
What if I do this?
let x=0 in (\y -> y:=2) x; x
Note the App case allocates new memory for y (and later forgets
it). x will not be affected. This is “pass by value”.
Pass by value = value of parameter passed to function. Cloning and allocation if necessary (if mutable language), basically creating a local variable.
Note: Not to be confused with call by value. Call by value refers to evaluation order. Pass by value refers to copying the caller's argument value.
Pass by reference = y stands for the same address as x's rather than new
memory. “y:=2” will change x! (Also disallows
“(\y -> y:=2) (1+2)”.)
Pascal supports both, with pass-by-value as the default, pass-by-reference as an opt-in per parameter. It looks like this:
function foo(m : integer, var r : integer) : integer;
begin
r := m + 1;
return m
end;
m is pass-by-value. r is pass-by-reference (the “var” marker).
C++ is similar, the marker is “&”:
int foo(int m, int &r)
{
r = m + 1;
return m;
}
I extend my language and interpreter with a pass-by-reference function call:
data Expr = ...
| AppByRef Expr Expr -- AppByRef func arg
interp (AppByRef f e) env =
interp f env
>>= \c -> case c of
VClosure fEnv v body ->
-- Require e to be Var
case e of
Var x ->
-- Find address of x
case Map.lookup x env of
Nothing -> raise VarNotFound
Just addr ->
-- map v to address of x
let bEnv = Map.insert v addr fEnv
in interp body bEnv
_ -> raise TypeError
_ -> raise TypeError
in SemanticsState.hs