Semantics II: State

Stateful language

I use a toy language that's still expression-centred, with local variables and functions:

data Expr = Num Integer
          | Var String
          | Prim2 Op2 Expr Expr         -- Prim2 op operand operand
          | Let [(String, Expr)] Expr   -- Let [(name, rhs), ...] eval-me
          | Lambda String Expr          -- Lambda var body
          | App Expr Expr               -- App func arg
          | ...

in SemanticsState.hs

but it adds assignment for changing variable values:

data Expr = ...
          | Assign String Expr          -- var := expr

in SemanticsState.hs

And to make it really useful, you will like sequential composition, e.g.,

x:=x+y; y:=x-y; x:=x-y;

so I have a sequential construct that holds a list of expressions, the semantics being to evaluate them in order, the point being most of them are for modifying variables:

data Expr = ...
          | Seq [Expr]                  -- sequential execution

in SemanticsState.hs

Possible values: I have one more case VNone for “no value”. This happens as the result of evaluating “Seq []”. I also use this for assignments, but this is an arbitrary choice, some of you may prefer the assigned value.

data Value = ...
           | VNone

in SemanticsState.hs

Model

If you start simple, stateful programs are mathematically modelled by state transition functions S → S, if S is the set of all states. This works if you just have global variables and no local variables.

But complications creep in quickly.

If expressions can change state too, e.g., C has x++ + y--, then you need S → (S, a) for both the new state and the result of evaluation.

If runtime errors are possible, then you need S → Either ErrorType (S, a). (Reminder: data Either a b = Left a | Right b.)

If there are local variables, then the cleanest model needs a middle person between variables and values:

The environment maps variables to the middle person: addresses.
Reason: If it stores values, you lose updated values when exiting a scope.
The state maps addresses to values. It is memory content.
Reason: If it stores variables, then removal of local variables when exiting a scope becomes tricky. (Doable, and compilers do this, but tricky.)

In retrospect: environments are best for tracking variables, states are best for tracking current values. Hence the separation.

So here are the representation and important connectives and helpers:

data MutM a = MkMutM (Map Int Value -> Either ErrorType (Map Int Value, a))
data ErrorType = TypeError | VarNotFound
    deriving Show
unMutM (MkMutM f) = f

-- Give an answer.
pure :: a -> MutM a
pure a = MkMutM (\s -> Right (s, a))

-- Sequential composition that passes answer from 1st stage to 2nd.
(>>=) :: MutM a -> (a -> MutM b) -> MutM b
MkMutM f >>= k = MkMutM (\s -> case f s of
                                 Left msg -> Left msg
                                 Right (s', a) -> unMutM (k a) s')
infixl 1 >>=

-- Raise an error.
raise :: ErrorType -> MutM a
raise err = MkMutM (\_ -> Left err)

-- Apply a function to the answer.
fmap :: (a -> b) -> MutM a -> MutM b
fmap f ma = ma >>= \a -> pure (f a)

-- Sequential composition that combines answers by a function.
liftA2 :: (a -> b -> c) -> MutM a -> MutM b -> MutM c
liftA2 f ma mb = ma >>= \a -> mb >>= \b -> pure (f a b)

-- Return the whole memory. Used by new and load below.
getMem = MkMutM (\s -> Right (s, s))

-- Modify the memory by a function, i.e., change s to f s. Used by store below.
modMem f = MkMutM (\s -> Right (f s, ()))

-- The interpreter shouldn't directly use getMem and modMem. Instead, use new,
-- load, store for individual addresses.

-- Allocate and store initial value, return new address.  What's the new
-- address?  I simply use the largest used address plus one.
new :: Value -> MutM Int
new val =
    fmap Map.size getMem
    >>= \n -> store n val
    >>= \_ -> pure n

-- Read value at address.
load :: Int -> MutM Value
load a = fmap (! a) getMem

-- Write value to address.
store :: Int -> Value -> MutM ()
store a val = modMem (Map.insert a val)

in SemanticsState.hs

The main interpreter calls the helper interpreter interp with initially empty environment and initially empty memory. When all is done, only the error or the final value is extracted. interp is the one that does the recursion and works in the MutM representation.

mainInterp :: Expr -> Either String Value
mainInterp expr = case unMutM (interp expr Map.empty) Map.empty of
  Left e -> Left e
  Right (_, v) -> Right v

interp :: Expr -> Map String Int -> MutM Value

in SemanticsState.hs

I first show assignment and those constructs affected by the split between environment and memory. Keep in mind that every time you access a variable by name, the environment gives you only an address; then you need to use load or store for the memory content.

interp (Var v) env =
    case Map.lookup v env of
      Just a -> load a
      Nothing -> raise VarNotFound

interp (Assign v e) env =
    case Map.lookup v env of
      Nothing -> raise VarNotFound
      Just addr ->
          interp e env
          >>= \val -> store addr val
          >>= \_ -> pure VNone

interp (Let eqns evalMe) env =
    extend env eqns
    >>= \env' -> interp evalMe env'
  where
    extend env [] = pure env
    extend env ((v,rhs) : eqns) =
        interp rhs env
        -- Now we need to allocate memory to store val
        >>= \val -> new val
        -- And the environment stores the address, not the value!
        >>= \addr -> let env' = Map.insert v addr env
        in extend env' eqns

interp (App f e) env =
    interp f env
    >>= \c -> case c of
      VClosure fEnv v body ->
          interp e env
          -- Now we need to allocate memory to store val.
          >>= \eVal -> new eVal
          -- And the environment stores the address, not the value!
          >>= \addr -> let bEnv = Map.insert v addr fEnv
          in interp body bEnv
      _ -> raise TypeError

in SemanticsState.hs

The rest are straightforward or like the previous functional interpreter. I just show how to do Seq just because it's new, but it's easy after you know the intention:

interp (Seq es) env = go es
  where
    -- The answer of the whole Seq is VNone if the list is empty, else the
    -- answer from the last expression.  But still run all expressions for the
    -- effects.
    go [] = pure VNone
    go [e] = interp e env
    go (e:es) = interp e env >>= \_ -> go es

in SemanticsState.hs

In the code file there is a swapDemo sample program that goes like:

let x=42; y=23
 in x:=x+y; y:=x-y; x:=x-y;
    x*10000 + y

Can you guess what it does?

Discussion

“foo + bar” is sensitive to which operand is evaluated first because either of them may change a variable the other may read. I chose left to right, but it is not the only sensible choice. The C standard takes “undefined behaviour” or “up to the compiler”, I forgot which.
What if I have this?
```
let x=0
 in (\y -> x:=y) 4; x
```
The function will change x! This is like Javascript rather than Java.
What if I have this?
```
let f = (let x=0 in \y -> x:=x+1; x)
 in f 0; f 0; f 0
```
You get 3. f has an internal counter! This is related to objects, their fields, and how methods are modeled by closures over the fields (in fact over the whole object). Also think of the “this” pointer. Right here I have f playing the dual role of a single-method object and its method.
Exercise: I think you can do this in Javascript, without objects, just with functions, but I'm not sure. Can you help me?
I don't free up memory. Memory clean-up is a big interesting topic but outside the scope of this course. Here I want to focus on the need for thinking in terms of an environment and an address-indexed memory, before you go on after this course to study how they are implemented as a stack, a heap, and garbage collection.

Pass by value, pass by reference

What if I do this?

let x=0
in (\y -> y:=2) x; x

Note the App case allocates new memory for y (and later forgets it). x will not be affected. This is “pass by value”.

Pass by value = value of parameter passed to function. Cloning and allocation if necessary (if mutable language), basically creating a local variable.

Note: Not to be confused with call by value. Call by value refers to evaluation order. Pass by value refers to copying the caller's argument value.

Pass by reference = y stands for the same address as x's rather than new memory. “y:=2” will change x! (Also disallows “(\y -> y:=2) (1+2)”.)

Pascal supports both, with pass-by-value as the default, pass-by-reference as an opt-in per parameter. It looks like this:

function foo(m : integer, var r : integer) : integer;
begin
    r := m + 1;
    return m
end;

m is pass-by-value. r is pass-by-reference (the “var” marker).

C++ is similar, the marker is “&”:

int foo(int m, int &r)
{
    r = m + 1;
    return m;
}

I extend my language and interpreter with a pass-by-reference function call:

data Expr = ...
          | AppByRef Expr Expr          -- AppByRef func arg

interp (AppByRef f e) env =
    interp f env
    >>= \c -> case c of
      VClosure fEnv v body ->
        -- Require e to be Var
        case e of
          Var x ->
            -- Find address of x
            case Map.lookup x env of
              Nothing -> raise VarNotFound
              Just addr ->
                -- map v to address of x
                let bEnv = Map.insert v addr fEnv
                in interp body bEnv
          _ -> raise TypeError
      _ -> raise TypeError

in SemanticsState.hs

Contents

Semantics II: State

Stateful language

Model

Discussion

Pass by value, pass by reference