Contents

Top

Semantics I: Expressions, Bindings, Functions

Here is a toy language featuring expressions, variable bindings, and functions. These are basic features present in almost all languages. We will also run into a few topics and discussions.

For least setup, I also start with dynamic checking of types (during evaluation, not upfront) and even dynamic checking that variables exist at use sites. Static checking of those two are two big lessons in their own right.

Vocabulary: Generally, “static X” means doing X by analysing the code, without needing to run it and wait; “dynamic X” and recently “just in time X” means doing X while running the code, and therefore only on code paths actually taken in a run.

Most of my interpreter is meant to be more mathematical than efficient. I am secretly writing mathematical formulas to define the meanings of the constructs.

“A formula is worth a thousand pictures.” – Dijkstra

Setup

I will have several kinds of run-time errors: Type errors, variable not found. If you support division, you will also have division by zero.

I use the ExprInterp type below to represent this possibility. (More general/abstract thinking: This type represents the interpreter and models the possible effects of the language. Not obvious now, makes more sense after modeling more languages.)

data ExprInterp a = Error ErrorType | Success a
data ErrorType = TypeError | VarNotFound
in SemanticsFunctions.hs

So my interpreter type goes as:

mainInterp :: Expr -> ExprInterp Value
in SemanticsFunctions.hs

Connectives that factor out 3 common chores in the interpreter: Give an answer, raise an error, sequential composition:

-- Give an answer. Simply "Success" for ExprInterp, but can be more complex for
-- other interpreters.
pure :: a -> ExprInterp a
pure a = Success a

-- Raise an error. Simply "Error" for ExprInterp, but can be more complex for
-- other interpreters.
raise :: ErrorType -> ExprInterp a
raise e = Error e

-- Sequential composition of two interpreter stages: Check success of
-- 1st stage, and if success, pass answer to 2nd stage.
(>>=) :: ExprInterp a -> (a -> ExprInterp b) -> ExprInterp b
Error msg >>= _ = Error msg
Success a >>= k = k a
in SemanticsFunctions.hs

For reasons explained in a later section (about variables and environments), I will be passing around a dictionary that maps variables to values. So I need this:

mainInterp expr = interp expr Map.empty

interp :: Expr -> Map String Value -> ExprInterp Value
in SemanticsFunctions.hs

And now I implement interp for each construct.

Basic constructs

My language has number literals, boolean literals, and binary operators:

data Expr = Num Integer
          | Bln Bool
          | Prim2 Op2 Expr Expr         -- Prim2 op operand operand
          | ...

data Op2 = Eq | Plus | Mul
in SemanticsFunctions.hs

I will be evaluating them (and more) to number values, boolean values, and later another kind of values. A clean habit is not to re-use the abstract syntax tree type, but to define a separate type, since values have much fewer possibilities, and some possibilities will not correspond well to any abstract syntax tree.

data Value = VN Integer
           | VB Bool
           | ...
in SemanticsFunctions.hs

Here is how I evaluate a number literal. (Boolean literal is similar.)

interp (Num i) _ = pure (VN i)
in SemanticsFunctions.hs

Arithmetic (all operands evaluated)

Here is how I evaluate addition; most other arithmetic operators are similar. The insight is to use structural recursion to evaluate the operands, then you will have number values to add. The annoying part is to check that the values are actually numbers—so I re-factor out the checking to a helper function (learned it the hard way).

interp (Prim2 Plus e1 e2) env =
    interp e1 env
    >>= \a -> intOrDie a
    >>= \i -> interp e2 env
    >>= \b -> intOrDie b
    >>= \j -> return (VN (i+j))

intOrDie :: Value -> ExprInterp Integer
intOrDie (VN i) = pure i
intOrDie _ = raise TypeError
in SemanticsFunctions.hs

I don't support division. If you do, you have one more annoying task to check for zero in the denominator!

Short-circuiting, conditionals (operands selectively evaluated)

I have an if-then-else:

data Expr = ...
          | Cond Expr Expr Expr         -- Cond test then-branch else-branch
in SemanticsFunctions.hs

This is a short-circuiting operator: some operands are selectively evaluated, others skipped. Here it goes:

interp (Cond test eThen eElse) env =
    interp test env
    >>= \a -> case a of
      VB True -> interp eThen env
      VB False -> interp eElse env
      _ -> raise TypeError
in SemanticsFunctions.hs

You can add short-circuting logical operators (and, or), and their semantics will be similar. Also, we could/should re-factor checking that we actually get a boolean value! (Exercise: implement and use a boolOrDie.)

Variables, local bindings, environments (scopes)

data Expr = ...
          | Var String
in SemanticsFunctions.hs

Where do we get the contents of variables from? A nice solution is to maintain a dictionary that maps variables to contents, so we can just look up. This dictionary is called “environment”; and mapping a variable to its content is called “binding”.

Also, the nature of the contents depend on the evaluation strategy of the language, e.g., call by value just needs values (what we do in this lecture), lazy evaluation needs something more complex to cater for partly evaluated, partly unevaluated expressions.

I use Data.Map (internally binary search trees) for dictionaries. Practical interpreters use hash tables; compilers use an array because they compile variable names to addresses.

Two constructs in this lecture are responsible for setting up environments: local bindings in this section later, and function application in a later section. For now we first read from the environment.

So to evaluate a variable, just look up, and give up if not found.

interp (Var v) env = case Map.lookup v env of
  Just a -> pure a
  Nothing -> raise VarNotFound
in SemanticsFunctions.hs

Roast: In some languages, when a variable is not found, it is created on the spot and bound to a “default” value. What's wrong is that rationally there is no such thing. What should be the default boolean value? (Note that either choice is wrong half of the time.) What should be the default number value? (Note that 0 is wrong for many use cases, e.g., when you plan to multiply numbers.) What should be the default value for time? For temperature?

My local binding construct wraps an expression inside a new local context of 0 or more “name = expr” bindings:

data Expr = ...
          | Let [(String, Expr)] Expr   -- Let [(name, rhs), ...] eval-me
in SemanticsFunctions.hs

E.g.,

let { x=1; y=0 }
 in x+y

is represented as

Let [("x", Num 1), ("y", Num 0)] (Prim2 Plus (Var "x") (Var "y"))

There are a number of decisions to make about the semantics of the local binding construct. (It is also possible to offer many different local binding constructs, one for each way of making these decisions. Scheme is a good example.)

Because I chose sequential binding, right after I process one equation, I have to extend the environment to include its new binding, under which I process the remaining equations and eventually the wrapped expression.

interp (Let eqns evalMe) env =
    extend eqns env
    >>= \env' -> interp evalMe env'
    -- Example:
    --    let x=2+3; y=x+4 in x+y
    -- -> x+y   (with x=5, y=9 in the larger environment env')
    -- "extend env eqns" builds env'
  where
    extend [] env = return env
    extend ((v,rhs) : eqns) env =
        interp rhs env
        >>= \a ->
        let env' = Map.insert v a env
        in extend eqns env'
in SemanticsFunctions.hs

E.g., working on let { x=2+3; y=x+4; } in x+y:

extend's nth iteration v rhs env env'
1 x 2+3 {} { x = 5 }
2 y x+4 { x = 5 } { x = 5, y = 9 }

Now we're ready to evaluate x+y under { x = 5, y = 9 }.

How these are local variables unknown to the outside: Suppose I have

(let {x=2+3; y=x+4;} in x+y) * (1+1)

I make a recursive call to handle the “let-in”. Inside that recursive call the new environment { x = 5, y = 9 } is built for internal use, but not returned or passed back to the outside. The outside still uses the outside environment for “1+1” etc.

Function construction (lambda), closures

For simplicity, I just have a lambda construct for anonymous functions; if you want to define a function with a name, use lambda together with “let”. I don't support recursive functions for now.

data Expr = ...
          | Lambda String Expr          -- Lambda argname body
in SemanticsFunctions.hs

E.g.,

\x -> ...

is represented as

Lambda "x" (...)

To evaluate a lambda, the first instinct is that there is almost nothing you can do now, the value is like the lambda itself, simply a tuple/record storing the parameter name and the function body. You're almost there! This is just missing some environment because…

Suppose your lambda is “\y->x+y”. What do you use for the x there?

Terminology: In that expression, y is a “bound variable”, x is a “free variable”.

This also carries to “let”: In “let y=x+1 in x*yy is a bound variable, x is a free variable.

Bound variable (of/in an expression): You can see where the variable is bound (introduced or declared or defined).

Free variable (of/in an expression): You can't, it has to come from the outside.

A free variable like x is probably bound in an outer context, e.g.,

let x=10 in ..... (\y->x+y) ....

When the interpreter runs into the lambda, and if it already knows x=10 from the outer context, it needs to attach “x=10” to the lambda so it is not forgotten!

So the value after evaluating a lambda needs to remember 3 things: parameter name, function body, the environment in scope for this lambda. (In practice people trim the environment down to just what the function body needs. For simplicity I don't.)

Terminology: The combination of “\y->x+y” plus “oh BTW x=10” is called a “closure”.

A closure is a record or data structure that stores an expression (usually a lambda, but generally can be any expression, or pointer to code if asm level) together with the environment for all of its free variables.

In this lecture, my only use of closures is for lambdas, so my closure representation is specialized for that purpose only.

data Value = ...
           | VClosure (Map String Value) String Expr

interp (Lambda v body) env = pure (VClosure env v body)
in SemanticsFunctions.hs

Fun fact: When Javascript, Java, and C++ finally added lambdas, they finally realized how difficult this business of free variables and closures is! They also have an extra issue:

Since x is a mutable variable in their languages, you get to ask: Does it mean the value at the time of creating the lambda, or does it mean a reference?

Function application

data Expr = ...
          | App Expr Expr               -- App func param
in SemanticsFunctions.hs

Exercise: Give some scenerios supporting why the function can be an arbitrary expression, not always a lambda or a name. Most 1970s languages allowed a name only! How dare…

There is a decision to make about the semantics of function application, namely evaluation strategy: call by value, lazy evaluation, call by name. I do call by value.

However, all of them require you to evaluate the function until you get a function closure, sooner or later. (Even if it's a name, you still have to look it up.) Usually people do it first, so do I.

Afterwards, since I do call by value, I evaluate the parameter until I get a value. Now I'm ready to plug and chug.

How do I plug and chug? Just like evaluating let, I can first extend the environment to bind the parameter name to the parameter value, then it makes sense to evaluate the function body under that environment.

Exercise: But extend which environment? Recall I have two now: The one in scope, and the one stored in the closure. Which one?

interp (App f e) env =
    interp f env
    >>= \c -> case c of
      VClosure fEnv v body ->
          interp e env
          >>= \eVal ->
          let bEnv = Map.insert v eVal fEnv  -- fEnv, not env
          in interp body bEnv
          -- E.g.,
          --    (\y -> 10+y) 17
          -- -> 10 + y      (but with y=17 in environment)
          --
      _ -> raise TypeError
in SemanticsFunctions.hs

E.g., for (let x=7 in \y -> x+y) 10:

  1. interp on the function gives:

    VClosure {x = VN 7} "y" («x+y»)
  2. interp on the “10” gives VN 10

  3. Now do

    interp («x+y») {x = VN 7, y = VN 10}

What if we extend the environment in scope, i.e., bEnv extends env rather than fEnv? This was basically the bug made in early implementations of Lisp. Some people like this bug and regard it as a feature: “dynamic scoping”. Roast: It's a bug, not a feature.

Dynamic scoping: a variable name refers to whoever has that name at the time of evaluation. Don't do this. Example:

  let { x=10; f = \y->x+y ; } in
  let { x=5; } in
  f 0
→ 5+0

Lexical/Static scoping: a variable name refers to whoever has that name in the code location. This is the sane way we all do today. Do this

Recursion

I don't have a recursion or iteration construct yet. Do you believe that I can do factorial? Do you even believe that I can do non-termination?

Answer: Try “(\x -> x x) (\x -> x x)”. TEE HEE HEE! “(\x -> x x)” is called “diagonal” and “Delta”. Recall the diagonalization proofs in computability, which talk about running a program with itself as input.

And here is how to do factorial:

let mkFac = \f -> \n -> if n=0 then 1 else n * (f f) (n-1)
in mkFac mkFac 2

The trick is to take an extra parameter for a function to be called. Then, like in diagonalization, call a function with itself as a parameter.

Here is a high-level trace:

   mkFac mkFac 2
→
   (\f -> \n -> if n=0 then 1 else n * (f f) (n-1)) mkFac 2
→
   (\n -> if n=0 then 1 else n * (mkFac mkFac) (n-1)) 2
→
   if 2=0 then 1 else 2 * (mkFac mkFac) (2-1)
→
   2 * (mkFac mkFac) (2-1)
→
   2 * (\f -> \n -> if n=0 then 1 else n * (f f) (n-1)) mkFac (2-1)
→
   2 * (\n -> if n=0 then 1 else n * (mkFac mkFac) (n-1)) (2-1)
→
   2 * (\n -> if n=0 then 1 else n * (mkFac mkFac) (n-1)) 1
→
   2 * if 1=0 then 1 else 1 * (mkFac mkFac) (1-1)
→
   2 * 1 * mkFac mkFac (1-1)
→
   2 * 1 * (\n -> if n=0 then 1 else n * mkFac mkFac (n-1)) (1-1)
→
   2 * 1 * (\n -> if n=0 then 1 else n * mkFac mkFac (n-1)) 0
→
   2 * 1 * if 0=0 then 1 else 0 * ...
→
   2 * 1 * 1

Exercise: Extend this technique to mutual recursion among multiple functions.

Question for advanced theory or philosophy: Where does the recursion come from?!

So why do languages support recursion directly? Why not just make you write code like this? Answer: Language design tries to let you express your intention directly rather than encoding/simulating your intention as something else. Your code is then more comprehensible.

Supporting recursive functions

I support recursive functions by this lambda-like construct that has one more field for a fuction name so that, in the function body, we know which name refers to self:

data Expr = ...
          | Rec String String Expr      -- Rec funcname argname funcbody
in SemanticsFunctions.hs

E.g., if I want \n -> if n==0 then 0 else f (n-1) in which I intend f to refer to the function again, it's represented by:

Rec "f" "n" («if n==0 then 0 else f (n-1)»)

How to extend the interpreter to carry out that intention? Here is a solution. (In previous years I presented a slicker but more mind-bending way.)

This 2nd kind of lambda is evaluated to a 2nd kind of closure, if only to indicate that it stands for a recursive function. I extend the Value type by:

data Value = ...
           | VRecClosure (Map String Value) String String Expr

interp (Rec f v fbody) env = pure (VRecClosure env f v fbody)
in SemanticsFunctions.hs

It has one more field (name of recursive function) than VClosure, but otherwise similar.

The real job is done when interp handles App f e. When f evaluates to VRecClosure, my intention is carried out (I show VClosure too for comparison):

interp (App f e) env =
    interp f env
    >>= \c -> case c of
      VClosure fEnv v body ->
          interp e env
          >>= \eVal ->
          let bEnv = Map.insert v eVal fEnv
          in interp body bEnv
      VRecClosure fEnv fName v body ->
          interp e env
          >>= \eVal ->
          let bEnv = Map.insert v eVal (Map.insert fName c fEnv)
          in interp body bEnv
      _ -> raise TypeError
in SemanticsFunctions.hs

In words: To enable recursive calls, the body is evaluated in an environment that maps the function name to the function again!

Exercise: Extend this technique to mutual recursion among multiple functions.