Haskell Types Part 1

Nested lambdas and nested function applications

Lambda = anonymous function = how to write a function without giving it a name

Python: lambda x : x+1
Javascript: function (x) { return x+1; }
x => x+1
Haskell: \x -> x+1

\ was chosen because it looks like the Greek letter lambda: λ.

Next, if you intend 2 parameters, the Haskell culture is to model it as a nested function: \x -> (\y -> 2*x - 3*y) (those parentheses can be omitted) — a function that maps the 1st parameter to a function that takes the 2nd parameter.

Shorthand: \x y -> 2*x - 3*y

Recall from earlier

diffSq x y = (x - y) * (x + y)

This can be written as

diffSq = \x y -> (x - y) * (x + y)

or even

diffSq x = \y -> (x - y) * (x + y)

What about applying a function to 2 parameters:

diffSq 10 5

is shorthand for

(diffSq 10) 5

It is possible use “diffSq 10” alone (“partial application”). When it is evaluated, here is what happens:

  diffSq 10
→ (\x y -> (x - y) * (x + y)) 10
→ \y -> (10 - y) * (10 + y)

Type-wise:

X -> Y -> A

is shorthand for

X -> (Y -> A)

Recall “function that maps the 1st parameter to function that takes the 2nd parameter”.

Next, if there is X->(Y->A), then it's natural ask about (X->Y)->A, i.e., what if a parameter is a function. Yes you can do that, and we say “higher-order functions”. Toy example: The following atFourPlusAtSeven function is a higher-order function:

atFourPlusAtSeven :: (Integer -> Integer) -> Integer
atFourPlusAtSeven f = f 4 + f 7
in HaskellTypes1.hs

Try using it like these:

atFourPlusAtSeven negate
atFourPlusAtSeven (\x -> x*2)

Realistic example showing both a higher-order function and partial application: The standard library has a function “map”. Its type:

map :: (a -> b) -> [a] -> [b]

or

map :: (a -> b) -> ([a] -> [b])

What it does:

map f [1, 2, 3] = [f 1, f 2, f 3]

This is possible (assume f :: Integer -> Integer):

map (map f) [[1,2,3], [4,5,6]]

One of the parameters is “map f”, a partial application.

Detailed type breakdown:

What it does:

  map (map f) [[1,2,3], [4,5,6]]
= [map f [1,2,3], map f [4,5,6]]
= [[f 1, f 2, f 3], [f 4, f 5, f 6]]

Try these:

map (\x -> 10*x+8) [1,2,3]
map (map (\x -> 10*x+8)) [[1,2,3], [4,5,6]]

Parametric polymorphism

This function takes an integer argument x, builds the list [x, x].
rep2Integer :: Integer -> [Integer]
rep2Integer x = [x, x]
in HaskellTypes1.hs

I have unnecessarily pinned the type to Integer -> [Integer]. rep2Integer 4 is allowed, rep2Integer False isn't. But surely \x -> [x, x] doesn't care about the type of x?

rep2 :: a -> [a]
rep2 x = [x, x]
in HaskellTypes1.hs

All allowed: rep2 4, rep2 False, rep2 "hello", etc.

Similar to {Java, Scala, Rust,...} generics, some C++ templates:

Java signature: <T> SLL<T> rep2(T)
C++ prototype: template<T> SLL<T> rep2(T)
(Pretend I have a type called SLL in Java and C++ for generic singly-linked lists.)

In “a -> [a]”, the “a” there is a “type variable” or “type parameter”. Names of type variables are up to you, doesn't have to be “a”, but does have to start with lower case, e.g., element, myElementType.

(Type constants (names of built-in types and defined types) start with upper case, e.g., Integer, Bool, String.)

The type signature

rep2 :: a -> [a]

is better understood as

rep2 :: forall a. a -> [a]

This means the user (receiver) chooses which type to use for “a”, e.g.,

We say that rep2 is “polymorphic” — can become one of many types: Integer->[Integer], Bool->[Bool], String->[String], etc. rep2Integer is “monomorphic” — stuck with being one single type.

Important: The choice is up to the user, therefore not up to the provider. The provider cannot write:

rep2wrong :: a -> [a]
rep2wrong x = [False, True]

because it is not up to the provider to choose Bool for “a”.

Rant: Rookies (of explaining things) try to explain by repeating “a can be any type” many, many times. This completely misses the point. No one ever doubts what a can be; the real question is who controls it. “Any” fails to explain why rep2wrong is wrong (in fact can give the opposite impression); “up to the user, not up to the provider” is the explanation.

Flash quiz:

xyx :: a -> a -> [a]
xyx x y = [x, y, x]
in HaskellTypes1.hs

Is “xyx 4 False” legal?

Example with more than one type variables: The map function we saw earlier has this type:

map :: (a -> b) -> [a] -> [b]

Again better understood as:

map :: forall a b. (a -> b) -> [a] -> [b]

The user chooses what to use for a and what to use for b. Examples:

“Parametric polymorphism” — the “parametric” part means: the provider is not told what the user chooses. Consequences:

(Two sides of the same coin. Generally, flexibility for the implementer is in direct conflict with predictability for the user. Programming is a dialectic class struggle between the user and the implementer.)

But we also agree there is a beneficial, limited way of varying behaviour by type. Recall my insertion sort (or most sorting algorithms); I pinned the element type at Integer:

insertionSort :: [Integer] -> [Integer]

Surely I only used order comparison (<=), so my code should be good for a lot of element types—those that support order comparisons—not just Integer? How would you write the polymorphic type? Not this:

insertionSort :: [a] -> [a]

because this says that the provider is not told that order comparisons can be done on whichever type the user chooses.

Different types have different meanings of order comparison, so order comparison and all algorithms built on it (e.g., sorting, BST operations) are technically varying by types, but only to a reasonable extent and nowhere near the facetious kind, and definitely beneficial.

Here is one way: The sorting algorithm takes an extra parameter for the comparison function, and the use site supplies it. It solves the simple cases, and you can do it in most languages. (In OOP languages, pass an object that contains the comparison method.)

insertionSort2 :: (a -> a -> Bool) -> [a] -> [a]
insertionSort2 _ [] = []
insertionSort2 leq (x:xt) = insert x (insertionSort2 leq xt)
  where
    insert e [] = [e]
    insert e xs@(x:xt)
        | leq e x = e : xs
        | otherwise = x : insert e xt

exampleInsertionSort2 = insertionSort2 (<=) [3,1,4]
in HaskellTypes1.hs

This approach can be error-prone. Suppose we do it to a polymorphic binary search tree type BST a and a suite of insert, delete, lookup operations:

bstEmpty :: BST a
bstInsert :: (a -> a -> Bool) -> a -> BST a -> BST a
bstDelete :: (a -> a -> Bool) -> a -> BST a -> BST a
bstLookup :: (a -> a -> Bool) -> a -> BST a -> Bool

The problem: If you erroneously use one comparator when inserting, but a different comparator when looking up or deleting, you get nonsense.

In a later lecture (on “type classes”) we will discuss what Haskell and other languages offer to ensure consistency. Teaser preview:

insertionSort3 :: Ord a => [a] -> [a]

bstEmpty :: BST a
bstInsert :: Ord a => a -> BST a -> BST a
bstDelete :: Ord a => a -> BST a -> BST a
bstLookup :: Ord a => a -> BST a -> Bool

For now Ord a means I can use comparisons like <= on values of type a.

User-defined types: Algebraic data types

First cut: Real enumerations. (C has fake enumerations.)

data Direction = North | East | South | West
    deriving (Eq, Show)
    -- To enable equality and printing. More on this when we discuss type classes.
in HaskellTypes1.hs

The name of the type is Direction.

The possible values of that type are North, East, South, West.

North, East, South, West are also the “data constructors” of that type. If you think “tags”, you're right, I think that too.

All 5 names are your choice. Only requirement: capitalized.

Warning: Not OOP constructors. You are only choosing a name for a tag, you are not writing arbitrary initialization code.

Warning: Not OOP subclasses/subtypes either. North is not a type, it's a term and value.

Similarly, the standard library defines Bool by

data Bool = False | True

Real enumeration because you can't mix up False, North, and 0. (Contrast with C.) Everything is not an integer.

Data constructors can be used in patterns. Pattern matching is the fundamental and preferred way of discerning a value of an algebraic data type.

bearing :: Direction -> Integer
bearing North = 0
bearing East = 90
bearing South = 180
bearing West = 270

-- inverse of the above, but has to be a partial function
direction :: Integer -> Direction
direction 0 = North
direction 90 = East
direction 180 = South
direction 270 = West
direction _ = error "unsupported bearing"
in HaskellTypes1.hs

Second cut: But each data constructor can have fields.

Imagine a feudal country Tetrastrata with four kinds of persons:

They can be represented as:

data Tetrastratan
    = Monarch
    | Lord String String Integer
    | Knight String
    | Peasant String
    deriving (Eq, Show)
in HaskellTypes1.hs

The type name is Tetrastratan.

The data constructors (tags) are: Monarch, Lord, Knight, Peasant.

A value of type Tetrastratan is one of:

Mathematically, Tetrastratan is a tagged disjoint union:

(singleton set) ⊎ String×String×Integer ⊎ String ⊎ String

analogous to a sum of products — polynomial — “algebraic”.

Data constructors of 1 or more fields can be used as functions. When you do it, their function types are:

Lord :: String -> String -> Integer -> Tetrastratan
Knight :: String -> Tetrastratan
Peasant :: String -> Tetrastratan

Monarch is not considered a function, but it is a constant, and its type is:

Monarch :: Tetrastratan

(Similarly for Direction earlier, e.g., North :: Direction.)

Below are examples of using Tetrastratan:

ninthDukeOfSussex = Lord "Duke" "Sussex" 9

-- How to address a Tetrastratan:
addressTetra Monarch = "H.M. The Monarch"
addressTetra (Lord d t i) = "The " ++ show i ++ "th " ++ d ++ " of " ++ t
addressTetra (Knight n) = "Sir " ++ n
addressTetra (Peasant n) = n

-- Social order:
superior Monarch (Lord _ _ _) = True
superior Monarch (Knight _) = True
superior Monarch (Peasant _) = True
superior (Lord _ _ _) (Knight _) = True
superior (Lord _ _ _) (Peasant _) = True
superior (Knight _) (Peasant _) = True
superior _ _ = False
in HaskellTypes1.hs

Exercise: String for lord titles is too broad. Replace by an enumeration type. Everything is not a string.

Design guide: Design your data type to avoid invalid data in the first place. This reduces bugs. Not always possible, and when possible not always worthwhile, but you strive to. Disallow invalid data unless you can justify why it is not worthwhile. Read this article from Scala practitioners:

Third cut: Recursion (self, mutual) is also supported.

Example: [singly linked] list of integers. B36-style recursive definition: A value of type MyIntegerList is one of:

data MyIntegerList = INil | ICons Integer MyIntegerList
    deriving (Show, Eq)

exampleMyIntegerList = ICons 4 (ICons (-10) INil)

-- `from0to n` builds a MyIntegerList from 0 to n-1
from0to :: Integer -> MyIntegerList
from0to n = make 0
  where
    make i | i >= n = INil
           | otherwise = ICons i (make (i+1))

myISum :: MyIntegerList -> Integer
myISum INil = 0
myISum (ICons x xs) = x + myISum xs
in HaskellTypes1.hs

Example: binary [search] tree of integers. B36-style recursive definition: A value of type IntegerBST is one of:

data IntegerBST = IEmpty | INode IntegerBST Integer IntegerBST
    deriving Show

exampleIntegerBST = INode (INode IEmpty 3 IEmpty) 7 (INode IEmpty 10 IEmpty)
in HaskellTypes1.hs

BST insert. Since this is functional programming with immutable trees, “insert” means produce a new tree that is like the input tree but with the new key. May be better to say “the tree plus k”.

ibstInsert :: Integer -> IntegerBST -> IntegerBST
ibstInsert k IEmpty =
    -- Base case: empty tree plus k = singleton tree with k
    INode IEmpty k IEmpty

ibstInsert k inp@(INode left key right)
    -- Induction step: The input tree has the form INode left key right.
    -- Induction hypothesis (strong induction):
    --   If t is a smaller tree than the input tree, e.g., t=left or t=right,
    --   then ibstInsert k t computes t plus k.
    -- Can you use this to help compute inp plus k?

    -- If k<key, the correct answer is a node consisting of:
    -- new left child = left plus k = ibstInsert k left (by IH)
    -- new key = key
    -- new right child = right

    | k < key = INode (ibstInsert k left) key right

    -- If k>key, mirror image of the above.

    | k > key = INode left key (ibstInsert k right)

    -- If k==key, nothing new to compute, the correct answer is the input tree.
    | otherwise = inp   -- INode left key right
in HaskellTypes1.hs

Fourth cut: Parametric polymorphism (“generics” in Java, Rust, etc.).

Example: binary [search] tree of arbitrary key type.

data BST a = Empty | Node (BST a) a (BST a)
    deriving Show

exampleBSTChar :: BST Char
exampleBSTChar = Node (Node Empty 'c' Empty) 'g' (Node Empty 'j' Empty)

bstInsert :: Ord a => a -> BST a -> BST a
-- "Ord a =>" to be explained later under Type Classes. For now it means I can
-- do "<", "<=" comparisons.
bstInsert k Empty = Node Empty k Empty
bstInsert k inp@(Node left key right)
    | k < key = Node (bstInsert k left) key right
    | k > key = Node left key (bstInsert k right)
    | otherwise = inp
in HaskellTypes1.hs

Exercise: Rewrite BST and bstInsert for dictionary, not just set. I.e., not just key, but also value. Start like this:

data BST k v = Empty | Node ...

My list type of arbitrary element type. (Just for teaching purpose. You already know about the built-in list type.)

data MyList a = Nil | Cons a (MyList a) deriving (Eq, Show)

exampleMyListI :: MyList Integer
exampleMyListI = Cons 4 (Cons (-10) Nil)

exampleMyListS :: MyList String
exampleMyListS = Cons "albert" (Cons "bart" Nil)
in HaskellTypes1.hs

Homogeneous list --- can't have different item types in the same list, e.g.,

Cons "albert" (Cons True Nil)

is illegal. Because what would be its type, MyList String? MyList Bool?

But look up “Either” and have:

Cons (Left "albert") (Cons (Right True) Nil) :: MyList (Either String Bool)

Similarly for the BST type above. Similarly for the built-in list type.

Why pattern matching instead of a lot of predicates for “is it empty?”, “is it a Node?”… and a lot of “getters” for “what is the head?”, “what is the rest?”, “what is the left child?”, etc etc: Pattern matching is more direct, more cohesive, more obviously correct, and more firmly grounded in formal logic. Elaboration in this article.