There is something common to these examples:
mySumV1 [] = 0
mySumV1 (x:xt) = x + mySumV1 xt
myProductV1 [] = 1
myProductV1 (x:xt) = x * myProductV1 xt
myMap f [] = []
myMap f (x:xt) = f x : myMap f xt
The common theme is a function that looks like
g [] = z
g (x:xt) = a function of x and (g xt)
mySumV1:
z = 0
function: x + mySumV1 xt --- the function is (+)
myProductV1:
z = 1
function: x * myProductV1 xt --- the function is (*)
myMap f:
z = []
function: f x : (myMap f) xt
= (\x r -> f x : r) x ((myMap f) xt) --- the function is \x r -> f x : r
All these are refactored to:
myFoldr :: (a -> b -> b) -> b -> [a] -> b
myFoldr op z = g
where
g [] = z
g (x:xt) = op x (g xt)
-- So here are mySumV1, myProductV1, myMap re-expressed with myFoldr:
mySumFoldr xs = myFoldr (+) 0 xs
myProductFoldr xs = myFoldr (*) 1 xs
myMapFoldr f xs = myFoldr (\x r -> f x : r) [] xs
The standard library has foldr
for this.
The “r” in “foldr” means “right” and came from very basic examples like
foldr (+) 0 [1,2,3] = 1 + (2 + (3 + 0))
which looks like a right-associative computation. However, I
deemphasize this angle. It doesn’t help recognizing sophisticated cases
like map
and beyond. And it utterly fails under lazy
evaluation, e.g., foldr (&&) True infinite_list
does not begin by building an infinite conjunction.
There is something common to these examples:
-- Sum a list, like the loop-and-accumulator you normally write:
mySumV2 xs = g 0 xs
where
-- If you want to use induction to prove g, use this specification:
-- for all xs, for all a, g a xs = a + sum of xs
g accum [] = accum
g accum (x:xt) = g (accum + x) xt
-- Reverse a list, like the loop-and-accumulator you normally write:
myReverse xs = g [] xs
where
-- If you want to use induction, the specification is:
-- for all xs, for all a, g a xs = (reversal of xs) ++ a
g accum [] = accum
g accum (x:xt) = g (x : accum) xt
Exercise: slowReverse
below is too slow. Why? (How much
time does ++
take?)
slowReverse [] = []
slowReverse (x:xs) = slowReverse xs ++ [x]
Common theme: a function f
that looks like
f xs = g a0 xs
where
g accum [] = accum
g accum (x:xs) = g (a function of accum and x) xs
mySumV2:
a0 = 0
function: (+)
myReverse:
a0 = []
function: (\a x -> x : a) = flip (:)
This is abstracted into:
myFoldl op a0 xs = g a0 xs
where
g accum [] = accum
g accum (x:xt) = g (op accum x) xt
mySumFoldl xs = myFoldl (+) 0 xs
myReverseFoldl xs = myFoldl (flip (:)) [] xs
The standard library has foldl
for this.
The “l” in “foldl” means “left” and came from very basic examples like
foldl (+) 0 [1,2,3] = ((0 + 1) + 2) + 3
which looks like a left-associative computation. However, I
deemphasize this angle. It doesn’t help recognizing sophisticated cases
like reverse
and beyond.
Under lazy evaluation, as seen in previous lectures,
foldl (+)
takes up too much space, and the fix is a version
that uses seq
every iteration. This is available as
foldl'
in Data.List:
foldl' op a0 [] = a0
foldl' op a0 (x:xt) = seq a0 (foldl' op (op a0 x) xt)
A lot of list functions simply traverse a list sequentially and “summerize”: length, sum, minimum, foldr, foldl, foldl’… They should also make sense for other sequence data structures, e.g., these should all make sense:
sum :: Num a => [] a -> a -- "[] a" is another way to write "[a]"
sum :: Num a => Maybe a -> a -- think sequence of length 0 or 1
sum :: Num a => Seq a -> a -- from module Data.Sequence
sum :: Num a => Vector a -> a -- array, https://hackage.haskell.org/package/vector
Haskell supports this generalization with a type class, but in a way you haven’t seen in other languages. You have seen this kind of generalization:
[] Int
, [] Char
,
[] Bool
[] a
But now I’m telling you to do this:
[] Int
,
Maybe Int
, Vector Int
,
Seq Int
t Int
With that, here is the type class:
class Foldable t where
length :: t a -> Int
sum :: Num a => t a -> a
minimum :: Ord a => t a -> a
foldr :: (a -> b -> b) -> b -> t a -> b
foldl :: (b -> a -> b) -> b -> t a -> b
-- and others
Important: It is not “class Foldable (t a)
”. Likewise,
instances go like “instance Foldable []
”, not
“instance Foldable ([] a)
”.
But the most fundamental method, which rigorously justifies the intuition of grouping those methods together under “sequentially summarize”, is:
foldMap :: Monoid m => (a -> m) -> t a -> m
So now I have to explain Monoid.
When you have an associative binary operator, mathematicians say you have a “semigroup”. This word puts emphasis on the set of values that your operator works on.
When the associative operator also has an identity element, mathematicians say you have a “monoid”. This word puts emphasis on the set of values that your operator works on.
Example: list concatenation, and the empty list is the identity element.
If you allow only non-empty lists, that’s a non-monoid semigroup.
Example: function composition, but take care to pick a functions space like “X → X”, i.e., domain and codomain have to be the same, so that “f ∘ f” makes sense.
Semigroups and monoids are represented by type classes in the Haskell standard library (replace “set” by “type”):
class Semigroup a where
(<>) :: a -> a -> a
-- other methods implementable by (<>)
class Semigroup a => Monoid a where
mempty :: a
-- other methods implementable by mempty and (<>)
instance Semigroup [a] where
(<>) = (++)
instance Monoid [a] where
mempty = []
class Foldable t where
foldMap :: Monoid m => (a -> m) -> t a -> m
...
The intention of foldMap
by list-based examples:
foldMap f [a, b, c, d] = f a <> f b <> f c <> f d
foldMap f [] = mempty
Similarly for other sequence data structures, since they all express the same idea of “store elements a, b, c, d, in that order”.
Note that since <>
is supposed to be associative,
grouping should not matter, i.e., these should all give the same
answer:
f a <> (f b <> (f c <> f d))
((f a <> f b) <> f c) <> f d
(f a <> f b) <> (f c <> f d)
An instance would choose one way according to efficiency or simplicity. I show a simple version for my own list type, just for concrete understanding, but it likely loses on efficiency:
data MyList a = Nil | Cons a (MyList a) deriving (Eq, Show)
instance Foldable MyList where
foldMap _ Nil = mempty
foldMap f (Cons x xt) = f x <> foldMap f xt
All other Foldable methods are mathematically applications of
foldMap
. (Obvious for some methods, surprising and advanced
for others, but all true.) This gives just cause for grouping them under
the same class. The class comes with their default implementations in
terms of foldMap
. So now you can use length, sum, minimum,
foldr, foldl, and others on my list type.
However, the default implementations can be very inefficient, so a practical instance custom-implements each method for efficiency.