Category Theory Notes

Why

Category theory has been applied to computer science for a long time now, at least since the 1960s. You’ve heard that category theory is good for you. You probably heard that before the new buzzword “applied category theory”, which usually refers to applications outside pure mathematics.

Resources

All the resources recommended are only for introduction.

Quickest introduction for those with strong math background: Basic Category Theory (2014, pdf, ? md, ? pages) by Tom Leinster. Best for general engineering background: Seven Sketches in Compositionality: An Invitation to Applied Category Theory (2018, pdf, 2.9 mb, 353 pages) by Brendan Fong and David Spivak

Spivak and Fong’s course website and video lectures.
John Baez’s series of online written lectures based on this book. My favoriate is Peter Smith’s Introducing Category Theory.

For those with no university math training: The Joy of Abstraction by Eugenia Cheng

Groups, and categories of groups

Definition 1. The objects $G$ equipped with a binary operation $*$ form a group iff

(i) $G$ are closed under $*$ , i.e. for any $x, y \in G$ , $x * y \in G$ ;
(ii) $*$ is associative, i.e. for any $x, y, z \in G$ , $(x * y) * z = x * (y * z)$ ;
(iii) there is a distinguished object $e \in G$ which acts as a group identity, i.e. for any $x \in G$ , $x * y = e = y * x$ .

A group is abelian iff its binary operation is commutative, i.e. for all $x, y \in G$ , $x * y = y * x$ .

Definition 8. A group homomorphism from the group $(G, *, e)$ as source to the group $(H, ⋆, d)$ as target is a function $f$ defined over $G$ with values among H such that for every $x, y \in G$ , $f (x * y) = f (x) ⋆ f (y)$ .

In sum, such a homomorphism sends products of objects in the source group to corresponding products in the target group. Although homomorphisms are often described as preserving group structure, this does not mean replicating all structure. A homomorphism from $G$ to $H$ can compress many or most aspects of the group structure on $G$ simply by mapping distinct G-objects to one and the same H-object. It really is better, then, to talk of homomorphisms as respecting group structure.

Theorem 1.

Any two homomorphisms $f : G \to H$ , $g : H \to J$ , will compose to give a homomorphism $g \circ f : G \to J$ .
Composition of homomorphisms is associative. $h \circ (g \circ f) = (h \circ g) \circ f$ .
For any group $G$ , there is an identity homomorphism $1_{G} : G \to G$ which sends each object to itself. Then for any $f : H \to J$ we have $f \circ 1_{H} = f = 1_{J} \circ f$ .

Connections to programming

Type is a primitive notion. It cannot be defined. Instead of calling it a type, we could as well call it an object or a proposition. These are the words that are used to describe it in different areas of mathematics(type theory, category theory, and logic, respectively).

A type by itself has no meaning. What makes it special is how it connects to other types. An arrow between types is called a function. An arrow between objects is called a morphism. An arrow between proposition is called an entailment. These are just words that are used to describe arrows in different areas of mathematics. You can use them interchangeably.

An object is defined by its connections. An object that has a unique outgoing arrow to every object is called the initial object. Its dual is an object that has a unique incoming arrow from every object. It’s called the terminal object. To be unique means that if you can find two of those, then they must be equal.

In mathematics, the initial object is often denoted by 0 and the terminal object by 1.

The initial object is the source of everything. As a type it’s known in Haskell as Void. Since there is an arrow from Void to everything, there is also an arrow from Void to itself. Thus Void begets Void and everything else. The initial object signifies logical falsehood, contradiction, or a counterfactual. It’s written as False and symbolized by an upside down T, ⊥.

The terminal object unites everything. As a type it’s (), known as Unit. In logic, the terminal object signifies the ultimate truth, symbolized by T.

There are infinitely many types in Haskell, and there is a unique function/arrow from Void to each one of them. All these functions are known under the same name: absurd. It’s useful in some cases as a type-safe way of exhaustively handling impossible cases. For example:

simple :: Either Void a -> a
simple (Left x) = absurd x
simple (Right y) = y

Programming	Category theory	Logic
type	object	proposition
function	morphism(arrow)	implication
`Void`	initial object, 0	False ⊥
`()`	terminal object, 1	True T

Elements

An object has no parts but it may have structure. The structure is defined by the arrows pointing at the object. We can probe the object with arrows.

In programming and in logic we want our initial object to have no structure. So we’ll assume that it has no incoming arrows (other than the one that’s looping back from it). Therefore Void has no structure.

The terminal object has the simplest structure. There is only one incoming arrow from any object to it: there is only one way of probing it from any direction. In this respect, the terminal object behaves like an indivisible point. Its only property is that it exists, and the arrow from any other object proves it.

Because the terminal object is so simple, we can use it to probe other more complex objects. If there is more than one arrow coming from the terminal object to some object a, it means that a has some structure: there is more than one way of looking at it. Since the terminal object behaves like a point, we can visualize each arrow from it as picking a different point or element of its target.

In category theory we say that x is a global element (or just element) of a if it’s an arrow

$1 \to_{x} a$

In Haskell, x :: A means that x is of type A. Haskell uses capitalized names for concrete types, and lower-cased names for type variables.

We say that x is a term of type A but, categorically, we’ll interpret it as an arrow $x : 1 \to A$ , a global element of A. Though the Haskell type system distinguishes between x :: A and x :: () -> A, they denote the same thing in categorical semantics.

In logic, such $x$ is called the proof of $A$ , since it corresponds to the implication $T \to A$ (if $T r u e$ is true then $A$ is true). Notice that there may be many different proofs of $A$ .

Since there be no arrows from any other object to Void, there is no arrow from the terminal object to it. Therefore Void has no elements. This is why we think of Void as empty.

Ther terminal object has just one element, since there is an unique arrow coming from it to itself, $1 \to 1$ . This is why we sometimes call it a singleton.

The Object of Arrows

In haskell we write f :: a -> b to mean that f is of the type “function from a to b”. Here a -> b is just the name we are giving to this type.

On the one hand we have arrows which connect two objects a and b. These arrows form a set. On the other hand we have an object of arrows from a to b. An “element” of this object is defined as an arrow from the terminal object () to the object we call a -> b.

The notation we use in programming tends to blur this distinction. This is why in category theory we call the object of arrows and exponential and write it as $b^{a}$ (the source object is in the exponent). So the statement f :: a -> b is equivalent to $1 \to_{f} b^{a}$ .

Composition

In Haskell we write composition as

h = g . f

Composition is the source of two mappings of arrows called pre-composition and post-composition.

For an arrow $f : a \to b$ , post-composition by $f$ , written as $(f \circ -)$ , lets us shift focus from one object to another. Pre-composition $(- \circ f)$ lets us shift the perspective from observer $b$ to observer $a$ .

In programming, an outgoing arrow is interpreted as extracting data from its source. An incoming arrow is interpreted as producing or constructing the target. Outgoing arrows define the interface, incoming arrow define the constructors.

Function application

Consider a journey from $1$ to $b$ , and first step is an arrow from the terminal object $1$ to some $a$ . It’s an element of $a$ , we can write it as $1 \to_{x} a$ . The rest of the journey is the arrow $a \to_{f} b$ . These two arrows are composable (they share the object $a$ is the middle) and their composition is the arrow $y$ from $1$ to $b$ . In other words, $y$ is an element of $b$ :

$1 \to_{x} a \to_{f} b \equiv 1 \to_{y} b$

We can write it as $y = f \circ x$ .

To translate it to Haskell:

x :: a -- a shorthand for x :: () -> a
f :: a -> b
y :: b
y = f x

Identity

Every object has a special arrow called the identity, which leaves the object unchanged. An identity arrow on the object $a$ is called $i d_{a}$ . So if we have an arrow $f : a \to b$ , we can compose it with identities on either side $i d_{b} \circ f = f = f \circ i d_{a}$ .

We can easily check what an identity does to elements. Let’s take an element $x : 1 \to a$ and compose it wiht $i d_{a}$ . The result is $i d_{a} \circ x = x$ .

In Haskell, we use the same name id for all identity functions id x = x;

Monomorphisms

Epimorphisms

Isomorphisms

Isomorphic Objects

Naturality

Reasoning with Arrows

Sum Types

Bool

We have defined 0 (the initial object) and 1 (the terminal object). What is 2 if not 1 plus 1?

A 2 is an object with two elements: two arrows coming from 1. Let’s call one arrow True and the other False. Don’t confuse those names with the logical interpretations of the initial and the terminal objects. These two are arrows.

This simple idea can be immediately expressed in Haskell as the definition of a type, traditionally called Bool.

data Bool where
    True :: Bool -- True :: () -> Bool
    False :: Bool -- False :: () -> Bool

-- we can now define a term of the type Bool
x :: Bool
x = True

The functions True and False that we used in the definition of Bool are called data constructors. They can be used to construct specific terms, like in the example above. As a side note, in Haskell, function names start with lower-case letters, except when they are data constructors.

\documentclass{minimal}
\usepackage{amsmath}
\usepackage{tikz-cd}
\begin{document}
 \begin{tikzcd}
 A
 \arrow[dd, bend right, "\text{True}"']
 \arrow[dd, bend left, "\text{False}"]
  \arrow[ddd, bend right =90, "x"']
 \arrow[ddd, bend left=90, "y"]
\\
 \\
\text{Bool}
\arrow[d, dashed, "h"]
\\
A
 \end{tikzcd}
\end{document}

Notes

Chapter 1

Preorders

Preorders is short for “preordered sets”. Whenever you have a set of things and a reasonable way deciding when anything in that set is “bigger” than some other thing, or “more expensive”, or “taller”, or “heavier”, or “better” in any well-defined sense, or… anything like that, you’ve got a poset.

Why almost every category theory book start by discussing preorders? Because a preorder is a specially simple kind of category, i.e., a category with at most one morphism from any object x to any object y is really just a preorder.

Total ordered sets are also called linearly ordered, they can be imaged as lines.

Preorders are graphs, partial ordered sets are trees, total ordered sets are lines.

Galois Connections

The right adjoint of f is a way of going back from B to A that’s related to f in some way. So, adjoints are some sort of generalization of inverses.

Why they matters?

Galois connections do something really cool: they tell you the best possible way to recover data that can’t be recovered. More precisely, they tell you the best approximation to reversing a computation that can’t be reversed.

Adjoints give the best approximate ways to solve a problem that has no solution, namely finding the inverse of a monotone function that has no inverse. Left adjoint gives the best approximation from below to the nonexistent inverse of f, and right adjoint gives the best approximation from above.

So, while f has no inverse, it has two “approximate inverses”. The left adjoint comes as close as possible to the (perhaps nonexistent) correct answer while making sure to never choose a number that’s too small. The right adjoint comes as close as possible while making sure to never choose a number that’s too big.

The two adjoints represent two opposing philosophies of life: make sure you never ask for too little and make sure you never ask for too much. This is why they’re philosophically profound.

If you need a mnemonic to remember which is which, remember left adjoints are “left-wing” or “liberal” or “generous”, while right adjoints are “right-wing” or “conservative” or “cautious”.

If choose g(b) to be greater than or equal to every element of this set, g(b) must be an upper bound of this set. g(b) also can’t be chosen to be any bigger that it needs to be, so it must be a least upper bound of the above set.

For a poset (partial ordered set), right adjoint is the supremum or join, left adjoint is the infimum or meet.

They are also related to limit and colimit in categories*.

Logic

The first kind of logic is very familiar. We could call it “subset logic”, but it’s part of what people usually call “classical logic”. This is the sort of logic we learn in school, assuming we learn any at all. The second kind of logic is less well known: it’s called “partition logic”.

In the first, propositions correspond to subsets of X. In the second, propositions correspond to partitions of X. In both approaches we get a poset of propositions where the partial order is “implication”, written ⟹ .

Subset logic

In the first, all the rules obeyed by “subset”, “and” and “or” become rules obeyed by “implies”, “and” and “or”. The left adjoint is defined using “for some”, which is called the existential quantifier, the right adjoint is defined using “for all”, which is called universal quantifier.

Partition logic

Each partition gives a proposition saying that two elements in the same part are “equivalent”, or the same in some way. The coarser the partition, the more elements are equivalent.

In partition logic, you learn more you move to finer partitions.

e.g., At first all wines taste alike, so they’re all lumped together, When you learn to distinguish red and white wines, you move to a finer partition.