2025-06-04
For any language \(A\), we have the problem of determining whether a given input is in \(A\) or not.
Deterministic Finite Automata
Computation:
Non-deterministic Finite Automata
Computation:
A language \(A\) is regular if and only if there is a DFA \(M\) such that \(L(M) = A\).
Suppose \(A\) and \(B\) are regular languages, then
\(\overline{A}\),
\(AB\),
\(A \cup B\),
\(A \cap B\),
\(A^n\), and
\(A^*\)
are also regular.
Suppose \(A\) is regular. Let \(M\) be a DFA for \(A\). Let \(M'\) be the DFA with all the states of \(M\) flipped. Then \(L(M') = \overline A\).
Let \(M\) and \(N\) be DFAs for \(A\) and \(B\) respectively. We need to find a DFA \(D\) for \(A \cup B\).
Idea 1: Here’s what we’d like to do. Let \(w\) be the input. \(D\) runs \(M\) on \(w\) to check if \(w \in A\) and then \(D\) runs \(N\) on \(w\) to check if \(w \in B\). Accept if either was accept and reject if both were rejected.
Why does this idea not work?
Idea 2: Instead, we need to run the two machines in parallel. Thus, each state in \(D\) corresponds to a 2-tuple where the first element is the ‘M’ state and the second element is the ‘N’ state.
If we’re at state \((x, y)\) and we read a character \(\sigma\). Then we transition to \((x', y')\) where \(x'\) is the state that \(x\) goes to (in \(M\)) when reading \(\sigma\) and \(y'\) is the state that \(y\) goes to (in \(N\)) when reading \(\sigma\).
What should the accepting states be? \((x, y)\) should be accepting if either \(x\) was an accept state in \(M\) or \(y\) was an accept state in \(N\) (or both).
Homework.
DFAs
A language \(A\) is regular iff there is a DFA \(M\) such that \(L(M)=A\)
Suppose \(A\) and \(B\) are regular languages, then
\(\overline{A}\) (by flipping states),
\(A \cup B\) (by running two DFAs in parallel),
\(A \cap B\) (hw),
\(AB\) (today!),
\(A^n\)(today!), and
\(A^*\)(today!),
are also regular.
The same trick of running the DFAs in parallel doesn’t work.
How would you write code to solve this?On input \(w\):
For \(i = 0,1,...,n:\)
inA = \(M(w[:i])\)
inB = \(M(w[i:])\)
if inA and inB: accept
reject
“If any sequence of choices leads to an accept state, accept”
“Reject only if every sequence of choices leads to reject”
If there are multiple arrows marked with the read character, choose which arrow to take.
If there is an \(\epsilon\)-transition, choose whether or not to take it!
Let \(A\) be any language. There is a DFA \(M\) such that \(A = L(M)\) if and only if there exists a NFA \(N\) such that \(A = L(N)\).
The forward direction is true since every DFA is already a NFA (it just doesn’t use the extra features). The backwards direction might be surprising to you!
Essentially it says all the extra features we give NFAs don’t in fact give them more “power”. If I can solve it with an NFA, I can also solve it with a DFA.
We need to simulate a NFA \(N\) with a DFA \(M\)
High level idea:
Have a state in \(M\) for every subset of states in \(N\).
Let \(S\), \(T\) be two states in \(M\). Then \(S\) transitions to \(T\) reading \(\sigma\) if some choice allows me to go from a state in \(S\) to a state in \(T\). The choice includes an unlimited number of \(\epsilon\) transitions.
The accepts states are the subsets that contain accept states.
The start state is the subset containing the start state in \(N\) and all states reachable from the start state using an \(\epsilon\) transition
Intuitively, if I’m at state \(S\) after having read \(w\). \(S\) should contain all the possible states I could have been in \(N\).
For every NFA \(N\), there exists a (potentially huge) DFA \(M\) such that \(L(N) = L(M)\).
The following are equivalent
\(A\) is regular
There is a DFA \(M\) such that \(L(M) = A\)
There is a NFA \(N\) such that \(L(N) = A\)
The more complex a model of computation, the easier it is to use. I.e. NFAs are more complex so finding NFAs for a language is easier than finding DFAs for language.
The more limited a model of computation, the easier it is to prove things about. Since DFAs are so restrictive, we can prove a lot of nice properties about them!
Since these two models of computation are equivalent, we can pick the right model for the right situation!
Let \(M\) and \(N\) be DFAs for \(A\) and \(B\) respectively. We’ll show that \(AB\) is regular by finding an NFA \(H\) for \(AB\).
This NFA guesses when the string in \(B\) should start. It starts in \(M\) and can only move to \(N\) from accepts states in \(M\) since we need the first part of the string to be part of \(A\).
Suppose \(A\) was regular, how would you show that \(\forall n \in \mathbb{N}. A^n\) is regular?
Regular expressions are all about matching patterns in strings.
Examples:
Working with the terminal (live)
Let \(\Sigma\) be an alphabet. Define the set of regular expressions \(\mathcal{R}_\Sigma\) recursively as follows.
\(\mathcal{R}_\Sigma\) is the smallest set such that
\(\emptyset \in \mathcal{R}_\Sigma\)
\(\epsilon\in \mathcal{R}_\Sigma\)
\(a \in \mathcal{R}_\Sigma\) for each \(a \in \Sigma\)
\(R \in \mathcal{R}_\Sigma \implies (R)^* \in \mathcal{R}_\Sigma\)
\(R_1, R_2 \in \mathcal{R}_\Sigma \implies (R_1R_2) \in \mathcal{R}_\Sigma\)
\(R_1, R_2 \in \mathcal{R}_\Sigma \implies (R_1 | R_2) \in \mathcal{R}_\Sigma\)
Note that \(\mathcal{R}_\Sigma\) is defined inductively.
* comes before concatenation which comes before \(|\).
For example, \(a^*b | bc\) means \(((a^*)b)|(bc)\).
The language if a regular expression \(R\), denoted \(L(R)\) is the set of strings that \(R\) matches. Formally,
\(L(\emptyset) = \emptyset\)
\(L(\epsilon) = \{\epsilon\}\)
\(L(a) = \{a\}\), for \(a \in \Sigma\)
\(L(R^*) = L(R)^*\) for \(R \in \mathcal{R}_\Sigma^*\)
\(L(R_1R_2) = L(R_1)L(R_2)\) for \(R_1, R_2\in \mathcal{R}_\Sigma\)
\(L(R_1 | R_2) = L(R_1) \cup L(R_2)\) for \(R_1, R_2\in \mathcal{R}_\Sigma\)
What are the languages of the following regular expressions? (Assume the alphabet is \(\{0,1\}\))
\(((0|1)(0|1)(0|1))^*\)
\((1^*01^*0)^*\)
\((0|1)^*1(0|1)(0|1)\)
\((0|1)^*010(0|1)^*\)
\((0 | \epsilon)1^*\)
\(L(((0|1)(0|1)(0|1))^*) = \{w: |w| = 0 \mod 3\}\).
\(L((1^*01^*0)^*) = \{w: w \text{ ends with 0 contains an even number of 0s}\}\).
\(L((0|1)^*1(0|1)(0|1)) = \{w: \text{ third last character of $w$ is 1}\}\).
\(L((0|1)^*010(0|1)^*) = \{w: w \text{ contains 010}\}\).
\(L((0 | \epsilon)1^*) = \{w: w \text{ is zero or more 1s or 0 followed by zero or more 1s}\}\).
If \(R\) is a regex, and \(m \in \mathbb{N}\), then
\(R^{m}\) means \(m\) copies of \(R\).
\(R^{+} = RR^*\), i.e. at least one copy of \(R\).
\(R^{m+} = R^mR^*\) means at least \(m\) copies of \(R\).
If \(S = \{s_1,s_2,...,s_n\}\) is a finite set of strings, then \(S\) is shorthand for \(s_1 | s_2 |...| s_n\). A common one is to use the alphabet, \(\Sigma\).
Write regular expressions for the following languages
Starts with 010.
The second and the second last letter of w are the same.
Contains 110.
1s always occur in pairs.
Doesn’t contain more than four 1s in a row.
Starts with 010. \(010\Sigma^*\)
The second and the second last letter are the same. \(\Sigma0\Sigma^*0\Sigma | \Sigma1\Sigma^*1\Sigma\)
Contains 110. \(\Sigma^*110 \Sigma^*\)
1s always occur in pairs. \(\{0,11\}^*\)
Doesn’t contain more than four 1s in a row. \(\{0,01,011,0111, 01111\}^*\)
CSC236 Summer 2025