CSC 341: February 7, 2007
From CSWiki
Instructor's notes for the class session
We may suspect that a language isn't regular when we can't think of any way to recognize it without having to save an unbounded amount of state at some point in the membership computation. But sometimes this only means that there's a trick to the algorithm that we haven't yet thought of, or that the membership problem is just really intricate.
One way to prove that a language isn't regular is to use Theorem 1.70, the pumping lemma for regular languages. The use of the pumping lemma to show that a language L is not regular can be understood as a game between Ms. Con, who believes that L is not regular, and Mr. Pro, who is skeptical about this conclusion and is playing to keep doubt alive. Here are the moves in the game:
- Mr. Pro plays first, choosing a positive integer p (the pumping length).
- Ms. Con replies by choosing a string s. It must be a member of L, and its length must be at least p (and there is no penalty for proposing a string of length greater than p). Ms. Con loses if there is no string that meets these conditions.
- Mr. Pro replies by dividing s into three substrings x, y, and z. The length of y must be positive (in other words: y may not be the null string), and the total length of xy must be less than or equal to p.
- Ms. Con replies by choosing a non-negative integer i. If the string xyiz is a member of the language L, Mr. Pro wins; if not, Ms. Con wins.
The pumping lemma says that, if L really is regular, then there is a winning strategy for Mr. Pro in this game. Consequently, a winning strategy for Ms. Con -- a general method by which she can always win, no matter how Mr. Pro plays -- constitutes a proof that L is not regular.
Proof of Theorem 1.70, using the pigeonhole principle.
Example 1.73:
.
Ms. Con's strategy is to wait until Mr. Pro has identified p and then choose s to be
. Mr. Pro is forced to divide this string in such a way that y is a non-empty string of
s, say
. Ms. Con then wins by choosing any non-negative integer i other than 1, since then
is not a member of
.
Problems 1.51 and 1.52; the Myhill-Nerode theorem.
Study questions
- Show that, for any regular language L, there is some positive integer p such that every member of L of length p or greater must have a pumpable substring within any group of p or more adjacent symbols.
- Is every finite language regular? If so, prove it; if not, give a counterexample.
- Sipser, Problem 1.35.
- What is the index of the language represented by the regular expression
?
- Construct a set of seven strings that is pairwise distinguishable by the language of homework exercise 5 (decimal numerals for multiples of 7).
Proposed Solutions
-
- Every finite language is regular, because it can be recognized by a DFA that is built to recognize each individual string in the language. Every language recognized by a DFA is regular, so every finite language is regular.
-
- To find the index, we must find the largest possible set of strings that are pairwise distinguishable by the language (see problems 1.51 and 1.52 for definitions). We start with
. Next we try a, which is distinguishable from
with ab. Next we add ab, distinguishable from
with aba and from a with ab. Next comes abb, distinguishable from
with ab, from a with ba, and from ab with ab. Finally we add b, which is distinguishable from the other strings by any string that will make the others in the language (since anything appended to b will still never be in the language). At this point our set consists of
. Any string we add at this point must be recognized by the language with some suffix (since otherwise the string would be indistinguishable from b). However, any strings of the form aba * will be indistinguishable from ab, and any strings of the form aba * b * will be indistinguishable from abb, and with
and a already in the set, we are left with no other valid options. This is the maximum size of the set, therefore the index is 5.
- Taking the multiples of 7, we reach 63 before the second digits start repeating themselves. The first digit of each of these will be distinguishable from the others by the second digit that makes it a multiple of 7. In other words, "1" is distinguishable from 2, 3, 4... with "4", because "14" is in the language while "24", "34", "44"... are not. So our set is {7,1,2,3,4,5,6}.

