doi:10.1016/S0166-218X(03)00299-3
Copyright © 2003 Elsevier B.V. All rights reserved.
A characterization of Thompson digraphs*1
a Dipartimento di matematica, Universitá di Roma “Tor Vergata”, via della ricerca scientifica, 00133, Rome, Italy
b 25 rue Philippe Lebon, BP 540, 76058, Le Havre Cedex, France
c Department of Computer Science, Hong Kong University of Science & Technology, Clear Water Bay, Kowloon, Hong Kong, China
d L.I.F.A.R. (Laboratoire d'Informatique), Université de Rouen, 76821, Mont-Saint-Aignan Cedex, France
Received 9 February 1999;
revised 13 September 2002;
accepted 7 February 2003. ;
Available online 25 July 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
A finite-state machine is called a Thompson machine if it can be constructed from an empty-free regular expression using the construction of Thompson as modified by Hopcroft and Ullman. We call the underlying digraph of a Thompson machine a Thompson digraph. We characterize Thompson digraphs and we give an algorithm that generates an equivalent regular expression from a Thompson machine that has size linear in the total number of states and transitions. Although the algorithm is simple, it is novel in that the usual constructions of equivalent regular expressions from finite-state machines produce regular expressions that have size exponential in the size of the given machine, in the worst case. The algorithm provides a tentative first step in the construction of small expressions from finite-state machines.
Author Keywords: Regular expressions; Dyck strings; Thompson digraphs and machines
Fig. 1. The Thompson construction. The figures (a)–(d) correspond to the cases (2)–(6) in the definition of regular expressions. When a given regular expression E is empty free, (e) is never used by the Thompson construction, we include it for completeness.
Fig. 2. (a) The Thompson machine obtained from the example empty-free regular expression (((
a+
b)
*)·((
b+λ)·
a)); (b) the corresponding Thompson digraph of this Thompson machine.
Fig. 3. The Thompson-digraph inductive construction. We use the same visual cues to indicate the source and sink vertices as we do to indicate the start and final states of finite-state machines.
Fig. 4. Partitions of a two-hammock used in the proofs of
Theorems 5 and 7Theorems 5 and 7.
Fig. 5. A depth-first traversal of an example Thompson digraph. Observe the back edge that identifies a cycle and the corresponding star-like unit. We use solid lines to indicate forward edges to vertices not previously visited, dashed lines to indicate back edges, and dotted lines to indicate forward edges to a previously visited vertex that is not an ancestor of the current vertex.
Fig. 6. An illustration of star reduction.
Fig. 7. A star reduction that does not yield a Thompson dag. The reason is that
t is the only (1,1)-vertex between
x and
y but in a Thompson dag there must be a non-zero, even number of such vertices between
x and
y.
Fig. 8. A back edge (
y,
x) that is not star reducible. The reason is that the only candidates for the associated vertices of (
y,
x) are
w and
z but there is no edge from
w to
z.
Fig. 9. Parsing a Thompson machine: An example.