Two strings at Hamming distance 1 cannot be both quasiperiodic
Introduction
A string is a finite sequence of letters over an alphabet Σ. If w is a string, then by we denote its length, by for we denote its i-th letter, and by we denote a factor of w being a string composed of the letters (if , then it is the empty string). A factor is called a prefix if and a suffix if .
An integer p is called a period of w if for all . A string u is called a border of w if it is both a prefix and a suffix of w. It is a fundamental fact of string periodicity that a string w has a period p if and only if it has a border of length ; see [5], [9]. If p is a period of w, is called a string period of w. If w has a period p such that , then w is called periodic. In this case w has a border of length at least .
For two strings w and of the same length n, we write if for all and . This means that w and are at Hamming distance 1, where the Hamming distance counts the number of different positions of two equal-length strings. The following fact states a folklore property of string periodicity that we generalize in this work into string quasiperiodicity.
Fact 1 Let w and be two strings of length n and be an index. If , then at most one of the strings w, is periodic.
Fact 1 is, in particular, a consequence of a variant of Fine and Wilf's periodicity lemma that was proved by Berstel and Boasson in [2] in the context of partial words with one hole (a hole is a don't care symbol). For completeness we provide its proof in Section 4 without using the terms of partial words.
We say that a string c covers a string w () if for every position there exists a factor such that . Then c is called a cover of w; see Fig. 1. A string w is called quasiperiodic if it has a cover of length smaller than n.
A significant amount of work has been devoted to the computation of covers in a string. A linear-time algorithm finding the shortest cover of a string was proposed by Apostolico et al. [1]. Later a linear-time algorithm computing all the covers of a string was proposed by Moore and Smyth [10]. Breslauer [3] gave an on-line -time algorithm computing the cover array of a string of length n, that is, an array specifying the lengths of shortest covers of all the prefixes of the string. Li and Smyth [8] provided a linear-time algorithm for computing the array of longest covers of all the prefixes of a string that can be used to populate all the covers of every prefix. All these papers employ various combinatorial properties of covers.
Our main contribution is stated as the following theorem. As we have already mentioned, a periodic string has a border long enough to be the string's cover. Hence, a periodic string is also quasiperiodic, and Theorem 2 generalizes Fact 1.
Theorem 2 Let w and be two strings of length n and be an index. If , then at most one of the strings w, is quasiperiodic.
The proof of Theorem 2 is divided into three sections. In Section 2 we restate several simple preliminary observations. Then, Section 3 contains a proof of a crucial auxiliary lemma which shows a combinatorial property of seeds that we use extensively in the main result. Finally, Section 4 contains the main proof.
Section snippets
Preliminaries
We say that a string s is a seed of a string w if and w is a factor of some string u covered by s; see Fig. 2. Furthermore, s is called a left seed of w if s is both a prefix and a seed of w. Thus a cover of w is always a left seed of w, and a left seed of w is a seed of w. The notion of seed was introduced in [6] and efficient computation of seeds was further considered in [4], [7].
In the proof of our main result we use the following easy observations that are immediate consequences of
Auxiliary lemma
In the following lemma we observe a new property of the notion of seed. As we will see in Section 4, this lemma encapsulates the hardness of multiple cases in the proof of the main result.
Before we proceed to the lemma, however, let us introduce an additional notion lying in between periodicity and quasiperiodicity. We say that a string w of length n is almost periodic with period p if there exists an index such that: In this case we
Main result
In this section we first present a proof of the folklore property of string periodicity (Fact 1) for completeness, and then proceed to the proof of our main result being a generalization of that fact (Theorem 2).
Proof of Fact 1 Assume to the contrary that and both strings are periodic. Let p and () be the shortest periods of w and , respectively. Assume w.l.o.g. that . It suffices to prove the lemma in the case that is a square of length and . Let us define and
Conclusions
In this note we have proved that every two distinct quasiperiodic strings of the same length differ at more than one position. This bound is tight, as, for instance, for every even the strings and are both quasiperiodic and differ at exactly two positions.
Acknowledgements
The authors thank Maxime Crochemore, Solon P. Pissis, and Wojciech Rytter for helpful discussions. We also thank an anonymous referee whose suggestions helped to simplify the proof of Theorem 2. Amihood Amir was partially supported by the ISF grant 571/14 and the Royal Society. Costas S. Iliopoulos was partially supported by the Onassis Foundation and the Royal Society.
References (10)
- et al.
Optimal superprimitivity testing for strings
Inf. Process. Lett.
(1991) - et al.
Partial words and a theorem of Fine and Wilf
Theor. Comput. Sci.
(1999) An on-line string superprimitivity test
Inf. Process. Lett.
(1992)- et al.
Efficient seeds computation revisited
- et al.
Algorithms on Strings
(2007)
Cited by (0)
- 1
The author is a Newton International Fellow.