An exponentially-growing family of universal quantum circuits

Mo Kordzanganeh; Pavel Sekatski; Leonid Fedichkin; Alexey Melnikov

doi:10.1088/2632-2153/ace757

1. Introduction

The successes of quantum computing in the past decade have laid the foundations for the interdisciplinary field of quantum machine learning (QML) [1, 2], where parameterised quantum circuits are used as machine learning models to extract features from the training data. It was argued [3] that such quantum neural networks (QNN) could have higher trainability, capacity, and generalization bound than their classical counterparts. Practically, hybrid quantum neural networks (HQNN) have shown promise in small-scale benchmarking tasks [4–7] and larger-scale industrial tasks [8–10]. Nevertheless, the utility, practicality, and scalability of pure QNNs are still unclear. Furthermore, [11] provided a thorough overview of this field, showing that while classical machine learning is solving large real-world problems, QNNs are mostly tried on synthetic, clean datasets and show no immediate real-world advantages in its current state¹ . It also suggested that the QNN research focus should be shifted from seeking quantum advantage to new research questions, such as finding a new, advantageous quantum neuron. This work explores a new quantum neuron that argues for moving beyond using the Pauli-, single-qubit gates for encoding data and instead employing higher-dimensional unitaries through gate decomposition.

Since quantum gates are represented by elements of compact groups, Fourier analysis is a natural tool for analysing QNNs. Schuld et al [15] showed that certain quantum encodings can create an asymptotically universal Fourier estimator. A universal Fourier estimator is a model that can fit a trigonometric series on any given function. As the number of terms in the series approaches infinity, the fit becomes an asymptotically perfect estimate. This estimator can initially infer the coarse correlations in the supplied data, and by increasing the number of Fourier terms, it can incrementally judge the more granular properties of the dataset. This provides an adjustable, highly-expressive machine learning model.

In [15], the authors showed that a QNN could operate as such in two ways: a sequential single-qubit architecture with n repetitions of the same encoding could yield n Fourier bases, which could also arise from an n-qubit architecture with the encoding gates applied to all qubits in parallel.

The sequential architecture is widely shown to be an efficient Fourier estimator, demonstrated both theoretically [15] and empirically [16, 17]. However, sequential circuits are often deep [17], and assuming that near-term quantum computers will have a non-negligible noise associated with each gate, these circuits can experience noise-induced barren plateaus [18]. Barren plateaus are a phenomenon observed in optimization problems where the gradients of the model vanish, rendering them impossible to train. More importantly, a single qubit can be simulated efficiently in a classical setting, so this architecture brings no quantum advantage.

In contrast, the parallel setting offers the exponential space advantage of quantum computing but poses two challenges for large numbers of qubits:

An exponentially growing parameter count with the number of qubits required to span the entire group space $\mathcal{SU}(2^n)$ [19], which could also lead to noise-induced barren plateaus. Spanning the full space is especially important for a priori problems, where the best model lies in the Hilbert space. Still, the machine learning scientist has no prior knowledge of parameterising a circuit to reach this point. In addition, gradient calculations in QNNs—as of this publication—use the parameter-shift rule discovered and presented in [20]. This method requires two evaluations of the QNN to find the derivative of the circuit with respect to each of the trainable parameters. An exponentially growing number of trainable parameters translates to exponentially increasing resources required for gradient computation. In mitigation, [21, 22] showed that a polynomially-growing number of parameters could generate a similar result based on the quantum t-design limits [23].
Strongly-parameterised QNNs have a vanishing variance of gradient which decreases exponentially with the number of qubits [24]. This means that for a large number of qubits if one initialises her QNN randomly, they will encounter a barren plateau. This happens because the expectation value of the derivative of the loss function with respect to each variable for any well-parameterised quantum circuit is zero, and its variance decreases exponentially with the number of qubits. Mitigation methods have been suggested in [25, 26], and most notably in [27], it was shown that by relaxing the well-parameterised constrain to only include a logarithmically growing circuit depth with the number of qubits in the system and use local measurements, the circuit is guaranteed to evade barren plateaus. Zhao and Gao [28] developed a platform based on ZX-calculus² to explore which QNN architectures are affected by the barren plateau phenomenon and found that strongly-entangling, hardware efficient circuits suffer from them. In contrast with the previously-mentioned cases of barren plateaus, the latter is not noise-induced. Thus, this problem must be addressed even in the fault-tolerant future of quantum computing.

Therefore, the practising QML scientist is limited in choosing her QNN architectures for general data science problems: they need to be shallow³ or employ only a few qubits. This contribution suggests modifying the encoding strategies in [15] to increase the growth of the Fourier bases in a QNN from linear in the number of qubits/number of repetitions to exponential. The proposed encoding is constructed by decomposing large unitary generators into local Pauli-Z rotations. This improves the expressivity of the QNNs without requiring additional qubits or encoding repetitions. The increased expressivity is a product of eliminating the encoding degeneracies of the quantum kernel, making efficient use of the available Hilbert space by assigning a unique wavevector to each of its dimensions. However, such encodings could introduce a greater risk of limiting the model's Fourier accessibility⁴ .

Section 2 provides a review of how angle-embedded QNNs approximate their input distributions by fitting to them a truncated Fourier. Specifically, section 2 reviews the linear encoding architectures and how their number of Fourier bases grow linearly with the number of repetitions—sequential linear in section 2.1—as well as the number of qubits—parallel linear in section 2.2. Then, section 3 introduces the same two architectures but slightly modified to represent an exponentially-growing number of Fourier bases. To use these architectures in practice, section 3.3 compares the training performance of these architectures and shows that the parallel exponential has a superior training performance to the other architectures on a synthetic, one-dimensional dataset. Finally, section 3.4 critically evaluates the work and suggests areas for future investigation.

2. Background review—linear architectures

As discussed in [15], all quantum neural networks that use angle embedding⁵ as their encoding strategy produces a truncated Fourier series approximation to the dataset. Schuld et al [15] also specifically explored two families of architectures of quantum neurons: a single-qubit architecture with a series of sequential $\mathcal{SU}(2)$ gates and a multi-qubit architecture with parallel $\mathcal{SU}(2)$ encoding gates. In this section, the results and the architectures introduced in [15] are explored in depth in section 3. Two alternative QNN architectures are presented with the capability of achieving an exponentially higher Fourier expressivity for the same number of gates.

Consider a quantum neuron that maps a real feature $x \in \mathcal{X}$ onto the quantum circuit via a parametric gate $S(x) = \textrm e^{-\textrm i\mathcal{G}x}$ . In most common architectures, the only parametric gates are single qubit rotations $\{R_x,R_y,R_z\}$ . For this work, the Pauli-Z generated rotations are used without any loss of generality $\mathcal{G} = \frac{1}{2}\sigma_z = \frac{1}{2}\big(\begin{smallmatrix} 1 & 0\\ 0 & -1 \end{smallmatrix}\big)$ ,⁶ then the embedding gate takes a simple form $S(x) = \big(\begin{smallmatrix} \textrm e^{-\textrm i x /2 } & 0\\ 0 & \textrm e^{\textrm i x /2} \end{smallmatrix}\big)$ . In general, the dependence of the expected value of any observable on the parameter x is then given by

$\begin{equation*} \langle M\rangle = \langle\Psi|(S^\dagger(x)\otimes \unicode{x1D7D9}) M (S(x)\otimes \unicode{x1D7D9})|\Psi\rangle = (c_0+c_0^*) + c_{1} \textrm e^{\textrm i x}+ c_1^* \textrm e^{-\textrm i x} \end{equation*}$

with some complex parameters c₀ and c₁, which depend on the rest of the circuit and the measurements.

This expected value is a function of the feature x with a very simple Fourier series. The data re-uploading method [16] is a natural way to construct neurons that give rise to richer Fourier series. These are architectures where several parametric gates depend on the same x. It is the most straightforward to consider gates that have a hardwired dependence on the feature⁷ . In particular, such that the expected value of any observable takes the form of a discrete Fourier series

$\begin{equation} f(x,\theta) = \sum_{k} c_k(\theta) \textrm e^{\textrm ikx}, \end{equation} \tag{ 1 }$

where θ the variational parameters and $c_k \in \mathbb{C}$ with $c^{*}_{-k} = c_k$ for real observables. In sections 2.1 and 2.2, two architectures exhibited in [15] are reviewed. The Fourier expressivities of these architectures are of particular interest, that is, the list of wavenumbers $\{k_1,k_2,\dots\}$ appearing in the exponents in equation (1).

2.1. Sequential linear

The single-qubit sequential linear method uses repetitions of the same single-qubit encoding gate S(x) interlaced in-between trainable variational layers. Figure 1(a) shows this implementation with generalized variational gates. Since the eigenvalues of each unitary are $\textrm e^{\pm \textrm i \frac{1}{2}x}$ , it is straightforward to observe (see, e.g. appendix A.1) that after n encoding layers, the expected value of any observable takes the form

$\begin{equation} f(x,\theta) = \sum_{k = -n}^{n} c_k(\theta) \textrm e^{\textrm ikx}, \end{equation} \tag{ 2 }$

Thus, the repetitions have an additive effect such that for n repetitions, the final list becomes $\{-n,-n+1,\ldots,0,\ldots,n-1,n\}$ . Each of the wavenumbers in the list gives rise to a sinusoidal term with the same frequency. Therefore, for n repetitions of the encoding $S(x) = \textrm e^{-\textrm i\mathcal{G}x}$ , n distinct Fourier bases are generated.

2.2. Parallel linear

In the parallel setting, the single-qubit encoding gates are applied in parallel on separate qubits—see figure 1(c). Similarly to the sequential encoding, for n parallel rotations n Fourier bases are produced. This is due to the commutativity between the parallel rotations as they act on separate qubits. The generator $\mathcal{G}$ becomes:

$\begin{equation} \mathcal{G} = \frac{1}{2} \sum_{q = 1}^{r} \sigma_z^{(q)}, \end{equation} \tag{ 3 }$

where q is the qubit index, r indicates the total number of qubits, and $\sigma_z^{(q)}$ is the Pauli-Z matrix applied to the $q\textrm{th}$ qubit. In appendix A.2, it is shown that $\mathcal{G}$ —being a square matrix of dimensions 2^r—has $2r+1$ unique eigenvalues. This suggests a high degree of degeneracy in its eigenspectrum. As before, subtracting these values from themselves yields a list of wavenumbers ranging from −r to r generating r Fourier bases.

2.3. Redundancy

Both in the sequential and parallel linear architectures, there is a lot of redundancy in how the feature is encoded into the circuit. This is the easiest to see for the parallel architecture, where most of the eigenvalues of $\exp(\textrm i x \mathcal{G})$ are largely degenerate as the encoding commutes with qubit permutations.

3. Results—exponential architectures

In this section, two new families of architectures are suggested that can encode an exponential number of Fourier bases for a given number of repetitions/parallel encodings. The basis of this generalization is to modify each 'subsequent' appearance of the encoding gate in the circuit by a re-scaling of the generator $S(x)\to S(m x)$ with an integer m. Keeping the factors m integer guarantees that this procedure results in a discrete Fourier series in the form of equation (1).

3.1. Sequential exponential

It was shown in section 2.1 that the wavenumbers created in the linear models are highly degenerate. By modifying the circuit encoding, this degeneracy can be reduced, resulting in adding new wavenumbers to the list. This is accomplished by altering the generators in the individual encoding layers. In the linear case, the diagonal elements of the generator λ_i always belonged to the list $\{-\frac{1}{2},\frac{1}{2}\}$ , but could be altered by scaling the generator $\mathcal{G}$ in each layer. In practice, this is achieved by scaling the embedded data x and mathematically associating it with the generator. The resultant function becomes

$\begin{align*} f(x,\theta) & = \left( W_{1,j_1}^{(0)\dagger}W_{j_1,j_2}^{(1)\dagger}\cdots M_{k^{^{\prime}},k} \cdots W_{i_2,i_1}^{(1)} W_{i_1,1}^{(0)}\right)\\ & \quad \times \textrm e^{\left((\lambda^{(1)}_{j_1}+\lambda^{(2)}_{j_2}+\cdots)-(\lambda^{(1)}_{i_1}+\lambda^{(2)}_{i_2}+\cdots)\right)x}, \end{align*}$

where $\lambda^{(l)}_i = a_l \lambda_i = \frac{1}{2}\{-a_l,a_l\}$ for $a_l \in \mathbb{N}$ .⁸ In this work, a_l scales as follows $a_l = \{2^0,2^1,2^2,\ldots,2^{n-1}+1\}$ . The motivation behind this choice is the sum of powers of 2, $\sum_{i = 0}^{n-1} 2^i = 2^{n} - 1$ , where the largest wavenumber possible, 2ⁿ, is obtained by taking all the positive contributions from the list of eigenvalues, i.e. $k_\textrm {max} = \sum_{i = 0}^{n-1} 2^i + 2^{n-1} + 1 = 2^n$ . Next, one can switch the signs of the positive values to negative starting from the smallest term to produce all integers from $-2^n$ to 2ⁿ. This generates 2ⁿ Fourier frequencies. Figure 1(b) shows a quantum circuit encoded using the sequential exponential strategy with 2 layers. Appendix C demonstrates that this network produces extreme constraints on the Fourier accessibility and thus is an undesirable choice for general data modelling. However, this scheme motivates extending this idea to parallel architectures.

3.2. Parallel exponential

To perform this extension, it is appropriate to proceed with a two-qubit example. The parallel linear method described in section 2.2 produces the generator:

$\begin{equation} \mathcal{G^{\textrm{lin}}} = \frac{1}{2} \left(\sigma_z \otimes \mathbb{I} + \mathbb{I} \otimes \sigma_z\right) = \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1 \end{pmatrix}. \end{equation} \tag{ 4 }$

This matrix has three unique eigenvalues, $\lambda \in \{-1,0,1\}$ , and when subtracted from itself—yielding wavenumbers $L^{\textrm{(lin)}}_k = \{-2,-1,0,1,2\}$ —it can produce 2 Fourier bases with frequencies $\{1,2\}$ . One could generate a matrix with more unique values. For example,

$\begin{equation} \mathcal{G^{\textrm{exp}}} = \frac{1}{2} \left(3\sigma_z \otimes \mathbb{I} + \mathbb{I} \otimes \sigma_z\right) = \begin{pmatrix} 2 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & -2 \end{pmatrix}, \end{equation} \tag{ 5 }$

is a generator with four unique eigenvalues that generate nine wavenumbers $\{-4, -3, -2, -1, 0, 1, 2, 3, 4\}$ . This generator can be constructed using the quantum circuit shown in figure 1(d). In this case, a $\mathcal{SU}(4)$ generator is employed. This is decomposed into two $\mathcal{SU}(2)$ generators, one using the group parameter, x, and the other 3x. This can be generalized to n qubits as one can extend the matrix for larger numbers of qubits, i.e. for n qubits $\mathcal{G}$ would be a diagonal matrix starting from $-2^{(n-1)}$ up to $2^{(n-1)}$ , producing 2ⁿ Fourier bases. The quantum circuit associated with this generator is an application of Pauli-Z rotations of x with frequencies increasing in the following way:

$\begin{equation} \mathcal{L} = \{2^0,2^1,2^2,\ldots,2^{n-1}+1\}, \end{equation} \tag{ 6 }$

where n is the number of qubits. Note the similarities between the sequential and parallel encodings and their symmetries in how the circuits are constructed. One also recognizes similarities between the parallel encoding and Kitaev's quantum phase estimation algorithm [33], albeit in this case, x is a classical feature.

This can be significantly more expressive than the parallel linear method. Still, this advantage needs to be accompanied by Fourier accessibility. If the Fourier values of these newly-acquired bases cannot be altered, there would be no advantage in pursuing this setting. Section 3.3 shows a significant advantage in using parallel exponential encoding in a simple toy example.

3.3. Training

In this section, the training performance of these four architectures on a simple dataset is compared. Each architecture is trained to reproduce a one-dimensional top-hat function using the variational layers shown in figure 2. Figure 3 shows the ground truth, as well as the fitting performances of these architectures, and figure 4, shows their training performance. It is clear that the parallel exponential architecture fits a closer function to the ground truth, and in contrast, the sequential exponential architecture has the worst performance of all models. Furthermore, the Fourier decompositions of the models in figure 5 show that exactly two Fourier terms are accessed by the linear architectures and four by the exponential ones. Additionally, figure 6 demonstrates the performance of the parallel exponential architecture on a trapped ion quantum processor. The IonQ implements a high-fidelity gate-based quantum processing unit through a process known as laser pumping trapped-ions explained in [36]. The hardware was shown to be one of the most accurate in recent benchmarking tests [37]. We specifically used the hardware introduced in [38] with a single-qubit fidelity of 0.997 and a two-qubit fidelity of 0.9725. The code implementation was done through Amazon Web Services Braket, and the process of the forward pass for 100 data points took four hours and 11 minutes due to the delays and queuing times. It can be observed that the low number of shots is the dominant source of noise here, and higher shot counts could yield a smoother curve that is closer to the simulator. It is noteworthy that even though the superconductor-based QPUs were not tested here, they are expected to produce a similar result if they match the single- and two-qubit gate fidelity rates.

**Figure 2.** Variational gates used in single—(a) and two-qubit (b) experiments.
Download figure:
Standard image High-resolution image

**Figure 3.** The QNNs fit the best possible truncated Fourier series on the top-hat function. The parallel exponential architecture provides the best fit. Even though the sequential exponential architecture has access to the same four Fourier frequencies, it fails to access all of them efficiently, and as a result, it performs sub-optimally. The linear architectures perform similarly to each other, potentially arising from their high Fourier accessibility to the two Fourier frequencies that they can represent.
Download figure:
Standard image High-resolution image

**Figure 4.** Training losses indicate a training advantage for the parallel exponential, and the sequential exponential architecture performs only marginally better than the linear architectures. The training was done on QMware hardware [34] using the PennyLane Python package [35]. The Adam optimizer minimises a mean squared loss function with a learning rate of = 0.1 and with uniformly-distributed parameters $\theta \in [0,2\pi]$ .
Download figure:
Standard image High-resolution image

epsilon — **Figure 4.** Training losses indicate a training advantage for the parallel exponential, and the sequential exponential architecture performs only marginally better than the linear architectures. The training was done on QMware hardware [34] using the PennyLane Python package [35]. The Adam optimizer minimises a mean squared loss function with a learning rate of = 0.1 and with uniformly-distributed parameters $\theta \in [0,2\pi]$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Fourier decomposition of the four architectures after training to fit the top-hat function. The linear architectures can only access two Fourier frequencies, whereas the exponential ones can access four.
Download figure:
Standard image High-resolution image

**Figure 6.** The parallel exponential fit to the top-hat function on a simulator vs on the IonQ Harmony quantum processor. The noisy solid line evaluates this network for 100 equally-spaced points using 100 shots of this device.
Download figure:
Standard image High-resolution image

3.4. Critical evaluation

While the results for the parallel exponentials are encouraging, it is equally important to understand the limitations of this approach. Firstly, while the exponential growth in the number of Fourier frequencies is evident, this is not the higher limit of Fourier frequency growth. Schuld et al [15] showed that for L repetitions of an encoding gate with a Hilbert space of dimension d, there is an upper limit to this growth of the form

$\begin{equation} K < \frac{d^{2L}}{2} -1, \end{equation} \tag{ 7 }$

where K is the number of Fourier frequencies. This suggests a potential for square-exponential growth, whereas the method discussed in this work only grows exponentially. In appendix D, a mathematical problem is proposed whose solution could unlock the maximum possible Fourier accessibility.

Secondly, it is important to emphasize that the two parallel architectures are the same, with a minor multiplicative factor added in the exponential case. Training them for a fixed number of epochs requires the same computational resources. However, adding more Fourier bases by eliminating the network's degeneracy could result in under-parameterised models. Therefore, it is often necessary to parameterise the exponential architectures more heavily than the linear ones, indirectly affecting the required resources. Every Fourier frequency requires two degrees of freedom (real-valued parameters), and an exponentially-growing Fourier space requires the resources to grow exponentially, too. These resources could include the classical memory required to store the parameters or the classical optimizer that needs to calculate the gradient for these parameters. And lastly, extending this to many qubits will still result in barren plateaus.

4. Conclusion and future work

This work suggested two new families of QNN architectures, dubbed sequential and parallel exponential circuits that provided an exponentially growing Fourier space. It was demonstrated that the former struggled with accessing these frequencies but also that the latter showed an advantage in approximating a top-hat function.

Future work could focus on a quantitative understanding of the Fourier accessibility of these networks, such that the optimal variational parameterisation could be chosen for a specific problem. Another possible direction for future work is to depart from hardwired encoding gates. A natural elementary step in this direction is to consider single-qubit gates of the form $S_i(x,w_i) = \exp(- \textrm i \, x w_i \frac{1}{2}\sigma_z )$ , where the scaling factor w_i is an independent scalar trainable parameter for each occurrence of the encoding gate in the circuit. In this case, the final wavevectors k are linear combinations of the parameters w_i that can be potentially trained efficiently. As an added note, the parallel exponential encoding introduced in this work for up to two qubits coincides with the commendable work in [39]. This paper came to our attention after we had released the preprint, and we recognise that the parallel exponential architecture bears resemblance to the Trenary encoding both for the two-qubit case and in the type of growth in Fourier terms, albeit with different scaling strategies. Furthermore, [40] follows a similar example and creates this architecture for an optical setup.

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Appendix A: Fourier estimation of the linear architectures

A.1. Sequential linear

To see the extraction of the Fourier modes more explicitly, one can write the function represented by this circuit as the expectation value of a measurement operator M:

$\begin{equation} f(x,\theta) = \langle0|W^{(0)\dagger}(\theta)S^{\dagger}(x)W^{(1)\dagger}(\theta)\cdots M \cdots W^{(1)}(\theta) S(x) W^{(0)}(\theta)|0\rangle \end{equation} \tag{ A1 }$

$\begin{equation} \phantom{f(x,\theta)}\kern-2.1pt = \langle0|W^{(0)\dagger}(\theta)\textrm e^{i\mathcal{G}x}W^{(1)\dagger}(\theta)\cdots M \cdots W^{(1)}(\theta)\textrm e^{-i\mathcal{G}x} W^{(0)}(\theta)|0\rangle \end{equation} \tag{ A2 }$

Writing this in tensor notation yields

$\begin{equation} = W_{1,j_1}^{(0)\dagger}\textrm e^{i\mathcal{G}_{j_1,j_2}x}W_{j_2,j_3}^{(1)\dagger}\cdots M_{k^{\prime},k} \cdots W_{i_3,i_2}^{(1)}\textrm e^{-i\mathcal{G}_{i_2,i_1}x} W_{i_1,1}^{(0)}, \end{equation} \tag{ A3 }$

where $W = W(\theta)$ and the index summation convention is employed. We can simplify this expression further by employing

$\begin{align*} \textrm e^{-i\mathcal{G}_{i_2,i_1}x} = \textrm e^{\lambda_{i_1} x} \delta_{i_1,i_2}, \end{align*}$

where $\lambda_{i_1} = \{-\frac{1}{2},\frac{1}{2}\}$ represents the eigenvalues of $\mathcal{G}$ . This simplifies our function:

$\begin{equation} = \left( W_{1,j_1}^{(0)\dagger}W_{j_1,j_2}^{(1)\dagger}\cdots M_{k^{^{\prime}},k} \cdots W_{i_2,i_1}^{(1)} W_{i_1,1}^{(0)}\right)\textrm e^{\left((\lambda_{j_1}+\lambda_{j_2}+\cdots)-(\lambda_{i_1}+\lambda_{i_2}+\cdots)\right)x}. \end{equation} \tag{ A4 }$

This means that for every combination of $\{\{i_1,\ldots\},\{j_1,\ldots\}\}$ , the QNN produces a wave-front with a wavenumber⁹ $k_{\boldsymbol{i},\boldsymbol{j}} = (\lambda_{j_1}+\lambda_{j_2}+\cdots)-(\lambda_{i_1}+\lambda_{i_2}+\cdots),$ where $\boldsymbol{i},\boldsymbol{j} \in \mathbb{N}^n$ . Note that for n repetitions of this encoding, one obtains $2^{2n}$ combinations for k, but this also includes a high level of degeneracy for each resultant wavenumber. Specifically, [17] showed that the degeneracy of the wavenumber of this circuit is given by $\{{2n\choose2n},{2n\choose2n-1},{2n\choose2n-2},\ldots,{2n\choose0}\}$ for the wavenumbers $\{-n, -(n-1) ,\ldots, -1 , 0 , 1 ,\ldots, n - 1, n\}$ . Keeping this in mind, one can rewrite equation (A4) as

$\begin{align} \kern-8.49pc f(x,\theta) = \sum_{k = -n}^{n} c_k(\theta) \textrm e^{\textrm ikx} \quad\,\, \end{align} \tag{ A5 }$

$\begin{align} = c_0 + \sum_{m = 1}^n 2|c_m| \cos(m x - \arg(c_m)) \end{align} \tag{ A6 }$

$\begin{align} = c_0 + 2 \sum_{m = 1}^n c^r_m \cos(mx) - c^i_m \sin(mx), \end{align} \tag{ A7 }$

where c_m are complex coefficients and $(c^r_m,c^i_m) = (Re(c_m),Im(c_m))$ .

A.2. Parallel linear

A parallel linear encoding consists of a two-qubit system where a Pauli-Z rotation of x is applied to each qubit. This means that two encodings are to be considered: the first encoding considers only the rotation applied to the first qubit making its encoding generator $\mathcal{G}^1 = \sigma_z \otimes \mathbb{I}$ and the second $\mathcal{G}^2 = \mathbb{I} \otimes \sigma_z$ :

$\begin{equation} \mathcal{G}^1 = \frac{1}{2} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix} ~,~ \mathcal{G}^2 = \frac{1}{2} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix}. \end{equation} \tag{ A8 }$

Since the gates are applied to separate qubits, the generators commute with each other, and since the group parameters applied to each gate are the same, the elements of the encoding matrix can be written as:

$\begin{equation} \left[\textrm e^{\textrm i \mathcal{G}^1 x}\textrm e^{\textrm i \mathcal{G}^2 x}\right]_{mn} = \textrm e^{\textrm i \lambda^1_j x}\textrm e^{\textrm i \lambda^2_k x} \delta_{mj} \delta_{kn} = \textrm e^{\textrm i(\lambda^1_j+\lambda^2_k) x}\delta_{mj}\delta{kn} = \textrm e^{\textrm i\mathcal{G}^{\textrm{comb}}_{mn} x}, \end{equation} \tag{ A9 }$

where $\mathcal{G}^{\textrm{comb}}$ is the combined generator, and δ is the Kronecker delta which acts as the identity matrix in index notation. Using this expression, one could write

$\begin{equation} \mathcal{G}^{\textrm{comb}}_{mn} = \mathcal{G}^1 + \mathcal{G}^2 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix}. \end{equation} \tag{ A10 }$

Similarly to the calculations in appendix A.1, this generator provides us with a list of eigenvalues $\lambda \in \{-1,0,1\}$ , which, when subtracted from itself yields 2 Fourier frequencies and five wavenumbers $k \in \{-2,-1,0,1,2\}$ . This calculation can be generalised to any number of qubits, and Fourier frequencies increase linearly with the number of qubits.

Appendix B: The derivation for the exponential architectures

B.1. Sequential exponential

Starting from equation (A4), one can scale the sequential features, x, by a series of scaling factors $a_l \in \mathbb{N}$ to effectively modify the group generator $\mathcal{G}$ of each layer l to obtain

$\begin{align*} \textrm e^{-\textrm i\mathcal{G}_{\textrm i_2,i_1}a_lx} = \textrm e^{\lambda_{i_1} a_l x} \delta_{i_1,i_2} = \textrm e^{\lambda^{(l)}_{i_1} x} \delta_{i_1,i_2} , \end{align*}$

where $\lambda^{(l)} = \frac{1}{2} \{-a_l,+a_l\}$ is the new effective list of eigenvalues. For n encoding layers, this work employs:

$\begin{equation} a_l (l) = \begin{cases} 2^l,& \textrm{if } l < n-1 \\ 2^l + 1, & l = n-1 \end{cases}, \end{equation} \tag{ B1 }$

where n − 1 is due to counting from 0. This parameterisation results in wavenumbers $k_{\boldsymbol{i},\boldsymbol{j}} = (\lambda^{(1)}_{j_1}+\lambda^{(2)}_{j_2}+\cdots)-(\lambda^{(1)}_{i_1}+\lambda^{(2)}_{i_2}+\cdots)$ . By choosing the i's and j's such that $\lambda^{(l)}_{j_m} = {+a_l} \forall m$ and $\lambda^{(l)}_{i_m} = {-a_l} \forall m$ , the largest $k_{\textrm{max}} = \frac{1}{2}(\sum^{n-1}_{l = 0}{a_l} - \sum^{n-1}_{l = 0}{-a_l}) = 2^{n}$ is obtained. Symmetrically, the minimal k is $k_{\textrm{min}} = -2^{n}$ , and every whole number between can be created by first flipping from small to large all $\lambda^{(l)}_{j_m}$ to arrive at k = 0, and then perform the same on $\lambda^{(l)}_{i_m}$ to arrive at the other extreme of the spectrum. The final list of wavenumbers is an equidistant list of size $2\times2^n +1$ , $k_{\boldsymbol{i},\boldsymbol{j}} \in \{-2^n, -2^n+1, \cdots, 0,\ldots, 2^n-1,2^n\}$ , which grows exponentially in the number of encodings n.

B.2. Parallel exponential

The parallel exponential encoding in the two-qubit limit consists of Pauli-Z rotations of x and 3x applied to, respectively, the first and the second qubits. This encoding will use the same generators as equation (A8) whose purpose is to be multiplied by the rotation angles, x or 3x here, and then exponentiated, i.e. $S(x) = \textrm e^{-\textrm i\mathcal{G}^1 x} \textrm e^{-\textrm i\mathcal{G}^2 (3x)}$ . $\mathcal{G}^2$ can absorb the scalar 3 to produce:

$\begin{equation} \mathcal{G}^1_{\textrm{new}} = \frac{1}{2} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix} ~,~ \mathcal{G}^2_{\textrm{new}} = \frac{1}{2} \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & -3 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & -3 \end{bmatrix}, \end{equation} \tag{ B2 }$

where $\mathcal{G}^1_{\textrm{new}} = \mathcal{G}^1$ remains unchanged. As the rotation angles are now equal—both are now simply x—one could add the generators to obtain:

$\begin{equation} \mathcal{G}^{\textrm{comb}}_{mn,{\textrm{new}}} = \mathcal{G}^1_{\textrm{new}} + \mathcal{G}^2_{\textrm{new}} = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & +1& 0 \\ 0 & 0 & 0 & -2 \end{bmatrix}. \end{equation} \tag{ B3 }$

This generator produces the eigenvalue list $\lambda \in \{-2,-1,1,2\}$ and by subtracting this list from itself, one can obtain the list of wavenumbers $k \in \{-4,-3,\ldots,3,4\}$ , a list of exponential growth with the number of qubits.

Appendix C: Constraints of the sequential exponential

In section 3.3, it was shown that both sequential and parallel exponential architectures represented four Fourier frequencies. However, the latter achieved a lower training loss and a better fit for the top-hat function. This is due to the reduced Fourier accessibility of the sequential architecture, meaning it lacks the freedom to achieve any desired point in the Fourier space. The models with four Fourier frequencies are realised in a nine-dimensional space that includes c₀ and the real and imaginary values of c_i . Each realisation of the trainable parameters of the quantum circuit produces a 9-dimensional array creating a point in this 9-dimensional space. Realising these two architectures many times makes it possible to analyse the geometry of the Fourier space for each architecture. Still, for manual observation, finding an efficient way to reduce this dimensionality to three dimensions (or four with colour) is essential. Figure 7 shows a choice for this dimensionality reduction by investigating the arguments of the complex Fourier coefficients, which, based on equation (A7), represent the phases of the co-sinusoidal terms. These show that the sequential exponential architecture is dramatically constrained in the collection of phases it can represent and that the parallel exponential is unconstrained in this way.

**Figure 7.** Fourier phases of the two exponential architectures: (a) sequential and (b) parallel. Each architecture was realised $10\,000$ times, and their arguments were calculated using the discrete Fourier transform of their outputs. We see that the sequential architecture has a restricted four-dimensional behaviour and that a linear dependence between the phases seems to exist, demonstrating a lack of Fourier accessibility. In contrast, the parallel architecture can fill the space, but still, some constraint is visible between the arg $(c_1)$ and arg $(c_4)$ bases.
Download figure:
Standard image High-resolution image

**Figure 7.** Fourier phases of the two exponential architectures: (a) sequential and (b) parallel. Each architecture was realised $10\,000$ times, and their arguments were calculated using the discrete Fourier transform of their outputs. We see that the sequential architecture has a restricted four-dimensional behaviour and that a linear dependence between the phases seems to exist, demonstrating a lack of Fourier accessibility. In contrast, the parallel architecture can fill the space, but still, some constraint is visible between the arg $(c_1)$ and arg $(c_4)$ bases.
Download figure:
Standard image High-resolution image

Appendix D: Beyond exponential growth

As in appendix A.1, to reach the final list of wavenumbers needed to subtract the eigenvalues of the Hamiltonian in pairs. This section proposes a mathematical problem leading to the highest possible Fourier series.

Problem 1. For a given $m \in \mathbb{N}$ , find a list of integers $L \in \mathbb{Z}^m$ such that when subtracted from itself, it produces a new list $k^\textrm {(max)}_L = \{x - y~|x,y\in L\}$ whose elements are sequential integers and, except for zero, all the elements have a degeneracy of precisely one.

The problem statement produces a list of eigenvalues L from which one can make a list of wavenumbers $k^\textrm {(max)}_L$ . After finding this list, it is crucial to check if one can create a diagonal Hamiltonian using R_Z rotations and non-parameterised gates whose diagonal elements are the numbers in L. In appendix C.3 of [41], this problem is equated to the perfect Golomb ruler where for $5\leqslant m$ , this becomes impossible, and the numbers either become nonsequential or degenerate.

An exponentially-growing family of universal quantum circuits

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction