Riemannian quantum circuit optimization for Hamiltonian simulation

Ayse Kotil; Rahul Banerjee; Qunsheng Huang; Christian B Mendl

doi:10.1088/1751-8121/ad2d6e

1. Introduction

Hamiltonian simulation is a natural and promising application of quantum computing [1, 2]. For example, quantum time evolution gives access to the dynamical behavior of strongly correlated quantum systems, can quickly generate entanglement, and is a central ingredient of several quantum algorithms, like the Harrow-Hassidim-Lloyd (HHL) algorithm and quantum phase estimation.

A detailed numerical analysis of Trotter-Suzuki splitting methods [3] shows that they can approximate the time evolution with circuit depth scaling essentially linearly in simulated time. Recently, the authors of [4–6] have proposed and implemented the idea of optimizing variational circuit Ansätze inspired by Trotterized time evolution for the purpose of Hamiltonian simulation (using parametrized circuit gates). Here, we build upon the same idea and adapt a tensor network perspective as in [6], but take a step further by regarding the circuit gates as general unitary matrices, analogous to [7, 8]. Our main technical innovation is a derivation of how to employ the Riemannian trust-region algorithm [9, chapter 7] for the purpose of optimizing the quantum circuit.

Methods for approximating quantum time evolution on a quantum computer have also been explored in [10–14]. The authors of [13, 14] focus on the best-approximation of the time-dependent quantum state, instead of the overall time evolution operator considered here. Approaches based on quantum signal processing [10, 11] might provide a complexity theoretic advantage in certain situations, but have the disadvantage of requiring additional auxiliary qubits and a block encoding of the Hamiltonian, which can incur a large overhead in practice.

2. Notation and setup

Consider the unitary time evolution operator (in units of $\hbar = 1$ )

$\begin{equation} U\left(t\right) = \mathrm{e}^{-i H t} \end{equation} \tag{ 1 }$

of a quantum system governed by a (time-independent) quantum Hamiltonian H defined on a lattice. We denote the local dimension of each lattice site by d, i.e. the local Hilbert space is $\mathbb{C}^d$ . In our numerical simulations we will set d = 2, but the method works for general d. Translation invariance of the Hamiltonian is assumed throughout.

Our goal is to approximate U by a quantum circuit. We designate the overall unitary transformation effected by the circuit as $W(G_1, \ldots, G_n)$ , with $G_\ell \in \mathcal{U}(d^2)$ , $\ell = 1, \ldots, n$ , to-be optimized quantum gates forming the circuit, as illustrated in figure 1 for a one-dimensional lattice and n = 3. Here and in the following, $\mathcal{U}(m)$ denotes the set of unitary m × m matrices. The circuit has a brick wall layout, and due to translation invariance, the gates within a layer are by construction all the same. We follow the mathematical convention of matrix chain ordering from right to left, i.e. the gates which are applied first are in the rightmost layer.

**Figure 1.** Example of a quantum circuit with brick wall layout and periodic boundary conditions, for approximating the exact time evolution operator.
Download figure:
Standard image High-resolution image

The Ansatz is motivated by the well-studied Trotterized time evolution approximation [3], which assumes that the Hamiltonian is a sum of 'simpler' terms, $H = \sum_{\gamma = 1}^{\Gamma} H_{\gamma}$ , such that each $\mathrm{e}^{-i H_{\gamma} t}$ can be exactly realized as quantum circuit; a basic example is the even–odd splitting of a Hamiltonian with nearest-neighbor interactions on a one-dimensional lattice, $H = H_{\text{even}} + H_{\text{odd}}$ , and the Strang splitting approximation

$\begin{equation} \mathrm{e}^{-i H t} = \mathrm{e}^{-i H_{\text{even}} t/2} \mathrm{e}^{-i H_{\text{odd}} t} \mathrm{e}^{-i H_{\text{even}} t/2} + \mathcal{O}\left(t^3\right). \end{equation} \tag{ 2 }$

A benchmark comparison of our optimized circuits with such splitting methods is presented in section 6.

We will devise a numerical method for optimizing the gates $G_\ell$ in sections 4 and 5. Each gate $G_\ell$ is regarded as general unitary matrix (instead of being parametrized), which opens up the broader range of Riemannian optimization techniques.

The time parameter t is set to a fixed (model dependent) numerical value of order 1. The optimized gates can then be used on a quantum computer to reach times which are integer multiples of t by concatenating copies of the circuit. One expects that the approximation error increases only linearly with the final time [3].

3. Light-cone considerations and generalization to larger systems

For practical reasons, we will perform the circuit optimization for rather small system sizes; nevertheless, it turns out that the circuit is a faithful representation of the time evolution operator for larger systems as well (assuming translation invariance), as already noted in [4, 15], cf the detailed mathematical analysis in [12]. To provide an intuitive argument why this works, we consider the light-cone picture shown in figure 2. The causal correlations spread with a finite velocity, and cannot exceed the Lieb-Robinson bounds [16, 17].

**Figure 2.** Physical light cone of causal correlations, and capability of a brick wall circuit to represent these (red bonds).
Download figure:
Standard image High-resolution image

One observes that the following conditions have to be satisfied to arrive at a faithful approximation of the exact time evolution operator:

(i)
The evolution time t has to be small enough such that the extent of the light cone at t is smaller or equal to the system size. This property ensures that the light cone does not interfere with itself given the periodic boundary conditions.
(ii)
The causal range of influence of the circuit gates (red thick lines in figure 2) has to enclose the physical light cone, to be able to represent the physical information spreading.

Assuming these prerequisites hold, it becomes possible to use the circuit also for larger systems, simply by extending it with copies of the same gate $G_\ell$ in layer $\ell$ . We will test this idea in our numerical experiments in section 6. A practical use-case scenario consists of performing the optimization on a classical computer and small system size, and then using the optimized gates for larger systems on a quantum computer.

4. Mathematical formalism of optimization on the manifold of unitary matrices

In this section we briefly review the mathematical formalism for optimization under unitary constraints [7, 9, 18], following the notation in [9]. This formalism forms the foundation for the numerical method proposed in the next section.

For fixed integer m, the set of unitary m × m matrices,

$\begin{equation} \mathcal{U}\left(m\right) = \left\{V \in \mathbb{C}^{m \times m} \,\vert\, V^{\,\dagger} V = I_m \right\}, \end{equation} \tag{ 3 }$

with I_m the m × m identity matrix, forms a mathematical manifold. $\mathcal{U}(m)$ can be interpreted as Riemannian submanifold of $\mathbb{C}^{m \times m}$ (with metric described below), and is a special case of the complex Stiefel manifold consisting of isometries. For notational simplicity, we omit the parameter m from $\mathcal{U}(m)$ in the following.

A central concept is the tangent space of a manifold. The tangent space at a given $V \in \mathcal{U}$ is parametrized by the set of complex anti-Hermitian matrices [9]:

$\begin{equation} T_V\mathcal{U} = \left\{V A: A \in \mathbb{C}^{m \times m}, A^{\dagger} = -A \right\}, \end{equation} \tag{ 4 }$

where $A^{\dagger}$ denotes the adjoint (conjugate transpose) of A. By construction, $V^{\,\dagger} X$ is anti-Hermitian for any $X \in T_V\mathcal{U}$ .

We introduce the following Riemannian metric on $T_V\mathcal{U}$ , as in [7]:

$\begin{equation} \langle{\cdot, \cdot}\rangle_V: T_V\mathcal{U} \times T_V\mathcal{U} \to \mathbb{R}, \quad \langle{X, Y}\rangle_V = \textrm{Tr}\left[X^{\dagger} Y\right]. \end{equation} \tag{ 5 }$

Note that $\textrm{Tr}[X^{\dagger} Y] = \textrm{Tr}[(V^\dagger X)^{\dagger} (V^\dagger Y)]$ , and thus the trace is real-valued since $V^\dagger X$ and $V^\dagger Y$ are anti-Hermitian.

Viewing $\mathcal{U}$ as embedded into $\mathbb{C}^{m \times m}$ , the corresponding projection onto the tangent space at $V \in \mathcal{U}$ reads [7, 9]:

$\begin{equation} P_V X = V \mathrm{skew}\left(V^{\,\dagger} X\right), \end{equation} \tag{ 6 }$

with $\mathrm{asym}(A) = \frac{1}{2} (A - A^{\dagger})$ the anti-Hermitian part of a matrix.

We define the gradient of a smooth function $f: \mathbb{C} \to \mathbb{R}$ at point $z = x + i y$ with $x, y \in \mathbb{R}$ as composed of the derivatives with respect to the real and imaginary components of z:

$\begin{equation} \mathrm{grad} \ f\left(z\right) = \partial_x \ f\left(z\right) + i \partial_y f\left(z\right). \end{equation} \tag{ 7 }$

This definition is straightforwardly generalized to functions depending on several complex numbers, e.g. $f: \mathbb{C}^m \to \mathbb{R}$ or $f: \mathbb{C}^{m \times m} \to \mathbb{R}$ , by applying the definition entrywise.

We will encounter the situation that the to-be optimized target function $f: \mathcal{U} \to \mathbb{R}$ is the restriction of a function $\bar{f}$ defined on $\mathbb{C}^{m \times m}$ . In this case, the gradient vector of f results from projecting the gradient vector of $\bar{f}$ onto the tangent space:

$\begin{equation} \mathrm{grad} \ f\left(V\right) = P_V \ \mathrm{grad} \ \bar{f}\left(V\right). \end{equation} \tag{ 8 }$

For the purpose of computing second derivatives and Hessian matrices, we need some additional concepts. Let $\mathfrak{X}(\mathcal{U})$ denote the set of smooth vector fields on $\mathcal{U}$ , following the notation of [9]. We will use the unique Riemannian (Levi-Civita) connection $\nabla$ , which is formally defined as a map

$\begin{equation} \nabla: \mathfrak{X}\left(\mathcal{U}\right) \times \mathfrak{X}\left(\mathcal{U}\right) \to \mathfrak{X}\left(\mathcal{U}\right), \quad \left(\eta, \xi\right) \mapsto \nabla_{\eta} \xi \end{equation} \tag{ 9 }$

which is symmetric and compatible with the Riemannian metric. Intuitively, $\nabla_{\eta}$ is the derivative of a vector field in direction η.

As before, we interpret $\mathcal{U}$ as Riemannian submanifold of $\mathbb{C}^{m \times m}$ . Let $\xi \in \mathfrak{X}(\mathcal{U})$ be a vector field. Then the derivative of ξ in gradient direction $X \in T_V \mathcal{U}$ ( $V \in \mathcal{U}$ ) is given by [9, equation (5.15)]

$\begin{equation} \nabla_X \xi = P_V \left(D\xi\left(V\right)\left[X\right]\right), \end{equation} \tag{ 10 }$

where $D\xi(V)[X]$ is the gradient of ξ in direction X at point V.

A retraction on $\mathcal{U}$ [9, chapter 4] is a mapping from the tangent bundle of the unitary matrix manifold into the manifold. We have found it convenient to use the polar decomposition ( $V \in \mathcal{U}$ ) as retraction:

$\begin{equation} R: T\,\mathcal{U} \to \mathcal{U}, \quad R_V\left(\xi\right) = q_{\text{polar}}\left(V + \xi\right), \end{equation} \tag{ 11 }$

where $q_{\text{polar}}(A)$ denotes the unitary matrix $Q \in \mathcal{U}$ from the polar decomposition of $A \in \mathbb{C}^{m \times m}$ as $A = Q P$ , with P a Hermitian positive semi-definite matrix of the same size as A. As a remark, alternative retraction methods have also been studied in the literature, based on QR-decompositions, projections, the Cayley transform and generally geodesic-like schemes [8, 19–21]. We have found the polar decomposition to work well in practice, and leave a thorough exploration of alternative methods for future work.

In our numerical calculations, we will optimize the quantum gates $G_1, \ldots, G_n$ simultaneously (instead of one after another). Matching this procedure with the mathematical formalism requires a generalization to target functions depending on several unitary matrices:

$\begin{equation} f: \underbrace{\mathcal{U} \times \cdots \times \mathcal{U}}_{n \ \text{terms}} \to \mathbb{R}. \end{equation} \tag{ 12 }$

Formally, f is a function from the product manifold $\mathcal{U}^{\times n}$ to the real numbers. The corresponding tangent space is the direct sum of the individual tangent spaces, cf [7]. In practice, the overall gradient vector is thus a concatenation of the individual gradient vectors in equation (8), and the overall retraction results from applying the retraction in equation (11) to the individual isometries and tangent vectors.

For minimizing f in (12), we will use the Riemannian trust-region algorithm [9, chapter 7]. The central idea consists of a quadratic approximation of the target function in the neighborhood of a point $G \in \mathcal{U}^{\times n}$ :

$\begin{equation} \hat{m}_G\left(X\right) = f\left(G\right) + \langle{\mathrm{grad} \ f\left(G\right), X\rangle} + \frac{1}{2} \langle{\mathrm{Hess} \ f\left(G\right)\left[X\right], X\rangle} \end{equation} \tag{ 13 }$

for $X \in T_G \mathcal{U}^{\times n}$ , with the Riemannian Hessian

$\begin{equation} \mathrm{Hess} \ f\left(G\right)\left[X\right] = \nabla_X \ \mathrm{grad} \ f\left(G\right). \end{equation} \tag{ 14 }$

The specific details for computing the gradient and Hessian will be discussed in the next section; with that, we have collected all ingredients for implementing the Riemannian trust-region algorithm [9, Algorithm 10], using the truncated conjugate-gradient method for the trust-region subproblem, see [9, Algorithm 11] and [22].

5. Numerical method for brick wall circuit optimization

As just mentioned, we will use the Riemannian trust-region algorithm [9, chapter 7] for the optimization. Here we describe the details for the specialization to the brick wall circuit Ansatz.

We quantify the approximation error by the Frobenius norm distance ${\Vert W - U\Vert}_{\mathrm{F}}^2$ between $U = \mathrm{e}^{-i H t}$ and the brick wall circuit W. (To shorten notation, we omit the explicit t dependence of U.) We minimize this distance with respect to $G = (G_1, \ldots, G_n)$ , where $G_\ell \in \mathcal{U}(m)$ ( $m = d^2$ ) for all $\ell = 1, \ldots, n$ :

$\begin{equation} G_{\text{opt}} = \mathop{\text{argmin}}_{G \in \mathcal{U}\left(m\right)^{\times n}} {\Vert W\left(G\right) - U\Vert}_{\mathrm{F}}^2. \end{equation} \tag{ 15 }$

Since both U and W(G) are unitary by construction, an expansion of the distance leads to

$\begin{equation} \begin{aligned} {\Vert W - U\Vert}_{\mathrm{F}}^2 & = \textrm{Tr}\left[\left(W - U\right)^{\dagger} \left(W - U\right)\right] \\ & = \textrm{Tr}\left[I\right] - 2 \, \mathrm{Re}\textrm{Tr}\left[U^{\dagger} W\right] + \textrm{Tr}\left[I\right], \end{aligned} \end{equation} \tag{ 16 }$

where I denotes the identity matrix. Thus we may equivalently minimize the following target function:

$\begin{equation} f: \mathcal{U}\left(m\right)^{\times n} \to \mathbb{R}, \quad f\left(G\right) = -\mathrm{Re}\textrm{Tr}\left[U^{\dagger} W\left(G\right)\right]. \end{equation} \tag{ 17 }$

A graphical tensor diagram representation of $\textrm{Tr}[U^{\dagger} W(G)]$ is shown in figure 3.

The gradient of f can be obtained as in equation (8). The straightforward extension of f reads

$\begin{equation} \bar{f}: \left(\mathbb{C}^{m \times m}\right)^{\times n} \to \mathbb{R}, \quad \bar{f}\left(G\right) = -\mathrm{Re}\textrm{Tr}\left[U^{\dagger} W\left(G\right)\right], \end{equation} \tag{ 18 }$

i.e. inserting general matrices into the brick wall diagram. In the present setting, since f depends on several unitary matrices G₁, ..., G_n , the corresponding individual projections $P_{G_\ell}$ ( $\ell = 1, \ldots, n$ ) have to be applied.

We make use of the Wirtinger formalism, summarized in the appendix, to obtain the gradient of $\bar{f}$ . The Wirtinger derivative of $-\mathrm{Re}\textrm{Tr}[U^{\dagger} W]$ with respect to an entry of W is

$\begin{equation} \partial_{W_{jk}} \left(-1\right) \mathrm{Re} \ \textrm{Tr}\left[U^{\dagger} W\right] = \partial_{W_{jk}} \left(-1\right) \frac{1}{2} \left( \textrm{Tr}\left[U^{\dagger} W\right] + \textrm{Tr}\left[W^{\dagger} U\right] \right) = -\frac{1}{2} U_{jk}^*. \end{equation} \tag{ 19 }$

Next, applying the chain rule (A2), with W regarded as function of $G_\ell$ , leads to

$\begin{equation} \begin{aligned} \partial_{G_\ell} \bar{f}\left(G\right) & = -\frac{1}{2} \sum_{j,k} U_{jk}^* \, \partial_{G_\ell} W_{jk}\left(G\right) \\ & = -\frac{1}{2} \textrm{Tr}\!\left[U^{\dagger} \, \partial_{G_\ell} W\left(G\right)\right]. \end{aligned} \end{equation} \tag{ 20 }$

Here we have used that $\partial_{G_\ell^*} W = 0$ , such that the second term of the chain rule vanishes. The derivative of W with respect to $G_\ell$ can be expressed as graphical diagram (shown for three layers) as

Download figure:
Standard image High-resolution image

The uncontracted legs of the dashed 'holes' in the network form the gradient tensor. The summation is due to the product rule. The tensor network diagram of $\textrm{Tr}[U^{\dagger} \, \partial_{G_\ell} W(G)]$ then results from combining figure 3 with equation (21). Note that $\textrm{Tr}[U^{\dagger} \, \partial_{G_\ell} W(G)]$ has the same dimensions as $G_\ell$ (instead of a scalar quantity, as the trace might suggest).

Finally, we can use relation (A4) to obtain the gradient of $\bar{f}$ with respect to the unitary matrices $(G_1, \ldots, G_n)$ :

$\begin{equation} \mathrm{grad} \ \bar{f}\left(G\right) = - \textrm{Tr}\!\left[U^{\dagger} \, \partial_{G_\ell} W\left(G\right)\right]^*_{\ell = 1, \ldots, n}. \end{equation} \tag{ 22 }$

The gradient of f is then, according to equation (8),

$\begin{equation} \mathrm{grad} \ f\left(G\right) = - P_{G_\ell} \textrm{Tr}\!\left[U^{\dagger} \, \partial_{G_\ell} W\left(G\right)\right]^*_{\ell = 1, \ldots, n}. \end{equation} \tag{ 23 }$

In practice, we have found it convenient to work with a real-valued representation of the gradient. For that purpose, we first parametrize the tangent spaces $T_{G_\ell} \mathcal{U}$ , $\ell = 1, \ldots, n$ : let

$\begin{equation} \mathbb{A}_m = \left\{A \in \mathbb{C}^{m \times m} \vert A^{\dagger} = -A \right\} \end{equation} \tag{ 24 }$

denote the set of anti-Hermitian m × m matrices (which is a vector space over the real numbers). Then the following map defines an isometry between this space and the real-valued m × m matrices:

$\begin{equation} \mathfrak{s}: \mathbb{R}^{m \times m} \to \mathbb{A}_m, \quad \mathfrak{s}\left(R\right) = \frac{1}{2} \left(R - R^T\right) + \frac{i}{2} \left(R + R^T\right), \end{equation} \tag{ 25 }$

with inverse $\mathfrak{s}^{-1}(A) = \mathrm{Re}(A) + \mathrm{Im}(A)$ . $\mathfrak{s}$ preserves the inner product $\langle{A, B}\rangle = \textrm{Tr}[A^{\dagger} B]$ . Together with equations (4) and (6), we may thus isometrically map the gradient of f to a list of real-valued matrices:

$\begin{equation} \widetilde{\mathrm{grad}} \ f\left(G\right) =- \mathfrak{s}^{-1}\left(\mathrm{skew}\left({G_\ell}^{\dagger} \textrm{Tr}\!\left[U^{\dagger} \, \partial_{G_\ell} W\left(G\right)\right]^*\right)\right)_{\ell = 1, \ldots, n}. \end{equation} \tag{ 26 }$

For convenience, we finally reshape this gradient into a real vector of length n m² in our calculations.

For calculating the Hessian appearing in equation (14), note that $X \in T_G \mathcal{U}^{\times n}$ consists of a list of tangent vectors: $X = (X_1, \ldots, X_n)$ with $X_\ell \in T_{G_\ell}\mathcal{U}$ for $\ell = 1, \ldots, n$ . Combined with the formula (23), we have to evaluate

$\begin{equation} - \nabla_{X_\ell} P_{G_{\ell^{^{\prime}}}} \textrm{Tr}\!\left[U^{\dagger} \, \partial_{G_{\ell^{^{\prime}}}} W\left(G\right)\right]^* \end{equation} \tag{ 27 }$

for all $\ell, \ell^{^{\prime}} = 1, \ldots, n$ . We use equation (10) for that purpose. In case $\ell \neq \ell^{^{\prime}}$ , the derivative in direction $X_\ell$ takes a similar form as in equation (21), but with the 'holes' in layer $\ell$ filled by $X_\ell$ . In case $\ell = \ell^{^{\prime}}$ , there is a contribution from the trace in equation (27), formed by replacing one of the remaining $G_\ell$ matrices by $X_\ell$ and summing over all occurrences. Another contribution stems from the projector $P_{G_\ell}$ : for this, we first rewrite the definition in equation (6) as

$\begin{equation} P_{G_\ell} Z = \frac{1}{2} Z - \frac{1}{2} G_{\ell} Z^{\dagger} G_{\ell}, \quad Z \in \mathbb{C}^{m \times m}. \end{equation} \tag{ 28 }$

Thus the gradient of $P_{G_\ell} Z$ in direction $X_\ell$ (Z regarded constant) equals

$\begin{equation} D\left(P_{G_\ell} Z\right)\left[X_\ell\right] = -\frac{1}{2} \left( X_{\ell} Z^{\dagger} G_{\ell} + G_{\ell} Z^{\dagger} X_{\ell} \right). \end{equation} \tag{ 29 }$

Analogous to the gradient in equation (26), we have found it convenient to use the map $\mathfrak{s}$ to parametrize the tangent vectors $X_\ell$ in terms of real m $\,\times\,$ m matrices, and accordingly represent the Hessian as real symmetric n m $^2\,\times\,$ n m² matrix. Note that the Hessian matrix is (in general) not positive semidefinite.

As mentioned above, based on the gradient in equation (26) and the real-valued Hessian matrix, we now employ the Riemannian trust-region algorithm [9, Algorithm 10] combined with the truncated conjugate-gradient method for the trust-region subproblem [9, Algorithm 11] [22], to minimize the target function (17) with respect to the unitary matrices $G = (G_1, \ldots, G_n) \in \mathcal{U}(m)^{\times n}$ . The hyperparameters of the algorithm are chosen as: initial radius $\Delta_0 = 0.01$ , maximum radius $\bar{\Delta} = 0.1$ , and $\rho^{^{\prime}} = \frac{1}{8}$ .

The optimization is sensitive to the initial brick wall unitaries used as starting point, and is not always converging to the global optimum according to our numerical experiments. We have found the following two strategies useful for obtaining expedient starting values: (i) A 'bootstrapping' approach, using the optimized gates from a circuit with fewer layers (typically two less) and padding identity layers (i.e. containing identity matrices) on the left and right. All the gates are then optimized simultaneously. (ii) Employing a splitting method from the literature as starting point. Given a Hamiltonian with two-body interactions, a splitting method provides two-qubit gates which form the same brick wall topology as our optimization Ansatz. By using these gates as starting point, the optimized circuit performs at least as good as the splitting method. For comparison, we demonstrate the effect of a more simplistic starting point, namely all identity matrices, for the Heisenberg model (see below).

The Python/NumPy source code of our implementation, including the numerical experiments of the following section, is available at [23]. We have tested the gradient and Hessian computation by comparison with finite difference approximations of derivatives.

6. Numerical simulations

As demonstration, we apply the numerical method in section 5 to several quantum lattice models. To describe the physical setup, first consider a Hamiltonian on the one-dimensional lattice $\mathbb{Z}_{/(L)}$ with L sites (enumerated as $0, 1, \ldots, L-1$ ) and periodic boundary conditions:

$\begin{equation} H = \sum_{j = 0}^{L-1} \hat{h}_{j,j+1}. \end{equation} \tag{ 30 }$

Here the subscripts indicate that the operator $\hat{h}$ acts locally on sites $j, j+1$ .

A natural benchmark is a comparison with Trotterized even–odd splitting schemes. On a one-dimensional lattice, the Hamiltonian is partitioned into an even and odd part, namely $H = H_{\text{even}} + H_{\text{odd}}$ with

$\begin{equation} H_{\text{even}/\text{odd}} = \sum_{\substack{j = 0 \\ j \text{even/odd}}}^{L-1} \hat{h}_{j,j+1}. \end{equation} \tag{ 31 }$

By construction, the summands commute pairwise since the operators act on disjoint sites. A time step Δt of $H_{\text{even}/\text{odd}}$ is exactly realized by a quantum circuit layer consisting of copies of the two-qubit quantum gate $\mathrm{e}^{-i \hat{h} \Delta t}$ . A step of the Strang splitting approximation, equation (2), can then be implemented using three circuit layers with the same layout as in figure 1. The construction works analogously for other splitting methods.

We fix a time t and quantify the approximation error by the spectral norm distance between the numerically exact time evolution operator $\mathrm{e}^{-i H t}$ and the unitary matrix resulting from the splitting method or optimized circuit $W(G_1, \ldots, G_n)$ , respectively. Note that the target function (17) actually minimizes the Frobenius norm distance, since this has a canonical and simpler tensor network representation, as shown in figure 3.

For existing splitting schemes we consider the approximation error for an increasing number of steps $r \in \mathbb{N}_{\unicode{x2A7E} 1}$ . Thus a single time step has size $\Delta t = \frac{t}{r}$ and consists, e.g. of three substeps for the Strang method. We choose the convention that the 'cost' of an integration method is the number n of required circuit layers. Since we can always merge the last substep with the first substep of the next time step, the number of layers for the Strang method is $n = 2 r + 1$ . In general, for a method with s substeps, we have $n = (s - 1) r + 1$ .

Besides the Strang method, we also apply the Suzuki formula of order 4 by Suzuki [24], the method of order 4 by Yoshida [25], the Runge–Kutta–Nyström method of order 4 by McLachlan [26], and the partitioned Runge–Kutta method $\text{S}_6$ of order 4 by Blanes and Moan [27].

6.1. Ising model on a one-dimensional lattice

We first consider the transverse-field Ising model (TFIM) Hamiltonian on the one-dimensional lattice $\mathbb{Z}_{/(L)}$ with L sites (enumerated as $0, 1, \ldots, L-1$ ) and periodic boundary conditions:

$\begin{equation} H^{\text{Ising}} = \sum_{j = 0}^{L-1} \left( J Z_j Z_{j+1} + g X_j + h Z_j \right). \end{equation} \tag{ 32 }$

Here X_j and Z_j are the usual Pauli matrices acting on site j, and $J, g, h \in \mathbb{R}$ are parameters. Without loss of generality, we set J = 1. We will first consider the integrable case h = 0, and then investigate general parameter values for which the model becomes non-integrable.

To demonstrate the behavior of the Riemannian trust-region algorithm, figure 4 visualizes the spectral norm distance and target function during the optimization iterations. For 7 circuit layers, the two curves almost overlap exactly. One also notices that a plateau is reached after around 160 iterations, such that we stop after 200 iterations. We have adopted the strategy of either using the circuit gates of an existing splitting method as starting point for the optimization, or start with the optimized gates from a circuit with less layers and pad identity gates (which are then subject to the optimization as well).

A benchmark evaluation of the numerical optimization for the integrable Ising model and t = 1 is shown in figure 5. For the even–odd splitting methods, we have used the two-qubit gate $\mathrm{e}^{-i J \left( Z \otimes Z + g \frac{1}{2}(X \otimes I_2 + I_2 \otimes X) \right) \Delta t}$ . The approximation error of the optimized circuit is represented by the thick blue curve. One observes a clear advantage: for example, the optimized circuit achieves the same or better accuracy using 9 layers as compared to the $\text{S}_6$ method by Blanes and Moan using 49 layers. The chosen parameter g = 0.75 corresponds to the ordered phase; as indication that the results are not parameter-specific, we have repeated the calculations for g = 1.5 corresponding to the disordered phase, with qualitatively same outcome (data not shown).

**Figure 5.** Approximation error (quantified by the spectral norm distance) of the quantum time evolution operator for t = 1 governed by the Ising model Hamiltonian (32) on a one-dimensional lattice. The circuit gates have been optimized for L = 6 lattice sites. (a) and (c) show a comparison with existing splitting methods from the literature, with the thick blue curves corresponding to the optimized circuits of the present work. Subplot (b) displays the error for larger systems, always using the same optimized brickwall circuit gates $(G_1, \ldots, G_n)$ found in (a).
Download figure:
Standard image High-resolution image

As discussed at the end of section 3, it is possible to use the optimized circuit gates also for larger system sizes L. The corresponding approximation error quantified by the spectral norm distance is shown in figure 5(b). As expected from the light cone picture, the error hardly increases for larger L.

Naturally, one could attribute the large effectiveness of the circuit optimization to the integrable property of the TFIM. In this sense, our method could be regarded as numerical equivalent of circuit compression based on the integrable structure manifest in the Yang-Baxter equation, as studied in [28–31]. Surprisingly, however, the optimization results remain qualitatively unaffected by turning on an integrability-breaking longitudinal field, i.e. setting h to a non-zero value, as shown in figure 5(c).

6.2. Heisenberg model on a one-dimensional lattice

As next example, we consider the Heisenberg-type Hamiltonian

$\begin{equation} H^{\text{Heis}} = \sum_{j = 0}^{L-1} \sum_{\alpha = 1, 2, 3} \left( J_\alpha \sigma^{\alpha}_j \sigma^{\alpha}_{j+1} + h_\alpha \sigma^{\alpha}_j \right), \end{equation} \tag{ 33 }$

with $(\sigma^1, \sigma^2, \sigma^3) = (X, Y, Z)$ the vector of Pauli matrices and $\vec{J} \in \mathbb{R}^3$ , $\vec{h} \in \mathbb{R}^3$ parameters. In our numerical simulations we set $\vec{J} = (1, 1, -\frac{1}{2})$ and $\vec{h} = (\frac{3}{4}, 0, 0)$ . As before, the model is defined on a lattice with L sites and periodic boundary conditions. The final time is set to $t = \frac{1}{4}$ , which is smaller than for the Ising model to compensate for the faster spreading velocity of correlations (data not shown).

The approximation error is plotted in figure 6(a), together with a benchmark comparison of splitting methods from the literature, as before. The advantage of the optimized circuit for the Heisenberg time evolution is less pronounced than for the Ising model, but nevertheless, the error is sill more than an order of magnitude smaller as compared to the Suzuki method. The dashed curve in figure 6(a) shows the results for a more simplistic optimization protocol, namely choosing identity gates as starting points. The gap to the best results widens with increasing number of layers, which is likely due to the higher-dimensional optimization landscape. In general, one observes that the results can sensitively depend on the starting point. As described above, employing a splitting method as starting point guarantees that the optimized circuit performs at least as well as the splitting method.

**Figure 6.** Approximation error of the quantum time evolution operator for $t = \frac{1}{4}$ governed by the Heisenberg model Hamiltonian (33) with $\vec{J} = (1, 1, -\frac{1}{2})$ and $\vec{h} = (\frac{3}{4}, 0, 0)$ on a one-dimensional lattice. (a) Comparison of the circuit optimization for L = 6 sites with splitting methods from the literature. (b) Approximation error for larger system sizes, using the same optimized brickwall circuit gates $(G_1, \ldots, G_n)$ .
Download figure:
Standard image High-resolution image

As for the Ising model, we also probe the generalizability to larger systems, by showing the error depending on L in figure 6(b). As expected, the error increases only slightly with L.

6.3. Ising model on a ladder geometry

As last example, we again consider the Ising model, but now on a lattice with ladder geometry, i.e. of dimension $L_x \times 2$ , with periodic boundary conditions along the ladder direction. The Ising Hamiltonian (without longitudinal field) on a general lattice reads

$\begin{equation} H^{\text{Ising}} = \sum_{\langle j, k \rangle} J Z_j Z_k + \sum_{j = 0}^{L-1} g X_j, \end{equation} \tag{ 34 }$

where the first sum runs over all nearest neighbor lattice sites. For our simulation, we set J = 1, g = 3, and final time $t = \frac{1}{4}$ .

The ladder geometry requires a modification of the splitting scheme: besides the interactions in x-direction, which we can split using an even–odd scheme as before, there are now additional interactions along the steps of the ladder. Thus we split the Hamiltonian into three terms: $H^{\text{Ising}}_{\text{ladder}} = H^{\text{Ising}}_{x,\text{even}} + H^{\text{Ising}}_{x,\text{odd}} + H^{\text{Ising}}_y$ . The Ansatz circuit layout for the optimization is modified accordingly, using three different gate topologies for the layers. Likewise, for the benchmark comparison we use splitting schemes supporting three terms. The Strang scheme and the methods by Suzuki and Yoshida can be adapted for this purpose. For example, the Strang splitting scheme for partitioning $H = H_{\text{a}} + H_{\text{b}} + H_{\text{c}}$ reads

$\begin{equation} \mathrm{e}^{-i H t} = \mathrm{e}^{-i H_{\text{a}} t/2} \mathrm{e}^{-i H_{\text{b}} t/2} \mathrm{e}^{-i H_{\text{c}} t} \mathrm{e}^{-i H_{\text{b}} t/2} \mathrm{e}^{-i H_{\text{a}} t/2} + \mathcal{O}\left(t^3\right). \end{equation} \tag{ 35 }$

In our setting, $H_{\text{a}} = H^{\text{Ising}}_{x,\text{even}}$ , $H_{\text{b}} = H^{\text{Ising}}_{x,\text{odd}}$ and $H_{\text{c}} = H^{\text{Ising}}_y$ . We additionally include the AY 15–6 method of order 6 by Auzinger et al [32] in the comparison. A common feature of the splitting methods and our Ansatz is the sequential ordering of the matrix exponentials and circuit layouts, respectively, namely as 'abcbabcba...'.

The approximation error after numerical optimization and a comparison with the existing splitting methods is shown in figure 7(a). Similar to the Ising model on a one-dimensional lattice, one observes a large advantage of the optimized circuits.

To probe the suitability of the optimized gates for larger systems, we use them for a $6 \times 2$ layout and record the approximation error in figure 7(b). As expected, the error increases only slightly. We cannot reach even larger systems in this comparison due to the difficulty of computing and storing the exact matrix exponential.

7. Conclusions and outlook

In this work we have constructed and explored a numerical scheme for optimizing the two-qubit gates $(G_1, \ldots, G_n)$ in a quantum circuit Ansatz as general unitary matrices. To use these gates in physical quantum computers still requires their further decomposition into hardware-native gates; for this purpose, one can rely on existing algorithms from the literature [33, 34]. For example, if the hardware supports CNOT and arbitrary single-qubit gates, each $G_\ell$ layer can be represented by (at most) seven layers of such gates [34]. In general, a good decomposition strategy and the 'cost' of applying the gates will depend on the specific hardware, cf [35]. We leave a detailed use-case study and comparison with parametrized gates as in [4, 6] for future work.

In the numerical experiments, we have seen that the improvement due to the optimization compared to existing splitting methods strongly depends on the model. It would be interesting to gain a deeper mathematical understanding of these differences, in particular regarding the (ir-)relevance of integrability. A related question is how to find suitable starting gates for the optimization, to arrive at a global optimum in the best case. Conversely, further improvements of the optimized circuits shown in this work might be possible.

The periodic boundary conditions are a tool to avoid finite size effects in the numerical simulations. Using the optimized gates on actual quantum computers requires additional considerations regarding the boundary conditions. We note that a one-dimensional system with periodic boundary conditions (as studied in the present work) could be realized on a physical quantum computer having a two-dimensional topology and nearest-neighbor connectivity, by mapping the logical qubits to a loop formed by physical qubits. Nevertheless, endowing the logical system with open boundary conditions would require a dedicated optimization of the brick wall circuit gates without translation invariance. We leave this task for future work.

We remark that our method can also be applied to quantum models with longer-range interactions, for which an even–odd splitting of the Hamiltonian is not feasible, assuming that the conditions based on the light-cone picture in section 3 are still satisfied. Related to that, our optimized gates could also be of practical use for the time-evolving block decimation (TEBD) algorithm, when simulating quantum systems on a classical computer.

Another natural application is the approximation of the time evolution on a two-dimensional lattice. The reasonably smallest dimension on a square lattice is a window of size $4 \times 4$ , to avoid interference due to the periodic boundary conditions. Performing the numerical optimization for 16 sites is technically challenging, but we plan to tackle this scenario via a tailored implementation, possibly running on a compute cluster. One could avoid very large matrices via a matrix-free application of circuit gates and matrix exponentials to statevectors, or via the 'exponential-free' approach investigated in [36].

The numerical method developed in our work can approximate a general translation invariant unitary target matrix. Thus another use-case could be the decomposition of multiple-qubit gates.

In principle, the Riemannian gate optimization framework is applicable for non-translation invariant systems as well when admitting independent gates within each layer. Since the number of to-be optimized gates then increases with the system size L (or number of 'orbitals' in a chemistry setting), the Riemannian trust-region part of the algorithm becomes computationally more expensive in practice. More specifically, in this scenario the overall number of gates is then $n = D \cdot L/2$ for D layers.

Acknowledgments

We would like to thank Alexander Kemper and Frank Pollmann for insightful discussions.This research is part of the Munich Quantum Valley, which is supported by the Bavarian state government with funds from the Hightech Agenda Bayern Plus. The research is also supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy via the project BayQS with funds from the Hightech Agenda Bayern.

Data availability statement

The data that support the findings of this study are openly available at the following URL: https://github.com/qc-tum/rqcopt.

Appendix: Wirtinger formalism

The Wirtinger derivatives are an alternative approach (as compared to equation (7) above) to define complex derivatives. Given a complex-valued smooth function $f: \mathbb{C} \to \mathbb{C}$ (not necessarily holomorphic) and a complex number $z = x + i y$ with $x, y \in \mathbb{R}$ , one introduces

$\begin{align} \partial_z \ f\left(z\right) & = \frac{1}{2}\left(\partial_x \ f - i \partial_y f \right), \end{align} \tag{ A1a }$

$\begin{align} \partial_{z^*} f\left(z\right) & = \frac{1}{2}\left(\partial_x \ f + i \partial_y f \right), \end{align} \tag{ A1b }$

where $\partial_x \ f$ and $\partial_y f$ are the conventional partial derivatives when interpreting f as function $f: \mathbb{R}^2 \to \mathbb{C}$ , $f(x, y) = f(x + i y)$ .

In case f is holomorphic, the Cauchy-Riemann equations imply that the Wirtinger derivative $\partial_z \ f$ is equal to the complex derivative of f, whereas the conjugated Wirtinger derivative vanishes: $\partial_{z^*} f = 0$ .

For smooth functions $f, g: \mathbb{C} \to \mathbb{C}$ , the following chain rule holds:

$\begin{equation} \partial_z \left(f \circ g\right) = \left( \partial_w f \circ g \right) \partial_z g + \left( \partial_{w^*} f \circ g \right) \partial_z g^*, \end{equation} \tag{ A2 }$

where ' $\circ$ ' denotes function composition and $w = g(z)$ .

The product rule takes the same form as for real-valued functions:

$\begin{equation} \partial_z \left(f \cdot g\right) = \left(\partial_z \ f\right) \cdot g + f \cdot \left(\partial_z g\right), \end{equation} \tag{ A3 }$

with $f \cdot g$ the pointwise product of f and g.

By definition, we can express the gradient in equation (7) of a smooth function $f: \mathbb{C} \to \mathbb{R}$ in terms of the Wirtinger derivative:

$\begin{equation} \mathrm{grad} \ f\left(z\right) = 2 \left(\partial_z \ f\left(z\right) \right)^*. \end{equation} \tag{ A4 }$

Riemannian quantum circuit optimization for Hamiltonian simulation

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Notation and setup

3. Light-cone considerations and generalization to larger systems

4. Mathematical formalism of optimization on the manifold of unitary matrices

5. Numerical method for brick wall circuit optimization

6. Numerical simulations

6.1. Ising model on a one-dimensional lattice

6.2. Heisenberg model on a one-dimensional lattice

6.3. Ising model on a ladder geometry

7. Conclusions and outlook

Acknowledgments

Data availability statement

Appendix: Wirtinger formalism

Riemannian quantum circuit optimization for Hamiltonian simulation

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Notation and setup

3. Light-cone considerations and generalization to larger systems

4. Mathematical formalism of optimization on the manifold of unitary matrices

5. Numerical method for brick wall circuit optimization

6. Numerical simulations

6.1. Ising model on a one-dimensional lattice

6.2. Heisenberg model on a one-dimensional lattice

6.3. Ising model on a ladder geometry

7. Conclusions and outlook

Acknowledgments

Data availability statement

Appendix: Wirtinger formalism