research-article

Open Access

Quantum Communication Complexity of Linear Regression

Authors:
Ashley Montanaro

School of Mathematics, University of Bristol, Bristol, UK and Phasecraft Ltd., Bristol, UK

School of Mathematics, University of Bristol, Bristol, UK and Phasecraft Ltd., Bristol, UK

0000-0001-5640-0343
View Profile

,
Changpeng Shao

School of Mathematics, University of Bristol, Bristol, UK

School of Mathematics, University of Bristol, Bristol, UK

0000-0002-3008-7296
View Profile

Authors Info & Claims

ACM Transactions on Computation Theory Volume 16 Issue 1Article No.: 1pp 1–30https://doi.org/10.1145/3625225

Published:12 March 2024Publication History

ACM Transactions on Computation Theory

Abstract

Quantum computers may achieve speedups over their classical counterparts for solving linear algebra problems. However, in some cases—such as for low-rank matrices—dequantized algorithms demonstrate that there cannot be an exponential quantum speedup. In this work, we show that quantum computers have provable polynomial and exponential speedups in terms of communication complexity for some fundamental linear algebra problems if there is no restriction on the rank. We mainly focus on solving linear regression and Hamiltonian simulation. In the quantum case, the task is to prepare the quantum state of the result. To allow for a fair comparison, in the classical case, the task is to sample from the result. We investigate these two problems in two-party and multiparty models, propose near-optimal quantum protocols, and prove quantum/classical lower bounds. In this process, we propose an efficient quantum protocol for quantum singular value transformation, which is a powerful technique for designing quantum algorithms. We feel this will be helpful in developing efficient quantum protocols for many other problems.

1 INTRODUCTION

Quantum computers are designed to solve some problems much faster than classical computers. In particular, quantum computers could be good at solving linear algebra problems. A famous example is the Harrow-Hassidim-Lloyd algorithm for solving linear systems [14], whose complexity is only polylog in the dimension. Over the past 10 years, quantum linear algebra techniques have been extensively developed especially with the discovery of block-encoding [8] and quantum singular value transformation [12]. For many linear algebra problems, the corresponding quantum algorithms have complexity only polylog in the dimension, which was claimed to be exponentially faster than classical algorithms. However, in 2018, Tang’s dequantized algorithm [33] and its development (e.g., References [9, 11, 17, 31]) showed that quantum computers indeed do not have exponential speedups (in terms of time and query complexity) for many linear algebra problems of low-rank assuming certain data structures. In this article, we show that, in the setting of communication complexity and without the low-rank assumption, provable speedups can be obtained for two fundamental problems: linear regression and Hamiltonian simulation.

1.1 Our Results

For linear regression problems, we consider two types of models. In the first model, there are only two parties: Alice and Bob. The communication between Alice and Bob can be 1-way or 2-way. In the second model, there are multiple parties. There is a referee so the communication is 2-way and only between each party and the referee. We call this model the quantum coordinator model, as it is a quantum version of the classical coordinator model [35].

1.1.1 Alice-Bob Model.

The setting here is that Alice has a matrix \(A\in \mathbb {R}^{m\times n}\) and Bob has a vector \({\bf b}\in \mathbb {R}^{m}\). Their goal is to solve the linear regression problem \(\arg \min _{{\bf x}}\Vert A{\bf x}-{\bf b}\Vert\) together using as little communication as possible. More precisely,

—	In the quantum case, their goal is to prepare a quantum state \(\| \tilde{{\bf x}}_{\rm opt} \rangle\) that is \(\varepsilon\) close to \(\| {\bf x}_{\rm opt} \rangle = \| A^+{\bf b} \rangle\) in trace distance, where \(A^+\) is the pseudoinverse of \(A\),¹ and \(\| A^+{\bf b} \rangle\) denotes the normalized state corresponding to \(A^+{\bf b}\). This corresponds to an optimal solution to the problem of minimizing \(\Vert A{\bf x}-{\bf b}\Vert\).
—	In the classical case, the goal is to sample from a distribution \(\widetilde{\mathcal {P}}(\| {\bf x}_{\rm opt} \rangle)\) that is \(\varepsilon\) close to the distribution \(\mathcal {P}(\| {\bf x}_{\rm opt} \rangle)\) defined by \(\| {\bf x}_{\rm opt} \rangle\) in the total variation distance, i.e., \(\Vert \widetilde{\mathcal {P}}(\| {\bf x}_{\rm opt} \rangle) - \mathcal {P}(\| {\bf x}_{\rm opt} \rangle)\Vert _{1} \le \varepsilon\).

The problem solved by the quantum computer is at least as hard as the problem solved by the classical computer. If the communication is 2-way, then Alice and Bob can send quantum/classical information to each other. If the communication is 1-way, then only Alice or Bob can send quantum/classical information to the other party. In communication complexity, we are interested in the minimal amount of communication (which is described by the number of qubits or bits) the parties used to achieve their goal. Since our main focus is on the quantum speedup with respect to dimension, throughout, we assume that matrix entries are specified with \(O(\log (mn))\) bits. Our main results are summarized in Table 1.

Table 1.

	Alice \(\rightarrow\) Bob	Bob \(\rightarrow\) Alice	Alice \(\leftrightarrow\) Bob
Quantum	\(\widetilde{O}((\kappa /\gamma)^2 \min (m,n))\)	\(\widetilde{O}((\kappa /\gamma)^2)\)	\(\widetilde{O}(\kappa /\gamma)\)
	\(\widetilde{\Omega }(\min (m,n))\)	\(\widetilde{\Omega }(\kappa ^2+1/\gamma ^2)\)	\(\widetilde{\Omega }(\kappa +1/\gamma)\)
Classical	\(\widetilde{O}(m n)\)	\(\widetilde{O}(m)\)	\(\widetilde{O}(m)\)
	\(\widetilde{\Omega }(\min (m,n))\)	\(\Omega (\min (m,n))\)	\(\Omega (\min (m,n))\)
Quantum speedups	at most quadratic	can be exponential	can be exponential

Alice has \(A\) and Bob has \({\bf b}\). All the lower bounds hold even if \(m=n, \kappa = O(1), 1/\gamma = O(1)\). Here, \(\kappa\) is the condition number of \(A\), i.e., the ratio of the maximal and minimal nonzero singular values of \(A\), and \(\gamma = {\Vert A {\bf x}_{\rm opt}\Vert }/{\Vert {\bf b}\Vert }\), which describes the overlap of \({\bf b}\) in the column space of \(A\). With \(\widetilde{O}, \widetilde{\Omega }\), we ignore all the polylog factors in the input size, the condition number, and the accuracy. The arrow means the direction of communication. The results are presented rigorously in Theorems 7, 10, and 12.

View Table

Table 1. Comparison of Quantum and Classical Communication Complexity for Solving the Linear Regression Problem \({\bf x}_{\rm opt}=\arg \min _{{\bf x}}\Vert A{\bf x}-{\bf b}\Vert\)

Alice has \(A\) and Bob has \({\bf b}\). All the lower bounds hold even if \(m=n, \kappa = O(1), 1/\gamma = O(1)\). Here, \(\kappa\) is the condition number of \(A\), i.e., the ratio of the maximal and minimal nonzero singular values of \(A\), and \(\gamma = {\Vert A {\bf x}_{\rm opt}\Vert }/{\Vert {\bf b}\Vert }\), which describes the overlap of \({\bf b}\) in the column space of \(A\). With \(\widetilde{O}, \widetilde{\Omega }\), we ignore all the polylog factors in the input size, the condition number, and the accuracy. The arrow means the direction of communication. The results are presented rigorously in Theorems 7, 10, and 12.

From Table 1, we can see that

—	If the communication is 1-way from Alice to Bob, then the quantum speedup is at most quadratic. The quantum protocol in this case is optimal if the linear regression problem is well-conditioned. The quadratic speedup here is not related to Grover’s algorithm or more generally the amplitude amplification [5]. A key point here is that Bob holds too little information about the linear regression they aim to solve. When the communication is 1-way from Alice to Bob, then even in the quantum case, Alice still needs to send a lot of information about the matrix to Bob.
—	If the communication is 1-way from Bob to Alice or 2-way, then the quantum speedup is exponential if the linear regression problem is well-conditioned. The quantum protocols in these two cases are optimal up to a polylogarithmic factor.

1.1.2 Coordinator Model.

In the second model, we consider a more general setting. Now, we suppose there are \(r\) parties \(P_0,\ldots ,P_{r-1}\). The party \(P_i\) holds a matrix \(A_i \in \mathbb {R}^{d_i\times n}\) and a vector \({\bf b}_i \in \mathbb {R}^{d_i}\). Their goal is to solve the linear regression problem (1) \(\begin{equation} \text{argmin}_{\bf x}\Vert A{\bf x}-{\bf b}\Vert , \quad \text{ where } A = \begin{pmatrix} A_0 \\ \vdots \\ A_{r-1} \end{pmatrix}, ~ {\bf b}= \begin{pmatrix} {\bf b}_0 \\ \vdots \\ {\bf b}_{r-1} \end{pmatrix} . \end{equation}\) In this case, there is a referee and every party can only communicate with the referee. Their goal is similar, i.e., up to certain errors, outputting the quantum state of the optimal solution quantumly or sampling from the optimal solution classically by one party or by the referee. In the classical case, Vempala, Wang, and Woodruff studied the problem (1) with the goal of outputting the whole vector of the optimal solution [35]. A near-optimal protocol of complexity \(O(r n^2)\) was given. The lower bound is \(\Omega (r n+n^2)\). Here, our goal is different from theirs.

In the quantum case, if we consider the problem in the simultaneous message passing (SMP) model (in which the referee is not allowed to send information to the parties), then we show that \(\Omega (n)\) qubits of communication are required for any quantum protocols to solve Equation (1). Because of this and also inspired by the classical coordinator model [35], we assume that the communication is 2-way between each party and the referee (also known as the coordinator, classically). We call this the quantum coordinator model. Our main result is summarized in Table 2.

Table 2.

Quantum	Classical
\(\widetilde{O}(r^{1.5}\kappa /\gamma)\)	\(O(rn^2)\)
\(\Omega (r\kappa)\)	\(\Omega (rn)\)

Here, \(\kappa\) is the condition number of \(A\) defined in (1) and \(\gamma =\Vert A{\bf x}_{\rm opt}\Vert /\Vert {\bf b}\Vert\). The results are presented rigorously in Theorems 15 and 19.

View Table

Table 2. Comparison of Quantum and Classical Communication Complexity for Solving (1) in the Coordinator Model

Here, \(\kappa\) is the condition number of \(A\) defined in (1) and \(\gamma =\Vert A{\bf x}_{\rm opt}\Vert /\Vert {\bf b}\Vert\). The results are presented rigorously in Theorems 15 and 19.

From Table 2, we have

—	The quantum protocol has an optimal dependence on the condition number. Also, when \(r=O({\rm polylog}(mn))\), quantum computers are exponentially faster than classical computers for well-conditioned linear regression problems.
—	Similar to the result of Reference [35], our result shows that it is hard for a classical computer even for a weak task of solving linear regressions.

1.2 Summary of Techniques

In the Alice-Bob model, the quantum protocols are straightforward. For example, if the communication is 1-way from Bob to Alice, then Bob just sends the quantum state of \({\bf b}\) to Alice, who applies \(A^{+}\) to this quantum state and performs some measurements. The interesting part is the lower bound analysis, which is based on the hardness of the disjointness problem and the index problem [7, 21, 22, 30]. In the disjointness problem, Alice and Bob, respectively, have a subset \(x\) and \(y\) of \([n]\), and their goal is to determine if \(x\cap y \ne \emptyset\). This problem can be reduced to a linear regression problem by constructing a diagonal matrix \(D\) from \(x\) and a vector \({\bf b}\) from \(y\) as follows: We set \(D_i = 1\) if \(i\in x\) and \(D_i=1/\varepsilon\) if \(i\notin x\) for some small \(\varepsilon\). Similarly, we define \(b_i = 1\) if \(i\in y\) and \(b_i=\varepsilon\) if \(i\notin y\). It is not hard to see that the indices in \(x\cap y\) have large amplitudes in the quantum state \(| D^{-1}{\bf b} \rangle\). The index problem is used in a similar way. This is indeed the main idea of our quantum/classical lower bound analysis. In different settings, we construct appropriate diagonal matrices and vectors.

The quantum protocol for solving Equation (1) is based on quantum singular value transformation (QSVT) [12], which is a useful technique for designing quantum algorithms (e.g., see the survey paper [24]). In the quantum coordinator model, we show that QSVT is still applicable and efficient (see Proposition 14). Unlike time and query complexity, the communication complexity of implementing QSVT can be estimated precisely. For example, if we apply QSVT to solve the linear regression problem (1), then the time complexity is \(\widetilde{O}((T_A+T_b)\alpha /\gamma \sigma _{\min })\) [8, Corollary 31], where \(T_A\) is the complexity of constructing the block-encoding of \(A\) so \(A/\alpha\) is the top-left corner of a unitary, \(T_b\) is the complexity of preparing the quantum state of \({\bf b}\), and \(\sigma _{\min }\) is the minimal singular value of \(A\). Regarding the communication complexity, we can show that \(T_A=T_b=O(r\log n)\) and \(\alpha = O(\sqrt {r}\Vert A\Vert)\). Here, \(T_A, T_b\) should be understood as communication complexity. We can also show that the above formula for time complexity is still true so the communication complexity is \(\widetilde{O}(r^{1.5}\kappa /\gamma)\). This is exactly the result we stated in Table 2. Since QSVT is a powerful technique in designing quantum algorithms, we believe that for many other linear algebra problems, quantum computers still have provable speedups in terms of communication complexity.

Indeed, as an application, we show that quantum computers achieve provable speedups for Hamiltonian simulation. In the Hamiltonian simulation problem, we suppose that the party \(P_i\) holds a Hamiltonian \(H_i\) of dimension \(n\), the referee holds a quantum state \(| \psi \rangle\), and their goal is to prepare the state \(e^{i(H_0+\cdots +H_{r-1})t} | \psi \rangle\) quantumly or sample from it classically. In Propositions 21 and 22, we show that the quantum communication complexity of this Hamiltonian simulation problem is \(\widetilde{O}(r|t|\sum _{i=0}^{r-1} \Vert H_i\Vert)\) and \(\Omega (n)\) bits communication are required for any classical protocols to sample from \(e^{i(H_{0}+\cdots +H_{r-1})t} | \psi \rangle\).

1.3 Related Work

The problem studied in this article is partially inspired by Reference [35], in which Vempala, Wang, and Woodruff studied the classical communication complexity of solving linear regression (and many other optimization problems) and showed that the naive protocol (of sending the whole information to others) is close to optimal. Their goal is to output a vector solution, while our goal is to sample from the solution. Recently, Tang et al. [34] studied quantum communication complexity of solving linear regression problems. Their focus was on the Alice-Bob model and their goal was to output an approximate vector solution. The complexity they proved is \(O(n\kappa /\varepsilon)\), where \(n\) is the dimension, \(\kappa\) is the condition number, and \(\varepsilon\) is the precision. Regarding quantum communication complexity for sampling problems, Ambainis et al. [3] exhibited an exponential gap between the quantum and classical communication required for a sampling problem related to disjointness. In Reference [25], Montanaro showed an exponential gap between the quantum and classical communication for a distributed variant of the Fourier sampling problem. Another linear algebra problem that shows quantum computers are exponentially better than classical computers is the vector-in-subspace problem studied by Raz in Reference [28]. This problem is closely related to the two-party linear regression problem we study in the case where communication is from Bob to Alice and where the matrix \(A\) is unitary. It was proved in Reference [19] that the one-way quantum protocol is exponentially better than any classical protocol, even if the latter is allowed bounded error and two-way communication. When restricted to finite fields, Sun and Wang [32] studied quantum/classical communication complexity of matrix singularity and determinant computation problems. Their results suggest that there is no exponential quantum speedup for those problems in terms of dimension.

2 PRELIMINARIES

2.1 Communication Complexity

Communication complexity has been studied extensively in the field of classical and quantum computing [6, 21, 36, 38, 39]. It usually deals with the following type of problem: Suppose there are two separated parties: Alice and Bob. Alice receives some input \(x\in X\) and Bob receives some input \(y \in Y\). Their goal is to compute \(f(x,y)\) together using as little communication as possible. We usually assume that Alice and Bob have unlimited computational power so they can perform any computation as efficiently as they want. The measure of complexity used is the amount of communication required to solve the problem. All other non-communication operations are treated as free. A protocol is an algorithm where first Alice does some individual computation and then sends a quantum/classical message to Bob, then Bob does some individual computation and sends a quantum/classical message to Alice, and so on. In the end, one of the parties outputs some value that should be \(f(x,y)\). A quantum message usually refers to a quantum state. The cost of sending a quantum state is described by the number of qubits the quantum state occupies. In contrast, the cost of sending a classical message is the number of bits the message uses.

The cost of a protocol is the total number of bits/qubits communicated on the worst-case input. We are more concerned about the minimal amount of communication they need. A deterministic protocol for \(f\) always has to output the right value \(f(x,y)\) for all \((x,y)\). In a bounded-error protocol, the protocol has to output the right value \(f(x,y)\) with probability at least \(2/3\) for all \((x,y)\). In the randomized model, Alice and Bob share an unlimited supply of uniformly random bits, which they can use in deciding what messages to send. Also, an error is allowed in randomized protocols, which means the output of the protocol is correct with probability at least 2/3. In this work, in the classical case, we will focus on randomized communication complexity. In the quantum case, we focus on bounded error communication complexity.

If only one party (say, Alice) can send information to another party, then this is known as 1-way communication model. Otherwise, it is a 2-way communication model. More generally, there can be multiple parties. In this case, there are many ways to define the models. For example, in the simultaneous message passing(SMP) model, there is a referee so each party can only communicate with the referee. If the communication is 2-way, then it is also known as coordinator model, classically [35].

Regarding communication complexity, two fundamental problems are the index problem and the disjointness problem. These problems are well-studied classically and quantumly. They also play significant roles in this article for the lower bounds estimation. In the index problem, Alice has a bit string \((x_1,\ldots ,x_n)\in \lbrace 0,1\rbrace ^n\) and Bob has an index \(j\in [n]\). The goal is to determine \(x_j\). This problem is trivial if Bob can send information to Alice. Namely, Bob just sends the index \(j\) to Alice, and Alice outputs \(x_j\). The index problem is hard if the communication is 1-way from Alice to Bob. In the disjointness problem, Alice has a bit string \((x_1,\ldots ,x_n)\in \lbrace 0,1\rbrace ^n\) and Bob has another bit string \((y_1,\ldots ,y_n)\in \lbrace 0,1\rbrace ^n\). The goal is to determine if there is an index \(j\) such that \(x_j=y_j=1\). We summarize the known results for these two problems into the following proposition:

Proposition 1.

We have the following.

(1)	The classical 1-way communication complexity of the index problem is \(\Theta (n)\) [22].
(2)	The quantum 1-way communication complexity of the index problem is \(\Theta (n)\) [7].
(3)	The classical 2-way communication complexity of the disjointness problem is \(\Theta (n)\) [18, 29].
(4)	The quantum 2-way communication complexity of the disjointness problem is \(\Theta (\sqrt {n})\) [1, 30].

Another known result we will use is called Distributed Fourier Sampling [25]. In this problem, Alice is given a function \(f:\lbrace 0,1\rbrace ^n \rightarrow \lbrace \pm 1\rbrace\), Bob is given a function \(g:\lbrace 0,1\rbrace ^n \rightarrow \lbrace \pm 1\rbrace\), their task is for one party (say, Bob) to approximately sample from the distribution \(P_{fg}\) on \(n\)-bit strings \(s\) where \(\begin{equation*} P_{fg}(s) = \left(\frac{1}{2^n} \sum _{x\in \lbrace 0,1\rbrace ^n} (-1)^{s\cdot x} f(x) g(x) \right)^2, \end{equation*}\) and \(s\cdot x = \sum _i s_ix_i\). That is, Bob must output a sample from any distribution \(\widetilde{P}_{fg}\) such that \(\Vert \widetilde{P}_{fg} - P_{fg}\Vert _1 \le \varepsilon\) for some constant inaccuracy \(\varepsilon\).

Proposition 2 (Theorem 1 of [25]).

There exist universal constants \(\varepsilon , \gamma \gt 0\) such that, for sufficiently large \(n\), any 2-way classical communication protocol for Distributed Fourier Sampling with shared randomness and inaccuracy \(\varepsilon\) must communicate at least \(\gamma 2^n\) bits.

The last result will be used in this work is about the communication complexity of the the multi-player set-disjointness problem [26]. In this problem, player \(P_j\) holds a subset \(T_j \in [n]\), where \(j\in \lbrace 1,\ldots ,k\rbrace\), and their goal is to determine if there is a \(j\in \lbrace 2,\ldots ,k\rbrace\) such that \(T_1\cap T_j \ne \emptyset\). This problem was studied in the coordinator model.

Proposition 3 (Theorem 3.1 of [26]).

Assume that \(n\ge 3{,}200 k\), then the communication complexity of the multi-player set-disjointness problem is lower bounded by \(\Omega (kn)\) in the coordinator model.

2.2 Block-encoding

Our quantum protocols in the multiparty model are based on QSVT [12]. A key ingredient of using QSVT is block-encoding. For convenience, in this section, we list some results about block-encoding that will be used in our quantum protocols.

Definition 4

(Block-encoding, cf. Definition 24 of [12]).

Suppose that \(A\) is an \(s\)-qubit operator, \(\alpha \ge \Vert A\Vert , \varepsilon \in \mathbb {R}^{\gt 0}\), and \(q\in \mathbb {N}\), then we say that the \((s+q)\)-qubit unitary \(U_A\) is an \((\alpha , q, \varepsilon)\) block-encoding of \(A\), if (2) \(\begin{equation} \Vert A - \alpha (\langle 0|^{\otimes q}\otimes I) U_A (| 0 \rangle ^{\otimes q}\otimes I) \Vert \le \varepsilon , \end{equation}\) where \(\Vert \cdot \Vert\) is the operator norm. In matrix form (3) \(\begin{equation} U_A = \begin{pmatrix}A/\alpha & \cdot \\ \cdot & \cdot \\ \end{pmatrix} \end{equation}\) up to an error \(\varepsilon\).

In quantum computing, we hope \(\alpha\) is as small as possible. It is obvious that the optimal choice is \(\alpha = \Vert A\Vert\). In the model of communication complexity, we assume that each party has unlimited computational power, so each party can first compute the singular value decomposition (SVD) of \(A\) and use it to construct the block-encoding with \(\alpha = \Vert A\Vert\). We can even assume that \(\varepsilon =0\). More precisely, if \(A=UDV^T\) is the SVD of \(A\), then (4) \(\begin{equation} \begin{pmatrix} U & \\ & I \end{pmatrix} \begin{pmatrix} D/\Vert A\Vert & \sqrt {I - D^2/\Vert A\Vert ^2} \\ \sqrt {I - D^2/\Vert A\Vert ^2} & -D/\Vert A\Vert \end{pmatrix} \begin{pmatrix} V^T & \\ & I \end{pmatrix} \end{equation}\) is an \((\Vert A\Vert , 1, 0)\) block-encoding of \(A\). In the above, we implicitly assumed that \(A\) is square, otherwise, we can add some zero rows or columns. In this article, we will always use this block-encoding.

The following result is a direct application of the technique of linear combination of unitaries [4]:

Lemma 5.

For each \(i\in \lbrace 0,1,\ldots ,r-1\rbrace\), let \(U_i\) be an \((\alpha _i, q,0)\) block-encoding of \(A_i \in \mathbb {R}^{d_i\times n}\), where \(\alpha _i\gt 0\). Let \(V\) be a unitary such that (5) \(\begin{equation} V | 0 \rangle = \frac{1}{\alpha } \sum _{i=0}^{r-1} \alpha _i | i \rangle , \end{equation}\) where \(\alpha = \sqrt {\sum _i\alpha _i^2}\). Then, (6) \(\begin{equation} ({\rm SWAP}_{1,2}\otimes I_n)\left(\sum _{i=0}^{r-1} | i \rangle \langle i| \otimes U_i\right) (V\otimes I_{2^q} \otimes I_n) \end{equation}\) is an \((\alpha , q+\log r,0)\) block-encoding of \(\begin{equation*} A := \begin{pmatrix}A_0 \\ \vdots \\ A_{r-1} \end{pmatrix}, \end{equation*}\) where \({\rm SWAP}_{1,2}\) swaps the first two registers containing \(\log r\) and \(q\) qubits, respectively.

Proof.

Denote the unitary (6) as \(W\). We can check that for any state \(| \psi \rangle\) \(\begin{eqnarray*} W | 0 \rangle ^{\otimes \log r} | 0 \rangle ^{\otimes q} | \psi \rangle &=& \frac{1}{\alpha } \sum _{i=0}^{r-1} | 0 \rangle ^{\otimes q} \otimes | i \rangle \otimes A_i | \psi \rangle + \text{orthogonal terms} \\ &=& \frac{1}{\alpha } | 0 \rangle ^{\otimes q} \otimes A | \psi \rangle + \text{orthogonal terms}. \end{eqnarray*}\) This means that \(W\) is a block-encoding of \(A\). □

Similarly, we have the following result:

Lemma 6.

For each \(i\in \lbrace 0,1,\ldots ,r-1\rbrace\), let \(U_i\) be an \((\alpha _i, q,0)\) block-encoding of \(A_i \in \mathbb {R}^{m\times n}\), where \(\alpha _i\gt 0\). Let \(V\) be a unitary such that (7) \(\begin{equation} V | 0 \rangle = \frac{1}{\sqrt {\alpha }} \sum _{i=0}^{r-1} \sqrt {\alpha _i} | i \rangle , \end{equation}\) where \(\alpha = \sum _i\alpha _i\). Then, (8) \(\begin{equation} (V^\dagger \otimes I_{2^q} \otimes I_n) \left(\sum _{i=0}^{r-1} | i \rangle \langle i| \otimes U_i\right) (V\otimes I_{2^q} \otimes I_n) \end{equation}\) is an \((\alpha , q+\log r,0)\) block-encoding of \(A_0+\cdots +A_{r-1}\).

Notation. For any matrix \(A\), with \(A^{+}\), we always mean the pseudoinverse of \(A\). The operator norm of \(A\) is denoted as \(\Vert A\Vert\), which is the largest singular value. The Frobenius norm is denoted as \(\Vert A\Vert _F\). It is defined as the square root of the sum of the absolute squares of the elements of \(A\). The condition number \(\kappa\) of matrix \(A\) is defined as the ratio of the largest singular value and the smallest nonzero singular value.

3 TWO PARTIES

In this section, we focus on the Alice-Bob model. In this model, Alice receives a matrix \(A \in \mathbb {R}^{m\times n}\) and Bob receives a vector \({\bf b}\in \mathbb {R}^{m}\), and their goal is to solve the linear regression problem (9) \(\begin{equation} \mathop{\text{argmin}}\limits_{{\bf x}\in \mathbb {R}^n} \quad \Vert A{\bf x}-{\bf b}\Vert ^2 \end{equation}\) through 1-way or 2-way communication.² In the quantum case, their goal is to output the quantum state of the optimal solution. In the classical case, their goal is to sample from the optimal solution.³ This sampling task is partially inspired by the quantum-inspired classical algorithms [33]. Also, sampling is a natural application of measuring the quantum state of the solution. But outputting a quantum state could be a much harder task than sampling, because we can perform many other operations on a quantum state.

If the communication is 2-way, then either Alice or Bob can output the result (i.e., the quantum state or a sample). If the communication is 1-way, then only Alice or Bob can output the result depending on the direction of the communication.

Since our main goal is to demonstrate the quantum advantage in terms of dimension, we assume that entries of \(A\) and \({\bf b}\) can be specified using \(O(\log (mn))\) bits for simplicity. In this work, many of the quantum protocols require the communication of the operator norm \(\Vert A\Vert\) and the 2-norm \(\Vert {\bf b}\Vert\). When entries are specified by \(O(\log (mn))\) bits, these two norms are also specified by \(O(\log (mn))\) bits. This is enough for us, since we indeed only need a good upper bound of these quantities in our quantum protocols. For the sampling task, the norm of \({\bf b}\) is unimportant, so in this article, we do not assume that \({\bf b}\) is a unit vector even if sometimes we use the notation \(| {\bf b} \rangle\).

The optimal solution of Equation (9) is \({\bf x}_{\rm opt}=A^+{\bf b}\). Since Alice can compute \(A^+\) in advance, the linear regression problem is indeed equivalent to the matrix-vector multiplication problem. We define (10) \(\begin{equation} \gamma := \frac{\Vert A {\bf x}_{\rm opt}\Vert }{\Vert {\bf b}\Vert } , \end{equation}\) which describes the fraction of the norm of \({\bf b}\) that lies in the column space of \(A\). It has an interesting geometric explanation. Namely, it is the cosine of the angle between \({\bf b}\) and the column space of \(A\). If \(\gamma =1\), then the linear system \(A{\bf x}={\bf b}\) is consistent, and \({\bf x}_{\rm opt}\) is the solution with the minimum norm. If \(\gamma = 0\), then \({\bf x}_{\rm opt} = 0\).

3.1 The Quantum Protocols

In this section, we present the quantum protocols for solving linear regression problems in the 1-way and 2-way models.

Case 1 (1-way). Only Bob can send quantum information to Alice.

In this case, it is Alice that needs to output the quantum state of the solution. The quantum protocol is straightforward. Namely, Bob just sends the quantum state \(| {\bf b} \rangle\) to Alice, then Alice applies \(A^{+}\) to \(| {\bf b} \rangle\). To this end, Alice constructs a block-encoding of \(A^{+}\), i.e., constructs a unitary \(U_{A}\) using the SVD of \(A\) such that \(\begin{equation*} U_A = \begin{pmatrix}A^{+}/\Vert A^{+}\Vert & \cdot \\ \cdot & \cdot \end{pmatrix}. \end{equation*}\) Then, she applies \(U_{A}\) to \(| 0 \rangle | {\bf b} \rangle\) and obtains (11) \(\begin{equation} \frac{1}{\Vert A^{+}\Vert } | 0 \rangle \otimes A^{+}| {\bf b} \rangle + | 0 \rangle ^\bot =\frac{\Vert A^{+}| {\bf b} \rangle \Vert }{\Vert A^{+}\Vert } | 0 \rangle \otimes | {\bf x}_{\rm opt} \rangle + | 0 \rangle ^\bot , \end{equation}\) where \(| 0 \rangle ^\bot\) refers to some orthogonal terms. Now, Alice can measure the first register. If she receives \(| 0 \rangle\), then the post-selected state is \(| {\bf x}_{\rm opt} \rangle\). The success probability of Alice seeing \(| 0 \rangle\) in the first register is \(\begin{equation*} \frac{\Vert A^{+} | {\bf b} \rangle \Vert ^2}{\Vert A^{+}\Vert ^2}. \end{equation*}\) Since the communication is 1-way, Alice cannot use the quantum amplitude amplification technique. Thus, for Alice to obtain a copy of the state \(| {\bf x}_{\rm opt} \rangle\), they need to repeat the above procedure \(O(\Vert A^{+}\Vert ^2/\Vert A^{+} | {\bf b} \rangle \Vert ^2)\) times, i.e., Bob sends \(O(\Vert A^{+}\Vert ^2/\Vert A^{+} | {\bf b} \rangle \Vert ^2)\) copies of the state \(| {\bf b} \rangle\) to Alice. Therefore, the quantum communication complexity is \(O((\log m)\Vert A^{+}\Vert ^2/\Vert A^{+} | {\bf b} \rangle \Vert ^2)\). Usually, Bob does not know \(\Vert A^{+}\Vert ^2/\Vert A^{+} | {\bf b} \rangle \Vert ^2\) exactly, which depends on \(A\) and \({\bf b}\). Here, we assume that Bob knows a good upper bound on it, so he knows how many copies need to be sent to Alice.

To obtain a clear intuition about the complexity, we can bound the complexity in terms of \(\kappa\) (the condition number of \(A\)) and \(\gamma\) (defined in Equation (10)). Suppose the SVD of \(A = \sum _{i=1}^r \sigma _i | u_i \rangle \langle v_i|\) and \({\bf b}= \sum _{i=1}^m \beta _i | u_i \rangle\), where \(r={\rm Rank}(A)\) and \(\sigma _1\ge \cdots \ge \sigma _r\gt 0\). Then, \(\Vert A^+\Vert = 1/\sigma _r\) and (12) \(\begin{equation} \Vert A^{+} {\bf b}\Vert ^2 = \sum _{i=1}^r \frac{|\beta _i|^2}{\sigma _i^2} \ge \frac{1}{\sigma _1^2} \sum _{i=1}^r |\beta _i|^2 = \frac{1}{\sigma _1^2} \frac{\Vert A {\bf x}_{\rm opt}\Vert ^2}{\Vert {\bf b}\Vert ^2} = \frac{\gamma ^2}{\sigma _1^2}. \end{equation}\) So, the communication complexity is bounded by \(O((\log m)\kappa ^2/\gamma ^2)\).

It is possible that \(\Vert A^{+} | {\bf b} \rangle \Vert\) (or \(\gamma\)) is very small or even zero. This happens when \({\bf b}\) is far from the column space of \(A\). In this case, there is a small success probability to obtain the solution state by measuring the first register of the state (11). If after \(O(m\log (mn))\) measurements Alice still does not receive \(| 0 \rangle\), then this means that the success probability is small. When this happens, Bob can just send the whole vector to Alice. This costs \(O(m\log (mn))\).

Case 2 (1-way). Only Alice can send quantum information to Bob.

In this case, Bob needs to output the quantum state of the solution. The quantum protocol reads as follows: Alice computes \(A^{+}\) and prepares the quantum state of \(A^{+}\): \(\begin{equation*} | A^{+} \rangle := \frac{1}{\Vert A^{+}\Vert _F} \sum _{i\in [n], j \in [m]} (A^{+})_{ij} | i \rangle | j \rangle = \frac{1}{\Vert A^{+}\Vert _F} \sum _{j \in [m]} A^{+}| j \rangle \otimes | j \rangle . \end{equation*}\) Then, she sends \(| A^{+} \rangle\) to Bob. Since Bob has the vector \({\bf b}\), he can construct a unitary \(U_b\) such that \(U_b| 0 \rangle = | \bar{{\bf b}} \rangle\), where \(\bar{{\bf b}}\) is the complex conjugate of \({\bf b}\). Now, he applies \(U_b^\dagger\) to the second register \(| j \rangle\) of \(| A^{+} \rangle\). The resulting state is \(\begin{equation*} \frac{1}{\Vert A^{+}\Vert _F} \sum _{j \in [m]} A^{+}| j \rangle \otimes U_b^\dagger | j \rangle = \frac{1}{\Vert A^{+}\Vert _F} A^{+}| {\bf b} \rangle \otimes | 0 \rangle + | 0 \rangle ^\bot = \frac{\Vert A^{+}| {\bf b} \rangle \Vert }{\Vert A^{+}\Vert _F} | {\bf x}_{\rm opt} \rangle \otimes | 0 \rangle + | 0 \rangle ^\bot . \end{equation*}\) Regarding the first equality, note that \(U_b| 0 \rangle = | \bar{{\bf b}} \rangle\), i.e., the first column of \(U_b\) is \(| \bar{{\bf b}} \rangle\), so we have \(U_b^\dagger | j \rangle = b_j | 0 \rangle + | 0 \rangle ^\bot\). This means \(\begin{equation*} \sum _{j \in [m]} A^{+}| j \rangle \otimes U_b^\dagger | j \rangle =\sum _{j \in [m]} b_j A^{+}| j \rangle \otimes | 0 \rangle + | 0 \rangle ^\bot =A^{+}| {\bf b} \rangle \otimes | 0 \rangle + | 0 \rangle ^\bot . \end{equation*}\) The success probability of obtaining \(| {\bf x}_{\rm opt} \rangle\) is \(\begin{equation*} \frac{\Vert A^{+}| {\bf b} \rangle \Vert ^2}{\Vert A^{+}\Vert _F^2}. \end{equation*}\) This means that Alice needs to send \(O(\Vert A^{+}\Vert _F^2/\Vert A^{+}| {\bf b} \rangle \Vert ^2)\) copies of the state \(| A^{+} \rangle\) to Bob. So, the total number of qubits in communication is \(O((\log (mn))\Vert A^{+}\Vert _F^2/\Vert A^{+}| {\bf b} \rangle \Vert ^2)\). Note that \(\begin{equation*} \Vert A^{+}\Vert _F^2=\sum _{i=1}^r \frac{1}{\sigma _i^2} \le \frac{\min (m,n)}{\sigma _r^2}, \end{equation*}\) together with Equation (12), the communication complexity is bounded by \(O((\log (mn))\min (m,n)\kappa ^2/\gamma ^2).\)

Similar to the discussion in case 1, if \(\gamma\) is too small, then Alice can just send the whole matrix to Bob, which uses \(O(mn\log (mn))\) qubits in communication.

Case 3 (2-way). Alice and Bob can send quantum information to each other.

In this case, either one can output the solution state. They still use the protocol designed in the first case. Since it is 2-way quantum communication, they can use the quantum amplitude amplification technique. More precisely, denote the state (11) as \(| \psi \rangle =U_A | 0 \rangle | {\bf b} \rangle\). To apply the quantum amplitude amplification, the main obstacle for Alice and Bob is to perform the reflection \(\begin{equation*} 2| \psi \rangle \langle \psi | - I = U_A (2| 0 \rangle | {\bf b} \rangle \langle 0|\langle {\bf b}| - I) U_A^\dagger . \end{equation*}\) They can achieve this as follows: For any state, say, in Bob’s hand, if he wants to apply the reflection, then he can first send the state to Alice who applies \(U_A^\dagger\) to it. Then, Alice sends the new state back to Bob who can apply the reflection \(2| 0 \rangle | {\bf b} \rangle \langle 0|\langle {\bf b}| - I\). After that, he sends the state to Alice again and asks her to apply \(U_A\) to the state. Finally, Alice sends the resulting state to Bob.

Therefore, the total number of copies Bob needs to send to Alice is \(O(\Vert A^{+}\Vert /\Vert A^{+} | {\bf b} \rangle \Vert)\). This means that the quantum communication complexity is \(O((\log m)\Vert A^{+}\Vert /\Vert A^{+} | {\bf b} \rangle \Vert)\), which is bounded from above by \(O((\log m)\kappa /\gamma)\). In summary, we have the following result:

Theorem 7.

Suppose Alice has a matrix \(A \in \mathbb {R}^{m\times n}\) and Bob has a vector \({\bf b}\in \mathbb {R}^{m}\). The quantum communication complexity of outputting \(| A^{+}{\bf b} \rangle\) is

(1)	\(O(\min \lbrace (\log m)\Vert A^{+}\Vert ^2 \Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert ^2 , m\log (mn)\rbrace)\) if the communication is 1-way from Bob to Alice.
(2)	\(O(\min \lbrace (\log (mn))\Vert A^{+}\Vert _F^2\Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert ^2,mn\log (mn)\rbrace)\) if the communication is 1-way from Alice to Bob.
(3)	\(O(\min \lbrace (\log m)\Vert A^{+}\Vert \Vert {\bf b}\Vert /\Vert A^{+} {\bf b}\Vert ,m\log (mn)\rbrace)\) if the communication is 2-way.

As a direct corollary, if \(A\) is well-conditioned and \({\bf b}\) lies in the column space of \(A\) (e.g., \(A\) is unitary), then the communication complexity in case 1 and case 3 is \(O(\log m)\), and the communication complexity in case 2 is \(O(\min (m,n)\log (mn))\), since \(\Vert A^{+}\Vert _F^2\Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert ^2=O(\text{rank}(A))=O(\min (m,n))\). From our lower bounds analysis in the next section, these are indeed optimal.

3.2 Lower Bounds

In this section, we show that the quantum protocols we are given in the previous section are optimal up to a factor of \(\log (mn)\). We also prove the lower bounds of classical protocols for the task of sampling from the optimal solution. To our ends, we first compute the quantum/classical communication complexity for the permutation-index problem defined as follows:

Definition 8

(Permutation-index Problem).

Suppose Alice has a permutation \(P=(P_1,\ldots ,P_n)\) of \([n]\), where \(P_i\in [n]\). Suppose Bob has an index \(j\in [n]\). The goal is for Bob to determine \(P_j\), where the communication is 1-way from Alice to Bob.

This problem is a special case of the index problem, in which \(P\) is a multiset [16]. However, we shall show that it is as hard as the index problem.

Proposition 9.

The quantum and classical 1-way communication complexity of the Permutation-index Problem is \(\Theta (n\log n)\).

Proof.

We prove that the index problem can be reduced to the Permutation-index Problem. Let \(S=\lbrace s_1,\ldots ,s_n\rbrace\) be a multiset of \(n\) integers from \([n]\). In the index problem, Alice has \(S\) and Bob has an index \(j\in [n]\), their goal is for Bob to determine \(s_j\). It is known that the quantum and classical 1-way communication complexity of the index problem is \(\Theta (n\log n)\) [15, 16].

The reduction is as follows: In the first step, for any \(i\in [n]\), Alice computes the multiplicity of \(i\) in \(S\), namely, Alice computes \(m_i = \#\lbrace j\in [n]:s_j=i\rbrace\). Then, she sends the information \(M = (m_1,\ldots ,m_n)\) to Bob. In the second step, Alice transforms the multiset \(S\) into a permutation. Suppose \(i_1\lt \cdots \lt i_p\) are the integers such that their multiplicities are nonzero. Then, Alice replaces \(i_1\) in \(S\) with \(1,2,\ldots ,m_{i_1}\) (the order is not important in the replacement), replaces \(i_2\) in \(S\) with \(m_{i_1}+1,\ldots ,m_{i_1}+m_{i_2}\), and replaces \(i_p\) in \(S\) with \(m_{i_1}+\cdots +m_{i_{p-1}}+1,\ldots ,n\). In the end, Alice receives a permutation \(P\). It is not hard to see that there is a one-to-one correspondence between \(S\) and the pair \((M, P)\). So, if there is a protocol for Alice and Bob to determine \(P_j\), then from the above construction, they can use this protocol to determine \(S_j\). Here, Alice needs to send \(M\) to Bob first.

Next, we compute the communication complexity. In the first step, the number of bits required to transmit \(M\) is bounded by \(\log \binom{2n-1}{n-1}\le 2n\), where \(\binom{2n-1}{n-1} = \#\lbrace m_1+\cdots +m_n=n:m_i\ge 0\rbrace\) is the number of nonnegative \(n\)-decompositions of \(n\). This means that the communication complexity of Permutation-index Problem is \(\Theta (n\log n) - 2n = \Theta (n\log n)\). □

We are now ready to prove the lower bounds of quantum and classical protocols. We first compute quantum lower bounds in terms of \(\kappa\) and \(\gamma\). Based on it, we then use similar ideas to prove the optimality of our quantum protocols and estimate classical lower bounds.

Theorem 10

(Quantum Lower Bounds (with Respect to κ, γ))

Suppose Alice has a matrix \(A \in \mathbb {R}^{m\times n}\) and Bob has a vector \({\bf b}\in \mathbb {R}^{m}\). To prepare the quantum state \(| A^{+}{\bf b} \rangle\),

(1)	\(\Omega (\kappa ^2 + 1/\gamma ^2 + \log \min (m,n))\) qubits of communication are required in the 1-way case from Bob to Alice.
(2)	\(\Omega (\min (m,n)\log \min (m,n))\) qubits of communication are required in the 1-way case from Alice to Bob.
(3)	\(\Omega (\kappa + 1/\gamma + \log \min (m,n))\) qubits of communication are required in the 2-way case.

Proof.

To prove the claimed lower bounds, our main idea is to reduce the disjointness problem or the index problem to a linear regression problem. In our reduction, the linear regressions we constructed have the property that \(A\) is square. This naturally leads to lower bounds in the general case. Namely, if \(m\gt n\), then we can use the same reduction by focusing on \(\text{argmin}_{{\bf x}} \left\Vert {\left(\begin{matrix} A \\ 0 \end{matrix}\right) {\bf x}- \left(\begin{matrix} {\bf b}\\ 0 \end{matrix}\right)}\right\Vert\). If \(m\le n\), then we can focus on \(\text{argmin}_{{\bf x}} \Vert {\left(\begin{matrix} A & 0 \\ \end{matrix}\right) {\bf x}- \left(\begin{matrix} {\bf b}& 0 \\ \end{matrix}\right)\Vert }\). This explains why the dependence on \(m,n\) is \(\min (m,n)\). Because of this, we below assume that \(m=n\).

We prove the second claim using the hardness of Permutation-index Problem. We can reduce the Permutation-index Problem to a linear regression problem as follows: Alice constructs a permutation matrix \(P\) according to the permutation she has. Bob constructs the quantum state \(| j \rangle\). If Bob can sample from the solution state \(| P_j \rangle = P | j \rangle\), then Bob can solve the Permutation-index Problem. Thus, the quantum lower bound of solving linear regression problems in case 2 is \(\Omega (n\log n)\).

We shall use the hardness of the disjointness problem to prove the first and third claims together. We aim to show that for any \(\kappa\), there is an instance \((A,{\bf b})\) such that at least \(\kappa\) (or \(\kappa ^2\)) bits of communication are required to prepare the quantum state of the optimal solution.

If \(\kappa\) is too large, then the naive protocol of sending the whole matrix or vector will be used, so we assume that \(1\le \kappa \le n\). Denote \(l=\lfloor \kappa \rfloor\) as the integer part of \(\kappa\). Suppose Alice has a subset \(S \subseteq [l]\) and Bob has another subset \(T\subseteq [l]\). Without loss of generality, we assume that \(S,T \ne \emptyset\) and \(S, T\) are proper subsets of \([l]\) of size \(\Theta (l)\).⁴ In the disjointness problem, their goal is to determine if \(S\cap T=\emptyset\). It is known that for this problem, the quantum 1-way communication complexity is \(\Theta (l)\) [7] and the quantum 2-way communication complexity is \(\Theta (\sqrt {l})\) [30] (also see Proposition 1).

The reduction is as follows: Choose \(\varepsilon = 1/\sqrt {l}\). Alice constructs an \(n\times n\) diagonal matrix \(A\) as follows: \(\begin{equation*} A_{ii} = {\left\lbrace \begin{array}{ll}1 & i \in S, \\ 1/\varepsilon & i \in [l] \backslash S, \\ 0 & i \in [n] \backslash [l]. \end{array}\right.} \end{equation*}\) Bob constructs an \(n\) dimensional vector \({\bf b}\) as follows: \(\begin{equation*} b_i = {\left\lbrace \begin{array}{ll}1 & i \in T, \\ \varepsilon & i \in [l] \backslash T, \\ 0 & i \in [n] \backslash [l]. \end{array}\right.} \end{equation*}\) Then, \(\begin{equation*} | A^+{\bf b} \rangle = \frac{1}{\sqrt {L}} \left(\sum _{i\in S \cap T} | i \rangle + \varepsilon \sum _{j\in (S \backslash T) \cup (T \backslash S)} | j \rangle + \varepsilon ^2 \sum _{k\in [m] \backslash {S \cup T} } | k \rangle \right) , \end{equation*}\) where \(\begin{equation*} L = |S \cap T| + \varepsilon ^2 |(S \backslash T) \cup (T \backslash S)| + \varepsilon ^4 |\overline{S \cup T}|. \end{equation*}\)

If \(S \cap T \ne \emptyset\), then the probability of getting an \(i\in S\cap T\) from measuring \(| A^{+}{\bf b} \rangle\) is at least 1/2. If \(S \cap T = \emptyset\), then we obtain a uniformly random \(i\in S \cup T\) from measurements. The disjointness problem is also hard even if \(|S\cap T| \le 1\) [29, 30].⁵ Under this setting, if \(S\cap T \ne \emptyset\), then we will see the same index from \(S\cap T\) many times by measuring the state \(| A^{+}{\bf b} \rangle\). If \(S\cap T = \emptyset\), then we will see different indices from \(S\cup T\). So, preparing \(| A^{+}{\bf b} \rangle\) is sufficient to solve the disjointness problem.

It is easy to compute that \(\kappa = \sqrt {l}\) and \(\gamma = \Theta (1)\). So, the quantum communication complexity is at least quadratic in \(\kappa\) in the first claim and at least linear in the third claim.

Regarding the dependence on \(\gamma\), we use the following construction: We also assume that \(|S\cap T|\le 1\). If \(|S\cap T|=1\), then we denote the intersection as \(\lbrace w\rbrace\). Alice constructs an \((n+1)\times (n+1)\) diagonal matrix \(A\) by setting \(A_{ii}=1\) if \(i\in S \cup \lbrace n+1\rbrace\) and 0 otherwise. Bob constructs a vector \({\bf b}\) such that \(b_i=1\) if \(i\in T\cup \lbrace n+1\rbrace\) and 0 otherwise. Now, we have \(A^+ = A\) and \({\bf x}_{\rm opt} = A{\bf b}= \sum _{i\in S\cap T} | i \rangle + | n+1 \rangle\). If \(S\cap T = \emptyset\), then we only see \(n+1\) by measuring \(| {\bf x}_{\rm opt} \rangle\). Otherwise, we will see \(w\) with probability \(1/2\). Now, \(\kappa = 1\) and \(\gamma = \sqrt {1/|T|} = \Theta (\sqrt {1/l})\), since \(|T|=\Theta (l)\) as assumed in the beginning. Hence, the quantum communication complexity is at least quadratic in \(1/\gamma\) in the first claim and at least linear in the third claim.

Finally, the lower bound of \(\log n\) comes from the index problem. In the index problem, Alice has a bit string \((x_1,\ldots ,x_n)\) and Bob has an index \(j\). The goal is to output \(x_j\). If the communication is from Bob to Alice or 2-way, then Bob can just send the index to Alice, and Alice outputs \(x_j\). This costs \(\Theta (\log n)\) communication.⁶ To build the connection between this problem and the linear regression problem, Alice constructs a permutation matrix \(U\in \mathbb {R}^{2n \times 2n}\). It is a block-diagonal matrix, each block has dimension 2. If \(x_j=1\), then the \(j\)th block is \(I_2\). Otherwise, the \(j\)th block is Pauli-\(X\). Bob constructs \(| 0 \rangle | j \rangle\). So, if \(x_j=1\), then \(U| 0 \rangle | j \rangle = | 0 \rangle | j \rangle\). Otherwise, \(U| 0 \rangle | j \rangle = | 1 \rangle | j \rangle\). This means that if we can prepare \(U| 0 \rangle | j \rangle\), then we can solve the index problem. □

Recall from Theorem 7 that the communication complexity of the quantum protocol we proposed for case 1 is \(O((\log m)\Vert A^{+}\Vert ^2\Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert ^2)\) and for case 3 is \(O((\log m)\Vert A^{+}\Vert \Vert {\bf b}\Vert /\Vert A^{+} {\bf b}\Vert)\). For the constructions in the above proof, we have \(\Vert A^+\Vert = 1\) and \(\Vert A^{+} {\bf b}\Vert ^2/\Vert {\bf b}\Vert ^2 = \Theta (|S\cap T|/|T|)\). So, we indeed proved that the lower bound is \(\Omega (\Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert ^2)\) for case 1 and \(\Omega (\Vert {\bf b}\Vert /\Vert A^{+} {\bf b}\Vert)\) for case 3. However, it is not clear what is the dependence on \(\Vert A^+\Vert\). To understand this, we can make appropriate scaling so the above construction shows that the lower bound is \(\Omega (\Vert A^{+}\Vert ^2\Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert ^2)\) for case 1 and \(\Omega (\Vert A^{+}\Vert \Vert {\bf b}\Vert ^2/\Vert A^{+} {\bf b}\Vert)\) for case 3. This suggests that the quantum protocols for case 1 and case 3 are optimal up to a factor of \(\log m\). We can use a similar construction to show the optimality of the quantum protocol for case 2 up to a factor of \(\log (mn)\). We state this in the following theorem. The proof is similar to that of Theorem 10, so we defer it to Appendix A.

Theorem 11 (Quantum Lower Bounds).

Suppose Alice has a matrix \(A \in \mathbb {R}^{m\times n}\) and Bob has a vector \({\bf b}\in \mathbb {R}^{m}\). To prepare the quantum state \(| A^{+}{\bf b} \rangle\),

(1)	\(\Omega (\Vert A^+\Vert ^2\Vert {\bf b}\Vert ^2/{\Vert A^+{\bf b}\Vert ^2})\) qubits of communication are required in the 1-way case from Bob to Alice.
(2)	\(\Omega (\Vert A^+\Vert _F^2\Vert {\bf b}\Vert ^2/\Vert A^+{\bf b}\Vert ^2)\) qubits of communication are required in the 1-way case from Alice to Bob.
(3)	\(\Omega (\Vert A^+\Vert \Vert {\bf b}\Vert /\Vert A^+{\bf b}\Vert)\) qubits of communication are required in the 2-way case.

In Theorem 10, the lower bound is additive with respect to \(\kappa\) and \(\gamma\), which are two quantities with nice explanations. Note that \(\kappa = \Vert A\Vert \Vert A^+\Vert , \gamma = \Vert AA^+{\bf b}\Vert /\Vert {\bf b}\Vert\), so \(\kappa /\gamma \ge \Vert A^+\Vert \Vert {\bf b}\Vert /\Vert A^+{\bf b}\Vert\). Although the lower bound given in Theorem 11 is multiplicative, we cannot say it is a stronger lower bound. We indeed did not prove that \(\Omega (\kappa /\gamma)\) is a lower bound. Actually, Theorem 11 can be viewed as an alternative statement of Theorem 10 using \(A,{\bf b}\) rather than \(\kappa , \gamma\).

The classical communication complexity of the disjointness problem is \(\Theta (n)\) in 2-way communication. So, similar to the proof of Theorem 10, we have the following lower bounds for classical protocols:

Theorem 12 (Classical Lower Bounds).

Suppose Alice has a matrix \(A \in \mathbb {R}^{m\times n}\) and Bob has a vector \({\bf b}\in \mathbb {R}^{n}\). To sample from the solution \(A^{+}{\bf b}\),

(1)	\(\Omega (\min (m,n))\) bits communication are required in the 1-way case from Bob to Alice.
(2)	\(\Omega (\min (m,n)\log \min (m,n))\) bits communication are required in the 1-way case from Alice to Bob.
(3)	\(\Omega (\min (m,n))\) bits communication are required in the 2-way case.

All the lower bounds are also true even if \(A\) is well-conditioned, i.e., \(\kappa =O(1)\).

Proof.

Similar to the analysis at the beginning of the proof of Theorem 10, we only need to consider the case that \(m=n\). The reductions in the proof of Theorem 10 are also true for classical protocols, so we now only need to prove the claim that \(\Omega (n)\) bits communication are required in the 2-way case even if \(A\) is well-conditioned. Regarding this, we use the hardness of the Distributed Fourier Sampling problem studied in Reference [25]. For convenience, we assume that \(n=2^d\) for some integer \(d\gt 0\). In the Distributed Fourier Sampling problem, Alice has a function \(f:\lbrace 0,1\rbrace ^d \rightarrow \lbrace \pm 1\rbrace\), Bob has another function \(g:\lbrace 0,1\rbrace ^d \rightarrow \lbrace \pm 1\rbrace\). Their goal is to sample from the distribution corresponding to the Fourier coefficients of \(fg\), i.e., to sample from the state \(\begin{equation*} | P_{fg} \rangle :=\sum _{s\in \lbrace 0,1\rbrace ^d} \left(\frac{1}{2^d} \sum _{x\in \lbrace 0,1\rbrace ^d} f(x)g(x) (-1)^{s\cdot x}\right) | s \rangle . \end{equation*}\) We can reduce this problem to a linear regression problem as follows: Alice constructs the matrix \(H^{\otimes d} D_f\), where \(H\) is the Hadamard matrix, and \(D_f\) is diagonal with \(x\)th diagonal entry equals \(f(x)\), where \(x\in \lbrace 0,1\rbrace ^d\). Bob constructs a vector whose quantum state is \(| g \rangle = {2^{-d/2}} \sum _{x\in \lbrace 0,1\rbrace ^d} g(x) | x \rangle\). Then, \(| P_{fg} \rangle = H^{\otimes d} D_f | g \rangle .\) It was shown in Reference [25, Theorem 1] that any classical 2-way communication protocol for this problem must communicate \(\Omega (2^d)=\Omega (n)\) bits (also see Proposition 2). □

Theorem 12 indicates that if the communication is from 1-way from Bob to Alice or 2-way, then the naive protocol for Bob sending the whole vector to Alice is optimal. Regarding the second claim, we believe that the lower bound is \(\Omega (n^2)\). However, we could not prove this claim. For this, we make the following conjecture:

Conjecture 13.

Suppose Alice has a matrix \(A \in \mathbb {R}^{m\times n}\) and Bob has a vector \({\bf b}\in \mathbb {R}^{m}\). For Bob to sample from the solution \(A^{+}{\bf b}\), \(\Omega (mn)\) bits of communication are required in the 1-way case, where the communication is from Alice to Bob.

It is possible that one may be able to use the composition of the Distributed Fourier Sampling problem and the index problem to prove the conjecture. In this composed problem, Alice has \(m\) Boolean functions \(f_1,\ldots ,f_m\), and Bob has a Boolean function \(g\) as well as an index \(j\). The goal is to sample from \(| P_{f_jg} \rangle\).

4 MULTIPLE PARTIES

In this section, we consider linear regression problems in a more general setting. Suppose there are \(s\) parties \(P_0, \ldots , P_{r-1}\). For each \(i\), the party \(P_i\) receives a matrix \(A_i \in \mathbb {R}^{d_i \times n}\) and a vector \({\bf b}_i\in \mathbb {R}^{d_i}\). Their goal is to solve the linear regression problem (13) \(\begin{equation} \mathop{\text{argmin}}\limits_{{\bf x}} \quad \Vert A {\bf x}- {\bf b}\Vert , \end{equation}\) where (14) \(\begin{equation} A = \begin{pmatrix}A_0 \\ \vdots \\ A_{r-1} \end{pmatrix}, \quad {\bf b}= \begin{pmatrix}{\bf b}_0 \\ \vdots \\ {\bf b}_{r-1} \end{pmatrix}. \end{equation}\)

We assume that there is a referee such that each party can only send information to the referee. In the SMP model, the communication is 1-way, i.e., the referee is not allowed to send information to other parties. From the second claim of Theorem 10, we have that it is hard to solve the linear regression problem (13) in the SMP model (see Proposition 17 below). So, similar to the classical coordinator model [35], we assume that the communication is 2-way between each party and the referee. We call it the quantum coordinator model. We will discuss this model in more detail in Section 6.

4.1 The Quantum Protocol

In this section, we aim to propose a quantum protocol for solving Equation (13) based on the technique of QSVT. With QSVT, we have a near-optimal quantum algorithm for solving linear regression problems in terms of time and query complexity [12]. Below, we show that this algorithm is still effective in the quantum coordinator model. For completeness, we list all the invoked results in Appendix C.

We first present a general result for QSVT in terms of communication complexity.

Proposition 14.

Suppose \(P_i\) has a matrix \(A_i \in \mathbb {R}^{d_i \times n}\), where \(i\in \lbrace 0,\ldots ,r-1\rbrace\). Let \(A\) be given in Equation (14). Assume that \(A\) is Hermitian. Let \(f\in \mathbb {R}[x]\) be a polynomial of degree \(d\) satisfying that \(|f(x)|\le 1/2\) for all \(x \in [-1,1]\). Then, there is a quantum protocol in the quantum coordinator model for the referee to construct an \((1, 3+\log r, 0)\) block-encoding of \(f(A/\alpha)\) with \(O(rd\log n)\) qubits of communication, where \(\alpha = \sqrt {\sum _{i=0}^{r-1} \Vert A_i\Vert ^2}\). Moreover, if \(A=\sum _i A_i\), then the result still holds with \(\alpha = \sum _i \Vert A_i\Vert\).

Proof.

The quantum protocol contains two steps.

Step 1. The referee needs a block-encoding of \(A\). This is achieved by Lemma 5. The party \(P_i\) constructs an \((\Vert A_i\Vert , 1, 0)\) block-encoding of \(A_i\) based on SVD (see Equation (4)), i.e., \(P_i\) computes the following unitary: \(\begin{equation*} U_i = \begin{pmatrix}A_i/\Vert A_i\Vert & \cdot \\ \cdot & \cdot \end{pmatrix}. \end{equation*}\) By Lemma 5, (15) \(\begin{equation} U = ({\rm SWAP}_{1,2}\otimes I_n)\left(\sum _{i=0}^{r-1} | i \rangle \langle i| \otimes U_i\right) (V\otimes I_{2} \otimes I_n) \end{equation}\) is an \((\alpha ,1+\log r,0)\) block-encoding of \(A\), where \(\alpha = \sqrt {\sum _i \Vert A_i\Vert ^2}\), and (16) \(\begin{equation} V | 0 \rangle = \frac{1}{\alpha } \sum _{j=0}^{r-1} \Vert A_j\Vert \, | j \rangle . \end{equation}\) For the referee to use this unitary, each party \(P_i\) sends \(\Vert A_i\Vert\) to the referee so the referee can construct the unitary \(V\) satisfying Equation (16). For any state, to apply \(U\) to it, the referee can first apply \(V\otimes I_{2} \otimes I_n\) to it. Next, the referee sends the state to \(P_0\) and asks \(P_0\) to apply \(U_0\) to the second and third register if the first register is \(| 0 \rangle\). After that, \(P_0\) sends the state back to the referee so the referee can ask \(P_1\) to do a similar control operation based on \(U_1\). They need to repeat this process \(r\) times. Finally, the referee applies \({\rm SWAP}_{1,2}\otimes I_n\) to the resulting state. This process totally uses \(O(r \log n)\) qubits of communication.

Step 2. The referee constructs the block-encoding of \(f(A/\alpha)\). This is achieved by QSVT. By Reference [12, Theorem 56], with the block-encoding of \(A\) and the polynomial \(f\), we can construct a \((1, 3+\log r,0)\) block-encoding \(\widetilde{U}\) of \(f(A/\alpha)\). The interesting part is the quantum circuit of \(\widetilde{U}\), which has a decomposition of the form (see Reference [12, Lemma 19]) (17) \(\begin{equation} \widetilde{U} = W_0 U W_1 U^\dagger W_2 U W_3 U^\dagger \cdots W_d U, \end{equation}\) where \(W_0, W_1,\ldots , W_d\) are generated by one- and two-qubit unitaries, depending on the polynomial \(f\) and some other public unitaries. Since these unitaries and function \(f\) are public, the referee can use \(W_0, W_1,\ldots , W_d\) without any communication. As discussed in step 1, the referee can use \(U\) and \(U^\dagger\) once with \(O(r\log n)\) qubits of communication. In Equation (17), the referee uses \(U\) and \(U^\dagger\) \(O(d)\) times. So, to use \(\widetilde{U}\) once, they communicate \(O(r d \log n)\) qubits in total.

The last claim can be proved similarly based on Lemma 6. □

Recall that in terms of time complexity, given an \((\alpha , q, \varepsilon)\) block-encoding of \(A\) in cost \(T\), we can construct a \((1,q+2, 4d\sqrt {\varepsilon /\alpha })\) block-encoding of \(f(A/\alpha)\) in cost \(O(dT)\) [12, Theorem 56]. By Proposition 14, we still have the same formula for the communication complexity of using QSVT. The difference is that we can compute \(T=O(r \log n)\) and \(\alpha =\sqrt {\sum _i \Vert A_i\Vert ^2}\) precisely. If \(A\) is not Hermitian, then the result in Proposition 14 is also true except that the matrix function \(f(A/\alpha)\) is defined with respect to singular value decomposition (see Reference [12, Definition 16]). With the above proposition, we now can propose a quantum protocol for solving linear regressions.

Theorem 15.

For the problem (13) in the quantum coordinator model, there is a quantum protocol for the referee to prepare \(| A^{+}{\bf b} \rangle\) by using \(\widetilde{O}(r^{1.5}\kappa /\gamma)\) qubits of communication.

Proof.

Let \(\delta \in (0,1]\) be a threshold of the singular values of \(A\). Our idea below depends on QSVT. When using QSVT to a polynomial approximation of \(3\delta /4x\), we will obtain \(A^+_{\ge \delta }\) automatically. Here, \(A^+_{\ge \delta }\) is the truncated matrix by removing the singular values of \(A\) that are smaller than \(\delta\). Just for the convenience of the statement of complexity analysis below, we assume that \(\delta = \Theta (\sigma _{\min })\) so \(\Vert A\Vert /\delta = \Theta (\kappa)\) and \(A^+_{\ge \delta }=A^+\). Also for convenience, we denote \(m = d_0+\cdots +d_{r-1}\), the row dimension of \(A\).

When solving linear regression problems, we can assume that \(A\) is Hermitian. Otherwise, we can consider \({\left(\begin{matrix} 0 & A \\ A^\dagger & 0 \end{matrix}\right)}.\) Its block-encoding is \({\left(\begin{matrix} 0 & U \\ U^\dagger & 0 \end{matrix}\right)}\), where \(U\) is given in Equation (15). With a similar argument, the referee can still use this block-encoding with \(O(r \log (mn))\) qubits of communication. So, below, we assume that \(A\) is Hermitian.

To apply Proposition 14, the referee needs a polynomial approximation of \(1/x\) in the interval \([-1,1]\backslash [-\delta ^{\prime },\delta ^{\prime }]\). This function is public and its polynomial approximation is known, e.g., see Reference [12, Corollary 69]. Indeed, Reference [12, Corollary 69] gives a polynomial approximation of \(3\delta ^{\prime }/4x\), which is enough for solving linear regression problems. The degree of the polynomial is \(d = O((1/\delta ^{\prime })\log (1/\varepsilon))\). In Proposition 14, \(f\) will be applied to the singular values of \(A\). However, the singular values of \(A\) can be larger than 1. To overcome this, we can apply Proposition 14 to the matrix \(A/\alpha\), where \(\alpha = \sqrt {\sum _{i=0}^{r-1} \Vert A_i\Vert ^2}=O(\sqrt {r}\Vert A\Vert),\) because \(\Vert A_i\Vert \le \Vert A\Vert\) for all \(i\). This means \(\delta ^{\prime }=\delta /\alpha\). By Proposition 14, there is a quantum protocol for the referee to construct a \((1,3+\log r,0)\) block-encoding \(\widetilde{U}\) of \((3\delta /4)A^{+}\) with \(O((r\alpha /\delta) (\log 1/\varepsilon) \log (mn))\) qubits of communication in total.

Regarding the quantum state of \({\bf b}\), each party sends the norm information of \({\bf b}_i\) to the referee, and then the referee prepares (18) \(\begin{equation} \frac{1}{\Vert {\bf b}\Vert } \sum _{i=0}^{r-1} \Vert {\bf b}_i\Vert \, | i \rangle | 0 \rangle . \end{equation}\) Similar to the application of the block-encoding of \(A\), to prepare the target state \(\begin{equation*} | {\bf b} \rangle = \frac{1}{\Vert {\bf b}\Vert } \sum _{i=0}^{r-1} \Vert {\bf b}_i\Vert \, | i \rangle | {\bf b}_i \rangle \end{equation*}\) the referee can send the state (18) to each party gradually and ask that party to prepare \(| {\bf b}_i \rangle\) using a control operator. This requires \(O(r \log (mn))\) qubits of communication in total.

Finally, the referee applies \(\widetilde{U}\) to \(| 0 \rangle | {\bf b} \rangle\) to prepare \(\begin{equation*} \frac{3\delta }{4} | 0 \rangle \otimes A^{+} | {\bf b} \rangle + | 0 \rangle ^\bot . \end{equation*}\) The success probability is \(\Omega (\delta ^2\gamma ^2/\Vert A\Vert ^2)\), where \(\gamma\) is defined in Equation (10). Also, see a similar analysis in Equation (12). Since the communication is 2-way, they can use amplitude amplification. Therefore, in total, they communicated \(\begin{equation*} O\left(\frac{r \alpha \Vert A\Vert }{\delta ^2\gamma } (\log 1/\varepsilon) \log (mn) \right) = O\left(r^{1.5} (\kappa ^2/\gamma) (\log 1/\varepsilon) \log (mn) \right) \end{equation*}\) qubits.

The dependence on \(\kappa\) can be reduced to be linear by the technique of variable-time amplitude amplification. This technique is still effective in the quantum coordinator model. We defer the analysis of this part to Appendix B. □

Recall that when solving linear regressions on a quantum computer, the time complexity is \(\widetilde{O}((T_A+T_b)\alpha /\delta \gamma)\) [8, Corollary 31], where \(T_A\) is the time complexity to construct the block-encoding of \(A\) and \(T_b\) is the time complexity to prepare the quantum state \(| {\bf b} \rangle\). Using our notation in the communication complexity model, \(\alpha = O(\sqrt {r}\Vert A\Vert)\) and \(T_A =T_b = \widetilde{O}(r)\), where \(T_A,T_b\) should be understood as the communication complexity of constructing the block-encoding and preparing the quantum state, respectively. This leads to a complexity of \(\widetilde{O}(r^{1.5}\kappa /\gamma)\), which is exactly the result described in Theorem 15. This means that the formula \(\widetilde{O}((T_A+T_b)\alpha /\delta \gamma)\) is true for both time and communication complexity. The difference is that, for communication complexity, we can compute \(T_A, T_b\) precisely, while for time complexity \(T_A, T_b\) are usually hard to estimate.

In the quantum case, the dependence of the complexity on \(r\) is \(r^{1.5}\), where \(\sqrt {r}\) comes from the construction of the block-encoding of \(A\) and \(r\) comes from the number of parties. Regarding the time and query complexity, QSVT usually leads to the best algorithm for linear regression. So, in the communication complexity, \(r^{1.5}\) might be optimal. In comparison, the complexity is linear in \(r\), classically [35]. It was shown in Reference [35] that for the harder task of outputting a vector solution, the naive protocol, that is, player \(P_i\) sends \(A_i^TA_i, A_i^T{\bf b}\) to the referee, is optimal. That is why the dependence on \(r\) is linear. In the quantum case, some other techniques may be required if we aim to reduce the dependence on \(r\).

Finally, we consider a general linear regression problem by setting (19) \(\begin{equation} A = A_0 + \cdots + A_{r-1}, \quad {\bf b}= {\bf b}_0 + \cdots + {\bf b}_{r-1} \end{equation}\) in Equation (13). Here, we have to assume that \(d_0=\cdots =d_{r-1}\). We also assume that \(A,{\bf b}\ne 0\). Note that for \(A,{\bf b}\) defined in Equation (19), the linear regression (13) is equivalent to \(\sum _i A_i^T A_i {\bf x}=\sum _i A_i^T {\bf b}_i\), which is a special case of the setting of Equation (19). For the setting (19), by Proposition 14, for any polynomial \(f\) of degree \(d\), the referee can construct a \((1,3+\log r,0)\) block-encoding of \(f(A/\alpha)\) with \(O(rd\log n)\) qubits of communication, where \(\alpha = \sum _{i=0}^{r-1}\Vert A_i\Vert\). Regarding the quantum state of \({\bf b}\), the referee can prepare \(\begin{equation*} \frac{1}{\sum _{i} \Vert {\bf b}_i\Vert } \sum _{i=0}^{r-1} \Vert {\bf b}_i\Vert \, | 0 \rangle | {\bf b}_i \rangle + | 0 \rangle ^\bot = \frac{\Vert {\bf b}\Vert }{\sum _{i} \Vert {\bf b}_i\Vert } | 0 \rangle | {\bf b} \rangle + | 0 \rangle ^\bot \end{equation*}\) by linear combination of unitaries with \(O(s\log n)\) qubits of communication.⁷ Therefore, similar to the protocol in Theorem 15, we have the following result:

Proposition 16.

For the setting (19), there is a quantum protocol for the referee to prepare \(| A^{+}{\bf b} \rangle\) by using (20) \(\begin{equation} \widetilde{O}\left(\frac{r \sum _{i=0}^{r-1}\Vert A_i\Vert }{\gamma \sigma _{\min }} \frac{\sum _{i=0}^{r-1} \Vert {\bf b}_i\Vert }{\Vert {\bf b}\Vert } \right) \end{equation}\) qubits of communication, where \(\sigma _{\min }\) is the minimal nonzero singular value of \(A\).

Unlike Theorem 15, here, we do not have \(\sum _{i=0}^{r-1}\Vert A_i\Vert \le r\Vert A\Vert\), and we also cannot give a nice bound for \(\sum _{i=0}^{r-1} \Vert {\bf b}_i\Vert /\Vert {\bf b}\Vert\).

4.2 Lower Bounds

In this section, we prove certain quantum/classical lower bounds for solving the linear regression problem (13). First, we show that it is hard to solve the linear regression (13) in the SMP model. This can be seen as evidence of why it is more interesting to consider the quantum coordinator model.

Proposition 17.

Assume that \(\sum _{i=0}^{r-1} d_i \ge n\). In the SMP model, \(\Omega (n\log n)\) qubits of communication are required to prepare the state \(| A^+{\bf b} \rangle\), and \(\Omega (n\log n)\) bits communication are required to sample from the solution \(A^+{\bf b}\).

Proof.

This is a direct corollary of the second claim of Theorems 10, 12. We can assume that the party \(P_0\) knows \(A_0,\ldots ,A_{r-1}\) and the referee knows \({\bf b}_0,\ldots ,{\bf b}_{r-1}\). Then, this is equivalent to a linear regression problem in the Alice-Bob model, where the communication is 1-way from Alice to Bob. □

We can similarly prove lower bounds in the quantum/classical coordinator model. But this only gives lower bounds in terms of \(\kappa\) or \(n\). Below, we consider the lower bound with respect to \(s\) and provide a much stronger one. We will use the hardness of a multi-player set-disjointness problem considered in Reference [35]. In this problem, the party \(P_j\) receives a subset \(T_j \subseteq [n]\), and their goal is to determine if \(T_0 \cap T_j \ne \emptyset\) for some \(j\ge 1\). As shown in Reference [26, Theorem 3.1] and Reference [37, Theorem 1] that for any classical protocol that succeeds with probability \(1-1/r^3\), the communication complexity is lower bounded by \(\Omega (r n)\). In the quantum case, we have the following result:

Lemma 18.

In the quantum coordinator model, the quantum communication complexity for the multi-player set-disjointness problem is \(\Theta (\sqrt {n} r)\).

Proof.

Since the two-player set-disjointness problem can be solved with \(O(\sqrt {n})\) qubits of communication, \(O(r \sqrt {n})\) provides a natural upper bound. Regarding the lower bound, we prove it in Section 6 (see Theorem 23). □

Theorem 19.

Assume that \(\sum _{i=0}^{r-1} d_i \ge n\), and \(r = O(\sqrt {n})\). In the coordinator model, \(\Omega (r \kappa)\) qubits of communication are required to prepare the state \(| A^+{\bf b} \rangle\), and \(\Omega (r n)\) bits communication are required to sample from \(A^+{\bf b}\).

Proof.

The proof is based on the hardness of the multi-player set-disjointness problem discussed above. Let \(\varepsilon = 1/\sqrt {n}, \xi = 1/\sqrt {r}\) and \(\eta = 1/\sqrt {n}r\). We consider the following reduction. The party \(P_0\) constructs a diagonal matrix \(D\) of dimension \(n\) by setting the \(i\)th diagonal entry as \(\begin{equation*} D_i = {\left\lbrace \begin{array}{ll}1 & i \in T_0, \\ 1/\varepsilon & i \notin T_0. \end{array}\right.} \end{equation*}\) For any \(j\ge 1\), the party \(P_j\) constructs a vector \({\bf b}_j\) by setting the \(i\)th entry as \(\begin{equation*} {\bf b}_j(i) = {\left\lbrace \begin{array}{ll}1 & i \in T_j, \\ \eta & i \notin T_j. \end{array}\right.} \end{equation*}\) Using a similar idea to the proof of Theorem 10, we want to construct a linear regression problem such that the optimal solution is close to \(D^{-1}({\bf b}_1+\cdots +{\bf b}_{r-1})\). For this, we consider the following linear regression problem: \(\begin{equation*} \mathop{\text{argmin}}\limits_{{\bf x}} \Vert A{\bf x}-{\bf b}\Vert , \quad \text{ where } A = \begin{pmatrix}D \\ \xi I_n \\ \vdots \\ \xi I_n \\ \end{pmatrix}, \, {\bf b}= \begin{pmatrix}{\bf 0} \\ {\bf b}_1 \\ \vdots \\ {\bf b}_{r-1} \\ \end{pmatrix}. \end{equation*}\) Up to normalization, the optimal solution is (21) \(\begin{equation} | {\bf x}_{\rm opt} \rangle = (D^2 + (r-1)\xi ^2 I_n)^{-1} | {\bf b}_1+\cdots +{\bf b}_{r-1} \rangle . \end{equation}\)

It is easy to see that the \(i\)th diagonal entry of \((D^2 + (r-1)\xi ^2 I_n)^{-1}\) equals (22) \(\begin{equation} {\left\lbrace \begin{array}{ll}\displaystyle \frac{1}{1 + (r-1)\xi ^2} & i \in T_0, \\ \displaystyle \frac{1}{\varepsilon ^{-2} + (r-1)\xi ^2} & i \notin T_0. \end{array}\right.} \end{equation}\) We use \(c_i\) to denote the \(i\)th entry of \({\bf b}_1+\cdots +{\bf b}_{r-1}\). Then, it is easy to check that if \(i\in T_1\cup \cdots \cup T_{r-1}\), then we have \(1\le c_i\le r-1\). Otherwise, \(c_i=(r-1)\eta\).

The quantum state of the optimal solution is \(\begin{equation*} | {\bf x}_{\rm opt} \rangle = \sum _{i\in T_0} \frac{c_i}{1 + (r-1)\xi ^2} \, | i \rangle + \sum _{j\notin T_0} \frac{c_j}{\varepsilon ^{-2} + (r-1)\xi ^2} \, | j \rangle . \end{equation*}\) We can reformulate it more precisely as follows: \(\begin{eqnarray*} | {\bf x}_{\rm opt} \rangle &=& \sum _{i\in T_0\cap (T_1\cup \cdots \cup T_{r-1})} \frac{c_i}{1 + (r-1)\xi ^2} \, | i \rangle +\sum _{j\in T_0\backslash (T_1\cup \cdots \cup T_{r-1})} \frac{(r-1)\eta }{1 + (r-1)\xi ^2} \, | j \rangle \\ && +\, \sum _{k\in (T_1\cup \cdots \cup T_{r-1}) \backslash T_0} \frac{c_k}{\varepsilon ^{-2} + (r-1)\xi ^2} \, | k \rangle + \sum _{l\notin T_0\cup T_1\cup \cdots \cup T_{r-1} } \frac{(r-1)\eta }{\varepsilon ^{-2} + (r-1)\xi ^2} \, | l \rangle , \end{eqnarray*}\) where \(1\le c_i,c_k \le r-1\). The total probability weights before normalization of the last three summations are, respectively, bounded by \(\begin{equation*} n(r-1)^2\eta ^2=O(1), \quad n(r-1)^2\varepsilon ^4=O(1)^{8}, \quad n(r-1)^2\varepsilon ^4\eta ^2=O(\varepsilon ^4). \end{equation*}\) ⁸ The amplitude of the first summation is at least \(\Omega (1)\) if \(T_0\cap (T_1\cup \cdots \cup T_{r-1}) \ne \emptyset\). In this case, if measuring the state \(| {\bf x}_{\rm opt} \rangle\) in the computational basis, then we will see an index from the intersection with a probability of at least \(1/3\). We can assume that the size of the intersection has order 1, because the disjointness problem remains hard with this promise. So, if the intersection is nonempty, then we will see the same index many times. Otherwise, we will see many different indices uniformly. This reduction shows that the lower bound for any classical protocol of solving linear regression (13) is \(\Omega (r n)\).

From Equation (22), it is easy to see that the condition number \(A\) is \(\kappa = \Theta (1/\varepsilon) = \Theta (\sqrt {n})\). By Lemma 18, we obtain the claimed lower bound for quantum protocols. □

In References [9, 17], it was shown that QSVT can be dequantized, which implies that many quantum algorithms based on QSVT do not have exponential speedups in terms of time and query complexity. When studying communication complexity, we can still use QSVT due to Proposition 14; however, the quantum speedups can be exponential in terms of communication complexity. This suggests that it is quite hard to use the techniques for dequantized algorithms to propose efficient classical protocols with low communication complexity.

5 HAMILTONIAN SIMULATION

As a byproduct, in this section, we consider the problem of Hamiltonian simulation in the coordinator model. We define the problem as follows: Suppose \(P_i\) holds a Hamiltonian \(H_i\) of dimension \(n\), the referee holds a quantum state \(| \psi \rangle\), and their goal is to prepare the state \(e^{i(H_0+\cdots +H_{r-1})t} | \psi \rangle\) quantumly or sample from it classically. By Proposition 14, we can use QSVT to achieve the goal. The lower bounds are also corollaries of the lower bounds we obtained previously.

We start from the simple case: the Alice-Bob model. Suppose Alice has a Hamiltonian \(H\) of dimension \(n\), Bob has a quantum \(| \psi \rangle\), and their goal is to prepare the state \(e^{iHt} | \psi \rangle\) quantumly or sample from it classically. As a corollary of Theorems 7 and 10, we have the following result:

Proposition 20.

Suppose Alice has a Hamiltonian matrix \(H \in \mathbb {C}^{n\times n}\) and Bob has a quantum state \(| \psi \rangle \in \mathbb {C}^{n}\). Then, the quantum communication complexity of outputting \(e^{iHt} | \psi \rangle\) is

(1)	\(\Theta (\log n)\) if the communication is 1-way from Bob to Alice or 2-way.
(2)	\(\Theta (n \log n)\) if the communication is 1-way from Alice to Bob.

Proof.

We apply Theorem 7 to \(A=e^{iHt}\) and \({\bf b}= | \psi \rangle\). Now, \(A\) is unitary. □

In the classical setting, the goal is to sample from the state \(e^{iHt} | \psi \rangle\). Regarding the lower bound for classical protocols, we have the following result by Theorem 12. The result is quite obvious, because we can always write a unitary as \(e^{iHt}\) for some \(H\):

Proposition 21.

Assume that Alice has a Hamiltonian matrix \(H \in \mathbb {C}^{n\times n}\) and Bob has a vector \(| \psi \rangle \in \mathbb {C}^{n}\). Then, \(\Omega (n)\) bits communication are required to sample from \(e^{iHt} | \psi \rangle\) if the communication is 2-way.

Proof.

We still use the notation defined in the proof of Theorem 12. Note that the Hadamard matrix has the decomposition \(H_2=e^{i\frac{\pi }{2}(I_2-H_2) }\). Let \(\begin{equation*} L = \sum _{j=1}^d I_2^{\otimes (j-1)}\otimes (I_2-H_2)\otimes I_2^{\otimes (d-j)}, \end{equation*}\) then \(H_2^{\otimes d} = e^{i\frac{\pi }{2}L }\). In the proof of Theorem 12, we can also consider the distribution \(D_f H_2^{\otimes d} D_f | g \rangle\), which is equivalent to \(H^{\otimes d} D_f | g \rangle\). Now, we have \(D_f H^{\otimes d} D_f = e^{i\frac{\pi }{2} D_f L D_f }\). So, similar to the proof of Theorem 12, Alice constructs the Hamiltonian \(D_f L D_f\) and Bob constructs the quantum state \(| g \rangle\). If they can sample from the resulting state, then they can solve the Distributed Fourier Sampling problem. Hence, the lower bound of classical protocols is \(\Omega (n)\). □

Finally, as an application of Proposition 14, we consider the communication complexity of Hamiltonian simulation in the coordinator model when there are multiple parties.

Proposition 22.

For any \(i\in \lbrace 0,\ldots ,r-1\rbrace\), suppose the party \(P_i\) receives a Hamiltonian \(H_i\) of dimension \(n\). Suppose the referee receives a quantum state \(| \psi \rangle\). Then, in the quantum coordinator model, there is a quantum protocol that costs (23) \(\begin{equation} O\left((r\log n) \left(\sum _{i=0}^{r-1} \Vert H_i\Vert \, |t| + \frac{\log (1/\varepsilon)}{\log (e+(\sum _{i=0}^{r-1} \Vert H_i\Vert \, |t|)^{-1}\log (1/\varepsilon))} \right) \right) \end{equation}\) qubits of communication to prepare the state \(e^{i(H_0+\cdots +H_{r-1})t} | \psi \rangle\) up to error \(\varepsilon\).

Proof.

By Reference [12, Lemma 59], there is a polynomial that approximates \(e^{it}\) up to error \(\varepsilon\) with degree \(\begin{equation*} d = O\left(|t| + \frac{\log (1/\varepsilon)}{\log (e+|t|^{-1}\log (1/\varepsilon))} \right). \end{equation*}\) By Proposition 14, the referee can construct a \((1,3+\log r,0)\) block-encoding of \(e^{it\sum _i H_i/\alpha }\) with \(O(rd\log n)\) qubits of communication, where \(\alpha = \sum _{i=0}^{r-1} \Vert H_i\Vert\). We replace \(t\) with \(\alpha t\). Putting it all together, we obtain the claimed result. □

6 MULTIPARTY QUANTUM COMMUNICATION COMPLEXITY OF DISJOINTNESS

In this section, we complete the proof of our lower bounds in the coordinator model via proving bounds on the quantum communication complexity of the disjointness problem in the multiparty case. We will consider a quantum model that is analogous to the classical coordinator model. Recall that in the coordinator model, there are \(s\) parties \(P_1,\ldots ,P_r\), and there is a coordinator (here, we call it the referee) \(R\). The communication is 2-way between \(P_i\) and \(R\). If \(P_i\) wants to send a message to \(P_j\), then \(P_i\) has to send the message to \(R\) first, then \(R\) will send the message to \(P_j\). In the quantum case, we define a similar model. Different from the previous quantum multiparty model [23] that considers the blackboard model (i.e., if \(P_i\) sends a message, then everyone else can see it), here, we focus on the coordinator model (i.e., if \(P_i\) sends a message, then only the referee can see the message). This model is almost equivalent to the message-passing (“number in hand”) model (i.e., no referee in this model, the party \(P_i\) can send a message directly to another party \(P_j\), and only \(P_j\) can see the message) up to a factor of 2.

In the model, we define the input as (24) \(\begin{equation} | {\rm In} \rangle = | \phi (x_1) \rangle _{P_1} \cdots | \phi (x_r) \rangle _{P_r} | \vec{0} \rangle _{C_1} \cdots | \vec{0} \rangle _{C_r} | \phi (y) \rangle _R , \end{equation}\) where \(x_i\) is the initial information in \(P_i\)’s hand, \(y\) is the initial information in the referee’s hand. The states \(| \phi (x_i) \rangle _{P_i}\), \(| \phi (y) \rangle _{R}\) depend on the initial information. The register \(| \vec{0} \rangle _{C_i}\) is the \(i\)th channel. A quantum protocol is a quantum algorithm that applies a series of unitaries of forms (25) \(\begin{equation} U_{P_1,C_1}, \quad \cdots , \quad U_{P_r,C_r}, \quad U_{C_i,R} \end{equation}\) to \(| {\rm In} \rangle\). The unitary \(U_{P_i,C}\) operates on the space of \(P_i\) and the channel \(C_i\). The unitary \(U_{C_i,R}\) operates on the \(i\)th channel and the space of the referee. At the beginning of a quantum protocol, \(P_1\) applies a unitary of the form \(U_{P_1,C_1}\) to his space and the channel \(C_1\). This corresponds to his private computation as well as to putting a message on the channel \(C_1\). The length of this first message is the number of channel qubits affected by \(P_1\)’s operation. In the second round, the referee speaks and applies a unitary of the form \(U_{C_1,R}\) to his space and the first channel. Then, \(P_2\) applies \(U_{P_2,C_2}\), and so on. If the referee speaks in the end, then a quantum protocol of \(R:=2rt\) rounds defines an output state of the form (26) \(\begin{equation} | {\rm Out} \rangle = \prod _{i=1}^t \, U_{C_r,R}^{(i,2r)} \, U_{P_r,C_r}^{(i,2r-1)} \, \cdots \, U_{C_1,R}^{(i,2)} \, U_{P_1,C_1}^{(i,1)} \, | {\rm In} \rangle . \end{equation}\) Here, for simplicity, we assume that the number of rounds is a integer multiple of \(2r\). We assume that, at the end of the protocol, the referee’s register contains the answer. A measurement of this register then determines the output of the protocol. The quantum communication complexity is the number of qubits used in the whole procedure, which is \(t(r+1)T\). Here, \(T\) is the total number of qubits in the channels.

We below consider the multiparty disjointness problem in this quantum coordinator model. The disjointness problem we are mainly interested in is defined as follows: \(P_i\) has a subset \(x_i\) of \([n]\), and the players aim to determine if there is an \(i\ge 2\) such that \(x_1 \cap x_i \ne \emptyset\). Equivalently, define the Boolean function that describes the 2-party disjointness problem as (27) \(\begin{equation} f(x,y) = {\left\lbrace \begin{array}{ll} 1 & |x\wedge y| \ge 1, \\ 0 & |x\wedge y| = 0. \end{array}\right.} \end{equation}\) Then, the disjointness problem defined above aims to compute (28) \(\begin{equation} f^r_{{\rm OR}}(x_1,x_2,\ldots ,x_r) = f(x_1,x_2) \vee \cdots \vee f(x_1,x_r) . \end{equation}\)

We use \(Q_\varepsilon (f^r_{{\rm OR}})\) to denote the quantum communication complexity of computing \(f^r_{{\rm OR}}\) with error \(\varepsilon\). Namely, there is a quantum protocol without prior entanglement that computes \(f^r_{{\rm OR}}\) of cost \(Q_\varepsilon (f^r_{{\rm OR}})\) such that the acceptance probability on every \((x_1,x_2,\ldots ,x_r)\) is at most \(\varepsilon\) whenever \(f^r_{{\rm OR}}(x_1,x_2,\ldots ,x_r) = 0\) and at least \(1-\varepsilon\) whenever \(f^r_{{\rm OR}}(x_1,x_2,\ldots ,x_r) = 1\). We use \(Q^*_\varepsilon (f^r_{{\rm OR}})\) to denote the quantum communication complexity with prior entanglement. The main result we aim to prove is as follows:

Theorem 23.

\(Q^*_{\varepsilon }(f^r_{{\rm OR}}) = \Theta (r\sqrt {n})\).

Proof.

The upper bound is obvious. We below focus on the proof of the lower bound. For each \(i\in \lbrace 2,\ldots ,r\rbrace\), let Bob plays the role of \(P_i\) and Alice plays the role of the remaining parties as well as the referee. If there is a protocol that computes \(f^r_{{\rm OR}}\), then the protocol allows us to determine if \(x_1\cap x_i=\emptyset\), which is a two-party disjointness problem, using at least \(\Omega (\sqrt {n})\) qubits of communication by setting other subsets \(x_j\) with \(j\ne 1, i\) as the empty set. This means that in this protocol, \(P_i\) needs to apply at least \(\Omega (\sqrt {n})\) unitaries. Therefore, in total, the communication complexity is at least \(\Omega (r\sqrt {n})\). □

7 CONNECTIONS BETWEEN COMMUNICATION COMPLEXITY AND QUANTUM-INSPIRED CLASSICAL ALGORITHMS

In the quantum-inspired classical algorithms, we use a model that allows sampling and query (SQ) access to the input data. Using this model, it was proved that, classically, we could solve some problems, e.g., linear regressions, in cost polylog in the dimension in the low-rank case [9]. We below discuss the connection between communication complexity and quantum-inspired classical algorithms. We will mainly focus on the Alice-Bob model.

First, we recall some definitions about quantum-inspired classical algorithms [9, Definitions 2.5, and 2.10]. For a vector \({\bf b}=(b_1,\ldots ,b_n)\in \mathbb {C}^n\), we have \(SQ({\bf b})\) if we can do the following three things: (i) for any \(i\), we can query for \(b_i\); (ii) we can sample from the distribution defined by \(\text{Prob}(i)=|b_i|^2/\Vert {\bf b}\Vert ^2\); (iii) we can query for the norm \(\Vert {\bf b}\Vert\). For a matrix \(A\in \mathbb {C}^{m\times n}\), we have \(SQ(A)\) if (i) we have \(SQ(A_{i*})\) for any \(i\), where \(A_{i*}\) is the \(i\)th row of \(A\); (ii) let \({\bf a}=(\Vert A_{1*}\Vert ,\ldots ,\Vert A_{m*}\Vert)\), then we have \(SQ({\bf a})\).

For the linear regression problem \(\text{argmin}\Vert A{\bf x}-{\bf b}\Vert\), by a quantum-inspired classical algorithm of complexity \(O(T)\), we mean we can compute \(SQ({\bf x}_*)\), where \(\Vert {\bf x}_*-A^+{\bf b}\Vert \le \varepsilon \Vert A^+{\bf b}\Vert\), by applying \(SQ(A), SQ({\bf b})\) \(O(T)\) times and \(O(T)\) other arithmetic operations. For example, assuming \({\bf b}\) lies in the column spaces of \(A\), then there is a quantum-inspired classical algorithm for linear regression with complexity \(\widetilde{O}(\Vert A\Vert _F^4\Vert A\Vert ^2\Vert A^+\Vert ^6/\varepsilon ^2)\) [31]. Without the assumption, the complexity is \(\widetilde{O}(\Vert A\Vert _F^6\Vert A\Vert ^6\Vert A^+\Vert ^{12}/\varepsilon ^4\gamma ^2)\) [11].

In the Alice-Bob model, we assume the communication is 2-way. By communicating with each other once, Alice can use \(SQ({\bf b})\) or Bob can use \(SQ(A)\) once. Therefore, it is easy to obtain the following result:

Proposition 24.

If there is a quantum-inspired classical algorithm for \(\text{argmin}\Vert A{\bf x}-{\bf b}\Vert\) of complexity \(O(T)\), then there is a classical protocol to solve this linear regression in the Alice-Bob model of communication complexity \(O(T)\), where Alice holds \(A\) and Bob holds \({\bf b}\), the communication is 2-way, and the goal is to sample from a distribution \(\varepsilon\)-close to the one defined by \(| A^+{\bf b} \rangle\).

Similar to the proof of Theorem 12, using the hardness of the Distributed Fourier Sampling problem (see Proposition 2), it is easy to conclude that in the low-rank case, the classical communication complexity is lower bounded by the Rank(\(A\)), while the quantum communication is \(O(1)\) for well-conditioned linear regressions.

Next, let us see two examples that suggest that low rank is not the only assumption for the efficiency of quantum-inspired classical algorithms. We consider the disjointness problem. Recall that, in this problem, Alice and Bob, respectively, have \({\bf a}=(0,a_1,\ldots ,a_n), {\bf b}=(0,b_1,\ldots ,b_n)\in \lbrace 0,1\rbrace ^{n+1}\), they want to determine if there is an \(i\) such that \(a_i=b_i=1\). Without loss of generality, we assume that the Hamming weights \(|{\bf a}|,|{\bf b}|=\Theta (n)\). Consider the following construction: \(\begin{equation*} A=| 0 \rangle \langle 0| + \frac{1}{n} | {\bf a} \rangle \langle {\bf a}|, \quad {\bf b}= | 0 \rangle + | {\bf b} \rangle , \end{equation*}\) where \(| {\bf a} \rangle , | {\bf b} \rangle\) are the quantum states of \({\bf a},{\bf b}\), respectively. Now, \(A\) has rank 2, and the solution is \(\begin{equation*} A^+{\bf b}= | 0 \rangle + n \langle {\bf a}|{\bf b} \rangle | {\bf a} \rangle . \end{equation*}\) If there is no \(i\) such that \(a_i=b_i=1\), then \(\langle {\bf a}|{\bf b} \rangle =0\), so \(A^+{\bf b}= | 0 \rangle\). If there is an \(i\) such that \(a_i=b_i=1\), then we have \(A^+{\bf b}\approx | 0 \rangle + | {\bf a} \rangle\). Here, we assumed that there is only one such \(i\), which is the worst case. Thus, if we can sample from the solution, then in the latter case, we will see some indices from \(\lbrace 1,\ldots ,n\rbrace\) with probability 1/2. As a result, we can solve the disjointness problem. This means \(\Omega (n)\) bits of communication are required to solve this linear regression. It also means that to solve this linear regression, any quantum-inspired classical algorithm costs \(\Omega (n)\). In this example, \(A\) has a low rank, while the complexity is linear in \(n\). This is indeed not a contradiction. In this example, we have \(\Vert A\Vert _F = \Theta (1), \Vert A^+\Vert = n\), and usually quantum-inspired classical algorithms are highly affected by \(\Vert A\Vert _F\Vert A^+\Vert\), which is \(\Theta (n)\) now.

Let us below consider another example, we set \(\begin{equation*} A=| 0 \rangle \langle 0| + | {\bf a} \rangle \langle {\bf a}|, \quad {\bf b}= \frac{1}{n} | 0 \rangle + | {\bf b} \rangle . \end{equation*}\) Now, the solution is \(\begin{equation*} A^+{\bf b}= \frac{1}{n} | 0 \rangle + \langle {\bf a}|{\bf b} \rangle | {\bf a} \rangle . \end{equation*}\) If there is no \(i\) such that \(a_i=b_i=1\), then \(A^+{\bf b}= \frac{1}{n} | 0 \rangle\). Otherwise, we have \(A^+{\bf b}\approx \frac{1}{n}(| 0 \rangle + | {\bf a} \rangle)\). Similarly, by measuring, we can also solve the disjointness problem. In this example, \(\Vert A\Vert _F\Vert A^+\Vert =\sqrt {2}\). However, now \({\bf b}\) is far away from the column space of \(A\), i.e., \(\gamma :=\Vert AA^+{\bf b}\Vert /\Vert {\bf b}\Vert \approx 1/n\). So, similar to quantum algorithms [8], quantum-inspired classical algorithms for linear regressions are also affected by \(\gamma\).

Usually, it is not easy to analyze the lower bounds for classical computation, and communication complexity provides us with an efficient tool to prove some nontrivial lower bounds. So, it is possible that we can find some other interesting properties of quantum-inspired classical algorithms through communication complexity.

8 CONCLUSIONS

In this work, we showed that quantum computers have provable polynomial or exponential speedups for solving linear regression problems and Hamiltonian simulation in terms of communication complexity. We also found that in the quantum coordinator model, we can still efficiently use the quantum singular value transformation technique. Because of this, we believe that for many other linear algebra problems, it is possible to obtain provable quantum speedups using this technique in terms of communication complexity.

ACKNOWLEDGMENTS

No new data were created during this study.

APPENDICES

A PROOF OF THEOREM 11

From the proof of Theorem 10, without loss of generality, we can simply assume that \(S,T \subseteq [n]\) and \(|S|,|T|=\Theta (n)\).

We prove the first and third claims together. Alice and Bob, respectively, construct a diagonal matrix \(A\) and a vector \({\bf b}\) by setting \(\begin{equation*} A_{ii} = {\left\lbrace \begin{array}{ll}\sqrt {\varepsilon } & i \in S, \\ 1/\sqrt {\varepsilon } & i \notin S. \end{array}\right.} \quad b_i = {\left\lbrace \begin{array}{ll}1/\sqrt {\varepsilon } & i \in T, \\ \sqrt {\varepsilon } & i \notin T, \end{array}\right.} \end{equation*}\) where \(\varepsilon = 1/\sqrt {n}\). Then, the optimal solution is \(\begin{equation*} {\bf x}_{\rm opt} = \frac{1}{\varepsilon } \sum _{i\in S \cap T} | i \rangle + \sum _{j\in (S \backslash T) \cup (T \backslash S)} | j \rangle + \varepsilon \sum _{k\in \overline{S \cup T} } | k \rangle . \end{equation*}\) So, \(\begin{eqnarray*} \Vert {\bf x}_{\rm opt}\Vert ^2 = \frac{|S \cap T|}{\varepsilon ^2} + |(S \backslash T) \cup (T \backslash S)| + \varepsilon ^2 |\overline{S \cup T}|. \end{eqnarray*}\)

If \(S\cap T \ne \emptyset\), then the norm of \({\bf x}_{\rm opt}\) is dominated by the first term, so \(\Vert {\bf x}_{\rm opt}\Vert ^2 = n |S\cap T|\). We can also compute that \(\Vert A^+\Vert ^2 = 1/\varepsilon = \sqrt {n}\) and \(\begin{equation*} \Vert {\bf b}\Vert ^2 = \frac{|S \cap T|}{\varepsilon } + \varepsilon |S \backslash T| + \frac{|T \backslash S|}{\varepsilon } + \varepsilon |\overline{S \cup T}| = \Theta (\sqrt {n} |T|). \end{equation*}\) Therefore, \(\begin{equation*} \frac{\Vert A^+\Vert ^2 \Vert {\bf b}\Vert ^2}{\Vert {\bf x}_{\rm opt}\Vert ^2} = \Theta \left(\frac{|T|}{|S\cap T|}\right). \end{equation*}\)

If \(S\cap T = \emptyset\), then \(\Vert {\bf x}_{\rm opt}\Vert ^2 = \Theta (|S|+|T|) = \Theta (n)\). So, \(\begin{equation*} \frac{\Vert A^+\Vert ^2 \Vert {\bf b}\Vert ^2}{\Vert {\bf x}_{\rm opt}\Vert ^2} = \Theta (n) =\Theta (|T|). \end{equation*}\) Note that \(\Theta ({|T|}/{\max (1,|S\cap T|)})\) is the quantum communication complexity for the disjointness problem if the communication is 1-way. If it is 2-way, then the complexity is \(\Theta (\sqrt {{|T|}/{\max (1,|S\cap T|)}})\).

Below, we prove the second claim. We will use the hardness of the index problem. Alice has a (0,1)-matrix \(A\in \mathbb {R}^{m\times n}\) and Bob has an index \((i,j)\), and their goal is to determine \(A_{ij}\). It is known that the communication complexity of this problem is \(\Theta (mn)\). Without loss of generality, we assume that the number of 1s in each column of \(A\) is \(\Theta (m)\), and the number of 1s in each row is \(\Theta (n)\).⁹ We can reduce the index problem to a linear regression problem using a similar construction to the above. For the \(j\)th column, Alice constructs a diagonal matrix \(D_j\) as follows: We use \(D_j(k,k)\) to denote the \(k\)th diagonal entry, then define \(\begin{equation*} D_j(k,k) = {\left\lbrace \begin{array}{ll}1/\sqrt {\varepsilon } & A_{kj}=1, \\ \sqrt {\varepsilon } & A_{kj}=0. \end{array}\right.} \end{equation*}\) Now, \(\varepsilon =1/\sqrt {m}\). With the index \((i,j)\), Bob constructs a vector \({\bf b}_j\) as follows: We use \({\bf b}_j(k)\) to denote the \(k\)th entry, then define \(\begin{equation*} {\bf b}_j(k) = {\left\lbrace \begin{array}{ll}1/\sqrt {\varepsilon } & k=i, \\ \sqrt {\varepsilon } & k\ne i. \end{array}\right.} \end{equation*}\) In the end, they consider the linear regression problem \(\min _{{\bf x}} \Vert D{\bf x}-{\bf b}\Vert ,\) where \(\begin{equation*} D = \begin{pmatrix} D_1 \\ \vdots \\ D_n \end{pmatrix}_{mn\times m}, \quad {\bf b}= \begin{pmatrix} {\bf 0}_{m(j-1)} \\ {\bf b}_j \\ {\bf 0}_{m(n-j)} \\ \end{pmatrix}_{mn\times 1}. \end{equation*}\) Here, \({\bf 0}_{m(j-1)}, {\bf 0}_{m(n-j)}\) are zero vectors of length \(m(j-1)\) and \(m(n-j)\), respectively.

Note that the pseudoinverse of \(D\) is \(\begin{equation*} D^+ = \left(\sum _{i=1}^n D_i^2\right)^{-1} \begin{pmatrix} D_1 & \cdots & D_n \end{pmatrix}. \end{equation*}\) Thus, the optimal solution of the above constructed linear regression problem is \(\begin{equation*} {\bf x}_{\rm opt} = \left(\sum _{i=1}^n D_i^2\right)^{-1} D_j {\bf b}_j. \end{equation*}\) For convenience, we denote the \(k\)th entry of \(\sum _{i=1}^n D_i^2\) as \(d_k\). Then, (29) \(\begin{equation} d_k = \sum _{i=1}^n D_i(k,k)^2 = \frac{1}{\varepsilon } \sum _{i:A_{ki}=1} 1 +\varepsilon \sum _{i:A_{ki}=0} 1 = \Theta (n/\varepsilon) = \Theta (n\sqrt {m}), \end{equation}\) where we used the assumption that each column of \(A\) has \(\Theta (n)\) 1’s. If \(A_{ij}=1\), then we can reformulate \({\bf x}_{\rm opt}\) as follows: \(\begin{equation*} {\bf x}_{\rm opt} = \frac{1}{\varepsilon d_i} | i \rangle + \sum _{k:k\ne i, A_{kj}=1} \frac{1}{d_k} | k \rangle + \varepsilon \sum _{k:A_{kj}=0} \frac{1}{d_k} | k \rangle . \end{equation*}\) By measuring this state, we will see \(i\) with a constant probability. So, we will see \(i\) many times when repeating the measurements. If \(A_{ij}=0\), then the first term does not exist and the second term is summing over all \(k\) with \(A_{kj}=1\). In this case, we will see many different indices by measuring the solution state.

We now estimate the communication complexity of our quantum protocol for this linear regression problem. First, we can compute that \(\begin{eqnarray*} \Vert {\bf x}_{\rm opt}\Vert ^2 &=& \frac{1}{\varepsilon ^2 d_i^2} + \sum _{k:k\ne i, A_{kj}=1} \frac{1}{d_k^2} + \varepsilon ^2 \sum _{k:A_{kj}=0} \frac{1}{d_k^2} = \Theta (1/\varepsilon ^2n^2m) = \Theta (1/n^2), \\ \Vert {\bf b}\Vert ^2 &=& \frac{1}{\varepsilon } + \varepsilon (m-1) = \Theta (\sqrt {m}). \end{eqnarray*}\) Regarding the Frobenius norm of \(D^+\), note that we assumed that each column of \(A\) has \(\Theta (m)\) 1’s, so we have \(\begin{equation*} \Vert D^+\Vert _F^2 = \sum _{j=1}^n \left\Vert \frac{D_j}{\sum _{i=1}^n D_i^2}\right\Vert _F^2 = \Theta \left(n\left\Vert \frac{D_1}{\sum _{i=1}^n D_i^2}\right\Vert _F^2\right) = \Theta \left(\frac{n\Vert D_1\Vert _F^2}{n^2m}\right) = \Theta \left(\frac{nm/\varepsilon }{n^2m}\right) = \Theta \left(\frac{\sqrt {m}}{n}\right). \end{equation*}\) In the above, the second equality is caused by the facts that \(d_k = \Theta (n\sqrt {m})\) from Equation (29) and that each column of \(A\) has \(\Theta (m)\) 1’s so \(\Vert D_j\Vert _F^2 = \Theta (\Vert D_1\Vert _F^2)\). Thus, \(\begin{equation*} \frac{\Vert D^+\Vert _F^2\Vert {\bf b}\Vert ^2}{\Vert {\bf x}_{\rm opt}\Vert ^2} = \Theta (mn). \end{equation*}\) This matches the complexity of the index problem.

B FURTHER DETAILS OF THE PROOF OF THEOREM 15

In this Appendix, we briefly describe the variable-time quantum algorithm (VTAA) for preparing \(| A^{-1}{\bf b} \rangle\) and show that it still works in the quantum coordinator model. The following definition comes from Reference [10, section 5], which is originally from Reference [2, section 3.3]:

Definition 25.

Let \(\mathcal {A}\) be a quantum algorithm on a space \(\mathcal {H}\) that starts in the state \(| 0 \rangle _\mathcal {H}\). We say \(\mathcal {A}\) is a variable-time quantum algorithm if the following conditions hold:

(1)	\(\mathcal {A}\) can be written as the product of \(T\) algorithms \(\mathcal {A}=\mathcal {A}_T \cdots \mathcal {A}_2\mathcal {A}_1\).
(2)	\(\mathcal {H}\) can be written as a product \(\mathcal {H}=\mathcal {H}_C\otimes \mathcal {H}_A\), where \(\mathcal {H}_C\) is a product of \(T\) single qubit registers denoted by \(\mathcal {H}_{C_1},\ldots ,\mathcal {H}_{C_T}\).
(3)	Each \(\mathcal {A}_j\) is a controlled unitary that acts on the registers \(\mathcal {H}_{C_j} \otimes \mathcal {H}_A\) controlled on the first \(j-1\) qubits of \(\mathcal {H}_{C}\) being set to 0.

In VTAA, two key techniques are performing gapped quantum phase estimation on \(A\) and computing truncated block-encoding of \(A^{-1}\). We first state these two results and then check why they are still working in the quantum coordinator model. The following result comes from Reference [10, Lemma 22]:

Lemma 26 (Gapped Phase Estimation (GPE)).

Let \(U\) be a unitary such that \(U | \psi \rangle = e^{i\lambda } | \psi \rangle\) and \(\lambda \in [-1,1]\). Let \(\phi \in (0,1/4]\) and \(\varepsilon \gt 0\). Then, there is a quantum algorithm that maps \(\begin{equation*} | 0 \rangle | 0 \rangle | \psi \rangle \mapsto \alpha _0 | 0 \rangle | g_0 \rangle | \psi \rangle +\alpha _1 | 1 \rangle | g_1 \rangle | \psi \rangle \end{equation*}\) for some unit vectors \(| g_0 \rangle , | g_1 \rangle\), and

—	if \(0 \le \|\lambda \| \le \phi\), then \(\|\alpha _1\| \le \varepsilon\),
—	if \(2\phi \le \|\lambda \| \le 1\), then \(\|\alpha _0\| \le \varepsilon\).

If \(T_U\) is the cost of implementing \(U\), then the cost of this quantum algorithm is \(O((\log 1/\varepsilon)T_U/\phi)\).

The quantum algorithm in the above lemma is based on the standard phase estimation. To use phase estimation in the quantum coordinator model, the main obstacle for them is the Hamiltonian simulation. In our case, \(U=e^{iA}\), where \(A\) is given in Equation (14). So, the referee needs to carry out Hamiltonian simulation of \(A\). This is achieved by QSVT. A quantum protocol can be given in a similar way to that of Proposition 22. The communication complexity of using \(U\) is \(T_U=\widetilde{O}(r\alpha)\), where \(\alpha = \sqrt {\sum _i \Vert A_i\Vert ^2}\). Note that if we estimate the time complexity, then \(T_U=\widetilde{O}(T_A \tilde{\alpha })\), where \(T_A\) is the cost to construct a block-encoding of \(A\) and \(\tilde{\alpha } \ge \alpha\) generally. So, for communication complexity, we can say \(T_A=\widetilde{O}(r)\).

Another result that will be used in the VTAA is truncated block-encoding of \(A^{-1}\) (see Reference [8, Corollary 29]).

Lemma 27.

Let \(A\) be Hermitian, and let \(U\) be an \((\alpha , a, \varepsilon)\) block-encoding of \(A\) that can be implemented using \(T_A\) elementary unitaries. Then, for any state \(| \psi \rangle\) that is spanned by eigenvectors of \(A\) with eigenvalues in the range \([-1,-\lambda ]\cup [\lambda ,1]\), there exists a unitary \(W(\lambda ,\varepsilon)\) \(\begin{equation*} W(\lambda ,\varepsilon): | 0 \rangle | 0 \rangle | \psi \rangle \mapsto \frac{1}{\alpha _{\max }} | 1 \rangle | 0 \rangle f(A) | \psi \rangle + | 0 \rangle ^\bot , \end{equation*}\) where \(\alpha _{\max } \le \lambda\) is a constant and \(\Vert f(A)| \psi \rangle - A^{-1}| \psi \rangle \Vert \le \varepsilon\). The cost of implementing \(W(\lambda ,\varepsilon)\) is \(\widetilde{O}((a+T_A)\alpha /\lambda)\).

For us, \(A\) is given in Equation (14). The unitary \(W(\lambda ,\varepsilon)\) is obtained in a similar way to the block-encoding of \((3\delta /4) A^+\) defined in Equation (17), where we focused on singular values that are at least \(\delta\). Now, we need to focus on singular values that are at least \(\lambda\). In the communication complexity, \(a=\log r\) and \(T_A=O(r\log (n))\) is the required number of qubits of communication to run \(U\). So, for the referee to use \(W(\lambda ,\varepsilon)\), they need to communicate \(\widetilde{O}(r\alpha /\lambda)\) qubits in total, where \(\alpha = \sqrt {\sum _i \Vert A_i\Vert ^2}\).

We next briefly describe the variable-time quantum algorithm \(\mathcal {A}\). For more, especially about the correctness and complexity analysis, we refer to Reference [10, section 5]. The algorithm \(\mathcal {A}\) is built as a sequence of steps \(\mathcal {A}_1,\ldots ,\mathcal {A}_T\) with \(T = \lceil \log \kappa \rceil +1\), so the algorithm is \(\mathcal {A} = \mathcal {A}_T\cdots \mathcal {A}_1\). The algorithm \(\mathcal {A}\) uses the following registers:

—	a \(T\)-qubit clock register \(C\), labelled \(C_1,\ldots , C_T\), used to determine a region the eigenvalue belongs to (i.e., to store the result of GPE);
—	a single-qubit flag register \(F\) to indicate whether the approximation of \(A^{-1}\) was successfully implemented;
—	a \((\log n)\)-qubit register \(I\), initialized to \(\| {\bf b} \rangle\), that finally contains the output state;
—	a register \(P\), divided into registers \(P_1, \ldots , P_T\), to be used as ancilla for GPE;
—	a register \(Q\) to be used as ancilla in the implementation of \(A^{-1}\).

The corresponding Hilbert spaces are denoted by \(\mathcal {H}_C ,\mathcal {H}_F , \mathcal {H}_I , \mathcal {H}_P\), and \(\mathcal {H}_Q\), respectively. All registers are initialized in \(| 0 \rangle\), except for register \(I\), which is initialized in \(| {\bf b} \rangle\). When we write \(| 0 \rangle _X\), we mean that all qubits of register \(X\) are in \(| 0 \rangle\).

We now describe algorithm \(\mathcal {A}_j\). In the algorithm below, each call to GPE uses the unitary operator \(e^{iA}\). For all \(j\in [T]\), let \(\varphi _j = 2^{-j}\), and let \(\delta = \varepsilon /(T \alpha _{\max })\). We define \(\mathcal {A}_j\) as the product of the following two unitary operations:

(1)	Conditional on the first \((j-1)\) qubits of \(\mathcal {H}_C\) being \(\| 0 \rangle\), apply GPE(\(\varphi _j,\delta\)) on the input state in \(I\) using \(C_j\) as the output qubit and additional fresh qubits from \(P\) as ancilla (denoted by \(P_j\)).
(2)	Conditional on \(C_j\) (the outcome of the previous step) being \(\| 1 \rangle _{C_j}\), apply \(W(\varphi _j,T\delta)\) to the input state in \(I\) using \(F\) as the flag register and register \(Q\) as ancilla.

As we can see, Lemmas 26 and 27 are the two main tools in VTAA. We have checked that they are still working in the quantum coordinator model, and so is VTAA. Compared with the time complexity, we can see that the only difference in the communication complexity is that \(T_A\) can be computed precisely in the above two lemmas. Roughly, for time complexity, it may not be easy to compute \(T_A\), while for communication complexity, \(T_A=\widetilde{O}(r)\). Also, the parameter \(\alpha\) in the block-encoding can be computed precisely, too, for communication complexity. In summary, the time complexity results for the above two lemmas still hold for communication complexity, while for communication complexity, we know the values of \(T_A\) and \(\alpha\).

C SOME PREVIOUS RESULTS ABOUT QUANTUM SINGULAR VALUE TRANSFORMATION

In this Appendix, we collate the results that we will use about quantum singular value transformation.

Definition 28

(Alternating Phase Modulation Sequence, Definition 15 of Reference [12]).

Let \(\mathcal {H}_U\) be a finite dimensional Hilbert space, and let \(U, \Pi , \widetilde{\Pi } \in \text{End}(\mathcal {H}_U)\) be linear operators on \(\mathcal {H}_U\) such that \(U\) is unitary and \(\Pi , \widetilde{\Pi }\) are orthogonal projectors. Let \(\Phi \in \mathbb {R}^n\), then we define the phased alternating sequence \(U_\Phi\) as follows: \(\begin{equation*} U_\Phi = {\left\lbrace \begin{array}{ll} e^{i\phi _1}(2\widetilde{\Pi }-I) U \prod _{j=1}^{(n-1)/2} \Big (e^{i\phi _{2j}(2\Pi -I)} U^\dagger e^{i\phi _{2j+1}(2\widetilde{\Pi }-I)} U \Big), & n \text{ is odd,} \\ \prod _{j=1}^{n/2} \Big (e^{i\phi _{2j-1}(2\Pi -I)} U^\dagger e^{i\phi _{2j}(2\widetilde{\Pi }-I)} U \Big), & n \text{ is even.} \end{array}\right.} \end{equation*}\)

Lemma 29 (Efficient Implementation of Alternating Phase Modulation Sequences, Lemma 19 of Reference [12]).

The alternating phased sequence \(U_\Phi\) can be implemented using a single ancilla qubit with \(n\) uses of \(U\) and \(U^\dagger\), \(n\) uses of \(C_{\Pi }NOT\), and \(n\) uses of \(C_{\widetilde{\Pi }}NOT\) gates, and \(n\) single qubit gates.

Definition 30.

Let \(f:\mathbb {R} \rightarrow \mathbb {C}\) be an even or odd function, let \(A\in \mathbb {C}^{m\times n}\) and \(A = \sum _{i=1}^{\min (m,n)} \sigma _i | u_i \rangle \langle v_i|\) be the singular value decomposition of \(A\). If \(f\) is odd, then we define \(f^{\text{(SV)}}(A) = \sum _{i=1}^{\min (m,n)} f(\sigma _i) | u_i \rangle \langle v_i|\). If \(f\) is even, then we define \(f^{\text{(SV)}}(A) = \sum _{i=1}^{n} f(\sigma _i) | v_i \rangle \langle v_i|\), where \(\sigma _i = 0\) for \(i\in [n]\backslash [\min (m,n)]\).

Theorem 31 (Corollary 8 and Theorem 17 of Reference [12]).

Using the same notation as in Definition 28, for any even or odd polynomial \(f(x)\) of degree \(n\) of the following properties:

—	\(\forall x\in [-1,1]: \|f(x)\|\le 1\),
—	\(\forall x\in (-\infty ,-1]\cup [1,\infty): \|f(x)\| \ge 1,\)
—	if \(n\) is even, then \(f(ix)\bar{f}(ix)\ge 1\) for all \(x\in \mathbb {R}.\)

There is an efficiently computable \(\Phi \in \mathbb {R}^n\) such that \(\begin{equation*} f^{\text{(SV)}}(\widetilde{\Pi } U \Pi) = {\left\lbrace \begin{array}{ll} \widetilde{\Pi } U_{\Phi } \Pi & \text{if } n \text{ is odd,} \\ \Pi U_{\Phi } \Pi & \text{if } n \text{ is even.} \end{array}\right.} \end{equation*}\)

Proposition 32 (Corollary 69 of Reference [12]).

Let \(\varepsilon , \delta \in (0, 1/2 ]\), then there is an odd polynomial \(P \in \mathbb {R}[x]\) of degree \(O(\delta ^{-1} \log (1/\varepsilon))\) that is \(\varepsilon\) approximating \(f(x) = 3\delta /4x\) on the domain \([-1,1]\backslash [-\delta ,\delta ]\), moreover, it is bounded 1 in absolute value.

Proposition 33 (Lemmas 57 and 59 of Reference [12]).

Let \(t\in \mathbb {R}\backslash \lbrace 0\rbrace , \varepsilon \in (0,1/e),\) and let \(R=\lfloor r(\frac{e|t|}{2}, \frac{5\varepsilon }{4})/2 \rfloor\), then \(\begin{eqnarray*} && \left\Vert \cos (tx) - J_0(t) + 2 \sum _{k=1}^R (-1)^k J_{2k}(t) T_{2k}(x) \right\Vert _{[-1,1]} \le \varepsilon , \\ && \left\Vert \sin (tx) - 2 \sum _{k=0}^R (-1)^k J_{2k+1}(t) T_{2k+1}(x) \right\Vert _{[-1,1]} \le \varepsilon , \end{eqnarray*}\) where \(J_m(t), m\in \mathbb {Z}\) denote Bessel functions of the first kind, and \(\begin{equation*} r(|t|,\varepsilon) = O\left(|t| + \frac{\log (1/\varepsilon)}{\log (e+|t|^{-1}\log (1/\varepsilon))} \right). \end{equation*}\)

Footnotes

¹ There are many equivalent ways to define pseudoinverse [13]. Here, we recall the one based on singular value decomposition (SVD). If \(A=\sum _{i=1}^r \sigma _i | u_i \rangle \langle v_i|\) is SVD of \(A\), where \(r=\text{Rank}(A)\), then \(A^+=\sum _{i=1}^r \sigma _i^{-1} | v_i \rangle \langle u_i|\).
Footnote
² Throughout, by an optimal solution of a linear regression problem \(\text{argmin}_{{\bf x}\in \mathbb {R}^n} \Vert A{\bf x}-{\bf b}\Vert ^2\), we always mean the solution \(A^+{\bf b}\), where \(A^+\) is the pseudoinverse of \(A\). It is possible that a linear regression can have infinitely many solutions; however, the solution with minimum norm is unique, which is \(A^+{\bf b}\) [27].
Footnote
³ Classically, the task of outputting a vector solution has been studied in Reference [35]. In this article, we show that sampling is also hard for classical computers. Moreover, this makes the quantum speedup in solving linear regression problems more convincing.
Footnote
⁴ If \(|T|=o(l)\), then Bob can send \(T\) directly to Alice, which only uses \(O(|T|)=o(l)\) bits of communication. So, we assume this is not the case.
Footnote
⁵ This follows from the main theorem proved in Reference [30]. Indeed, in Reference [30], it was shown that given a predicate \(D\) on \(\lbrace 0,1,\ldots ,n\rbrace\), let \(l_0:=\max \lbrace l:1\le l \le n/2, D(l)\ne D(l-1)\rbrace , l_1:=\max \lbrace n-l:n/2\le l \lt n, D(l)\ne D(l+1)\rbrace\), then up to a logarithmic factor the bounded-error quantum communication complexity of \(f(x,y):=D(|x\cap y|)\) is \(\sqrt {n l_0}+l_1\). For the disjointness problem with the promise that \(|S\cap T| \le 1\), we have \(D(l)=1\) if \(l\in \lbrace 1,n\rbrace\) and \(D(l)=0\) otherwise. In this case, we have \(l_0=1,l_1=1\). So, the lower bound is \(\sqrt {n}\). Here, we added \(D(n)=1\) intentionally to ensure \(l_1\) is well-defined. It corresponds to the case that \(S=T=[n]\), which is the trivial case.
Footnote
⁶ By Reference [22, Theorem 3.7], it is known that the VC-dimension of this function is \(\Theta (\log n)\). By Reference [20, Theorem 3], VC-dimension is a lower bound of one-way quantum communication complexity. Thus, the one-way quantum communication complexity is lower bounded by \(\Omega (\log n)\).
Footnote
⁷ The proof is basically the same as Lemma 6.
Footnote
⁸ Here, we used the fact that \(r=O(\sqrt {n})\).
Footnote
⁹ If some columns of \(A\) contain \(o(m)\) 1’s, then Alice can send all these columns to Bob first. This totally costs \(o(mn)\), which is strictly less than \(mn\) So, removing these columns does not affect the hardness of the index problem. The same analysis is also true for rows. Hence, we can assume that each column has \(\Theta (m)\) 1’s and each row has \(\Theta (n)\) 1’s.

REFERENCES

[1] Aaronson Scott and Ambainis Andris. 2003. Quantum search of spatial regions. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 200–209.Google ScholarCross Ref
Reference
[2] Ambainis A.. 2012. Variable time amplitude amplification and a faster quantum algorithm for solving systems of linear equations. In Proceedings of the 29th International Symposium on Theoretical Aspects of Computer Science (STACS’12), Vol. 14. 636–47.Google Scholar
[3] Ambainis Andris, Schulman Leonard J., Ta-Shma Amnon, Vazirani Umesh, and Wigderson Avi. 2003. The quantum communication complexity of sampling. SIAM J. Comput. 32, 6 (2003), 1570–1585.Google ScholarDigital Library
Reference
[4] Berry Dominic W., Childs Andrew M., Cleve Richard, Kothari Robin, and Somma Rolando D.. 2015. Simulating Hamiltonian dynamics with a truncated Taylor series. Phys. Rev. Lett. 114, 9 (2015), 090502.Google ScholarCross Ref
Reference
[5] Brassard Gilles, Hoyer Peter, Mosca Michele, and Tapp Alain. 2002. Quantum amplitude amplification and estimation. Contemp. Math. 305 (2002), 53–74.Google ScholarCross Ref
Reference
[6] Buhrman Harry, Cleve Richard, Massar Serge, and Wolf Ronald de. 2010. Nonlocality and communication complexity. Rev. Mod. Phys. 82, 1 (2010), 665.Google ScholarCross Ref
Reference
[7] Buhrman Harry and Wolf Ronald de. 2001. Communication complexity lower bounds by polynomials. In Proceedings of the 16th Annual IEEE Conference on Computational Complexity. IEEE, 120–130.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[8] Chakraborty Shantanav, Gilyén András, and Jeffery Stacey. 2019. The power of block-encoded matrix powers: Improved regression techniques via faster Hamiltonian simulation. In Proceedings of the 46th International Colloquium on Automata, Languages, and Programming (ICALP’19)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 132), Baier Christel, Chatzigiannakis Ioannis, Flocchini Paola, and Leonardi Stefano (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 33:1–33:14. DOI:Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[9] Chia Nai-Hui, Gilyén András, Li Tongyang, Lin Han-Hsuan, Tang Ewin, and Wang Chunhao. 2020. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing. 387–400.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[10] Childs Andrew M., Kothari Robin, and Somma Rolando D.. 2017. Quantum algorithm for systems of linear equations with exponentially improved dependence on precision. SIAM J. Comput. 46, 6 (2017), 1920–1950.Google ScholarDigital Library
[11] Gilyén András, Song Zhao, and Tang Ewin. 2022. An improved quantum-inspired algorithm for linear regression. Quantum 6 (2022), 754.Google ScholarCross Ref
Reference 1Reference 2
[12] Gilyén András, Su Yuan, Low Guang Hao, and Wiebe Nathan. 2019. Quantum singular value transformation and beyond: Exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing. 193–204.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
Reference 10
Reference 11
[13] Golub Gene H. and Loan Charles F. Van. 2013. Matrix Computations. The Johns Hopkins University Press.Google ScholarCross Ref
[14] Harrow Aram W., Hassidim Avinatan, and Lloyd Seth. 2009. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 15 (2009), 150502.Google ScholarCross Ref
Reference
[15] Jain Rahul, Sen Pranab, and Radhakrishnan Jaikumar. 2008. Optimal direct sum and privacy trade-off results for quantum and classical communication complexity. arXiv preprint arXiv:0807.1267 (2008).Google Scholar
Reference
[16] Jayram Thathachar S. and Woodruff David P.. 2013. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Trans. Algor. 9, 3 (2013), 1–17.Google ScholarDigital Library
Reference 1Reference 2
[17] Jethwani Dhawal, Franccois Le Gall, and Singh Sanjay K.. 2020. Quantum-inspired classical algorithms for singular value transformation. In Proceedings of the 45th International Symposium on Mathematical Foundations of Computer Science (MFCS’20). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
Reference 1Reference 2
[18] Kalyanasundaram Bala and Schnitger Georg. 1992. Communication Complexity and Lower Bounds for Sequential Computation. Vieweg+Teubner Verlag, Wiesbaden, 253–268.Google Scholar
Reference
[19] Klartag Bo’az and Regev Oded. 2011. Quantum one-way communication can be exponentially stronger than classical communication. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing. 31–40.Google Scholar
Reference
[20] Klauck Hartmut. 2000. On quantum and probabilistic communication: Las Vegas and one-way protocols. In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing. 644–651.Google ScholarDigital Library
[21] Klauck Hartmut. 2000. Quantum communication complexity. arXiv preprint quant-ph/0005032 (2000).Google Scholar
Reference 1Reference 2
[22] Kremer Ilan, Nisan Noam, and Ron Dana. 1999. On randomized one-round communication complexity. Comput. Complex. 8, 1 (1999), 21–49.Google ScholarDigital Library
Reference 1Reference 2
[23] Lee Troy, Schechtman Gideon, and Shraibman Adi. 2009. Lower bounds on quantum multiparty communication complexity. In Proceedings of the 24th Annual IEEE Conference on Computational Complexity. IEEE, 254–262.Google ScholarDigital Library
Reference
[24] Martyn John M., Rossi Zane M., Tan Andrew K., and Chuang Isaac L.. 2021. Grand unification of quantum algorithms. PRX Quant. 2, 4 (2021), 040203.Google ScholarCross Ref
Reference
[25] Montanaro Ashley. 2019. Quantum states cannot be transmitted efficiently classically. Quantum 3 (2019), 154.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[26] Phillips Jeff M., Verbin Elad, and Zhang Qin. 2012. Lower bounds for number-in-hand multiparty communication complexity, made easy. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 486–501.Google ScholarDigital Library
Reference 1Reference 2
[27] Planitz M.. 1979. Inconsistent systems of linear equations. Math. Gaz. 63, 425 (1979), 181–185.Google ScholarCross Ref
[28] Raz Ran. 1999. Exponential separation of quantum and classical communication complexity. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing. 358–367.Google ScholarDigital Library
Reference
[29] Razborov Alexander A.. 1990. On the distributional complexity of disjointness. In Proceedings of the International Colloquium on Automata, Languages, and Programming. Springer, 249–253.Google ScholarCross Ref
Reference 1Reference 2
[30] Razborov Alexander A.. 2003. Quantum communication complexity of symmetric predicates. Izvestiya: Math. 67, 1 (2003), 145.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[31] Shao Changpeng and Montanaro Ashley. 2022. Faster quantum-inspired algorithms for solving linear systems. ACM Trans. Quant. Comput. 3, 4 (2022), 1–23.Google ScholarDigital Library
Reference 1Reference 2
[32] Sun Xiaoming and Wang Chengu. 2012. Randomized communication complexity for linear algebra problems over finite fields. In Proceedings of the 29th Symposium on Theoretical Aspects of Computer Science (STACS’12). LIPIcs, 477–488.Google Scholar
Reference
[33] Tang Ewin. 2019. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing. 217–228.Google ScholarDigital Library
Reference 1Reference 2
[34] Tang Hao, Li Boning, Wang Guoqing, Xu Haowei, Li Changhao, Barr Ariel, Cappellaro Paola, and Li Ju. 2022. Communication-efficient quantum algorithm for distributed machine learning. arXiv preprint arXiv:2209.04888 (2022).Google Scholar
Reference
[35] Vempala Santosh S., Wang Ruosong, and Woodruff David P.. 2020. The communication complexity of optimization. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1733–1752.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
Reference 10
[36] Wolf Ronald de. 2002. Quantum communication and complexity. Theor. Comput. Sci. 287, 1 (2002), 337–353.Google ScholarDigital Library
Reference
[37] Woodruff David P. and Zhang Qin. 2017. When distributed computation is communication expensive. Distrib. Comput. 30, 5 (2017), 309–323.Google ScholarDigital Library
Reference
[38] Yao Andrew Chi-Chih. 1979. Some complexity questions related to distributive computing (preliminary report). In Proceedings of the 11th Annual ACM Symposium on Theory of Computing. 209–213.Google ScholarDigital Library
Reference
[39] Yao Andrew Chi-Chih. 1993. Quantum circuit complexity. In Proceedings of the IEEE 34th Annual Foundations of Computer Science. IEEE, 352–361.Google Scholar
Reference

Index Terms

Quantum Communication Complexity of Linear Regression
1. Theory of computation
  1. Computational complexity and cryptography
    1. Communication complexity
  2. Models of computation
    1. Quantum computation theory

Recommendations

Quantum Entanglement and Communication Complexity

We consider a variation of the communication complexity scenario, where the parties are supplied with an extra resource: particles in an entangled quantum state. We note that "quantum nonlocality" can be naturally expressed in the language of ...
Read More
Noisy Interactive Quantum Communication

We study the problem of simulating protocols in a quantum communication setting over noisy channels. This problem falls at the intersection of quantum information theory and quantum communication complexity, and it will be of importance for eventual real-...
Read More
Tensor norms and the classical communication complexity of nonlocal quantum measurement
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing

Nonlocality is at the heart of quantum information processing. In this paper we investigate the minimum amount of classical communication required to simulate a nonlocal quantum measurement. We derive general upper bounds, which in turn translate to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computation Theory Volume 16, Issue 1
March 2024
105 pages
ISSN:1942-3454
EISSN:1942-3462
DOI:10.1145/3613509
Editor:
Prahladh Harsha
Tata Institute of Fundamental Research, India
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 March 2024
- Online AM: 22 September 2023
- Accepted: 19 September 2023
- Received: 13 October 2022
Published in toct Volume 16, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Quantum computing
communication complexity
linear regression
quantum singular value transformation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 399
  Total Downloads
- Downloads (Last 12 months)399
- Downloads (Last 6 weeks)183
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quantum Communication Complexity of Linear Regression

ACM Transactions on Computation Theory

Abstract

1 INTRODUCTION

1.1 Our Results

1.1.1 Alice-Bob Model.

1.1.2 Coordinator Model.

1.2 Summary of Techniques

1.3 Related Work

2 PRELIMINARIES

2.1 Communication Complexity

2.2 Block-encoding

(Block-encoding, cf. Definition 24 of [12]).

3 TWO PARTIES

3.1 The Quantum Protocols

3.2 Lower Bounds

(Permutation-index Problem).

(Quantum Lower Bounds (with Respect to κ, γ))

4 MULTIPLE PARTIES

4.1 The Quantum Protocol

4.2 Lower Bounds

5 HAMILTONIAN SIMULATION

6 MULTIPARTY QUANTUM COMMUNICATION COMPLEXITY OF DISJOINTNESS

7 CONNECTIONS BETWEEN COMMUNICATION COMPLEXITY AND QUANTUM-INSPIRED CLASSICAL ALGORITHMS

8 CONCLUSIONS

ACKNOWLEDGMENTS

APPENDICES

A PROOF OF THEOREM 11

B FURTHER DETAILS OF THE PROOF OF THEOREM 15

C SOME PREVIOUS RESULTS ABOUT QUANTUM SINGULAR VALUE TRANSFORMATION

(Alternating Phase Modulation Sequence, Definition 15 of Reference [12]).

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Quantum Entanglement and Communication Complexity

Noisy Interactive Quantum Communication

Tensor norms and the classical communication complexity of nonlocal quantum measurement

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media