Paper The following article is Open access

Coupled regularization with multiple data discrepancies

, and

Published 13 June 2018 © 2018 IOP Publishing Ltd
, , Special issue on joint reconstruction and multi-modality/multi-spectral imaging Citation Martin Holler et al 2018 Inverse Problems 34 084003 DOI 10.1088/1361-6420/aac539

0266-5611/34/8/084003

Abstract

We consider a class of regularization methods for inverse problems where a coupled regularization is employed for the simultaneous reconstruction of data from multiple sources. Applications for such a setting can be found in multi-spectral or multi-modality inverse problems, but also in inverse problems with dynamic data. We consider this setting in a rather general framework and derive stability and convergence results, including convergence rates. In particular, we show how parameter choice strategies adapted to the interplay of different data channels allow to improve upon convergence rates that would be obtained by treating all channels equally. Motivated by concrete applications, our results are obtained under rather general assumptions that allow to include the Kullback–Leibler divergence as data discrepancy term. To simplify their application to concrete settings, we further elaborate several practically relevant special cases in detail.

To complement the analytical results, we also provide an algorithmic framework and source code that allows to solve a class of jointly regularized inverse problems with any number of data discrepancies. As concrete applications, we show numerical results for multi-contrast MR and joint MR-PET reconstruction.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

In many classical fields of application of inverse problems, rather than measuring a single data channel, the simultaneous or sequential acquisition of multiple channels has become increasingly important recently. Besides having different sources of information available, an advantage of multiple measurements is that correlations between different data channels can be exploited in the inversion process, often leading to a significant improvement for each channel.

Multi-modality and multi-contrast imaging techniques for instance deal with the exploration of such joint structures for improved reconstruction. Applications of such techniques can be found for instance in biomedical imaging [2, 1113, 21, 26, 36, 38, 42, 42, 46], geosciences [45], electron microscopy [20] and beyond. Also spatio-temporal imaging techniques can be interpreted in this way, regarding the measurement at each time-point as a different channel, and we exemplary refer to [17, 32, 41] for applications.

The present work focuses on a general framework for coupled regularization of inverse problems where the data is split into multiple channels. In an abstract setting, we consider the following problem setup.

Equation (1)

where the Ti denote forward operators, the fi the given data, the Di data discrepancy terms and R a regularization functional. Our main interpretation of this setup is that $u= (u_1, \ldots, u_N)$ denotes some unknown multi-channel quantity of interest, each ui corresponds to some measured data fi such that $T_i u = \widetilde{T}_i u_i \approx f_i $ , and R realizes a coupled regularization of all channels.

A first particular, example for such a setting is the joint reconstruction of magnetic resonance (MR) and positron emission tomography (PET) images via solving

as has for instance been considered in [13, 26]. Here, the first data fidelity reflects Gaussian noise in MR, while the second one is the Kullback–Leibler divergence that arises since the noise in PET imaging is typically assumed to be Poisson distributed.

A second example is the situation that the components $(\,f_i)_i$ result from a sequential measurement process which is such that at different measurement times, the quality of the measured data is different. A practical application of this is spatio-temporal regularization for dynamic magnetic resonance imaging (dMRI) [32, 41], where the fi would correspond to measurements of the same object at different time points. There, one aims to solve

Equation (2)

where now $u = (u_i)_{i=1}^N$ is a time series of measurements and R employs a spatio-temporal regularization. Since in dMRI, there is always a trade-off between the signal quality and measurement time, it is reasonable to adapt the measurement time to the underlying dynamics of the object, leading to measurements with different resolution and noise properties.

We highlight two particular situations of the setting (1) that are captured by standard theory: the first one is the situation when the regularization decouples, i.e. $R(u) = \sum\nolimits_{i=1}^N R_i (u_i)$ , which reduces the reconstruction to N decoupled problems. Application-wise, however, it has been shown in many recent works that, in case the different channels share some structure, a coupled regularization is highly beneficial in terms of reconstruction quality. The second particular case is when all data terms are weighted the same, i.e. $\lambda_i = \lambda_j$ for all $i, \,j$ . In this situation, all data terms can be grouped to a single discrepancy and, again, standard theory applies. We believe, however, that there are various good reasons to consider an individual weighting of all data terms: (i) The noise type of the involved measurements might be different and the different data discrepancies might hence scale differently. (ii) The noise level will in general be different, e.g. the SNR for one channel might be significantly better, allowing to choose a very high value for one of the $\lambda_i$ . (iii) The degree of ill-posedness might be very different for each channel, as is the case for instance with MR-PET reconstruction, and hence again an appropriate adaption of the regularization parameters is necessary.

Allowing different parameters for different data discrepancies, the following questions arise.

  • What is the convergence behavior of the method if the data of some channels converges to a ground truth? Is there a limit problem, and if so, what does it look like?
  • What about convergence rates in case the noise on all channels approaches zero? How is the interplay of the different data terms with respect to convergence rates?
  • Taking into account different types of data terms, what is the best parameter choice?

We will see that those questions all have rather natural answers. In particular, we will show that in case the data discrepancies have different continuity properties, which is true for instance for MR-PET reconstruction, an appropriate parameter choice allows to partially compensate for less regular discrepancies.

To the best knowledge of the authors, a convergence theory for coupled regularization approach to inverse problems with multiple data discrepancies has not been developed so far. Regarding classical Tikhonov regularization with a single regularization and a single data term we refer to the books [15, 29, 40, 43] for convergence results in Hilbert and Banach spaces. In particular, we highlight the seminal work [22] that provides appropriate source conditions and convergence rates in Banach spaces that have also been the basis for this work.

While [22] and classical works consider a setting that allows for powers of norms as data discrepancies, there are far less works that also cover discrepancies that do not satisfy a (generalized) triangle inequality, as is the case for the Kullback–Leibler divergence that is included in the present work. For the latter, we refer to [23, 33, 37, 39].

While multi-discrepancy regularization has not been considered in depth so far, there has been a lot of recent research on multi-penalty regularization and convergence behavior for multiple parameters and functionals. We refer to the works [18, 25, 28] for an additive combination of multiple regularization terms, to [16, 19, 31] for an infimal-convolution-type combination and to [5] for stability and convergence results for multiple-parameter-TGV regularization.

1.1. Outline of the paper

The paper is organized as follows: in section 2 we define the variational problem setting under consideration and provide stability and convergence results in the general setting. This is done in a way that captures both powers of norms as well as the Kullback–Leibler divergence as data discrepancy terms. In section 3 we consider particular classes of data fidelities. We show how our results can be applied to powers of norms and obtain corresponding convergence rates. After some preliminary results on the Kullback–Leibler divergence, we conclude stability and convergence rates for mixed L2 norm and Kullback–Leibler discrepancies, which is particularly motivated by applications to joint MR-PET reconstruction. In section 4 we exemplary work out the application of our results to two choices of coupled regularization. In section 5 we first outline an algorithmic framework that allows to realize coupled TGV regularization with any combination of L2 and Kullback–Leibler divergence discrepancies. Then we show exemplary numerical results for coupled regularization in the context of multi-contrast MR and PET imaging.

2. General stability and convergence results

2.1. Notation and variational problem setting

We will, throughout this work and up to further specification, consider the following problem setting: let $(X, \|\cdot \|_X)$ be a Banach space and $R\colon X \to [0, \infty]$ be a regularization term. Further, for $i = 1, \dots, N$ , let $(Y_i, \|\cdot \|_{Y_i})$ be normed spaces, let $D_i\colon Y_i\times Y_i \to [0, \infty]$ be given data discrepancies and let $ \newcommand{\domain}{{\rm dom}} T_i:\domain(T_i) \subset X \rightarrow Y_i$ be operators that model the forward problem. For $ \newcommand{\domain}{{\rm dom}} \newcommand{\bi}{\boldsymbol}u \in \domain(T):=\bigcap\nolimits_{i=1}^N \domain(T_i) \subset X$ , $f = (\,f_1, \ldots, f_N) \in Y:=Y_1 \times \ldots \times Y_N$ and $\lambda \in [0, \infty]^N$ we define

Equation (3)

where, for $\lambda _i = \infty$ , we define

We consider the problem

We will further use $\tau_X$ and $\tau_{Y_i}$ , $\tau_{D_i}$ , $i=1, \ldots, N$ , to denote different topologies on X and Yi, $i=1, \ldots, N$ , respectively, and assume that $\tau_X$ and $\tau_{Y_i}$ satisfy the T2 separation axiom. For τ being any topology, we say that a sequence $(z_n)_n$ τ-converges to z as $n\rightarrow \infty$ , and write $z_n \mathop{\rightarrow}\limits^{\tau} z$ as $n\rightarrow \infty$ , if the sequence converges with respect to the notion of convergence induced by the topology. Further we write $f^n_i \mathop{\rightarrow}\limits^{D_i\,} f_i$ and say $f^n_i$ Di-converges to fi if $D_i(\,f_i, f^n_i) \rightarrow 0$ as well as $f^n_i \mathop{\rightarrow}\limits^{\tau_{D_i}}f_i$ as $n \rightarrow \infty$ . By $z_n \rightarrow z$ we denote norm convergence and by $z_n \rightharpoonup z$ we denote weak convergence.

Note that the notion $f^n_i \mathop{\rightarrow}\limits^{D_i\,} f_i$ will be used to denote convergence of the data, and the topology $\tau_{D_i}$ allows to require a stronger convergence than just $D_i(\,f_i, f^n_i) \rightarrow 0$ , which is useful for particular applications. In the applications considered in this paper, however, the topology $\tau_{D_i}$ will be chosen to be the trivial topology such that $f^n_i \mathop{\rightarrow}\limits^{D_i\,} f_i$ is equivalent to $D_i(\,f_i, f^n_i) \rightarrow 0$ .

We will use the following product-space notation: we denote by $\mathop{\rightarrow}\limits^{\tau_{Y\,}}$ and $\mathop{\rightarrow}\limits^{\tau_{D\,}}$ component wise convergence for sequences in $ \newcommand{\bi}{\boldsymbol}Y:=\bigoplus\nolimits_{i=1}^N Y_i$ . By $\|\cdot\|_Y= \left(\sum\nolimits_{i=1}^N \|\cdot \|_{Y_i}^2\right) ^{1/2}$ we denote the standard product norm on Y and we define $T:X \rightarrow Y$ as $(Tu)_i = T_i u$ .

Throughout this work, we will use the following assumptions

Assumption (G). 

  • 1.  
    For each i and $a, b\in Y_i$ , $ \newcommand{\R}{\mathbb{R}} D_i(a, b) = 0 \Rightarrow a=b$ .
  • 2.  
    Each Di is $\tau_{Y_i} \times {D_i}$ lower semi-continuous, i.e.
    Equation (4)
  • 3.  
    R is lower semi-continuous w.r.t $\tau_X$ .
  • 4.  
    The operators Ti are continuous from $(\rm dom(T_i), \tau_X)$ to $(Y_i, \tau_{Y_i})$ and $ \newcommand{\domain}{{\rm dom}} \domain(T_i)$ is closed w.r.t convergence in $\tau_X$ .
  • 5.  
    For any $\lambda \in (0, \infty){}^N$ , the functional $J_{\lambda}(\cdot, \cdot)$ is uniformly coercive in the sense that, for sequences $(u^n)_n, (\,f^n)_n$ such that $f^n \mathop{\to}\limits^{D} f$ we have that $J_\lambda(u^n, f^n)< c$ implies that $(u^n)_n$ admits a $\tau_X$ -convergent subsequence.

Remark 2.1. The assumptions above are kept rather general with the aim of allowing data discrepancies beyond powers of norms, in particular the Kullback–Leibler divergence used for Poisson noise. Regarding their applicability, we note the following: assumptions 1 and 2 are rather weak and standard, e.g. are fulfilled in case the Di are powers of norms and $\tau_Y$ is stronger than the corresponding weak topology. Assumptions 3 and 4 are also rather standard. Assumption 5 essentially requires pre-compactness of sublevel sets, which is a standard condition for existence.

It will also be convenient to use the following notation: for a set $I\subset \{1, \dots, N\}$ we define the reduced energy functional

Equation (5)

Here, the complement Ic is always taken in $\{1, \ldots, N\}$ , i.e. we define $I^c = \{1, \ldots, N\} \setminus I$ . Further, writing $I = \{i_1, \ldots, i_{|I|}\}$ , we will use the notation $T_I=(T_{i_1}, \dots T_{i_{|I|}})$ , $f_I = (\,f_{i_1}, \dots f_{i_{|I|}})$ and $\lambda_I = (\lambda_{i_1}, \dots \lambda_{i_{|I|}})$ . For two sequences $(x^n)_n$ and $(y^n)_n$ we write $x^n \sim y^n$ if there exist constants $c, C>0$ such that $cy^n \leqslant x^n \leqslant Cy^n$ for all n. Further, we use the standard little $o()$ and big $O()$ terminology, that is, we say $x^n = o(y^n)$ and $x^n = O(y^n)$ if $\lim\nolimits_{n\rightarrow \infty} x^n/y^n = 0$ or $x^n/y^n$ is uniformly bounded in n, respectively.

For the functional R and $\xi \in \partial R(u) \subset X^*$ , an element of the subdifferential of R at $u \in X$ , we define the Bregman distance $D_R^\xi:X \times X \rightarrow [0, \infty]$ as $D_R^\xi (v, u) = R(v) - R(u) - \langle \xi, v-u\rangle_{X^*, X}$ .

2.2. Results in the general setting

Using assumption (G), the following existence result can be obtained by standard arguments.

Proposition 2.2. Let $f\in Y$ and $\lambda \in (0, \infty){}^ N$ be such that $J_\lambda(\cdot, f)$ is proper, i.e. finite in at least one point, and assume that assumption (G) holds. Then there exists a solution to $P(\lambda, f)$ .

Proof. Take $(u^n)_n$ to be a minimizing sequence for which we can assume that $J_\lambda(u^n, f)$ is bounded. Then by assumption $(u^n)_n$ admits a $\tau_X$ -convergent subsequence and, by the lower semi-continuity assumptions on R and Di and by continuity of Ti we obtain that the limit is a minimizer. □

The following proposition shows existence of what we will refer to as $J_{\lambda, I^c}$ minimal solution to $T_Iu=f_I^\dagger$ . This notion will be relevant later on when we study limit problems in multi-discrepancy regularization for vanishing noise level.

Proposition 2.3. Assume that assumption (G) holds. Let $\lambda \in (0, \infty){}^N$ , $I\subset \{1, \dots , N\}$ and $f^\dagger \in Y$ be given. Assume that there exists u0 such that $D_i(T_iu_0, f_i^\dagger) = 0$ for $i \in I$ and $J_{\lambda, I^c}(u_0, f^\dagger)<\infty$ . Then there exist an $u^\dagger$ such that

Equation (6)

Remark 2.4. Note that $J_{\lambda, I^c}$ is again a multi-data Tikhonov functional for indices in Ic, and so $J_{\lambda, I^c}$ -minimal solutions to $T_iu=f_i^\dagger$ can be understood as solutions to a Tikhonov problem for Ic, using $T_Iu=f_I^\dagger$ as prior.

Proof. It is obvious that the two minimization problems are equivalent, hence we show existence for the second one. Due to our assumptions, the set of solutions is not empty and the infimum is greater equal zero. Consequently, an infimising sequence exists, which due to the coercivity assumption admits a $\tau_X$ -convergent subsequence with limit $u^\dagger$ . Due to continuity of the Ti and lower semi-continuity of the Di in the first component w.r.t. $\tau_{Y_i}$ we get that the limit $u^\dagger$ is a solution of (6) and the assertion follows. □

Now we consider the situation of varying data, which is relevant for investigating stability of the solution method with respect to data perturbations. To this aim, we will need the following continuity assumption at a particular pair $(u, f^\dagger) \in X\times Y$ and index set $I\subset \{1, \ldots, N\}$ .

Assumption ${S_1(u, f^\dagger, I^c)}$

We remark that this assumption is in particular always satisfied if the data fidelity is a power of a norm, as can be easily deduced from the inverse triangle inequality for norms.

The next theorem considers the rather general situation that the given data converges and we choose the parameters to either converge to infinity, which corresponds to assuming vanishing noise, or to a fixed positive number. From this, simplified results on stability and convergence for vanishing noise level will then be obtained as corollaries. Note further that we allow for powers pi of the data terms, which will be relevant for the particular case of norms later on.

Theorem 2.5 (General convergence result). Suppose that assumption (G) holds. Let $(\,f^n)_n$ in Y and $f^\dagger \in Y$ be such that $f^n \mathop{{\rightarrow}}\limits^{D} f^\dagger$ as $n\rightarrow \infty$ , and, for $i \in \{1, \ldots, N\}$ , let $p_i \in [1, \infty)$ and define $\delta_i ^n = D_i(\,f^\dagger _i, f_i^n){}^{1/p_i} $ such that, by assumption, $\delta_i^n \rightarrow 0$ as $n \rightarrow \infty$ . Choose the parameters $\lambda^n = (\lambda_1^n, \ldots, \lambda_N^n) \in (0, \infty){}^ N$ such that, for a given subset $I \subset \{1, \ldots, N\}$ ,

and denote $\lambda^\dagger = \lim\nolimits_{n \rightarrow \infty} \lambda ^n \in (0, \infty]^N$ .

Assume that there exists a $J_{\lambda^\dagger, I^c}$ -minimal solution to $T_Iu=f_I^\dagger$ , denoted by u0, such that $S_1(u_0, f^\dagger, I^c)$ holds. Then every sequence $(u^n)_n$ of solutions to the problems

Equation (7)

admits a $\tau_X$ -convergent subsequence again denoted by $(u^n)_n$ with limit $u^\dagger$ such that

and every limit of a $\tau_X$ -convergent subsequence of solutions is a solution to

Equation (8)

Proof. Let $(u^n)_n$ be any sequence of solutions and let u0 be a $J_{\lambda^\dagger, I^c}$ -minimal solution to $T_Iu=f_I^\dagger$ satisfying $S_1(u_0, f^\dagger, I^c)$ . By minimality, one obtains

Equation (9)

for C  >  0, where the convergence $ \newcommand{\bi}{\boldsymbol}D_i\big(T_iu_0, f_i^n\big)\to D_i\big(T_iu_0, f_i^\dagger\big)$ is due to the assumption $S_1(u_0, f^\dagger, I^c)$ . Setting $ \newcommand{\N}{\mathbb{N}} \lambda^0_i = \inf\nolimits_{n\in \N} \lambda_i^n>0$ for all i, this implies boundedness of $J_{\lambda^0}(u^n, f^n)$ and consequently, by coercivity, $(u^n)_n$ admits a convergent subsequence.

Assume now that $(u^n)_n$ is any subsequence of a sequence of solutions converging to some $u^\dagger \in X$ . The estimate in (9) implies for each $i\in I$

Equation (10)

and hence, by lower semi-continuity of Di and since $\lambda_i \to \infty$ we obtain $T_iu^\dagger=f_i^\dagger$ .

Further, we get from lower semi-continuity and convergence of the $\lambda_i^n$ for $i \notin I$ that

Combining this with the estimate in (9) we obtain, since $u_0 \in X$ solves (8), that also $u^\dagger$ is also a $J_{\lambda^\dagger, I^c}$ -minimizing solution of $T_Iu=f_I^\dagger$ .

Now assume that there is one $i_0 \in I$ such that $\lambda_{i_0}^ n D_{i_0}(T_{i_0}u^ n, f^ n) $ does not converge to zero. This implies that

which, by (9), yields

and hence a contradiction. Consequently, for all $i \in I$ ,

which means that $D_i(T_iu^n, f_i^n) = o((\lambda_i^n){}^{-1})$ . Finally, we assume that either there is an $i_0 \in I^c$ such that $ \newcommand{\bi}{\boldsymbol}\limsup\nolimits_{n\rightarrow \infty} D_{i_0}\big(T_{i_0}u^n, f_{i_0}^n\big) > D_{i_0} \big(T_{i_0}u^\dagger, f_{i_0}^\dagger\big)$ or that $\limsup\nolimits_{n\rightarrow \infty} R(u^n) > R(u^\dagger)$ . In either case this implies that there is a c  >  0 such that

But as before this implies that

which is a contradiction to u0 being optimal. Together with lower semi-continuity, this implies $ \newcommand{\bi}{\boldsymbol}\lim\nolimits_{n\rightarrow \infty} D_i\big(T_iu^n, f_i^n\big) = D_i \big(T_iu^\dagger, f_i^\dagger\big)$ for all $i \in I^c$ as well as $\lim\nolimits_{n\rightarrow \infty} R(u^n) = R(u^\dagger)$ . □

Remark 2.6. We note that, by choosing $\lambda_i^n$ for $i \in I$ such that

with $ \newcommand{\e}{{\rm e}} \epsilon _i \in (0, p_i) $ , the assumptions of theorem 2.5 concerning the choice of $\lambda_i^n$ hold and we obtain the rate

which comes arbitrary close to the rate $o(\delta^{\,p_i})$ . The rate $O(\delta^{\,p_i})$ will be obtained under additional source conditions later on.

As first consequence of theorem 2.5 we obtain a basic stability result. That is, letting all regularization parameters in theorem 2.5 fixed, it follows that the solutions of $P(\lambda, f^ \dagger)$ are stable with respect to perturbations of the data. Note that this could have also been obtained from standard theory (e.g. [22]) without handling the N data terms individually.

Corollary 2.7 (Stability). Suppose that assumption (G) holds. Let $(\,f^n)_n$ in Y and $f^\dagger \in Y$ be such that $f^n \mathop{{\rightarrow}}\limits^{D} f^\dagger$ as $n\rightarrow \infty$ and take $\lambda^\dagger \in (0, \infty){}^ N$ . Assume that for u0 being a solution to $P(\lambda, f^ \dagger)$ , assumption $S_1(u_0, f^\dagger, \{1, \ldots, N\})$ holds. Then every sequence $(u^n)_n$ of solutions to

Equation (11)

admits a $\tau_X$ -convergent subsequence again denoted by $(u^n)_n$ with limit $u^\dagger$ such that

and every limit of a $\tau_X$ -convergent subsequence of solutions is a solution to

Equation (12)

Proof. Choose $ \newcommand{\e}{{\rm e}} I = \emptyset$ , $\lambda^n = \lambda^\dagger$ and apply theorem 2.5. □

The next corollary is specific to the multi-data-discrepancy setting and handles the situation when the noise on some components vanishes while the noise on other components remains the same. It shows that in this situation, using an appropriate parameter choice, the limit problem comprises the minimization of the remaining terms subject to hard constraints for the channels with ground truth data. To deduce this result, we set the regularization parameters corresponding to components with a fixed noise level to be constant and assume an appropriate parameter choice for the others. Remarkably, the result holds without assumption $S_1(u, f^\dagger, I^c)$ .

Corollary 2.8 (Convergence for vanishing noise). Assume that assumption (G) holds. Let $I \subset \{1, \ldots, N\}$ be any index set, $f^ \dagger \in Y$ , and the sequence $(\,f^n)_n$ be such that $f^n_{I^c} = f_{I^c}^\dagger$ and $f^n_I \mathop{\rightarrow}\limits^{{D_I}} f_I^\dagger$ as $n\rightarrow \infty $ . Define $\delta^n_i = D_i(\,f^ \dagger_i, f^n_i){}^ {1/p_i}$ with $p_i \in [1, \infty)$ for $i\in I$ and $\delta_i^ n = 0$ else.

Choose the parameters $\lambda^n = (\lambda_1^n, \ldots, \lambda_N^n) \in (0, \infty){}^ N$ such that $\lambda_{i}^n = \lambda^ \dagger _{i}\in (0, \infty)$ for $i \in I^c$ and

and denote $\lambda^\dagger = \lim\nolimits_{n \rightarrow \infty} \lambda ^n \in (0, \infty]^N$ .

Assume that there exists a $J_{\lambda^\dagger, I^c}$ -minimal solution to $T_Iu=f_I^\dagger$ . Then every sequence $(u^n)_n$ of solutions to

Equation (13)

admits a $\tau_X$ -convergent subsequence again denoted by $(u^n)_n$ with limit $u^\dagger$ such that

and every limit of a $\tau_X$ -convergent subsequence of solutions is a solution to

Equation (14)

Proof. Apply theorem 2.5 and note that assumption $S_1(u_0, f^\dagger, I)$ was only needed in its proof to show the convergence $ \newcommand{\bi}{\boldsymbol} D_i\big(T_iu_0, f_i^n\big)\rightarrow D_i\big(T_iu_0, f_i^\dagger\big)$ for $i \in I^c$ , which holds trivially since $f^n_i = f_i^ \dagger$ for $i \in I^c$ . □

Remark 2.9. A particular consequence of the previous corollary is that fixing $f^n_i = f^\dagger_i$ also for $i \in I$ allows for any parameter choice $\lambda_i^n$ such that $\lambda_i^n \rightarrow \infty$ and yields still $D_i(T_iu^n, f^\dagger_i)= D_i(T_iu^n, f^n_i) = o((\lambda_i^n){}^{-1}) $ for $i \in I$ . In the context of joint regularization, this analytically confirms that, letting the parameter $\lambda_i^n$ for some particular, more well-conditioned modalities or channels increase approximates the situation that the corresponding components Tiu are fixed to hit the data exactly and only enter as prior information in the joint regularization functional R.

In summary, the previous results answer the first question posed in the introduction: multi-data-fidelity regularization is stable with respect to convergence of the data for some channels and, using an appropriate parameter choice strategy, the limit problem in case the data for some channels converges to a ground truth comprises the minimization of the regularization and data discrepancies for the other channels subject to hard constraints for the channels with ground truth data.

Next we consider convergence rates. To this aim, we will need a slightly stronger version of assumption $S_1(u, f^\dagger, I^c)$ , which we denote by $S_2(U, f^\dagger)$ . Essentially we will require that the continuity of assumption $S_1(u, f^\dagger, I^c)$ also holds for all u in a set U and that the modulus of continuity can be estimated by functions $\psi_i$ . The typical situation we have in mind for applications is that $\psi_i(x) = x^{\nu_i}$ with $\nu_i \in (0, 1]$ .

Assumption ${S_2(U, f^ \dagger)}$

Remark 2.10. We note that assumption $S_2(U, f^ \dagger)$ is satisfied trivially in case the data discrepancy is a norm or, more generally, whenever the data terms satisfy an inverse triangle inequality of the type

The reason for using a slightly weaker assumption is that the weaker version as above is satisfied by norms to the power p with p  >  1, for which an inverse triangle inequality does not hold.

We also remark that the assumption $S_2(U, f^ \dagger)$ could be further relaxed for those indices $i \in I$ where $\lambda^n_i \rightarrow \infty$ , by allowing arbitrary constants as factors for $D_i(T_iv, f_i^\dagger)$ , instead of $(1 \pm\psi_i(D_i(\,f^\dagger_i, f_i)))$ which goes to one. The fact that we can even get the constants to converge to one, which is true but less trivial for powers of norms, is only needed for indices in Ic where $\lambda_i^n$ converges to a finite value. Hence the latter is a particularity of the multi-data fidelity setting when the noise in some component goes to zero but in others not.

The next theorem provides the main estimates for obtaining convergence rates for all specifications of the data terms that are considered later on. It partially relies on a rather abstract source condition that will be discussed below and simplified in particular applications of the theorem.

Theorem 2.11. With the assumptions of theorem 2.5, for $i \in I^c$ , define $\phi_i:[0, \infty) \rightarrow [0, \infty)$ continuous, monotonuously increasing with $\phi(0) = 0$ such that

Let $(u^n)_n$ be a sequence of solutions to $P(\lambda^n, f^n)$ and $u^\dagger$ be a $J_{\lambda^\dagger, f^\dagger}$ -minimizing solution of $T_I u = f_I^\dagger$ . Set $U_{n_0} = \{u^n \, |\, n\geqslant n_0\} \cup \{u^\dagger\}$ and assume that, for some $n_0 \geqslant 0$ , $S_2(U_{n_0}, f^\dagger)$ holds.

Then

Equation (15)

If we assume further that there exists

and constants $0<\beta_1<1$ , $0<\beta_2$ such that

for all u satisfying $ \newcommand{\e}{{\rm e}} J_{\lambda^\dagger, I^c}(u, f^\dagger) + \sum\nolimits_{i \in I} D_i(T_i u, f^\dagger_i) \leqslant J_{\lambda^\dagger}(u^\dagger, f^\dagger)+ \epsilon$ , for some $ \newcommand{\e}{{\rm e}} \epsilon > 0$ , then, after at most finitely many indices n we have

Proof. Take $(u^n)_n$ a sequence of solutions and $u^ \dagger$ a $J_{\lambda^\dagger, f^\dagger}$ -minimizing solution of $T_I u = f_I^\dagger$ and, without loss of generality, assume that $S_2(U_0, f^\dagger)$ holds.

At first we note that, from optimality and the fact that $T_i u^\dagger = f_i^\dagger$ for $i \in I$ , it follows that

Equation (16)

Now we aim to change from fn to $f^\dagger$ and from $\lambda^n$ to $\lambda^\dagger$ in the functionals $J_{\lambda^n, I^c}$ of last line above. To this aim, we first note that, by assumption $S_2(U_0, f^\dagger)$ ,

where the last inequality follows from estimating $\lambda_i^n (D_i(T_iu^\dagger, f_i^\dagger)+1) \leqslant C$ for all $i\in I^c$ . Similarly, we get

where the last inequality is obtained by boundedness of $D_i(T_iu^ n, f^\dagger_i)$ , which follows by estimating it above with $D_i(T_iu^n, f^n_i)$ (up to constants) using $S_2(U_0, f^\dagger)$ , which in turn is bounded by the assertion of theorem 2.5.

Now in order to change from $\lambda^n$ to $\lambda^\dagger$ we sum the two equations above and further estimate

where the last inequality is again due to boundedness of $D_i(T_iu^n, f^\dagger_i)$ .

Combining the last estimate with (16) we get

Equation (17)

which already shows the estimate on the extended objective functional $J_{\lambda ^\dagger, I^ c}(\cdot, f^ \dagger)$ as in (15). Now by the second inequality in $S_2(U_0, f^\dagger)$ we can estimate $ \newcommand{\bi}{\boldsymbol}D_i (T_i u^n, f^\dagger_i) \leqslant C\big(D_i (T_i u^n, f_i^n) + \psi_i((\delta_i^n){}^{\,p_i})\big)$ for $i \in I$ . This, together with the fact that $C < \lambda _i ^n $ for $i \in I$ and n sufficiently large allows to conclude from (17) that, up to finitely many indices, $ \newcommand{\e}{{\rm e}} J_{\lambda^\dagger, I^c}(u^n, f^\dagger) + \sum\nolimits_{i \in I} D_i(T_i u^n, f_i^\dagger) \leqslant J_{\lambda^\dagger}(u^\dagger, f^\dagger)+ \epsilon$ .

Hence the source condition holds for u  =  un and the difference of the reduced objective functionals as above can be estimated by

Plugging this estimate in equation (17), the assertion follows. □

As mentioned above, the parameters $p_i\geqslant 1$ are introduced for the case that the data discrepancies are powers of norms. Setting all pi  =  1, an abstract estimate on convergence rates follows immediately from the previous theorem.

Proposition 2.12. With the assumptions of theorem 2.11, set pi  =  1 for $i \in I$ . Then, with $(u^n)_n$ a sequence of solutions to $P(\lambda^n, f^n)$ we obtain for $j\in I$

Equation (18)

Equation (19)

Proof. First note that, by assumption $S_2(U_0, f^\dagger)$ we can again estimate, for $i \in I$ ,

It then follows from the assertion of theorem 2.11 that after finitely many indices

Using non-negativity of all involved terms (for sufficiently large n), the assertion follows as direct consequence. □

Before considering a more concrete result for convergence rates, we discuss the source condition (SC), which is a direct extension of a standard source condition for single, norm discrepancy terms in the context of non-linear inverse problems in Banach spaces, see [22]. The reason for introducing the power 1/pi in the condition is to obtain compatibility with existing source conditions in the case that $D_i(v, f) = \|v-f\|^ {q_i}_{Y_i}$ , by choosing $p_i = q_i$ . Obviously, when letting the noise on all data components converge to zero (still potentially at different rates), i.e. choosing $I=\{1, \ldots, N\}$ , $J_{\lambda^\dagger, I^ c}(\cdot, f^\dagger)$ reduces to R and the source condition coincides with the one of [22]. As discussed for instance there or in [40], under some additional assumptions, this source condition then is equivalent to more classical source conditions [15].

The following remark, whose proof is direct, allows for an interpretation of the source condition also in the most general setting.

Remark 2.13. Take $u^ \dagger\in X$ , $f^ \dagger \in Y$ , $\lambda^ \dagger \in (0, \infty]^ N$ and $I \subset \{1, \ldots, N\}$ such that $\lambda_i ^ \dagger < \infty $ for $i \in I^ c$ . With $ \xi \in \partial \Big(J_{\lambda^\dagger, I^c}(\cdot, f^\dagger)\Big)(u^\dagger)$ and constants $0<\beta_1<1$ , $0<\beta_2$ , $p_i \in [1, \infty)$ for $i \in I$ , we have, for all $u \in X$ ,

if and only if

The second equation can be interpreted such that if u deviates from the equality constraints $T_i u = f^ \dagger_i$ for $i \in I$ , the cost of the respective data term needs to increase at least as fast as the Bregman distance of u to $u^\dagger$ with respect to the reduced objective functional.

We now consider more concrete assertions on convergence rates. It is easy to see that, in the assertion of proposition 2.12, the choice $f_i^n = f_i^\dagger$ and $\lambda_i^n = \lambda_i^\dagger$ for $i\in I^c$ allows to choose $ \newcommand{\e}{{\rm e}} \phi_i \equiv 0$ and improves the convergence rates accordingly. More particularly, in the case that the noise on every component goes to zero, i.e. $ \newcommand{\e}{{\rm e}} I^c = \emptyset$ , we get the following corollary.

Corollary 2.14. With the assumptions of theorem 2.11, set $I= \{1, \ldots, N\}$ , pi  =  1 for all i and let $(u^n)_n$ be a sequence of solutions to $P(\lambda^n, f^n)$ and $u^\dagger$ be an R-minimizing solution of $T u = f^\dagger$ such that $S_2(U_{n_0}, f^\dagger)$ holds for some n0. Then, after at most finitely many $n\geqslant 0$ ,

Equation (20)

If we assume further that there exists

constants $0<\beta_1<1$ , $0<\beta_2$ such that

for all u satisfying $ \newcommand{\e}{{\rm e}} R(u) + \sum\nolimits_{i=1}^N D_i(T_i u, f^\dagger_i) \leqslant R (u^\dagger)+ \epsilon$ for some $ \newcommand{\e}{{\rm e}} \epsilon > 0$ , then we obtain for any $j \in \{1, \ldots, N\}$ ,

Equation (21)

Equation (22)

Proof. This follows from theorem 2.11 and proposition 2.12 by choosing $I = \{1, \ldots, N\}$ . □

The previous result provides, in a general setting, an answer to the second question posed in the introduction: under appropriate assumptions, convergence rates can be obtained for the multi-discrepancy setting and the estimates above provide explicit information on the interplay of the different channels and discrepancy terms with respect to convergence rates. Next, we will discuss a first example in the general setting how this interplay can be exploited with appropriate parameter choice strategies in order to improve convergence rates. This is a first answer to the third question of the introduction, for which a more detailed discussion with concrete examples of data discrepancies is provided in the subsequent section.

Consider the particular case that the noise level for all components is given as $\delta_i^n \sim (\delta^n){}^{\mu_i}$ for some $\delta^n \rightarrow 0$ and $\mu_i \geqslant 1$ and that $\psi_i (x) \sim x$ for all i. It is immediate that using the same $\delta_i^n$ -dependent parameter choice for all $\lambda_i^n$ , say $ \newcommand{\e}{{\rm e}} \lambda_i^n \sim (\delta_i^n){}^{-(1-\epsilon)} $ for $ \newcommand{\e}{{\rm e}} \epsilon \in (0, 1)$ , yields the rates

That is, the rates for both $D_R^\xi$ and Dj are affected by the worst of all $\mu_i$ . In addition, choosing $ \newcommand{\e}{{\rm e}} \epsilon \rightarrow 1$ , gives the best rate for $D_R^\xi$ , however worsens the rate for Dj whenever $\mu_j > \min\nolimits_i \{\mu_i\}$ .

As will be shown in the following corollary, adapting the rates of each parameter to the interplay of the $\mu_i$ allows to improve upon this. In particular, it allows to obtain the rate $(\delta^ n){}^{\mu_j}$ for each $\mu_j$ and a slightly improved rate for $D_R^\xi$ . This is achieved in the more general setting that $\psi_i (x) \sim x^{\nu_i}$ with potentially different $\nu_i \in (0, 1]$ , which will be relevant for applications with different noise characteristics yielding different discrepancy terms.

Corollary 2.15. With the assumptions of corollary 2.14, suppose that the $\psi_i$ in assumption $S_2(U, f^ \dagger)$ are such that

and that

with a sequence $(\delta^n)_n$ in $(0, \infty)$ such that $\delta^n \rightarrow 0$ , $\mu_i \geqslant 1$ and $\nu_i \in (0, 1]$ . Define $ \newcommand{\e}{{\rm e}} \eta_i = \mu_i \nu_i$ and $ \newcommand{\e}{{\rm e}} \eta_{\min} = \min\nolimits_i \{\eta_i\} $ and take one i0 such that $ \newcommand{\e}{{\rm e}} \mu_{i_0} = \min \{\mu_i \, |\, \eta_i = \eta_{\min} \}$ . Assume that $\nu_{i_0} < 1$ . Then, with

for all i it follows that $ \newcommand{\e}{{\rm e}} \epsilon_i \in (0, 1)$ and with the parameter choice

we obtain

Equation (23)

If we again assume that the source condition (SC) of corollary 2.14 holds, then we obtain for any $j \in \{1, \ldots, N\}$ ,

Equation (24)

Equation (25)

This choice of the $ \newcommand{\e}{{\rm e}} \epsilon_i$ is optimal for the estimates on Dj and $D_R^\xi$ given in corollary 2.14 in the sense that they cannot be improved by any other parameter choice of the form $ \newcommand{\e}{{\rm e}} \lambda_i^n = (\delta_i^n){}^{-(1-\tilde{\epsilon}_i)}$ with $ \newcommand{\e}{{\rm e}} \tilde{\epsilon} _i \in (0, 1)$ .

Remark 2.16. Before providing the proof, we note that, for the sake of simplicity, we have left out the case that $\nu_{i_0}=1$ with $\nu_{i_0}$ as above. In this case, one has to choose $ \newcommand{\e}{{\rm e}} \epsilon_i = \epsilon \frac{\eta_{\min}}{\mu_i}$ with $ \newcommand{\e}{{\rm e}} \epsilon \in (0, 1)$ , which allows to approximate the same rates in the limit $ \newcommand{\e}{{\rm e}} \epsilon \rightarrow 1$ .

Proof. At first we show that indeed $ \newcommand{\e}{{\rm e}} \epsilon_i < 1$ : in case i is such that $ \newcommand{\e}{{\rm e}} \eta_i > \eta_{\min}$ this follows from $ \newcommand{\e}{{\rm e}} \epsilon_i= \frac{\eta_{\min}}{\eta_i}\nu_i \leqslant \frac{\eta_{\min}}{\eta_i} < 1$ . In the other case, we note that necessarily $\nu_i<1$ hence $ \newcommand{\e}{{\rm e}} \epsilon_i < \frac{\eta_{\min}}{\eta_i} \leqslant 1$ .

Plugging in the proposed parameter choice in the rates of corollary 2.14 the claimed rates follow by direct computation. Regarding their optimality, we note that for any choice of $ \newcommand{\e}{{\rm e}} \tilde{\epsilon}_i$ , the rate for $ \newcommand{\D}{\mathcal{D}} \D_R^\xi$ is given as

which cannot be better that $ \newcommand{\e}{{\rm e}} (\delta^n){}^{\eta_{\min}}$ .

Similar, by considering the jth summand in the rate for Dj it is immediate that the rate cannot be better than $(\delta^n){}^{\mu_j}$ , which is achieved with the proposed parameter choice. □

3. Specification of the data term

3.1. Power-of-norm discrepancies

In this subsection, we consider the particular case that all discrepancy terms Di are powers of norms, i.e. $D_i(u, f) = \|u-f\|_{Y_i}^{\,p_i}$ with $p_i \in [1, \infty)$ and $J_\lambda(u, f) = R(u) + \sum\nolimits_{i=1}^N\lambda_i \|T_iu - f_i\|_{Y_i}^{\,p_i}$ . That is, we consider the minimization problem

In this particular case, we choose the topologies $\tau_{D_i}$ to be the trivial topologies and hence Di-convergence is equivalent to convergence in the norm $\|\cdot \|_{Y_i}$ . As we will see, assumption (G) then reduces as follows.

Assumption (N). [Norm discrepancies, $\tau_D$ the trivial topology]

  • 1.  
    $\|\cdot \|_{Y_i}$ is sequentially lower semi-continuous w.r.t. $\tau_{Y_i}$ , and addition and scalar multiplication are continuous in the topology $\tau_{Y_i}$ .
  • 2.  
    R is lower semi-continuous w.r.t $\tau_X$ .
  • 3.  
    The operators Ti are continuous from $(X, \tau_X)$ to $(Y_i, \tau_{Y_i})$ and $ \newcommand{\domain}{{\rm dom}} \domain(T_i)$ is closed w.r.t convergence in $\tau_X$ .
  • 4.  
    For any $\lambda \in (0, \infty){}^N$ , $f \in Y$ and any sequence $(u^n)_n$ we have that $J_\lambda(u^n, f)< c$ implies that $(u^n)_n$ admits a $\tau_X$ -convergent subsequence.

Proposition 3.1. For each $i \in \{1, \ldots, N\}$ , set $D_i(v_i,f_i) := \|v_i-f_i\|_{Y_i}^{\,p_i}$ with $p_i \in [1, \infty)$ , and define $\tau_D$ to be the trivial topology. Then, if assumption (N) holds, also assumption (G) is satisfied.

Proof. We consider the particular points of assumption (G) with the numbering given there. The Point 1. obviously holds. For two sequences $(a_i^n)_n, (\,f_i^n)_n$ in Yi converging to $a _i, f_i \in Y_i$ with respect to $\tau_{Y_i}$ and norm convergence, respectively, we get that

and hence, by monotonicity and continuity of the mapping $x \mapsto x^{\,p_i}$ on $[0, \infty)$ Point 2. in assumption (G) holds. The Points 3. and 4. remain unchanged. Regarding point 5., take $\lambda >0$ , $(\,f^n)_n$ to be sequence D-converging to $f \in Y$ and take $(u^n)_n $ to be a sequence such that $J_\lambda(u^n, f^{n}) < c $ , with c  >  0. By the triangle-inequality we get that

which is bounded, hence by assumption $(u^ n)_n $ admits a $\tau_X$ -convergent subsequence. □

By the inverse triangle inequality and continuity of $x \mapsto x^{\,p_i}$ it is also clear that assumption $S_1(u, f^\dagger, I^c)$ holds for any $u \in X$ . Hence, in summary, for the case of power-of-norm discrepancies, we get the following simplified version of the general result in theorem 2.5, covering existence, stability and convergence for vanishing noise. Obviously, the corollaries 2.7 and 2.8 then also hold in simplified versions.

Theorem 3.2 (General convergence result for powers of norms). Suppose that assumption (N) holds. Let $(\,f^n)_n$ in Y and $f^\dagger \in Y$ be such that $\|\,f^\dagger - f^n \|_{Y_i} \rightarrow 0$ as $n\rightarrow \infty$ , and, for $i \in \{1, \ldots, N\}$ , define $\delta_i ^n = \|\,f^\dagger _i - f_i^n\|_{Y_i} $ such that, by assumption, $\delta_i^n \rightarrow 0$ as $n \rightarrow \infty$ . Choose the parameters $\lambda^n = (\lambda_1^n, \ldots, \lambda_N^n) \in (0, \infty){}^ N$ such that, for a given subset $I \subset \{1, \ldots, N\}$ ,

Further denote $\lambda^\dagger = \lim\nolimits_{n \rightarrow \infty} \lambda ^n \in (0, \infty]^N$ .

If $ \newcommand{\domain}{{\rm dom}} \newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol}\domain(R) \cap \left(\bigcap\nolimits_{i=1}^n \domain(T_i) \right) \neq \emptyset$ , then there exists a sequence $(u^n)_n$ of solutions to $P_N(\lambda, f^n)$ , every such sequence of solutions possesses a $\tau_X$ -convergent subsequence again denoted by $(u^n)_n$ with limit $u^\dagger$ such that

and every limit of a $\tau_X$ -convergent subsequence of solutions is a solution to

Equation (26)

Regarding convergence rates results, we remember that in the general case they rely on assumption $S_2(U, f^ \dagger)$ . For power-of-norm discrepancies, we now show that also this assumption is always satisfied. To this aim, we first need the following basic facts from analysis.

Lemma 3.3. Let $a, b, \lambda >0$ , $p, q \in (1, \infty)$ such that $\frac{1}{p}+\frac{1}{q}=1$ . Then

Further, there exist constants $c, C>0$ such that for any $\lambda \in (0, 1]$ ,

Proof. The first inequality follows from the Hölder inequality, since

which implies

Using de l'Hospital's rule we further get

and obviously $ \lim\nolimits_{\mu \rightarrow \infty} \frac{(1+\mu^q){}^{\,p-1}}{\mu^{q(\,p-1)}} = 1$ , hence the other two inequalities hold. □

This allows us to obtain the following estimates, which are a particular case of the inequalities required by assumption $S_2(U, f^ \dagger)$ .

Proposition 3.4. Let $(Y, \|\cdot\|)$ be a normed space and $f, f^\dagger, v \in Y$ and $p, q\in (1, \infty)$ with $\frac{1}{p}+\frac{1}{q}=1$ . Define $\delta = \|f - f^\dagger\|$ . Then, there exits $\alpha>0$ and constants $c, C, d, D>0$ such that, whenever $\delta < \alpha$ ,

Proof. Obviously, the results holds for $\delta=0$ , hence we assume $\delta>0$ . First we use lemma 3.3 to estimate for a general $\lambda \in (0, 1]$

Equation (27)

Now we set $\lambda = \delta^\frac{1}{q}$ and get for δ sufficiently small

Reordering the inequality (27), we obtain

and since $\lim\nolimits_{\lambda \rightarrow 0} \frac{(1+C \lambda){}^{-1}-1}{\lambda}=-C$ , there exists d  >  0 such that for all $\lambda $ sufficiently small $(1+C\lambda){}^{-1} \geqslant (1-d\lambda)$ and hence

Setting again $\lambda = \delta^\frac{1}{q}$ the result follows. □

The previous proposition shows, in case all Di-terms are powers of norms, assumption $S_2(U, f^ \dagger)$ holds again for any $U \subset X$ , $f^\dagger$ with $\psi_i(x)=x^\frac{1}{p_i}$ . From this, we obtain the following convergence rates.

Theorem 3.5. With the assumptions of theorem 3.2, decompose the index set I such that $I = J_1 \cup J_p$ where $J_1 = \{i \in I \, |\, p_i = 1\}$ , $J_p = \{i \in I \, |\, p_i > 1\}$ . Let the parameter choice for the indices $i\in I$ be such that

Equation (28)

with $ \newcommand{\e}{{\rm e}} \epsilon_i \in (0, 1)$ . Let $(u^n)_n$ be a sequence of solutions to $P_N(\lambda, f^n)$ and $u^\dagger$ be a $J_{\lambda^\dagger, f^\dagger}$ -minimizing solution of $T_I u = f_I^\dagger$ . Then

Equation (29)

Assume further that there exists

and constants $0<\beta_1<1$ , $0<\beta_2$ such that

for all u satisfying $ \newcommand{\e}{{\rm e}} J_{\lambda^\dagger, I^c}(u, f^\dagger) + \sum\nolimits_{i \in I} \|T_iu-f^\dagger_i\|_{Y_i}^{p_i} \leqslant J_{\lambda^\dagger}(u^\dagger, f^\dagger)+ \epsilon$ for some $ \newcommand{\e}{{\rm e}} \epsilon > 0$ . Then, for $j \in I$ , it holds

Equation (30)

and, in case pj  =  1,

Equation (31)

and, in case pj  >  1,

Equation (32)

Proof. At first we note that, by theorem 2.11, for all but finitely many indices n we get

Using that, by Young's inequality, for $i \in J_p$ ,

we obtain

Equation (33)

Estimating the left hand side below by the individual terms, the assertion follows. □

We now consider the particular case that the noise on all components vanishes, i.e. $I= \{1, \ldots, N\}$ , and that pi  =  2 for all i, which is typically chosen when the norms $\|\cdot \|_{Y_i}$ are two-norms and the noise is Gaussian.

Assume again that the noise level of the different components is related as $(\delta_i^n) = (\delta^n){}^{\mu_i}$ with $\delta^n \rightarrow 0$ , $\delta^n \geqslant 0$ and $\mu_i \geqslant 1$ with $\mu_{i_0} = 1$ for one i0 (the slowest rate). As a direct consequence of the previous theorem we get the rate $\delta^n$ for $D^\xi_R$ and $(\delta^n){}^{\mu_j + 1}$ for $ \|T_ju^n - f_j\|_{Y_j}^{2} $ . The following corollary shows that, again, by adapting the parameter choice to the interplay of the different noise levels, we can improve the rate for $ \|T_ju^n - f_j\|_{Y_j}^{2} $ without worsening the rate for $D_R^\xi$ .

Corollary 3.6. With the assumptions of theorem 3.2, set the index set $I= \{1, \ldots, N\}$ and pi  =  2 for all i. Assume that the noise level for each component is such that

with $(0, \infty) \ni \delta^n \rightarrow 0$ , $\mu_i \geqslant 1$ and $\mu_{i_0} = 1$ for some i0. Let the parameter choice for all i be such that

Equation (34)

Let $(u^n)_n$ be a sequence of solutions to (${P_N(\lambda, f)}$ ) and $u^\dagger$ be an R-minimizing solution of $T u = f^\dagger$ . Then

Equation (35)

Assume further that there exists

and constants $0<\beta_1<1$ , $0<\beta_2$ such that

for all u satisfying $ \newcommand{\e}{{\rm e}} R(u) + \sum\nolimits_{i=1}^n D_i(T_iu, f_i^\dagger) \leqslant R(u^\dagger)+ \epsilon$ for some $ \newcommand{\e}{{\rm e}} \epsilon > 0$ . Then, for any j, it holds that

Equation (36)

and

Equation (37)

This parameter choice is optimal for the estimates given in equation (33) in the sense that they cannot be improved by choosing $ \newcommand{\e}{{\rm e}} \lambda_i^n = (\delta_i^n){}^{-(2-\epsilon_i)}$ with $ \newcommand{\e}{{\rm e}} \epsilon_i \in (0, 2)$ .

Proof. First note that the first estimate in equation (33) of theorem 3.2 holds true for any admissible parameter choice. It is also easy to see that, for the particular case of this corollary, the rate for $D_{R}^\xi$ cannot be better than claimed. Also for the data discrepancy $\|T_j u^n -f_j^n\|_{Y_j}^2$ we see by considering the jth summand, that the rate cannot be better than $(\delta^n){}^{2\mu_j}$ , which is the claimed rate for the proposed choice. Plugging in the proposed parameter choice, the claimed rates follow by direct computation. □

3.2. Mixed two-norm and Kullback–Leibler-divergence discrepancies

In inverse problems where the measurement process essentially corresponds to counting a number of events, such as photon measurements in PET or electrons in electron tomography, the noise is typically assumed to follow a Poisson distribution. Interpreting the regularization term as a prior in a Bayesian approach, the data fidelity for computing a maximum a posteriori (MAP) estimate for Poisson-noise-corrupted data is the Kullback–Leibler divergence (see for instance [23, 39]).

Since important applications of joint regularization approaches, such as MR-PET reconstruction or multi-spectral electron tomography, include this type of measurements we also consider the application of the previous results to the Kullback–Leibler divergence as data fidelity. To this aim, we first establish some basic properties of this type of data fidelity.

3.2.1. Basic properties of the Kullback–Leibler divergence.

We define the Kullback–Leibler divergence as follows. Let $(\Sigma, \mathcal{B}_\mu, \mu)$ with $ \newcommand{\R}{\mathbb{R}} \Sigma \subset \R^m$ be a finite measure space. We consider functions in $L^1(\Sigma, \mu)$ , where typically μ is for example either the m-dimensional Lebesgue $\mathcal{L}^ m$ measure and Σ a bounded subset of $ \newcommand{\R}{\mathbb{R}} \R^m$ or and $ \newcommand{\R}{\mathbb{R}} \Sigma \subset \mathcal{S}^{m-1} \times \R$ , with $\mathcal{S}^{m-1}$ being the m  −  1 dimensional unit sphere and the m  −  1 dimensional Hausdorff measure restricted to $\mathcal{S}^{m-1}$ . The latter is the generic image space for the Radon transform.

Then we define $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl:L^1(\Sigma, \mu) \times L^1(\Sigma, \mu) \rightarrow [0, \infty]$ as

Equation (38)

where we set $0 \log \Big(\frac{a} {0}\Big) = 0$ for $a \geqslant 0$ and $-b \log (\frac{0}{b}) = \infty$ for b  >  0. Note that $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ is well-defined since, as an easy computation shows, the integrand is always non-negative, but potentially takes the value infinity also for $u \geqslant 0 $ .

The following properties of $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ are well-known and can for instance be found in [3, 37] for the case of Lebesgue measures and directly extended more general measures since they rely exclusively on pointwise assertions.

Lemma 3.7 (General properties of Kullback–Leibler). For $v, f \in L^1(\Sigma, \mu)$ , the Kullback–Leibler divergence satisfies the following.

  • 1.  
    The function $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\cdot, \cdot)$ is non-negative, and
  • 2.  
    The function $ \newcommand{\dkl}{D_{{\rm KL}}} (v, f)\mapsto \dkl(v, f)$ is convex.
  • 3.  
    The Kullback–Leibler divergence can be estimated as follows
    Equation (39)
  • 4.  
    For $(v^n, f^n)\in L^1(\Sigma, \mu) \times L^1(\Sigma, \mu)$ uniformly bounded in $L^1(\Sigma, \mu)$ ,

Further, we will require suitable continuity results for the Kullback–Leibler divergence as stated in the following proposition.

Proposition 3.8. (Continuity properties)

  • 1.  
    For sequences $(v^n)_n, (\,f^n)_n \in L^1(\Sigma, \mu)$ and $v,f \in L^1(\Sigma,\mu)$ , the Kullback–Leibler divergence is jointly lower semi-continuous in the sense that
    Equation (40)
  • 2.  
    Let $f, v \in L^1(\Sigma, \mu)$ and the sequence $(\,f^n)_n$ be given such that $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f, f^n) \to 0$ as $n \rightarrow \infty$ . Assume that there exists constants $\beta_0, \, \beta_1>0$ such that
    Then
    Equation (41)
  • 3.  
    In the setting of Assertion 2, defining $\Sigma_{v+} = \{x \in \Sigma \, |\, v(x)>0\}$ . Then there exists C  >  0 such that
    Equation (42)

Proof. 

  • Ad 1.  
    We show that $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ is lower semi-continuous with respect to L1 convergence in both components. The desired statement then follows since convergence in $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f, f^n)$ implies convergence in L1 due to lemma 3.7, assertion 3, and convexity of the function yields lower semi-continuity in weak L1 convergence.Without loss of generality we assume that $(v^n)_n$ and $(\,f^n)_n$ are such that $v^n\geqslant 0$ , $f^n \geqslant 0$ pointwise almost everywhere and $ \newcommand{\dkl}{D_{{\rm KL}}} \lim\nolimits_{n\to \infty} \dkl(v^n, f^n)=\liminf\nolimits_{n\to \infty} \dkl(v^n, f^n)$ . The sequences $(v^n)_n$ , $(\,f^n)_n$ are bounded in L1, and thus contain subsequences $(v^{n_i})_i$ , $(\,f^{n_i})_i$ converging pointwise to $v$ and f, respectively. We show that the non-negative integrand $ \newcommand{\bi}{\boldsymbol}g(a, b)=a-b-b\log \big(\frac{a}{b}\big)$ of $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ , is lower semi-continuous on $[0, \infty)\times[0, \infty)$ . Indeed, for $a^n\to a$ , $b^n\to b$ , the case a  >  0, b  >  0 is trivial since g is continuous on $(0, \infty)\times(0, \infty)$ . If a  =  0 and b  =  0, then $g(a, b)=0\leqslant g(a^n, b^n)$ , if a  =  0, b  >  0, then $ \newcommand{\bi}{\boldsymbol}-b^n \log \big(\frac{a^n}{b^n}\big) \to \infty$ implying $\lim g(a^n, b^n)=g(a, b)=\infty$ . In the case a  >  0, b  =  0, the value $ \newcommand{\bi}{\boldsymbol}b^n \log \big(\frac{a^n}{b^n}\big) \to 0$ resulting in $g(a, b)=\lim \limits_{n\rightarrow \infty} g(a^n, b^n)$ . Hence, $ \newcommand{\bi}{\boldsymbol}g\big(v(x), f(x)\big)\leqslant \liminf\nolimits_{i \to \infty} g\big(v^{n_i}(x), f^{n_i}(x)\big)$ a.e. and application of Fatou's Lemma yields
  • Ad 2.  
    We assume $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f, f^n) < \infty $ and $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(v, f)< \infty$ , since the case $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(v, f)=\infty$ is trivial due to lower semi-continuity. Define $\Sigma_{v+} = \{x \in \Sigma \, |\, v(x) >0 \}$ and note that $f(x) > 0 $ for $x \in \Sigma_{v+}$ by assumption. Using the conventions for $a\log(\frac{a}{b})$ for the case that a  =  0 or b  =  0, direct computation shows
    Equation (43)
    Indeed, note that the above computations are also valid in the special cases that one of the occurring functions attains the value 0: since both $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f, f^n) < \infty $ and $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(v, f)< \infty$ are finite we get that, up to sets of measure zero, $f(x) = 0 $ implies fn(x)  =  0 as well as $v(x)=0$ implies $f(x)=0$ . This means that all log terms in the second line above are finite and a simple case study shows that the estimate above also captures all degenerate cases. The claimed convergence then follows immediately.
  • Ad 3.  
    This follows directly from (43) and lemma 3.7.

Our goal is now to identify when the general assumptions (G) as well as assumptions $S_1(u, f^\dagger, I^c)$ , $S_2(U, f^ \dagger)$ can be verified for data fidelities that include $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ . To this aim, we summarize the corresponding assertions in the following proposition, which is an immediate consequence of lemma 3.7 and proposition 3.8.

Proposition 3.9 (Properties of $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ ). Define by $ \newcommand{\dkl}{D_{{\rm KL}}} \tau_{\dkl}$ to be the trivial topology on $L^1(\Sigma, \mu)$ such that $ \newcommand{\dkl}{D_{{\rm KL}}} f^n \mathop{\rightarrow}\limits^{\dkl} f$ if and only if $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f, f^n) \rightarrow 0$ . Then the following holds for each $f, v \in L^1(\Sigma, \mu)$ and each sequence $(\,f^n)_n $ in $L^1(\Sigma, \mu)$ .

  • 1.  
    $f^n \overset{D_{\rm KL}}{\rightarrow} f$ if and only if $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f, f^n) \rightarrow 0$ and $\|f^n -f \|_1 \rightarrow 0$ .
  • 2.  
    $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(u, v) \geqslant 0$ , $ \newcommand{\dkl}{D_{{\rm KL}}} \newcommand{\R}{\mathbb{R}} \dkl(u, v) = 0 \Rightarrow u=v$ .
  • 3.  
    $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ is lower semi-continuous w.r.t weak-L1 convergence in the first and $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ convergence in the second component.
  • 4.  
    If $v, f^\dagger \in L^1(\Sigma, \mu)$ are such that there exists constants $\beta_0, \beta_1$ with $\beta_0 f^\dagger \geqslant v \geqslant \beta_1 f^\dagger$ , then the mapping
  • 5.  
    If $U \subset L^1 (\Sigma, \mu)$ and $f^\dagger \in L^1(\Sigma, \mu)$ are such that there exit constants $\beta_0, \beta_1$ with $\beta_0 f^\dagger \geqslant v \geqslant \beta_1 f^\dagger$ for all $v \in U$ , then there exists C  >  0 such that, for each $v \in U$ and f such that $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl(\,f^\dagger, f) \leqslant 1$ we have
    Equation (44)

3.2.2. Stability and convergence for mixed two-norm and Kullback–Leibler discrepancies.

In the previous section we have derived properties of the Kullback–Leibler divergence that allow to incorporate it in the general framework of section 2. In this section, motivated by concrete applications, we will work out how these results can be employed to the particular case of mixed L2 and Kullback–Leibler data discrepancy terms. That is, we decompose the index set $\{1, \ldots, N\}$ as the disjoint union of $ \newcommand{\nr}{{{\rm nr}}} L_\nr$ and $ \newcommand{\kl}{{{\rm kl}}} L_\kl$ and consider the minimization problem

where $p \in (1, 2]$ and $ \newcommand{\R}{\mathbb{R}} \Omega \subset \R^d$ is a bounded Lipschitz domain, $R:L^{\,p}(\Omega){}^N \rightarrow [0, \infty]$ is a regularization term and $\lambda \in (0, \infty){}^N$ . Note that we have further specified the problem by evaluating the data discrepancies only on the components ui of $u = (u_1, \ldots, u_N)$ , which is a particular case of the previous setting and corresponds to the situation that all components of u are obtained from independent measurements. Also, we now assume the operators Ti to be bounded linear operators, that is, for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ we assume $T_i\in \mathcal{L}(L^{\,p}(\Omega) , L^{\,p}(\Sigma_i))$ with $ \newcommand{\R}{\mathbb{R}} \Sigma_i \subset \R^{m_i}$ bounded and Lebesgue measurable and for $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ we assume $T_i \in \mathcal{L}(L^{\,p}(\Omega) , L^1(\Sigma_i, \mu_i))$ with $ \newcommand{\R}{\mathbb{R}} \Sigma_i \subset \R^{m_i}$ and $(\Sigma_i, \mathcal{B}_{\mu_i}, \mu_i)$ a finite measure space. The (fi) denote the given data in the respective spaces and, for indices $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ , we have included additional additive terms ci that we assume to be given and such that $c_i \in L^ 1(\Sigma_i)$ , $c_i \geqslant 0$ pointwise. The reason for including the ci is that in PET imaging they are typically used to model some (a-priory estimated) measurements resulting from scattering and random events (see for instance [26]).

The data discrepancies $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ are the Kullback–Leibler divergence as defined in equation (38). For the norm discrepancies we extend the 2-norm by duality for $v \in L^{\,p}(\Sigma_i)$ as

Note that, as can be easily seen by duality and density arguments that $\|u\|_2 < \infty$ if and only if $u \in L^2(\Sigma_i)$ and $\|\cdot \|_2$ is lower semi-continuous w.r.t weak L1 convergence as being the pointwise supremum of a set of continuous functions [14]. The reason for using $L^{\,p}(\Sigma_i)$ as image space and extending the 2-norm instead of defining Ti to be continuously mapping in L2 is that, in typical applications we need to choose $p \leqslant \frac{d}{d-1}$ with $d \in \{2, 3, 4\}$ to obtain continuous embeddings and hence allowing Ti to map continuously to Lp instead of L2 is less restrictive. On the other hand, the 2-norm is the natural choice for measurements corrupted by Gaussian noise.

For the particular setup of equation ($P_{{\rm NKL}}(\lambda, f)$ ), choosing the involved topologies appropriately, the general assumption (G) can be simplified as follows.

Assumption (NKL). [$\tau_X$ and $\tau_{Y_i}$ the weak topologies in the respective spaces, $\tau_{D_i}$ the trivial topologies]

  • 1.  
    R is lower semi-continuous w.r.t weak convergence in $L^{\,p}(\Omega){}^N$ .
  • 2.  
    $T_i\in \mathcal{L}(L^{\,p}(\Omega) , L^{\,p}(\Sigma_i))$ for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ and $T_i \in \mathcal{L}(L^{\,p}(\Omega) , L^1(\Sigma_i, \mu_i))$ for $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ .
  • 3.  
    For each C  >  0 the set
    is sequentially compact w.r.t. weak convergence in $L^{\,p}(\Omega) ^N$ .

Proposition 3.10. If assumption (NKL) holds for the particular choices of regularization and data fidelity used equation ($P_{{\rm NKL}}(\lambda, f)$ ), and if we choose $\tau_X$ the topology corresponding to weak Lp convergence, $\tau_{D_i}$ the trivial topologies and $\tau_{Y_i}$ the topology corresponding to weak Lp and weak L1 convergence for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ and $ \newcommand{\kl}{{{\rm kl}}} i\in L_\kl$ , respectively, then assumption (G) holds.

Proof. First note that, when considering assumption (G) we include the additional terms ci appearing in the Kullback–Leibler discrepancies in the forward operators Ti, that is, for $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ we consider the affine operators $\tilde{T}_i u = T_i u_i + c_i$ . Given the properties of $ \newcommand{\dkl}{D_{{\rm KL}}} \dkl$ and the 2-norm, the only thing that is then left to show is that the sequential compactness as in Point 3. above implies sequential compactness as in assumption (G), Point 5. To this aim take $\lambda \in (0, \infty) ^N$ and sequences $(u^n)_n$ , $(\,f^n)_n$ , and assume that $f^n \overset{D}{\rightarrow} f$ and $J_\lambda(u^n, f^n)<c$ . Obviously this implies that R(un) is bounded. For $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ , also, $\|T_i u^n_i - f^n_i\|_2$ is bounded and we can estimate

and consequently $\|T_i u_i\|_{p}$ is bounded. Hence, if we can show that $\|T_iu_i^n\|_1$ is bounded for $ \newcommand{\kl}{{{\rm kl}}} i \in L_{\kl}$ the assertion follows. Assume that (up to subsequences) $\|T_i u_i^n\|_1 \rightarrow \infty$ which implies that $\|T_i u_i^n + c_i\|_1 \rightarrow \infty$ . Using lemma 3.7 we get for large enough n

Now estimating all bounded quantities with constants, this implies that

for $C_1, C_2 >0$ which contradicts to $\|T_iu^n_i+ c_i\|_1 \rightarrow \infty$ . Hence the assertion follows. □

Now the continuity assumption $S_1(u, f^\dagger, I^c)$ that was required for stability translates to the following assumption on the data.

Assumption ${S^{{\rm kl}}_1(u, f^\dagger, I^c \cap L_{{\rm kl}})}$

Noting that assumption $S_1(u, f^\dagger, I^c)$ was only used for u such that $D_i(T_iu, f_i^\dagger) < \infty$ , the following lemma allows to relate the assumption $S_1(u, f^\dagger, I^c)$ and $S^{{\rm kl}}_1(u, f^\dagger, I^c \cap L_{{\rm kl}})$ .

Lemma 3.11. Assume that (NKL) holds and that u is such that $\|T_i u_i - f_i^\dagger\|_2 < \infty$ for $ \newcommand{\nr}{{{\rm nr}}} i \in I^c \cap L_\nr$ . Then assumptions $S^{{\rm kl}}_1(u, f^\dagger, I^c \cap L_{{\rm kl}})$ implies $S_1(u, f^\dagger, I^c)$

Proof. For indices $ \newcommand{\nr}{{{\rm nr}}} i \in I^c \cap L_\nr$ we note that $\|T_i u_i - f_i^\dagger\|_2 < \infty$ implies for all fi with $\|\,f_i - f_i^\dagger \|_2$ finite that $\|T_i u_i -f_i \|_2 \leqslant \|T_i u_i -f_i^\dagger \|_2 + \|f_i^\dagger -f_i \|_2< \infty$ . Hence the inverse triangle inequality can be employed to show continuity of $f_i \mapsto \|T_i u_i - f_i\|_2 $ w.r.t Di-convergence. For $ \newcommand{\kl}{{{\rm kl}}} i \in I^c \cap L_\kl$ , this is the assertion of proposition 3.9. □

Remark 3.12. Remember that assumption $S^{{\rm kl}}_1(u, f^\dagger, I^c \cap L_{{\rm kl}})$ will be applied with u being a solution to $P_{{\rm NKL}}(\lambda, f^\dagger)$ and hence it appears not too restrictive. In particular, it is not stronger that standard assumptions in the context of inverse problems with Poisson noise even with a single data discrepancy [37, 39].

Using assumption $S^{{\rm kl}}_1(u, f^\dagger, I^c \cap L_{{\rm kl}})$ , a basic existence and stability result follows as direct consequence of proposition 2.2 and corollary 2.7.

Proposition 3.13 (Stability). Assume that assumption (NKL) holds. Let $(\,f^n)_n$ in Y and $f^\dagger \in Y$ be such that $f^n \overset{{D}}{\rightarrow}f^\dagger$ as $n\rightarrow \infty$ and take $\lambda \in (0, \infty){}^ N$ . In case $J_\lambda(\cdot, f^n)$ is proper, there exists a solution to $P_{{\rm NKL}}(\lambda, f^n)$ . If we further assume that, for u0 being a solution to ($P_{{\rm NKL}}(\lambda, f)$ ), assumption $ \newcommand{\kl}{{{\rm kl}}} S^{\kl}_1(u_0, f^\dagger, \{1, \ldots, N\} \cap L_\kl)$ holds, then every sequence $(u^n)_n$ of solutions to

Equation (45)

admits a $\tau_X$ -convergent subsequence again denoted by $(u^n)_n$ with limit $u^\dagger$ such that

and every limit of a $\tau_X$ -convergent subsequence of solutions is a solution to

Equation (46)

It is immediate that, under assumption (NKL) and even without the assumption $S^{{\rm kl}}_1(u, f^\dagger, I^c \cap L_{{\rm kl}})$ , the counterpart of the convergence result of corollary 2.8 holds for the particular setting of this section. In order to obtain rates, we need to translate assumption $S_2(U, f^ \dagger)$ appropriately as follows.

Assumption ${S^{\rm kl}_2(U, f^ \dagger)}$

Again we note that, when assumption $S_2(U, f^ \dagger)$ will be applied in the particular setting of this section, we set $U = \{(u^n)_n \, |\, n \geqslant n_0 \} \cup \{u^\dagger \}$ with $(u^n)_n$ being a sequence of minimizers and $u^\dagger$ being a minimizer for data fn and $f^\dagger$ , respectively. Hence, as can be easily deduced by the triangle inequality, it will hold that $\|T_iu_i-f_i\|_2 + \|T_iu_i-f_i^\dagger\|_2< \infty$ for all $u \in U$ , fi such that $\|f_i-f_i^\dagger \|_2 < \infty$ and $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ . Taking this into account, the relation of assumptions $S^{\rm kl}_2(U, f^ \dagger)$ and $S_2(U, f^ \dagger)$ is given as follows.

Lemma 3.14. Assume that assumption (NKL) holds and that U is such that for each $u \in U$ , $\|T_i u_i - f_i\|_2 + \|T_i u_i - f_i^\dagger\|_2 < \infty$ for each $\|f_i-f_i^\dagger\|_2 <\infty $ and $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ . Then assumptions $S^{\rm kl}_2(U, f^ \dagger)$ implies $S_2(U, f^ \dagger)$ with $\psi_i(x) = x^\frac{1}{2}$ for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ and $\psi_i(x) = x^{1/2}$ for $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ .

Proof. Given that all involved quantities are finite, we can transfer the result of proposition 3.4 also to the extended 2-norm, which implies that $S_2(U, f^ \dagger)$ holds with $\psi_i (x) = x^{1/2}$ for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ . For $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ , it is the assertion of proposition 3.9 that $S_2(U, f^ \dagger)$ holds with $\psi_i(x) = x^{1/2}$ . □

Using assumption $S^{\rm kl}_2(U, f^ \dagger)$ , we get the following convergence rates result for ($P_{{\rm NKL}}(\lambda, f)$ ).

Corollary 3.15. Let a sequence $(\,f^n)_n$ be such that $f^n \mathop{\rightarrow}\limits^{{D}} f^\dagger\in Y$ as $n\rightarrow \infty $ . Define $\delta^n_i = \|f_i^n - f^\dagger_i\|_2$ for $i \in L_{nr}$ and $ \newcommand{\dkl}{D_{{\rm KL}}} \delta^n_i = \dkl(\,f^\dagger_i, f^n_i)$ for $i \in L_{kl}$ . Assume that for each i,

with $(0, \infty) \ni \delta^n \rightarrow 0$ and $\mu_i \geqslant 1$ . Define $ \newcommand{\nr}{{{\rm nr}}} \newcommand{\kl}{{{\rm kl}}} \overline{\mu} = \min (\{\mu_i\, |\, i \in L_\nr \} \cup \{\mu_i/2 \, | \, i \in L_\kl \}) $ , set $ \newcommand{\e}{{\rm e}} \epsilon_i = \overline{\mu}/\mu_i$ for all i, let the parameter choice be such that

and denote $\lambda^\dagger = \lim\nolimits_{n \rightarrow \infty} \lambda ^n \in (0, \infty]^N$ . Let $(u^n)_n $ be a sequence of solutions to $P_{{\rm NKL}}(\lambda^n, f^n)$ and let $u^\dagger $ be an R-minimizing solution of $T_i u_i = f^\dagger$ . Set $U_{n_0} = \{u^n \, |\, n \geqslant n_0 \} \cup \{u^\dagger \}$ and assume that $ \newcommand{\kl}{{{\rm kl}}} S_2^\kl(U_{n_0}, f^\dagger)$ holds for some n0. Then

Equation (47)

If we assume further that there exists

constants $0<\beta_1<1$ , $0<\beta_2$ such that

for all u satisfying $ \newcommand{\dkl}{D_{{\rm KL}}} \newcommand{\nr}{{{\rm nr}}} \newcommand{\kl}{{{\rm kl}}} \newcommand{\e}{{\rm e}} R(u)+ \sum\nolimits_{i \in L_\nr} \|T_i u_i - f^\dagger _i\|_2 ^2 + \sum\nolimits_{i \in L_\kl} \dkl(T_iu_i+c_i, f^\dagger_i) \leqslant R(u^\dagger)+ \epsilon$ for some $ \newcommand{\e}{{\rm e}} \epsilon > 0$ , then we obtain for any $j \in \{1, \ldots, N\}$ ,

Equation (48)

Equation (49)

Equation (50)

Proof. The estimate on R is an immediate consequence of theorem 2.11 and the parameter choice. In case the source condition holds, the theorem also implies that

Now using Young's inequality we can estimate for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ and C  >  0 a generic constant

Further, by $ \newcommand{\kl}{{{\rm kl}}} S^\kl_2(U_{n_0}, f^\dagger)$ we can estimate

Together, this implies that

Plugging in the parameter choice this implies the rates as claimed. □

We note that the rates for the data terms above are optimal in the sense that, even for single data term regularization with the corresponding data term, the rate would be the same. Since the Bregman distance involves all components of the unknown, we naturally obtain the worst-case rate amongst all data terms.

4. Particular choices for coupled regularization

In this subsection, we show how the previous results on mixed L2 and Kullback–Leibler discrepancies can be applied for concrete regularization functionals. First we consider a rather simple example of joint wavelet-sparsity regularization, where the regularization term is coercive in L2. As second application, we consider coupled second order total generalized variation (TGV) [9] regularization, where coercivity can only be established up to finite dimensional subspaces, and show how our results can also be applied in this situation.

4.1. Joint wavelet sparsity

Wavelets are well-known and heavily used in image processing and beyond for their property of sparse approximation of data. This is exploited for example in image compression and denoising applications [30]. When dealing with the regularization of multi-spectral data where the individual components are correlated, a natural approach for coupled regularization is to enforce joint sparsity of wavelet coefficients, see for instance [27]. This is realized by the regularization functional

where $W_Nu = (Wu_1, \ldots, Wu_N)$ with $ \newcommand{\e}{{\rm e}} W:L^2(\Omega) \rightarrow \ell^2$ a biorthogonal wavelet transform, Ω a bounded set and, for $ \newcommand{\e}{{\rm e}} z= (z_1, \ldots, z_N) = ((z_{i, 1})_i, \ldots, (z_{i, N})_i) \in (\ell^2){}^N$ we define

With the definitions of section 3.2.2, we then consider

Then, with $R(u) = \|W_Nu\|_{2, 1} $ , it is easy to see that R is lower semi-continuous w.r.t weak L2 convergence and, setting p  =  2, that also the compactness requirement of assumption (NKL) is satisfied since $c \|u\|_{L^2} \leqslant \|W_Nu\|_2 \leqslant \|W_Nu\|_{2, 1}$ for all $u \in L^2(\Omega){}^N$ and c  >  0 (see [30]).

Hence if all the forward operators Ti are continuous, assumption (NKL) holds and all results of section 3.2.2 apply.

4.2. Joint total generalized variation regularization

The Total Generalized Variation functional has been introduced in 2010 [9] with the aim of overcoming the well-known staircasing effect of the Total Variation (TV) functional while still allowing for jump discontinuities in the reconstruction. It has been analyzed in [5, 8] in the context of regularization of linear inverse problems and has been heavily used in diverse applications, including the joint reconstruction of magnetic resonance (MR) and positron emission tomography (PET) images [26]. The latter employs an extension of second order TGV for vector-valued data where the coupling of the individual terms is carried out via a Nuclear and Frobenius norm at the level of first and second order derivatives, respectively. In this section, we show how the general results of section 3.2.2 can be applied to an extended version of this setting. Numerical examples for concrete applications to multi-contrast MR and PET-MR regularization will then be considered in the next section.

For parameters $\alpha_0, \alpha_1 > 0$ and $ \newcommand{\BV}{{\rm BV}} u=(u_1, \ldots, u_N) \in \BV(\Omega){}^N$ with Ω a bounded Lipschitz domain we define the second order TGV functional as

Equation (51)

noting that by the results in [5] this is equivalent to the original definition of $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\TGVat}{\TGV_\alpha ^2} \newcommand{\T}{\mathcal{T}} \TGVat$ as in [9]. Here, $ \newcommand{\Wrt}{\mathrm{D}} \Wrt u = (\Wrt u_1, \ldots, \Wrt u_N)$ and $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} \symgrad w = (\symgrad w_1, \ldots, \symgrad w_N) $ denote the distributional derivative and symmetrized derivative of u and $w=(w_1, \ldots, w_N)$ , respectively, and $ \newcommand{\BD}{{\rm BD}} \newcommand{\R}{\mathbb{R}} \BD(\Omega, \R^{d}){}^N$ denotes the space of functions of bounded deformation, i.e. the space of all $ \newcommand{\R}{\mathbb{R}} w \in L^1(\Omega, \R^d){}^N$ such that $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} \symgrad w$ can be represented by a finite Radon measure. For $ \newcommand{\M}{\mathcal{M}} z \in \M(\Omega, Z){}^N$ , the space of N-tuples of finite Radon measures with values in $ \newcommand{\R}{\mathbb{R}} Z \in \{\R^{d}, S^{d \times d}\} $ where $S^{d \times d}$ is the space of symmetric $d\times d$ matrices, we define

where $|\cdot |$ can be any pointwise norm on ZN. We note that in fact this pointwise norm defines the coupling of the different components of Du and $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} \symgrad w$ , e.g. choosing $|\cdot|$ to be a Frobenius norm results in a Frobenius-norm-type coupling of $ \newcommand{\Wrt}{\mathrm{D}} \Wrt u$ and $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} \symgrad w$ . In [26], $|\cdot|$ was chosen to be the spectral norm for $ \newcommand{\R}{\mathbb{R}} Z = \R^{d}$ , yielding a nuclear-norm coupling of the components of $ \newcommand{\Wrt}{\mathrm{D}} \Wrt u$ .

With the definitions of section 3.2.2 and $p \in (1, d/(d-1)]$ , we now consider the minimization problem

Here, $ \newcommand{\I}{\mathcal{I}} \I_{{\rm KL}_+}(u)$ is the indicator function of the set $ \newcommand{\kl}{{{\rm kl}}} {\rm ~KL}_+:= \{u \in L^{\,p}(\Omega){}^ N \, |\, u_i \geqslant$ $ \newcommand{\kl}{{{\rm kl}}} 0 {\rm ~a.e.~for~all~} i \in L_\kl\}$ and constrains all ui for $ \newcommand{\kl}{{{\rm kl}}} i\in L_\kl$ to be non-negative, i.e.

Our goal is to show that assumption (NKL) holds for this setting. However, since $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\TGVat}{\TGV_\alpha ^2} \newcommand{\T}{\mathcal{T}} \TGVat$ itself is coercive only up to a finite dimensional subset, we have to employ a slight modification of this setup to obtain coercivity (without imposing further assumptions on the Ti), which is equivalent in the sense that solutions to the modified problem are still solutions to the original problem. That is, with $\mathcal{P}^1(\Omega){}^N$ the set of $ \newcommand{\R}{\mathbb{R}} \R^N$ -valued polynomials of order less or equal to one, we define the finite dimensional subspace

where $T = (T_1, \ldots, T_N)$ . Further, we define

where $(\cdot, \cdot)$ denotes the standard inner product in $ \newcommand{\R}{\mathbb{R}} \R^N$ , and note that, as can be easily shown, $Z^\perp$ is a complement of Z in $L^{\,p}(\Omega)$ , i.e. $Z^\perp$ is closed, $Z\cap Z^\perp = \{0\}$ and $L^{\,p}(\Omega) = Z + Z^\perp$ .

We then consider the modified problem

which is equivalent to the original one in the sense that any solution of ($P_{{\rm NKL-TGV}}(\lambda, f)$ ) is a solution of the original one and, with $u = u_1 + u_2 \in Z^\perp + Z$ being a solution of the original problem, u1 is a solution of ($P_{{\rm NKL-TGV}}(\lambda, f)$ ).

For this setting, we get the following assertion.

Proposition 4.1. With the setting of section 3.2.2, let $p\in (1, d/(d-1)]$ and assume that $T_i\in \mathcal{L}(L^{\,p}(\Omega) , L^{\,p}(\Sigma_i))$ for $ \newcommand{\nr}{{{\rm nr}}} i \in L_\nr$ and $T_i \in \mathcal{L}(L^{\,p}(\Omega) , L^1(\Sigma_i, \mu_i))$ for $ \newcommand{\kl}{{{\rm kl}}} i \in L_\kl$ . Set $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\TGVat}{\TGV_\alpha ^2} \newcommand{\I}{\mathcal{I}} \newcommand{\T}{\mathcal{T}} R = \TGVat +\I_{{\rm ~KL}_+} +\I_{Z^\perp}$ . Then assumption (NKL) holds for ($P_{{\rm NKL-TGV}}(\lambda, f)$ ) and all results of section 3.2.2 apply.

Proof. First note that $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\TGVat}{\TGV_\alpha ^2} \newcommand{\T}{\mathcal{T}} \TGVat$ , $ \newcommand{\I}{\mathcal{I}} \I_{{\rm KL}_+}$ and $ \newcommand{\I}{\mathcal{I}} \I_{Z^\perp}$ are lower semi-continuous w.r.t weak Lp convergence (see for instance [6] for $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\T}{\mathcal{T}} \TGV$ in the vector-valued case). Hence we are left to show that any sequence $(u^n)_n $ in $L^{\,p}(\Omega){}^N$ such that

admits an Lp-weakly convergent subsequence. To this aim, we first define the operator $P_{\mathcal{P}^1}:L^{\,p}(\Omega) \rightarrow L^{\,p}(\Omega) $ as

It is easy to see that $P_{\mathcal{P}^1}$ is well-defined and a continuous linear projection. Then, by results in [5, 6], there exists C  >  0 such that

Hence, in order to obtain existence of a weakly convergent subsequence in $L^{\,p}(\Omega){}^N$ , we are left to show boundedness of $P_{\mathcal{P}^1}(u^n)$ . For this, note that since $u^n \in Z^\perp$ , also $P_{\mathcal{P}^1}(u^n) \in Z^\perp$ for all n and since T is injective on the finite dimensional space $\mathcal{P}^1(\Omega){}^N \cap Z^\perp$ we get from equivalence of norms in finite dimensions that and boundedness of $u^n_i - P_{\mathcal{P}^1}(u^n)_i$ in Lp

for some C  >  0 and the proof is complete. □

5. Numerical realization and examples

In this section we sketch how coupled second order TGV regularization with multiple data discrepancies can be realized numerically and provide some examples at the end. Source code that realizes the outlined algorithmic framework for any number of data terms in dimensions two and three is provided online [24].

With $\Omega_h$ being a discretized pixel grid in $ \newcommand{\R}{\mathbb{R}} \R^d$ and the discrete vector spaces $ \newcommand{\R}{\mathbb{R}} U:= \{u:\Omega_h \rightarrow \R^N\}$ an $ \newcommand{\R}{\mathbb{R}} V:= \{v:\Omega_h \rightarrow (\R^d){}^N\}$ , we consider the following minimization problem.

Equation (52)

where we now simultaneously minimize over the unknown u and the balancing variable $v$ appearing in the definition of $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\TGVat}{\TGV_\alpha ^2} \newcommand{\T}{\mathcal{T}} \TGVat$ . We equip the discrete vector spaces with the standard inner products $(u, s):= \sum\nolimits_{x \in \Omega_h} \sum\nolimits_{i=1}^N u_i(x)s_i(x)$ for $u, s \in U$ and similarly for $V$ , and assume that all involved operators and functionals are now appropriately discretized (see for instance [4, 7] for a detailed explanation on a possible discretization of $ \newcommand{\TGV}{{\rm TGV}} \newcommand{\TGVat}{\TGV_\alpha ^2} \newcommand{\T}{\mathcal{T}} \TGVat$ ). In particular, $\nabla :U \rightarrow V$ is a discrete gradient operator and $ \newcommand{\R}{\mathbb{R}} \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} \symgrad: V \rightarrow W:= (\R^{d \times d}){}^N$ a discrete symmetrized gradient operator, and the terms $\|z\|_1$ for $z \in V$ or $z \in W$ denote discrete L1 norms such that $\|z\|_1 = \sum\nolimits_{x \in \Omega_h} |z(x)|$ with $|\cdot|$ appropriate pointwise norms on $ \newcommand{\R}{\mathbb{R}} (\R^d){}^N$ or $ \newcommand{\R}{\mathbb{R}} (\R^{d \times d}){}^N$ that define the coupling of the individual components.

For simplicity, we assume that $T = (T_1, \ldots, T_N)$ does not vanish on affine functions, hence existence can be ensured without the additional constraint on $Z^\perp$ . Re-writing the discrete minimization problem (52) in the form

with $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} Kx = K(u, v) = (\nabla u - v, \symgrad v, Tu)$ , $ \newcommand{\I}{\mathcal{I}} G=\I_{{\rm KL}_+}$ and F chosen appropriately, in the discrete setting it follows from standard arguments from convex analysis ([14, theorem III.4.1 and proposition III.3.1]) that (52) is equivalent to a saddle-point problem of the form

where $F^*(y^*): = \sup\nolimits_y (y, y^*) - F(y)$ denotes the convex conjugate of F. More explicitly, this yields equivalence of (52) to the saddle-point problem

Equation (53)

where the variables $p, q, r$ are defined in appropriate image spaces of the operators $\nabla$ , $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} \symgrad$ and T, respectively and $ \newcommand{\dkl}{D_{{\rm KL}}} (\lambda_i\dkl){}^*(r_i, f_i) := \sup\nolimits_{s} (r_i, s) - \lambda_i\dkl(s, f_i)$ . Employing the general algorithm of [10] to this saddle-point reformulation, we arrive at algorithm 1 for obtaining a solution of (53).

Algorithm 1. Scheme of implementation for solving (53).

1: function coupled-reconstruction $(\,f_i)_{i}, (c_i)_i$
2:    Initialize $u, \overline{u}, v, \overline{v}, p, q, r, \sigma, \tau$
3:    Repeat
4:       $p \gets {\rm prox}_{\alpha_1} (\,p + \sigma \nabla \overline{u} - \overline{v})$
5:       $ \newcommand{\symgrad}{\mathcal{E}} \newcommand{\sym}{{\rm Sym}} q \gets {\rm prox}_{\alpha_0} (q + \sigma \symgrad \overline{v})$
6:       $ \newcommand{\nr}{{{\rm nr}}} r_i \gets {\rm prox} _{\|\cdot\|_2 ^2, \lambda_i} (r_i + \sigma T_i \overline{u}_i - \sigma f_i) \quad {\rm for}~i \in L_\nr $
7:       $ \newcommand{\dkl}{D_{{\rm KL}}} \newcommand{\kl}{{{\rm kl}}} r_i \gets {\rm prox} _{\dkl, \lambda_i} (r_i + \sigma T_i \overline{u}_i + \sigma c_i) \quad {\rm for}~i \in L_\kl $
8:       $ \newcommand{\dive}{{\rm div}} \newcommand{\bi}{\boldsymbol} u_+ \gets {\rm prox}_{{\rm KL}_+} \big(u - \tau (-\dive p + T^* r) \big)$
9:       $ \newcommand{\dive}{{\rm div}} v_+ \gets v - \tau (-p - \dive q) $
10:       $ (\overline{u}_+, \overline{v}_+) \gets2 (u_+, v_+) - (u, v) $
11:       $ (\sigma_+, \tau_+) \gets \mathcal{S}(\sigma, \tau, u_+, v_+, u, v) $
12:       $ (u, \overline{u}) \gets(u_+, \overline{u}_+) $
13:       $ (v, \overline{v}) \gets (v_+, \overline{v}_+) $
14:       $ (\sigma, \tau) \gets (\sigma_+, \tau_+) $
15:    Until Stopping criterion fulfilled
16:    Return u
17: end function

Note that there, the operation ${\rm prox}$ corresponds to proximal mappings which are, for a function $g\colon H \to H$ on a Hilbert space H and $\alpha>0$ , defined as

Here, at any point $x \in \Omega_h$ in the discrete spatial grid and for $i \in \{0, 1\}$ , the mapping ${\rm prox}_{\alpha_i}$ can be given explicitly as

that is, it reduces to a pointwise projection with respect to the pointwise norm. In case $|\cdot|$ is the Frobenius norm (i.e. the root of the squared sum of all entries on $ \newcommand{\R}{\mathbb{R}} (\R^d){}^N$ or $(S^{d\times d}){}^N$ ), the projection is given as $ \newcommand{\proj}{{\rm proj}} \proj_{\{z \, :\, |z| \leqslant \alpha_i \}}(\hat{z}) = \frac{\hat{z}}{\max\{1, |\hat{z}|/\alpha_i\}}$ . In the case that $|\cdot|$ is the nuclear-norm on $ \newcommand{\R}{\mathbb{R}} (\R^d){}^N$ the computation requires a pointwise SVD of $2\times2$ or $3\times3$ matrices (depending on the spatial dimensions and the number of channels) at every point and efficient code to compute this is provided in the source code [24]. The other proximal mappings can be given explicitly and again pointwise for any $x \in \Omega_h$ as

The operator $\mathcal{S}(\cdot)$ realizes an adaptive stepsize update and we refer to [26] for more details on the discretization for a particular case and to the source code [24] for more details on the general setup.

5.1. Exemplary numerical results

In this section we provide exemplary numerical results that have been obtained with algorithm 1 described in the previous section and the publicly available source code [24].

In the first experiment, we test the reconstruction of an 2D in vivo coronal knee MRI exam from a clinical MR system. This is an interesting test case for our approach because parts of the clinical protocol consists of two acquisition sequences that obtain data for the same orientation but with two different contrasts. In the first sequence, the spin signal from fat-tissue is suppressed, in the second sequence, fat-tissue is not suppressed. We employ the setting of (52) with N  =  2, $ \newcommand{\nr}{{{\rm nr}}} L_\nr = \{1, 2\}$ and $ \newcommand{\kl}{{{\rm kl}}} \newcommand{\e}{{\rm e}} L_\kl = \emptyset$ .

A reduced number of k-space lines, at a rate of 4 below the Nyquist limit, was acquired to accelerate the duration of the scan. Figure 1 shows results from conventional CG-SENSE parallel imaging [34] and joint TGV regularized reconstructions. Since no ground truth is available, the parameters for the methods were chosen according to visual inspection. As can be seen in the figure, the regularized reconstruction suffers from less noise and artifacts and anatomical details are better visible.

Figure 1.

Figure 1. In vivo multi-contrast knee measurement with a data reduction factor of 4. Joint TGV reconstruction is compared to conjugate gradient SENSE.

Standard image High-resolution image

In the next experiment, we consider the situation of combining data from two different imaging modalities, which is an interesting problem in the context of recent developments in hybrid PET-MRI systems [35]. This is an interesting case because the different imaging physics lead to different noise properties. MR data is corrupted by Gaussian noise, while PET data comes from a Poisson process. We first performed a numerical simulation study using a human brain phantom [1], where we have a ground truth available to evaluate our results. The PET acquisition was simulated such that the total number of counts corresponded to a 10min FDG PET head scan. Radial MR acquisitions with 16 projections were simulated for three commonly used contrasts in brain imaging (MPRAGE, FLAIR, T2 weighted). Acquisition with an 8 channel receive coil was simulated, and MR k-space data were corrupted by adding complex Gaussian noise. The simulated matrix size was 176  ×  176  ×  30.

We reconstructed the combined dataset by solving (52) in a three-dimensional setting with N  =  4, $ \newcommand{\nr}{{{\rm nr}}} L_\nr = \{1, 2, 3\}$ and $ \newcommand{\kl}{{{\rm kl}}} L_\kl = \{4\}$ , and using nuclear norm coupling [26] of the four channels.

Results of the simulation are shown in figure 2. For reference, we also present results using the currently used standard methods to reconstruct such data, expectation maximization [44] for PET and iterative conjugate gradient (CG) SENSE [34] for MR. Since PET is a quantitative imaging modality, it is important that the quantitative signal values are preserved when coupling the different contrasts and modalities. Table 1 presents the PET signal values (in Bq cm−3) of the ground truth, the conventional and the joint reconstruction for gray matter, white matter and cerebrospinal fluid. Our results demonstrate that coupling improves image quality in terms of RMSE to the ground truth and the fidelity of the quantitative signal values.

Figure 2.

Figure 2. Numerical simulation of multiple MR contrasts and PET. Joint TGV reconstructions show pronounced reduction of streaking artifacts and higher resolution for PET. This is also reflected in notable reductions of RMSE (displayed at bottom left of each image) to the ground truth.

Standard image High-resolution image

Table 1. PET signal activity (Bq cm−3) for three different brain tissues.

  Gray matter White matter Cerebrospinal fluid
Ground truth 229 90   8450    0
EM 173 90 11 470 8097
Joint TGV 20 255 10 368 3193

Finally, we performed an experiment with in vivo PET-MR data from a clinical PET-MR system (figure 3). PET scan duration was 10 min, the MR exam was performed using an MPRAGE contrast at an acceleration factor of 4 below the Nyquist limit. The variational reconstruction was obtained by solving (52) again in a three-dimensional setting with N  =  2, $ \newcommand{\nr}{{{\rm nr}}} L_\nr = \{1\}$ and $ \newcommand{\kl}{{{\rm kl}}} L_\kl = \{2\}$ . Reference reconstructions using EM and CG-SENSE are again shown for reference. Again no ground truth is available, but one can see that the joint reconstruction method generally reduces noise and obtains a sharper reconstruction of the PET image.

Figure 3.

Figure 3. In vivo PET-MRI measurement. Joint TGV reconstruction is compared to EM (PET) and conjugate gradient SENSE (MRI).

Standard image High-resolution image

Acknowledgments

MH acknowledges support by the Austrian Science Fund (FWF) (Grants J 4112 and P 29192). He would also like to thank the Isaac Newton Institute for Mathematical Sciences for support and hospitality during the program 'Variational Methods and Effective Algorithms for Imaging and Vision', which was supported by EPSRC grant number EP/K032208/1, when part of the work on this paper was undertaken. RH acknowledges support by the Austrian Science Fund (FWF) (Grant P 29192). FK acknowledges support by the National Institutes of Health (NIH) under grant P41 EB017183.

Please wait… references are loading.
10.1088/1361-6420/aac539