Deep learning based digital backpropagation demonstrating SNR gain at low complexity in a 1200 km transmission link

Bertold Ian Bitachon; Amirhossein Ghazisaeidi; Marco Eppenberger; Benedikt Baeuerle; Masafumi Ayata; Juerg Leuthold

doi:10.1364/OE.401667

1. Introduction

Nonlinear compensation (NLC) techniques are likely to become an indispensable component of today’s long-haul, high-capacity coherent transmission systems. They mitigate the fiber Kerr nonlinearity, an effect that limits the maximum capacity [1–3]. An ideal NLC technique should effectively compensate the nonlinearity using the lowest computational effort possible.

Digital backpropagation (DBP) is among the best solutions—in terms of accuracy—to remove fiber nonlinearity [4–7]. It works by numerically approximating the nonlinear Schrödinger equation (NLSE) solution using the split step Fourier method (SSFM). The performance of DBP improves with the number of steps per spans (SpSs) used in the SSFM. In order to achieve good performance, multiple SpSs are necessary, which come at the cost of additional footprint and power consumption for the extra digital circuits needed for the implementation of the Fourier transforms and inverse Fourier transforms. The number of SpSs can be reduced by using the filtered DBP (FDBP), where at each nonlinear step an additional filtering operation is done to the intensity signal to limit the overcompensation of the non-linearity [6,8,9]. Another option to mitigate fiber nonlinearities in a fiber is using Volterra series transfer functions. In this approach, nonlinearities are modeled and compensated by using a set of n^th order nonlinear transfer functions. It has been shown that inverting a 3^rd order Volterra transfer function has a comparable performance as high complexity DBP [10]. The Volterra series can be implemented either in time domain [3,11,12] or in frequency domain [10,13]. However, Volterra series also require considerable computational power and they suffer from a performance penalty when the computational complexity is reduced [14].

Recently, neural networks (NN) have become a favored approach to solve various machine learning tasks [15–17]. One of the reasons is their universal approximation properties [18]. This ability makes NN an attractive solution for NLC because they can achieve good performance with lower complexity [19–25]. Furthermore, it has been shown that DBP has a similar mathematical structure as NNs in time domain and that they can be placed at the very beginning of the receiver digital signal processing (DSP) chain [20]. This method is different than that of [22] where they used NN in a perturbative compensation scheme. However, using NN or deep learning based DBP (DL-DBP) as a direct DBP replacement and placing them in the front of the receiver DSP chain is not trivial. This is because the symbols, which are needed for the training of the NN, are only available after going through the conventional DSP at the receiver end. To overcome this hurdle, developing a new symbol-level training procedure is necessary.

In this paper we introduce a new training method for dispersion and nonlinear compensation in fiber networks via deep learning based digital backpropagation (DL-DBP). Using our new training method, the DL-DBP experimentally achieves a 2dB SNR improvement over uncompensated and a 1 dB SNR gain with respect to 1 SpSs DBP and 0.7 dB SNR gain with respect to 1 SpSs FDBP on a 32 GBd 16QAM signal transmitted across 1200 km single mode fiber. This SNR gain is obtained using 6 times less computational power than that of a conventional DBP. The DL-DBP uses static hidden layers so that it can be trained at the symbol level. This way the algorithm is blind not only to polarization rotation state, frequency offset and phase offset but also to timing error. Insights into the operation mechanism are offered in Section 5. It is shown, that the DL-DBP uses a low pass filter in its learned hidden-layers to reduce out of band modulation due to the compensation of the nonlinearities. Additionally, it not only applies phase rotations, but also influences the amplitude of the complex signal passing through it. Further, we show that the DL-DBP can be trained with a signal exhibiting a high nonlinearity. The trained network then can be applied on low-power signals. Training then needs to be redone using a signal with high nonlinearity when dispersion or nonlinearity conditions changes. Lastly, using simulation results, we show that the proposed method is suitable for compensating nonlinearity at longer distances.

The DL network approach in this paper is an extension of our previous work presented at the OFC conference 2020 in Ref. [24]. Here, we have introduced a new timing recovery static layer so that the DL-DPB now operates blindly with respect to all relevant parameters. This training method is also a further improvement to the training method used in [21,23] since they only include polarization state rotation and phase error in their training loop so that their network is only blind to these impairments. Further, in this paper, we use 2 different lasers instead of 1 making our setup and training method closer to a real world application than that of [25]. In addition, we discuss the operation mechanism of the DL-based nonlinear compensation scheme that complements the independent results from the recently published paper [25]. This paper is organized as follow; we first introduce the mathematical concept of DL-DBP in Section 2. We then elaborate the new training method in Section 3. In Section 4, we explain our experimental setup and discuss the results of our proposed method in Section 5. Lastly, we give the conclusion in Section 6.

2. Deep learning based digital backpropagation

The DBP has a similar mathematical structure as a NN so that it can be realized and optimized using the NN training algorithm [20]. DBP undoes the fiber channel by approximating the solution of the nonlinear Schrödinger equation (NLSE) [4]. The DBP algorithm splits the fiber channel into multiple sections and applies a dispersion compensation filter and power dependent phase rotation to each section. Using ${ \star }$ as the convolution operator, we can write the DBP for M sections of fiber with a length $\mathrm{\Delta} = {L_{\textrm{fiber}}}/M$ as

(1)$${{\textbf u}_{M + 1,p}} = {{\textbf h}_p}\; ({{\gamma_M},{{\textbf w}_{M,p}}{ \star }{{\textbf h}_p}({{\gamma_{M - 1}},{{\textbf w}_{M - 1,p}}{ \star } \ldots {{\textbf h}_p}({{\gamma_1},{{\textbf w}_{1,\textrm{p}}}{ \star }{{\textbf u}_{1,\textrm{p}}}} )} )} )$$

where ${{\textbf u}_{m,p}}\; \in {{{\mathbb C}}^{N \times 1}}$ is the signal vector of size N at $m$^th sections of the fiber, ${{\textbf w}_{m,p}} \in {{{\mathbb C}}^{K \times 1}}$ is the dispersion filter with$\; K$ taps for a fiber section of length $\mathrm{\Delta}$. p is $\textrm{X}$ for the signal in the x-polarization and $\textrm{Y}$ for the signal in the y-polarization. For a dual polarized signal ${{\textbf h}_p} \in {{{\mathbb C}}^{N \times 1}},p \in \{{X,Y} \}$ is a vector function whose elements ${\textbf{h}_p} \in {{\mathbb C}}$ are

{h_\textrm{X}}({\gamma ,x,y} )= x \exp \left( {\textrm{j}\frac{8}{9}\gamma ({{{|x |}^2} + {{|y |}^2}} )} \right),

(2)$${h_\textrm{Y}}({\gamma ,x,y} )= y \exp \left( {\textrm{j}\frac{8}{9}\gamma ({{{|x |}^2} + {{|y |}^2}} )} \right),$$

where $x,y$ are the signals in the x- and y-polarizations respectively and $\gamma $ is the nonlinear Kerr coefficient of a fiber. The factor $\textrm{8/9}$ follows the Manakov version of DBP [5] where we average the nonlinearity contributions from each polarization, since the length scale of the nonlinearity is much larger than the length scale of random polarization rotations. The nonlinear coefficient is determined by the effective area of the fiber and the nonlinear refractive index of the fiber’s material [26].

In NN, we alternate between applying a linear operation $(\mathrm{\Sigma} )$ and a nonlinear operation as the activation function $({g(x )} ).$ This is similar to DBP (see Fig. 1) [20]. Using this similarity, the architecture of the NN is systematically realized by first choosing the number of hidden layers to be the same as the number of sections in the DBP. For a NN with M hidden layers, we set the values of the operator ${\mathrm{\Sigma}_{m,p}}$ with the help of the dispersion filter ${{\textbf w}_{m,p}}$; more precisely ${\mathrm{\Sigma}_{m,p}}({{{\textbf u}_{m,p}}} )= {\textbf w}_{m,p}^H{ \star }{{\textbf u}_{m,p}},\; \forall m = \{{1, \ldots ,M} \};p = \{{\textrm{X},\textrm{Y}} \}$. The linear operation can be done by arranging the weight vectors into a banded matrix. The activation function of the $m$^th hidden layer ${g_{m,p}}(x )$ is defined by (2) with a factor $\gamma _{\textrm{NN}}^m$ instead of the Kerr coefficient $\gamma $. The factor $\gamma _{\textrm{NN}}^m$ regulates the phase rotation in each layer and therefore is denoted as rotational strength throughout the text. Subsequently, the weight elements of the NN in the operator ${\mathrm{\Sigma}_{m,p}}$ and the $\gamma _{\textrm{NN}}^m$ in the activation function of each hidden layer are optimized using a NN training method. The experimental results later show that the optimized DL-DBP has an improved performance at lower complexity than that of a conventional DBP.

Fig. 1. Comparison between the structure of a digital backpropagation (DBP) and a neural network (NN). ${{\textbf w}_{m,p}}{ \star }$ is the dispersion filter operation at the $m$^th section of the fiber. $\; p$ is $\textrm{X}$ for the signal in the x-polarization and Y for the signal in the y-polarization. ${{\textbf h}_p}$ is a vector function whose elements ${h_p}$ are defined by (2). ${\mathrm{\Sigma}_{m,p}}$ is the linear operation of a NN at hidden layer $m\; $and polarization p. $g(x )$ is the chosen activation function. Lastly, ${{\textbf u}_p} \in {{{\mathbb C}}^{N \times 1}}$ is the input vector for polarization p and ${\hat{d}_{\textrm{DBP},p}}$ and ${\hat{d}_{\textrm{NN},p}}$ are the output symbols for DBP and NN respectively.

Download Full Size | PDF

The NN is trained using a backpropagation gradient descent algorithm [18]. This algorithm adjusts the parameters of the networks in order to minimize a cost function. In this paper, the cost function is defined as the mean squared error (MSE) between the output symbol of the network ${\hat{d}_{\textrm{NN,p}}}$ and the known symbol ${d_p}$. The minimization is done by calculating the gradient of the cost function with respect to the parameters of the networks — ${{\textbf w}_{1,p}}, \ldots ,{{\textbf w}_{M,p}}$, as well as $\gamma _{\textrm{NN}}^1,\; \gamma _{\textrm{NN}}^2, \ldots ,\; \gamma _{\textrm{NN}}^M$. Here, we consider the complex valued signal ${{\textbf u}_p}$ and use the complex-valued version of neural networks similar to what has been described in [27–30]. Utilizing a complex-valued NN complicates the training procedure, since the real-valued cost function (such as MSE) becomes nonanalytic, i.e. it is not differentiable on the whole complex plane [31]. Furthermore, the chosen activation function of (2) is also nonanalytic in the complex plane for both x and y, but it is analytic with respect to $\gamma _{\textrm{NN}}^m$. To overcome this limitation, one can still calculate the real increments of the real and imaginary parts of the nonanalytic function (the real-valued cost function and the activation function as a function of x and $y$) when calculating the total change of the nonanalytic function. This requires one to separate the complex-valued function $f(z )\in {{\mathbb C}}$ into two real-valued functions $U,V \in {{\mathbb R}}$ (real part and imaginary part), calculate their derivatives with respect to both ${\Re }\{z \}$ and ${\Im }\{z \}$, and then convert back the solutions into the complex domain [31]. However, this becomes cumbersome as the network grows. Wirtinger calculus offers a framework to do this process compactly. It should be noted than in a framework of Wirtinger calculus, the function $f(z )\in {{\mathbb C}}$ is still nonanalytic with respect to $z \in {{\mathbb C}}$. However, it assumes that both the real and imaginary part of the function $f(z )$ are differentiable with respect to both ${\Re }\{z \}$ and ${\Im }\{z \}$. Therefore, we use Wirtinger calculus to implement the backpropagation gradient descent compactly with nonanalytic complex functions to the NN.

For the sake of clarity, we now outline the use of the Wirtinger calculus to find the differentials of the nonanalytic complex functions. As stated in the previous paragraph and following [31], a complex-valued function $f(z )$ can be written as a combination of real-valued functions

(3)$$f(z )= U({a,b} )+ \textrm{j}V({a,b} )$$

where $z = a + b\textrm{j} \in {{\mathbb C}};{\; \; \; }a,b \in {{\mathbb R}}$ and $U,V \in {{\mathbb R}}$ are real-valued functions. $U,V$ are assumed to be differentiable with respect to $a,b$. The Wirtinger derivatives of a complex-valued function (3) are defined as the partial derivatives of the function f with respect to z and its complex conjugate$\; {z^\ast }$ [31]

\frac{{\partial f(z )}}{{\partial z}} \buildrel \Delta \over = \frac{1}{2}\left( {\frac{{\partial f}}{{\partial a}} - \textrm{j}\frac{{\partial f}}{{\partial b}}} \right),

(4)$$\frac{{\partial f(z )}}{{\partial {z^{\ast }}}} \buildrel \Delta \over = \frac{1}{2}\left( {\frac{{\partial f}}{{\partial a}} + \textrm{j}\frac{{\partial f}}{{\partial b}}} \right). $$

Here, we have assumed that f is differentiable with respect to a and b. The derivatives (4) are implemented by treating $f(z )$ as if it was a multivariate function $f({z,{z^\ast }} )$ where z and ${z^\ast }$ are assumed to be independent variables [31]. Hence, we work with the total differential. The total differential $\textrm{d}f$ of a complex function $f(z )$ is defined as [30]

(5)$$\textrm{d}f = \frac{{\partial f(z )}}{{\partial z}}\textrm{d}z + \frac{{\partial f(z )}}{{\partial {z^{\ast }}}}\textrm{d}{z^{\ast }}.$$

For a real valued function $f(z )$, such as MSE, (5) simplifies to

(6)$$\textrm{d}f = 2{\Re }\left\{ {\frac{{\partial f(z )}}{{\partial z}}\textrm{d}z} \right\}. $$

Since $\partial {f^\ast }\; (z )/\partial {z^\ast } = {({\partial f(z )/\partial z} )^\ast }$ holds, the simplification in (6) can be shown by using (5),

(7)$${\left( {\frac{{\partial f(z )}}{{\partial z}}\textrm{d}z} \right)^{\ast }} = \frac{{\partial {f^{\ast }}(z )}}{{\partial {z^{\ast }}}}\textrm{d}{z^{\ast }} = \frac{{\partial f(z )}}{{\partial {z^{\ast }}}}\textrm{d}{z^{\ast }}, $$

and $z + {z^\ast } = 2{\Re }\{z \}$. The last equality in (7) is only valid for a real-valued function $f(z )$. $\textrm{d}f$ in (6) is maximized if the result of the multiplication is a real number, implying that $\textrm{d}f$ is maximum if $\textrm{d}z \propto {({\partial f(z )/\partial z} )^\ast }$. Therefore, the complex parameters $({{{\textbf w}_m},\; \gamma_{\textrm{NN}}^m} )$ need to be updated in the direction of ${({\partial f(z )/\partial z} )^\ast }$ to follow the steepest gradient of the real valued cost function. Using (5), (6) and the chain rule of derivatives is sufficient to describe the backpropagation gradient descent of any complex-valued neural network [30].

Further, we optimize the gradient descent algorithm using a complex number compatible ADAM optimizer [30] and use a different learning rates for ${{\textbf w}_m}$ and $\gamma _{\textrm{NN}}^m$ since the gradients of $\gamma _{\textrm{NN}}^m\; \forall \; m = \{{1, \ldots ,M} \}$ have different magnitudes than that of the ${{\textbf w}_m}$. Lastly, it should be noted that all the parameters of DL-DBP (${{\textbf w}_m}$ and $\gamma _{\textrm{NN}}^m$) are complex numbers. Depending on the sign of the imaginary part of $\gamma _{\textrm{NN}}^m$, the activation function is also acting as an attenuator/amplifier to the signal propagating through the layers.

3. Impairments insensitive parametric optimization

In this section, we discuss the proposed training method where we introduce several static hidden layers and separate the training phase of the network into two stages as shown in Fig. 2(a). The term static hidden layer refers to a hidden layer that is not updated during the training process. In the first stage, we estimate the parameters of the static hidden layers, i.e. the timing error, polarization state rotation and phase correction, on the training set. In the second stage, we train the weights and rotational strengths of our network using the complex-valued backpropagation and the estimated static hidden layers. Once the network is trained, we remove the static hidden layers and apply the trained networks to unseen datasets without the static hidden layers. During the inference stage, the static hidden layers are replaced by conventional DSP, see Fig. 2(b).

Fig. 2. (a) Training phase of DL-DBP. In the first stage, we use a timing recovery block, a polarization demultiplexer block, a carrier frequency recovery (CFR) block and a carrier phase recovery block (CPR) to estimate timing error, polarization state rotation, frequency offset and phase offset respectively. Those estimations are then fed as input to the static hidden layers. The training loop then takes those estimations into account during the second stage. (b) The inference setup for both DL-DBP and DBP. During the inference stage, we remove all static hidden layers and add least mean square (LMS) equalizer block in the digital signal processing chain.

Download Full Size | PDF

As a direct replacement of DBP, we place the DL-DBP in the front of the receiver DSP chain and train the DL-DBP by minimizing the MSE between the output symbol of the DL-DBP (${d_{\textrm{out}}}[k ]$ in Fig. 2(a)) and the known, transmitted symbols. However, our network cannot be trained on the symbol level directly, since several DSP steps must be taken before we can compare any symbol to a known one. To circumvent this, we include several static hidden layers during the training loop to ensure that the network is trainable on the symbol level as shown in Fig. 2(a). These static hidden layers remove timing errors, demultiplex and downsample the dual polarization signals, and compensate for frequency and phase errors. The timing errors arise from the asynchronous clock sources in the transmitter and receiver while the phase error arises mainly due to the laser linewidth and frequency offset between the transmitter and receiver laser. As stated before, the static hidden layers are not updated during the training loop and their parameters are inferred from conventional receiver DSP, i.e. a “linear DSP scheme” performing chromatic dispersion compensation, timing recovery, polarization rotation compensation and phase estimation. The additional static hidden layers ensure that the network, i.e. layer 1 to M, compensates the fiber impairment—mainly chromatic dispersion and fiber nonlinearity—and is blind to the timing error, polarization state rotation and phase noise. For longer distances, we need to include the previously trained shorter-distance DL-DBP and chromatic dispersion compensation in the first training stage to ensure that it adequately estimates the static hidden layers’ parameters. Without the previously trained DL-DBP, the accumulated nonlinearity at longer fiber lengths hinders the efficiency of the first stage to estimate the parameters of the static hidden layers. However, for a transmission distance of 1200 km, chromatic dispersion compensation alone is sufficient.

In conventional DSP, the time shift operation can be implemented in the frequency domain by phase shift or in the time domain by a finite impulse response (FIR) interpolating filter. Since such a filter is a linear operation, we integrate the static timing recovery layer using a static hidden layer with a linear activation function. For the static hidden layer that corrects for timing error, we employ a set of interpolation filters, as the timing recovery is done on a block-by-block basis [32,33]. Similarly, polarization demultiplexing and phase rotation are also linear operations, which we incorporate to the training structure using additional static hidden layers with a linear activation function. The former uses a 2-by-2 butterfly FIR filter, whereas the latter has a fixed complex number weight determined by the estimated phases. Note that the weight for the phase rotation layer changes from sample to sample.

4. Experimental setup

Figure 3 shows the experimental setup comprising a transmitter, a transmission setup, and a receiver. At the transmitter, a dual polarization 32 GBd 16QAM signal with a square root-raised cosine and a roll-off factor of 0.1 was generated using a 64 GSa/s arbitrary waveform generator (AWG), a dual polarization IQ modulator which has a 3 dB bandwidth of 38 GHz, and a tunable laser source (TLS). The AWG produces a single randomly generated waveform of size 2¹⁸. The outputs of the AWG were then amplified by four parallel driver amplifiers (DA) with 3 dB bandwidth of 65 GHz. Prior to amplification, the output signals of the AWG were digitally delayed to compensate the skews due to different RF cable lengths and the internal skews of the dual polarization IQ modulator. The transmission setup comprises a recirculation loop. To ensure power stability inside the recirculating loop, we loaded the rest of the C-band using an amplified spontaneous emission (ASE) noise source in a similar manner to [34]. The noise loading ensures that fluctuations on individual wavelengths are less impactful to the overall power level inside the loop. To minimize the crosstalk between the noise and the channel under test (CUT), we spectrally shaped the noise to have a 1.6 nm wide notch around 1550 nm where the CUT is located. Furthermore, we adjusted the CUT to a noise power ratio of -10 dB and monitored the quality of the signal using an optical spectrum analyzer (OSA). Effectively, this is a quasi-single channel.

Fig. 3. Experimental setup for quasi-single channel 32 GBd 16QAM. We used a 64 GSa/s arbitrary waveform generator (AWG), a dual polarization IQ modulator and a tunable laser source (TLS) to create our single channel. A spectrally shaped amplified spontaneous emission (ASE) noise was loaded to the rest of the C-band to ensure power stability of our recirculating loop. Minimal crosstalk was ensured by having a 1.6 nm wide notch around 1550 nm and controlling the power ratio between channel under test (CUT) and noise. The quality of the signal was monitored using an optical spectrum analyzer (OSA). A recirculating fiber loop consisting of three 100 km fiber spool, four erbium doped fiber amplifiers (EDFAs), two variable optical attenuators (VOAs) and a wave shaper were used to emulate 1200 km transmission. The power inside the loop was monitored using 5 GHz photodiode and a mixed signal oscilloscope (MSO). A 90° optical hybrid, four 43 GHz balanced photodiodes, and an 80 GSa/s Digital Storage Oscilloscope (DSO) were used to receive the signal. We then processed the digitized signal offline using MATLAB.

Download Full Size | PDF

We used a recirculating loop consisting of three times 100 km fiber spools, four erbium doped fiber amplifiers (EDFAs), two variable optical attenuators (VOAs) and a wave shaper, to emulate a transmission distance of 1200 km. The wave shaper is used as a gain equalizing filter. We monitored the power inside the loop using a 5 GHz photodiode and a mixed signal oscilloscope (MSO). The fiber spools have a dispersion parameter of 17 ps/(km nm), 0.183 dB/km of attenuation ($\alpha $) and an effective area of 83.3$\; \mathrm{\mu}{\textrm{m}^2}$, which give 1.0703 W^-1/km of nonlinearity$\; (\gamma )$ according to [26]. Prior to launching the quasi-single channel into the recirculating loop, we amplified the signal to compensate for any loss from the optical switches. At the receiver, we used two band pass filters and an EDFA to recover and bring the CUT to the optimal operational power of our 90° hybrid and four 43 GHz balanced photodiodes. The received electrical signals were then sampled using 33 GHz bandwidth 80 GSa/s digital storage oscilloscope (DSO) and processed offline using MATLAB.

For the DL-DBP offline processing, we first resampled the received signal with an oversampling of two and normalized the signal to CUT power level. We used an oversampling of two because the DL-DBP—due to its training procedure—only operates at integer oversampling. The chosen oversampling ratio also ensures that the time resolution is sufficient for joint compensation of both chromatic dispersion and Kerr nonlinearity simultaneously in a post compensation scheme. We then employed ${N_{\textrm{hid}}} = 12$ hidden layers with activation function ${g_{m,p}}(x )$ as defined in Section 2 and varied the number of FIR taps in the linear part of the hidden layers. The DL-DBP is then equivalent to a one step per span DBP (cf. Fig. 1). We initialized the weights of the hidden layer with the impulse response of a dispersion filter [35] and the rotational strengths were set to zero. This initialization is chosen because the parameters of the subsequent static hidden layers were approximated under the assumption that the chromatic dispersion is already mitigated. Although we start with a well-known solution for chromatic dispersion, the DL-DBP, as will be shown in Section 5, still needs to learn the proper nonlinear parameters in terms of $\gamma _{\textrm{NN}}^m$ and amplitude responses. A set of 195 taps FIR filters were deployed in the timing recovery static hidden layer while a 21 taps 2-by-2 FIR filter was deployed in the polarization demultiplexing static hidden layer. We use 195 taps to ensure that the timing recovery would not cause any limitations on the training loop. The taps of the timing recovery filters were estimated using the modified Godard algorithm [32,33] while the polarization demultiplexing filter’s taps were estimated using the constant modulus (CMA) [36] and radius directed equalizers (RDE) [37] algorithm. In the phase correction static hidden layer, we employed the 4^th power algorithm to estimate the frequency offset [38] and blind phase search (BPS) [39] to estimate the laser phase noise respectively. As depicted in Fig. 2(b), while inferring on an unseen dataset, we applied the same algorithms, i.e., modified Godard, RDE, CMA, 4^th power, and BPS to estimate the timing information, state of polarization, frequency offset and phase noise. In addition to these algorithms, we also utilized a least mean square (LMS) equalization to remove any residual linear impairment. For the conventional DBP or FDBP offline processing, after resampling into oversampling of two and normalizing the signal to CUT power level, we applied either the Manakov DBP [5] or FDBP [6] followed by the same algorithms (modified Godard, RDE, CMA, 4^th power, BPS, and LMS) as in DL-DBP. In the filtered DBP, we used a Gaussian filter [8,9]. The Gaussian filter is chosen to be 1^st order filter. Finally, for the DL-DBP, we utilized a complex number compatible ADAM optimizer, and 1.5 million iterations to train the network. The training comprises a set of five different measurements, each consisting of 150’000 symbols. During the training phase, we run the DL-DBP twice on this measurement set, where each run we randomly shuffle the order of the measurement set. The shuffling was done to ensure that the DL-DBP does not learn the repetition. It should be noted that since we shuffle the order of the measurement set, the symbols order are not scrambled. During the inference stage, we use different set of measurements. Further for this work, we trained with one seed and verified the results with another one. In Fig. 4, it can be seen that both the smoothed mean square error and the SNR converges for 1.5 million symbols. Following [9], the conventional DBP was optimized by sweeping the $\alpha $ and $\gamma $ until the SNRs at the end of DSP chain saturate while the filtered DBP was optimized by sweeping the $\; \alpha $, $\gamma $ and the bandwidth of the filter.

Fig. 4. (a) Smoothed average of x- and y-polarization mean square error (MSE) as a function of number of iterations for a DL-DBP optimized for 4dBm of launch power. (b) SNR of x- and y-polarization and the average of the SNRs as a function of number of iterations for a DL-DBP optimized for 4 dBm of launch power. Both the MSE and the SNR was calculated using the training dataset.

Download Full Size | PDF

5. Results and discussion

Figure 5(a) shows the average estimated SNR of the suggested DL-DBP, a 1SpSs DBP, a 1SpSs FDBP, and a linear compensation. The linear compensation uses only chromatic dispersion filter. It can be seen that DL-DBP does not deliver any significant gain in a low launch power regime. However, in the high launch power regime, a 1.9 dB improvement over linear compensation was observed. Further, a 1 dB improvement compared to 1 SpSs DBP, and a 0.7 dB improvement compared to 1 SpSs FDBP was observed. Figure 5(a) also shows a 2 dB shift in optimal launch power, suggesting that a transmission distance extension is possible. We also show the constellation diagram of the techniques at the very end of the DSP chain for one of the polarization at launch power of 5 dBm in Fig. 5(b). These results are based on the average of the estimated SNR of both polarization at the end of the receiver DSP as a function of launch power of CUT in each fiber span. The DL-DBP uses 93 taps per hidden layer and its training was repeated for each launch power. The Gaussian filter in FDBP has a 3 dB bandwidth of 10 GHz. The value was obtained by sweeping the sweeping the parameters of FBDP ($\alpha $, $\gamma $ and the bandwidth of the filter) until the gain saturates. Both DBP and FDBP use frequency domain filtering with a large Fourier transform size (>2¹⁹).

Fig. 5. (a) Average estimated SNR as a function of launch power in channel under test (CUT) for four different techniques, namely the deep learning based digital backpropagation (DL-DBP), the digital backpropagation (DBP) with 1 step per spans (SpSs), the filtered digital backpropagation (FDBP) with 1 SpSs, and a linear DSP scheme (Linear). (b) The constellation diagrams of the four different techniques at the very end of the DSP chain for one of the polarization at launched power of 5dBm. (c) Received SNR Gain using DL-DBP, DBP and FDBP with respect to a linear compensation scheme plotted as a function of complexity. (d) Average estimated SNR as a function of launch power in CUT and number of hidden layers of DL-DBP.

Download Full Size | PDF

Furthermore, we varied the number of hidden layers of DL-DBP. In Fig. 5(d), we plot the SNR versus launch power for three values of hidden layers, i.e. ${N_{\textrm{hid}}} = 6,\; 12,\; 24$. Furthermore, the numbers of filter taps per hidden layer are 169, 93, and 65 respectively. The numbers were chosen as a compromise between minimum number of taps for a given fiber section and training time. It is shown that the performance of DL-DBP improves as we increase the number of hidden layers. However, the relative improvement becomes smaller as we increase the number of hidden layers from 12 to 24. We attribute this to the fact that an NN with a high number of hidden layers is harder to train [18]. In general, as the number of hidden layers increases the gain will eventually saturate. Based on Fig. 5(d), we choose to stay with 12 hidden layers for the rest of this work, since they already offer better performance than both 1 SpSs DBP and 1 SpSs FDBP. Additionally, it can be seen that the performance of DL-DBP employing 6 hidden layers is comparable to the performance of 1 SpSs DBP.

A study of inference complexity was done by calculating the number of real multiplications needed to process one symbol per polarization (mul/sym/pol). We only studied the inference complexity, since the training can be done offline prior to deployment. Following [4], the number of mul/sym/pol for DBP is $({4 \cdot K + 9} )\cdot {f_\textrm{s}}/R \cdot {N_{\textrm{sps}}}$, whereas for DL-DBP is $({4 \cdot K + 12} )\cdot {f_\textrm{s}}/R \cdot {N_{\textrm{hid}}},$ where K is the dispersion filter length per span (number of FIR taps), ${N_{\textrm{sps}}}$ is the number of step per spans, ${N_{\textrm{hid}}}$ is the number of hidden layers and ${f_\textrm{s}}/R$ is the oversampling ratio. We use two different constants in the complexity calculation because DL-DBP uses an additional multiplication, as $\gamma _{\textrm{NN}}^m$ is a complex number. The constant 9 for the DBP is the sum of the 2 absolute values (4 real multiplications), 1 real multiplication with the system parameter, and 4 real multiplications with the input signal. In contrast, the constant for the DL-DBP is the sum of 2 absolute values (4 real multiplications), 2 real multiplications with the $\gamma _{\textrm{NN}}^m \in {{\mathbb C}}$, and 6 real multiplications with the input signal. We also assume that the exponential operation is implemented using a look up table. For the complexity of FDBP, we follow [4,8]. The complexity of the FDBP is given as

(8)$$\left( {4 \cdot K + \frac{{{N_{\textrm{FFT}}}{{\log }_2}{N_{\textrm{FFT}}} + 10{N_{\textrm{FFT}}}}}{{{N_{\textrm{FFT}}} - {n_{\textrm{tap}}} + 1}}} \right) \cdot \frac{{{f_\textrm{s}}}}{R} \cdot {N_{\textrm{sps}}}$$

where ${N_{\textrm{FFT}}}$ is the size of the Fourier transform and ${n_{\textrm{tap}}}$ is the length of the Gaussian filter. ${n_{\textrm{tap}}} \approx 2\sqrt {2\ln 2} \cdot R/{B_{3\textrm{db}}}$ is the minimum filter length [8], where R is the symbol rate, ${B_{3\textrm{db}}}$ is the 3 dB bandwidth of the Gaussian filter. The complexity of FDBP assumes that the Gaussian filtering operation uses the overlap-and-safe method while the dispersion compensation uses convolutional based filtering in time domain. Since the Gaussian filter uses overlap-and-safe for every ${N_{\textrm{fft}}}$ input samples, we count 9 real multiplications for the exponential function and one extra one from the filtering operation. In this work, we assume that ${N_{\textrm{FFT}}} = 4096$ so that there is enough frequency resolution to filter adequately. Figure 5(c) shows the gain with respect to the linear compensation as a function of complexity. We fixed the number of hidden layers inside the DL-DBP and varied K. Further, for different SpSs’ in both DBP and FDBP, we sweep theirs parameters until the gain saturates. The gain was calculated at the optimum launched power, c.f. Figure 5(a). For the same number of multiplications, the DL-DBP delivers a higher gain over both a conventional DBP and filtered DBP since the DL-DBP can jointly optimize its parameters —$\; {{\textbf w}_m};\gamma _{\textrm{NN}}^m\forall m = \{{1, \ldots ,M} \}$. This is crucial especially when the complexity is at a premium. A conventional DBP needs 6 times the computational power to achieve the same gain as DL-DBP. It should be noted that the time domain filtering of the inference stage of DL-DBP, DBP and FDBP can be efficiently implemented in frequency domain using overlap-and-safe method so that the number of real multiplications can be further reduced while maintaining the same SNR gain [40].

In Fig. 6(a), we plotted the x- and y-polarization frequency response of the linear part at each layer of the DL-DBP. The network was optimized for 4 dBm launch power and a complexity of 9216 mul/sym/pol. It is shown that there is amplitude ringing in the initial frequency response because we approximate the initial weights using a square window [41]. As backpropagation uses this filter multiple times, we accumulate the amplitude distortion [42]. Furthermore, all the layers in the DL-DBP have a low pass filter characteristic. During the revision of this paper, a similar low pass characteristic has also been shown independently in [25]. The low pass filtering implies that the DL-DBP learns to suppress the out-of-band modulation due to the compensation of fiber nonlinearity as already pointed out in previous works [6,43]. From the phase response of the linear part at each layer, we can see that the quadratic phase relation with respect to the frequency is still dominant and one layer corrects the chromatic dispersion of one span. In addition to chromatic dispersion, the linear part at each layer also adds group delay $(\tau )$ and a negligible constant phase shift. Table 1 summarizes the extracted parameters for both polarizations. From the combined frequency response in Fig. 6(b), the DL-DBP also attenuates the DC component of the signal. However, it is done in a distributed manner since there are multiple layers that are either attenuating or amplifying the DC component. This behavior is because the DL-DBP was trained on a signal with a flat spectrum around DC (as one would expect from an ideal transmitter). However, in a practical system the transmitters often deviate from the ideal operation point and DC components emerge, which are then corrected by the DL-DBP. This shows that DL-DBP may compensate for other impairments in the system, such as device imperfection.

Fig. 6. (a) Initial, x- and y-polarization frequency and phase response of the linear part at each layer of DL-DBP. The networks was optimized for 4 dBm channel under test launch power and a complexity of 9216 mul/sym/pol. (b) The combined frequency and phase response of 12 layers DL-DBP.

Download Full Size | PDF

Table 1. Extracted dispersion parameters $({D} )$ in ps/(nm km) and group delay $({\tau } )$ in ps for both polarizations. The dispersion parameters is normalized to 100km.

View Table | View all tables in this article

We list the obtained $\gamma _{\textrm{NN}}^m$ of the DL-DBP of Fig. 6 (a DL-DBP optimized for 4 dBm launch power and 9216 mul/sym/pol complexity) in Table 2. DL-DBP applies a different rotational strength in each layer. From the table it can be seen that large phase corrections are applied in the first couple of layers that diminish as they approach the last layer. This approach is different from the conventional DBP where the rotational strength is uniform across different spans. It can also be seen that DL-DBP shapes the amplitude of the signal using the imaginary part of the $\gamma _{\textrm{NN}}^m\; \; ({{\Im }\{{{\; }\gamma_{\textrm{NN}}^m} \}} )$. The sign of ${\Im }\{{{\; }\gamma_{\textrm{NN}}^m} \}$ determines whether ${\; }{\gamma _{\textrm{NN}}}$ increases or decreases the amplitude of the signal. This capability allows the DL-DBP to compliment the attenuation or amplification coming from the linear part of DL-DBP. In summary, the frequency response in Fig. 6(a) and $\gamma _{\textrm{NN}}^m$ imply that the DL-DBP corrects for the dispersion using the phase response of the linear part of the DL-DBP then it corrects for non-linearity by adjusting the signal amplitude using both the amplitude response of the linear part and $\gamma _{\textrm{NN}}^m$ prior to undoing the Kerr nonlinearity induced phase rotation. The low pass characteristic of the DL-DBP further improves the signal by reducing the residual out-of-band modulation. This method is different than that of the FDBP [6,8,9] or the perturbation-based backpropagation [43] where they reduce the overcompensation by filtering the intensity signal used in the nonlinear compensation. Additionally, we also observe a polarization dependent group delay corrections. This implies that the DL-DBP alleviates the accumulated polarization mode dispersion.

Table 2. Learned ${\gamma }_{{\textrm{NN}}}^{m}$ of DL-DBP from Fig. 6

View Table | View all tables in this article

We show the generality of the DL-DBP by applying $\gamma _{\textrm{NN}}^m$ obtained from one power level to another power level. Here, generality means that the DL-DBP learns the underlying mechanism of nonlinear fiber propagation and can be applied independent of the power level. We either, first, multiply $\gamma _{\textrm{NN}}^m$ by a scale factor $\eta $ or apply $\gamma _{\textrm{NN}}^m$ directly without multiplication with $\eta $. Following [22], the scaling $\eta $ is defined as

(9)$$\eta = {10^{0.1 \cdot ({{P_{\textrm{ch}}} - {P_{\textrm{ref}}}} )}},$$

where ${P_{\textrm{ch}}}$ is the power in the channel in dBm and ${P_{\textrm{ref}}}$ is the reference power in dBm. Figures 7(a) and (b) show that the DL-DBP trained at one power level provides a similar performance to DL-DBPs as trained by other power levels. This implies that the DL-DBP learns the underlying mechanism of fiber nonlinearity and allows us to train the DL-DBP by a signal with high nonlinearity without a need to retrain the DL-DBP for every power level. Furthermore, it can be seen that it is better to use the $\gamma _{\textrm{NN}}^m$ directly without scaling. Lastly, it should be noted that we only scale the $\gamma _{\textrm{NN}}^m$ and use the learned weights directly without scaling.

Fig. 7. Average estimated SNR as a function of launch power in channel under test (${P_{\textrm{ch}}}\; $) for DL-DBP using (a) scaled $\gamma _{NN}^m$ and (b) unscaled $\gamma _{NN}^m$. It should be noted that we only scale the $\gamma _{\textrm{NN}}^m$ and use the learned weights directly.

Download Full Size | PDF

Finally, Fig. 8 indicates that the DL-DBP is suitable for compensating nonlinearity at longer distances. Furthermore, it also shows that DL-DBP increases the transmission distance by 900 km with respect to linear compensation while an increase of 300 km over 1 SpSs DBP is possible. The plots in Fig. 8 are based on simulated transmission using the same parameters as in [24]. For longer distances, we use more hidden layers inside the DL-DBP. For all distances, the amount of hidden layers was chosen in such a way that the DL-DBP is equivalent to 1 SpSs DBP and the number of taps per hidden layers is chosen to be 97 taps. However, it should be noted that as the distance gets longer the DL-DBP needs to include more hidden layers and training becomes less efficient. One of the reasons for this is the gradient of the cost function needs to propagate deeper inside the network and as the gradient propagates deeper the absolute value could diminish so that the parameters in the deeper layers only change a little [18].

Fig. 8. Average of optimum x-and y-polarization SNR as a function of transmission distance for four different techniques, namely the deep learning based digital backpropagation (DL-DBP), the digital backpropagation (DBP) with 1 step per spans (SpSs), the filtered digital backpropagation (FDBP) with 1 SpSs, and a linear DSP scheme (Linear). The SNRs were calculated using simulated transmission using parameters from [24]. The DL-DBP is suitable to compensate nonlinearity at longer distances.

Download Full Size | PDF

6. Conclusion

A DL-DBP was applied to 1200 km transmission data. This is possible due to a novel training method that allows DL-DBP to compensate fiber nonlinearity in the presence of timing error, polarization state rotation and phase noise. We have demonstrated a 1.9 dB gain with respect to conventional linear compensation, a 1 dB gain with respect to 1 SpSs DBP and a 0.7dB improvement with respect to 1 SpSs FDBP. Furthermore, a complexity study showed that our DL-DBP requires much less computational power to achieve a 1.9 dB gain whereas a conventional DBP needs 6 times the computational power to achieve the same gain. We discussed the effect of the DL-DBP on the signal passing through. It was shown that DL-DBP undoes the dispersion for one span, compensates the Kerr nonlinearity in a distributed fashion and reduces residual out-of-band modulation. Another advantage of a DL-DBP is that it may also capable of compensating other impairments, such as device imperfections. We also show that the proposed method generalizes the problem well and can be retrained on a signal with high nonlinearity when dispersion or nonlinearity condition changes. Lastly, applying DL-DBP on a simulated transmission indicates perfect operation even for longer distances.

Funding

European Research Council (670478, PLASILOR).

Acknowledgement

This work was supported in part by Sterlite Technology Limited and in part by Oclaro (now Lumentum).

The authors would like to thank Ueli Koch and George Christidis for helpful insights in preparing this manuscript. The authors would also like to thank the reviewers for their constructive feedbacks. Special thanks to Alyssa for proofreading this work.

Disclosures

The authors declare no conflicts of interests.

References

1. R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity Limits of Optical Fiber Networks,” J. Lightwave Technol. 28(4), 662–701 (2010). [CrossRef]

2. A. D. Ellis, Z. Jian, and D. Cotter, “Approaching the Non-Linear Shannon Limit,” J. Lightwave Technol. 28(4), 423–433 (2010). [CrossRef]

3. A. d. I. F. Ghazisaeidi, J. Ruiz, L. Schmalen, P. Tran, C. Simonneau, E. Awwad, B. Uscumlic, P. Brindel, and G. Charlet, “Submarine Transmission Systems Using Digital Nonlinear Compensation and Adaptive Rate Forward Error Correction,” J. Lightwave Technol. 34(8), 1886–1895 (2016). [CrossRef]

4. E. Ip and J. M. Kahn, “Compensation of Dispersion and Nonlinear Impairments Using Digital Backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

5. E. Ip, “Nonlinear Compensation Using Backpropagation for Polarization-Multiplexed Transmission,” J. Lightwave Technol. 28(6), 939–951 (2010). [CrossRef]

6. L. B. Du and A and J. Lowery, “Improved single channel backpropagation for intra-channel fiber nonlinearity compensation in long-haul optical communication systems,” Opt. Express 18(16), 17075–17088 (2010). [CrossRef]

7. N. V. Irukulapati, H. Wymeersch, P. Johannisson, and E. Agrell, “Stochastic Digital Backpropagation,” IEEE Trans. Commun. Technol. 62(11), 3956–3968 (2014). [CrossRef]

8. Y. Gao, J. H. Ke, K. P. Zhong, J. C. Cartledge, and S. S. Yam, “Assessment of Intrachannel Nonlinear Compensation for 112 Gb/s Dual-Polarization 16QAM Systems,” J. Lightwave Technol. 30(24), 3902–3910 (2012). [CrossRef]

9. J. I. F. d. Ruiz, A. Ghazisaeidi, and G. Charlet, “Optimization rules and performance analysis of filtered digital backpropagation,” in 2015 European Conference on Optical Communication (ECOC), (IEEE, 2015), pp. 1–3.

10. F. P. Guiomar, J. D. Reis, A. L. Teixeira, and A. N. Pinto, “Digital Postcompensation Using Volterra Series Transfer Function,” IEEE Photonics Technol. Lett. 23(19), 1412–1414 (2011). [CrossRef]

11. Z. Tao, L. Dou, W. Yan, L. Li, T. Hoshida, and J. C. Rasmussen, “Multiplier-Free Intrachannel Nonlinearity Compensating Algorithm Operating at Symbol Rate,” J. Lightwave Technol. 29(17), 2570–2576 (2011). [CrossRef]

12. A. Ghazisaeidi and R. Essiambre, “Calculation of coefficients of perturbative nonlinear pre-compensation for Nyquist pulses,” in 2014 The European Conference on Optical Communication (ECOC), (IEEE, 2014), pp. 1–3.

13. F. P. Guiomar, J. D. Reis, A. L. Teixeira, and A. N. Pinto, “Mitigation of intra-channel nonlinearities using a frequency-domain Volterra series equalizer,” Opt. Express 20(2), 1360–1369 (2012). [CrossRef]

14. L. Liu, L. Li, Y. Huang, K. Cui, Q. Xiong, F. N. Hauske, C. Xie, and Y. Cai, “Intrachannel Nonlinearity Compensation by Inverse Volterra Series Transfer Function,” J. Lightwave Technol. 30(3), 310–316 (2012). [CrossRef]

15. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

16. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the Gap to Human-Level Performance in Face Verification,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2014), pp. 1701–1708.

17. I. Sutskever, J. Martens, and G. Hinton, “Generating text with recurrent neural networks,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, (Omnipress, 2011), pp. 1017–1024.

18. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, (MIT Press, 2016).

19. V. Kamalov, L. Jovanovski, V. Vusirikala, S. Zhang, F. Yaman, K. Nakamura, T. Inoue, E. Mateo, and Y. Inada, “Evolution from 8QAM live traffic to PS 64-QAM with Neural-Network Based Nonlinearity Compensation on 11000 km Open Subsea Cable,” in OFC, OSA Technical Digest (online) (Optical Society of America, 2018), pp. Th4D.5.

20. C. Häger and H. D. Pfister, “Nonlinear Interference Mitigation via Deep Neural Networks,” in Optical Fiber Communication Conference, OSA Technical Digest (online) (Optical Society of America, 2018), pp. W3A.4.

21. E. Sillekens, W. Yi, D. Semrau, A. Ottino, B. Karanov, S. Zhou, K. Law, J. Chen, D. Lavery, L. Galdino, P. Bayvel, and R. I. Killey, “Experimental Demonstration of Learned Time-Domain Digital Back-Propagation,” arXiv e-prints, arXiv:1912.12197 (2019).

22. S. Zhang, F. Yaman, K. Nakamura, T. Inoue, V. Kamalov, L. Jovanovski, V. Vusirikala, E. Mateo, Y. Inada, and T. Wang, “Field and lab experimental demonstration of nonlinear impairment compensation using neural networks,” Nat. Commun. 10(1), 3033–3040 (2019). [CrossRef]

23. V. Oliari, S. Goossens, C. Häger, G. Liga, R. M. Bütler, M. v. d. Hout, S. v. d. Heide, H. D. Pfister, C. Okonkwo, and A. Alvarado, “Revisiting Efficient Multi-Step Nonlinearity Compensation With Machine Learning: An Experimental Demonstration,” J. Lightwave Technol. 38(12), 3114–3124 (2020). [CrossRef]

24. B. I. Bitachon, A. Ghazisaeidi, B. Baeuerle, M. Eppenberger, and J. Leuthold, “Deep Learning Based Digital Back Propagation with Polarization State Rotation & Phase Noise Invariance,” in Optical Fiber Communication Conference (OFC) 2020, OSA Technical Digest (Optical Society of America, 2020), pp. M1G.2.

25. Q. Fan, G. Zhou, T. Gui, C. Lu, and A. P. T. Lau, “Advancing theoretical understanding and practical performance of signal processing for nonlinear optical communications through machine learning,” Nat. Commun. 11(1), 3694–3704 (2020). [CrossRef]

26. G. P. Agrawal, Nonlinear Fiber Optics, (Academic Press, 2013).

27. H. Leung and S. Haykin, “The complex backpropagation algorithm,” IEEE Trans. Signal Process. 39(9), 2101–2104 (1991). [CrossRef]

28. G. M. Georgiou and C. Koutsougeras, “Complex domain backpropagation,” IEEE Trans. Circuits Syst. 39(5), 330–334 (1992). [CrossRef]

29. T. Kim and T. Adali, “Fully Complex Multi-Layer Perceptron Network for Nonlinear Signal Processing,” J Signal Process Sys. 32(1/2), 29–43 (2002). [CrossRef]

30. A. Sarroff, “Complex Neural Networks for Audio,” Doctoral thesis (Dartmouth College, 2018).

31. T. Adali, P. J. Schreier, and L. L. Scharf, “Complex-Valued Signal Processing: The Proper Way to Deal With Impropriety,” IEEE Trans. Signal Process. 59(11), 5101–5125 (2011). [CrossRef]

32. A. Josten, B. Baeuerle, B. I. Bitachon, G. Stalder, D. Hillerkuss, and J. Leuthold, “400G Probabilistic Shaped PDM-64QAM Synchronization in the Frequency Domain,” IEEE Photonics Technol. Lett. 31(9), 697–700 (2019). [CrossRef]

33. A. Josten, B. Baeuerle, E. Dornbierer, J. Boesser, D. Hillerkuss, and J. Leuthold, “Modified Godard Timing Recovery for Non Integer Oversampling Receivers,” Appl. Sci. 7(7), 655–668 (2017). [CrossRef]

34. D. J. Elson, G. Saavedra, K. Shi, D. Semrau, L. Galdino, R. Killey, B. C. Thomsen, and P. Bayvel, “Investigation of bandwidth loading in optical fibre transmission using amplified spontaneous emission noise,” Opt. Express 25(16), 19529–19537 (2017). [CrossRef]

35. S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express 16(2), 804–817 (2008). [CrossRef]

36. S. J. Savory, “Digital Coherent Optical Receivers: Algorithms and Subsystems,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1164–1179 (2010). [CrossRef]

37. M. J. Ready and R. P. Gooch, “Blind equalization based on radius directed adaptation,” in International Conference on Acoustics, Speech, and Signal Processing, (IEEE, 1990), pp. 1699–1702.

38. M. Selmi, Y. Jaouen, and P. Ciblat, “Accurate digital frequency offset estimator for coherent PolMux QAM transmission systems,” in 2009 35th European Conference on Optical Communication, (IEEE, 2009), pp. 1–2.

39. T. Pfau, S. Hoffmann, and R. Noe, “Hardware-Efficient Coherent Digital Receiver Concept With Feedforward Carrier Recovery for M-QAM Constellations,” J. Lightwave Technol. 27(8), 989–999 (2009). [CrossRef]

40. S. W. Smith, The scientist and engineer's guide to digital signal processing, (California Technical Publishing, 1997).

41. A. V. Oppenheim and R. W. Schafer, Discrete-time signal processing, 3rd ed., Pearson new Intern. ed. ed., (Pearson Education, 2013).

42. G. Goldfarb and G. Li, “Efficient backward-propagation using wavelet-based filtering for fiber backward-propagation,” Opt. Express 17(11), 8815–8821 (2009). [CrossRef]

43. W. Yan, Z. Tao, L. Dou, L. Li, S. Oda, T. Tanimura, T. Hoshida, and J. C. Rasmussen, “Low Complexity Digital Perturbation Back-propagation,” in 37th European Conference and Exposition on Optical Communications, OSA Technical Digest (CD) (Optical Society of America, 2011), pp. Tu.3.A.2.

	x-polarization		y-polarization
Layer	$D$ [ps/(nm km)]	$τ$ [ps]	$D$ [ps/(nm km)]	$τ$ [ps]
1	16.84	0.07	16.85	0.17
2	16.87	0.10	16.87	0.16
3	16.88	0.13	16.88	0.15
4	16.89	0.15	16.89	0.14
5	16.90	0.17	16.90	0.13
6	16.90	0.19	16.91	0.13
7	16.91	0.21	16.91	0.12
8	16.91	0.23	16.92	0.12
9	16.92	0.24	16.92	0.11
10	16.92	0.25	16.92	0.11
11	16.93	0.26	16.93	0.11
12	16.93	0.28	16.94	0.11

Layer	$γ_{NN}^{m}$
1	-26.8-j0.6
2	-28.04-j1.26
3	-26.33-j1.17
4	-31.8+j0.82
5	-31.66+j0.49
6	-26.83-j1.3
7	-23.102-j0.826
8	-19.9-j0.54
9	-13.464+j0.134
10	-14.87-j0.574
11	-12.5-j0.48
12	-8-j2.1

	x-polarization		y-polarization
Layer	$D$ [ps/(nm km)]	$τ$ [ps]	$D$ [ps/(nm km)]	$τ$ [ps]
1	16.84	0.07	16.85	0.17
2	16.87	0.10	16.87	0.16
3	16.88	0.13	16.88	0.15
4	16.89	0.15	16.89	0.14
5	16.90	0.17	16.90	0.13
6	16.90	0.19	16.91	0.13
7	16.91	0.21	16.91	0.12
8	16.91	0.23	16.92	0.12
9	16.92	0.24	16.92	0.11
10	16.92	0.25	16.92	0.11
11	16.93	0.26	16.93	0.11
12	16.93	0.28	16.94	0.11

Layer	$γ_{NN}^{m}$
1	-26.8-j0.6
2	-28.04-j1.26
3	-26.33-j1.17
4	-31.8+j0.82
5	-31.66+j0.49
6	-26.83-j1.3
7	-23.102-j0.826
8	-19.9-j0.54
9	-13.464+j0.134
10	-14.87-j0.574
11	-12.5-j0.48
12	-8-j2.1

Deep learning based digital backpropagation demonstrating SNR gain at low complexity in a 1200 km transmission link

Abstract

Corrections

1. Introduction

2. Deep learning based digital backpropagation

3. Impairments insensitive parametric optimization

4. Experimental setup

5. Results and discussion

6. Conclusion

Funding

Acknowledgement

Disclosures

References

Cited By

Figures (8)

Tables (2)

Equations (11)

Optics Express

Bertold Ian Bitachon	https://orcid.org/0000-0002-0841-9389
Marco Eppenberger	https://orcid.org/0000-0001-9271-6863
Benedikt Baeuerle	https://orcid.org/0000-0001-8545-915X
Masafumi Ayata	https://orcid.org/0000-0002-4418-3764
Juerg Leuthold	https://orcid.org/0000-0003-0111-8169