# Design of Low-Power DSP-Free Coherent Receivers for Data Center Links

Jose Krause Perin, Anujit Shastri, and Joseph M. Kahn

Abstract—Coherent detection offers high spectral efficiency and receiver sensitivity, but digital signal processing (DSP)-based coherent receivers may be prohibitively power hungry for data centers even when optimized for short-reach applications, where fiber propagation impairments are less severe. We propose and evaluate low-power DSP-free homodyne coherent receiver architectures for dual-polarization quadrature phase shift keying (DP-QPSK) for inter- and intradata center links. We propose a novel optical polarization demultiplexing technique, for DP-QPSK and higher-order modulation formats, with three cascaded phase shifters driven by marker tone detection circuitry. We consider carrier recovery based on either optical or electrical phase-locked loops (PLLs). We propose a novel multiplier-free phase detector based on XOR gates, which exhibits less than 0.5 dB power penalty relative to a conventional Costas loop phase detector. We also study the relative performance of homodyne DP-differential QPSK, for which carrier phase recovery is unnecessary. Our proposed DSP-free architectures exhibit  $\sim 1$  dB power penalty at small chromatic dispersion compared to their DSP-based counterparts. We estimate conservatively that the high-speed analog electronics of an electrical PLL-based coherent receiver consume nearly 4 W for 200 Gbit/s DP-QPSK, assuming a 90-nm complementary metaloxide semiconductor process.

*Index Terms*—Carrier phase recovery, coherent detection, data centers, DQPSK, DSP-free, electrical phase-locked loop, low power, optical phase-locked loop, polarization controller, polarization demultiplexing, QPSK.

## I. INTRODUCTION

**R**ESEARCH on spectrally efficient modulation formats compatible with intensity modulation and direct detection (IM-DD), e.g., [1], [2], has led to the adoption of four-level pulse amplitude modulation (4-PAM) by the IEEE P802.3bs task force to enable 50 and 100 Gbit/s-per-wavelength data center interconnects [3]. However, scaling the bit rate beyond 100 Gbit/s is challenging, as these IM-DD systems exploit only one degree of freedom of the optical fiber channel: optical intensity. Moreover, these systems already face tight implementation constraints. Indeed, 50 Gbit/s 4-PAM links for inter-data center

The authors are with the Edward L. Ginzton Laboratory, Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: jkperin@stanford.edu; anujit@stanford.edu; jmk@ee.stanford.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JLT.2017.2752079

applications, i.e., amplified links near 1550 nm with reach up to ~80 km, require optical signal-to-noise ratio (OSNR) greater than 29 dB to operate below the 7% forward error correction (FEC) threshold [4]. 100 Gbit/s 4-PAM links for intra-data center applications, i.e., unamplified links near 1310 nm with reach up to ~10 km, have limited power margin, as their receiver sensitivity is only -7 dBm [1]. The receiver sensitivity may be improved by 4.5 dB using avalanche photodiodes or by 6 dB using semiconductor optical amplifiers [5].

Next-generation data center links will likely demand innovative low-power solutions that scale to bit rates beyond 100 Gbit/s while accommodating increased optical losses due to the fiber plant, wavelength demultiplexing, and possibly optical switches. Recently proposed techniques based on Stokes vector detection [6], [7] and single-sideband discrete-multitone (SSB-DMT) [8] are spectrally efficient, but they rely on powerhungry analog-to-digital converters (ADCs) and digital signal processing (DSP) and do not address the problem of high required OSNR in inter-data center links or limited power margin in intra-data center links. Coherent detection is more scalable, as it enables four degrees of freedom of the single-mode fiber (SMF), namely two quadratures in two polarizations, and improves sensitivity by up to 20 dB by mixing a weak signal with a strong local oscillator (LO) [9].

Coherent detection based on high-speed DSP is a mature technology in long-haul systems, but it may be unsuitable for data center links. In long-haul systems, the high cost and power consumption of high-speed DSP are amortized, as a 3-dB sensitivity improvement, for instance, may double the reach and nearly halve the number of required repeaters. Data center applications, however, have other design priorities such as cost, power consumption, and port density, and they face fewer propagation impairments, as polarization mode dispersion (PMD) and nonlinearities are negligible. These fundamental differences may favor low-power architectures based on analog signal processing that avoid high-speed ADCs and DSP altogether. DSPbased coherent receivers optimized for short-reach applications [10], [11] will inevitably require high-speed ADCs and DSP for basic operations such as polarization demultiplexing, carrier recovery (CR), and timing recovery, which, combined, consume roughly 17 W in 40-nm complementary metal-oxide semiconductor (CMOS) for a 100 Gbit/s dual-polarization (DP) quaternary phase-shift keying (QPSK) receiver [12].

In this paper, we propose and evaluate homodyne DSP-free coherent receiver architectures for DP-QPSK. We propose polarization demultiplexing based on optical phase shifters that

0733-8724 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received March 17, 2017; revised June 14, 2017 and August 23, 2017; accepted September 10, 2017. Date of publication September 12, 2017; date of current version October 12, 2017. This work was supported in part by Maxim Integrated, in part by Google, in part by the National Science Foundation under Award ECCS-1740291, and in part by the CAPES Fellowship Proc. 13318/13-6. (*Corresponding author: Jose Krause Perin.*)

are controlled by low-frequency marker tone detection circuitry. CR is based on either an optical or an electrical phase-locked loop (PLL). We propose a novel multiplier-free phase detector based on exclusive-OR (XOR) gates. We also study the relative performance of homodyne DP-differential QPSK (DP-DQPSK), whereby information is encoded in phase transitions, hence avoiding CR circuitry.

The estimated power consumption of the high-speed analog electronics of our most power-hungry architecture is nearly 4 W for 200 Gbit/s DP-QPSK, assuming 90-nm CMOS. Moreover, near zero chromatic dispersion (CD), the proposed DSP-free systems exhibit  $\sim 1$  dB power penalty compared to their DSP-based counterparts. The DSP-based receiver used as a benchmark employs a newly proposed 2  $\times$  2 multiple-input multiple-output (MIMO) equalizer based on a small-differential group delay (DGD) approximation, halving the number of required real operations.

The remainder of this paper is organized as follows. In Section II, we present the proposed architecture for a DP-QPSK receiver based on analog signal processing and describe polarization demultiplexing, CR, and a startup protocol. In Section III, we present a homodyne DP-DQPSK receiver architecture that does not require CR. Section IV compares the performance of these different analog receivers to a simplified DSP-based receiver. Section V concludes the paper.

#### II. HOMODYNE DP-QPSK RECEIVER

Fig. 1 shows the overall block diagram of the DP-QPSK coherent receiver based on analog signal processing. In this implementation, a polarization controller is driven by a low-speed microcontroller, which does marker tone detection, as discussed in Section II-A.

After balanced photodetection, transimpedance amplifiers (TIAs) with automatic gain control (AGC), and low-pass filtering (LPF) to reduce noise, the signals reach the high-speed analog electronics stage, where CR, timing recovery and detection are performed. Timing recovery and detection may be realized using conventional clock and data recovery (CDR) techniques [13]; thus, we do not discuss them further herein. Polarization recovery and CR are performed using only analog waveforms and do not depend on timing information. The high-speed analog electronics stage is detailed in Fig. 2 for CR based on optical PLL (OPLL) and electrical PLL (EPLL).

In an OPLL (Fig. 2(a)), the LO laser is frequency-modulated by the frequency correction signal generated by the CR stage. Hence, an OPLL requires a LO laser with wideband frequency modulation (FM) response and short propagation delay in the LO path to minimize the overall loop delay. Minimizing the loop delay is one of the main challenges in OPLL design, since the loop includes the LO laser, 90° hybrid, photodiodes, and all the subsequent electronics in CR, which may not be realized within the same chip. Notably, Park et al have demonstrated loop delays of only 120 ps for a highly integrated 40 Gbit/s binary PSK (BPSK) coherent receiver [14].

eceived Polarization LPF Optical Controlle TIA AGC Signal 90° Carrier LPF Q Recovery Data TIA-AGC Timing Outputs coverv and LPF Detection TIA-AGC Local LPF YQ Oscillato TIA-AGC Lase Du **Balanced Detection** Hybrid **Frequency Correction Signal** Block diagram of DP-QPSK receiver based on analog signal process-Fig. 1.

Low-Speed Polarization Correction Signals

Polarization Recovery

Fig. 1. Block diagram of DP-QPSK receiver based on analog signal processing. The polarization controller is composed of optical phase shifters detailed in Fig. 3. The block diagram corresponding to carrier recovery, timing recovery and detection is detailed in Fig. 2 for carrier recovery based on EPLL and OPLL. This diagram is also used for DP-DQPSK, but the polarization controller and carrier recovery blocks are replaced by those shown in Fig. 10.

An EPLL (Fig. 2(b)) implementation eliminates requirements on LO laser FM response and on propagation delay at the cost of more complex analog electronics. Specifically, an EPLL requires a single-side band mixer in each polarization to de-rotate the incoming signals (see Fig. 2(b)), since the transmitter and LO lasers are not phase locked. Additionally, the frequency offset between transmitter and LO lasers must always be within the lock-in and hold-in ranges of the EPLL, which are typically limited by the tuning range of the voltage-controlled oscillator (VCO) [15]. The VCO tuning range can be on the order of several GHz (e.g., 11.8 GHz for a ring oscillator VCO [16]). The constraint on frequency offset can be satisfied by strict laser temperature control, whose cost and power consumption could be shared among several channels by using frequency combs [17] for both the transmitter and LO. Alternatively, a frequency error estimation stage (Fig. 2(b)), based on relatively simple frequency discriminator circuitry [18], may be used to keep the LO laser frequency sufficiently close to the transmitter laser.

We restrict our analysis to the feedback CR techniques OPLL and EPLL, which are governed by the same underlying theory, as described in Section II-B. Feedforward CR (FFCR) has been widely used in DSP-based coherent receivers [19], and it is also feasible in analog signal processing [20]. However, analog FFCR has several implementation drawbacks. First, phase estimation in analog FFCR is limited to non-data-aided (NDA) methods, e.g., raising the signal to Mth power (for M-PSK), which have poorer performance than decision-directed methods [21] and restrict modulation to PSK. Second, compared to feedback techniques, FFCR requires more complex analog circuitry to implement an Mth-power operation and frequency division. Furthermore, analog FFCR would offer virtually no improvement over EPLL, since commercial distributed feedback (DFB) lasers already have narrow linewidths on the order of 300 kHz [22], and the loop delay in an EPLL is very small, as the loop can be realized within a single chip.

Polarization Signal

Processin



Fig. 2. Block diagram of carrier recovery based on (a) OPLL and (b) EPLL (shown for one polarization only). Phase estimates in the two polarizations may be optionally combined in the adder depicted in dashed lines in both diagrams. In an OPLL implementation, the delay of the frequency correction signal must be as short as possible, which means that the LO laser must be physically close to that output. Note that an EPLL implementation requires a de-rotation stage in each polarization, since the transmitter and LO lasers are not phase locked. However, only one quadrature VCO (QVCO) and loop filter are necessary. The EPLL implementation may also require a frequency error estimator if the laser frequency drift exceeds the VCO frequency range.

#### A. Polarization Demultiplexing

In DSP-based coherent receivers, a  $2 \times 2$  MIMO equalizer performs polarization demultiplexing and compensates for PMD and polarization-dependent loss (PDL) [23].

Fortunately, PMD effects are negligible up to 80 km at 56 Gbaud with modern standard SMF [24]. With PDL causing only small power penalties at these distances, polarization rotation becomes the only impairment that needs to be compensated. Polarization rotation through a fiber varies on a time scale of the order of milliseconds [25], becoming slower on shorter link lengths [26], and can be compensated at the receiver by an optical polarization controller driven by low-speed (< 100 kHz) circuitry.

Polarization demultiplexing using optical polarization control has been implemented using fiber squeezers [27], [28] or interferometers with variable phase shifters [29]. Both solutions have the same underlying principle of cascading birefringent elements to transform the incoming state of polarization. The latter allows for integration and is the method chosen for our proposed analog coherent receiver. Fig. 1 shows the polarization controller, which is detailed in Fig. 3(a). A polarization beam splitter (PBS) and polarization beam rotator (PBR) first separate the two incoming, rotated polarizations and rotate one, such that both are aligned to integrated waveguides supporting a single polarization [30]. Once the incoming signal's rotated polarizations have been separated into two waveguides, they are cascaded through three phase shifters and two 50/50 couplers. By controlling the relative phase shift through each phase shifter,



Fig. 3. Block diagram of polarization controller recovery method. A marker tone is placed only on the in-phase portion of the X polarization at the transmitter. Propagation through the fiber results in polarization rotation, while the polarization controller compensates for this rotation using three phase shifters and two 50/50 couplers (fixed phase shifts are not shown). Three phase shifting sections are used in (a), while the architecture in (b) uses only two phase shifting sections, corresponding to systems where the CR is done independently in each polarization.

the polarizations are demultiplexed into the signals transmitted on the X and Y polarizations at the two output ports of the polarization controller, at which point they are guided to the  $90^{\circ}$  hybrid.

Polarization demultiplexing based on minimization of the radio frequency (RF) power spectral density (PSD), as previously proposed for DP-DQPSK [31] and discussed in Section III, requires independent CR in each polarization, which is not feasible for OPLL-based receivers, and adds significant complexity to EPLL-based receivers. Our proposed algorithm is based on marker tone detection and can be used for QPSK, higher-order quadrature amplitude modulation (QAM), and intensity-modulated (IM) signals. As illustrated in Fig. 3, the DP-QPSK transmitter has a low-frequency (< 50 kHz) marker tone added to the XI tributary. The phase shifters are adjusted to minimize the marker tone's presence in the XQ, YI, and YQ tributaries at the receiver, so that the polarization rotation through the fiber is compensated completely.

To show how the phase shifters are adjusted to demultiplex the incoming, rotated polarizations, we begin with the arbitrary, unitary matrix for polarization transformation due to fiber propagation in the absence of PDL and PMD, written as [32]:

$$I_{Fiber}^{T} = \begin{bmatrix} e^{j\alpha_1} & 0\\ 0 & e^{-j\alpha_1} \end{bmatrix} \begin{bmatrix} \cos(\zeta) & -j\sin(\zeta)\\ -j\sin(\zeta) & \cos(\zeta) \end{bmatrix} \begin{bmatrix} e^{j\alpha_0} & 0\\ 0 & e^{-j\alpha_0} \end{bmatrix},$$
(1)

where the variables  $\alpha_0$ ,  $\zeta$ , and  $\alpha_1$  represent random, timevarying rotation variables of the unitary matrix corresponding to the rotation undergone by propagation through the fiber, as shown in Fig. 3. Unless otherwise specified, throughout the text we omit (t) from time-varying variables to simplify notation. To compensate for this rotation, a similar matrix can be obtained by setting up a sequence of three phase shifters separated by two couplers, as shown in Fig. 3. The matrix corresponding to this sequence is

$$T_{Controller} = \begin{bmatrix} e^{j\phi_1} & 0\\ 0 & e^{-j\phi_1} \end{bmatrix} \begin{bmatrix} \cos(\theta) & -j\sin(\theta)\\ -j\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} e^{j\phi_0} & 0\\ 0 & e^{-j\phi_0} \end{bmatrix},$$
(2)

where the variables  $\varphi_1$ ,  $\theta$ , and  $\varphi_0$  correspond to the amount of differential phase shift in each of the three phase shifters, as shown in Fig. 3. Setting up mirror matrices for propagation and the controller enables easier analysis. The output electric field values after the polarization controller are

$$\begin{bmatrix} E_{o,x} \\ E_{o,y} \end{bmatrix} = T_{Controller} T_{Fiber} \begin{bmatrix} E_{i,x} \\ E_{i,y} \end{bmatrix},$$
(3)

where  $E_{o,x}$  and  $E_{i,y}$  are the output electric fields in the X and Y polarizations and  $E_{i,x}$  and  $E_{i,y}$  are the input electric fields in the X and Y polarizations. When the polarization controller is close to compensating the fiber-induced polarization rotation, the following approximations hold:

$$\begin{aligned}
\varphi_0 &\approx -\alpha_1 \\
\varphi_1 &\approx -\alpha_0 \\
\theta &\approx -\zeta.
\end{aligned}$$
(4)

Equality holds when the polarization rotation has been perfectly compensated. Using (3) small-angle approximations that follow from (4), we can compute the dependence of the input electric field that carries the marker tone on the output electric fields:

$$\operatorname{Im} \{E_{o,x}\} = [(\varphi_1 + \alpha_0) + \cos(2\theta) (\varphi_0 + \alpha_1)] \operatorname{Re} \{E_{i,x}\}$$
  

$$\operatorname{Re} \{E_{o,y}\}$$
  

$$= [-\sin(2\varphi_1) (\theta + \zeta) + \sin(2\theta) \cos(2\varphi_1) (\varphi_0 + \alpha_1)] \operatorname{Re} \{E_{i,x}\}$$
  

$$\operatorname{Im} \{E_{o,y}\}$$
  

$$= [-\cos(2\varphi_1) (\theta + \zeta) - \sin(2\theta) \sin(2\varphi_1) (\varphi_0 + \alpha_1)] \operatorname{Re} \{E_{i,x}\},$$
  
(5)

where  $\operatorname{Re}\{\cdot\}$  and  $\operatorname{Im}\{\cdot\}$  denote the real and imaginary part, respectively. By convention, the marker tone is in  $\operatorname{Re}\{E_{i,,x}\}$ . Note that we omit the other electric field input terms that do not carry the marker tone. Hence, the equations in (5) go to zero when the polarization rotation has been perfectly compensated.

In practice, the output electric fields used in (4) are represented by the downconverted signals XQ, YI, and YQ shown in Fig. 2. The polarization signal-processing block shown in Fig. 2 can be implemented by a circuit that would low- or band-pass filter the downconverted signals XQ, YI and YQ. It would then synchronously detect the filtered signals to estimate the amplitude and sign of the marker tones in each, enabling low-speed signal processing to adjust the controller phase shifts  $\varphi_1$ ,  $\theta$ , and  $\varphi_0$  to minimize the unwanted marker tone amplitudes, thus compensating for fiber polarization rotation. Various algorithms can be used; for example, the phase shift variables can be adjusted in round-robin manner to minimize the sum of the squares of the amplitudes of the unwanted marker tones in the various tributaries. Alternatively, the sign of the marker tone in each tributary can be used in conjunction with (5) to adjust the phase shifter variables. For example, if the detected marker tone in  $Im\{E_{o,x}\}$ has a positive sign,  $\varphi_1$  should be decreased and/or  $\varphi_0$  should be increased or decreased, depending on the sign of  $\theta$ . Appendix I shows that minimizing the unwanted marker tone amplitude results in polarization demultiplexing with 180° phase ambiguity i.e.,  $T_{Controller}T_{Fiber} = \pm I$ , where I is the identity matrix. This 180° phase ambiguity is not critical, however, since the receiver already has to resolve a 90° phase ambiguity introduced by CR. As discussed in Section II-B, the phase ambiguity is typically resolved by transmitting a training sequence or by differentially decoding the bits.

Through simulation, we have verified several different methods for adjusting these phase shifts. Fig. 4 illustrates one method, showing the phase shifter variables and bit-error rate (BER) vs. iteration step. In this process, the signs of the marker tone amplitudes in the XQ, YI and YQ tributaries are measured after bandpass filtering the signals and synchronously detecting them. By using (5) and these signs, the phase shifter variables are adjusted in the directions that minimize the marker tone amplitudes in the unwanted tributaries. Using this method, the phase shifts reliably converge to the corresponding fiber matrix variables, as shown in (4), and reliably track them, as shown in Fig. 4. In this simulation, the fiber matrix variables are changing at rotation rates on the Poincaré sphere up to 700 rad/s.

When the small-angle approximations that follow from (4) do not hold, such as during startup, a different procedure must be used before adaptation using (4) can begin. First, the same



Fig. 4. Convergence of the phase shifter variables  $\varphi_0$ ,  $\varphi_1$  and  $\theta$  to the fiber propagation matrix phases  $\alpha_1$ ,  $\alpha_0$  and  $\zeta$  indicates that the system locks to a demultiplexed state.



Fig. 5. Transmitting the same data in the X and Y polarizations results in QPSK constellations even if the rotation through the fiber is not at all compensated as in (a). When the first two phase shifters converge, as shown in (b), the Y polarization is recovered but there is residual phase offset shown as a rotation of the X polarization constellation. Red circles indicate transmitted signals; blue circles indicate received signals. The marker tone excursion is exaggerated and is shown as a line of circles.

data should be transmitted on both polarizations, resulting in QPSK constellations at the receiver regardless of polarization rotation through the fiber and allowing for CR phase estimation, as discussed in Section II-C. This will produce the constellations shown in Fig. 5(a). For the transmitted signals, the line extends only in the direction of the real or in-phase portion of the X polarization, but for the received signals, it is clearly in all four tributaries. Next, only variables  $\theta$  and  $\varphi_0$  should be adjusted

to minimize the marker tone only in the Y polarization. By allowing  $\varphi_1$  to be free, it only results in a residual phase offset between the X and Y polarizations, as shown by Fig. 5(b). At this point, the marker tone ideally only exists in the X polarization, and (4) can once again be used for adaptation.

The above assumes that phase estimation in the CR stage uses only the Y polarization, as shown in Fig. 5(b) by the Y polarization constellation's absolute phase rotation being zero. This is an important detail when CR phase estimation is performed using only one polarization, as discussed in Section II-B. If the transmitted marker tone is on the same polarization used for CR phase estimation, there will be a residual phase offset between the two polarizations that cannot be compensated using marker tone detection and the error probability in the orthogonal polarization could become as high as 0.5. This is because the marker tone is reduced to the X polarization, and the polarization controller's logic will have no knowledge of the relative rotation between the constellations in the X and Y polarizations. When the transmitted marker tone is in the X polarization and CR phase estimation is done using only the Y polarization, the residual phase offset between the two polarizations can be identified by the polarization controller as marker tone present in the imaginary or quadrature part of the X polarization, as shown in Fig. 5(b).

The number of phase shifters in the polarization controller determines the receiver optics complexity. For a previously proposed method for DP-DQPSK, the minimum number of phase shifters is shown to be two [33], since the two polarizations are recovered and detected separately. Our method requires three phase shifters if the two polarization branches share the same CR stage. This is the case in an OPLL (Fig. 2(a)), since the LO laser is shared by both polarizations. It is also the case for EPLL implementations sharing CR between polarization branches, regardless of whether one or two polarizations are used for phase estimation (Fig. 2(b)). If, however, CR is performed separately for each polarization branch, then only two phase shifters are required. This is shown in Fig. 3(b). Using two phase shifters results in residual phase offsets, denoted by  $Xe_0^{j\alpha}$  and  $Ye_0^{-j\alpha}$ , at the outputs of the polarization controller, but these time-varying, residual phase shifts are compensated later by CR.

The underlying material for the polarization controller can be silica, lithium niobate, or another low-loss material that allows integration of multiple phase-shifting sections. The waveguides do not necessarily need to support multiple polarizations, since the input polarizations are demultiplexed solely by coupling and phase shifts. Commercially available polarization controllers using discrete components [34] can track rotation rates up to thousands of radians per second without resets and with minimal insertion loss, but are too large and power-hungry to be integrated into a transceiver module.

Endless polarization control may be achieved by cascading more phase shifting sections, so that phase shifters can alternate and provide endless phase excursion, despite their individual phase excursion limits [27], [28], [35], [36]. Alternatively, endless polarization control can be achieved by resetting the phase shifters when one of them is close to its excursion limits. Resetting will cause burst errors during the switching period. For



Fig. 6. Block diagram of carrier phase estimators for QPSK inputs based on (a) Costas loop and (b) a multiplier-free approach based on XORs. LIA denotes limiting amplifier, and ABS denotes full-wave rectifier.

phase shifting speeds on the order of 1 ns for  $\pi$  phase shifts, typical of phase shifters used for high-speed data modulation [37], the burst errors can be corrected by 7% FEC with current interleaving standards at 56 Gbaud [38]. With phase shifting speeds on the order of 1  $\mu$ s for  $\pi$  phase shifts, typical of phase shifters tuned thermally [39], additional buffering of ~200 kbits would be required at 56 Gbaud, increasing latency on the order of the shifting time.

#### B. Carrier Recovery

CR architectures based on an OPLL or an EPLL consist of three basic stages: phase estimator, loop filter, and oscillator. The oscillator is the LO laser in an OPLL, and an electronic VCO in an EPLL. The phase estimator stage wipes off the modulated data in order to estimate the phase error, which is then filtered by the loop filter, producing a control signal for the oscillator frequency. We consider a second-order loop filter [15] described by the transfer function

$$F(s) = 2\zeta\omega_n + \omega_n^2 / s, \tag{6}$$

where  $\zeta$  is the damping coefficient, typically chosen to be  $1/\sqrt{2}$ as a compromise between fast response and small overshoot. Here  $\omega_n = 2\pi f_n$  is the loop natural frequency, which must be optimized to minimize the phase error variance. A second-order loop filter is typically preferred, as it has ideally infinite d.c. gain, resulting in zero steady-state error for a frequency step input.

Fig. 6 shows two possible implementations of a phase estimator for QPSK inputs. Fig. 6(a) shows the block diagram of a conventional Costas loop [21], which requires two linear and wideband analog multipliers per polarization. We propose a novel multiplier-free phase detector based on XOR gates, as



Fig. 7. Equivalent block diagram for Costas loop, without sign operation  $sgn(\cdot)$ , and XOR-based loop including  $sgn(\cdot)$ .

shown in Fig. 6(b). Multiplier-free Costas loop alternatives based on XOR gates have been proposed for BPSK [40] and for QPSK [41]. The latter relies on precisely delaying and adding the in-phase and quadrature components prior to the XOR operation. Using simple operations, our proposed phase detector estimates the sign of the phase error rather than its actual value. When XI and XQ form a QPSK signal, the output of the second XOR  $O_{XOR2}$  reduces to the sign of phase error:  $O_{XOR2} = \text{sgn}(\phi_e)$ . After loop filtering and negative feedback, this output counteracts the phase error. When the loop has made the phase error small,  $O_{XOR2}$  oscillates very fast, but these fast oscillations are virtually eliminated after low-pass filtering by the loop filter.

Fig. 7 shows an equivalent block diagram of Costas and XORbased loops of Fig. 6. They differ only in the nonlinear characteristic within the loop. While the Costas loop nonlinear function is simply  $\sin \phi_e(t)$ , for the XOR-based loop it is  $\operatorname{sgn}(\sin \phi_e(t))$ . The delay  $\tau_d$  accounts for lumped and distributed delays of components and signal paths in the EPLL or OPLL.

Similarly to [42], we use the small-signal approximation  $\sin \varepsilon \approx \varepsilon$  to linearize the loop transfer function in Fig. 7 and obtain the phase error variance:

$$\sigma_e^2 = \Delta v_{tot} \int_{-\infty}^{\infty} \left| j\omega + e^{-j\omega\tau_d} F(j\omega) \right|^{-2} d\omega + 2(2\pi)^2 k_a \int_0^{\infty} |\omega|^{-1} \left| j\omega + e^{-j\omega\tau_d} F(j\omega) \right|^{-2} d\omega + \frac{T_s}{2N_{PE}\gamma_s} \frac{1}{2\pi} \int_{-\infty}^{\infty} \left| \frac{F(j\omega)}{j\omega + e^{-j\omega\tau_d} F(j\omega)} \right|^2 d\omega, \quad (7)$$

where  $\Delta v_{tot}$  denotes the sum of the transmitter laser and LO laser linewidths,  $k_a$  characterizes the magnitude of flicker noise [43],  $T_s$  is the symbol time, and  $\gamma_s$  is the signal-to-noise ratio (SNR).  $N_{PE} = 1$ , if phase estimation is performed using only one polarization, and  $N_{PE} = 2$ , if phase estimation is performed in both polarizations and summed, as illustrated in Fig. 2. The terms in (7) account for phase error contribution due to the intrinsic laser phase noise caused by spontaneous emission, flicker noise and additive white Gaussian noise (AWGN), respectively. The loop filter, and in particular  $f_n$ , should be optimized to minimize (7).

It is important to highlight that  $\Delta v_{tot}$  refers to the intrinsic laser linewidth due to spontaneous emission. Low-frequency flicker noise caused by electrical noise in the tuning sections of tunable lasers may lead to an apparent broader linewidth. Indeed, as reported in [44], a typical sampled grating (SG) distributed



Fig. 8. Maximum loop delay for 0.5-dB SNR penalty as a function of the combined linewidth. Curves are shown for loop filter natural frequency optimized at every point, and when loop filter natural frequency is twice the optimal.

Bragg reflector (DBR) laser with linewidth below 1 MHz had an apparent linewidth ranging from 10 to 50 MHz. However, as indicated in (7), the flicker noise component on the phase error variance is smaller than intrinsic phase noise component, since the flicker noise term integral decays with an additional  $|\omega^{-1}|$ factor. Not considering this effect would lead to a suboptimal choice of  $f_n$ .

The SNR depends on whether the receiver is shot-noise limited, e.g., in unamplified intra-data center links, or ASE-limited, e.g., in amplified inter-data center links:

$$\gamma_s = \begin{cases} \frac{RP_{rx}}{2qR_s}, \text{ shot-noise limited} \\ \frac{RP_{rx}}{N_A n_{sp} h v R_s}, \text{ ASE-limited} \end{cases},$$
(8)

where  $P_{rx}$  is the received power, R is the photodiodes responsivity, q is the electron charge, h is Planck's constant, v is the optical signal frequency,  $N_A$  is the number of amplifiers, and  $R_s$  is the symbol rate. Note that a 1-dB penalty in SNR corresponds to a 1-dB penalty in the receiver sensitivity.

As shown in [45], the bit error probability of a PSK signal with phase error distributed according to  $N(0, \sigma_e^2)$  is

$$P_{b} = \frac{1}{2} \operatorname{erfc}\left(\sqrt{\gamma_{s}}\right) + \sum_{l=0}^{\infty} (-1)^{l} H_{l}\left(1 - \cos\left(\left(2l+1\right)\frac{\pi}{4}\right) e^{-0.5\left(2l+1\right)^{2}\sigma_{e}^{2}}\right),$$
(9)

where  $\sigma_e^2$  is given in (7) and

$$H_{l} = \frac{\sqrt{\gamma_{s}}e^{-\gamma_{s}/2}}{\sqrt{\pi}\left(2l+1\right)} \left(I_{l}\left(\frac{\gamma_{s}}{2}\right) + I_{l+1}\left(\frac{\gamma_{s}}{2}\right)\right) \ge 0, \quad (10)$$

where  $I_l(x)$  is the modified Bessel function of the first kind. Using (7)–(10), we can compute the receiver sensitivity penalty as a function of  $f_n$ ,  $\tau_d$ , and  $\Delta v_{tot}$ . Fig. 8 shows the maximum delay for a 0.5-dB SNR penalty as a function of the combined linewidth for  $N_{PE} = 1$ , 2 with respect to a system with no phase noise. The loop natural frequency is optimized at each point. The



Fig. 9. Comparison of SNR penalty vs combined linewidth for Costas loop and XOR-based loop. Simulation curves include thermal noise and ISI penalties, while theory curves do not.

maximum delay is significantly reduced at wider linewidths or when the natural frequency is suboptimal.

An example of this is shown in Fig. 8 by the curve where the natural frequency is twice the optimal. Interestingly, there is virtually no penalty for using only one of the polarizations for phase estimation in CR, as the optimal value of  $f_n$  is reached when the phase noise component in (7) is dominant. Fig. 8 assumes  $k_a = 1.7 \cdot 10^{10}$  Hz<sup>2</sup>, which is typical of DFB lasers [22], but for  $k_a = 3.4 \cdot 10^{11}$  Hz<sup>2</sup>, observed in digital supermode DBR (DS-DBR) lasers [22], the flicker noise effects become significant for  $\Delta v_{tot} < 1$  MHz.

Although (7) was derived using the small-signal approximation for the Costas loop  $\sin \varepsilon \approx \varepsilon$ , the performance of the XOR-based loop is similar to the Costas loop for the same loop filter parameters optimized using (7)–(9). Fig. 9 compares the performance of Costas and XOR-based loops as a function of  $\Delta v_{tot}$ . The analysis curves were obtained using (7)–(10), while the curves for Costas loop and XOR-based loop were obtained through Monte Carlo simulations. Interestingly, although the XOR-based loop does not allow a small-signal approximation to be made in analysis, its performance is very similar to the Costas loop. They differ by less than 0.5 dB for  $N_{PE} = 1,2$ .

Both Costas and XOR-based phase estimators exhibit a  $90^{\circ}$  phase ambiguity. This ambiguity is typically resolved by either transmitting a known training sequence at the beginning of transmission, or by differentially decoding the bits. Although differentially decoding the bits doubles the bit-error ratio (BER) [46], near the FEC threshold this corresponds to less than 0.5 dB SNR penalty. Moreover, using a training sequence would require retraining whenever there is a cycle slip. If the bits are differentially decoded, however, a cycle slip only causes a few more error events that could be corrected by the FEC.

# C. Proposed Startup Protocol

At startup, the receiver cannot perform polarization demultiplexing and CR simultaneously. For instance, marker tone detection is only possible after CR, so that the marker tone is at



Fig. 10. Block diagram of (a) polarization recovery and (b) detection of DQPSK signal. Detection is shown for only one polarization; it is identical in the second.

the expected frequency. CR, in turn, requires that the received signals in each polarization branch must be QPSK, which is not the case for any given received state of polarization. To circumvent these problems, we have devised a startup protocol, which can also be used to recover from a continuous loss of the marker tone in the relevant tributary caused by a discontinuous polarization change.

First, the transmitter sends the same data in both polarizations so that the received signal in each polarization branch is QPSK regardless of the received state of polarization. The transmitted sequence needs to be known at the receiver only if the bits are not differentially decoded, in which case a training sequence is required to resolve the 90° phase ambiguity. Once phase lock is acquired, the polarization estimation algorithm can adjust the phase shifters to demultiplex the two polarizations, as described in Section II-A, with the marker tone now at the appropriate frequency. Once the polarization recovery processing detecting sufficiently low marker tone amplitudes in the XQ, YI and YQ tributaries, data transmission in both polarizations can start.

## III. HOMODYNE DP-DQPSK RECEIVER

In DQPSK transmission, the information is encoded in the phase transitions between two consecutive symbols. Hence, DQPSK detection does not require an absolute phase reference and CR is not necessary, which significantly simplifies the receiver. Homodyne DQPSK, however, has some disadvantages compared to homodyne QPSK. First, DQPSK has an inherent  $\sim$ 2.4 dB SNR penalty due to differential detection compared to coherent detection [47]. Second, differential detection restricts modulation to PSK, which limits its spectral efficiency compared to quadrature-amplitude modulation (QAM).

Compared to the overall block diagram of DP-QPSK receiver shown in Fig. 1, the DP-DQPSK receiver differs in polarization recovery and in the high-speed analog electronics. Polarization recovery for DP-DQPSK is shown in Fig. 10(a). As discussed earlier in Section II-A, since detection is separate for the X and Y polarizations, only two phase shifters are needed. A small portion of the optical signal is split off and detected. By changing the phase shifts of the two regions until the RF PSD of this signal is minimized, the incoming, rotated polarizations will be demultiplexed [31].

The high-speed analog operations are shown in Fig. 10(b), which depicts the block diagram of differential decoding for one polarization. These two analog multiplications, phase shifts, time delays, and low-pass filtering are the only operations performed in the high-speed analog signal processing of the receiver.

For inter-data center applications using inline optical amplification, where receiver sensitivity is not as critical due to optical amplification, DQPSK may be implemented using delay interferometers [9]. This implementation is particularly interesting, since virtually all signal processing is done in the optical domain. High-speed electronics is employed only to perform CDR. Nevertheless, we restrict our focus to LO-based DQPSK, which can also be used in intra-data center links not employing inline optical amplification.

### A. Frequency Offset Penalty

Although the DQPSK receiver does not require CR, the frequency offset between transmitter and LO lasers may lead to a significant penalty. The error probability of M-DPSK was studied in [48]. Assuming that the SNR is time invariant the BER of M-DPSK in the presence of frequency error is given by

$$P_{b} = \frac{2}{\log_{2} M} \left( F\left(\pi\right) - F\left(\pi/M\right) \right),$$

$$F\left(\phi\right) = \frac{\gamma_{s} \sin\left(\Delta\Psi - \phi\right)}{4\pi}$$

$$\times \int_{-\pi/2}^{\pi/2} \frac{\exp\left(-\left(\gamma_{s} - \gamma_{s} \cos\left(\Delta\Psi - \phi\right) \cos t\right)\right)}{\gamma_{s} - \gamma_{s} \cos\left(\Delta\Psi - \phi\right) \cos t} dt$$
(11)

where  $\Delta \Psi = 2\pi f_{off} T_s$  is the phase error due to frequency offset  $f_{off}$  during a symbol period.

Fig. 11 shows the SNR penalty as a function of the frequency offset. The SNR penalty grows roughly quadratically with frequency offset and reaches 3 dB at  $f_{off} = 2$  GHz. As in the EPLL-based receivers discussed in Section II, frequency combs [17] at both the transmitter and LO can be used to amortize the high cost and power consumption of strict laser temperature control. Alternatively, frequency-locking techniques based on frequency discriminators can be employed [18].

#### IV. COMPARISON AND DISCUSSION

In this section, we compare the performance of the proposed receiver architectures based on analog signal processing with their DSP-based counterparts. In the DSP-based receiver, equalization and polarization demultiplexing are simplified, as discussed in Appendix II. CR is performed using the Viterbi-Viterbi method [49], a feedforward method that uses a simple averaging filter rather than the optimal Wiener filter [19].



Fig. 11. SNR penalty as a function of frequency offset between transmitter and LO lasers for a 224 Gbit/s DP-DPQSK system.



Fig. 12. Power penalty versus dispersion for several receiver architectures. The dispersion here could refer to the total dispersion in an intra-data center link or the residual dispersion after optical CD compensation in an inter-data center link.

We target a bit rate of 200 Gbit/s per wavelength, resulting in 224 Gbit/s after including 7% hard-decision FEC overhead [50], and 5% Ethernet overhead. The FEC is assumed to be hard-decision RS(255, 239) or similar, which leads to a FEC threshold of  $1.8 \times 10^{-4}$  [50].

Fig. 12 shows the SNR penalty vs CD for the several receiver architectures. Table I summarizes the simulation parameters. The reference SNR is roughly 11 dB, which corresponds to the SNR required to achieve the target BER with DP-QPSK in an ISI-free channel with matched filtering at the receiver.

At small CD, DSP-free systems exhibit ~1 dB SNR penalty compared to their DSP-based counterparts due to ISI from component bandwidth limitations and suboptimal receiver filtering, which in our simulation was a 5th-order Bessel filter with bandwidth equal to 0.7  $R_s$ . Increasing CD incurs very little penalty in DSP-based systems owing to equalization. For DSP-free

TABLE I SIMULATION PARAMETERS

|                               | Modulation format                    | (D)QPSK                          |
|-------------------------------|--------------------------------------|----------------------------------|
|                               | Bit rate                             | 224 Gbit/s                       |
| Tx                            | Laser linewidth                      | 200 kHz                          |
|                               | Relative intensity noise             | -150 dB/Hz                       |
|                               | Modulator bandwidth                  | 30 GHz                           |
| Rx                            | Photodiodes responsivity             | 1 A/W                            |
|                               | Input-referred noise                 | $30 \text{ pA/}\sqrt{\text{Hz}}$ |
|                               | Antialiasing filter type             | 5 <sup>th</sup> -order<br>Bessel |
|                               | Antialiasing filter bandwidth        | 39.2 GHz                         |
| LO Laser                      | Linewidth*                           | 200 kHz                          |
|                               | Output power                         | 15 dBm                           |
|                               | Relative intensity noise             | -150 dB/Hz                       |
| Analog<br>Carrier<br>Recovery | Loop filter damping factor           | $\sqrt{2}$ / 2                   |
|                               | Loop delay ( $\tau_d$ )              | 250 ps                           |
|                               | Optimal natural frequency* $(f_n^*)$ | 116 MHz                          |
|                               | $N_{PE}$                             | 1                                |
| DSP                           | ADC effective resolution             | 5 bits                           |
|                               | Oversampling rate                    | 5/4                              |
|                               | Equalizer number of taps             | 7                                |
|                               | Filter adaptation algorithm          | CMA                              |
|                               | Feedforward filter number of taps    | 9                                |

\* Except for the set of curves indicated in Fig. 12.

systems, the SNR penalty increases quadratically with CD and reaches 5 dB at roughly  $\pm 35$  ps/nm. Note that, as expected, DQPSK systems exhibit a penalty of  $\sim 2.4$  dB compared to QPSK systems.

The penalty of using an XOR-based loop as opposed to a Costas loop is less than 0.5 dB, even when  $\Delta v_{tot} = 2$  MHz. The two scenarios of  $\Delta v_{tot} = 400$  kHz and  $\Delta v_{tot} = 2$  MHz represent likely realizations of EPLL and OPLL, respectively.

An OPLL implementation requires phase tunable lasers, which typically exhibit linewidth on the order of a few MHz [44], [51]. An EPLL implementation can use standard DFB lasers, which exhibit linewidths of several hundred kHz [22].

An ideal, shot-noise limited DP-QPSK receiver exhibits receiver sensitivity of -35 dBm. Assuming realistic polarization demultiplexing loss of 2 dB [52], [53], 90° hybrid loss of 1.5 dB [54], and 5 dB SNR penalty due to  $\pm 35$  ps/nm dispersion, the receiver sensitivity becomes -26.5 dBm, which is nearly 13 dB better than that of an amplified 4-PAM system at half the bit rate. Note that these values are for devices optimized for near 1550 nm; similar values are expected for devices optimized for near 1310 nm, though coherent detection and DWDM components for the O band are not as commercially mature as Cband components. This sensitivity would allow eye-safe systems near 1310 nm to achieve a reach up to 40 km. In fact, systems with 100 GHz wavelength spacing could support 49 channels with 5 dB of margin, and systems with 200 GHz wavelength spacing could support 25 channels with 8 dB of margin.

The SNR penalty in Fig. 12 is equivalent to an OSNR penalty in amplified systems. The actual values are related by the wellknown expression [47]:

$$OSNR = \frac{\Delta f}{\Delta \nu_{opt}} SNR, \qquad (12)$$

where  $\Delta \nu_{opt} = 12.5$  GHz is the reference bandwidth to measure OSNR, and  $\Delta f$  is the one-sided noise bandwidth of the electric signal before detection, which in the analog implementation, due to imperfect filtering, is  $\Delta f \approx 38$  GHz. Hence, the reference system achieves the target BER when OSNR  $\approx 16$  dB. Moreover, amplified systems near 1550 nm require optical CD compensation, as  $\pm 35$  ps/nm of CD corresponds to just a few kilometers of dispersion-uncompensated transmission.

We restrict the power consumption comparison to the polarization demultiplexing and high-speed electronics for the DSP-free architectures, and ADCs and DSP for the DSP-based receiver. Other components such as the LO laser, photodiodes, TIA-AGCs, and FEC decoding are the same in both systems. Using the models listed in [12] for power consumption of ADC and DSP of long-haul coherent systems, and the simplifications from Appendix II, the power consumption of the DSP-based receivers including only ADC and DSP for 224 Gbit/s DP-QPSK amounts to 37.3 W in 28-nm CMOS. In 7-nm CMOS, this estimate drops to 12.4 W. These calculations assume that a complex multiplication is performed using three real multiplications. Moreover, we assume that the receiver DSP implements carrier recovery, timing recovery, and a simplified MIMO equalizer as described in Appendix II with 7 taps per filter. All parameters are identical to [12].

Power consumption of the analog receiver is harder to estimate, since there is more variability in the choice of the functional block implementation and transistor technology. For instance, CMOS transistors would offer lower manufacturing costs, while bipolar transistors would offer improved linearity and lower power consumption. The most complex and power hungry parts of the proposed analog circuitry are analog mixers and XORs. Both can be realized using Gilbert cells [41], [55]. A 9-to-50-GHz Gilbert-Cell down-conversion mixer built in 130-nm CMOS had a total power consumption of 97 mW [56], while a 25-75 GHz broadband Gilbert-Cell mixer using 90-nm CMOS had a total power consumption of 93 mW [57]. Passive mixers would exhibit even lower power consumption. An EPLL implementation requires eight analog mixers, two XORs, four adders, two limiting amplifiers, two full-wave rectifiers, one comparator, one loop filter, and one QVCO. Under the conservative assumption that the power consumption of each individual component is equal to the power consumption of a Gilbert cell (93 mW in 90-nm CMOS), the aggregate power consumption of all functional blocks would be nearly 2 W. This estimate does not account for layout and interconnects, which typically double the power consumption of high-speed analog integrated circuits. Hence, we estimate that the power consumption of the high-speed analog electronics for an EPLL implementation would be 4 W. More accurate estimates may only be obtained after circuit-level design, which is beyond the scope of this work. An OPLL-based DP-QPSK receiver and a DP-DQPSK receiver have even lower power consumption, as they do not require a de-rotation stage.

Other receiver operations such as polarization demultiplexing and CDR are also power-efficient. For instance, three phase shifting sections can have a total power consumption of approximately 75 mW [52]. Moreover, a 40 Gb/s CDR in 90 nm CMOS consumes 48 mW [13], excluding output buffers.

Although transmitter architectures are beyond the scope of this paper, the power consumption of DP transmitters can be significantly simplified by leveraging advances in modulator materials and technologies [58]–[60]. Similar to the tradeoffs made for our data center receiver architecture, transmitters can additionally reduce power consumption and complexity by avoiding digital pulse-shaping, digital pre-emphasis and digital-to-analog converters (DACs).

## V. CONCLUSION

We proposed and evaluated DSP-free analog coherent receiver architectures for unamplified intra-data center links and amplified inter-data center links. We showed that using a marker tone-based polarization demultiplexing scheme with an optical polarization controller, the analog coherent receiver can recover and track the transmitted polarization-multiplexed signals for a receiver operating at baseband. This technique can be extended to higher order QAM formats like 16-QAM and above, and can also be extended to higher-order IM formats such as 4-PAM and above. We also showed how CR can be conducted using a multiplier-free phase detector based on XOR gates and that its performance is within 0.5 dB of a Costas loop-based phase detector. Our proposed multiplier-free phase estimator is limited to QPSK inputs, however. Finally, we showed that DSPfree analog coherent receivers would have  $\sim 1$  dB penalty at small CD relative to their DSP-based counterparts. The SNRpenalty for DSP-free systems increases quadratically with CD and reaches 5 dB at roughly  $\pm$ 35 ps/nm. The power consumption of polarization demultiplexing and high-speed electronics is estimated to be nearly 4 W in 90 nm CMOS. Moreover, the improved receiver sensitivity due to coherent detection would allow 40-km unamplified and eye-safe transmission of up to 49 DWDM channels near 1310 nm, potentially blending intraand inter-data center applications.

#### APPENDIX I

# MARKER TONE POLARIZATION DEMULTIPLEXING

To properly demultiplex the incoming, rotated polarizations, the polarization controller must essentially invert the fiber transfer matrix, so that  $T_{Controller}T_{Fiber} = I$ , where I is the identity matrix. Minimization of the marker tone in the tributaries in which it is not transmitted leads to solutions that satisfy one of

TABLE II POLARIZATION CONTROLLER VARIABLES THAT LEAD TO MARKER TONE DETECTION AS FUNCTIONS OF THE FIBER TRANSFER MATRIX VARIABLES

| $T_{Controller}T_{Fiber} = I$ |                             |                             | $T_{Controller}T_{Fiber} = -I$ |                             |                             |
|-------------------------------|-----------------------------|-----------------------------|--------------------------------|-----------------------------|-----------------------------|
| θ                             | $arphi_0$                   | $arphi_1$                   | θ                              | $arphi_0$                   | $arphi_1$                   |
| -ζ                            | $-\alpha_1$                 | $-\alpha_0$                 | -ζ                             | $-\alpha_1$                 | $-\alpha_0 + \pi$           |
| -ζ                            | $-\alpha_1 + \pi$           | $-\alpha_0 + \pi$           | -ζ                             | $-\alpha_1 + \pi$           | $-\alpha_0$                 |
| $-\zeta + \pi$                | $-\alpha_1$                 | $-\alpha_0 + \pi$           | $-\zeta + \pi$                 | $-\alpha_1$                 | $-\alpha_0$                 |
| $-\zeta + \pi$                | $-\alpha_1 + \pi$           | $-\alpha_0$                 | $-\zeta + \pi$                 | $-\alpha_1 + \pi$           | $-\alpha_0 + \pi$           |
| ζ                             | $-\alpha_1 - \frac{\pi}{2}$ | $-\alpha_0 + \frac{\pi}{2}$ | ζ                              | $-\alpha_1 + \frac{\pi}{2}$ | $-\alpha_0 + \frac{\pi}{2}$ |
| ζ                             | $-\alpha_1 + \frac{\pi}{2}$ | $-\alpha_0 - \frac{\pi}{2}$ | ζ                              | $-\alpha_1 - \frac{\pi}{2}$ | $-\alpha_0 - \frac{\pi}{2}$ |
| $\zeta + \pi$                 | $-\alpha_1 + \frac{\pi}{2}$ | $-\alpha_0 + \frac{\pi}{2}$ | $\zeta + \pi$                  | $-\alpha_1 - \frac{\pi}{2}$ | $-\alpha_0 + \frac{\pi}{2}$ |
| $\zeta + \pi$                 | $-\alpha_1 - \frac{\pi}{2}$ | $-\alpha_0 - \frac{\pi}{2}$ | $\zeta + \pi$                  | $-\alpha_1 + \frac{\pi}{2}$ | $-\alpha_0 - \frac{\pi}{2}$ |

Note that maker tone detection leads to the overall channel matrix being  $T_{Controller}T_{Fiber} = \pm I.$ 

two pairs of equations,

$$\cos(\zeta)e^{j(\alpha_1+\alpha_0)} = \cos(\theta)e^{-j(\varphi_1+\varphi_0)}$$
$$\sin(\zeta)e^{-j(\alpha_1-\alpha_0)} = -\sin(\theta)e^{-j(\varphi_1-\varphi_0)}, \qquad (13)$$

or,

$$\cos(\zeta)e^{j(\alpha_1+\alpha_0)} = -\cos(\theta)e^{-j(\varphi_1+\varphi_0)}$$
$$\sin(\zeta)e^{-j(\alpha_1-\alpha_0)} = \sin(\theta)e^{-j(\varphi_1-\varphi_0)}.$$
(14)

The first pair of equations leads to solutions shown in the first three columns of Table II. Each of these solutions properly inverts  $T_{Fiber}$  and leads to polarization demultiplexing i.e.,  $T_{Controller}T_{Fiber} = I$ .

The second pair of equations, shown by (14), leads to solutions shown in the second three columns of Table II. In this case, the overall transfer matrix is  $T_{Controller}T_{Fiber} = -I$ . This corresponds to the constellation in each polarization being rotated by 180°. Therefore, minimizing the unwanted marker tone amplitude results in polarization demultiplexing with 180° phase ambiguity. Nevertheless, as discussed in Section II-A, this phase ambiguity is not critical since the receiver already has to resolve a 90° phase ambiguity introduced by CR.

Changing any one of the three polarization controller variables by  $\pm \pi$  also transforms the overall transfer matrix from -I to I. Changing any two of the three polarization controller variables  $\pm \pi$  preserves the overall transfer matrix, allowing for resetting of phase shifters with finite excursion.



Fig. 13. Block diagram of (a) CD and  $2 \times 2$  MIMO equalizers used in conventional coherent receivers, and (b) simplified equalizer for short-reach applications assuming small CD and small-DGD approximation.

## APPENDIX II SIMPLIFIED DSP-BASED COHERENT RECEIVER

Fig. 13(a) shows the block diagram of the equalization and polarization demultiplexing stages typically used in long-haul systems [61]. First, CD equalization is performed using nearly static frequency-domain equalizers with hundreds of taps. Followed by a  $2 \times 2$  MIMO equalizer constituted of filters with typically less than 15 taps that is updated frequently in order to mitigate PMD and track changes in the received state of polarization.

The CD equalizers may be omitted if CD is small enough such that the filters in the 2 × 2 MIMO equalizer can compensate for it. Moreover, note that if the skew between the two polarizations is much smaller than the sampling rate, the coefficients of filter  $h_{11}$  are approximately proportional to those of  $h_{12}$ , and similarly for filters  $h_{21}$  and  $h_{22}$ . Hence, we can simplify the 2 × 2 MIMO as shown in Fig. 13(b), which nearly halves the require number of DSP operations compared to the 2 × 2 MIMO equalizer from Fig. 13(a). This simplification only holds when the mean differential group delay (DGD) between the two polarizations is much smaller than the sampling rate, so that the two polarizations appear synchronized at the receiver. Assuming a sampling rate of 70 GS/s (oversampling ratio of 5/4), and PMD of 0.1 ps/ $\sqrt{km}$  [9], the small-DGD approximation holds up to ~200 km.

In Fig. 13(b), the filters  $h_{11}$  and  $h_{22}$  mitigate ISI caused by CD, PMD, and components bandwidth limitations. The cross terms  $h_{12}$ ,  $h_{21}$  remove the Y component from X and vice-versa. The filter coefficients can be updated using either leastmean squares (LMS) or constant-modulus amplitude (CMA) algorithms. The update equations are shown in Table III. Note that these equations assume a time-domain implementation. Since these filters are very short (7 taps in the simulations of Section IV), there is virtually no difference in efficiency between

TABLE III UPDATE EQUATIONS USING CMA OR LMS ALGORITHM

| Algorithm | Error measure               | Update equations                                                                                                                                                                                                                                                  |
|-----------|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| СМА       | $e_1[n] = 2 -   y_1[n]  ^2$ | $\boldsymbol{h}_{11} \leftarrow \boldsymbol{h}_{11} + \mu \boldsymbol{e}_1[n] \boldsymbol{y}_1[n] \boldsymbol{x}_1^*$ $\boldsymbol{h}_{12} \leftarrow \boldsymbol{h}_{12} + \mu \boldsymbol{e}_1[n] \boldsymbol{y}_1[n] \boldsymbol{h}_{11}^H \boldsymbol{x}_1^*$ |
| LMS       | $e_1[n] = y_1 - [y_1]_D$    | $\boldsymbol{h}_{II} \leftarrow \boldsymbol{h}_{II} - 2\mu \boldsymbol{e}_1[n] \boldsymbol{x}_1^*$ $\boldsymbol{h}_{12} \leftarrow \boldsymbol{h}_{12} - 2\mu \boldsymbol{e}_1[n] \boldsymbol{h}_{II}^H \boldsymbol{x}_1^*$                                       |

Variables in boldface are vectors,  $[\cdot]_D$  denote the decision operator,  $\mathbf{x}^*$  denotes element-wise complex conjugate, and  $\mathbf{x}^H$  denotes the Hermitian of a vector.

time-domain and frequency-domain implementations. Note that for large CD systems, such as inter data center links reaching up to 80 km, the CD equalizers cannot be omitted.

#### ACKNOWLEDGMENT

The authors would like to thank the helpful discussions with Prof. B. Murmann, and also for the comments and suggestions made by the anonymous reviewers.

#### REFERENCES

- M. Sharif, J. K. Perin, and J. M. Kahn, "Modulation schemes for singlelaser 100 Gb/s links: Single-carrier," *J. Lightw. Technol.*, vol. 33, no. 20, pp. 4268–4277, Oct. 2015.
- [2] J. K. Perin, M. Sharif, and J. M. Kahn, "Modulation schemes for singlewavelength 100 Gbit/s links: Multicarrier," *J. Lightw. Technol.*, vol. 33, no. 24, pp. 5122–5132, Dec. 2015.
- J. D'Ambrosia, "IEEE P802.3bs baseline summary," 2015. [Online]. Available: http://www.ieee802.org/3/bs/index.html
- [4] N. Eiselt *et al.*, "First real-time 400G PAM-4 demonstration for inter-data center transmission over 100 km of SSMF at 1550 nm," in *Proc. Opt. Fiber Commun. Conf.*, 2016, Paper W1K.5.
- [5] J. Krause Perin, M. Sharif, and J. M. Kahn, "Sensitivity improvement in 100 Gbit/s-per- wavelength links using semiconductor optical amplifiers or avalanche photodiodes," *J. Lightw. Technol.*, vol. 34, no. 33, pp. 5542–5553, Dec. 2016.
- [6] D. Che, Q. Hu, and W. Shieh, "High-spectral-efficiency optical direct detection using the stokes vector receiver," in *Proc. Eur. Conf. Opt. Commun.*, 2015, pp. 1–3.
- [7] M. Morsy-Osman, M. Chagnon, M. Poulin, S. Lessard, and D. V. Plant, "224-Gb/s 10-km transmission of PDM PAM-4 at 1.3 um using a single intensity-modulated laser and a direct-detection MIMO DSP-based receiver," *J. Lightw. Technol.*, vol. 33, no. 7, pp. 1417–1424, Apr. 2015.
- [8] L. Zhang *et al.*, "Beyond 100-Gb/s transmission over 80-km SMF using direct-detection SSB-DMT at C-Band," *J. Lightw. Technol.*, vol. 34, no. 2, pp. 723–729, Jan. 2016.
- [9] E. Ip, A. P. T. Lau, D. J. F. Barros, and J. M. Kahn, "Coherent detection in optical fiber systems," *Opt. Express*, vol. 16, no. 2, pp. 753–791, 2008.
- [10] K. Roberts *et al.*, "High capacity transport—100G and beyond," J. Lightw. Technol., vol. 33, no. 3, pp. 563–578, Feb. 2015.
- [11] C. Laperle and M. Osullivan, "Advances in high-speed DACs, ADCs, and DSP for optical coherent transceivers," *J. Lightw. Technol.*, vol. 32, no. 4, pp. 629–643, Feb. 2014.
- [12] B. S. G. Pillai *et al.*, "End-to-end energy modeling and analysis of longhaul coherent transmission systems," *J. Lightw. Technol.*, vol. 32, no. 18, pp. 3093–3111, Sep. 2014.
- [13] C. F. Liao and S. I. Liu, "40 Gb/s transimpedance-AGC amplifier and CDR circuit for broadband data receivers in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 3, pp. 642–655, Mar. 2008.
- [14] M. Lu, H.-C. Park, E. Bloch, L. A. Johansson, M. J. Rodwell, and L. A. Coldren, "Highly integrated homodyne receiver for short-reach coherent communication," *Int. Photon. Optoelectron.*, 2015, Paper OT2A.4, doi: 0.1364/OEDI.2015.OT2A.4.
- [15] F. Gardner, *Phaselock techniques*, 3rd ed., John Wiley & Sons Inc., Hoboken, NJ, USA. doi: 10.1002/0471732699.

- [16] K. AbuGharbieh, M. Abdelfattah, T. Al-Maaita, and A. Tahboub, "A wide tuning range 11.8 GHz ring oscillator VCO with temperature and process compensation," in *Proc. IEEE EuroCon 2013*, Jul. 2013, pp. 1844– 1848.
- [17] T. N. Huynh et al., "200-Gb/s baudrate-pilot-aided QPSK/direct detection with single-section quantum-well mode-locked laser," *IEEE Photon. J.*, vol. 8, no. 2, Apr. 2016, Art. no. 7903107.
- [18] H. R. Rideout, J. S. Seregelyi, S. Paquet, and J. Yao, "Discriminatoraided optical phase-lock loop incorporating a frequency down-conversion module," *IEEE Photon. Technol. Lett.*, vol. 18, no. 22, pp. 2344–2346, Nov. 2006.
- [19] E. Ip and J. M. Kahn, "Feedforward carrier recovery for coherent optical communications," J. Lightw. Technol., vol. 25, no. 9, pp. 2675–2692, Sep. 2007.
- [20] R. Noé, "Phase noise-tolerant synchronous QPSK/BPSK baseband-type intradyne receiver concept with feedforward carrier recovery," J. Lightw. Technol., vol. 23, no. 2, pp. 802–817, Feb. 2005.
- [21] J. R. Barry and J. M. Kahn, "Carrier synchronization for homodyne and heterodyne detection of optical quadriphase-shift keying," *J. Lightw. Technol.*, vol. 10, no. 12, pp. 1939–1951, Dec. 1992.
- [22] I. Fatadin, D. Ives, and S. J. Savory, "Differential carrier phase recovery for QPSK optical coherent systems with integrated tunable lasers," *Opt. Express*, vol. 21, no. 8, pp. 10166–10171, 2013.
- [23] M. Kuschnerov *et al.*, "DSP for coherent single-carrier receivers," J. Lightw. Technol., vol. 27, no. 16, pp. 3614–3622, Aug. 2009.
- [24] "ITU-T recommendation G.652: Characteristics of a single-mode optical fibre and cable," 2009. [Online]. Available: https://www.itu.int/rec/T-REC-G.652/en.
- [25] H. Bulow *et al.*, "Measurement of the maximum speed of PMD fluctuation in installed field fiber," in *Proc. Tech. Digest. Opt. Fiber Commun. Conf. 1999, Int. Conf. Integr. Opt. Opt. Fiber Commun.*, 1999, pp. 83– 85.
- [26] K. Choutagunta and J. M. Kahn, "Dynamic channel modeling for modedivision multiplexing," *J. Lightw. Technol.*, vol. 35, no. 12, pp. 2451–2463, Jun. 2017.
- [27] R. Noe, H. Heidrich, and D. Hoffmann, "Endless polarization control systems for coherent optics," J. Lightw. Technol., vol. 6, no. 7, pp. 1199–1208, Jul. 1988.
- [28] N. G. Walker and G. R. Walker, "Endless polarisation control using four fibre squeezers," *Electron. Lett.*, vol. 23, no. 6, pp. 290–292, Mar. 1987.
- [29] H. Heidrich, C. H. Von Helmolt, D. Hoffmann, H. J. Hensel, and A. Kleinwächter, "Polarisation transformer on Ti:LiNbO3 with reset-free optical operation for heterodyne/homodyne receivers," *Electron. Lett.*, vol. 23, no. 7, pp. 335–336, Mar. 1987.
- [30] H. Fukuda, K. Yamada, T. Tsuchizawa, T. Watanabe, H. Shinojima, and S. Itabashi, "Polarization beam splitter and rotator for polarizationindependent silicon photonic circuit," in *Proc. 4th IEEE Int. Conf. Group IV Photon.*, 2007, pp. 1–3.
- [31] C. R. Doerr, N. K. Fontaine, and L. L. Buhl, "PDM-DQPSK silicon receiver with integrated monitor and minimum number of controls," *IEEE Photon. Technol. Lett.*, vol. 24, no. 8, pp. 697–699, Apr. 2012.
- [32] C. K. Madsen *et al.*, "Reset-free integrated polarization controller using phase shifters," *IEEE J. Sel. Topics Quantum Electron.*, vol. 11, no. 2, pp. 431–438, Mar./Apr. 2005.
- [33] C. R. Doerr and L. Chen, "Monolithic PDM-DQPSK receiver in silicon," in Proc. Eur. Conf. Opt. Commun., 2010, pp. 1–3.
- [34] "General photonics: Reset-free polarization stabilizer polastay<sup>TM</sup> specifications." [Online]. Available: http://www.generalphotonics.com/wpcontent/uploads/2015/04/POS-20X.pdf
- [35] R. Noe, "Endless polarisation control in coherent optical communications," *Electron. Lett.*, vol. 22, no. 15, pp. 772–773, Jul. 1986.
- [36] H. Shimizu and K. Kaede, "Endless polarisation controller using electrooptic waveplates," *Electron. Lett.*, vol. 24, no. 7, pp. 412–413, Mar. 1988.
- [37] P. Dong, C. Xie, L. Chen, L. L. Buhl, and Y. Chen, "112-Gb/s monolithic PDM-QPSK modulator in silicon," *Opt. Express*, vol. 20, no. 26, pp. 624–629, 2012.
- [38] ITU-T "Recommendation G.709: Interfaces for the optical transport network," 2016. [Online]. Available: http://www.itu.int/rec/T-REC-G.709/
- [39] Y. Zhou *et al.*, "Thermo-optic switch based on photonic crystal nanobeam cavities," *Photon. Res.*, vol. 5, no. 2. pp. 108–112, 2017.
- [40] H. Park et al., "40Gbit/s coherent optical receiver using a costas loop," Opt. Express, vol. 20, no. 26, pp. 197–203, 2012.
- [41] E. Bloch, "Millimeter-wave CMOS and InP front-end ICs for optical and wireless high data-rate communication," Ph.D. dissertation, Dept. Elect. Eng., Technion - Israel Institute of Technology, Haifa, Israel, 2014.

- [42] M. A. Grant, W. C. Michie, and M. J. Fletcher, "The performance of optical phase-locked loops in the presence of nonnegligible loop propagation delay," *J. Lightw. Technol.*, vol. 5, no. 4, pp. 592–597, Apr. 1987.
- [43] L. G. Kazovsky, "Balanced phase-locked loops for optical homodyne receivers: Performance analysis, design considerations, and laser linewidth requirements," J. Lightw. Technol., vol. 4, no. 2, pp. 182–195, Feb. 1986.
- [44] S. Ristic, A. Bhardwaj, M. J. Rodwell, L. A. Coldren, and L. A. Johansson, "An optical phase-locked loop photonic integrated circuit," *J. Lightw. Technol.*, vol. 28, no. 4, pp. 526–538, Feb. 2010.
- [45] V. K. Prabhu, "PSK performance with imperfect carrier phase recovery," *IEEE Trans. Aerosp. Electron. Syst.*, vol. AES-12, no. 2, pp. 275–286, Mar. 1976.
- [46] M. E. Frerking, Digital Signal Processing in Communication Systems. Norwell, MA, USA: Kluwer, 1993.
- [47] G. P. Agrawal, Fiber-Optic Communication Systems. New York, NY, USA: Wiley, 2002.
- [48] R. F. Pawula, S. O. Rice, and J. H. Roberts, "Distribution of the phase angle between two vectors perturbed by Gaussian noise," *IEEE Trans. Veh. Technol.*, vol. 50, no. 2, pp. 576–583, Mar. 2001.
- [49] S. J. Savory, "Digital coherent optical receivers: Algorithms and subsystems," *IEEE J. Sel. Topics Quantum Electron.*, vol. 16, no. 5, pp. 1164–1179, Sep./Oct. 2010.
- [50] Optical Fibres, Cables and Systems, International Telecommunications Union, Geneva, Switzerland, 2009.
- [51] L. N. Langley *et al.*, "Packaged semiconductor laser optical phase-locked loop (OPLL) for photonic generation, processing and transmission of microwave signals," *IEEE Trans. Microw. Theory Tech.*, vol. 47, no. 7, pp. 1257–1264, Jul. 1999.
- [52] N. C. Harris *et al.*, "Efficient, compact and low loss thermo-optic phase shifter in silicon," *Opt. Express*, vol. 22, no. 9, pp. 83–85, 2014.
- [53] Y. Zhang *et al.*, "Ultra-compact and highly efficient silicon polarization splitter and rotator," *APL Photonics*, vol. 1 no. 9, pp 091304-1–6, 2016, doi: 10.1063/1.4965832.

- [54] S. Farwell *et al.*, "InP Coherent receiver chip with high performance and manufacturability for CFP2 modules," in *Proc. Opt. Fiber Commun. Conf.*, 2014, Paper W1I.6.
- [55] H. Park, "High speed integrated circuits for high speed coherent optical communications," Ph.D. dissertation, Dept. Elect. Eng., University of California, Santa Barbara, CA, 2014.
- [56] C.-S. Lin, P.-S. Wu, H.-Y. Chang, and H. Wang, "A 9–50-GHz gilbert-cell down-conversion mixer in 0.13-um CMOS technology," *IEEE Microw. Wireless Components Lett.*, vol. 16, no. 5, pp. 293–295, May 2006.
- [57] J. H. Tsai, P. S. Wu, C. S. Lin, T. W. Huang, J. G. J. Chern, and W. C. Huang, "A 25-75 GHz broadband Gilbert-cell mixer using 90-nm CMOS technology," *IEEE Microw. Wireless Components Lett.*, vol. 17, no. 4, pp. 247–249, Apr. 2007.
- [58] B. Milivojevic *et al.*, "112Gb/s DP-QPSK transmission over 2427 km SSMF using small-size silicon photonic IQ modulator and low-power CMOS driver," in *Proc. Opt. Fiber Commun. Conf.*, Anaheim, CA, 2013, pp. 1–3, doi: 10.1364/OFC.2013.OTh1D.1.
- [59] R. Palmer et al., "Low power Mach-Zehnder modulator in siliconorganic hybrid technology," *IEEE Photon. Technol. Lett.*, vol. 25, no. 13, pp. 1226–1229, Jul. 2013.
- [60] S. Lange *et al.*, "Low power InP-based monolithic DFB-laser IQ modulator with SiGe differential driver for 32-GBd QPSK modulation," *J. Lightw. Technol.*, vol. 34, no. 8, pp. 1678–1682, Apr. 2016.
- [61] E. Ip and J. M. Kahn, "Digital equalization of chromatic dispersion and polarization mode dispersion," *J. Lightw. Technol.*, vol. 25, no. 8, pp. 2033–2043, Aug. 2007.

Authors' biographies not available at the time of publication.