Comparison of Advanced Control Strategies Applied to a Multiple-Degrees-of-Freedom Wave Energy Converter: Nonlinear Model Predictive Controller versus Reinforcement Learning

Haider, Ali S.; Bubbar, Kush; McCall, Alan

doi:10.3390/jmse11112120

Open AccessArticle

Comparison of Advanced Control Strategies Applied to a Multiple-Degrees-of-Freedom Wave Energy Converter: Nonlinear Model Predictive Controller versus Reinforcement Learning

by

Ali S. Haider

^1,*

,

Kush Bubbar

¹

and

Alan McCall

²

¹

System-Model Development Engineering Lab, University of New Brunswick, Fredericton, NB E3B 5A3, Canada

²

Dehlsen Associates, LLC, Santa Barbra, CA 93101, USA

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(11), 2120; https://doi.org/10.3390/jmse11112120

Submission received: 2 October 2023 / Revised: 30 October 2023 / Accepted: 1 November 2023 / Published: 6 November 2023

(This article belongs to the Special Issue Optimization and Energy Maximizing Control Systems for Wave Energy Converters II)

Download

Browse Figures

Versions Notes

Abstract

:

Achieving energy maximizing control of a Wave Energy Converter (WEC) not only needs a comprehensive dynamic model of the system—including nonlinear hydrodynamic effects and nonlinear characteristics of Power Take-Off (PTO)—but to treat the entire system using an integrated approach, i.e., as a cyber–physical system considering the WEC dynamics, control strategy, and communication interface. The resulting energy-maximizing optimization formulation leads to a non-quadratic and nonstandard cost function. This article compares the (1) Nonlinear Model Predictive Controller (NMPC) and (2) Reinforcement Learning (RL) techniques as applied to a class of multiple-degrees-of-freedom nonlinear WEC–PTO systems subjected to linear as well as nonlinear hydrodynamic conditions in simulation, using the WEC-Sim™ toolbox. The results show that with an optimal choice of RL agent and hyperparameters, as well as suitable training conditions, the RL algorithm is more robust under more stringent operating requirements, for which the NMPC algorithm fails to converge. Further, RL agents are computationally efficient on real-time target machines with a significantly reduced Task Execution Time (TET).

Keywords:

energy maximizing control; nonlinear model predictive control; cyber–physical modeling; wave energy converter; reinforcement learning; nonlinear viscous drag; non-ideal power take-off

1. Introduction

Renewable energy technologies offer a feasible, sustainable, and green solution to increasing global energy needs, and the ocean offers an immense, untapped resource of energy with the potential to become an integral part of the world’s energy mix [1,2]. The prospect of ocean wave energy has triggered researchers to explore techniques to maximize energy capture [3] for wave energy converters under operating conditions deviating from ideality, to include practical PTO system constraints [4] and the nonlinear hydrodynamics effects of ocean waves. Energy maximization for a WEC system is in practice a multi-objective optimization problem, requiring considerations of the physical geometry of the WEC, the PTO system design, the mooring system design, the ocean conditions of the deployment site, the communication interface, and the control methodology.

On the control front, Model Predictive Control (MPC) yields superior overall system performance for wave energy converters because it optimizes energy capture while enforcing the electro-mechanical operating limits of the system [5]. MPC is a constrained online optimal control strategy that forecasts future trajectories of the system dynamics to solve an optimization program over a receding horizon window and determine the best instantaneous control action to maximize the output power of the WEC. The MPC algorithm uses an internal model of the plant to predict the system’s future states. However, WEC systems are increasingly growing in complexity [6], and there is a need for the control algorithm to handle the resulting non-ideal operating conditions. MPC algorithms suffer convergence issues under stringent non-ideal operating conditions, due to the limitations of the complex online optimization algorithm. These convergence issues become more prominent as the complexity of the optimization problem increases, due to the inclusion of multiple-DoF PTO mechanisms. The performance of the MPC algorithm is also vulnerable when incorporating nonlinearities such as viscous drag effects and nonlinear hydrodynamic forces.

Moreover, the MPC controller typically does not consider cyber-related issues, such as communication latency and packet loss between the real-time target machine that implements the controller and the WEC hardware. These factors contribute not only to the degree of optimality of the MPC solution but also to the degree of convergence of the optimal control problem. There is always an intrinsic limitation of the mathematical model of a WEC when simulating a real-world system, and if the internal plant prediction is too simple, the MPC optimization algorithm generates a poor solution under non-ideal conditions and may even become unstable.

Reinforcement Learning (RL) is a data-driven, goal-oriented, computational technique. In the RL approach, a computer interacts with a given unknown dynamic system through the RL inputs (i.e., observations and reward) and RL outputs (i.e., actions). During these interactions, the RL approach trains an agent to perform a task based on a reward from the environment [7]. Given a suitable training environment and RL agent structure, the agent can be trained for any practical environment. For a given energy-maximizing problem for a WEC, if the training environment includes effects such as non-ideal PTO behavior, communications interface latencies, and nonlinear hydrodynamic responses, then the trained RL agent learns to maximize the reward (i.e., optimization objective) in the presence of these effects. The adaptability of the RL approach to a given environment has led to an uptake in the usage of this technique for energy-maximizing problems for WEC systems. For example, an RL approach based on the Q-learning approach is presented to maximize the energy extraction in regular and irregular sea states for a point-absorber-type wave energy converter in [8], where the controller damping and stiffness are adjusted based on a reward function. Resistive control of a realistic WEC model using an RL approach based on a least-squares policy iteration is presented in [9]. A nonlinear reactive control strategy for a two-body point-absorber wave energy converter using the Q actor–critic learning method is presented for a two-body 1-DoF point absorber in [10]. A deep-RL-agent-based real-time control is presented in [11] for a 1-DoF heaving point absorber under a linear environment and is compared with a linear MPC.

This work presents the energy-maximizing control of a 2-DoF WEC array device for the digital twin of Dehlsen Associates’ three-pod CENTIPOD™ device [12]. The optimizing objective is to maximize the energy harnessed by PTO machines in heave and pitch axes, subject to the electro-mechanical constraints of the system. The objective function is a nonlinear and non-quadratic function of PTO current, heave velocity, and pitch velocity, considering the practical electric machine loss characteristics of the PTOs. Moreover, the wave energy converter model includes nonlinear hydrodynamic effects due to the quadratic drag of fluid, yielding a WEC model with nonlinear dynamics. To enact the energy-maximizing control of the WEC plant, we designed two controllers: (1) a Nonlinear MPC (NMPC) and (2) an RL-agent-based controller. For NMPC design, we extended the approach in [13] to two degrees of freedom, exploiting the technique of pseudo-quadratization using the ACADO Toolkit [14]. The WEC plant is modeled in surge–heave–pitch degrees of freedom using Cummin’s equation, where the radiation force convolution terms are approximated by state-space models [15]. For array devices, more thorough energy-based modeling approaches are possible, such as the port-Hamiltonian approach [16]; however, for this study, array effects and body-to-body interactions are neglected. On the RL side, we trained a Deep Deterministic Policy Gradient (DDPG) RL agent for the heave and pitch degrees of freedom. The simulation results of (1) NMPC and (2) RL are compared under the operation of the device in linear sea conditions as well as with the nonlinear hydrodynamics effects enabled in WECSim™ [17]. The WEC digital twin is simulated on an emulator machine and interfaced with the controller/training machine over EtherCAT and Universal Datagram Port (UDP) buses.

2. Developing Time-Domain Equations of the WEC

This work is related to the investigation of the power capture performance of advanced controllers for three-pod Centipod devices made by Dehlsen Associates, LLC (the multi-pod CENTIPOD) [12]. Figure 1 shows a 35th-Froude-scaled model of the WEC. However, for this work, a full-scale WECSIM [17] based digital twin of the Centipod device is considered, as shown in Figure 2; this is an array of three floating bodies (pods) that are free to heave and pitch against reaction bodies (spars) attached to a single submerged backbone structure, which is moored with three taut lines. The backbone structure is the main contributor of reaction damping to the PTO, as well as providing a stable common junction point for multiple pod–spars mechanisms. The backbone is taut-moored to the seabed, as shown in Figure 2. The pods have linear direct-drive permanent magnetic AC generator PTO machines in the heave axis and rotatory direct-drive permanent magnetic AC generator PTO machines in the pitch axes. For this study, body-to-body radiation coupling between pods is ignored; it will be evaluated in future research.

As per the multi-body dynamics convention for the floating pods, subscripts “1”, “3”, and “5” denote surge, heave, and pitch axes, respectively. Table 1 lists the variables and their descriptions, which are used in WEC dynamics.

2.1. Dynamic Model of WEC in Surge, Heave, and Pitch Axes

The orientation of the Centipod device in Figure 2 with respect to incoming waves in the surge direction results in negligible roll, sway, and yaw displacements of the pods; hence, it is adequate to consider the surge–pitch–heave model of each pod for energy capture considerations. The floating pods in Figure 2 are modeled as point-absorber bodies. Heave motion is very weakly coupled to surge and pitch; hence, this coupling effect can be ignored. In the local frame of reference, the Cummins equations for the three axes of freedom (surge–pitch–heave) are

M_{11} {\dot{v}}_{1} + A_{15} (\infty) {\dot{v}}_{5} = - F_{r, 11} (t) - F_{r, 15} (t) - F_{v, 1} (t) + F_{e, 1} (t),

(1)

M_{33} {\dot{v}}_{3} (t) = - F_{r, 33} (t) - F_{h s, 3} (t) - F_{v, 3} (t) - F_{p, 3} (t) + F_{e, 3} (t),

(2)

M_{55} {\dot{v}}_{5} + A_{51} (\infty) {\dot{v}}_{1} = - F_{r, 55} (t) - F_{r, 51} (t) - F_{v, 5} (t) - F_{h s, 5} (t) - F_{p, 5} (t) + F_{e, 5} (t)

(3)

Here,

M_{i i} = (m + A_{i i} (\infty))

. The radiation force, buoyancy restoring force, and fluid quadratic damping terms in (1) through (3), respectively, are given by

F_{r, i j} (t) = \int_{- \infty}^{t} K_{i j} (t - τ) v_{j} d τ,

(4)

F_{h s, i} (t) = C_{i} x_{i},

(5)

F_{v, i} (t) = C_{d, i} v_{i} |v_{i}| .

(6)

The time-domain convolution integral term in (4) can be transformed into the frequency-domain expression

Z_{p q} (j ω) V_{q} (j ω)

through the application of the Fourier transform. The frequency-domain hydrodynamic parameters for the Centipod hull geometry without the mooring system are determined using the WAMIT™ (Version 7.201-x64) software package [18]. A single-body WEC intrinsic impedance

Z_{p q} (j ω)

[19] is calculated using these hydrodynamic parameters plotted in Figure 3, Figure 4 and Figure 5. A minimal-order transfer function in Laplace space is approximated for

Z_{p q} (j ω)

using system identification techniques, and an equivalent state-space representation can be formulated as in [15,20]. After performing algebraic manipulations, the final state-space model of the plant developed in [20] is given by

\dot{X} = A X + B_{p} F_{p} + B_{v} F_{v} + B_{e} F_{e},

(7)

where

X

is the state vector and

F_{p} = {[\begin{matrix} F_{p, 5} & F_{p, 3} \end{matrix}]}^{T}

(8)

F_{v} = {[\begin{matrix} F_{v, 1} & F_{v, 5} & F_{v, 3} \end{matrix}]}^{T}

(9)

F_{e} = {[\begin{matrix} F_{e, 1} & F_{e, 5} & F_{e, 3} \end{matrix}]}^{T}

(10)

The state matrix and input matrices in (7) are given by (11) and (12) with appropriate systems parameter constants

m_{i j}, a_{i}

, and

b_{i}

.

2.2. Non-Ideal Power Take-Off Model

The power take-off machine for the heave axis is a Linear Universal Modular Actuator/Absorber (LUMA) machine [21]. For the pitch-axis, PTO comprises a direct-drive permanent magnet AC generator. The non-ideal power take-off model is taken from the case study in [20], where the PTO power capture is a function of PTO force and velocity with system parameter constants

c_{i}

given by (13).

\begin{array}{l} A = [\begin{matrix} \begin{matrix} 0 & 0 & - m_{15} C_{5} & - m_{11} & 0 & - m_{11} & 0 & - m_{15} & 0 & - m_{15} & 0 \\ 0 & 0 & - m_{55} C_{5} & - m_{51} & 0 & - m_{51} & 0 & - m_{55} & 0 & - m_{55} & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ b_{3} & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ b_{4} & 0 & 0 & a_{3} & a_{4} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & b_{5} & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & b_{6} & 0 & 0 & 0 & a_{5} & a_{6} & 0 & 0 & 0 & 0 \\ 0 & b_{7} & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & b_{8} & 0 & 0 & 0 & 0 & 0 & a_{7} & a_{8} & 0 & 0 \\ b_{9} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ b_{10} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & a_{9} & a_{10} \end{matrix} & 0_{11 \times 4} \\ 0_{4 \times 11} & \begin{matrix} 0 & \frac{{- C}_{3}}{M_{33}} & \frac{- 1}{M_{33}} & 0 \\ 1 & 0 & 0 & 0 \\ b_{1} & 0 & 0 & 1 \\ b_{2} & 0 & a_{1} & a_{2} \end{matrix} \end{matrix}] \\ = [\begin{matrix} 0 & 0 & 1.4 & - 3.8 e^{- 6} & 0 & - 3.8 e^{- 6} & 0 & 6.7 e^{- 8} & 0 & 6.7 e^{- 8} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & - 2.8 & 6.7 e^{- 8} & 0 & 6.7 e^{- 8} & 0 & - 1.3 e^{- 7} & 0 & - 1.3 e^{- 7} & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 2.2 e^{5} & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ - 4.7 e^{5} & 0 & 0 & - 5.3 & - 2.2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 5.4 e^{5} & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & - 7 e^{5} & 0 & 0 & 0 & - 3.3 & - 1.3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 2.4 e^{6} & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & - 2.4 e^{6} & 0 & 0 & 0 & 0 & 0 & - 2.6 & - 1.1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 5.4 e^{5} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ - 6.9 e^{5} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & - 3.3 & - 1.3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & - 2.1 & - 1.2 e^{- 6} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 6.3 e^{5} & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & - 8.9 e^{5} & 0 & - 1.5 & - 1.5 \end{matrix}] \end{array}

(11)

B_{p} = [\begin{matrix} {- m}_{15} & 0 \\ {- m}_{55} & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & \frac{- 1}{M_{33}} \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix}] = [\begin{matrix} 6.7 e^{- 8} & 0 \\ - 1.3 e^{- 7} & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 1.2 e^{- 6} \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix}], B_{e} = - B_{v} = [\begin{matrix} m_{11} & m_{15} & 0 \\ m_{51} & m_{55} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \frac{1}{M_{33}} \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] = [\begin{matrix} 3.8 e^{- 6} & - 6.7 e^{- 8} & 0 \\ - 6.8 e^{- 8} & 1.3 e^{- 7} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1.2 e^{- 6} \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}]

(12)

\begin{matrix} P_{E, i} = c_{0, i} F_{p, i} v_{i} - (\begin{matrix} c_{1, i} F_{p, i}^{6} + c_{2, i} F_{p, i}^{5} + c_{3, i} F_{p, i}^{4} + c_{4, i} F_{p, i}^{3} + c_{5, i} F_{p, i}^{2} + c_{6, i} F_{p, i} + c_{7, i}) \end{matrix} \end{matrix},

(13)

3. Nonlinear MPC Design for WEC

A given NMPC problem optimizes a manipulated variable

u \subseteq w

to maximize some cost function

P

of a set of system variables

“ w ”

while respecting the given system constraints. A general class of NMPC problems has been formulated in [13], in which the cost function takes on a nonlinear piecewise polynomial form. Considering the case of finite-horizon optimization, we can mathematically describe the NMPC problem of such a class as (Table 2 and Table 3)

\underset{u}{m a x} P (w) = \{\begin{matrix} P_{1} (w) + ρ_{N, 1} (w), & w_{k} < R_{1} \\ P_{2} (w) + ρ_{N, 2} (w), & R_{1} \leq w_{k} \leq R_{2} \\ ⋮ & ⋮ \\ P_{j} (w) + ρ_{N, j} (w), & R_{j - 1} \leq w_{k} \leq R_{j} \end{matrix},

(14)

4. RL Agent Design for 2-DoF Heave–Pitch PTOs

Similar to Section 3, we propose a method to optimize the overall electrical power captured by the 2-DoF PTO, in this case through designing an appropriate RL agent for our problem. For continuous action and observation spaces, the typical options for the candidate agents are

Deep Deterministic Policy Gradient (DDPG).
Twin-Delayed Deep Deterministic policy gradient (TD3).
Proximal Policy Optimization (PPO).
Soft Actor-Critic (SAC).

Regarding increasing complexity, the DDPG is the simplest compatible agent, followed by TD3, PPO, and SAC. TD3 is an improved, more complex version of DDPG, while PPO has more stable updates but requires more training [22]. On the other hand, SAC is an improved and more complex version of DDPG that generates stochastic policies. We utilize the DDPG for our problem, typically the first choice for problems with continuous action and observation spaces [7].

4.1. RL DDPG Agent Reward Function and Properties

DDPG-based control aims to maximize the PTO power capture while respecting the PTO velocity limits. The observation consists of the pod velocity, and the action is the PTO force. Training is performed offline for this study. To specify a reward function to train our RL DDPG, we propose using a modified version of (14) below, which includes a penalty term for the agent for exceeding the velocity limits of the PTO mechanisms:

R e w a r d_{i} = k_{p} \frac{1}{2} h_{i}^{T} (2 W_{i}) h_{i} - k_{v_{i}} (|v_{i}| > v_{i, m a x})

(15)

Here,

k_{p}

and

k_{v_{i}}

are some appropriate scaling factors. We have designed two separate RL agents for the pitch and heave control because these DoFs are decoupled. The DDPG agent options for both DoFs are given in Table 4.

4.2. Design of Actor and Critic Deep Networks for RL Training

The DDPG-based RL algorithm requires the critic and actor neural networks to implement the optimal policy by generating actions in response to the given observations. A critic neural network predicts the discounted value of the cumulative long-term reward by looking at the observations and actions, and an agent neural network implements the RL policy to produce actions to maximize the predicted discounted cumulative long-term reward [22]. An experience-based design choice of the deep network structures for RL actor and critic is shown in Figure 6. Based on extensive training trials, an RL-Q-value representation for the critic network is finally selected for both heave and pitch, and the other hyperparameter choice for the critic network is given in Table 5.

An RL-deterministic representation is chosen for the actor network for both heave and pitch, based on extensive training trials; the other hyperparameters chosen for the actor network are provided in Table 6.

4.3. RL Agent Training

To train the RL agent, we generate training data by simulating a full-scale version of the 2-DoF version of Dehlsen’s three-pod CENTIPOD WEC of Figure 1 in an emulator machine, by using its WEC-Sim model, as shown in Figure 7. The mean of a cluster of sea states used to execute the WEC-Sim simulation are described in Table 7, which were selected based on the geographic location and wave resources at the deployment site at PacWave [23,24]; the corresponding wave spectrum is shown in Figure 8.

The strategy to train the RL agent is shown schematically in Figure 9. An emulator machine simulates the real-time digital twin of the plant in Simulink/WEC-Sim., which is connected to another real-time controller target machine via an Ethernet/Universal Datagram Packet (UDP) link which runs the RL training algorithm in MATLAB/Simulink. A MATLAB script to train the RL algorithm in the controller machine establishes the connection between the two machines for each training episode and trains the agent for the environment marked in Figure 9.

The Simulink model used to implement the RL agent is shown in Figure 10. A separate RL agent is implemented for heave and pitch DoFs for each of the three pods of Centipod WEC in Figure 2. For this study, body-to-body interactions and array effects are neglected, and all pods are assumed to be identical; therefore, the DDPG RL agent is trained for the heave and pitch axes for one pod, and the trained policy is deployed for each PTO in Figure 10. The training policy is implemented in a Speedgoat Performance real-time target machine (Intel Core 3.1 GHz, 4-core, 8 GB). The velocity data (observations) of each pod are collected over the UDP link, parsed, and observed by respective policies to generate control actions, which are packed and transmitted to the WEC emulator machine over the UDP channel, as shown in Figure 10.

Two agents are trained for each DoF, one for the unconstrained PTO force and the other with a

40 k N

upper bound on the PTO force magnitude. The three pods in Figure 10 are identical, so agents are trained for a single pod and the resulting agents are duplicated for the other two pods. The training stats for the heave agents for Pod 1 with constrained and unconstrained PTO forces are shown in Figure 11a and Figure 11b, respectively. In either case, RL training converges in about

40

episodes with a

100 s

simulation time for each episode. The training stats for the pitch agents for Pod 1 with constrained and unconstrained PTO forces are shown in Figure 12a and Figure 12b, respectively. The constrained force pitch agent converges in

25

episodes, and the uncontained force pitch agent converges in

60

episodes.

5. Results

The training setup shown in Figure 9 is also used for executing system simulations with the trained agents. When executing simulations, the agents in Figure 10 are replaced with the trained agents, and the training machine in Figure 9 plays the role of a controller machine. A controller model similar to the RL model in Figure 9 is developed to simulate the NMPC by extending the scheme in [13] to the NMPC designed in Section 4, as shown in Figure 13. The RL and NMPC controllers in Figure 10 and Figure 13, respectively, are tested with the WEC-Sim model of Figure 10 running on the emulator machine in Figure 9, with the same sea-state parameters given in Table 7. Tests are run with constrained and unconstrained PTO force conditions. Each case is simulated with linear wave conditions as well as with nonlinear buoyancy and Froude–Krylov excitations enabled in WECSim. The mechanical velocity constraints of the power take-off machines are enforced as follows:

|h e a v e v e l o c i t y| \leq 2 m / s

and

|p i t c h v e l o c i t y| \leq 0.5 r a d / s

.

The moving averaged electrical power outputs with nonlinear MPC and RL are plotted in heave and pitch for Pod 1 with unconstrained and constrained PTO force cases subject to linear wave conditions, as shown in Figure 14 and Figure 15, respectively. It is also important to evaluate the performance of the two algorithms under nonlinear buoyancy and Froude–Krylov wave excitations in WEC-Sim. The average electrical heave and pitch power outputs for Pod 1 with unconstrained and constrained PTO force cases under nonlinear hydrodynamic conditions are shown in Figure 16 and Figure 17, respectively.

The results for the PTO power capture performance in heave and pitch from Figure 14 through Figure 17 are summarized in Table 8. The summary of the computational performance statistics for the NMPC and RL-DDGP control algorithms is given in Table 9, where the average Task Execution Time (TET) for each algorithm and RL agent training times for the PTO machines in heave and pitch axis are listed.

6. Discussion

The observations of the moving mean of electrical output power from the PTO mechanisms in Figure 14, Figure 15, Figure 16 and Figure 17 reveal an improvement in the power output in the case of the RL-DPPG agent compared to the NMPC. This observation may be attributed to the fact that the operation of the NMPC is based upon the prediction model of the WEC plant, which, from the definition of the NMPC method, is an approximate representation of the actual process. On the other hand, the RL-DPPG agent was trained on the actual process in Figure 9 and observed the full process dynamics to determine how to act accordingly. In the constrained linear hydrodynamic cases in Figure 14b and Figure 15b, the performance of the RL-DPPG very closely resembles the performance of NMPC. However, in the cases with unconstrained PTO forces (Figure 14a and Figure 15a), nonlinear effects in the process dynamics become prominent as large PTO force magnitudes emerge and the NMPC performance degrades. This degradation can be attributed to the NMPC algorithm being susceptible to unmodelled or poorly modeled nonlinearities. The plot of the instantaneous electrical power output for the heave DoF is shown in Figure 18a, which corresponds to the average power output plot in Figure 14a. A significantly improved RL-DPPG control strategy operation is evident compared to the NMPC.

The performance degradation of the NMPC is attributed to the unmodeled process nonlinearities and becomes fully visible when observing the system’s operation under nonlinear hydrodynamic wave conditions in WEC-Sim, as detailed in Figure 19 and Figure 17a. The NMPC performs poorly in these figures, and the controller becomes unstable. The plot of instantaneous electrical power output for heave is shown in Figure 18b, corresponding to the average power output plot in Figure 19 under unconstrained PTO force conditions. It can be observed in Figure 18b that the NMPC algorithm fails to converge after around 170 s because it is unable to respond appropriately to nonlinear wave hydrodynamic conditions. On the other hand, RL-DPPG remains stable under the same nonlinear wave conditions. The heave PTO force plots corresponding to Figure 18b for NMPC and RL-DPPG are shown in Figure 19.

In Figure 19 the unstable output of the NMPC can be observed. The performance of the RL-DPPG remains stable and robust against process uncertainties in the same conditions. This might also be attributed to the fact that no underlying online optimization problem is being solved at every time step in the RL-DPPG policy deployment stage as opposed to the case of the NMPC, which also explains the significantly reduced TET in Table 9 for the RL-DDPG agent compared to the NMPC.

7. Conclusions

This article presents a comparison of two strategies, (a) NMPC and (b) RL-DPPG, for controlling the power-capture dynamics of the nonlinear WEC device by Dehlsen Associates’ (the three-pod CENTIPOD WEC), with a PTO operating simultaneously in the heave axis and pitch axis. A state-space model of the WEC plant is formulated, including nonlinear quadratic viscous drag, and we consider a case study PTO model with a non-quadratic cost function. Two controllers are designed to optimize power capture from the PTO, (a) NMPC and (b) RL-DPPG, by training agents for the PTO machines in the heave and pitch axis. Both control algorithms are tested against the same simulated WEC model in WEC-Sim running on an external emulator machine. The heave and pitch PTO power output results are obtained for the linear wave conditions as well as with nonlinear buoyancy, and Froude–Krylov excitations are enabled in WECSim for cases where the PTO force is constrained or unconstrained. Results depict a significant enhancement in the performance of the proposed RL-DDPG algorithm when compared to the NMPC controller, based on various performance metrics, including a reduction in the Task Execution Time (TET), an increase in the power extraction, an improvement in the robust operation when subject to exogenous conditions, and more overall flexibility and ease of design.

Author Contributions

Conceptualization, A.S.H. and A.M.; methodology, A.S.H., K.B. and A.M.; software, A.M. and K.B.; validation, A.S.H., A.M. and K.B.; formal analysis, A.S.H. and K.B.; investigation, A.S.H., A.M. and K.B.; resources, A.M. and K.B.; data curation, A.M.; writing—original draft preparation, A.S.H. and K.B.; writing—review and editing, A.S.H., A.M. and K.B.; visualization, A.S.H., K.B. and A.M.; supervision, K.B. and A.M.; project administration, K.B. and A.M.; funding acquisition, K.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the McCain Foundation postdoctoral fellowship award in innovation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors appreciate technical support from Dehlsen Associates, LLC, California, and the provision of experimental data for the PTO machines from McCleer Power, Michigan.

Conflicts of Interest

The authors declare no conflict of interest.

References

Muetze, A.; Vining, J.G. Ocean Wave Energy Conversion—A Survey. In Proceedings of the Conference Record of the 2006 IEEE Industry Applications Conference Forty-First IAS Annual Meeting, Tampa, FL, USA, 8–12 October 2006; Volume 3, pp. 1410–1417. [Google Scholar]
Richter, M.; Magana, M.E.; Sawodny, O.; Brekken, T.K.A. Nonlinear Model Predictive Control of a Point Absorber Wave Energy Converter. IEEE Trans. Sustain. Energy 2013, 4, 118–126. [Google Scholar] [CrossRef]
Genest, R.; Ringwood, J.V. A Critical Comparison of Model-Predictive and Pseudospectral Control for Wave Energy Devices. J. Ocean Eng. Mar. Energy 2016, 2, 485–499. [Google Scholar] [CrossRef]
Falcão, A.F.O.; Henriques, J.C.C. Effect of Non-Ideal Power Take-off Efficiency on Performance of Single- and Two-Body Reactively Controlled Wave Energy Converters. J. Ocean Eng. Mar. Energy 2015, 1, 273–286. [Google Scholar] [CrossRef]
Brekken, T.K.A. On Model Predictive Control for a Point Absorber Wave Energy Converter. In Proceedings of the 2011 IEEE Trondheim PowerTech, Trondheim, Norway, 19–23 June 2011; pp. 1–8. [Google Scholar]
Bubbar, K.; Buckham, B.; Wild, P. A Method for Comparing Wave Energy Converter Conceptual Designs Based on Potential Power Capture. Renew. Energy 2018, 115, 797–807. [Google Scholar] [CrossRef]
What Is Reinforcement Learning?—MATLAB & Simulink—MathWorks United Kingdom. Available online: https://uk.mathworks.com/help/reinforcement-learning/ug/what-is-reinforcement-learning.html (accessed on 2 October 2023).
Anderlini, E.; Forehand, D.I.M.; Bannon, E.; Xiao, Q.; Abusara, M. Reactive Control of a Two-Body Point Absorber Using Reinforcement Learning. Ocean. Eng. 2018, 148, 650–658. [Google Scholar] [CrossRef]
Control of a Realistic Wave Energy Converter Model Using Least-Squares Policy Iteration. Available online: https://ieeexplore.ieee.org/document/7911321 (accessed on 2 October 2023).
Zadeh, L.G.; Glennon, D.; Brekken, T.K.A. Nonlinear Control Strategy for a Two-Body Point Absorber Wave Energy Converter Using Q Actor-Critic Learning. In Proceedings of the 2020 IEEE Conference on Technologies for Sustainability (SusTech), Santa Ana, CA, USA, 23–25 April 2020; pp. 1–5. [Google Scholar]
Anderlini, E.; Husain, S.; Parker, G.G.; Abusara, M.; Thomas, G. Towards Real-Time Reinforcement Learning Control of a Wave Energy Converter. J. Mar. Sci. Eng. 2020, 8, 845. [Google Scholar] [CrossRef]
Rij, J.; Yu, Y.-H.; McCall, A.; Coe, R.G. Extreme Load Computational Fluid Dynamics Analysis and Verification for a Multi-Body Wave Energy Converter. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering. American Society of Mechanical Engineers, Glasgow, UK, 9–14 June 2019. [Google Scholar]
Haider, A.S.; Brekken, T.K.A.; McCall, A. A State-of-the-Art Strategy to Implement Nonlinear Model Predictive Controller with Non-Quadratic Piecewise Discontinuous Cost Index for Ocean Wave Energy Systems. In Proceedings of the 2020 IEEE Energy Conversion Congress and Exposition (ECCE), Detroit, MI, USA, 11–15 October 2020; pp. 1873–1878. [Google Scholar]
Houska, B.; Ferreau, H.J.; Diehl, M. An Auto-Generated Real-Time Iteration Algorithm for Nonlinear MPC in the Microsecond Range. Automatica 2011, 47, 2279–2285. [Google Scholar] [CrossRef]
Lu, H.; Chang, S.; Chen, C.; Fan, T.; Chen, J. Replacement of Force-to-Motion Relationship with State–Space Model for Dynamic Response Analysis of Floating Offshore Structures. Appl. Ocean. Res. 2022, 119, 102977. [Google Scholar] [CrossRef]
Barradas-Berglind, J.J.; Muñoz-Arias, M.; Wei, Y.; Prins, W.A.; Vakis, A.I.; Jayawardhana, B. Towards Ocean Grazer’s Modular Power Take-Off System Modeling: A Port-Hamiltonian Approach. IFAC-PapersOnLine 2017, 50, 15663–15669. [Google Scholar] [CrossRef]
WEC-Sim (Wave Energy Converter SIMulator)—WEC-Sim Documentation. Available online: https://wec-sim.github.io/WEC-Sim/ (accessed on 27 March 2021).
Wamit, Inc. The State of the Art in Wave Interaction Analysis. Available online: https://www.wamit.com/ (accessed on 28 March 2021).
Falnes, J. Wave-Energy Conversion through Relative Motion between Two Single-Mode Oscillating Bodies. J. Offshore Mech. Arct. Eng. 1999, 121, 32–38. [Google Scholar] [CrossRef]
Haider, A.S.; Brekken, T.K.A.; McCall, A. Real-Time Nonlinear Model Predictive Controller for Multiple Degrees of Freedom Wave Energy Converters with Non-Ideal Power Take-Off. J. Mar. Sci. Eng. 2021, 9, 890. [Google Scholar] [CrossRef]
Technology—Centipod. Available online: https://centipodwave.com/technology/ (accessed on 24 October 2023).
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019. [Google Scholar] [CrossRef]
PacWave—TESTING WAVE ENERGY FOR THE FUTURE. Available online: https://pacwaveenergy.org/ (accessed on 24 October 2023).
Dunkle, G.; Zou, S.; Robertson, B. Wave Resource Assessments: Spatiotemporal Impacts of WEC Size and Wave Spectra on Power Conversion. Energies 2022, 15, 1109. [Google Scholar] [CrossRef]

Figure 1. 35th-scale Centipod wave energy converter by Dehlsen Associates, LLC.

Figure 2. Degrees of freedom for dynamic modeling of Centipod WEC: (a) baseline configuration and (b) model with mooring lines.