[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Dynamic Responses of Train-Symmetry-Bridge System Considering Concrete Creep and the Creep-Induced Track Irregularity
Previous Article in Journal
Oblique Arbitrary Amplitude Dust Ion Acoustic Solitary Waves in Anisotropic Non-Maxwellian Plasmas with Kappa-Distributed Electrons
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube

College of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(10), 1845; https://doi.org/10.3390/sym15101845
Submission received: 17 August 2023 / Revised: 22 September 2023 / Accepted: 26 September 2023 / Published: 29 September 2023
Figure 1
<p>A two-dimensional depiction of the control maneuver. Different regions in the graph represent the following: <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>1</mn> <mo>−</mo> </mrow> </msub> </mrow> </semantics></math> corresponds to the region where <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>≤</mo> <mo>−</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> <mo>≥</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>1</mn> <mo>+</mo> </mrow> </msub> </mrow> </semantics></math> depicts the area with <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>≥</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> <mo>≤</mo> <mo>−</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>2</mn> <mo>+</mo> </mrow> </msub> </mrow> </semantics></math> characterizes the area where <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>e</mi> <mo>|</mo> </mrow> <mo>≤</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> <mo>≥</mo> <msubsup> <mi>e</mi> <mn>0</mn> <mo>′</mo> </msubsup> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>2</mn> <mo>−</mo> </mrow> </msub> </mrow> </semantics></math> highlights the territory where <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>e</mi> <mo>|</mo> </mrow> <mo>≤</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> <mo>≤</mo> <mo>−</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>3</mn> <mo>+</mo> </mrow> </msub> </mrow> </semantics></math> showcases the domain where <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>≥</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> <mo>≤</mo> <mrow> <mo>|</mo> <mrow> <msubsup> <mi>e</mi> <mn>0</mn> <mo>′</mo> </msubsup> </mrow> <mo>|</mo> </mrow> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>3</mn> <mo>−</mo> </mrow> </msub> </mrow> </semantics></math> marks the domain where <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>≤</mo> <mo>−</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> <mo>≤</mo> <mrow> <mo>|</mo> <mrow> <msubsup> <mi>e</mi> <mn>0</mn> <mo>′</mo> </msubsup> </mrow> <mo>|</mo> </mrow> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>4</mn> <mo>+</mo> </mrow> </msub> </mrow> </semantics></math> exemplifies the region where <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> </mrow> <mo>|</mo> </mrow> <mo>≤</mo> <msubsup> <mi>e</mi> <mn>0</mn> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>≥</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mrow> <mn>4</mn> <mo>−</mo> </mrow> </msub> </mrow> </semantics></math> describes the space where <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> </mrow> <mo>|</mo> </mrow> <mo>≤</mo> <msubsup> <mi>e</mi> <mn>0</mn> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>≤</mo> <mo>−</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mn>5</mn> </msub> </mrow> </semantics></math> specifies the domain where <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>e</mi> <mo>|</mo> </mrow> <mo>&lt;</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <msub> <mi>e</mi> <mi>c</mi> </msub> </mrow> <mo>|</mo> </mrow> <mo>&lt;</mo> <msub> <mi>e</mi> <mn>0</mn> </msub> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math>.</p> ">
Figure 2
<p>Diagram illustrating the architectural structure of the deep neural network.</p> ">
Figure 3
<p>Diagram of the constrained DNN structure expanded by Algorithm 1.</p> ">
Figure 4
<p>The flowchart depicting the feedback processing mechanism of the control synthesis.</p> ">
Figure 5
<p>The structure of the constrained DNN-based robust model predictive control scheme with an adjustable error tube.</p> ">
Figure 6
<p>Nominal state trajectories under various DNN architectures.</p> ">
Figure 7
<p>The state trajectories of the proposed algorithm (<span class="html-italic">N</span> = 12). Colors in the figure represent specific categories as follows: the green polytopes depicte the error tube for every sampling times; the dark gray area represents the 0-step homothetic tube controllability set X<sub>0</sub>; the gray area declares the undesirable state area.</p> ">
Figure 8
<p>State astringency comparison between Algorithm 1 and the HTMPC (<span class="html-italic">N</span> = 25). (<b>a</b>) Curves of <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> </mrow> </semantics></math> obtained by implementing two control algorithms, respectively; (<b>b</b>) curves of <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>2</mn> </msub> </mrow> </semantics></math> obtained by implementing two control algorithms, respectively.</p> ">
Figure 9
<p>Control input astringency comparison between Algorithm 1 and the HTMPC (<span class="html-italic">N</span> = 25).</p> ">
Figure 10
<p>Comparison of Algorithm 1 and the HTMPC for computational efficiency (<span class="html-italic">N</span> = 50). (<b>a</b>) The statistical of computational time for Algorithm 1. (<b>b</b>) The statistical of computational time for HTMPC.</p> ">
Figure 11
<p>Euclidean norm of state error under various DNN architectures.</p> ">
Figure 12
<p>State astringency comparison between Algorithm 1 and the HTMPC (<span class="html-italic">N</span> = 30). (<b>a</b>) Curves of state obtained by implementing two distinct control algorithms, respectively. (<b>b</b>) Curves of state obtained by implementing two distinct control algorithms, respectively. (<b>c</b>) Curves of state obtained by implementing two distinct control algorithms, respectively. (<b>d</b>) Curves of state obtained by implementing two distinct control algorithms, respectively.</p> ">
Figure 13
<p>Control input astringency comparison between Algorithm 1 and the HTMPC (<span class="html-italic">N</span> = 30). (<b>a</b>) Curves of control input <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">v</mi> <mn>1</mn> </msub> </mrow> </semantics></math> for the nominal system obtained by employing two distinct control algorithms, respectively. (<b>b</b>) Curves of control input <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">v</mi> <mn>2</mn> </msub> </mrow> </semantics></math> for the nominal system obtained by employing two distinct control algorithms, respectively.</p> ">
Versions Notes

Abstract

:
This paper proposes a novel robust model predictive control (RMPC) scheme for constrained linear discrete-time systems with bounded disturbance. Firstly, the adjustable error tube set, which is affected by local error and error variety rate, is introduced to overcome uncertainties and disturbances. Secondly, the auxiliary control rate associated with the cost function is designed to minimize the discrepancy between the actual system and the nominal system. Finally, a constrained deep neural network (DNN) architecture with symmetry properties is developed to address the optimal control problem (OCP) within the constrained system while conducting a thorough convergence analysis. These innovations enable more flexible adjustments of state and control tube cross-sections and significantly improve optimization speed compared to the homothetic tube MPC. Moreover, the effectiveness and practicability of the proposed optimal control strategy are illustrated by two numerical simulations. In practical terms, for 2-D systems, this approach achieves a remarkable 726.23-fold improvement in optimization speed, and for 4-D problems, it demonstrates an even more impressive 7218.07-fold enhancement.

1. Introduction

Over the past few years, RMPC was enjoying enormous acceptance in practical applications, including trajectory tracking, industrial process control, and energy systems [1,2,3]. The successful implementation of RMPC in the various branches is on account of its prominent advantages. In particular, RMPC provides an integrated solution for controlling systems with model uncertainty, additive disturbance, and constraints. Theoretically, the feature attracted remarkable attention for analyzing and synthesizing different forms of RMPC. As a result, several RMPC algorithms were investigated in the literature [4,5,6,7], and so on.
Recently, the application requirements for considering practical constraints and the realization environment prompted increasing attention of RMPC towards new orientations. For instance, the increasing demand for algorithms underscores the need to integrate optimization performance and control robustness, propelling the development of tube-based MPC (TMPC) [8,9,10]. The deployment of tubes draws forth a set of strictly set theoretic strategies for RMPC synthesis, which consider a computationally efficient treatment of uncertainties and their interaction with the system dynamics and constraints. In [11,12], a class of linear systems with bounded disturbance and convex constraint separated the nominal system from the actual system by adopting a separation control strategy. What is noteworthy is that the conservatism of the proposal employing this construction in [13,14] was caused by deploying the fixed tube cross-section shape sets. To mitigate this conservatism, the homothetic tube model predictive control (HTMPC) strategy proposed in [15,16] explored the impact of disturbances by constructing locally accurate reachable sets centered around nominal system trajectories. In light of these developments, the concept of HTMPC emerged as an enhanced and more adaptable framework for RMPC synthesis. Among the array of control schemes considered, HTMPC stands out as an improved and more versatile option. What sets it apart is its capacity to parameterize the cross-sections of the state tube and control tube in terms of associated centers and scaling sequences. This study aims to further investigate this concept by considering variations in state error and changes in the value of the cost function during error adjustment in designing the tube size controller and auxiliary control law, which distinguishes it from previous literature [15] that incorporates scaling vector optimization into OCP, thereby increasing computational complexity and aiming to optimize scaling vectors to a specific value. However, it is essential to note that an inherent drawback of the HTMPC approach lies in its computational complexity, which grows significantly with an increasing number of constraints, as measured by the proliferation of polytopic regions.
Furthermore, the issue of computational complexity is generally associated with dynamic programming in the presence of constraints and uncertainties, which inspires the development of parameterized RMPC [17,18,19]. The parameterized optimization problem is commonly approximated using neural network (NN) or DNN to enhance computational efficiency [20,21]. Certain studies even turned to symmetric neural networks (SNNs) due to their unique properties [22]. SNNs, characterized by symmetric weight initialization and activation functions, demonstrated their ability to accelerate convergence and improve the robustness of neural network-based approaches [23]. Some studies adopted an offline approach to generate nominal systems [24,25]. While effective in reducing online computation time, this method leans toward a more conservative control strategy in highly uncertain scenarios, necessitating a trade-off with control performance. Additionally, other studies considered system uncertainty by establishing a linear variable parameter system model [26,27]. This approach facilitates adaptive learning to address system changes and uncertainties, making it better suited for handling variations in the variable parameter system. However, applying this technique in complex, large-scale systems demands substantial computational resources for the training and inference of DNN, potentially leading to real-time control delays. As the field of online learning technology continues to mature, its integration with RMPC holds promise for enhancing the real-time capabilities and scalability of the control scheme. Notably, previous studies employed reinforcement learning techniques to solve linear quadratic regulator and MPC problems, providing convergence proofs for associated issues [28,29]. Advanced deep reinforcement learning algorithms further demonstrated their potential within an RMPC framework, emphasizing the iterative interaction between optimal control actions and performance indices [30,31]. These instances underscore the capacity of online learning techniques to address quadratic programming problems. Therefore, the integration of online learning techniques, including deep neural networks (DNNs) with a symmetric architecture, holds immense potential in enhancing the real-time capabilities and scalability of robust model predictive control (RMPC). Our proposed approach, which leverages the computational power of GPUs for real-time acquisition of time-varying nominal system information, not only ensures real-time control performance, but also optimizes efficiency.
Building upon the above research, it is not difficult to find that a promising approach involves incorporating tubes with increased degrees of freedom into the optimization process while employing function approximation and online learning techniques within the framework of RMPC to enhance computational efficiency. The main contributions of the paper are three-fold:
  • A fuzzy-based tube size controller is investigated to adjust the local error tube-scaling vector. Specifically, the controller is designed by considering the state error between the nominal and the actual systems; the error and error variety rate bounds are then established, and the fuzzy IF-THEN rules are derived. The tightened sets on state error are developed to satisfy the system constraints in the case of external disturbances and model uncertainties.
  • An auxiliary control law pertaining to the scaling vector of the error tube holds greater significance. The auxiliary control law effectively mitigates interference impact on the system by considering variations in the system’s cost function.
  • A theoretically rigorous and technically achievable framework for RMPC with online parameter estimation, based on a constrained DNN with symmetry properties to improve computing performance, was developed: the OPC is defined based on the parameters of online learning; the DNN structure is expanded using Dykstra’s projection algorithm to ensure the feasibility of the successor state and control input; a time-varying nominal system is generated based on the aforementioned content to fulfill the requirements of system robustness.
The remainder of this paper is organized as follows: Preliminaries and problem formulation is considered in Section 2. In Section 3, a novel RMPC scheme is developed based on the fuzzy-based tube size controller and constrained DNN algorithm. Section 4 provides two numerical examples to illustrate the feasibility and effectiveness of the proposed control scheme. In Section 5, some conclusions are drawn.

2. Preliminaries and Problem Formulation

2.1. Nomenclatures

The set of non-negative reals is denoted by ; N is a sequence of non-negative integers N { 0 , 1 , 2 , , N } . For a set A and a real matrix M of compatible dimensions, the image of A under M is denoted by M A = { M a : a A } . Given two subsets C and B of n and x n , the Minkowski set addition is defined by C B { c + b | c C , b B } and Minkowski set subtraction is defined by C B { c | c B C } . { x } C is substituted for x C . For M > 0 and x n , define x M 2 = x T M x . The distance of a point x n from a point z n is denoted by d ( x , z ) = | x z | . C o n v { } denotes the convex hall of elements in { } . For an unknown vector v , the notations v * represent its optimal value.

2.2. Problem Formulation

Consider a discrete-time linear system with bounded disturbance (actual system) in the form of
x k + 1 = A x k + B u k + w k ,   k N ,
where N is the horizon length. x k n and u k m are the state vector and the control input of the actual system subject to bounded disturbance w k . w k n is taking values in the set W n . The x k + 1 denotes the successor state of the actual system. The system variables are selected such that the following constraints are satisfied:
x k X n , u k U m , w k W n ,   k N ,
where X and U are compact and convex, which contains the origin as an interior point. The compact set W contains the origin.
Let the nominal (reference) system without any disturbance corresponding to (1) be defined by
z k + 1 = A z k + B v k ,   k N ,
where z k n and v k m are the state and control input of the nominal system without accounting for any uncertainty, respectively. z k + 1 denotes the desired value of the successor state in the system (1).
The state error is represented as
e k = x k z k ,   k N .
Assumption 1.
  • The matrix pair ( A , B ) n × n × n × m is known and stabilizable;
  • The state x k can be measured at each sample time;
  • The current disturbance w k W and future disturbances w k + i W , i = 1 , 2 , , N 1 are not known and can take arbitrary values.
In this paper, the fixed shape set of the error tube is expressed as E. For any non-empty set E n , the error tube is a sequence of sets E N = { E k } , where E k is given by
E k = α k E ,   k N with α k ,
where α k is the scaling vector. Meanwhile, for each relevant k N , the state tube X N and control tube U N 1 corresponding to HTMPC [18] are indirectly determined by the following form
X N = { z k ( e k 1 ) } E N ,   k N
U N 1 = { v k ( e k ) } K E N ,   k N
where { z k ( e k 1 ) } and { v k ( e k ) } are the sequence of state tube and control tube centers determined by state error e . K m × n is the disturbance rejection gain [32]. The corresponding control policy is a sequence of control laws Π N 1 = { π k ( e k , E k , U k ) } with
e k α k E , π k ( e k , E k , U k ) = v k ( e ) + K e k , k N 1 .
Refer to Equations (5)–(8), clearly, given set E , the error tube E N , state tube X N , control tube U N 1 , and control policy Π N 1 are determined by the sequences of { e k n } and { v k m } . Consequently, introduce a decision variable φ N = ( e 0 , , e N , v 0 , , v N 1 ) N ( n + m + 1 ) .
Subsequently, the OCP N ( e ) is defined by
V N 0 ( e ) = inf φ N { V N ( φ N ) : φ N Φ N ( e ) } ,
d N 0 ( e ) = arg inf φ N { V N ( φ N ) : φ N Φ N ( e ) } ,
where the cost function V N ( ) is defined by
V N ( φ N ) = min k = 0 N 1 ( e k , v k ) + V f ( e N ) ,
with
( e k , v k ) = e Q e 2 + v Q v 2 , k N
and
V f ( e N ) = e P 2 ,
here, ( e k , v k ) is the stage cost, which is employed to achieve the desired performance of the control. The terminal cost represented by V f ( e N ) ensures stability and recursive feasibility. Q e n × n , Q v m × m , and P n × n are known positive definite symmetric matrices. For any x k X n , the set of permissible decision variables φ N corresponds to the value of the set-valued map Φ N ( e ) as Φ N ( e ) : = { φ N : ( 14 )   h o l d s   f o r   a l l   k N 1 } , where
e 0 E ,
( A + B K ) e W α k + 1 E ,
e k α k E ,
{ υ ( e k ) } U α k K E ,
{ z ( e k 1 ) } X α k E ,
A x k + B u k W z ( e k ) α k + 1 E ,
E N E f ,
where E f n + 1 is the terminal constraint set [33] for N ( e ) .
Similar to the tube MPC principle [8], if z k satisfies X α k + 1 E and v k satisfies U K E , then the imposed constraints on the actual system state x X and control input u U are also met. In this work, the determination of z k is related to e k 1 , while the determination of v k is concerned with e k ; thus, it is imperative to satisfy both conditions { z ( e k 1 ) } X α k E and { υ ( e k ) } U α k K E . Furthermore, at step N, if E N fulfills terminal constraint E N E f (the Equation (21) provides the formulation and limitation of E f ), it guarantees that the system state complies with requirement x N X .
Constraints (15) and (19) represent the set dynamics of the error tube and the homothetic state tube, respectively, which contribute to dynamic relaxation in [8]. In addition, the terminal constraint set E f satisfies the following constraint:
( A + B K ) E f E f .
The performance evaluation of the terminal control necessitates the definition of a 0-step homothetic tube controllability set X 0 [15], which must satisfy the following constraints:
X 0 = Pr o j n { ( x , z , α ) : z X α E | K z U α E } .
where Pr o j n ( Z ) denotes a set z n + m projected onto n as Pr o j n ( Z ) = { x n : y m s u c h t h a t ( x , y ) Z } .

2.3. Controller Synthesis

The objectives of this paper is to design an optimal control policy u k based on any given initial state error e 0 , which not only renders the local state x k asymptotically tracking the reference state z k , namely e k asymptotically approaching zero, but also minimizes the OCP. The problem of solving the conventional control policy u k of (1) is converted into finding the nominal optimal control input v k ( e ) and designing an appropriate disturbance rejection gain K while ensuring that the constraints related to α k are satisfied.
The controller synthesis for the proposed RMPC scheme is specified as
u k = υ ( e k ) + K e k ,   k N
where u k is the control action obtained from the presented method. The ancillary control law is denoted as K e k , which keeps the local state x k within the error tube centered around the trajectory of z k . υ ( e k ) is the output obtained by online learning with state errors as input.
Consider the error system obtained by integrating the Equations (1), (3), and (4) as
e k + 1 = A e k + B ( u k v k ) + w k ,   k N
where e k + 1 is the successor state error. The system (24) is rewritten to be
e k + 1 = ( A + B K ) e k + w k ,   k N .

3. DNN-Based RMPC with a Fuzzy-Based Tube Size Controller

This section presents the design of the novel RMPC scheme, which incorporates updates to scaling and policy iteration for nominal control. The innovative RMPC framework consists of a fuzzy-based tube size controller and a constrained DNN-based nominal RMPC component. The former calculates the error tube-scaling vector by considering both state error and error variety rate, while the latter determines a sequence of constraints associated with scaling to ensure optimal control policy generation. Concurrently, the DNN-based nominal RMPC offers a time-varying nominal system that exhibits enhanced computational efficiency. Moreover, by incorporating variations in the cost function value into the auxiliary control law design, it effectively mitigates the adverse effects of interference on the system.

3.1. Error Tube and Constraint Satisfaction

This work discusses that the fuzzy control is used to estimate (predict) the corresponding error tube-scaling vector α k , allowing for computational feasibility of the OCP N ( e ) . More importantly, an auxiliary control law K e k pertaining to the scaling vector of the error tube holds greater significance. The auxiliary control law effectively mitigates interference impact on the system by considering variations in the system’s cost function.
Assumption 2.
  • The error tube cross-section shape set E n (i.e., outer invariant approximation of the minimal robust positively invariant set [34]) is compact, convex, and contains the origin such that { ( A + B K ) e : e E } W α k + 1 E , k N 1 ;
  • The state tube cross-section shape set is given by = C o n v { ( e ) : e E } ;
  • The control tube cross-section shape set V is given by V = C o n v { v ( e ) : e E } .
If E satisfies Assumption 2, then for any established α k , it holds that e k α k E . Further, the nominal state and control input are restrained indirectly as ( e ) and v ( e ) V . It is clear that if e k α k E , v ( e ) V , then the satisfaction of original constraints u U for w W is guaranteed by using the control scheme u k ( e ) = υ ( e k ) + K e k .
Next, the fuzzy-based tube size controller is employed to estimate the error tube scaling, which generates the scaling vector by considering the local error and the error variety rate. The components of the fuzzy controller [35] include some fuzzy IF-THEN rules and a fuzzy inference engine. The fuzzy inference engine utilizes the IF-THEN rules to map from input error e n and error variety rate e c n to an output variable α . The lower and upper bound values of e and e c are represented as ± e 0 and ± e 0 , respectively. Furthermore, divide the two-dimensional graph comprising e and e c into nine distinct regions, as depicted in Figure 1. Upper and lower limits for both e and e c define these regions. Each region, denoted as G i , corresponds to a specific IF-THEN rule. The fuzzy controller accurately determines the region within the graph where a given pair of values for e and e c are located, based on the provided input. Subsequently, it employs IF-THEN rules to calculate the appropriate scaling variables. Taking G 3 + as an illustrative example, in this particular scenario, when e e 0 and e e 0 , it indicates a relatively high positive deviation of the system’s state error with a gradual increase. In such circumstances, the controller generates a diminished value for α as an output, ensuring that the system’s state error exhibits a tightening trend.
To be specific, fuzzy IF-THEN rules are written as
  • ( G 1 ). IF e e 0 and e c e 0 or e e 0 and e c e 0 THEN α takes on a smaller value;
  • ( G 2 ). IF | e | e 0 and e c e 0 or | e | e 0 and e c e 0 THEN takes on a slightly larger value;
  • ( G 3 ). IF e e 0 and e c | e 0 | or e e 0 and e c | e 0 | THEN α takes a value as small as possible.
  • ( G 4 ). IF | e c | e 0 and e e 0 or | e c | e 0 and e e 0 THEN α takes on a larger value;
  • ( G 5 ). IF | e | < e 0 and | e c | < e 0 THEN α takes a value as large as possible.
For convenience, let the universe of e be a~b and set the universe of e c as c~d. The membership degree function is taken as the triangular function. Then, singleton fuzzifier and average center defuzzifier [36] were used to calculate outputs α based on the feedback values of e and e c in the form of
α = i = 1 5 η i μ ( e , e c ) i = 1 5 μ ( e , e c ) ,
where μ ( ) is the membership degree of the five cases mentioned above. The η i is an adjustable weight parameter of α under a different context. Afterward, the successor value of α is determined by
α k + 1 = α k + τ ,
with
τ = max λ { λ | W λ E } .
Theorem 1.
Given system (1) controlled with the control policy u { u 0 , u 1 , , u N 1 } , the state error e k is restricted to the error tube α k E . To be specific, the design of the disturbance rejection rate ensures that error lim k e k 0 for w W .
Lemma 1
([37]).
s 1 T F s 2 + s 2 T F s 1 s 1 T F s 1 + s 2 T F s 2 ,
where s 1 , s 2 are any vector. F m × n is a positive definite matrix.
Proof of Theorem 1.
Consider the error system (25). The disturbance rejection gain K guarantees that e k is constrained to be inside the set α k E , i.e., x k z k + α k E . Since the nominal system (3) has robust stability, the nominal state z k should converge to the origin d ( z k , 0 ) 0 . Then, the state error e k must converge to error tube α k E because of x k z ( e k 1 ) + α k E , namely d ( e k , α k E ) 0 . Finally, the state error e k is restricted to a variable error tube α k E whose center is at the origin by implementing the ancillary control law K e k .
Here, the disturbance rejection gain K is solved by the following equation
K T B T H I n + H I n B K + 2 B K H I n 2 + ( 3 τ 2 + 1 ) H I n = Q ,
where H is determined by equation H = ( V N ( d N ) ) T V N ( d N ) and Q is a positive definite matrix. I n denotes the identity matrix with the same dimensions as the state vector x k . For convenience, let us set H I n = P V .
Then, P is the solution to the following Lyaponuv equation
ε 2 A + B K P 2 P = Q e K Q v 2 ,
with
ε ( 1 , 1 E i g max ( A + B K ) ) ,
where E i g max ( ) is the maximum value of the matrix eigenvalue.
The Lyapunov candidate function is represented as
V L = e k T P V e k ,   k N .
Consider the first difference equation as
Δ V L = e k + 1 T P V e k + 1 e k T P V e k .
By substituting Equation (25) into Equation (33), one obtains
Δ V L = ( B K e k ) T P V e k + ( B K e k ) T P V B K e k + ( B K e k ) T P V w k + w k T P V e k + w k T P V B K e k   + w k T P V w k + e k T P V B K e k + e k T P V w k .
According to Lemma 1, then it follows that
Δ V L ( B K e k ) T P V e k + e k T P V e k + e k T P V B K e k + 2 ( B K e k ) T P V B K e k + 3 w k T P V w k .
Or, equivalently
Δ V L e k T ( K T B T P V + 2 B K P V 2 ) e k + e k T P V e k + e k T P V B K e k + 3 w k T P V w k .
According to Equation (28), the disturbance is bounded by τ as w k τ e k . We have
Δ V L e k T ( K T B T P V + 2 B K H P V 2 + P V B K + ( 3 τ 2 + 1 ) P V ) e k .
By substituting Equation (29) into Inequation (37), further obtain
Δ V L e k T ( Q ) e k .
It is clear that Δ V L 0 , thus the function (32) is a decreasing function, then lim t e k 0 . □
This section shows that optimal cross-sections of the error tube are calculated online by considering the adjustable tube-scaling parameters α k , which are affected by a combination of error and error variety ratio. Theorem 1 shows that the successor estimation of the actual system has a non-increasing estimation error at each time step. The design of the fuzzy-based tube size controller and the auxiliary control law considers both variations in state error and changes in the value of cost function during error adjustment, unlike previous literature [15] that incorporates optimization of scaling vector α k into OCP, thereby increasing computational complexity and aiming to optimize scaling vectors to a specific value. In addition, we discover that the appropriate selection of the acquisition form of a nominal system can improve prediction accuracy. Nonetheless, the invariable nominal system is considered during the prediction in [24,25]. In order to improve the control performance, our main concern here is to define a parameter estimation scheme that generates a time-varying nominal system based on the DNN algorithm and still enables a computationally tractable RMPC algorithm, which is presented in the following.

3.2. Design of DNN-Based Nominal RMPC

This section focuses on designing the DNN-based nominal RMPC to construct a parameter estimation synthesis that provides a time-varying nominal system for the control scheme. The cost function for the constrained system proposed in conventional RMPC is reformulated as an online learning problem by introducing a series of reference control inputs v k = υ θ ( e k ) parameterized by θ. The modified OCP N θ ( e ) , solved online, is defined by
V N θ ( e ) = inf φ N k = 0 N 1 ( e k Q e 2 + υ θ ( e k ) Q v 2 ) + ( e N ) P 2 ,
φ N θ ( e ) = arg inf φ N k = 0 N 1 ( e k Q e 2 + υ θ ( e k ) Q v 2 ) + ( e N ) P 2 .
s . t .   e k α k E ,   k N 1 ,
( A + B K ) e k W α k + 1 E ,   k N 1 ,
{ υ θ ( e k ) } U α k K E ,   k N 1 ,
{ z ( e k ) } X α k + 1 E ,   k N 1 ,
A x k + B u k W z ( e k ) α k + 1 E ,   k N 1 ,
E N E f .
The parameters θ will update in the direction of the gradient θ V N θ ( e ) of the cost function by adopting the policy gradient method. In the architecture of constrained DNN-based nominal RMPC, the state errors ( e k , , e N ) are used as input to create the optimal control policy υ θ ( e k ) as the output of DNN.
This paper employs DNN characterized by inherent symmetry, which features symmetric weights, facilitating efficient parameter sharing. Consequently, the network demands fewer computational resources than conventional network structures, rendering them advantageous in resource-constrained environments. Assuming the network has L hidden layers, the layers 1 and L each consist of i neurons. The architecture of a deep neural network is illustrated in Figure 2.
The superiority of the network architecture employed in this paper over a typical neural network structure is demonstrated in Table 1.
From Table 1, it can be observed that in symmetric neural networks, the number of weights to be calculated is reduced since each connection is computed only once and then shared. Notably, despite having fewer parameters, deep neural networks with symmetric structures achieve higher accuracy under the same computational resources. Regarding convergence, symmetric neural networks require 35.79% fewer iterations than conventional neural networks.
The output of the DNN-based nominal RMPC is formulated as
υ θ ( e k ) = δ ( i = 1 m W L a L 1 + b L ) ,
where the linear relationship coefficient matrix and bias vector between the hidden layer and the output layer are denoted as W n × m and b n × 1 , respectively. The affine function parameters θ = { W 1 : L , b 1 : L } will be optimized. δ is a rectified linear unit function. The output value of the hidden layer is a L m × 1 , and set the input a 1 to e k .
Since the neural network may output a potentially infeasible υ θ ( e k ) for a given error e k , Dykstra’s projection algorithm [38] is introduced to ensure that subsequent states and controls remain feasible. Its structure is shown in Figure 3.
Theorem 2.
By applying Dykstra’s projection algorithm, the optimal control input υ θ ( e k ) converges to the orthogonal projection of υ θ ( e k ) onto the polytopic U α k K E as t .
Proof of Theorem 2.
First, define the orthogonal projection of υ θ ( e k ) onto the polytopic U α k K E as P ( υ θ ( e k ) ) , and a series of variables v ( k , t ) and I ( k , t ) are generated from the DNN structure, which is extended by Dykstra’s projection algorithm. It then iterates as
v ( k , t ) = P ( υ θ ( e k ) ) ( v ( k 1 , t ) I ( k , t 1 ) ) ,
I ( k , t ) = v ( k , t ) ( v ( k 1 , t ) I ( k , t 1 ) ) .
Assume that the starting condition of the algorithm is v ( 0 , 0 ) = v ( e 0 ) and I ( 0 , 0 ) = 0 . When t , we have I ( k , t ) = I ( k , t 1 ) , it is clearly that v ( k , t ) = v ( k + 1 , t ) (i.e., the nominal control input υ θ ( e k ) converges to P ( υ θ ( e k ) ) ). □
Thus, given a state error e k , control policy will output P ( υ θ ( e k ) ) = f ( e k ; θ ) .
According to the policy gradient theory presented in [39], the gradient of the value function θ V N θ ( e ) with respect to the policy parameters θ is
θ V N θ ( e ) = E v [ V N θ ( e ) θ log ϕ ( v t ; f ( e t ; θ t ) , Σ ) ] ,
where ϕ ( v t ; f ( e t ; θ t ) , Σ ) is a multivariate Gaussian probability density function used to sample control inputs υ θ ( e k ) , centered at the DNN output f ( e k ; θ ) with diagonal covariance Σ, the covariance Σ anneals to 0 at the end of training to return to the control police.
The neural network parameters iterate by using stochastic gradient descent as
θ t + 1 = θ t γ t V N θ ( e ) θ t log ϕ ( v t ; f ( e t ; θ t ) , Σ ) .
The learning rate γ t of DNN is selected as a positive number.
The termination criterion for the iteration is defined as
| θ t + 1 θ t | | V N θ ( e ) V N 1 θ ( e ) | .
For application of the proposed approach, instead of focusing on constructing a set of polytopic regions, function approximation and reinforcement learning techniques are used to directly learn an approximate optimal control policy. Furthermore, the policy gradient method guarantees the control action converges to locally optimal solutions by applying function approximation to generate unbiased estimates of the gradient with respect to the parameter θ. The proposed optimization method significantly enhances the computational performance of the system control while ensuring the feasibility of control inputs.

3.3. The Feedback Mechanism of the Control Synthesis

In this paper, the feedback loop encompasses state error, state error variety rate, and cost function, as illustrated in Figure 4. Expressly, the state error and error variety rate are conveyed to the fuzzy controller, subsequently yielding an error scaling vector associated with constraints at the subsequent time step. Simultaneously, the state error contributes to the optimization process of the cost function. The resulting cost function value is then fed back into the auxiliary control law, thereby determining the auxiliary control rate for the upcoming sampling time.
The comparison between the computational performance of the proposed algorithm and HTMPC is shown in Table 1. Where q X , q U , q E , and q E f in Table 1 denote the numbers of affine inequalities of the irreducible representation of the sets X , U , E , and E f employed in the propose scheme; q S and q G f are the numbers of affine inequalities of the irreducible representation of the state homothetic set and the terminal constraint set, respectively.
Table 2 clearly demonstrates that assigning the scaling vector to the fuzzy controller’s specialized treatment not only provides a more comprehensive consideration of the impact of state error and error variety rate on error tube scaling, but also effectively reduces the number of decision variables and inequality constraints in the optimization process. Furthermore, the design of a symmetric constrained DNN structure addresses the issue of the exponential growth of polyhedra construction with the increasing number of constraints during the optimization. Consequently, implementing the proposed algorithm allows for a substantial reduction in computational complexity while enhancing the flexibility of system control.

3.4. The DNN-Based RMPC with a Fuzzy-Based Tube Size Controller Structure

To recapitulate, the proposed RMPC scheme comprises a fuzzy-based tube size controller and a DNN-based nominal RMPC part. The fuzzy-based tube size controller is employed to adjust error tube scaling. Meanwhile, the tightened sets (i.e., the minimal disturbance invariant set with an adjustable parameter α k ) and disturbance rejection gain K are computed online to restrain state error. Then, the DNN nominal RMPC is used to generate the time-varying nominal system in the case that the constraints associated with α are satisfied. It provides a theoretically rigorous and technically achievable framework for RMPC with online parameter estimation to improve calculated performance.
In this paper, we obtain the error tube shape set E by computing the minimum robust positively invariant set using the method described in [34]. Moreover, we set the error bound e 0 to 3.7 and secured the rate of error variation e 0 by 2.5. As parameter α typically ranges between 1 and 2, we design the fuzzy rule table in the Table 3 format. Table 3 shows that inputting a data pair ( e 0 , e 0 ) determines a reasonable value for α , which subsequently dictates α k + 1 ’s value according to Equation (27). The determination of α k + 1 further influences determining associated constraints α k + 1 E , α k + 1 K E , X α k + 1 E , and U α k + 1 K E . Additionally, disturbance rejection rate and terminal cost function for control can be determined based on Equations (29) and (30). To meet specified constraints, constrained time-varying nominal system trajectories are computed through a Dykstra’s projection algorithm-extended constrained DNN. The actual system will track the nominal trajectory while satisfying relevant conditions. Section 4 will explicitly discuss DNN parameter settings depending on the dimensionality of input and output variables. Specifically, Algorithm 1 gives the main procedure of the proposed control scheme, and its whole structure diagram is presented in Figure 5.
Algorithm 1 DNN-based RMPC with a fuzzy-based tube size controller
Given initial conditions e 0 = 0 , α 0 = 1 and weighting matrices Q e , Q v , determine the set E .
Compute the terminal weight matrix P and disturbance rejection gain K by using (29) and (30).
1: Randomly initialize θ
2: Set learning rate γ
3: for each time instant k = 0,1,2,…,N do
4: Compute polytopic α k E , α k K E , X α k E and U α k K E
5:  if constraints (41)–(46) are satisfied then
6:   repeat calculate θ t + 1 by using (51)
7:   until convergence
8:  else
9:   let e k + 1 = e k
10:  end if
11: Solve the optimization problem (39) and (40) based on θ t + 1 to obtain v k * ( e k ) ,
12: Compute the error variety rate e c and the corresponding scaling vector α , then obtain the successor scaling vector α t + 1 by using (27),
13: Calculate the control input as u k ( e ) = υ θ ( e k ) + K e k , and then implement u k to the system.
14: end for

4. Simulations and Comparison Study

In this section, the advantages of the Algorithm 1 are illustrated by the following simulation examples of both 2-D and 4-D systems. The simulation experiments were conducted using Matlab, and the polyhedral constraint set was constructed utilizing the Mosek and MPT toolbox. Subsequently, the convex optimization problem of the actual system was solved. Deep learning toolboxes were employed to train neural networks for determining optimal control inputs in a nominal system.
Example 1.
Consider a 2-D double integrator discrete-time system in the form of (1) with
A = [ 1 1 0 1 ] ,   B = [ 0.5 1 ] .
The state constraints are x X { [ 10 , 2 ] × [ 10 , 2 ] } , the disturbance is bounded as w W { w | w 0.1 } , and the control constraint is u U { u | | u | 1 } . The performance index function is defined in (39)–(46) with Q e = I 2 and Q v = 0.01 , the terminal cost V f ( e ) is the value function e N P 2 , while P is calculated from (30). Then, disturbance rejection gain K is computed by using (29). The set E is computed as a polytopic. The horizon length is selected as N = 12 . The system is simulated using the initial condition x 0 = z 0 = ( 4 , 2 ) and α 0 = 1 , the value of α k + 1 is induced by Equation (27). In the context of neural network architecture determination, Figure 6 compares system nominal state trajectories when employing different network structures and deep neural networks with varying layers.
Indeed, from Figure 6, it is evident that when utilizing a symmetric DNN with six hidden layers, the trajectories of system nominal states can reach the desired values more rapidly (i.e., the trajectories of system nominal states reaching the origin by the 12th sampling time).
The state trajectories for the proposed RMPC scheme are indicated in Figure 7. The solid line represents the state trajectory of the nominal system (3), while the dash-dot line is the state trajectory of the actual system (1). The error tube α k E is depicted by green polytopes, while the 0-step homothetic tube controllability set X0 is represented by the dark gray area. Obviously, the local state at each instance is regulated in an error tube α k E centered around the trajectory of the nominal state. As anticipated, the cross-section of the error tube diminishes as the nominal state converges towards the origin.
Then, in order to make the comparison between the control performance of Algorithm 1 and the RMPC algorithm more apparent, N is set to 25. Figure 8 shows the state curves for Algorithm 1 and the HTMPC strategy. The state constraint is shown in the gray region. Algorithm 1 makes that initiating from an initial condition significantly distant from the desired equilibrium point enables faster convergence to the target state while maintaining a narrower range of fluctuation in state error when satisfying origin constraints for disturbances and state.
Figure 9 presents the control input curves generated by two optimization methods. The region shown in gray is U . Obviously, the control action of the actual system (1) consistently satisfies the control constraint. Meanwhile, Algorithm 1 accelerates the convergence of the control input toward the desired equilibrium point with reduced overshoot.
For the purpose of validating the efficacy of Algorithm 1 in reducing optimization time, a statistical analysis was conducted on the optimization time. Furthermore, to investigate the trend of optimization time, a slightly larger value of N (N = 50) was selected during the experimentation. As shown in Figure 10, the computational efficiency of Algorithm 1 is generally 2–3 orders of magnitude faster than HTMPC. In addition, as N increases, the calculation time for HTMPC exhibits an exponential growth trend. In contrast, the calculation time required by Algorithm 1 shows a gradual slowing trend and eventually stabilizes within 0.16 ms. Specifically, Algorithm 1 saves an average of 339.54 times more optimization time than HTMPC. When N = 50, A1 can save 726.23 times the optimization time compared to HTMPC.
Example 2.
To further authenticate the proposed approach, consider the system of the form (1) with four state dimensions and two control input dimensions as
A = [ 1 1.5 0 0 0.5 - 0.5 1 0 0 0.1 0.1 0 0.5 0 0.5 0.5 ] ,   B = [ 0 1 1 0.1 1 0 0 0 ] .
Constraints are given by the inequalities as
x X { x | | x | [ 5 5 2 2 ] } ,   u U { u | | u | [ 1 1 ] } .
The parameters are set to horizon N = 30, weighting matrices Q e = d i a g { 10 , 10 , 1 , 1 } and Q v = d i a g { 0.01 , 0.01 } . The system is simulated according to the provided initial condition x 0 = z 0 = ( 3 , 4 , 1.5 , 1 ) . The other parameters of the system are under the same conditions as those in Example 1. Algorithm 1 will be implemented in this system to test its control performance for large-scale systems. Furthermore, the final DNN structure is determined by comparing the Euclidean norms of state errors generated when applying different deep neural network architectures, as illustrated in Figure 11. Specifically, the chosen DNN configuration comprises a symmetric deep neural network with eight hidden layers, each containing 14 neurons.
The Euclidean norm e 2 = i = 1 4 ( e i ) 2 is employed to depict the trend of state error changes. As indicated in Figure 11, it is observed that when applying a symmetric neural network with eight hidden layers, the system’s state error is generally more minor and converges within the neighborhood of zero more quickly.
Figure 12 depicts the state variable curves for each dimension. The figure demonstrates that the time-varying nominal system obtained by online learning results in a slight error and shorter adjustment time during the convergence of the nominal state. The translucent area in these figures represents the range of error fluctuations; evidently, Algorithm 1 generally yields a bound on state errors than HTMPC, indicating greater flexibility in scaling the state tube. Furthermore, in Table 2, a visual comparison is performed using specific data to effectively demonstrate the error-constraining capabilities when evaluating the tracking performance of the actual system against the nominal system, employing Algorithm 1 and HTMPC. In order to mitigate the extreme influence of outliers, we opted for the mean squared error (MSE), known for its numerical stability, as the metric for assessing the tracking performance.
The utilization of Algorithm 1 for controlling a 4-D system, as illustrated in Table 4, leads to a minor MSE between the nominal and actual states across all four dimensions. The average MSE of the four dimensions is reduced by 67.86% when Algorithm 1 is employed, compared to its counterpart HTMPC. Consequently, the implementation of Algorithm 1 ensures a closer approximation of the actual state to the nominal state with reduced error.
Moreover, the time-varying nominal system generated by Algorithm 1, as depicted in Figure 13, exhibits enhanced control input stabilization capabilities with a faster convergence rate and reduced overshoot.
From a computational perspective, Algorithm 1 exhibits more pronounced advantages regarding computational efficiency for large-scale systems. As illustrated in Table 5, it can be observed that the proposed method significantly reduces the computation time to less than 6 ms when applied to four-dimensional input systems. In contrast, the HTMPC approach requires a longer computation time. On average, Algorithm 1 achieves optimization up to 7218.07 times faster than HTMPC.

5. Conclusions

This paper presents a mathematically rigorous and computationally tractable RMPC scheme for constrained linear systems with bounded disturbance. Firstly, a more flexible approach is proposed to adjust the size of the corresponding tube cross-section by incorporating a fuzzy-based tube size controller, which is influenced by both error magnitude and error variability ratio. Subsequently, the OCP for systems is reformulated as an online learning problem with iterative parameters. A time-varying nominal system for the control scheme is generated from the DNN-based nominal RMPC. Additionally, Dykstra’s projection algorithm is incorporated into the DNN optimization process to ensure the feasibility of the successor state and control input. The proposed integrated control strategy significantly reduces the computational time while enhancing control effectiveness, thereby enabling its potential application in large-scale systems. Simulation results demonstrate the effectiveness of the proposed optimal control algorithm. The current study is constrained by the need for a measurable criterion for evaluating the suboptimal nature of the derived control law, thus impeding our ability to ascertain its degree of alignment with an optimal solution. To address this constraint, it might be imperative to devise metrics or algorithms capable of proficiently assessing the efficacy of the control.

Author Contributions

Conceptualization, S.Y. and Y.L.; methodology, S.Y.; software, Y.L.; validation, Y.L. and H.C.; formal analysis, S.Y.; investigation, Y.L.; resources, S.Y.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and H.C.; supervision, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61703224 and 61640302.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

  1. Yang, H.Y.; Yan, Z.P.; Zhang, W.; Gong, Q.S.; Zhang, Y.; Zhao, L.Y. Trajectory tracking with external disturbance of bionic underwater robot based on CPG and robust model predictive control. Ocean Eng. 2022, 263, 112215. [Google Scholar] [CrossRef]
  2. Shi, H.Y.; Li, P.; Su, C.L.; Wang, Y.; Yu, J.X.; Cao, J.T. Robust constrained model predictive fault-tolerant control for industrial processes with partial actuator failures and interval time-varying delays. J. Process Control 2019, 75, 187–203. [Google Scholar] [CrossRef]
  3. Xie, Y.Y.; Liu, L.; Wu, Q.W.; Qian, Z. Robust model predictive control based voltage regulation method for a distribution system with renewable energy sources and energy storage systems. Int. J. Electr. Power Energy Syst. 2020, 118, 105749. [Google Scholar] [CrossRef]
  4. Ojaghi, p.; Bigdeli, N.; Rahmani, M. An LMI approach to robust model predictive control of nonlinear systems with state-dependent uncertainties. J. Process Control 2016, 47, 1–10. [Google Scholar] [CrossRef]
  5. Zheng, Y.P.; Li, D.W.; Xi, Y.G.; Zhang, J. Improved model prediction and RMPC design for LPV systems with bounded parameter changes. Automatica 2013, 49, 3695–3699. [Google Scholar] [CrossRef]
  6. Fleming, J.; Kouvaritakis, B.; Cannon, M. Robust tube MPC for linear systems with multiplicative uncertainty. IEEE Trans. Autom. Control 2015, 60, 1087–1092. [Google Scholar] [CrossRef]
  7. Hertneck, M.; Köhler, J.; Trimpe, S.; Allgöwer, F. Learning an approximate model predictive controller with guarantees. IEEE Control Syst. Lett. 2018, 2, 543–548. [Google Scholar] [CrossRef]
  8. Langson, W.; Chryssochoos, I.; Raković, S.V.; Mayne, D.Q. Robust model predictive control using tubes. Automatica 2004, 40, 125–133. [Google Scholar] [CrossRef]
  9. Mayne, D.Q.; Raković, S.V.; Findeisen, R.; Allgöwer, F. Robust output feedback model predictive control of constrained linear systems: Time varying case. Automatica 2009, 45, 2082–2087. [Google Scholar] [CrossRef]
  10. Yu, S.Y.; Maier, C.; Chen, H.; Allgöwer, F. Tube MPC scheme based on robust control invariant set with application to Lipschitz nonlinear systems. Syst. Control Lett. 2013, 62, 194–200. [Google Scholar] [CrossRef]
  11. Limon, D.; Alvarado, I.; Alamo, T.; Camacho, E.F. Robust tube-based MPC for tracking of constrained linear systems with additive disturbances. J. Process Control 2010, 20, 248–260. [Google Scholar] [CrossRef]
  12. Cannon, M.; Kouvaritakis, B.; Raković, S.V.; Cheng, Q.F. Stochastic tubes in model predictive control with probabilistic constraints. IEEE Trans. Autom. Control 2011, 56, 194–200. [Google Scholar] [CrossRef]
  13. Mayne, D.Q.; Seron, M.M.; Raković, S.V. Robust model predictive control of constrained linear systems with bounded disturbances. Automatica 2005, 41, 219–224. [Google Scholar] [CrossRef]
  14. Mayne, D.Q.; Raković, S.V.; Findeisen, R.; Allgöwer, F. Robust output feedback model predictive control of constrained linear systems. Automatica 2006, 42, 1217–1222. [Google Scholar] [CrossRef]
  15. Raković, S.V.; Kouvaritakis, B.; Findeisen, R.; Cannon, M. Homothetic tube model predictive control. Automatica 2012, 48, 1631–1638. [Google Scholar] [CrossRef]
  16. Raković, S.V.; Cheng, Q.F. Homothetic tube MPC for constrained linear difference inclusions. In Proceedings of the Chinese Control and Decision Conference (CCDC), Guiyang, China, 25–27 May 2013; pp. 754–761. [Google Scholar]
  17. Georgiou, A.; Tahir, F.; Jaimoukha, I.M.; Evangelou, S.A. Computationally Efficient Robust Model Predictive Control for Uncertain System Using Causal State-Feedback Parameterization. IEEE Trans. Autom. Control 2023, 68, 3822–3829. [Google Scholar] [CrossRef]
  18. Zhang, B.; Yan, S. Asynchronous constrained resilient robust model predictive control for markovian jump systems. IEEE Trans. Ind. Inform. 2020, 16, 7025–7034. [Google Scholar] [CrossRef]
  19. Olaru, S.; Dumur, D. A parameterized polyhedra approach for the explicit robust model predictive control. In Informatics in Control, Automation and Robotics II; Springer: Dordrecht, The Netherlands, 2007; pp. 217–226. [Google Scholar]
  20. Lin, C.Y.; Yeh, H.Y. Repetitive model predictive control based on a recurrent neural network. In Proceedings of the 2012 International Symposium on Computer, Consumer and Control, Taichung, Taiwan, 4–6 June 2012; pp. 540–543. [Google Scholar]
  21. Han, H.; Kim, H.; Kim, Y. An efficient hyperparameter control method for a network intrusion detection system based on proximal policy optimization. Symmetry 2022, 14, 161. [Google Scholar] [CrossRef]
  22. Xu, K.; Huang, D.Z.; Darve, E. Learning constitutive relations using symmetric positive definite neural networks. J. Comput. Phys. 2021, 428, 110072. [Google Scholar] [CrossRef]
  23. Di, M.M.; Forti, M.; Tesi, A. Existence and characterization of limit cycles in nearly symmetric neural networks. IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 2002, 49, 979–992. [Google Scholar]
  24. Ma, C.Q.; Jiang, X.Y.; Li, P.; Liu, J. Offline computation of the explicit robust model predictive control law based on eeep neural networks. Symmetry 2023, 15, 676. [Google Scholar] [CrossRef]
  25. Bumroongsri, P.; Kheawhom, S. An off-line robust MPC algorithm for uncertain polytopic discrete-time systems using polyhedral invariant sets. J. Process Control 2012, 22, 975–983. [Google Scholar] [CrossRef]
  26. Lorenzen, M.; Cannon, M.; Allgöwer, F. Robust MPC with recursive model update. Automatica 2019, 103, 461–471. [Google Scholar] [CrossRef]
  27. Moreno-Mora, F.; Beckenbach, L.; Streif, S. Performance bounds of adaptive MPC with bounded parameter uncertainties. Eur. J. Control 2022, 68, 100688. [Google Scholar] [CrossRef]
  28. Bradtke, S.J. Reinforcement learning applied to linear quadratic regulation. Adv. Neural Inf. Process. Syst. 1992, 5, 295–302. [Google Scholar]
  29. Wang, H.R.; Zariphopoulou, T.; Zhou, X.Y. Reinforcement learning in continuous time and space: A stochastic control approach. J. Mach. Learn. Res. 2020, 21, 8145–8178. [Google Scholar]
  30. Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. Int. Conf. Mach. Learn. 2015, 37, 1889–1897. [Google Scholar]
  31. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
  32. Kolmanovsky, I.; Gilbert, E.C. Theory and computation of disturbance invariant sets for discrete-time linear systems. Math. Probl. Eng. 1998, 4, 317–367. [Google Scholar] [CrossRef]
  33. Wan, Z.; Pluymers, B.; Kothare, M.V.; De Moor, B. Efficient robust constrained model predictive control with a time varying terminal constraint set. Syst. Control Lett. 2006, 55, 618–621. [Google Scholar] [CrossRef]
  34. Raković, S.V.; Kerrigan, E.C.; Kouramas, K.I.; Mayne, D.Q. Invariant approximations of the minimal robust positively invariant set. IEEE Trans. Autom. Control 2005, 50, 406–410. [Google Scholar] [CrossRef]
  35. Nguyen, A.T.; Taniguchi, T.; Eciolaza, L.; Campos, V.; Palhares, R.; Sugeno, M. Fuzzy control systems: Past, present and future. IEEE Comput. Intell. Mag. 2019, 14, 56–58. [Google Scholar] [CrossRef]
  36. Zeng, X.J.; Madan, G.S. Approximation accuracy analysis of fuzzy systems as function approximators. IEEE Trans. Fuzzy Syst. 1996, 4, 44–63. [Google Scholar] [CrossRef]
  37. Bertsekas, D. Dynamic programming and optimal control: Volume I. Athena Sci. 2012, 4, 111–120. [Google Scholar]
  38. Wu, L.G.; Su, X.J.; Shi, P. Dykstra’s Algorithm for a constrained least-squares matrix problem. Numer. Linear Algebra Appl. 1996, 3, 459–471. [Google Scholar]
  39. Sutton, R.S.; Mcallester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Info. Proc. Syst. (NIPS) 2000, 12, 1057–1063. [Google Scholar]
Figure 1. A two-dimensional depiction of the control maneuver. Different regions in the graph represent the following: G 1 corresponds to the region where e e 0 and e c e 0 ; G 1 + depicts the area with e e 0 and e c e 0 ; G 2 + characterizes the area where | e | e 0 and e c e 0 ; G 2 highlights the territory where | e | e 0 and e c e 0 ; G 3 + showcases the domain where e e 0 and e c | e 0 | ; G 3 marks the domain where e e 0 and e c | e 0 | ; G 4 + exemplifies the region where | e c | e 0 and e e 0 ; G 4 describes the space where | e c | e 0 and e e 0 ; G 5 specifies the domain where | e | < e 0 and | e c | < e 0 .
Figure 1. A two-dimensional depiction of the control maneuver. Different regions in the graph represent the following: G 1 corresponds to the region where e e 0 and e c e 0 ; G 1 + depicts the area with e e 0 and e c e 0 ; G 2 + characterizes the area where | e | e 0 and e c e 0 ; G 2 highlights the territory where | e | e 0 and e c e 0 ; G 3 + showcases the domain where e e 0 and e c | e 0 | ; G 3 marks the domain where e e 0 and e c | e 0 | ; G 4 + exemplifies the region where | e c | e 0 and e e 0 ; G 4 describes the space where | e c | e 0 and e e 0 ; G 5 specifies the domain where | e | < e 0 and | e c | < e 0 .
Symmetry 15 01845 g001
Figure 2. Diagram illustrating the architectural structure of the deep neural network.
Figure 2. Diagram illustrating the architectural structure of the deep neural network.
Symmetry 15 01845 g002
Figure 3. Diagram of the constrained DNN structure expanded by Algorithm 1.
Figure 3. Diagram of the constrained DNN structure expanded by Algorithm 1.
Symmetry 15 01845 g003
Figure 4. The flowchart depicting the feedback processing mechanism of the control synthesis.
Figure 4. The flowchart depicting the feedback processing mechanism of the control synthesis.
Symmetry 15 01845 g004
Figure 5. The structure of the constrained DNN-based robust model predictive control scheme with an adjustable error tube.
Figure 5. The structure of the constrained DNN-based robust model predictive control scheme with an adjustable error tube.
Symmetry 15 01845 g005
Figure 6. Nominal state trajectories under various DNN architectures.
Figure 6. Nominal state trajectories under various DNN architectures.
Symmetry 15 01845 g006
Figure 7. The state trajectories of the proposed algorithm (N = 12). Colors in the figure represent specific categories as follows: the green polytopes depicte the error tube for every sampling times; the dark gray area represents the 0-step homothetic tube controllability set X0; the gray area declares the undesirable state area.
Figure 7. The state trajectories of the proposed algorithm (N = 12). Colors in the figure represent specific categories as follows: the green polytopes depicte the error tube for every sampling times; the dark gray area represents the 0-step homothetic tube controllability set X0; the gray area declares the undesirable state area.
Symmetry 15 01845 g007
Figure 8. State astringency comparison between Algorithm 1 and the HTMPC (N = 25). (a) Curves of x 1 obtained by implementing two control algorithms, respectively; (b) curves of x 2 obtained by implementing two control algorithms, respectively.
Figure 8. State astringency comparison between Algorithm 1 and the HTMPC (N = 25). (a) Curves of x 1 obtained by implementing two control algorithms, respectively; (b) curves of x 2 obtained by implementing two control algorithms, respectively.
Symmetry 15 01845 g008
Figure 9. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 25).
Figure 9. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 25).
Symmetry 15 01845 g009
Figure 10. Comparison of Algorithm 1 and the HTMPC for computational efficiency (N = 50). (a) The statistical of computational time for Algorithm 1. (b) The statistical of computational time for HTMPC.
Figure 10. Comparison of Algorithm 1 and the HTMPC for computational efficiency (N = 50). (a) The statistical of computational time for Algorithm 1. (b) The statistical of computational time for HTMPC.
Symmetry 15 01845 g010
Figure 11. Euclidean norm of state error under various DNN architectures.
Figure 11. Euclidean norm of state error under various DNN architectures.
Symmetry 15 01845 g011
Figure 12. State astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of state obtained by implementing two distinct control algorithms, respectively. (b) Curves of state obtained by implementing two distinct control algorithms, respectively. (c) Curves of state obtained by implementing two distinct control algorithms, respectively. (d) Curves of state obtained by implementing two distinct control algorithms, respectively.
Figure 12. State astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of state obtained by implementing two distinct control algorithms, respectively. (b) Curves of state obtained by implementing two distinct control algorithms, respectively. (c) Curves of state obtained by implementing two distinct control algorithms, respectively. (d) Curves of state obtained by implementing two distinct control algorithms, respectively.
Symmetry 15 01845 g012
Figure 13. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of control input v 1 for the nominal system obtained by employing two distinct control algorithms, respectively. (b) Curves of control input v 2 for the nominal system obtained by employing two distinct control algorithms, respectively.
Figure 13. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of control input v 1 for the nominal system obtained by employing two distinct control algorithms, respectively. (b) Curves of control input v 2 for the nominal system obtained by employing two distinct control algorithms, respectively.
Symmetry 15 01845 g013
Table 1. Performance comparison: symmetrical DNN vs. general DNN.
Table 1. Performance comparison: symmetrical DNN vs. general DNN.
Network StructureNumber of Calculated WeightsIterationsPrecision
Symmetrical DNN 2 i ( L + 1 ) 43698.03%
Typical DNN [ m + n + ( i + 1 ) L i + 1 ] i 67997.64%
Table 2. The comparison of computational complexity between the proposed approach and HTMPC.
Table 2. The comparison of computational complexity between the proposed approach and HTMPC.
Control StrategyNumber of Decision VariablesNumber of Inequality ConstraintsUpper Bound on the Number of Critical Regions
Proposed Approach N ( m + n ) + n N ( q X + q U + q E ) + q E f 0
HTMPC N ( m + n + 1 ) + n + 1 N ( q X + q U + q S + 1 ) + q S + q G f 2 N ( q X + q U + q S + 1 ) + q S + q G f
Table 3. Fuzzy rule comparison table.
Table 3. Fuzzy rule comparison table.
Scaling VectorFuzzy Rule Control Region
G 1 G 2 G 3 G 4 G 5
α0.5–0.80.9–1.20–0.41.3–1.61.7–2.0
Table 4. The MSE of state trajectories produced by two distinct control methodologies.
Table 4. The MSE of state trajectories produced by two distinct control methodologies.
Control StrategyMean Squared Error
X1X2X3X4
Algorithm 10.2574075720.2500641790.0814182760.150326006
HTMPC1.2264544840.6066670320.1660215170.300930419
Table 5. Comparison of Algorithm 1 and the HTMPC for computational efficiency.
Table 5. Comparison of Algorithm 1 and the HTMPC for computational efficiency.
Control StrategyHorizon Length (N)
1020304050
Algorithm 10.003938 s0.004592 s0.004823 s0.004967 s0.005094 s
HTMPC23.179 s27.674 s30.239 s38.098 s49.837 s
The table denotes the calculation time unit “second” as “s”.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, S.; Liu, Y.; Cao, H. Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube. Symmetry 2023, 15, 1845. https://doi.org/10.3390/sym15101845

AMA Style

Yang S, Liu Y, Cao H. Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube. Symmetry. 2023; 15(10):1845. https://doi.org/10.3390/sym15101845

Chicago/Turabian Style

Yang, Shizhong, Yanli Liu, and Huidong Cao. 2023. "Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube" Symmetry 15, no. 10: 1845. https://doi.org/10.3390/sym15101845

APA Style

Yang, S., Liu, Y., & Cao, H. (2023). Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube. Symmetry, 15(10), 1845. https://doi.org/10.3390/sym15101845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop