CN118409560B

CN118409560B - Rotary wheel hydraulic servo position control method and system for steel cylinder necking machine

Info

Publication number: CN118409560B
Application number: CN202410874913.XA
Authority: CN
Inventors: 郑伟; 许益; 杨义胜; 陈键
Original assignee: Hangzhou Rebotech Co ltd
Current assignee: Hangzhou Rebotech Co ltd
Priority date: 2024-07-02
Filing date: 2024-07-02
Publication date: 2024-08-23
Anticipated expiration: 2044-07-02
Also published as: CN118409560A

Abstract

The invention discloses a method and a system for controlling hydraulic servo positions of rotary wheels for a steel cylinder necking machine; the invention relates to the technical field of steel cylinder closing-in; selecting wall thickness deviation delta _w and pressing force F as input variables of fuzzy logic control; reading the fuzzy set and the fuzzy rule to infer a fuzzy value alpha (k)', of the correction factor alpha (k); reading the set state space and action space, and introducing a fuzzy value alpha (k)' as a suggestion of one action of the reinforcement learning algorithm; a discretized PID controller is introduced to control a hydraulic servo system; the invention can intelligently process two key variables of wall thickness deviation and pressing force through the fuzzy logic control system in the step S1. The reinforcement learning strategy introduced in step S2 not only considers the dynamic machining precision, but also considers the machining efficiency. By the method, the obtained correction factor alpha (k) can reflect the actual requirements in the machining process more comprehensively, and double optimization of machining efficiency and precision is realized.

Description

Rotary wheel hydraulic servo position control method and system for steel cylinder necking machine

Technical Field

The invention relates to the technical field of steel cylinder necking, in particular to the technical field of hydraulic servo system control of a rotary wheel template back cover process, and particularly relates to a rotary wheel hydraulic servo position control method and system for a steel cylinder necking machine.

Background

The steel bottle back cover closing-in process comprises a flat template back cover process and a rotary wheel template back cover process. The flat-template back cover apparatus is simple, convenient to operate and efficient, but the bottom fusion and concave bottom size control aspects are poor. The rotary wheel template back cover can realize automatic control, the bottom center can reduce local defects and has better fusion, and the concave bottom size is easy to control, so that the main flow technology is generally preferred to the rotary wheel template back cover technology.

The rotary wheel template back cover process is to heat and press a seamless steel pipe with certain specification in a closing machine (also called a back cover machine). The basic mechanical construction and principle thereof can be found in the following documents: jiang Ziliang A steel cylinder hot-spinning hydraulic necking machine [ P ]. CN202021913840.4:2021-05-18;

the process comprises the following steps:

s1, heating the pipe orifice of a seamless steel pipe to about 1100 ℃;

s2, the rotary wheel moves to the pipe orifice linearly and is in tangential contact with the pipe orifice;

S3, the rotary wheel rotates 90 degrees along the end face of the pipe orifice, namely the rotary wheel is in vertical contact with the pipe orifice from tangential contact; in the rotating process, the rotating wheel is pressed by a hydraulic servo system (driven by a hydraulic cylinder), under the action of radial pressing of the rotating wheel shaft, the bottom metal of the seamless steel pipe is fused together and forms an approximately plane shape (hereinafter, the approximately plane shape will be simply referred to as a longitudinal protruding part) with an outer hemispherical shape and an inner center of an umbilical eye;

s4, resetting the rotary wheel, and taking down the steel bottle with the bottom sealed.

However, the tube steel cylinders manufactured by the rotary wheel template back cover process have some defects. The following defect occurrence points are summarized by analysis, as shown in fig. 2:

(1) Landing part: the shape and size of the part of the tube bottle are limited by the spinning back cover process, the thickening is insufficient, the thickness is insufficient, and the fatigue life is low. This thinner region is more susceptible to damage during use of the cylinders, particularly when they are frequently moved or subjected to impact.

(2) Transition section position: the transition section of the steel bottle is the most concentrated part of stress, and the steel tube cannot thicken according to the standard of the longitudinal protruding part in the spinning back cover process, and is not smooth with the bottle body. Such structural discontinuities and stress concentrations can lead to reduced fatigue life and increased risk of problems with the cylinder during use.

The occurrence mechanism of the defects is as follows: in the process of S3, under the radial pressurizing effect of the rotating wheel on the nozzle, the central position of the bottom of the steel cylinder generates a longitudinal protruding part as shown in fig. 2 in the forming process, so that the wall thickness of the bottom of the steel cylinder deviates from the transition section position and the grounding position; when the wall thickness deviation causes low relative temperature, the rotary wheel of the bottom collecting machine generates cracks, slag inclusion and bubbles on the inner wall of the steel cylinder due to improper relative force application of the hydraulic servo system. The mechanism of the deeper layer can be referred to as the following documents, and the description is omitted here:

Wu Chuanxiao structural design and performance test of drawn cold-spun high pressure seamless steel cylinder [ D ] Zhejiang university of industry, 2019;

The reasons and countermeasures for the generation of the longitudinal grooves of the inner container of the vehicle compressed natural gas steel cylinder [ J ]. Petroleum and chemical equipment, 2013, 16 (5): doi: CNKI: SUN: HSFF.0.2013-05-027;

To solve the above-mentioned drawbacks, some of the prior art has improved on it, for example:

One prior art discloses a bottom-closing machine (Fan Yingjun, guo Shijin, li Jiushi, etc.) with adjustable gas cylinder head geometry, a high-pressure seamless large-caliber steel cylinder spinning closing machine [ P ]. CN 02287580.8:2004-11-17), which can improve the modeling of the grounding part and the transition section part by adjusting the relative position of the molding template and the blank, but does not essentially optimize the phenomenon of improper relative force application, and still can possibly generate the problem of inner wall stress;

Another prior art discloses a bottom collector (Wang Zhonghua, bai Yanci. A cylinder closing device [ P ]. CN 98114087.4:2000-06-14) with special configuration, which is constructed by a rotating wheel with a curved surface and an arc angle, so that the modeling of the grounding part and the transition part can be improved, but the phenomenon of improper relative force application is not optimized per se, and the problem of inner wall stress can still occur;

another prior art discloses a secondary molding process (Li Linyu, li Baisheng) based on a rotary wheel template back cover process, namely, seamless steel cylinder automatic closing-up molding equipment [ P ]. CN 201811515063.5:2019-02-12), which performs bottle opening preforming on a steel cylinder workpiece through a closing-up scraper die, and performs bottle opening shoulder secondary molding on the steel cylinder workpiece through a molding rotary wheel, so that the stress at the end part of a bottle opening is improved, and blanks are tidy; however, the phenomenon of improper relative force application is not optimized per se, and the problem of inner wall stress can still occur.

Therefore, the invention provides a method and a system for controlling the hydraulic servo position of a rotary wheel for a steel cylinder necking machine.

Disclosure of Invention

In view of this, the embodiment of the invention hopes to provide a method and a system for controlling the hydraulic servo position of a spinning wheel for a steel cylinder necking machine; the technical scheme of the invention is realized as follows:

in the first aspect, a rotary wheel hydraulic servo position control method for a steel cylinder necking machine comprises the following steps:

summary (a):

the invention aims to solve the following technical problems: considering that the longitudinal protruding part is necessarily formed according to a specific construction mode of the rotary wheel template back cover process, eliminating the longitudinal protruding part is not realistic; but can carry out intelligent control to hydraulic servo, make the wheel of rotating carry out the intelligent regulation dynamics of exerting pressure of adaptation according to the wall thickness deviation, reduce the steel bottle inner wall and produce the possibility of crackle, slag inclusion and bubble, and then optimize the stress concentration phenomenon at landing position and changeover portion position, reinforcing steel bottle life.

The invention adopts the fuzzy logic control and the reinforcement learning strategy, the fuzzy logic control can provide quick and robust preliminary response, and the reinforcement learning can be fine-tuned and optimized on the basis. This combination enables the control system to both utilize the prior knowledge of the expert and to learn and adapt itself.

(II) an improvement idea:

The hydraulic servo system (hydraulic cylinder) of the traditional bottom receiving machine is controlled based on a PLC program, and based on the PLC program, a continuous PID controller algorithm can be introduced into the hydraulic servo system:

；

Wherein: u (t) is the output electrical signal of the controller at time t. e (t) is the deviation of the set value from the actual value, i.e. e (t) =r (t) -y (t), where r (t) is the set value and y (t) is the actual value.

Indicating at a certain momentIs a deviation value of (a). Deviation ofMeans at this point in timeAnd, the difference between the set value (desired value) and the actual value.Is an integral variable (point in time) representing any time from the initial time to the current time t in the integral term.

K _p、K_i and K _d are the gains of the proportional, integral and derivative controllers, respectively. The proportional term K _p e (t) is used to respond immediately to a deviation, the greater the deviation, the more control is effected. The integral term K _i e (t) is used to eliminate steady state errors, increasing the control effort by accumulating past deviations. The differentiating term K _d e (t) is used for predicting the change trend of future deviation, and is controlled in advance to reduce overshoot and improve the stability of the system.

2.1 Discretized PID controller:

The longitudinal protruding part is formed step by step under the action of rotary wheel axle radial pressurization; therefore, discretization processing is required for the PID controller:

；

wherein: u (k) is the controller output at sample time k. e (k) is the deviation at sampling instant k. e (k-1) is the deviation at the sampling instant k-1.

E (j) represents the deviation value at the sampling instant j. r (j) is the set value (expected value) at sampling instant j, and y (j) is the actual value at the same sampling instant, the offset e (j) is defined as；

Discrete PID controllers are implemented in digital systems where time is divided into discrete sample points rather than being continuous. At these sampling points, the system will read the actual value and calculate the deviation from the set point. Thus in practice, e (j) is the deviation between the set value and the actual value at a particular sampling instant j, which plays a key role in the discrete PID control algorithm, helping the system adjust the output to reduce or eliminate such deviation.

2.2 Intelligent discrete PID controller:

In order to dynamically adjust the pressing force of the spinning wheel, a correction term which dynamically changes along with the discrete step length is introduced. The modified discrete PID controller expression is:

；

Where α (k) is a correction factor and it is time-varying (varying with time k), it can dynamically adjust the output u (k) of the PID controller. In other words, it is actually a dynamic scaling of the output of the overall PID controller. The correction factor alpha (k) is updated at each sampling instant k, thereby adjusting the output force of the PID controller in real time.

By varying the value of α (k), the output of the PID controller can be directly scaled up or down. For example when α (k) >1, the output of the controller will be amplified; when 0< alpha (k) <1, the output will be scaled down; and when α (k) =0, the controller output is zero, i.e., no control action is performed.

This mode allows the hydraulic servo system to be dynamically adjusted according to the current relative wall thickness and relative temperature difference to optimize the control effect. For example, when a large wall thickness deviation is detected, α (k) may be increased to enhance the control action; whereas alpha (k) may be reduced to avoid over-control.

The technical context of the scheme is clear: how to dynamically assign a correction factor alpha (k).

(III) technical scheme:

For dynamically assigning the correction factor alpha (k), the scheme selects a preliminary dynamic adjustment strategy based on expert knowledge for alpha (k) through fuzzy logic control. It is ensured that there is a relatively reasonable control strategy at the beginning of the operation of the hydraulic servo system. This strategy is then further optimized using a reinforcement learning algorithm. The reinforcement learning algorithm may adjust the value of α (k) based on real-time state and performance feedback of the system to maximize the closing-in efficiency reward function. In this way, the hydraulic servo system can learn gradually how to adjust α (k) more precisely in different states. The specific operation steps comprise the following steps S1-S3.

3.1 Step S1, reading the fuzzy logic control system:

selecting wall thickness deviation delta _w and pressing force F (the two values are respectively obtained from the existing steel cylinder design drawing and hydraulic cylinder execution parameters) as input variables of fuzzy logic control; reading the fuzzy set and the fuzzy rule to infer a fuzzy value alpha (k)', of the correction factor alpha (k); the method comprises the following steps S100-S102.

3.1.1 Step S100, reading fuzzy sets:

Fuzzy sets, including "small S", "medium M" and "large L", are defined for wall thickness deviations Δ _w and pressing forces F, and membership functions are assigned to these sets.

3.1.1.1 The membership function of the input variable wall thickness deviation Δ _w is:

for a membership function μ _S(Δ_w of small S), a gaussian membership function of the form:

；

Similar forms can be used for membership functions for medium M and large L, but the center value and width parameters are adjusted accordingly to reflect different aggregate ranges.

Membership function μ _M(Δ_w for M) is:

；

Membership function μ _L(Δ_w for large L) is:

；

Wherein c _S is the center value of the set of small S, c _M is the center value of the set of medium M, c _L is the center value of the set of large L, σ ² is the width parameter of the corresponding set, determining the shape of the corresponding membership function. e is the base of the natural logarithm.

The membership function of 3.1.1.2 applied force F is:

The membership function μ _S (F) for small S is:

；

Wherein c _S,F is the central value of the set of small pressing forces S, representing what we consider to be a typical value of "small" pressing forces; σ _S,F is the width parameter of the set, which determines the coverage of the "small" fuzzy set.

The membership function μ _M (F) of M is:

；

Wherein c _M,F is the central value of the set of medium M, representing what we consider to be a typical value of "medium" compression force; σ _M,F is the corresponding width parameter.

The membership function μ _L (F) for large L is:

；

Wherein c _L,F is the central value of the set of large L, representing what we consider to be a typical value of "large" pressing force; σ _L,F is the width parameter of the set.

3.1.2 Step S101, fuzzy reasoning:

The Fuzzy set FS of the Fuzzy value alpha (k)' of the correction factor alpha (k) is deduced from the input wall thickness deviation delta _w and the real-time value of the pressing force F by using a Fuzzy inference engine based on the established Fuzzy rule base (Fuzzy INFERENCE ENGINE).

The fuzzy inference engine uses a series of IF-THEN rules that define the relationship between inputs and outputs. For each rule R _i:

；

Wherein μ _Ai(Δ_w)、μ_Bi (F) and μ _Ci (α (k) ') are membership functions of the corresponding fuzzy sets of inputs Δ _w, F and outputs α (k)' respectively.

The fuzzy inference opportunity outputs a fuzzy set FS of final fuzzy values α (k)' by a weighted average aggregation method according to the activation degree of all fuzzy rules (i.e. the logic and operation result of the input membership degree).

3.1.3 Step S102, defuzzification:

The final fuzzy set FS of fuzzy values α (k) 'is defuzzified to obtain fuzzy values α (k)'.

There are various methods of defuzzification, including a maximum membership method and a centroid method (also referred to as centroid method or area center method), and the like. The scheme takes a gravity center method as an example:

；

where μ _F (x) represents the membership function of the fuzzy set FS, and a and b represent the domain range of the fuzzy set.

But based on the discretized form set forth in section 2.2 above, and the form required in step S3 below, the fuzzy set should be made up of a series of discrete points, so the integration formula described above needs to be further converted into a summed form. Let the fuzzy set FS be composed of n discrete points, each with a membership μ _i and a corresponding value x _i, the discretized centroid formula may be expressed as:

；

By the above expression, the fuzzy set FS of fuzzy values α (k)' can be defuzzified to a specific numerical value.

It will be appreciated that the fuzzy value α (k)' at this time may be used as the correction factor α (k) described in the foregoing 2.3, that is, may be directly substituted into the step S3 for the position control of the subsequent cylinder necking machine. However, the blur value α (k)' at this time takes only the precision factor into consideration, and does not take into consideration the actual machining efficiency factor. Therefore, the subsequent step S2 performs further refinement search on the blur value α (k)' to obtain the dual effect of both the machining efficiency and the machining precision.

3.2 Step S2, introducing a reinforcement learning algorithm:

And reading the set state space and action space, maximizing the closing efficiency through a reward function, introducing a fuzzy value alpha (k)' as a suggestion of one action of the reinforcement learning algorithm, dynamically outputting a correction factor alpha (k) by the reinforcement learning algorithm, and observing rewards obtained after the action is executed and changes of the system state. The method comprises the following steps S200-S203.

3.2.1 Step S200, state space and action space:

1) State space S: comprises a wall thickness deviation real-time value d and an applied pressure real-time value f:

；

wherein D is the range of all possible values of the wall thickness deviation, and F is the set of all possible values of the pressing force.

2) Action space a: the value range defined as the correction factor α (k) is a discretized value:

A = {α₁(k), α₂(k),...,α_n(k)}；

Where α _i (k) is the i-th discrete value that the correction factor α (k) may take at the k-th time instant, and n is the total number of discrete values.

3.2.2 Step S201, reward function:

the method comprises the steps that positive rewards are given when the closing efficiency of the hydraulic servo system action execution is improved, and negative rewards are given when the closing efficiency is reduced; and generating a bonus factor r;

Let current binding off efficiency be E _current, last binding off efficiency be E _previous, then the variation delta E of binding off efficiency represents as:

ΔE = E_current- E_previous；

Based on this variation, a bonus factor r can be defined as a positive and negative bonus mechanism:

；

3.2.3 step S202, execute SARSA algorithm:

parameters and value functions in the algorithm are initialized. The SARSA algorithm combines the fuzzy value alpha (k)', the current efficiency state of the hydraulic servo system and the rewarding factor r, and selects and outputs the actual value of the correction factor alpha (k). The SARSA algorithm selects the action to actually perform based on the suggestion and the current state of the action. The method comprises the following steps S2020-S2022;

3.2.3.1 step S2020, combined with fuzzy logic control:

The SARSA algorithm considers the fuzzy value α (k)' of the fuzzy logic control output and the current efficiency state of the hydraulic servo system when selecting the action to be actually performed. This process may be expressed as selecting an action a _t that is based on the current state s _t and a suggestion of the ambiguity value α (k)' and the value function Q (s _t, a) in the SARSA algorithm. Action a _t chooses to use the epsilon-greedy policy:

；

Wherein, Is a small positive number representing the probability of exploration. In practice, the blur value α (k)' is used as a reference or weight selected by act a _t. a is a specific value of the correction factor α (k) that is selected by the agent based on the current state and environmental feedback.

More specifically, the smart agent will choose an action a _t based on the current state s _t and previous experience (embodied by the value function (Q (s, a)) that is intended to maximize the long-term jackpot.

In selecting actions, the agent uses the epsilon-greedy strategy described above, i.e., in most cases it will select the action currently considered optimal (i.e., the action with the highest Q value, i.e., the output value of the value function Q (s _t, a)), but with a small fraction of the probabilityIt will be caused to randomly select an action to explore the possible better choices. This strategy balances the need to utilize current known information and explore unknown options.

3.2.3.2 Step S2021, action is performed and rewards are observed:

The correction factor α (k) is adjusted according to the action selected by the SARSA algorithm, and after the selected action a _t (i.e., adjustment of the correction factor α (k)) is performed, the newly derived bonus factor r' and the new hydraulic servo efficiency state s _t+1 are observed. This process is the reaction of the environment to the action and the quality of the action is fed back by a reward factor.

3.2.3.3 Step S2022, updating the reinforcement learning algorithm:

The value function Q in the SARSA algorithm is updated based on the observed new prize factor r' and the new state s _t+1 (s _t, a). The update formula follows the update rules of SARSA:

；

Wherein: η is the learning rate, controlling the step size of the update. Gamma is a discount factor used to weigh the importance of future rewards. a _t+1 is the next action selected according to the value function and possible epsilon-greedy policy in the new state _st+1. Q (s _t, a)' is an updated value function.

3.2.4 Step S203, obtain correction factor α (k):

The value function Q (S _t, a) in step S202 is used to estimate the expected return for performing action a in state S. Action a corresponds to a different correction factor α (k). Based on the updated value function Q (s _t, a)', the process of selecting the correction factor α (k) is selected by a strategy:

when selecting action a (i.e., correction factor α (k)), an ε -greedy strategy is employed to find a balance between "utilizing" (exploitation) the best action currently known and "exploring" (exploration) the new action that may be better. Namely, to Action a (correction factor α (k)) with the largest value function in the current state is selected:

；

wherein s _t is the current state;

The principle of the above expression is to randomly select an action based on a defined action space S (i.e., a discrete value range of the correction factor α (k)), and argmax _a(Q(s_t, a)) represents an action a of selecting the largest Q (S _t, a).

The correction factor alpha (k) is thus selected from a predefined action space based on the value function and the current strategy. The value function provides the expected return information for selecting different actions in different states, and the policy defines how actions are selected based on this information.

3.3 Step S3, executing the PID controller:

As described in 2.3 above, a discretized PID controller is introduced to control the hydraulic servo system, and the electrical signal of the hydraulic servo system is dynamically scaled based on the correction factor α (k) obtained:

；

Where u (k) is the controller output (electrical signal) at sample time k. e (k) is the deviation at sampling instant k. e (k-1) is the deviation at the sampling instant k-1. e (j) represents the deviation value at the sampling instant j. r (j) is the set value (expected value) at sampling instant j, and y (j) is the actual value at the same sampling instant, the offset e (j) is defined as ; K _p、K_i and K _d are the gains of the proportional, integral and derivative controllers, respectively.

The correction factor α (k) is obtained based on the fuzzy value α (k)' which is output by the fuzzy logic control system in step S1 and is changed in consideration of the wall selection deviation Δ _w and the dynamic change of the pressing force F, and is output in combination with the reinforcement learning strategy for the machining efficiency in step S2. This correction factor α (k) has the dual effect of both machining efficiency and machining accuracy.

(IV) a mechanism for solving the technical problems:

4.1 Intelligent control System:

The hydraulic servo system is used as the core driving force of the steel cylinder necking machine, and the movement of the rotary wheel is accurately controlled through the hydraulic servo valve. By introducing a fuzzy logic control and reinforcement learning strategy, the system can intelligently calculate the correction factor alpha (k) according to the wall thickness deviation and the real-time data of the pressing force, so as to adjust the output of the hydraulic servo system and realize the accurate control of the pressing force of the rotating wheel.

4.2 Self-adaptive intelligent regulation:

And the control system dynamically adjusts the pressing force of the spinning wheel by utilizing fuzzy logic control and reinforcement learning algorithm according to the feedback data, so that the accuracy and efficiency in the processing process are ensured. The self-adaptive intelligent adjusting mechanism enables the spinning wheel to adjust the pressure applying force in real time according to the wall thickness deviation of different parts, and avoids the conditions of excessive pressure application or insufficient pressure application.

4.3, Reducing the defective rate:

By precisely controlling the pressing force of the rotary wheel, the method can remarkably reduce the possibility of cracks, slag inclusion and bubbles on the inner wall of the steel cylinder. Cracks, slag inclusions and bubbles are common defects in the manufacturing process of steel cylinders, and seriously affect the quality and safety of products. The intelligent control method greatly reduces the probability of generating the defects through real-time monitoring and adjustment, thereby improving the yield of products.

4.4 Optimizing stress distribution and enhancing service life:

the landing part and the transition part of the steel cylinder are key areas for stress concentration. The stress distribution of the areas can be optimized and the stress concentration phenomenon can be reduced by intelligently controlling the pressing force of the spinning wheel. The optimized stress distribution not only improves the structural strength of the steel cylinder, but also effectively prolongs the service life of the steel cylinder. In the long-term use process, the steel cylinder can better resist the change of internal and external pressure, and the damage risk caused by stress concentration is reduced.

In the second aspect, a rotary wheel hydraulic servo position control system for a steel cylinder necking machine:

As shown in fig. 3, the system includes a processor and a memory connected to the processor, where the memory stores program instructions that, when executed by the processor, cause the processor to execute the servo position control method described above, generate the correction factor α (k), and then read by a PLC controller and control a rotary hydraulic servo system.

Compared with the prior art, the invention has the beneficial effects that:

1. Intelligent self-adaptive adjustment: the invention can intelligently process two key variables of wall thickness deviation and pressing force through the fuzzy logic control system in the step S1. The reinforcement learning strategy introduced in step S2 not only considers the dynamic machining precision, but also considers the machining efficiency. By the method, the obtained correction factor alpha (k) can reflect the actual requirements in the machining process more comprehensively, and double optimization of machining efficiency and precision is realized.

2. Accurate real-time control: the invention uses a discretized PID controller algorithm to intelligently control the hydraulic servo system. The control mode can accurately adjust the pressing force according to the real-time wall thickness deviation data through the correction factors, so that the rotary wheel is ensured to be always kept in the optimal state in the processing process.

3. The defective rate is reduced: the invention can obviously reduce the possibility of generating cracks, slag inclusion and bubbles on the inner wall of the steel cylinder through intelligent self-adaptive adjustment. Not only improves the product quality, but also reduces the defective rate, thereby reducing the production cost. Meanwhile, by accurately controlling the pressing force of the rotary wheel, the invention can optimize the stress distribution of the grounding part and the transition section part of the steel cylinder and reduce the phenomenon of stress concentration. This helps to increase the structural strength and service life of the cylinder.

4. And the robustness of the production flow is improved: the method combines a plurality of advanced control strategies, so that the whole production flow has stronger robustness to external interference and internal change. Stable processing quality and efficiency can be maintained even in a complex production environment.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the technical descriptions, and it is obvious that the drawings in the following description are only some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the method of the present invention.

FIG. 2 is a schematic view of the bottom half-section of a cylinder.

FIG. 3 is a schematic diagram of the system components of the present invention.

Fig. 4 is a schematic diagram of the overall stress simulation of the experimental group and the control group of the test example.

Fig. 5 is a schematic diagram of stress simulation of the landing sites of the experimental group and the control group of the test example.

Fig. 6 is a schematic diagram of stress simulation at the transition sections of the experimental group and the control group of the test example.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below;

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simpler, and the relevant points refer to the description of the method.

Embodiment one: as shown in fig. 1, the embodiment discloses a hydraulic servo position control method of a spinning wheel for a steel cylinder necking machine; the intelligent control of the hydraulic servo system is realized, so that the rotating wheel can carry out self-adaption intelligent regulation on the pressure force according to the wall thickness deviation, the possibility of generating cracks, slag inclusion and bubbles on the inner wall of the steel cylinder is reduced, the stress concentration phenomenon of the grounding part and the transition part of the steel cylinder is further optimized, and the service life of the steel cylinder is prolonged; the scheme includes the following steps S1-S3.

In the present embodiment, regarding step S1, the fuzzy logic control system is read: selecting wall thickness deviation delta _w and pressing force F as input variables of fuzzy logic control; the pressing force F can be obtained from the execution parameters of the hydraulic cylinder; the wall thickness deviation Δ _w is determined from the design file by determining the difference between the longitudinal projection as shown in fig. 2 and the other wall thickness. Then, the fuzzy set and the fuzzy rule are read to infer a fuzzy value alpha (k)' of the correction factor alpha (k). The step S1 includes the following steps S100 to S102.

Specifically, in step S100, the fuzzy set is read: fuzzy sets, including "small S", "medium M", and "large L", are defined for the input variable wall thickness deviations Δ _w and the pressing force F, and corresponding membership functions are specified. These fuzzy sets and membership functions will be used for subsequent fuzzy logic reasoning to achieve intelligent control of the spin wheel pressure. Wherein:

1) The membership function of the input variable wall thickness deviation Δ _w is:

Membership function μ _S(Δ_w for small S):

；

Membership function μ _M(Δ_w for M) is:

；

Membership function μ _L(Δ_w for large L) is:

；

It will be appreciated that the membership function described above belongs to a gaussian membership function, and that the degree to which a variable belongs to a fuzzy set can be described smoothly. And the pressing force F is similar to the wall thickness deviation Deltaw, and the Gaussian membership functions are also assigned to the small S, medium M and large L sets of the pressing force F. The central value and the width parameter of the functions are set according to the actual range and typical value of the pressing force:

2) The membership function of the pressing force F is:

The membership function μ _S (F) for small S is:

；

The membership function μ _M (F) of M is:

；

The membership function μ _L (F) for large L is:

；

It will be appreciated that by defining fuzzy sets and membership functions, the uncertainty of the input variables can be described and handled more flexibly. The flexibility of the hydraulic system enables the hydraulic system to better adapt to various complex situations such as small changes in the wall thickness of the steel cylinder or fluctuation in the pressing force. The membership function provides a basis for subsequent fuzzy logic reasoning, so that the system can intelligently adjust the pressing force of the spinning wheel according to the real-time wall thickness deviation and the pressing force data.

Further, the python execution procedure of the above step S101 is as follows:

import math

def gaussian_membership_function(x, c, sigma):

"""

# Gaussian membership function

Param x: # input value

Param c: # center value

PARAM SIGMA standard deviation of the # width parameter

Return: # membership value

"""

return math.exp(-((x - c) ** 2) / (2 * sigma ** 2))

def calculate_membership_functions(input_value, parameters):

"""

# Calculate and return six membership function values

Input value of param input_value: # input value (wall thickness deviation Deltaw or pressing force F)

PARAM PARAMETERS: # parameter dictionary containing center value and width parameter of each membership function

Return: # membership values for small S, medium M, large L

"""

# Initializing membership dictionary

memberships = {'S': 0, 'M': 0, 'L': 0}

# Calculating membership according to parameters

for set_name, params in parameters.items():

c, sigma = params['center'], params['sigma']

memberships[set_name] = gaussian_membership_function(input_value, c, sigma)

return memberships

# Parameter setting (Adjustable)

parameters_delta_w = {

'S': {'center': -1, 'sigma': 0.5},

'M': {'center': 0, 'sigma': 0.5},

'L': {'center': 1, 'sigma': 0.5},

}

parameters_F = {

'S': {'center': 50, 'sigma': 10},

'M': {'center': 100, 'sigma': 10},

'L': {'center': 150, 'sigma': 10},

}

# Main program

Delta_w_value= { } # wall thickness deviation value

F_value= { } # pressing force value

memberships_delta_w = calculate_membership_functions(delta_w_value, parameters_delta_w)

memberships_F = calculate_membership_functions(F_value, parameters_F)

print("Memberships for delta_w:", memberships_delta_w)

print("Memberships for F:", memberships_F)

In the above procedure, the calculate_ membership _functions function will accept the input values (wall thickness deviation Δw or pressing force F) and the related parameters (center value and width parameters) and then return the membership degrees for the corresponding fuzzy sets (small S, medium M, large L). The gaussian_ membership _function computes the gaussian membership of a given input value x to a specific central value c and width parameter sigma. The function traverses the parameters, calculates the membership of the input value to each fuzzy set using a gaussian membership function, and returns a dictionary containing the membership.

Specifically, in step S101, fuzzy reasoning: in this step, intelligent reasoning is performed using a fuzzy inference engine to derive a fuzzy set FS of fuzzy values α (k)' of the correction factor α (k) from the input real-time data of the wall thickness deviation Δ _w and the pressing force F. The step is a core link of intelligent control of the hydraulic servo system, and aims to enable the spinning wheel to adaptively adjust the pressing force according to the real-time wall thickness deviation.

The fuzzy inference engine uses a series of IF-THEN rules that define the relationship between inputs and outputs. Each rule R _i expresses a specific condition to outcome relationship:

；

Wherein μ _Ai(Δ_w)、μ_Bi (F) and μ _Ci (α (k) ') are membership functions of the corresponding fuzzy sets of inputs Δ _w, F and outputs α (k)' respectively. These membership functions are used to quantify the degree to which the input and output belong to a particular fuzzy set ("small S", "medium M", "large L").

Then, fuzzy reasoning is performed as follows:

1) Input blurring: the values of the wall thickness deviation delta _w and the pressing force F measured in real time are converted into membership degrees of the fuzzy set through corresponding membership functions.

2) Rule activation: fuzzy inference opportunities evaluate all IF-THEN rules. For each rule, it calculates the logical AND operation result of the membership degree corresponding to the input value, reflecting the activation degree of the rule.

3) Calculating an output fuzzy value: the fuzzy inference opportunity outputs a fuzzy set FS of fuzzy values α (k)' according to the weighted average aggregation method of all the activated rules. This weighted average considers the degree of activation of each rule and its corresponding output membership:

There are n fuzzy rules, each rule Ri corresponding to an output membership μc _i (α (k)') and an activation degree factor w _i (which is the minimum of the corresponding membership). The fuzzy set FS of the final fuzzy value α (k)' obtained by the weighted average aggregation method is expressed as:

；

where i traverses all rules from 1 to n. This expression is actually a weighted average of the output membership of all rules to get the final fuzzy value.

It will be appreciated that by fuzzy reasoning, the operating system is able to dynamically adjust the fuzzy value of the fuzzy value α (k)' based on real-time wall thickness deviation and applied pressure force data. The self-adaption enables the pressing force of the rotary wheel to respond to the change in the manufacturing process of the steel cylinder more accurately, so that the product quality is improved. The system can adjust the pressure of the rotary wheel according to actual conditions, avoid negative influence of excessive or insufficient pressure on the quality of the steel cylinder, and further enable fuzzy reasoning to be helpful for reducing the possibility of cracks, slag inclusion and bubbles on the inner wall of the steel cylinder.

Further, the python execution procedure of step S101 is as follows, and the above-mentioned calculation_ membership _functions function is called to be executed, and may be implemented with the fuzzy rule base provided in the second embodiment:

Input parameters #)

Parameter_delta_w= {..degree } # wall thickness. Membership function parameter of deviation Deltaw

Membership function parameter of parameters_f= { } # pressing force F

Membership function parameter of parameter_alpha_k_prime= {..degree } # correction factor α (k)'

# Fuzzy inference function

def fuzzy_inference(delta_w, F):

# Calculating membership degree of input value

memberships_delta_w = calculate_membership_functions(delta_w, parameters_delta_w)

memberships_F = calculate_membership_functions(F, parameters_F)

Initializing the membership degree of the output fuzzy set FS to 0

memberships_alpha_k_prime = {key: 0 for key in parameters_alpha_k_prime.keys()}

# Traversing all fuzzy rules for reasoning

for rule_key in memberships_delta_w.keys():

# Obtain input membership of current rule

mu_delta_w = memberships_delta_w[rule_key]

mu_F = memberships_F[rule_key]

Calculation rule activation degree (minimum value of membership degree is taken as activation degree)

rule_activation = min(mu_delta_w, mu_F)

# If the rule is activated (i.e., the degree of activation is greater than 0), then the membership of the output fuzzy set FS is updated

if rule_activation>0:

for output_key in memberships_alpha_k_prime.keys():

The # rule output directly corresponds to the same fuzzy set, e.g., the "small S" input corresponds to the "small S" output, etc.

memberships_alpha_k_prime[output_key] = max(memberships_alpha_k_prime[output_key], rule_activation)

return memberships_alpha_k_prime

# Main program

Delta_w_value=. A specific wall thickness deviation value #

F_value=. # a specific pressing force value

memberships_alpha_k_prime = fuzzy_inference(delta_w_value, F_value)

print("Fuzzy output memberships for α(k)':", memberships_alpha_k_prime)

In the above procedure, the membership of the input values (wall thickness deviation Δw and pressing force F) to the respective fuzzy sets ("small S", "medium M", "large L", etc.) is calculated by calling the calculate_ membership _functions function. The program then traverses all fuzzy rules. For each rule, it calculates the degree of activation of the rule, i.e. the minimum value of the input membership. This minimum value represents the extent to which the precondition portion of the rule is satisfied. If the degree of activation of the rule is greater than 0, indicating that the rule is valid for the current input, the program updates the membership degree of the fuzzy set FS of fuzzy values α (k)' based on the output part of the rule.

Specifically, in step S102, defuzzification: performing gravity center method on the final fuzzy set FS of the fuzzy value alpha (k) 'to defuzzify, and obtaining a fuzzy value alpha (k)':

；

Where μ _F (x) represents the bell-shaped membership function of the fuzzy set FS, and a and b represent the domain range of the fuzzy set. Wherein c is the center point of the bell shape:

；

However, for the form required for the following step S3, the fuzzy set FS should be made up of a series of discrete points, so the integration formula described above needs to be further converted into a summed form. Let the fuzzy set FS be composed of n discrete points, each point has a membership degree μ _i, and the corresponding value is x _i, then the discretized gravity center method is implemented as follows:

；

It is understood that the fuzzy value α (k)' at this time may be directly substituted into the step S3 for the position control of the subsequent cylinder closing machine. However, the blur value α (k)' at this time takes only the precision factor into consideration, and does not take into consideration the actual machining efficiency factor. Therefore, the subsequent step S2 performs further refinement search on the blur value α (k)' to obtain the dual effect of both the machining efficiency and the machining precision.

Further, the python execution procedure of step S102 is as follows:

def defuzzify(FS):

"""

The fuzzy set FS is defuzzified by the gravity center method.

FS, fuzzy set, expressed as dictionary form, keys as possible output values, and corresponding membership.

"""

# Initializing numerator and denominator

numerator = 0

denominator = 0

Traversing each element in the fuzzy set

for value, membership in FS.items():

numerator += value * membership

denominator += membership

Calculation of the defuzzified value using the barycentric equation #

if denominator == 0:

Raise ValueError (the sum of membership of fuzzy sets cannot be 0')

defuzzified_value = numerator / denominator

return defuzzified_value

# Example fuzzy set FS

FS_example = {

10: 0.1,

20: 0.5,

30: 0.8,

40: 1.0,

50: 0.6,

60: 0.3

}

Defuzzification of example fuzzy sets

alpha_k_prime = defuzzify(FS_example)

Print (f "defuzzified value α (k)' = { alpha_k_prime }")

The principle of the above procedure is: the defuzzification uses a gravity center method, and a specific numerical value is obtained by calculating a weighted average of all elements in the fuzzy set. Two variables numerator (numerator) and denominator (denominator) are first initialized to zero. Each element in the fuzzy set is then traversed, calculating a weighted sum (numerator) and a sum of membership (denominator). Finally, obtaining the defuzzified value through division operation. The function returns a specific value alpha_k_prime after defuzzification, which can be directly used for a subsequent control system.

In the present embodiment, regarding step S2, a reinforcement learning algorithm is introduced: and reading the set state space and action space, maximizing the closing efficiency through a reward function, introducing a fuzzy value alpha (k)' as a suggestion of one action of the reinforcement learning algorithm, dynamically outputting a correction factor alpha (k) by the reinforcement learning algorithm, and observing rewards obtained after the action is executed and changes of the system state. The method comprises the following steps S200-S203.

Specifically, in step S200, the state space and the action space are:

；

2) Action space a: an action is a decision made by an agent (i.e., a control system) based on the current state for changing the state of the system or affecting the environment. The value range defined as the correction factor α (k) is a discretized value:

A = {α₁(k), α₂(k),...,α_n(k)}；

It will be appreciated that based on the state space and the action space, the reinforcement learning algorithm is able to more accurately understand the current state of the system and take appropriate action (i.e., select the appropriate correction factor) as needed. The accurate control of the pressure and the position of the rotary wheel is facilitated, so that the manufacturing precision and the quality of the steel cylinder are improved. Because the state space contains real-time values of wall thickness deviation and pressing force, the algorithm can dynamically adjust the correction factors according to the real-time changes of the information. The self-adaptive adjustment capability enables the system to flexibly cope with the change of various processing conditions, and ensures the stable and consistent molding quality of the steel cylinder.

Specifically, in step S201, the bonus function: the method comprises the steps that positive rewards are given when the closing efficiency of the hydraulic servo system action execution is improved, and negative rewards are given when the closing efficiency is reduced; and generating a bonus factor r; let current binding off efficiency be E _current, last binding off efficiency be E _previous, then the variation delta E of binding off efficiency represents as: Δe=e _current- E_previous; this variation represents a change in closing efficiency from the last action to the current action. Based on this variation, a bonus factor r can be defined as a positive and negative bonus mechanism:

；

the mechanism is as follows:

1) When Δe >0, this indicates an improvement in the closing efficiency, and a positive prize, i.e., r= +1, is given. This encourages algorithms to continue to explore those actions that improve the efficiency of the closing.

2) When Δe <0, this indicates a decrease in the closing efficiency, at which time a negative prize, i.e. r= -1, is given. This is a penalty to behaviors that are detrimental to task completion, and the bootstrap algorithm avoids such behaviors.

3) When Δe=0, this indicates no change in the closing efficiency, and the prize factor is 0, i.e., r=0. The current behavior has no positive effect or negative effect.

Further, the python execution procedure of steps S200 to S201 is as follows:

import numpy as np

# definition State space and action space

def create_state_action_spaces():

Value range of wall thickness deviation and pressing force

D=np.area (-1.0, 1.1, 0.1) # range of wall thickness deviations

F=np. area (0, 101, 10) # range of pressing force

# Create State space

state_space = [(d, f) for d in D for f in F]

The # motion space is the discrete value of the correction factor

Action_space=np.area (-1.0, 1.1, 0.2) # exemplary correction factor range

return state_space, action_space

# Definition reward function

def calculate_reward(current_efficiency, previous_efficiency):

delta_E = current_efficiency - previous_efficiency

if delta_E>0:

Return 1 # closing efficiency is improved, and positive rewards are given

elif delta_E<0:

Return-1 # necking efficiency is reduced, and negative rewards are given

else:

Return 0 # closing efficiency is unchanged, and rewards are 0

# Main program

if __name__ == "__main__":

Create status space and action space #

state_space, action_space = create_state_action_spaces()

print("State Space:", state_space)

print("Action Space:", action_space)

Value of # twice closing efficiency

Previous_effeciency=80#; last closing efficiency

Current_efficiency=85#. Current necking efficiency

# Calculate reward

reward = calculate_reward(current_efficiency, previous_efficiency)

print("Reward:", reward)

In the above procedure, the create_state_action_ spaces function generates all possible state combinations according to the preset wall thickness deviation range D and the pressing force range F, and forms a state space. The motion space is defined according to a preset correction factor range, which represents a possible motion adjustment amount. The calculate_reorder function receives as inputs the current and last closing efficiencies. And the change of the closing efficiency is judged by calculating the difference delta_E between the two. The positive and negative of the prize are determined according to the positive and negative of delta_E, if the efficiency is increased, the prize is positive, if the efficiency is decreased, the prize is negative, and if there is no change, the prize is zero.

Specifically, in step S202, the SARSA algorithm is executed: for solving an optimal strategy in a reinforcement learning environment. The SARSA algorithm is characterized in that it considers the current State, the current Action, the next State and the Action to be taken in the next State simultaneously in the learning process, thus obtaining the name SARSA (State-Action-Reward-State-Action). In this step, the SARSA algorithm is combined with fuzzy logic control to achieve intelligent control of the hydraulic servo system.

First, parameters and value functions in the algorithm need to be initialized. The value function is used to estimate the long-term return for taking a particular action in a given state, which is the basis for the SARSA algorithm to make decisions. Initializing these parameters and value functions is a precondition for the algorithm to begin learning. The SARSA algorithm will work in conjunction with the fuzzy value α (k)' of the fuzzy logic control output, the current hydraulic servo efficiency state and the bonus factor r. The fuzzy value alpha (k)' output by the fuzzy logic control reflects the preliminary judgment of the system on the correction factor, and is obtained based on the current wall thickness deviation, the pressing force and other factors. This ambiguity provides a reference point for the SARSA algorithm, but the SARSA algorithm can make further refined decisions based on the actual situation.

The SARSA algorithm then uses the information to select and output the actual value of the correction factor α (k). This process is the core of the algorithm, which takes into account the current state, the actions that may be taken, and the expected return from these actions. By continuously trial and error and learning, the SARSA algorithm can gradually find out the optimal action strategy, namely, how to adjust the correction factors according to the current wall thickness deviation, the pressing force and other factors, so that the self-adaptive adjustment of the pressing force of the spinning wheel is realized. While the SARSA algorithm selects the action actually performed based on the suggestion and the current state of the action. The suggestions of actions here come from the fuzzy logic control of step S1, while the SARSA algorithm is responsible for making the final decision based on these suggestions. By means of the method, the pressing force of the spinning wheel can be dynamically adjusted according to actual conditions, and different processing conditions and wall thickness deviations can be met.

Specifically, in step S203, the correction factor α (k) is acquired: the value function Q (S _t, a) in step S202 is used to estimate the expected return for performing action a in state st. Action a corresponds to a different correction factor α (k). Based on the updated value function Q (s _t, a)', the process of selecting the correction factor α (k) is selected by the following strategy:

In selecting action a (i.e., correction factor α (k)), an ε -greedy strategy is employed:

The action a (correction factor alpha (k)) with the largest value function in the current state is selected with the probability of 1-epsilon, and the correction factor alpha (k) corresponding to the action is the optimal selection in the current state. Meanwhile, an action is randomly selected according to the epsilon probability, so that the system is guaranteed to have enough opportunities to explore new possibilities:

；

wherein s _t is the current state;

The correction factor alpha (k) is thus selected from the predefined action space a based on the value function and the current strategy. The value function provides the expected return information for selecting different actions in different states, and the policy defines how actions are selected based on this information.

It can be appreciated that by using a combination of a value function and an epsilon-greedy strategy, the system can adaptively select the optimal correction factor alpha (k) according to different states, thereby realizing intelligent control of the hydraulic servo system. By optimizing the pressing force of the rotary wheel, the possibility of generating cracks, slag inclusion and bubbles on the inner wall of the steel cylinder is reduced, and further the product quality and the qualification rate are improved. Through the accurate selection of the correction factor alpha (k), the stress concentration phenomenon of the grounding part and the transition section part of the steel cylinder can be optimized, and the structural strength and the safety of the product are improved. The mechanism is mainly attributed to the ideas of reinforcement learning and the application of epsilon-greedy strategies. By constantly learning and trying, the system is able to gradually find the optimal action selection (i.e., correction factor α (k)), thereby achieving precise control of the hydraulic servo system.

Further, the python execution procedure of step S203 is as follows:

import numpy as np

Both the state space and the motion space (the value of the correction factor alpha (k)) are discrete

Num_states= { } # size of state space, example value

Num_actions= { } # size of action space (i.e. number of possible values of correction factor)

# Q=. The updated Q-value table is obtained from step S202 of embodiment three

Setting epsilon value in epsilon-greedy policy

epsilon = 0.1

# Current state st (need to be retrieved or specified from the actual Environment)

current_state = {}

Function of# epsilon-greedy policy selection action (correction factor alpha (k))

def epsilon_greedy_policy(state, Q, epsilon):

if np.random.rand()<epsilon:

# Randomly select actions with probability of ε

return np.random.randint(num_actions)

else:

# Otherwise select action with maximum Q value

return np.argmax(Q[state, :])

Step S203 is performed to obtain the correction factor α (k)

def get_correction_factor(current_state, Q, epsilon):

# Use epsilon-greedy policy to select actions (correction factors)

chosen_action = epsilon_greedy_policy(current_state, Q, epsilon)

The # set motion space (value of correction factor alpha (k)) is predefined and mapped to [0.1, 0.3, 0.5, 0.7, 0.9]

correction_factors = [0.1, 0.3, 0.5, 0.7, 0.9]

return correction_factors[chosen_action]

Process for selecting correction factor once #

The #Q value table is obtained and updated from step S202

# Q=. The (use of the Q value updated in step S202)

correction_factor = get_correction_factor(current_state, Q, epsilon)

print(f"Selected correction factor α(k): {correction_factor}")

In the above procedure, the magnitudes of the state space and the operation space (i.e., the possible values of the correction factors) are set first. Meanwhile, an updated Q value table is obtained from step S202, and the epsilon value in the epsilon-greedy policy is set. The epsilon_greedy_policy function implements epsilon-greedy policy. It accepts as inputs the current state, Q value table and epsilon value and returns an action index. This action index corresponds to the selected correction factor. If the random number is less than ε, then randomly selecting an action; otherwise, selecting the action with the maximum Q value in the current state. The get_correction_factor function is the core of step S203. It invokes the epsilon_greedy_policy function to select an action (i.e., an index of correction factors) and then returns the corresponding correction factor value according to the predefined correction factor list. This list should be set according to the possible values of the correction factors in the actual application. In the simulation section, the updated Q value table is acquired from step S202, and the get_correction_factor function is called to select the correction factor. Finally, the selected correction factor value is printed out.

In the present embodiment, regarding step S3, a PID controller is executed: a discretized PID controller is introduced to control a hydraulic servo system, and the electric signal of the hydraulic servo system is dynamically scaled based on the obtained correction factor alpha (k):

；

Where u (k) represents the controller output electrical signal at sample time k. e (k) is a deviation at the sampling time k, which represents a difference between the set value (expected value) r (k) and the actual value y (k), i.e., e (k) =r (k) -y (k). Similarly, e (k-1) is the deviation at sampling instant k-1, and e (j) represents the deviation at any sampling instant j.

The three main parts of the PID controller are proportional (P), integral (I) and derivative (D) control, which are adjusted by the three gain parameters K _p、K_i and K _d, respectively. Proportional control is used to reduce the current bias, integral control is used to eliminate the past accumulated bias, and derivative control is used to predict and reduce the future bias.

It will be appreciated that the controller output u (k) here is dynamically scaled by a correction factor α (k). The correction factor α (k) is based on the output of the fuzzy logic control system of step S1, which takes into account the wall thickness deviation Δw and the dynamic change of the applied pressure force F, generating a fuzzy value α (k)'. In addition, the correction factor α (k) also incorporates the output of the reinforcement learning strategy for the machining efficiency in step S2. In this way, the correction factor α (k) can give consideration to both the machining efficiency and the machining accuracy; the output of the PID controller can be adjusted according to the real-time wall thickness deviation and the pressing force, so that the rotating wheel can adaptively adjust the pressing force. The method can reduce the possibility of cracks, slag inclusion and bubbles on the inner wall of the steel cylinder, and optimize the stress concentration phenomenon of the grounding part and the transition section part of the steel cylinder.

Further, the python execution procedure of step S3 is as follows:

import serial

import time

# initializing serial port communications

Serial_port=serial.serial ('COM 1', 9600, timeout=1) #com 1 port

# PID controller parameters

Kp = 1.0

Ki = 0.1

Kd = 0.01

# Initializing PID variables

previous_error = 0

integral = 0

# Input correction factor alpha (k)

alpha_k = {...}

def pid_controller(setpoint, process_variable, k):

global previous_error, integral

error = setpoint - process_variable

derivative = error - previous_error

integral += error

u_k = alpha_k * (Kp * error + Ki * integral + Kd * derivative)

previous_error = error

return u_k

# Main cycle

For k in range (100): # 100 control cycles

# Obtaining actual value from sensor of hydraulic System (Process_variable)

process_variable = k / {...}

# Set target value (setpoint)

setpoint = {...}

Output u (k) of# calculation PID controller

u_k = pid_controller(setpoint, process_variable, k)

# Send u (k) to the PLC controller through the serial port

Message=f "U { u_k:.2f } \n" # formatted message, retaining two-bit decimal numbers

The serial_port.write () # sends an electrical signal u (k) to the PLC

# Analog control cycle delay

time.sleep(0.1)

Serial port communication with # closed

serial_port.close()

In the above procedure, serial communication is first initialized using the serial library of Python. A COM1 port was used and the baud rate was set to 9600. Proportional (Kp), integral (Ki) and derivative (Kd) gain parameters of the PID controller are set. Global variables for PID calculation are then initialized, including the last error (previous_error) and the integral of the error (integral). The pid_controller function calculates the output u (k) of the PID controller from the setpoint (setpoint), the actual value (process_variable) and the current time (k). This function uses global variables to track the last error and the integral of the error.

100 Control cycles (adjustable) were performed in the main cycle. In each cycle, the actual value is obtained from a sensor or otherwise and a target value is set. Then, the pid_controller function is called to calculate the output u (k) of the PID controller. At the same time, the correction factor alpha (k) is updated in each control cycle;

After u (k) is calculated, the u (k) is formatted into a character string message and is sent to the PLC through a serial port. This message starts with "U", followed by the value of U (k), retaining a two-bit decimal. After the PLC receives this message, it can control the action of the hydraulic system (cylinder) according to the value of u (k). The delay of one control period is simulated after each control cycle is completed to simulate the control period in an actual system.

Embodiment two: the embodiment further discloses a fuzzy rule base based on the technical scheme of the first embodiment. The fuzzy inference engine of step S101 uses a series of IF-THEN rules, but these are all encapsulated in a fuzzy rule base. In this embodiment, the fuzzy rule base includes the following rules (R _i):

(rule 1) if the wall thickness deviation is small and the pressing force is small, the correction factor α (k) should be small.

Principle of: when the wall thickness deviation and the pressing force are small, the position control of the current spinning wheel is quite accurate, and the applied pressure is small. In this case, the system only needs to make fine adjustments to maintain its accuracy, so the correction factor should be set small.

(Rule 2) if the wall thickness deviation is small and the pressing force is moderate, the correction factor α (k) should be small.

Principle of: even though the pressing force is moderate, the position control is still accurate due to the very small wall thickness deviation. To avoid new problems caused by overcorrection, the correction factor should be kept small.

(Rule 3) if the wall thickness deviation is small and the pressing force is large, the correction factor α (k) should be medium.

Principle of: in the case of small wall thickness deviations, unnecessary deformations may be caused if the pressing force is large. Therefore, the correction amount needs to be appropriately increased to cancel the influence possibly caused by the excessive pressing, but the excessive adjustment is avoided, so the correction factor is set to be medium.

(Rule 4) if the wall thickness deviation is moderate and the pressing force is small, the correction factor α (k) should be medium.

Principle of: moderate wall thickness deviations indicate that a certain amount of adjustment is required to achieve more precise control. Since the pressing force is small and may not be sufficient to cause significant deformation, a moderate correction is required to ensure the accuracy of the adjustment.

(Rule 5) if the wall thickness deviation is moderate and the pressing force is moderate, the correction factor α (k) should be medium.

Principle of: under the condition that the wall thickness deviation and the pressing force are moderate, the position adjustment can be smoothly and effectively carried out by adopting the moderate correction amount, and the wall thickness deviation and the pressing force are not excessively aggressive or excessively conservative.

(Rule 6) if the wall thickness deviation is moderate and the pressing force is large, the correction factor α (k) should be large.

Principle of: the large pressing force may cause large deformation, so a large correction amount is required to ensure the accuracy of the position of the rotor, and prevent the position deviation caused by the excessive pressure.

(Rule 7) if the wall thickness deviation is large and the pressing force is small, the correction factor α (k) should be large.

Principle of: in the case of large wall thickness deviations, even with small pressing forces, a large correction is required to quickly reduce the deviations and bring the rotor position back to the correct trajectory as soon as possible.

(Rule 8) if the wall thickness deviation is large and the pressing force is moderate, the correction factor α (k) should be large.

Principle of: when the wall thickness deviation is large and the pressing force is moderate, a large correction amount is needed for adjustment in order to quickly correct the deviation.

(Rule 9) if the wall thickness deviation is large and the pressing force is large, the correction factor α (k) should be large or extremely large.

Principle of: in extreme cases where both wall thickness deviations and pressing forces are very large, very large corrections are required to quickly correct the positional deviations. This may indicate a serious problem in the production process and even emergency measures may need to be taken to prevent further problems.

In this embodiment, the terms "small", "large", etc. are defined based on the membership function in embodiment one. However, for the division of different intervals, this embodiment provides a standardized method for reference, taking "small", "maximum" as an example:

1) Is very small: a left falling membership function is used to represent "small", e.g., a falling half trapezoid or falling half ridge function. x represents the input value (wall thickness deviation or pressing force), a and b are parameters defining a "small" range, and the "small" membership function μ _VS (x) can be expressed as:

；

where b < a, b is the lower bound of the "small" range and a is the upper bound. When x is less than or equal to a, the membership is calculated according to a function; when x is greater than a, x does not belong to the "small" fuzzy set at all.

2) "Very large" can be represented by a membership function similar to "very large", but with a narrower range, meaning that only very large values belong to this fuzzy set. Its membership function μ _EL(x) is expressed as:

；

where g < f, g is the lower bound of the "maximum" range and f is the upper bound.

Further, the python execution procedure of the fuzzy rule base disclosed in this embodiment is as follows:

import numpy as np

import skfuzzy as fuzz

from skfuzzy import control as ctrl

Domain of input and output definition #

wall_thickness_deviation = ctrl.Antecedent(np.arange(0, 101, 1), 'wall_thickness_deviation')

applied_pressure = ctrl.Antecedent(np.arange(0, 101, 1), 'applied_pressure')

correction_factor = ctrl.Consequent(np.arange(0, 101, 1), 'correction_factor')

# Define fuzzy sets for input and output and membership functions thereof

wall_thickness_deviation.automf(3, names=['small', 'medium', 'large'])

applied_pressure.automf(3, names=['small', 'medium', 'large'])

correction_factor['small'] = fuzz.trimf(correction_factor.universe, [0, 0, 50])

correction_factor['medium'] = fuzz.trimf(correction_factor.universe, [25, 50, 75])

correction_factor['large'] = fuzz.trimf(correction_factor.universe, [50, 100, 100])

# Definition fuzzy rule

rule1 = ctrl.Rule(wall_thickness_deviation['small']&applied_pressure['small'], correction_factor['small'])

rule2 = ctrl.Rule(wall_thickness_deviation['small']&applied_pressure['medium'], correction_factor['small'])

rule3 = ctrl.Rule(wall_thickness_deviation['small']&applied_pressure['large'], correction_factor['medium'])

rule4 = ctrl.Rule(wall_thickness_deviation['medium']&applied_pressure['small'], correction_factor['medium'])

rule5 = ctrl.Rule(wall_thickness_deviation['medium']&applied_pressure['medium'], correction_factor['medium'])

rule6 = ctrl.Rule(wall_thickness_deviation['medium']&applied_pressure['large'], correction_factor['large'])

rule7 = ctrl.Rule(wall_thickness_deviation['large']&applied_pressure['small'], correction_factor['large'])

rule8 = ctrl.Rule(wall_thickness_deviation['large']&applied_pressure['medium'], correction_factor['large'])

rule9 = ctrl.Rule(wall_thickness_deviation['large']&applied_pressure['large'], correction_factor['large'])

# Create control System and add rules

control_system = ctrl.ControlSystem([rule1, rule2, rule3, rule4, rule5, rule6, rule7, rule8, rule9])

simulator = ctrl.ControlSystemSimulation(control_system)

Main program # (reasoning)

Input [ 'wall_speed_displacement' ] =65# input: moderate wall thickness deviation

Input [ 'applied_pressure' ] =75# input: greater pressing force

simulator.compute()

print(simulator.output['correction_factor'])

The above procedure uses skfuzzy libraries (scipy. Fuzzy or fuzzy-c-means extensions) to build fuzzy rule libraries, the principle of which is: the domains (possible range of values, using an integer range of 0 to 100) are defined for wall thickness deviations, pressing forces and correction factors. A fuzzy set is then defined for each input and output. Three fuzzy sets (small, medium, large) are automatically created using automf method, and membership functions of the three fuzzy sets (small, medium, large) are manually defined for the correction factors. According to the provided rules, fuzzy rules are created using ctrl. These rules determine the fuzzy set of the output (correction factor) based on the fuzzy set of the input (wall thickness deviation and pressing force). All rules are then added to a control system and a simulator is created to run the control system.

The main routine illustratively demonstrates that the simulator provides the input values (wall thickness deviation of 65 and compression force of 75) and calculates the output (correction factor). This output is a fuzzy set that represents the distribution of possible values of the correction factors given the input.

Embodiment III: referring to fig. 1, the embodiment further discloses a specific implementation of the SARSA algorithm and the epsilon-greedy strategy in step S202 based on the first embodiment. It includes the following steps S2020-S2022.

In the present embodiment, regarding step S2020, in combination with the fuzzy logic control: the SARSA algorithm considers the fuzzy value α (k)' of the fuzzy logic control output and the current efficiency state of the hydraulic servo system when selecting the action to be actually performed. This process may be expressed as selecting an action a _t that is based on the current state s _t and a suggestion of the ambiguity value α (k)' and the value function Q (s _t, a) in the SARSA algorithm. Action a _t chooses to use the epsilon-greedy policy:

；

Wherein, Is a small positive number representing the probability of exploration. In practice, the blur value α (k)' is used as a reference or weight selected by act a _t. a is a specific value of the correction factor α (k) that is selected by the agent based on the current state and environmental feedback. The initialization expression of the value function Q (s _t, a) is as follows:

；

Wherein the symbols are Means "for all"; s is the state space and A is the action space. At the beginning, for all state-action pairs, it is assumed that their expected return is 0, and these values are learned and adjusted step by step as the algorithm runs.

It should be noted that the state s _t, which includes the current closing-in efficiency E _current, the wall thickness deviation real-time value d, and the applied pressure real-time value f, is a vector or data structure.

It should be noted that action a _t is a behavior or decision selected by the agent according to current state s _t. In the control system of the cylinder necking machine, the action a _t represents a specific control instruction to the hydraulic servo system, and the electric signal u (k-1) output in the step S3 at the previous moment k-1 is shown; the selection of action a _t is based on the current state st and the value function Q (st, a) in order to maximize the long-term jackpot. Under the epsilon-greedy strategy, the action with the highest Q value (i.e., the action currently considered optimal) is selected most of the time, but there is also some probability ϵ that an action is randomly selected to explore the possible better choices.

More specifically, in selecting actions, the agent employs an ε -greedy strategy that balances the relationship of exploration and utilization. To be used forThe smart will select the action with the highest Q value in the current state, i.e., the action it deems to be able to bring about the maximum length jackpot. This is done using knowledge that the agent has learned in order to obtain the maximum return. However, in order to avoid sinking into the locally optimal solution, the agent also acts asIs selected randomly for an action. This randomness gives the agent the opportunity to explore actions that may not be optimal under the current knowledge system, but in fact may lead to greater returns. In this way, the intelligent agent can continuously expand its knowledge boundary and find better strategies.

Preferably, in practice, the selected action a _t is the electrical signal u (k-1) output at step S3 at the previous time k-1, which represents the adjustment of the correction factor α (k). This adjustment is then used to control certain parameters of the hydraulic servo system, such as the pressing force of the spinning wheel, etc., to optimize the necking efficiency. By the mode, the hydraulic system can dynamically adjust the pressing force of the rotary wheel according to real-time processing conditions and wall thickness deviation conditions, so that self-adaptive intelligent control is realized.

In the present embodiment, regarding step S2021, the action is performed and the bonus is observed: in the hydraulic servo position control system of the cylinder necking machine, step S2021 is to execute the action selected by the SARSA algorithm and observe the rewards and new system status generated thereby. Specifically, when the system selects an action a _t (i.e., adjusts the correction factor α (k)) according to the SARSA algorithm, the action is performed immediately. The execution action corresponds to the activation of the step S3, and then the hydraulic servo system and the pressing force thereof are adjusted to adapt to the current wall thickness deviation condition. After performing the action, the system observes and records two key information: a new bonus factor r' and a new hydraulic servo efficiency state st+1. The bonus factor r' is a direct feedback of the environment to the execution of the action (based on the form of step S201) for evaluating the quality of the action a _t (i.e., adjusting the correction factor α (k)), based on the evaluation of the production efficiency.

If the efficiency of the system is increased, the wall thickness deviation is reduced, or the steel cylinder quality is improved after the action is performed, the reward factor r' will be positive; conversely, if the action causes a decrease in system performance or a problem, the bonus factor r' will be negative. The new state st+1 reflects the new situation after the system performs the action, including the pressing force of the hydraulic servo system.

In the present embodiment, regarding step S2022, the reinforcement learning algorithm is updated: after observing the new bonus factor r' and the new state _st+1, step S2022 is entered, i.e. the value function Q in the SARSA algorithm is updated (S _t, a). The value function Q (s _t, a) plays a core role in reinforcement learning, which estimates the long-term jackpot that can be achieved by performing action a in a given state s:

；

Wherein: η is the learning rate, which is the interval value between [0,1], which controls the step size of the value function update. The larger the learning rate is, the larger the adjustment amplitude of the value function is when the value function is updated each time, the faster the learning speed is possible, but the unstable learning process is also possible to be caused; the smaller the learning rate, the smaller the update amplitude, the more stable the learning process, but more learning time may be required. The η assignment therefore requires subjective assignment based on subjective activity by those skilled in the art.

Gamma is a discount factor, which is the interval value between 0,1, used to weigh the importance of future rewards. The closer the discount factor is to 1, the more the system pays attention to future rewards; the closer the discount factor is to 0, the more focused the system is to the current instant prize. The assignment of gamma therefore requires subjective assignment based on subjective activity by those skilled in the art.

A _t+1 is the next action selected according to the value function and possible epsilon-greedy policy in the new state s _t+1. This action selection process is similar to the action selection process described in step S2020, with the aim of maximizing the long-term jackpot.

Q (s _t, a)' is an updated value function that will be used to guide the system in making more intelligent decisions in the future.

Further, the python execution procedure of this embodiment is as follows:

import numpy as np

discrete values of# call state space and action space

Num_states=10# state space

Num_actions=5# action space

# Initializing the value function Q (s, a)

Q = np.zeros((num_states, num_actions))

Parameters of the # SARSA algorithm

Learning_rate= learning rate η of 0.1 #

Discount _factor=0.9 discount factor gamma

Epsilon=epsilon in 0.1# epsilon-greedy policy

Current state st and action at

Current_state=0#) example initial State

Current_action=np.argmax (Q current_state) # selects the initial action, which is abbreviated as selecting the action with the largest Q value

Function of environment feedback, return to new state and rewards

def step(current_state, current_action):

New_state= Current_State+1% num_states# state transition

Reward=1 if new_state= = 0 else0# reward function

return new_state, reward

# Epsilon-greedy policy selection actions

def epsilon_greedy_policy(state, epsilon):

if np.random.rand()<epsilon:

Return np. Random (num_actions) # randomly selects actions with epsilon probability

else:

Return np. Argmax (Q state:) # otherwise select the action with the largest Q value

# SARSA update procedure

def sarsa_update(current_state, current_action, new_state, reward):

# Select next action according to epsilon-greedy policy in new state

next_action = epsilon_greedy_policy(new_state, epsilon)

# Update value function Q (s, a)

Q[current_state, current_action] += learning_rate * (reward + discount_factor * Q[new_state, next_action]- Q[current_state, current_action])

return next_action

# One SARSA learning procedure

new_state, reward = step(current_state, current_action)

next_action = sarsa_update(current_state, current_action, new_state, reward)

# Update current state and action, prepare for the next iteration

current_state = new_state

current_action = next_action

# Outputting the updated value function Q (s, a) and forming a Q value table

print("Updated Q-value function:")

print(Q)

In the above procedure, the initialization value function Q is an all-zero matrix whose size is determined by the size of the state space and the motion space. The learning rate of SARSA algorithm, discount factor and epsilon value in epsilon-greedy strategy are also set. The step function simulates the environment's reaction to the action, returning a new state and rewards. The epsilon_greedy_policy function realizes epsilon-greedy policy, and selects an action randomly with epsilon probability, otherwise, selects the action with the largest Q value in the current state.

The SARSA _update function performs the update step of the SARSA algorithm. The Q value of the current state and action is updated based on the new state, the prize, and the next action. The update formula is the core of the SARSA algorithm, which combines the estimated values of the instant rewards and the future rewards to adjust the Q value.

An action is performed during part of a SARSA learning process, feedback of the environment is observed, and the value function is updated using SARSA update rules. Then we update the current state and action in preparation for the next iteration. Finally, the updated value function Q is output, which now contains the new knowledge obtained by the system through one SARSA learning. In practice, this learning process is repeated until the value function converges or a predetermined number of learning rounds is reached.

Test example:

Test purpose (one):

The control method aims at verifying the advantages of the necking machine introduced into the rotary wheel hydraulic servo position control system in the steel cylinder machining process, comparing the advantages with the advantages of the traditional control mode (control group), and evaluating the influence of a new control method on the stress distribution of the steel cylinder.

(II) test apparatus and method:

2.1 experimental group equipment:

An automatic spinning TH12001-40R necking machine is provided with a spinning roller hydraulic servo position control system.

The spinning roller hydraulic servo position control system comprises a processor and a memory connected with the processor, wherein a program instruction is stored in the memory, when the program instruction is executed by the processor, the processor is enabled to execute the spinning roller hydraulic servo position control method (S1-S3), and after the correction factor alpha (k) is generated, the correction factor alpha (k) is read by a PLC controller and used for controlling the spinning roller hydraulic servo system of the automatic spinning TH12001-40R necking machine.

The hydraulic system performs a pressurization simulation through Autodesk Inventor assembly environments.

PID controller algorithm and correction factor α (k) were performed using Siemens S7-200 CPU 226 type PLC controller.

2.2 Control group device:

The same model of automatic spinning TH12001-40R necking machine is controlled by adopting a conventional PLC timing diagram programming.

2.3 Simulation environment:

The hydraulic system execution was simulated using the assembly environment of Autodesk Inventor.

The stresses were simulated using a Autodesk Inventor finite element analysis module.

Communication is established with Autodesk Inventor through Siemens S7-200 CPU 226 type PLC.

2.4 Test materials:

a 347 stainless steel cylinder 500mm in diameter and 2000mm in height, 15mm in wall thickness, 80mm in diameter and 28mm in thickness in longitudinal projection. The remaining parameters are shown in table 1;

TABLE 1 Material parameter Table

(III) test content and steps:

3.1 preparation stage:

And setting parameters of a closing machine of an experimental group and a control group.

A cylinder model is created at Autodesk Inventor and material properties are set.

3.2 Execution phase:

Experimental group: the method for controlling the hydraulic servo position of the spinning wheel is executed through a PID instruction functional block of the PLC, and a correction factor alpha (k) is applied.

Control group: and programming and controlling the operation of the necking machine according to a conventional PLC timing diagram.

And (3) processing simulation is carried out on the two groups of necking machines in a simulation environment until the processing of the steel cylinder is completed.

3.3 Analysis stage:

And performing stress simulation analysis on the processed steel cylinder by using a Autodesk Inventor finite element analysis module.

The total stress, the grounding part stress and the transition part stress of the steel cylinders of the experimental group and the control group are compared.

3.3.1 Setting of the landing position:

and determining the specific position of the grounding part of the steel cylinder, namely the area where the bottom of the steel cylinder is contacted with the ground.

Applying vertical downward pressure to the area to simulate the stress condition of the steel cylinder when the steel cylinder is placed.

Boundary conditions are set to limit the displacement of the bottom of the steel cylinder.

3.3.2 Setting the transition section part:

and determining the specific position of the transition section of the steel cylinder. And applying corresponding rigid constraint conditions on the transition section to simulate the stress condition in the actual processing process.

3.3.3 Pressure parameters:

3.3.4 experimental and control group co-parameters:

Simulating the internal pressure of the steel cylinder: set to 15MPa to simulate the internal pressure of the cylinder when it is operating normally.

Simulation of surface stress: 55Mpa;

simulating internal stress: 12-15 Mp;

setting the allowable tensile strength: setting according to the scope of GB 713-2014;

(IV) test results:

4.1 overall stress control:

The overall stress distribution of the steel cylinders of the experimental group is obviously superior to that of the control group, and the steel cylinders show more uniform stress distribution and lower stress concentration phenomenon. As shown in FIG. 4, wherein (a) and (c) are control groups and (b) and (d) are experimental groups.

4.2 Stress control at the landing site:

The stress of the experimental group is obviously lower than that of the control group at the grounding part of the steel bottle, which indicates that the novel control method effectively reduces the stress concentration of the area. As shown in fig. 5, where (a) is a control group and (b) is an experimental group. It should be noted that in part (b) of fig. 5, the stress of the wall structure is because the part is set as a fixed point in software for simulation, and the structure of the part is half-cut for clarity of illustration, so that the stress point of the part does not exist in practice.

4.3 Transition section site stress control:

at the transition section of the steel cylinder, the stress distribution of the experimental group is also superior to that of the control group, and the experimental group shows relatively smooth transition and lower stress peak value. As shown in fig. 6, wherein (a) is a control group and (b) is an experimental group.

Note that: the "landing zone" and "transition zone" are referred to collectively in conjunction with fig. 2.

5. Conclusion:

The method for controlling the hydraulic servo position of the rotary wheel is verified to be effective in the machining process of the steel cylinder necking machine. By introducing the control system, the experimental steel cylinders show significantly better overall stress, ground contact position stress and transition section position stress than the control group in the traditional control mode. This result demonstrates the remarkable effect of the new control method in improving the processing quality of the steel cylinders and reducing the stress concentration, and provides powerful support for process improvement in the steel cylinder manufacturing industry.

All of the above examples merely represent embodiments of the invention which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

It will be further appreciated by those of skill in the art that the various example elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the various example elements and steps have been described generally in terms of function in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims

1. The hydraulic servo position control method of the rotary wheel for the steel cylinder necking machine is characterized by comprising the following steps of:

s1, reading a fuzzy logic control system: selecting wall thickness deviation delta _w and pressing force F as input variables of fuzzy logic control; reading the fuzzy set and the fuzzy rule to infer a fuzzy value alpha (k)';

S2, executing a reinforcement learning algorithm: reading a set state space and an action space, maximizing closing efficiency through a reward function, regarding the fuzzy value alpha (k)' as an action of the reinforcement learning algorithm, dynamically outputting a correction factor alpha (k) by the reinforcement learning algorithm, and observing rewards obtained after executing the action;

S3, executing a PID controller: and introducing a discretized PID controller to control the hydraulic servo system, and dynamically scaling the electric signal of the hydraulic servo system based on the obtained correction factor alpha (k).

2. The servo position control method according to claim 1, characterized in that: in the step S1, the wall thickness deviation delta _w and the pressing force F are provided with the fuzzy sets, wherein the fuzzy sets comprise three levels of small S, medium M and large L, and each fuzzy set is specified by a membership function.

3. The servo position control method according to claim 1, characterized in that: in the step S1, the fuzzy rule is established by a fuzzy rule base, and the inference is performed by a fuzzy inference engine; the two are cooperated, and a fuzzy set FS of a fuzzy value alpha (k)' of a correction factor alpha (k) is deduced according to the input wall thickness deviation delta _w and the real-time value of the pressing force F; wherein for each rule R _i:

；

Wherein μ _Ai(Δ_w)、μ_Bi (F) and μ _Ci (α (k) ') are membership functions of the corresponding fuzzy sets of inputs Δ _w, F and outputs α (k)' respectively;

The fuzzy inference engine outputs a fuzzy set FS of the fuzzy value alpha (k)' through a weighted average aggregation method according to the activation degree of all the fuzzy rules.

4. A servo position control method according to claim 3, wherein: in the step S1, the fuzzy set FS is defuzzified to obtain the fuzzy value α (k)':

；

Wherein mu _F (x) represents the membership function of the fuzzy set FS, and a and b represent the domain range of the fuzzy set.

5. The servo position control method according to claim 1, characterized in that: in the step S2, the state space and the action space are implemented by:

S200, state space S: comprises a wall thickness deviation real-time value d and an applied pressure real-time value f:

；

Action space a: defining the value range of the correction factor alpha (k):

A = {α₁(k), α₂(k),...,α_n(k)}；

Wherein D is the range of values of the wall thickness deviation, and F is the set of values of the pressing force; α _i (k) is the i-th discrete value taken by the correction factor α (k) at the k-th time, and n is the total number of discrete values.

6. The servo position control method according to claim 5, characterized in that: in the step S2, the reward function is implemented by:

s201, executing a mechanism that the rewarding factor r is positive and negative rewarding:

；

Where Δe is the amount of change in the closing efficiency.

7. The servo position control method according to claim 5, characterized in that: in the S2, the reinforcement learning algorithm is a SARSA algorithm; the step of performing the introducing, the action and the observing in the step S2 by the SARSA algorithm is implemented by the following three steps:

S2020, in combination with fuzzy logic control: based on the epsilon-greedy strategy, an action a _t is selected which is based on the current state s _t and the fuzzy value alpha (k)', and the value function Q (s _t, a) in the SARSA algorithm:

；

Wherein, Is the probability of exploration; the blur value α (k)' is a weight selected by the action a _t; a is a specific value of the correction factor alpha (k);

S2021, performing actions and observing rewards: adjusting the correction factor α (k) according to the selected action a _t, observing a new bonus factor r' and a new hydraulic servo efficiency state s _t+1 after performing the selected action a _t;

S2022, update reinforcement learning algorithm: updating the value function Q (s _t, a) according to the new bonus factor r' and the state s _t+1:

；

Wherein: η is the learning rate, controlling the updated step size; gamma is the discount factor; a _t+1 is the next action selected according to the value function and the epsilon-greedy policy in new state _st+1; q (s _t, a)' is an updated value function.

8. The servo position control method according to claim 7, characterized in that: in S2, the output is achieved by:

s203, acquiring a correction factor alpha (k):

。

9. the servo position control method according to any one of claims 1 to 8, characterized in that: in the step S3, the discretized PID controller is:

；

where u (k) is the controller output at sample time k; e (k) is the deviation at sampling instant k; e (k-1) is the deviation at sampling instant k-1; e (j) represents the deviation value at the sampling instant j; r (j) is a set value at a sampling time j, and y (j) is an actual value at the same sampling time j;

；

K _p、K_i and K _d are the gains of the proportional, integral and derivative controllers, respectively.

10. The cylinder closing-in machine is with revolving round hydraulic servo position control system, its characterized in that: the system comprises a processor and a memory connected with the processor, wherein the memory stores program instructions, and when the program instructions are executed by the processor, the processor is caused to execute the servo position control method according to any one of claims 1-9, and after the correction factor alpha (k) is generated, the correction factor alpha (k) is read by a PLC controller and is used for controlling the rotary hydraulic servo system.