CN118101441B - Service scheduling method, device, equipment and storage medium - Google Patents
Service scheduling method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN118101441B CN118101441B CN202410519870.3A CN202410519870A CN118101441B CN 118101441 B CN118101441 B CN 118101441B CN 202410519870 A CN202410519870 A CN 202410519870A CN 118101441 B CN118101441 B CN 118101441B
- Authority
- CN
- China
- Prior art keywords
- node
- cluster
- nodes
- network
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000003860 storage Methods 0.000 title claims abstract description 19
- 238000005516 engineering process Methods 0.000 claims abstract description 14
- 238000004590 computer program Methods 0.000 claims description 20
- 230000008439 repair process Effects 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 10
- 238000007405 data analysis Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 238000007726 management method Methods 0.000 claims description 6
- 238000004519 manufacturing process Methods 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 5
- 230000000737 periodic effect Effects 0.000 claims description 4
- 238000004806 packaging method and process Methods 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 abstract description 9
- 230000011664 signaling Effects 0.000 description 13
- 238000013461 design Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000009977 dual effect Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000008263 repair mechanism Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000013024 troubleshooting Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003749 cleanliness Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Train Traffic Observation, Control, And Security (AREA)
- Hardware Redundancy (AREA)
Abstract
The application discloses a service scheduling method, a device, equipment and a storage medium, and relates to the technical field of railway signal operation and maintenance, wherein the method comprises the following steps: dividing nodes in an A network and a B network of a double-network signal system of the track traffic into two node groups respectively; clustering computing resources of the two node groups respectively; when detecting that a node in a certain cluster has a fault, scheduling the service of the fault node to other computing nodes in the cluster to run. According to the application, the two node groups are clustered respectively through the cloud primary technology, so that all nodes in the dual-network are respectively located in the two clusters, faults are found through the clusters, and the service of the faulty node is rapidly scheduled to other nodes in the clusters for operation, so that the operation and maintenance work of a manual mode for hours can be shortened to the system self-healing of a few seconds level, and the safety, reliability and stability of train operation are improved.
Description
Technical Field
The present application relates to the field of railway signal operation and maintenance technologies, and in particular, to a service scheduling method, device, equipment, and storage medium.
Background
Two independent signal networks are used in railway signal systems to ensure the safety and reliability of train operation. The AB two-network design is commonly referred to as a dual signal system or a dual signal system, and includes an a-network and a B-network in a primary-standby relationship. The reliability, stability and safety of the system can be improved, and the running smoothness and safety of the train are ensured.
The AB twin-wire design makes a great contribution to ensuring train safety, but the current AB twin-wire design has obvious defects: this design lacks self-healing capabilities. This drawback tends to reduce the reliability of the system. For example, when a network device fails, manual intervention is usually required to perform investigation and repair immediately, otherwise, the operation of the whole system depends on the only B network device for a long time. If the A system cannot be repaired in time, the B system seriously threatens the operation safety of the train and even triggers catastrophic results when the B system fails immediately. With the development of railway technology, particularly the rise of high-speed railway, various information technologies and automation technologies are widely used in railway signal systems, so that the systems are more and more complex. How to debug a repair system in a short time becomes a challenging task, and the complexity of the system in turn affects the reliability of the AB twin network.
Therefore, in the AB dual-network design, how to shorten the time for troubleshooting and repairing after the occurrence of a fault and to improve the stability and reliability of the signal system is a problem that needs to be solved at present.
Disclosure of Invention
The application mainly aims to provide a service scheduling method, a device, equipment and a storage medium, which aim to solve the technical problems of unstable signal system and low reliability caused by long time for checking and repairing after network faults in an AB double network of a railway signal system.
In order to achieve the above object, the present application provides a service scheduling method, which includes:
Dividing nodes in an A network and nodes in a B network of a double-network signal system of track traffic into a first node group and a second node group respectively, wherein the double-network signal system comprises the A network and the B network;
clustering computing resources of the first node group and computing resources of the second node group through computing nodes of the dual-network signal system respectively to obtain a first cluster and a second cluster;
When the node in the first cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the first cluster are located through the first cluster, or when the node in the second cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the second cluster are located through the second cluster.
In an embodiment, before the step of scheduling, by the first cluster, traffic of the failed node to a computing node where other nodes in the first cluster are located when the node in the first cluster is detected to be failed, the method includes:
Acquiring operation data of nodes in the first cluster in real time, and analyzing whether the operation data is abnormal or not to obtain a data analysis result;
Acquiring running log information of nodes in the first cluster in real time, and analyzing the running log information to obtain a log analysis result;
and judging whether the nodes in the first cluster have faults or not according to the data analysis result and the log analysis result.
In an embodiment, the step of scheduling, when detecting that a node in the first cluster fails, traffic of the failed node to a computing node where other nodes in the first cluster are located by the first cluster, includes:
when detecting that the node in the first cluster has a fault, establishing a target node in the computing nodes where other nodes in the first cluster are located through the first cluster;
And dispatching the service of the failed node to the target node for operation.
In an embodiment, after the step of scheduling the traffic of the failed node to the target node, the method further includes:
restarting the fault node through the first cluster;
When the restarting fails, node backup data of periodic backup is obtained;
performing application thermal repair operation on the fault node according to the node backup data, or replacing the detected fault component in the fault node with a healthy component in the node backup data corresponding to the fault component to obtain a repaired node;
and restoring the service of the fault node to the repaired node to operate.
In an embodiment, after the step of recovering the service of the failed node to the operation in the repaired node, the method includes:
Judging whether the repaired node meets a preset service operation requirement or not;
if the preset service operation requirement is not met, deploying a new service system through the first cluster;
and restoring the service of the fault node to the node of the new service system to operate.
In an embodiment, the step of deploying a new service system through the first cluster if the preset service operation requirement is not met includes:
packaging the first cluster and the dependent environments required by the first cluster according to a container technology to obtain a container mirror image;
Automatically constructing the container mirror image, carrying out versioning management on the constructed container mirror image, and obtaining the versioned container mirror image;
And deploying a new service system according to the versioned container mirror image.
In one embodiment, the step of deploying a new service system according to the versioned container image includes:
Testing the versioned container image in a production environment;
And when the test result meets the preset deployment condition, deploying a new service system according to the versioned container mirror image.
In addition, to achieve the above object, the present application also proposes a service scheduling apparatus, including:
the node grouping module is used for dividing nodes in an A network and nodes in a B network of a double-network signal system of the track traffic into a first node group and a second node group respectively, wherein the double-network signal system comprises the A network and the B network;
The node cluster module is used for respectively clustering the computing resources of the first node group and the computing resources of the second node group through the computing nodes of the double-network signal system to obtain a first cluster and a second cluster;
And the service scheduling module is used for scheduling the service of the failed node to the computing node where other nodes in the first cluster are located through the first cluster when the node in the first cluster is detected to fail, or scheduling the service of the failed node to the computing node where other nodes in the second cluster are located through the second cluster when the node in the second cluster is detected to fail.
In addition, to achieve the above object, the present application also proposes a service scheduling apparatus, the apparatus comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the traffic scheduling method as described above.
In addition, to achieve the above object, the present application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the traffic scheduling method as described above.
Furthermore, to achieve the above object, the present application provides a computer program product comprising a computer program which, when being executed by a processor, implements the steps of the traffic scheduling method as described above.
The application provides a service scheduling method, which divides nodes in an A network and nodes in a B network of a double-network signal system of rail transit into a first node group and a second node group respectively; respectively clustering the computing resources of the first node group and the computing resources of the second node group through computing nodes of the dual-network signal system to obtain a first cluster and a second cluster; when the node in the first cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the first cluster are located through the first cluster, or when the node in the second cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the second cluster are located through the second cluster. According to the application, the two node groups are clustered through the cloud primary technology, so that all nodes in the dual-network are respectively located in the two clusters, faults are found through the clusters, and the service of the faulty node is rapidly scheduled to other nodes in the clusters to run, so that the operation and maintenance work of a manual mode for hours can be shortened to a system self-healing of a few seconds, and the safety, reliability and stability of train running are improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic flow chart of a first embodiment of a service scheduling method according to the present application;
FIG. 2 is a schematic diagram of a prior art dual-network signaling system;
fig. 3 is a schematic structural diagram of an improved dual-network signaling system in the service scheduling method of the present application;
FIG. 4 is a schematic diagram of a system architecture after a node is newly built in the service scheduling method of the present application;
Fig. 5 is a schematic flow chart provided in a second embodiment of a service scheduling method according to the present application;
fig. 6 is a schematic flow chart of a third embodiment of a service scheduling method according to the present application;
fig. 7 is a schematic block diagram of a service scheduling device according to an embodiment of the present application;
fig. 8 is a schematic device structure diagram of a hardware operating environment related to a service scheduling method in an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the technical solution of the present application and are not intended to limit the present application.
For a better understanding of the technical solution of the present application, the following detailed description will be given with reference to the drawings and the specific embodiments.
The main solutions of the embodiments of the present application are: dividing nodes in an A network and nodes in a B network of a double-network signal system of track traffic into a first node group and a second node group respectively, wherein the double-network signal system comprises the A network and the B network; clustering computing resources of the first node group and computing resources of the second node group through computing nodes of the dual-network signal system respectively to obtain a first cluster and a second cluster; when the node in the first cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the first cluster are located through the first cluster, or when the node in the second cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the second cluster are located through the second cluster.
The existing current AB double-network design in the prior art has obvious defects: this design lacks self-healing capabilities. This drawback tends to reduce the reliability of the system. For example, when a network device fails, manual intervention is usually required to perform investigation and repair immediately, otherwise, the operation of the whole system depends on the only B network device for a long time. If the A system cannot be repaired in time, the B system seriously threatens the operation safety of the train and even triggers catastrophic results when the B system fails immediately. Therefore, how to shorten the time for troubleshooting and repairing after occurrence of a fault and to improve the stability and reliability of the signal system is a problem to be solved at present.
The application provides a solution, two node groups are clustered through a cloud native technology, so that all nodes in a double network are respectively positioned in the two clusters, faults are found through the clusters, and the service of the fault node is rapidly scheduled to other nodes in the clusters to run, thereby shortening the operation and maintenance work of a manual mode for a few hours to a system self-healing of a few seconds level, and improving the safety, reliability and stability of train running.
It should be noted that, the execution body of the method of the embodiment may be a computing service device with functions of service scheduling, network communication and program running, such as a tablet computer, a personal computer, a mobile phone, etc.; the service scheduling device described above having the same or similar functions may be also used. The present embodiment and the following embodiments will be described by taking a service scheduling apparatus as an example.
Based on this, an embodiment of the present application provides a service scheduling method, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the service scheduling method of the present application.
In this embodiment, the service scheduling method includes steps S10 to S30:
and S10, dividing nodes in an A network and nodes in a B network of a double-network signal system of the track traffic into a first node group and a second node group respectively, wherein the double-network signal system comprises the A network and the B network.
It should be noted that, in the signal system of the rail traffic, two independent signal networks may be used to ensure the safety and reliability of the train running, and this AB dual-network design is generally called a dual-signal system or dual-network signal system, where the AB dual-network design includes an a-network and a B-network that are in a primary-standby relationship with each other. For example, the structure of the existing dual-network signal system may refer to fig. 2, where, taking three subway stations of the first station, the second station and the third station as examples, each station has two groups of AB hosts connected to two red-blue ring networks at the same time, and the whole network has no single point of failure. In railway transportation, safety is critical, and thus the signaling system must be designed for high reliability. The AB two-network design provides redundancy and redundancy by using two independent signal networks to ensure that the train can still run safely even if one signal network fails.
It will be appreciated that the two signal networks, a-network and B-network, are typically composed of different equipment, lines and control systems, and operate independently of each other, with the rail transit trains receiving signal information over the two networks to ensure signal accuracy and reliability. If one network fails or needs maintenance, the system can be automatically switched to the other network, so that the continuity and safety of train operation are ensured.
It should be noted that in the AB dual-network design in the railway signal system, the nodes generally refer to key devices or control points in the signal system, which are responsible for executing or transmitting commands and information, so as to ensure the safety and efficiency of train operation. These nodes may include signal controllers, track circuits, switch controllers, signal lights, wireless communication devices, etc. that together form a signal network that ensures safe operation of the train. These devices are responsible for executing or communicating commands and information to control the operating status of the train and the usage of the track.
It should be understood that, in order to solve the problem that the dual-network signal system designed by the AB dual-network lacks self-healing capability and has too long time for manually checking faults and repairing equipment, in this embodiment, the nodes in the a network and the nodes in the B network are respectively divided into a first node group and a second node group, and all the nodes in one network are uniformly managed in a packet mode, so that each group is conveniently clustered subsequently, the fault detection speed is improved, and the service is timely scheduled to the healthy nodes.
Step S20, clustering computing resources of the first node group and computing resources of the second node group through computing nodes of the dual-network signal system respectively to obtain a first cluster and a second cluster;
It can be understood that clustering software may be deployed in the computing nodes of the dual-network signaling system, and the computing resources of the first node group and the computing resources of the second node group are clustered respectively to obtain a first cluster and a second cluster (i.e., two clusters a and B), and the structure of the clustered system may refer to fig. 3, where fig. 3 is a schematic structural diagram of an improved dual-network signaling system in the service scheduling method of the present application. Under normal operation, the traffic of each node runs on the computing node of its own home site.
It should be appreciated that the clustering technique may be to aggregate multiple computing resources together for management and use as a whole. For example, clustering may be performed by the container orchestration tool Kubernetes, which allows multiple containers to work cooperatively in a cluster, providing load balancing, service discovery, auto-scaling, etc., to manage large-scale containerized applications.
And step S30, when the node in the first cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the first cluster are located through the first cluster, or when the node in the second cluster is detected to be faulty, the service of the faulty node is scheduled to be run in the computing node where other nodes in the second cluster are located through the second cluster.
It will be appreciated that in an AB two-network design in a railway signaling system, a computing node is typically referred to as a device or system with data processing and computing capabilities that are responsible for analyzing and processing the information collected to guide the safe operation of the train. Computing nodes are a subset of nodes, which may be distributed over an a-network or B-network, and computing nodes may be responsible for processing data from both networks and making decisions based on the data.
It is worth noting that there are compute nodes in the two-network signaling system, there are also compute nodes in each node in the AB two-network, and these compute nodes are typically independent. The computing nodes in the two-network signaling system are typically responsible for processing the signal data from the track circuits, train detection equipment, and generating control instructions based on these data to ensure that the train is running safely on a given path and schedule. In railway signalling systems the presence of computational nodes is indispensable, and they can be located in control centers, along-line signalling base stations, and even on trains. Each node in the AB dual network (i.e., a node specific to a specific site such as the first site, the second site, etc.) may be equipped with a respective computing node as an important component in the railway network to control and manage the signaling devices of the present site, where the computing nodes may be connected to two separate signaling networks, namely the a-network and the B-network, to enable the reception of data and the transmission of commands.
It should be noted that, in this embodiment, since the nodes in the a network are clustered into the first cluster, the nodes in the B network are clustered into the second cluster, and the first cluster or the second cluster can respectively pass through the health monitoring mechanism of the cloud primary system, and by adopting an advanced detection technology (such as a fault diagnosis technology, an intelligent monitoring system or a dual-machine hot standby mechanism, etc.), the fault node in the network is rapidly discovered, the cluster can discover the fault within 300 ms to 2 seconds, and compared with the manual operation and maintenance for hours, the fault detection efficiency of the system is improved, and the operation and maintenance difficulty is greatly reduced. And then after the fault node is found, the service on the fault node can be quickly recovered on the healthy node in the cluster through the scheduling system of the cloud primary cluster, for example, after the node B of the first site in fig. 3 detects the fault, the service of the node B can be quickly scheduled to the node B of the second site through the second cluster to operate.
In one possible implementation, step S30 may include steps S301 to S302:
Step S301, when detecting that a node in the first cluster has a fault, establishing a target node in a computing node where other nodes in the first cluster are located through the first cluster.
And step S302, dispatching the service of the failed node to the target node for operation.
When a node in the first cluster (or the second cluster, which is described herein and below by taking the first cluster as an example) is detected to fail, a target node may be established in the computing nodes where the other nodes in the first cluster are located through the first cluster by using the computing resources of the other nodes in the first cluster. For example, referring to fig. 4, fig. 4 is a schematic diagram of a system structure after a node is newly built in the service scheduling method of the present application, after a node B of a first site fails, a node B may be built in a second site through a second cluster, and services of the node B of the first site may be scheduled to the node B, so that scheduling and service recovery may be implemented within 5 seconds. By using the computing resources of the second site, the traffic of the first site can be quickly restored to the dual-activity highly reliable and highly secure state.
In this embodiment, when the failure of the node in the first cluster is detected, the node is established at another site in the cluster to operate the service of the failed node, and the service of the failed node can be quickly restored to a highly reliable and highly safe state by using the computing resources of the other site.
The embodiment provides a service scheduling method, which divides nodes in an A network and a B network of a double-network signal system of rail transit into two node groups respectively; clustering computing resources of the two node groups respectively through computing nodes; when detecting that a node in a certain cluster has a fault, scheduling the service of the fault node to the computing node where other nodes in the cluster are located for operation. In the embodiment, the two node groups are clustered through the cloud native technology, so that all nodes in the dual-network are respectively located in the two clusters, faults are found through the clusters, and the service of the faulty node is rapidly scheduled to other nodes in the clusters to run, so that the operation and maintenance work of a manual mode for hours can be shortened to a system self-healing of a level of seconds, and the safety, reliability and stability of train running are improved.
In the second embodiment of the present application, the same or similar content as in the first embodiment of the present application may be referred to the above description, and will not be repeated. On this basis, please refer to fig. 5, fig. 5 is a schematic flow chart provided by a second embodiment of the service scheduling method of the present application, and before step S30, the service scheduling method further includes steps S201 to S203:
Step 201, acquiring operation data of nodes in the first cluster in real time, and analyzing whether the operation data is abnormal or not to obtain a data analysis result.
It will be appreciated that the operational data of the nodes in the first cluster may be obtained, and the operational data may include operational data related to hardware information, software information, network communication status information, sensor information, power information, interface information, and the like. After the operation data is obtained, the information can be analyzed one by one to obtain a data analysis result. For example, the hardware information may include components of track circuits, annunciators, switches, etc., which may fail due to frequency interference or equipment aging; software faults in the software information may involve errors in the system software, resulting in the signal system not being able to process the information correctly or to execute commands; monitoring network communication status information, such as network disconnection, network delay or data packet loss; various sensors monitor the working state of key components, such as temperature, pressure, current and the like, and once abnormal values are detected, possible faults can be judged; unstable or interrupted power supply can also cause node faults, and the normal operation of the whole signal system is affected; interface information, each device in the signal system is connected through an interface, and the normal communication and cooperative work between the devices can be affected by the fault of the interface.
Step S202, running log information of nodes in the first cluster is obtained in real time, and the running log information is analyzed to obtain a log analysis result.
It should be understood that in order to more comprehensively detect the operation state of the node, log files may be set in each device, each type of sensor, communication network and software, and used for collecting log information in the service operation process. By analyzing the log information, specific links of abnormal behaviors and errors can be tracked, so that the problem can be repaired.
And step 203, judging whether the nodes in the first cluster have faults according to the data analysis result and the log analysis result.
It should be understood that, by the results of the data analysis and the log analysis, whether there is abnormal data or status affecting the normal operation of the railway or the equipment is determined, and if there is the abnormality, it is determined that the node has failed.
In this embodiment, operation data and operation log information of nodes in the first cluster are obtained in real time, whether the operation data and the operation log information are abnormal or not is analyzed, and whether the nodes in the first cluster have faults or not is judged according to a data analysis result and a log analysis result. In the embodiment, the cloud primary system health monitoring mechanism is adopted to detect the signal system in real time through various means, so that the fault node in the network is rapidly found, the health state of the double-network signal system is ensured, and the running safety of the train is ensured.
In the third embodiment of the present application, the same or similar content as the first embodiment of the present application can be referred to the above description, and the description thereof will not be repeated. On this basis, referring to fig. 6, fig. 6 is a schematic flow chart provided by a third embodiment of the service scheduling method according to the present application, and after step S30, the service scheduling method further includes steps S40 to S70:
And step S40, restarting the fault node through the first cluster.
And step S50, when the restarting is failed, the node backup data of the periodic backup is obtained.
And step S60, performing application thermal repair operation on the fault node according to the node backup data, or replacing the detected fault component in the fault node with a healthy component in the node backup data corresponding to the fault component to obtain a repaired node.
And step S70, restoring the service of the fault node to the repaired node to operate.
It should be appreciated that after a node failure is discovered, the failure may be repaired by the repair mechanism of the cloud-primary system, which may be a diagnosis and resolution of the portion of the system where the problem occurred, possibly including application of software patches, adjustment of configuration, or other necessary intervention. The repair of the failed node may be performed by first restarting the failed node, and when the restart fails, it may be considered that a substantial failure has occurred, and the failed node may be subjected to a thermal repair or an operation of replacing a component, so as to rejoin the cluster and restore the normal operating state. In the cloud primary environment, a plurality of copies can be deployed and distributed on different nodes, so that even if one node fails, the copies on other nodes can still continue to provide services, and the services are ensured not to be interrupted. In Kubernetes, this can be achieved by REPLICASETS and Pod copies. In addition, to prevent data loss and catastrophic failure, the cloud primary system may periodically back up the data and store the data in persistent storage, which ensures the integrity and consistency of the data so that the data can be correctly restored when a failure occurs. Therefore, when the restarting fails, the node backup data of the periodic backup can be obtained, the hot repair operation is applied to the fault node according to the node backup data, or the detected fault component in the fault node can be replaced by the healthy component in the node backup data corresponding to the fault component, so that the repaired node is obtained, and then the service of the fault node is restored to the repaired node to operate, thereby ensuring the stability and the usability of the system.
In a first possible implementation manner, after step S70, steps S80 to S100 may further include:
and step S80, judging whether the repaired node meets the preset service operation requirement.
Step S90, if the preset service operation requirement is not met, deploying a new service system through the first cluster.
And step S100, restoring the service of the fault node to the node of the new service system to operate.
It should be appreciated that after a node failure is discovered, an attempt may first be made to solve the problem by the repair mechanism described above, including the application of thermal repair, replacement of the problematic components, and so forth. If the problem cannot be solved by direct repair, or the repaired node cannot meet the preset service operation requirement, that is, the repaired node cannot normally operate service, or the failed node needs to be updated and changed greatly, a new and corrected service system version can be deployed by utilizing the immutable infrastructure mechanism of the first cluster. The new version is constructed in a clean and unmodified environment, and the service of the fault node is restored to the node of the new service system to operate, so that the cleanliness and consistency of the new environment can be ensured. By a combination of both mechanisms, repair and immutable infrastructure, continuity and stability of the system can be ensured.
In this embodiment, when the repaired node does not meet the preset service operation requirement, a new service system is deployed through the first cluster; and restoring the service of the fault node to the node of the new service system to operate. In the embodiment, after the repair mechanism of the cloud primary system tries to solve the problem, the problem cannot be solved through direct repair, namely the repaired node still cannot normally operate the service, or the system needs to be updated and changed greatly, a new healthy version is deployed through the non-variable infrastructure mechanism of the cloud primary, and the continuity and stability of the system are ensured, so that the system is ensured to be quickly restored to the AB dual-activity state.
In a second possible embodiment, step S90 may include steps S901 to S903:
Step S901, packaging the first cluster and the dependency environment required by the first cluster according to the container technology, and obtaining a container mirror image.
It will be appreciated that cloud virtualization infrastructure can be used, which is the basis for building an immutable infrastructure that provides flexible, extensible computing resources. The applications and dependent environments are then packaged: the application programs in the first cluster and the needed dependent environments, base libraries, system environments and the like are packaged into a container mirror image by utilizing the container technology.
Step S902, automatically constructing the container mirror image, and carrying out versioning management on the constructed container mirror image to obtain the versioned container mirror image.
It should be appreciated that the ability to quickly roll back to any one of the historical versions can be ensured by the automated construction of container images, as well as the versioning management of those images. After the step of automatically building the container image is configured, the build process is automatically triggered each time a new code is submitted.
Step S903, deploying a new service system according to the versioned container image.
It will be appreciated that the versioned container image may be automatically deployed into a production environment by a continuous deployment system, which typically includes the steps of pulling the image, running the container instance, etc., to form a new business system.
In this embodiment, consistency and repeatability in different environments are ensured by deploying and managing applications and their dependent environments using container images as a standard. By automated construction of container images, and versioning management of these images, it is ensured that a quick rollback to any one historical version can be ensured. By means of automatic construction and deployment of container mirror images, development and deployment efficiency can be improved, human errors are reduced, and overall stability and safety of the system are improved.
In a third possible embodiment, step S903 may include steps S9031 to S9032:
step S9031, testing the versioned container image in a production environment.
Step S9032, when the test result meets the preset deployment condition, deploying a new service system according to the versioned container mirror image.
It will be readily appreciated that prior to deploying the new version, the versioned container image needs to be sufficiently tested and validated in the class production environment to ensure the stability and reliability of the new version. Once the new version passes the test, it can gradually migrate from the old version to the new version through a traffic switching mechanism, such as using a service grid or a load balancer, to achieve seamless switching. In addition, in the whole deployment process, the performance and health condition of the application can be tracked through the monitoring system, and meanwhile, log information is collected, so that the problem can be conveniently examined and analyzed. Therefore, when problems are found in the new version, the method can quickly recover to the previous version so as to reduce the influence on the service.
In this embodiment, before a new version is deployed, testing and verifying are performed on the versioned container image in a production environment, and when a test result meets a preset deployment condition, a new service system is deployed according to the versioned container image, so that stability and reliability of the new version can be ensured.
In this embodiment, the failed node is restarted by the first cluster; when the restarting fails, hot repair or component replacement is applied to the fault node through the node backup data which are backed up regularly, the repaired node is obtained, and the service of the fault node is restored to the repaired node to operate; when the repaired node does not meet the preset service operation requirement, deploying and managing the application program and the dependence environment thereof by using the container mirror image as a standard, ensuring consistency and repeatability in different environments, deploying a new service system after fully testing and verifying the container mirror image, and recovering the service of the fault node to the node of the new service system for operation. Therefore, the fault node can be repaired or a new service environment is established, so that the service of the fault node is normally operated, the overall stability and safety of the system are improved, and the running safety of the train is ensured.
The present application also provides a service scheduling device, referring to fig. 7, the service scheduling device includes:
a node grouping module 701, configured to divide a node in an a network and a node in a B network of a dual-network signal system of rail traffic into a first node group and a second node group, respectively, where the dual-network signal system includes the a network and the B network;
The node cluster module 702 is configured to cluster, by using computing nodes of the dual-network signaling system, computing resources of the first node group and computing resources of the second node group respectively, to obtain a first cluster and a second cluster;
And the service scheduling module 703 is configured to schedule, when detecting that a node in the first cluster fails, a service of the failed node to a computing node where another node in the first cluster is located through the first cluster, or schedule, when detecting that a node in the second cluster fails, a service of the failed node to a computing node where another node in the second cluster is located through the second cluster, and operate.
The service scheduling device provided by the application can solve the technical problem by adopting the service scheduling method in the embodiment. Compared with the prior art, the service scheduling device provided by the application has the same beneficial effects as the service scheduling method provided by the embodiment, and other technical features in the service scheduling device are the same as the features disclosed by the method of the embodiment, and are not repeated herein.
The application provides a service scheduling device, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the service scheduling method in the first embodiment.
Referring now to fig. 8, a schematic diagram of a traffic scheduling device suitable for use in implementing embodiments of the present application is shown. The service scheduling device in the embodiment of the present application may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal DIGITAL ASSISTANT: personal digital assistants), PADs (Portable Application Description: tablet computers), PMPs (Portable MEDIA PLAYER: portable multimedia players), vehicle-mounted terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The traffic scheduling device shown in fig. 8 is only an example and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
As shown in fig. 8, the traffic scheduling apparatus may include a processing device 1001 (e.g., a central processing unit, a graphic processor, etc.), which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access Memory (RAM: random Access Memory) 1004. In the RAM1004, various programs and data required for operation of the traffic scheduling apparatus are also stored. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. In general, the following systems may be connected to the I/O interface 1006: input devices 1007 including, for example, a touch screen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, and the like; an output device 1008 including, for example, a Liquid crystal display (LCD: liquid CRYSTAL DISPLAY), a speaker, a vibrator, and the like; storage device 1003 including, for example, a magnetic tape, a hard disk, and the like; and communication means 1009. The communication means 1009 may allow the traffic scheduling device to communicate wirelessly or wired with other devices to exchange data. While a traffic scheduling device having various systems is shown in the figures, it should be understood that not all of the illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the disclosed embodiment of the application are performed when the computer program is executed by the processing device 1001.
The service scheduling equipment provided by the application can solve the technical problem of service scheduling by adopting the service scheduling method in the embodiment. Compared with the prior art, the service scheduling device provided by the application has the same beneficial effects as the service scheduling method provided by the embodiment, and other technical features in the service scheduling device are the same as the features disclosed by the method of the previous embodiment, and are not repeated herein.
It is to be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
The present application provides a computer readable storage medium having computer readable program instructions (i.e., a computer program) stored thereon for performing the traffic scheduling method in the above-described embodiments.
The readable storage medium provided by the application is a computer readable storage medium, and the computer readable storage medium stores computer readable program instructions (i.e. a computer program) for executing the service scheduling method, so that the technical problem can be solved. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the application are the same as those of the service scheduling method provided by the above embodiment, and are not described herein.
The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the traffic scheduling method as described above.
The computer program product provided by the application can solve the technical problem. Compared with the prior art, the beneficial effects of the computer program product provided by the application are the same as those of the service scheduling method provided by the above embodiment, and are not described in detail herein.
The foregoing description is only a partial embodiment of the present application, and is not intended to limit the scope of the present application, and all the equivalent structural changes made by the description and the accompanying drawings under the technical concept of the present application, or the direct/indirect application in other related technical fields are included in the scope of the present application.
Claims (5)
1. The service scheduling method is characterized by comprising the following steps:
Dividing nodes in an A network and nodes in a B network of a double-network signal system of track traffic into a first node group and a second node group respectively, wherein the double-network signal system comprises the A network and the B network;
clustering computing resources of the first node group and computing resources of the second node group through computing nodes of the dual-network signal system respectively to obtain a first cluster and a second cluster;
Acquiring operation data of nodes in the first cluster in real time, and analyzing whether the operation data is abnormal or not to obtain a data analysis result;
Acquiring running log information of nodes in the first cluster in real time, and analyzing the running log information to obtain a log analysis result;
judging whether the nodes in the first cluster have faults or not according to the data analysis result and the log analysis result;
When the node in the first cluster is detected to be faulty, a target node is established in the computing nodes where other nodes in the first cluster are located through the first cluster, and the service of the faulty node is scheduled to run in the target node, or when the node in the second cluster is detected to be faulty, the service of the faulty node is scheduled to run in the computing nodes where other nodes in the second cluster are located through the second cluster;
restarting the fault node through the first cluster;
When the restarting fails, node backup data of periodic backup is obtained;
performing application thermal repair operation on the fault node according to the node backup data, or replacing the detected fault component in the fault node with a healthy component in the node backup data corresponding to the fault component to obtain a repaired node;
restoring the service of the fault node to the repaired node to operate;
Judging whether the repaired node meets a preset service operation requirement or not;
if the preset service operation requirement is not met, deploying a new service system through the first cluster;
and restoring the service of the fault node to the node of the new service system to operate.
2. The method of claim 1, wherein the step of deploying a new service system through the first cluster if the preset service operation requirement is not satisfied comprises:
packaging the first cluster and the dependent environments required by the first cluster according to a container technology to obtain a container mirror image;
Automatically constructing the container mirror image, carrying out versioning management on the constructed container mirror image, and obtaining the versioned container mirror image;
And deploying a new service system according to the versioned container mirror image.
3. The method of claim 2, wherein the deploying a new business system according to the versioned container image comprises:
Testing the versioned container image in a production environment;
And when the test result meets the preset deployment condition, deploying a new service system according to the versioned container mirror image.
4. A traffic scheduling device, the device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the traffic scheduling method of any one of claims 1 to 3.
5. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the traffic scheduling method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410519870.3A CN118101441B (en) | 2024-04-28 | 2024-04-28 | Service scheduling method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410519870.3A CN118101441B (en) | 2024-04-28 | 2024-04-28 | Service scheduling method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118101441A CN118101441A (en) | 2024-05-28 |
CN118101441B true CN118101441B (en) | 2024-07-23 |
Family
ID=91142611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410519870.3A Active CN118101441B (en) | 2024-04-28 | 2024-04-28 | Service scheduling method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118101441B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331098A (en) * | 2016-08-23 | 2017-01-11 | 东方网力科技股份有限公司 | Server cluster system |
CN111756573A (en) * | 2020-05-28 | 2020-10-09 | 浪潮电子信息产业股份有限公司 | CTDB double-network-card fault monitoring method in distributed cluster and related equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11818202B2 (en) * | 2021-03-12 | 2023-11-14 | Ceretax, Inc. | System and method for high availability tax computing |
CN113535474B (en) * | 2021-06-30 | 2022-11-11 | 重庆紫光华山智安科技有限公司 | Method, system, medium and terminal for automatically repairing heterogeneous cloud storage cluster fault |
US20230336407A1 (en) * | 2022-04-15 | 2023-10-19 | Dish Wireless L.L.C. | Automated server restoration construct for cellular networks |
CN116346582A (en) * | 2022-12-29 | 2023-06-27 | 交控科技股份有限公司 | Method, device, equipment and storage medium for realizing redundancy of main network and standby network |
CN117614825A (en) * | 2023-09-22 | 2024-02-27 | 平顶山中选自控系统有限公司 | Cloud primary platform of intelligent coal preparation plant |
-
2024
- 2024-04-28 CN CN202410519870.3A patent/CN118101441B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331098A (en) * | 2016-08-23 | 2017-01-11 | 东方网力科技股份有限公司 | Server cluster system |
CN111756573A (en) * | 2020-05-28 | 2020-10-09 | 浪潮电子信息产业股份有限公司 | CTDB double-network-card fault monitoring method in distributed cluster and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN118101441A (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109144789B (en) | Method, device and system for restarting OSD | |
US20110167293A1 (en) | Non-disruptive i/o adapter diagnostic testing | |
WO2022088861A1 (en) | Database fault handling method and apparatus | |
CN111897697A (en) | Server hardware fault repairing method and device | |
CN118101441B (en) | Service scheduling method, device, equipment and storage medium | |
CN112615728B (en) | Simulation system master-slave switching method based on railway safety communication protocol | |
CN114706714A (en) | Method for synchronizing computer memory division snapshots | |
CN107026762B (en) | Disaster recovery system and method based on distributed cluster | |
CN114328033A (en) | Method and device for keeping service configuration consistency of high-availability equipment group | |
JP2009075719A (en) | Redundancy configuration device and self-diagnostic method thereof | |
CN109885420B (en) | PCIe link fault analysis method, BMC and storage medium | |
KR20140140719A (en) | Apparatus and system for synchronizing virtual machine and method for handling fault using the same | |
JP2009040199A (en) | Fault tolerant system for operation management | |
CN114598594B (en) | Method, system, medium and equipment for processing application faults under multiple clusters | |
CN113472891B (en) | SDN controller cluster data processing method, equipment and medium | |
CN115454872A (en) | Database test method, device, equipment and storage medium | |
JPH07183891A (en) | Computer system | |
Rajković et al. | Resource Awareness in Complex Industrial Systems–A Strategy for Software Updates | |
JP2015106226A (en) | Dual system | |
CN116436768B (en) | Automatic backup method, system, equipment and medium based on cross heartbeat monitoring | |
CN117493072A (en) | Program running method, device, equipment and storage medium | |
CN118740660B (en) | Edge computing embedded application dependency system | |
CN118331516B (en) | Data processing method and device | |
RU2818078C1 (en) | System and method for remote control of operation and maintenance for system for collecting information on power consumption | |
CN118689701A (en) | Method, electronic device and computer program product for data processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |