[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024183493A1 - 通信方法及云服务系统 - Google Patents

通信方法及云服务系统 Download PDF

Info

Publication number
WO2024183493A1
WO2024183493A1 PCT/CN2024/073866 CN2024073866W WO2024183493A1 WO 2024183493 A1 WO2024183493 A1 WO 2024183493A1 CN 2024073866 W CN2024073866 W CN 2024073866W WO 2024183493 A1 WO2024183493 A1 WO 2024183493A1
Authority
WO
WIPO (PCT)
Prior art keywords
maintenance unit
maintenance
resource
resource pool
unit
Prior art date
Application number
PCT/CN2024/073866
Other languages
English (en)
French (fr)
Inventor
王乐晓
秦丹涛
胡堃
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024183493A1 publication Critical patent/WO2024183493A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services

Definitions

  • the embodiments of the present application relate to the field of cloud technology, and in particular, to a communication method and a cloud service system.
  • the cloud management platform can provide unified management of the private clouds of enterprises and institutions. Operation and maintenance personnel can manage private clouds through the user portal provided by the cloud management platform.
  • the embodiment of the present application provides a communication method and a cloud service system.
  • the method can realize independent management of a single-point resource pool through a distributed operation and maintenance unit to meet the needs of different scenarios.
  • an embodiment of the present application provides a cloud service system.
  • the cloud service system includes: multiple resource pools. Among them, a single resource pool among the multiple resource pools of the system is used to provide cloud services. And, each of the multiple resource pools is deployed with an operation and maintenance unit. Among them, multiple operation and maintenance units exchange data based on a communication connection. And, each of the multiple operation and maintenance units is used to manage a resource pool, wherein a resource pool managed by each operation and maintenance unit is the resource pool to which the operation and maintenance unit belongs.
  • the management of the resource pool to which each operation and maintenance unit belongs by the operation and maintenance unit is specifically used for: receiving user instructions.
  • the embodiment of the present application is based on distributed operation and maintenance units, so that each operation and maintenance unit can operate independently as a separate cloud management platform.
  • the management of the cloud (i.e., the resource pool) by each operation and maintenance unit no longer depends on the overall cloud management platform, and a peer-to-peer management and operation mechanism is implemented.
  • Each operation and maintenance unit only needs to bear its own management needs, effectively reducing the operation and maintenance pressure of single-point management, improving the overall stability and reliability of the system, and realizing flexible management of the system.
  • the resource pool may also be referred to as a cloud, cloud resources, or cloud facilities, etc.
  • the resource pool includes, but is not limited to, software resources and hardware resources.
  • Software resources include, but are not limited to, at least one cloud platform, and hardware resources include, but are not limited to, basic settings such as server clusters.
  • the cloud services provided by multiple resource pools are the same or different.
  • the communication connection between the operation and maintenance units may comply with a private communication protocol.
  • each operation and maintenance unit provides a user interface for receiving user instructions.
  • the multiple operation and maintenance units include a first operation and maintenance unit and at least one second operation and maintenance unit; the first operation and maintenance unit is deployed in a first resource pool, and the at least one second operation and maintenance unit is respectively deployed in at least one second resource pool.
  • the first operation and maintenance unit is used to receive first operation and maintenance operation instruction information, and the first operation and maintenance operation instruction information is used to indicate the acquisition of resource status evaluation results of multiple resource pools.
  • the first operation and maintenance unit is also used to send a first task request information to at least one second operation and maintenance unit based on the first operation and maintenance operation instruction information; wherein the first task request information is used to indicate that the second operation and maintenance unit feeds back the resource status evaluation result of the corresponding second resource pool.
  • the second operation and maintenance unit is used to obtain the resource status of the second resource pool in response to the received first task request information; obtain the resource status evaluation result of the second resource pool based on the resource status of the second resource pool; and send a first task response information to the first operation and maintenance unit, wherein the first task response information includes the resource status evaluation result of the second resource pool.
  • the first operation and maintenance unit is also used to receive the first task response information fed back by at least one second operation and maintenance unit; and obtain the resource status evaluation result of at least one second resource pool based on the first task response information.
  • the master node based on the distributed operation and maintenance unit, the master node only needs to issue tasks, and each operation and maintenance unit can perform corresponding tasks, thereby reducing the computing burden of the master node, and the interaction process only transmits the results, effectively reducing communication losses.
  • the first operation and maintenance unit before the first operation and maintenance unit receives the first operation and maintenance operation instruction information, the first operation and maintenance unit is further configured to receive first user instruction information, where the first user instruction information is used to instruct the first operation and maintenance unit to act as a management node; in response to the first user instruction Information, sends a first docking request information to each second operation and maintenance unit, the first docking request information is used to indicate that the first operation and maintenance unit is the management node of the cloud service system.
  • the second operation and maintenance unit is also used to respond to the received first docking request information to determine that the first operation and maintenance unit is the management node and the second operation and maintenance unit is the managed node; send a first docking response information to the first operation and maintenance unit, wherein the first docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the present application uses a distributed structure, in which the user specifies the master node (i.e., the management node, also referred to as the headquarters), so that the master node can dispatch tasks to each operation and maintenance unit based on the tasks indicated by the user, and each operation and maintenance unit will perform the corresponding tasks on its own and feed back the results to the master node. Users can freely set the master node according to scenario requirements, so that the cloud service system is more in line with user needs.
  • the third operation and maintenance unit in at least one second operation and maintenance unit is used to receive the second user indication information, and the second user indication information is used to indicate that the third operation and maintenance unit is a management node; in response to the second user indication information, a second docking request information is sent to each second operation and maintenance unit, and the second docking request information is used to indicate that the third operation and maintenance unit is a management node of the cloud service system.
  • the second operation and maintenance unit is also used to respond to the received second docking request information, determine that the third operation and maintenance unit is a management node and the second operation and maintenance unit is a managed node; send a second docking response information to the third operation and maintenance unit, and the second docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the system in the embodiment of the present application can specify the headquarters according to user needs, and in the event of an abnormality in the headquarters, the user can re-specify.
  • the new headquarters can continue to implement the functions of the headquarters, that is, when each operation and maintenance unit is equal, that is, has the same capabilities, any operation and maintenance unit can inherit the role of the headquarters, thereby improving the fault tolerance and flexibility of the system.
  • the third operation and maintenance unit is further used to receive second operation and maintenance operation instruction information, and the second operation and maintenance operation instruction information is used to instruct to obtain resource status evaluation results of multiple resource pools.
  • the third operation and maintenance unit is also used to send second task request information to at least one second operation and maintenance unit based on the second operation and maintenance operation instruction information; wherein the second task request information is used to instruct at least one second operation and maintenance unit to feedback the resource status evaluation result of the corresponding second resource pool.
  • the third operation and maintenance unit is also used to obtain the resource status of the third resource pool based on the second operation and maintenance operation instruction information; based on the resource status of the third resource pool, obtain the resource status evaluation result of the third resource pool; wherein the third operation and maintenance unit is deployed in the third resource pool.
  • the second operation and maintenance unit is used to obtain the resource status of the second resource pool in response to the received second task request information; based on the resource status of the second resource pool, obtain the resource status evaluation result of the second resource pool; and send second task response information to the third operation and maintenance unit, wherein the second task response information includes the resource status evaluation result of the second resource pool.
  • the third operation and maintenance unit is further used to receive the second task response information fed back by at least one second operation and maintenance unit; based on the second task response information, obtain the resource status evaluation result of at least one second resource pool.
  • the second resource pool is managed based on the second operation and maintenance unit with the communication anomaly.
  • each operation and maintenance unit operates independently and manages the corresponding cloud resources. If there is a communication anomaly between any operation and maintenance unit and the main operation and maintenance unit, the operation and maintenance personnel can still access the operation and maintenance node with the communication anomaly and manage the corresponding cloud resources, which will not affect the access and local data management of local operation and maintenance personnel.
  • the multiple operation and maintenance units include a third operation and maintenance unit and a fourth operation and maintenance unit.
  • the third operation and maintenance unit is used to receive an update instruction from the first unit; based on the update instruction from the first unit, the specified operation and maintenance capability of the third operation and maintenance unit is updated.
  • the fourth operation and maintenance unit is used to receive an update instruction from the second unit; based on the update instruction from the second unit, the specified operation and maintenance capability of the fourth operation and maintenance unit is updated.
  • the operation and maintenance capabilities to be updated indicated by the update instruction from the first unit and the update instruction from the second unit are the same or different.
  • each operation and maintenance unit is independent of each other, and can be evolved and iterated for a single operation and maintenance unit to meet the private customization of user needs and increase the flexibility and applicability of each operation and maintenance unit. Moreover, since the data between the operation and maintenance units are isolated from each other, the update of any operation and maintenance unit is only valid for that operation and maintenance unit and will not affect each other.
  • the cloud service system further includes a server cluster.
  • Each of the multiple resource pools includes at least one server cluster; wherein the models of the servers in the server cluster can be the same or different.
  • the distributed operation and maintenance unit in the embodiment of the present application can be deployed in an existing server cluster, providing a universal operation and maintenance management system.
  • an embodiment of the present application provides a communication method.
  • the method is applied to a cloud service system, the cloud service system includes multiple resource pools; wherein a single resource pool among the multiple resource pools is used to provide cloud services; each resource pool among the multiple resource pools is deployed with an operation and maintenance unit; Among them, multiple operation and maintenance units exchange data based on communication connections; each of the multiple operation and maintenance units is used to manage a resource pool, wherein a resource pool managed by each operation and maintenance unit is a resource pool to which the operation and maintenance unit belongs; the method includes: the operation and maintenance unit receives a user instruction. The operation and maintenance unit responds to the user instruction and performs a target operation and maintenance operation on the resource pool to which the operation and maintenance unit belongs.
  • the embodiment of the present application is based on distributed operation and maintenance units, so that each operation and maintenance unit can operate independently as a separate cloud management platform.
  • the management of the cloud (i.e., the resource pool) by each operation and maintenance unit no longer depends on the overall cloud management platform, and a peer-to-peer management and operation mechanism is implemented.
  • Each operation and maintenance unit only needs to bear its own management needs, effectively reducing the operation and maintenance pressure of single-point management, improving the overall stability and reliability of the system, and realizing flexible management of the system.
  • the resource pool may also be referred to as a cloud, cloud resources, or cloud facilities, etc.
  • the resource pool includes, but is not limited to, software resources and hardware resources.
  • Software resources include, but are not limited to, at least one cloud platform, and hardware resources include, but are not limited to, basic settings such as server clusters.
  • the cloud services provided by multiple resource pools are the same or different.
  • the communication connection between the operation and maintenance units may comply with a private communication protocol.
  • each operation and maintenance unit provides a user interface for receiving user instructions.
  • the multiple operation and maintenance units include a first operation and maintenance unit and at least one second operation and maintenance unit; the first operation and maintenance unit is deployed in a first resource pool, and at least one second operation and maintenance unit is respectively deployed in at least one second resource pool, and the method further includes: the first operation and maintenance unit receives first operation and maintenance operation instruction information, and the first operation and maintenance operation instruction information is used to indicate the acquisition of resource status evaluation results of multiple resource pools. Based on the first operation and maintenance operation instruction information, the first operation and maintenance unit sends a first task request information to at least one second operation and maintenance unit; wherein the first task request information is used to instruct the second operation and maintenance unit to feedback the resource status evaluation result of the corresponding second resource pool.
  • the second operation and maintenance unit In response to the received first task request information, the second operation and maintenance unit obtains the resource status of the second resource pool; based on the resource status of the second resource pool, obtains the resource status evaluation result of the second resource pool; and sends a first task response information to the first operation and maintenance unit, wherein the first task response information includes the resource status evaluation result of the second resource pool.
  • the first operation and maintenance unit receives the first task response information fed back by at least one second operation and maintenance unit; based on the first task response information, obtains the resource status evaluation result of at least one second resource pool.
  • the master node based on the distributed operation and maintenance unit, the master node only needs to issue tasks, and each operation and maintenance unit can perform corresponding tasks, thereby reducing the computing burden of the master node, and the interaction process only transmits the results, effectively reducing communication losses.
  • the method before the first operation and maintenance unit receives the first operation and maintenance operation instruction information, the method also includes: the first operation and maintenance unit receives the first user instruction information, the first user instruction information is used to indicate the first operation and maintenance unit as a management node; in response to the first user instruction information, the first docking request information is sent to each second operation and maintenance unit, the first docking request information is used to indicate the first operation and maintenance unit as a management node of the cloud service system; the second operation and maintenance unit responds to the received first docking request information, determines that the first operation and maintenance unit is a management node and the second operation and maintenance unit is a managed node; sends a first docking response information to the first operation and maintenance unit, wherein the first docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the present application uses a distributed structure, in which the user specifies the master node (i.e., the management node, which can also be called the headquarters), so that the master node can assign tasks to each operation and maintenance unit based on the tasks indicated by the user, and each operation and maintenance unit performs the corresponding tasks on its own, and feeds back the results to the master node.
  • the user can freely set the master node according to the scene requirements, so that the cloud service system is more in line with the user's needs.
  • the system in the embodiment of the present application can specify the headquarters according to user needs, and in the case of abnormal headquarters, the user can re-specify.
  • the new headquarters can continue to realize the functions of the headquarters. That is to say, when all operation and maintenance units are equal, that is, have the same capabilities, any operation and maintenance unit can inherit the role of the headquarters, thereby improving the fault tolerance and flexibility of the system.
  • the method further includes: if an abnormality occurs in the first operation and maintenance unit, a third operation and maintenance unit in at least one second operation and maintenance unit receives a second user indication information, and the second user indication information is used to indicate that the third operation and maintenance unit serves as a management node; the third operation and maintenance unit responds to the second user indication information and sends a second docking request information to each second operation and maintenance unit, and the second docking request information is used to indicate that the third operation and maintenance unit serves as a management node of the cloud service system; the second operation and maintenance unit responds to the received second docking request information and determines that the third operation and maintenance unit is a management node and the second operation and maintenance unit is a managed node; the second operation and maintenance unit sends a second docking response information to the third operation and maintenance unit, and the second docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the headquarters in the event of an abnormality in the headquarters, the headquarters
  • the third operation and maintenance unit receives second operation and maintenance operation instruction information, where the second operation and maintenance operation instruction information is used to instruct the acquisition of resource status evaluation results of multiple resource pools; the third operation and maintenance unit sends second task request information to at least one second operation and maintenance unit based on the second operation and maintenance operation instruction information; wherein the second task request information is used to instruct at least one second operation and maintenance unit to feedback the resource status evaluation result of the corresponding second resource pool; the third operation and maintenance unit acquires the resource status of the third resource pool based on the second operation and maintenance operation instruction information.
  • the third operation and maintenance unit is deployed in the third resource pool; the second operation and maintenance unit obtains the resource state of the second resource pool in response to the received second task request information; based on the resource state of the second resource pool, obtain the resource state evaluation result of the second resource pool; send the second task response information to the third operation and maintenance unit, wherein the second task response information includes the resource state evaluation result of the second resource pool; the third operation and maintenance unit receives the second task response information fed back by at least one second operation and maintenance unit; based on the second task response information, obtain the resource state evaluation result of at least one second resource pool.
  • the second resource pool is managed based on the second operation and maintenance unit with the communication anomaly.
  • each operation and maintenance unit operates independently and manages the corresponding cloud resources. If there is a communication anomaly between any operation and maintenance unit and the main operation and maintenance unit, the operation and maintenance personnel can still access the operation and maintenance node with the communication anomaly and manage the corresponding cloud resources, which will not affect the access and local data management of local operation and maintenance personnel.
  • the multiple operation and maintenance units include a third operation and maintenance unit and a fourth operation and maintenance unit
  • the method further includes: the third operation and maintenance unit receives the first unit update instruction; the third operation and maintenance unit updates the designated operation and maintenance capability of the third operation and maintenance unit based on the first unit update instruction; the fourth operation and maintenance unit receives the second unit update instruction; the fourth operation and maintenance unit updates the designated operation and maintenance capability of the fourth operation and maintenance unit based on the second unit update instruction; wherein the operation and maintenance capability to be updated indicated by the first unit update instruction and the second unit update instruction are the same or different.
  • each operation and maintenance unit is independent of each other, and can be evolved and iterated for a single operation and maintenance unit to meet the private customization of user needs and increase the flexibility and applicability of each operation and maintenance unit. Moreover, since the data between the operation and maintenance units are isolated from each other, the update of any operation and maintenance unit is only effective for the operation and maintenance unit and will not affect each other.
  • an embodiment of the present application provides a communication method, characterized in that it is applied to a cloud service system, the cloud service system includes multiple resource pools; wherein a single resource pool among the multiple resource pools is used to provide cloud services; each resource pool among the multiple resource pools is deployed with an operation and maintenance unit; wherein data is exchanged between the multiple operation and maintenance units based on a communication connection; each of the multiple operation and maintenance units is used to manage a resource pool, wherein a resource pool managed by each operation and maintenance unit is a resource pool to which the operation and maintenance unit belongs; the multiple operation and maintenance units include a first operation and maintenance unit and at least one second operation and maintenance unit; the first operation and maintenance unit is deployed in the first resource pool, and the at least one second operation and maintenance unit is respectively deployed in at least one second resource pool, and the method includes: the first operation and maintenance unit receives first operation and maintenance operation instruction information, and the first operation and maintenance operation instruction information is used to indicate The method comprises the following steps: obtaining resource status evaluation results of multiple resource pools; the first
  • the embodiment of the present application is based on distributed operation and maintenance units, so that each operation and maintenance unit can operate independently as a separate cloud management platform.
  • the management of the cloud (i.e., resource pool) by each operation and maintenance unit no longer depends on the overall cloud management platform, and a peer-to-peer management and operation mechanism is implemented.
  • Each operation and maintenance unit only needs to bear its own management needs, effectively reducing the operation and maintenance pressure of single-point management, improving the overall stability and reliability of the system, and realizing flexible management of the system.
  • the master node based on distributed operation and maintenance units, the master node only needs to publish tasks, and each operation and maintenance unit can perform corresponding tasks separately. Thereby reducing the computing burden of the master node, and the interaction process only transmits results, effectively reducing communication losses.
  • the method before the first operation and maintenance unit receives the first operation and maintenance operation indication information, the method also includes: the first operation and maintenance unit receives the first user indication information, the first user indication information is used to indicate the first operation and maintenance unit as a management node; in response to the first user indication information, first docking request information is sent to each second operation and maintenance unit, the first docking request information is used to indicate the first operation and maintenance unit as a management node of the cloud service system; the second operation and maintenance unit determines that the first operation and maintenance unit is a management node and the second operation and maintenance unit is a managed node in response to the received first docking request information; and first docking response information is sent to the first operation and maintenance unit, wherein the first docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the method further includes: if an abnormality occurs in the first operation and maintenance unit, a third operation and maintenance unit in at least one second operation and maintenance unit receives second user indication information, where the second user indication information is used to indicate the third operation and maintenance unit as a management node; the third operation and maintenance unit The unit responds to the second user indication information and sends second docking request information to each second operation and maintenance unit, where the second docking request information is used to indicate the third operation and maintenance unit as the management node of the cloud service system; the second operation and maintenance unit responds to the received second docking request information and determines that the third operation and maintenance unit is the management node and the second operation and maintenance unit is the managed node; the second operation and maintenance unit sends second docking response information to the third operation and maintenance unit, where the second docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the third operation and maintenance unit receives second operation and maintenance operation instruction information, where the second operation and maintenance operation instruction information is used to instruct to obtain resource status evaluation results of multiple resource pools; the third operation and maintenance unit sends second task request information to at least one second operation and maintenance unit based on the second operation and maintenance operation instruction information; wherein the second task request information is used to instruct at least one second operation and maintenance unit to feedback the resource status evaluation result of the corresponding second resource pool; the third operation and maintenance unit obtains the resource status of the third resource pool based on the second operation and maintenance operation instruction information; obtains the resource status evaluation result of the third resource pool based on the resource status of the third resource pool; wherein the third operation and maintenance unit is deployed in the third resource pool; the second operation and maintenance unit obtains the resource status of the second resource pool in response to the received second task request information; obtains the resource status evaluation result of the second resource pool based on the resource status of the second resource pool; sends second task response information to the third operation and maintenance unit, wherein the second task response information includes the resource
  • the second resource pool is managed based on the second operation and maintenance unit with the communication abnormality.
  • the multiple operation and maintenance units include a third operation and maintenance unit and a fourth operation and maintenance unit
  • the method also includes: the third operation and maintenance unit receives the first unit update instruction; the third operation and maintenance unit updates the designated operation and maintenance capability of the third operation and maintenance unit based on the first unit update instruction; the fourth operation and maintenance unit receives the second unit update instruction; the fourth operation and maintenance unit updates the designated operation and maintenance capability of the fourth operation and maintenance unit based on the second unit update instruction; wherein the operation and maintenance capability to be updated indicated by the first unit update instruction and the second unit update instruction are the same or different.
  • an embodiment of the present application provides a communication method.
  • the method is applied to a cloud service system, which includes multiple resource pools; wherein a single resource pool among the multiple resource pools is used to provide cloud services; an operation and maintenance unit is deployed in each resource pool among the multiple resource pools; wherein data is exchanged between the multiple operation and maintenance units based on a communication connection; each of the multiple operation and maintenance units is used to manage a resource pool, wherein a resource pool managed by each operation and maintenance unit is a resource pool to which the operation and maintenance unit belongs; the multiple operation and maintenance units include a first operation and maintenance unit and at least one second operation and maintenance unit; the first operation and maintenance unit is deployed in the first resource pool, and at least one second operation and maintenance unit is respectively deployed in at least one second resource pool.
  • the method comprises: the first operation and maintenance unit receives first user indication information, and the first user indication information is used to indicate that the first operation and maintenance unit serves as a management node; the management node is used in the cloud service system to issue a resource analysis task to at least one second operation and maintenance unit, so that the second operation and maintenance unit that receives the resource analysis task performs a resource analysis and operation and maintenance operation, and feeds back a resource analysis result to the first operation and maintenance unit; if an abnormality occurs in the first operation and maintenance unit, a third operation and maintenance unit among at least one second operation and maintenance unit receives second user indication information, and the second user indication information is used to indicate that the third operation and maintenance unit serves as a management node.
  • the method also includes: the first operation and maintenance unit sends first docking request information to each second operation and maintenance unit in response to the first user indication information, the first docking request information is used to indicate the first operation and maintenance unit as the management node of the cloud service system; the second operation and maintenance unit determines that the first operation and maintenance unit is the management node and the second operation and maintenance unit is the managed node in response to the received first docking request information; and sends first docking response information to the first operation and maintenance unit, wherein the first docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the method also includes: the first operation and maintenance unit receives the first operation and maintenance operation indication information, and the first operation and maintenance operation indication information is used to indicate the acquisition of resource status evaluation results of multiple resource pools; the first operation and maintenance unit sends a first task request information to at least one second operation and maintenance unit based on the first operation and maintenance operation indication information; wherein the first task request information is used to indicate the second operation and maintenance unit to feedback the corresponding resource status evaluation result of the second resource pool; the second operation and maintenance unit obtains the resource status of the second resource pool in response to the received first task request information; based on the resource status of the second resource pool, obtains the resource status evaluation result of the second resource pool; sends a first task response information to the first operation and maintenance unit, wherein the first task response information includes the resource status evaluation result of the second resource pool; the first operation and maintenance unit receives the first task response information fed back by at least one second operation and maintenance unit; based on the first task response information,
  • the method also includes: the third operation and maintenance unit sends second docking request information to each second operation and maintenance unit in response to the second user indication information, and the second docking request information is used to indicate that the third operation and maintenance unit serves as the management node of the cloud service system; the second operation and maintenance unit determines that the third operation and maintenance unit is the management node and the second operation and maintenance unit is the managed node in response to the received second docking request information; the second operation and maintenance unit sends second docking response information to the third operation and maintenance unit, and the second docking response information includes the operation and maintenance information of the second operation and maintenance unit.
  • the method further includes: the third operation and maintenance unit receives second operation and maintenance operation indication information, the second operation and maintenance operation indication information is used to indicate obtaining resource status evaluation results of multiple resource pools; the third operation and maintenance unit sends second task request information to at least one second operation and maintenance unit based on the second operation and maintenance operation indication information; wherein the second task request information is used to instruct at least one second operation and maintenance unit to feedback the resource status evaluation result of the corresponding second resource pool; the third operation and maintenance unit obtains the resource status of the third resource pool based on the second operation and maintenance operation indication information; based on the resource status of the third resource pool resource status, and obtain the resource status evaluation result of the third resource pool; wherein the third operation and maintenance unit is deployed in the third resource pool; the second operation and maintenance unit obtains the resource status of the second resource pool in response to the received second task request information; based on the resource status of the second resource pool, obtains the resource status evaluation result of the second resource pool; send
  • the second resource pool is managed based on the second operation and maintenance unit with the communication abnormality.
  • an embodiment of the present application provides a communication method.
  • the method is applied to a cloud service system, which includes multiple resource pools; wherein a single resource pool among the multiple resource pools is used to provide cloud services; each resource pool among the multiple resource pools is deployed with an operation and maintenance unit; wherein multiple operation and maintenance units perform data exchange based on a communication connection; each of the multiple operation and maintenance units is used to manage a resource pool, wherein a resource pool managed by each operation and maintenance unit is a resource pool to which the operation and maintenance unit belongs, and each of the multiple operation and maintenance units has the same operation and maintenance capability; the method includes: any operation and maintenance unit of the multiple operation and maintenance units receives a unit update instruction; any operation and maintenance unit updates the specified operation and maintenance capability of the operation and maintenance unit based on the unit update instruction.
  • the unit update instruction is used to instruct deletion and/or addition of a specified operation and maintenance capability.
  • an embodiment of the present application provides a computer program product comprising instructions, which, when executed by a computer device cluster, enables the computer device cluster to execute the method provided in the second aspect or any possible design of the second aspect.
  • an embodiment of the present application provides a computer program product comprising instructions, which, when executed by a computer device cluster, enables the computer device cluster to execute the method provided in the third aspect or any possible design of the third aspect.
  • an embodiment of the present application provides a computer program product comprising instructions, which, when executed by a computer device cluster, enables the computer device cluster to execute the method provided in the fourth aspect or any possible design of the fourth aspect.
  • an embodiment of the present application provides a computer program product comprising instructions, which, when executed by a computer device cluster, enables the computer device cluster to execute the method provided in the fifth aspect or any possible design of the fifth aspect.
  • an embodiment of the present application provides a computer-readable storage medium, characterized in that it includes computer program instructions.
  • the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method provided by the second aspect or any possible design of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, characterized in that it includes computer program instructions.
  • the computer program instructions are executed by a computing device cluster, the computing device cluster executes the third aspect or any possible The design method provided.
  • an embodiment of the present application provides a computer-readable storage medium, characterized in that it includes computer program instructions.
  • the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method provided by the fourth aspect or any possible design of the fourth aspect.
  • an embodiment of the present application provides a computer-readable storage medium, characterized in that it includes computer program instructions.
  • the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method provided by the fifth aspect or any possible design of the fifth aspect.
  • an embodiment of the present application provides a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of at least one computing device is used to execute instructions stored in the memory of at least one computing device, so that the computing device executes a method provided by any possible design in the second to fifth aspects.
  • FIG1 is a schematic diagram of a scenario of a cloud service system shown as an example
  • FIG2 is a schematic diagram showing the structure of a cloud service system in an embodiment of the present application.
  • FIG3 is a schematic diagram showing an exemplary structure of a cloud
  • FIG4 is a schematic diagram showing the structure of an operation and maintenance unit
  • FIG5 is a schematic diagram of an exemplary cloud service system initialization process
  • FIG6 is a schematic diagram showing an exemplary operation and maintenance unit interaction process
  • FIG7 is a schematic diagram showing an exemplary operation and maintenance unit interaction process
  • FIG8 is a schematic diagram showing an exemplary scenario of an abnormal operation and maintenance unit of a headquarters
  • FIG9 is a schematic diagram showing an exemplary scenario of an abnormal operation and maintenance unit of a headquarters
  • FIG10 is a schematic diagram showing an exemplary scenario of abnormal communication between the operation and maintenance unit
  • FIG. 11 is a schematic diagram showing the structure of an exemplary device.
  • Cloud platform also known as cloud system, cloud environment or cloud, is a software system of cloud technology (also known as cloud computing technology) services provided by cloud providers, which is used to provide cloud service-related interfaces for tenants to remotely access cloud services.
  • Tenants can log in to the cloud platform on the cloud service access page using their pre-registered account and password, and after successful login, select and purchase the corresponding cloud services on the cloud service access page, such as object storage services, virtual machine services, container services, etc.
  • Public cloud is a cloud platform provided by third-party public cloud providers for individuals or enterprises.
  • the hardware, software and other structures are owned and managed by the third-party public cloud providers.
  • a private cloud is a dedicated cloud platform provided for an enterprise or organization.
  • a private cloud can be operated internally by the corresponding enterprise or organization.
  • Private clouds are mainly for enterprise users, also known as enterprise clouds.
  • Hybrid cloud refers to a cloud platform formed by different cloud platforms.
  • a hybrid cloud includes at least two cloud platforms, also known as a multi-cloud platform or multi-cloud.
  • a hybrid cloud combines public cloud and private cloud.
  • some enterprise users prefer to store data in private clouds, but at the same time hope to obtain computing resources from public clouds.
  • hybrid clouds including public clouds and private clouds are increasingly being adopted.
  • Hybrid clouds mix and match public clouds and private clouds to achieve good results.
  • the cloud involved in the embodiments of the present application may be a public cloud, a private cloud and/or a hybrid cloud, and this application does not limit this.
  • the cloud may also be referred to as a resource pool or a cloud facility, etc., which is not limited in the present application.
  • Cloud management platform (abbreviated as cloud management platform) is a product for managing public cloud, private cloud and hybrid cloud.
  • the product form can be software, providing enterprises with services for managing cross-cloud infrastructure.
  • Cloud management platform can provide a unified entrance (also called operation and maintenance entrance), and operation and maintenance personnel You can access the cloud management platform through a unified entrance, and manage the cloud resources (including hardware and software) of the corresponding cloud through the cloud management platform.
  • Services include computing services, storage services, or network services. Any device or function that a user device can access on a cloud platform can be considered a service provided by the cloud platform.
  • Resources are hardware or software resources used to provide services.
  • resources include computing resources, storage resources or network resources.
  • computing resources include central processing unit (CPU) resources, memory resources and/or hard disk resources.
  • FIG. 1 is a schematic diagram of a cloud service system.
  • the cloud service system includes but is not limited to: a cloud management platform 100 and multiple clouds.
  • each cloud includes only one cloud platform.
  • the multiple clouds include but are not limited to: cloud 110 located in Shanghai, cloud 120 located in Hunan, cloud 130 located in Henan, cloud 140 located in Zhengzhou, and cloud 150 located in Luoyang.
  • the cloud can be understood as a cloud platform, or a cloud includes multiple cloud platforms, and the cloud platform includes cloud service resources and basic resources (i.e. hardware resources).
  • the management of the cloud by the cloud management platform is the management of the cloud platform.
  • the cloud management platform manages the cloud resources (including cloud platform and hardware) of the cloud for explanation.
  • the cloud management platform corresponds to the cloud platform and manages multiple cloud platforms.
  • the server cluster includes one or more devices, including but not limited to: network devices, security devices, computing devices, etc., which are not limited in this application.
  • the hardware resources of the cloud i.e., cloud infrastructure
  • Shanghai for example, in a data center in Shanghai.
  • the regional division method is used as an example to illustrate the independence between the clouds.
  • the number of clouds and the division method (e.g., regional division) involved in the embodiments of the present application are only illustrative examples, and the present application does not limit them.
  • each cloud may be the same or different, and this application does not limit this.
  • the cloud management platform 110 is deployed in cloud 110. That is to say, cloud 110 serves as the main cloud (also known as the headquarters cloud), and other clouds are subordinate clouds of cloud 110 (also known as sub-clouds or branches).
  • the cloud management platform is used to uniformly manage the cloud facilities of all branches in the customer's cloud service system, and to support the operation and maintenance personnel of each branch to perform daily operation and maintenance affairs on the local cloud facilities.
  • each branch in the cloud service system will have its own unique local cloud facilities (including hardware resources and software resources, such as server clusters and cloud platforms).
  • the headquarters While operating and maintaining its own local cloud facilities, the headquarters (usually peer-to-peer with the local cloud deployed by the branch) usually needs to have a global understanding of the operating profile of the cloud facilities of each branch, but the specific daily operation and maintenance operations need to be the responsibility of each branch.
  • all operation and maintenance personnel need to access the cloud management platform 110 to perform operation and maintenance operations on the corresponding cloud.
  • the cloud management platform has the functions of decentralization and domain division. Specifically:
  • Decentralization Different operation and maintenance personnel of customers perform daily operation and maintenance affairs on the cloud management platform. In order to control each operation and maintenance personnel to have only the minimum operation authority, the cloud management platform will provide decentralization capabilities to support the ability to grant different operation and maintenance permissions to different operation and maintenance personnel.
  • Domain division Different operation and maintenance personnel of customers perform daily operation and maintenance affairs on the cloud management platform. In order to control each operation and maintenance personnel to only operate and maintain the operation and maintenance objects within their own responsibility scope, the cloud management platform will provide domain division capabilities to support the ability to grant different operation and maintenance personnel different ranges of operation and maintenance objects.
  • the power division and domain division function of the cloud management platform enables the operation and maintenance personnel of each branch to only see the relevant data of the cloud within their own authority and domain after entering the cloud management platform, and will not see the relevant data of the cloud of other branches.
  • the cloud management platform in the existing technology adopts the power division and domain division method, so that the operation and maintenance personnel of each branch can perform daily operation and maintenance affairs on the cloud management platform in a unified manner.
  • the operation and maintenance personnel in Zhengzhou can establish a network connection through electronic devices (such as mobile phones, tablets, wearable devices, computers, etc.) and network devices (such as gateways, which can be located in Zhengzhou or Shanghai, and this application does not limit this). Establish a communication connection.
  • the network device is connected to the cloud 110.
  • the Zhengzhou operation and maintenance personnel send user instructions (also known as operation and maintenance instructions, or operation and maintenance operation instructions, etc., which are not limited in this application) to the cloud management platform on the cloud 110 through the network device.
  • the instruction is used to obtain the resource analysis results of the cloud platform in Zhengzhou.
  • the instructions issued by the Zhengzhou operation and maintenance personnel can only be for Zhengzhou Cloud 140.
  • the cloud management platform sends a resource acquisition instruction to cloud 140 (i.e., a cloud platform located in Zhengzhou) to obtain resource data of the Zhengzhou cloud platform.
  • the resource data may be hardware resource data and/or software resource data, which is not limited in this application.
  • the communication between cloud 140 and cloud 110 may be direct communication, i.e., cloud 110 sends the instruction to cloud 140. It may also be indirect communication, for example, cloud 110 sends the instruction to cloud 130, which is then forwarded to cloud 140 by cloud 130. This application is not limited.
  • cloud 140 feeds back the resource status of cloud 140 to cloud 110.
  • cloud 140 may transmit the hardware operating status of each server in the Zhengzhou server cluster to cloud 110 via a communication connection.
  • the cloud management platform on cloud 110 performs resource analysis in response to the received resource status sent by cloud 140 to obtain a resource analysis result.
  • the cloud management platform may display the resource analysis result on the display interface.
  • the processing of other cloud platforms is similar to that of cloud 140, and no further description is given here.
  • each branch (which can also be understood as a branch cloud) need to access the cloud management platform through the communication connection between the local and the cloud management platform, and all instructions need to be sent to the cloud management platform through the communication connection.
  • the communication connection between the branch's cloud platform and the main cloud is disconnected, the electronic devices of the operation and maintenance personnel of the branch structure will not be able to access the cloud management platform.
  • the operation and maintenance personnel of the branch structure will not be able to manage the local cloud platform.
  • the management of the local cloud platform by the branch's operation and maintenance personnel depends on the communication connection between the cloud management platform. Once the communication connection is abnormal, the cloud platform cannot be managed normally.
  • the cloud management platform located at the headquarters needs to provide operation and maintenance capabilities for the operation and maintenance personnel of all branches globally
  • the unified cloud management platform of the headquarters needs to manage all global operation and maintenance data and needs to bear all the pressure of operation and maintenance data.
  • the cloud management platform can easily become a bottleneck, and this bottleneck will affect the operation and maintenance business of all branches globally.
  • the cloud management platform is used to manage at least one cloud in the cloud service system.
  • management is not required.
  • the embodiment of the present application provides a cloud service system and a corresponding communication method.
  • a distributed cloud service system based on operation and maintenance units is constructed to achieve a peer-to-peer relationship between the operation and maintenance units.
  • the cloud management platform in the cloud service system can be customized with the operation and maintenance unit as the smallest unit to achieve individual scalability. It can be understood that each operation and maintenance unit in the embodiment of the present application is equivalent to a cloud management platform for managing the corresponding cloud.
  • the cloud platform is a platform virtualized by deploying cloud platform software on basic resources (i.e., hardware resources), and the service resources it provides are based on hardware.
  • the cloud platform includes cloud service resources (i.e., software resources) and basic resources (i.e., hardware resources).
  • the management of the cloud platform by the operation and maintenance unit is to manage the cloud resources of the cloud platform.
  • the management of the cloud by the operation and maintenance unit is the management of the cloud platform.
  • the operation and maintenance unit corresponds to the cloud, that is, one operation and maintenance unit is deployed on a cloud, and the operation and maintenance unit can be used to manage the cloud resources (including cloud platform and hardware) of the corresponding cloud for explanation.
  • the operation and maintenance unit corresponds to at least one cloud platform and manages multiple cloud platforms.
  • the cloud platform is equivalent to the cloud, and in order to distinguish the management object of the operation and maintenance unit, the embodiment of the present application is managed by the operation and maintenance unit.
  • both hardware resources and software resources i.e., cloud service resources
  • cloud resources are regarded as cloud resources corresponding to the cloud platform.
  • the cloud platform and the hardware resources are distinguished, and both are regarded as part of the cloud resources of the cloud, and will not be repeated below.
  • FIG2 is a schematic diagram of the structure of the cloud service system in the embodiment of the present application.
  • the system includes but is not limited to multiple clouds.
  • it includes but is not limited to cloud 210 located in Shanghai, cloud 220 located in Henan, cloud 230 located in Zhengzhou, and cloud 240 located in Luoyang.
  • the number and division method of each cloud in FIG2 are only illustrative examples, and this application does not limit them.
  • each cloud also includes a corresponding operation and maintenance unit.
  • cloud 210 includes operation and maintenance unit 211
  • cloud 220 includes operation and maintenance unit 221
  • cloud 230 includes operation and maintenance unit 231
  • cloud 240 includes operation and maintenance unit 241.
  • the operation and maintenance unit in the embodiment of the present application is deployed on the cloud and can be understood as being part of the cloud, that is, the operation and maintenance unit can be described as being contained in the corresponding cloud, or belonging to the corresponding cloud.
  • the operation and maintenance unit can also be understood as a management layer above the cloud, that is, it is used to manage the entire cloud, rather than Part of the cloud.
  • a single cloud may include software resources and hardware resources.
  • software resources include but are not limited to cloud platforms
  • hardware resources include but are not limited to at least one server cluster.
  • the server cluster includes one or more devices.
  • the devices include but are not limited to: network devices, security devices, computing devices, etc., which are not limited in this application.
  • Cloud services include but are not limited to: storage services, computing services, etc., which are not limited in this application.
  • each server cluster in the embodiment of the present application includes at least one device, and the models of the devices may be the same or different.
  • the different models of the devices may refer to different models of devices of the same manufacturer, or different models of devices of different manufacturers. This application does not limit this.
  • each cloud in the embodiment of the present application may be provided by a third-party vendor.
  • the third-party vendors of each cloud may be the same or different.
  • each cloud may also be the same vendor, for example, all Huawei Cloud. This application does not limit this.
  • each cloud needs to support the communication strategy (for example, support the corresponding communication interface) and environment deployment (for example, support the installation and operation of the operation and maintenance unit) in the embodiments of the present application.
  • the communication method in the embodiment of the present application can be applied to the existing cloud service system, and only the software-level installation and operation of the operation and maintenance unit need to be implemented.
  • the operation and maintenance unit can be understood as an application software or a program instruction.
  • the operation and maintenance personnel can install and run the software package corresponding to the operation and maintenance unit on each cloud (such as cloud 210, cloud 220, cloud 230 and cloud 240) to deploy and run the operation and maintenance unit on the cloud.
  • each operation and maintenance unit can be understood as an independent cloud management platform for managing the cloud resources of the corresponding cloud.
  • cloud resources include but are not limited to software resources and hardware resources.
  • Hardware resources are the server clusters described above.
  • Software resources include but are not limited to cloud platforms and some application software deployed on the cloud.
  • management involved in the embodiments of the present application includes, but is not limited to, querying resource status, analyzing resource status, and file management and other operation and maintenance capabilities. Specific examples will be described in the embodiments below.
  • Figure 3 specifically includes but is not limited to:
  • the cloud resources of cloud 210 include but are not limited to: software resources and hardware resources.
  • software resources include but are not limited to at least one cloud platform, such as cloud platform 212 and cloud platform 213.
  • Cloud platform is a software system for cloud technology services provided by a cloud provider, which is used to provide interfaces related to cloud services for tenants to remotely access cloud services.
  • Tenants can log in to the cloud platform on the cloud service access page using a pre-registered account and password, and after successful login, select and purchase the corresponding cloud services on the cloud service access page, such as object storage services, virtual machine services, container services, etc.
  • data isolation between cloud platforms means that different clouds correspond to different service objects, and the services provided are the same or different. This application does not limit this.
  • the hardware resources 214 of the cloud resources include but are not limited to server clusters, such as storage media, memory, and CPU (central processing unit) in the server cluster.
  • the operation and maintenance unit 211 is used to manage the cloud resources in the cloud 210.
  • it manages multiple cloud platforms (including cloud platform 212 and cloud platform 213) and corresponding hardware resources 214.
  • the hardware resources 214 include multiple server clusters.
  • cloud platform 212 is deployed on server cluster 1, and cloud platform 213 is deployed on server cluster 2.
  • Cloud platform 212 can provide users with corresponding cloud service resources, such as cloud storage resources, cloud computing resources, etc. based on the basic resources provided by server cluster 1.
  • Cloud platform 213 can be based on the basic resources provided by server cluster 2.
  • a cloud platform is equivalent to a cloud.
  • cloud platform 212 and cloud platform 213 in Figure 3 can be used as a cloud in the organizational structure of the cloud service system, and can also be called a cloud platform.
  • hardware resources can also be regarded as resources of the cloud platform.
  • this application only takes the example that a cloud includes at least one cloud platform, one cloud corresponds to one operation and maintenance unit, and the cloud resources of the cloud include a cloud platform and hardware resources, that is, the operation and maintenance unit is used to manage at least one cloud platform and hardware resources in the cloud.
  • the operation and maintenance unit may configure an interface with the cloud platform and an interface with each server in the server cluster, thereby interacting with the cloud platform and the server through the interface to obtain relevant data of the cloud platform and relevant data of the server to achieve management of cloud resources.
  • each operation and maintenance unit serves as an independent cloud management platform, and each operation and maintenance unit is equal. Accordingly, communication connections can be established between the operation and maintenance units to exchange data.
  • the operation and maintenance personnel can establish communication connections between the operation and maintenance units according to needs so that the operation and maintenance units can communicate with each other.
  • the operation and maintenance personnel can establish communication connections between the operation and maintenance units according to needs so that the operation and maintenance units can communicate with each other.
  • the maintenance unit 211 establishes a communication connection with the operation and maintenance unit 221, and data can be exchanged between the operation and maintenance unit 211 and the operation and maintenance unit 221.
  • the operation and maintenance unit 211 establishes a communication connection with the operation and maintenance unit 231 and the operation and maintenance unit 241, and data can be exchanged between the operation and maintenance unit 211 and the operation and maintenance unit 231 and the operation and maintenance unit 241.
  • each operation and maintenance unit is equal to another, wherein equal means that the function of each operation and maintenance unit is to be used for management, that is, to serve as an independent cloud management platform.
  • each operation and maintenance unit can also set up a corresponding organizational structure based on user needs and play a corresponding role.
  • the operation and maintenance unit 211 can serve as the headquarters (also referred to as the main operation and maintenance unit) in the cloud service system based on the instructions of Shanghai operation and maintenance personnel.
  • the main operation and maintenance unit can be used to issue tasks to other operation and maintenance units and summarize the task results. Accordingly, other operation and maintenance units can perform corresponding operation and maintenance operations based on the tasks issued by the main operation and maintenance unit. Specific examples will be described below.
  • the operation and maintenance unit in the embodiment of the present application is described in detail below in conjunction with the structural diagram of the operation and maintenance unit shown in Figure 4. Please refer to Figure 4.
  • the operation and maintenance unit includes but is not limited to: system/data interface, CMDB (Configuration Management Database), data analysis platform, user management, report management, display screen, automatic operation, knowledge base, resource management, performance management, log management, alarm management and other operation and maintenance feature capabilities. And support distributed analysis capabilities such as capacity analysis, load analysis, idle resource analysis, bottleneck resource analysis and other data analysis capabilities.
  • the operation and maintenance unit also includes a communication interface for realizing distributed analysis task reception and issuance, data collection and reporting, and operation and maintenance experience/knowledge distribution and sharing with other "operation and maintenance units".
  • the characteristics of the operation and maintenance unit are as follows:
  • An operation and maintenance unit can manage one or more cloud platforms.
  • a cloud may include one or more cloud platforms. Accordingly, the operation and maintenance unit deployed on the cloud may manage one or more cloud platforms. In this way, the operation and maintenance unit in the embodiment of the present application, as an independent cloud management platform, is only used to manage one or more local cloud platforms, so that the granularity of operation and maintenance management is refined to a separate cloud, thereby improving the flexibility of the overall operation and maintenance management of the cloud service system.
  • the operation and maintenance units have a peer relationship with each other.
  • each operation and maintenance unit is peer-to-peer, where peer means that each operation and maintenance unit is used for management, that is, as an independent cloud management platform.
  • peer means that each operation and maintenance unit is used for management, that is, as an independent cloud management platform.
  • the operation and maintenance units in the embodiment of the present application are in a peer-to-peer relationship and exist independently of each other. The abnormality of any operation and maintenance unit will not affect the normal operation of other operation and maintenance units.
  • the operation and maintenance unit itself can evolve and iterate independently.
  • each operation and maintenance unit is independent of each other. Therefore, each operation and maintenance unit can add or delete specific operation and maintenance functions based on user needs.
  • adding or deleting a specific operation and maintenance function may refer to adding or deleting an existing module in the operation and maintenance unit.
  • Shanghai operation and maintenance personnel can add module 1 in operation and maintenance unit 211, and module 1 can perform specified operation and maintenance operations.
  • Henan operation and maintenance personnel can delete the report management module in operation and maintenance unit 221 to delete the operation and maintenance function of operation and maintenance unit 221 corresponding to the report.
  • adding or deleting a specific operation and maintenance function may also refer to adding or deleting some functions of an existing module. This application is not limited.
  • each operation and maintenance unit is independent of each other, so it can evolve and iterate independently without taking effect globally, which can realize user customization, improve the overall flexibility of system operation and maintenance, and can be applied to user needs in different scenarios.
  • the operation and maintenance unit can receive analysis tasks issued by one or more other operation and maintenance units, and can also initiate analysis tasks to one or more other operation and maintenance units and collect operation and maintenance data.
  • any operation and maintenance unit can send an analysis task to at least one operation and maintenance unit with which a communication connection is established.
  • the operation and maintenance unit that receives the analysis task can feedback the analysis result to the operation and maintenance unit through the communication connection.
  • each operation and maintenance unit corresponds, and accordingly, each operation and maintenance unit has the ability to publish tasks and feedback task results, so as to adapt to the scenario of failure of the headquarters (i.e., the main operation and maintenance unit).
  • the analysis task is sent to each operation and maintenance unit for execution, so that the analysis task can be completed in a distributed manner.
  • Each node only needs to complete its own analysis task, and there is no need to concentrate the task on one of the nodes for execution, thereby effectively improving the completion efficiency of the analysis task, reducing the communication interaction between clouds, and reducing communication overhead.
  • the operation and maintenance unit is not limited to one or some specific cloud vendors. It only needs to have the above-mentioned "operation and maintenance feature capabilities" and be able to follow the communication protocol.
  • the embodiments of the present application can be applicable to different hardware and software environments, thereby improving the universality of application scenarios.
  • operation and maintenance unit includes but is not limited to:
  • System/data interface used to provide an interface through which the operation and maintenance unit can actively collect or passively receive operation and maintenance data related to the operation and maintenance object (i.e., cloud or cloud platform).
  • the operation and maintenance data includes but is not limited to: resource objects, alarms, performance indicators, logs, etc.
  • CMDB A configuration database used to establish various resource objects and resource object relationships in the entire cloud. This module stores The instances of resource objects at various layers, such as hardware, virtualization, software, and applications, of the local cloud (i.e., the cloud to which the operation and maintenance unit belongs) and the relationship data between the instances.
  • Data analysis platform a database used to store time series data such as performance indicators and logs in the local cloud (i.e., the cloud to which the operation and maintenance unit belongs).
  • the unified format is used uniformly by all upper-level module applications. It can be understood that this module converts the data collected or reported from the outside into a specific format so that the upper-level modules of the operation and maintenance unit can receive information in a specific format and obtain the corresponding data.
  • Resource management a feature module for providing a resource list.
  • the operation and maintenance unit can obtain a resource list by calling this module, which includes various resource objects in the cloud.
  • Performance management a feature module used to provide performance indicator-related capabilities.
  • the operation and maintenance unit can call this module to display performance indicators, set performance thresholds, and collect performance data.
  • Alarm management a feature module used to provide alarm-related management capabilities.
  • Log management A feature module used to provide management capabilities related to system logs, security logs, operation logs, and run logs.
  • Automatic operation a feature module used to provide daily automated operation and maintenance capabilities.
  • Knowledge base A case library and knowledge base feature module used to accumulate and consolidate common troubleshooting methods.
  • Report management a feature module used to provide report-related management capabilities.
  • the operation and maintenance unit can call the report management module to perform functions such as viewing, modifying, deleting, customizing, and setting report periodic tasks.
  • Large screen display a feature module for providing large screen display business requirements.
  • the operation and maintenance unit can provide an operation and maintenance interface
  • the large screen display module can support the operation of the operation and maintenance interface and display relevant operation and maintenance parameters and resource parameters.
  • Load analysis a feature module for providing analysis of the load conditions of various resources in the cloud.
  • the load analysis module may periodically obtain the load conditions of various resources in the cloud and perform load analysis.
  • the load analysis module may also obtain the load resource status and perform load analysis after receiving an instruction.
  • the received instruction may be sent by the main operation and maintenance unit or issued by local operation and maintenance personnel, which is not limited in this application.
  • load resources include, but are not limited to: the hardware load conditions of the server cluster. For example, the CPU load conditions, the memory load conditions, etc., are not limited in this application.
  • Capacity analysis a feature module used to provide analysis of capacity information (e.g., the used capacity and unused capacity status of each resource) of various basic resources in the cloud (i.e., hardware resources, such as computing resources, storage resources, etc.) and cloud service resources (i.e., cloud service resources provided by the cloud platform).
  • the capacity analysis module may periodically obtain various basic resource conditions and cloud service capacity information in the cloud, and perform capacity analysis.
  • the capacity analysis module may also obtain various basic resource conditions and cloud service capacity information and perform capacity analysis after receiving instructions.
  • the received instructions may be sent by the main operation and maintenance unit, or issued by local operation and maintenance personnel, which is not limited in this application.
  • Idle resource analysis also referred to as free resource analysis: a feature module for providing analysis of various types of idle resources.
  • the operation and maintenance unit can obtain the idle status of each resource and perform analysis by calling the idle resource analysis module.
  • the capacity analysis module can periodically obtain various types of idle resources in the cloud and perform idle resource analysis.
  • the capacity analysis module can also obtain various types of idle resources and perform idle resource analysis after receiving instructions. Among them, the received instructions can be sent by the main operation and maintenance unit or issued by local operation and maintenance personnel, and this application is not limited.
  • the operation and maintenance unit can obtain the idle status of the CPU of each server in the server cluster by calling the idle resource analysis module, and analyze the idle status of the obtained CPU to obtain the overall idle status of the CPU in the cloud. It can be understood that the results of the idle resource analysis can be used to indicate which resources in the cloud have low utilization rates.
  • Bottleneck resource analysis The cloud management platform provides customers with a feature module for analyzing various bottleneck resources.
  • the operation and maintenance unit can obtain the usage of each resource and perform bottleneck resource analysis by calling the bottleneck resource analysis module.
  • the bottleneck resource analysis module can periodically obtain the usage of various resources in the cloud and perform bottleneck resource analysis.
  • the bottleneck resource analysis module can also obtain the usage of various resources and perform bottleneck resource analysis after receiving an instruction.
  • the received instruction can be sent by the main operation and maintenance unit or issued by the local operation and maintenance personnel, and this application is not limited. It can be understood that the results of the bottleneck resource analysis can be used to indicate which resources in the cloud have a higher utilization rate.
  • resources with excessive utilization may cause bottlenecks.
  • the operation and maintenance unit determines that the utilization rate of storage resources is too high through bottleneck resource analysis, it can be predicted that the storage resources may serve as the resource bottleneck of the local cloud, and may not meet user needs in subsequent use.
  • the operation and maintenance unit can then prompt the user to expand the storage resources by means of an alarm or prompt.
  • User management used to provide feature modules related to user management and authentication.
  • Communication interface used to provide an interface for data interaction with other operation and maintenance units.
  • Data interaction includes but is not limited to: “analysis task issuance and result return”, “data collection or reporting” and “operation and maintenance experience distribution and sharing”. Details are as follows:
  • the communication interface can not only receive analysis tasks issued to itself by other operation and maintenance units, but also initiate analysis tasks required by itself to other operation and maintenance units. Since the performance indicator data are distributed locally in each operation and maintenance unit, the single-point bottleneck problem of performance indicator data is avoided.
  • the operation and maintenance unit i.e., the main operation and maintenance unit belonging to the "headquarters" role needs analysis results, it only needs to synchronously distribute the analysis tasks to at least one other equivalent operation and maintenance unit of its concern.
  • the headquarters operation and maintenance unit can send analysis tasks through the communication interface, and transmit the analysis tasks to the target operation and maintenance unit based on the communication connection (also called communication channel, which follows the distributed operation and maintenance communication protocol) with other operation and maintenance units.
  • the target operation and maintenance unit can be at least one operation and maintenance unit in the cloud service system.
  • the target operation and maintenance unit After receiving the analysis task through the communication interface, the target operation and maintenance unit performs analysis locally and can transmit the analysis results back to the headquarters operation and maintenance unit through the communication interface and the communication connection with the headquarters operation and maintenance unit.
  • the headquarters unit receives the analysis results transmitted back by at least one operation and maintenance unit through the communication interface and can summarize and display the analysis results.
  • the operation and maintenance units in the cloud service system that have operation and maintenance experience distribution and sharing modules all have the ability to share data such as automatic operation scripts and operation and maintenance (experience) knowledge bases.
  • the operation and maintenance units can publish their own automatic operation scripts, operation and maintenance (experience) knowledge bases, and other data to other operation and maintenance units based on received user instructions.
  • the peer operation and maintenance units can learn based on the received automatic operation scripts, operation and maintenance (experience) knowledge bases, etc., so that the operation and maintenance units can perform related operation and maintenance operations based on the learned automatic operation scripts, operation and maintenance (experience) knowledge bases, and other data during operation.
  • the communication interface is a concentrated embodiment of the interactive capability between the operation and maintenance units.
  • the data packets sent by the communication interface follow the distributed operation and maintenance communication protocol. It can be understood that the structure of the data packets sent by each operation and maintenance unit is the same.
  • the operation and maintenance unit can monitor the data packets through the communication interface and correctly read the data packets that follow the distributed operation and maintenance communication protocol to obtain the data carried in the data packets.
  • the protocol itself is peer-to-peer, and there is no difference between the client and the server between different operation and maintenance units. It is just that in the actual deployment and actual use of the customer, the organizational structure of the cloud service system can be set according to user needs, and it plays a role in satisfying user demands.
  • the communication interface can not only receive the analysis tasks issued to itself by other operation and maintenance units, but also initiate the analysis tasks required by itself to other operation and maintenance units. Since the performance indicator data are distributed locally in each operation and maintenance unit, the single-point bottleneck problem of performance indicator data is avoided.
  • the operation and maintenance unit i.e., the main operation and maintenance unit
  • the headquarters operation and maintenance unit can send analysis tasks through the communication interface, and transmit the analysis tasks to the target operation and maintenance unit based on the communication connection with other operation and maintenance units (also called a communication channel, which follows the distributed operation and maintenance communication protocol).
  • the target operation and maintenance unit can be at least one operation and maintenance unit in the cloud service system.
  • the target operation and maintenance unit After receiving the analysis task through the communication interface, the target operation and maintenance unit performs analysis locally, and can transmit the analysis results back to the headquarters operation and maintenance unit again through the communication interface and the communication connection with the headquarters operation and maintenance unit.
  • the headquarters unit receives the analysis results returned by at least one operation and maintenance unit through the communication interface, and can summarize and display the analysis results, thereby reducing the single point pressure of the cloud service system.
  • the analysis and calculation tasks can be distributed to each operation and maintenance unit for execution. While improving the execution effect of the analysis task, it reduces the amount of data in the communication interaction between the clouds, effectively reduces the communication overhead, and saves air interface resources.
  • the operation and maintenance unit in the embodiment of the present application contains various feature modules of all operation and maintenance business areas including user management, which are sufficient to support the self-closed loop within the operation and maintenance unit to support the local operation and maintenance team to realize its own daily operation and maintenance business. Its own independence is guaranteed, that is, it can evolve and iterate on its own, and the versions of each operation and maintenance unit can be different, and the manufacturers to which they belong can also be different.
  • the customized enhancement of the unique capabilities performed in a specific instance of the operation and maintenance unit is limited to the corresponding operation and maintenance unit itself, and will not affect other operation and maintenance units.
  • distributed computing of heavy analysis tasks can be realized to avoid the single-point resource consumption bottleneck of analysis tasks.
  • the operation and maintenance units can also interact with data such as automatic job scripts and operation and maintenance experience knowledge bases through communication connections and communication interfaces to realize the sharing of operation and maintenance experience capabilities accumulated by each operation and maintenance unit.
  • At least one operation and maintenance unit may not establish a communication connection with other operation and maintenance units in the cloud service system. It should be noted that in this scenario, although the at least one operation and maintenance unit does not exchange data with other operation and maintenance units, it still belongs to the same cloud rented by the same customer as other operation and maintenance units. Therefore, even in the absence of communication, at least one operation and maintenance unit still belongs to the same cloud service system as other operation and maintenance units.
  • each operation and maintenance unit is equal, and in an embodiment of the present application, each operation and maintenance unit has the ability to publish analysis tasks.
  • Henan operation and maintenance personnel can also publish analysis tasks to operation and maintenance units 231 and 241 through operation and maintenance unit 221.
  • Operation and maintenance units 231 and 241 can perform corresponding resource analysis based on the analysis tasks to obtain resource analysis results.
  • Operation and maintenance units 231 and 241 can feedback resource analysis results to operation and maintenance unit 221.
  • Figure 5 is an exemplary schematic diagram of the cloud service system initialization process. Please refer to Figure 5, which specifically includes but is not limited to the following steps:
  • the operation and maintenance personnel 1 in FIG5 may be the Shanghai operation and maintenance personnel in FIG2
  • the operation and maintenance unit 1 may be the operation and maintenance unit 211
  • the cloud platform 1 is the cloud platform deployed in the cloud 210
  • the cloud platform 2 may provide corresponding cloud services based on the hardware resources in the cloud 220.
  • the operation and maintenance personnel 2 may be the Henan operation and maintenance personnel
  • the operation and maintenance unit 2 may be the operation and maintenance unit 221
  • the cloud platform 2 is the cloud platform deployed in the cloud 220
  • the cloud platform 2 may provide corresponding cloud services based on the hardware resources in the cloud 220.
  • the operation and maintenance personnel 3 may be the Zhengzhou operation and maintenance personnel, the operation and maintenance unit 3 may be the operation and maintenance unit 231, the cloud platform 3 is the cloud platform deployed in the cloud 230, and the cloud platform 3 may provide corresponding cloud services based on the hardware resources in the cloud 230.
  • the organizational structure of the cloud service system shown in FIG2 is taken as an example, that is, the cloud 230 in Zhengzhou can be considered as a subordinate branch of the cloud 220 in Henan.
  • each operation and maintenance unit is equal, and only the roles played in the organizational structure are different. Among them, different roles may also have different permissions.
  • operation and maintenance unit 3 is a subordinate of operation and maintenance unit 2, and operation and maintenance unit 2 can issue analysis tasks to operation and maintenance unit 3 so that operation and maintenance unit 3 can feedback resource analysis results.
  • Operation and maintenance unit 3 is equal to operation and maintenance unit 2, and it also has the ability to issue analysis tasks, but it is a subordinate of operation and maintenance unit 2 in the organizational structure of the cloud service system shown in FIG2, and usually it will not issue analysis tasks to operation and maintenance unit 2.
  • the organizational structure described in the embodiment of the present application is only a schematic example, and the organizational structure and roles can be set according to actual needs, and this application does not limit it.
  • each operation and maintenance personnel can deploy an operation and maintenance unit on the local cloud.
  • the operation and maintenance unit can be understood as an application software or program instruction.
  • the operation and maintenance personnel can install and run the software package corresponding to the operation and maintenance unit on each cloud (for example, cloud 210, cloud 220, cloud 230, and cloud 240) to deploy and run the operation and maintenance unit on the cloud.
  • each operation and maintenance unit can be understood as an independent cloud management platform for managing the cloud resources of the corresponding cloud.
  • the deployment phase may also include setting operation and maintenance information of the operation and maintenance unit, such as but not limited to: user name, password, address information (including domain name, IP address, etc.) and other information.
  • setting operation and maintenance information of the operation and maintenance unit such as but not limited to: user name, password, address information (including domain name, IP address, etc.) and other information.
  • the modules in the operation and maintenance units are started synchronously.
  • the operation and maintenance capabilities of each operation and maintenance unit are the same, that is, the modules included are the same.
  • the operation and maintenance units in the embodiment of the present application may have independent expansion capabilities, and the operation and maintenance capabilities of each operation and maintenance unit may be partially or completely different. The operation and maintenance personnel can expand a single operation and maintenance unit according to actual needs, and this application does not limit it.
  • some modules may optionally start to obtain relevant data in the cloud.
  • the capacity analysis module may periodically obtain basic resources and cloud service capacity information in the cloud, and perform capacity analysis.
  • the idle resource analysis module may periodically obtain various idle resources in the cloud, and perform idle resource analysis.
  • the load resource analysis module may periodically obtain the load conditions of various resources, and perform load resource analysis.
  • the bottleneck resource analysis module may periodically obtain the usage of various resources, and perform bottleneck resource analysis.
  • each analysis module may only obtain the corresponding resource status without performing analysis actions. After receiving a user's query instruction or an analysis task issued by the headquarters operation and maintenance unit, the analysis action may be performed based on the most recently obtained resource status and the analysis result may be obtained.
  • each analysis module may also obtain the resource status in the cloud and perform analysis to obtain analysis results after receiving a query instruction from the user or an analysis task issued by the headquarters operation and maintenance unit.
  • the operation and maintenance operations in the cloud service system in the embodiment of the present application are performed at the granularity of the operation and maintenance unit, and each operation and maintenance unit can perform the operation and maintenance operations accordingly without affecting the execution of other operation and maintenance units.
  • each operation and maintenance unit can perform the operation and maintenance operations on the local cloud without having to go through other cloud management platforms to cause data detours and waste communication resources.
  • the operation and maintenance unit serves as an independent cloud management platform, which can be used to manage the corresponding cloud platform.
  • the operation and maintenance unit can obtain various parameters of the cloud platform in real time through various modules to monitor the operating status and resource status of the cloud platform.
  • the alarm module of the operation and maintenance unit can issue an alarm.
  • the operation and maintenance unit can also update the cloud platform, such as upgrading some components of the cloud platform, etc., which is not limited in this application.
  • the management of the cloud platform by the operation and maintenance unit can realize the various operation and maintenance capabilities of the cloud management platform in the existing technology, but in an embodiment of the present application, each operation and maintenance unit can independently manage the cloud platform it manages, without the need to interact with each other.
  • the operation and maintenance unit also manages the hardware resources in the cloud.
  • the hardware resources can also be considered as managing the hardware resources of the cloud platform. For example, the operating status and resource usage status of the hardware resources are monitored, and alarms are issued in a timely manner.
  • the operation and maintenance personnel designates the operation and maintenance unit 1, that is, the operation and maintenance unit in the cloud 210 located in Shanghai as the headquarters role, which can also be called the main operation and maintenance unit.
  • the operation and maintenance unit of the headquarters role in the embodiment of the present application is equal to other operation and maintenance units, wherein the "equality" of the operation and maintenance units can be understood as the headquarters operation and maintenance unit and other operation and maintenance units all have the ability to publish analysis tasks and feedback analysis results.
  • the headquarters role is used as an example to illustrate as the subject of publishing analysis tasks.
  • other operation and maintenance units can also be designated as headquarters roles.
  • the headquarters role can also include multiple ones, which is not limited in this application.
  • the operation and maintenance personnel 1 can access the operation and maintenance unit 1 through the electronic device.
  • the operation and maintenance unit 1 can provide an operation and maintenance interface, and the operation and maintenance personnel 1 can enter information such as a user name and password through the operation and maintenance interface to log in to the operation and maintenance unit.
  • the operation and maintenance unit 1 receives the information entered by the user, verifies the user information, and allows the operation and maintenance personnel 1 to log in after the verification is successful.
  • the operation and maintenance personnel 1 After the operation and maintenance personnel 1 successfully logs in, they can send instruction information to the operation and maintenance unit 1 through the electronic device to instruct the operation and maintenance unit 1 to act as the headquarters operation and maintenance unit. That is, the operation and maintenance unit 1 acts as the headquarters role in the organizational structure of the cloud service system.
  • operation and maintenance unit 1 is connected with operation and maintenance unit 2 and operation and maintenance unit 3.
  • operation and maintenance unit 1 determines that operation and maintenance unit 1 serves as the headquarters role.
  • Operation and maintenance unit 1 sends docking instruction information to operation and maintenance unit 2, and the docking instruction information includes but is not limited to: docking instruction, identification information of operation and maintenance unit 1 (for example, address information), and identification information of operation and maintenance unit 2 (for example, address information).
  • the docking instruction information is used to indicate that operation and maintenance unit 1 serves as the headquarters role of operation and maintenance unit 2, and can also be understood as the headquarters role of operation and maintenance unit 1 in the cloud service system.
  • operation and maintenance unit 2 determines that operation and maintenance unit 1 serves as the headquarters operation and maintenance unit, i.e., serves as the headquarters role, in the cloud service system based on the docking indication in the docking indication information, the identification information of operation and maintenance unit 1, and the identification information of operation and maintenance unit 2.
  • the operation and maintenance unit 2 sends a docking response message to the operation and maintenance unit 1, and the docking response message includes but is not limited to: a successful docking indication, identification information of the operation and maintenance unit 1, identification information of the operation and maintenance unit 2, and operation and maintenance information of the operation and maintenance unit 2.
  • the operation and maintenance information includes but is not limited to: user name, password, and address information of the operation and maintenance unit 2, and this application does not limit this.
  • the operation and maintenance unit 2 can prompt the user to receive the docking indication information, and after receiving the user's permission indication, send the docking response information.
  • operation and maintenance unit 1 receives the docking response information sent by operation and maintenance unit 2. Based on the docking response information, operation and maintenance unit 1 determines that operation and maintenance unit 2 agrees to dock, that is, operation and maintenance unit 1 is allowed to serve as the headquarters operation and maintenance unit. Exemplarily, operation and maintenance unit 1 can establish a communication connection with operation and maintenance unit 2 based on the address information in the received operation and maintenance information.
  • operation and maintenance unit 1 is connected with operation and maintenance unit 2 to establish a communication connection.
  • Operation and maintenance unit 2 can be connected with operation and maintenance unit 3 to establish a communication connection.
  • operation and maintenance unit 1 when issuing an analysis task, operation and maintenance unit 1 can issue an analysis task to operation and maintenance unit 2.
  • Operation and maintenance unit 2 can execute the analysis task and issue the analysis task to operation and maintenance unit 3.
  • Operation and maintenance unit 3 feeds back the analysis results to operation and maintenance unit 2.
  • Operation and maintenance unit 2 feeds back the local analysis results and the analysis results of operation and maintenance unit 2 to operation and maintenance unit 1 through the communication connection between operation and maintenance power supply 1.
  • operation and maintenance unit 3 as a subordinate of operation and maintenance unit 2, docks with operation and maintenance unit 2 and establishes a communication connection.
  • Operation and maintenance unit 3 can also dock with operation and maintenance unit 1 and establish a communication connection.
  • operation and maintenance unit 1 when issuing an analysis task, operation and maintenance unit 1 can issue the analysis task to operation and maintenance unit 2 through the communication connection between operation and maintenance unit 2, and receive the analysis results fed back by operation and maintenance unit 2.
  • operation and maintenance unit 1 can issue the analysis task to operation and maintenance unit 3 through the communication connection between operation and maintenance unit 3, and receive the analysis results fed back by operation and maintenance unit 3.
  • the organizational structure in the cloud service system is set according to user needs and can be understood as a logical structure. The organizational structure of the cloud service system will not affect the communication structure between the operation and maintenance units.
  • each operation and maintenance unit may also interact with other data, which is not limited in this application.
  • operation and maintenance personnel 1 instructs operation and maintenance unit 1 to issue an idle resource analysis task.
  • the operation and maintenance unit 1 establishes communication connections with the operation and maintenance unit and the operation and maintenance unit 3 respectively.
  • the operation and maintenance personnel 1 can send user instructions (which may also be called operation and maintenance instructions or operation and maintenance instruction information, etc., which are not limited in this application) to the operation and maintenance unit 1 through an electronic device.
  • the user instructions include but are not limited to: identification information of the target operation and maintenance unit and idle resource analysis task instructions, which are used to instruct the operation and maintenance unit 1 to obtain the idle resource analysis results of the target operation and maintenance unit.
  • the target operation and maintenance units are operation and maintenance unit 1, operation and maintenance unit 2, and operation and maintenance unit 3 as examples.
  • the target operation and maintenance unit may be all operation and maintenance units in this cloud service system.
  • idle resource analysis tasks can be issued to them through their parent nodes (i.e., the superior operation and maintenance units).
  • the target operation and maintenance unit may also refer to all operation and maintenance units in this cloud service system that have completed docking with operation and maintenance unit 1 (i.e., the headquarters operation and maintenance unit).
  • the target operation and maintenance unit may also be at least one operation and maintenance unit in the cloud service system. For example, if the target operation and maintenance unit is operation and maintenance unit 2, the user instruction includes the identification information of operation and maintenance unit 2 and the idle resource analysis task indication, which is used to instruct operation and maintenance unit 1 to obtain the idle resource analysis task of operation and maintenance unit 2.
  • operation and maintenance unit 1 sends an idle resource analysis task to operation and maintenance unit 2.
  • operation and maintenance unit 1 sends an idle resource analysis task to operation and maintenance unit 3.
  • operation and maintenance unit 2 sends the idle resource analysis result to operation and maintenance unit 1.
  • operation and maintenance unit 2 sends the idle resource analysis result to operation and maintenance unit 1.
  • the operation and maintenance unit 1 obtains a local idle resource analysis task.
  • the operation and maintenance unit 1 determines that the target operation and maintenance units are the operation and maintenance unit 2 and the operation and maintenance unit 3, and the operation and maintenance operation to be performed is to obtain the idle resource analysis results of the operation and maintenance unit 2 and the operation and maintenance unit 3. Accordingly, the operation and maintenance unit 1 sends resource analysis instruction information to the operation and maintenance unit 2 through the communication interface and the communication connection with the operation and maintenance unit 2.
  • the resource analysis instruction information includes the identification information of the operation and maintenance unit 2 and the idle resource analysis task instruction, which is used to instruct the operation and maintenance unit 2 to perform the idle resource analysis operation and feedback the idle resource analysis results.
  • the operation and maintenance unit 1 sends resource analysis indication information to the operation and maintenance unit 3 through the communication interface and the communication connection with the operation and maintenance unit 3.
  • the resource analysis indication information includes the identification information of the operation and maintenance unit 3 and the idle resource analysis task indication, which is used to instruct the operation and maintenance unit 3 to perform the operation and maintenance operation of the idle resource analysis and feedback the idle resource analysis results.
  • the operation and maintenance unit 1 before the operation and maintenance unit 1 sends the idle resource analysis task to the operation and maintenance unit 2 and the operation and maintenance unit 3, it authenticates with the operation and maintenance unit 2 and the operation and maintenance unit 3.
  • the operation and maintenance unit 1 sends the authentication information to the operation and maintenance unit 2 through the communication connection, and the authentication information includes the user name and password of the operation and maintenance unit 2 (this information is obtained in the docking process of Figure 5).
  • the operation and maintenance unit 2 After receiving the authentication information, the operation and maintenance unit 2 authenticates the user name and password. After the operation and maintenance unit 2 determines that the authentication is successful, it sends a response message to the operation and maintenance unit 1 to indicate that the authentication is successful.
  • the operation and maintenance unit 1 After the operation and maintenance unit 1 receives the response information and determines that the authentication is successful, it can continue to execute S602a.
  • the processing of the operation and maintenance unit 3 is the same as that of the operation and maintenance unit 2, which will not be repeated here.
  • the operation and maintenance unit 1 can also send the operation and maintenance information of the operation and maintenance unit 1 to the operation and maintenance unit 2, including but not limited to: the user name, password and address information of the operation and maintenance unit 1.
  • the operation and maintenance unit 2 if the operation and maintenance unit actively pulls data, that is, as shown in FIG6 , the operation and maintenance unit 1 actively requests data from the operation and maintenance unit 2, then during the authentication process, the operation and maintenance unit 2 only needs to authenticate the operation and maintenance unit 1, that is, to determine whether the operation and maintenance unit 1 holds the user name and password of the operation and maintenance unit 2. It can also be understood that the operation and maintenance unit 1 has the authority to obtain data from the operation and maintenance unit 1 only when the user name and password of the operation and maintenance unit 2 are held. In other examples, if the operation and maintenance unit passively obtains data, for example, the operation and maintenance unit 2 sends an alarm message to the operation and maintenance unit 1, for the operation and maintenance unit 1, it is passively receiving data.
  • the operation and maintenance unit 1 needs to authenticate the operation and maintenance unit 2, that is, before the operation and maintenance unit 2 sends data to the operation and maintenance unit 1, the user name and password of the operation and maintenance unit 1 are sent to the operation and maintenance unit 1 for authentication. After the operation and maintenance unit 1 determines that the authentication of the operation and maintenance unit 2 is successful and sends the response information to the operation and maintenance unit 2, the operation and maintenance unit 2 determines that the authentication is successful based on the response information and sends the alarm information to the operation and maintenance unit 1.
  • the operation and maintenance unit 2 receives resource analysis indication information through the communication interface, and performs idle resource analysis operation and maintenance operations based on the resource analysis indication information.
  • the operation and maintenance unit 2 (specifically, the idle resource analysis module, which will not be described again below) periodically obtains the idle resource status, performs idle resource analysis, and obtains idle resource analysis results.
  • the cycle length can be adjusted according to actual needs. Settings are not limited in this application.
  • the operation and maintenance unit 2 feeds back the idle resource analysis results most recently obtained to the operation and maintenance unit 1.
  • the operation and maintenance unit 2 can send resource analysis response information to the operation and maintenance unit 1 through the communication connection with the operation and maintenance unit 1.
  • the resource analysis response information includes but is not limited to: the identification information of the operation and maintenance unit 1, the identification information of the operation and maintenance unit 2, and the idle resource analysis results.
  • the operation and maintenance unit 2 periodically obtains the idle resource status. After receiving the resource analysis indication information, the operation and maintenance unit 2 performs idle resource analysis based on the latest idle resource status obtained, obtains the idle resource analysis result, and feeds back the idle resource analysis result to the operation and maintenance unit 1 through the communication connection with the operation and maintenance unit 1.
  • the operation and maintenance unit 2 can send resource analysis response information to the operation and maintenance unit 1 through the communication connection with the operation and maintenance unit 1, and the resource analysis response information includes but is not limited to: the identification information of the operation and maintenance unit 1, the identification information of the operation and maintenance unit 2, and the idle resource analysis result.
  • the operation and maintenance unit 2 may obtain the idle resource status in the cloud and perform idle resource analysis to obtain the idle resource analysis result, and feed back the idle resource analysis result to the operation and maintenance unit 1 through the communication connection with the operation and maintenance unit 1.
  • the operation and maintenance unit 2 may send resource analysis response information to the operation and maintenance unit 1 through the communication connection with the operation and maintenance unit 1, and the resource analysis response information includes but is not limited to: the identification information of the operation and maintenance unit 1, the identification information of the operation and maintenance unit 2, and the idle resource analysis result.
  • the idle resource status obtained by the operation and maintenance unit 2 can be triggered after receiving the indication information, or it can be obtained before receiving the indication information, and the present application does not limit it.
  • the processing of the operation and maintenance unit 3 is the same as that of the operation and maintenance unit 2, which will not be described in detail here. It should be noted that the order of S602a and S602b is not limited.
  • the target operation and maintenance unit also includes operation and maintenance unit 1. Accordingly, operation and maintenance unit 1 obtains idle resource analysis results based on the idle resource status of the local cloud (i.e., cloud 210).
  • the method for obtaining the idle resource analysis results can refer to the relevant description of the operation and maintenance unit 2 processing, which will not be repeated here.
  • the operation and maintenance unit 1 may display the obtained idle resource analysis results, including the idle resource analysis results of the operation and maintenance unit 1 , the idle resource analysis results of the operation and maintenance unit 2 , and the idle resource analysis results of the operation and maintenance unit 3 .
  • the target operation and maintenance unit may be at least one operation and maintenance unit in the cloud service system. If the target operation and maintenance unit is operation and maintenance unit 2, operation and maintenance unit 1 executes S602a without executing S602b.
  • the operation and maintenance unit can also perform the release and return of other tasks, and the release and return of other types of data (such as alarm information) (wherein the return may be optional). Its implementation is similar to the process in Figure 6, and this application will not illustrate them one by one.
  • the headquarters operation and maintenance unit can initiate global analysis tasks.
  • the headquarters operation and maintenance unit can send an analysis task to at least one operation and maintenance unit.
  • each operation and maintenance unit After each operation and maintenance unit receives the analysis task from the headquarters, it will perform the analysis task based on the resource status of the local cloud, and uniformly transmit the analysis results back to the headquarters operation and maintenance unit, thereby reducing the computing pressure of the headquarters operation and maintenance unit and reducing the communication overhead caused by communication interaction.
  • operation and maintenance personnel 1 instructs operation and maintenance unit 1 to perform knowledge sharing.
  • the operation and maintenance personnel 1 can send user instructions (which may also be called operation and maintenance instructions or operation and maintenance instruction information, etc., which are not limited in this application) to the operation and maintenance unit 1 through an electronic device.
  • the user instructions include but are not limited to: identification information of the target operation and maintenance unit and knowledge sharing instructions, which are used to instruct the operation and maintenance unit 1 to send operation and maintenance experience knowledge and/or automatic operation scripts and other information (hereinafter referred to as knowledge information) to the target operation and maintenance unit.
  • knowledge information information of the target operation and maintenance unit
  • the description of the target operation and maintenance unit can be found above and will not be repeated here.
  • operation and maintenance unit 1 sends instruction information to operation and maintenance unit 2.
  • operation and maintenance unit 1 sends instruction information to operation and maintenance unit 3.
  • the operation and maintenance unit 2 learns the instruction information.
  • operation and maintenance unit 2 learns knowledge information from operation and maintenance unit 1.
  • operation and maintenance unit 1 determines that the target operation and maintenance units are operation and maintenance unit 2 and operation and maintenance unit 3, and that the operation and maintenance operation to be performed is knowledge sharing. Accordingly, operation and maintenance unit 1 calls the knowledge base module in operation and maintenance unit 1 to obtain the corresponding knowledge information. Operation and maintenance unit 1 sends knowledge sharing indication information to operation and maintenance unit 2 through the communication interface and the communication connection with operation and maintenance unit 2.
  • the knowledge sharing indication information includes but is not limited to: identification information of operation and maintenance unit 2 and knowledge information of operation and maintenance unit 1.
  • the operation and maintenance unit 2 receives the knowledge sharing indication information through the communication interface.
  • the operation and maintenance unit 2 obtains the indication information based on the indication sharing indication information.
  • the operation and maintenance unit 2 can learn the indication information and perform corresponding operation and maintenance operations based on the learned operation and maintenance experience indication and/or automatic operation script in the subsequent operation and maintenance process.
  • operation and maintenance unit 3 is the same as that of operation and maintenance unit 2, and will not be repeated here.
  • any operation and maintenance unit in the cloud service system can actively (e.g., periodically) or passively (i.e., in response to instructions from operation and maintenance personnel) trigger knowledge sharing to send knowledge information to a target operation and maintenance unit.
  • the target operation and maintenance unit may be pre-set or may be instructed by the operation and maintenance personnel through user instructions, which is not limited in this application.
  • Figure 8 is a schematic diagram of an exemplary scenario in which the headquarters operation and maintenance unit is abnormal. Please refer to Figure 8.
  • This scenario includes the cloud service system shown in Figure 2, wherein the cloud service system also includes a cloud 250 located in Hunan, and an operation and maintenance unit 251 is deployed on cloud 250.
  • the cloud service system also includes a cloud 250 located in Hunan, and an operation and maintenance unit 251 is deployed on cloud 250.
  • the operation and maintenance unit 211 is used as the headquarters operation and maintenance unit, and a fault occurs.
  • the cause of the failure of the operation and maintenance unit may be a communication interface failure or a module failure, which is not limited in this application.
  • Figure 9 is a schematic diagram of an exemplary scenario of an abnormality of the headquarters operation and maintenance unit.
  • the Henan operation and maintenance personnel can instruct the operation and maintenance unit 221 in the cloud 220 to act as the headquarters role in the cloud service system.
  • the Henan operation and maintenance personnel can access the operation and maintenance unit 221 through an electronic device, and send a user instruction to the operation and maintenance unit 221 through the electronic device.
  • the user instruction is used to instruct the operation and maintenance unit 221 to act as the headquarters operation and maintenance unit.
  • the operation and maintenance unit 221 receives the user instruction and determines to act as the headquarters role.
  • the operation and maintenance unit 221 is docked with other operation and maintenance units in the cloud service system.
  • the operation and maintenance unit 221 is docked with the operation and maintenance unit 251 and establishes a communication connection.
  • the specific method can be referred to the description in Figure 6, which will not be repeated here.
  • operation and maintenance unit 221 since operation and maintenance unit 221 has already completed docking with operation and maintenance unit 231 and operation and maintenance unit 241 and established a communication connection, operation and maintenance unit 221 does not need to perform docking steps with operation and maintenance unit 231 and operation and maintenance unit 241 again.
  • the operation and maintenance unit 221 can implement the operation and maintenance operations previously performed by the operation and maintenance unit 221. For example, based on the instructions of the operation and maintenance personnel, the operation and maintenance unit 221 can issue operation and maintenance operations such as idle resource analysis tasks to the operation and maintenance unit 251, the operation and maintenance unit 231, and the operation and maintenance unit 241. For specific implementation, please refer to the relevant description of the operation and maintenance unit 221 above, which will not be repeated here.
  • the operation and maintenance personnel when the operation and maintenance unit assigned the "headquarters" role by the operation and maintenance personnel fails during the operation of the cloud service system, the operation and maintenance personnel can select another equivalent operation and maintenance unit in the cloud service system as the headquarters role as needed to assume the relevant capability requirements of the original headquarters operation and maintenance unit while isolating the failed headquarters operation and maintenance unit.
  • the new headquarters operation and maintenance unit in the cloud service system can still manage other operation and maintenance units except the original headquarters operation and maintenance unit, thereby realizing the rapid acceptance and switching of the headquarters operation and maintenance capabilities.
  • FIG10 is a schematic diagram of a scenario in which the operation and maintenance unit communication is abnormal.
  • the scenario includes the cloud service system shown in FIG2 , wherein the cloud service system also includes a cloud 250 located in Hunan, and an operation and maintenance unit 251 is deployed on the cloud 250.
  • the cloud service system also includes a cloud 250 located in Hunan, and an operation and maintenance unit 251 is deployed on the cloud 250.
  • the operation and maintenance unit 211 is used as the headquarters operation and maintenance unit, and an abnormality occurs in the communication connection between the operation and maintenance unit 251 and the operation and maintenance unit 211.
  • an abnormality occurs in the communication connection between the operation and maintenance unit 251 and the operation and maintenance unit 211
  • since the operation and maintenance unit 251 is an independent operation and maintenance unit it can still continue to manage the local cloud platform, for example, it can continue to obtain the resource status of the local cloud and perform resource status analysis.
  • the operation and maintenance unit 251 cannot receive the relevant instructions issued by the headquarters operation and maintenance unit.
  • the headquarters operation and maintenance unit i.e., the operation and maintenance unit 211
  • the headquarters operation and maintenance unit may try to periodically send instructions to the operation and maintenance unit 251 to detect whether the communication connection is normal. After determining that the communication connection is normal, the relevant instructions may be issued to the operation and maintenance unit 251.
  • FIG11 shows a schematic block diagram of a communication device 110 according to an embodiment of the present application.
  • the communication device 110 may include: a processor 1101 and a transceiver/transceiver pin 1102, and optionally, a memory 1103.
  • the processor 1101 may be used to execute the steps performed by the rendering device in each method of the aforementioned embodiment, and control the receiving pin to receive a signal, and control the sending pin to send a signal.
  • bus 1104 The components of the device 1100 are coupled together via a bus 1104, wherein the bus system 1104 includes a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system 1104 includes a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are labeled as the bus system 1104 in the figure.
  • the memory 1103 may be used to store instructions in the aforementioned method embodiment.
  • the device 1100 may correspond to the rendering device in each method of the aforementioned embodiment, and the device The above and other management operations and/or functions of each element in 1100 are respectively for implementing the corresponding steps of the aforementioned methods, and for the sake of brevity, they are not repeated here.
  • an embodiment of the present application also provides a computer-readable storage medium, which stores a computer program.
  • the computer program includes at least one code segment, and the at least one code segment can be executed by a device to control the device to implement the above method embodiment.
  • the embodiment of the present application also provides a computer program, which is used to implement the above method embodiment when executed by a device or a computer cluster.
  • the program may be stored in whole or in part on a storage medium packaged together with the processor, or may be stored in whole or in part on a memory not packaged together with the processor.
  • the embodiment of the present application further provides a processor, which is used to implement the above method embodiment.
  • the above processor can be a chip.
  • an embodiment of the present application also provides a computing device cluster (also referred to as a server cluster, cloud infrastructure, or cloud device cluster, etc.), including at least one computing device (such as a server), each computing device including a processor and a memory; the processor of at least one computing device is used to execute instructions stored in the memory of at least one computing device, so that the computing device executes the above-mentioned method embodiment.
  • a computing device cluster also referred to as a server cluster, cloud infrastructure, or cloud device cluster, etc.
  • each computing device including a processor and a memory
  • the processor of at least one computing device is used to execute instructions stored in the memory of at least one computing device, so that the computing device executes the above-mentioned method embodiment.
  • the steps of the method or algorithm described in conjunction with the disclosed content of the embodiments of the present application can be implemented in hardware or by executing software instructions by a processor.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read Only Memory, ROM), erasable programmable read-only memory (Erasable Programmable ROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM), registers, hard disks, mobile hard disks, read-only compact disks (CD-ROMs) or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to a processor so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be a component of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • Computer-readable media include computer storage media and communication media, wherein the communication media include any media that facilitates the transmission of a computer program from one place to another.
  • the storage medium can be any available medium that a general or special-purpose computer can access.
  • a and/or B in this article is merely a description of the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.
  • first and second in the description and claims of the embodiments of the present application are used to distinguish different objects rather than to describe a specific order of objects.
  • a first target object and a second target object are used to distinguish different target objects rather than to describe a specific order of target objects.
  • words such as “exemplary” or “for example” are used to indicate examples, illustrations or descriptions. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as “exemplary” or “for example” is intended to present related concepts in a specific way.
  • multiple refers to two or more than two.
  • multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施例提供了一种通信方法及云服务系统。在该云服务系统中,运维单元以分布式方式部署在系统中,每个资源池部署一个运维单元,各运维单元负责管理对应的资源池。各运维单元可基于接收到的用户指令,对对应的资源池执行运维操作。本申请基于分布式的运维单元结构,实现每个运维单元的独立运行与管理,提供一种对等的运维管理机制,各运维单元承担自身运维管理需求即可,有效降低单点管理的运维压力,提高系统整体的稳定性和可靠性。

Description

通信方法及云服务系统 技术领域
本申请实施例涉及云技术领域,尤其涉及一种通信方法及云服务系统。
背景技术
随着云计算的飞速发展,云服务的应用场景越来越广泛,各行各业的企事业机构,可基于自身业务需求构建私有云。云管平台可提供对管理企事业机构的私有云的统一管理,运维人员可通过云管平台提供的用户入口管理私有云。
但是,随着云业务的复杂度越来越高,企事业机构自身构建的私有云的数量越来越多,整体规模也会越来越大,部署形态也会倾向于更加匹配客户的组织架构。而已有技术中的云管平台的结构较为单一,无法满足不同的场景需求。
发明内容
本申请实施例提供一种通信方法及云服务系统。该方法通过分布式结构的运维单元,可实现单点资源池的独立管理,以适应不同场景需求。
第一方面,本申请实施例提供一种云服务系统。该云服务系统包括:多个资源池。其中,系统的多个资源池中的单一资源池用于提供云服务。并且,多个资源池中的每个资源池部署有运维单元。其中,多个运维单元之间基于通信连接进行数据交互。以及,多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池。在本申请中,每个运维单元对本运维单元所属的资源池的管理,具体用于:接收用户指令。以及,响应于用户指令,对本运维单元所属的资源池执行目标运维操作。这样,本申请实施例基于分布式的运维单元,实现每个运维单元作为一个单独的云管平台独立运行,各运维单元对云(即资源池)的管理不再依赖于总的云管平台,实现一种对等的管理运维机制,各运维单元仅需要承担自身的管理需求,有效降低单点管理的运维压力,提高系统整体的稳定性和可靠性,实现对系统的灵活管理。
示例性的,资源池也可以称为云、云资源或云设施等。资源池中包括但不限于:软件资源和硬件资源。软件资源包括但不限于至少一个云平台,硬件资源包括但不限于服务器集群等基础设置。
示例性的,多个资源池所提供的云服务相同或不同。
示例性的,运维单元之间的通信连接可以是遵循私有通信协议。
示例性的,每个运维单元提供用户接口,用于接收用户指令。
在一种可能的实现方式中,多个运维单元中包括第一运维单元和至少一个第二运维单元;第一运维单元部署于第一资源池,至少一个第二运维单元分别部署于至少一个第二资源池。第一运维单元,用于接收第一运维操作指示信息,第一运维操作指示信息用于指示获取多个资源池的资源状态评估结果。第一运维单元,还用于基于第一运维操作指示信息,向至少一个第二运维单元发送第一任务请求信息;其中,第一任务请求信息用于指示第二运维单元反馈对应的第二资源池的资源状态评估结果。第二运维单元,用于响应于接收到的第一任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第一运维单元发送第一任务响应信息,其中,第一任务响应信息包括第二资源池的资源状态评估结果。第一运维单元,还用于接收至少一个第二运维单元反馈的第一任务响应信息;基于第一任务响应信息,获取至少一个第二资源池的资源状态评估结果。这样,本申请实施例中基于分布式的运维单元,主节点仅需要发布任务,各运维单元可各自执行对应任务。从而降低主节点的计算负担,并且,交互过程仅传输结果,有效降低通信损耗。
在一种可能的实现方式中,第一运维单元接收第一运维操作指示信息之前,第一运维单元,还用于接收第一用户指示信息,第一用户指示信息用于指示第一运维单元作为管理节点;响应于第一用户指示 信息,向各第二运维单元发送第一对接请求信息,第一对接请求信息用于指示第一运维单元作为云服务系统的管理节点。第二运维单元,还用于响应于接收到的第一对接请求信息,确定第一运维单元为管理节点且第二运维单元为被管理节点;向第一运维单元发送第一对接响应信息,其中,第一对接响应信息包括第二运维单元的运维信息。这样,本申请通过分布式结构,以用户指定主节点(即管理节点,也可以称为总部)的方式,可使得主节点基于用户指示的任务,将任务派发给各个运维单元,由各运维单元自行执行相应任务,并将结果反馈给主节点。用户可根据场景需求自由设置主节点,使得云服务系统更贴合用户需求。
在一种可能的实现方式中,若第一运维单元发生异常,至少一个第二运维单元中的第三运维单元,用于接收第二用户指示信息,第二用户指示信息用于指示第三运维单元作为管理节点;响应于第二用户指示信息,向各第二运维单元发送第二对接请求信息,第二对接请求信息用于指示第三运维单元作为云服务系统的管理节点。第二运维单元,还用于响应于接收到的第二对接请求信息,确定第三运维单元为管理节点且第二运维单元为被管理节点;向第三运维单元发送第二对接响应信息,第二对接响应信息包括第二运维单元的运维信息。这样,本申请实施例中的系统可根据用户需求指定总部,并且在总部异常的情况下,用户可重新指定。新的总部可继续实现总部的功能,也就是说,在各运维单元对等,即具备相同能力的情况下,任一运维单元均可继承总部的角色,从而提高系统的容错能力以及灵活性。
在一种可能的实现方式中,第三运维单元,还用于接收第二运维操作指示信息,第二运维操作指示信息用于指示获取多个资源池的资源状态评估结果。第三运维单元,还用于基于第二运维操作指示信息,向至少一个第二运维单元发送第二任务请求信息;其中,第二任务请求信息用于指示至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果。第三运维单元,还用于基于第二运维操作指示信息,获取第三资源池的资源状态;基于第三资源池的资源状态,得到第三资源池的资源状态评估结果;其中,第三运维单元部署于第三资源池。第二运维单元,用于响应于接收到的第二任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第三运维单元发送第二任务响应信息,其中,第二任务响应信息包括第二资源池的资源状态评估结果。第三运维单元,还用于接收至少一个第二运维单元反馈的第二任务响应信息;基于第二任务响应信息,获取至少一个第二资源池的资源状态评估结果。这样,在分布式的运维单元结构中,在总部异常的情况下,可根据用户指定切换总部,并继续执行总部的功能。
在一种可能的实现方式中,若第二运维单元中的任一第二运维单元与第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理第二资源池。这样,基于分布式的运维单元架构,各运维单元独立运行,并管理对应的云资源,任意运维单元与主运维单元之间发生通信异常,运维人员仍然可接入通信异常的运维节点并管理对应的云资源,不会影响本地运维人员的接入与本地数据管理。
在一种可能的实现方式中,多个运维单元中包括第三运维单元和第四运维单元,第三运维单元,用于接收第一单元更新指令;基于第一单元更新指令,更新第三运维单元的指定运维能力。第四运维单元,用于接收第二单元更新指令;基于第二单元更新指令,更新第四运维单元的指定运维能力。其中,第一单元更新指令与第二单元更新指令所指示的待更新的运维能力相同或不同。这样,本申请实施例中基于分布式的运维单元结构,各运维单元相互独立,可针对单个运维单元进行演进与迭代,满足用户需求的私有定制,增加各运维单元的灵活性和适用性。并且,由于各运维单元之间的数据相互隔离,任一运维单元的更新仅对该运维单元有效,不会互相影响。
在一种可能的实现方式中,云服务系统还包括服务器集群。多个资源池中的每个资源池中包括至少一个服务器集群;其中,服务器集群中的服务器的型号可以相同或不同。这样,本申请实施例中的分布式运维单元可部署在已有的服务器集群中,提供一种通用的运维管理系统。
第二方面,本申请实施例提供一种通信方法。该方法应用于云服务系统,云服务系统包括多个资源池;其中,多个资源池中的单一资源池用于提供云服务;多个资源池中的每个资源池部署有运维单元; 其中,多个运维单元之间基于通信连接进行数据交互;多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池;方法包括:运维单元接收用户指令。运维单元响应于用户指令,对本运维单元所属的资源池执行目标运维操作。这样,本申请实施例基于分布式的运维单元,实现每个运维单元作为一个单独的云管平台独立运行,各运维单元对云(即资源池)的管理不再依赖于总的云管平台,实现一种对等的管理运维机制,各运维单元仅需要承担自身的管理需求,有效降低单点管理的运维压力,提高系统整体的稳定性和可靠性,实现对系统的灵活管理。
示例性的,资源池也可以称为云、云资源或云设施等。资源池中包括但不限于:软件资源和硬件资源。软件资源包括但不限于至少一个云平台,硬件资源包括但不限于服务器集群等基础设置。
示例性的,多个资源池所提供的云服务相同或不同。
示例性的,运维单元之间的通信连接可以是遵循私有通信协议。
示例性的,每个运维单元提供用户接口,用于接收用户指令。
在一种可能的实现方式中,多个运维单元中包括第一运维单元和至少一个第二运维单元;第一运维单元部署于第一资源池,至少一个第二运维单元分别部署于至少一个第二资源池,方法还包括。第一运维单元接收第一运维操作指示信息,第一运维操作指示信息用于指示获取多个资源池的资源状态评估结果。第一运维单元基于第一运维操作指示信息,向至少一个第二运维单元发送第一任务请求信息;其中,第一任务请求信息用于指示第二运维单元反馈对应的第二资源池的资源状态评估结果。第二运维单元响应于接收到的第一任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第一运维单元发送第一任务响应信息,其中,第一任务响应信息包括第二资源池的资源状态评估结果。第一运维单元接收至少一个第二运维单元反馈的第一任务响应信息;基于第一任务响应信息,获取至少一个第二资源池的资源状态评估结果。这样,本申请实施例中基于分布式的运维单元,主节点仅需要发布任务,各运维单元可各自执行对应任务。从而降低主节点的计算负担,并且,交互过程仅传输结果,有效降低通信损耗。
在一种可能的实现方式中,第一运维单元接收第一运维操作指示信息之前,方法还包括:第一运维单元接收第一用户指示信息,第一用户指示信息用于指示第一运维单元作为管理节点;响应于第一用户指示信息,向各第二运维单元发送第一对接请求信息,第一对接请求信息用于指示第一运维单元作为云服务系统的管理节点;第二运维单元响应于接收到的第一对接请求信息,确定第一运维单元为管理节点且第二运维单元为被管理节点;向第一运维单元发送第一对接响应信息,其中,第一对接响应信息包括第二运维单元的运维信息。这样,本申请通过分布式结构,以用户指定主节点(即管理节点,也可以称为总部)的方式,可使得主节点基于用户指示的任务,将任务派发给各个运维单元,由各运维单元自行执行相应任务,并将结果反馈给主节点。用户可根据场景需求自由设置主节点,使得云服务系统更贴合用户需求。这样,本申请实施例中的系统可根据用户需求指定总部,并且在总部异常的情况下,用户可重新指定。新的总部可继续实现总部的功能,也就是说,在各运维单元对等,即具备相同能力的情况下,任一运维单元均可继承总部的角色,从而提高系统的容错能力以及灵活性。
在一种可能的实现方式中,方法还包括:若第一运维单元发生异常,至少一个第二运维单元中的第三运维单元接收第二用户指示信息,第二用户指示信息用于指示第三运维单元作为管理节点;第三运维单元响应于第二用户指示信息,向各第二运维单元发送第二对接请求信息,第二对接请求信息用于指示第三运维单元作为云服务系统的管理节点;第二运维单元响应于接收到的第二对接请求信息,确定第三运维单元为管理节点且第二运维单元为被管理节点;第二运维单元向第三运维单元发送第二对接响应信息,第二对接响应信息包括第二运维单元的运维信息。这样,在分布式的运维单元结构中,在总部异常的情况下,可根据用户指定切换总部,并继续执行总部的功能。
在一种可能的实现方式中,第三运维单元接收第二运维操作指示信息,第二运维操作指示信息用于指示获取多个资源池的资源状态评估结果;第三运维单元基于第二运维操作指示信息,向至少一个第二运维单元发送第二任务请求信息;其中,第二任务请求信息用于指示至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果;第三运维单元基于第二运维操作指示信息,获取第三资源池的资源状 态;基于第三资源池的资源状态,得到第三资源池的资源状态评估结果;其中,第三运维单元部署于第三资源池;第二运维单元响应于接收到的第二任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第三运维单元发送第二任务响应信息,其中,第二任务响应信息包括第二资源池的资源状态评估结果;第三运维单元接收至少一个第二运维单元反馈的第二任务响应信息;基于第二任务响应信息,获取至少一个第二资源池的资源状态评估结果。这样,在分布式的运维单元结构中,在总部异常的情况下,可根据用户指定切换总部,并继续执行总部的功能。
在一种可能的实现方式中,若第二运维单元中的任一第二运维单元与第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理第二资源池。这样,基于分布式的运维单元架构,各运维单元独立运行,并管理对应的云资源,任意运维单元与主运维单元之间发生通信异常,运维人员仍然可接入通信异常的运维节点并管理对应的云资源,不会影响本地运维人员的接入与本地数据管理。
在一种可能的实现方式中,多个运维单元中包括第三运维单元和第四运维单元,方法还包括:第三运维单元接收第一单元更新指令;第三运维单元基于第一单元更新指令,更新第三运维单元的指定运维能力;第四运维单元接收第二单元更新指令;第四运维单元基于第二单元更新指令,更新第四运维单元的指定运维能力;其中,第一单元更新指令与第二单元更新指令所指示的待更新的运维能力相同或不同。这样,本申请实施例中基于分布式的运维单元结构,各运维单元相互独立,可针对单个运维单元进行演进与迭代,满足用户需求的私有定制,增加各运维单元的灵活性和适用性。并且,由于各运维单元之间的数据相互隔离,任一运维单元的更新仅对该运维单元有效,不会互相影响。
第三方面,本申请实施例提供一种通信方法,其特征在于,应用于云服务系统,云服务系统包括多个资源池;其中,多个资源池中的单一资源池用于提供云服务;多个资源池中的每个资源池部署有运维单元;其中,多个运维单元之间基于通信连接进行数据交互;多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池;多个运维单元中包括第一运维单元和至少一个第二运维单元;第一运维单元部署于第一资源池,至少一个第二运维单元分别部署于至少一个第二资源池,方法包括:第一运维单元接收第一运维操作指示信息,第一运维操作指示信息用于指示获取多个资源池的资源状态评估结果;第一运维单元基于第一运维操作指示信息,向至少一个第二运维单元发送第一任务请求信息;其中,第一任务请求信息用于指示第二运维单元反馈对应的第二资源池的资源状态评估结果;第二运维单元响应于接收到的第一任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第一运维单元发送第一任务响应信息,其中,第一任务响应信息包括第二资源池的资源状态评估结果;第一运维单元接收至少一个第二运维单元反馈的第一任务响应信息;基于第一任务响应信息,获取至少一个第二资源池的资源状态评估结果。这样,本申请实施例基于分布式的运维单元,实现每个运维单元作为一个单独的云管平台独立运行,各运维单元对云(即资源池)的管理不再依赖于总的云管平台,实现一种对等的管理运维机制,各运维单元仅需要承担自身的管理需求,有效降低单点管理的运维压力,提高系统整体的稳定性和可靠性,实现对系统的灵活管理。并且,本申请实施例中基于分布式的运维单元,主节点仅需要发布任务,各运维单元可各自执行对应任务。从而降低主节点的计算负担,并且,交互过程仅传输结果,有效降低通信损耗。
在一种可能的实现方式中,第一运维单元接收第一运维操作指示信息之前,方法还包括:第一运维单元接收第一用户指示信息,第一用户指示信息用于指示第一运维单元作为管理节点;响应于第一用户指示信息,向各第二运维单元发送第一对接请求信息,第一对接请求信息用于指示第一运维单元作为云服务系统的管理节点;第二运维单元响应于接收到的第一对接请求信息,确定第一运维单元为管理节点且第二运维单元为被管理节点;向第一运维单元发送第一对接响应信息,其中,第一对接响应信息包括第二运维单元的运维信息。
在一种可能的实现方式中,方法还包括:若第一运维单元发生异常,至少一个第二运维单元中的第三运维单元接收第二用户指示信息,第二用户指示信息用于指示第三运维单元作为管理节点;第三运维 单元响应于第二用户指示信息,向各第二运维单元发送第二对接请求信息,第二对接请求信息用于指示第三运维单元作为云服务系统的管理节点;第二运维单元响应于接收到的第二对接请求信息,确定第三运维单元为管理节点且第二运维单元为被管理节点;第二运维单元向第三运维单元发送第二对接响应信息,第二对接响应信息包括第二运维单元的运维信息。
在一种可能的实现方式中,第三运维单元接收第二运维操作指示信息,第二运维操作指示信息用于指示获取多个资源池的资源状态评估结果;第三运维单元基于第二运维操作指示信息,向至少一个第二运维单元发送第二任务请求信息;其中,第二任务请求信息用于指示至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果;第三运维单元基于第二运维操作指示信息,获取第三资源池的资源状态;基于第三资源池的资源状态,得到第三资源池的资源状态评估结果;其中,第三运维单元部署于第三资源池;第二运维单元响应于接收到的第二任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第三运维单元发送第二任务响应信息,其中,第二任务响应信息包括第二资源池的资源状态评估结果;第三运维单元接收至少一个第二运维单元反馈的第二任务响应信息;基于第二任务响应信息,获取至少一个第二资源池的资源状态评估结果。
在一种可能的实现方式中,若第二运维单元中的任一第二运维单元与第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理第二资源池。
在一种可能的实现方式中,多个运维单元中包括第三运维单元和第四运维单元,方法还包括:第三运维单元接收第一单元更新指令;第三运维单元基于第一单元更新指令,更新第三运维单元的指定运维能力;第四运维单元接收第二单元更新指令;第四运维单元基于第二单元更新指令,更新第四运维单元的指定运维能力;其中,第一单元更新指令与第二单元更新指令所指示的待更新的运维能力相同或不同。
第四方面,本申请实施例提供一种通信方法。该方法应用于云服务系统,云服务系统包括多个资源池;其中,多个资源池中的单一资源池用于提供云服务;多个资源池中的每个资源池部署有运维单元;其中,多个运维单元之间基于通信连接进行数据交互;多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池;多个运维单元中包括第一运维单元和至少一个第二运维单元;第一运维单元部署于第一资源池,至少一个第二运维单元分别部署于至少一个第二资源池,方法包括:第一运维单元接收第一用户指示信息,第一用户指示信息用于指示第一运维单元作为管理节点;管理节点在云服务系统中用于向至少一个第二运维单元发布资源分析任务,使得接收到资源分析任务的第二运维单元执行资源分析运维操作,并向第一运维单元反馈资源分析结果;若第一运维单元发生异常,至少一个第二运维单元中的第三运维单元接收第二用户指示信息,第二用户指示信息用于指示第三运维单元作为管理节点。
在一种可能的实现方式中,第一运维单元接收第一用户指示信息之后,方法还包括:第一运维单元响应于第一用户指示信息,向各第二运维单元发送第一对接请求信息,第一对接请求信息用于指示第一运维单元作为云服务系统的管理节点;第二运维单元响应于接收到的第一对接请求信息,确定第一运维单元为管理节点且第二运维单元为被管理节点;向第一运维单元发送第一对接响应信息,其中,第一对接响应信息包括第二运维单元的运维信息。
在一种可能的实现方式中,第一运维单元接收第一用户指示信息之后,方法还包括:第一运维单元接收第一运维操作指示信息,第一运维操作指示信息用于指示获取多个资源池的资源状态评估结果;第一运维单元基于第一运维操作指示信息,向至少一个第二运维单元发送第一任务请求信息;其中,第一任务请求信息用于指示第二运维单元反馈对应的第二资源池的资源状态评估结果;第二运维单元响应于接收到的第一任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第一运维单元发送第一任务响应信息,其中,第一任务响应信息包括第二资源池的资源状态评估结果;第一运维单元接收至少一个第二运维单元反馈的第一任务响应信息;基于第一任务响应信息,获取至少一个第二资源池的资源状态评估结果。
在一种可能的实现方式中,第三运维单元接收第二用户指示信息之后,方法还包括:第三运维单元响应于第二用户指示信息,向各第二运维单元发送第二对接请求信息,第二对接请求信息用于指示第三运维单元作为云服务系统的管理节点;第二运维单元响应于接收到的第二对接请求信息,确定第三运维单元为管理节点且第二运维单元为被管理节点;第二运维单元向第三运维单元发送第二对接响应信息,第二对接响应信息包括第二运维单元的运维信息。
在一种可能的实现方式中,第三运维单元接收第二用户指示信息之后,方法还包括:第三运维单元接收第二运维操作指示信息,第二运维操作指示信息用于指示获取多个资源池的资源状态评估结果;第三运维单元基于第二运维操作指示信息,向至少一个第二运维单元发送第二任务请求信息;其中,第二任务请求信息用于指示至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果;第三运维单元基于第二运维操作指示信息,获取第三资源池的资源状态;基于第三资源池的资源状态,得到第三资源池的资源状态评估结果;其中,第三运维单元部署于第三资源池;第二运维单元响应于接收到的第二任务请求信息,获取第二资源池的资源状态;基于第二资源池的资源状态,得到第二资源池的资源状态评估结果;向第三运维单元发送第二任务响应信息,其中,第二任务响应信息包括第二资源池的资源状态评估结果;第三运维单元接收至少一个第二运维单元反馈的第二任务响应信息;基于第二任务响应信息,获取至少一个第二资源池的资源状态评估结果。
在一种可能的实现方式中,若第二运维单元中的任一第二运维单元与第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理第二资源池。
第五方面,本申请实施例提供一种通信方法。该方法应用于云服务系统,云服务系统包括多个资源池;其中,多个资源池中的单一资源池用于提供云服务;多个资源池中的每个资源池部署有运维单元;其中,多个运维单元之间基于通信连接进行数据交互;多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池,并且,多个运维单元中的每个运维单元具有相同的运维能力;方法包括:多个运维单元的任一运维单元接收单元更新指令;任一运维单元基于单元更新指令,更新本运维单元的指定运维能力。
在一种可能的实现方式中,单元更新指令用于指示删除和/或增加指定运维能力。
第六方面,本申请实施例提供一种包含指令的计算机程序产品,当该指令被计算机设备集群运行时,使得该计算机设备集群执行如第二方面或第二方面的任意可能的设计提供的方法。
第七方面,本申请实施例提供一种包含指令的计算机程序产品,当该指令被计算机设备集群运行时,使得该计算机设备集群执行如第三方面或第三方面的任意可能的设计提供的方法。
第八方面,本申请实施例提供一种包含指令的计算机程序产品,当该指令被计算机设备集群运行时,使得该计算机设备集群执行如第四方面或第四方面的任意可能的设计提供的方法。
第九方面,本申请实施例提供一种包含指令的计算机程序产品,当该指令被计算机设备集群运行时,使得该计算机设备集群执行如第五方面或第五方面的任意可能的设计提供的方法。
第十方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如第二方面或第二方面的任意可能的设计提供的方法。
第十一方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如第三方面或第三方面的任意可能 的设计提供的方法。
第十二方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如第四方面或第四方面的任意可能的设计提供的方法。
第十三方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如第五方面或第五方面的任意可能的设计提供的方法。
第十四方面,本申请实施例提供一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得该计算设备执行如第二方面至第五方面中的任意可能的设计提供的方法。
附图说明
图1为示例性示出的云服务系统的场景示意图;
图2为示例性示出的本申请实施例中云服务系统的结构示意图;
图3为示例性示出的云的结构示意图;
图4为示例性示出的运维单元的结构示意图;
图5为示例性示出的云服务系统初始化流程示意图;
图6为示例性示出的运维单元交互流程示意图;
图7为示例性示出的运维单元交互流程示意图;
图8为示例性示出的总部运维单元异常的场景示意图;
图9为示例性示出的总部运维单元异常的场景示意图;
图10为示例性示出的运维单元通信异常的场景示意图;
图11为示例性示出的装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为使本领域人员更好的理解本申请中的技术方案,首先对涉及到的背景技术进行简单说明。
云平台:也称云系统、云环境或云,是一种云供应商提供的云技术(也称云计算(cloud computing)技术)服务的软件系统,用于提供与云服务相关的界面以供租户远程访问云服务。租户可通过预先注册的账号密码在云服务访问页面登录云平台,并在登录成功之后,在云服务访问页面选择并购买对应的云服务,云服务例如对象存储服务、虚拟机服务、容器服务等。
公有云,是第三方的公有云供应商为广大的个人或企业提供的云平台。在公有云中,硬件、软件和其他结构均为第三方的公有云供应商所拥有和管理。
私有云,是为一个企业或组织提供的专用的云平台。私有云可以由对应的企业或组织进行内部操作。私有云主要是面向企业用户,也称企业云。
混合云,是指不同的云平台所形成的云平台。混合云包括至少两个云平台,也称多云平台或多云。可选地,混合云融合了公有云和私有云。出于安全考虑,一些企业用户更愿意将数据存放在私有云中,但是同时又希望可以获得公有云的计算资源。在这种情况下,包括公有云和私有云的混合云被越来越多地采用。混合云将公有云和私有云进行混合和匹配,以获得良好的使用效果。
在本申请实施例中所涉及到的云,可以是公有云、私有云和/或混合云,本申请不做限定。
在本申请实施例中,云也可以称为资源池或云设施等,本申请不做限定。
云管理平台(简称云管平台),一种管理公有云、私有云和混合云的产品,产品形态可以是软件,为企业提供管理跨多个云基础设施的服务。云管平台可提供统一的入口(也可以称为运维入口),运维人员 可通过统一入口接入云管平台,并通过云管平台管理对应云的云资源(包括硬件和软件)。
服务,包括:计算服务、存储服务或网络服务等。用户设备在云平台可以访问到的设备或功能都可视为云平台提供的服务。
资源,用于提供服务的硬件或软件资源。例如,与计算服务、存储服务或网络服务对应的,资源包括计算资源、存储资源或网络资源。可选地,计算资源包括中央处理器(CPU,central processing unit)资源、内存资源和/或硬盘资源等。
在已有技术方案中,通常情况下,客户会统一构建一套云管平台,云管平台用于统一管理总部(也可以称为主云)以及各分支机构(也可以称为从属云)的云资源(包括硬件和软件),以提供对云的运维能力。图1为示例性示出的云服务系统的场景示意图。请参照图1,该云服务系统中包括但不限于:云管平台100和多个云。
该示例中仅以每个云包括一个云平台为例进行说明。示例性的,多个云包括但不限于:位于上海的云110、位于湖南的云120、位于河南的云130、位于郑州的云140以及位于洛阳的云150。
在本申请实施例中,单一云可以包括软件资源和硬件资源。也可以理解为云设施(或资源池)包括云基础设施和软件资源。其中,硬件资源(也可以称为云基础资源或云基础资源)包括不限于多个服务器集群,软件资源包括但不限于云平台所对应的应用软件等。可选地,云平台也可以理解为是包括软件资源(指云服务的相关资源)和硬件资源,即,云平台是在基础资源(即硬件资源)上通过部署云平台软件而虚拟化出的平台,其提供的服务资源是建立在硬件基础上的。也就是说,云可以理解为即为云平台,或者是一个云包括多个云平台,云平台包括云服务资源和基础资源(即硬件资源)。可以理解为,云管平台对于云的管理,即为对云平台的管理。在本申请实施例中,为更好的阐述云管平台与云之间的对应关系,仅以云管平台管理云的云资源(包括云平台和硬件)进行说明,实际上,也可以理解为是云管平台与云平台对应,管理的是多个云平台。
示例性的,服务器集群中包括一个或多个设备。设备包括但不限于:网络设备、安全设备、计算设备等,本申请不做限定。
示例性的,以云110(可以称为云设施或资源池)为例,该云的硬件资源(即云基础设施)位于上海,例如位于上海的数据中心中。在本申请实施例中,仅以区域划分的方式为例进行说明,以表示各云之间的独立性。本申请实施例中所涉及到的云的数量以及划分方式(例如区域划分)仅为示意性举例,本申请不做限定。
示例性的,每个云所提供的服务可以相同或不同,本申请不做限定。
在该场景中,根据用户需求,云管平台110部署在云110。也就是说,云110作为主云(也可以称为总部云),其它云皆为云110的从属云(也可以称为子云或分支机构)。示例性的,在该场景中,云管平台用于统一管理客户的云服务系统中的所有分支机构的云设施,并且支撑各分支机构的运维人员对本地云设施进行日常的运维事务。示例性的,云服务系统中的各个分支机构都会有自身独有的本地云设施(包括硬件资源和软件资源,例如服务器集群和云平台)。总部(与分支机构部署的本地云,通常情况下是对等的)在运维其自身本地云设施的同时,通常需要全局了解各个分支机构云设施的运行概况,但是具体的运维日常操作,需要各分支机构自行负责。
在该场景中,所有运维人员(例如包括上海运维人员、湖南运维人员、河南运维人员、洛阳运维人员以及郑州运维人员)均需要接入云管平台110,以对对应的云执行运维操作。
示例性的,云管平台具有分权和分域的功能。具体的:
分权:客户不同的运维人员都在云管平台之上进行日常运维事务,为控制每个运维人员只能拥有最小的操作权限,云管平台会提供分权能力,以支撑对不同的运维人员赋予不同的运维权限的能力。
分域:客户不同的运维人员都在云管平台之上进行日常运维事务,为控制每个运维人员只能运维自身责任范围内的运维对象,云管平台会提供分域能力,以支撑对不同的运维人员赋予不同的运维对象范围的能力。
可以理解为,云管平台的分权分域功能使得每一个分支机构的运维人员进入到云管平台之后,都只会看到属于自己权限和分域内的云的相关数据,不会看到其它分支机构云的相关数据。已有技术中的云管平台采用分权分域的方式,使得各个分支机构的运维人员可以统一在云管平台上执行日常运维事务。
举例说明:以郑州运维人员为例。位于郑州的运维人员可通过电子设备(例如手机、平板、可穿戴设备、电脑等)与网络设备(例如网关,该网络设备可位于郑州,也可以是上海,本申请不做限定)建 立通信连接。其中,网络设备接入云110。示例性的,郑州运维人员基于电子设备与网络设备之间的通信连接,通过网络设备向云110上的云管平台发送用户指令(也可以称为运维指令,或运维操作指令等,本申请不做限定)。该指令用于获取郑州的云平台的资源分析结果。其中,基于分权分域的方式,郑州运维人员所下发的指令只能是针对郑州云140的。
云管平台响应于接收到的用户指令,向云140(即位于郑州的云平台)发送资源获取指令,用于获取郑州云平台的资源数据。资源数据可以是硬件资源数据和/或软件资源数据,本申请不做限定。其中,云140与云110的通信可以是直接通信,即云110将指令发送给云140。也可以是间接通信,例如云110将指令发送给云130,再由云130转发给云140。本申请不做限定。云140响应于接收到的资源获取指令,将云140的资源状态反馈给云110。例如,云140可将郑州服务器集群中的各服务器的硬件运行状态通过通信连接传输给云110。云110上的云管平台响应于接收到的云140发送的资源状态进行资源分析,得到资源分析结果。云管平台可在显示界面上显示资源分析结果。其它云平台的处理与云140类似,此处不再注意赘述。
如上文所述,各分支机构(也可以理解为是分支云)的运维人员需要通过本地与云管平台之间的通信连接接入云管平台,并且所有指令均需要通过该通信连接发送到云管平台。在该场景中,如果分支机构的云平台与主云之间的通信连接断开,则分支结构的运维人员的电子设备将无法接入云管平台。相应的,分支结构的运维人员将无法对本地云平台进行管理。也就是说,分支机构的运维人员对与本地云平台的管理依赖于云管平台之间的通信连接,一旦通信连接异常,则无法正常管理云平台。
并且,在该场景中,全局统一的云管平台一旦出现异常,将会影响全局所有分支机构的运维工作。
以及,由于位于总部的云管平台需要为全局各分支机构的运维人员提供运维能力,则总部的统一云管平台需要管理全局所有的运维数据,需要承受全部的运维数据压力,云管平台极易成为瓶颈,而此瓶颈会影响全局所有分支机构的运维业务。
此外,通常情况下,在云管平台上进行业务增强或定制方案,都将全局生效。例如需要增强云管平台的某个运维功能时,该增强运维能力将对所有分支机构有效。因此,在分支机构对运维能力有特殊需求时,将无法满足运维能力的个性化定制。
需要说明的是,在本申请中,云管平台用于管理该云服务系统中的至少一个云。当然,如果部分云不存在管理需求,也可以不做管理。
本申请实施例提供一种云服务系统及对应的通信方法。在该云服务系统中,通过构建一套基于运维单元的分布式云服务系统,实现各运维单元的对等关系。云服务系统中的云管平台以运维单元为最小单位可进行个性化定制,以实现个体的可扩展性。可以理解为,本申请实施例中每个运维单元相当于一个云管平台,用于管理对应的云。
可选地,云平台是在基础资源(即硬件资源)上通过部署云平台软件而虚拟化出的平台,其提供的服务资源是建立在硬件基础上的。也就是说,云平台包括云服务资源(即软件资源)和基础资源(即硬件资源)。运维单元对云平台的管理,即为对云平台的云资源进行管理。
需要说明的是,在本申请实施例中,运维单元对于云的管理,即为对云平台的管理。在本申请实施例中,为更好的阐述运维单元与云之间的对应关系,仅以运维单元与云对应,即一个云上部署一个运维单元,运维单元可用于管理对应云的云资源(包括云平台和硬件)进行说明,实际上,也可以理解为是运维单元与至少一个云平台对应,管理的是多个云平台。也就是说,在目前的云技术中,云平台等同于云,而为了区分运维单元的管理对象,本申请实施例中均以运维单元管理的是云。并且,在目前的云技术中,一些实例中将硬件资源与软件资源(即云服务资源)均作为云平台所对应的云资源。在本申请实施例中,由于运维单元获取到的数据包括云服务资源的相关数据和基础资源(即硬件资源)的相关数据,因此,在本申请实施例中将云平台与硬件资源区分开,均作为云的云资源的部分,下文中不再重复说明。
图2为示例性示出的本申请实施例中云服务系统的结构示意图。请参照图2,系统中包括但不限于多个云。例如包括但不限于位于上海的云210、位于河南的云220、位于郑州的云230和位于洛阳的云240等。图2中的各云的数量以及划分方式仅为示意性举例,本申请不做限定。
示例性的,各云中还包括对应的运维单元。例如,云210中包括运维单元211,云220中包括运维单元221,云230中包括运维单元231,以及云240中包括运维单元241。需要说明的是,本申请实施例中的运维单元是部署于云端的,可以理解为是属于云的一部分,即运维单元可描述为包含于对应的云,或者是属于对应的云。可选地,运维单元也可以理解为是在云之上的管理层,即其用于管理整个云,而非 云的一部分。
在本申请实施例中,单一云可以包括软件资源和硬件资源。其中,软件资源包括但不限于云平台,硬件资源包括不限于至少一个服务器集群。示例性的,服务器集群中包括一个或多个设备。设备包括但不限于:网络设备、安全设备、计算设备等,本申请不做限定。可以理解为,云是将至少一个服务器集群的硬件资源虚拟化为云服务资源,以为用户提供对应的云服务。云服务包括但不限于:存储服务、计算服务等,本申请不做限定。
示例性的,本申请实施例中的各服务器集群中包括至少一个设备,设备的型号可以相同或不同。其中,设备的型号不同可以是指同一个厂商的不同型号的设备,还可以是指不同的厂商的不同型号的设备。本申请不做限定。
示例性的,本申请实施例中的各云可以由第三方厂商提供。各云的第三方厂商可以相同或不同。可选地,各云也可以是同一的厂商,例如均为华为云。本申请不做限定。
需要说明的是,各云需要支持本申请实施例中的通信策略(例如支持对应的通信接口)以及环境部署(例如可支持运维单元的安装与运行)。
也就是说,本申请实施例中的通信方法可应用于已有的云服务系统中,仅需要实现运维单元的软件层面的安装与运行即可。
在本申请实施例中,运维单元可以理解为是应用软件或程序指令。运维人员可在各个云(例如云210、云220、云230以及云240)上安装并运行运维单元对应的软件包,以在云上部署并运行运维单元。在本申请实施例中,各运维单元可以理解为是一个独立的云管平台,用于管理对应的云的云资源。其中,云资源包括但不限于软件资源和硬件资源。硬件资源即为上文所述的服务器集群。软件资源包括但不限于云平台以及一些部署在云上的应用软件等。
本申请实施例中所涉及到的“管理”包括但不限于:查询资源状态、分析资源状态以及文件管理等运维能力。具体示例将在下文实施例中说明。
为更好的说明运维单元与云的关系,下面以图3所示的云的结构示意图为例进行说明。请参照图3,具体包括但不限于:
以云210为例(其它云的结构与云210类似,不再逐一说明)。云210的云资源包括但不限于:软件资源和硬件资源。其中,软件资源包括但不限于至少一个云平台,例如包括云平台212和云平台213。云平台一种云供应商提供的云技术服务的软件系统,用于提供与云服务相关的界面以供租户远程访问云服务。租户可通过预先注册的账号密码在云服务访问页面登录云平台,并在登录成功之后,在云服务访问页面选择并购买对应的云服务,云服务例如对象存储服务、虚拟机服务、容器服务等。
示例性的,云平台之间的数据隔离,即,不同的云所对应的服务对象不同,所提供的服务相同或不同。本申请不做限定。
仍参照图2,云资源的硬件资源214包括但不限于服务器集群,例如可以进一步服务器集群中的存储介质、内存以及CPU(central processing unit,中央处理器)等。
运维单元211,用于管理云210中的云资源。例如管理多个云平台(包括云平台212和云平台213)以及对应的硬件资源214。其中,硬件资源214包括多个服务器集群。例如,云平台212是在服务器集群1上部署的,云平台213是在服务器集群2上部署的。云平台212即可基于服务器集群1提供的基础资源,为用户提供对应的云服务资源,例如云存储资源、云计算资源等。云平台213即可基于服务器集群2提供的基础资源。当然,如上文所述,在云技术中,云平台等同于云,例如,根据客户需求,图3中的云平台212和云平台213在该云服务系统的组织结构中可以作为一个云,也可以称为是一个云平台。可选地,硬件资源同样也可以看作是云平台的资源。本申请为更好的区分不同资源的数据以及运维单元的对应关系,仅以一个云包括至少一个云平台,一个云对应于一个运维单元,以及云的云资源包括云平台和硬件资源,即运维单元用于管理云中的至少一个云平台和硬件资源为例进行说明。
示例性的,在运维单元的部署阶段,运维单元可配置与云平台的接口以及与服务器集群中各服务器的接口,从而通过接口与云平台以及服务器进行数据交互,从而获取到云平台的相关数据和服务器的相关数据,以实现对云资源的管理。
请继续参照图2,示例性的,在本申请实施例中,各运维单元作为独立的云管平台,每个运维单元都是对等的。相应的,各运维单元之间可建立通信连接以进行数据交互。在实际应用场景中,运维人员可根据需求,建立运维单元之间的通信连接,以使运维单元之间可相互通信。以图2所示的系统为例,运 维单元211与运维单元221之间建立通信连接,运维单元211与运维单元221之间可进行数据交互。运维单元211分别于运维单元231和运维单元241之间建立通信连接,运维单元211可与运维单元231和运维单元241之间进行数据交互。
在本申请实施例中,各运维单元之间是对等的,其中,对等的意思是各运维单元的作用均是用于管理,即作为独立的云管平台。而在实际应用场景,各运维单元之间也可以基于用户需求,设置对应的组织结构,并充当对应的角色。以图2所示场景为例,在该示例中,运维单元211可基于上海运维人员的指示,作为云服务系统中的总部(也可以称为主运维单元)。在本申请实施例中,主运维单元可用于向其它运维单元发布任务,并汇总任务结果。相应的,其它运维单元可基于主运维单元发布的任务,执行对应的运维操作。具体示例将在下文中说明。
下面结合图4所示的运维单元的结构示意图,对本申请实施例中的运维单元进行详细说明。请参照图4,运维单元中包括但不限于:系统/数据接口、CMDB(Configuration Management Database,配置管理数据库)、数据分析平台、用户管理、报表管理、展示大屏、自动作业、知识库、资源管理、性能管理、日志管理、告警管理等运维特性能力。以及支持分布式分析能力的容量分析、负载分析、闲置资源分析、瓶颈资源分析等数据分析能力。以及,运维单元还包括有用于与其它“运维单元”之间实现分布式分析任务接收与下发、数据采集与上报、运维经验/知识分发与共享的通信接口。
在本申请实施例中,运维单元的特性如下:
a.一个运维单元可以管理一个或多个云平台。
示例性的,如图3中所示,一个云中可以包括一个或多个云平台。相应的,该云上部署的运维单元可以管理一个或多个云平台。这样,本申请实施例中的运维单元作为独立云管平台,仅用于管理本地的一个或多个云平台,从而使得运维管理的颗粒度细化到单独的云,进而提高云服务系统整体的运维管理的灵活性。
b.运维单元相互之间为对等关系。
示例性的,各运维单元之间是对等的,其中,对等的意思是各运维单元的作用均是用于管理,即作为独立的云管平台。这样,本申请实施例中的运维单元是对等关系,互相是独立存在的,任一运维单元的异常均不会影响其他运维单元的正常运行。
c.运维单元自身可以独立演进与迭代。
在本申请实施例中,各运维单元相互独立,因此,各运维单元可以基于用户需求增加或删除特定运维功能。可选地,增加或删除特定运维功能可以是指增加或删除运维单元中的已有模块。例如,上海运维人员可在运维单元211中增加模块1,模块1可以执行指定的运维操作。再例如,河南运维人员可在运维单元221中删除报表管理模块,以删除运维单元221对应于报表的运维功能。可选地,增加或删除特定运维功能还可以是指增加或删除已有模块的部分功能。本申请不做限定。这样,本申请实施例中的分布式结构,各运维单元相互独立,因此可以独立演进与迭代,而不会全局生效,可实现用户的个性化定制,提高系统运维的整体灵活度,可适用于不同场景下的用户需求。
d.运维单元可以接收其它一个或多个运维单元发布的分析任务,也可以向其它一个或多个运维单元发起分析任务并采集运维数据。
在本申请实施例中,任一运维单元可以向与其建立通信连接的至少一个运维单元发送分析任务。接收到分析任务的运维单元可通过通信连接向该运维单元反馈分析结果。这样,本申请实施例中各运维单元是对应的,相应的,各运维单元均具备发布任务以及反馈任务结果的能力,从而可适应于总部(即主运维单元)故障的场景。并且,分析任务下发到各运维单元执行,从而使得以分布式的方式完成分析任务,各节点仅需要完成自己的分析任务,无需将任务集中到其中一个节点执行,进而有效提高分析任务的完成效率,并减少云之间的通信交互,降低通信开销。
e.运维单元并不局限于某个或某些特定的云厂商,其只要拥有上述相关的“运维特性能力”,并且能够遵循通信协议即可。
这样,本申请实施例中可适用于不同的硬件及软件环境,可提高应用场景的普遍性。
请继续参照图4,下面对运维单元中的具体运维功能进行向细说明,运维单元中包括但不限于:
1)系统/数据接口:用于提供接口,使得运维单元可通过该接口主动采集或者被动接收运维对象(即云或云平台)相关运维数据。其中,运维数据包括但不限于:资源对象、告警、性能指标、日志等。
2)CMDB:用于建立整个云中的各类资源对象以及资源对象关系的配置数据库。此模块中存储有本 地云(即运维单元所属云)的硬件、虚拟化、软件、应用等各层资源对象的实例以及实例之间的关系数据。
3)数据分析平台:用于存储本地云(即运维单元所属云)中的性能指标、日志等时序数据的数据库。以统一的格式为上层的各模块应用统一使用。可以理解为,该模块将外部采集或上报上来的数据转换成特定格式,以使得运维单元的上层模块能够接收到特定格式的信息,并获取到相应数据。
4)资源管理:用于提供资源列表的特性模块。例如,运维单元可通过调用该模块,获取资源列表,资源列表中包括云中的各资源对象。
5)性能管理:用于提供性能指标相关能力的特性模块。例如,运维单元可通过调用该模块执行性能指标的展示、性能阈值的设置、性能采集任务的设备等能力。
6)告警管理:用于提供告警相关管理能力的特性模块。
7)日志管理:用于提供系统日志、安全日志、操作日志以及运行日志相关管理能力的特性模块。
8)自动作业:用于提供日常自动化运维操作能力的特性模块。
9)知识库:用于提供的积累与固化常见故障处理方法的案例库与知识库特性模块。
10)报表管理:用于提供报表相关管理能力的特性模块。例如,运维单元可通过调用报表管理模块执行报表的查看、修改、删除、自定义、报表周期任务的设置等能力。
11)展示大屏:用于提供展示大屏类业务诉求的特性模块。示例性的,运维单元可提供运维界面,展示大屏模块可支持运维界面的运行,并显示相关运维参数以及资源参数等。
12)负载分析:用于提供分析云内各类资源负载情况的特性模块。在本申请实施例中,负载分析模块可周期性地获取云内的各类资源负载情况,并执行负载分析。可选地,负载分析模块也可以是在接收到指令之后,获取负载资源状态,并执行负载分析。其中,接收到的指令可以是主运维单元发送的,也可以是本地运维人员下发的,本申请不做限定。示例性的,负载资源包括但不限于:服务器集群的硬件负载情况。例如CPU的负载情况、内存的负载情况等,本申请不做限定。
13)容量分析:用于提供分析云内各类基础资源(即硬件资源,例如计算资源、存储资源等)以及云服务资源(即云平台提供的云服务资源)的容量信息(例如每种资源已使用的容量和未使用的容量状态)的特性模块。在本申请实施例中,容量分析模块可周期性地获取云内的各类基础资源情况以及云服务容量信息,并执行容量分析。可选地,容量分析模块也可以是在接收到指令之后,获取各类基础资源情况以及云服务容量信息,并执行容量分析。其中,接收到的指令可以是主运维单元发送的,也可以是本地运维人员下发的,本申请不做限定。
14)闲置资源分析(也可以称为空闲资源分析):用于提供的分析各类闲置资源的特性模块。示例性的,运维单元可通过调用闲置资源分析模块,获取到各资源的闲置情况,并进行分析。在本申请实施例中,容量分析模块可周期性地获取云内的各类闲置资源,并执行闲置资源分析。可选地,容量分析模块也可以是在接收到指令之后,获取各类闲置资源,并执行闲置资源分析。其中,接收到的指令可以是主运维单元发送的,也可以是本地运维人员下发的,本申请不做限定。例如,运维单元可通过调用闲置资源分析模块,获取服务器集群的各服务器的CPU的闲置情况,并对获取到的CPU的闲置情况进行分析,以获取到的云内的CPU总体的闲置情况。可以理解为,闲置资源分析的结果可用于表示云内的哪些资源的利用率较低。
15)瓶颈资源分析:云管平台为客户提供的分析各类瓶颈资源的特性模块。示例性的,运维单元可通过调用瓶颈资源分析模块,获取到各资源的使用情况,并进行瓶颈资源分析。在本申请实施例中,瓶颈资源分析模块可周期性地获取云内的各类资源的使用情况,并执行瓶颈资源分析。可选地,瓶颈资源分析模块也可以是在接收到指令之后,获取各类资源的使用情况,并执行瓶颈资源分析。其中,接收到的指令可以是主运维单元发送的,也可以是本地运维人员下发的,本申请不做限定。可以理解为,瓶颈资源分析的结果可用于表示云内的哪些资源的利用率较高。其中,利用率过高的资源可能产生瓶颈,例如,若通过瓶颈资源分析,运维单元确定存储资源的使用率过高,则可预测出存储资源可能会作为本地云的资源瓶颈,在后续使用中可能会无法满足用户需求。则运维单元可通过告警或提示的方式,提示用户对存储资源进行扩容。
16)用户管理:用于提供用户管理及认证相关的特性模块。
17)通信接口:用于提供与其它运维单元进行数据交互的接口。数据交互包括但不限于:“分析任务下发及结果回传”、“数据采集或上报”以及“运维经验分发与共享”等。详细如下:
A.分析任务下发及结果回传,包括:容量分析任务、负载分析任务、闲置资源分析任务、瓶颈资源分析任务等分析型任务的下发与对应分析结果的回传(即反馈)。在本申请实施例中,通信接口不仅可以接收其它运维单元下发到自身的分析任务,也可以向其它运维单元发起自身需要的分析任务。由于各项性能指标数据分布于各个运维单元本地,故避免了性能指标数据的单点瓶颈问题。当归属于“总部”角色的运维单元(即主运维单元)需要分析结果时,只需要其将各项分析任务同步分发给其关注的其它对等的至少一个运维单元即可。总部运维单元能够通过通信接口发送分析任务,并基于与其他运维单元之间的通信连接(也可以称为通信通道,该通信通道遵循分布式运维通信协议)将分析任务传输给目标运维单元。其中,目标运维单元可以是云服务系统中的至少一个运维单元。目标运维单元通过通信接口收到分析任务后,在本地进行分析,并可以通过通信接口以及与总部运维单元之间的通信连接,将分析结果再次回传到总部运维单元。总部单元通过通信接口接收到至少一个运维单元回传的分析结果,可对分析结果进行汇总以及显示等操作。
B.数据采集或上报,包括:资源列表数据、告警列表数据的采集与上报。
C.运维经验分发与共享,包括:自动作业脚本、运维(经验)知识库等的分发与共享。示例性的,云服务系统中具备运维经验分法与共享模块的运维单元均具有自动作业脚本、运维(经验)知识库等数据的共享能力,运维单元可基于接收到的用户指令,向其它运维单元发布本运维单元的自动作业脚本、运维(经验)知识库等数据。对端运维单元可基于接收到的自动作业脚本、运维(经验)知识库等进行学习,以使得运维单元在运行过程中基于已学习到的自动作业脚本、运维(经验)知识库等数据进行相关运维操作。
在本申请实施例中,通信接口是运维单元之间交互能力的集中体现。通信接口所发送的数据包遵循分布式运维通信协议,可以理解为,各运维单元发送的数据包的结构是相同的,运维单元可通过通信接口监听数据包,并正确读取遵循分布式运维通信协议的数据包,以获取数据包中携带的数据。示例性的,协议本身是对等的,在不同的运维单元之间并不存在客户端与服务端的区别,只是在客户的实际部署与实际使用过程中,可以根据用户需求设置云服务系统的组织结构,并充当满足用户诉求的角色。在本申请实施例中,通信接口不仅可以接收其它运维单元下发到自身的分析任务,也可以向其它运维单元发起自身需要的分析任务。由于各项性能指标数据分布于各个运维单元本地,故避免了性能指标数据的单点瓶颈问题。当归属于“总部”角色的运维单元(即主运维单元)需要分析结果时,只需要其将各项分析任务同步分发给其关注的其它对等的至少一个运维单元即可。总部运维单元能够通过通信接口发送分析任务,并基于与其他运维单元之间的通信连接(也可以称为通信通道,该通信通道遵循分布式运维通信协议)将分析任务传输给目标运维单元。其中,目标运维单元可以是云服务系统中的至少一个运维单元。目标运维单元通过通信接口收到分析任务后,在本地进行分析,并可以通过通信接口以及与总部运维单元之间的通信连接,将分析结果再次回传到总部运维单元。总部单元通过通信接口接收到至少一个运维单元回传的分析结果,可对分析结果进行汇总以及显示等操作,从而减轻云服务系统的单点压力,通过分布式结构使得分析计算任务能够以分布式的方式下发到各运维单元执行,在提高分析任务的执行效果的同时,减少各云之间的通信交互的数据量,有效降低通信开销,节约空口资源。
综上,本申请实施例中的运维单元中包含包括用户管理在内的所有运维业务领域的各项特性模块,足以支撑运维单元内部可自闭环支撑本地运维团队实现自身的运维日常业务。保证了其自身的独立性,即可以自行的演进与迭代,各运维单元之间的版本可不同、归属的厂商也可以不同。在运维单元某个特定实例进行的特有能力的定制增强,影响范围仅限于对应运维单元自身,对其它运维单元,不会产生影响。以及,可实现重型分析任务的分布式计算,避免分析任务的单点资源消耗瓶颈。同时,各个运维单元之间,还可以通过通信连接以及通信接口将进行自动作业脚本、运维经验知识库等数据的交互,实现各运维单元积累的运维经验能力共享。
在一种可能的实现方式中,至少一个运维单元也可以不与云服务系统中的其它运维单元建立通信连接。需要说明的是,在该场景中,虽然上述至少一个运维单元未与其他运维单元进行数据交互,其仍然是与其他运维单元相同属于同一个客户租用的云,因此,即使在未通信的情况下,至少一个运维单元仍与其他运维单元属于同一个云服务系统。
在另一种可能的实现方式中,若至少一个运维单元的负载较大,若其执行各任务的分析,可能会加重计算资源的负载。在该场景中,至少一个运维单元可选地可向其它负载较小(也可以是通信质量最好的运维单元,本申请不做限定)发送本地云的资源状态,以使得其它运维单元进行资源分析,以得到资 源分析结果,并将资源分析结果上报给总部运维单元。其中,资源分析结果中可包括标识信息,用于标识该资源分析结果是对应于原有运维单元(即发送资源状态的运维单元)。
在又一种可能的实现方式中,如上文所述,各运维单元是对等的,在本申请实施例中,各运维单元均具有发布分析任务的能力。可选地,如图2所示的云服务系统中,河南运维人员同样可通过运维单元221向运维单元231以及运维单元241发布分析任务。运维单元231和运维单元241可基于分析任务,执行对应的资源分析,得到资源分析结果。运维单元231和运维单元241可向运维单元221反馈资源分析结果。
下面以具体示例对本申请实施例中的各运维单元之间的交互方式进行向细说明。在初始化阶段,各地运维人员可执行运维单元的部署以及配置。图5为示例性示出的云服务系统初始化流程示意图。请参照图5,具体包括但不限于如下步骤:
S501,各个地区的运维人员分别部署运维单元。
示例性的,结合图2所示的云服务系统,图5中的运维人员1可以是图2中的上海运维人员,运维单元1可以是运维单元211,云平台1即为部署在云210中的云平台,云平台2可基于云220中的硬件资源提供相应云服务。运维人员2可以是河南运维人员,运维单元2可以是运维单元221,云平台2即为部署在云220中的云平台,云平台2可基于云220中的硬件资源提供相应云服务。运维人员3可以是郑州运维人员,运维单元3可以是运维单元231,云平台3即为部署在云230中的云平台,云平台3可基于云230中的硬件资源提供相应云服务。
在本实例中,以图2中所示的云服务系统的组织结构为例,即,郑州的云230可以认为是河南的云220的下级分支机构。当然,在本申请实施例中,各运维单元是对等的,仅在组织结构中充当的角色不同。其中,角色不同权限也可以不相同。例如,运维单元3作为运维单元2的下级,运维单元2可向运维单元3发布分析任务,以使得运维单元3反馈资源分析结果。而运维单元3与运维单元2是对等的,其同样具备发布分析任务的能力,但是其在图2所示的云服务系统的组织结构中是运维单元2的下级,通常情况是不会向运维单元2发布分析任务的。当然,本申请实施例中所述的组织结构仅为示意性举例,组织结构以及角色可根据实际需求设置,本申请不做限定。
在初始化阶段,各运维人员可在本地云上部署运维单元。具体的,如上文所述,运维单元可以理解为是应用软件或程序指令。运维人员可在各个云(例如云210、云220、云230以及云240)上安装并运行运维单元对应的软件包,以在云上部署并运行运维单元。在本申请实施例中,各运维单元可以理解为是一个独立的云管平台,用于管理对应的云的云资源。
示例性的,部署阶段还可以包括设置运维单元的运维信息,例如包括但不限于:用户名、密码、地址信息(包括域名、IP地址等)等信息。
示例性的,各云的运维单元部署完成后,运维单元中的各模块同步启动。在本申请实施例中,以各运维单元具备的运维能力相同,即包括的模块相同为例进行说明。如上文所述,本申请实施例中的运维单元可具备单独扩展能力,各运维单元的运维能力可以部分或全部不同,运维人员可根据实际需求对单一运维单元进行扩展,本申请不做限定。
在本申请实施例中,各运维单元中的模块启动后,一些模块可选地开始获取云内的相关数据。例如,容量分析模块可周期性地获取云内的基础资源以及云服务容量信息,并进行容量分析。闲置资源分析模块可周期性的获取云内的各闲置资源,并进行闲置资源分析。负载资源分析模块可周期性的获取各类资源负载情况,并进行负载资源分析。以及,瓶颈资源你分析模块可周期性地获取各资源的使用情况,并进行瓶颈资源分析。
在一种可能的实现方式中,各分析模块可以仅获取对应的资源状态,而不执行分析动作。在接收到用户的查询指令或者是总部运维单元发布的分析任务之后,可基于最近一次获取到的资源状态,执行分析动作,并得到分析结果。
在另一种可能的实现方式中,各分析模块也可以在接收到用户的查询指令或总部运维单元发布的分析任务之后,获取云内的资源状态,并执行分析,得到分析结果。
也就是说,本申请实施例中的云服务系统中的运维操作以运维单元为颗粒度执行,每个运维单元可对应执行运维操作,而不影响其它运维单元的执行。并且,各运维单元可对本地云执行运维操作,无需再通过其它云管平台以造成数据绕行和通信资源浪费。
S502,各地区的运维人员管理各运维单元。
在本申请实施例中,运维单元作为独立的云管平台,其可用于管理对应的云平台。例如,运维单元可通过各模块实时获取云平台的各项参数,以对云平台的运行状态和资源状态进行监测。当检测云平台发生故障时,运维单元的告警模块可进行告警。再例如,运维单元也可以对云平台进行更新,例如为云平台的部分组件进行升级等,本申请不做限定。也就是说,运维单元对于云平台的管理,可以实现已有技术中的云管平台的各项运维能力,但是在本申请实施例中,各运维单元可独立的对各自管理的云平台进行管理,相互之间无需进行交互。
可选地,在管理阶段,运维单元同样实现对云内硬件资源的管理。当然,也可以认为是对云平台的硬件资源进行管理。例如,对硬件资源的运行状态和资源使用状态进行监控,并及时进行告警等。
S503,运维人员指定总部角色。
在本申请实施例中,以运维人员指定运维单元1,即位于上海的云210中的运维单元作为总部角色,也可以称为主运维单元为例进行说明。
需要说明的是,本申请实施例中的总部角色的运维单元与其他运维单元是对等的,其中,运维单元“对等”可以理解为是总部运维单元与其它运维单元均具备发布分析任务和反馈分析结果的能力。在该场景中,仅以总部角色作为发布分析任务的主体为例进行说明。当然,在其他实施例中,也可以指定其它运维单元作为总部角色。可选地,总部角色也可以是包括多个,本申请不做限定。
仍参照图5,运维人员1可通过电子设备接入运维单元1。运维单元1可提供运维界面,运维人员1可通过运维界面输入用户名和密码等信息,以登录运维单元。运维单元1接收到用户输入的信息,对用户信息进行验证,并在验证成功后,允许运维人员1登录。
运维人员1登录成功后,可通过电子设备向运维单元1发送指令信息,用于指示运维单元1作为总部运维单元。即,运维单元1在云服务系统的组织结构中作为总部角色。
S504,运维单元1与运维单元2和运维单元3对接。
运维单元1响应于接收到的指令信息,确定运维单元1作为总部角色。运维单元1向运维单元2发送对接指示信息,对接指示信息中包括但不限于:对接指示、运维单元1的标识信息(例如为地址信息)以及运维单元2的标识信息(例如为地址信息)。对接指示信息用于指示运维单元1作为运维单元2的总部角色,也可以理解为是运维单元1作为云服务系统的总部角色。
运维单元2响应于接收到的对接指示信息,基于对接指示信息中的对接指示、运维单元1的标识信息和运维单元2的标识信息,确定运维单元1在云服务系统中作为总部运维单元,即作为总部角色。
运维单元2向运维单元1发送对接响应信息,对接响应信息中包括但不限于:对接成功指示、运维单元1的标识信息、运维单元2的标识信息以及运维单元2的运维信息。其中,运维信息包括但不限于:运维单元2的用户名、密码和地址信息等信息,本申请不做限定。可选地,运维单元2在耳机收到对接指示信息之后,可提示用户接收到对接指示信息,并在接收到用户允许指示之后,再发送对接响应信息。
示例性的,运维单元1接收运维单元2发送的对接响应信息,运维单元1基于对接响应信息,确定运维单元2同意对接,即允许运维单元1作为总部运维单元。示例性的,运维单元1可基于接收到的运维信息中的地址信息,与运维单元2建立通信连接。
运维单元3的交互流程与运维单云2相同,此处不再赘述。
在一种可能的实现方式中,以图2中的组织结构为例,运维单元1与运维单元2进行对接,建立通信连接。运维单元2可与运维单元3进行对接,建立通信连接。在该场景中,运维单元1在发布分析任务时,可以向运维单元2发布分析任务。运维单元2可执行分析任务,并向运维单元3发布分析任务。运维单元3向运维单元2反馈分析结果。运维单元2将本地的分析结果以及运维单元2的分析结果通过与运维电源1之间的通信连接反馈给运维单元1。
在另一种可能的实现方式中,以图2中的组织结构为例,在该场景中,运维单元3作为运维单元2的下级,与运维单元2进行对接并建立通信连接,运维单元3同样可以与运维单元1进行对接并建立通信连接。在该场景中,运维单元1在发布分析任务时,可通过与运维单元2之间的通信连接向运维单元2发布分析任务,并接收运维单元2反馈的分析结果。同样的,运维单元1可通过与运维单元3之间的通信连接向运维单元3发布分析任务,并接收运维单元3反馈的分析结果。可以理解为,云服务系统中的组织结构是根据用户需求设置的,可以理解为是逻辑上的结构。而云服务系统的组织结构不会影响各运维单元之间的通信结构。
需要说明的是,本申请实施例中仅以总部角色向其它运维单元发布分析任务为例,对各运维单元之 间的交互进行说明。在其他实施例中,各运维单元也可以进行其它数据的交互,本申请不做限定。
下面以图6所示的运维单元交互流程示意图,对本申请实施例中运维单元所实现的分布式分析任务执行流程进行详细说明。请参照图6,具体包括但不限于如下步骤:
S601,运维人员1指示运维单元1发布闲置资源分析任务。
示例性的,运维单元完成图5所示的初始化之后,运维单元1分别与运维单元和运维单元3建立通信连接。
运维人员1可通过电子设备向运维单元1下发用户指令(也可以称为运维指令或运维指示信息等,本申请不做限定),用户指令中包括但不限于:目标运维单元的标识信息和闲置资源分析任务指示,用于指示运维单元1获取目标运维单元的闲置资源分析结果。在本实例中,以目标运维单元为运维单元1、运维单元2和运维单元3为例进行说明。在其他实施例中,目标运维单元可以本云服务系统中的所有运维单元。在该示例中,如上文所述,若存在未与运维单元1进行对接的运维单元,则对于这类运维单元,可通过其父节点(即上级运维单元)向其发布闲置资源分析任务。
可选地,如果存在孤立运维单元,即未与任何运维单元建立通信连接的运维单元,则无需对其进行处理。可选地,目标运维单元也可以是指本云服务系统中的所有与运维单元1(即总部运维单元)完成对接的运维单元。可选地,目标运维单元还可以是云服务系统中的至少一个运维单元。例如,目标运维单元为运维单元2,则用户指令中包括运维单元2的标识信息和闲置资源分析任务指示,用于指示运维单元1获取运维单元2的闲置资源分析任务。
S602a,运维单元1向运维单元2发送闲置资源分析任务。
S602b,运维单元1向运维单元3发送闲置资源分析任务。
S603a,运维单元2向运维单元1发送闲置资源分析结果。
S603b,运维单元2向运维单元1发送闲置资源分析结果。
S603c,运维单元1获取本地闲置资源分析任务。
示例性的,运维单元1基于用户指令,确定目标运维单元为运维单元2和运维单元3,以及需要执行的运维操作为获取运维单元2和运维单元3的闲置资源分析结果。相应的,运维单元1通过通信接口以及与运维单元2之间的通信连接,向运维单元2发送资源分析指示信息,资源分析指示信息中包括运维单元2的标识信息以及闲置资源分析任务指示,用于指示运维单元2执行闲置资源分析的运维操作,并反馈闲置资源分析结果。
以及,运维单元1通过通信接口以及与运维单元3之间的通信连接,向运维单元3发送资源分析指示信息,资源分析指示信息中包括运维单元3的标识信息以及闲置资源分析任务指示,用于指示运维单元3执行闲置资源分析的运维操作,并反馈闲置资源分析结果。
在一种可能的实现方式中,运维单元1向运维单元2和运维单元3发送闲置资源分析任务之前,与运维单元2和运维单元3进行鉴权。以运维单元2为例,运维单元1通过通信连接向运维单元2发送鉴权信息,鉴权信息中包括运维单元2的用户名和密码(该信息是在图5的对接流程中获取到的)。运维单元2接收到鉴权信息后,对用户名和密码进行鉴权。运维单元2确定鉴权成功后,向运维单元1发送响应信息,用于指示鉴权成功。运维单元1接收到响应信息,确定鉴权成功后,可继续执行S602a。运维单元3的处理与运维单元2的处理相同,此处不再赘述。可选地,在图5所示的对接过程中,运维单元1也可以向运维单元2发送运维单元1的运维信息,包括但不限于:运维单元1的用户名、密码以及地址信息等。在一些实施例中,若运维单元主动拉取数据,即如图6中所示,运维单元1主动向运维单元2请求数据,则鉴权过程中,仅需要运维单元2对运维单元1进行鉴权,即,确定运维单元1是否持有运维单元2的用户名和密码,也可以理解为是在持有运维单元2的用户名和密码的情况下,运维单元1才具有从运维单元1获取数据的权限。在另一些示例中,若运维单元被动获取数据,例如运维单元2向运维单元1发送告警信息,对于运维单元1,其即为被动接收数据,在该场景中,运维单元1需要对运维单元2进行鉴权,即运维单元2向运维单元1发送数据之前,向运维单元1发送运维单元1的用户名和密码以进行鉴权。在运维单元1确定运维单元2的鉴权成功并向运维单元2发送响应信息后,运维单元2基于响应信息,确定鉴权成功,并向运维单元1发送告警信息。
仍参照图6,示例性的,运维单元2通过通信接口接收资源分析指示信息,基于资源分析指示信息,执行闲置资源分析运维操作。一个示例中,运维单元2(具体为闲置资源分析模块,下文中不再重复说明)周期性地获取闲置资源状态,并进行闲置资源分析,得到闲置资源分析结果。周期时长可根据实际需求 设置,本申请不做限定。运维单元2接收到资源分析指示信息后,向运维单元1反馈最近一次获取到的闲置资源分析结果。具体的,运维单元2可通过与运维单元1之间的通信连接,向运维单元1发送资源分析响应信息,资源分析响应信息中包括但不限于:运维单元1的标识信息、运维单元2的标识信息以及闲置资源分析结果。
另一个示例中,运维单元2周期性地获取闲置资源状态。运维单元2在接收到资源分析指示信息之后,基于最新获取到的闲置资源状态进行闲置资源分析,得到闲置资源分析结果,并通过与运维单元1之间的通信连接,将闲置资源分析结果反馈给运维单元1。具体的,运维单元2可通过与运维单元1之间的通信连接,向运维单元1发送资源分析响应信息,资源分析响应信息中包括但不限于:运维单元1的标识信息、运维单元2的标识信息以及闲置资源分析结果。
又一个示例中,运维单元2可在接收到资源分析指示信息之后,获取云内的闲置资源状态并进行闲置资源分析,得到闲置资源分析结果,并通过与运维单元1之间的通信连接,将闲置资源分析结果反馈给运维单元1。具体的,运维单元2可通过与运维单元1之间的通信连接,向运维单元1发送资源分析响应信息,资源分析响应信息中包括但不限于:运维单元1的标识信息、运维单元2的标识信息以及闲置资源分析结果。
也就是说,在本申请实施例中,运维单元2所获取到的闲置资源状态,可以接收到指示信息之后再触发获取的,也可以是在接收到指示信息之前获取到的,本申请不做限定。
示例性的,运维单元3的处理与运维单元2相同,此处不再赘述。需要说明的是,S602a、S602b的顺序不做限定。
示例性的,在本场景中,目标运维单元还包括运维单元1,相应的,运维单元1基于本地云(即云210)的闲置资源状态,得到闲置资源分析结果。闲置资源分析结果的获取方式可参照运维单元2处理时的相关描述,此处不再赘述。
可选地,运维单元1可显示获取到的闲置资源分析结果,包括运维单元1的闲置资源分析结果、运维单元2的闲置资源分析结果和运维单元3的闲置资源分析结果。
在一种可能的实现方式中,如上文所述,目标运维单元可以是云服务系统中的至少一个运维单元。若目标运维单元为运维单元2,则运维单元1执行S602a,而无需执行S602b。
在另一种可能的实现方式中,运维单元还可以执行其它任务的发布与回传,以及其他类型数据(例如告警信息)的发布与回传(其中回传可能是可选地)。其实现方式与图6中的流程类似,本申请不再逐一举例说明。
综上,对于诸如“闲置资源分析”等分析类的任务,其需要消耗大量的存量、计算资源。在本申请实施例中的云服务系统中,总部运维单元可发起全局的分析任务。在发起任务时,总部运维单元可向至少一个个运维单元发送分析任务,各运维单元收到总部的分析任务后,会基于本地云的资源状态执行分析任务,并将分析结果统一回传到总部运维单元,从而降低总部运维单元的计算压力,并降低通信交互所带来的通信开销。
下面以图7所示的运维单元交互流程示意图,对本申请实施例中运维单元所实现的经验知识共享流程进行详细说明。请参照图7,具体包括但不限于如下步骤:
S701,运维人员1指示运维单元1执行知识共享。
示例性的,运维人员1可通过电子设备向运维单元1下发用户指令(也可以称为运维指令或运维指示信息等,本申请不做限定),用户指令中包括但不限于:目标运维单元的标识信息和知识共享指示,用于指示运维单元1向目标运维单元发送运维经验知识和/或自动作业脚本等信息(以下均称为知识信息)。目标运维单元的描述可参照上文,此处不再赘述。
S702a,运维单元1向运维单元2发送指示信息。
S702b,运维单元1向运维单元3发送指示信息。
S703a,运维单元2学习指示信息。
S703b,运维单元2向运维单元1学习知识信息。
示例性的,运维单元1基于用户指令,确定目标运维单元为运维单元2和运维单元3,以及需要执行的运维操作为知识共享。相应的,运维单元1调用运维单元1中的知识库模块,以获取对应的知识信息。运维单元1通过通信接口以及与运维单元2之间的通信连接,向运维单元2发送知识共享指示信息,知识共享指示信息中包括但不限于:运维单元2的标识信息以及运维单元1的知识信息。
运维单元2通过通信接口接收知识共享指示信息。运维单元2基于指示共享指示信息,获取指示信息。运维单元2可学习指示信息,并在后续运维过程中,基于学习到的运维经验指示和/或自动作业脚本执行相应的运维操作。
示例性的,运维单元3的处理与运维单元2相同,此处不再赘述。
在一种可能的实现方式中,云服务系统中的任一运维单元均可主动(例如周期性地)或被动(即响应于运维人员的指示)触发知识共享,以向目标运维单元发送知识信息。其中,目标运维单元可以是预先设置的,也可以是运维人员通过用户指令指示的,本申请不做限定。
下面结合附图,对本申请实施例中的通信方法应用于异常场景的流程进行详细说明。图8为示例性示出的总部运维单元异常的场景示意图。请参照图8,在该场景中包括图2所示的云服务系统,其中,云服务系统中还包括位于湖南的云250,云250上部署运维单元251。相关描述可参照上文实施例中的相关内容,此处不再赘述。
在本实例中,以运维单元211作为总部运维单元,且发生故障为例进行说明。可选地,运维单元的故障原因可能是通信接口故障或模块故障,本申请不做限定。
示例性的,在该场景中,总部运维单元发生异常(即故障)之后,运维人员可重新指定总部运维单元。例如,图9为示例性示出的总部运维单元异常的场景示意图,请参照图9,河南运维人员可指示云220中的运维单元221作为云服务系统中的总部角色。具体的,河南运维人员可通过电子设备接入运维单元221,并通过电子设备向运维单元221发送用户指令,用户指令用于指示运维单元221作为总部运维单元。运维单元221接收到用户指令,确定作为总部角色。运维单元221与云服务系统中的其它运维单元对接。例如,运维单元221与运维单元251进行对接,并建立通信连接。具体方式可参照图6中的描述,此处不再赘述。在该场景中,由于运维单元221已经与运维单元231和运维单元241对接完成,并建立通信连接,运维单元221无需再与运维单元231和运维单元241执行对接步骤。
运维单元221作为新的总部运维单元,其可实现运维单元221之前执行的运维操作。例如,运维单元221可基于运维人员的指示,向运维单元251、运维单元231和运维单元241发布闲置资源分析任务等运维操作。具体实现可参照上文中对于运维单元221的相关描述,此处不再赘述。
综上,在本申请实施例中,云服务系统在运行过程中,被运维人员赋予“总部”角色的运维单元出现故障时,运维人员可以根据需要,选择云服务系统中另一个对等的运维单元作为总部角色,以承担原有总部运维单元的相关能力诉求的同时,隔离出现故障的总部运维单元。这样,在原总部运维单元故障的时间窗内,云服务系统中的新的总部运维单元仍然可以实现对除原总部运维单元之外其它运维单元的管理,从而可实现总部运维能力的快速承接与切换。
图10为示例性示出的运维单元通信异常的场景示意图。请参照图10,在该场景中包括图2所示的云服务系统,其中,云服务系统中还包括位于湖南的云250,云250上部署运维单元251。相关描述可参照上文实施例中的相关内容,此处不再赘述。
在本实例中,以运维单元211作为总部运维单元,运维单元251与运维单元211之间的通信连接发生异常为例进行说明。在该场景中,运维单元251与运维单元211之间的通信连接发生异常之后,由于运维单元251为独立的运维单元,其仍然可以继续执行对本地云平台的管理,例如可以继续获取本地云的资源状态,并进行资源状态分析等。
其中,在运维单元251的通信连接发生异常的时间窗内,运维单元251无法接收到总部运维单元发布的相关指令。可选地,总部运维单元(即运维单元211)可尝试周期性地向运维单元251发送指令,以探测通信连接是否正常。在确定通信连接发生正常之后,可向运维单元251发布相关指令。
上述详细阐述了本申请实施例的方法,下面提供本申请实施例的装置。
图11示出了本申请实施例的一种通信装置110的示意性框图。通信装置110可以包括:处理器1101和收发器/收发管脚1102,可选地,还包括存储器1103。该处理器1101可用于执行前述的实施例的各方法中的渲染装置所执行的步骤,并控制接收管脚接收信号,以及控制发送管脚发送信号。
装置1100的各个组件通过总线1104耦合在一起,其中总线系统1104除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图中将各种总线都标为总线系统1104。
可选地,存储器1103可以用于前述方法实施例中的存储指令。
应理解,根据本申请实施例的装置1100可对应于前述的实施例的各方法中的渲染装置,并且装置 1100中的各个元件的上述和其它管理操作和/或功能分别为了实现前述各个方法的相应步骤,为了简洁,在此不再赘述。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
基于相同的技术构思,本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序包含至少一段代码,该至少一段代码可由装置执行,以控制装置用以实现上述方法实施例。
基于相同的技术构思,本申请实施例还提供一种计算机程序,当该计算机程序被装置或计算机集群执行时,用以实现上述方法实施例。
所述程序可以全部或者部分存储在与处理器封装在一起的存储介质上,也可以部分或者全部存储在不与处理器封装在一起的存储器上。
基于相同的技术构思,本申请实施例还提供一种处理器,该处理器用以实现上述方法实施例。上述处理器可以为芯片。
基于相同的技术构思,本申请实施例还提供一种计算设备集群(也可以称为服务器集群、云基础设施、或云设备集群等),包括至少一个计算设备(例如服务器),每个计算设备包括处理器和存储器;至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得该计算设备执行上述方法实施例。
结合本申请实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一目标对象和第二目标对象等是用于区别不同的目标对象,而不是用于描述目标对象的特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个处理单元是指两个或两个以上的处理单元;多个系统是指两个或两个以上的系统。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (24)

  1. 一种云服务系统,其特征在于,包括:
    多个资源池;其中,所述多个资源池中的单一资源池用于提供云服务;
    所述多个资源池中的每个资源池部署有运维单元;其中,多个运维单元之间基于通信连接进行数据交互;
    所述多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池;
    其中,每个所述运维单元对本运维单元所属的资源池的管理,具体用于:
    接收用户指令;
    响应于所述用户指令,对本运维单元所属的资源池执行目标运维操作。
  2. 根据权利要求1所述的云服务系统,其特征在于,所述多个运维单元中包括第一运维单元和至少一个第二运维单元;所述第一运维单元部署于第一资源池,所述至少一个第二运维单元分别部署于至少一个第二资源池;
    所述第一运维单元,用于接收第一运维操作指示信息,所述第一运维操作指示信息用于指示获取所述多个资源池的资源状态评估结果;
    所述第一运维单元,还用于基于所述第一运维操作指示信息,向所述至少一个第二运维单元发送第一任务请求信息;其中,所述第一任务请求信息用于指示所述第二运维单元反馈对应的第二资源池的资源状态评估结果;
    所述第二运维单元,用于响应于接收到的所述第一任务请求信息,获取所述第二资源池的资源状态;基于所述第二资源池的资源状态,得到所述第二资源池的资源状态评估结果;向所述第一运维单元发送第一任务响应信息,其中,所述第一任务响应信息包括所述第二资源池的资源状态评估结果;
    所述第一运维单元,还用于接收所述至少一个第二运维单元反馈的第一任务响应信息;基于所述第一任务响应信息,获取所述至少一个第二资源池的资源状态评估结果。
  3. 根据权利要求2所述的云服务系统,其特征在于,
    所述第一运维单元接收所述第一运维操作指示信息之前,所述第一运维单元,还用于接收第一用户指示信息,所述第一用户指示信息用于指示所述第一运维单元作为管理节点;响应于所述第一用户指示信息,向各第二运维单元发送第一对接请求信息,所述第一对接请求信息用于指示所述第一运维单元作为所述云服务系统的管理节点;
    所述第二运维单元,还用于响应于接收到的所述第一对接请求信息,确定所述第一运维单元为管理节点且所述第二运维单元为被管理节点;向所述第一运维单元发送第一对接响应信息,其中,所述第一对接响应信息包括所述第二运维单元的运维信息。
  4. 根据权利要求2所述的云服务系统,其特征在于,
    若所述第一运维单元发生异常,所述至少一个第二运维单元中的第三运维单元,用于:
    接收第二用户指示信息,所述第二用户指示信息用于指示所述第三运维单元作为管理节点;响应于所述第二用户指示信息,向各第二运维单元发送第二对接请求信息,所述第二对接请求信息用于指示所述第三运维单元作为所述云服务系统的管理节点;
    所述第二运维单元,还用于响应于接收到的所述第二对接请求信息,确定所述第三运维单元为管理节点且所述第二运维单元为被管理节点;向所述第三运维单元发送第二对接响应信息,所述第二对接响应信息包括所述第二运维单元的运维信息。
  5. 根据权利要求4所述的云服务系统,其特征在于,
    所述第三运维单元,还用于接收第二运维操作指示信息,所述第二运维操作指示信息用于指示获取所述多个资源池的资源状态评估结果;
    所述第三运维单元,还用于基于所述第二运维操作指示信息,向所述至少一个第二运维单元发送第 二任务请求信息;其中,所述第二任务请求信息用于指示所述至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果;
    所述第三运维单元,还用于基于所述第二运维操作指示信息,获取第三资源池的资源状态;基于所述第三资源池的资源状态,得到所述第三资源池的资源状态评估结果;其中,所述第三运维单元部署于所述第三资源池;
    所述第二运维单元,用于响应于接收到的所述第二任务请求信息,获取所述第二资源池的资源状态;基于所述第二资源池的资源状态,得到所述第二资源池的资源状态评估结果;向所述第三运维单元发送第二任务响应信息,其中,所述第二任务响应信息包括所述第二资源池的资源状态评估结果;
    所述第三运维单元,还用于接收所述至少一个第二运维单元反馈的第二任务响应信息;基于所述第二任务响应信息,获取所述至少一个第二资源池的资源状态评估结果。
  6. 根据权利要求2至5任一项所述的云服务系统,其特征在于,若所述第二运维单元中的任一第二运维单元与所述第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理所述第二资源池。
  7. 根据权利要求1至6任一项所述的云服务系统,其特征在于,所述多个运维单元中包括第三运维单元和第四运维单元,
    所述第三运维单元,用于接收第一单元更新指令;基于所述第一单元更新指令,更新所述第三运维单元的指定运维能力;
    所述第四运维单元,用于接收第二单元更新指令;基于所述第二单元更新指令,更新所述第四运维单元的指定运维能力;
    其中,所述第一单元更新指令与所述第二单元更新指令所指示的待更新的运维能力相同或不同。
  8. 根据权利要求1至7任一项所述的云服务系统,其特征在于,所述云服务系统还包括服务器集群;
    所述多个资源池中的每个资源池中包括至少一个服务器集群;其中,所述服务器集群中的服务器的型号可以相同或不同。
  9. 一种通信方法,其特征在于,应用于云服务系统,所述云服务系统包括多个资源池;其中,所述多个资源池中的单一资源池用于提供云服务;所述多个资源池中的每个资源池部署有运维单元;其中,多个运维单元之间基于通信连接进行数据交互;所述多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池;所述方法包括:
    所述运维单元接收用户指令;
    所述运维单元响应于所述用户指令,对本运维单元所属的资源池执行目标运维操作。
  10. 根据权利要求9所述的方法,其特征在于,所述多个运维单元中包括第一运维单元和至少一个第二运维单元;所述第一运维单元部署于第一资源池,所述至少一个第二运维单元分别部署于至少一个第二资源池,所述方法还包括:
    所述第一运维单元接收第一运维操作指示信息,所述第一运维操作指示信息用于指示获取所述多个资源池的资源状态评估结果;
    所述第一运维单元基于所述第一运维操作指示信息,向所述至少一个第二运维单元发送第一任务请求信息;其中,所述第一任务请求信息用于指示所述第二运维单元反馈对应的第二资源池的资源状态评估结果;
    所述第二运维单元响应于接收到的所述第一任务请求信息,获取所述第二资源池的资源状态;基于所述第二资源池的资源状态,得到所述第二资源池的资源状态评估结果;向所述第一运维单元发送第一任务响应信息,其中,所述第一任务响应信息包括所述第二资源池的资源状态评估结果;
    所述第一运维单元接收所述至少一个第二运维单元反馈的第一任务响应信息;基于所述第一任务响应信息,获取所述至少一个第二资源池的资源状态评估结果。
  11. 根据权利要求10所述的方法,其特征在于,所述第一运维单元接收所述第一运维操作指示信息 之前,所述方法还包括:
    所述第一运维单元接收第一用户指示信息,所述第一用户指示信息用于指示所述第一运维单元作为管理节点;响应于所述第一用户指示信息,向各第二运维单元发送第一对接请求信息,所述第一对接请求信息用于指示所述第一运维单元作为所述云服务系统的管理节点;
    所述第二运维单元响应于接收到的所述第一对接请求信息,确定所述第一运维单元为管理节点且所述第二运维单元为被管理节点;向所述第一运维单元发送第一对接响应信息,其中,所述第一对接响应信息包括所述第二运维单元的运维信息。
  12. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    若所述第一运维单元发生异常,所述至少一个第二运维单元中的第三运维单元接收第二用户指示信息,所述第二用户指示信息用于指示所述第三运维单元作为管理节点;
    所述第三运维单元响应于所述第二用户指示信息,向各第二运维单元发送第二对接请求信息,所述第二对接请求信息用于指示所述第三运维单元作为所述云服务系统的管理节点;
    所述第二运维单元响应于接收到的所述第二对接请求信息,确定所述第三运维单元为管理节点且所述第二运维单元为被管理节点;
    所述第二运维单元向所述第三运维单元发送第二对接响应信息,所述第二对接响应信息包括所述第二运维单元的运维信息。
  13. 根据权利要求12所述的方法,其特征在于,
    所述第三运维单元接收第二运维操作指示信息,所述第二运维操作指示信息用于指示获取所述多个资源池的资源状态评估结果;
    所述第三运维单元基于所述第二运维操作指示信息,向所述至少一个第二运维单元发送第二任务请求信息;其中,所述第二任务请求信息用于指示所述至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果;
    所述第三运维单元基于所述第二运维操作指示信息,获取第三资源池的资源状态;基于所述第三资源池的资源状态,得到所述第三资源池的资源状态评估结果;其中,所述第三运维单元部署于所述第三资源池;
    所述第二运维单元响应于接收到的所述第二任务请求信息,获取所述第二资源池的资源状态;基于所述第二资源池的资源状态,得到所述第二资源池的资源状态评估结果;向所述第三运维单元发送第二任务响应信息,其中,所述第二任务响应信息包括所述第二资源池的资源状态评估结果;
    所述第三运维单元接收所述至少一个第二运维单元反馈的第二任务响应信息;基于所述第二任务响应信息,获取所述至少一个第二资源池的资源状态评估结果。
  14. 根据权利要求10至13任一项所述的方法,其特征在于,若所述第二运维单元中的任一第二运维单元与所述第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理所述第二资源池。
  15. 根据权利要求9至14任一项所述的方法,其特征在于,所述多个运维单元中包括第三运维单元和第四运维单元,所述方法还包括:
    所述第三运维单元接收第一单元更新指令;
    所述第三运维单元基于所述第一单元更新指令,更新所述第三运维单元的指定运维能力;
    所述第四运维单元接收第二单元更新指令;
    所述第四运维单元基于所述第二单元更新指令,更新所述第四运维单元的指定运维能力;
    其中,所述第一单元更新指令与所述第二单元更新指令所指示的待更新的运维能力相同或不同。
  16. 一种通信方法,其特征在于,应用于云服务系统,所述云服务系统包括多个资源池;其中,所述多个资源池中的单一资源池用于提供云服务;所述多个资源池中的每个资源池部署有运维单元;其中,多个运维单元之间基于通信连接进行数据交互;所述多个运维单元中的每一运维单元用于管理一个资源池,其中,每一运维单元管理的一个资源池为本运维单元所属的资源池;所述多个运维单元中包括第一 运维单元和至少一个第二运维单元;所述第一运维单元部署于第一资源池,所述至少一个第二运维单元分别部署于至少一个第二资源池,所述方法包括:
    所述第一运维单元接收第一运维操作指示信息,所述第一运维操作指示信息用于指示获取所述多个资源池的资源状态评估结果;
    所述第一运维单元基于所述第一运维操作指示信息,向所述至少一个第二运维单元发送第一任务请求信息;其中,所述第一任务请求信息用于指示所述第二运维单元反馈对应的第二资源池的资源状态评估结果;
    所述第二运维单元响应于接收到的所述第一任务请求信息,获取所述第二资源池的资源状态;基于所述第二资源池的资源状态,得到所述第二资源池的资源状态评估结果;向所述第一运维单元发送第一任务响应信息,其中,所述第一任务响应信息包括所述第二资源池的资源状态评估结果;
    所述第一运维单元接收所述至少一个第二运维单元反馈的第一任务响应信息;基于所述第一任务响应信息,获取所述至少一个第二资源池的资源状态评估结果。
  17. 根据权利要求16所述的方法,其特征在于,所述第一运维单元接收所述第一运维操作指示信息之前,所述方法还包括:
    所述第一运维单元接收第一用户指示信息,所述第一用户指示信息用于指示所述第一运维单元作为管理节点;响应于所述第一用户指示信息,向各第二运维单元发送第一对接请求信息,所述第一对接请求信息用于指示所述第一运维单元作为所述云服务系统的管理节点;
    所述第二运维单元响应于接收到的所述第一对接请求信息,确定所述第一运维单元为管理节点且所述第二运维单元为被管理节点;向所述第一运维单元发送第一对接响应信息,其中,所述第一对接响应信息包括所述第二运维单元的运维信息。
  18. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    若所述第一运维单元发生异常,所述至少一个第二运维单元中的第三运维单元接收第二用户指示信息,所述第二用户指示信息用于指示所述第三运维单元作为管理节点;
    所述第三运维单元响应于所述第二用户指示信息,向各第二运维单元发送第二对接请求信息,所述第二对接请求信息用于指示所述第三运维单元作为所述云服务系统的管理节点;
    所述第二运维单元响应于接收到的所述第二对接请求信息,确定所述第三运维单元为管理节点且所述第二运维单元为被管理节点;
    所述第二运维单元向所述第三运维单元发送第二对接响应信息,所述第二对接响应信息包括所述第二运维单元的运维信息。
  19. 根据权利要求18所述的方法,其特征在于,
    所述第三运维单元接收第二运维操作指示信息,所述第二运维操作指示信息用于指示获取所述多个资源池的资源状态评估结果;
    所述第三运维单元基于所述第二运维操作指示信息,向所述至少一个第二运维单元发送第二任务请求信息;其中,所述第二任务请求信息用于指示所述至少一个第二运维单元反馈对应的第二资源池的资源状态评估结果;
    所述第三运维单元基于所述第二运维操作指示信息,获取第三资源池的资源状态;基于所述第三资源池的资源状态,得到所述第三资源池的资源状态评估结果;其中,所述第三运维单元部署于所述第三资源池;
    所述第二运维单元响应于接收到的所述第二任务请求信息,获取所述第二资源池的资源状态;基于所述第二资源池的资源状态,得到所述第二资源池的资源状态评估结果;向所述第三运维单元发送第二任务响应信息,其中,所述第二任务响应信息包括所述第二资源池的资源状态评估结果;
    所述第三运维单元接收所述至少一个第二运维单元反馈的第二任务响应信息;基于所述第二任务响应信息,获取所述至少一个第二资源池的资源状态评估结果。
  20. 根据权利要求16至19任一项所述的方法,其特征在于,若所述第二运维单元中的任一第二运维 单元与所述第一运维单元之间的通信异常,基于存在通信异常的第二运维单元管理所述第二资源池。
  21. 根据权利要求16至20任一项所述的方法,其特征在于,所述多个运维单元中包括第三运维单元和第四运维单元,所述方法还包括:
    所述第三运维单元接收第一单元更新指令;
    所述第三运维单元基于所述第一单元更新指令,更新所述第三运维单元的指定运维能力;
    所述第四运维单元接收第二单元更新指令;
    所述第四运维单元基于所述第二单元更新指令,更新所述第四运维单元的指定运维能力;
    其中,所述第一单元更新指令与所述第二单元更新指令所指示的待更新的运维能力相同或不同。
  22. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求9至21中任一项所述的方法。
  23. 一种计算机可读存储介质,其特征在于,包括:
    所述计算机可读存储介质用于存储指令或计算机程序;当所述指令或所述计算机程序被执行时,使如权利要求9至21中任一项所述的方法被实现。
  24. 一种计算机程序产品,其特征在于,包括:指令或计算机程序;
    所述指令或所述计算机程序被执行时,使如权利要求9至21中任一项所述的方法被实现。
PCT/CN2024/073866 2023-03-07 2024-01-24 通信方法及云服务系统 WO2024183493A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310213434.9A CN118660051A (zh) 2023-03-07 2023-03-07 通信方法及云服务系统
CN202310213434.9 2023-03-07

Publications (1)

Publication Number Publication Date
WO2024183493A1 true WO2024183493A1 (zh) 2024-09-12

Family

ID=92674083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/073866 WO2024183493A1 (zh) 2023-03-07 2024-01-24 通信方法及云服务系统

Country Status (2)

Country Link
CN (1) CN118660051A (zh)
WO (1) WO2024183493A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105471960A (zh) * 2015-08-03 2016-04-06 北京汉柏科技有限公司 一种私有云与公有云的信息交互系统及方法
CN108234566A (zh) * 2016-12-21 2018-06-29 阿里巴巴集团控股有限公司 一种集群的数据处理方法及装置
CN109981333A (zh) * 2018-12-28 2019-07-05 华为技术有限公司 一种应用于数据中心的运维方法和运维设备
WO2019233322A1 (zh) * 2018-06-06 2019-12-12 华为技术有限公司 资源池的管理方法、装置、资源池控制单元和通信设备
CN111343263A (zh) * 2020-02-21 2020-06-26 北京京东尚科信息技术有限公司 批量私有云的运维系统和运维方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105471960A (zh) * 2015-08-03 2016-04-06 北京汉柏科技有限公司 一种私有云与公有云的信息交互系统及方法
CN108234566A (zh) * 2016-12-21 2018-06-29 阿里巴巴集团控股有限公司 一种集群的数据处理方法及装置
WO2019233322A1 (zh) * 2018-06-06 2019-12-12 华为技术有限公司 资源池的管理方法、装置、资源池控制单元和通信设备
US20210064438A1 (en) * 2018-06-06 2021-03-04 Huawei Technologies Co., Ltd. Resource Pool Management Method and Apparatus, Resource Pool Control Unit, and Communications Device
CN109981333A (zh) * 2018-12-28 2019-07-05 华为技术有限公司 一种应用于数据中心的运维方法和运维设备
CN111343263A (zh) * 2020-02-21 2020-06-26 北京京东尚科信息技术有限公司 批量私有云的运维系统和运维方法

Also Published As

Publication number Publication date
CN118660051A (zh) 2024-09-17

Similar Documents

Publication Publication Date Title
US9652281B2 (en) System and method for virtualization aware server maintenance mode
US10355988B1 (en) System, method, and computer program for preserving service continuity in a network function virtualization (NFV) based communication network
KR101954480B1 (ko) 클라우드-컴퓨팅 스탬프의 자동화된 구축
CN106688210B (zh) 用于扩充利用网络功能虚拟化协调器(nfv-o)的物理系统的系统、方法和计算机程序
US9450783B2 (en) Abstracting cloud management
CN110971614A (zh) 物联网适配方法、系统、计算机设备及存储介质
CN107911463B (zh) 一种业务跨云架构及其创建方法、管理方法
CN111989681A (zh) 自动部署的信息技术(it)系统和方法
US20070250608A1 (en) System and method for dynamic server allocation and provisioning
KR102524126B1 (ko) 5g 인프라 구축을 위한 분산 클라우드 시스템의 설계 및 설치를 제공하는 장치 및 방법
CN107544783B (zh) 一种数据更新方法、装置及系统
CN102664747B (zh) 一种云计算平台系统
KR102524540B1 (ko) 멀티 클라우드 서비스 플랫폼 장치 및 방법
CN103458055A (zh) 一种云计算平台
CN105556499A (zh) 智能自动缩放
US20200301690A1 (en) Method and system for managing the end to end lifecycle of a cloud-hosted desktop virtualization environment
CN100421382C (zh) 高扩展性互联网超级服务器的维护单元结构及方法
CN110932914A (zh) 部署方法、部署装置、混合云系统架构及计算机存储介质
CN110855739B (zh) 一种基于容器技术的异地及异构资源统一管理方法及系统
CN114615268A (zh) 基于Kubernetes集群的服务网络、监控节点、容器节点及设备
WO2024183493A1 (zh) 通信方法及云服务系统
Nogueira et al. Network virtualization system suite: Experimental network virtualization platform
US9973569B2 (en) System, method and computing apparatus to manage process in cloud infrastructure
US20230336407A1 (en) Automated server restoration construct for cellular networks
WO2017193285A1 (zh) 软件管理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24766216

Country of ref document: EP

Kind code of ref document: A1