CN101594383B - Method for monitoring service and status of controllers of double-controller storage system - Google Patents
Method for monitoring service and status of controllers of double-controller storage system Download PDFInfo
- Publication number
- CN101594383B CN101594383B CN200910017117XA CN200910017117A CN101594383B CN 101594383 B CN101594383 B CN 101594383B CN 200910017117X A CN200910017117X A CN 200910017117XA CN 200910017117 A CN200910017117 A CN 200910017117A CN 101594383 B CN101594383 B CN 101594383B
- Authority
- CN
- China
- Prior art keywords
- controller
- state
- service
- module
- active
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000012544 monitoring process Methods 0.000 title abstract description 7
- 238000004891 communication Methods 0.000 claims abstract description 16
- 230000005540 biological transmission Effects 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Landscapes
- Hardware Redundancy (AREA)
Abstract
The invention discloses a method for monitoring the service and the status of controllers of a double-controller storage system. Three software function modules, namely a node communication (COMM) module, a cluster service management (CSM) module and a local service management module (LSM) are defined, and the mutual communication is performed among the modules, wherein the function of the COMM module is to receive information from the CSM module and transfer the information to the COMM module of another controller through communication media of the controller; the CSM module receives information of the opponent controller from the COMM module, receives operation status information of a local service group from the LSM module, decides a status value of the controller and sends the status value to the COMM module and the LSM module; the LSM module acquires the status value from the COMM module, adjusts and detects the operation status of a local controller service group of a service group at regular time, and finally notifies the CSM module of the operation status; and each controller needs to decide the status of the local controller after the system is started to realize the automatic fault switching of the controllers and the sustainable utilization of storage service.
Description
Technical field
The present invention relates to Computer Storage and service monitoring technique, a kind of service of double controller storage system specifically and controller state method for supervising
Technical background
Along with computerized information extensive use and development; The reliability of calculating and data becomes the core of information; The reliability requirement of computer memory device and system is also more and more stronger, for the memory device and the system of single controller, if controller breaks down (being generally hardware fault); Stores service is just unavailable, causes data, services interruption even data integrity to be damaged.Memory device and the system of sharing two control controllers of a plurality of disk array RAID have realized that the redundant of data and service is equipped with mutually; When a controller failure took place in application double controller storage system, another controller should detect and take over its all services.Service operation condition monitoring and failover how effectively to realize double controller storage system reliably are the problems that two controlled storage systems will solve.
Summary of the invention
The service and the controller state method for supervising that the purpose of this invention is to provide a kind of double controller storage system.
The objective of the invention is to realize by following mode; Demand to two controlled storage system service monitorings; The invention solves the monitoring and the switching of service operation state; And with controller state value concise representation controller service operation state, be to drive to adjust service operation and stop with the controller state, be the monitor mode of switch unit with the service relatively.
For realizing the objective of the invention is to share state and service operation information on the double-control system through two control inter-node communications; Perhaps regularly heartbeat message is overtime when a node service operation is unusual; Then normal node will be according to the design switching state, and is respective service operation and stopping on the target adjustment controller with this state;
Define three software function modules: node communication module COMM, cluster service administration module CSM and local service administration module LSM.Can intercom mutually between the module, wherein COMM module major function is to receive to be transferred to another controller COMM module from the information of CSM module and through the controller communication media; The CSM module receives the other side's controller information from the COMM module, receives local service group running state information from the LSM module, and the state value of this controller of making a strategic decision also sends to the COMM module and the LSM module; The LSM module is obtained state value from the COMM module, and adjustment also regularly detects service groups local controller service groups ruuning situation, notice CSM module.The state of two controller operations is respectively active:active; Takeover:standby and standby:takeover;
Behind definite local controller state; Whether the LSM module regularly inquires about this controller state and detects local service group running status consistent; If inconsistently then carry out consistency adjustment, adjustment is operating as attempts carrying out maximum starting or stoping of serving unusually for N time, and N is the numerical value that presets; More than or equal to 0, the adjustment operation may be failed; The CSM module is obtained the service groups running status from LSM, and receives and timed sending node transmission information through the circulation of COMM module; The node transport packet is drawn together controller state and each service operation state of service groups; The operation or stop; The CSM module obtains the nodal information of the other side's transmission then with the time interocclusal record that receives at every turn; If unusual to the square controller service operation, perhaps surpass Preset Time and do not receive the other side's nodal information, then taking over the other side's service groups and changing this controller state is takeover;
When this controller in running owing to detecting the other side and unusually carry out state variation; If need to take over to the square controller service; Then before taking over, carry out a program that presets; This program generally is to trigger electronic switch to restart or stop square controller, thoroughly isolates the controller of square controller to Service Source;
When double-control system is in a takeover, another standby state switches to the state that two controllers all are active to it; Be active, the active state, method is to add the sign of a detection; Be called the extended mode value; Transmit together in company with heartbeat message, as this extended mode value during, then ignore the other side's state variation and do not change this controller state automatically for preset particular value; This extended mode value is set for presetting particular value and changing this node state value during switching, all changes to the active state respectively, cancel the extended mode value at last;
Stop all controllers in the double-control system, method and from takeover, standby switches to active, and active is similar, promptly uses the extended mode value and the inactive state is set, and after corresponding all service stopping, stops COMM, CSM, LSM module;
Single controller in will restarting or stop two controlled storage systems, and this moment, two controllers moved, and method is that the non-controller state that stops to be set to takeover earlier, and the controller state that will restart or stop again is set to inactive; Concrete controlled step is following:
S1. putting the local controller state is inactive, does not start any service groups, and beginning reiving/transmitting state information;
S2 waits for some seconds, and second number is by the maximum decision of system start-up time difference, and purpose is to guarantee can reach equal state at last each other when two controllers start simultaneously;
S3 shows that the other side is also starting if the other side's state is inactive also, and it is active that the local controller state then is set, execution in step S7;
S4 is if the other side's state is active, and it is active that the local controller state then is set, execution in step S7;
S5 shows that the other side has moved all service groups if do not receive the other side's information or the other side's state is takeover, and it is standby that the local controller state then is set, execution in step S7;
S6 is if the other side's state is standby, and it is takeover that the local controller state then is set, execution in step S7;
S7 is according to local controller state adjustment service groups running status and regularly detect local controller state value and service groups ruuning situation.
The invention has the advantages that: be coupled to controller state in the service operation situation, convenient two controlled storage system states are checked and are safeguarded, are particularly suitable for two controlled storage systems of two nodes.
Description of drawings
Fig. 1 is a software module structure;
Fig. 2 controller system starting state monitoring flow chart;
State transition diagram when Fig. 3 has explained the double controller storage system operation.
Embodiment
Detailed explanation below with reference to accompanying drawing method of the present invention being done.
Two controlled storage system structures are that two controllers are connected to same disk groups; The communication media that is necessary between the controller; Generally comprise direct-connected Ethernet card, direct-connected serial ports, same LAN, on two controllers, all move the monitoring program of carrying out the inventive method.
The service that double-control system is provided is divided into two groups, is designated as ServiceGroup0, ServiceGroup1, and generally each service groups comprises iSCSI Target service and FC Target service.The service of each service groups all provides startup, stops to operate with status poll.Take place unusual or because controller failure when causing service to switch to another controller, is to be that unit switches with service place service groups in service.
For distinguishing two controllers, be designated controller 0 and controller 1 respectively, monitoring program note controller system Status Type has inactive; Active, takeover, standby; State value is that inactive representes that controller is in service state is not provided, and is not ready for providing service; State value active, all services that expression should operation service group 0 on controller 0, all services that expression should operation service group 1 on controller 1; A controller state value is takeover, and all services of expression service groups 0 and service groups 1 all move on this controller; A controller state value is standby, and expression does not provide the service of arbitrary service groups, but can switch to the active state by system manager's manual operation, perhaps detects another controller heartbeat timeout and initiatively switches to the takeover state.
Therefore under the normal condition, the state of two controller operations be (active, active), (takeover, standby), (standby, takeover).
Therefore; Behind definite local controller state, whether the LSM module regularly inquires about this controller state and detects local service group running status consistent, if inconsistently then carry out the consistency adjustment; Adjustment is operating as attempts carrying out starting or stoping of maximum N unusual services; N is the numerical value that presets, and more than or equal to 0, the adjustment operation may be failed.The CSM module is obtained the service groups running status from LSM, and receives and timed sending node transmission information through the circulation of COMM module.The node transport packet is drawn together controller state and each service operation state of service groups (moving/stop); The CSM module obtains the nodal information of the other side's transmission then with the time interocclusal record that receives at every turn; If it is unusual to the square controller service operation; Perhaps surpass Preset Time and do not receive the other side's nodal information, then taking over the other side's service groups (if existence) and changing this controller state is takeover.
When this controller in running owing to detecting the other side and unusually carry out state variation; If need to take over to the square controller service; Then before taking over, carry out a program that presets; This program generally is to trigger electronic switch to restart or stop square controller, thoroughly isolates the controller of square controller to Service Source.
When double-control system is in a takeover, another standby state can switch to all (active of active of two controllers to it; Active) state; Method is to add the sign of a detection, is called the extended mode value, transmits together in company with heartbeat message; As this extended mode value during, then ignore the other side's state variation and do not change this controller state automatically for preset particular value.This extended mode value is set for presetting particular value and changing this node state value during switching, all changes to the active state respectively, cancel the extended mode value at last.
When will stopping all controllers in the double-control system, method and from (takeover standby) switches to (active; Active) similar, promptly use the extended mode value and the inactive state is set, after corresponding all service stopping, stop COMM; CSM, modules such as LSM.
Single controller in will restarting or stop two controlled storage systems, and this moment, two controllers moved, and method is that the non-controller state that stops to be set to takeover earlier, and the controller state that will restart or stop again is set to inactive.
The service operation situation that this controller of each controller user mode value representation is whole, and be the operation of serving on the target drives adjustment controller with the state value and stop.
Double-controller system provides two groups of services altogether, is designated as service groups 0 and service groups 1, and every group of service is made up of one or more services, switches with whole service groups during handed over service.
Controller uses ID 0 and 1 mark, and two controllers just often can all provide service at software and hardware, and this moment, controller 0 and controller 1 provided the service of service groups 0 and service groups 1 respectively, and this moment, controller 0 all was active with controller 1 state value.
The controller state value is that inactive representes that controller is in service state is not provided, and is not ready for providing service; State value active, all services that expression should operation service group 0 on controller 0, all services that expression should operation service group 1 on controller 1; A controller state value is takeover, and all services of expression service groups 0 and service groups 1 all move on this controller; A controller state value is standby, and expression does not provide the service of arbitrary service groups, but can switch to the active state by system manager's manual operation, perhaps detects another controller heartbeat timeout and initiatively switches to the takeover state.
During the double-controller system operation; Controller regularly detects on this controller the service operation state and regularly to the other side's node transmission state and service ruuning situation; Be called the heartbeat transmission; Surpass the Preset Time value when detecting the other side's heartbeat transmission, perhaps the arbitrary service operation in the service groups that should normally move of the other side is unusual, then takes over the other side's service and changes this node state.
Controller regularly detects service operation state on this controller, restarts this service as if maximum N trials that the service stopping that should move based on state is then preset, and wherein N is more than or equal to 0.
One controller can be carried out a preset program before taking over another controller service groups, and this preset program generally is used for from software and the electric visit of isolating another controller to the double-control system shared resource.
Claims (7)
1. the service of a double controller storage system and controller state method for supervising; It is characterized in that; Share state and service operation information on the double-control system through two control inter-node communications; Unusual or regularly heartbeat message is overtime when a node service operation, then normal node will be according to the design switching state, and is respective service operation and stopping on the target adjustment controller with this state;
Define three software function modules: node communication module COMM; Cluster service administration module CSM and local service administration module LSM; The intercommunication of module, wherein the COMM functions of modules is to receive to be transferred to another controller COMM module from the information of CSM module and through the controller communication media; The CSM module receives the other side's controller information from the COMM module, receives local service group running state information from the LSM module, and the state value of this controller of making a strategic decision also sends to the COMM module and the LSM module; The LSM module is obtained state value from the COMM module, and adjustment also regularly detects service groups local controller service groups ruuning situation, notifies the CSM module at last;
Each controller will determine the state of local controller after the system start-up, and the state of two controller operations is respectively active:active or takeover:standby or standby:takeover; Wherein, the controller state value is that inactive representes that controller is in service state is not provided, and is not ready for providing service; State value active, all services that expression should operation service group 0 on controller 0, all services that expression should operation service group 1 on controller 1; A controller state value is takeover, and all services of expression service groups 0 and service groups 1 all move on this controller; A controller state value is standby, and expression does not provide the service of arbitrary service groups, but can switch to the active state by system manager's manual operation, perhaps detects another controller heartbeat timeout and initiatively switches to the takeover state;
Behind definite local controller state; Whether the LSM module regularly inquires about this controller state and detects local service group running status consistent; If inconsistently then carry out consistency adjustment, adjustment is operating as attempts carrying out maximum starting or stoping of serving unusually for N time, and N is the numerical value that presets; More than or equal to 0, the adjustment operation may be failed; The CSM module is obtained the service groups running status from LSM, and receives and timed sending node transmission information through the circulation of COMM module; The node transport packet is drawn together controller state and each service operation state of service groups and is promptly moved or stop; The CSM module obtains the nodal information of the other side's transmission then with the time interocclusal record that receives at every turn; If it is unusual to the square controller service operation; Perhaps surpass Preset Time and do not receive the other side's nodal information, then taking over the other side's service groups and changing this controller state is takeover;
When this controller in running owing to detecting the other side and unusually carry out state variation; If need to take over to the square controller service; Then before taking over, carry out a program that presets; This program is to trigger electronic switch to restart or stop square controller, thoroughly isolates the controller of square controller to Service Source;
When double-control system is in a takeover, another standby state switches to the state that two controllers all are active to it; Be active, the active state, method is to add the sign of a detection; Be called the extended mode value; Transmit together in company with heartbeat message, as this extended mode value during, then ignore the other side's state variation and do not change this controller state automatically for preset particular value; This extended mode value is set for presetting particular value and changing this node state value during switching, all changes to the active state respectively, cancel the extended mode value at last;
Stop all controllers in the double-control system, method and from takeover, standby switches to active, and active is similar, promptly uses the extended mode value and the inactive state is set, and after corresponding all service stopping, stops COMM, CSM, LSM module;
Single controller in will restarting or stop two controlled storage systems, and this moment, two controllers moved, and the non-controller state that stops to be set to takeover earlier, and the controller state that will restart or stop again is set to inactive; Concrete controlled step is following:
S1. putting the local controller state is inactive, does not start any service groups, and beginning reiving/transmitting state information;
S2 waits for some seconds, and second number is by the maximum decision of system start-up time difference, and purpose is to guarantee can reach equal state at last each other when two controllers start simultaneously;
S3 shows that the other side is also starting if the other side's state is inactive also, and it is active that the local controller state then is set, execution in step S7;
S4 is if the other side's state is active, and it is active that the local controller state then is set, execution in step S7;
S5 shows that the other side has moved all service groups if do not receive the other side's information or the other side's state is takeover, and it is standby that the local controller state then is set, execution in step S7;
S6 is if the other side's state is standby, and it is takeover that the local controller state then is set, execution in step S7;
S7 is according to local controller state adjustment service groups running status and regularly detect local controller state value and service groups ruuning situation.
2. method according to claim 1 is characterized in that the service operation situation that this controller of each controller user mode value representation is whole, and is the operation of serving on the target drives adjustment controller with the state value and stops.
3. method according to claim 1 is characterized in that double-controller system provides two groups of services altogether, is designated as service groups O and service groups 1, and every group of service is made up of one or more services, switches with whole service groups during handed over service.
4. method according to claim 1; It is characterized in that; Controller uses ID 0 and 1 mark; Two controllers just often can all provide service at software and hardware, and this moment, controller 0 and controller 1 provided the service of service groups 0 and service groups 1 respectively, and this moment, controller 0 all was active with controller 1 state value.
5. method according to claim 1; It is characterized in that double-controller system when operation, controller regularly detects on this controller the service operation state and regularly to the other side's node transmission state and service ruuning situation; Be called the heartbeat transmission; Surpass the Preset Time value when detecting the other side's heartbeat transmission, perhaps the arbitrary service operation in the service groups that should normally move of the other side is unusual, then takes over the other side's service and changes this node state.
6. based on the described method of claim 1, it is characterized in that controller regularly detects service operation state on this controller, restart this service as if maximum N trials that the service stopping that should move based on state is then preset, wherein N is more than or equal to 0.
7. method according to claim 1 is characterized in that, a controller can be carried out a preset program before taking over another controller service groups, and this preset program is used for from software and the electric visit of isolating another controller to the double-control system shared resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910017117XA CN101594383B (en) | 2009-07-09 | 2009-07-09 | Method for monitoring service and status of controllers of double-controller storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910017117XA CN101594383B (en) | 2009-07-09 | 2009-07-09 | Method for monitoring service and status of controllers of double-controller storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101594383A CN101594383A (en) | 2009-12-02 |
CN101594383B true CN101594383B (en) | 2012-05-23 |
Family
ID=41408820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910017117XA Active CN101594383B (en) | 2009-07-09 | 2009-07-09 | Method for monitoring service and status of controllers of double-controller storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101594383B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI560558B (en) * | 2015-06-08 | 2016-12-01 | Synology Inc | Method for managing a storage system, and associated apparatus |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6007522B2 (en) * | 2012-03-09 | 2016-10-12 | 日本電気株式会社 | Cluster system |
CN104536853B (en) * | 2015-01-09 | 2016-07-27 | 浪潮电子信息产业股份有限公司 | Device for guaranteeing continuous availability of resources of dual-controller storage equipment |
CN104731727B (en) * | 2015-03-25 | 2017-05-31 | 浪潮集团有限公司 | A kind of dual control storage system monitoring management system and method |
CN105912416B (en) * | 2016-04-07 | 2019-06-28 | 珠海市魅族科技有限公司 | A kind of method and terminal monitoring processor in the terminal |
CN107423167A (en) * | 2017-07-31 | 2017-12-01 | 郑州云海信息技术有限公司 | A kind of ISCSI target redundancy control methods and system based on dual control storage |
CN109672544B (en) * | 2017-10-13 | 2020-12-11 | 杭州海康威视系统技术有限公司 | Data processing method and device and distributed storage system |
CN107678891B (en) * | 2017-10-13 | 2021-06-29 | 郑州云海信息技术有限公司 | Double control method and device of storage system and readable storage medium |
CN107807868A (en) * | 2017-10-13 | 2018-03-16 | 郑州云海信息技术有限公司 | A kind of dual control storage system disaster dump method of testing and system |
CN110545197B (en) * | 2018-05-29 | 2022-09-09 | 杭州海康威视系统技术有限公司 | Node state monitoring method and device |
CN108958990B (en) * | 2018-07-24 | 2021-10-15 | 郑州云海信息技术有限公司 | Method and device for improving reliability of field replaceable unit information |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101237315A (en) * | 2008-02-28 | 2008-08-06 | 浪潮电子信息产业股份有限公司 | A synchronous detection and failure separation method for dual control high-availability system |
CN101296183A (en) * | 2008-04-29 | 2008-10-29 | 北京泰得思达科技发展有限公司 | Data transmission system of double-controller system |
CN101382872A (en) * | 2008-10-21 | 2009-03-11 | 浪潮电子信息产业股份有限公司 | Double-control storage and switch control method for SAS and SATA signal by detecting heartbeat |
-
2009
- 2009-07-09 CN CN200910017117XA patent/CN101594383B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101237315A (en) * | 2008-02-28 | 2008-08-06 | 浪潮电子信息产业股份有限公司 | A synchronous detection and failure separation method for dual control high-availability system |
CN101296183A (en) * | 2008-04-29 | 2008-10-29 | 北京泰得思达科技发展有限公司 | Data transmission system of double-controller system |
CN101382872A (en) * | 2008-10-21 | 2009-03-11 | 浪潮电子信息产业股份有限公司 | Double-control storage and switch control method for SAS and SATA signal by detecting heartbeat |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI560558B (en) * | 2015-06-08 | 2016-12-01 | Synology Inc | Method for managing a storage system, and associated apparatus |
US9858135B2 (en) | 2015-06-08 | 2018-01-02 | Synology Incorporated | Method and associated apparatus for managing a storage system |
Also Published As
Publication number | Publication date |
---|---|
CN101594383A (en) | 2009-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101594383B (en) | Method for monitoring service and status of controllers of double-controller storage system | |
CN101136900B (en) | Fast transparent fault shift device and implementing method facing to service | |
CN102257759B (en) | Master-standby switching method, system control unit and communication system | |
CN102355366B (en) | Member-stacking device and method for managing member-stacking device at split stacking moment | |
CN101257405B (en) | Method for implementing double chain circuits among master-salve equipments | |
CN101917337B (en) | Device and method for interconnecting router cluster middle plates | |
EP3036873A1 (en) | Dedicated control path architecture for stacked packet switches | |
US20080307254A1 (en) | Information-processing equipment and system therefor | |
CN105471622A (en) | High-availability method and system for main/standby control node switching based on Galera | |
CN101841735B (en) | Frame-type switch, stack system and fault treatment method after stack | |
US20100268687A1 (en) | Node system, server switching method, server apparatus, and data takeover method | |
CN113220509B (en) | Double-combination alternating shift system and method | |
CN109981353B (en) | Method and system for protecting adjacent station redundancy in frame type network communication equipment | |
CN112468328A (en) | Dual-redundancy FC-AE-1553 network reconstruction method based on switched topology | |
CN105763442A (en) | PON system and method avoiding interruption of LACP aggregation link in main-standby switching process | |
CN102487332B (en) | Fault processing method, apparatus thereof and system thereof | |
WO2012000338A1 (en) | Method and system for achieving main/standby switch for single boards | |
KR101358995B1 (en) | Method and system for managing high availability | |
JP4806382B2 (en) | Redundant system | |
CN114124803B (en) | Device management method and device, electronic device and storage medium | |
EP4084492A1 (en) | A method, system and olt for dual-parenting pon protection | |
JP5176914B2 (en) | Transmission device and system switching method for redundant configuration unit | |
CN113742142B (en) | Method for managing SATA hard disk by storage system and storage system | |
WO2007062569A1 (en) | A system for enabling service switching and a service switching method thereof | |
CN111510336B (en) | Network equipment state management method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |