[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN101594383B - Method for monitoring service and status of controllers of double-controller storage system - Google Patents

Method for monitoring service and status of controllers of double-controller storage system Download PDF

Info

Publication number
CN101594383B
CN101594383B CN200910017117XA CN200910017117A CN101594383B CN 101594383 B CN101594383 B CN 101594383B CN 200910017117X A CN200910017117X A CN 200910017117XA CN 200910017117 A CN200910017117 A CN 200910017117A CN 101594383 B CN101594383 B CN 101594383B
Authority
CN
China
Prior art keywords
controller
state
service
module
active
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200910017117XA
Other languages
Chinese (zh)
Other versions
CN101594383A (en
Inventor
施培任
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Langchao Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langchao Electronic Information Industry Co Ltd filed Critical Langchao Electronic Information Industry Co Ltd
Priority to CN200910017117XA priority Critical patent/CN101594383B/en
Publication of CN101594383A publication Critical patent/CN101594383A/en
Application granted granted Critical
Publication of CN101594383B publication Critical patent/CN101594383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method for monitoring the service and the status of controllers of a double-controller storage system. Three software function modules, namely a node communication (COMM) module, a cluster service management (CSM) module and a local service management module (LSM) are defined, and the mutual communication is performed among the modules, wherein the function of the COMM module is to receive information from the CSM module and transfer the information to the COMM module of another controller through communication media of the controller; the CSM module receives information of the opponent controller from the COMM module, receives operation status information of a local service group from the LSM module, decides a status value of the controller and sends the status value to the COMM module and the LSM module; the LSM module acquires the status value from the COMM module, adjusts and detects the operation status of a local controller service group of a service group at regular time, and finally notifies the CSM module of the operation status; and each controller needs to decide the status of the local controller after the system is started to realize the automatic fault switching of the controllers and the sustainable utilization of storage service.

Description

A kind of service of double controller storage system and controller state method for supervising
Technical field
The present invention relates to Computer Storage and service monitoring technique, a kind of service of double controller storage system specifically and controller state method for supervising
Technical background
Along with computerized information extensive use and development; The reliability of calculating and data becomes the core of information; The reliability requirement of computer memory device and system is also more and more stronger, for the memory device and the system of single controller, if controller breaks down (being generally hardware fault); Stores service is just unavailable, causes data, services interruption even data integrity to be damaged.Memory device and the system of sharing two control controllers of a plurality of disk array RAID have realized that the redundant of data and service is equipped with mutually; When a controller failure took place in application double controller storage system, another controller should detect and take over its all services.Service operation condition monitoring and failover how effectively to realize double controller storage system reliably are the problems that two controlled storage systems will solve.
Summary of the invention
The service and the controller state method for supervising that the purpose of this invention is to provide a kind of double controller storage system.
The objective of the invention is to realize by following mode; Demand to two controlled storage system service monitorings; The invention solves the monitoring and the switching of service operation state; And with controller state value concise representation controller service operation state, be to drive to adjust service operation and stop with the controller state, be the monitor mode of switch unit with the service relatively.
For realizing the objective of the invention is to share state and service operation information on the double-control system through two control inter-node communications; Perhaps regularly heartbeat message is overtime when a node service operation is unusual; Then normal node will be according to the design switching state, and is respective service operation and stopping on the target adjustment controller with this state;
Define three software function modules: node communication module COMM, cluster service administration module CSM and local service administration module LSM.Can intercom mutually between the module, wherein COMM module major function is to receive to be transferred to another controller COMM module from the information of CSM module and through the controller communication media; The CSM module receives the other side's controller information from the COMM module, receives local service group running state information from the LSM module, and the state value of this controller of making a strategic decision also sends to the COMM module and the LSM module; The LSM module is obtained state value from the COMM module, and adjustment also regularly detects service groups local controller service groups ruuning situation, notice CSM module.The state of two controller operations is respectively active:active; Takeover:standby and standby:takeover;
Behind definite local controller state; Whether the LSM module regularly inquires about this controller state and detects local service group running status consistent; If inconsistently then carry out consistency adjustment, adjustment is operating as attempts carrying out maximum starting or stoping of serving unusually for N time, and N is the numerical value that presets; More than or equal to 0, the adjustment operation may be failed; The CSM module is obtained the service groups running status from LSM, and receives and timed sending node transmission information through the circulation of COMM module; The node transport packet is drawn together controller state and each service operation state of service groups; The operation or stop; The CSM module obtains the nodal information of the other side's transmission then with the time interocclusal record that receives at every turn; If unusual to the square controller service operation, perhaps surpass Preset Time and do not receive the other side's nodal information, then taking over the other side's service groups and changing this controller state is takeover;
When this controller in running owing to detecting the other side and unusually carry out state variation; If need to take over to the square controller service; Then before taking over, carry out a program that presets; This program generally is to trigger electronic switch to restart or stop square controller, thoroughly isolates the controller of square controller to Service Source;
When double-control system is in a takeover, another standby state switches to the state that two controllers all are active to it; Be active, the active state, method is to add the sign of a detection; Be called the extended mode value; Transmit together in company with heartbeat message, as this extended mode value during, then ignore the other side's state variation and do not change this controller state automatically for preset particular value; This extended mode value is set for presetting particular value and changing this node state value during switching, all changes to the active state respectively, cancel the extended mode value at last;
Stop all controllers in the double-control system, method and from takeover, standby switches to active, and active is similar, promptly uses the extended mode value and the inactive state is set, and after corresponding all service stopping, stops COMM, CSM, LSM module;
Single controller in will restarting or stop two controlled storage systems, and this moment, two controllers moved, and method is that the non-controller state that stops to be set to takeover earlier, and the controller state that will restart or stop again is set to inactive; Concrete controlled step is following:
S1. putting the local controller state is inactive, does not start any service groups, and beginning reiving/transmitting state information;
S2 waits for some seconds, and second number is by the maximum decision of system start-up time difference, and purpose is to guarantee can reach equal state at last each other when two controllers start simultaneously;
S3 shows that the other side is also starting if the other side's state is inactive also, and it is active that the local controller state then is set, execution in step S7;
S4 is if the other side's state is active, and it is active that the local controller state then is set, execution in step S7;
S5 shows that the other side has moved all service groups if do not receive the other side's information or the other side's state is takeover, and it is standby that the local controller state then is set, execution in step S7;
S6 is if the other side's state is standby, and it is takeover that the local controller state then is set, execution in step S7;
S7 is according to local controller state adjustment service groups running status and regularly detect local controller state value and service groups ruuning situation.
The invention has the advantages that: be coupled to controller state in the service operation situation, convenient two controlled storage system states are checked and are safeguarded, are particularly suitable for two controlled storage systems of two nodes.
Description of drawings
Fig. 1 is a software module structure;
Fig. 2 controller system starting state monitoring flow chart;
State transition diagram when Fig. 3 has explained the double controller storage system operation.
Embodiment
Detailed explanation below with reference to accompanying drawing method of the present invention being done.
Two controlled storage system structures are that two controllers are connected to same disk groups; The communication media that is necessary between the controller; Generally comprise direct-connected Ethernet card, direct-connected serial ports, same LAN, on two controllers, all move the monitoring program of carrying out the inventive method.
The service that double-control system is provided is divided into two groups, is designated as ServiceGroup0, ServiceGroup1, and generally each service groups comprises iSCSI Target service and FC Target service.The service of each service groups all provides startup, stops to operate with status poll.Take place unusual or because controller failure when causing service to switch to another controller, is to be that unit switches with service place service groups in service.
For distinguishing two controllers, be designated controller 0 and controller 1 respectively, monitoring program note controller system Status Type has inactive; Active, takeover, standby; State value is that inactive representes that controller is in service state is not provided, and is not ready for providing service; State value active, all services that expression should operation service group 0 on controller 0, all services that expression should operation service group 1 on controller 1; A controller state value is takeover, and all services of expression service groups 0 and service groups 1 all move on this controller; A controller state value is standby, and expression does not provide the service of arbitrary service groups, but can switch to the active state by system manager's manual operation, perhaps detects another controller heartbeat timeout and initiatively switches to the takeover state.
Therefore under the normal condition, the state of two controller operations be (active, active), (takeover, standby), (standby, takeover).
Therefore; Behind definite local controller state, whether the LSM module regularly inquires about this controller state and detects local service group running status consistent, if inconsistently then carry out the consistency adjustment; Adjustment is operating as attempts carrying out starting or stoping of maximum N unusual services; N is the numerical value that presets, and more than or equal to 0, the adjustment operation may be failed.The CSM module is obtained the service groups running status from LSM, and receives and timed sending node transmission information through the circulation of COMM module.The node transport packet is drawn together controller state and each service operation state of service groups (moving/stop); The CSM module obtains the nodal information of the other side's transmission then with the time interocclusal record that receives at every turn; If it is unusual to the square controller service operation; Perhaps surpass Preset Time and do not receive the other side's nodal information, then taking over the other side's service groups (if existence) and changing this controller state is takeover.
When this controller in running owing to detecting the other side and unusually carry out state variation; If need to take over to the square controller service; Then before taking over, carry out a program that presets; This program generally is to trigger electronic switch to restart or stop square controller, thoroughly isolates the controller of square controller to Service Source.
When double-control system is in a takeover, another standby state can switch to all (active of active of two controllers to it; Active) state; Method is to add the sign of a detection, is called the extended mode value, transmits together in company with heartbeat message; As this extended mode value during, then ignore the other side's state variation and do not change this controller state automatically for preset particular value.This extended mode value is set for presetting particular value and changing this node state value during switching, all changes to the active state respectively, cancel the extended mode value at last.
When will stopping all controllers in the double-control system, method and from (takeover standby) switches to (active; Active) similar, promptly use the extended mode value and the inactive state is set, after corresponding all service stopping, stop COMM; CSM, modules such as LSM.
Single controller in will restarting or stop two controlled storage systems, and this moment, two controllers moved, and method is that the non-controller state that stops to be set to takeover earlier, and the controller state that will restart or stop again is set to inactive.
The service operation situation that this controller of each controller user mode value representation is whole, and be the operation of serving on the target drives adjustment controller with the state value and stop.
Double-controller system provides two groups of services altogether, is designated as service groups 0 and service groups 1, and every group of service is made up of one or more services, switches with whole service groups during handed over service.
Controller uses ID 0 and 1 mark, and two controllers just often can all provide service at software and hardware, and this moment, controller 0 and controller 1 provided the service of service groups 0 and service groups 1 respectively, and this moment, controller 0 all was active with controller 1 state value.
The controller state value is that inactive representes that controller is in service state is not provided, and is not ready for providing service; State value active, all services that expression should operation service group 0 on controller 0, all services that expression should operation service group 1 on controller 1; A controller state value is takeover, and all services of expression service groups 0 and service groups 1 all move on this controller; A controller state value is standby, and expression does not provide the service of arbitrary service groups, but can switch to the active state by system manager's manual operation, perhaps detects another controller heartbeat timeout and initiatively switches to the takeover state.
During the double-controller system operation; Controller regularly detects on this controller the service operation state and regularly to the other side's node transmission state and service ruuning situation; Be called the heartbeat transmission; Surpass the Preset Time value when detecting the other side's heartbeat transmission, perhaps the arbitrary service operation in the service groups that should normally move of the other side is unusual, then takes over the other side's service and changes this node state.
Controller regularly detects service operation state on this controller, restarts this service as if maximum N trials that the service stopping that should move based on state is then preset, and wherein N is more than or equal to 0.
One controller can be carried out a preset program before taking over another controller service groups, and this preset program generally is used for from software and the electric visit of isolating another controller to the double-control system shared resource.

Claims (7)

1. the service of a double controller storage system and controller state method for supervising; It is characterized in that; Share state and service operation information on the double-control system through two control inter-node communications; Unusual or regularly heartbeat message is overtime when a node service operation, then normal node will be according to the design switching state, and is respective service operation and stopping on the target adjustment controller with this state;
Define three software function modules: node communication module COMM; Cluster service administration module CSM and local service administration module LSM; The intercommunication of module, wherein the COMM functions of modules is to receive to be transferred to another controller COMM module from the information of CSM module and through the controller communication media; The CSM module receives the other side's controller information from the COMM module, receives local service group running state information from the LSM module, and the state value of this controller of making a strategic decision also sends to the COMM module and the LSM module; The LSM module is obtained state value from the COMM module, and adjustment also regularly detects service groups local controller service groups ruuning situation, notifies the CSM module at last;
Each controller will determine the state of local controller after the system start-up, and the state of two controller operations is respectively active:active or takeover:standby or standby:takeover; Wherein, the controller state value is that inactive representes that controller is in service state is not provided, and is not ready for providing service; State value active, all services that expression should operation service group 0 on controller 0, all services that expression should operation service group 1 on controller 1; A controller state value is takeover, and all services of expression service groups 0 and service groups 1 all move on this controller; A controller state value is standby, and expression does not provide the service of arbitrary service groups, but can switch to the active state by system manager's manual operation, perhaps detects another controller heartbeat timeout and initiatively switches to the takeover state;
Behind definite local controller state; Whether the LSM module regularly inquires about this controller state and detects local service group running status consistent; If inconsistently then carry out consistency adjustment, adjustment is operating as attempts carrying out maximum starting or stoping of serving unusually for N time, and N is the numerical value that presets; More than or equal to 0, the adjustment operation may be failed; The CSM module is obtained the service groups running status from LSM, and receives and timed sending node transmission information through the circulation of COMM module; The node transport packet is drawn together controller state and each service operation state of service groups and is promptly moved or stop; The CSM module obtains the nodal information of the other side's transmission then with the time interocclusal record that receives at every turn; If it is unusual to the square controller service operation; Perhaps surpass Preset Time and do not receive the other side's nodal information, then taking over the other side's service groups and changing this controller state is takeover;
When this controller in running owing to detecting the other side and unusually carry out state variation; If need to take over to the square controller service; Then before taking over, carry out a program that presets; This program is to trigger electronic switch to restart or stop square controller, thoroughly isolates the controller of square controller to Service Source;
When double-control system is in a takeover, another standby state switches to the state that two controllers all are active to it; Be active, the active state, method is to add the sign of a detection; Be called the extended mode value; Transmit together in company with heartbeat message, as this extended mode value during, then ignore the other side's state variation and do not change this controller state automatically for preset particular value; This extended mode value is set for presetting particular value and changing this node state value during switching, all changes to the active state respectively, cancel the extended mode value at last;
Stop all controllers in the double-control system, method and from takeover, standby switches to active, and active is similar, promptly uses the extended mode value and the inactive state is set, and after corresponding all service stopping, stops COMM, CSM, LSM module;
Single controller in will restarting or stop two controlled storage systems, and this moment, two controllers moved, and the non-controller state that stops to be set to takeover earlier, and the controller state that will restart or stop again is set to inactive; Concrete controlled step is following:
S1. putting the local controller state is inactive, does not start any service groups, and beginning reiving/transmitting state information;
S2 waits for some seconds, and second number is by the maximum decision of system start-up time difference, and purpose is to guarantee can reach equal state at last each other when two controllers start simultaneously;
S3 shows that the other side is also starting if the other side's state is inactive also, and it is active that the local controller state then is set, execution in step S7;
S4 is if the other side's state is active, and it is active that the local controller state then is set, execution in step S7;
S5 shows that the other side has moved all service groups if do not receive the other side's information or the other side's state is takeover, and it is standby that the local controller state then is set, execution in step S7;
S6 is if the other side's state is standby, and it is takeover that the local controller state then is set, execution in step S7;
S7 is according to local controller state adjustment service groups running status and regularly detect local controller state value and service groups ruuning situation.
2. method according to claim 1 is characterized in that the service operation situation that this controller of each controller user mode value representation is whole, and is the operation of serving on the target drives adjustment controller with the state value and stops.
3. method according to claim 1 is characterized in that double-controller system provides two groups of services altogether, is designated as service groups O and service groups 1, and every group of service is made up of one or more services, switches with whole service groups during handed over service.
4. method according to claim 1; It is characterized in that; Controller uses ID 0 and 1 mark; Two controllers just often can all provide service at software and hardware, and this moment, controller 0 and controller 1 provided the service of service groups 0 and service groups 1 respectively, and this moment, controller 0 all was active with controller 1 state value.
5. method according to claim 1; It is characterized in that double-controller system when operation, controller regularly detects on this controller the service operation state and regularly to the other side's node transmission state and service ruuning situation; Be called the heartbeat transmission; Surpass the Preset Time value when detecting the other side's heartbeat transmission, perhaps the arbitrary service operation in the service groups that should normally move of the other side is unusual, then takes over the other side's service and changes this node state.
6. based on the described method of claim 1, it is characterized in that controller regularly detects service operation state on this controller, restart this service as if maximum N trials that the service stopping that should move based on state is then preset, wherein N is more than or equal to 0.
7. method according to claim 1 is characterized in that, a controller can be carried out a preset program before taking over another controller service groups, and this preset program is used for from software and the electric visit of isolating another controller to the double-control system shared resource.
CN200910017117XA 2009-07-09 2009-07-09 Method for monitoring service and status of controllers of double-controller storage system Active CN101594383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910017117XA CN101594383B (en) 2009-07-09 2009-07-09 Method for monitoring service and status of controllers of double-controller storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910017117XA CN101594383B (en) 2009-07-09 2009-07-09 Method for monitoring service and status of controllers of double-controller storage system

Publications (2)

Publication Number Publication Date
CN101594383A CN101594383A (en) 2009-12-02
CN101594383B true CN101594383B (en) 2012-05-23

Family

ID=41408820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910017117XA Active CN101594383B (en) 2009-07-09 2009-07-09 Method for monitoring service and status of controllers of double-controller storage system

Country Status (1)

Country Link
CN (1) CN101594383B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI560558B (en) * 2015-06-08 2016-12-01 Synology Inc Method for managing a storage system, and associated apparatus

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6007522B2 (en) * 2012-03-09 2016-10-12 日本電気株式会社 Cluster system
CN104536853B (en) * 2015-01-09 2016-07-27 浪潮电子信息产业股份有限公司 Device for guaranteeing continuous availability of resources of dual-controller storage equipment
CN104731727B (en) * 2015-03-25 2017-05-31 浪潮集团有限公司 A kind of dual control storage system monitoring management system and method
CN105912416B (en) * 2016-04-07 2019-06-28 珠海市魅族科技有限公司 A kind of method and terminal monitoring processor in the terminal
CN107423167A (en) * 2017-07-31 2017-12-01 郑州云海信息技术有限公司 A kind of ISCSI target redundancy control methods and system based on dual control storage
CN109672544B (en) * 2017-10-13 2020-12-11 杭州海康威视系统技术有限公司 Data processing method and device and distributed storage system
CN107678891B (en) * 2017-10-13 2021-06-29 郑州云海信息技术有限公司 Double control method and device of storage system and readable storage medium
CN107807868A (en) * 2017-10-13 2018-03-16 郑州云海信息技术有限公司 A kind of dual control storage system disaster dump method of testing and system
CN110545197B (en) * 2018-05-29 2022-09-09 杭州海康威视系统技术有限公司 Node state monitoring method and device
CN108958990B (en) * 2018-07-24 2021-10-15 郑州云海信息技术有限公司 Method and device for improving reliability of field replaceable unit information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101237315A (en) * 2008-02-28 2008-08-06 浪潮电子信息产业股份有限公司 A synchronous detection and failure separation method for dual control high-availability system
CN101296183A (en) * 2008-04-29 2008-10-29 北京泰得思达科技发展有限公司 Data transmission system of double-controller system
CN101382872A (en) * 2008-10-21 2009-03-11 浪潮电子信息产业股份有限公司 Double-control storage and switch control method for SAS and SATA signal by detecting heartbeat

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101237315A (en) * 2008-02-28 2008-08-06 浪潮电子信息产业股份有限公司 A synchronous detection and failure separation method for dual control high-availability system
CN101296183A (en) * 2008-04-29 2008-10-29 北京泰得思达科技发展有限公司 Data transmission system of double-controller system
CN101382872A (en) * 2008-10-21 2009-03-11 浪潮电子信息产业股份有限公司 Double-control storage and switch control method for SAS and SATA signal by detecting heartbeat

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI560558B (en) * 2015-06-08 2016-12-01 Synology Inc Method for managing a storage system, and associated apparatus
US9858135B2 (en) 2015-06-08 2018-01-02 Synology Incorporated Method and associated apparatus for managing a storage system

Also Published As

Publication number Publication date
CN101594383A (en) 2009-12-02

Similar Documents

Publication Publication Date Title
CN101594383B (en) Method for monitoring service and status of controllers of double-controller storage system
CN101136900B (en) Fast transparent fault shift device and implementing method facing to service
CN102257759B (en) Master-standby switching method, system control unit and communication system
CN102355366B (en) Member-stacking device and method for managing member-stacking device at split stacking moment
CN101257405B (en) Method for implementing double chain circuits among master-salve equipments
CN101917337B (en) Device and method for interconnecting router cluster middle plates
EP3036873A1 (en) Dedicated control path architecture for stacked packet switches
US20080307254A1 (en) Information-processing equipment and system therefor
CN105471622A (en) High-availability method and system for main/standby control node switching based on Galera
CN101841735B (en) Frame-type switch, stack system and fault treatment method after stack
US20100268687A1 (en) Node system, server switching method, server apparatus, and data takeover method
CN113220509B (en) Double-combination alternating shift system and method
CN109981353B (en) Method and system for protecting adjacent station redundancy in frame type network communication equipment
CN112468328A (en) Dual-redundancy FC-AE-1553 network reconstruction method based on switched topology
CN105763442A (en) PON system and method avoiding interruption of LACP aggregation link in main-standby switching process
CN102487332B (en) Fault processing method, apparatus thereof and system thereof
WO2012000338A1 (en) Method and system for achieving main/standby switch for single boards
KR101358995B1 (en) Method and system for managing high availability
JP4806382B2 (en) Redundant system
CN114124803B (en) Device management method and device, electronic device and storage medium
EP4084492A1 (en) A method, system and olt for dual-parenting pon protection
JP5176914B2 (en) Transmission device and system switching method for redundant configuration unit
CN113742142B (en) Method for managing SATA hard disk by storage system and storage system
WO2007062569A1 (en) A system for enabling service switching and a service switching method thereof
CN111510336B (en) Network equipment state management method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant