Summary of the invention
The present invention is based on only repairing in the prior art to disk state and hard disk failure, a kind of reparation is provided
The method and storage system of the failure of connection chip in link.
In a first aspect, the application provides a kind of storage fault handling method, it is applied to storage system, the storage system packet
At least one storage dish and at least two storage controls are included, each storage control includes connection chip, each control
Device is connected to each storage dish by the connection chip of itself, and at least two storage control interconnects;
The described method includes:
First storage control receives the first data operation request, is counted by the first branch of the first link to described first
The first operational order is sent according to corresponding first storage dish of read-write operation, first link is comprising the first storage control
The link of the connection chip of device, the target side that the first of first link branches into first link are deposited for described first
The connection of disk is stored up, first storage control is any storage control at least two storage control;
First storage control monitors that first operational order executes time-out, and first storage control is logical
It crosses the first branch of the second link and forwards first operational order to first storage dish, second link is includes institute
The link of the connection chip of the second storage control is stated, the first of second link branches into the target side of second link
For the connection of first storage dish, second storage control is any storage connecting with first storage control
Controller;
First storage control is received to be referred to by first operation of the first multi-branch transport of second link
The operation success response of order, according to the number of the operation success response statistical operation exception, described in the operation exception instruction
The operational order that first storage control receives executes time-out by first link, but is executed by second link
Success;
First storage control determines that the statistics number of the operation exception described in the given time is more than predetermined threshold,
Fault restoration is carried out to the connection chip in first link.
The above method, from the first link switching to the second link, storage control can refer to according to operation
The practical executive condition enabled carrys out the number of statistical operation exception, after operation exception reaches predetermined threshold, executes the first link
Reparation improve the stability and follow-up data operation of storage system to identify link failure and repair link failure
Execution efficiency.
For above-mentioned in a first aspect, a kind of possible mode for executing operation exception statistics is as follows: the first storage control
Device processed increases behaviour that is primary or maintaining former statistics according to statistical rules and the operation success response, by the number of operation exception
It is constant to make frequency of abnormity, wherein the statistical rules includes: that every branch of operation occur in to(for) first link is different
Often, only count primary;Correspondingly, the predetermined threshold is less than or equal to the quantity N of storage dish described in the storage system.This
Predetermined threshold, is set as the quantity N of storage dish, that is, predetermined threshold is set as the first link by kind specific embodiment
Numbers of branches, when operation exception statistics, the operation exception occurred in every branch of the first link is only counted once,
The operation exception occurred again in any branch through counting will not count, when all occur in every branch operation exception it
Afterwards, the statistics number of operation exception reaches quantity N, that is, reaches predetermined threshold, in such cases, that is, can determine whether the first link
There is link failure.
For above-mentioned in a first aspect, in a kind of possible implementation: the method also includes: the first storage control
Device determines there is operation exception in the n-th branch of first link after, the failure of the n-th branch of first link is set
Label, the faulty tag indicate that the n-th branch of first link is unavailable or the grade of the n-th branch of first link
It does not reduce, n is nature number variable, and n is more than or equal to 1, and is less than or equal to N;First controller is being received for the n-th storage
After the follow-up data operation requests of disk, according to the faulty tag of the n-th branch of first link, directly pass through described second
N-th branch of link sends subsequent operation instruction to n-th storage dish
Before the first link failure is not repaired, above-mentioned implementation avoid subsequent operation instruction execution delay or
Person executes failure.
Further, after carrying out fault restoration to the connection chip in first link, the method also includes:
The faulty tag of every branch of first link is deleted, or the normal mark of every branch of first link is set
Label, the normal tag indicate that first link state can be used or the rank of first link is normal;Then described first
Controller is after receiving the follow-up data operation requests for first storage dish, according to the first link failure mark
The normal tag of the state of label or the first link switches back into first link and deposits to subsequent data operation request is targeted
It stores up disk and sends operational order.
Above-mentioned implementation, so that the first storage control continues the data operation request that be sent to storage dish upon receipt
Later, it according to the state of first link (faulty tag has been deleted or the normal tag of the first link), directly selects
First link carries out the transmission of operational order, and since the path of the first link is more shorter than the second link, subsequent operation refers to
Order will get more quickly to processing, and such processing mode avoids subsequent operation instruction and passes through caused by the execution of the second link
Time delay, improve the treatment effeciency of operational order.
Optionally, after the connection chip to first link carries out fault restoration, the method also includes: detection
Whether the connection chip in first link repairs success;Then detect the connection chip reparation in first link at
After function, the faulty tag of first link, or the normal tag of setting first link are deleted.
The above method further detects the first link, to get the true shape of link after repairing the first link
State guarantees that subsequent operation execution can be carried out according to true link state.
Optionally, the method also includes: detect connection chip in first link repair it is unsuccessful after,
Issue the Breakdown Maintenance notice of the connection chip in first link.
Specifically, the connection chip in first link carries out fault restoration, comprising: restarts described first and deposits
Store up the connection chip of controller;Alternatively, the connection chip of isolation first storage control;Alternatively, to first storage
Queue on the connection chip of controller is repaired;Alternatively, to the port on the connection chip of first storage control
It is repaired.
The above-mentioned reparation to link focuses on the reparation of the connection chip to chain road, so that hardware problem of making a thorough investigation, guarantees to repair
Multiple efficiency.
Optionally, after first storage control monitors that first operational order executes time-out, the method
Further include: first label of the first storage control record, first label indicate that first operational order passes through institute
The first branch for stating the first link executes time-out;Before the number according to the operation success response statistical operation exception,
Further include: second label of the first storage control record, second label indicate that first operational order passes through institute
The first branch for stating the second link runs succeeded;Determine whether first operational order is provided simultaneously with first label and institute
State the second label;If first operational order is provided simultaneously with first label and second label, determination is grasped
Make abnormal.
Second aspect, the application provide a kind of storage system characterized by comprising at least one storage dish and at least
Two storage controls;Each storage control includes connection chip, and each controller is connected by the connection chip of itself
It is connected to each storage dish;At least two storage control interconnects;
First storage control, for receiving the first data operation request, by the first branch of the first link to described
Corresponding first storage dish of data read-write operation sends the first operational order, is monitoring that it is super that first operational order executes
When, first operational order, the first storage control are forwarded to first storage dish by the first branch of the second link
Device processed is any storage control at least two storage control, and second storage control is and described first
Any storage control of storage control connection, first link include the connection chip of first storage control
Link, the first of first link branch into the connection that the target side in first link is first storage dish, institute
The link that the second link is the connection chip comprising second storage control is stated, the first of second link branches into institute
The target side for stating the second link is the connection of first storage dish;And first storage control passes through for receiving
The operation success response of first operational order of first multi-branch transport of second link, successfully rings according to described operate
Answer the number of statistical operation exception, in the given time the operation exception number be more than predetermined threshold after, to first chain
Connection chip in road carries out fault restoration, and the operation exception indicates the operational order that first storage control receives
Time-out is executed by first link, but is run succeeded by second link.
The first storage control in above-mentioned storage system is also used to execute the first storage control in above-mentioned first aspect
The correlation function executed in fault handling method.
The third aspect, the application provide a kind of storage control, comprising:
Processing module is stored, for receiving the first data operation request, by the first branch of the first link to described the
Corresponding first storage dish of one data read-write operation sends the first operational order, and first link is comprising first storage
The link of the connection chip of controller, it is described that the first of first link, which branches into the target side in first link,
The connection of one storage dish.
Link failure processing module, for passing through the second link after monitoring that first operational order executes time-out
The first branch forward first operational order to first storage dish, second link be include second storage
The link of the connection chip of controller, the target side that the first of second link branches into second link is described first
The connection of storage dish;And receive by the operation of first operational order of the first multi-branch transport of second link at
Function response, according to the number of the operation success response statistical operation exception, the in the given time statistics of the operation exception
Number carries out fault restoration to the connection chip in first link, the operation exception indicates institute more than after predetermined threshold
It states the first operational order and time-out is executed by first link, but run succeeded by second link.
Optionally, link failure processing module executes operation exception statistics and specifically includes: according to statistical rules and the behaviour
Make success response, the number of operation exception is increased primary or maintain the operation exception number of former statistics constant, wherein is described
Statistical rules includes: the operation exception occurred in every branch for first link, only counts primary;Correspondingly, institute
State the quantity N that predetermined threshold is storage dish described in the storage system.
Optionally, link failure processing module is also used to determine in the n-th branch of first link operation exception occur
Later, the faulty tag of the n-th branch of first link is set, and the faulty tag indicates n-th point of first link
Branch is unavailable or the rank of the n-th branch of first link reduces, and n is nature number variable, and n is more than or equal to 1, and is less than etc.
In N;
Processing module is then stored, is also used to after receiving the follow-up data operation requests for the n-th storage dish, according to
The faulty tag of n-th branch of first link, directly by the n-th branch of second link to n-th storage dish
Send subsequent operation instruction.
Optionally, link failure processing module is also used to repair to the connection chip progress failure in first link
After multiple, the faulty tag of every branch of first link is deleted, or every branch of first link is set
Normal tag, the normal tag indicate that first link state can be used or the rank of first link is normal;Then deposit
After storing up processing module follow-up data operation requests, just according to the state of the first link failure label or the first link
Normal label switches back into the targeted storage dish of follow-up data operation requests described in the first chain road direction and sends operational order.
Fourth aspect, the application provide a kind of storage control, comprising:
Equipment is stored, for storing instruction;And
An at least processor is coupled with the storage equipment;
Wherein, when an at least processor execution described instruction, described instruction causes the processor to execute
State the method for first aspect.Method, storage system and the storage control that above-mentioned the application various aspects provide, Neng Gouzhen
The execution time delay for just solving the problems, such as operational order caused by link failure in storage system avoids toggle path in the prior art
Although operation success but inefficient problem, further increase the efficiency of storage system caused by the mode of handling failure.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiment is a part of the embodiments of the present invention, rather than whole embodiments.Based on this hair
Embodiment in bright, those of ordinary skill in the art's every other reality obtained without making creative work
Example is applied, all should belong to the scope of protection of the invention.
As shown in Figure 1, being the architecture diagram of storage system provided in an embodiment of the present invention, which includes two storages
Controller (10,20) and three storage dishes (30,40,50), storage control 10 include connection chip 60, and storage control 20 wraps
Connection chip 70 is included, storage control 10 is connect with storage dish 30,40 and 50 respectively by connecting chip 60, storage control 20
It is connect respectively with storage dish 30,40 and 50 by connecting chip 70;There are also chains between storage control 10 and storage control 20
Road connection, which can be network connection, and to be also possible to bus direct-connected.Storage control 10 has 3 link difference
3 storage dishes are connected to, are the link 01 that storage control 10 is connected to storage dish 30 by connecting chip 60, storage control respectively
The link 02 that device 10 processed is connected to storage dish 40 by connecting chip 60, storage control 10 are connected to by connection chip 60 and are deposited
The link 03 of disk 50 is stored up, storage control 20 there are 3 links to be connected respectively to 3 storage dishes, is that storage control 20 is logical respectively
The link 04 that connection chip 70 is connected to storage dish 30 is crossed, storage control 20 is connected to storage dish 40 by connecting chip 70
Link 05, the link 06 that storage control 20 is connected to storage dish 50 by connecting chip 70.
Above-mentioned connection chip (60,70) can be high speed peripheral component interconnection bus (full name in English: peripheral
Component interconnect express, abbreviation PCIe) chip, be also possible to small computer system interface (English
Language: Small Computer System Interface;Write a Chinese character in simplified form: SCSI) chip, so-called connection core in the embodiment of the present invention
Piece refers on storage control for connecting the connection chip of storage dish.
The quantity of storage control and storage dish in the storage system of above-mentioned Fig. 1 is all citing form, and the present invention is real
The storage system for applying example offer includes at least two storage controls and at least one storage dish.Above-mentioned storage control with deposit
The hardware state of storage disk can be flexibly, such as can be storage control and formed together with storage control concentrated setting
Controller chassis, multiple storage dishes concentrate in together composition this discharge plate control of hard disk chassis separation (disk refers to storage dish, and control refers to storage control)
Form, can also be, for example, the form of disk control one of the storage control together with storage dish concentrated setting.
Storage dish in above-mentioned Fig. 1 can be the disk of traditional form, such as hard disk hard disk, be also possible to solid-state
Storage hard disk (full name in English: solid state disk, referred to as: SSD), it can also be that other storage medium shapes are morphogenetic and deposit
Store up disk.During storage dish in Fig. 1 is specifically used, disk array (English: Redundant Arrays of can be formed
Independent Disks, referred to as: RAID), the storage of data is carried out by composition disk array, can provide reliability more
High storage system.
Storage control in above-mentioned Fig. 1 can be hardware entities device, as shown in Fig. 2, the storage control 200 can be with
Including processing unit 201 and communication interface 202, processing unit 201 is used to execute the data storage function of storage control 200,
Communication interface 202 is used to carry out communication interaction with other equipment, and other equipment can be access host or other storage systems
System, such as processing unit 201 receive the data read request or data write request that access host is sent, tool by communication interface 202
Body, communication interface 202 can be adapter.Optionally, the storage control 200 of the example, in hardware can also include defeated
Enter output interface 203, input/output interface 203 is connected with input-output apparatus, information for receiving input, output operation
As a result.Input/output interface 203 can be mouse, keyboard, display or CD-ROM drive etc..Optionally, which deposits
Storing up controller 200 can also include additional storage 204, also commonly referred to as external memory, and the storage medium of additional storage 204 can be with
It is magnetic medium (for example, floppy disk, hard disk, tape), optical medium (such as CD) or semiconductor medium (such as solid state hard disk)
Deng.
Processing unit 201 is used to execute the data storage function of storage control 200, can there are many implement shape
Formula, such as processing unit 201 may include processor 2011 and memory 2012, processor 2011 in memory 2012 according to storing
Program unit executes relevant data storage processing, and processor 2011 can be central processing unit (CPU) or image processor (English
Text: graphics processing unit, GPU), processor 2011 can be single core processor or multi-core processor.Processing
Unit 201 can also be realized individually using the logical device of built-in processing logic, such as (English is complete for field programmable gate array
Claim: Field Programmable Gate Array, abbreviation: FPGA) or digital signal processor (English: digital
Signal processor, DSP) etc..
Storage control in above-mentioned Fig. 1 is also possible to be made of the processing logic of storage control, processing logic tool
Logic circuit realization can also be respectively adopted, in Fig. 2 realizing by way of the program code for residing in memory in body
Schematically in the processing logic composition deposited, the processing logic of the storage control may include: storage processing logical AND storage
Logic is managed, is used for after receiving the data reading or data write request that access host is sent, holds wherein storing processing logic
The relevant data storage operations of row;Storage management logic be used for storage processing logic implementation procedure in, to storage control,
The treatment process of storage dish and data is managed, and the state and/or process failure of monitoring device simultaneously carry out corresponding failure
Processing.
In above-mentioned storage system shown in FIG. 1, for each storage dish, there are both links to deposit respectively with two
Store up controller connection.Wherein, the storage management logic in storage control can be used to manage all storage dishes, such as save storage
The link information of disk notes abnormalities in time by the state of detection storage dish and makes diagnosis, to repair exception as much as possible,
Guarantee the reliability of data storage.The storage management logic in storage control carries out troubleshooting mode such as in the prior art
Under: storage control 10 receives storage operation requests, determines that the storage operation requests need to read storage dish 30, passes through itself
Connection chip 60 to storage dish 30 send reading instruction, storage control 10 determine send reading instruction processing time-out after, really
It makes and has showed IO time-out, send storage control 20 for the reading instruction thus according to internal scheduled troubleshooting strategy, with
So that storage control 20 sends reading instruction to storage dish 30 by connection chip 70 again, after storage control 20 is handled successfully
Returning response message is to storage control 10.Such troubleshooting mode, although to store operation requests (I/O request) finally
Can be succeeded processing by replacement link, but I/O link is long, and time delay is big, be will lead to the whole service disconnection that externally presents and showed
As.In the prior art, occur in business compared under long time delay or the scene of interruption, such as above-mentioned storage control 10 is sent out determining
In the case where the IO time-out for being sent to storage dish 30, it is also possible to be repaired to storage dish 30, if however IO time-out is because of storage
Link between controller 10 and storage dish 30 breaks down, then repairing storage dish 30 is idle work.
The present invention in view of the above technical problems, provides the fault handling method and device of a kind of storage system, this method and
Device is for solving link failure bring IO time delay or service disconnection.The present invention increases link failure in storage control
Processing module to identify that the I O process caused because of link failure (connection failure of chip) is abnormal, and carries out link corresponding
Reparation handled so that subsequent I/O request can switch back in original link, improve treatment effeciency.
The link failure processing module that storage control in the embodiment of the present invention provides, can be storage management logic
Function enhancing is also possible to the individual processing logic independently of storage management logic, and the link failure processing module is for knowing
Other link failure and the reparation for handling link failure.Specific troubleshooting process will be specific by subsequent embodiment with details
Explanation.
Before describing the specific embodiments, for convenience with it is clear, herein first to the chain in the embodiment of the present invention
Road carries out the unification in appellation.Each storage control and each storage in storage system as where the embodiment of the present invention
There may be multilink between disk, for each storage dish, each storage dish there are at least both links to be connected to difference
Storage control, storage system as shown in Figure 1, since there are two storage controls (10,20), then each storage dish connects
Being connected to each storage control and being corresponding with includes two chains between both links, such as storage dish 30 and storage control 10
Road, the first link are the link 01 that storage control 10 is connected to storage dish 30 by connecting chip 60, and the second link is storage
Controller 10 is connected to the link 04 of storage dish 30 by the connection chip 70 of storage control 20;For storage dish 40, with
It also include both links between storage control 10, the first link is connected to storage by connecting chip 60 for storage control 10
The link 02 of disk 40, the second link are connected to storage dish 40 by the connection chip 70 of storage control 20 for storage control 10
Link 05;As it can be seen that including n link between each storage dish and each storage control, control is stored in n=storage system
The quantity of device.For convenience, in present specification, for the link between each storage dish and each storage control into
Distinction description is gone, the link of the connection chip comprising the first storage control is known as the first link, and (example includes as shown in figure 1
The link 01 of the connection chip 60 of storage control 10,02,03), by the link of the connection chip comprising the second storage control
Referred to as the second link (link 04,05,06 that example includes the connection chip 70 of storage control 20 as shown in figure 1), will deposit comprising n-th
The link for storing up the connection chip of controller is known as the n-th link.Link is distinguished according to the difference of the target side of the first link connection
Branch, such as the connection that the target side of the first link is storage dish 30 is known as to the first branch (such as link of the first link
It 01) is, that the connection of storage dish 30 is known as the first branch (such as link 04) of the second link by the target side of the second link, by the
The target side of one link is that the connection of storage dish 40 is known as the second branch (such as link 02) of the first link, by the second link
Target side is that the connection of storage dish 40 is known as the second branch (such as link 05) of the second link.
As shown in figure 3, being the specific implementation process of link failure processing method provided in an embodiment of the present invention, need to illustrate
, storage system shown in Fig. 3 is the simple version of storage system shown in Fig. 1, and the company on storage control is omitted in Fig. 3
The connection relationship for connecing chip, other storage dishes and storage dish and storage control is mainly used for illustrating flow processing relationship.
It accesses host and initiates the first data operation request to the first storage control in step 301, which is used for
Data access is carried out to the first storage dish.In step 302, the first storage control is by the first branch of the first link to first
Storage dish initiates the first operational order, and request carries out read operation to corresponding data or write operation, first link include
The connection chip of first storage control;In step 303, it is super that the first storage control monitors that above-mentioned first operational order executes
When, the first storage control handoff links, by the connection between the first storage control and the second storage control to
Two storage controls forward the first operational order;In step 304, the second storage control is by the connection chip of itself to first
Storage dish sends the first operational order;Step 303 and step 304 realize the first storage control the by the second link
One branch forwards first operational order to first storage dish, and second link includes the company of the second storage control
Connect chip;In step 305, the first storage dish is finished after operation, and the behaviour of the first operational order is sent to second controller
Make success response, forwards the operation success response to the first storage control in second storage control of step 306;In step
307, the first storage control sends access response to host.
In step 308 (for step 308 with 307 practical execution without permanent order, the two sequence is interchangeable), the first storage control
For device after the first branch by the second link receives the operation success response of the first operational order, statistical operation is abnormal
Number.It, can be by time of operation exception when determination needs to increase the number of operation exception according to predetermined statistical rules
Number increases once, and the operation exception indicates that the operational order that first storage control receives is held by first link
Row time-out, but the operation to be run succeeded by second link;Specifically, can there are the following two kinds statistical rules, the first system
Meter rule: it for the operation exception occurred in every branch of each link, only counts primary, it is understood that are as follows: the first chain
The operation exception for the first time occurred in each branch on road, operation exception number increase once, go out in each branch of the first link
Existing non-operation exception for the first time, without statistics;Second statistical rules: as long as there is operation exception, operation exception time is carried out
Several statistics, regardless of whether the operation exception for the first time in the branch, i.e., every once-through operation exception occur, the number of operation exception is
Increase primary.In the present embodiment, it according to the first statistical rules, determines in the first branch of first link and occurs grasping for the first time
Make exception, the statistics number of operation exception is updated to one by zero.
For how to identify operation exception, specifically, the first storage control can monitor above-mentioned first in step 303
After operational order executes time-out, the first label of record, first label indicates that first operational order passes through described first
First branch of link executes time-out;In step 306, the first storage control receives the second storage control forwarding
After operating success response message, the second label of record, second label indicates first operational order by described the
First branch of two links runs succeeded;First storage control determines whether the first operational order is provided simultaneously with the first mark
Note may determine that described with the second label if first operational order is corresponding with the first label and the second label simultaneously
There is operation exception in the implementation procedure of first operational order.First storage control determine aforesaid operations it is abnormal and then
According to above-mentioned first statistical rules or the second statistical rules, the number of operation exception is increased primary or is remained unchanged.First
Storage control can also be arranged described first after operation exception occurs in the first branch of determining first link of step 308
The faulty tag of first branch of link, the faulty tag indicate that the first branch of first link is unavailable or described
The rank of first branch of the first link reduces;After the faulty tag of the first branch of first link is set, access
Host initiates the second data operation request of the first storage dish of access, and the first storage control continues upon receipt will be sent to first
After second data operation request of storage dish, according to the faulty tag of the first branch of first link, chain can be carried out
The directly switching or the shunting of link on road, such as turning for subsequent operation instruction is no longer carried out by the first branch of the first link
Hair, but the forwarding of subsequent operation instruction is directly carried out by the first branch of the second link, in the present embodiment, the first storage control
Device processed directly forwards the second operational order by the second link, and pass through according to the faulty tag of the first branch of the first link
Second link obtains the response of the second operational order, and the first storage control sends the second data operation request to access host
(such as step 309 arrives step 314) for response.Before the first link failure is not repaired, directly switching or the chain of above-mentioned link
The troubleshooting mode of the shunting on road avoids the execution delay of subsequent operation instruction or executes failure.
Then it in next link, accesses host and is asked in step 315 to the initiation third data manipulation of the first storage control
It asks, which is used to carry out data access to the second storage dish.In step 316, the first storage control is logical
Cross the second branch of the first link and initiate third operational order to the second storage dish, request to corresponding data carry out read operation or
Person's write operation;In step 317, the first storage control monitors that above-mentioned third operational order executes time-out, the first storage control
Device handoff links processed are forwarded by the connection between the first storage control and the second storage control to the second storage control
Third operational order;In step 318, the second storage control sends third behaviour to the second storage dish by the connection chip of itself
It instructs;Step 317 and step 318 realize the first storage control by the second branch of the second link to described second
Storage dish forwards the third operational order, and second link includes the connection chip of the second storage control;In step
319, the second storage dish is finished after operation, and the operation success response of third operational order is sent to second controller,
Second storage control of step 320 forwards the operation success response of the third operational order to the first storage control;In step
321, the first storage control sends corresponding access response to host.
In step 322, the first storage control is receiving third operational order by the second branch of the second link
Operate the number of success response and then secondary statistical operation exception.According to predetermined statistical rules, need to increase in determination
When the number of operation exception, the number of operation exception can be increased once, the operation exception indicates the first storage control
The operational order that device receives executes time-out, but the operation to run succeeded by second link by first link;
It in the present embodiment, according to the first statistical rules, determines occur operation exception for the first time in the second branch of first link, operates
Abnormal statistics number is updated to two by one.In this step, the first storage control can be with the first link of further progress
The label of the faulty tag of second branch.First storage control determines that the second branch of the first link grasps in step 322
After making exception, the faulty tag of the second branch of first link can also be set, the faulty tag instruction described the
Second branch of one link is unavailable or the rank of the second branch of first link reduces;First link is being set
The second branch faulty tag after, access host initiate access the second storage dish other data operation requests, first deposits
Storage controller continues upon receipt will be sent to after other data operation requests of the second storage dish, according to first link
The faulty tag of second branch can carry out the direct switching or the shunting of link of link.
Further, the first storage control can also be before step 301, or in any of step 301- step 322
Moment starts timer to monitor the statistics number of the operation exception in a period of time, when the statistics number of operation exception is pre-
Reach predetermined threshold in fixing time, the statistics number of the operation exception also can indicate that the first link breaks down;At this point, first
Storage control can carry out the fault restoration of the first link.For example, in step 323, timer expired, the first storage control
Determine whether the statistics number of operation exception reaches predetermined threshold, after the statistics number of operation exception reaches predetermined threshold, the
One storage control carries out the fault restoration of the first link.Above-mentioned first statistical rules occurs just for each branch of link
Operation exception is counted for the first time, and correspondingly, predetermined threshold is set as the quantity of storage dish, that is, the numbers of branches of link,
(or the data manipulation instruction of each storage dish is destined for when the case where operation exception all occurs in every branch for determining link
There is operation exception), that is, the case where meeting predetermined threshold, can assert that link breaks down, at this point, being directed to the connection core of link
Piece is repaired, and failure can be solved.Above-mentioned second statistical rules, as long as operation exception occurs in chain road, regardless of whether this point
The operation exception for the first time occurred in branch all carries out the statistics of operation exception, in such cases, sets predetermined threshold as empirical value,
Link failure can also be identified from certain probability, and in the reparation for the connection chip for identifying the laggard line link of link failure, i.e.,
It can solve failure.
In the case where above-mentioned first statistical rules, predetermined threshold might be less that the quantity of storage dish, such as setting make a reservation for
The quantity for the storage dish that threshold value is 2/3rds, when the statistics number of operation exception reaches the quantity of 2/3rds storage dish,
There is operation exception on i.e. 2/3rds link branches, can also identify that failure has occurred in outgoing link.
In addition, the above-mentioned predetermined time can be executed by timer, can also realize in other way.Pre-
When fixing time not up to, the number of operation exception has reached predetermined threshold, also it can be assumed that there is link failure.
Specifically, since, comprising the connection chip of the first storage control, the first storage control is to this in the first link
Connection chip carries out repairing the failure that can repair the first link.Again specifically, the first storage control can restart described
The connection chip of one storage control;Alternatively, the connection chip of isolation first storage control;Alternatively, to described first
Queue on the connection chip of storage control is repaired;Alternatively, on the connection chip of first storage control
It is repaired port.
The embodiment of the present invention can also proceed as follows after above-mentioned troubleshooting: the first storage control carries out
After the fault restoration of first link, the faulty tag of first link can also be deleted, or setting described first
The normal tag of link, the normal tag indicate that first link state is available or the rank of first link just
Often, so that the first storage control upon receipt continue to be sent to after other data operation requests of the first storage dish, root
According to the state (faulty tag has been deleted or the normal tag of the first link) of first link, the first link is directly selected
The transmission of operational order is carried out, since the path of the first link is more shorter than the second link, subsequent operational order will be more
It is quickly obtained processing, such processing mode avoids subsequent operation instruction and executes caused time delay by the second link,
Improve the treatment effeciency of operational order.The embodiment of the present invention really solves operation caused by link failure in storage system
The execution time delay problem of instruction, although avoid in the prior art toggle path come operation caused by the mode of handling failure success
But inefficient problem further increases the efficiency of storage system.
The embodiment of the present invention, after repairing to the first link, can also further detect first link is
It is no to repair successfully, after detecting first link repair success, just the deletion of the faulty tag of the first link of progress or
The label of the normal tag of the first link of person.For example, whether the connection chip in the first link of detection repairs success, then the connection
After the state of chip is normal, deletes the faulty tag of the first link or the normal tag of the first link is set.If detecting
One link repair is unsuccessful, such as connection chip in the first link is repaired unsuccessful, and first storage control can be with
The Breakdown Maintenance notice of first link is issued, such as specifically, the failure of the connection chip in the first link can be issued
Maintenance or replacement notice, thoroughly to solve the problems, such as hardware fault.
The functional module of storage control provided in an embodiment of the present invention is introduced below, as shown in figure 4, storage control
Device processed includes storage processing module 401 and link failure processing module 402, and link failure processing mould 402 can be storage tube
The function enhancing for managing logic, is also possible to the individual processing logic independently of storage management logic, those skilled in the art can
It is neatly realized with introduction according to an embodiment of the present invention.
Processing module 401 is stored, for receiving the first data operation request, by the first branch of the first link to described
Corresponding first storage dish of first data read-write operation sends the first operational order, and first link is to deposit comprising described first
The link of the connection chip of controller is stored up, the target side that the first of first link branches into first link is described
The connection of first storage dish.
Link failure handles mould 402, for passing through the second link after monitoring that first operational order executes time-out
The first branch forward first operational order to first storage dish, second link be include second storage
The link of the connection chip of controller, the target side that the first of second link branches into second link is described first
The connection of storage dish;And receive by the operation of first operational order of the first multi-branch transport of second link at
Function response, according to the number of the operation success response statistical operation exception, the in the given time statistics of the operation exception
Number carries out fault restoration to the connection chip in first link, the operation exception indicates institute more than after predetermined threshold
It states the first operational order and time-out is executed by first link, but run succeeded by second link.
Link failure processing module 402, execute operation exception statistics specifically includes: according to statistical rules and it is described operation at
The number of operation exception is increased primary or maintains the operation exception number of former statistics constant, wherein the statistics by function response
Rule includes: the operation exception occurred in every branch for first link, only counts primary;Correspondingly, described pre-
Determine the quantity N that threshold value is storage dish described in the storage system.
Link failure handles mould 402, after being also used to determine and operation exception occur in the n-th branch of first link,
The faulty tag of n-th branch of first link is set, and the faulty tag indicates that the n-th branch of first link can not
With or first link the n-th branch rank reduce, n be nature number variable, n be more than or equal to 1, and be less than or equal to N;Then
Storage processing module 401 is after receiving the follow-up data operation requests for the n-th storage dish, according to first link
The faulty tag of n-th branch directly sends subsequent operation to n-th storage dish by the n-th branch of second link and refers to
It enables.
Link failure handles mould 402, is also used to after carrying out fault restoration to the connection chip in first link,
The faulty tag of every branch of first link is deleted, or the normal mark of every branch of first link is set
Label, the normal tag indicate that first link state can be used or the rank of first link is normal;Then store processing
After module follow-up data operation requests, according to the state of the first link failure label or the normal mark of the first link
Label switch back into the targeted storage dish of follow-up data operation requests described in the first chain road direction and send operational order.
The concrete function of each functional module also has associated description in above-mentioned embodiment illustrated in fig. 3, and details are not described herein.
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is
The specific work process of system, device and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.In addition, shown or beg for
Opinion mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or unit
Or communication connection, it is also possible to electricity, mechanical or other form connections.