US20140089735A1 - Computer System Input/Output Management - Google Patents
Computer System Input/Output Management Download PDFInfo
- Publication number
- US20140089735A1 US20140089735A1 US14/093,926 US201314093926A US2014089735A1 US 20140089735 A1 US20140089735 A1 US 20140089735A1 US 201314093926 A US201314093926 A US 201314093926A US 2014089735 A1 US2014089735 A1 US 2014089735A1
- Authority
- US
- United States
- Prior art keywords
- data
- initiator
- network
- san
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0823—Errors, e.g. transmission errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3495—Performance evaluation by tracing or monitoring for systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/133—Protocols for remote procedure calls [RPC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/28—Timers or timing mechanisms used in protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/86—Event-based monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/88—Monitoring involving counting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0604—Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/065—Generation of reports related to network devices
Definitions
- Embodiments of the present invention relate generally to the collection of event and performance data at one or more servers in communications infrastructures such as a Storage Area Network (SAN) and forwarding that data to a network diagnostics manager where the data can be analyzed and adjustments made to the SAN to improve its performance, referred to herein as input/output (I/O) management.
- SAN Storage Area Network
- I/O input/output
- HBA Host Bus Adapter
- NIC Network Interface Card
- I/O Input/Output
- the disk arrays in the SAN can be presented as Small Computer System Interface (SCSI) disks to the Operating System (OS).
- SCSI Small Computer System Interface
- OS Operating System
- the SCSI disks are, in turn, either presented up to an application running in the server as a File System or a raw disk device.
- the OS and applications running on the server may access the SAN storage array as a disk connected to the server.
- SAN errors and bottlenecks are often difficult to diagnose, and can be caused by subtle interactions with seemingly unrelated devices.
- the SCSI protocol is layered on top of the network protocol.
- the OS issues SCSI commands to the storage array to access data and control the storage arrays.
- Two types of commands are issued to the storage array.
- Data commands e.g., read, write, report Logical Unit Number (LUN or, more simply, LU), and the like
- Task management commands e.g., target reset, LUN reset, etc.
- the task management commands issued by one server to a storage array can affect data access commands from another server to the same storage array. This also means the action of one server connected to a storage array can cause an error on another server connected to same storage array.
- a server configured as an initiator to enter a debug or diagnostics mode.
- agents can be employed to collect massive amounts of counter information and protocol event data (SCSI events, Fibre Channel (FC) events, discovery events, and the like) related to the fabric and target at each HBA and store the collected data in a system log file.
- SCSI events Serial Bus Events
- FC Fibre Channel
- the counters only provide information about the performance of a particular HBA or switch port (e.g., the amount of data passing through an HBA, the number of data packets sent, received, etc.), but do not provide a “big picture” of what is happening in the overall network.
- the high-level event data is also generally limited to information about events seen at a particular HBA or switch port (e.g., a notification that a network component was inserted or removed, etc.), but as with the counter information, it does not provide a “big picture” of what is happening in the overall network. High-level events are also problematic in helping determine root causes, because they can often be intentionally induced by the end-user, or simply a symptom of a problem created by a root cause existing elsewhere.
- this type of intensive data collection represents an overhead burden that affects the performance of the system because the system is still operating while the massive amount of event data is being collected.
- the mere act of operating in a diagnostics mode can mask the problem.
- the mere collection of data does not provide any insight into the problem. The system log file must be reviewed, and the data collected at the time the performance issues were occurring must be interpreted in an attempt to diagnose the problem.
- Protocol information e.g. network protocol and SCSI protocol information
- SCSI protocol information is much more valuable for uncovering root cause, but this information is typically locked up in the network devices and never exposed.
- Some existing network diagnostics tools do not require special hardware placed at various locations throughout the network. Such tools communicate with the fabric switches (each of which has a Simple Network Management Protocol (SNMP) agent running inside it) using the SNMP protocol, and gather high level counter data (e.g., how many bytes have been transmitted in the last hour, the number of read commands in the last hour, etc.).
- SNMP Simple Network Management Protocol
- counter data e.g., how many bytes have been transmitted in the last hour, the number of read commands in the last hour, etc.
- I/O Input/Output
- an endpoint no longer sends any commands. The lack of activity detected by the counters indicates there may be a problem, but the type of problem is unknown.
- Some network switches have an option where a port can be directed to send information to analyzer hardware within the switch. Additional hardware external to the switch then encapsulates the information into Ethernet frames that can be read with dedicated software.
- This type of hardware solution represents another hardware add-on that provides for the viewing of lower level protocol items. It does this by extracting portions of the packets that the switch may not normally extract for the purpose of collecting the information, and does so only on a single port at a time. After the initiator stack obtains this information from the target and fabric, the information can be interpreted. However, in response to this information, the initiator can only control its own operation (e.g., not send as much data, try another route, etc.). Moreover, the initiator does not keep a “scorecard” of this information for diagnosing network performance issues.
- Emulex Corporation's HBAnywareTM management suite in its current configuration, keeps track of how HBAs are performing, how they are configured, enables HBAs to be configured remotely, and allows reports to be sent to remote locations on the network.
- HBAnywareTM is disclosed in U.S. application Ser. No. 10/277,922, filed on Oct. 21, 2002, the contents of which are incorporated herein by reference.
- the functionality of HBAnywareTM resides in HBA device drivers, but remote user space agents in the HBAs are also needed to perform the management functions.
- HBAnywareTM collects configuration information about the HBAs using agents in the remote servers (HBAs) and causes the HBAs to be configured for different sizes and behaviors.
- HBAs agents in the remote servers
- HBAnywareTM communicates with the remote servers both in-band and out-of-band.
- the HBA drivers in the remote servers communicate with each other to allow centralized management of the SAN and configuration of HBA hardware at a central point. For example, if HBAnywareTM-compatible hardware is located somewhere in the SAN, it can be discovered by the HBAnywareTM software. Messages can be sent to and received from the HBAnywareTM-compatible hardware that cause the firmware in the hardware to be updated, enable the configuration of the LUNs in the network, etc. All of this can be done from a central location rather than requiring each server to separately configure its own HBA.
- HBAnywareTM can also collect some types of diagnostics information. With HBAnywareTM, the agents collect data from the stack, but only data local to the HBA (e.g. link up, link down) is collected. Counter data is collected from the HBAs, but it is generally uninteresting, and no lower level protocol events, no latency data, and no capacity information is collected. Moreover, HBAnywareTM does not integrate the collected information into a system view.
- FIG. 1 illustrates an exemplary conventional SAN 100 including a host computer 102 , a fabric 104 , a target 106 and one or more Logical Units (LUs) 108 , which are actually logical drives partitioned from one or more physical disk drives controlled by the target's array controller.
- the host computer 102 includes an initiator 110 such as a Host Bus Adapter (HBA) or I/O controller for communicating over the SAN 100 .
- HBA Host Bus Adapter
- a representative application 112 is shown running on the host computer 102 .
- the fabric 104 may implement the Fibre Channel (FC) transport protocol for enabling communications between one or more initiators 110 and one or more targets 106 .
- FC Fibre Channel
- the target 106 acts as a front end for the LUs 108 , and may be a target array (a single controller with one or more ports for managing, controlling access to and formatting of LUs), Just a Bunch Of Disks (a JBOD) (a collection of physical disks configured in a loop, where each disk is a single target and a LU), a Switched Bunch Of Disks (SBOD®), or the like.
- a target array a single controller with one or more ports for managing, controlling access to and formatting of LUs
- JBOD Just a Bunch Of Disks
- SBOD® Switched Bunch Of Disks
- An example of a conventional target array is an EMC Symmetrix® storage system or an IBM Shark storage system.
- the application 112 may employ a file system protocol and may initiate read or write I/O commands 114 that are sent out of the host 102 through the initiator 110 and over the fabric 104 to target 106 , where data may be read from or written to one or more of the LUs 108 .
- I/O command 114 When an I/O command 114 is transmitted, there is an expectation that the I/O command will be completed, and that it will be completed within a certain period of time. If the read or write operation is completed successfully, an I/O command completion notification 116 will be delivered back to the application 112 .
- the I/O command may not complete, and no I/O command completion notification 116 will be sent back to the application 112 .
- the only feedback received by the application 112 may be an indication that the I/O command timed-out, and a reason code providing a reason for the timeout.
- the average I/O command completion times for two different LUs 108 in the same target 106 are drastically different, for a similar I/O pattern and RAID level (e.g. greater than 25% difference), this may be an indication that the LUs are unbalanced and that there is some unfairness at the target, and that perhaps the LU loads need to be re-balanced to achieve a greater degree of fairness.
- the average I/O command completion times for all LUs 108 at a target 106 are rising, over time, and becoming too high, this may be an indication that the target is receiving too many I/O requests and that more storage needs to be added so that some data can be shifted to the new target. In other words, it is desirable for the application to detect unfairness among LUs and/or overloaded conditions at a particular target.
- queue depth is one of the “knobs” available to the storage administrator to balance the system.
- a SAN can be thought of in terms of many other queuing problems.
- the SAN has a fixed I/O handling capacity, and that capacity needs to be shared by all the applications that are demanding I/O.
- SRM Storage Resource Management
- SAN Management data link utilization, for example
- Storage Management data primarily storage capacity
- the fabric is rarely the I/O capacity bottleneck. More often, the bottleneck is either at the server or at the storage controller.
- I/O handling capacity depends on a number of factors, including memory availability, kernel architecture, and Central Processing Unit (CPU) power.
- CPU Central Processing Unit
- I/O handling is also dictated by a number of factors, including the system architecture, the controller front-end, the amount and speed of cache, the controller back-end, and the actual disks themselves.
- Managing performance issues requires an understanding of the current mapping of initiators to target ports and backend devices.
- understanding the queue depth demand of every initiator, the I/O handling capability of the storage controllers, and an understanding of the actual queue demand placed on the system by every initiator is highly desirable. All of this information must be put together to help understand where the performance issue is, and what areas can be leveraged to mitigate or eliminate the performance issue. Putting together this information is becoming more difficult in today's data centers. With virtual server technology, more queuing demand is placed on storage controllers by fewer initiators and servers. Further, the mapping of all the queue demand to the storage controllers is more difficult to discern and aggregate.
- Embodiments of the present invention relate generally to the collection of event and performance data at one or more servers in communications infrastructures such as a SAN and forwarding that data to a network diagnostics manager where the data can be analyzed and adjustments made to the SAN to improve its performance, referred to herein as I/O management.
- One embodiment of the present invention relates to the use of remote agents and a central server application for collecting specific interesting negative event data (diagnostics data) to enable a picture of the operational health of the SAN to be determined.
- agents are placed in servers having HBAs, NICs, or other adapters (I/O controllers) acting as initiators.
- the agents interact with relatively inexpensive HBAs in the servers through a driver stack to collect event data. Because of the initiator function they perform, HBAs have visibility to parts of the network that other entities do not have access to, and thus are ideal locations for gathering event data.
- a SAN diagnostics manager located in one or more management servers then pulls the collected data from each agent so that a SAN diagnostics manager can piece together a “picture” of the SAN that an individual server would not ordinarily be able to see.
- the agents can also collect errors and performance data (e.g., throughput problems, etc.) seen at the HBAs from the OS of the servers.
- the agents according to embodiments of the present invention are (1) nondisruptive, (2) capable of being activated when needed, (3) selectively configurable to look for only a certain number of data items and store them in memory, and (4) configured for periodically sending this information back to a central location so that a picture of what is happening in the SAN can be developed.
- the information being collected at any one particular HBA by itself may not by itself be enlightening to overall network performance, the collection of information being gathered by a number of HBAs can reveal the trouble spots in the network.
- embodiments of the present invention are able to collect protocol-based negative event data such as error messages and observational performance information received by initiators from the targets.
- embodiments of the present invention can cause little or no additional performance degradation by being configurable to collect only a relatively small amount of interesting negative event data.
- embodiments of the present invention collect “stateful” event information having a temporal component.
- counters do not provide time information, they must be monitored constantly, which can have a negative performance impact.
- SAN diagnostics tools can only provide counter data indicating, for example, the number of bytes being received by an HBA, with no visibility at the driver level.
- embodiments of the present invention can make requests of the driver itself, such as observed performance indications (e.g., a latency timer that starts when the request is made and stops when a completion message is received). These performance indications can reveal previously undetectable performance issues at the driver level.
- embodiments of the present invention do not require hardware, just downloadable drivers, APIs and agents, and can collect the specific kind of data needed to develop a big picture of the network.
- the data collected according to embodiments of the invention although collected by the initiator driver stack, doesn't relate to the initiator but relates to other parts of the network such as the switches or the targets.
- a further embodiment of the present invention relates to the computation of an oversubscription value based on the demand for a device divided by the handling capacity of the device to help determine whether the device is oversubscribed and changes need to be made.
- a still further embodiment of the present invention relates to collecting and logging certain types of event data in a database in a centralized management server, and computing a system severity value indicative of the level of impact (criticality or severity) of each event.
- FIG. 1 illustrates an exemplary conventional SAN including an initiator for sending an I/O command, a fabric, and a target for the I/O command including one or more LUs.
- FIG. 2 illustrates an exemplary kernel within a host computer for determining I/O command completion times according to embodiments of the present invention.
- FIG. 3 illustrates an exemplary flowchart describing the determination of I/O command completion times according to embodiments of the present invention.
- FIG. 4 a illustrates an exemplary SAN in an enterprise data center according to embodiments of the invention.
- FIG. 4 b illustrates an exemplary production server in greater detail according to embodiments of the invention.
- FIG. 4 c illustrates an exemplary management server in greater detail according to embodiments of the invention.
- FIG. 5 illustrates an exemplary organizational structure of an enterprise data center.
- FIG. 6 illustrates an exemplary communication flow between software and hardware elements according to embodiments of the present invention.
- FIG. 7 is an example of a SAN and a SAN diagnostics manager capable of computing an oversubscription value according to embodiments of the present invention.
- FIG. 8 illustrates an exemplary SAN implementing severity data collection system according to embodiments of the present invention.
- Embodiments of the present invention relate generally to the collection of data at one or more servers in a SAN and forwarding that data to a network diagnostics manager where the data can be analyzed and adjustments made to the SAN to improve its performance.
- One particular embodiment of the present invention relates to the processing of I/O commands across the SAN, and more particularly, to the determination of I/O command completion times and average I/O command completion times (latency) per logical driver in a SAN to enable optimization of storage allocations and improve I/O command completion times.
- Another particular embodiment of this invention relates to the use of remote agents embedded in initiators and a network diagnostics manager application for collecting specific interesting negative event data (diagnostics data) to enable a picture of the operational health of the SAN to be determined.
- a further particular embodiment of the present invention relates to the computation of an oversubscription value based on the demand for a device divided by the handling capacity of the device to help determine whether the device is oversubscribed and changes need to be made.
- a still further particular embodiment of the present invention relates to collecting and logging certain types of event data in a database in a centralized management server, and computing a system severity value indicative of the level of impact (criticality or severity) of each event.
- the first embodiment of the present invention to be described in greater detail relates to the determination of I/O command completion times and average I/O command completion times (latency) per logical driver in a SAN to enable optimization of storage allocations and improve I/O command completion times.
- embodiments of the present invention are described herein in terms of SCSI upper layer transport protocols and FC lower layer transport protocols for purposes of illustration only, embodiments of the present invention are applicable to other upper and lower layer transport protocols.
- embodiments of the present invention are not limited to fabric-attached storage, but apply to any SAN topology discoverable by the present invention, be it hub-based, arbitrated-loop based, or fabric based.
- FIG. 2 illustrates an exemplary kernel 200 within a host computer for computing I/O command completion times and average I/O command completion times according to embodiments of the present invention.
- the kernel 200 is the essential center of the host operating system, the core that provides basic services for all other parts of the operating system.
- the kernel 200 may include an upper transport protocol layer such as SCSI layer 202 and a lower transport protocol driver layer 204 .
- the driver 204 may include a transmit section 206 , a receive section 218 , and global data space 228 .
- the driver's global data space 228 may store driver configuration data, buckets 224 for each LU, and a queue 234 for each LU, described in further detail below.
- the host operating system calls the driver 204 , which allocates a block of storage or data structure within its global data space 228 representing that port instance, and assigns a target pointer to that block of storage.
- the driver monitors multi-ported I/O controllers the same as single-port I/O controllers. This approach maintains target/port independence. In other words, the driver does not try to figure out whether two or more targets belong to a single I/O controller.
- FC discovery provides for target discovery only, and targets are not subordinate to each other.
- a multi-port array looks like multiple targets to the driver's discovery engine, just like a JBOD with four disks is discovered as four targets.
- Embodiments of the present invention track this raw data just the same, allowing upper layer applications to “link” target/LU disturbances together with additional knowledge of the SAN topology.
- the driver 204 To compute average completion time on a per-LU, per-target and per port basis, the driver 204 must store statistics about the completion times for a number of I/O command completions on a per-LU, per-target, per-port basis. Therefore, in embodiments of the present invention, the driver may allocate “buckets” (memory locations) within its global data space 228 for storing a count of the number of I/O commands that completed within a particular range of time.
- one bucket may keep track of the number of I/O commands that took between 0.0 and 10.0 ms to complete, another bucket may keep track of the number of I/O commands that took between 10.0 and 20.0 ms to complete, another bucket may keep track of the number of I/O commands that took between 20.0 and 30.0 ms to complete, and so on.
- Bucket sizes may be fixed by the driver 204 , or may be specified by the system administrator when the driver is loaded.
- Each bucket 224 corresponds to a particular LU, target and port. In the example of FIG. 2 , N buckets are allocated for each of M LUs, and this two-dimensional array may be repeated for P targets and Q ports.
- a three-dimensional per-LU, per-target, per-port histogram array is stored in the driver's global data space 228 .
- the buckets 224 are accumulators, so they are not normally reset. Eventually, they may wrap back to zero, so embodiments of the present invention may keep track of when the count in each bucket wraps around. For example, if the total count in an N-bit bucket is 2 ⁇ N, and it has wrapped twice, the driver must recognize that the count in the bucket is 2 ⁇ 2 ⁇ N plus whatever count is in the bucket at the time the computation is performed. One way that wrapping could be estimated is to keep track of the previous count and compare it to the new count. If the new count is lower, then it is assumed that the bucket count wrapped once. Alternatively, the driver could reset the bucket counts when an overflow condition is encountered, or the driver could issue a signal or trap to the application indicating an overflow, and the application could initiate a reset.
- I/O commands 232 from an application being executed by the host are received by the upper SCSI layer 202 (see block 300 of FIG. 3 ) and passed down to the transmit section 206 of driver 204 as SCSI commands 230 .
- the transmit section 206 time stamps a start time of the SCSI command 230 (see reference character 208 in FIG. 2 and block 302 in FIG. 3 ) and embeds the time stamp into a transport protocol data structure such as a FC command 210 that encapsulates the SCSI command.
- the FC command 210 is then sent out over wiring 212 to the FC fabric 214 (see block 304 in FIG. 3 ).
- the FC command 210 that encapsulates the SCSI command and timestamp includes some data fields (including the timestamp field) that do not leave the kernel memory allocated and managed by the driver.
- the receive section 218 then computes an elapsed I/O command completion time (net round trip completion time from transmit to receive) 222 based on the difference between the timestamped SCSI command start time 208 extracted from within the I/O command completion 216 and the recorded completion time 220 (see block 308 in FIG. 3 ). Every time an I/O command completion time 222 is computed for a particular LU and port, the count in the appropriate bucket 224 (based on the I/O completion time) is incremented (see block 310 in FIG. 3 ). The buckets therefore maintain a count of the number of I/O commands completed and a distribution of all completion times.
- the count in the buckets may form a bell curve, or may form two different spikes, one for reads and one for writes, with the read times being much shorter than the writes.
- the relative position of these spikes to each other depends on the nature of the I/O mix and how the storage is set up (e.g., RAID, RAID1 or RAID5).
- the count can be used to compute an average I/O command completion time for a particular LU.
- the I/O command completion time measurement is based on a clock in the host, and utilizes high resolution timers in the operating system that resolve to milliseconds at least.
- the driver 204 keeps track of the time from when the driver sent an I/O command to the time it receives an acknowledgement of the completion of that I/O command, all the way back through the network from the LU. In other words, it is the entire round trip time from the driver's perspective.
- the I/O command completion time measurement is performed by the lower transport protocol driver layer 204 .
- embodiments of the present invention track the I/O command completion times in the driver 204 from the time the SCSI layer 202 gives the SCSI command 230 to the driver to the time the driver receives the I/O completion 216 .
- the I/O command completion times therefore take into account all of the transport layer latency and overhead without injecting continued SCSI layer file system application thread transitions to user space into the completion time. The measurements are more accurate because the delays due to higher level processing are not included.
- the receive section 218 may then compute an updated average I/O command completion time for the particular LU and port (see block 310 in FIG. 3 ).
- the product of each multiplication for each bucket associated with that LU is then summed.
- the sum is then divided by the sum of the counts in all of the buckets for that LU to produce the average I/O command completion time for that LU. This computation is repeated for all LUs in each target and for all ports.
- raw data such as the bucket counts may be sent to a higher layer, and the computation of average I/O command completion times may be performed at this higher layer.
- the buckets and/or average I/O command completion times may also be made available to upper layer applications for display to system administrators via a host-provided Application Programming Interface (API).
- This host-provided API typically receives data at its bottom edge from the driver while exporting a callable interface at its top edge for applications.
- the data may also be used to make histogram plots that aid in early warning detection and usage patterns on the storage device (see block 312 in FIG. 3 ).
- a higher level application wants to read the histogram data, it may request a target 1 pairing, and the driver would index into its private data structure, access it in its entirety, and return it back up to the application (see block 314 in FIG. 3 ).
- embodiments of the present invention may detect overloading (average I/O command completion time for a LU approaching an upper limit, or I/O commands failing altogether).
- This upper limit may represent a predetermined time interval from a maximum allowable I/O command completion time specified by the upper layers, both of which may be a default value that may also be configurable by the system administrator.
- a queue 234 may be maintained in the driver's global data space 228 for each LU in each target for each port. This queue 234 holds outstanding (pending and incomplete) I/O commands for that LU.
- the depth of the queue 234 may be controllable at the SCSI layer of the initiator. Adjusting the queue depth serves to control the number of outstanding I/O commands for each LU.
- the receive section 218 may not only generate per-LU average I/O command completion times, as described above, but may also be able to do something with it, such as throttling back the per-LU queue depth. For example, suppose that the receive section 218 detects that a LU's average I/O command completion time is moving out too far (increasing over time towards the upper limit).
- the driver's receive section 218 can upcall the midlayer (call into the operating system), and from the SCSI layer 202 , lower the number of outstanding I/O commands to that LU by reducing the queue depth for that LU (by half, for example), until the LU recovers, as indicated by a reduction in the average I/O command completion time.
- the amount that the queue depth is lowered may be configurable by the system administrator. The effect of lowering the maximum number of incomplete I/O commands is that it increases the probability that the LU will actually respond and complete the I/O commands because it is not as overloaded.
- multipathing configurations benefit from timely completion of I/O commands rather than error handling as multipathing configurations typically have to maintain command retry state that pressures system resources.
- the queue depth can be lowered for all LUs in the target.
- This blanket approach serves to protect against the starvation of LUs and provide fairness to all LUs so that LUs with a high number of I/O command completions are throttled as well as those LUs that are starved. If, after lowering the queue depth for all LUs, the average I/O command completion time for a particular LU is still too high, the queue depth for all LUs in the target can be repeatedly lowered, as necessary, until a lower limit is reached.
- the lower limit which may be configurable by the system administrator, is preferable as opposed to lowering the allowable number of outstanding I/O requests to reach zero because it is desirable to have some amount of I/O commands queued up so it is possible to evaluate how well the LU is doing. If the condition causing the high average I/O command completion time is transient, the LU will recover quickly. If the condition is more continuous in nature, the LU will recover slowly, or may not recover at all.
- the driver can automatically perform step increases to the LU queue depth for all LUs in the target.
- the queue depth can eventually be raised until it is back to the initial depth that the driver was initialized with.
- the step increases may be configurable by the driver, and are useful to prevent overload conditions from being reintroduced if the condition causing the high average I/O command completion times is continuous in nature.
- the target may simply be oversubscribed, and it may be necessary to expand the number of LUs in the target, or redirect some of the data out to a new target. Being oversubscribed is relative—cutting the queue depth in half even once may be an indication that the storage array is oversubscribed, or a system administrator may not consider the storage array to be oversubscribed until the queue depth has been dropped to the lower limit without improvement in the average I/O command completion time. Adding LUs or redirecting data to a new target would have to be performed manually by the system administrator.
- the average I/O command completion time is not the only statistic that may be used to determine what is occurring to the LUs within a target. For example, if there is a large disparity between the average I/O command completion times of LUs in the same target, for a similar I/O load, this is an indication of starvation (unfairness in the average I/O command completion times for LUs within a target). Starvation usually applies to a few LUs out of many, and occurs due to unfairness of the I/O scheduler in the operating system, above the driver. However, the driver is not in control of fairness in terms of I/O scheduling, and thus can only detect a lack of fairness, not restore it. Changing fairness is something that the system administrator must do manually.
- the counts in the individual buckets may also provide an indication of what is happening within a LU. For example, a bell curve centered at a particular average I/O command completion time may be expected, but if there is a spike at some unexpected completion time, this may indicate a specific problem requiring LU maintenance.
- the nature of the distribution of counts in the buckets for a LU may provide an indication of what is happening in the LU, and more generally, what is happening at the target level, which is what the FC transport protocol cares about. (The application cares about the LU level.) Again, any adjustments made as a result of the nature of the distribution of counts in the buckets for a LU must be made manually by the system administrator.
- the invention can be extended to multiple initiators and multiple targets. Statistics can be obtained for all initiators and all targets so that a system administrator can determine which targets are overloaded and which initiators are affected. In other words, it can be extended across the entire SAN. All existing tools do not and cannot have this extension capability because they are all applicable only to direct attached storage.
- a system administrator may want to work from a single terminal on a single host and evaluate I/O command completion time data for all hosts in the SAN and all of the LUs, targets and ports in the SAN.
- Emulex Corporation's HBAnywareTM management suite in its current configuration, keeps track of how HBAs are performing, how they are configured, enables HBAs to be configured remotely, and allows reports to be sent to remote locations on the network.
- HBAnywareTM can be extended in view of embodiments of the present invention to poll the average I/O command completion time and other information from the driver of each host within which HBAnywareTM is running and present it to the system administrator at a remote location in graphical or tabular form as described above so that a system administrator can see all of this LU loading information for the entire SAN and make adjustments accordingly.
- HBAnywareTM has a routine running in each driver that reports back, in-band, to the host within which the HBAnywareTM software is running.
- HBAnywareTM can communicate with all of the HBAs on each host, collect the data for each of the buckets for each LU, and send this data back to the host within which the HBAnywareTM software is running.
- the adjustments to the queue depths could also be done by a system administrator using HBAnywareTM and communicated back to each of the drivers.
- the latency information in its histogram form, can also include information about the amount of I/Os being completed by the target and LUNs. Monitoring this volume is also interesting from a diagnostic standpoint. For example, four load-balanced HBAs, in the same server, should show similar volume to a given target/LUN. However, if the volume is quite different, the load balancing software is either malfunctioning or not configured correctly.
- embodiments of the invention further allow for the collection and management of the completion times for all servers in a data center, or some logical subset (i.e., all servers associated with a particular application), to provide trending, comparisons, and the discernment of acceptable and unacceptable response times. By comparing across multiple servers and trending over time, good and bad latency can be determined.
- the ability to observe latency for all servers associated with a given application can be helpful when diagnosing a performance problem with that particular application. In general, any servers whose latency falls significantly outside the average latency of the other servers, especially after being newly added to the system, can be targeted as possibly malfunctioning.
- Embodiments of the invention also provide the ability to integrate protocol error data, capacity (queue) data and latency data. For example, protocol errors that do not affect latency or volume of I/O can be ignored. Protocol errors that do affect latency or volume can also be prioritized.
- protocol errors that do affect latency or volume can also be prioritized.
- the second embodiment of the present invention to be discussed in greater detail relates to the use of remote agents embedded in initiators and a network diagnostics manager application for collecting specific interesting negative initiator event data (diagnostics data) to enable a picture of the operational health of the SAN to be determined.
- agents are placed in servers acting as initiators in the SAN.
- the agents interact with relatively inexpensive HBAs, NICs or adapters (referred to herein as I/O controllers) to collect initiator event data rather than relying on expensive test box hardware.
- I/O controllers relatively inexpensive HBAs, NICs or adapters
- the collected event data doesn't necessarily relate to the initiator but may relate to other parts of the network such as the switches and targets (disk drives).
- HBAs are primarily intended to transfer data over a FC link
- embodiments of the present invention implement firmware modifications to utilize the HBAs for gathering certain event data.
- HBAs are ideal for gathering event data because each HBA has visibility to parts of the network that other entities cannot see.
- Each agent collects the event data from the HBA through the HBA driver stack and sends the collected data to a network diagnostics manager in a centralized management server or a plurality of distributed servers so that a picture of the SAN can be pieced together that any one individual server would not ordinarily be able to see.
- the agents can also collect, from local drivers in the HBAs of the servers, errors and performance data seen at the HBAs (e.g., throughput problems, etc.).
- This data is periodically pulled from the agents by a network diagnostics manager and stored in a database, where it can be accessed by a base application. With this collected information, an overall picture of the network performance can be pieced together, and the SAN can be diagnosed based on what the initiators have seen.
- the network diagnostics manager can integrate I/O data from the OS, the initiator stack, network, and storage devices to create information and add value. This integration of data can enable the system to determine which errors to ignore, and which ones to pay attention to.
- embodiments of the present invention provide a number of nondisruptive smart agents scattered about the network and capable of being activated when needed and selectively configurable to look for only a certain number of data items and store them in memory, and periodically send this information back to a central location so that a picture of what is happening in the SAN can be developed.
- the information being collected at any one particular HBA by itself may not be enlightening, the collection of information being gathered by a number of HBAs can reveal the trouble spots in the network.
- collecting protocol-based negative event data in accordance with embodiments of the present invention is akin to obtaining information that a car with a particular license plate number ran a red light (a protocol violation) at the time of the accident.
- embodiments of the present invention may cause no additional performance degradation by the collection of only interesting negative event data, because performance is already degraded at this point in time.
- the counter information collected by existing SAN diagnostics tools is just a number, and only is associated with time from the perspective of when they were read. Because counters don't have time information, they must be monitored constantly, which can have a negative performance impact.
- the SAN diagnostics, tool according to embodiments of the invention sends “stateful” information back, related to time.
- existing SAN diagnostics tools can only provide counter data indicating, for example, the number of bytes being received by an HBA, with no visibility at the driver level.
- embodiments of the present invention can make requests of the driver itself, such as observed performance indications (e.g., a latency timer starts when the request is made, and stops when a completion message is received). These performance indications can reveal previously undetectable performance issues at the driver level.
- embodiments of the present invention do not require hardware, just downloadable drivers, APIs and agents, and can collect the specific kind of data needed to develop a big picture of the network.
- the data collected according to embodiments of the invention although collected at the initiator, doesn't relate to the initiator but relates to other parts of the network such as the switches and/or targets.
- FIG. 4 a illustrates an exemplary SAN 400 in an enterprise data center according to embodiments of the invention.
- a number of production servers 402 each executing one or more applications (e.g. financial, human resources, or engineering applications), are connected to a fabric 406 .
- Each production server 402 contains one or more HBAs (I/O controllers) 408 , an initiator device driver stack 424 , and an agent 418 .
- the production servers 402 can communicate with one or more storage arrays 412 over the fabric 406 , which can include, but is not limited to, FC, Ethernet, and Infiniband.
- the SAN 400 also includes one or more management servers 414 executing network diagnostics manager software 416 for configuring the speed of the fabric (e.g., the speed of the links), zoning, etc.
- the network diagnostics manager 416 may be stored in computer-readable storage media and executed by one or more processors in the server.
- the network diagnostics manager 416 may receive link information (which is analogous to the “pulse” of the SAN 400 , indicating that there is activity occurring), and throughput information (which is analogous to the “blood pressure” of the SAN, where excessively high or low throughput can indicate problems).
- the network diagnostics manager 416 is responsible for configuring the storage arrays (e.g., mapping LUNs in the storage arrays to production servers, configuring the size of the logical LUNs, etc.).
- the network diagnostics manager 416 may monitor the capacity of the storage arrays 412 and receive localized latency measurements (e.g. how long it took to complete a particular command at the storage array).
- HBAnywareTM as mentioned above, can also be executed within the one or more management servers 414 .
- FIG. 5 illustrates an exemplary organizational structure 500 of an enterprise data center.
- Conventional SAN diagnostics software has drawbacks in that it can only provide high level information and does not have access to lower level information. So, for example, in FIG. 5 , an application performance issue appearing at a business unit group 502 (which provides the applications) may be passed down to the distributed systems engineering level 504 deploying Linux, Windows, Solaris servers and the like for resolution, but those involved at the distributed systems engineering level may not have ready access to the lower level information (only available at the SAN management group level 506 or storage management group level 508 ) to diagnose the problem, and therefore must rely on diagnostics information received from those groups. However, information received from the SAN management group 506 or storage management group 508 individually may not reveal the source of the problem. Thus, the agents mentioned above are utilized in embodiments of the present invention to automatically collect this lower level event data.
- FIG. 4 b illustrates an exemplary production server 402 in greater detail according to embodiments of the invention.
- Each production server 402 includes HBA 402 and/or Network Interface Card (NIC) 410 , generally referred to herein as I/O controllers, and an initiator driver stack 424 that may be stored in memory (computer-readable storage media) 426 and executable by processor 420 .
- NIC Network Interface Card
- I/O controllers generally referred to herein as I/O controllers
- an initiator driver stack 424 may be stored in memory (computer-readable storage media) 426 and executable by processor 420 .
- each production server 402 includes an agent 418 that can be stored in computer-readable storage media 426 and executed by the server processor 420 .
- the agent 418 may be downloaded, activated, deactivated, configured or upgraded using the network diagnostics manager 416 executing in the one or more management servers 414 through out-of-band channels 422 (e.g., over the Ethernet) to collect a configurable subset of information such as selected initiator event data. End users may independently perform these tasks as well.
- the agents can be either installed separately or delivered within a software download.
- the driver stack is tailored by adding an agent API 436 to facilitate communications between the agent 418 and the initiator driver stack 424 .
- the agents 418 capture data at the initiators (production servers 402 ) that is already being generated, and provide the data to the network diagnostics manager 416 in a format that shows how the initiator, connected fabric devices, and targets are performing. Although collected at the initiators, all the data doesn't relate to the initiator itself.
- the agents 418 cooperate with the existing initiator driver stack 424 and collect the information from the driver stack.
- a portion of memory 426 in the server 402 stores the data collected by the agent 418 . However, if the local memory 426 exceeds a certain capacity threshold, the agent 418 can proactively communicate with the network diagnostics manager 416 through the collector 430 to request that it be serviced.
- FIG. 4 c illustrates an exemplary management server 414 in greater detail according to embodiments of the invention.
- the network diagnostics manager 416 includes a base application 428 (the network diagnostics software), one or more collectors 430 , a database 432 , and a Service Locations Protocol (SLP) interface 434 .
- the agents 418 communicate with the SLP interface 434 over the Ethernet 422 to identify what type of agent they are. In embodiments of the present invention, for example, all agents 418 having the capabilities described above may be identified as being of the same agent type.
- the collectors 430 communicate with the SLP interface 434 to identify all of the agents 418 of a certain type.
- the collectors 430 do not have to broadcast over the network to identify the agents 418 .
- the collector 430 pulls data from the sensors/agents 418 over the Ethernet 422 on a periodic basis, and stores the data in the database 432 .
- the collected data represents a system-wide view.
- the data can then be accessed by the base application 428 , which performs a recording engine function.
- the base application 428 may be accessible through a web server to enable a system administrator to retrieve specific data, generate reports, and diagnose network problems.
- the base application 428 through the collector 430 , can communicate with the agent 418 through a messaging protocol to configure the agent to collect only certain types of event data and store it in the special memory 432 .
- the collector 430 polls the agent 418 , which then retrieves the data from the special memory 426 and sends it to the collector 430 .
- an initially inactive agent 418 can be downloaded into a production server 402 .
- the agents 418 can be included in driver kits (containing drivers, industry standard APIs, HBAnywareTM, etc.), so that when the kit is installed, the agents are also installed and are ready to be awakened.
- the initiator driver stack 424 can be tailored so that if an agent 418 is ever awakened, the agent can be immediately accessible to the OS.
- An agent API 436 can be downloaded and placed into initiator driver stack 424 so that an end user does not have to update the driver stack to get the benefit of the agents when they are activated. In this way, nothing needs to be added to production servers 402 to enable them to interact with the agents 418 .
- These inactive agents 418 can collect nothing until a command is received to activate them and enable them to collect only certain kinds of event data. This data can be saved into memory 426 , and periodically sent back to the network diagnostics manager 416 .
- FIG. 6 illustrates an exemplary communication flow 600 between software and hardware elements according to embodiments of the present invention.
- changes are needed to the initiator driver stack.
- data was just being acted on, but with embodiments of the present invention the data now needs to be converted to a form that can be collected by the agent, and the agent has to be tailored to know where to collect this information from in the stack.
- an application 602 running on a production server may send a file system request 604 .
- the file system 606 then converts this request to a SCSI request 608 , which is passed down to a low-level device driver 610 .
- the low-level device driver 610 creates a hardware-specific translation 612 for the HBA 614 it services.
- the HBA 614 then communicates with a storage array 616 through the fabric 618 .
- the low-level device driver 610 To implement the first embodiment of the present invention involving latency measurements, the low-level device driver 610 must be modified at 620 to time stamp both the outgoing I/O request and the incoming I/O completion. To implement the second embodiment of the present invention involving the collection of interesting negative event data, the low-level device driver 610 must be modified to include an agent API 622 to act as an interface with the agent and allow the agent to obtain the latency and negative initiator event data from the driver.
- the agent API 622 can be stored in computer-readable storage media and executed by one or more processors in the server.
- the agents 418 can work with any HBA 408 or I/O controller, provided that the proper commands are sent to the initiator driver stack 424 .
- the agents 418 have to ask the right questions (specific or general or OS questions) of a particular HBA 408 through the initiator driver stack 424 .
- the agent 418 can first send vendor-unique API commands to the driver stack 424 , and if those fail, either because the HBA 408 is from a different vendor or because the driver stack is outdated, the agent can revert to either open standard APIs or Operating System APIs to communicate with the driver stack and HBA.
- the network diagnostics system is able to collect both initiator event data and OS data.
- the SAN diagnostics system of the present invention is based on the fact that the SCSI protocol is an initiator-based protocol. As such, the initiator starts every “conversation” in the SAN, and if any entity between the initiator and the target cannot partake in the conversation or service the request, feedback must be provided back to the initiator. Because a SCSI initiator sees things and issues kinds of commands that SCSI targets don't, and receives feedback, SCSI initiators are in a privileged position.
- a switch can respond with a message indicating that the switch is too busy on a particular link to process a request from an initiator at this time. It can be very helpful to be notified of this type of bottleneck, so a message (protocol feedback) is sent back from the switch to the initiator. In another example, feedback can be received from the target itself, indicating that the disk or the target controller is too busy to process a request.
- the initiator is the only entity that collects feedback related to the overall performance of the network. For example, from an event standpoint and performance standpoint, a switch doesn't have a good perspective on end performance. The switch isn't collecting information from the target on how busy the target is, it is just collecting information on how busy the switch is. Thus, the initiator is the entity most suitable for placing an agent to collect information related to the overall performance of the network.
- This feedback can be extracted from within the initiator's own driver stack by the agent.
- an agent is not needed in the target's stack, because the central SAN diagnostics manager does not communicate with the target directly. Instead, an agent obtains information from the initiator's stack (which was received from the target).
- Initiator event data that may be collected by the sensors/agents can include, but is not limited to, (1) fabric_busy, which is sent back to an initiator by a fabric device to indicate that the fabric is overloaded, (2) queue_full, which is sent back to an initiator by a disk to indicate that a particular LUN can't process any new requests/commands because its queue is full, and (3) SCSI_busy, which is sent back to an initiator by a target to indicate that the target is too busy to process commands.
- Other initiator event data that may be collected includes a target_reset command, which usually originates from the SCSI stack in a server.
- Still other initiator event data that may be collected includes, but is not limited to, ELS PLOGI (Process Login), ELS LOGO (Logout), ELS PRLO (Process Logout), ELS ADISC (Address Discovery), ELS RSCN (Registered State Change Notification), Link control P_BSY (Port Busy), FCP Read Check Error, SCSI Logical Unit Rest, SCSI Check Condition, SCSI Bus Reset, Queue Depth Changed, Firmware Updated, and Boot Code Updated.
- ELS PLOGI Process Login
- ELS PRLO Process Logout
- ELS ADISC Address Discovery
- ELS RSCN Registered State Change Notification
- Link control P_BSY Port Busy
- FCP Read Check Error SCSI Logical Unit Rest, SCSI Check Condition, SCSI Bus Reset, Queue Depth Changed, Firmware Updated, and Boot Code Updated.
- this type of initiator event data is often much more relevant and interesting to the diagnosis of network trouble than
- This data being collected is initiator endpoint data, not overall performance data.
- the collected data is exception data from inside the stacks of the initiator, target, and switches.
- the collected data is selectively chosen to be interesting negative event data, a subset of the data available for collection. No additional performance degradation may be incurred by the collection of such data, because performance is already degraded at this point in time.
- Another capability of the agents 418 according to embodiments of the present invention is communicating with the OSs 438 of the servers to gather and report data in ways that are not currently done.
- the agent 418 running in user space gets this data from the OS 438 and sends it back to the network diagnostics manager 416 .
- the information that can be collected by the agent 418 from the OS 438 includes, but is not limited to, OS events, queue depth, throughput, the number of commands seen during a certain period of time, seconds since last HBA reset, transmitted frames, transmitted words, received frames, received words, LIP count, NOS errors, error frames, dumped frames, link failures, loss of sync, loss of signal, invalid transmit word count, invalid CRC count, disk read bytes per second, and disk write bytes per second.
- inventions of the present invention can be extended to back-end initiators.
- interesting negative event data can be collected from target stacks within back-end initiators, at the back end of a network appliance filer (a filer front end with a SAN at the back end).
- a network appliance filer a filer front end with a SAN at the back end.
- initiators may be connected through a fabric (including fabric switches) to a storage subsystem having a FC HBA operating in a target mode. Agents could be placed in the storage subsystem to pull negative event data from the target driver stack at the front end of the storage subsystem.
- the network diagnostics manager could receive data from the target driver stack in an attempt to understand where the storage subsystem is having issues (e.g. cache memory, a fabric issue on back end, etc.).
- issues e.g. cache memory, a fabric issue on back end, etc.
- the event data that can be collected by embodiments of the present invention is not limited to the specific data mentioned above.
- Embodiments of the present invention include other metrics that would be of interest.
- the agent could monitor particular error responses in a particular sequence, and note the time of occurrence of each sequence. Collecting this data from a number of locations may produce a meaningful compilation of data. This is just one example of the types of information that can be collected according to embodiments of the present invention.
- FCOE Fibre Channel over Ethernet
- FCOE Fibre Channel over Ethernet
- iSCSI more generally, anything utilizing a SCSI stack
- NAS and NIC stacks Ethernet stacks
- SAS Serial Attached SCSI
- the third embodiment of the present invention to be described in further detail relates to the computation of an oversubscription value based on the demand for a device divided by the handling capacity of the device to help determine whether the device is oversubscribed and changes need to be made.
- SCSI devices When SCSI devices are deployed, they are programmed with a command queue depth. This queue depth dictates how many commands can be queued up to the particular device at a given time. If there is demand for more commands and the queue is full, the other commands cannot enter the queue and need to wait for slots to open up. In large data centers, there are thousands of servers accessing thousands of devices. If there is too much demand for a particular device, its queue fills up, and applications must wait to be serviced. If the queue is too overloaded it could, for example, take ten seconds to open an e-mail. As a result, understanding how the queue is utilized is important to understanding looming performance issues.
- a RAID controller In a RAID controller the issue is further compounded.
- devices such as disk drives
- This controller front end has some I/O processing capability that is far less than the I/O processing capability of all the devices it supports.
- I/O processing capability As a result, it is important to understand the I/O demand on the controller as well as on each individual device.
- virtual server technology more queuing demand is placed on storage controllers, by fewer initiators and servers. Further, the mapping of all the queue demand to the storage controllers is more difficult to discern and aggregate.
- Embodiments of the present invention collect, operate on, and display queuing information from thousands of host servers in a data center environment. This is not practical to do manually today, and no other product exists to perform this task. Because this information is not available today, IT administrators are forced to react to performance issues. Often times they must either slow down all the systems (lower the queue depth everywhere) in order to bring the system into some level of predictable performance. Without specific, and system-wide, queuing information it is difficult to maintain peak performance.
- FIG. 7 is an example of a SAN 700 and a network diagnostics manager 702 capable of computing an oversubscription value according to embodiments of the present invention.
- the production servers 704 contain queues 706 having a certain queue depth 708 for storing commands associated with a particular LUN 710 .
- Each production server 704 may be mapped to a LUN 710 in a storage array 712 .
- the storage array 712 accesses the fabric 714 through an array port 716 , which may have a certain handling capacity 718 that is a function of the number of commands that can be simultaneously handled and the memory required for those commands.
- FIG. 7 is an example of SAN 700 and a network diagnostics manager 702 capable of computing an oversubscription value according to embodiments of the present invention.
- the production servers 704 contain queues 706 having a certain queue depth 708 for storing commands associated with a particular LUN 710 .
- Each production server 704 may be mapped to a LUN 710 in a storage array 712
- each queue 706 in each production server 704 has a queue depth of 30, but the array port handling capacity 718 is 45.
- a computation of a configured oversubscription value as defined by the configured queue depths of the queues in the production servers associated with LUNs being serviced by a particular array port (e.g., 30+30+30 in this example) divided by the maximum array port handling capacity (supply) servicing the logical LUNs (e.g., 45 in this example) yields a configured oversubscription value of 2, which is within acceptable limits (some oversubscription is desirable because the production servers in reality may only have a couple of commands in their queues at any time).
- the configured oversubscription value should become 20, for example, this can represent a significant oversubscription calling for a reallocation of resources.
- this embodiment of the present invention may compute an actual oversubscription value, which uses the actual queue depths of the production servers divided by the maximum array port handling capacity (supply) servicing the LUNs.
- oversubscription ratios that can be calculated according to embodiments of the present invention include, but are not limited to, target Port Oversubscription Value for maximum queue depths, target Port Oversubscription Value for actual queue depths, HBA Oversubscription Value for maximum queue depths, HBA Oversubscription Value for actual queue depths, Device Oversubscription Value for maximum queue depths, and Device Oversubscription Value for actual queue depths.
- Embodiments of the present invention can automatically collect the programmed block storage maximum queue depth for every device, sample the block storage actual queue usage for every device, map the queues of every device to storage arrays or controller front-ends, and map maximum and actual per device queue information on a per server, per initiator, per target port, port target, and per device basis.
- agents installed in the production servers and the storage arrays can extract this information and send it to a SAN diagnostics manager in one or more management servers over the Ethernet. This information is then organized and displayed (e.g., on a web page accessible over the Internet) in such a manner that the data center administrator can quickly determine if there are issues or opportunities related to queue management.
- the system will also provide for alerts when certain measured values approach or surpass configured thresholds (high water marks).
- the system also establishes a Target Port Oversubscription Value (TPOV) for both the maximum and utilized queue usage. This value is based on the quotient of each of these queue usage measures, summed up for all the devices behind a target port, divided by the I/O handling capability of the storage array.
- TPOV Target Port Oversubscription Value
- the I/O handling capability of the storage array is either gleaned empirically, or overridden by the end-user based on more expert knowledge of a specific configuration. In other embodiments, reports and trend charts can be produced.
- Embodiments of the present invention also extend to the back-end of network attached storage systems (a.k.a. filers). Because many filers are simply specialized server front-ends with SAN back-ends, embodiments of the present invention can be used to monitor the queuing on the backend and provide important performance data to the end user.
- network attached storage systems a.k.a. filers
- the fourth embodiment of the present invention to be discussed in greater detail relates to collecting and logging certain types of event data in a database in a centralized management server, and computing a system severity value indicative of the level of impact (criticality or severity) of each event.
- the data is collected at the driver level by the agents, and stored in the special memory in the production servers.
- I/O scope low level events
- the base application can utilize this data to generate a severity calculation based on a predetermined severity level along with some collected data such as the number of servers affected by a target reset command.
- FIG. 8 illustrates an exemplary SAN 802 implementing severity data collection system according to embodiments of the present invention.
- each computer 800 (which can be equated to server 402 in FIG. 4 a ) connected to the Storage Area Network (SAN) 802 runs a Storage Area Network Monitor Agent 804 (which can be the same as or additional to agent 418 in FIG. 4 a ).
- This agent 804 continuously monitors for storage network protocol events that include, but are not limited to, Discovery events, Task Management commands, Link events, SCSI errors, and Change in data access performance. This monitoring can be performed with minimal or zero effect on system performance. Each one of these events can adversely affect the availability and performance of the SAN 802 .
- Discovery events can indicate, for example, that an application's storage is no longer available.
- Task Management events can indicate that a storage controller is experiencing intermittent hardware problems, potentially leading to loss of storage access for several servers.
- Link events can also indicate loss of access, or degraded access to the SAN 802 .
- SCSI errors can indicate loss of access, or degraded access to the SCSI devices attached to the SAN 802 .
- any Storage Area Network monitor agent 804 detects an event, it sends information about the event to the Storage Area Network Monitor 806 (which can be equated to the network diagnostics manager 416 in FIG.
- the Storage Area Network Monitor 806 When the Storage Area Network Monitor 806 receives such event information from an agent 804 , the Storage Area Network Monitor logs this information to the Event Log Database 808 (which can be equated to database 432 in FIG. 4 c ). This allows root cause of any severe events to be investigated immediately, as opposed to awaiting another failure scenario after enabling logging.
- a System Severity field or value will be calculated as described above. System Severity will consider the level of impact for each event on other SAN elements. For example, a link event on one of four links to a server is not as critical as a link event to a server with a single link. Likewise, a downed target that has two servers connected to it likely has less impact than a target that has thirty servers connected to it.
- the Event Log Database 808 contains event information from the entire SAN 802 as a SAN-wide protocol analyzer for root cause analysis.
- the Event Analyzer 810 operates in two modes, Manual Mode and Knowledge Base Mode. In Manual Mode, the Event Analyzer 810 presents several filtering and sorting options to the end-user. Multiple SAN events can be filtered to investigate events associated with particular initiators, particular storage arrays, particular times of day, etc. Sorting can also be utilized to perform time-based sorts, storage port sorts, etc. This manual mode will allow capture of problematic SAN events, along with the filtering and sorting tools to quickly identify the root cause of the event.
- the I/O data can also be used in a predictive manner to allow for the prevention of problems before they occur.
- the collected information can include the various ways (network paths) that an I/O request can access a target. By observing the latency data and the negative events, it may be possible to determine when one or more of these I/O paths are lost. An alert can be generated when the number of I/O paths is reduced to some threshold (e.g., one path). Such an alert would allow an administrator to restore the lost paths.
- Currently such an alert is only available for physical network paths and not I/O paths, and I/O paths can be lost even though physical networks paths are not. Such is the case when a the I/O path is lost within the target device, even though the physical network path to the target device is operational. The only alert is when all paths are lost.
- an alert can be generated if a Queue Full event came back a certain number of times (e.g., 25 times) in a minute, or if the latency for a particular initiator/target port pair exceeds an average of a certain amount of time (e.g., 100 ms) over a given sample period (e.g., one hour).
- a Queue Full event came back a certain number of times (e.g., 25 times) in a minute, or if the latency for a particular initiator/target port pair exceeds an average of a certain amount of time (e.g., 100 ms) over a given sample period (e.g., one hour).
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Environmental & Geological Engineering (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- This is a Continuation In Part (CIP) of U.S. application Ser. No. 11/360,557, filed on Feb. 22, 2006, the contents of which are incorporated by reference herein in their entirety for all purposes.
- Embodiments of the present invention relate generally to the collection of event and performance data at one or more servers in communications infrastructures such as a Storage Area Network (SAN) and forwarding that data to a network diagnostics manager where the data can be analyzed and adjustments made to the SAN to improve its performance, referred to herein as input/output (I/O) management.
- In modern data centers, business data is often stored within networked storage systems. These storage systems, typically disk arrays, are usually connected to a fast, reliable and low latency SAN. Servers needing access to this data may also be connected to the SAN using a Host Bus Adapter (HBA), Network Interface Card (NIC), or other similar adapter or interface device (generally referred to herein as an Input/Output (I/O) controller). The disk arrays in the SAN can be presented as Small Computer System Interface (SCSI) disks to the Operating System (OS). The SCSI disks are, in turn, either presented up to an application running in the server as a File System or a raw disk device. The OS and applications running on the server may access the SAN storage array as a disk connected to the server.
- In today's increasingly data-driven and competitive business environment, fast, efficient, error-free storage and retrieval of data is often critical to business success. The use of SANs has become widespread as the ability to store and retrieve massive amounts of data from a large number of storage devices over a large geographic area is now becoming a business necessity. Not surprisingly, the ability to quickly identify and fix problems and bottlenecks in storing and retrieving data across a SAN is a goal of any such storage system.
- However, SAN errors and bottlenecks are often difficult to diagnose, and can be caused by subtle interactions with seemingly unrelated devices. For example, in the vast majority of installations today, the SCSI protocol is layered on top of the network protocol. As a result, the OS issues SCSI commands to the storage array to access data and control the storage arrays. Two types of commands are issued to the storage array. Data commands (e.g., read, write, report Logical Unit Number (LUN or, more simply, LU), and the like) are issued by the OS to access data stored in the storage array. Task management commands (e.g., target reset, LUN reset, etc.) are issued to control the command queues of the storage system. The task management commands issued by one server to a storage array can affect data access commands from another server to the same storage array. This also means the action of one server connected to a storage array can cause an error on another server connected to same storage array.
- Like all networks, discovery- and link-related events can occur in the SAN that lead to availability and/or performance problems. Due the complexity of the SAN protocols involved and the size of the SANs in today's world-class data centers, it is essential to have tools to help quickly identify the root cause of any network or storage connectivity issues. However, the solutions that exist today, under the umbrella of SAN management software, do not provide the required information to quickly, if ever, determine root causes.
- When a network problem is detected, some existing solutions can allow a server configured as an initiator to enter a debug or diagnostics mode. In such a diagnostics mode, agents can be employed to collect massive amounts of counter information and protocol event data (SCSI events, Fibre Channel (FC) events, discovery events, and the like) related to the fabric and target at each HBA and store the collected data in a system log file. However, the counters only provide information about the performance of a particular HBA or switch port (e.g., the amount of data passing through an HBA, the number of data packets sent, received, etc.), but do not provide a “big picture” of what is happening in the overall network. Counters are good at showing trends but are not effective, and sometimes misleading, when attempting to determine root causes of SAN availability or performance issues. The high-level event data is also generally limited to information about events seen at a particular HBA or switch port (e.g., a notification that a network component was inserted or removed, etc.), but as with the counter information, it does not provide a “big picture” of what is happening in the overall network. High-level events are also problematic in helping determine root causes, because they can often be intentionally induced by the end-user, or simply a symptom of a problem created by a root cause existing elsewhere.
- Furthermore, this type of intensive data collection represents an overhead burden that affects the performance of the system because the system is still operating while the massive amount of event data is being collected. In addition, the mere act of operating in a diagnostics mode can mask the problem. Moreover, the mere collection of data does not provide any insight into the problem. The system log file must be reviewed, and the data collected at the time the performance issues were occurring must be interpreted in an attempt to diagnose the problem.
- Today's SAN Management tools rely on counter and event information because that is all that is available to them. Protocol information (e.g. network protocol and SCSI protocol information) is much more valuable for uncovering root cause, but this information is typically locked up in the network devices and never exposed.
- Some existing network diagnostics tools do not require special hardware placed at various locations throughout the network. Such tools communicate with the fabric switches (each of which has a Simple Network Management Protocol (SNMP) agent running inside it) using the SNMP protocol, and gather high level counter data (e.g., how many bytes have been transmitted in the last hour, the number of read commands in the last hour, etc.). However, this data is generally uninteresting, because the fabric is usually able to move all Input/Output (I/O) commands being demanded of it. Furthermore, when events happen at the fabric or process level, an endpoint (initiator or target) no longer sends any commands. The lack of activity detected by the counters indicates there may be a problem, but the type of problem is unknown.
- Other existing SAN diagnostics tools require special hardware (e.g. deep analyzers) to be placed at various locations around the network to collect data and generate reports. Often, because this hardware is expensive, a single (or a few) hardware analyzer(s) must be moved around from HBA to HBA to gather needed data. However, the data collected by such hardware solutions also cannot develop a big picture of the network.
- Some network switches have an option where a port can be directed to send information to analyzer hardware within the switch. Additional hardware external to the switch then encapsulates the information into Ethernet frames that can be read with dedicated software. This type of hardware solution represents another hardware add-on that provides for the viewing of lower level protocol items. It does this by extracting portions of the packets that the switch may not normally extract for the purpose of collecting the information, and does so only on a single port at a time. After the initiator stack obtains this information from the target and fabric, the information can be interpreted. However, in response to this information, the initiator can only control its own operation (e.g., not send as much data, try another route, etc.). Moreover, the initiator does not keep a “scorecard” of this information for diagnosing network performance issues.
- In addition to the SAN diagnostics tools mentioned above, current HBA management tools can also provide some diagnostics capabilities. For example, Emulex Corporation's HBAnyware™ management suite, in its current configuration, keeps track of how HBAs are performing, how they are configured, enables HBAs to be configured remotely, and allows reports to be sent to remote locations on the network. HBAnyware™ is disclosed in U.S. application Ser. No. 10/277,922, filed on Oct. 21, 2002, the contents of which are incorporated herein by reference. The functionality of HBAnyware™ resides in HBA device drivers, but remote user space agents in the HBAs are also needed to perform the management functions.
- HBAnyware™ collects configuration information about the HBAs using agents in the remote servers (HBAs) and causes the HBAs to be configured for different sizes and behaviors. HBAnyware™ communicates with the remote servers both in-band and out-of-band. With HBAnyware™, the HBA drivers in the remote servers communicate with each other to allow centralized management of the SAN and configuration of HBA hardware at a central point. For example, if HBAnyware™-compatible hardware is located somewhere in the SAN, it can be discovered by the HBAnyware™ software. Messages can be sent to and received from the HBAnyware™-compatible hardware that cause the firmware in the hardware to be updated, enable the configuration of the LUNs in the network, etc. All of this can be done from a central location rather than requiring each server to separately configure its own HBA.
- HBAnyware™ can also collect some types of diagnostics information. With HBAnyware™, the agents collect data from the stack, but only data local to the HBA (e.g. link up, link down) is collected. Counter data is collected from the HBAs, but it is generally uninteresting, and no lower level protocol events, no latency data, and no capacity information is collected. Moreover, HBAnyware™ does not integrate the collected information into a system view.
- Therefore, there is a need to collect specific interesting negative event data, along with command latency and system capacity data, to enable a picture of the operational health of the SAN to be determined and quickly identify the root cause of SAN problems.
- Even in the absence of catastrophic SAN errors, SAN performance can be critical to business success. Therefore, reducing the time it takes to store and retrieve data across a SAN is always a goal of any such storage system.
-
FIG. 1 illustrates an exemplaryconventional SAN 100 including ahost computer 102, afabric 104, atarget 106 and one or more Logical Units (LUs) 108, which are actually logical drives partitioned from one or more physical disk drives controlled by the target's array controller. Thehost computer 102 includes aninitiator 110 such as a Host Bus Adapter (HBA) or I/O controller for communicating over theSAN 100. Arepresentative application 112 is shown running on thehost computer 102. Thefabric 104 may implement the Fibre Channel (FC) transport protocol for enabling communications between one ormore initiators 110 and one ormore targets 106. Thetarget 106 acts as a front end for theLUs 108, and may be a target array (a single controller with one or more ports for managing, controlling access to and formatting of LUs), Just a Bunch Of Disks (a JBOD) (a collection of physical disks configured in a loop, where each disk is a single target and a LU), a Switched Bunch Of Disks (SBOD®), or the like. An example of a conventional target array is an EMC Symmetrix® storage system or an IBM Shark storage system. - In the example of
FIG. 1 , theapplication 112 may employ a file system protocol and may initiate read or write I/O commands 114 that are sent out of thehost 102 through theinitiator 110 and over thefabric 104 to target 106, where data may be read from or written to one or more of theLUs 108. When an I/O command 114 is transmitted, there is an expectation that the I/O command will be completed, and that it will be completed within a certain period of time. If the read or write operation is completed successfully, an I/Ocommand completion notification 116 will be delivered back to theapplication 112. At other times, however, if atarget 106 orLU 108 is overloaded or malfunctioning, the I/O command may not complete, and no I/Ocommand completion notification 116 will be sent back to theapplication 112. In such a situation, the only feedback received by theapplication 112 may be an indication that the I/O command timed-out, and a reason code providing a reason for the timeout. - To assist a SAN system administrator in identifying
problem targets 106 or LUs 108 and maintaining an efficient SAN with a balanced and fair LU workload, it is desirable to know the average I/O command completion time for I/O commands sent to eachLU 108 in atarget 106. In particular, it would be desirable for a system administrator to receive continuously updated LU-specific average I/O command completion time information for each LU in each target the initiator discovered in a dynamic manner. Such information would enable the system administrator to identify where latencies are being injected into the SAN or identify latencies that are worsening, and make adjustments accordingly. For example, if the average I/O command completion times for twodifferent LUs 108 in thesame target 106 are drastically different, for a similar I/O pattern and RAID level (e.g. greater than 25% difference), this may be an indication that the LUs are unbalanced and that there is some unfairness at the target, and that perhaps the LU loads need to be re-balanced to achieve a greater degree of fairness. On the other hand, if the average I/O command completion times for allLUs 108 at atarget 106 are rising, over time, and becoming too high, this may be an indication that the target is receiving too many I/O requests and that more storage needs to be added so that some data can be shifted to the new target. In other words, it is desirable for the application to detect unfairness among LUs and/or overloaded conditions at a particular target. - However, conventional fabric-attached storage solutions do not provide average I/O command completion time information for an
initiator 110 andtarget 106 in aSAN 100, or for multiple initiators and targets in a SAN. Conventional systems either do nothing, or wait for an initial I/O command failure to occur before taking corrective action such as limiting the outstanding I/O count. The problem with this approach is that by the time the storage device provides an indication that a problem exists, it may be too late to influence the storage device or it may become very expensive to react from an application point of view. - It should be noted that for directly attached and controlled storage such as conventional parallel Small Computer System Interconnect (SCSI) systems where the storage is directly connected to the host without an intervening target array, tools do exist for calculating the I/O command completion time for a particular I/O command and an average I/O command completion time, such as iostat-v, sysstat version 5.0.5, ©Sebastien Godard, the contents of which are incorporated by reference herein. In such systems, a statistics counter in the SCSI layer keeps track of I/O command completion times, and monitoring tools within the operating system display this parameter. However, the average I/O command completion time is merely an information-only health indicator, because directly-attached storage systems by their very nature cannot make use of this information to adjust storage allocations and improve the response times of I/O commands.
- Therefore, there is also a need to compute average I/O command completion times on a per-LU, per-target basis within a fabric-attached storage system to enable a driver within a host, or a system administrator, to make adjustments to improve the efficiency of the SAN.
- One of the causes of increased latency in the execution of I/O commands in SANs is the oversubscription of resources. The responsiveness of devices such as a disk array is a function of the queue depths of queues in their associated production servers and the handling capacity of their storage array ports. Therefore, reducing problems associated with the oversubscription of resources across a SAN is always a goal of any storage system.
- In today's datacenters, queue depth is one of the “knobs” available to the storage administrator to balance the system. When managing queue depths, a SAN can be thought of in terms of many other queuing problems. The SAN has a fixed I/O handling capacity, and that capacity needs to be shared by all the applications that are demanding I/O.
- Today's SAN Management solutions focus on the capacity issue being in the fabric itself, or the disk capacity at the array. For example, Storage Resource Management (SRM) captures and reports, separately, SAN Management data (link utilization, for example) for switches and Storage Management data (primarily storage capacity) for arrays. However, the fabric is rarely the I/O capacity bottleneck. More often, the bottleneck is either at the server or at the storage controller. At the server, I/O handling capacity depends on a number of factors, including memory availability, kernel architecture, and Central Processing Unit (CPU) power. At the storage controller, I/O handling is also dictated by a number of factors, including the system architecture, the controller front-end, the amount and speed of cache, the controller back-end, and the actual disks themselves. When there are performance issues that need to be managed with queue depths, administrators are forced to use a completely manual process today.
- Managing performance issues requires an understanding of the current mapping of initiators to target ports and backend devices. In addition, understanding the queue depth demand of every initiator, the I/O handling capability of the storage controllers, and an understanding of the actual queue demand placed on the system by every initiator is highly desirable. All of this information must be put together to help understand where the performance issue is, and what areas can be leveraged to mitigate or eliminate the performance issue. Putting together this information is becoming more difficult in today's data centers. With virtual server technology, more queuing demand is placed on storage controllers by fewer initiators and servers. Further, the mapping of all the queue demand to the storage controllers is more difficult to discern and aggregate.
- Therefore, there is also a need to quickly and easily obtain capacity information for resources in the SAN to determine when oversubscription is becoming a problem and to initiate fixes to alleviate the oversubscription.
- Embodiments of the present invention relate generally to the collection of event and performance data at one or more servers in communications infrastructures such as a SAN and forwarding that data to a network diagnostics manager where the data can be analyzed and adjustments made to the SAN to improve its performance, referred to herein as I/O management.
- One embodiment of the present invention relates to the use of remote agents and a central server application for collecting specific interesting negative event data (diagnostics data) to enable a picture of the operational health of the SAN to be determined. To better identify problem areas in the network, agents are placed in servers having HBAs, NICs, or other adapters (I/O controllers) acting as initiators. The agents interact with relatively inexpensive HBAs in the servers through a driver stack to collect event data. Because of the initiator function they perform, HBAs have visibility to parts of the network that other entities do not have access to, and thus are ideal locations for gathering event data. A SAN diagnostics manager located in one or more management servers then pulls the collected data from each agent so that a SAN diagnostics manager can piece together a “picture” of the SAN that an individual server would not ordinarily be able to see. In addition to collecting initiator data, the agents can also collect errors and performance data (e.g., throughput problems, etc.) seen at the HBAs from the OS of the servers.
- Unlike conventional SAN diagnostics tools, the agents according to embodiments of the present invention are (1) nondisruptive, (2) capable of being activated when needed, (3) selectively configurable to look for only a certain number of data items and store them in memory, and (4) configured for periodically sending this information back to a central location so that a picture of what is happening in the SAN can be developed. Although the information being collected at any one particular HBA by itself may not by itself be enlightening to overall network performance, the collection of information being gathered by a number of HBAs can reveal the trouble spots in the network.
- Furthermore, unlike conventional software-based SAN diagnostics tools that are only able to collect counter information and/or high-level event data, embodiments of the present invention are able to collect protocol-based negative event data such as error messages and observational performance information received by initiators from the targets.
- In addition, unlike conventional SAN diagnostics tools that create performance degradation due to their collection of massive amounts of data, embodiments of the present invention can cause little or no additional performance degradation by being configurable to collect only a relatively small amount of interesting negative event data.
- Moreover, unlike conventional SAN diagnostics tools that collect only numeric counter data unassociated with time except as to when the counters were read, embodiments of the present invention collect “stateful” event information having a temporal component. In addition, because counters do not provide time information, they must be monitored constantly, which can have a negative performance impact.
- Conventional SAN diagnostics tools can only provide counter data indicating, for example, the number of bytes being received by an HBA, with no visibility at the driver level. However, embodiments of the present invention can make requests of the driver itself, such as observed performance indications (e.g., a latency timer that starts when the request is made and stops when a completion message is received). These performance indications can reveal previously undetectable performance issues at the driver level.
- Also, unlike conventional hardware-based SAN diagnostics tools that must be moved around from HBA to HBA and cannot develop a big picture of the network, embodiments of the present invention do not require hardware, just downloadable drivers, APIs and agents, and can collect the specific kind of data needed to develop a big picture of the network.
- Furthermore, unlike HBAnyware, which only collects configuration information about the HBAs, the data collected according to embodiments of the invention, although collected by the initiator driver stack, doesn't relate to the initiator but relates to other parts of the network such as the switches or the targets.
- A further embodiment of the present invention relates to the computation of an oversubscription value based on the demand for a device divided by the handling capacity of the device to help determine whether the device is oversubscribed and changes need to be made. A still further embodiment of the present invention relates to collecting and logging certain types of event data in a database in a centralized management server, and computing a system severity value indicative of the level of impact (criticality or severity) of each event.
-
FIG. 1 illustrates an exemplary conventional SAN including an initiator for sending an I/O command, a fabric, and a target for the I/O command including one or more LUs. -
FIG. 2 illustrates an exemplary kernel within a host computer for determining I/O command completion times according to embodiments of the present invention. -
FIG. 3 illustrates an exemplary flowchart describing the determination of I/O command completion times according to embodiments of the present invention. -
FIG. 4 a illustrates an exemplary SAN in an enterprise data center according to embodiments of the invention. -
FIG. 4 b illustrates an exemplary production server in greater detail according to embodiments of the invention. -
FIG. 4 c illustrates an exemplary management server in greater detail according to embodiments of the invention. -
FIG. 5 illustrates an exemplary organizational structure of an enterprise data center. -
FIG. 6 illustrates an exemplary communication flow between software and hardware elements according to embodiments of the present invention. -
FIG. 7 is an example of a SAN and a SAN diagnostics manager capable of computing an oversubscription value according to embodiments of the present invention. -
FIG. 8 illustrates an exemplary SAN implementing severity data collection system according to embodiments of the present invention. - In the following description of preferred embodiments, reference is made to the accompanying drawings which form a part hereof, and in which it is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the preferred embodiments of the present invention.
- Embodiments of the present invention relate generally to the collection of data at one or more servers in a SAN and forwarding that data to a network diagnostics manager where the data can be analyzed and adjustments made to the SAN to improve its performance.
- One particular embodiment of the present invention relates to the processing of I/O commands across the SAN, and more particularly, to the determination of I/O command completion times and average I/O command completion times (latency) per logical driver in a SAN to enable optimization of storage allocations and improve I/O command completion times. Another particular embodiment of this invention relates to the use of remote agents embedded in initiators and a network diagnostics manager application for collecting specific interesting negative event data (diagnostics data) to enable a picture of the operational health of the SAN to be determined. A further particular embodiment of the present invention relates to the computation of an oversubscription value based on the demand for a device divided by the handling capacity of the device to help determine whether the device is oversubscribed and changes need to be made. A still further particular embodiment of the present invention relates to collecting and logging certain types of event data in a database in a centralized management server, and computing a system severity value indicative of the level of impact (criticality or severity) of each event.
- Each of these embodiments will be described in greater detail below. Note that although embodiments of the invention may be described herein in the context of SANs, or more generally in the context of networks, it should be understood that embodiments of the invention are also applicable to other types of communications infrastructures, such as Network Attached Storage (NAS), High-Performance Computing, standard network traffic, and the like.
- The first embodiment of the present invention to be described in greater detail relates to the determination of I/O command completion times and average I/O command completion times (latency) per logical driver in a SAN to enable optimization of storage allocations and improve I/O command completion times. It should further be understood that although embodiments of the present invention are described herein in terms of SCSI upper layer transport protocols and FC lower layer transport protocols for purposes of illustration only, embodiments of the present invention are applicable to other upper and lower layer transport protocols. Note also that embodiments of the present invention are not limited to fabric-attached storage, but apply to any SAN topology discoverable by the present invention, be it hub-based, arbitrated-loop based, or fabric based.
-
FIG. 2 illustrates anexemplary kernel 200 within a host computer for computing I/O command completion times and average I/O command completion times according to embodiments of the present invention. Thekernel 200 is the essential center of the host operating system, the core that provides basic services for all other parts of the operating system. Thekernel 200 may include an upper transport protocol layer such asSCSI layer 202 and a lower transportprotocol driver layer 204. Thedriver 204 may include a transmitsection 206, a receivesection 218, andglobal data space 228. The driver'sglobal data space 228 may store driver configuration data,buckets 224 for each LU, and aqueue 234 for each LU, described in further detail below. - Every time an I/O controller port is discovered, the host operating system calls the
driver 204, which allocates a block of storage or data structure within itsglobal data space 228 representing that port instance, and assigns a target pointer to that block of storage. Because an I/O controller may contain more than one port, and the driver maps each I/O port to a target, the driver monitors multi-ported I/O controllers the same as single-port I/O controllers. This approach maintains target/port independence. In other words, the driver does not try to figure out whether two or more targets belong to a single I/O controller. FC discovery provides for target discovery only, and targets are not subordinate to each other. Therefore a multi-port array looks like multiple targets to the driver's discovery engine, just like a JBOD with four disks is discovered as four targets. Embodiments of the present invention track this raw data just the same, allowing upper layer applications to “link” target/LU disturbances together with additional knowledge of the SAN topology. - To compute average completion time on a per-LU, per-target and per port basis, the
driver 204 must store statistics about the completion times for a number of I/O command completions on a per-LU, per-target, per-port basis. Therefore, in embodiments of the present invention, the driver may allocate “buckets” (memory locations) within itsglobal data space 228 for storing a count of the number of I/O commands that completed within a particular range of time. For example, one bucket may keep track of the number of I/O commands that took between 0.0 and 10.0 ms to complete, another bucket may keep track of the number of I/O commands that took between 10.0 and 20.0 ms to complete, another bucket may keep track of the number of I/O commands that took between 20.0 and 30.0 ms to complete, and so on. Bucket sizes may be fixed by thedriver 204, or may be specified by the system administrator when the driver is loaded. Eachbucket 224 corresponds to a particular LU, target and port. In the example ofFIG. 2 , N buckets are allocated for each of M LUs, and this two-dimensional array may be repeated for P targets and Q ports. Thus, a three-dimensional per-LU, per-target, per-port histogram array is stored in the driver'sglobal data space 228. - The
buckets 224 are accumulators, so they are not normally reset. Eventually, they may wrap back to zero, so embodiments of the present invention may keep track of when the count in each bucket wraps around. For example, if the total count in an N-bit bucket is 2̂N, and it has wrapped twice, the driver must recognize that the count in the bucket is 2×2̂N plus whatever count is in the bucket at the time the computation is performed. One way that wrapping could be estimated is to keep track of the previous count and compare it to the new count. If the new count is lower, then it is assumed that the bucket count wrapped once. Alternatively, the driver could reset the bucket counts when an overflow condition is encountered, or the driver could issue a signal or trap to the application indicating an overflow, and the application could initiate a reset. - In the example of
FIG. 2 , I/O commands 232 from an application being executed by the host are received by the upper SCSI layer 202 (seeblock 300 ofFIG. 3 ) and passed down to the transmitsection 206 ofdriver 204 as SCSI commands 230. The transmitsection 206 time stamps a start time of the SCSI command 230 (seereference character 208 inFIG. 2 and block 302 inFIG. 3 ) and embeds the time stamp into a transport protocol data structure such as aFC command 210 that encapsulates the SCSI command. TheFC command 210 is then sent out overwiring 212 to the FC fabric 214 (seeblock 304 inFIG. 3 ). Note that “wiring,” as referred to herein, is intended to encompass any transmission media, including copper, fiber, and other media. However, it should be noted that the timestamp does not go out on thewiring 212. TheFC command 210 that encapsulates the SCSI command and timestamp includes some data fields (including the timestamp field) that do not leave the kernel memory allocated and managed by the driver. When an I/O command completion 216 representing the completion of the original SCSI command is received from thefabric 214 in the receivesection 218 of the driver 204 (seeblock 306 inFIG. 3 ), the receivesection 218 fetches and records thecompletion time 220 at which the I/O command completion 216 was received (seeblock 308 inFIG. 3 ). The receivesection 218 then computes an elapsed I/O command completion time (net round trip completion time from transmit to receive) 222 based on the difference between the timestamped SCSI command starttime 208 extracted from within the I/O command completion 216 and the recorded completion time 220 (seeblock 308 inFIG. 3 ). Every time an I/Ocommand completion time 222 is computed for a particular LU and port, the count in the appropriate bucket 224 (based on the I/O completion time) is incremented (seeblock 310 inFIG. 3 ). The buckets therefore maintain a count of the number of I/O commands completed and a distribution of all completion times. The count in the buckets may form a bell curve, or may form two different spikes, one for reads and one for writes, with the read times being much shorter than the writes. The relative position of these spikes to each other depends on the nature of the I/O mix and how the storage is set up (e.g., RAID, RAID1 or RAID5). The count can be used to compute an average I/O command completion time for a particular LU. - The I/O command completion time measurement is based on a clock in the host, and utilizes high resolution timers in the operating system that resolve to milliseconds at least. Thus, at a relatively low layer in the host, the
driver 204 keeps track of the time from when the driver sent an I/O command to the time it receives an acknowledgement of the completion of that I/O command, all the way back through the network from the LU. In other words, it is the entire round trip time from the driver's perspective. - Note that unlike conventional operating system facilities, which measure I/O command completion times at higher layers in the protocol stack, the I/O command completion time measurement according to embodiments of the present invention is performed by the lower transport
protocol driver layer 204. In particular, embodiments of the present invention track the I/O command completion times in thedriver 204 from the time theSCSI layer 202 gives theSCSI command 230 to the driver to the time the driver receives the I/O completion 216. The I/O command completion times therefore take into account all of the transport layer latency and overhead without injecting continued SCSI layer file system application thread transitions to user space into the completion time. The measurements are more accurate because the delays due to higher level processing are not included. - Once an I/O command completion time has been computed and the appropriate bucket has been incremented, the receive
section 218 may then compute an updated average I/O command completion time for the particular LU and port (seeblock 310 inFIG. 3 ). The average I/O command completion time for a LU can be computed by multiplying the average I/O command completion time represented by aparticular bucket 224 by the count in each bucket (e.g. a 0-10 ms bucket with a count of 10 would be 5 ms×10=50 ms). The product of each multiplication for each bucket associated with that LU is then summed. The sum is then divided by the sum of the counts in all of the buckets for that LU to produce the average I/O command completion time for that LU. This computation is repeated for all LUs in each target and for all ports. - Alternatively, raw data such as the bucket counts may be sent to a higher layer, and the computation of average I/O command completion times may be performed at this higher layer. The buckets and/or average I/O command completion times may also be made available to upper layer applications for display to system administrators via a host-provided Application Programming Interface (API). This host-provided API typically receives data at its bottom edge from the driver while exporting a callable interface at its top edge for applications. The data may also be used to make histogram plots that aid in early warning detection and usage patterns on the storage device (see
block 312 inFIG. 3 ). In addition, if a higher level application wants to read the histogram data, it may request atarget 1 pairing, and the driver would index into its private data structure, access it in its entirety, and return it back up to the application (seeblock 314 inFIG. 3 ). - In conventional systems, system administrators may wish to limit the storage system's configuration in advance of an overload (e.g. dropped I/O commands) to prevent overload from ever occurring. Overloading is an indication of an oversubscribed target. However, in conventional systems this must be done without the assistance of any monitoring data. Utilizing embodiments of the present invention, that effort could be reduced as each system communicating with the storage device would automatically detect average I/O command completion time trend increases and throttle back the outstanding I/O commands for each LU in a target. In particular, after the average I/O command completion time information is computed and stored, the information may be evaluated by the driver to determine if certain conditions exist that require automatic intervention. For example, embodiments of the present invention may detect overloading (average I/O command completion time for a LU approaching an upper limit, or I/O commands failing altogether). This upper limit may represent a predetermined time interval from a maximum allowable I/O command completion time specified by the upper layers, both of which may be a default value that may also be configurable by the system administrator.
- As mentioned above, a
queue 234 may be maintained in the driver'sglobal data space 228 for each LU in each target for each port. Thisqueue 234 holds outstanding (pending and incomplete) I/O commands for that LU. The depth of thequeue 234 may be controllable at the SCSI layer of the initiator. Adjusting the queue depth serves to control the number of outstanding I/O commands for each LU. - In embodiments of the present invention, the receive
section 218 may not only generate per-LU average I/O command completion times, as described above, but may also be able to do something with it, such as throttling back the per-LU queue depth. For example, suppose that the receivesection 218 detects that a LU's average I/O command completion time is moving out too far (increasing over time towards the upper limit). Upon detecting an average I/O command completion time that is in danger of increasing beyond this upper limit, the driver's receivesection 218 can upcall the midlayer (call into the operating system), and from theSCSI layer 202, lower the number of outstanding I/O commands to that LU by reducing the queue depth for that LU (by half, for example), until the LU recovers, as indicated by a reduction in the average I/O command completion time. The amount that the queue depth is lowered may be configurable by the system administrator. The effect of lowering the maximum number of incomplete I/O commands is that it increases the probability that the LU will actually respond and complete the I/O commands because it is not as overloaded. There is a better chance that the LU will complete the I/O commands rather than having them timeout and require error handling from activating in the upper layers of the system. In addition, multipathing configurations benefit from timely completion of I/O commands rather than error handling as multipathing configurations typically have to maintain command retry state that pressures system resources. - Alternatively, the queue depth can be lowered for all LUs in the target. This blanket approach serves to protect against the starvation of LUs and provide fairness to all LUs so that LUs with a high number of I/O command completions are throttled as well as those LUs that are starved. If, after lowering the queue depth for all LUs, the average I/O command completion time for a particular LU is still too high, the queue depth for all LUs in the target can be repeatedly lowered, as necessary, until a lower limit is reached. The lower limit, which may be configurable by the system administrator, is preferable as opposed to lowering the allowable number of outstanding I/O requests to reach zero because it is desirable to have some amount of I/O commands queued up so it is possible to evaluate how well the LU is doing. If the condition causing the high average I/O command completion time is transient, the LU will recover quickly. If the condition is more continuous in nature, the LU will recover slowly, or may not recover at all.
- If the average I/O command completion time starts to improve (drop) to some acceptable threshold, the driver can automatically perform step increases to the LU queue depth for all LUs in the target. The queue depth can eventually be raised until it is back to the initial depth that the driver was initialized with. The step increases may be configurable by the driver, and are useful to prevent overload conditions from being reintroduced if the condition causing the high average I/O command completion times is continuous in nature.
- If the array does not recover even after the corrections, the target may simply be oversubscribed, and it may be necessary to expand the number of LUs in the target, or redirect some of the data out to a new target. Being oversubscribed is relative—cutting the queue depth in half even once may be an indication that the storage array is oversubscribed, or a system administrator may not consider the storage array to be oversubscribed until the queue depth has been dropped to the lower limit without improvement in the average I/O command completion time. Adding LUs or redirecting data to a new target would have to be performed manually by the system administrator.
- The average I/O command completion time is not the only statistic that may be used to determine what is occurring to the LUs within a target. For example, if there is a large disparity between the average I/O command completion times of LUs in the same target, for a similar I/O load, this is an indication of starvation (unfairness in the average I/O command completion times for LUs within a target). Starvation usually applies to a few LUs out of many, and occurs due to unfairness of the I/O scheduler in the operating system, above the driver. However, the driver is not in control of fairness in terms of I/O scheduling, and thus can only detect a lack of fairness, not restore it. Changing fairness is something that the system administrator must do manually.
- The counts in the individual buckets may also provide an indication of what is happening within a LU. For example, a bell curve centered at a particular average I/O command completion time may be expected, but if there is a spike at some unexpected completion time, this may indicate a specific problem requiring LU maintenance. In other words, the nature of the distribution of counts in the buckets for a LU may provide an indication of what is happening in the LU, and more generally, what is happening at the target level, which is what the FC transport protocol cares about. (The application cares about the LU level.) Again, any adjustments made as a result of the nature of the distribution of counts in the buckets for a LU must be made manually by the system administrator.
- In addition, if a LU isn't as available as other LUs, as evidenced by a high average I/O command completion time for that LU as compared to other LUs, other LUs with a higher limit should be used. However, a system administrator would have to manually intervene and make a decision to change the storage allocation and/or move data from one LU to another.
- The invention can be extended to multiple initiators and multiple targets. Statistics can be obtained for all initiators and all targets so that a system administrator can determine which targets are overloaded and which initiators are affected. In other words, it can be extended across the entire SAN. All existing tools do not and cannot have this extension capability because they are all applicable only to direct attached storage.
- A system administrator may want to work from a single terminal on a single host and evaluate I/O command completion time data for all hosts in the SAN and all of the LUs, targets and ports in the SAN. Emulex Corporation's HBAnyware™ management suite, in its current configuration, keeps track of how HBAs are performing, how they are configured, enables HBAs to be configured remotely, and allows reports to be sent to remote locations on the network. HBAnyware™ can be extended in view of embodiments of the present invention to poll the average I/O command completion time and other information from the driver of each host within which HBAnyware™ is running and present it to the system administrator at a remote location in graphical or tabular form as described above so that a system administrator can see all of this LU loading information for the entire SAN and make adjustments accordingly. HBAnyware™ has a routine running in each driver that reports back, in-band, to the host within which the HBAnyware™ software is running. HBAnyware™ can communicate with all of the HBAs on each host, collect the data for each of the buckets for each LU, and send this data back to the host within which the HBAnyware™ software is running.
- In addition, instead of having the driver detect an increasing average I/O command completion time and an upcoming overload condition and set the queue depth automatically, the adjustments to the queue depths could also be done by a system administrator using HBAnyware™ and communicated back to each of the drivers. The latency information, in its histogram form, can also include information about the amount of I/Os being completed by the target and LUNs. Monitoring this volume is also interesting from a diagnostic standpoint. For example, four load-balanced HBAs, in the same server, should show similar volume to a given target/LUN. However, if the volume is quite different, the load balancing software is either malfunctioning or not configured correctly.
- It should be understood that all operating systems have some ability to measure completion times. However, embodiments of the invention further allow for the collection and management of the completion times for all servers in a data center, or some logical subset (i.e., all servers associated with a particular application), to provide trending, comparisons, and the discernment of acceptable and unacceptable response times. By comparing across multiple servers and trending over time, good and bad latency can be determined. In addition, the ability to observe latency for all servers associated with a given application can be helpful when diagnosing a performance problem with that particular application. In general, any servers whose latency falls significantly outside the average latency of the other servers, especially after being newly added to the system, can be targeted as possibly malfunctioning.
- Embodiments of the invention also provide the ability to integrate protocol error data, capacity (queue) data and latency data. For example, protocol errors that do not affect latency or volume of I/O can be ignored. Protocol errors that do affect latency or volume can also be prioritized.
- The second embodiment of the present invention to be discussed in greater detail relates to the use of remote agents embedded in initiators and a network diagnostics manager application for collecting specific interesting negative initiator event data (diagnostics data) to enable a picture of the operational health of the SAN to be determined. To better identify problem areas in the network, agents are placed in servers acting as initiators in the SAN. The agents interact with relatively inexpensive HBAs, NICs or adapters (referred to herein as I/O controllers) to collect initiator event data rather than relying on expensive test box hardware. Although termed “initiator event data,” the collected event data doesn't necessarily relate to the initiator but may relate to other parts of the network such as the switches and targets (disk drives). A benefit of collecting initiator event data is that no direct access (and associated access rights) to the network or storage components is needed to collect this data. Although HBAs are primarily intended to transfer data over a FC link, embodiments of the present invention implement firmware modifications to utilize the HBAs for gathering certain event data. HBAs are ideal for gathering event data because each HBA has visibility to parts of the network that other entities cannot see. Each agent collects the event data from the HBA through the HBA driver stack and sends the collected data to a network diagnostics manager in a centralized management server or a plurality of distributed servers so that a picture of the SAN can be pieced together that any one individual server would not ordinarily be able to see. The agents can also collect, from local drivers in the HBAs of the servers, errors and performance data seen at the HBAs (e.g., throughput problems, etc.).
- This data is periodically pulled from the agents by a network diagnostics manager and stored in a database, where it can be accessed by a base application. With this collected information, an overall picture of the network performance can be pieced together, and the SAN can be diagnosed based on what the initiators have seen. The network diagnostics manager can integrate I/O data from the OS, the initiator stack, network, and storage devices to create information and add value. This integration of data can enable the system to determine which errors to ignore, and which ones to pay attention to.
- Unlike conventional SAN diagnostics systems, embodiments of the present invention provide a number of nondisruptive smart agents scattered about the network and capable of being activated when needed and selectively configurable to look for only a certain number of data items and store them in memory, and periodically send this information back to a central location so that a picture of what is happening in the SAN can be developed. Although the information being collected at any one particular HBA by itself may not be enlightening, the collection of information being gathered by a number of HBAs can reveal the trouble spots in the network.
- As noted above, existing software-based SAN diagnostics tools are only able to collect counter information and/or high-level event data. Counters are good at showing trends but are not effective, and sometimes misleading, when attempting to determine root cause of SAN availability or performance issues. Relying on SAN counters for troubleshooting a SAN is akin to counting the average number of cars traveling through an intersection over a one week period to determine the root cause of an accident that happened on a particular day and time. High-level events can also be problematic in helping determine root cause. The high-level events can be intentionally induced by the end-user, or they can simply be a symptom of a problem, not the root cause. On the other hand, continuing the analogy above, collecting protocol-based negative event data in accordance with embodiments of the present invention is akin to obtaining information that a car with a particular license plate number ran a red light (a protocol violation) at the time of the accident.
- Unlike existing SAN diagnostics tools that create performance degradation due to their collection of massive amounts of data, embodiments of the present invention may cause no additional performance degradation by the collection of only interesting negative event data, because performance is already degraded at this point in time.
- Furthermore, the counter information collected by existing SAN diagnostics tools is just a number, and only is associated with time from the perspective of when they were read. Because counters don't have time information, they must be monitored constantly, which can have a negative performance impact. In contrast, the SAN diagnostics, tool according to embodiments of the invention sends “stateful” information back, related to time.
- In addition, existing SAN diagnostics tools can only provide counter data indicating, for example, the number of bytes being received by an HBA, with no visibility at the driver level. However, embodiments of the present invention can make requests of the driver itself, such as observed performance indications (e.g., a latency timer starts when the request is made, and stops when a completion message is received). These performance indications can reveal previously undetectable performance issues at the driver level.
- Also, unlike existing hardware-based SAN diagnostics tools that must be moved around from HBA to HBA and cannot develop a big picture of the network, embodiments of the present invention do not require hardware, just downloadable drivers, APIs and agents, and can collect the specific kind of data needed to develop a big picture of the network.
- Unlike HBAnyware, which collects configuration information about the HBAs, the data collected according to embodiments of the invention, although collected at the initiator, doesn't relate to the initiator but relates to other parts of the network such as the switches and/or targets.
-
FIG. 4 a illustrates anexemplary SAN 400 in an enterprise data center according to embodiments of the invention. In the example ofFIG. 4 a, a number ofproduction servers 402, each executing one or more applications (e.g. financial, human resources, or engineering applications), are connected to afabric 406. Eachproduction server 402 contains one or more HBAs (I/O controllers) 408, an initiatordevice driver stack 424, and anagent 418. Theproduction servers 402 can communicate with one ormore storage arrays 412 over thefabric 406, which can include, but is not limited to, FC, Ethernet, and Infiniband. - The
SAN 400 also includes one ormore management servers 414 executing networkdiagnostics manager software 416 for configuring the speed of the fabric (e.g., the speed of the links), zoning, etc. Thenetwork diagnostics manager 416 may be stored in computer-readable storage media and executed by one or more processors in the server. Thenetwork diagnostics manager 416 may receive link information (which is analogous to the “pulse” of theSAN 400, indicating that there is activity occurring), and throughput information (which is analogous to the “blood pressure” of the SAN, where excessively high or low throughput can indicate problems). Thenetwork diagnostics manager 416 is responsible for configuring the storage arrays (e.g., mapping LUNs in the storage arrays to production servers, configuring the size of the logical LUNs, etc.). Thenetwork diagnostics manager 416 may monitor the capacity of thestorage arrays 412 and receive localized latency measurements (e.g. how long it took to complete a particular command at the storage array). HBAnyware™, as mentioned above, can also be executed within the one ormore management servers 414. -
FIG. 5 illustrates an exemplaryorganizational structure 500 of an enterprise data center. Conventional SAN diagnostics software has drawbacks in that it can only provide high level information and does not have access to lower level information. So, for example, inFIG. 5 , an application performance issue appearing at a business unit group 502 (which provides the applications) may be passed down to the distributedsystems engineering level 504 deploying Linux, Windows, Solaris servers and the like for resolution, but those involved at the distributed systems engineering level may not have ready access to the lower level information (only available at the SANmanagement group level 506 or storage management group level 508) to diagnose the problem, and therefore must rely on diagnostics information received from those groups. However, information received from theSAN management group 506 orstorage management group 508 individually may not reveal the source of the problem. Thus, the agents mentioned above are utilized in embodiments of the present invention to automatically collect this lower level event data. -
FIG. 4 b illustrates anexemplary production server 402 in greater detail according to embodiments of the invention. Eachproduction server 402 includesHBA 402 and/or Network Interface Card (NIC) 410, generally referred to herein as I/O controllers, and aninitiator driver stack 424 that may be stored in memory (computer-readable storage media) 426 and executable byprocessor 420. In order to selectively capture interesting negative initiator event data, eachproduction server 402 includes anagent 418 that can be stored in computer-readable storage media 426 and executed by theserver processor 420. Theagent 418 may be downloaded, activated, deactivated, configured or upgraded using thenetwork diagnostics manager 416 executing in the one ormore management servers 414 through out-of-band channels 422 (e.g., over the Ethernet) to collect a configurable subset of information such as selected initiator event data. End users may independently perform these tasks as well. The agents can be either installed separately or delivered within a software download. The driver stack is tailored by adding anagent API 436 to facilitate communications between theagent 418 and theinitiator driver stack 424. - It is significant to note that the
agents 418 capture data at the initiators (production servers 402) that is already being generated, and provide the data to thenetwork diagnostics manager 416 in a format that shows how the initiator, connected fabric devices, and targets are performing. Although collected at the initiators, all the data doesn't relate to the initiator itself. Theagents 418 cooperate with the existinginitiator driver stack 424 and collect the information from the driver stack. A portion ofmemory 426 in theserver 402 stores the data collected by theagent 418. However, if thelocal memory 426 exceeds a certain capacity threshold, theagent 418 can proactively communicate with thenetwork diagnostics manager 416 through thecollector 430 to request that it be serviced. -
FIG. 4 c illustrates anexemplary management server 414 in greater detail according to embodiments of the invention. As illustrated inFIG. 4 c, thenetwork diagnostics manager 416 includes a base application 428 (the network diagnostics software), one ormore collectors 430, adatabase 432, and a Service Locations Protocol (SLP)interface 434. Theagents 418 communicate with theSLP interface 434 over theEthernet 422 to identify what type of agent they are. In embodiments of the present invention, for example, allagents 418 having the capabilities described above may be identified as being of the same agent type. Thecollectors 430 communicate with theSLP interface 434 to identify all of theagents 418 of a certain type. In this way, thecollectors 430 do not have to broadcast over the network to identify theagents 418. Thecollector 430 pulls data from the sensors/agents 418 over theEthernet 422 on a periodic basis, and stores the data in thedatabase 432. The collected data represents a system-wide view. The data can then be accessed by thebase application 428, which performs a recording engine function. Thebase application 428 may be accessible through a web server to enable a system administrator to retrieve specific data, generate reports, and diagnose network problems. - The
base application 428, through thecollector 430, can communicate with theagent 418 through a messaging protocol to configure the agent to collect only certain types of event data and store it in thespecial memory 432. Periodically, on a configurable basis, thecollector 430 polls theagent 418, which then retrieves the data from thespecial memory 426 and sends it to thecollector 430. - In one embodiment of the present invention, an initially
inactive agent 418 can be downloaded into aproduction server 402. Theagents 418 can be included in driver kits (containing drivers, industry standard APIs, HBAnyware™, etc.), so that when the kit is installed, the agents are also installed and are ready to be awakened. In addition, theinitiator driver stack 424 can be tailored so that if anagent 418 is ever awakened, the agent can be immediately accessible to the OS. Anagent API 436 can be downloaded and placed intoinitiator driver stack 424 so that an end user does not have to update the driver stack to get the benefit of the agents when they are activated. In this way, nothing needs to be added toproduction servers 402 to enable them to interact with theagents 418. Theseinactive agents 418 can collect nothing until a command is received to activate them and enable them to collect only certain kinds of event data. This data can be saved intomemory 426, and periodically sent back to thenetwork diagnostics manager 416. -
FIG. 6 illustrates anexemplary communication flow 600 between software and hardware elements according to embodiments of the present invention. As mentioned above, to implement embodiments of the present invention, changes are needed to the initiator driver stack. Previously, data was just being acted on, but with embodiments of the present invention the data now needs to be converted to a form that can be collected by the agent, and the agent has to be tailored to know where to collect this information from in the stack. Referring toFIG. 6 , anapplication 602 running on a production server may send afile system request 604. Thefile system 606 then converts this request to aSCSI request 608, which is passed down to a low-level device driver 610. The low-level device driver 610 creates a hardware-specific translation 612 for theHBA 614 it services. TheHBA 614 then communicates with a storage array 616 through thefabric 618. - To implement the first embodiment of the present invention involving latency measurements, the low-
level device driver 610 must be modified at 620 to time stamp both the outgoing I/O request and the incoming I/O completion. To implement the second embodiment of the present invention involving the collection of interesting negative event data, the low-level device driver 610 must be modified to include anagent API 622 to act as an interface with the agent and allow the agent to obtain the latency and negative initiator event data from the driver. Theagent API 622 can be stored in computer-readable storage media and executed by one or more processors in the server. - Referring again to
FIG. 4 b, theagents 418 can work with anyHBA 408 or I/O controller, provided that the proper commands are sent to theinitiator driver stack 424. In other words, theagents 418 have to ask the right questions (specific or general or OS questions) of aparticular HBA 408 through theinitiator driver stack 424. Theagent 418 can first send vendor-unique API commands to thedriver stack 424, and if those fail, either because theHBA 408 is from a different vendor or because the driver stack is outdated, the agent can revert to either open standard APIs or Operating System APIs to communicate with the driver stack and HBA. - As mentioned above, the network diagnostics system according to embodiments of the present invention is able to collect both initiator event data and OS data. The SAN diagnostics system of the present invention is based on the fact that the SCSI protocol is an initiator-based protocol. As such, the initiator starts every “conversation” in the SAN, and if any entity between the initiator and the target cannot partake in the conversation or service the request, feedback must be provided back to the initiator. Because a SCSI initiator sees things and issues kinds of commands that SCSI targets don't, and receives feedback, SCSI initiators are in a privileged position. For example, if an initiator sends a command to write a block of data, a switch can respond with a message indicating that the switch is too busy on a particular link to process a request from an initiator at this time. It can be very helpful to be notified of this type of bottleneck, so a message (protocol feedback) is sent back from the switch to the initiator. In another example, feedback can be received from the target itself, indicating that the disk or the target controller is too busy to process a request. The initiator is the only entity that collects feedback related to the overall performance of the network. For example, from an event standpoint and performance standpoint, a switch doesn't have a good perspective on end performance. The switch isn't collecting information from the target on how busy the target is, it is just collecting information on how busy the switch is. Thus, the initiator is the entity most suitable for placing an agent to collect information related to the overall performance of the network.
- This feedback, such as whether the SCSI layer or FC layer is too busy, can be extracted from within the initiator's own driver stack by the agent. Note that an agent is not needed in the target's stack, because the central SAN diagnostics manager does not communicate with the target directly. Instead, an agent obtains information from the initiator's stack (which was received from the target).
- Initiator event data that may be collected by the sensors/agents can include, but is not limited to, (1) fabric_busy, which is sent back to an initiator by a fabric device to indicate that the fabric is overloaded, (2) queue_full, which is sent back to an initiator by a disk to indicate that a particular LUN can't process any new requests/commands because its queue is full, and (3) SCSI_busy, which is sent back to an initiator by a target to indicate that the target is too busy to process commands. Other initiator event data that may be collected includes a target_reset command, which usually originates from the SCSI stack in a server. Note, however, that if a production server sends out a target_reset command to the target, the target then sends a 3rd party logout to every server associated with its logical LUNs. This is disruptive because it puts the other servers out of commission. Still other initiator event data that may be collected includes, but is not limited to, ELS PLOGI (Process Login), ELS LOGO (Logout), ELS PRLO (Process Logout), ELS ADISC (Address Discovery), ELS RSCN (Registered State Change Notification), Link control P_BSY (Port Busy), FCP Read Check Error, SCSI Logical Unit Rest, SCSI Check Condition, SCSI Bus Reset, Queue Depth Changed, Firmware Updated, and Boot Code Updated. As noted above, this type of initiator event data is often much more relevant and interesting to the diagnosis of network trouble than the simple counter data and high-level event data.
- This data being collected is initiator endpoint data, not overall performance data. The collected data is exception data from inside the stacks of the initiator, target, and switches. The collected data is selectively chosen to be interesting negative event data, a subset of the data available for collection. No additional performance degradation may be incurred by the collection of such data, because performance is already degraded at this point in time.
- Another capability of the
agents 418 according to embodiments of the present invention is communicating with theOSs 438 of the servers to gather and report data in ways that are not currently done. Theagent 418 running in user space (as opposed to kernel space) gets this data from theOS 438 and sends it back to thenetwork diagnostics manager 416. - The information that can be collected by the
agent 418 from theOS 438 includes, but is not limited to, OS events, queue depth, throughput, the number of commands seen during a certain period of time, seconds since last HBA reset, transmitted frames, transmitted words, received frames, received words, LIP count, NOS errors, error frames, dumped frames, link failures, loss of sync, loss of signal, invalid transmit word count, invalid CRC count, disk read bytes per second, and disk write bytes per second. - The embodiment of the present invention currently under discussion has been described above in the context of placing agents in front-end initiators. However, alternatively or additionally, embodiments of the present invention can be extended to back-end initiators. In such embodiments, interesting negative event data can be collected from target stacks within back-end initiators, at the back end of a network appliance filer (a filer front end with a SAN at the back end). For example, in a FC SAN, initiators may be connected through a fabric (including fabric switches) to a storage subsystem having a FC HBA operating in a target mode. Agents could be placed in the storage subsystem to pull negative event data from the target driver stack at the front end of the storage subsystem. With this arrangement, if an initiator determines that the storage subsystem did not fulfill a particular request, the network diagnostics manager could receive data from the target driver stack in an attempt to understand where the storage subsystem is having issues (e.g. cache memory, a fabric issue on back end, etc.).
- The event data that can be collected by embodiments of the present invention is not limited to the specific data mentioned above. Embodiments of the present invention include other metrics that would be of interest. For example, the agent could monitor particular error responses in a particular sequence, and note the time of occurrence of each sequence. Collecting this data from a number of locations may produce a meaningful compilation of data. This is just one example of the types of information that can be collected according to embodiments of the present invention.
- The embodiment of the present invention currently under discussion could also be implemented in Fibre Channel over Ethernet (FCOE) systems. FCOE is generally compatible with embodiments of the present invention discussed herein because the initiators in FCOE utilize a SCSI initiator stack, and all of the same FC and SCSI event, latency and capacity data (negative event data) can be collected from the initiator stack. Embodiments of the present invention could also apply to iSCSI (more generally, anything utilizing a SCSI stack), NAS and NIC stacks (Ethernet stacks). Embodiments of the present invention could apply to Serial Attached SCSI (SAS) initiators as well, or other protocols where a local stack can be probed for negative event data from remote devices.
- The third embodiment of the present invention to be described in further detail relates to the computation of an oversubscription value based on the demand for a device divided by the handling capacity of the device to help determine whether the device is oversubscribed and changes need to be made.
- When SCSI devices are deployed, they are programmed with a command queue depth. This queue depth dictates how many commands can be queued up to the particular device at a given time. If there is demand for more commands and the queue is full, the other commands cannot enter the queue and need to wait for slots to open up. In large data centers, there are thousands of servers accessing thousands of devices. If there is too much demand for a particular device, its queue fills up, and applications must wait to be serviced. If the queue is too overloaded it could, for example, take ten seconds to open an e-mail. As a result, understanding how the queue is utilized is important to understanding looming performance issues.
- In a RAID controller the issue is further compounded. In this case, there are hundreds or thousands of devices (such as disk drives) located behind a controller front end. This controller front end has some I/O processing capability that is far less than the I/O processing capability of all the devices it supports. As a result, it is important to understand the I/O demand on the controller as well as on each individual device. In addition, with virtual server technology, more queuing demand is placed on storage controllers, by fewer initiators and servers. Further, the mapping of all the queue demand to the storage controllers is more difficult to discern and aggregate.
- Embodiments of the present invention collect, operate on, and display queuing information from thousands of host servers in a data center environment. This is not practical to do manually today, and no other product exists to perform this task. Because this information is not available today, IT administrators are forced to react to performance issues. Often times they must either slow down all the systems (lower the queue depth everywhere) in order to bring the system into some level of predictable performance. Without specific, and system-wide, queuing information it is difficult to maintain peak performance.
- For example, if a particular application is experiencing poor performance, understanding the queue depth that it is utilizing would help determine whether the performance issue is I/O-related or application related. If the issue is I/O-related, a common approach includes increasing the queue depth for all the application's storage devices. This has the effect of not only increasing these devices quality of service, but also decreasing the relative quality of service of all other devices being serviced by the same storage front end. Embodiments of the present invention will allow administrators to make more informed choices when addressing these performance parameters.
-
FIG. 7 is an example of aSAN 700 and anetwork diagnostics manager 702 capable of computing an oversubscription value according to embodiments of the present invention. In the example ofFIG. 7 , theproduction servers 704 containqueues 706 having acertain queue depth 708 for storing commands associated with aparticular LUN 710. Eachproduction server 704 may be mapped to aLUN 710 in astorage array 712. Thestorage array 712 accesses thefabric 714 through anarray port 716, which may have acertain handling capacity 718 that is a function of the number of commands that can be simultaneously handled and the memory required for those commands. In the simplified example ofFIG. 7 , eachqueue 706 in eachproduction server 704 has a queue depth of 30, but the arrayport handling capacity 718 is 45. Thus, a computation of a configured oversubscription value as defined by the configured queue depths of the queues in the production servers associated with LUNs being serviced by a particular array port (e.g., 30+30+30 in this example) divided by the maximum array port handling capacity (supply) servicing the logical LUNs (e.g., 45 in this example) yields a configured oversubscription value of 2, which is within acceptable limits (some oversubscription is desirable because the production servers in reality may only have a couple of commands in their queues at any time). However, if the configured oversubscription value should become 20, for example, this can represent a significant oversubscription calling for a reallocation of resources. - In addition to computing a configured oversubscription value, this embodiment of the present invention may compute an actual oversubscription value, which uses the actual queue depths of the production servers divided by the maximum array port handling capacity (supply) servicing the LUNs.
- The types of oversubscription ratios that can be calculated according to embodiments of the present invention include, but are not limited to, target Port Oversubscription Value for maximum queue depths, target Port Oversubscription Value for actual queue depths, HBA Oversubscription Value for maximum queue depths, HBA Oversubscription Value for actual queue depths, Device Oversubscription Value for maximum queue depths, and Device Oversubscription Value for actual queue depths.
- The above description illustrates that this embodiment of the present invention collects, operates on, and organizes queue information. Embodiments of the present invention can automatically collect the programmed block storage maximum queue depth for every device, sample the block storage actual queue usage for every device, map the queues of every device to storage arrays or controller front-ends, and map maximum and actual per device queue information on a per server, per initiator, per target port, port target, and per device basis. In one embodiment, agents installed in the production servers and the storage arrays can extract this information and send it to a SAN diagnostics manager in one or more management servers over the Ethernet. This information is then organized and displayed (e.g., on a web page accessible over the Internet) in such a manner that the data center administrator can quickly determine if there are issues or opportunities related to queue management.
- The system will also provide for alerts when certain measured values approach or surpass configured thresholds (high water marks). The system also establishes a Target Port Oversubscription Value (TPOV) for both the maximum and utilized queue usage. This value is based on the quotient of each of these queue usage measures, summed up for all the devices behind a target port, divided by the I/O handling capability of the storage array. The I/O handling capability of the storage array is either gleaned empirically, or overridden by the end-user based on more expert knowledge of a specific configuration. In other embodiments, reports and trend charts can be produced. With all of the information provided by all of the embodiments of the present invention described above, a system administrator may be able to link performance, via latency measurements, to oversubscription values, or link “array busies” and device queue full events to oversubscription values.
- Embodiments of the present invention also extend to the back-end of network attached storage systems (a.k.a. filers). Because many filers are simply specialized server front-ends with SAN back-ends, embodiments of the present invention can be used to monitor the queuing on the backend and provide important performance data to the end user.
- The fourth embodiment of the present invention to be discussed in greater detail relates to collecting and logging certain types of event data in a database in a centralized management server, and computing a system severity value indicative of the level of impact (criticality or severity) of each event. In this embodiment, the data is collected at the driver level by the agents, and stored in the special memory in the production servers. When the collector polls the production servers, I/O scope (low level events) are collected and centralized in a database. The base application can utilize this data to generate a severity calculation based on a predetermined severity level along with some collected data such as the number of servers affected by a target reset command.
-
FIG. 8 illustrates anexemplary SAN 802 implementing severity data collection system according to embodiments of the present invention. As shown inFIG. 8 , each computer 800 (which can be equated toserver 402 inFIG. 4 a) connected to the Storage Area Network (SAN) 802 runs a Storage Area Network Monitor Agent 804 (which can be the same as or additional toagent 418 inFIG. 4 a). Thisagent 804 continuously monitors for storage network protocol events that include, but are not limited to, Discovery events, Task Management commands, Link events, SCSI errors, and Change in data access performance. This monitoring can be performed with minimal or zero effect on system performance. Each one of these events can adversely affect the availability and performance of theSAN 802. Discovery events can indicate, for example, that an application's storage is no longer available. Task Management events can indicate that a storage controller is experiencing intermittent hardware problems, potentially leading to loss of storage access for several servers. Link events can also indicate loss of access, or degraded access to theSAN 802. SCSI errors can indicate loss of access, or degraded access to the SCSI devices attached to theSAN 802. When any Storage AreaNetwork monitor agent 804 detects an event, it sends information about the event to the Storage Area Network Monitor 806 (which can be equated to thenetwork diagnostics manager 416 inFIG. 4 a) that includes, but is not limited to, Type of the event, Event Severity, Time stamp of the event, Identifier of the initiator's network hardware, and port, which experienced the event, Identifier of the Storage system, and port, which saw the event, and Attributes of the event. This information allows for automatically correlating multiple events from multiple servers. - When the Storage
Area Network Monitor 806 receives such event information from anagent 804, the Storage Area Network Monitor logs this information to the Event Log Database 808 (which can be equated todatabase 432 inFIG. 4 c). This allows root cause of any severe events to be investigated immediately, as opposed to awaiting another failure scenario after enabling logging. As part of populating theEvent Log Database 808, a System Severity field or value will be calculated as described above. System Severity will consider the level of impact for each event on other SAN elements. For example, a link event on one of four links to a server is not as critical as a link event to a server with a single link. Likewise, a downed target that has two servers connected to it likely has less impact than a target that has thirty servers connected to it. - The
Event Log Database 808 contains event information from theentire SAN 802 as a SAN-wide protocol analyzer for root cause analysis. TheEvent Analyzer 810 operates in two modes, Manual Mode and Knowledge Base Mode. In Manual Mode, theEvent Analyzer 810 presents several filtering and sorting options to the end-user. Multiple SAN events can be filtered to investigate events associated with particular initiators, particular storage arrays, particular times of day, etc. Sorting can also be utilized to perform time-based sorts, storage port sorts, etc. This manual mode will allow capture of problematic SAN events, along with the filtering and sorting tools to quickly identify the root cause of the event. - In the Knowledge Base mode, all related events are linked together by the
Event Analyzer 810 and presented, based on a knowledge base ofinformation 812, to the user as a single failure. In this manner, seemingly disparate events can be correlated to help indicate root cause. For each failure presented to the user, information also is presented to the user that includes, but is not limited to, Number of computers affected by the failure, Number of storage ports affected by the failure, Severity of the SAN failure, and a list of potential root causes based on theknowledge base 812. - All the above provides a better method to root cause SAN issues. Having the ability to immediately view SAN-wide protocol events in the aftermath of an incident provides faster resolution of critical problems. Having the ability to capture protocol events by means of the storage adapter also eliminates the need to place a network sniffer on every suspect path. This provides a cheaper mechanism to root cause SAN issues. Furthermore, a major benefit of integrating I/O data from the OS, the initiator stack, network, and storage devices is to create information and add value. This integration of data can enable the system to determine which errors to ignore, and which ones to pay attention to. For example, if the collected data indicates that the storage array is busy but that latency is acceptable, the system can ignore any detected errors. Another benefit is the ability to collect all data from a production server with no need for access rights to the network or storage components. The I/O data can also be used in a predictive manner to allow for the prevention of problems before they occur. For example, the collected information can include the various ways (network paths) that an I/O request can access a target. By observing the latency data and the negative events, it may be possible to determine when one or more of these I/O paths are lost. An alert can be generated when the number of I/O paths is reduced to some threshold (e.g., one path). Such an alert would allow an administrator to restore the lost paths. Currently such an alert is only available for physical network paths and not I/O paths, and I/O paths can be lost even though physical networks paths are not. Such is the case when a the I/O path is lost within the target device, even though the physical network path to the target device is operational. The only alert is when all paths are lost.
- In another example, an alert can be generated if a Queue Full event came back a certain number of times (e.g., 25 times) in a minute, or if the latency for a particular initiator/target port pair exceeds an average of a certain amount of time (e.g., 100 ms) over a given sample period (e.g., one hour).
- Although the present invention has been fully described in connection with embodiments thereof with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. Such changes and modifications are to be understood as being included within the scope of the present invention as defined by the appended claims.
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/093,926 US20140089735A1 (en) | 2006-02-22 | 2013-12-02 | Computer System Input/Output Management |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/360,557 US7716381B2 (en) | 2006-02-22 | 2006-02-22 | Method for tracking and storing time to complete and average completion time for storage area network I/O commands |
US12/486,670 US8635376B2 (en) | 2006-02-22 | 2009-06-17 | Computer system input/output management |
US14/093,926 US20140089735A1 (en) | 2006-02-22 | 2013-12-02 | Computer System Input/Output Management |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/486,670 Continuation US8635376B2 (en) | 2006-02-22 | 2009-06-17 | Computer system input/output management |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140089735A1 true US20140089735A1 (en) | 2014-03-27 |
Family
ID=41164887
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/486,670 Active 2027-01-06 US8635376B2 (en) | 2006-02-22 | 2009-06-17 | Computer system input/output management |
US14/093,926 Abandoned US20140089735A1 (en) | 2006-02-22 | 2013-12-02 | Computer System Input/Output Management |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/486,670 Active 2027-01-06 US8635376B2 (en) | 2006-02-22 | 2009-06-17 | Computer system input/output management |
Country Status (1)
Country | Link |
---|---|
US (2) | US8635376B2 (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140359376A1 (en) * | 2013-05-30 | 2014-12-04 | Xyratex Technology Limited | Method of, and apparatus for, detection of degradation on a storage resource |
CN104951383A (en) * | 2014-03-31 | 2015-09-30 | 伊姆西公司 | Hard disk health state monitoring method and hard disk health state monitoring device |
WO2017023461A1 (en) * | 2015-08-06 | 2017-02-09 | Drivescale, Inc. | Method and system for balancing storage data traffic in converged networks |
US20170093668A1 (en) * | 2015-09-25 | 2017-03-30 | International Business Machines Corporation | Data traffic monitoring tool |
US20170093975A1 (en) * | 2015-09-26 | 2017-03-30 | Arun Raghunath | Technologies for managing data object requests in a storage node cluster |
US20170093976A1 (en) * | 2015-09-26 | 2017-03-30 | Arun Raghunath | Technologies for reducing latency variation of stored data object requests |
WO2017053687A1 (en) * | 2015-09-25 | 2017-03-30 | Brocade Communication Systems, Inc. | High granularity link oversubscription detection |
US9992276B2 (en) | 2015-09-25 | 2018-06-05 | International Business Machines Corporation | Self-expanding software defined computing cluster |
US10055327B2 (en) | 2014-09-30 | 2018-08-21 | International Business Machines Corporation | Evaluating fairness in devices under test |
US20180278484A1 (en) * | 2015-11-02 | 2018-09-27 | Hewlett Packard Enterprise Development Lp | Storage area network diagnostic data |
US10409750B2 (en) | 2016-07-11 | 2019-09-10 | International Business Machines Corporation | Obtaining optical signal health data in a storage area network |
US10503654B2 (en) | 2016-09-01 | 2019-12-10 | Intel Corporation | Selective caching of erasure coded fragments in a distributed storage system |
US10691582B2 (en) | 2018-05-29 | 2020-06-23 | Sony Interactive Entertainment LLC | Code coverage |
US10901874B2 (en) * | 2018-05-18 | 2021-01-26 | Sony Interactive Entertainment LLC | Shadow testing |
US11126367B2 (en) * | 2018-03-14 | 2021-09-21 | Western Digital Technologies, Inc. | Storage system and method for determining ecosystem bottlenecks and suggesting improvements |
US11436113B2 (en) | 2018-06-28 | 2022-09-06 | Twitter, Inc. | Method and system for maintaining storage device failure tolerance in a composable infrastructure |
Families Citing this family (123)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7716381B2 (en) * | 2006-02-22 | 2010-05-11 | Emulex Design & Manufacturing Corporation | Method for tracking and storing time to complete and average completion time for storage area network I/O commands |
JP4740979B2 (en) * | 2007-05-29 | 2011-08-03 | ヒューレット−パッカード デベロップメント カンパニー エル.ピー. | Method and system for determining device criticality during SAN reconfiguration |
US8301759B2 (en) * | 2008-10-24 | 2012-10-30 | Microsoft Corporation | Monitoring agent programs in a distributed computing platform |
EP2184681A1 (en) * | 2008-10-31 | 2010-05-12 | HSBC Holdings plc | Capacity control |
US8935500B1 (en) * | 2009-09-24 | 2015-01-13 | Vmware, Inc. | Distributed storage resource scheduler and load balancer |
CN102577232B (en) * | 2009-10-29 | 2016-01-20 | 惠普发展公司,有限责任合伙企业 | Be connected to the network switch of transducer |
US8893146B2 (en) * | 2009-11-13 | 2014-11-18 | Hewlett-Packard Development Company, L.P. | Method and system of an I/O stack for controlling flows of workload specific I/O requests |
US9042237B2 (en) * | 2009-12-14 | 2015-05-26 | At&T Intellectual Property I, L.P. | Identifying network performance alert conditions |
US20110202650A1 (en) * | 2010-02-12 | 2011-08-18 | Brocade Communications Systems, Inc. | Method and system for monitoring data flows in a network |
US8959220B2 (en) | 2010-11-02 | 2015-02-17 | International Business Machines Corporation | Managing a workload of a plurality of virtual servers of a computing environment |
US8966020B2 (en) | 2010-11-02 | 2015-02-24 | International Business Machines Corporation | Integration of heterogeneous computing systems into a hybrid computing system |
US9081613B2 (en) | 2010-11-02 | 2015-07-14 | International Business Machines Corporation | Unified resource manager providing a single point of control |
US8984109B2 (en) | 2010-11-02 | 2015-03-17 | International Business Machines Corporation | Ensemble having one or more computing systems and a controller thereof |
US9253016B2 (en) | 2010-11-02 | 2016-02-02 | International Business Machines Corporation | Management of a data network of a computing environment |
US8504691B1 (en) * | 2010-12-29 | 2013-08-06 | Amazon Technologies, Inc. | System and method for allocating resources for heterogeneous service requests |
US20120233604A1 (en) * | 2011-03-08 | 2012-09-13 | Jibbe Mahmoud K | Method for concurrently supporting new and legacy third party boot-sets in an array |
US8938566B2 (en) * | 2011-03-17 | 2015-01-20 | American Megatrends, Inc. | Data storage system for managing serial interface configuration based on detected activity |
US8621113B2 (en) * | 2011-05-31 | 2013-12-31 | Micron Technology, Inc. | Apparatus including host bus adapter and serial attachment programming compliant device and related methods |
US9009220B2 (en) * | 2011-10-14 | 2015-04-14 | Mimecast North America Inc. | Analyzing stored electronic communications |
US8452901B1 (en) | 2011-12-30 | 2013-05-28 | Emc Corporation | Ordered kernel queue for multipathing events |
US8788658B2 (en) | 2012-02-03 | 2014-07-22 | International Business Machines Corporation | Allocation and balancing of storage resources |
US9003432B1 (en) * | 2012-06-28 | 2015-04-07 | Emc Corporation | Efficient management of kernel driver performance data |
US9317349B2 (en) | 2013-09-11 | 2016-04-19 | Dell Products, Lp | SAN vulnerability assessment tool |
US9720758B2 (en) | 2013-09-11 | 2017-08-01 | Dell Products, Lp | Diagnostic analysis tool for disk storage engineering and technical support |
US9454423B2 (en) * | 2013-09-11 | 2016-09-27 | Dell Products, Lp | SAN performance analysis tool |
US10223230B2 (en) | 2013-09-11 | 2019-03-05 | Dell Products, Lp | Method and system for predicting storage device failures |
US9699032B2 (en) * | 2013-10-29 | 2017-07-04 | Virtual Instruments Corporation | Storage area network queue depth profiler |
US9843518B2 (en) * | 2014-03-14 | 2017-12-12 | International Business Machines Corporation | Remotely controlled message queue |
US9436411B2 (en) | 2014-03-28 | 2016-09-06 | Dell Products, Lp | SAN IP validation tool |
US9052938B1 (en) * | 2014-04-15 | 2015-06-09 | Splunk Inc. | Correlation and associated display of virtual machine data and storage performance data |
US20150341244A1 (en) * | 2014-05-22 | 2015-11-26 | Virtual Instruments Corporation | Performance Analysis of a Time-Varying Network |
US10691440B2 (en) | 2014-06-06 | 2020-06-23 | Hewlett Packard Enterprise Development Lp | Action execution based on management controller action request |
US9854042B2 (en) * | 2015-01-23 | 2017-12-26 | Dell Products, Lp | Automated assessment report generation |
US10291463B2 (en) * | 2015-10-07 | 2019-05-14 | Riverbed Technology, Inc. | Large-scale distributed correlation |
US9418088B1 (en) | 2015-12-02 | 2016-08-16 | International Business Machines Corporation | Identification of storage system elements causing performance degradation |
US10459658B2 (en) * | 2016-06-23 | 2019-10-29 | Seagate Technology Llc | Hybrid data storage device with embedded command queuing |
US10298437B2 (en) * | 2016-09-06 | 2019-05-21 | Quest Software Inc. | Distributed data collection in an enterprise network |
CN108009006B (en) | 2016-11-02 | 2022-02-18 | 华为技术有限公司 | Scheduling method and device of I/O (input/output) request |
US10237156B2 (en) * | 2016-11-28 | 2019-03-19 | Mellanox Technologies Tlv Ltd. | Low-complexity measurement of packet traversal time in network element |
US10579567B2 (en) * | 2017-06-28 | 2020-03-03 | Western Digital Technologies, Inc. | Queue depth management for host systems accessing a peripheral component interconnect express (PCIe) device via a PCIe switch |
US11507595B1 (en) * | 2017-07-17 | 2022-11-22 | EMC IP Holding Company LLC | Agent-less replication management |
CN107707628B (en) * | 2017-09-06 | 2020-06-02 | 华为技术有限公司 | Method and apparatus for transmitting data processing requests |
US11223534B2 (en) | 2017-12-29 | 2022-01-11 | Virtual Instruments Worldwide, Inc. | Systems and methods for hub and spoke cross topology traversal |
WO2019133763A1 (en) | 2017-12-29 | 2019-07-04 | Virtual Instruments Corporation | System and method of application discovery |
US11126495B2 (en) * | 2018-03-07 | 2021-09-21 | Micron Technology, Inc. | Dynamic error handling in a memory system |
US10757189B2 (en) | 2018-04-30 | 2020-08-25 | EMC IP Holding Company LLC | Service level objection based input-output selection utilizing multi-path layer of host device |
US10949287B2 (en) | 2018-09-19 | 2021-03-16 | International Business Machines Corporation | Finding, troubleshooting and auto-remediating problems in active storage environments |
US11044313B2 (en) | 2018-10-09 | 2021-06-22 | EMC IP Holding Company LLC | Categorizing host IO load pattern and communicating categorization to storage system |
US10880217B2 (en) | 2018-12-24 | 2020-12-29 | EMC IP Holding Company LLC | Host device with multi-path layer configured for detection and resolution of oversubscription conditions |
US10642764B1 (en) * | 2019-03-01 | 2020-05-05 | Western Digital Technologies, Inc. | Data transfer command latency of a host device |
US10754559B1 (en) | 2019-03-08 | 2020-08-25 | EMC IP Holding Company LLC | Active-active storage clustering with clock synchronization |
US11074011B2 (en) | 2019-03-20 | 2021-07-27 | Western Digital Technologies, Inc. | Solid state drive latency estimation interface for host performance tuning |
US12010172B2 (en) * | 2019-09-30 | 2024-06-11 | EMC IP Holding Company LLC | Host device with multi-path layer configured for IO control using detected storage port resource availability |
US11099754B1 (en) | 2020-05-14 | 2021-08-24 | EMC IP Holding Company LLC | Storage array with dynamic cache memory configuration provisioning based on prediction of input-output operations |
US11175828B1 (en) | 2020-05-14 | 2021-11-16 | EMC IP Holding Company LLC | Mitigating IO processing performance impacts in automated seamless migration |
US11012512B1 (en) | 2020-05-20 | 2021-05-18 | EMC IP Holding Company LLC | Host device with automated write throttling responsive to storage system write pressure condition |
US11023134B1 (en) | 2020-05-22 | 2021-06-01 | EMC IP Holding Company LLC | Addition of data services to an operating system running a native multi-path input-output architecture |
US11151071B1 (en) | 2020-05-27 | 2021-10-19 | EMC IP Holding Company LLC | Host device with multi-path layer distribution of input-output operations across storage caches |
US11226851B1 (en) | 2020-07-10 | 2022-01-18 | EMC IP Holding Company LLC | Execution of multipath operation triggered by container application |
US11256446B1 (en) | 2020-08-03 | 2022-02-22 | EMC IP Holding Company LLC | Host bus adaptor (HBA) virtualization aware multi-pathing failover policy |
CN116113917A (en) * | 2020-08-25 | 2023-05-12 | 华为技术有限公司 | Lightweight thread (LWT) rebalancing in storage systems |
US11916938B2 (en) | 2020-08-28 | 2024-02-27 | EMC IP Holding Company LLC | Anomaly detection and remediation utilizing analysis of storage area network access patterns |
US11157432B1 (en) * | 2020-08-28 | 2021-10-26 | EMC IP Holding Company LLC | Configuration of block devices based on provisioning of logical volumes in a storage system |
US11392459B2 (en) | 2020-09-14 | 2022-07-19 | EMC IP Holding Company LLC | Virtualization server aware multi-pathing failover policy |
US11320994B2 (en) | 2020-09-18 | 2022-05-03 | EMC IP Holding Company LLC | Dynamic configuration change control in a storage system using multi-path layer notifications |
US11032373B1 (en) | 2020-10-12 | 2021-06-08 | EMC IP Holding Company LLC | Host-based bandwidth control for virtual initiators |
US11397540B2 (en) | 2020-10-12 | 2022-07-26 | EMC IP Holding Company LLC | Write pressure reduction for remote replication |
US11630581B2 (en) | 2020-11-04 | 2023-04-18 | EMC IP Holding Company LLC | Host bus adaptor (HBA) virtualization awareness for effective input-output load balancing |
US11385824B2 (en) | 2020-11-30 | 2022-07-12 | EMC IP Holding Company LLC | Automated seamless migration across access protocols for a logical storage device |
US11543971B2 (en) | 2020-11-30 | 2023-01-03 | EMC IP Holding Company LLC | Array driven fabric performance notifications for multi-pathing devices |
US11397539B2 (en) | 2020-11-30 | 2022-07-26 | EMC IP Holding Company LLC | Distributed backup using local access |
US11204777B1 (en) | 2020-11-30 | 2021-12-21 | EMC IP Holding Company LLC | Boot from SAN operation support on multi-pathing devices |
US11620240B2 (en) | 2020-12-07 | 2023-04-04 | EMC IP Holding Company LLC | Performance-driven access protocol switching for a logical storage device |
US11409460B2 (en) | 2020-12-08 | 2022-08-09 | EMC IP Holding Company LLC | Performance-driven movement of applications between containers utilizing multiple data transmission paths with associated different access protocols |
US11455116B2 (en) | 2020-12-16 | 2022-09-27 | EMC IP Holding Company LLC | Reservation handling in conjunction with switching between storage access protocols |
US11651066B2 (en) | 2021-01-07 | 2023-05-16 | EMC IP Holding Company LLC | Secure token-based communications between a host device and a storage system |
US11308004B1 (en) * | 2021-01-18 | 2022-04-19 | EMC IP Holding Company LLC | Multi-path layer configured for detection and mitigation of slow drain issues in a storage area network |
US11449440B2 (en) | 2021-01-19 | 2022-09-20 | EMC IP Holding Company LLC | Data copy offload command support across multiple storage access protocols |
US11494091B2 (en) | 2021-01-19 | 2022-11-08 | EMC IP Holding Company LLC | Using checksums for mining storage device access data |
US11467765B2 (en) | 2021-01-20 | 2022-10-11 | EMC IP Holding Company LLC | Detection and mitigation of slow drain issues using response times and storage-side latency view |
US11386023B1 (en) | 2021-01-21 | 2022-07-12 | EMC IP Holding Company LLC | Retrieval of portions of storage device access data indicating access state changes |
US11640245B2 (en) | 2021-02-17 | 2023-05-02 | EMC IP Holding Company LLC | Logical storage device access in an encrypted storage environment |
US11797312B2 (en) | 2021-02-26 | 2023-10-24 | EMC IP Holding Company LLC | Synchronization of multi-pathing settings across clustered nodes |
US11755222B2 (en) | 2021-02-26 | 2023-09-12 | EMC IP Holding Company LLC | File based encryption for multi-pathing devices |
US11928365B2 (en) | 2021-03-09 | 2024-03-12 | EMC IP Holding Company LLC | Logical storage device access using datastore-level keys in an encrypted storage environment |
US11294782B1 (en) | 2021-03-22 | 2022-04-05 | EMC IP Holding Company LLC | Failover affinity rule modification based on node health information |
US11782611B2 (en) | 2021-04-13 | 2023-10-10 | EMC IP Holding Company LLC | Logical storage device access using device-specific keys in an encrypted storage environment |
US11422718B1 (en) | 2021-05-03 | 2022-08-23 | EMC IP Holding Company LLC | Multi-path layer configured to provide access authorization for software code of multi-path input-output drivers |
US11550511B2 (en) | 2021-05-21 | 2023-01-10 | EMC IP Holding Company LLC | Write pressure throttling based on service level objectives |
US11822706B2 (en) | 2021-05-26 | 2023-11-21 | EMC IP Holding Company LLC | Logical storage device access using device-specific keys in an encrypted storage environment |
US11625232B2 (en) | 2021-06-07 | 2023-04-11 | EMC IP Holding Company LLC | Software upgrade management for host devices in a data center |
US11526283B1 (en) | 2021-06-08 | 2022-12-13 | EMC IP Holding Company LLC | Logical storage device access using per-VM keys in an encrypted storage environment |
US11762588B2 (en) | 2021-06-11 | 2023-09-19 | EMC IP Holding Company LLC | Multi-path layer configured to access storage-side performance metrics for load balancing policy control |
US11954344B2 (en) | 2021-06-16 | 2024-04-09 | EMC IP Holding Company LLC | Host device comprising layered software architecture with automated tiering of logical storage devices |
CN113377566B (en) * | 2021-06-22 | 2024-07-05 | 新华三技术有限公司合肥分公司 | UEFI-based server starting method, device and storage medium |
US11411805B1 (en) | 2021-07-12 | 2022-08-09 | Bank Of America Corporation | System and method for detecting root cause of an exception error in a task flow in a distributed network |
US11750457B2 (en) | 2021-07-28 | 2023-09-05 | Dell Products L.P. | Automated zoning set selection triggered by switch fabric notifications |
US11625308B2 (en) | 2021-09-14 | 2023-04-11 | Dell Products L.P. | Management of active-active configuration using multi-pathing software |
US11586356B1 (en) | 2021-09-27 | 2023-02-21 | Dell Products L.P. | Multi-path layer configured for detection and mitigation of link performance issues in a storage area network |
US12131047B2 (en) | 2021-10-14 | 2024-10-29 | Dell Products L.P. | Non-disruptive migration of logical storage devices in a Linux native multi-pathing environment |
US11656987B2 (en) | 2021-10-18 | 2023-05-23 | Dell Products L.P. | Dynamic chunk size adjustment for cache-aware load balancing |
US11418594B1 (en) | 2021-10-20 | 2022-08-16 | Dell Products L.P. | Multi-path layer configured to provide link availability information to storage system for load rebalancing |
US12001595B2 (en) | 2021-12-03 | 2024-06-04 | Dell Products L.P. | End-to-end encryption of logical storage devices in a Linux native multi-pathing environment |
US11567669B1 (en) | 2021-12-09 | 2023-01-31 | Dell Products L.P. | Dynamic latency management of active-active configurations using multi-pathing software |
US12045480B2 (en) | 2021-12-14 | 2024-07-23 | Dell Products L.P. | Non-disruptive switching of multi-pathing software |
US12135627B1 (en) | 2022-01-31 | 2024-11-05 | Splunk Inc. | Facilitating management of collection agents |
US11902081B1 (en) * | 2022-01-31 | 2024-02-13 | Splunk Inc. | Managing collection agents via an agent controller |
US11892937B2 (en) | 2022-02-28 | 2024-02-06 | Bank Of America Corporation | Developer test environment with containerization of tightly coupled systems |
US12028203B2 (en) * | 2022-02-28 | 2024-07-02 | Bank Of America Corporation | Self-resolution of exception errors in a distributed network |
US11438251B1 (en) | 2022-02-28 | 2022-09-06 | Bank Of America Corporation | System and method for automatic self-resolution of an exception error in a distributed network |
US12001679B2 (en) | 2022-03-31 | 2024-06-04 | Dell Products L.P. | Storage system configured to collaborate with host device to provide fine-grained throttling of input-output operations |
US11620054B1 (en) | 2022-04-21 | 2023-04-04 | Dell Products L.P. | Proactive monitoring and management of storage system input-output operation limits |
US11983432B2 (en) | 2022-04-28 | 2024-05-14 | Dell Products L.P. | Load sharing of copy workloads in device clusters |
US11789624B1 (en) | 2022-05-31 | 2023-10-17 | Dell Products L.P. | Host device with differentiated alerting for single points of failure in distributed storage systems |
US11886711B2 (en) | 2022-06-16 | 2024-01-30 | Dell Products L.P. | Host-assisted IO service levels utilizing false-positive signaling |
US11983429B2 (en) | 2022-06-22 | 2024-05-14 | Dell Products L.P. | Migration processes utilizing mapping entry timestamps for selection of target logical storage devices |
US12001714B2 (en) | 2022-08-16 | 2024-06-04 | Dell Products L.P. | Host device IO selection using buffer availability information obtained from storage system |
US12105956B2 (en) | 2022-09-23 | 2024-10-01 | Dell Products L.P. | Multi-path layer configured with enhanced awareness of link performance issue resolution |
US11934659B1 (en) | 2022-09-28 | 2024-03-19 | Dell Products L.P. | Host background copy process with rate adjustment utilizing input-output processing pressure feedback from storage system |
US12032842B2 (en) | 2022-10-10 | 2024-07-09 | Dell Products L.P. | Host device with multi-path layer configured for alignment to storage system local-remote designations |
US12099733B2 (en) | 2022-10-18 | 2024-09-24 | Dell Products L.P. | Spoofing of device identifiers in non-disruptive data migration |
US12131022B2 (en) | 2023-01-12 | 2024-10-29 | Dell Products L.P. | Host device configured for automatic detection of storage system local-remote designations |
US11989156B1 (en) | 2023-03-06 | 2024-05-21 | Dell Products L.P. | Host device conversion of configuration information to an intermediate format to facilitate database transitions |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5922051A (en) * | 1997-05-14 | 1999-07-13 | Ncr Corporation | System and method for traffic management in a network management system |
US20050030893A1 (en) * | 2003-07-21 | 2005-02-10 | Dropps Frank R. | Method and system for detecting congestion and over subscription in a fibre channel network |
US20050273645A1 (en) * | 2004-05-07 | 2005-12-08 | International Business Machines Corporation | Recovery from fallures in a computing environment |
US7203801B1 (en) * | 2002-12-27 | 2007-04-10 | Veritas Operating Corporation | System and method for performing virtual device I/O operations |
US7313613B1 (en) * | 2002-01-03 | 2007-12-25 | Microsoft Corporation | System and method facilitating network diagnostics and self-healing |
US7590775B2 (en) * | 2004-08-06 | 2009-09-15 | Andrew Joseph Alexander Gildfind | Method for empirically determining a qualified bandwidth of file storage for a shared filed system |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5991829A (en) * | 1994-03-29 | 1999-11-23 | The United States Of America As Represented By The Secretary Of The Navy | Method of sensing target status in a local area network |
US7085227B1 (en) * | 2001-05-11 | 2006-08-01 | Cisco Technology, Inc. | Method for testing congestion avoidance on high speed networks |
US6026425A (en) | 1996-07-30 | 2000-02-15 | Nippon Telegraph And Telephone Corporation | Non-uniform system load balance method and apparatus for updating threshold of tasks according to estimated load fluctuation |
US6421723B1 (en) | 1999-06-11 | 2002-07-16 | Dell Products L.P. | Method and system for establishing a storage area network configuration |
US6950888B1 (en) | 2000-09-29 | 2005-09-27 | International Business Machines Corporation | Method, system and program products for determining whether I/O constraints exist for controllers of a computing environment |
US7343410B2 (en) | 2001-06-28 | 2008-03-11 | Finisar Corporation | Automated creation of application data paths in storage area networks |
US20040153844A1 (en) | 2002-10-28 | 2004-08-05 | Gautam Ghose | Failure analysis method and system for storage area networks |
US6816917B2 (en) * | 2003-01-15 | 2004-11-09 | Hewlett-Packard Development Company, L.P. | Storage system with LUN virtualization |
US7281167B2 (en) | 2003-08-26 | 2007-10-09 | Finisar Corporation | Multi-purpose network diagnostic modules |
US7286967B2 (en) * | 2003-10-20 | 2007-10-23 | Hewlett-Packard Development Company, L.P. | Retrieving performance data from devices in a storage area network |
US20070260728A1 (en) | 2006-05-08 | 2007-11-08 | Finisar Corporation | Systems and methods for generating network diagnostic statistics |
US7716381B2 (en) | 2006-02-22 | 2010-05-11 | Emulex Design & Manufacturing Corporation | Method for tracking and storing time to complete and average completion time for storage area network I/O commands |
US7948909B2 (en) * | 2006-06-30 | 2011-05-24 | Embarq Holdings Company, Llc | System and method for resetting counters counting network performance information at network communications devices on a packet network |
JP4331742B2 (en) | 2006-10-25 | 2009-09-16 | 株式会社日立製作所 | Computer system, computer and method for managing performance based on I / O allocation ratio |
US7835300B2 (en) | 2007-01-26 | 2010-11-16 | Beyers Timothy M | Network diagnostic systems and methods for handling multiple data transmission rates |
-
2009
- 2009-06-17 US US12/486,670 patent/US8635376B2/en active Active
-
2013
- 2013-12-02 US US14/093,926 patent/US20140089735A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5922051A (en) * | 1997-05-14 | 1999-07-13 | Ncr Corporation | System and method for traffic management in a network management system |
US7313613B1 (en) * | 2002-01-03 | 2007-12-25 | Microsoft Corporation | System and method facilitating network diagnostics and self-healing |
US7203801B1 (en) * | 2002-12-27 | 2007-04-10 | Veritas Operating Corporation | System and method for performing virtual device I/O operations |
US20050030893A1 (en) * | 2003-07-21 | 2005-02-10 | Dropps Frank R. | Method and system for detecting congestion and over subscription in a fibre channel network |
US20050273645A1 (en) * | 2004-05-07 | 2005-12-08 | International Business Machines Corporation | Recovery from fallures in a computing environment |
US7590775B2 (en) * | 2004-08-06 | 2009-09-15 | Andrew Joseph Alexander Gildfind | Method for empirically determining a qualified bandwidth of file storage for a shared filed system |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9239746B2 (en) * | 2013-05-30 | 2016-01-19 | Xyratex Technology Limited—A Seagate Company | Method of, and apparatus for, detection of degradation on a storage resource |
US20140359376A1 (en) * | 2013-05-30 | 2014-12-04 | Xyratex Technology Limited | Method of, and apparatus for, detection of degradation on a storage resource |
CN104951383A (en) * | 2014-03-31 | 2015-09-30 | 伊姆西公司 | Hard disk health state monitoring method and hard disk health state monitoring device |
US10055327B2 (en) | 2014-09-30 | 2018-08-21 | International Business Machines Corporation | Evaluating fairness in devices under test |
US10061679B2 (en) | 2014-09-30 | 2018-08-28 | International Business Machines Corporation | Evaluating fairness in devices under test |
US10678670B2 (en) | 2014-09-30 | 2020-06-09 | International Business Machines Corporation | Evaluating fairness in devices under test |
US10671506B2 (en) | 2014-09-30 | 2020-06-02 | International Business Machines Corporation | Evaluating fairness in devices under test |
US9794112B2 (en) | 2015-08-06 | 2017-10-17 | Drivescale, Inc. | Method and system for balancing storage data traffic in converged networks |
CN108027749A (en) * | 2015-08-06 | 2018-05-11 | 驱动器扩展股份有限公司 | Method and system for the storage data traffic in balanced converged network |
US9998322B2 (en) | 2015-08-06 | 2018-06-12 | Drivescale, Inc. | Method and system for balancing storage data traffic in converged networks |
WO2017023461A1 (en) * | 2015-08-06 | 2017-02-09 | Drivescale, Inc. | Method and system for balancing storage data traffic in converged networks |
US10826785B2 (en) * | 2015-09-25 | 2020-11-03 | International Business Machines Corporation | Data traffic monitoring tool |
US9992276B2 (en) | 2015-09-25 | 2018-06-05 | International Business Machines Corporation | Self-expanding software defined computing cluster |
WO2017053687A1 (en) * | 2015-09-25 | 2017-03-30 | Brocade Communication Systems, Inc. | High granularity link oversubscription detection |
US20170093668A1 (en) * | 2015-09-25 | 2017-03-30 | International Business Machines Corporation | Data traffic monitoring tool |
US10637921B2 (en) | 2015-09-25 | 2020-04-28 | International Business Machines Corporation | Self-expanding software defined computing cluster |
US20170093975A1 (en) * | 2015-09-26 | 2017-03-30 | Arun Raghunath | Technologies for managing data object requests in a storage node cluster |
US11089099B2 (en) * | 2015-09-26 | 2021-08-10 | Intel Corporation | Technologies for managing data object requests in a storage node cluster |
US10659532B2 (en) * | 2015-09-26 | 2020-05-19 | Intel Corporation | Technologies for reducing latency variation of stored data object requests |
US20170093976A1 (en) * | 2015-09-26 | 2017-03-30 | Arun Raghunath | Technologies for reducing latency variation of stored data object requests |
US20180278484A1 (en) * | 2015-11-02 | 2018-09-27 | Hewlett Packard Enterprise Development Lp | Storage area network diagnostic data |
US10841169B2 (en) * | 2015-11-02 | 2020-11-17 | Hewlett Packard Enterprise Development Lp | Storage area network diagnostic data |
US10409750B2 (en) | 2016-07-11 | 2019-09-10 | International Business Machines Corporation | Obtaining optical signal health data in a storage area network |
US10503654B2 (en) | 2016-09-01 | 2019-12-10 | Intel Corporation | Selective caching of erasure coded fragments in a distributed storage system |
US11126367B2 (en) * | 2018-03-14 | 2021-09-21 | Western Digital Technologies, Inc. | Storage system and method for determining ecosystem bottlenecks and suggesting improvements |
US10901874B2 (en) * | 2018-05-18 | 2021-01-26 | Sony Interactive Entertainment LLC | Shadow testing |
US11409639B2 (en) | 2018-05-18 | 2022-08-09 | Sony Interactive Entertainment LLC | Shadow testing |
US10691582B2 (en) | 2018-05-29 | 2020-06-23 | Sony Interactive Entertainment LLC | Code coverage |
US11436113B2 (en) | 2018-06-28 | 2022-09-06 | Twitter, Inc. | Method and system for maintaining storage device failure tolerance in a composable infrastructure |
Also Published As
Publication number | Publication date |
---|---|
US8635376B2 (en) | 2014-01-21 |
US20090259749A1 (en) | 2009-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8635376B2 (en) | Computer system input/output management | |
US7716381B2 (en) | Method for tracking and storing time to complete and average completion time for storage area network I/O commands | |
US7725568B2 (en) | Method and apparatus for network storage flow control | |
US8843613B2 (en) | Information processing system, and management method for storage monitoring server | |
US7882393B2 (en) | In-band problem log data collection between a host system and a storage system | |
US8972622B2 (en) | Monitoring network performance and detecting network faults using round trip transmission times | |
US8209684B2 (en) | Monitoring system for virtual application environments | |
US8181161B2 (en) | System for automatically collecting trace detail and history data | |
US8990634B2 (en) | Reporting of intra-device failure data | |
US8051324B1 (en) | Master-slave provider architecture and failover mechanism | |
US7337353B2 (en) | Fault recovery method in a system having a plurality of storage systems | |
US9965200B1 (en) | Storage path management host view | |
US20050097182A1 (en) | System and method for remote management | |
CN111200526B (en) | Monitoring system and method of network equipment | |
EP3332323B1 (en) | Method and system for balancing storage data traffic in converged networks | |
WO2012120634A1 (en) | Management computer, storage system management method, and storage system | |
JP2004086914A (en) | Optimization of performance of storage device in computer system | |
US20230362250A1 (en) | Performance-Driven Storage Provisioning | |
US8095938B1 (en) | Managing alert generation | |
US20200177482A1 (en) | Methods for monitoring performance of a network fabric and devices thereof | |
US8065133B1 (en) | Method for testing a storage network including port level data handling | |
WO2015023286A1 (en) | Reactive diagnostics in storage area networks | |
WO2017074471A1 (en) | Tracking contention in a distributed business transaction | |
US8024460B2 (en) | Performance management system, information processing system, and information collecting method in performance management system | |
CN108599978B (en) | Cloud monitoring method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EMULEX CORPORATION;REEL/FRAME:036942/0213 Effective date: 20150831 |
|
AS | Assignment |
Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH CAROLINA Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:037808/0001 Effective date: 20160201 Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:037808/0001 Effective date: 20160201 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD., SINGAPORE Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:041710/0001 Effective date: 20170119 Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:041710/0001 Effective date: 20170119 |