[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO1990001737A1 - Single disk emulation for synchronous disk array - Google Patents

Single disk emulation for synchronous disk array Download PDF

Info

Publication number
WO1990001737A1
WO1990001737A1 PCT/US1989/002262 US8902262W WO9001737A1 WO 1990001737 A1 WO1990001737 A1 WO 1990001737A1 US 8902262 W US8902262 W US 8902262W WO 9001737 A1 WO9001737 A1 WO 9001737A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
data
disk drive
disk drives
sector
Prior art date
Application number
PCT/US1989/002262
Other languages
French (fr)
Inventor
Robert J. Halford
Original Assignee
Cray Research, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cray Research, Inc. filed Critical Cray Research, Inc.
Publication of WO1990001737A1 publication Critical patent/WO1990001737A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B19/00Driving, starting, stopping record carriers not specifically of filamentary or web form, or of supports therefor; Control thereof; Control of operating function ; Driving both disc and head
    • G11B19/20Driving; Starting; Stopping; Control thereof
    • G11B19/28Speed controlling, regulating, or indicating
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • G11B20/1217Formatting, e.g. arrangement of data block or words on the record carriers on discs
    • G11B20/1252Formatting, e.g. arrangement of data block or words on the record carriers on discs for discontinuous data, e.g. digital information signals, computer programme data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/18Error detection or correction; Testing, e.g. of drop-outs
    • G11B20/1816Testing
    • G11B2020/183Testing wherein at least one additional attempt is made to read or write the data when a first attempt is unsuccessful

Definitions

  • This invention relates generally to disk storage for computer systems. In particular, it is directed to an array of synchronous disk drives that emulate a single logical disk drive.
  • Disk drives have long been popular mass storage devices. They provide a low cost solution to the problem of non-volatile data storage. Virtually all computer system manufacturers, therefore, provide for disk drives as system peripherals.
  • the problems facing a computer system user wishing to increase the data transfer rates of disk drives are not trivial. Up until now, most solutions have sought to incrementally enhance the performance of a single disk drive while retaining the disk drive's basic architecture.
  • the basic structure of the disk drive consists of a metal disk coated with magnetic material rotating under one or more read/write heads. Most disk drives are multi-platen systems where a number of the metal disks are arranged in a stack.
  • All data transfers to disk drives are sequen ⁇ tial in the sense that data moves in or out sequentially one word at a time.
  • the access time to a selected word is partially dependent on its location.
  • Data is recorded on the disk in concentric circles called "tracks".
  • the disk drive has detection means for indi ⁇ cating when the magnetic head is positioned at the outermost track.
  • a stepper motor or servo-controlled linear motor
  • This head positioning func ⁇ tion is called a "seek”.
  • the period required to posi ⁇ tion the Read/Write heads from the time the command is received until the time the drive becomes ready is known as the seek time.
  • latency time is the time for half a revolution.
  • sectors Within each track, information is organized into segments called "sectors".
  • a sector can consist of any number of bytes, limited only by the storage capa ⁇ city of the track.
  • the addressing of sectors is typi- cally a software function. So that the sectors can be identified by the software, each sector is preceded by an identifier block. The format of this identifier block is system dependent.
  • each track is single bit serial, so that each byte is stored as eight consecutive bits on a track. Because track selection and latency increase access times, it is preferable to transfer large blocks of data which will be stored in sequential locations. Once the disk heads are positioned at a particular track and no further head movement is required, data will be transferred at a fixed rate. This fixed rate is deter- mined by the speed of the disk drive and is independent of the computer system itself.
  • Prior art disk drives are limited in their capacity and speed in transferring data.
  • the prior art is lacking in high performance disk drives which will allow data transfer rates exceeding the speed of currently available disk drives.
  • the prior art disk drives are also limited in their capacity to store data. There is a need in the art, therefore, for a higher capacity disk drive than that currently available.
  • a multiple disk drive array storage device in which a plurality of single disk drives are synchronized and controlled to emulate the operation of a single high-speed, high- capacity disk drive.
  • the array storage device appears at the interface to the host computer to be a single disk drive.
  • the array storage device synchronizes the spindles of a plurality of individual disk drives by means of a master clock (synchronizing) signal.
  • Multi- bit digital data words are divided into subparts and each subpart is placed in a different fixed sector for ⁇ mat block corresponding to each disk drive.
  • Subparts from a plurality of words are assembled to form a single fixed sector format block, and when completed, each sec- tor format block is written to an individual disk drive. In this fashion, the speed and capacity of the array storage device appears as a multiple of the number of individual disk drives internal to the device.
  • Fig. 1 is a block diagram of the present invention whereby the disk drive subsystem employs a disk array.
  • Fig. 2 is a typical disk platter stack used for storing information.
  • Fig. 3 is a block diagram of the present invention illustrating the buffering operations used to synchronize the data transfers between the host computer and the disk drive subsystem.
  • Fig. 4 is a block diagram of the present invention that illustrates the connections for control and data cables between the disk controller and each individual disk drive in the disk array.
  • Fig. 5 illustrates the format of a sector on an individual disk drive of the present invention.
  • Fig. 6 shows the logical grouping of data on the disk array and its error detection and correction means.
  • a host computer 101 communicates with a disk drive sub-system 103 via an input/output channel 102.
  • This communication includes both control information and data to be stored on the disk drive sub-system 103.
  • the data is transmitted in, for example, 16-bit-wide parcels from the host computer 101.
  • Each bit of the 16-bit-wide parcel, plus a parity bit, is stored in a simultaneous, parallel operation on one of 17 standard disk drives 900a-900g in array 900.
  • This parallel operation rather than the standard serial operation, results in a storage transfer rate that is 17 times faster than the standard architecture.
  • the disk controller 108 broadcasts control signals to the array of disk drives 900 simultaneously.
  • This controller 108 provides an interface that appears to the host computer 101 as a single disk drive, thereby providing transparent operation and co patability with existing computer systems.
  • the array of disk drives 900 perform the same operations simultaneously because their disk platter stacks (also known as "spindles") are synchronized. This synchronization requires that all disk platters be aligned and that each disk drive reach its index mark at the same time.
  • the interface between the disk drive sub ⁇ system 103 and the host computer 101 consists of two cables designated Cable 104 and 105. Each cable con ⁇ sists of 24 signals, described in Table 1.
  • Signals originating with the host computer 101 are transmitted in Cable 104.
  • Signals originating in the disk drive subsystem 103 are transmitted in Cable 105.
  • the signal designated WRITE CLOCK is a clock generated by the host computer 101. It synchronizes the transmission of commands and data to the disk drive subsystem 103.
  • the four CODE signals indicate the function to be per ⁇ formed by the disk drive subsystem 103. These function codes are examined and decoded by the disk drive sub ⁇ system 103 only when the FUNCTION/DATA READY is active and the PARITY signal indicates there are no errors.
  • the PARITY signal carries an odd parity value for the CODE signals.
  • the signals BUS OUT Q through BUS OUT 15 form a 16 bit data bus from the host computer 101 to the disk drive subsystem 103.
  • the disk drive subsystem 103 can read 16 bits of data from the BUS OUT Q -15 signals.
  • the PARITY (BUS OUT) signal transmits the odd parity value for the BUS OUT Q -15 signals.
  • the signal designated READ CLOCK is a clock generated by the disk drive subsystem 103 for the synchronized transfer of status and data signals to the host computer 101.
  • the signal STATUS/DATA READY is set active by the disk drive sub-system 103 when the disk drive subsystem 103 is transmitting data or status information on the BUS IN signals to the host computer 101.
  • the signal STATUS/DATA READY is pulsed active for a single READ CLOCK pulse during write functions to indicate the readiness of the disk drive subsystem 103 to accept the next transfer.
  • the signal ERROR is set active by the disk drive subsystem 103 in conjunction with the signal DONE to indicate that at least one error condition occurred during the execution of the current function.
  • the signal DONE indicates the completion of command.
  • Signal DONE is set active for only one pulse of the signal READ CLOCK.
  • the signal READY indicates the availability of the disk drive subsystem 103 to accept commands from the host computer 101.
  • the signal INDEX/SECTOR MARK carries coded index and sector mark information.
  • the signal INDEX/SECTOR MARK is set active by the disk drive sub ⁇ system 103 for a single READ CLOCK pulse to indicate a sector mark.
  • Sector mark signals establish rotational reference points along a given track on the disk stack. Each track in the preferred embodiment can contain up to 128 sectors with the initial sector (sector 0) starting coincident with the index point. Sector marks are generated at the end of each physical sector on the track.
  • the signal INDEX/SECTOR MARK is set active by the disk drive subsystem 103 for two consecutive READ CLOCK pulses to indicate an index mark.
  • the index mark is a status indicator that, when active, indicates that the reference point or index area is passing under the head.
  • the index point is a pulse which occurs once per revolution of the disk stack. The index point is used to reset the timing circuits once each revolution.
  • the index mark is generated at the beginning of sector 0.
  • the signal PARITY (STATUS) carries the odd parity value for the com ⁇ bination of signals STATUS/DATA READY, ERROR, DONE and READY. This parity value can be examined whenever the signal READY is set active by the disk drive subsystem 103.
  • the signals BUS IN Q through BUS I 15 form a 16 bit data bus that transmits information from the disk drive subsystem 103 to the host computer 101.
  • the signal PARITY (BUS IN) carries in the preferred embodi ⁇ ment the odd parity value for the signals BUS INg through BUS I 15 during the READ CLOCK pulses where the signal READY is active.
  • the commands generated by the host computer 101 include SELECT, READ, WRITE, HEAD SELECT, CYLINDER SELECT, DATA TRANSFER, other commands, and diagnostics. These commands are identified by a specific combination of active and inactive CODE signals. Recognition of a command by the disk drive subsystem 103 requires an active signal FUNCTION/DATA READY and a valid signal PARITY (STATUS) during a WRITE CLOCK pulse.
  • the first command generated by the host computer 101 will be the command SELECT. If the binary value transmitted by the BUS OUT signals during a SELECT command matches the logical unit number of the disk drive subsystem 103, the disk drive subsystem 103 recognizes that the host co - puter 101 wishes to converse with it.
  • the next series of commands will indicate the head and track to use during the READ or WRITE operation.
  • the command HEAD SELECT specifies the head group to use for those disk drive subsystems that support multiple Read/Write heads on a single disk platter stack. Note that this embodiment assumes each individual disk in the disk 900 is iden ⁇ tical to the others. Therefore, this embodiment will support multiple head drives and the HEAD SELECT command only if all drives are multiple head drives.
  • Fig. 2 depicts a disk platter stack with multiple Read/Write heads.
  • the signals BUS OUT 8 through BUS OUT 1Q indicate which head group to use.
  • the command CYLINDER SELECT identifies which track to use.
  • the interface protocol between the host com- puter and the disk drive controller 108 in the pre ⁇ ferred embodiment of the present invention could be of the type described in my co-pending patent application SN 622,066 entitled "Electrical Interface System" and assigned to the assignee of the present invention. Those skilled in the art will readily recognize other interface systems which could be used.
  • the READ command initiates the transfer of information from the disk drive subsystem 103 to the host computer 101.
  • the BUS OUT signals are used to indicate parameters to the READ command.
  • the signals BUS OUT 0 through BUS OUT 7 are used to identify which sector is desired.
  • the signals BUS OUT ⁇ through BUS OUTJ Q are used to indicate which head to select following the execution of the current READ command on those disk drive subsystems that support multiple Read/Write heads on a single disk platter stack.
  • the signals BUS OUT 12 through BUS OUT 14 indicate a "read option" as follows:
  • the READ command is the implied request for a first data packet comprised of 16 words serially transmitted, each word being 16 bits of parallel data (termed a 16x16 packet). If an error is detected in the beginning phase of the READ process (e.g., reading the sector ID, com ⁇ paring the ID, verifying that the sector is not flawed. etc.) then a DONE signal and the ERROR signal is returned. If no error is detected, then the first data packet is returned. In order to continue the READ pro ⁇ cess, FUNCTION/READY is activated along with the speci- fie CODE combination for "DATA" during a single WRITE CLOCK period. This process is repeated 128 times to read a 4096 byte data sector. When all the data has been transferred, the disk drive subsystem 103 responds with an active DONE signal and the ERROR signal if asserted.
  • the 16x16 packet approach is preferred due to its higher transmission efficiency since a minimum amount of handshaking protocol is required for transmitting the digital data.
  • 16 words of 16-bit data is transmitted in a packet to pro ⁇ vide a highly efficient disk channel protocol.
  • This packet transfer technique is described in U. S. Patent Application Serial No. 644,066 of Robert James Halford entitled "Electrical Interface System", assigned to the assignee of the present invention.
  • the WRITE command transfers information from the host computer 101 to the disk drive subsystem 103.
  • the BUS OUT signals indicate specific parameters to the WRITE command.
  • the signals BUS OUTg through BUS OUT 7 indicate the specific sector where the data is to be written.
  • the signals BUS OUTs through BUS OUT 10 indi ⁇ cate the head which is to be selected following execu ⁇ tion of the current WRITE command for those disk drive subsystems that support multiple Read/Write heads on a single disk platter stack.
  • the signals BUS OUT ⁇ 2 through BUS OUT14 indicate the write options as follows: 1) Verify the section ID and write the data portion of the sector;
  • the WRITE command is indicated by a specific CODE combination and an active FUNCTION/READY signal.
  • the WRITE command is the implied request for the first data packet (16x16 bits). If an error is detected in the beginning phases of the WRITE process (e.g., ID field, ID field compare, etc.), then a DONE signal and the ERROR signal is returned. If no error is detected, then the first data packet is transferred.
  • FUNCTION/READY is activated along with the specific CODE for "DATA" for 16 WRITE CLOCK periods. This process is repeated 128 times to write a 4096 byte data sector.
  • the disk drive subsystem 103 responds with an active DONE signal and the ERROR signal, if asserted.
  • Disk controllers generally provide a means for synchronizing the data transfers between a host computer and a disk drive.
  • the disk drive subsystem 103 contains a controller 108 which includes a buffer memory 113 so that it can temporarily store data, thus giving the host computer 101 or disk drive subsystem 103 more time to respond.
  • data transfer instructions may include the number of bits to be transferred and the controller 108 may be used to keep track.
  • one level of buffering is provided in the input/output chan ⁇ nel 102 and two levels of buffering are provided within the controller 108.
  • the host computer 101 can transfer data at a rate of 850 MBits/sec.
  • an individual disk drive for example drive 900a within the disk array 900 can only store or retrieve data at a rate of 10 to 24 Mbits/sec. Therefore, the input/output channel 102 and the controller 108 perform buffering and synchronization functions using a speed matching technique.
  • the inpu /output channel 102 provides for dual speed matching buffers 371 and 371. Using dual buffers 371 and 372, the channel 102 will transfer data into buffer 371 first, then when buffer 371 is full it will switch to buffer 372 and continue transferring data into it. While data is being transferred into the buffer 372, data can be transferred out of buffer 371 and transmitted to the disk controller buffers 113. By using the speed matching technique, the transfer of data from the 850 Mbits/sec interface to the multiple 10 to 24 Mbits/sec individual disk interfaces is easily attained.
  • Each data transfer between the channel 102 and the disk controller buffers 113 will use the buffer locations 381 and 382.
  • Each transfer into buffer 381 or 382 transmits 16 bits of data.
  • a parity check is per ⁇ formed at that time and the 16 data bits plus the parity bit are stored in the speed matching buffers 381 and 382.
  • the data stored in buffers 381 and 382 is then transmitted serially to one of seventeen dual speed matching buffers 301 through 334.
  • each bit of the 16 bit parcel and single bit parity value is routed to a particular pair of speed matching buffers depending on its position within the parcel.
  • bit position 0 will be routed along the data path associated with speed matching buffers 301-302 and disk drive 900a; bit position 1 will be routed along the data path associated with- speed matching buffers 303-304 and disk drive 900b; etc.
  • each bit retrieved from a disk drive is then stored in a unique pair of speed matching buffers.
  • the transfer from buffers 301 through 334 into 381 or 382 combines the bits into a 16 bit par ⁇ cel, plus parity bit, that is ordered in the following manner: the bit retrieved from disk drive 900a will pro ⁇ vide the bit for position 0; the bit retrieved from disk drive 900b will provide the bit for position 1, etc.
  • This embodiment provides a storage capacity equal to 17 times the storage capacity of an individual disk drive because there are seventeen parallel disk drives operating in unison.
  • an individual disk drive has a data transfer rate of 10-24 Mbits/sec
  • this embodiment provides an overall data transfer rate of 170-408 Mbits/sec. or 17 times the transfer rate of an individual disk drive (16 out of the 17 disk drives store data, the 17th disk drive stores the parity bit for error detection purposes and therefore, to the user.
  • the effective data transfer rate is 160-384 Mbits/sec. ) .
  • the seek rate, or the time required to address a par-, ticular track, for this invention is equal to the seek rate of an individual disk drive. Because this embodi- ment uses synchronous spindles, the latency period is equal to the average case for an individual disk drive.
  • two cables connect the common disk controller 108 to each individual disk drive: a Control cable 168 and multiple Read/Write cables 150 through 167.
  • the disk drives are "daisy chained" on the Control cable 168 although this tech ⁇ nique is not essential.
  • Each individual disk drive requires a direct connection to the disk controller for the Read/Write cable.
  • the Control cable 168 contains BUSg through BUSg signals, TAGg thrdugh TAG 2 signals, STATUSg through STATUS8 signals, the OPEN CABLE signal, the SELECT ENABLE signal, UNIT SELECT through UNIT SELECT3 signals, and the SERVO CLOCK signal.
  • the Read/Write cable contains the signals READ CLOCK, READ DATA, WRITE CLOCK, WRITE DATA, UNIT SELECTED, and SEEK END.
  • each disk drive compares its manually set address with the value indicated by the UNIT SELECT signals. If the values match, the selected disk drive recognizes that the disk controller wishes to converse with it. In this embodiment, all disk drives are set with the same address. Therefore, all of the disk drives respond simultaneously and in parallel to the same UNIT SELECT signal .
  • TAG signals indicate the type of function to be performed. Again, all disk drives in array 900 will respond simultaneously and in parallel. TAGg tells the disk drive to move its Read/Write head to the loca ⁇ tion indicated by the address represented by the signals BUSg through BUSg.
  • the signal SEEK END indicates that the Read/Write head is stationary, that it is positioned over a track, and that it is ready to perform the next data transfer operation. TAG ⁇ indicates which
  • the head group will be indicated by the signals on BUSg through BUS3.
  • the signal TAG 2 indicates that the signals respresented by BUSg through BUSg should be interpreted by the disk drive as a function code such as READ, WRITE or some other command.
  • control signal TAG 2 indicates a function
  • an active BUSg signal indicates a WRITE operation.
  • the WRITE CLOCK signal provides the timing pulse for the WRITE DATA line.
  • Data is transmitted to the disk drive on the WRITE DATA line.
  • the BUSg indicator will cause the disk drive to enable its write circuitry to convert the data on the WRITE DATA line to current reversals in the Read/Write head, thereby recording information on a selected track on the disk stack.
  • control signal TAG 2 indicates a function
  • an active BUS ⁇ signal indicates a READ operation.
  • the READ CLOCK signal provides the timing pulse for the signal READ DATA.
  • Data is transmitted to the controller on the READ DATA line.
  • the BUS j indicator will cause the disk drive to enable its read circuitry to convert the analog information recorded on a given track to digitized data which is transmitted through the READ DATA line back to the disk controller.
  • control signal TAG 2 indicates a function then an active BUSg signal indicates a REZERO operation. This indicator will cause the disk drive head positioner to reposition to cylinder 0.
  • the record format on the disk is under the control of the disk controller.
  • the index and sector mark pulses are available for use by the controller to indicate the beginning of a track or sector.
  • the preferred sector format is shown in Fig. 5. Those skilled / in the art will readily recognize that other formats could be used.
  • the field HEAD SCATTER 622 provides some tolerance for head skew.
  • the field HEAD SCATTER 622 consists of binary O's.
  • the field BINARY O's SYNC 623 consists of multiple binary O's used to provide phase locked oscillator synchronization and some tolerance for timing.
  • the field BINARY l's SYNC 624 consists of one or more binary l's used to indicate the beginning of either the sector ID 625 or the sector data 628.
  • the field EOR 629 End of Record Tolerance
  • the desired unit is selected and the track, head, and sector are addressed.
  • the disk controller 108 searches for the leading edge of the desired sector.
  • the sector ID field 625 must always be read and verified prior to writing the sector data field 628.
  • the sector data field 628 must always be preceded by the BINARY O's SYNC 626 and BINARY l's SYNC 627 fields.
  • BINARY O's SYNC 626 also includes a HEAD SCATTER portion similar in operation to HEAD SCATTER 622, providing tolerance for head skew.
  • the sector data field 628 must always be followed by the EOR pad 629.
  • Spindle synchronization is accomplished in the preferred embodiment by developing an external clock within the disk controller 108.
  • This external clock provides a master pulse-once per revolution time with possible extra pulses each N microsecond.
  • the MASTER CLOCK signal provides the disk controller with a method for synchronizing the WRITE DATA line with the actual speed of the disk platters. In this embodiment, one MASTER CLOCK signal will be broadcast to all individual disk drives to force the spindle synchronization.
  • 14-inch disk drives use a synchronizing pin to synchro ⁇ nize the AC motors controlling the spindles. This synchronization function was used for video instant replay for digitized images.
  • the 14-inch disk drives were synchronized to + microseconds skew by providing an external clock driving the synchronization pin of each drive.
  • the sector zero marks of each platen were then synchronized with 4-microsecond skew.
  • skew must be kept below LO microseconds for synchronized drives. The lower the skew the higher the data packing capacity of the present invention.
  • the synchronization of spindles must occur at a very fine level.
  • the disk platter typically rotates at a speed of 3600 rpm. This results in one revolution every 0.0166 seconds.
  • the sector time (the period that the sector is under the Read/Write head) is 0.0166 divided by 64 or 0.000260 seconds.
  • the spindles can be synchronized to a tolerance of plus or minus 12.5 microseconds (0.0000125 seconds) and the sector time is 260 microseconds (0.000260 seconds), then at least 25 microseconds or 10% of the sector space must be given up to the BINARY O's SYNC field to ensure that all disk drives begin reading information from the same sector on the same track at the same time.
  • the tolerance is N microseconds, then it is more realistic to use 5xN micro seconds to insure spindle synchronization.
  • the level of spindle synchronization is important. For example, if the spindles can be synchronized only to a tolerance of plus or minus 80 microseconds, then the BINARY O's SYNC field would require 33% of the sector space (for 64 sectors per track). Of course, the pre- sent invention would still work.
  • the optimal synchroni ⁇ zation factor for this approach should be near one microsecond.
  • Fig. 6 which describes logi ⁇ cally how data is stored on the parallel disk drives in the array 900.
  • Each row represents bits stored on a single sector on a single track on a single disk drive.
  • Each column represents a 16 bit word transferred by the host computer 100. Each bit is stored on a different disk drive. The parity bit on the seventeenth disk drive was generated by the dis k controller for error detection purposes.
  • the data is logically grouped in 15 word segments. Each 15 word segment includes error detection and correction means labeled in Fig. 6 as bits Eg through E 15
  • Each sector includes an additional word for redundant error correc ⁇ tion and detection labeled in Fig. 6 as bits Cg through C 1S .
  • any track in error can be corrected for any 15 word segment.
  • bits Pg through are "vertical" parity bits. They contain the odd parity value for the column of bits.
  • the bits labelled Eg through E15 j_ n Fig. 6 are members of the Error Correction Code (ECC) value for the block.
  • ECC Error Correction Code
  • the com ⁇ bination of ECC and parity check bits enables the iden ⁇ tification and correction of all failing bits on any single disk drive within a 15 word segment.
  • the row, or disk drive, in error can change every sixteen bits for the case of randomly detected unflawed media defects.
  • an ECC is generated over the entire sector and stored as a vertical "word” imme ⁇ diately following the last group in the sector. This ECC will verify that the sector was repaired correctly.
  • a plurality of single disk drives, synchronized and controlled to emulate the operation of a single disk drive is a cost-effective means of pro ⁇ viding high-performance, high-capacity disk storage devices.
  • Another benefit of the present invention is the reliability and cost-effectiveness of using just one disk controller to manage a plurality of disk drives.
  • Yet another benefit is that the number of cables con- necting the host computer with the disk storage device is reduced with regards to the number of disk drives communicating with the host computer.
  • a high degree of fault tolerance is provided through the use of parity bits and sector ECC parcels such that one disk drive within the array can fail without interrupting the operation of data storage and retrieval.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

A multiple disk drive array storage device is described which emulates the operation of a single disk drive so that the handshaking and protocol between the array storage device and the host computer appears to the host computer to be that of a single disk drive. The array storage device includes a plurality of individual disk drives, each of which having its spindle synchronized to the other disk drives using a master clock synchronization. Digital data words are received by the array storage device controller which divides the words into subparts and writes each subpart to a different disk drive within the storage device. The buffering and formatting of the digital data for reading and writing from the individual disk drives is accomplished by the controller transparent to the host computer.

Description

SINGLE DISK EMULATION FOR SYNCHRONUS DISK ARRAY Background of the Invention 1. Field of the Invention
This invention relates generally to disk storage for computer systems. In particular, it is directed to an array of synchronous disk drives that emulate a single logical disk drive. 2. Description of the Prior Art
Disk drives have long been popular mass storage devices. They provide a low cost solution to the problem of non-volatile data storage. Virtually all computer system manufacturers, therefore, provide for disk drives as system peripherals.
The major advantage of disk drives is low cost. This advantage is outweighed for some applications by the disadvantage of insufficient data transfer speed, particularly in super computer and other high performance computing environments such as the type manufactured by
Cray Research, Inc., the assignee of the present invention. The problems facing a computer system user wishing to increase the data transfer rates of disk drives are not trivial. Up until now, most solutions have sought to incrementally enhance the performance of a single disk drive while retaining the disk drive's basic architecture. The basic structure of the disk drive consists of a metal disk coated with magnetic material rotating under one or more read/write heads. Most disk drives are multi-platen systems where a number of the metal disks are arranged in a stack.
All data transfers to disk drives are sequen¬ tial in the sense that data moves in or out sequentially one word at a time. The access time to a selected word is partially dependent on its location. Data is recorded on the disk in concentric circles called "tracks". The disk drive has detection means for indi¬ cating when the magnetic head is positioned at the outermost track. A stepper motor (or servo-controlled linear motor) controls the head position causing it to step from track to track. This head positioning func¬ tion is called a "seek". The period required to posi¬ tion the Read/Write heads from the time the command is received until the time the drive becomes ready is known as the seek time.
Once a track is selected, it is necessary to wait for the desired location to rotate into position under the head. The average waiting time, known as latency time, is the time for half a revolution.
Within each track, information is organized into segments called "sectors". A sector can consist of any number of bytes, limited only by the storage capa¬ city of the track. The addressing of sectors is typi- cally a software function. So that the sectors can be identified by the software, each sector is preceded by an identifier block. The format of this identifier block is system dependent.
Usually each track is single bit serial, so that each byte is stored as eight consecutive bits on a track. Because track selection and latency increase access times, it is preferable to transfer large blocks of data which will be stored in sequential locations. Once the disk heads are positioned at a particular track and no further head movement is required, data will be transferred at a fixed rate. This fixed rate is deter- mined by the speed of the disk drive and is independent of the computer system itself.
Prior art disk drives are limited in their capacity and speed in transferring data. The prior art is lacking in high performance disk drives which will allow data transfer rates exceeding the speed of currently available disk drives. The prior art disk drives are also limited in their capacity to store data. There is a need in the art, therefore, for a higher capacity disk drive than that currently available.
Summary of the Invention To overcome the limitations in the prior art discussed above, and to overcome other limitations readily recognizable to those skilled in the prior art, this invention provides a new architecture for disk drive storage devices. According to the preferred embo¬ diment of the present invention, a multiple disk drive array storage device is described in which a plurality of single disk drives are synchronized and controlled to emulate the operation of a single high-speed, high- capacity disk drive. The array storage device appears at the interface to the host computer to be a single disk drive. The array storage device synchronizes the spindles of a plurality of individual disk drives by means of a master clock (synchronizing) signal. Multi- bit digital data words are divided into subparts and each subpart is placed in a different fixed sector for¬ mat block corresponding to each disk drive. Subparts from a plurality of words are assembled to form a single fixed sector format block, and when completed, each sec- tor format block is written to an individual disk drive. In this fashion, the speed and capacity of the array storage device appears as a multiple of the number of individual disk drives internal to the device.
Brief Description of the Drawings In the drawings, where like numerals refer to like elements throughout the several views.
Fig. 1 is a block diagram of the present invention whereby the disk drive subsystem employs a disk array.
Fig. 2 is a typical disk platter stack used for storing information.
Fig. 3 is a block diagram of the present invention illustrating the buffering operations used to synchronize the data transfers between the host computer and the disk drive subsystem.
Fig. 4 is a block diagram of the present invention that illustrates the connections for control and data cables between the disk controller and each individual disk drive in the disk array.
• Fig. 5 illustrates the format of a sector on an individual disk drive of the present invention.
Fig. 6 shows the logical grouping of data on the disk array and its error detection and correction means. Detailed Description of a Preferred Embodiment In the following Detailed Description of the preferred embodiment, reference is made to the accom- panying drawings which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. This embodiment is described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
Referring to Fig. 1 initially, a host computer 101 communicates with a disk drive sub-system 103 via an input/output channel 102. This communication includes both control information and data to be stored on the disk drive sub-system 103. The data is transmitted in, for example, 16-bit-wide parcels from the host computer 101. Each bit of the 16-bit-wide parcel, plus a parity bit, is stored in a simultaneous, parallel operation on one of 17 standard disk drives 900a-900g in array 900. This parallel operation, rather than the standard serial operation, results in a storage transfer rate that is 17 times faster than the standard architecture. Those skilled in the art will readily recognized that other size binary words, such as 64 bits, may be transferred as parallel data bits and that a one-to-one correspon¬ dence between the number of parallel data bits and the number of disk drives in array 900 is not required. The disk controller 108 broadcasts control signals to the array of disk drives 900 simultaneously. This controller 108 provides an interface that appears to the host computer 101 as a single disk drive, thereby providing transparent operation and co patability with existing computer systems. The array of disk drives 900 perform the same operations simultaneously because their disk platter stacks (also known as "spindles") are synchronized. This synchronization requires that all disk platters be aligned and that each disk drive reach its index mark at the same time. Those skilled in the art will readily recognize, given the teachings of the present invention, that a greater or lesser number of individual disk drives may be combined to form the array 900. In the preferred embodiment of the present invention, the interface between the disk drive sub¬ system 103 and the host computer 101 consists of two cables designated Cable 104 and 105. Each cable con¬ sists of 24 signals, described in Table 1.
Table 1
Cable (104) Cable (105) WRITE CLOCK READ CLOCK FUNCTION READY STATUS/DATA READY CODE 0-3 ERROR PARITY (CODE) DONE BUS OUT 0-15 READY PARITY (BUS OUT) INDEX/SECTOR MARK PARITY (STATUS) BUS IN 0_15 PARITY (BUS IN)
Signals originating with the host computer 101 are transmitted in Cable 104. Signals originating in the disk drive subsystem 103 are transmitted in Cable 105.
On cable 104 in the preferred embodiment the signal designated WRITE CLOCK is a clock generated by the host computer 101. It synchronizes the transmission of commands and data to the disk drive subsystem 103. The four CODE signals indicate the function to be per¬ formed by the disk drive subsystem 103. These function codes are examined and decoded by the disk drive sub¬ system 103 only when the FUNCTION/DATA READY is active and the PARITY signal indicates there are no errors. The PARITY signal carries an odd parity value for the CODE signals. The signals BUS OUTQ through BUS OUT15 form a 16 bit data bus from the host computer 101 to the disk drive subsystem 103. Each time that the signal WRITE CLOCK is active, and the signal FUNCTION/DATA READY is active, the disk drive subsystem 103 can read 16 bits of data from the BUS OUT Q-15 signals. The PARITY (BUS OUT) signal transmits the odd parity value for the BUS OUT Q-15 signals. On cable 105 of the preferred embodiment, the signal designated READ CLOCK is a clock generated by the disk drive subsystem 103 for the synchronized transfer of status and data signals to the host computer 101. The signal STATUS/DATA READY is set active by the disk drive sub-system 103 when the disk drive subsystem 103 is transmitting data or status information on the BUS IN signals to the host computer 101. The signal STATUS/DATA READY is pulsed active for a single READ CLOCK pulse during write functions to indicate the readiness of the disk drive subsystem 103 to accept the next transfer.
The signal ERROR is set active by the disk drive subsystem 103 in conjunction with the signal DONE to indicate that at least one error condition occurred during the execution of the current function. The signal DONE indicates the completion of command. Signal DONE is set active for only one pulse of the signal READ CLOCK. The signal READY indicates the availability of the disk drive subsystem 103 to accept commands from the host computer 101.
The signal INDEX/SECTOR MARK carries coded index and sector mark information. The signal INDEX/SECTOR MARK is set active by the disk drive sub¬ system 103 for a single READ CLOCK pulse to indicate a sector mark. Sector mark signals establish rotational reference points along a given track on the disk stack. Each track in the preferred embodiment can contain up to 128 sectors with the initial sector (sector 0) starting coincident with the index point. Sector marks are generated at the end of each physical sector on the track. The signal INDEX/SECTOR MARK is set active by the disk drive subsystem 103 for two consecutive READ CLOCK pulses to indicate an index mark. The index mark is a status indicator that, when active, indicates that the reference point or index area is passing under the head. The index point is a pulse which occurs once per revolution of the disk stack. The index point is used to reset the timing circuits once each revolution. The index mark is generated at the beginning of sector 0.
In the preferred embodiment, the signal PARITY (STATUS) carries the odd parity value for the com¬ bination of signals STATUS/DATA READY, ERROR, DONE and READY. This parity value can be examined whenever the signal READY is set active by the disk drive subsystem 103. The signals BUS INQ through BUS I 15 form a 16 bit data bus that transmits information from the disk drive subsystem 103 to the host computer 101. The signal PARITY (BUS IN) carries in the preferred embodi¬ ment the odd parity value for the signals BUS INg through BUS I 15 during the READ CLOCK pulses where the signal READY is active.
The commands generated by the host computer 101 include SELECT, READ, WRITE, HEAD SELECT, CYLINDER SELECT, DATA TRANSFER, other commands, and diagnostics. These commands are identified by a specific combination of active and inactive CODE signals. Recognition of a command by the disk drive subsystem 103 requires an active signal FUNCTION/DATA READY and a valid signal PARITY (STATUS) during a WRITE CLOCK pulse.
When the host computer 101 wants to transfer data in or out of the disk drive subsystem 103, the first command generated by the host computer 101 will be the command SELECT. If the binary value transmitted by the BUS OUT signals during a SELECT command matches the logical unit number of the disk drive subsystem 103, the disk drive subsystem 103 recognizes that the host co - puter 101 wishes to converse with it.
After the' disk drive sub-system 103 has been selected by the host computer 101, the next series of commands will indicate the head and track to use during the READ or WRITE operation. The command HEAD SELECT specifies the head group to use for those disk drive subsystems that support multiple Read/Write heads on a single disk platter stack. Note that this embodiment assumes each individual disk in the disk 900 is iden¬ tical to the others. Therefore, this embodiment will support multiple head drives and the HEAD SELECT command only if all drives are multiple head drives. Fig. 2 depicts a disk platter stack with multiple Read/Write heads. The signals BUS OUT8 through BUS OUT1Q indicate which head group to use. The command CYLINDER SELECT identifies which track to use. Those skilled in the art will readily recognize that single head or single platt'er disk drives could be substituted for the multiple platter, multiple head disk drives 900.
The interface protocol between the host com- puter and the disk drive controller 108 in the pre¬ ferred embodiment of the present invention could be of the type described in my co-pending patent application SN 622,066 entitled "Electrical Interface System" and assigned to the assignee of the present invention. Those skilled in the art will readily recognize other interface systems which could be used. The READ command initiates the transfer of information from the disk drive subsystem 103 to the host computer 101. The BUS OUT signals are used to indicate parameters to the READ command. The signals BUS OUT0 through BUS OUT7 are used to identify which sector is desired. The signals BUS OUTβ through BUS OUTJQ are used to indicate which head to select following the execution of the current READ command on those disk drive subsystems that support multiple Read/Write heads on a single disk platter stack. The signals BUS OUT12 through BUS OUT14 indicate a "read option" as follows:
1) Verify sector ID and read the data portion of the sector;
2) Read the sector ID; 3) Read sector data only;
4) Read the current contents of controller memory;
5) Read the sector ECC (ERROR CORRECTION CODE) information. The READ command is indicated by a specific
CODE combination and an active FUNCTION/READY signal. The READ command is the implied request for a first data packet comprised of 16 words serially transmitted, each word being 16 bits of parallel data (termed a 16x16 packet). If an error is detected in the beginning phase of the READ process (e.g., reading the sector ID, com¬ paring the ID, verifying that the sector is not flawed. etc.) then a DONE signal and the ERROR signal is returned. If no error is detected, then the first data packet is returned. In order to continue the READ pro¬ cess, FUNCTION/READY is activated along with the speci- fie CODE combination for "DATA" during a single WRITE CLOCK period. This process is repeated 128 times to read a 4096 byte data sector. When all the data has been transferred, the disk drive subsystem 103 responds with an active DONE signal and the ERROR signal if asserted.
The 16x16 packet approach is preferred due to its higher transmission efficiency since a minimum amount of handshaking protocol is required for transmitting the digital data. For each request, 16 words of 16-bit data is transmitted in a packet to pro¬ vide a highly efficient disk channel protocol. This packet transfer technique is described in U. S. Patent Application Serial No. 644,066 of Robert James Halford entitled "Electrical Interface System", assigned to the assignee of the present invention.
The WRITE command transfers information from the host computer 101 to the disk drive subsystem 103. The BUS OUT signals indicate specific parameters to the WRITE command. The signals BUS OUTg through BUS OUT7 indicate the specific sector where the data is to be written. The signals BUS OUTs through BUS OUT10 indi¬ cate the head which is to be selected following execu¬ tion of the current WRITE command for those disk drive subsystems that support multiple Read/Write heads on a single disk platter stack. The signals BUS OUT^2 through BUS OUT14 indicate the write options as follows: 1) Verify the section ID and write the data portion of the sector;
2) Write the sector ID;
3) Write a defective sector ID; 4) Write data only to the controller memory; 5) Write the data with a zero. ECC block.
The WRITE command is indicated by a specific CODE combination and an active FUNCTION/READY signal. The WRITE command is the implied request for the first data packet (16x16 bits). If an error is detected in the beginning phases of the WRITE process (e.g., ID field, ID field compare, etc.), then a DONE signal and the ERROR signal is returned. If no error is detected, then the first data packet is transferred. To continue the WRITE process, FUNCTION/READY is activated along with the specific CODE for "DATA" for 16 WRITE CLOCK periods. This process is repeated 128 times to write a 4096 byte data sector. When all the data has been transferred, the disk drive subsystem 103 responds with an active DONE signal and the ERROR signal, if asserted.
Disk controllers generally provide a means for synchronizing the data transfers between a host computer and a disk drive. In this preferred embodiment, the disk drive subsystem 103 contains a controller 108 which includes a buffer memory 113 so that it can temporarily store data, thus giving the host computer 101 or disk drive subsystem 103 more time to respond. Additionally, data transfer instructions may include the number of bits to be transferred and the controller 108 may be used to keep track. In the preferred embodiment one level of buffering is provided in the input/output chan¬ nel 102 and two levels of buffering are provided within the controller 108.
Referring now to Fig. 3, in the preferred embodiment of the present invention the host computer 101 can transfer data at a rate of 850 MBits/sec. However, an individual disk drive, for example drive 900a within the disk array 900 can only store or retrieve data at a rate of 10 to 24 Mbits/sec. Therefore, the input/output channel 102 and the controller 108 perform buffering and synchronization functions using a speed matching technique.
The inpu /output channel 102 provides for dual speed matching buffers 371 and 371. Using dual buffers 371 and 372, the channel 102 will transfer data into buffer 371 first, then when buffer 371 is full it will switch to buffer 372 and continue transferring data into it. While data is being transferred into the buffer 372, data can be transferred out of buffer 371 and transmitted to the disk controller buffers 113. By using the speed matching technique, the transfer of data from the 850 Mbits/sec interface to the multiple 10 to 24 Mbits/sec individual disk interfaces is easily attained.
Each data transfer between the channel 102 and the disk controller buffers 113 will use the buffer locations 381 and 382. Each transfer into buffer 381 or 382 transmits 16 bits of data. A parity check is per¬ formed at that time and the 16 data bits plus the parity bit are stored in the speed matching buffers 381 and 382. The data stored in buffers 381 and 382 is then transmitted serially to one of seventeen dual speed matching buffers 301 through 334. When transferring data to these dual speed matching buffers, each bit of the 16 bit parcel and single bit parity value is routed to a particular pair of speed matching buffers depending on its position within the parcel.
To illustrate, seventeen bits are transferred out of buffer 381. Each bit is routed along a data path associated with a unique pair of speed matching buffers and an individual disk drive. Bits are routed in the following manner: bit position 0 will be routed along the data path associated with speed matching buffers 301-302 and disk drive 900a; bit position 1 will be routed along the data path associated with- speed matching buffers 303-304 and disk drive 900b; etc.
Conversely, when data is transferred from an individual disk drive and into the speed matching buf- fers, the process is reversed. Each bit retrieved from a disk drive is then stored in a unique pair of speed matching buffers. The transfer from buffers 301 through 334 into 381 or 382 combines the bits into a 16 bit par¬ cel, plus parity bit, that is ordered in the following manner: the bit retrieved from disk drive 900a will pro¬ vide the bit for position 0; the bit retrieved from disk drive 900b will provide the bit for position 1, etc.
This embodiment provides a storage capacity equal to 17 times the storage capacity of an individual disk drive because there are seventeen parallel disk drives operating in unison. Although an individual disk drive has a data transfer rate of 10-24 Mbits/sec, this embodiment provides an overall data transfer rate of 170-408 Mbits/sec. or 17 times the transfer rate of an individual disk drive (16 out of the 17 disk drives store data, the 17th disk drive stores the parity bit for error detection purposes and therefore, to the user. the effective data transfer rate is 160-384 Mbits/sec. ) . The seek rate, or the time required to address a par-, ticular track, for this invention is equal to the seek rate of an individual disk drive. Because this embodi- ment uses synchronous spindles, the latency period is equal to the average case for an individual disk drive.
Referring to Fig. 4, two cables connect the common disk controller 108 to each individual disk drive: a Control cable 168 and multiple Read/Write cables 150 through 167. In the preferred embodiment of the preferred invention, the disk drives are "daisy chained" on the Control cable 168 although this tech¬ nique is not essential. Each individual disk drive requires a direct connection to the disk controller for the Read/Write cable.
The Control cable 168 contains BUSg through BUSg signals, TAGg thrdugh TAG2 signals, STATUSg through STATUS8 signals, the OPEN CABLE signal, the SELECT ENABLE signal, UNIT SELECT through UNIT SELECT3 signals, and the SERVO CLOCK signal. The Read/Write cable contains the signals READ CLOCK, READ DATA, WRITE CLOCK, WRITE DATA, UNIT SELECTED, and SEEK END.
When the signal DEVICE ENABLE is active, drive selection and operation are allowed. The disk controller 108 will respond with SELECT ENABLE. Each disk drive compares its manually set address with the value indicated by the UNIT SELECT signals. If the values match, the selected disk drive recognizes that the disk controller wishes to converse with it. In this embodiment, all disk drives are set with the same address. Therefore, all of the disk drives respond simultaneously and in parallel to the same UNIT SELECT signal .
The TAG signals indicate the type of function to be performed. Again, all disk drives in array 900 will respond simultaneously and in parallel. TAGg tells the disk drive to move its Read/Write head to the loca¬ tion indicated by the address represented by the signals BUSg through BUSg. The signal SEEK END indicates that the Read/Write head is stationary, that it is positioned over a track, and that it is ready to perform the next data transfer operation. TAG^ indicates which
Read/Write head to use on the data transfer operation. The head group will be indicated by the signals on BUSg through BUS3. The signal TAG2 indicates that the signals respresented by BUSg through BUSg should be interpreted by the disk drive as a function code such as READ, WRITE or some other command.
When control signal TAG2 indicates a function, then an active BUSg signal indicates a WRITE operation. The WRITE CLOCK signal provides the timing pulse for the WRITE DATA line. Data is transmitted to the disk drive on the WRITE DATA line. The BUSg indicator will cause the disk drive to enable its write circuitry to convert the data on the WRITE DATA line to current reversals in the Read/Write head, thereby recording information on a selected track on the disk stack.
When control signal TAG2 indicates a function, then an active BUS^ signal indicates a READ operation. The READ CLOCK signal provides the timing pulse for the signal READ DATA. Data is transmitted to the controller on the READ DATA line. The BUSj indicator will cause the disk drive to enable its read circuitry to convert the analog information recorded on a given track to digitized data which is transmitted through the READ DATA line back to the disk controller.
When control signal TAG2 indicates a function then an active BUSg signal indicates a REZERO operation. This indicator will cause the disk drive head positioner to reposition to cylinder 0.
Referring now to Fig. 5, the record format on the disk is under the control of the disk controller. The index and sector mark pulses are available for use by the controller to indicate the beginning of a track or sector. The preferred sector format is shown in Fig. 5. Those skilled/in the art will readily recognize that other formats could be used.
The field HEAD SCATTER 622 provides some tolerance for head skew. The field HEAD SCATTER 622 consists of binary O's. The field BINARY O's SYNC 623 consists of multiple binary O's used to provide phase locked oscillator synchronization and some tolerance for timing. The field BINARY l's SYNC 624 consists of one or more binary l's used to indicate the beginning of either the sector ID 625 or the sector data 628. The field EOR 629 (End of Record Tolerance) is a 8 bit pad of binary O's which prevents the possible destruction of the end of a sector by an out of skew Read/Write head attempting to address the following sector.
To execute a WRITE command, the following steps occur in the preferred embodiment. First, as men¬ tioned earlier, the desired unit is selected and the track, head, and sector are addressed. The disk controller 108 searches for the leading edge of the desired sector. The sector ID field 625 must always be read and verified prior to writing the sector data field 628. The sector data field 628 must always be preceded by the BINARY O's SYNC 626 and BINARY l's SYNC 627 fields. BINARY O's SYNC 626 also includes a HEAD SCATTER portion similar in operation to HEAD SCATTER 622, providing tolerance for head skew. The sector data field 628 must always be followed by the EOR pad 629.
Spindle synchronization is accomplished in the preferred embodiment by developing an external clock within the disk controller 108. This external clock provides a master pulse-once per revolution time with possible extra pulses each N microsecond. The MASTER CLOCK signal provides the disk controller with a method for synchronizing the WRITE DATA line with the actual speed of the disk platters. In this embodiment, one MASTER CLOCK signal will be broadcast to all individual disk drives to force the spindle synchronization.
Spindle synchronization has been known in the prior art, and the means for synchronizing spindles of a plurality of disk drives is capable on certain models of Ampex disk drives. For example, Ampex model No. 9397
14-inch disk drives use a synchronizing pin to synchro¬ nize the AC motors controlling the spindles. This synchronization function was used for video instant replay for digitized images. The 14-inch disk drives were synchronized to + microseconds skew by providing an external clock driving the synchronization pin of each drive. The sector zero marks of each platen were then synchronized with 4-microsecond skew. For the present invention to operate properly, skew must be kept below LO microseconds for synchronized drives. The lower the skew the higher the data packing capacity of the present invention. The synchronization of spindles must occur at a very fine level. The disk platter typically rotates at a speed of 3600 rpm. This results in one revolution every 0.0166 seconds. If there are 64 sectors per track, then the sector time (the period that the sector is under the Read/Write head) is 0.0166 divided by 64 or 0.000260 seconds. As stated earlier there are leading binary O's in the BINARY O's SYNC field that is inserted just before sector ID block or the sector data block. If the spindles can be synchronized to a tolerance of plus or minus 12.5 microseconds (0.0000125 seconds) and the sector time is 260 microseconds (0.000260 seconds), then at least 25 microseconds or 10% of the sector space must be given up to the BINARY O's SYNC field to ensure that all disk drives begin reading information from the same sector on the same track at the same time. Actually, in the real world, if the tolerance is N microseconds, then it is more realistic to use 5xN micro seconds to insure spindle synchronization. The level of spindle synchronization is important. For example, if the spindles can be synchronized only to a tolerance of plus or minus 80 microseconds, then the BINARY O's SYNC field would require 33% of the sector space (for 64 sectors per track). Of course, the pre- sent invention would still work. The optimal synchroni¬ zation factor for this approach should be near one microsecond.
Referring now to Fig. 6, which describes logi¬ cally how data is stored on the parallel disk drives in the array 900. Each row (horizontal) represents bits stored on a single sector on a single track on a single disk drive. Each column (vertical) represents a 16 bit word transferred by the host computer 100. Each bit is stored on a different disk drive. The parity bit on the seventeenth disk drive was generated by the disk controller for error detection purposes. The data is logically grouped in 15 word segments. Each 15 word segment includes error detection and correction means labeled in Fig. 6 as bits Eg through E15 Each sector includes an additional word for redundant error correc¬ tion and detection labeled in Fig. 6 as bits Cg through C1S.
Using this storage method and using error detection and correction circuits for manipulating the Error Correction Code (ECC) bits E0-E15 in the disk controller, any track in error can be corrected for any 15 word segment. In Fig. 6, bits Pg through
Figure imgf000023_0001
are "vertical" parity bits. They contain the odd parity value for the column of bits. The bits labelled Eg through E15 j_n Fig. 6 are members of the Error Correction Code (ECC) value for the block. The com¬ bination of ECC and parity check bits enables the iden¬ tification and correction of all failing bits on any single disk drive within a 15 word segment. The row, or disk drive, in error can change every sixteen bits for the case of randomly detected unflawed media defects. As an additional check, an ECC is generated over the entire sector and stored as a vertical "word" imme¬ diately following the last group in the sector. This ECC will verify that the sector was repaired correctly.
Thus, a plurality of single disk drives, synchronized and controlled to emulate the operation of a single disk drive, is a cost-effective means of pro¬ viding high-performance, high-capacity disk storage devices. Another benefit of the present invention is the reliability and cost-effectiveness of using just one disk controller to manage a plurality of disk drives. Yet another benefit is that the number of cables con- necting the host computer with the disk storage device is reduced with regards to the number of disk drives communicating with the host computer. A high degree of fault tolerance is provided through the use of parity bits and sector ECC parcels such that one disk drive within the array can fail without interrupting the operation of data storage and retrieval.
While the present invention has been described in connection with the preferred embodiment thereof, it will be understood that many modifications will be readily apparent to those of ordinary skill in the art, and this application is intended to cover any adaptations or variations thereof. Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.

Claims

WHAT IS CLAIMED IS:
1. A high speed data storage device for com¬ puters, comprising: a plurality of disk drives, each having synchronized spindles; and control means for managing the data storage device, the control means comprising: means for synchronizing the spindles of each disk drive within said plura¬ lity of disk drives; means for dividing and storing portions of a logical unit of data on each disk drive within said plurality of disk drives; means for retrieving and reconstructing said portions of the logical unit of data stored from each disk drive within said plurality of disk drives; and means for emulating a single disk drive when communicating with the computer such that the high speed data storage device can be accessed without alteration to the computer's standard software or hardware inter¬ face.
2. A device in accordance with claim 1 wherein the means for storing further comprises storing each portion of the logical unit of data simultaneously on each disk drive within said plurality of disk drives and wherein the means for retrieving further comprises retrieving each portion of the logical unit of data simultaneously from each disk drive within the plurality of disk drives.
3. A device in accordance with claim 1 wherein said logical unit of data comprises a ultibit binary word and said portion comprises 1 bit.
4. A device in accordance with claim 1 wherein said means for synchronizing the spindle of each disk drive within said plurality of disk drives includes a master clock signal broadcast to all disk drives to synchronize the spindles.
5. A high speed disk array for emulating the operation of a single disk drive, comprising: a data input/output interface means for con¬ necting to a computer for receiving/transmitting digital data; a plurality of disk storage devices, each having spindles synchronized with each other; a common controller connected to said inter¬ face means and connected for controlling each of said disk storage devices for performing the steps of:
(a) receiving said digital data from said data interface means;
(b) dividing said digital data into subparts;
(c) assigning error detection codes to said subparts;
(d) formatting said subparts into a plurality of fixed sector formal subparts; and (e) simultaneously writing each fixed sector formal subpart to one of said plurality of disk storage devices.
6. The disk array according to claim 5 wherein said common controller is further operable for per¬ forming the steps of:
(f) simultaneously reading each fixed sector format subpart from each of said plura¬ lity of disk storage devices;
(g) unformatting said subparts from each fixed sector format subpart;
(h) reassembling said digital data from said unformatted subparts; and (i) transmitting said reconstructed digital data to said data interface.
7. A method of emulating a single disk drive using a plurality of disk drives, comprising the steps of:
(a) transmitting a plurality of multi-bit digital data words from a computer to a common controller;
(b) dividing each word of said plurality of multi-bit digital data words into a plurality of subparts;
(c) assembling a sector format block for each disk drive of the plurality of disk drives to include a subpart from each word of said plurality of multi-bit digital data words;
(d) assigning error detection codes and synchronization codes to each sector for¬ mat block; and
(e) simultaneously writing each sector format block to each disk drive of said plura¬ lity of disk drives.
PCT/US1989/002262 1988-08-02 1989-05-23 Single disk emulation for synchronous disk array WO1990001737A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US22736788A 1988-08-02 1988-08-02
US227,367 1988-08-02

Publications (1)

Publication Number Publication Date
WO1990001737A1 true WO1990001737A1 (en) 1990-02-22

Family

ID=22852813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1989/002262 WO1990001737A1 (en) 1988-08-02 1989-05-23 Single disk emulation for synchronous disk array

Country Status (1)

Country Link
WO (1) WO1990001737A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0450801A2 (en) * 1990-03-30 1991-10-09 International Business Machines Corporation Disk drive array storage system
AU630635B2 (en) * 1988-11-14 1992-11-05 Emc Corporation Arrayed disk drive system and method
EP0520707A2 (en) * 1991-06-24 1992-12-30 International Business Machines Corporation Data storage apparatus
EP0540355A2 (en) * 1991-11-01 1993-05-05 Fujitsu Limited Rotation synchronous control system
GB2278487B (en) * 1993-05-24 1997-06-18 Mitsubishi Electric Corp Improved recording apparatus and method for an arrayed recording apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57197664A (en) * 1981-05-29 1982-12-03 Hitachi Ltd Disc subsystem
EP0156724A1 (en) * 1984-03-16 1985-10-02 Bull S.A. Recording method for a disc memory, an a disc memory system
EP0242121A2 (en) * 1986-04-10 1987-10-21 Sony Corporation Synchronising systems for digital apparatus
EP0320107A2 (en) * 1987-11-06 1989-06-14 Micropolis Corporation Parallel disk drive array storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57197664A (en) * 1981-05-29 1982-12-03 Hitachi Ltd Disc subsystem
EP0156724A1 (en) * 1984-03-16 1985-10-02 Bull S.A. Recording method for a disc memory, an a disc memory system
EP0242121A2 (en) * 1986-04-10 1987-10-21 Sony Corporation Synchronising systems for digital apparatus
EP0320107A2 (en) * 1987-11-06 1989-06-14 Micropolis Corporation Parallel disk drive array storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN, Vol. 7, No. 51 (P-179) (1196), 26 February 1983; & JP-A-57197664 (Hitachi Seisakusho K.K.) 3 December 1982 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU630635B2 (en) * 1988-11-14 1992-11-05 Emc Corporation Arrayed disk drive system and method
EP0450801A2 (en) * 1990-03-30 1991-10-09 International Business Machines Corporation Disk drive array storage system
EP0450801A3 (en) * 1990-03-30 1994-02-16 Ibm
EP0520707A2 (en) * 1991-06-24 1992-12-30 International Business Machines Corporation Data storage apparatus
EP0520707A3 (en) * 1991-06-24 1994-05-25 Ibm Data storage apparatus
EP0540355A2 (en) * 1991-11-01 1993-05-05 Fujitsu Limited Rotation synchronous control system
EP0540355A3 (en) * 1991-11-01 1993-06-23 Fujitsu Limited Rotation synchronous control system
US5877913A (en) * 1991-11-01 1999-03-02 Fujitsu Limited Rotation synchronous control system
GB2278487B (en) * 1993-05-24 1997-06-18 Mitsubishi Electric Corp Improved recording apparatus and method for an arrayed recording apparatus

Similar Documents

Publication Publication Date Title
US5128810A (en) Single disk emulation interface for an array of synchronous spindle disk drives
US6374389B1 (en) Method for correcting single bit hard errors
US5283791A (en) Error recovery method and apparatus for high performance disk drives
US5301310A (en) Parallel disk storage array system with independent drive operation mode
JP2739727B2 (en) Method of configuring a disk file memory subsystem and disk drive subsystem
EP0279912B1 (en) Multiple copy data mechanism on synchronous disk drives
US5202979A (en) Storage system using multiple independently mechanically-driven storage units
US6968404B2 (en) Optical drive controller with a host interface for direct connection to an IDE/ATA data bus
US5384669A (en) Combining small records into a single record block for recording on a record media
US6499083B1 (en) Disk-based storage system responsive to a direction-selection signal for autonomously controlling seeks in a sequence determined by the direction-selection signal and a locally-stored doubly linked list
JP3180130B2 (en) Computer data storage device capable of correcting error data and digital data storage method
JP2674985B2 (en) Data read control method
EP0520707A2 (en) Data storage apparatus
US5335328A (en) Methods for recording and reading data from a record member having data in any one of a plurality of block formats including determining length of records being transferred
JPH05502313A (en) data storage system
JPH06119724A (en) Array type recording device
US5887128A (en) Method and apparatus for redundant disk storage system with offset
JPH0786811B2 (en) Array disk drive drive position confirmation method
US4630269A (en) Methods for diagnosing malfunctions in a disk drive
US3911400A (en) Drive condition detecting circuit for secondary storage facilities in data processing systems
US5359611A (en) Method and apparatus for reducing partial write latency in redundant disk arrays
WO1990001737A1 (en) Single disk emulation for synchronous disk array
WO1991001524A1 (en) An error recovery method and apparatus for high performance disk drives
JP3615250B2 (en) Disk array device
WO1984002017A1 (en) Control of cache buffer for memory subsystem

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE