WO1997049037A1 - Novel cache memory structure and method - Google Patents
Novel cache memory structure and method Download PDFInfo
- Publication number
- WO1997049037A1 WO1997049037A1 PCT/US1997/010155 US9710155W WO9749037A1 WO 1997049037 A1 WO1997049037 A1 WO 1997049037A1 US 9710155 W US9710155 W US 9710155W WO 9749037 A1 WO9749037 A1 WO 9749037A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- data
- cache memory
- lru
- entry
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/128—Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/123—Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/31—Providing disk cache in a specific location of a storage system
- G06F2212/312—In storage controller
Definitions
- This invention relates to a high performance computer data storage device utilizing a combination of solid state storage and one or more mass memories, such as a rotating magnetic disk device.
- a typical caching system uses a single solid state memory unit as a holding area for a string of magnetic disks, allowing certain information to be stored in a high speed cache memory, thereby increasing speed of performance as compared to the use solely of lower speed disk memories, i.e. for some percentage of times a piece of data is contained in the high speed cache memory, thereby allowing faster access as compared with when that data is only stored m a disk drive.
- Host computer 101 communicates with the entire string 102 of disks 102-1 through 102-N via cache unit 103 via Host interface 104, such as a Small Computer Systems Interface (SCSI) . All data going to or from disk string 102 passes through the cache-to-disk data path consisting of host interface 104, cache unit 103, and disk interface 105.
- Cache unit 103 manages the caching of data and services requests from host computer 101.
- Ma]or components of cache unit 103 include microprocessor 103-1, cache management hardware 103-2, cache management firmware 103-3, address lookup table 103-4, and solid state cache memory 103-5.
- the prior art cache system of Figure 1 is intended to hold frequently accessed data m a solid state memory area so as to give more rapid access to that data than would be achieved if the same data were accessed from the disk media.
- Such cache systems are quite effective when attached to certain host computers and under certain work loads.
- the single cache memory 103-5 is used in conjunction with all disks in disk string 102. Data from any of the disks may reside in cache memory 103-5 at any given time. The most frequently accessed data is given precedence for caching regardless of the disk drive on which it resides.
- the determination of whether or not the data is m cache memory 103-5 and the location of that data in cache memory 103-5 is usually via hashing schemes and table search operations. Hashing schemes and table searches can introduce time delays of their own which can defeat the purpose of the cache unit itself. Performance is very sensitive to cache-hit rates. Due to caching overhead and queuing times, a low hit rate in a typical string oriented cache system can result in overall performance that is poorer than that of an equivalently configured uncached string of disks.
- the size of cache memory 103-5 relative to the capacity of disk drives 102 is generally low.
- An apparently obvious technique to remedy a low hit rate is to increase the cache memory 103-5 size.
- With limited cache memory 103-5 capacity a multitude of requests over a variety of data segments exhausts the capability of the cache system to retain the desirable data m cache memory 103-5. Often, data which would be reused in the near future is decached prematurely to make room in cache memory 103-5 for handling new requests from host computer 101. The result is a reduced cache hit rate.
- a reduced hit rate increases the number of disk accesses; increased disk accesses increase the contention on the data path.
- a self-defeating cycle is instituted. "Background" cache-ahead operations are limited since the data transferred during such cache ahead operations travels over the same data path as, and often conflicts with data transferred to service direct requests from the host computer 101.
- the data path between cache unit 103 and disk string 102 can easily be overloaded. All data to and from any of the disks in disk string 102, whether for satisfying requests from host computer 101 or for cache management purposes, travels across the cache-to-disk path. This creates a bottleneck if a large amount of prefetching of data from disk string 102 to cache memory 103-5 occurs.
- Each attempt to prefetch data from disk string 102 into cache memory 103-5 potentially creates contention for the path with data being communicated between any of the disk drives of disk string 102 and host computer 101.
- prefetching of data into cache memory 103-5 must be judiciously limited; increasing the size of the cache memory 103-5 beyond a certain limit does not produce corresponding improvements in the performance of the cache system. This initiates a string of related phenomena.
- Cache-ahead management is often limited to fetching an extra succeeding track of data from disk wherever a read command from the host cannot be fulfilled from the cached data. This technique helps minimize the tendency of cache-ahead to increase the queuing of requests waiting for the path between cache memory 103-5 and disk string 102.
- one of the concep t s on which caching is based is that data accesses tend to De concentrated within a given locality withm a reasonably short time frame.
- Cache memory 103-5 is generally volatile; the data is lost if power to the unit is removed. This characteristic, coupled with the possibility of unexpected power outages, has generally imposed a write-through design for handling data transferred from host computer 101 to the cached string. In such a design, all writes from the host are written directly to disk; handled at disk speed, these operations are subject to all the inherent time delays of seek, latency, and lower transfer rates commonly associated with disk operations.
- a solid state storage device has high-speed response, but at a relatively high cost per megabyte of storage.
- a rotating magnetic disk, optical disk, or other mass media provides high storage capacity at a relatively low cost per megabyte, but with a low-speed response.
- the teachings of this invention provide a hybrid solid state and mass storage device which gives near solid state speed at a cost per megabyte approaching that of the mass storage device .
- embodiments will be described with regard to magnetic disk media. However, it is to be understood that the teachings of this invention are equally applicable to other types of mass storage devices, including optical disk devices, and the like.
- the hardware features include: one or more rotating magnetic disk media, an ample solid state storage capacity; private channels between the disks and the solid state storage device; and high speed microprocessors to gather the intelligence, make data management decisions, and carry out the various data management tasks.
- the firmware features include the logic for gathering the historical data, making management decisions, and instructing the hardware to carry out the various data management operations.
- Important aspects of the firmware include making the decisions regarding the retention of data in the solid state memory based on usage history gathered during the device ' s recent work load experience; and a comprehensive, intelligent, plateau-based methodology for dynamically distributing the solid state memory for the usage of the data stored on, or to be stored on, the various disks.
- This distribution of solid state memory is work load sensitive and is constantly dynamic; it is accomplished in such a way as to guarantee full utilization of the solid state memory while at the same time ensuring that the data for all disks are handled in such a way as to retain the most useful data for each disk m the solid state memory for the most appropriate amount of time.
- the multiple plateau-based cache distribution methodology is illustrated in Figures 57 and 58, and described m the section entitled Plateau Cache Illustration.
- the hardware and firmware features are combined in a methodology which incorporates simultaneity of memory management and data storage operations.
- the hybrid storage media of this invention performs at near solid state speeds for many types of computer workloads while practically never performing at less than normal magnetic disk speeds for any workload.
- One or more rotating magnetic disk media are used to give the device a large capacity; the solid state storage is used to give the device a high-speed response capability.
- By associating the solid state media directly with magnetic disk devices private data communicaticn lines are established which avoid contention between normal data transfers between the host and the device and transfers between the solid state memory and the disks.
- the private data channels permit virtually unlimited conversation between the two storage media.
- Utilization of ample solid state memory permits efficient maintenance of data for multiple, simultaneously active data streams.
- Management of the storage is via one or more microprocessors which utilize historical and projected data accesses to perform intelligent placement of data. No table searches are employed m the time critical path.
- Host accesses to data stored m the solid state memory are at solid state speeds; host accesses to data stored on the magnetic disks are at disk device speeds. Under most conditions, all data sent from the host to the device is handled at solid state speeds.
- Intelligence is embodied to cause the device to dynamically shift its operating priorities in order to maintain performance at a high level, including the optimization of the handling of near-future host-device I/O operations.
- GLOSSARY OF TERMS ADDRESS TRANSLATION The method for converting a sector address into a disk bin address and sector offset within the
- ADDRESS TRANSLATION TABLE The table which maintains the relationship between disk bin identifiers and solid state memory addresses; also holds statistics about frequency of bin accesses, recent bin accesses, or other information as required.
- ADT TABLE See ADDRESS TRANSLATION TABLE
- BACKGROUND ACTIVITY Any of the tasks done by the described device's controller which are not done m immediate support of the host's activity. For example, the writing of modified data from cache to disk, prefetch reading of data from disk into cache, etc.
- BACKGROUND SEEK The first step in a background sweep write of modified data from cache to disk.
- BACKGROUND SWEEP The collective activities which write data from cache to disk as background tasks.
- BACKGROUND OPERATION See BACKGROUND ACTIVITY.
- BACKGROUND TASK See BACKGROUND ACTIVITY.
- BACKGROUND WRITE The writing of modified data from cache to disk as a background task.
- BAL Bin Address List.
- BATTERY BACKUP The hardware module and its controller which assures the described device of power for an orderly shutdown when outside power is interrupted.
- BEGGING FOR A BIN The canvassing of the cache chains for all the drives to find a cache bin that can be reused.
- BM An arbitrary number of contiguous sectors occupying space on either a disk drive or in cache memory in which data is stored.
- BIN ADDRESS LIST (BAL) : A list of one or more disk bin addresses associated with a host command, and which relate the command's requirements to disk bins, cache bins, and sector addresses .
- BIN SIZE The number of sectors considered to be in a disk bin and also the number of sectors considered to be m a cache bin; this may or may not be equal to the actual number of sectors in a disk track.
- BUYING A BIN The acquisition of a cache bin from the global cache chain to use for data for a given drive, such cache bin received in exchange for a cache bin currently in the LRU position of the cache chain of that drive.
- CACHE The solid state memory area which holds user data within the cache system of this invention.
- CACHE-AHEAD FACTOR At each cache bin read hit or re-hit, cached data sufficient to satisfy a number of I/O's may remain in front of, and/or behind, the current location of the data involved in the current I/O. When either of these two remaining areas contain valid data for less than a set number of I/O's, the cache-ahead is activated. That minimum number of potential I/O's is the cache-ahead factor, or the proximity factor.
- CACHE BIN A data bin in cache memory.
- CACHE CHAIN The logical bidirectional chain which maintains references to a set of cache bins m a certain sequence; in the described device, m the order of most-recently-used (MRU) to least-recently-used (LRU) .
- CACHE CHAIN STATUS An attribute based on the number of cache bins in a given cache chain, either for a given drive or global, and which is associated with a cache chain which indicates that the cache chain is in a specified condition. Such cache bins contain no modified data since cache bins containing modified data are removed from the cache chain and placed in the modified pool. The cache chain status is used to control decisions regarding the way in which the device manages the cache and other resources .
- CACHE HIT A host initiated read or write command which can be serviced entirely by utilizing currently cached data and/or currently assigned cache bins.
- CACHE HIT RATE The proportion of all host I/O's which have been, or are being, serviced as cache hits.
- CACHE MEMORY The solid state memory for retaining data. See SOLID STATE MEMORY.
- CACHE MISS A host initiated read or write command which cannot be serviced entirely by utilizing currently cached data and/or currently assigned cache bins.
- CACHE READ HIT A host initiated read command which can be serviced entirely by utilizing currently cached data.
- CACHE READ MISS A host initiated read command which cannot be serviced entirely by utilizing currently cached data.
- CACHE STATUS See CACHE CHAIN STATUS.
- CACHE WRITE HIT A host initiated write command which can be serviced entirely by utilizing currently assigned cache bins.
- CACHE WRITE MISS A host initiated write command which cannot be serviced entirely by utilizing currently assigned cache bins .
- CHAINING A method utilized m the LRU table to logically connect cache bins in a desired sequence, in the case of this invention, to connect the cache bins in most-recently-used to least-recently-used order. Chaining is also used in any table management in which the logical order is dynamic and does not always match the physical order of the data sets in the table.
- CHAINING See CHAIN.
- CLEAN BIN CLEAN CACHE BIN: A cache bin containing only unmodified data; that is data that matches the data in the corresponding disk bin.
- CLEAN DATA Data currently resident in a cache bin which exactly matches the data stored m the disk bin to which the cache bin is assigned.
- CLEANING The act of writing data from cache memory to disk, such data having been written from the host into cache and which has not yet been written from cache to disk.
- CONTROL TABLE Any of the various tables which maintain records which control the location and retention of data stored in cache or on disk, and which are used to control the activities of the described device.
- DATA BIN (DISK) See DISK BIN.
- DATA CHANNEL See CHANNEL
- PROXIMITY A group of contiguous sectors within a cache bin, and of a size which matches the most recent host read for data from that cache bin.
- DECACHE The removal of the logical references which relate some or all of the data in a cache bin to the data in a disk bin. When such references are all removed, the decached data in the cache bin is no longer available from cache, but can be retrieved from the disk on which it is stored.
- DIRTY BIN; MODIFIED BIN A cache bin which contains data which has been written to the cache by the host, and which data has not yet been written from the cache to the corresponding disk drive.
- DIRTY DATA; MODIFIED DATA Data which has been written to the cache by the host, and which data has not yet been written from the cache to the corresponding disk drive. In other words, data in a cache bin that does not match the data in the corresponding disk bin.
- DISCONNECT The action of removing the current logical connection on a channel between two devices, thus freeing the channel for use by other devices which have access to the channel .
- DISK BIN; DRIVE BIN A data bin on a disk.
- DISK BIN ADDRESS The address of the first sector of data in a given bin on disk. These addresses correspond to physical locations on the rotating magnetic disk. Each sector address as specified in an I/O operation can be converted into a disk bin address and a sector offset within that bin.
- DISK CACHE That portion of the described device's cache memory which is assigned to data corresponding to data stored on, or intended to be stored on, a specific disk drive.
- DISK DRIVE See DISK.
- DISK SECTOR ADDRESS The address of a physical sector on the magnetic disk device.
- DISK SERVER The logical section of the caching device which handles the writes to, and reads from, the rotating magnetic disk.
- DISK TRACK A complete data track on a disk; one complete band on one platter of the disk device; this terminology is generally not meaningful to the logic of the described device.
- DMA Direct Memory Access; that is, memory-to-memory transfer without the involvement of the processor.
- DRAM Dynamic random access memory. The chip or chips that are used for solid state memory devices. DRIVE: See DISK. DRIVE BIN: See DISK BIN.
- DRIVE CACHE Collectively the cache bins which are currently assigned to a given drive. 97/49037 PCMJS97/10155
- DRIVE CACHE CHAIN The logical chain of cache bins which are currently assigned to maintain data for a given disk drive of the described device. Such cache bins are available to be decached and reused only in very special circumstances, as opposed to those cache bins which have migrated into the global cache chain.
- DRIVE CACHE CHAIN STATUS A term describing a drive's cache condition based on the number of unmodified cache bins assigned to that given drive. See CACHE CHAIN STATUS.
- DRIVE MODE An attribute of each drive based on the number of cache bins which contain modified data which indicates that the amount of such cache is at a specified level . Used to control decisions regarding the way in which the device manages the drive, the cache, and other resources .
- DRIVE NORMAL MODE The condition of the cache assigned to a specific drive in which the described storage device can use its normal priorities with respect to the management of data for that drive in order to reach its optimal performance level. In this mode, the background sweep is dormant; and cache ahead, recycle, and read ahead operations take place as circumstances indicate they are appropriate.
- DRIVE SATURATED MODE The drive mode in which no cache ahead operations, no recycling, and no read ahead operations are permitted on the drive. The global logic will not allow the number of modified cache bins assigned to a given drive to exceed the number that places the device in this mode.
- DRIVE SWEEP MODE The drive mode in which the sweep has been activated for the drive based on the number of cache bins assigned to the drive and which contain modified data; m this mode the sweep shares resources with the cache ahead operations.
- DRIVE TIMEOUT MODE The drive mode in which the sweep has been activated for the drive based on the time since a cache bin assigned to the drive was first written to by the host and which still contains modified data; in this mode the sweep shares resources with the cache ahead operations.
- DRIVE URGENT MODE The drive mode in which the sweep is active. Due to the large number of cache bins containing modified data, no cache ahead operations are permitted on the drive, and recycling is limited to the more frequently accessed cache bins.
- DRIVE STATUS See DRIVE CACHE CHAIN STATUS.
- EDAC Error Detection And Correction.
- EEPROM Electrically Erasable Programmable Read-Only Memory.
- EPROM Erasable Programmable Read-Only Memory.
- EXCESS CACHE CHAIN STATUS The term used to describe a cache chain, either global or local, when that chain contains more than the desired maximum number of cache bins.
- the global cache chain will be in this status whenever the described device is powered on.
- FIRMWARE The collective set of logical instructions and control tables which control the described device's activities.
- GAP ELIMINATION The actions taken to make the cached data within a given cache bin contiguous. This can be done by reading data from the disk into the gap, or by taking whatever actions are necessary to allow one of the cached areas to be decached.
- GAP READ The elimination of a gap m cached, modified data in a single cache bin by reading intervening data from the disk.
- GAP TABLE A control table which contains a line which references each gap of each cache bin which currently contains a gap.
- This table will usually be null, or will have no lines m it, since the logic of the described device eliminates gaps as soon as feasible after their creation by the host.
- GAP WRITE The elimination of a gap in cached, modified data in a single cache bin by writing some or all of the modified data m the cache bin from cache to the disk.
- GLOBAL CACHE Collectively the cache bins which are currently assigned to a given drive, but which have been placed in the global cache chain. Cache bins in the global cache chain are more readily accessible for decachmg and reassignment to other drives .
- GLOBAL CACHE CHAIN The logical chain of cache bins which, although they may be currently assigned to maintain data for the various disk drives of the described device, are readily available to be decached and reused for caching data for any of the drives .
- GLOBAL CACHE CHAIN STATUS A term describing the global cache condition based on the number of unmodified cache bins currently in the global cache chain. See CACHE CHAIN STATUS.
- GLOBAL DRIVE MODE A term describing a controlling factor of the drive operations; determined by the total number of modified cache bins assigned to all drives.
- GLOBAL NORMAL DRIVE MODE See GLOBAL NORMAL MODE.
- GLOBAL NORMAL MODE The condition based on the total number of cache bins containing modified cache bins in which the described storage device can use its normal priorities with respect to the management disk drive activities in order to reach its optimal performance level. In this mode, the background sweep, recycling, and cache ahead operations are under individual drive control .
- GLOBAL POOL See MODIFIED POOL.
- GLOBAL SATURATED MODE The global mode in which a very large number of cache bins contain modified data. All cache aheads, all recycling, and all read aheads are prohibited. The sweep is operating for all drives which have any modified cache bins assigned to them.
- GLOBAL URGENT MODE The global mode in which a large number of cache bins contain modified data. The background sweep is forced on for all drives which have any modified cache bins, cache aheads are prohibited, but read aheads are permitted under individual drive control.
- GLOBAL LRU The term used to reference that cache bin in the least-recently-used position of the global cache chain.
- GLOBAL MODE See GLOBAL DRIVE MODE.
- GLOBAL MODIFIED CACHE POOL The pool of cache bins each of which contains one or more sectors of modified, or dirty, data.
- GLOBAL STATUS See GLOBAL CACHE CHAIN STATUS .
- HASHING A procedure used in many computer programs which is used to quickly determine the approximate location of some desired information. Used extensively in conventional caching devices to reduce the amount of searching to determine where, if at all, data for a given disk location may be located in cache. This methodology usually results in a search to locate the exact data location, such search prone to becoming very time consuming for large caches. The present invention uses no such hashing and/or search scheme, and is not subject to such time delays.
- HOLE See GAP.
- HOST The computer to which the caching device is attached.
- HOST COMMAND Any of the logical instructions sent from the host to the described device to instruct the device to take some kind of action, such as to send data to the host (a READ command) , to accept data from the host for storing (a WRITE command) , among others.
- HOST SERVER The portion of the caching device which interfaces with the host computer.
- I/O SIZE The size of a host I/O request as a number of sectors .
- LAD See LEAST ACTIVE DRIVE.
- LEAST ACTIVE DRIVE The least active disk drive based on recent host I/O activity and short term history of host I/O activity.
- LEAST ACTIVE DRIVE CHAIN See LEAST ACTIVE DRIVE LIST.
- LEAST ACTIVE DRIVE LIST The chained lines of the ADT-DISKS table which maintain the drive activity information.
- LEAST-RECENTLY-USED TABLE See LRU TABLE.
- LINK A term used to describe a line in a chained table, and which is tied, forward, backward, or both ways via pointers, to another line or other lines m the table.
- LOCKED CACHE BIN The term used to describe a cache bin which is part of an ongoing I/O, either between the host and the solid state memory, or between the solid state memory and the rotating media storage device, or both.
- LOGICAL SPINDLE A spindle of a disk drive, or a logical portion thereof which has been designated a spindle for purposes of the described device.
- LRU Least-Recently-Used, as pertains to that data storage cache bin which has not been accessed for the longest period of time.
- MAGNETIC DISK See DISK.
- MAGNETIC DISK MEDIA A device which utilizes one or more spinning disks on which to store data.
- MARGINAL CACHE CHAIN STATUS The cache chain status, either global or local, in which the number of cache bins in the cache chain is approaching the smallest permissible. The device's control logic will attempt to keep the cache chain from becoming smaller, but will allow it to do so if no other course of action is available to handle the host activity.
- MODE See DRIVE MODE and GLOBAL DRIVE MODE.
- MODIFIED BIN See DIRTY BIN.
- MODIFIED BINS TABLE See MOD TABLE.
- MODIFIED CACHE BIN See DIRTY BIN.
- MODIFIED CACHE POOL See MODIFIED POOL.
- MODIFIED DATA See DIRTY DATA.
- MODIFIED POOL MODIFIED CACHE POOL: A generally unordered list of all the cache bins which, at a given moment m time, contain modified data.
- MOST ACTIVE DRIVE The most active drive based on recent host I/O activity as reflected in the LEAST ACTIVE DRIVE LIST of the ADT table.
- MRU Most-Recently-Used, as pertains to that data storage track which has been accessed in the nearest time past.
- NORMAL CACHE CHAIN STATUS The cache chain status, either global or local, m which the number of cache bins in the cache chain is withm the preset limits which are deemed to be the best operating range for a drive's cache chain.
- NORMAL MODE See DRIVE NORMAL MODE and GLOBAL NORMAL MODE.
- NULL NULL VALUE: A value in a table field which indicates the field should be considered to be empty; depending on usage, will be zero, or will be the highest value the bit structure of the field can accommodate.
- NULL VALUE See NULL.
- PERIPHERAL STORAGE DEVICE Any of several data storage devices attached to a host for purposes of storing data. In the described device, may be a disk drive, but is not limited thereto.
- PHYSICAL TRACK See DISK TRACK.
- the plateaus represent various amounts of cache which may be assigned to the disks or to global cache and which is protected for a given set of conditions. In this embodiment, the plateaus are fixed at initialization time, and are the same for all drives. In logical extensions of this invention, the plateaus for the various drives would not necessarily be equal, and they may be dynamically adjusted during the operation of the device.
- PREFETCH See CACHE AHEAD.
- PRIVATE CHANNEL See PRIVATE DATA CHANNEL.
- PRIVATE DATA CHANNEL In the described device, a physical data path used to move data between the solid state memory ana a disk drive, but not used to move data between the host computer and the described device; therefore, private to the device.
- PROXIMITY A term for expressing the "nearness" of the data in a cache bin which is currently being accessed by the host to either end of the said cache bin.
- PROXIMITY FACTOR See CACHE-AHEAD FACTOR.
- QUEUED HOST COMMAND Information concerning a host command which has been received by the described device, but which could not be immediately handled by the device; for example, a read cache miss.
- QUEUED READ CACHE MISS Information concerning a host read command which has been received by the described device, but, which could not be immediately fulfilled by the device.
- QUEUED READ COMMAND See QUEUED READ CACHE MISS.
- QUEUED SEEK CACHE MISS Information concerning a host seek command which has been received by the described device, and for which the data at the specified disk address is not currently stored in cache.
- QUEUED WRITE CACHE MISS Information concerning a host write command which has been received by the described device, but which could not be immediately handled by the device.
- READ AHEAD The reading of data from a disk bin into a cache bin as a part of the reading of requested data from disk into cache resulting from a read cache miss.
- the data so read will be the data from the end of the requested data to the end of the disk bin. This is not to be confused with a cache ahead operation.
- READ CACHE HIT See CACHE READ HIT.
- READ CACHE MISS See CACHE READ MISS.
- READ COMMAND A logical instruction sent from the host to the described device to instruct the device to send data to the host .
- READ FETCH The reading of data from a disk bin into a cache bin m order to satisfy the host request for data which is not currently in cache. The data so read will be the data from the beginning of the requested data to the end of the requested data withm the disk bm. This is not to be confused with a read ahead or a cache ahead operation.
- READ HIT See CACHE READ HIT.
- READ MISS See CACHE READ MISS.
- READ QUEUE A temporary queue of read cache miss commands; the information in this queue is used to cache the requested data and then to control the transmission of that data to the host .
- RECONNECT The action of reestablishing a logical data path connection on a channel between two devices, thus enabling the channel to be used for transmitting data. Used m handling queued read cache misses and queued writes.
- RECYCLE The term used to describe the retention of data in a cache bm beyond that bin's logical arrival at the global cache LRU position; such retention may be based on a number of factors, including whether or not some data in the cache bin was read at some time since the data m the cache bin was most recently cached, or since the data was last retained in cache as the result of a recycling action.
- ROTATING MAGNETIC DISK See DISK.
- ROTATING STORAGE MEDIA A data storage device such as a magnetic disk.
- SATURATED MODE SATURATED: See SATURATED DRIVE MODE and SATURATED GLOBAL DRIVE MODE.
- SCSI Small Computer System Interface; the name applied to the protocol for interfacing devices, such as a disk device to a host computer.
- SCSI CONTROL CHANNEL A physical connection between devices which uses the SCSI protocol, and is made up of logical controllers connected by a cable.
- SECTOR The logical sub-unit of a disk track or disk bin; the smallest addressable unit of data on a disk.
- SECTOR ADDRESS The numerical identifier of a disk sector, generally indicating the sequential location of the sector on the disk.
- SECTOR OFFSET In the described device, the relative location of a given sector withm a cache bm or disk bm.
- SEEK The action of positioning the read/write head of a disk drive to some specific sector address Usually done by a host in preparation for a subsequent read or a write command.
- SEEK CACHE MISS In the described device, a seek command from the host for which the data of the corresponding disk bm is not cached. The described device will place the information about the seek m a seek queue and attempt to execute a background read ahead in response to a seek cache miss.
- SEEK COMMAND A logical instruction sent from the host to the described device to instruct the device to position the read/write head of the disk to some specific sector address. In the described device, this is handled as an optional cache ahead operation.
- SEEK QUEUE A temporary queue of host seek miss commands which are waiting to be satisfied by background cache ahead actions, should time permit.
- SEGMENT See SECTOR.
- SERIAL PORT A means for communicating with a device, external to the described device, such as a terminal or personal computer, which in this context may be used to reset operating parameters, reconfigure the device, or make inquiries concerning the device's operations.
- SOLID STATE STORAGE See SOLID STATE MEMORY.
- SOLID STATE STORAGE DEVICE See SOLID STATE MEMORY.
- SPINDLE See LOGICAL SPINDLE.
- SSD See SOLID STATE MEMORY.
- SSD BIN ADDRESS The address in the solid state memory at which the first byte of the first sector currently corresponding to a given disk bm resides.
- STATUS See DRIVE CACHE CHAIN STATUS and GLOBAL CACHE CHAIN STATUS .
- STEALING A BIN The acquisition of a cache bm from the global cache chain, or indirectly from another drive's cache chain, for use for data for a given drive when that drive does not give back a cache bm in exchange.
- SWEEP See BACKGROUND SWEEP.
- SWEEP MODE See DRIVE SWEEP MODE.
- TABLE SEARCH A technique used in some devices to find references to certain data, such as the location of cached data. This procedure is often time consuming, and in general, it is not used in the described device m any of the time critical paths .
- TIMEOUT In the described device, a timeout occurs when, for a given drive, some cache bm has been holding dirty, or modified, data for more than some preset length of time. The occurrence of a timeout will place the drive m timeout mode if the sweep for that drive is not already active.
- TIMEOUT MODE See DRIVE TIMEOUT MODE.
- WRITE CACHE HIT See CACHE WRITE HIT.
- WRITE CACHE MISS See CACHE WRITE MISS.
- WRITE COMMAND A logical instruction sent from the host to the described device to instruct the device to accept data from the host for storing m the device.
- WRITE QUEUE A temporary queue of host write miss commands which are waiting to be satisfied due to the extremely heavy load of write activities which have temporarily depleted the supply of cache bins available for reuse. This queue will usually be null, or empty, and if it is not, the described device will be operating m the saturated mode.
- WRITE THROUGH A technique used in some caching devices to allow host write commands to bypass cache and write directly to the disk drive.
- Figure 1 depicts the logic for a typical prior-art cached disk, computer data storage system.
- Figure 2 depicts an overall view of the hardware component of one embodiment of the present invention.
- Figure 3 depicts an overall view of one embodiment of the present invention which uses cached disks as a computer data storage unit .
- CACHE ILLUSTRATIONS These diagrams depict the cache of selected embodiments of the present invention as related to statuses and modified pools as related to the various drive operating modes.
- Figure 4 depicts one embodiment of drive cache structure as its various sizes relate to the drive cache statuses.
- Figure 5 depicts one embodiment of the pool of modified cache bins associated with a drive, showing how the pool's size relates to the drive cache modes.
- Figure 6 depicts one embodiment of the global cache structure, as its various sizes relate to the global cache statuses.
- Figure 7 depicts the composite pool of modified cache bins, showing how the pool's size relates to the global cache modes.
- FIRMWARE - CACHE MANAGEMENT MODULES These modules handle the cache management of the present invention. Each module may be invoked from one or more places m the firmware as needed. They may be called as a result of an interrupt, from withm the background controller, or as a result of, or part of any activity.
- Figure 8 shows one embodiment of a cache-ahead determination procedure that determines which cache bm, if any, should be scheduled for a cache-ahead operation.
- Figure 9 shows one embodiment of a set drive mode procedure for setting the operating mode for a specific drive's cache based on the number of modified cache bins assigned to that drive.
- Figure 10 shows one embodiment of a set drive cache chain status module, which uses the information about the number of cache bins currently assigned to the specified drive to set a cache chain status for that drive.
- Figure 11 shows an exemplary procedure for locating a cache bin to reuse.
- Figure 12 shows an exemplary method by which a specific drive gets a cache bm from the global cache to use for its current caching requirements when the drive can afford to give a cache bm to the global chain m return.
- Figure 13 shows an exemplary method by which a specific drive gets a cache bm from the global cache to use for its current caching requirements when no drive can give a bm to global .
- Figure 14 shows an exemplary method by which a specific drive indirectly gets a cache bm from another drive's cache to use for its own current caching requirements.
- Figure 15 shows an exemplary method by which a specific drive gets a cache bm for its use from any drive when none is available m either the drive's own cache chain or m the global cache .
- Figure 16 shows one embodiment of logic for determining the least active drive whose cache chain is the best candidate for supplying a bm for use by another drive.
- Figure 17 shows an exemplary procedure for setting the operating mode for the global cache based on the total number of modified cache bins.
- Figure 18 depicts one embodiment of a set global cache chain status module, which uses the information about the number of cache bins currently assigned to the global portion of cache to set a global cache chain status.
- Figure 19 shows exemplary logic for determining if a current host request is a cache hit or must be handled as a cache miss.
- Figure 20 depicts exemplary logic for setting up a bm address list based on a host command.
- Figure 21 depicts exemplary logic for translating the sector address of a host command mto a bm identifier.
- Figure 22 shows an exemplary method for updating and rechammg the least-active disk list.
- Figure 23 shows an exemplary method by which a cache bm which has just been involved in a cache read hit is rechamed to the MRU position of that drive's cache chain, if that action is appropriate.
- Figure 24 depicts exemplary logic for moving a newly modified cache bm from a drive's cache chain to the pool of modified cache bins.
- Figure 25 depicts exemplary logic for moving a newly modified cache bm from the global cache chain to the pool of modified cache bins.
- Figure 26 depicts one embodiment of a module which determines whether or not sufficient cache bins are available for reuse to handle a current requirement .
- these modules are invoked, directly or indirectly, by a host interrupt
- These modules may call other modules which are described in a different section.
- Figure 27 shows exemplary logic of the host-command interrupt management .
- Figure 28 depicts exemplary logic of handling the completion of a host command.
- Figure 29 depicts exemplary logic for handling a read command from the host when all the data to satisfy the command is found to be in the cache memory.
- Figure 30 depicts exemplary logic for the host interrupt handling of a read command when some or all of the data required to satisfy the command is not m the cache memory.
- Figure 31 depicts exemplary logic for handling a seek command from the host when the addressed bm is not currently cached.
- Figure 32 depicts exemplary logic for the host interrupt handling of a write command when all the data bins related to that write are found to be m the cache memory.
- Figure 33 depicts exemplary logic for the host interrupt andlmg of a write command when some or all of the data bins related to that write are not m the cache memory.
- these modules make up the executive control and the control loop which runs at all times during which an interrupt is not being processed.
- These modules may call other modules which are described m a different section
- Figure 34 depicts one embodiment of the initiation of a cache-ahead operation, if one is scheduled for the given drive
- Figure 35 shows one embodiment of the main firmware contro_ for the described device.
- Figure 36 shows one embodiment of the operations which take place when the described device is initially powered on.
- Figure 37 shows one embodiment of the background operation control loop for a drive.
- Figure 38 depicts an exemplary procedure which shuts down the described device when it is powered off.
- Figure 39 shows one embodiment of the logic for eliminating gaps in the modified portions of the cached data in a cache bin.
- Figure 40 shows one embodiment of the handling of the queued commands for a given drive.
- Figure 41 depicts, for one embodiment of this invention, the methods by which a module rechains cache bins from the LRU or near LRU positions of the global cache chain to the MRU or LRU positions of the specific, private disk cache chains; the movement is based on recycling information on each cache bin reflecting that bins activity since first being cached or since it was last successfully recycled.
- Figure 42 depicts one embodiment of the handling of a queued read cache miss operation.
- Figure 43 depicts one embodiment of the method of fetching missing data from disk.
- Figure 44 depicts one embodiment of the handling of a queued seek cache miss operation.
- Figure 45 depicts one embodiment of the logic for determining if the cache associated with a given drive has included any modified cache bins for more than the specified time limit.
- Figure 46 depicts one embodiment of the initiation of a background write from cache to disk, if one is appropriate for the given drive at the current time.
- Figure 47 depicts one embodiment of a method for identifying a modified cache bin assigned to a specified disk, which cache bin to write from cache to disk at this time.
- Figure 48 depicts one embodiment of the handling of a queued write cache miss operation.
- these modules are invoked, directly or indirectly, by an interrupt from one of the disk drives. These modules may call other modules which are described in a different section.
- Figure 49 shows one embodiment of logic for handling the termination of a cache-ahead operation.
- Figure 50 shows one embodiment of logic of the drive-interrupt management.
- Figure 51 depicts exemplary actions to be taken when a read from, or write to, a drive has completed, such read or write initiated for the purpose of eliminating a gap or gaps in the modified cached data of a cache bin.
- Figure 52 depicts exemplary logic for the termination of a seek which was initiated for the purpose of writing modified data from a cache bin to its corresponding disk drive.
- Figure 53 depicts exemplary logic for handling the termination of a background write from cache to disk.
- MODULES ENTERED VIA INTERNAL INTERRUPTS In one embodiment, these modules are invoked, directly or indirectly, by an interrupt from within the described device itself. These modules may call other modules which are described in a different section.
- Figure 54 depicts exemplary handling of a power-off interrupt .
- Figure 55 depicts exemplary logic for initiation of the background sweep for writing from cache to disk when the device is in its power-down sequence.
- MODULES ENTERED VIA SERIAL PORT INTERRUPTS In exemplary embodiments, this module is invoked, directly or indirectly, by an interrupt from a device attached to the serial port of the described device. These modules may call other modules which are described in a different section.
- Figure 56 depicts exemplary logic for handling the communications with a peripheral attached to the device via the serial port .
- Figure 57 is a graph illustrating eight cases of the cache assignments of a three-plateau configuration.
- Figure 58 is a graph illustrating eight cases of the cache assignments of a five-plateau configuration.
- Table CM1 depicts exemplary operating rules based on the drive modes.
- Table CM2 depicts exemplary control precedence for the global modes and the drive modes.
- Table CM3 summarizes exemplary rules for setting the drive modes .
- Table CM4 summarizes exemplary rules for setting the global mode.
- Table CS1 summarizes exemplary rules for setting the drive cache statuses .
- Table CS2 summarizes exemplary rules for setting the global cache status.
- Table CS3 gives an exemplary set of the possible status conditions and the corresponding actions required to acquire a cache bin for reuse.
- Table CS4 gives an exemplary set of bin acquisition methods based on the combinations of cache statuses.
- Table LB1 gives an exemplary set of cache bin locking rules for operations involving host reads.
- Table LB2 gives an exemplary set of cache bm locking rules for operations involving host writes.
- Table LB3 gives an exemplary set of cache bm locking rules for operations involving caching activities.
- Table LB4 gives an exemplary set of cache bm locking rules for operations involving sweep activities.
- Table LB5 gives an exemplary set of cache bm locking rules for operations involving gaps. CONTROL TABLE EXAMPLES Tables TCA through TCG give an example of a Configuration Table which defines one exemplary configuration of the present invention.
- Table TCA gives an exemplary set of sizing parameters for one configuration of the described device and some basic values derived therefrom.
- Table TCB gives an exemplary set of drive cache status parameters for one configuration of the described device and some basic values derived therefrom.
- Table TCC gives an exemplary set of global cache status parameters for one configuration of the described device and some basic values derived therefrom.
- Table TCD gives an exemplary set of drive mode parameters for one configuration of the described device and some basic values derived therefrom.
- Table TCE gives an exemplary set of global mode parameters for one configuration of the described device and some basic values derived therefrom.
- Table TCF gives an exemplary set of recycling parameters for one configuration of the described device.
- Table TCG gives an exemplary set of drive activity control parameters for one configuration of the described device.
- Table TLB gives an example of the unmdexed values of an LRU table at the completion of system initialization at power up time.
- Table TLC gives an exemplary snapshot of portions of an LRU table that are indexed by spindle number, taken at the completion of system initialization at power up time.
- Table TLD gives an exemplary snapshot of portions of an LRU table that are indexed by cache bm number, taken at the completion of system initialization at power up time.
- Tables TLE through TLG give exemplary snapshots of some portions of a Least-Recently-Used Table taken during the operation of the present invention.
- Table TLE gives an example of the unmdexed values of an LRU table at an arbitrary time during the operation of the present invention.
- Table TLF gives an exemplary snapshot of portions of an LRU table that are indexed by spindle number, taken at an arbitrary time during the operation of the present invention.
- Table TLG gives an exemplary snapshot of portions of an LRU table that are indexed by cache bm number, taken at the completion of system initialization at power up time.
- Tables TAB through TAD give an example of an initial Address Translation (ADT) Table.
- Table TAB gives an example of the unmdexed values of an ADT table at the completion of system initialization at power up time.
- Table TAC gives an exemplary snapshot of portions of an ADT table that are indexed by spindle number, taken at the completion of system initialization at power up time.
- Table TAD gives an exemplary snapshot of portions of an ADT table that are indexed by disk bm number, taken at the completion of system initialization at power up time.
- Tables TAE through TAG give exemplary snapshots of some portions of an Address Translation Table taken during the operation of the described device.
- Table TAE gives an example of the unmdexed values of an ADT table at an arbitrary time during the operation of described device.
- Table TAF gives an exemplary snapshot of portions of an ADT table that are indexed by spindle number, taken at an arbitrary time during the operation of described device.
- Table TAG gives an exemplary snapshot of portions of an ADT table that are indexed by disk bin number, taken at an arbitrary time during the operation of described device.
- Tables TGB through TGD give an example of an initial GAP Table.
- Table TGB gives an example of the unmdexed values of a GAP table at the completion of system initialization at power up time.
- Table TGC gives an exemplary snapshot of portions of a GAP table that are indexed by spindle number, taken at the completion of system initialization at power up time.
- Table TGD gives an example of the unmdexed values of a GAP table at the completion of system initialization at power up time.
- Tables TGE through TGG give exemplary snapshots of some portions of a GAP table taken at an arbitrary time during the operation of the described device.
- Table TGE gives an example of the unmdexed values of a GAP table taken at an arbitrary time during the operation of the described device.
- Table TGF gives an exemplary snapshot of portions of a GAP table that are indexed by spindle number, taken at an arbitrary time during the operation of described device.
- Table TGG gives an exemplary snapshot of portions of a GAP table that are indexed by gap number, taken at an arbitrary time during the operation of described device.
- Table TMD gives an exemplary snapshot of portions of a Modified Bins table taken at the completion of system initialization at power up time.
- Table TMG gives an exemplary snapshot of some portions of a Modified Bins Table taken during the operation of the described device.
- the present invention is a computer peripheral data storage device consisting of a combination solid state memory and one or more mass storage devices, such as rotating magnetic disks; such device having the large capacity of magnetic disks with near solid state speed at a cost per megabyte approaching that of magnetic disk media.
- mass storage devices such as rotating magnetic disks
- This invention derives its large storage capacity from the rotating magnetic disk media. Its high speed performance stems from the combination of a private channel between the two storage media, one or more microprocessors utilizing a set of unique data management algorithms, a unique prefetch procedure, parallel activity capabilities, and an ample solid state memory.
- This hybrid storage media gives overall performance near that of solid state memory for most types of computer workloads while practically never performing at less than normal magnetic disk speeds for any workload.
- the present invention appears to be one or more directly addressable entities, such as magnetic disks.
- a solid state memory and one or more magnetic disk devices private data communication lines are established within the device which avoids contention with normal data transfers between the host and the device, and transfers between the solid state memory and the disk media.
- These private data channels permit unrestricte ⁇ data transfers between the two storage media with practically no contention with the communication between the host computer and the present invention.
- Utilization of ample solid state memory permits efficient retention of data for multiple, simultaneously active data streams. Management of the storage is via microprocessors which anticipate data accesses based on historical activity. Data is moved into the solid state memory from the one or more mass memory devices based on management algorithms which insure that no table searches need be employed m the time-critical path.
- Host computer accesses to data stored in the solid state memory are at near solid state speeds; accesses to data stored in the mass memory but not in the solid state memory are at near mass memory speeds. All data sent from the host to the device is transferred at solid state speeds limited only by the channel capability.
- One embodiment of the present invention includes a power backup system which includes a rechargeable battery; this backup system is prepared to maintain power on the device should the outside power be interrupted. If such a power interruption occurs, the device manager takes whatever action is necessary to place all updated data onto mass storage before shutting down the entire device. Information about functional errors and operational statistics are maintained by the diagnostic module-error logger. Access to this module is via a device console and/or an attached personal type computer. The console and/or personal computer are the operator's access to the unit for such actions as powering the unit on and off, reading or resetting the error logger, inquiring of the unit's statistics, and modifying the 5 device's management parameters and configuration.
- Memory device 200 is a self-contained module which includes interfaces with certain external devices. Its primary contact is with host computer 201 via host interface 204.
- Host interface 204 comprises, for 5 example, a dedicated SCSI control processor which handles communications between host computer 201 and memory manager 205.
- An operator interface is provided via the console 207, which allows the user to exercise overall control of the memory device 200.
- Memory manager 205 handles all functions necessary to manage the storage of data m, and retrieval of data from disk drive 210 5 (or high capacity memory devices) and solid state memory 208, the two storage media.
- the memory manager 205 consists of one or more microprocessors 205-1, associated firmware 205-2, and management tables, such as Address Translation (ADT) Table 205-3 and Least Recently Used (LRU) Table 205-4.
- ADT Address Translation
- LRU Least Recently Used
- Solid state memory 30 208 is utilized for that data which memory manager 205, based on its experience, deems most useful to host computer 201, or most likely to become useful in the near future.
- Magnetic disk 210 is the ultimate storage for all data, and provides the needed large storage capacity. It may include one or more disk drives
- Disk interface 209 serves as a separate dedicated control processor (such as an SCSI processor) for
- a separate disk interface 209-1 through 209-N is associated with each disk drive 210-1 through
- Console 210-N Information about functional errors and operational statistics are maintained by diagnostic module-error logger 206. 0 Access to module 206 is obtained through console 207. Console
- 207 serves as the operator's access to the memory device 200 for such actions as powering the system on and off, reading or resetting the error logger, or inquiring of system statistics.
- the memory device 200 includes power backup system 203 which
- Backup system 203 is prepared to maintain power to memory device 200 should normal power be interrupted. If such a power interruption occurs, the memory manager 205 takes whatever action is necessary to place all updated data stored in solid state memory 208 onto magnetic disk
- Figure 3 depicts a hardware controller block diagram of one embodiment of this invention.
- hardware controller 300 provides three I/O ports, 301, 302, and 303.
- I/O ports 301 and 302 are single-ended or differential wide or
- I/O ports 303-1 through 303-N is a single-ended SCSI port used to connect controller 300 to disk drive 210 (which in this embodiment is a
- Cache memory 308 (corresponding to memory 208) is a large, high-speed memory used to store, on a dynamic basis, the currently active and potentially active data.
- the storage capacity of cache memory 308 can be selected at any convenient size and, in the embodiment depicted in Figure 3, comprises 64 Megabytes of storage.
- Cache memory 308 is organized as 16 Megawords; each word consists of four data bytes (32 bits) and seven bits of error-correctmg code.
- the storage capacity of cache memory 308 is selected to be within the range of approximately one-half of one percent (0.5%) to 100 percent of the storage capacity of the one or more magnetic disks 210 ( Figure 2) with which it operates.
- a small portion of cache memory 308 is used to store the tables required to manage the caching operations; alternatively, a different memory (not shown, but accessible by microcontroller 305) is used for this purpose.
- Error Detection and Correction (EDAC) circuitry 306 performs error detecting and correcting functions for cache memory 308.
- EDAC circuitry 306 generates a seven-bit error-correctmg code for each 32-bit data word written to cache memory 308; this information is written to cache memory 308 along with the data word from which it was generated.
- the error-correctmg code is examined by EDAC circuitry 306 when data is retrieved from cache memory 308 to verify that the data has not been corrupted since last written to cache memory 308.
- the modified Hamming code chosen for this embodiment allows EDAC circuitry 306 to correct all smgle-bit errors that occur and detect all double-bit and many multiple-bit errors that occur.
- Error logger 307 is used to provide a record of errors that are detected by EDAC circuitry
- error logger 307 The information recorded by error logger 307 is retrieved by microcontroller 305 for analysis and/or display. This information is sufficiently detailed to permit identification by microcontroller 305 of the specific bit in error (for single-bit errors) or the specific word in error (for double-bit errors) .
- EDAC circuitry 306 detects a smgle-bit error
- the bit in error is corrected as the data is transferred to whichever interface requested the data (processor/cache interface logic 316, host/cache interface logic 311 or 312, and disk/cache interface logic 313) .
- a signal is also sent to microcontroller 305 to permit handling of this error condition (which involves analyzing the error based on the contents of error logger 307, attempting to scrub (correct) the error, and analyzing the results of the scrub to determine if the error was soft or hard) .
- EDAC circuitry 306 detects a double-bit error
- a signal is sent to microcontroller 305.
- Microcontroller 305 will recognize that some data has been corrupted. If the corruption has occurred in the ADT or LRU tables, an attempt is made to reconstruct the now-defective table from the other, then relocate both tables to a different portion of cache memory 308. If the corruption has occurred in an area of cache memory 308 that holds user data, microcontroller 305 attempts to salvage as much data as possible (transferring appropriate portions of cache memory 308 to disk drives 210-1 through 210-N, for example) before refusing to accept new data transfer commands.
- Microcontroller 305 includes programmable control processor 314 (for example, a 68360 microcontroller available from Motorola) , 64 kilobytes of EPROM memory 315, and hardware to allow programmable control processor 314 to control the following: I/O ports 301, 302, and 303, cache memory 308, EDAC 306, error logger 307, host/cache interface logic 311 and 312, disk/cache interface logic 313, processor/cache interface logic 316, and serial port 309. Programmable control processor 314 performs the functions dictated by software programs that have been converted into a form that it can execute directly.
- programmable control processor 314 for example, a 68360 microcontroller available from Motorola
- Programmable control processor 314 performs the functions dictated by software programs that have been converted into a form that it can execute directly.
- the host/cache interface logic sections 311 and 312 are essentially identical. Each host/cache interface logic section contains the DMA, byte/word, word/byte, and address register hardware that is required for the corresponding I/O port (301 for 311, 302 for 312) to gain access to cache memory 308. Each host/cache interface logic section also contains hardware to permit control via microcontroller 305. In this embodiment I/O ports 301 and 302 have data path widths of eight bits (byte) . Cache memory 308 has a data path width of 32 bits (word) .Disk/cache interface logic 313 is similar to host/cache interface logic sections 311 and 312.
- Disk/cache interface logic 313 also contains hardware to permit control via microcontroller 305.
- I/O port 303 has a data path width of eight bits (byte) .
- Processor/cache interface logic 316 is similar to host/cache interface logic sections 311 and 312 and disk/cache interface logic 313. It contains the DMA, half-word/word, word/half-word, and address register hardware that is required for programmable control processor 314 to gain access to cache memory 308.
- Processor/cache interface logic 316 also contains hardware to permit control via microcontroller 305.
- Serial port 309 allows the connection of an external device (for example, a small computer) to provide a human interface to the system 200.
- Serial port 309 permits initiation of diagnostics, reporting of diagnostic results, setup of system 200 operating parameters, monitoring of system 200 performance, and reviewing errors recorded inside system 200.
- serial port 309 allows the transfer of different and/or improved soft-ware programs from the external device to the control program storage (when memory 315 is implemented with EEPROM rather than EPROM, for example) .
- firmware provides an active set of logical rules which is a real-time, full-time manager of the device's activities. Among its major responsibilities are the following: 1. Initialization of the device at power up.
- Cache management including the movement of data between cache memory and the various integral disk drives.
- control is transferred to the firmware executive controller. See Figure 35.
- the first task of the executive is to test the various hardware components and initialize the entire set of control tables. See Figure 36. After completion of initialization, the executive enters a closed loop which controls the background tasks associated with each disk drive. See Figure 37. When power to the device is interrupted, the executive initiates a controlled shutdown. See Figure 38. Between power up and shutdown, the system reacts to host commands and, most importantly, is proactive in making independent decisions about its best course of action to maintain the most efficient operation. CONTROL TABLES
- the activities of the present invention are controlled by firmware which m turn is highly dependent on a set of logical tables.
- firmware which m turn is highly dependent on a set of logical tables.
- the configuration of the device and records of the activities and data whereabouts are maintained in tables which are themselves stored in memory in the described device.
- the Configuration Table (CFG) table is made up of unmdexed items which describe the configuration of the device and some of the values defining the device's rules for operation.
- the Address Translation (ADT) table the primary function of this table is to maintain records of which disk bins' data is cached in which cache bins at each instant. It also maintains some historical records of disk bins' activities.
- the Least Recently Used (LRU) table this table is central to the logic of managing the cache bins of the device. It maintains information on which portions of cache bins contain valid data, which portions contain modified data, the order of most recent usage of the data in the cache bins, the recycling control information, and any other information necessary to the operation of the device.
- the Gap (GAP) table this table works in conjunction with the LRU table in keeping track of the modified portions of data withm cache bins.
- This table comes into play only when there are more than one, non- contiguous modified portions of modified data withm any one cache bm. 5.
- the Modified Bins (MOD) table This table keeps a bit -map type record of all disk bins, with an indicator of whether or not the cache bm currently related to the disk bm contains modified data which is waiting to be written to the disk.
- the Configuration table describes an operational version of the present invention and gives the basic rules for its operation.
- CFG-SECSIZE size in bytes, of the sectors on the disk drives.
- CFG-CACHMB size in megabytes, of entire cache.
- CFG-CACHBINS size in bins, of the entire cache.
- CFG-GSEXCPCT lower limit (pet) of all cache, global excess status CFG-GSEXCESB lower limit (bins) of global chain in excess status CFG-GSEXCPCT * CFG-CACHBINS DRIVE MODE PARAMETERS CFG-DMSWEEP lower limit (bins) of modified bins for sweep mode
- CFG-DRVSIZE Definition Capacity, in megabytes, of each disk drive.
- CFG-SEC ⁇ IZE Size, m sectors, of each cache bm and each disk bm.
- Initialization Set at time CFG table is created; may be reset via communication with the serial port when the device is totally inactive, offline, and has no data stored in its cache. In one example, this is preset to a number which creates a bm size of approximately 32KB. This is approximately 64 sectors if the sector size is 512 bytes.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- CFG-DSNORPCT Definition The minimum cache size assigned to a drive when that drive is in normal status; expressed as a percentage of all cache .
- Initialization Preset to predefined number,- can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache.
- CFG-DSEXCPCT Definition The lower limit of the drive minimum cache size when the drive is in excess status; expressed as a percentage of the total cache, distributed over all drives. This also, and more importantly, defines the upper limit of a drive's normal status.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Calculated at device startup time as a portion of the entire cache in the device.
- CFG-GSMARGB Definition Absolute lower limit of number of cache bins in the global cache chain when in global marginal status. The lowest number of cache bins permitted to be assigned to the global cache chain at any time; the logic of the device always keeps the number of cache bins in the global cache chain greater than this number.
- Initialization Preset to predefined number; can be reset via erial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- CFG-GSEXCPCT Definition The lower limit (% of total cache) of the amount of the total cache in the global cache chain when m global cache excess status.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- CFG-DMURGPCT Definition The percent of a drive's average share of all cache bins, which when modified, puts that drive mto urgent mode.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- CFG-DMURGNTB Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Set based on other parameters.
- CFG-DMURGNTB CFG-DMURGPCT * CFG-CACHBINS / CFG-DRIVES CFG-DMSATPCT
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Set based on other parameters.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- CFG-LAD-ADJUST Definition The value of the total host I/O count that, when attained, causes the counts of I/O's for each drive to be adjusted downward by the least-active-drive tally adjustment factor.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
- CFG-LAD-ADJUST Definition The divisor to be used for adjusting the count-relative tally of host I/O's for each drive.
- Initialization Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache.
- FORMAT OF ADDRESS TRANSLATION ADT TABLE The Address Translation Table, working with the LRU, GAP, and MOD tables, maintains the information required to manage the caching operations.
- ADT Address Translation Table
- ADT-DISKS the indexed section containing information pertaining to each logical spmdle mcluded in the described device.
- ADT-BINS the indexed section of the ADT table containing information pertaining to each logical disk bm of the entire described storage device.
- the unmdexed segment of the ADT table contains fields whose values are dynamically variable values; these fields are used primarily as counters of the device activities.
- ADTC-READS The total number of host reads from the device since the device was powered on or since this field was last reset.
- ADTC-WRITES The total number of host writes to the device since the device was powered on or since this field was last reset .
- APT-DISKS THE FIRST INDEXED SEGMENT OF THE ADT TABLE
- the first tabular section of the ADT table contains information relating to each logical spindle. There is one line in this section for each logical spmdle, and each line is referenced by logical spmdle number.
- ADTD-LINE-BEG The first line number withm the ADT-BINS table which relates to the referenced logical disk spmdle. This is set during system configuration and is not changed during the storage device's operation.
- ADTD-LINE-BEG is used as an offset to locate lines in the ADT-BINS table that are associated with a specific logical spmdle.
- ADTD-HEAD-POS The current position of the read/write head of the logical spindle. This is kept as a logical disk bin number and is updated each time the referenced disk head is repositioned.
- ADTD-SWEEP-DIR For each logical disk spmdle, the direction in which the current sweep of the background writes is progressing. This is updated each time the sweep reverses its direction across the referenced disk. 4) ADTD-DISK-ACCESSES. A count of the number of times this logical disk spmdle has been accessed by the host since the last time this field was reset. This is an optional field, but if present, is incremented each time the logical spmdle is accessed for either a read or a write. This field may be used to influence the amount of cache assigned to each spmdle at any given time.
- ADTD-DISK-READS A count of the number of times this logical disk spmdle has been accessed by a host read operation since the last time this field was reset. This field may be used to influence the amount of cache assigned to each spmdle at any given time.
- ADTD-DISK-WRITES A count of the number of times this logical disk spmdle has been accessed by a host write operation since the last time this field was reset. This field may be used to influence the amount of cache assigned to each spmdle at any given time.
- ADTD-LAD-USAGE A count related to the number of times this logical disk spindle has been accessed by the host. This field is incremented each time the logical spmdle is accessed for either a read or a write, and it is recalculated when the current total count of host I/O's for all drives reaches a preset limit. This field is used to maintain balance among the various drives and the management of the amount of cache assigned to each spmdle at any given time.
- ADTD-LINK-MORE the pointer to the ADTD line relating to the drive which has the next higher usage factor in the least-active-drive list. This is part of the bidirectional chaining of the LAD list lines. If this bm is the most active of all m the chain, ADTD-LINK-MORE will contain a null value.
- ADTD-LINK-LESS the pointer to the ADTD line relating to the drive which has the next lower usage factor in the least-active-drive list. This is part of the bidirectional chaining of the LAD list lines. If this bm is the least active of all in the chain, ADTD-LINK-LESS will contain a null value.
- the lines are grouped based on logical spmdle number so that all lines related to bins of the first logical spmdle are in the first section of the table, followed by all lines for bins of the second logical spmdle, and so on for all logical spindles in the storage device.
- the ADT-BINS line number adjusted by the offset for the specific logical spmdle, directly corresponds to a logical disk bm number on the disk.
- the host wants to access or modify data on the disk, it does so by referencing a starting disk sector address and indicating the number of sectors to be accessed or modified. For caching purposes, the starting sector address is converted mto a logical disk bin number and sector offset withm that logical disk bm.
- a disk sector address is converted mto a logical disk bm number and a sector offset by dividing it by the number of sectors per logical disk bm. The remainder is the offset mto the bm.
- the quotient is the disk bm identifier and is the index into the ADT table. Using this index, the condition of the specified disk bm can be determined directly from data in the ADT table; no search is required to determine cache-hits or misses.
- Each ADT-BINS line contains at least the following item:
- ADTB-CACHE-BIN This field contains the number of the logical cache bm which contains the data for the logical disk bm corresponding to this ADT table line number. By design, the value in ADTB-CACHE-BIN also points to the line m the LRU table related to the cached disk bm. A null value is stored in this field to indicate that the data which is stored m, or is destined to be stored in, the logical disk bm is not in cache memory. It is by means of this field that cache-hits can be serviced completely without any table search. This field is updated each time data for a logical disk bin is entered into or removed from the cache memory.
- each ADT-BINS line may contain one or more activity monitoring fields such as, but not limited to the following:
- ADTB-BIN-ACCESSES A count of the number of times this logical disk bm of this spmdle has been accessed by the host since the last time this field was reset.
- ADTB-BIN-READS A count of the number of times this logical disk bm of this spmdle has been accessed by a host read operation since the last time this field was reset.
- the LRU table maintains the basic information pertaining to all cache bins. This includes a variety of information, generally of the following nature: 1. The assignment of cache bins to logical spindles;
- the LRU table also provides certain redundancies for the data kept in the ADT table, thus contributing to system reliability. It is by means of the LRU table, the ADT table, and the GAP table that the system determines which cache bin to overwrite when cache space is needed for an uncached disk bin.
- the unindexed portion of the LRU table contains global data required to manage the caching process .
- the tabular portions provide the actual cache management information and are composed of pointers for LRU chaining purposes, pointers into the ADT and GAP tables, and the recycle control registers or flags. For reference purposes, these three segments will be referenced as follows:
- LRU-CONTROL the unindexed section of the LRU table which contains overall, global information concerning the cache bins.
- LRU-DISKS the indexed section which contains information pertaining to the cache bins associated with each logical spmdle mcluded in the device.
- LRU-BINS the indexed section of the LRU table which contains information pertaining to each cache bm of the entire storage device.
- LRUC-GLOBAL-STATUS current status of the global cache chain. 5. LRUC-GLOBAL-DIRTY - total number of modified cache bins . 6. LRUC-GLOBAL-MODE - current operating mode for the global cache. LRU-DISKS. FIRST INDEXED SEGMENT OF LRU TABLE. SPINDLES INFORMATION
- LRUD-BINS-DIRTY number modified cache bins assigned to spmdle.
- LRUD-DISK-LRU pointer to oldest bm in spindle's cache chain.
- LRUD-DISK-MRU pointer to newest bm in spindle's cache chain.
- LRUB-CHAIN - flag indicating bm is in global or m a drive chain.
- LRUB-VALID-LOW lowest sector in cache bin containing valid data.
- LRUB-LOCK-RD-FETCH locked for fetch from disk for host read.
- LRUP-LOCK-RD-AHEAD locked for read ahead based on host read.
- the unindexed items pertain to the cache management, and include the following single-valued items.
- LRUC-GLOBAL-BINS the total number of cache bins currently in the global cache chain.
- LRUC-GLOBAL-LRU the pointer to oldest bm in the global cache chain.
- This LRU-CONTROL element points to the LRU-BINS table line whose corresponding cache data area is considered to be in the global chain and which has been left untouched for the longest period of time by a host read, cache ahead, or a cleaning operation. If there is new read activity for the referenced cache bm it is updated and promoted to the MRU position for its spmdle; if the new activity is a result of a host write, the cache bin is logically placed in the modified bins pool.
- the GLOBAL LRU cache bm is the first candidate for overwriting when new data must be placed in the cache for any spmdle . When such overwriting does occur, this bm will be removed from the global chain and placed in one of the spmdle ' s local LRU chains or in the modified pool.
- LRUC-GLOBAL-MRU the pointer to the LRU-BINS table line whose corresponding cache bm is in the global chain and which is considered to be the most recently used of the global cache bins.
- GLOBAL-MRU is updated every time a cache bin of any spmdle is demoted from any local spindle's LRU chain, when a cache bin of the modified pool is cleaned by writing its modified data to its disk, or when, for any reason, a cache bm is chained to this position.
- This relates to the number of cache bins which contain unmodified data and which are currently assigned to the global cache and, therefore, are currently linked mto the global cache chain.
- the status is reset each time a cache bm is placed in or removed from the global cache chain.
- the global status is always excess, normal, marginal, or minimal. 5.
- LRUC-GLOBAL-DIRTY the total number of modified cache bins, for all spindles, regardless of the spmdle to which they are assigned.
- cache bins contain data which has written from the host into cache and are currently waiting to be written from cache to disk.
- LRUC-GLOBAL-MODE the current operating mode for the global cache. This relates to the total number of cache bins assigned to all disk drives and which currently contain modified data. The mode is reset each time a cache bin for any drive is moved into or out of the modified pool. The global mode is always normal, urgent, or saturated.
- the first tabular section of the LRU table contains information relating to the cache bins assigned to each logical spindle. There is one line in this section for each logical spindle, and each line is referenced by logical spindle number.
- LRUD-BINS-CHAIN the number cache bins allocated to the corresponding spindle's private cache chain. This field is used to maintain the number of cache bins containing clean data and currently allocated to this spindle's private cache. This count excludes those cache bins assigned to this spindle but which are allocated to the global cache (and thus, linked into the global chain) .
- LRUD-BINS-DIRTY the number of cache bins currently assigned to the corresponding spindle, each of which contains some modified, or dirty, data which is currently awaiting a write to disk. This number is increased by one whenever an unmodified cache bin associated with this spindle is updated by the host, and it is decreased by one whenever data in a modified cache bin associated with this spindle is copied to the disk.
- LRUD-DISK-LRU points to the spindle's cache bin (and to the corresponding line m the LRUB table) which is in the spindle's cache chain and which has been untouched by host activity for the longest period of time. It is updated when new activity for the referenced cache bin makes it no longer the least-recently-used of the referenced spindle.
- the referenced cache bin is the next candidate for demoting to global cache when this spindle must give up a cache bin.
- a cache bin demoted from the LRU position of any local cache chain enters the global chain at the MRU position.
- LRUD-DISK-MRU points to the cache bin (and to the corresponding line in the LRUB table) which has been most recently referenced by host read activity.
- the referenced cache bin will always be in the spindle's private cache.
- LRUD-DISK-MRU is updated each time a cache bin of the referenced spindle is touched by a read from the host, when data for the spindle is read from disk into cache as part of a cache ahead operation, or when a cache bin is promoted based on the recycling procedures.
- the address of the accessed cache bin is placed m LRUD-DISK-MRU and the LRU-BINS chains are updated in the LRU-BINS table.
- a bin is used from the LRU position of the global cache to fulfill the requirement for a cache bin for a given spmdle, that global cache bin is allocated to the specific spmdle at that time and either chained into the spindle's local chain or placed in the modified pool.
- Such action may require the receiving spmdle, or some other spindle, to give its LRU cache bin to the top (MRU) of the global cache chain. Giving up the LRU cache bin in such a fashion does not decache the data in the bin; the data remains valid for the spindle from whose chain it was removed until the cache bin reaches the global LRU position and is reused for caching some other logical disk bin.
- LRUD-DISK-STATUS the current status of the cache chain for the disk drive. This relates to the number of cache bins which contain unmodified data and which are currently assigned to a given drive and, therefore, are currently linked into the drive's private cache chain. The status is reset each time a cache bin is placed in or removed from the disk's private cache chain. The drive status is always excess, normal, marginal , or minimal .
- LRUD-DISK-MODE the current operating mode for the disk drive. This relates to the number of cache bins assigned to the disk drive and which currently contain modified data.
- the mode is reset each time a cache bin for the given drive is moved into or out of the modified pool, or when a sweep timeout occurs.
- the drive mode is always normal, timeout, sweep, urgent, or saturated.
- Each LRU-BINS table line contains pointer fields plus other control fields.
- LRUB-DISK-ID the identity of the spmdle for this cache bm.
- LRUB-CHAIN - flag indicating whether the cache bm is m the global cache chain or in the drive cache chain. This is a one-bit marker where a one indicates the cache bm is m the global chain, and a zero indicates the cache bm is m one of the drive's cache chains. When the cache bin is m the modifie ⁇ pool, this field has no meaning.
- LRUB-LINK-OLD the pointer to the LRUB line relating to the next-older (in usage) cache bm for the same drive. This is part of the bidirectional chaining of the LRU table lines. If this bin is the oldest of all in the chain, LRUB-LINK-OLD will contain a null value.
- LRUB-LINK-NEW the pointer to the LRUB line relating to the next newer (in usage) cache bm for the same drive. This is the other half of the bidirectional chaining of LRU table lines. If this bm is the newest of all m the chain, LRUB-LINK-NEW will contain a null value.
- LRUB-VALID-LOW the number of the lowest sector withm the cache bm containing valid data. This is a bm-relative number.
- LRUB-VALID-HIGH the number of the highest sector withm the cache bm containing valid data. This is a bm-relative number.
- LRUB-MOD-LOW the number of the lowest sector withm the cache bm containing modified data, if any. This is a bin-relative number.
- LRUB-MOD-HIGH the number of the highest sector withm the cache bm containing modified data. This is a bm-relative number.
- LRUB-MOD-GAP a pointer mto the GAP table if any gaps consisting of uncached data exist within the modified portion of the currently cached portion withm this cache bm. If one or more such gaps exist, this field points to the GAP table line containing information pertaining to the first of such gaps. If no such gaps exist, this field will contain a null value. Since normal workloads create few gaps and a background task is dedicated to the clearing of gaps, there will be very few, if any, gaps at any given instant during normal operations .
- LRUB-LOCKED - a set of flags which indicate whether or not the cache bm is currently locked. This set of flags indicates whether or not the corresponding cache bm is currently the target of some operation, such as being acquired from the disk, being modified by the host, or being written to the disk by the cache controller; such operation making the cache bm unavailable for certain other operations.
- the following sub-fields each indicate some specific reason for which the cache bm is locked, such lock may restrict some other specific operations involving this cache bm. More than one lock may be set for a given bin at any one time. For purposes of quickly determining if a cache bin is locked, these flags are treated as one field made up of eight sub-fields of one bit each. If found to be locked, the individual lock bits are inspected for the reason(s) for the lock.
- LRUB-LOCK-RDHIT - a flag which, when set, indicates the cache bm is locked by a host read hit.
- LRUB-LOCK-RDMISS - a flag which, when set, indicates the cache bm is locked by a host read miss.
- LRUP-LOCK-WRHIT - a flag which, when set, indicates the cache bm is locked by a host write hit .
- LRUP-LOCK-WRMISS - a flag which, when set, indicates the cache bm is locked by a host write miss.
- LRUB-LOCK-RD-FETCH - a flag which, when set, indicates the cache bm is locked for a fetch from disk for a host read miss.
- LRUP-LOCK-GAP-READ - a flag which, when set, indicates the cache bin is locked for a read from disk to eliminate a gap in modified data.
- LRUB-LOCK-GAP-WRITE - a flag which, when set, indicates the cache bm is locked for a write to disk to eliminate a gap in modified data.
- LRUP-LOCK-SWEEP - a flag which, when set, indicates the cache bm is locked by the sweep for writing modified data to disk.
- LRUB-RECYCLE a field whose value indicates the desirability of recycling, or retaining in cache, the data currently resident m the cache bm. Its management and usage is as described in the recycling section of this document. The higher the value m this field, the more desirable it is to retain the data in this cache bin in cache when the cache bm reaches the LRU position in its spindle's LRU chain.
- This field may be one or more bits in size; for purposes of this description, it will be assumed to be four bits, allowing for a maximum value of 15.
- the GAP table maintains the information required to manage the gaps m the valid, modified data withm bins of cached data. For each spmdle, the gaps are chained such that all gaps for a given cache bm are grouped mto a contiguous set of links. There are three sections m the GAP table, the unindexed, or single-valued items, and two sets of indexed, or tabular segments. For reference purposes, these three segments will be referenced as follows:
- GAP-CONTROL the unindexed section of the GAP table.
- GAP-DISKS the indexed section containing information pertaining to each logical spindle included in the described device.
- GAP-GAPS the indexed section containing detailed information pertaining to each gap that currently exists.
- the unindexed items of the GAP table include the following single-valued items.
- GAP-GAPS The total number of gaps that currently exist for all logical spindles of the described storage device.
- GAP-UNUSED-FIRST The line number of the first unused line in the GAPS portion of the GAP table.
- GAP-UNUSED-NUMBER The number of unused lines in the GAPS portion of the GAP table.
- the first tabular section of the GAP table contains summary information about the gaps relating to cache bins assigned to each logical spindle. There is one line in this section for each logical spindle, and it is indexed by logical spindle number. For the described version of this invention, this portion of the GAP table will contain four lines; however, any number of logical spindles could be included within the constraints of and consistent with the other elements of the device.
- GAPD-NUMBER The number of gaps that currently exist for this logical spindle. This is increased by one whenever a new gap is created in a cache bin assigned to this logical spindle, and it is decreased by one whenever a gap for this logical spindle has been eliminated. If no gaps exist in cache bins assigned to this logical spindle, this value is set to zero.
- GAPD-FIRST A pointer which contains the GAP table line number of the first gap for the logical spmdle.
- GAPD-LAST A pointer which contains the GAP table line number of the last gap for the logical spmdle.
- the second tabular section of the GAP table contains detail information about the gaps that exist. There is one line m this section for each gap m any cache bm assigned to any logical spmdle, and it is indexed by arbitrary line number. Lines of the GAP-GAPS table are chained m such a way as to ensure that all GAP-GAPS lines relating to a given cache bm are chained into contiguous links. It is likely that, at any given time, very few, if any, of these lines will contain real information.
- the design of the storage device operations is to minimize the number of gaps that exist at any given moment, and when they do exist, the device gives a high priority to the background task of eliminating the gaps. 1) GAPG-DISK.
- GAPG-BIN The cache bm number in which this gap exists.
- the value in this field acts as an index mto the LRU table for the cache bm in which this gap exists.
- the value in this field is null if this line of the GAP-GAPS table is not assigned to any cache bm; when this value is null, it indicates that this line is available to be used for some new gap should such a gap be created by caching activities.
- GAPG-SECTOR-BEG The sector number in the bm identified in GAPG-BIN which is the first that contains non-valid data; this sector is the beginning of the gap. The value in this field is meaningless if this line of the GAP-GAPS table is not assigned to any spmdle.
- GAPG-SECTOR-END The sector number m the bm identified in GAPG-BIN which is the last that contains non-valid data; this sector is the end of the gap. The value in this field is meaningless if this line of the GAP-GAPS table is not assigned to any spmdle.
- GAPG-PREV A pointer to a line of the GAP-GAPS table which contains details pertaining to another gap of the same cache bm of the same logical spmdle, which gap precedes this gap m the gap chain for the same cache bin for the same logical spmdle. If this line of the table does not currently represent a gap, this field is used to maintain the position of this table line m the available gaps chain. Note that the firmware will maintain the gap chains in such a fashion so as to ensure that all gaps for a given cache bin assigned to a given spmdle will be connected in contiguous links of the spmdle gap chain.
- this field points to the preceding unused line of the GAP-GAPS table. If this is the first link in the unused gap chain, this value will be set to null.
- GAPG-NEXT A pointer to a line of the GAP-GAPS table which contains details pertaining to another gap of the same cache bm of the same logical spmdle, which gap follows this gap in the gap chain for this logical spmdle and cache bm. If this line of the table does not currently represent a gap, this field is used to maintain the position of this table line in the available gaps chain. If this line of the GAP-GAPS table is not assigned to any spmdle, the value in this field points to the succeeding unused line of the GAP-GAPS table. If this is the last link in the unused gap chain, this value will be set to null.
- the Modified Bins (MOD) Table working with the ADT, LRU, and GAP tables, maintains the information required to manage the background sweep operations of the described device.
- the MOD table contains one bit. This bit is a one if modified data currently resides in the cache bm corresponding to the disk bin which relates to the said bit and the bit is a zero if no such modified data exists for the disk bin.
- the bits are accessed in word-size groups, for example, 16 bits per access. If the entire computer word is zero, it is known that there is no modified data m the cache bins corresponding to the disk bins represented by those 16 bits.
- the computer word is non-zero, there is modified data in cache for at least one of the related disk bins. Since the bits of the MOD table are maintained in the same sequence as the disk bins, and the starting point related to each disk is known, the MOD table can be used to relate disk bins to modified cached data. This information, along with the information in the ADT, LRU, and GAP tables, gives a method of quickly determining which cache bin's data should be written to disk at the next opportunity. See the descriptions of the ADT table items which contain information about disk drive sizes and current read write head positions. There is one segment m the MOD table; it is indexed by an arbitrary value which is calculated from a disk bin number. Each line of the MOPD table is a single 16-bit word. Each word, or line, contains 16 one-bit flags representing the condition of the corresponding 16 disk bins with respect modified data. The reference name for the field is MOD-FLAGS. FORMAT OF CACHE BINS (BAL) TABLE
- a temporary BAL table is created for each host I/O.
- the BAL table for a given host I/O is made up of a list of the disk bins involved in the I/O and their corresponding cache bins, if any.
- a BAL table will contain one line for each disk bin which is part of the I/O.
- a BAL table contains information necessary for fulfilling the host I/O. This includes a variety of information which includes the entire set of information required to handle the I/O.
- the indexed portion of the table gives details about each disk bin involved in the host I/O.
- the unindexed portion of the BAL table contains data describing the I/O.
- the tabular portion provides actual cache information.
- these two portions will be referenced as follows: 1.
- BAL-COMMAND the unindexed section of the BAL table which contains overall information concerning the I/O.
- BAL-BINS the indexed section which contains information pertaining to the each disk bin associated with the host I/O.
- the unindexed items in the BAL-COMMAND section describe the host I/O, and include the following single-valued items.
- BALC-DISK-ID the logical spindle to which the host I/O was addressed.
- BALC-ADDRESS the logical sector number of the first sector of data, on the specified logical spmdle, to be transferred for the host I/O.
- BALC-SIZE the number of sectors of data to be transferred m the host I/O.
- BALC-HIT A flag indicating whether or not all the data (for a read command) , or the whole data area (for a write command) , for the entire host I/O is represented m cache.
- the indexed items m the BAL-BINS section describe the details about each disk bin involved in the host I/O. There is one line in the table for each disk bm involved in the host I/O. Each line of a BAL table contains the following items. 1. BALB-DBIN - the disk bm number.
- BALB-CBIN The corresponding cache bm, if any, in which data of the disk bin is currently cached.
- BALB-BEGSEC The beginning sector number withm the disk bm required for the host I/O.
- BALB-ENDSEC The final sector number withm the disk bm required for the host I/O.
- BALB-VALID-LOW The beginning sector number withm the cache bm which is in cache and which is required for the host I/O. For a cache hit, this will match the value m BALB-BEGSEC if this is the first bm involved in the I/O. For a cache hit on subsequent bins, if any, this will indicate the first sector (sector 0) of the bin.
- BALB-VALID-HIGH The final sector number within the cache bm which is in cache and which is required for the host I/O. For a cache hit, this will match the value in BALB-ENDSEC if this is the last bm involved in the I/O; for a cache hit on 97/49037 PC17US97/10155 ⁇
- BALB-GAPS A marker indicating whether or not there are any gaps in the required cached area of this cache bm.
- the caching operations of the described device are based, in part, on the concepts of drive modes and cache statuses. DRIVE MODES
- the two types of drive modes are the global drive mode and a mode for each individual drive.
- the modes are based on the number of modified cache bins assigned to each drive.
- the global mode is based on the total number of modified cache bins for all drives. See Figure 9.
- the purpose of the modes is to control the described device's actions with respect to its discretionary disk activity such as the background sweep, cache ahead, read ahead, and the cache management of recycling.
- An illustration of a drive cache with respect to drive operating modes is given m Figure 5.
- An illustration of the global cache with respect to global operating modes is given in Figure 7.
- the cache of each drive is always in one of the defined drive modes.
- the possible drive modes are normal, timeout, sweep, urgent, and saturated.
- Table CM3 shows the rules for setting the drive modes.
- CFG-DMURGNTB ⁇ dp ⁇ CFG-DMSATURB urgent may af f ect performance
- the global cache of the described device is always in one of the defined global modes.
- the possible global modes are normal, urgent, and saturated.
- Table CM4 shows the rules for setting the global drive mode. See Figure 17.
- the operating rules for a given drive are based on that drive's operating mode.
- the drive mode governs various cache activities relating to the corresponding disk drives; in particular, it governs, drive by drive, the activation of the background sweep modules, the cache-ahead modules, recycling, and the read-ahead modules.
- Table CM1 summarizes the drive mode -operating rules relationships.
- the operating rules for the drives may be overridden by the rules based on the global operating mode. If the global mode so indicates, a drive may be forced to operate in a manner other than would be indicated by its own cache mode.
- Table CM2 summarizes the relationships between the global modes and the drive modes. Global mode cedes control of individual drives to drive mode except as noted in table CM2. In case of conflict between the rules of the two types of modes, the global mode operating rules override the drive mode operating rules.
- cache chain statuses There are two types of cache chain statuses, global cache status and a drive cache status for each drive.
- the purpose of the statuses is to help to manage the cache activities, such as to provide plateaus for amounts of cache assigned to each drive under the varying balance of workloads among the drives.
- the global cache status is based on the number of cache bins in the global cache.
- Each drive's cache status is based on the number of bins in each drive's cache chain. While there could be any reasonable number of cache statuses, for purposes of this discussion, there will be assumed to be four; these are given names of minimal, marginal, normal, and excess. In the relationships between the statuses, excess is considered above, or higher than, normal; normal is considered above, or higher than, marginal; and marginal is considered above, or higher than, minimal.
- One of the functions of the statuses is to facilitate the reallocation of cache from one drive to another as different drives become the most active of the drives. As a drive becomes the target of many host I/O's in a short period of time, the cache assigned to that drive will be drawn from the other drives in an orderly fashion. The other drive which is the least active of those in the highest status will give up cache first.
- cache from the next least active drive will be drawn off until that drive reaches that same lower level .
- Figures 4 and 6 The sizes of the categories of cache chains on which the statuses are based, and are also shown in Figures 4 and 6. These sizes are chosen such that they enable the described device to act in an efficient manner based on the current cache conditions.
- Figure 57 illustrates the growth and recession of the cache allocations using the three plateaus as described herein.
- Figure 58 illustrates the cache allocations with an assumed five plateau configuration, a logical extension of the described concepts.
- Each cache chain status defines a condition or situation under which the corresponding component is operating and how that component interacts with other components.
- the minimum status is the one in which the component cannot under any condition give up a cache bin.
- this status generally defines the number of cache bins required to handle one host I/O of the largest acceptable size.
- this status generally defines the number of cache bins required to maintain the cache chains intact . Assuming more than one drive is configured mto the described device, not all components can simultaneously be in the minimal status except during a period of a large number of writes by the host .
- the marginal cache status is the one in which the component has sufficient cache bins available to operate but is not operating in the optimal manner. Assuming more than one drive is configured mto the described device, not all components can simultaneously be in the marginal status except during a period of a large number of writes by the host .
- the marginal cache status defines the smallest the cache chain for a given drive may become when the described device is operating m the generally normal fashion. In other words, each drive will usually have at least the marginal amount of cache protected from depletion by the needs of other drives.
- the normal cache status is the one which the device logic desires to maintain the component for best overall device performance.
- a very active drive will generally operate with a number of cache bins hovering m the neighborhood of the upper limit of the normal status.
- a very inactive drive will generally operate with a number of cache bins hovering m the neighborhood of the lower limit of the normal status.
- the excess cache status is the one in which the component has more than the desired maximum cache bins assigned to it for optimal overall device performance.
- the global cache chain will begin operation m this status when the device is powered up. As various drives become active, the global status will move into the normal status. A drive will not likely ever be m the excess status.
- the primary purpose of the excess status is to delineate the upper bound of normal cache status. This is important to maintaining the balance of the caches assigned to the various drives under changing workloads DRIVE CACHE CHAIN STATUS DETERMINATION
- the cache chain for each drive is always in one of the defined drive cache chain statuses. Table CS1 shows the rules for setting the drive cache statuses.
- the global cache chain is always m one of the defined global cache chain statuses .
- Table CS2 shows the rules for setting the global cache status.
- the global and drive cache chain statuses interact to determine where to find the cache bm to be reused.
- the drive may acquire the bin via one of three types of actions. See Figure 11.
- Stealing is the preferred method for a drive to obtain a cache bm when a drive needs one for any purpose, for a read of any nature or a write.
- a drive which needs a cache bm to reuse may, depending on the cache chain statuses, "steal" a bm from the global cache chain. Since the drive is stealing the cache bm, it need not give up a cache bm; the global cache chain can provide a cache bm in some manner. The data in the global LRU cache bm is decached and the cache bm is made available for the drive's use.
- the global cache may be compensated by taking a cache bm from some other drive.
- the global cache will usually take a bm from the least active drive in order to maintain the global cache withm the normal status.
- the global cache chain is in normal status. 2. No other drive has a cache chain status better than marginal, 3. The cache chain status of the stealing drive is not excess.
- the following set of conditions for stealing will generally occur only during the startup time of the device when nothing is in cache. These conditions are:
- the global cache chain is in excess status.
- the set of conditions for stealing a cache bin with compensation will be the normal ones encountered during the operation of the described device. These conditions are: 1. The global cache chain is not in the excess status .
- a bin is to be stolen from the least active cache chain which has the highest cache status.
- the global chain is considered to be the most active. See Figures 11 through 16. The following general logic is used: 1. The best, least active chain is identified.
- the method for obtaining that cache bin depends, to some extent, on the intended usage of the cache bm.
- the first set of conditions for buying a cache bm 1.
- the buying drive' s cache chain must be m the excess status; the cache bin may be used for either a read or a write.
- the second set of conditions for buying a cache bm is a combination of the following:
- the global cache chain must be in the minimal or marginal status.
- the buying drive ' s cache chain is not m the excess status .
- No other drive m the least active drive list has a status better than marginal.
- the third set of conditions for buying a cache bm is a combination of the following:
- the cache bm is to be used for a read operation.
- the global cache chain must be m the minimal status .
- the buying drive's cache chain is in the minimal status.
- No other drive in the least active drive list has a status better than marginal .
- the buying drive's LRU cache bm is rechamed to the MRU position of the global cache chain.
- the global LRU cache bm is rechamed to the MRU position of the cache chain of the drive requiring a cache bm.
- the data in the global LRU cache bin is decached and that cache bin is made available for the drive's use. If no drive is found to be able to donate a cache bin, the system is overloaded with modified bins, i.e. in a saturated state. In this condition, the management of all drives is actively trying to write modified bins to the corresponding drives. A drive begging for a bin must wait until a cache bin becomes available from some drive's cache, even from its own. DECACHE DATA FROM A CACHE BIN
- the references to the corresponding drive bin are updated in the ADT table, LRU table, and GAP table, if any, are updated to show that the drive bin whose data currently was cached in the given cache bin is no longer cached.
- a bin that is a candidate for decaching will never have any references in the MODIFIED BINS table since that would indicate the host has written some data into this cache bin which has not yet been written to the drive.
- such a cache bin would be in the modified pool, and not in the global cache chain.
- the cache bin to be decached will be the cache bin currently at the LRU position of the global cache chain. This generally will be a cache bin whose data has not 97/49037 PCMJS97/10155
- the cache bm chosen is not necessarily the one with absolutely longest time since access; due to the dynamic and ever-changing assignment of the described device ' s cache mto a global cache chain and multiple private cache chains for each drive, there may be some cache bm assigned to a drive other than the one currently involved in the reason for the decaching which has been cached but unreferenced for a longer period of time. This exception is intentional, it is part of the design which is intended to prevent activity related to any one drive from creating excessive interference with the caching performance of data for another drive; it enhances the effectiveness of the described caching scheme.
- the primary condition which must be satisfied in order for data m a cache bm to be decached is that the bm must be inactive; that is, it is not at this instant the subject of any host or background activity. It is highly unlikely that the global LRU bm would have any activity since most activities would reposition it out of the global cache chain.
- a gap is created when data destined for more than one portion of a disk bm are cached in a single cache bm, the cached portions of data are both in the modified or dirty condition, and the cached portions are not contiguous. This is dealt with by bookkeeping in the LRU and GAP tables. It is generally desirable to eliminate gaps as much as possible since they complicate the process of determining cache hits. There are several possibilities for the relationship of the location of new data sent from the host with respect to previously cached data.
- the new data's locality may relate to the localities of previously cached data m a variety of ways.
- the new data may share no cache bins with the old, in which case, no gaps in the modified data will occur. If the new data does share cache bins with previously cached data, it may share the cache bins with old data m several ways. If the new data is contiguous to, or overlapping the old modified, cached data, no gaps can exist If all the previously cached data is clean, no gap is allowed to be created since some or all of the old data may be logically decached to avoid the gap.
- a gap may be eliminated in either of two ways, the method being selected to be the most beneficial to the overall operation of the device. The decision of which method to use in eliminating a gap is based on the relative sizes of the dirty data segments, the current drive mode, and the global mode. If the modes indicate that there is a relatively large amount of modified data to be written from cache to disk, the decision will be to write data to eliminate the gap.
- the data that should be located in the intervening space may be read from the disk mto cache and marked it dirty even when it is not dirty; the gap will no longer exist. This method is chosen when the drive and global modes are both normal, and the ratio of the gap size to the size of the smaller of the adjacent cached areas is less than a predetermined value.
- the data occupying the modified areas withm the cache bm may be written from cache to disk; in this case the gap is eliminated by cleaning the dirty pieces of data and then decachmg one of the cached areas .
- the data to be decached is selected based on the relative sizes and directions within the cache bm of the cached portions so as to retain the data more likely to be useful in cache. This is usually the larger of the two segments, but may, under some circumstances be the smaller. Regardless of the method chosen to eliminate a gap, or gaps in the modified data withm a given cache bm, the process involves several steps. See Figures 39, 51, and 50. 1. The GAP table is updated to show a gap elimination is in progress on a given cache bm. 2. The LRU table is updated to show a gap elimination is in progress on a given cache bin.
- a read from, or a write to, a disk drive is initiated.
- the device manager continues to handle other tasks, both for the host and background.
- the gap elimination module completes the LRU table and GAP table bookkeeping required to eliminate the gap.
- a gap is an area within a bin, its size being the number of sectors between the end of the modified data area preceding the gap and the beginning of the modified data area following the gap.
- gapsize is the number of sectors in the gap.
- forward-size is the size of the cached portion within the cache bin which is contiguous to the forward-most sector of the gap.
- backward-size is the size of the cached portion within the cache bin which is contiguous to the rearward-most sector of the gap.
- forward-ratio is gapsize divided by forward-size.
- backward-ratio is gapsize divided by backward-size.
- gapw ⁇ te is a preset ratio of the gapsize to forward-size or backward-size, which ratio, when exceeded, causes the gap to be eliminated by:
- One of the goals of the present invention is to have in cache, at any given time, that data which is expected to be accessed by the host m the near future. Much of the data retained m cache has been placed there either directly by the host or has been read from the disk as a direct result of a host activity. In addition to that data, it is desirable to anticipate future host requests for data to be read from the device, and to prefetch that data from the disk into cache. There are several aspects to the prefetching of data. Some of these have a positive effect and some have a negative effect on performance. On the positive side, successful prefetching into cache of data that is actually requested by the host precludes read cache misses. This improves the overall device's performance.
- a read cache-ahead action is a background type of activity, and only uses the private channel between disk and cache, it will have a minimal negative impact on the caching device's response time to host I/O activity.
- the cache-ahead is given a lower priority than any incoming host I/O request.
- the controller checks for the desirability for a cache-ahead after every host I/O which is a read operation regardless of whether the I/O was a cache hit or a cache miss.
- a major factor in limiting the cache-ahead activity is the lack of need for its operation following most host I/O's.
- the described caching device determines the number of data segments of the same size as the current host I/O which remain between the location of the end of the current host I/O data and each end of the cached bm containing that data. If this computed number of data segments is more than a predetermined number, the cache unit can handle that number of sequential, contiguous host read I/O's withm that cache bm before there is a need to fetch data for the succeeding bm from the disk into the cache memory.
- the caching device will attempt to initiate action to fetch the data from the succeeding disk drive bm so that the service to the host can proceed with the least disk-imposed delays. Conversely, if the caching device were to ignore the above-described locality factor and always fetch the next data bm after every cache read-miss, many unneeded bins of data would be fetched from disk into cache memory.
- the forward bin is cached at this time. If the succeeding bm is already cached, the bm preceding the host I/O is considered; if it is not already cached, and the proximity factor favors caching, the data from the preceding bm is cached at this time. Of course, if both of these candidate bins had been cached previously, the cache-ahead module has no need to do any caching. A very important benefit accrues from this cache-ahead, cache-back feature. If related bins are going to be accessed by the host m a sequential mode, that sequence will be either in a forward or backward direction from the first one accessed in a given disk area.
- the background sweep and prefetches for a given drive alternately use the resources. See Figure 37.
- any scheduled prefetches can proceed without concern for the sweep.
- the sweep can proceed to use the resources as needed.
- the cache-ahead proceeds as shown in Figure 34.
- the cache management can do a cache-ahead in two steps.
- the first step would entail a seek to position the read-write head of disk drive to the proper disk track. If the drive were needed for servicing a higher priority operation, the cache- ahead could be aborted at this point. If no such operation exited at the end of the seek, the cache management would proceed to read the data from disk mto cache to complete the cache-ahead operation.
- the present invention is designed to handle operations between the host and the cache and between the disks and cacne simultaneously.
- a system of cache bm locks is incorporated to ensure that no data is overwritten or decached while it is the object of some kind of I/O activity.
- the cache bin lock flags are mcluded in the LRU table; see the section on the LRU format for a description of those flags. At any given time, a bm may be locked for more than one reason; all locks must be considered in determining if simultaneous operations may proceed. The most restrictive lock prevails. If a cache bin is found locked by a background task, there is no problem, since background tasks can be delayed. If a host I/O request involves a locked cache bm, there can be one of three results based on the lock flags of a bm: the new operation may proceed, the new operation may be delayed (queued for later completion) , or, m rare cases, it may proceed using an alternate cache bm. The following notes discuss the various considerations and are referenced in the tables describing the handling of potential conflicts.
- GNA A background operation usually will not be initiated if its target cache bm is currently locked for any reason.
- GNB Multiple operations involving a given cache bin will be handled in the order received or initiated, all other conditions being equal . This is especially important when one or both operations are modifying the data in the cache bin.
- GNC Use of a newly assigned, alternate cache bin for a host I/O results in the decaching of the cache bin currently assigned to the disk bin. Only clean bins can be decached.
- Bins that are the subjects of cache aheads or read aheads may be abandoned at the end of those operations if so doing will contribute to the overall performance of the present invention.
- a cache bin that contains any modified sectors is considered dirty, and resides in the pool of modified bins.
- a read hit refers to the existence of both the cache bin and the currently valid (cached) sectors m that cache bin.
- Gaps are taken into account when determining read hits.
- a locked bin effectively causes a read hit to be handled as a read miss since a locked bin will delay fulfilling the host request.
- a read miss may result in a read fetch, or in both a read fetch and a read ahead.
- the LRU table is updated for valid sectors at the time each read fetch and each read ahead is completed.
- a write hit refers to the cache bm only; the existence or absence of valid sectors is not considered for the write hit/miss determination.
- a host write immediately marks the sectors being written from the host to the cache as modified in the LRU table, makes the target cache bm dirty, and removes the cache bm from its cache chain (placing the cache bm m the pool of modified bins) .
- a host write does not mark the sectors being written from the host to the cache as valid in the LRU table until the operation is completed.
- a host write may modify the currently valid sectors in cache bins, may extend the valid area, create a gap, or do some combination of these.
- WMB WMB.
- a host write immediately marks the sectors being written from the host to the cache as modified in the LRU table, makes the target cache bm dirty, and removes the cache bin from its cache chain.
- a host write does not mark the sectors being written from the host to the cache as valid in the LRU table until the operation is completed.
- WMC A host write may modify the currently valid sectors in cache bins, may extend the valid area, create a gap, or do some combination of these.
- RFA reads data from the disk mto cache.
- a read fetch occurs only as a result of a read miss, and the primary purpose of the fetch is to satisfy the direct requirements of the read miss .
- RFB uses an assigned cache bm but does not mark the sectors being read from disk mto cache as valid m the LRU table until the disk read (fetch) operation is completed.
- a read fetch occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
- a bm locked for a read fetch cannot be decached to accommodate use of an alternate cache bm, since an active host read is waiting for the data being fetched, and which was resident in the present invention prior to this new host write.
- a read ahead refers to the reading of data from disk mto cache of the portion of the data in the disk bm succeeding the sectors covered by the read fetch which satisfied the read miss which triggered the read ahead.
- RAB uses an assigned cache bm but does not mark the sectors being read from disk mto cache as valid m the LRU table until the disk read operation is completed.
- a read ahead bin may be abandoned if it is clean, and a subsequent, overlapping (in time) host I/O operation can be handled more quickly by assigning and using another cache bm instead of the read ahead bm. In this case, the bin that is the subject of a read ahead will be abandoned prior to or at the end of that operation, and the abandoned cache bins made available for immediate reuse.
- a read ahead occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
- CACHE AHEAD NOTES CAA A cache ahead is the result of the proximity logic determining that the data in a disk bm adjacent to data currently residing m a cache bin should be read from disk into cache.
- a cache ahead assigns a cache bm to a disk bin; the cache bm is always clean and is immediately placed m the cache chain of the owning drive, but no sectors will be marked valid in the cache bin until the cache ahead is completed.
- a cache ahead bm may be abandoned if a subsequent, overlapping (m time) host I/O operation can be handled more quickly by assigning and using another cache bin instead of the cache ahead bin. In this case, the bm that is the subject of a cache ahead will be abandoned prior to or at the end of that operation, and the abandoned cache bins made available for immediate reuse.
- CAD A cache ahead occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
- a sweep write for a clean cache bm is a contradiction of terms, and cannot occur.
- a sweep only deals with a dirty bin.
- SWB A sweep write operation does not alter any data m a cache bm, and does not change the valid area of the cache bm.
- a sweep write occupies the cache to disk I/O path and precludes other operations requiring that same path.
- SWD. A host write may occur on a bm locked for a sweep write; however, the bm and its (newly) modified sectors must be recorded as dirty when the sweep and host write are completed, rather than being completely clean as at the completion of an uncontested sweep write.
- a gap can occur only in modified portions of the data in a cache bm, and thus the bm is dirty.
- a gap read for a clean cache bm is a contradiction of terms, and cannot occur.
- a gap read is presumed to alter data m a cache bm.
- a gap read occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path
- GRD GRD.
- a gap read uses an assigned cache bm but does not mark the sectors being read from disk mto cache as valid in the LRU table until the disk read operation is completed.
- a gap can occur only in modified portions of the data in a cache bm, and thus the bm is dirty.
- a gap write for a clean cache bm is a contradiction of terms, and cannot occur.
- GWB A gap write takes the sectors that are about to be decached out of the valid area when the gap write is initiated.
- GWC A gap write occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
- OPERATION CONDITION LOCKED BY COMMENT NOTES sweep clean any impossible SWA sweep dirty cache ahead impossible CAB sweep dirty read ahead impossible GNA sweep dirty read fetch impossible GNA sweep dirty sweep impossible GNE sweep dirty gap read impossible GNA sweep dirty gap write impossible GNA sweep dirty read hit proceed SWB sweep dirty read miss impossible RMA,RFB,RAD sweep dirty write hit impossible GNA sweep dirty write miss impossible GNA
- An important goal of the described methodology is the retention in cache, at any given moment m time, of data which is most likely to be requested by the host in the near future.
- One of the mechanisms of the present invention for accomplishing this goal is the recycling of cached bins based on recent history of usage of the data in each cache bm. Recycling, m its simplest form, is the granting of a "free ride" through the MRU-to-LRU cycle. Whenever a cache bm containing data previously accessed by the host is re-accessed by a read command from the host, information associated with that cache bm is updated in such a way as to indicate that bin's recent activity.
- a write operation by the host does not contribute to the recycling of a bm, since a write miss is usually handled at the same speed as a hit, and, therefore, a much smaller benefit would accrue from recycling based on host write activity. It is likely the present invention's performance benefits as much or more from the availability for reuse of cache bins whose primary activity was host writes as it would from recycling such cache bins.
- the recycling information is inspected whenever the cache bin reaches or nears the LRU position of the global cache chain. Normally, when a cache bin reaches the global LRU position, it is the primary candidate for decaching of its data when the cache bm is needed for caching some other data.
- the LRU cache bm may be placed at the MRU position of the drive's private cache instead of being decached and reassigned.
- This action provides the currently cached data in that cache bin one or more "free rides" down through the private and global cache chains, or in other words, that data's time in cache is increased.
- the recycling management decides whether to place the recycled cache at the MRU position of its drive cache chain, or at the MRU position of the global cache chain.
- the cache bin is not to be recycled.
- the resulting value is one half or more, the integer portion of the resulting value is placed in the recycle register, and the cache bm is recycled.
- the recycle register is divided by a preset factor based on the urgent mode. See the recycling control parameters in the CFG table description.
- the resulting value is one half or more, the integer portion of the resulting value is placed in the recycle register, and the cache bm is recycled.
- the data from the host is placed in cache and the cache bins are assigned to the specified disk and placed in the modified bins pool.
- the data will be written from the cache to its specified disk in the background, minimizing the impact of the disk operations on the time required to service the host I/O.
- the modules that handle this background activity are collectively known as the background sweep modules.
- To limit the sweep activity, and thus limit contention for the spindles only the data in those portions of cache bins which have been modified are written from SSD to disk during a sweep.
- Each disk's sweep is influenced by the mode in which the drive is currently operating, and is triggered independently of the other drives' conditions except when the global cache is saturated.
- the background sweep modules do not usually copy data from cache to disk as soon as the data has been modified. Rather, the sweep modules remain dormant until some set of conditions justify its operation.
- the background sweep for a given disk can be awakened by any of three sets of circumstances. These circumstances are:
- the drive enters the sweep mode.
- a drive is placed in the sweep mode when the number of cache bins containing modified data for that drive exceeds a preset threshold;
- the drive enters the timeout mode.
- a drive is placed in timeout mode when a specified amount of time has elapsed since the data in the oldest modified cache bin was written from the host to the specified disk's cache;
- the global cache is m saturated mode, and there is some modified data waiting to be written from this disk's cache to the disk. Global cache is placed in saturated mode when some prespecified amount of all cache bins contains modified data.
- a drive is placed m sweep mode when its count of modified cache bins surpasses some preset number.
- the modified data from some number of cache bins will be written to the disk.
- the number of cache bins from which data is to be written to disk is equal to the number of cache bins containing modified data at the time the count placed the drive m sweep mode. Since more cache bins may be written into by the host while the sweep is active, this sweep may not reduce the number of modified cache bins to zero. It is important that this limitation on the number of cache bins to be written exists since, otherwise, the sweep could be caught up in a lengthy set of repetitious writes of one cache bm.
- the drive may be placed in the timeout mode.
- a timeout occurs when data in a cache bin has been modified, the corresponding bm on disk has not yet been updated after a certain minimum time has elapsed, and the sweep for that disk has not been activated by the modified count. See Figure 45.
- a timeout occurs for a given disk's cache, by definition there will be data in at least one cache bm which needs to be copied to disk. At this time, the given disk's cache will be placed m the timeout mode.
- a sweep which has been initiated by a timeout unlike a sweep triggered by the counter, will write all modified cache bins for that disk drive to the disk before the drive sweep is terminated. Note, however, that this is a background activity, and as such, still has a lower priority than the handling of host commands.
- the global cache may be forced into the saturated mode.
- the global mode overrides the individual drive modes and conditions, and the sweep is activated for all drives for which there exist any modified cache bins.
- the sweep for each drive behaves as though there had been a timeout. This method of triggering all the sweeps is for the purpose of making maximum use of the simultaneous cache-to-disk operations. As soon as the global crisis is past, the drive sweep operations will revert to individual drive control.
- the background activities for each drive are handled; see Figure 37.
- a write from cache to disk may be initiated.
- the modified cache bin corresponding to the disk bin which is nearest, but not directly at, the disk read-write head position is identified. See Figure 47. If the read-write head for the drive is not currently located at the address of the disk bin corresponding to the modified bin a seek is initiated on the drive to the identified disk bin, and the sweep takes no further action at this time. If no other activity directly involving the disk drive occurs before the sweep again is given an opportunity to write a modified cache bin, the same cache bin will be identified, and this time, the head-position will be proper to continue with the write.
- the command is a read and it can be serviced entirely from cache, it is serviced by the read-hit portion of the controller (see description of read-hit handling) .
- the command is considered a read cache miss, the information required to service it is queued, and the storage device disconnects from the host. See Figure 30.
- the background manager will note the existence of the queued tasK and will complete the handling of the read miss. See Figures 37, 40, and 42.
- the command is a write and all bins involved in the operation are already in cache, the command is serviced by the write-hit portion of the controller. See Figure 32 and description of write-hit handling. If any portion of the write involves an uncached bm or bins, the command is turned over to the write-miss portion of the controller.
- this drive is rechamed upward m the LAD list. Additionally, the total disk accesses is incremented. If this makes the total accesses reach the LAD maximum, the total accesses field is reset to zero, and the access count for each drive is recalculated to be its current value divided by the LAD adjustment factor.
- This overall LAD procedure is designed to temporarily favor the most active drive (s) but not allow one huge burst of activity by one drive to dominate the cache management for an overly long period of time.
- the analysis of a host command includes creation of a bm address list (BAL) which contains the locations of each bm involved in the operation (see description of bm address list setup) .
- BAL bm address list
- the list will contain the bin's current location in cache, if it already resides there; or where it will reside in cache after this command, and related caching activity have been completed.
- the space for it to be put mto m cache is located, and the current bm resident in that space is decached.
- the analysis includes setting the cache hit/miss flag so that the controller logic can be expedited. See Figure 19.
- the controller segment which sets up the bm address list uses the I/O sector address and size to determine the disk bin identifying numbers for each bm involved in the I/O operation as described m the section below. See Figure 20. The number of bins involved is also determined, and for each, the portion of the bm which is involved m the operation is calculated.
- a sector address can be converted into a bin address by dividing it by the bm size. The quotient will be the bm number, and the remainder will be the offset mto the bm where the sector resides. See Figure 21. CACHE HIT/MISS DETERMINATION - READ
- Each line of the bm address list is inspected, and if, for a given disk bm, a corresponding cache bm is shown in the ADT table to exist, that information is copied mto the corresponding line of the BAL list.
- the valid sectors information in the LRU table for the bm is compared with the sectors required for the current I/O.
- Each line of the bm address list is inspected, and if, for a given disk bm, a corresponding cache bm is shown in the ADT table to exist, that information is copied mto the corresponding line of the BAL list. If any bins are not in cache, the cache-miss marker is set.
- CACHE READ-HIT OPERATION Refer to Figure 29.
- an I/O read command must have been received from the host. The command will have been analyzed and the bm address table will have been set up, and it has been determined that all data required to fulfill the read command is available in cache. With this preliminary work completed, the host read command can be satisfied by using each line of the bm address table as a subcommand control.
- a cache read hit is satisfied entirely from the cached data without disconnecting from the host.
- the caching firmware executes several operations at this time: 1.
- the requested data must be sent to the host. Since all required portions of all affected bins are already in cache, all required data can be sent directly from the cache to the host .
- the affected bins will be rechamed to become the most-recently-used (MRU) cache bins in the LRU cache chain for the drive. This may involve moving the cache b ⁇ n(s) from the global cache chain to the specific drive's cache chain.
- MRU most-recently-used
- the LRU table is updated to reflect the fact that this data has been accessed by the host; if the recycling register value for any cache bin involved m the read hit has not reached its maximum allowable value, it is increased by one to provide for the possible recycling of the cache bm when that bm reaches the LRU position of the cache chain. 4. Proximity calculations are performed to determine the desirability of scheduling a potential read-ahead of an adjacent disk bm. Refer to the discussion on cache-ahead and see Figure 8. 5. The statuses of the global cache chain and the specified drive cache chain must be updated. CACHE READ-MISS OPERATION
- a cache read-miss ( Figure 30) is satisfied m part or wholly from the disk. In order to reach this module, an I/O read command must have been received from the host. The command will have been analyzed and the bin address table will have been set up, and it has been determined that some or all data required to fulfill the read command are not available in cache. A cache read-miss is handled in several steps: 1. See Figure 30. The information required to handle the read command is saved in a read queue for later use.
- the storage device logically disconnects from the host.
- Steps 3 through 6 are repeated for each disk bm involved m the host request .
- this module will, if the bin was linked mto either the drive cache chain or the global cache chain, remove the affected cache bm from the cache chain and place it in the modified pool. In each case, the corresponding LRU table is updated to reflect any changes resulting from the existence of this new data. If the new data created any gaps in the modified portions of the bins, the GAP table is also updated accordingly in order to handle any needed post-transfer staging of partial bins. UPDATING THE GAP TABLE
- Adding a gap reference for a given cache bm is handled in several steps; if no previous gaps exist for the cache bm, the LRUB table gap pointer item will be null. To indicate a gap exists in the modified data of the cache bm, the pointer is set to the number of the first available line in the GAP table. The referenced GAP table line is then filled in with the information about the gap.
- a write miss is usually handled entirely withm the cache.
- the host command will have been analyzed and the bm address list will have been set up. With this preliminary work completed, the host write command can be satisfied by using each line of the bm address list as a subcommand control. Since this is a cache-miss, at least one of the addressed disk bins has no related cache bm assigned to it. One of two conditions will exist; 1) the device is operating in global saturated mode, or 2) it is not operating in global saturated mode. CACHE WRITE-MISS OPERATION WHEN NOT SATURATED If global cache is not operating in saturated mode, all data can be sent directly from the host to the cache without disconnecting from the host.
- cache bins are selected as needed and assigned to the drive. See Figures 33 and 26. As data is written into each cache bin, the bin, if not already in the modified pool, is removed from its current cache chain and placed in the modified pool, and the MOD table is updated. See Figures 24 and 25.
- the corresponding LRU table is updated to reflect any changes resulting from the existence of this new data. If the new data created any gaps in the modified portions of the bins, the GAP table is also updated accordingly in order to handle any needed post-transfer staging of partial bins. Since the writing into cache bins may change the status and modes of both the drive cache and the global cache, the drive status, the drive mode, the global status, and the global mode are all updated.
- the corresponding ADT, LRU and GAP tables are updated to reflect any changes resulting from the existence of this new data.
- the new data may change the status and modes of both the drive and the global cache, so the drive status, the drive mode, the global status, and the global mode are all updated.
- the queued seek will not be carried out as long as the disk is busy handling host read cache misses, or if the drive is busy on background sweeps or cache-ahead actions, or if the cache modes or statuses indicate it would be intrusive to the overall caching performance to use a cache bm for the seek action.
- the disk When the read from disk to cache completes, the disk will generate an interrupt. See Figure 50.
- the seek termination module will then update the control tables to show the subject disk bin's data is now in a cache bin.
- the cache bm is rechamed as the MRU bin for the drive. See Figure 52.
- INTERNAL INTERRUPTS While the present invention is m operation, it monitors its power source. Should the power to the unit be interrupted, for any reason, the device goes through a controlled power-down sequence, initiated as depicted in Figure 54.
- POWER DOWN CONTROL As depicted in the diagram of Figure 38, this portion of the firmware is invoked when the unit senses that the voltage on the power line to it has dropped.
- the device Since some of the data in the device may be in cache bins in a modified state and awaiting transfer to one or more of the disks, power must be maintained to the cache memory until the modified portions have been written to their respective disks. Thus, a failure of the line power causes the device to switch to the battery backup module.
- the battery backup module provides power while the memory device goes through an intelligent shutdown process. If the host is m the process of a data transfer with the memory device when power drops, the shutdown controller allows the transfer in progress to be completed. It then blocks any further transactions with the host from being initiated. The shutdown controller then must initiate a background sweep for each disk to copy any modified portions of cache bins from the solid state memory to the disk so that such data will not be lost when power is completely shut off to the control and memory circuits.
- the solid state memory will also reside on the disks. At this point the disk spindles can be powered down, reducing the load on the battery. Most power outages are of a short duration. Therefore, the controller continues to supply battery power to the control circuits and the solid state memory for some number of seconds. If the outside power is restored in this time period, the controller will power the spindles back up and switch back to outside power. In this case, the operation can proceed without having to reestablish the historical data in the solid state memory. In any case, no data is at risk since it is all stored on the rotating magnetic disks before a final shutdown.
- the final background sweep copies modified portions of data m cache bins from the solid state memory to the magnetic disk. See Figure 55. There will usually be only a few such cache bins, or portions of cache bins to copy for each drive since the number of cache bins that can reach this state is intentionally limited by the logical operation of the system.
- the final sweep makes use of logic developed for the normal timeout operation of the background sweep. The sweep is initiated in much the same manner as for a timeout during normal operation. If no cache bins for a given drive contain data which need to be copied, the sweep for that drive is left in the dormant state, and no further sweep action is required for the drive.
- the sweep control sets up and initiates a background write event for the drive. Writes for all drives are executed simultaneously until all data from modified cache bins has been written from cache to the respective drives. When no modified cache bins remain to be copied, the sweep is finished.
- the present invention includes a capability for utilizing an external terminal which can communicate with the device ' s executive control via a serial port.
- This communication facility can handle several types of activities. See Figure 56.
- the serial port may make inquiries to obtain data about the workloads the device has been encountering, such as the numbers of I/O's by the host, the current cache condition, the history of the caching operations, etc. This information is sufficient to allow external analysis to determine, as a function of time, levels of performance, frequency of occurrence of various situations such as the background sweep, cache-aheads, and the modes of operation.
- the serial port can be used to initiate self tests and to obtain the results thereof.
- the serial port may, under certain circumstances, modify the configuration of the device. For example, a disk drive may be removed from, or added to the configuration. Another example, is the resetting of some of the information in the configuration table, such as the various cache management parameters .
- the serial port may be used by the device ' s executive to report current operational conditions such as hardware failures or perceived problems, such as excessive disk retries during I/O's between the cache and the disks.
- control tables or segments of control tables are listed here. These tables and segments represent conditions which could occur at initialization and during the operation of the present invention. A very brief description of each pertinent table field is given here for convenience in interpreting the table data; see the various table format descriptions for more detailed information. Note that the asterisk (*) is used throughout to indicate a null value, and a dash (-) is used to indicate the field has no meaning m that particular circumstance. CONFIGURATION TABLE
- the CONFIGURATION table is made up of seven sections. 1. SIZING PARAMETERS AND DERIVED VALUES
- CFG-DRIVEB 62 500 bins capacity of each disk drive.
- CFG-DRIVES 4 spindles number of disk drives on device.
- CFG-DSNORMB 820 bins lower limit of drive normal status.
- CFG-GSEXCPCT 50 percent lower limit percent of all cache in global excess status, CFG-GSEXCESB 4, 096 bins lower limit of global chain in excess status. CFG-GSEXCPCT * CFG-CACHBINS
- CFG-DMSATPCT 80 percent lower limit percent of modified bins for saturated mode, CFG-DMSATURB 1,638 bins lower limit of modified bins for saturated mode. CFG-DMSATPCT * CFG-CACHBINS /
- CFG-GMSATPCT 60 percent lower limit percent of all bins modified bins for saturated mode.
- the complete LRU table is made up of three sections
- LRU-CONTROL unindexed counters of device activity 2.
- LRU-DISKS indexed by logical spindle 3.
- LRU-BINS indexed by logical disk bin LEAST RECENTLY USED CONTROL TABLE, INITIAL
- LRUC-TOTAL-MOD 0 bins number modified cache bins LRUC-GLOBAL-BINS 8,184 bins number in global cache chain LRUC-GLOBAL-LRU 0 line oldest bin in global cache chain LRUC-GLOBAL-MRU 8,183 line newest bin in global cache chain
- the values in the fol lowing table are dynamical ly variable ; while there are many possible valid sets of values ,
- the values in this table represent one possible set of values at a given point in time in the operation of the present invention .
- LRUC-TOTAL-MOD 40 bins number modified cache bins LRUC-GLOBAL-BINS 1002 bins number global cache chain bins LRUC-GLOBAL-LRU 110 line oldest bin in global cache chain LRUC-GLOBAL-MRU 2883 line newest bin in global cache chain
- the values are dynamically variable; there are many possible valid sets of values depending on the specific implementation of the present invention, its configuration, and its work load prior to the capture of these LRU table values .
- the values in 0 this table represent a small sample of one possible set of values at a given point in time in the operation of the present invention. Only a few selected LRU lines are shown.
- ADT ADDRESS TRANSLATION
- ADTC - ACCESSES 0 accesses number of host accesses to device 3 5
- ADTC -READS 0 accesses number of host reads to device
- ADTC -WRITES 0 accesses number of host writes to device
- ADTD-LINE-BEG first ADT-BINS line for referenced spmdle ADTD-HEAD-POS current position of read/write head of spindle
- ADTD-SWEEP-DIR current direction of sweep ADTD-DISK-ACCESSES number of host accesses since last reset
- ADTD-DISK-READS number of host read accesses since last reset
- ADTD-DISK-WRITES number of host write accesses since last reset
- ADTD-LAD-USAGE function based on number of host accesses ADTD-LINK-MORE chain pointer to more active drive in LAD list
- ADTB-CACHE-BIN logical cache bin containing disk bin data ADTB-BIN-ACCESSES number of host accesses since last reset
- ADTC-ACCESSES 10,000,000 number of host accesses to device
- ADTC-READS 8,000,000 number of host reads to device
- ADTC-WRITES 2,000,000 number of host writes to device
- the values in this table represent a sample of one possible set of ADT values at a given point in time in the operation of the present invention.
- the values in this table represent a sample of one possible set of ADT values at a given point in time in the operation of the present invention.
- the line numbers, disk bin numbers, and disk numbers are implicit and are not in the ADT table, but they are included here for clarity.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Storage device (200) including solid state (208) and disk (210) storage maintaining fast response time approaching solid state for many workloads and improved response workloads, using hardware and algorithms which places and maintains data in the most appropriate media based on actual and projected activity. A searchless method for determining location of data is used. Sufficient solid state memory permits retention of useful, active data, as well as prefetching data into solid state memory. Transferring updated data from solid state storage to disk and of prefetched data from disk to solid state memory is done as a timely, unobtrusive, background task. A locking mechanism provides for data integrity while permitting operations on the same data between host and solid state memory, and between solid state memory and disk. Private channels between solid state storage and disks prevent conversations between these media from conflicting with transmissions between the host and the invention.
Description
NOVEL CACHE MEMORY STRUCTURE AND METHOD
Related Applications
This application is a continuation-m-part of U.S. patent application serial no. 08/392,209 filed February 22, 1995, which in turn is a divisional of serial no. 08/255,251 filed June 7,
1994, which in turn is a continuation-m-part of serial no.
08/139,559 filed October 20, 1993 (now U.S. patent 5,353,430) , which m turn is a continuation of serial no. 07/860,731 filed February 21, 1992 (abandoned) , which in turn is a contmuation- m-part of serial no. 07/665,021 filed March 5, 1991
(abandoned) .
BACKGROUND OF INVENTION
Field of the Invention This invention relates to a high performance computer data storage device utilizing a combination of solid state storage and one or more mass memories, such as a rotating magnetic disk device.
Description of Prior Art A number of computer data storage systems exist which make some use of solid state memory devices as a front end to rotating magnetic disk devices.
A typical caching system uses a single solid state memory unit as a holding area for a string of magnetic disks, allowing certain information to be stored in a high speed cache memory, thereby increasing speed of performance as compared to the use solely of lower speed disk memories, i.e. for some percentage of times a piece of data is contained in the high speed cache
memory, thereby allowing faster access as compared with when that data is only stored m a disk drive.
A block diagram of such a system is shown in Figure 1. Host computer 101 communicates with the entire string 102 of disks 102-1 through 102-N via cache unit 103 via Host interface 104, such as a Small Computer Systems Interface (SCSI) . All data going to or from disk string 102 passes through the cache-to-disk data path consisting of host interface 104, cache unit 103, and disk interface 105. Cache unit 103 manages the caching of data and services requests from host computer 101. Ma]or components of cache unit 103 include microprocessor 103-1, cache management hardware 103-2, cache management firmware 103-3, address lookup table 103-4, and solid state cache memory 103-5. The prior art cache system of Figure 1 is intended to hold frequently accessed data m a solid state memory area so as to give more rapid access to that data than would be achieved if the same data were accessed from the disk media. Typically, such cache systems are quite effective when attached to certain host computers and under certain work loads. However, there exist some drawbacks and, under certain conditions such cache systems exhibit a performance level less than that achieved by similar, but uncached devices. Some of the factors contributing to the less than desirable performance of prior art cached disk devices are now described.
The single cache memory 103-5 is used in conjunction with all disks in disk string 102. Data from any of the disks may reside in cache memory 103-5 at any given time. The most frequently accessed data is given precedence for caching regardless of the disk drive on which it resides. When fulfilling a host command, the determination of whether or not the data is m cache memory 103-5 and the location of that data
in cache memory 103-5, is usually via hashing schemes and table search operations. Hashing schemes and table searches can introduce time delays of their own which can defeat the purpose of the cache unit itself. Performance is very sensitive to cache-hit rates. Due to caching overhead and queuing times, a low hit rate in a typical string oriented cache system can result in overall performance that is poorer than that of an equivalently configured uncached string of disks. The size of cache memory 103-5 relative to the capacity of disk drives 102 is generally low. An apparently obvious technique to remedy a low hit rate is to increase the cache memory 103-5 size. However, it has been found that there is an upper limit to the size of the cache memory 103-5 above which adding more capacity has limited benefits. With limited cache memory 103-5 capacity, a multitude of requests over a variety of data segments exhausts the capability of the cache system to retain the desirable data m cache memory 103-5. Often, data which would be reused in the near future is decached prematurely to make room in cache memory 103-5 for handling new requests from host computer 101. The result is a reduced cache hit rate. A reduced hit rate increases the number of disk accesses; increased disk accesses increase the contention on the data path. A self-defeating cycle is instituted. "Background" cache-ahead operations are limited since the data transferred during such cache ahead operations travels over the same data path as, and often conflicts with data transferred to service direct requests from the host computer 101. The data path between cache unit 103 and disk string 102 can easily be overloaded. All data to and from any of the disks in disk string 102, whether for satisfying requests from host computer 101 or for cache management purposes, travels across the
cache-to-disk path. This creates a bottleneck if a large amount of prefetching of data from disk string 102 to cache memory 103-5 occurs. Each attempt to prefetch data from disk string 102 into cache memory 103-5 potentially creates contention for the path with data being communicated between any of the disk drives of disk string 102 and host computer 101. As a result, prefetching of data into cache memory 103-5 must be judiciously limited; increasing the size of the cache memory 103-5 beyond a certain limit does not produce corresponding improvements in the performance of the cache system. This initiates a string of related phenomena.
Cache-ahead management is often limited to fetching an extra succeeding track of data from disk wherever a read command from the host cannot be fulfilled from the cached data. This technique helps minimize the tendency of cache-ahead to increase the queuing of requests waiting for the path between cache memory 103-5 and disk string 102. However, one of the concepts on which caching is based is that data accesses tend to De concentrated within a given locality withm a reasonably short time frame.
For example, data segments are often accessed in sequential fashion. Limiting the cache-ahead operations to being a function of read misses can have a negative effect of lowering the cache hit rate since such limitation may prevent or degrade the exploitation of the locality of data accesses.
A variety of algorithms and configurations have been devised in attempts to optimize the performance of string caches. A nearly universally accepted concept involves the retention and replacement of cached data segments based on least-recently-used (LRU) measurements. The decachmg of a a to make room for new data is managed by a table which gives, for each cached block of data, its relative time since it was last
accessed. Depending on the algorithm used, this process can also result in some form of table search, with a potential measurable time delay.
Cache memory 103-5 is generally volatile; the data is lost if power to the unit is removed. This characteristic, coupled with the possibility of unexpected power outages, has generally imposed a write-through design for handling data transferred from host computer 101 to the cached string. In such a design, all writes from the host are written directly to disk; handled at disk speed, these operations are subject to all the inherent time delays of seek, latency, and lower transfer rates commonly associated with disk operations. SUMMARY
Computer operations and throughput are often limited by the time required to write data to, or read data from, a peripheral data storage device. A solid state storage device has high-speed response, but at a relatively high cost per megabyte of storage. A rotating magnetic disk, optical disk, or other mass media provides high storage capacity at a relatively low cost per megabyte, but with a low-speed response. The teachings of this invention provide a hybrid solid state and mass storage device which gives near solid state speed at a cost per megabyte approaching that of the mass storage device . For the purposes of this discussion, embodiments will be described with regard to magnetic disk media. However, it is to be understood that the teachings of this invention are equally applicable to other types of mass storage devices, including optical disk devices, and the like. This invention is based on a combination of hardware and firmware features. The hardware features include: one or more rotating magnetic disk media, an ample solid state storage capacity; private channels between the disks and the solid state storage
device; and high speed microprocessors to gather the intelligence, make data management decisions, and carry out the various data management tasks.
The firmware features include the logic for gathering the historical data, making management decisions, and instructing the hardware to carry out the various data management operations. Important aspects of the firmware include making the decisions regarding the retention of data in the solid state memory based on usage history gathered during the device ' s recent work load experience; and a comprehensive, intelligent, plateau-based methodology for dynamically distributing the solid state memory for the usage of the data stored on, or to be stored on, the various disks This distribution of solid state memory is work load sensitive and is constantly dynamic; it is accomplished in such a way as to guarantee full utilization of the solid state memory while at the same time ensuring that the data for all disks are handled in such a way as to retain the most useful data for each disk m the solid state memory for the most appropriate amount of time. The multiple plateau-based cache distribution methodology is illustrated in Figures 57 and 58, and described m the section entitled Plateau Cache Illustration. The hardware and firmware features are combined in a methodology which incorporates simultaneity of memory management and data storage operations. The hybrid storage media of this invention performs at near solid state speeds for many types of computer workloads while practically never performing at less than normal magnetic disk speeds for any workload. One or more rotating magnetic disk media are used to give the device a large capacity; the solid state storage is used to give the device a high-speed response capability. By associating the solid state media directly with magnetic disk devices, private data communicaticn
lines are established which avoid contention between normal data transfers between the host and the device and transfers between the solid state memory and the disks. The private data channels permit virtually unlimited conversation between the two storage media. Utilization of ample solid state memory permits efficient maintenance of data for multiple, simultaneously active data streams. Management of the storage is via one or more microprocessors which utilize historical and projected data accesses to perform intelligent placement of data. No table searches are employed m the time critical path. Host accesses to data stored m the solid state memory are at solid state speeds; host accesses to data stored on the magnetic disks are at disk device speeds. Under most conditions, all data sent from the host to the device is handled at solid state speeds. Intelligence is embodied to cause the device to dynamically shift its operating priorities in order to maintain performance at a high level, including the optimization of the handling of near-future host-device I/O operations. Additionally, a variety of historical data is collected and utilized to create a unique methodology for preserving data stored from each disk in a dynamically associated area of solid state memory while still allowing spacial reassignments to be made in response to an ever changing workload balance on the mass media devices . GLOSSARY OF TERMS ADDRESS TRANSLATION: The method for converting a sector address into a disk bin address and sector offset within the
ADDRESS TRANSLATION TABLE; ADT TABLE: The table which maintains the relationship between disk bin identifiers and solid state memory addresses; also holds statistics about frequency of bin accesses, recent bin accesses, or other information as required.
ADT TABLE: See ADDRESS TRANSLATION TABLE
BACKGROUND ACTIVITY: Any of the tasks done by the described device's controller which are not done m immediate support of the host's activity. For example, the writing of modified data from cache to disk, prefetch reading of data from disk into cache, etc.
BACKGROUND SEEK: The first step in a background sweep write of modified data from cache to disk.
BACKGROUND SWEEP: The collective activities which write data from cache to disk as background tasks.
BACKGROUND OPERATION: See BACKGROUND ACTIVITY. BACKGROUND TASK: See BACKGROUND ACTIVITY.
BACKGROUND WRITE: The writing of modified data from cache to disk as a background task. BAL: Bin Address List.
BATTERY BACKUP: The hardware module and its controller which assures the described device of power for an orderly shutdown when outside power is interrupted.
BEGGING FOR A BIN: The canvassing of the cache chains for all the drives to find a cache bin that can be reused. BM An arbitrary number of contiguous sectors occupying space on either a disk drive or in cache memory in which data is stored. BIN ADDRESS LIST (BAL) : A list of one or more disk bin addresses associated with a host command, and which relate the command's requirements to disk bins, cache bins, and sector addresses .
BIN SIZE: The number of sectors considered to be in a disk bin and also the number of sectors considered to be m a cache bin; this may or may not be equal to the actual number of sectors in a disk track.
BUYING A BIN: The acquisition of a cache bin from the global cache chain to use for data for a given drive, such cache
bin received in exchange for a cache bin currently in the LRU position of the cache chain of that drive.
CACHE: The solid state memory area which holds user data within the cache system of this invention. CACHE AHEAD; PREFETCH: The reading of data from a disk bin into a cache bin m anticipation of its need to fulfill a read request from the host .
CACHE-AHEAD FACTOR; PROXIMITY FACTOR: At each cache bin read hit or re-hit, cached data sufficient to satisfy a number of I/O's may remain in front of, and/or behind, the current location of the data involved in the current I/O. When either of these two remaining areas contain valid data for less than a set number of I/O's, the cache-ahead is activated. That minimum number of potential I/O's is the cache-ahead factor, or the proximity factor.
CACHE BIN: A data bin in cache memory.
CACHE CHAIN: The logical bidirectional chain which maintains references to a set of cache bins m a certain sequence; in the described device, m the order of most-recently-used (MRU) to least-recently-used (LRU) .
CACHE CHAIN STATUS; CACHE STATUS: An attribute based on the number of cache bins in a given cache chain, either for a given drive or global, and which is associated with a cache chain which indicates that the cache chain is in a specified condition. Such cache bins contain no modified data since cache bins containing modified data are removed from the cache chain and placed in the modified pool. The cache chain status is used to control decisions regarding the way in which the device manages the cache and other resources . CACHE HIT: A host initiated read or write command which can be serviced entirely by utilizing currently cached data and/or currently assigned cache bins.
CACHE HIT RATE: The proportion of all host I/O's which have been, or are being, serviced as cache hits.
CACHE MEMORY: The solid state memory for retaining data. See SOLID STATE MEMORY. CACHE MISS: A host initiated read or write command which cannot be serviced entirely by utilizing currently cached data and/or currently assigned cache bins.
CACHE READ HIT: A host initiated read command which can be serviced entirely by utilizing currently cached data. CACHE READ MISS: A host initiated read command which cannot be serviced entirely by utilizing currently cached data. CACHE STATUS: See CACHE CHAIN STATUS.
CACHE WRITE HIT: A host initiated write command which can be serviced entirely by utilizing currently assigned cache bins. CACHE WRITE MISS: A host initiated write command which cannot be serviced entirely by utilizing currently assigned cache bins .
CHAIN; CHAINING: A method utilized m the LRU table to logically connect cache bins in a desired sequence, in the case of this invention, to connect the cache bins in most-recently-used to least-recently-used order. Chaining is also used in any table management in which the logical order is dynamic and does not always match the physical order of the data sets in the table. CHAINING: See CHAIN.
CHAIN STATUS: See CACHE CHAIN STATUS.
CHANNEL; PATH: The physical data path connecting two devices, and over which data is transmitted between the two devices . CLEAN BIN; CLEAN CACHE BIN: A cache bin containing only unmodified data; that is data that matches the data in the corresponding disk bin.
CLEAN DATA: Data currently resident in a cache bin which exactly matches the data stored m the disk bin to which the cache bin is assigned.
CLEANING: The act of writing data from cache memory to disk, such data having been written from the host into cache and which has not yet been written from cache to disk.
CONTROL TABLE: Any of the various tables which maintain records which control the location and retention of data stored in cache or on disk, and which are used to control the activities of the described device. DATA BIN (DISK) : See DISK BIN. DATA CHANNEL: See CHANNEL
DATA SEGMENT (PROXIMITY) : A group of contiguous sectors within a cache bin, and of a size which matches the most recent host read for data from that cache bin.
DECACHE: The removal of the logical references which relate some or all of the data in a cache bin to the data in a disk bin. When such references are all removed, the decached data in the cache bin is no longer available from cache, but can be retrieved from the disk on which it is stored.
DIRTY BIN; MODIFIED BIN: A cache bin which contains data which has been written to the cache by the host, and which data has not yet been written from the cache to the corresponding disk drive. DIRTY DATA; MODIFIED DATA: Data which has been written to the cache by the host, and which data has not yet been written from the cache to the corresponding disk drive. In other words, data in a cache bin that does not match the data in the corresponding disk bin. DISCONNECT: The action of removing the current logical connection on a channel between two devices, thus freeing the
channel for use by other devices which have access to the channel .
DISK; DISK DRIVE; DRIVE; MAGNETIC DISK; ROTATING MAGNETIC DISK: A rotating magnetic media disk drive.
DISK BIN; DRIVE BIN: A data bin on a disk.
DISK BIN ADDRESS: The address of the first sector of data in a given bin on disk. These addresses correspond to physical locations on the rotating magnetic disk. Each sector address as specified in an I/O operation can be converted into a disk bin address and a sector offset within that bin.
DISK CACHE: That portion of the described device's cache memory which is assigned to data corresponding to data stored on, or intended to be stored on, a specific disk drive. DISK DRIVE: See DISK.
DISK SECTOR ADDRESS: The address of a physical sector on the magnetic disk device.
DISK SERVER: The logical section of the caching device which handles the writes to, and reads from, the rotating magnetic disk.
DISK TRACK; PHYSICAL TRACK: A complete data track on a disk; one complete band on one platter of the disk device; this terminology is generally not meaningful to the logic of the described device. DMA: Direct Memory Access; that is, memory-to-memory transfer without the involvement of the processor.
DRAM: Dynamic random access memory. The chip or chips that are used for solid state memory devices. DRIVE: See DISK. DRIVE BIN: See DISK BIN.
DRIVE CACHE: Collectively the cache bins which are currently assigned to a given drive.
97/49037 PCMJS97/10155
DRIVE CACHE CHAIN: The logical chain of cache bins which are currently assigned to maintain data for a given disk drive of the described device. Such cache bins are available to be decached and reused only in very special circumstances, as opposed to those cache bins which have migrated into the global cache chain.
DRIVE CACHE CHAIN STATUS; DRIVE STATUS: A term describing a drive's cache condition based on the number of unmodified cache bins assigned to that given drive. See CACHE CHAIN STATUS.
DRIVE MODE; DRIVE OPERATING MODE: An attribute of each drive based on the number of cache bins which contain modified data which indicates that the amount of such cache is at a specified level . Used to control decisions regarding the way in which the device manages the drive, the cache, and other resources .
DRIVE OPERATING MODE: See DRIVE MODE.
DRIVE NORMAL MODE: The condition of the cache assigned to a specific drive in which the described storage device can use its normal priorities with respect to the management of data for that drive in order to reach its optimal performance level. In this mode, the background sweep is dormant; and cache ahead, recycle, and read ahead operations take place as circumstances indicate they are appropriate. DRIVE SATURATED MODE: The drive mode in which no cache ahead operations, no recycling, and no read ahead operations are permitted on the drive. The global logic will not allow the number of modified cache bins assigned to a given drive to exceed the number that places the device in this mode. DRIVE SWEEP MODE: The drive mode in which the sweep has been activated for the drive based on the number of cache bins
assigned to the drive and which contain modified data; m this mode the sweep shares resources with the cache ahead operations. DRIVE TIMEOUT MODE: The drive mode in which the sweep has been activated for the drive based on the time since a cache bin assigned to the drive was first written to by the host and which still contains modified data; in this mode the sweep shares resources with the cache ahead operations.
DRIVE URGENT MODE: The drive mode in which the sweep is active. Due to the large number of cache bins containing modified data, no cache ahead operations are permitted on the drive, and recycling is limited to the more frequently accessed cache bins.
DRIVE STATUS: See DRIVE CACHE CHAIN STATUS. EDAC: Error Detection And Correction. EEPROM: Electrically Erasable Programmable Read-Only Memory.
EPROM: Erasable Programmable Read-Only Memory.
EXCESS CACHE CHAIN STATUS: The term used to describe a cache chain, either global or local, when that chain contains more than the desired maximum number of cache bins. The global cache chain will be in this status whenever the described device is powered on.
FIRMWARE: The collective set of logical instructions and control tables which control the described device's activities. GAP; HOLE: When a cache bin contains two non-contiguous groups of sectors of cached data, the group of sectors between the two groups of cached data is considered to be a HOLE or GAP. GAP ELIMINATION: The actions taken to make the cached data within a given cache bin contiguous. This can be done by reading data from the disk into the gap, or by taking whatever actions are necessary to allow one of the cached areas to be decached.
GAP READ: The elimination of a gap m cached, modified data in a single cache bin by reading intervening data from the disk.
GAP TABLE: A control table which contains a line which references each gap of each cache bin which currently contains a gap.
This table will usually be null, or will have no lines m it, since the logic of the described device eliminates gaps as soon as feasible after their creation by the host.
GAP WRITE: The elimination of a gap in cached, modified data in a single cache bin by writing some or all of the modified data m the cache bin from cache to the disk.
GLOBAL CACHE: Collectively the cache bins which are currently assigned to a given drive, but which have been placed in the global cache chain. Cache bins in the global cache chain are more readily accessible for decachmg and reassignment to other drives .
GLOBAL CACHE CHAIN: The logical chain of cache bins which, although they may be currently assigned to maintain data for the various disk drives of the described device, are readily available to be decached and reused for caching data for any of the drives .
GLOBAL CACHE CHAIN STATUS; GLOBAL STATUS: A term describing the global cache condition based on the number of unmodified cache bins currently in the global cache chain. See CACHE CHAIN STATUS.
GLOBAL DRIVE MODE: A term describing a controlling factor of the drive operations; determined by the total number of modified cache bins assigned to all drives.
GLOBAL NORMAL DRIVE MODE: See GLOBAL NORMAL MODE. GLOBAL NORMAL MODE: The condition based on the total number of cache bins containing modified cache bins in which the described storage device can use its normal priorities with
respect to the management disk drive activities in order to reach its optimal performance level. In this mode, the background sweep, recycling, and cache ahead operations are under individual drive control . GLOBAL POOL: See MODIFIED POOL.
GLOBAL SATURATED MODE: The global mode in which a very large number of cache bins contain modified data. All cache aheads, all recycling, and all read aheads are prohibited. The sweep is operating for all drives which have any modified cache bins assigned to them.
GLOBAL URGENT MODE: The global mode in which a large number of cache bins contain modified data. The background sweep is forced on for all drives which have any modified cache bins, cache aheads are prohibited, but read aheads are permitted under individual drive control.
GLOBAL LRU: The term used to reference that cache bin in the least-recently-used position of the global cache chain. GLOBAL MODE: See GLOBAL DRIVE MODE.
GLOBAL MODIFIED CACHE POOL: The pool of cache bins each of which contains one or more sectors of modified, or dirty, data. GLOBAL STATUS : See GLOBAL CACHE CHAIN STATUS . HASHING: A procedure used in many computer programs which is used to quickly determine the approximate location of some desired information. Used extensively in conventional caching devices to reduce the amount of searching to determine where, if at all, data for a given disk location may be located in cache. This methodology usually results in a search to locate the exact data location, such search prone to becoming very time consuming for large caches. The present invention uses no such hashing and/or search scheme, and is not subject to such time delays. HOLE: See GAP. HOST: The computer to which the caching device is attached.
HOST COMMAND: Any of the logical instructions sent from the host to the described device to instruct the device to take some kind of action, such as to send data to the host (a READ command) , to accept data from the host for storing (a WRITE command) , among others.
HOST SERVER: The portion of the caching device which interfaces with the host computer.
I/O SIZE: The size of a host I/O request as a number of sectors . LAD: See LEAST ACTIVE DRIVE.
LEAST ACTIVE DRIVE: The least active disk drive based on recent host I/O activity and short term history of host I/O activity.
Used in determining from which drive to steal a cache bin when one is needed for reuse, generally for another drive.
LEAST ACTIVE DRIVE CHAIN: See LEAST ACTIVE DRIVE LIST.
LEAST ACTIVE DRIVE LIST: The chained lines of the ADT-DISKS table which maintain the drive activity information.
LEAST-RECENTLY-USED TABLE: See LRU TABLE. LINK: A term used to describe a line in a chained table, and which is tied, forward, backward, or both ways via pointers, to another line or other lines m the table.
LOCKED CACHE BIN; LOCKED BIN: The term used to describe a cache bin which is part of an ongoing I/O, either between the host and the solid state memory, or between the solid state memory and the rotating media storage device, or both.
LOGICAL SPINDLE: A spindle of a disk drive, or a logical portion thereof which has been designated a spindle for purposes of the described device. LRU: Least-Recently-Used, as pertains to that data storage cache bin which has not been accessed for the longest period of time.
LRU TABLE; LEAST-RECENTLY-USED TABLE: The table containing the information which allows the caching device's controller to determine which solid state memory data areas may be reused with the least impact on the cache efficiency. MAGNETIC DISK: See DISK.
MAGNETIC DISK MEDIA: A device which utilizes one or more spinning disks on which to store data.
MARGINAL CACHE CHAIN STATUS: The cache chain status, either global or local, in which the number of cache bins in the cache chain is approaching the smallest permissible. The device's control logic will attempt to keep the cache chain from becoming smaller, but will allow it to do so if no other course of action is available to handle the host activity.
MINIMAL CACHE CHAIN STATUS: The cache chain status, either global or local, m which the number of cache bins m the cache chain for a given disk drive is the smallest permissible. The device's control logic will not allow the cache chain to be smaller than this limit.
MOD TABLE; MODIFIED BINS TABLE: The table containing the information which indicates which disk bins' data are currently partly or wholly stored in cache bins and that those corresponding cache bins contain data which has been written into by the host but not yet copied to the corresponding disk. Used by the background sweep to determine which cache bin should be the next one copied to disk.
MODE: See DRIVE MODE and GLOBAL DRIVE MODE. MODIFIED BIN: See DIRTY BIN. MODIFIED BINS TABLE: See MOD TABLE. MODIFIED CACHE BIN: See DIRTY BIN. MODIFIED CACHE POOL: See MODIFIED POOL.
MODIFIED DATA: See DIRTY DATA.
MODIFIED POOL: MODIFIED CACHE POOL: A generally unordered list of all the cache bins which, at a given moment m time, contain modified data.
MOST ACTIVE DRIVE: The most active drive based on recent host I/O activity as reflected in the LEAST ACTIVE DRIVE LIST of the ADT table.
MRU: Most-Recently-Used, as pertains to that data storage track which has been accessed in the nearest time past.
NORMAL CACHE CHAIN STATUS: The cache chain status, either global or local, m which the number of cache bins in the cache chain is withm the preset limits which are deemed to be the best operating range for a drive's cache chain.
NORMAL MODE: See DRIVE NORMAL MODE and GLOBAL NORMAL MODE.
NULL, NULL VALUE: A value in a table field which indicates the field should be considered to be empty; depending on usage, will be zero, or will be the highest value the bit structure of the field can accommodate.
NULL VALUE: See NULL.
OPERATING MODE: See DRIVE MODE and GLOBAL DRIVE MODE. PATH: See CHANNEL.
PERIPHERAL STORAGE DEVICE: Any of several data storage devices attached to a host for purposes of storing data. In the described device, may be a disk drive, but is not limited thereto. PHYSICAL TRACK: See DISK TRACK.
PLATEAU: In the context of this invention, the plateaus represent various amounts of cache which may be assigned to the disks or to global cache and which is protected for a given set of conditions. In this embodiment, the plateaus are fixed at initialization time, and are the same for all drives. In logical extensions of this invention, the plateaus for the
various drives would not necessarily be equal, and they may be dynamically adjusted during the operation of the device. PREFETCH: See CACHE AHEAD.
PRIVATE CHANNEL: See PRIVATE DATA CHANNEL. PRIVATE DATA CHANNEL: In the described device, a physical data path used to move data between the solid state memory ana a disk drive, but not used to move data between the host computer and the described device; therefore, private to the device. PROXIMITY: A term for expressing the "nearness" of the data in a cache bin which is currently being accessed by the host to either end of the said cache bin.
PROXIMITY FACTOR: See CACHE-AHEAD FACTOR. QUEUED HOST COMMAND: Information concerning a host command which has been received by the described device, but which could not be immediately handled by the device; for example, a read cache miss.
QUEUED READ CACHE MISS: Information concerning a host read command which has been received by the described device, but, which could not be immediately fulfilled by the device. QUEUED READ COMMAND: See QUEUED READ CACHE MISS. QUEUED SEEK CACHE MISS: Information concerning a host seek command which has been received by the described device, and for which the data at the specified disk address is not currently stored in cache.
QUEUED WRITE CACHE MISS: Information concerning a host write command which has been received by the described device, but which could not be immediately handled by the device.
READ AHEAD: The reading of data from a disk bin into a cache bin as a part of the reading of requested data from disk into cache resulting from a read cache miss. The data so read will be the data from the end of the requested data to the end
of the disk bin. This is not to be confused with a cache ahead operation.
READ CACHE HIT: See CACHE READ HIT. READ CACHE MISS: See CACHE READ MISS. READ COMMAND: A logical instruction sent from the host to the described device to instruct the device to send data to the host .
READ FETCH: The reading of data from a disk bin into a cache bin m order to satisfy the host request for data which is not currently in cache. The data so read will be the data from the beginning of the requested data to the end of the requested data withm the disk bm. This is not to be confused with a read ahead or a cache ahead operation. READ HIT: See CACHE READ HIT. READ MISS: See CACHE READ MISS.
READ QUEUE: A temporary queue of read cache miss commands; the information in this queue is used to cache the requested data and then to control the transmission of that data to the host . RECONNECT: The action of reestablishing a logical data path connection on a channel between two devices, thus enabling the channel to be used for transmitting data. Used m handling queued read cache misses and queued writes.
RECYCLE: The term used to describe the retention of data in a cache bm beyond that bin's logical arrival at the global cache LRU position; such retention may be based on a number of factors, including whether or not some data in the cache bin was read at some time since the data m the cache bin was most recently cached, or since the data was last retained in cache as the result of a recycling action.
ROTATING MAGNETIC DISK: See DISK.
ROTATING STORAGE MEDIA: A data storage device such as a magnetic disk.
SATURATED MODE; SATURATED: See SATURATED DRIVE MODE and SATURATED GLOBAL DRIVE MODE. SCSI: Small Computer System Interface; the name applied to the protocol for interfacing devices, such as a disk device to a host computer.
SCSI CONTROL CHANNEL: A physical connection between devices which uses the SCSI protocol, and is made up of logical controllers connected by a cable.
SECTOR; SEGMENT: The logical sub-unit of a disk track or disk bin; the smallest addressable unit of data on a disk.
SECTOR ADDRESS: The numerical identifier of a disk sector, generally indicating the sequential location of the sector on the disk.
SECTOR OFFSET: In the described device, the relative location of a given sector withm a cache bm or disk bm.
SEEK: The action of positioning the read/write head of a disk drive to some specific sector address Usually done by a host in preparation for a subsequent read or a write command.
SEEK CACHE MISS: In the described device, a seek command from the host for which the data of the corresponding disk bm is not cached. The described device will place the information about the seek m a seek queue and attempt to execute a background read ahead in response to a seek cache miss.
SEEK COMMAND: A logical instruction sent from the host to the described device to instruct the device to position the read/write head of the disk to some specific sector address. In the described device, this is handled as an optional cache ahead operation.
SEEK QUEUE: A temporary queue of host seek miss commands which are waiting to be satisfied by background cache ahead actions, should time permit. SEGMENT: See SECTOR. SERIAL PORT: A means for communicating with a device, external to the described device, such as a terminal or personal computer, which in this context may be used to reset operating parameters, reconfigure the device, or make inquiries concerning the device's operations. SOLID STATE MEMORY, SOLID STATE DEVICE; SSD: Storage media made up of solid state devices such as DRAMs . Used for the cache memory of the described device.
SOLID STATE STORAGE: See SOLID STATE MEMORY. SOLID STATE STORAGE DEVICE: See SOLID STATE MEMORY. SPINDLE: See LOGICAL SPINDLE. SSD: See SOLID STATE MEMORY.
SSD BIN ADDRESS: The address in the solid state memory at which the first byte of the first sector currently corresponding to a given disk bm resides. STATUS: See DRIVE CACHE CHAIN STATUS and GLOBAL CACHE CHAIN STATUS .
STEALING A BIN: The acquisition of a cache bm from the global cache chain, or indirectly from another drive's cache chain, for use for data for a given drive when that drive does not give back a cache bm in exchange.
SWEEP: See BACKGROUND SWEEP. SWEEP MODE: See DRIVE SWEEP MODE.
TABLE SEARCH: A technique used in some devices to find references to certain data, such as the location of cached data. This procedure is often time consuming, and in general, it is not used in the described device m any of the time critical paths .
TIMEOUT: In the described device, a timeout occurs when, for a given drive, some cache bm has been holding dirty, or modified, data for more than some preset length of time. The occurrence of a timeout will place the drive m timeout mode if the sweep for that drive is not already active. TIMEOUT MODE: See DRIVE TIMEOUT MODE. WRITE CACHE HIT: See CACHE WRITE HIT. WRITE CACHE MISS: See CACHE WRITE MISS.
WRITE COMMAND: A logical instruction sent from the host to the described device to instruct the device to accept data from the host for storing m the device.
WRITE QUEUE: A temporary queue of host write miss commands which are waiting to be satisfied due to the extremely heavy load of write activities which have temporarily depleted the supply of cache bins available for reuse. This queue will usually be null, or empty, and if it is not, the described device will be operating m the saturated mode.
WRITE THROUGH: A technique used in some caching devices to allow host write commands to bypass cache and write directly to the disk drive.
BRIEF DESCRIPTIONS OF DRAWINGS AND TABLES
These sections give a very brief description of each of the figures and tables presented in this specification. While some diagrams may be pertinent to more than one section, they will be described only in their primary section. SYSTEM OVERVIEW
Figure 1 depicts the logic for a typical prior-art cached disk, computer data storage system.
Figure 2 depicts an overall view of the hardware component of one embodiment of the present invention.
Figure 3 depicts an overall view of one embodiment of the present invention which uses cached disks as a computer data storage unit . CACHE ILLUSTRATIONS These diagrams depict the cache of selected embodiments of the present invention as related to statuses and modified pools as related to the various drive operating modes.
Figure 4 depicts one embodiment of drive cache structure as its various sizes relate to the drive cache statuses. Figure 5 depicts one embodiment of the pool of modified cache bins associated with a drive, showing how the pool's size relates to the drive cache modes.
Figure 6 depicts one embodiment of the global cache structure, as its various sizes relate to the global cache statuses.
Figure 7 depicts the composite pool of modified cache bins, showing how the pool's size relates to the global cache modes.
FIRMWARE - CACHE MANAGEMENT MODULES These modules handle the cache management of the present invention. Each module may be invoked from one or more places m the firmware as needed. They may be called as a result of an interrupt, from withm the background controller, or as a result of, or part of any activity. Figure 8 shows one embodiment of a cache-ahead determination procedure that determines which cache bm, if any, should be scheduled for a cache-ahead operation.
Figure 9 shows one embodiment of a set drive mode procedure for setting the operating mode for a specific drive's cache based on the number of modified cache bins assigned to that drive.
Figure 10 shows one embodiment of a set drive cache chain status module, which uses the information about the number of cache bins currently assigned to the specified drive to set a cache chain status for that drive. Figure 11 shows an exemplary procedure for locating a cache bin to reuse.
Figure 12 shows an exemplary method by which a specific drive gets a cache bm from the global cache to use for its current caching requirements when the drive can afford to give a cache bm to the global chain m return.
Figure 13 shows an exemplary method by which a specific drive gets a cache bm from the global cache to use for its current caching requirements when no drive can give a bm to global . Figure 14 shows an exemplary method by which a specific drive indirectly gets a cache bm from another drive's cache to use for its own current caching requirements.
Figure 15 shows an exemplary method by which a specific drive gets a cache bm for its use from any drive when none is available m either the drive's own cache chain or m the global cache .
Figure 16 shows one embodiment of logic for determining the least active drive whose cache chain is the best candidate for supplying a bm for use by another drive. Figure 17 shows an exemplary procedure for setting the operating mode for the global cache based on the total number of modified cache bins.
Figure 18 depicts one embodiment of a set global cache chain status module, which uses the information about the number of cache bins currently assigned to the global portion of cache to set a global cache chain status.
Figure 19 shows exemplary logic for determining if a current host request is a cache hit or must be handled as a cache miss.
Figure 20 depicts exemplary logic for setting up a bm address list based on a host command.
Figure 21 depicts exemplary logic for translating the sector address of a host command mto a bm identifier.
Figure 22 shows an exemplary method for updating and rechammg the least-active disk list. Figure 23 shows an exemplary method by which a cache bm which has just been involved in a cache read hit is rechamed to the MRU position of that drive's cache chain, if that action is appropriate.
Figure 24 depicts exemplary logic for moving a newly modified cache bm from a drive's cache chain to the pool of modified cache bins.
Figure 25 depicts exemplary logic for moving a newly modified cache bm from the global cache chain to the pool of modified cache bins. Figure 26 depicts one embodiment of a module which determines whether or not sufficient cache bins are available for reuse to handle a current requirement . FIRMWARE - MODULES ENTERED AS A RESULT OF HOST INTERRUPTS
In exemplary embodiments, these modules are invoked, directly or indirectly, by a host interrupt These modules may call other modules which are described in a different section. Figure 27 shows exemplary logic of the host-command interrupt management .
Figure 28 depicts exemplary logic of handling the completion of a host command.
Figure 29 depicts exemplary logic for handling a read command from the host when all the data to satisfy the command is found to be in the cache memory.
Figure 30 depicts exemplary logic for the host interrupt handling of a read command when some or all of the data required to satisfy the command is not m the cache memory.
Figure 31 depicts exemplary logic for handling a seek command from the host when the addressed bm is not currently cached. Figure 32 depicts exemplary logic for the host interrupt handling of a write command when all the data bins related to that write are found to be m the cache memory.
Figure 33 depicts exemplary logic for the host interrupt andlmg of a write command when some or all of the data bins related to that write are not m the cache memory. MODULES IN EXECUTIVE CONTROL AND MAIN LOOP
In one embodiment, these modules make up the executive control and the control loop which runs at all times during which an interrupt is not being processed. These modules may call other modules which are described m a different section
Figure 34 depicts one embodiment of the initiation of a cache-ahead operation, if one is scheduled for the given drive
Figure 35 shows one embodiment of the main firmware contro_ for the described device. Figure 36 shows one embodiment of the operations which take place when the described device is initially powered on.
Figure 37 shows one embodiment of the background operation control loop for a drive.
Figure 38 depicts an exemplary procedure which shuts down the described device when it is powered off.
Figure 39 shows one embodiment of the logic for eliminating gaps in the modified portions of the cached data in a cache bin.
Figure 40 shows one embodiment of the handling of the queued commands for a given drive.
Figure 41 depicts, for one embodiment of this invention, the methods by which a module rechains cache bins from the LRU or near LRU positions of the global cache chain to the MRU or LRU positions of the specific, private disk cache chains; the movement is based on recycling information on each cache bin reflecting that bins activity since first being cached or since it was last successfully recycled. Figure 42 depicts one embodiment of the handling of a queued read cache miss operation.
Figure 43 depicts one embodiment of the method of fetching missing data from disk.
Figure 44 depicts one embodiment of the handling of a queued seek cache miss operation.
Figure 45 depicts one embodiment of the logic for determining if the cache associated with a given drive has included any modified cache bins for more than the specified time limit. Figure 46 depicts one embodiment of the initiation of a background write from cache to disk, if one is appropriate for the given drive at the current time.
Figure 47 depicts one embodiment of a method for identifying a modified cache bin assigned to a specified disk, which cache bin to write from cache to disk at this time.
Figure 48 depicts one embodiment of the handling of a queued write cache miss operation. MODULES ENTERED VIA DRIVE INTERRUPTS
In one embodiment, these modules are invoked, directly or indirectly, by an interrupt from one of the disk drives. These modules may call other modules which are described in a different section.
Figure 49 shows one embodiment of logic for handling the termination of a cache-ahead operation.
Figure 50 shows one embodiment of logic of the drive-interrupt management. Figure 51 depicts exemplary actions to be taken when a read from, or write to, a drive has completed, such read or write initiated for the purpose of eliminating a gap or gaps in the modified cached data of a cache bin.
Figure 52 depicts exemplary logic for the termination of a seek which was initiated for the purpose of writing modified data from a cache bin to its corresponding disk drive.
Figure 53 depicts exemplary logic for handling the termination of a background write from cache to disk. MODULES ENTERED VIA INTERNAL INTERRUPTS In one embodiment, these modules are invoked, directly or indirectly, by an interrupt from within the described device itself. These modules may call other modules which are described in a different section.
Figure 54 depicts exemplary handling of a power-off interrupt .
Figure 55 depicts exemplary logic for initiation of the background sweep for writing from cache to disk when the device is in its power-down sequence. MODULES ENTERED VIA SERIAL PORT INTERRUPTS In exemplary embodiments, this module is invoked, directly or indirectly, by an interrupt from a device attached to the serial port of the described device. These modules may call other modules which are described in a different section.
Figure 56 depicts exemplary logic for handling the communications with a peripheral attached to the device via the serial port .
CACHE ASSIGNMENT ILLUSTRATIONS
Figure 57 is a graph illustrating eight cases of the cache assignments of a three-plateau configuration.
Figure 58 is a graph illustrating eight cases of the cache assignments of a five-plateau configuration. TABLES
A variety of tables are interspersed with the text to clarify the logic of the described exemplary embodiments. In addition, there are a number of tables included for the purpose of showing examples of the control tables to illustrate some of the possible logical situations in which the described device may be operating. LOGIC TABLES
Table CM1 depicts exemplary operating rules based on the drive modes.
Table CM2 depicts exemplary control precedence for the global modes and the drive modes.
Table CM3 summarizes exemplary rules for setting the drive modes . Table CM4 summarizes exemplary rules for setting the global mode.
Table CS1 summarizes exemplary rules for setting the drive cache statuses .
Table CS2 summarizes exemplary rules for setting the global cache status.
Table CS3 gives an exemplary set of the possible status conditions and the corresponding actions required to acquire a cache bin for reuse.
Table CS4 gives an exemplary set of bin acquisition methods based on the combinations of cache statuses.
Table LB1 gives an exemplary set of cache bin locking rules for operations involving host reads.
Table LB2 gives an exemplary set of cache bm locking rules for operations involving host writes.
Table LB3 gives an exemplary set of cache bm locking rules for operations involving caching activities. Table LB4 gives an exemplary set of cache bm locking rules for operations involving sweep activities.
Table LB5 gives an exemplary set of cache bm locking rules for operations involving gaps. CONTROL TABLE EXAMPLES Tables TCA through TCG give an example of a Configuration Table which defines one exemplary configuration of the present invention.
Table TCA gives an exemplary set of sizing parameters for one configuration of the described device and some basic values derived therefrom.
Table TCB gives an exemplary set of drive cache status parameters for one configuration of the described device and some basic values derived therefrom.
Table TCC gives an exemplary set of global cache status parameters for one configuration of the described device and some basic values derived therefrom.
Table TCD gives an exemplary set of drive mode parameters for one configuration of the described device and some basic values derived therefrom. Table TCE gives an exemplary set of global mode parameters for one configuration of the described device and some basic values derived therefrom.
Table TCF gives an exemplary set of recycling parameters for one configuration of the described device. Table TCG gives an exemplary set of drive activity control parameters for one configuration of the described device.
Table TLB gives an example of the unmdexed values of an LRU table at the completion of system initialization at power up time.
Table TLC gives an exemplary snapshot of portions of an LRU table that are indexed by spindle number, taken at the completion of system initialization at power up time.
Table TLD gives an exemplary snapshot of portions of an LRU table that are indexed by cache bm number, taken at the completion of system initialization at power up time. Tables TLE through TLG give exemplary snapshots of some portions of a Least-Recently-Used Table taken during the operation of the present invention.
Table TLE gives an example of the unmdexed values of an LRU table at an arbitrary time during the operation of the present invention.
Table TLF gives an exemplary snapshot of portions of an LRU table that are indexed by spindle number, taken at an arbitrary time during the operation of the present invention.
Table TLG gives an exemplary snapshot of portions of an LRU table that are indexed by cache bm number, taken at the completion of system initialization at power up time.
Tables TAB through TAD give an example of an initial Address Translation (ADT) Table.
Table TAB gives an example of the unmdexed values of an ADT table at the completion of system initialization at power up time.
Table TAC gives an exemplary snapshot of portions of an ADT table that are indexed by spindle number, taken at the completion of system initialization at power up time. Table TAD gives an exemplary snapshot of portions of an ADT table that are indexed by disk bm number, taken at the completion of system initialization at power up time.
Tables TAE through TAG give exemplary snapshots of some portions of an Address Translation Table taken during the operation of the described device.
Table TAE gives an example of the unmdexed values of an ADT table at an arbitrary time during the operation of described device.
Table TAF gives an exemplary snapshot of portions of an ADT table that are indexed by spindle number, taken at an arbitrary time during the operation of described device. Table TAG gives an exemplary snapshot of portions of an ADT table that are indexed by disk bin number, taken at an arbitrary time during the operation of described device.
Tables TGB through TGD give an example of an initial GAP Table. Table TGB gives an example of the unmdexed values of a GAP table at the completion of system initialization at power up time.
Table TGC gives an exemplary snapshot of portions of a GAP table that are indexed by spindle number, taken at the completion of system initialization at power up time.
Table TGD gives an example of the unmdexed values of a GAP table at the completion of system initialization at power up time.
Tables TGE through TGG give exemplary snapshots of some portions of a GAP table taken at an arbitrary time during the operation of the described device.
Table TGE gives an example of the unmdexed values of a GAP table taken at an arbitrary time during the operation of the described device. Table TGF gives an exemplary snapshot of portions of a GAP table that are indexed by spindle number, taken at an arbitrary time during the operation of described device.
Table TGG gives an exemplary snapshot of portions of a GAP table that are indexed by gap number, taken at an arbitrary time during the operation of described device.
Table TMD gives an exemplary snapshot of portions of a Modified Bins table taken at the completion of system initialization at power up time.
Table TMG gives an exemplary snapshot of some portions of a Modified Bins Table taken during the operation of the described device. DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
The present invention is a computer peripheral data storage device consisting of a combination solid state memory and one or more mass storage devices, such as rotating magnetic disks; such device having the large capacity of magnetic disks with near solid state speed at a cost per megabyte approaching that of magnetic disk media. For the purposes of this discussion, embodiments will be described with regard to magnetic disk media. However, it is to be understood that the teachings of this invention are equally applicable to other types of mass storage devices, including optical disk devices, and the like. This invention derives its large storage capacity from the rotating magnetic disk media. Its high speed performance stems from the combination of a private channel between the two storage media, one or more microprocessors utilizing a set of unique data management algorithms, a unique prefetch procedure, parallel activity capabilities, and an ample solid state memory. This hybrid storage media gives overall performance near that of solid state memory for most types of computer workloads while practically never performing at less than normal magnetic disk speeds for any workload.
To the host computer, the present invention appears to be one or more directly addressable entities, such as magnetic
disks. By the combination, withm the device, of a solid state memory and one or more magnetic disk devices, private data communication lines are established within the device which avoids contention with normal data transfers between the host and the device, and transfers between the solid state memory and the disk media. These private data channels permit unrestricteα data transfers between the two storage media with practically no contention with the communication between the host computer and the present invention. Utilization of ample solid state memory permits efficient retention of data for multiple, simultaneously active data streams. Management of the storage is via microprocessors which anticipate data accesses based on historical activity. Data is moved into the solid state memory from the one or more mass memory devices based on management algorithms which insure that no table searches need be employed m the time-critical path.
Host computer accesses to data stored in the solid state memory are at near solid state speeds; accesses to data stored in the mass memory but not in the solid state memory are at near mass memory speeds. All data sent from the host to the device is transferred at solid state speeds limited only by the channel capability.
One embodiment of the present invention includes a power backup system which includes a rechargeable battery; this backup system is prepared to maintain power on the device should the outside power be interrupted. If such a power interruption occurs, the device manager takes whatever action is necessary to place all updated data onto mass storage before shutting down the entire device. Information about functional errors and operational statistics are maintained by the diagnostic module-error logger. Access to this module is via a device console and/or an attached
personal type computer. The console and/or personal computer are the operator's access to the unit for such actions as powering the unit on and off, reading or resetting the error logger, inquiring of the unit's statistics, and modifying the 5 device's management parameters and configuration.
The following sections of this paper describe exemplary embodiments of structure and methods taught by the present invention. HARDWARE DESCRIPTION 0 A device constructed in accordance with the teachings of this invention is depicted in Figure 2. Memory device 200 is a self-contained module which includes interfaces with certain external devices. Its primary contact is with host computer 201 via host interface 204. Host interface 204 comprises, for 5 example, a dedicated SCSI control processor which handles communications between host computer 201 and memory manager 205. An operator interface is provided via the console 207, which allows the user to exercise overall control of the memory device 200. Other points of contact are to terminal 207 (such as a 0 personal computer) for interrogating system status or the operating condition of memory device 200, and a telephone dial-m line 202 which can access via computer/terminal 207. Memory manager 205 handles all functions necessary to manage the storage of data m, and retrieval of data from disk drive 210 5 (or high capacity memory devices) and solid state memory 208, the two storage media. The memory manager 205 consists of one or more microprocessors 205-1, associated firmware 205-2, and management tables, such as Address Translation (ADT) Table 205-3 and Least Recently Used (LRU) Table 205-4. Solid state memory 30 208 is utilized for that data which memory manager 205, based on its experience, deems most useful to host computer 201, or most likely to become useful in the near future. Magnetic disk 210
is the ultimate storage for all data, and provides the needed large storage capacity. It may include one or more disk drives
210-1 through 210-N. Disk interface 209 serves as a separate dedicated control processor (such as an SCSI processor) for
5 handling communications between memory manager 205 and disk drive 210. In one embodiment, a separate disk interface 209-1 through 209-N is associated with each disk drive 210-1 through
210-N. Information about functional errors and operational statistics are maintained by diagnostic module-error logger 206. 0 Access to module 206 is obtained through console 207. Console
207 serves as the operator's access to the memory device 200 for such actions as powering the system on and off, reading or resetting the error logger, or inquiring of system statistics.
The memory device 200 includes power backup system 203 which
15 includes a rechargeable battery. Backup system 203 is prepared to maintain power to memory device 200 should normal power be interrupted. If such a power interruption occurs, the memory manager 205 takes whatever action is necessary to place all updated data stored in solid state memory 208 onto magnetic disk
20 210 before shutting down memory device 200.
Figure 3 depicts a hardware controller block diagram of one embodiment of this invention. As shown in Figure 3, hardware controller 300 provides three I/O ports, 301, 302, and 303. I/O ports 301 and 302 are single-ended or differential wide or
25 narrow SCSI ports, used to connect hardware controller 300 to one or more host computers 201 (Figure 2) . I/O ports 303-1 through 303-N is a single-ended SCSI port used to connect controller 300 to disk drive 210 (which in this embodiment is a
5.25" or 3.5" magnetic hard disk drive) . Disk drives 210-1
30 through 210-N provide long-term non-volatile storage for data that flows into controller 300 from host computers 201. In one embodiment, a separate SCSI port 303-1 through 303-N and
disk/cache interface 313-1 through 313-N are associated with each disk drive 210-1 through 210-N. "Differential" and "single-ended" refer to specific electrical characteristics of SCSI ports; the most significant distinction between the two lies m the area of acceptable I/O cable length. The SCSI aspects of I/O ports 301, 302, and 303 are otherwise identical. Cache memory 308 (corresponding to memory 208) is a large, high-speed memory used to store, on a dynamic basis, the currently active and potentially active data. The storage capacity of cache memory 308 can be selected at any convenient size and, in the embodiment depicted in Figure 3, comprises 64 Megabytes of storage. Cache memory 308 is organized as 16 Megawords; each word consists of four data bytes (32 bits) and seven bits of error-correctmg code. Typically, the storage capacity of cache memory 308 is selected to be within the range of approximately one-half of one percent (0.5%) to 100 percent of the storage capacity of the one or more magnetic disks 210 (Figure 2) with which it operates. A small portion of cache memory 308 is used to store the tables required to manage the caching operations; alternatively, a different memory (not shown, but accessible by microcontroller 305) is used for this purpose. Error Detection and Correction (EDAC) circuitry 306 performs error detecting and correcting functions for cache memory 308. In this embodiment, EDAC circuitry 306 generates a seven-bit error-correctmg code for each 32-bit data word written to cache memory 308; this information is written to cache memory 308 along with the data word from which it was generated. The error-correctmg code is examined by EDAC circuitry 306 when data is retrieved from cache memory 308 to verify that the data has not been corrupted since last written to cache memory 308. The modified Hamming code chosen for this embodiment allows EDAC circuitry 306 to correct all smgle-bit
errors that occur and detect all double-bit and many multiple-bit errors that occur. Error logger 307 is used to provide a record of errors that are detected by EDAC circuitry
306. The information recorded by error logger 307 is retrieved by microcontroller 305 for analysis and/or display. This information is sufficiently detailed to permit identification by microcontroller 305 of the specific bit in error (for single-bit errors) or the specific word in error (for double-bit errors) .
In the event that EDAC circuitry 306 detects a smgle-bit error, the bit in error is corrected as the data is transferred to whichever interface requested the data (processor/cache interface logic 316, host/cache interface logic 311 or 312, and disk/cache interface logic 313) . A signal is also sent to microcontroller 305 to permit handling of this error condition (which involves analyzing the error based on the contents of error logger 307, attempting to scrub (correct) the error, and analyzing the results of the scrub to determine if the error was soft or hard) .
In the event that EDAC circuitry 306 detects a double-bit error, a signal is sent to microcontroller 305. Microcontroller 305 will recognize that some data has been corrupted. If the corruption has occurred in the ADT or LRU tables, an attempt is made to reconstruct the now-defective table from the other, then relocate both tables to a different portion of cache memory 308. If the corruption has occurred in an area of cache memory 308 that holds user data, microcontroller 305 attempts to salvage as much data as possible (transferring appropriate portions of cache memory 308 to disk drives 210-1 through 210-N, for example) before refusing to accept new data transfer commands. Any response to a request for status from the host computer 20 L will contain information that the host computer 201 may use to recognize that memory device 200 is no longer operating
properly. Microcontroller 305 includes programmable control processor 314 (for example, a 68360 microcontroller available from Motorola) , 64 kilobytes of EPROM memory 315, and hardware to allow programmable control processor 314 to control the following: I/O ports 301, 302, and 303, cache memory 308, EDAC 306, error logger 307, host/cache interface logic 311 and 312, disk/cache interface logic 313, processor/cache interface logic 316, and serial port 309. Programmable control processor 314 performs the functions dictated by software programs that have been converted into a form that it can execute directly. These software programs are stored in EPROM memory 315. In one embodiment, the host/cache interface logic sections 311 and 312 are essentially identical. Each host/cache interface logic section contains the DMA, byte/word, word/byte, and address register hardware that is required for the corresponding I/O port (301 for 311, 302 for 312) to gain access to cache memory 308. Each host/cache interface logic section also contains hardware to permit control via microcontroller 305. In this embodiment I/O ports 301 and 302 have data path widths of eight bits (byte) . Cache memory 308 has a data path width of 32 bits (word) .Disk/cache interface logic 313 is similar to host/cache interface logic sections 311 and 312. It contains the DMA, byte/word, word/byte, and address register hardware that is required for disk I/O port 303 to gain access to cache memory 308. Disk/cache interface logic 313 also contains hardware to permit control via microcontroller 305. In this embodiment, I/O port 303 has a data path width of eight bits (byte) . Processor/cache interface logic 316 is similar to host/cache interface logic sections 311 and 312 and disk/cache interface logic 313. It contains the DMA, half-word/word, word/half-word, and address register hardware that is required for programmable control processor 314 to gain access to cache memory 308.
Processor/cache interface logic 316 also contains hardware to permit control via microcontroller 305. In this embodiment, programmable control processor 314 has a data path width of 16
Serial port 309 allows the connection of an external device (for example, a small computer) to provide a human interface to the system 200. Serial port 309 permits initiation of diagnostics, reporting of diagnostic results, setup of system 200 operating parameters, monitoring of system 200 performance, and reviewing errors recorded inside system 200. In other embodiments, serial port 309 allows the transfer of different and/or improved soft-ware programs from the external device to the control program storage (when memory 315 is implemented with EEPROM rather than EPROM, for example) . FIRMWARE
In one embodiment of the present invention firmware provides an active set of logical rules which is a real-time, full-time manager of the device's activities. Among its major responsibilities are the following: 1. Initialization of the device at power up.
2. Responses to host commands.
3. Cache management, including the movement of data between cache memory and the various integral disk drives.
4. Responses to console or serial port inquiries and commands.
5. Power line monitoring and controlled shutdown of the device on indications of power down conditions.
Each of the above is described in the following sections of the text and are illustrated in the accompanying diagrams and tables.
INITIALIZATION
When the described device is powered up from a quiescent condition, control is transferred to the firmware executive controller. See Figure 35. The first task of the executive is to test the various hardware components and initialize the entire set of control tables. See Figure 36. After completion of initialization, the executive enters a closed loop which controls the background tasks associated with each disk drive. See Figure 37. When power to the device is interrupted, the executive initiates a controlled shutdown. See Figure 38. Between power up and shutdown, the system reacts to host commands and, most importantly, is proactive in making independent decisions about its best course of action to maintain the most efficient operation. CONTROL TABLES
The activities of the present invention are controlled by firmware which m turn is highly dependent on a set of logical tables. The configuration of the device and records of the activities and data whereabouts are maintained in tables which are themselves stored in memory in the described device. There are five mam types of tables involved in the device's firmware:
1. The Configuration Table (CFG) table; this table is made up of unmdexed items which describe the configuration of the device and some of the values defining the device's rules for operation.
2. The Address Translation (ADT) table; the primary function of this table is to maintain records of which disk bins' data is cached in which cache bins at each instant. It also maintains some historical records of disk bins' activities. 3. The Least Recently Used (LRU) table; this table is central to the logic of managing the cache bins of the device. It maintains information on which portions of cache bins contain
valid data, which portions contain modified data, the order of most recent usage of the data in the cache bins, the recycling control information, and any other information necessary to the operation of the device. 4. The Gap (GAP) table; this table works in conjunction with the LRU table in keeping track of the modified portions of data withm cache bins. This table comes into play only when there are more than one, non- contiguous modified portions of modified data withm any one cache bm. 5. The Modified Bins (MOD) table; this table keeps a bit -map type record of all disk bins, with an indicator of whether or not the cache bm currently related to the disk bm contains modified data which is waiting to be written to the disk. Each of the above is described m the following sections of the text. FORMAT OF CONFIGURATION (CFG) TABLE
The Configuration table describes an operational version of the present invention and gives the basic rules for its operation.
SUMMARY OF CONFIGURATION TABLE
SIZES OF THINGS
CFG-DRIVES number of disk drives currently on the device
CFG-DRVSIZE capacity, in megabytes, of each disk drive CFG-DRIVEB capacity, in bins, of each disk drive
= CFG-DRVSIZE / CFG-BINSIZE CFG-BINSIZE size, in sectors, of each cache bin and each disk bin
CFG-SECSIZE size, in bytes, of the sectors on the disk drives.
CFG-CACHMB size, in megabytes, of entire cache. CFG-CACHBINS size, in bins, of the entire cache.
= CFG-CACHMB / CFG-BINSIZE. DRIVE CACHE STATUS PARAMETERS
CFG-DSMARGB lower limit (bins) of drive's chain in marginal status
CFG-DSNORPCT lower limit (pet) of all cache for drive normal status.
CFG-DSNORMB lower limit (bins) of drive's chain in normal status
= CFG-DSNORPCT * CFG-CACHBINS CFG-DSEXCPCT lower limit (pet) of all cache for drive excess status CFG DSEXCESB lower limit (bins) of drive's chain in excess status = CFG-CACHBINS * CFG-DSEXCPCT
GLOBAL CACHE STATUS PARAMETERS
CFG-GSMARGB lower limit (bins) of global chain in marginal status
CFG-GSNORPCT lower limit (pet) of all cache, global normal status CFG-GSNORMB lower limit (bins) of global chain in normal status = CFG-GSNORMB = CFG-GSNORPCT * CFG-CACHBINS
CFG-GSEXCPCT lower limit (pet) of all cache, global excess status CFG-GSEXCESB lower limit (bins) of global chain in excess status = CFG-GSEXCPCT * CFG-CACHBINS DRIVE MODE PARAMETERS CFG-DMSWEEP lower limit (bins) of modified bins for sweep mode
CFG-DMURGPCT lower limit (pet) of modified bins for urgent mode CFG-DMURGNTB lower limit (bins) of modified bins for urgent mode = CFG-DMURGPCT * CFG-CACHBINS / CFG-DRIVES
CFG-DMSATPCT lower limit (pet) of modified bins for saturated mode CFG-DMSATURB lower limit (bins) of modified bins for saturated mode = CFG-DMSATPCT * CFG-CACHBINS / CFG-DRIVES GLOBAL MODE PARAMETERS
CFG-GMURGPCT lower limit (pet) of modified bins for urgent mode
CFG-GMURGNTB lower limit (bins) of modified bins for urgent mode = CFG-GMURGPCT * CFG-CACHBINS
CFG-GMSATPCT lower limit (pet) of modified bins for saturated mode CFG-GMSATURB lower limit (bins) of modified bins for saturated mode = CFG-GMSATPCT * CFG CACHBINS RECYCLING CONTROL PARAMETERS CFG-RECYCLEB number of bins to be checked per recycle cycle
CFG-RECYCLEM maximum value which the recycle register can attain CFG-RECYCLEN normal adjustment factor for recycle register CFG-RECYCLEU urgent adjustment factor for recycle register
DRIVE ACTIVITY PARAMETERS
CFG-LAD-MAX total host I/O count that causes count adjustment CFG-LAD-ADJUST divisor to be used for adjusting drive I/O counts
DETAILS OF CONFIGURATION TABLE SIZING PARAMETERS CFG-DRIVES
Definition: The number of logical spindles, or disk drives currently incorporated m the device. Initialization: Set to number of available spindles, based on:
1) hardware supplied numbers;
2) exploration of available spindles during startup;
3) communication from the serial port. CFG-DRVSIZE Definition: Capacity, in megabytes, of each disk drive.
Initialization: Set to disk size based on information supplied by the disk drive. CFG-DRIVEB
Definition: Capacity, in bins, of each disk drive. Initialization: Set to disk size (CFG-DRVSIZE) divided by bmsize (CFG-BINSIZE) CFG-BINSIZE
Definition: Size, m sectors, of each cache bm and each disk bm. Initialization: Set at time CFG table is created; may be reset via communication with the serial port when the device is totally inactive, offline, and has no data stored in its cache. In one example, this is preset to a number which creates a bm size of approximately 32KB. This is approximately 64 sectors if the sector size is 512 bytes. CFG-SECΞIZE
Definition: The size, in bytes, of the sectors on the attached disk drives. In one example, this is preset to 512 bytes. Initialization: Preset to match the disk drive specifications. CFG-CACHMB
Definition: The size, m megabytes, of the entire cache available to the device.
Initialization: Set to available cache size based on:
1) hardware supplied numbers;
2) exploration of available cache during startup;
3) communication from the serial port. CFG-CACHBINS
Definition: The size, m bins, of the entire cache available to the device .
Initialization: Set to CFG-CACHMB / CFG-BINSIZE.
DRIVE CACHE STATUS PARAMETERS CFG-DSMARGB
Definition: The lower limit (number of bins) of a drive's cache chain m marginal status. This is the number of cache bins each drive owns at startup time; it is the minimum number of cache bins the device logic will allow m a drive's private cache chain at any time.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache. CFG-DSNORPCT Definition: The minimum cache size assigned to a drive when that drive is in normal status; expressed as a percentage of all cache .
Initialization. Preset to predefined number,- can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache. CFG-DSNORMB
Definition: The lower limit (number of bins) of a drive's cache chain m normal status; if the number of cache bins in a drive's cache chain is below this number, the drive is in the marginal status. Device logic tries to keep each drive's cache equal to, or larger than, this.
Initialization: Calculated at device startup time as a portion of the entire cache in the device.
CFG-DSNORMB = CFG-DSNORPCT * CFG-CACHBINS
CFG-DSEXCPCT Definition: The lower limit of the drive minimum cache size when the drive is in excess status; expressed as a percentage of the total cache, distributed over all drives. This also, and more importantly, defines the upper limit of a drive's normal status. Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-DSEXCESB
Definition: The lower limit (number of bins) of a drive's cache chain in excess status; if the number of cache bins assigned to a spindle is below this number, the drive is in normal or marginal status . Device logic tries to keep each drive ' s cache smaller than this.
Initialization: Calculated at device startup time as a portion of the entire cache in the device.
CFG-DSEXCESB = CFG-CACHBINS * CFG-DSEXCPCT
GLOBAL CACHE STATUS PARAMETERS
CFG-GSMARGB Definition: Absolute lower limit of number of cache bins in the global cache chain when in global marginal status. The lowest number of cache bins permitted to be assigned to the global cache chain at any time; the logic of the device always keeps the number of cache bins in the global cache chain greater than this number.
Initialization: Preset to predefined number; can be reset via erial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-GSNORPCT
Definition: The lower limit (percent of total cache) of the amount of the total cache in the global cache chain when in global normal status. Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-GSNORMB
Definition: The lower limit (number of cache bins) of bins in the global cache chain when m global normal status; also, the upper bound of the number of cache bins assigned to the global cache chain which defines the marginal global status.
Initialization: CFG-GSNORMB = CFG-GSNORPCT * CFG-CACHBINS
CFG-GSEXCPCT Definition: The lower limit (% of total cache) of the amount of the total cache in the global cache chain when m global cache excess status.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache.
CFG-GSEXCESB
Definition: The lower limit (number of cache bins) of the global cache chain when m global excess status; also, defines the upper bound of the global normal status. If the number of bins assigned to the global cache chain exceeds this value, the global cache chain status is defined as excess. The global cache chain will usually be greater than this only during device startup.
Initialization: At start up, all but a small number of cache bins are assigned to the global cache chain; as the cache bins are required for a spindle's use, they will be assigned to the spindle. CFG-GSEXCESB = CFG-GSEXCPCT * CFG-CACHBINS
DRIVE MODE PARAMETERS
CFG-DMSWEEP
Definition: The minimum number of cache bins in the modified pool for a given drive that places the drive m sweep mode and activates the background sweep for that drive.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-DMURGPCT Definition: The percent of a drive's average share of all cache bins, which when modified, puts that drive mto urgent mode.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache. CFG-DMURGNTB
Definition: When this number of cache bins assigned to a drive are m the modified pool, the drive is m the urgent mode.
Initialization: Set based on other parameters.
CFG-DMURGNTB = CFG-DMURGPCT * CFG-CACHBINS / CFG-DRIVES CFG-DMSATPCT
Definition: The percent of a drive's average share of all cache bins, which when modified, puts that drive into saturated mode. Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-DMSATURB
Definition: When this number of modified bins assigned to a drive are in the modified pool, the drive is m the saturated mode. Initialization: Set based on other parameters.
CFG-DMSATURB = CFG-DMSATPCT * CFG-CACHBINS / CFG-DRIVES
GLOBAL MODE PARAMETERS
CFG-GMURGPCT
Definition: The percent of the total number of cache bins, which, when in the modified pool, places the device in global urgent mode.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache. CFG-GMURGNTB
Definition: When this total number of modified bins are m the modified pool, the device is in the global urgent mode.
Initialization: Set based on other parameters.
CFG-GMURGNTB = CFG-GMURGPCT * CFG-CACHBINS CFG-GMSATPCT
Definition: The percent of the total number of cache bins, which, when in the modified pool, places the device in global saturated mode.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-GMSATURB
Definition: When this total number of modified bins are in the modified pool, the device is in the global saturated mode. Initialization: Set based on other parameters. CFG-GMSATURB = CFG-GMSATPCT * CFG-CACHBINS
RECYCLING CONTROL PARAMETERS
CFG-RECYCLEB
Definition: The number of bins to be checked for recycling per recycle cycle. Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache.
CFG-RECYCLEM
Definition: The maximum value which the recycle register for a given cache bin can attain regardless of the number of accesses between recycling actions on the related bin.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache. CFG-RECYCLEN
Definition: The adjustment factor to be applied to the recycle register for a given bm when that bm is tested for recycling and
1) the drive to which the bm is assigned is operating in the normal mode; and
2) the global mode is normal.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache. CFG-RECYCLEU
Definition: The adjustment factor to be applied to the recycle register for a given bin when that bm is tested for recycling and
1) the drive to which the bm is assigned is operating in the urgent mode; or
2) the global mode is urgent.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache. DRIVE ACTIVITY PARAMETERS CFG-LAD-MAX
Definition: The value of the total host I/O count that, when attained, causes the counts of I/O's for each drive to be adjusted downward by the least-active-drive tally adjustment factor. Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored in its cache. CFG-LAD-ADJUST Definition: The divisor to be used for adjusting the count-relative tally of host I/O's for each drive.
Initialization: Preset to predefined number; can be reset via serial port communication when the device is totally inactive, offline, and has no data stored m its cache. FORMAT OF ADDRESS TRANSLATION (ADT) TABLE The Address Translation Table, working with the LRU, GAP, and MOD tables, maintains the information required to manage the caching operations. There are three sections in the ADT table, the unmdexed, or single-valued items, and two sets of indexed, or tabular segments. For reference purposes, these three segments will be referenced as follows:
1. ADT-CONTROL, the unmdexed section of the ADT table.
2. ADT-DISKS, the indexed section containing information pertaining to each logical spmdle mcluded in the described device. 3. ADT-BINS, the indexed section of the ADT table containing information pertaining to each logical disk bm of the entire described storage device.
APT-CONTROL. THE UNINDEXED SEGMENT OF THE ADT TABLE
The unmdexed segment of the ADT table contains fields whose values are dynamically variable values; these fields are used primarily as counters of the device activities. 1) ADTC-ACCESSES. The total number of host I/O's to device since the device was powered on or since this field was last reset.
2) ADTC-READS. The total number of host reads from the device since the device was powered on or since this field was last reset.
3) ADTC-WRITES. The total number of host writes to the device since the device was powered on or since this field was last reset .
APT-DISKS. THE FIRST INDEXED SEGMENT OF THE ADT TABLE The first tabular section of the ADT table contains information relating to each logical spindle. There is one line in this section for each logical spmdle, and each line is referenced by logical spmdle number.
1) ADTD-LINE-BEG. The first line number withm the ADT-BINS table which relates to the referenced logical disk spmdle. This is set during system configuration and is not changed during the storage device's operation. ADTD-LINE-BEG is used as an offset to locate lines in the ADT-BINS table that are associated with a specific logical spmdle. 2) ADTD-HEAD-POS. The current position of the read/write head of the logical spindle. This is kept as a logical disk bin number and is updated each time the referenced disk head is repositioned.
3) ADTD-SWEEP-DIR. For each logical disk spmdle, the direction in which the current sweep of the background writes is progressing. This is updated each time the sweep reverses its direction across the referenced disk.
4) ADTD-DISK-ACCESSES. A count of the number of times this logical disk spmdle has been accessed by the host since the last time this field was reset. This is an optional field, but if present, is incremented each time the logical spmdle is accessed for either a read or a write. This field may be used to influence the amount of cache assigned to each spmdle at any given time.
5) ADTD-DISK-READS. A count of the number of times this logical disk spmdle has been accessed by a host read operation since the last time this field was reset. This field may be used to influence the amount of cache assigned to each spmdle at any given time.
6) ADTD-DISK-WRITES. A count of the number of times this logical disk spmdle has been accessed by a host write operation since the last time this field was reset. This field may be used to influence the amount of cache assigned to each spmdle at any given time.
7) ADTD-LAD-USAGE. A count related to the number of times this logical disk spindle has been accessed by the host. This field is incremented each time the logical spmdle is accessed for either a read or a write, and it is recalculated when the current total count of host I/O's for all drives reaches a preset limit. This field is used to maintain balance among the various drives and the management of the amount of cache assigned to each spmdle at any given time.
8) ADTD-LINK-MORE - the pointer to the ADTD line relating to the drive which has the next higher usage factor in the least-active-drive list. This is part of the bidirectional chaining of the LAD list lines. If this bm is the most active of all m the chain, ADTD-LINK-MORE will contain a null value.
9) ADTD-LINK-LESS - the pointer to the ADTD line relating to the drive which has the next lower usage factor in the
least-active-drive list. This is part of the bidirectional chaining of the LAD list lines. If this bm is the least active of all in the chain, ADTD-LINK-LESS will contain a null value. ADT-BINS TABLE. THE SECOND INDEXED SEGMENT OF THE ADT TABT.F There is one line in the second tabular portion of the ADT table for each logical disk bm of all spindles combined. A line is referred to by its line number, or index. The lines are grouped based on logical spmdle number so that all lines related to bins of the first logical spmdle are in the first section of the table, followed by all lines for bins of the second logical spmdle, and so on for all logical spindles in the storage device. The ADT-BINS line number, adjusted by the offset for the specific logical spmdle, directly corresponds to a logical disk bm number on the disk. When the host wants to access or modify data on the disk, it does so by referencing a starting disk sector address and indicating the number of sectors to be accessed or modified. For caching purposes, the starting sector address is converted mto a logical disk bin number and sector offset withm that logical disk bm. A disk sector address is converted mto a logical disk bm number and a sector offset by dividing it by the number of sectors per logical disk bm. The remainder is the offset mto the bm. The quotient is the disk bm identifier and is the index into the ADT table. Using this index, the condition of the specified disk bm can be determined directly from data in the ADT table; no search is required to determine cache-hits or misses. Each ADT-BINS line contains at least the following item:
1) ADTB-CACHE-BIN. This field contains the number of the logical cache bm which contains the data for the logical disk bm corresponding to this ADT table line number. By design, the value in ADTB-CACHE-BIN also points to the line m the LRU table related to the cached disk bm. A null value is stored in this
field to indicate that the data which is stored m, or is destined to be stored in, the logical disk bm is not in cache memory. It is by means of this field that cache-hits can be serviced completely without any table search. This field is updated each time data for a logical disk bin is entered into or removed from the cache memory. For implementation of additional intelligence to be applied to the cache maintenance operations, but which are not described in this document, other fields may be included in the ADT-BINS section of the ADT table. For example, each ADT-BINS line may contain one or more activity monitoring fields such as, but not limited to the following:
1) ADTB-BIN-ACCESSES. A count of the number of times this logical disk bm of this spmdle has been accessed by the host since the last time this field was reset. 2) ADTB-BIN-READS . A count of the number of times this logical disk bm of this spmdle has been accessed by a host read operation since the last time this field was reset.
3) ADTB-BIN-WRITES. A count of the number of times this logical disk bm of this spmdle has been accessed by a host write operation since the last time this field was reset. FORMAT OF LEAST RECENTLY USED (LRU) TABLE
The LRU table maintains the basic information pertaining to all cache bins. This includes a variety of information, generally of the following nature: 1. The assignment of cache bins to logical spindles;
2. The assignment of cache bins to disk bins;
3. Which cache bins are m the modified pool;
4. The relative times when data in cache disk bins was last accessed; 5. Which sectors of the cache bins contain valid information;
6. The existence of gaps, if any, in the data in the cached bins;
7. Other information required for managing the caching operations. The information concerning relative access times is necessary for the caching control logic to always be aware of which cache bins are available for overwriting whenever data from uncached disk bins must be placed in cache. Each cache bin is nearly always allocated to a cache bin of one of the spindles; however, at any given time, a cache bin may be considered to be in one of the following logical places:
1. Chained into that spindle's local cache chain;
2. Chained into the global cache chain;
3. Resident in the modified cache bins pool. The LRU table also provides certain redundancies for the data kept in the ADT table, thus contributing to system reliability. It is by means of the LRU table, the ADT table, and the GAP table that the system determines which cache bin to overwrite when cache space is needed for an uncached disk bin. There are three sections in the LRU table, the unindexed, or single-valued items, and two sets of indexed, or tabular segments. The unindexed portion of the LRU table contains global data required to manage the caching process . The tabular portions provide the actual cache management information and are composed of pointers for LRU chaining purposes, pointers into the ADT and GAP tables, and the recycle control registers or flags. For reference purposes, these three segments will be referenced as follows:
1. LRU-CONTROL, the unindexed section of the LRU table which contains overall, global information concerning the cache bins.
2. LRU-DISKS, the indexed section which contains information pertaining to the cache bins associated with each logical spmdle mcluded in the device.
3. LRU-BINS, the indexed section of the LRU table which contains information pertaining to each cache bm of the entire storage device. SUMMARY OF LRU TABLE FIELDS
The following is a brief reference list for the LRU table fields. See the detailed description for the entire set of information about these fields.
T.RU-CONTROL. UNINDEXED SEGMENT OF THE LRU TABLE
1. LRUC-GLOBAL-BINS - total number of bins in global cache chain.
2. LRUC-GLOBAL-LRU - pointer to oldest bm m global cache chain.
3. LRUC-GLOBAL-MRU - pointer to newest bm of global cache chain.
4. LRUC-GLOBAL-STATUS - current status of the global cache chain. 5. LRUC-GLOBAL-DIRTY - total number of modified cache bins . 6. LRUC-GLOBAL-MODE - current operating mode for the global cache. LRU-DISKS. FIRST INDEXED SEGMENT OF LRU TABLE. SPINDLES INFORMATION
1. LRUD-BINS-CHAIN - number bins allocated to spindle's cache chain.
2. LRUD-BINS-DIRTY - number modified cache bins assigned to spmdle. 3. LRUD-DISK-LRU - pointer to oldest bm in spindle's cache chain.
4. LRUD-DISK-MRU - pointer to newest bm in spindle's cache chain.
5. LRUD-DISK-STATUS - current status of spindle's cache chain. 6. LRUD-DISK-MODE - current operating mode for the disk drive. LRU-BINS TABLE. SECOND INDEXED SEGMENT OF LRU TABLE. BINS INFORMATION
1. LRUB-DISK-BIN - pointer to disk bin associated with cache bm.
2. LRUB-DISK-ID - identity of the spmdle for this cache bm.
3. LRUB-CHAIN - flag indicating bm is in global or m a drive chain. 4. LRUB-LINK-OLD - pointer to LRUB line of next older cache bin. 5. LRUB-LINK-NEW - pointer to LRUB line of next newer cache bin. 6. LRUB-VALID-LOW - lowest sector in cache bin containing valid data.
7. LRUB-VALID-HIGH - highest sector m bm containing valid data.
8. LRUB-MOD-LOW - lowest sector m bm containing modified data. 9. LRUB-MOD-HIGH - highest sector in bm containing modified data.
10. LRUB-MOD-GAP - pointer into GAP table if any gaps exist in data.
11. LRUB-LOCKED - flags indicating if, and why, a cache bin is locked.
11.1. LRUB-LOCK-RDHIT - locked by a host read hit.
11.2. LRUB-LOCK-RDMISS - locked by a host read miss.
11.3. LRUP-LOCK-WRHIT - locked by a host write hit.
11.4. LRUP-LOCK-WRMISS - locked by a host write miss.
11.5. LRUB-LOCK-RD-FETCH - locked for fetch from disk for host read. 11.6. LRUP-LOCK-RD-AHEAD - locked for read ahead based on host read.
11.7. LRUB-LOCK-CACHE - locked for cache ahead operation.
11.8. LRUP-LOCK-GAP-READ - locked for read from disk to eliminate gap.
11.9. LRUB-LOCK-GAP-WRITE - locked for write to disk to eliminate gap.
11.10. LRUP-LOCK-SWEEP - locked for write of modified data to disk. 12. LRUB-RECYCLE - recycling register for the cache bm. LRU-CONTROL, UNINDEXED SEGMENT OF THE LRU TABLE
The unindexed items pertain to the cache management, and include the following single-valued items.
1. LRUC-GLOBAL-BINS - the total number of cache bins currently in the global cache chain.
2. LRUC-GLOBAL-LRU - the pointer to oldest bm in the global cache chain. This LRU-CONTROL element points to the LRU-BINS table line whose corresponding cache data area is considered to be in the global chain and which has been left untouched for the longest period of time by a host read, cache ahead, or a cleaning operation. If there is new read activity for the referenced cache bm it is updated and promoted to the MRU position for its spmdle; if the new activity is a result of a host write, the cache bin is logically placed in the modified bins pool. The GLOBAL LRU cache bm is the first candidate for overwriting when new data must be placed in the cache for any spmdle .
When such overwriting does occur, this bm will be removed from the global chain and placed in one of the spmdle ' s local LRU chains or in the modified pool.
3. LRUC-GLOBAL-MRU - the pointer to the LRU-BINS table line whose corresponding cache bm is in the global chain and which is considered to be the most recently used of the global cache bins. GLOBAL-MRU is updated every time a cache bin of any spmdle is demoted from any local spindle's LRU chain, when a cache bin of the modified pool is cleaned by writing its modified data to its disk, or when, for any reason, a cache bm is chained to this position.
4. LRUC-GLOBAL-STATUS - the current status of the global cache chain.
This relates to the number of cache bins which contain unmodified data and which are currently assigned to the global cache and, therefore, are currently linked mto the global cache chain. The status is reset each time a cache bm is placed in or removed from the global cache chain. The global status is always excess, normal, marginal, or minimal. 5. LRUC-GLOBAL-DIRTY - the total number of modified cache bins, for all spindles, regardless of the spmdle to which they are assigned.
These cache bins contain data which has written from the host into cache and are currently waiting to be written from cache to disk.
This is increased by one whenever data m an unmodified cache bm is written by the host, and it is decreased by one whenever modified data resident in a cache bin is copied to the disk. 6. LRUC-GLOBAL-MODE - the current operating mode for the global cache.
This relates to the total number of cache bins assigned to all disk drives and which currently contain modified data. The mode is reset each time a cache bin for any drive is moved into or out of the modified pool. The global mode is always normal, urgent, or saturated.
LRU-DISKS. FIRST INDEXED SEGMENT OF LRU TABLE. SPINDLES INFORMATION
The first tabular section of the LRU table contains information relating to the cache bins assigned to each logical spindle. There is one line in this section for each logical spindle, and each line is referenced by logical spindle number.
1. LRUD-BINS-CHAIN - the number cache bins allocated to the corresponding spindle's private cache chain. This field is used to maintain the number of cache bins containing clean data and currently allocated to this spindle's private cache. This count excludes those cache bins assigned to this spindle but which are allocated to the global cache (and thus, linked into the global chain) .
2. LRUD-BINS-DIRTY - the number of cache bins currently assigned to the corresponding spindle, each of which contains some modified, or dirty, data which is currently awaiting a write to disk. This number is increased by one whenever an unmodified cache bin associated with this spindle is updated by the host, and it is decreased by one whenever data in a modified cache bin associated with this spindle is copied to the disk.
3. LRUD-DISK-LRU - points to the spindle's cache bin (and to the corresponding line m the LRUB table) which is in the spindle's cache chain and which has been untouched by host activity for the longest period of time. It is updated when new activity for the referenced cache bin makes it no longer the least-recently-used of the referenced spindle. The referenced cache bin is the next candidate for demoting to global cache
when this spindle must give up a cache bin. A cache bin demoted from the LRU position of any local cache chain enters the global chain at the MRU position.
4. LRUD-DISK-MRU - points to the cache bin (and to the corresponding line in the LRUB table) which has been most recently referenced by host read activity. The referenced cache bin will always be in the spindle's private cache. LRUD-DISK-MRU is updated each time a cache bin of the referenced spindle is touched by a read from the host, when data for the spindle is read from disk into cache as part of a cache ahead operation, or when a cache bin is promoted based on the recycling procedures. When such activity occurs, the address of the accessed cache bin is placed m LRUD-DISK-MRU and the LRU-BINS chains are updated in the LRU-BINS table. If a bin is used from the LRU position of the global cache to fulfill the requirement for a cache bin for a given spmdle, that global cache bin is allocated to the specific spmdle at that time and either chained into the spindle's local chain or placed in the modified pool. Such action may require the receiving spmdle, or some other spindle, to give its LRU cache bin to the top (MRU) of the global cache chain. Giving up the LRU cache bin in such a fashion does not decache the data in the bin; the data remains valid for the spindle from whose chain it was removed until the cache bin reaches the global LRU position and is reused for caching some other logical disk bin.
5. LRUD-DISK-STATUS - the current status of the cache chain for the disk drive. This relates to the number of cache bins which contain unmodified data and which are currently assigned to a given drive and, therefore, are currently linked into the drive's private cache chain. The status is reset each time a cache bin is placed in or removed from the disk's private
cache chain. The drive status is always excess, normal, marginal , or minimal .
6. LRUD-DISK-MODE - the current operating mode for the disk drive. This relates to the number of cache bins assigned to the disk drive and which currently contain modified data.
The mode is reset each time a cache bin for the given drive is moved into or out of the modified pool, or when a sweep timeout occurs. The drive mode is always normal, timeout, sweep, urgent, or saturated. LRU-BINS TABLE. SECOND INDEXED SEGMENT OF LRU TABLE. BINS
INFORMATION
There is one line in this tabular portion for each cache bin in the cache data area. A line is referenced by its line number, or index. That line number directly corresponds to a logical bin m the cache data area. Each LRU-BINS table line contains pointer fields plus other control fields.
1. LRUB-DISK-BIN - the pointer to the ADT line which references this cache bin. By design, this value, along with the spmdle identifier, identifies the logical disk bm whose data currently resides in this cache bm.
2. LRUB-DISK-ID - the identity of the spmdle for this cache bm. A field containing the identity of the spmdle to which this cache bm is currently assigned. This identity is maintained even when the cache bm is placed in the global cache chain or in the modified pool. In rare cases, this field may contain a null value, indicating that the cache bm is not assigned to any spmdle, and of course, does not contain any valid data.
3. LRUB-CHAIN - flag indicating whether the cache bm is m the global cache chain or in the drive cache chain. This is a one-bit marker where a one indicates the cache bm is m the global chain, and a zero indicates the cache bm is m one of
the drive's cache chains. When the cache bin is m the modifieα pool, this field has no meaning.
4. LRUB-LINK-OLD - the pointer to the LRUB line relating to the next-older (in usage) cache bm for the same drive. This is part of the bidirectional chaining of the LRU table lines. If this bin is the oldest of all in the chain, LRUB-LINK-OLD will contain a null value.
5. LRUB-LINK-NEW - the pointer to the LRUB line relating to the next newer (in usage) cache bm for the same drive. This is the other half of the bidirectional chaining of LRU table lines. If this bm is the newest of all m the chain, LRUB-LINK-NEW will contain a null value.
6. LRUB-VALID-LOW - the number of the lowest sector withm the cache bm containing valid data. This is a bm-relative number.
7. LRUB-VALID-HIGH - the number of the highest sector withm the cache bm containing valid data. This is a bm-relative number.
8. LRUB-MOD-LOW - the number of the lowest sector withm the cache bm containing modified data, if any. This is a bin-relative number.
9. LRUB-MOD-HIGH - the number of the highest sector withm the cache bm containing modified data. This is a bm-relative number. 10. LRUB-MOD-GAP - a pointer mto the GAP table if any gaps consisting of uncached data exist within the modified portion of the currently cached portion withm this cache bm. If one or more such gaps exist, this field points to the GAP table line containing information pertaining to the first of such gaps. If no such gaps exist, this field will contain a null value. Since normal workloads create few gaps and a background task is dedicated to the clearing of gaps, there will
be very few, if any, gaps at any given instant during normal operations .
11. LRUB-LOCKED - a set of flags which indicate whether or not the cache bm is currently locked. This set of flags indicates whether or not the corresponding cache bm is currently the target of some operation, such as being acquired from the disk, being modified by the host, or being written to the disk by the cache controller; such operation making the cache bm unavailable for certain other operations. The following sub-fields each indicate some specific reason for which the cache bm is locked, such lock may restrict some other specific operations involving this cache bm. More than one lock may be set for a given bin at any one time. For purposes of quickly determining if a cache bin is locked, these flags are treated as one field made up of eight sub-fields of one bit each. If found to be locked, the individual lock bits are inspected for the reason(s) for the lock.
11.1. LRUB-LOCK-RDHIT - a flag which, when set, indicates the cache bm is locked by a host read hit. 11.2. LRUB-LOCK-RDMISS - a flag which, when set, indicates the cache bm is locked by a host read miss.
11.3. LRUP-LOCK-WRHIT - a flag which, when set, indicates the cache bm is locked by a host write hit .
11.4. LRUP-LOCK-WRMISS - a flag which, when set, indicates the cache bm is locked by a host write miss.
11.5. LRUB-LOCK-RD-FETCH - a flag which, when set, indicates the cache bm is locked for a fetch from disk for a host read miss.
11.6. LRUP-LOCK-RD-AHEAD - a flag which, when set, indicates the cache bm is locked for a read ahead subsequent to a host read.
11.7. LRUB-LOCK-CACHE - a flag which, when set, indicates the cache bin is locked for a cache ahead operation.
11.8. LRUP-LOCK-GAP-READ - a flag which, when set, indicates the cache bin is locked for a read from disk to eliminate a gap in modified data.
11.9. LRUB-LOCK-GAP-WRITE - a flag which, when set, indicates the cache bm is locked for a write to disk to eliminate a gap in modified data.
11.10. LRUP-LOCK-SWEEP - a flag which, when set, indicates the cache bm is locked by the sweep for writing modified data to disk.
12. LRUB-RECYCLE - a field whose value indicates the desirability of recycling, or retaining in cache, the data currently resident m the cache bm. Its management and usage is as described in the recycling section of this document. The higher the value m this field, the more desirable it is to retain the data in this cache bin in cache when the cache bm reaches the LRU position in its spindle's LRU chain. This field may be one or more bits in size; for purposes of this description, it will be assumed to be four bits, allowing for a maximum value of 15. FORMAT OF GAP TABLE
The GAP table maintains the information required to manage the gaps m the valid, modified data withm bins of cached data. For each spmdle, the gaps are chained such that all gaps for a given cache bm are grouped mto a contiguous set of links. There are three sections m the GAP table, the unindexed, or single-valued items, and two sets of indexed, or tabular segments. For reference purposes, these three segments will be referenced as follows:
1. GAP-CONTROL, the unindexed section of the GAP table.
2. GAP-DISKS, the indexed section containing information pertaining to each logical spindle included in the described device.
3. GAP-GAPS, the indexed section containing detailed information pertaining to each gap that currently exists.
GAP-CONTROL. UNINDEXED SEGMENT OF THE GAP TABLE
The unindexed items of the GAP table include the following single-valued items.
1) GAP-GAPS. The total number of gaps that currently exist for all logical spindles of the described storage device.
This is increased by one whenever a new gap is created, and it is decreased by one whenever a gap has been eliminated.
2) GAP-UNUSED-FIRST. The line number of the first unused line in the GAPS portion of the GAP table. 3) GAP-UNUSED-NUMBER. The number of unused lines in the GAPS portion of the GAP table. GAP-DISKS TABLE. FIRST INDEXED SEGMENT OF THE GAP TABLE
The first tabular section of the GAP table contains summary information about the gaps relating to cache bins assigned to each logical spindle. There is one line in this section for each logical spindle, and it is indexed by logical spindle number. For the described version of this invention, this portion of the GAP table will contain four lines; however, any number of logical spindles could be included within the constraints of and consistent with the other elements of the device.
1) GAPD-NUMBER. The number of gaps that currently exist for this logical spindle. This is increased by one whenever a new gap is created in a cache bin assigned to this logical spindle, and it is decreased by one whenever a gap for this logical spindle has been eliminated. If no gaps exist in cache
bins assigned to this logical spindle, this value is set to zero.
2) GAPD-FIRST. A pointer which contains the GAP table line number of the first gap for the logical spmdle. 3) GAPD-LAST. A pointer which contains the GAP table line number of the last gap for the logical spmdle. GAP-GAPS TABLE, SECOND INDEXED SEGMENT OF THE GAP TABLE
The second tabular section of the GAP table contains detail information about the gaps that exist. There is one line m this section for each gap m any cache bm assigned to any logical spmdle, and it is indexed by arbitrary line number. Lines of the GAP-GAPS table are chained m such a way as to ensure that all GAP-GAPS lines relating to a given cache bm are chained into contiguous links. It is likely that, at any given time, very few, if any, of these lines will contain real information. The design of the storage device operations is to minimize the number of gaps that exist at any given moment, and when they do exist, the device gives a high priority to the background task of eliminating the gaps. 1) GAPG-DISK. The identity of the logical spmdle to which this gap pertains. For each logical spindle, this value will match the logical spmdle number. If this line of the GAP-GAPS table is not currently assigned to any logical spmdle, this value will be null to indicate that this line of the table is available for assignment to a new gap in some disk's cache bm should such a gap be created by caching activities.
2) GAPG-BIN. The cache bm number in which this gap exists. The value in this field acts as an index mto the LRU table for the cache bm in which this gap exists. The value in this field is null if this line of the GAP-GAPS table is not assigned to any cache bm; when this value is null, it indicates
that this line is available to be used for some new gap should such a gap be created by caching activities.
3) GAPG-SECTOR-BEG. The sector number in the bm identified in GAPG-BIN which is the first that contains non-valid data; this sector is the beginning of the gap. The value in this field is meaningless if this line of the GAP-GAPS table is not assigned to any spmdle.
4) GAPG-SECTOR-END. The sector number m the bm identified in GAPG-BIN which is the last that contains non-valid data; this sector is the end of the gap. The value in this field is meaningless if this line of the GAP-GAPS table is not assigned to any spmdle.
5) GAPG-PREV. A pointer to a line of the GAP-GAPS table which contains details pertaining to another gap of the same cache bm of the same logical spmdle, which gap precedes this gap m the gap chain for the same cache bin for the same logical spmdle. If this line of the table does not currently represent a gap, this field is used to maintain the position of this table line m the available gaps chain. Note that the firmware will maintain the gap chains in such a fashion so as to ensure that all gaps for a given cache bin assigned to a given spmdle will be connected in contiguous links of the spmdle gap chain. If this line of the GAP-GAPS table is not assigned to any spmdle, the value m this field points to the preceding unused line of the GAP-GAPS table. If this is the first link in the unused gap chain, this value will be set to null.
6) GAPG-NEXT. A pointer to a line of the GAP-GAPS table which contains details pertaining to another gap of the same cache bm of the same logical spmdle, which gap follows this gap in the gap chain for this logical spmdle and cache bm. If this line of the table does not currently represent a gap, this field is used to maintain the position of this table line in the
available gaps chain. If this line of the GAP-GAPS table is not assigned to any spmdle, the value in this field points to the succeeding unused line of the GAP-GAPS table. If this is the last link in the unused gap chain, this value will be set to null.
FORMAT OF MODIFIED BINS (MOD) TABLE
The Modified Bins (MOD) Table, working with the ADT, LRU, and GAP tables, maintains the information required to manage the background sweep operations of the described device. For each disk bm for each disk, the MOD table contains one bit. This bit is a one if modified data currently resides in the cache bm corresponding to the disk bin which relates to the said bit and the bit is a zero if no such modified data exists for the disk bin. The bits are accessed in word-size groups, for example, 16 bits per access. If the entire computer word is zero, it is known that there is no modified data m the cache bins corresponding to the disk bins represented by those 16 bits. On the other hand, if the computer word is non-zero, there is modified data in cache for at least one of the related disk bins. Since the bits of the MOD table are maintained in the same sequence as the disk bins, and the starting point related to each disk is known, the MOD table can be used to relate disk bins to modified cached data. This information, along with the information in the ADT, LRU, and GAP tables, gives a method of quickly determining which cache bin's data should be written to disk at the next opportunity. See the descriptions of the ADT table items which contain information about disk drive sizes and current read write head positions. There is one segment m the MOD table; it is indexed by an arbitrary value which is calculated from a disk bin number. Each line of the MOPD table is a single 16-bit word. Each word, or line, contains 16 one-bit flags representing the condition of
the corresponding 16 disk bins with respect modified data. The reference name for the field is MOD-FLAGS. FORMAT OF CACHE BINS (BAL) TABLE
A temporary BAL table is created for each host I/O. The BAL table for a given host I/O is made up of a list of the disk bins involved in the I/O and their corresponding cache bins, if any. A BAL table will contain one line for each disk bin which is part of the I/O. A BAL table contains information necessary for fulfilling the host I/O. This includes a variety of information which includes the entire set of information required to handle the I/O. There are two sections in the BAL table, the unindexed, or single-valued items, and a set of indexed, or tabular segments. For each BAL table, the following unindexed items describe the I/O. The indexed portion of the table gives details about each disk bin involved in the host I/O. The unindexed portion of the BAL table contains data describing the I/O. The tabular portion provides actual cache information. For reference purposes, these two portions will be referenced as follows: 1. BAL-COMMAND, the unindexed section of the BAL table which contains overall information concerning the I/O.
2. BAL-BINS, the indexed section which contains information pertaining to the each disk bin associated with the host I/O. BAL-COMMAND. UNINDEXED SEGMENT OF THE BAL TABLE
For each BAL table, the unindexed items in the BAL-COMMAND section describe the host I/O, and include the following single-valued items.
1. BALC-DISK-ID - the logical spindle to which the host I/O was addressed.
2. BALC-ADDRESS - the logical sector number of the first sector of data, on the specified logical spmdle, to be transferred for the host I/O.
3. BALC-SIZE - the number of sectors of data to be transferred m the host I/O.
4. BALC-HIT - A flag indicating whether or not all the data (for a read command) , or the whole data area (for a write command) , for the entire host I/O is represented m cache. BAL-BINS TABLE. INDEXED SEGMENT OF BAL TABLE, BINS INFORMATION For each BAL table, the indexed items m the BAL-BINS section describe the details about each disk bin involved in the host I/O. There is one line in the table for each disk bm involved in the host I/O. Each line of a BAL table contains the following items. 1. BALB-DBIN - the disk bm number.
2. BALB-CBIN - The corresponding cache bm, if any, in which data of the disk bin is currently cached.
3. BALB-BEGSEC - The beginning sector number withm the disk bm required for the host I/O. 4. BALB-ENDSEC - The final sector number withm the disk bm required for the host I/O.
5. BALB-VALID-LOW - The beginning sector number withm the cache bm which is in cache and which is required for the host I/O. For a cache hit, this will match the value m BALB-BEGSEC if this is the first bm involved in the I/O. For a cache hit on subsequent bins, if any, this will indicate the first sector (sector 0) of the bin.
6. BALB-VALID-HIGH - The final sector number within the cache bm which is in cache and which is required for the host I/O. For a cache hit, this will match the value in BALB-ENDSEC if this is the last bm involved in the I/O; for a cache hit on
97/49037 PC17US97/10155 ~
penultimate bins, if any, this will indicate the last sector of
7. BALB-GAPS - A marker indicating whether or not there are any gaps in the required cached area of this cache bm. CACHE MANAGEMENT
The caching operations of the described device are based, in part, on the concepts of drive modes and cache statuses. DRIVE MODES
Two types of drive modes are involved in the control of the described device. The two types of modes are the global drive mode and a mode for each individual drive. For drives, the modes are based on the number of modified cache bins assigned to each drive. The global mode is based on the total number of modified cache bins for all drives. See Figure 9. The purpose of the modes is to control the described device's actions with respect to its discretionary disk activity such as the background sweep, cache ahead, read ahead, and the cache management of recycling. An illustration of a drive cache with respect to drive operating modes is given m Figure 5. An illustration of the global cache with respect to global operating modes is given in Figure 7. DRIVE MODE DETERMINATION
The cache of each drive is always in one of the defined drive modes. The possible drive modes are normal, timeout, sweep, urgent, and saturated.
Table CM3 shows the rules for setting the drive modes.
DRIVE NUMBER OF MODIFIED
DRIVE MODIFIED POOL SIZE ( dp ) MODE BINS FOR GIVEN DRIVE dp<CFG-DMSWEEP normal few or none
1 <=dp<CFG-DMSWEEP timeout f ew , timer has expired
CFG-DMSWEEP < =dp<CFG-DMURGNTB sweep suff icient to trigger sweep
CFG-DMURGNTB< =dp<CFG-DMSATURB urgent may af f ect performance
CFG -DMSATURB< =dp saturated above level f or good performance
TABLE CM3 DRIVE MODE DETERMINATION
GLOBAL MODE DETERMINATION
The global cache of the described device is always in one of the defined global modes. The possible global modes are normal, urgent, and saturated. Table CM4 shows the rules for setting the global drive mode. See Figure 17.
GLOBAL TOTAL NUMBER OF GLOBAL MODIFIED POOL ΞIZE(gp) MODE MODIFIED BINS
gp< CFG-GMURGNTB normal in expected operating range
CFG-GMURGNTB<=gp<CFG-GMSATURB urgent may affect performance
CFG-GMSATURB<=gp saturated is degrading performance
TABLE CM4 GLOBAL MODE DETERMINATION DRIVE MODE OPERATING RULES
Whenever any cache activity involves the modification of a cache bin or the cleaning of a cache bin, the related modes are reset .
The sizes of the categories of modified pools on which the modes are based are shown in Figures 5 and 7. These sizes are chosen such that they enable the described device to act in an efficient manner based on the current cache conditions.
The operating rules for a given drive are based on that drive's operating mode. The drive mode governs various cache activities relating to the corresponding disk drives; in particular, it governs, drive by drive, the activation of the background sweep modules, the cache-ahead modules, recycling, and the read-ahead modules. Table CM1 summarizes the drive mode -operating rules relationships.
MODE SWEEP CACHE-AHEAD RECYCLE READ-AHEAD
Drive-normal off on normal on
Drive-timeout on by timer on shared normal on
Drive-sweep on by count on shared normal on
Drive-urgent on off urgent on
Drive-saturated on off
TABLE CM1 DRIVE MODE OPERATING RULES
GLOBAL MODE OPERATING RULES
The operating rules for the drives may be overridden by the rules based on the global operating mode. If the global mode so indicates, a drive may be forced to operate in a manner other than would be indicated by its own cache mode. Table CM2 summarizes the relationships between the global modes and the drive modes. Global mode cedes control of individual drives to drive mode except as noted in table CM2. In case of conflict between the rules of the two types of modes, the global mode operating rules override the drive mode operating rules.
MODE SWEEP CACHE-AHEAD RECYCLE READ-AHEAD
Global- -normal drive drive drive drive control control control control
Global- -urgent all on drive urgent drive control control
Global- -saturated all on all off all off drive control
TABLE CM2 GLOBAL MODE OPERATING RULES
CACHE CHAIN STATUSES
There are two types of cache chain statuses, global cache status and a drive cache status for each drive. The purpose of the statuses is to help to manage the cache activities, such as to provide plateaus for amounts of cache assigned to each drive under the varying balance of workloads among the drives. The global cache status is based on the number of cache bins in the global cache. Each drive's cache status is based on the number of bins in each drive's cache chain. While there could be any reasonable number of cache statuses, for purposes of this discussion, there will be assumed to be four; these are given names of minimal, marginal, normal, and excess. In the relationships between the statuses, excess is considered above,
or higher than, normal; normal is considered above, or higher than, marginal; and marginal is considered above, or higher than, minimal.
One of the functions of the statuses is to facilitate the reallocation of cache from one drive to another as different drives become the most active of the drives. As a drive becomes the target of many host I/O's in a short period of time, the cache assigned to that drive will be drawn from the other drives in an orderly fashion. The other drive which is the least active of those in the highest status will give up cache first.
When its status reaches a lower level, cache from the next least active drive will be drawn off until that drive reaches that same lower level .
At that point, the global cache will be drawn down to the same status level. This sequence continues until the currently active drive needs no more cache, has reached the top of its normal status range, or other conditions dictate that the active drive can no longer steal cache from elsewhere. The effect of this procedure is to allow a busy drive to use a large portion of the cache, but at the same time, to allow each of the other drives to maintain some amount of its most useful cached data.
An illustration of a drive cache with respect to drive cache chain statuses is given in Figure 4. An illustration of the global cache with respect to global cache chain statuses is given in Figure 6.
The sizes of the categories of cache chains on which the statuses are based, and are also shown in Figures 4 and 6. These sizes are chosen such that they enable the described device to act in an efficient manner based on the current cache conditions. Figure 57 illustrates the growth and recession of the cache allocations using the three plateaus as described herein. Figure 58 illustrates the cache allocations with an
assumed five plateau configuration, a logical extension of the described concepts.
Whenever any cache activity involves the modification of a drive or global cache chain, the related cache statuses are reset .
Each cache chain status defines a condition or situation under which the corresponding component is operating and how that component interacts with other components.
The minimum status is the one in which the component cannot under any condition give up a cache bin. For the global cache, this status generally defines the number of cache bins required to handle one host I/O of the largest acceptable size. For a drive, this status generally defines the number of cache bins required to maintain the cache chains intact . Assuming more than one drive is configured mto the described device, not all components can simultaneously be in the minimal status except during a period of a large number of writes by the host .
The marginal cache status is the one in which the component has sufficient cache bins available to operate but is not operating in the optimal manner. Assuming more than one drive is configured mto the described device, not all components can simultaneously be in the marginal status except during a period of a large number of writes by the host . The marginal cache status defines the smallest the cache chain for a given drive may become when the described device is operating m the generally normal fashion. In other words, each drive will usually have at least the marginal amount of cache protected from depletion by the needs of other drives.
The normal cache status is the one which the device logic desires to maintain the component for best overall device performance. A very active drive will generally operate with a number of cache bins hovering m the neighborhood of the upper
limit of the normal status. A very inactive drive will generally operate with a number of cache bins hovering m the neighborhood of the lower limit of the normal status.
The excess cache status is the one in which the component has more than the desired maximum cache bins assigned to it for optimal overall device performance. The global cache chain will begin operation m this status when the device is powered up. As various drives become active, the global status will move into the normal status. A drive will not likely ever be m the excess status. The primary purpose of the excess status is to delineate the upper bound of normal cache status. This is important to maintaining the balance of the caches assigned to the various drives under changing workloads DRIVE CACHE CHAIN STATUS DETERMINATION The cache chain for each drive is always in one of the defined drive cache chain statuses. Table CS1 shows the rules for setting the drive cache statuses.
IF DRIVE CACHE CHAIN SIZE(dc) IS THEN DRIVE STATUS IS dc=CFG-DSMARGB minimal
CFG-DSMARGB < dc<CFG-DSNORMB marginal
CFG-DSNORMB <=dc<CFG-DSEXCESB normal CFG-DSEXCESB<=dc excess TABLE CS1 DRIVE CACHE CHAIN STATUSES
GLOBAL CACHE CHAIN STATUS DETERMINATION
The global cache chain is always m one of the defined global cache chain statuses . Table CS2 shows the rules for setting the global cache status.
IF GLOBAL CACHE CHAIN SIZE(gc) IS THEN GLOBAL STATUS IS gc=CFG-GSMARGB minimal
CFG-GSMARGB < gc<CFG-GSNORMB marginal
CFG-GSNORMB <=gc<CFG-GSEXCESB normal CFG-GSEXCESB<=gc excess
TABLE CS2 GLOBAL CACHE CHAIN STATUSES
FINDING A CACHE BIN TO REUSE
When a cache bm is required to cache data for any reason, the global and drive cache chain statuses interact to determine where to find the cache bm to be reused. Specifically, when a cache bm is required for use with respect to a given drive, the drive may acquire the bin via one of three types of actions. See Figure 11.
1. Steal a cache bm from the global cache chain from the global cache chain, with or without compensation to the global cache chain from some other drive. This is the preferred method for obtaining a cache bm.
2. Buy a cache bm from the global cache chain; this is the method to be used when the cache chain status of the acquiring drive is at the edge of, or in, the excess status. 3. Beg for a bm from any drive cache chain; this is the last resort and will only occur when a large number of cache bins are in the modified pool.
A complete set of the possible status conditions and the corresponding actions required to acquire a cache bm for reuse is given in tables CS3 and CS4.
BEST
LEAST ACTION STEAL
CHAIN CHAIN ACTIVE TO BE BUY OR PROBABLE STATUS STATUS DRIVE TAKEN BEG FROM SITUATION minimal minimal minimal beg anywhere saturated minimal minimal marginal steal least active saturated minimal minimal normal steal least active saturated minimal minimal excess steal least active unlikely minimal marginal minimal buy global saturated minimal marginal marginal steal least active saturated minimal marginal normal steal least active unlikely minimal marginal excess steal least active unlikely minimal normal minimal buy global saturated minimal normal marginal steal least active unlikely minimal normal normal steal least active unlikely minimal normal excess steal least active unlikely minimal excess n/a buy global unlikely marginal minimal minimal steal global saturated marginal minimal marginal steal least active saturated marginal minimal normal steal least active saturated marginal minimal excess steal least active unlikely marginal marginal minimal buy global saturated
marginal marginal marginal steal least active saturated marginal marginal normal steal least active unlikely marginal marginal excess steal least active unlikely marginal normal minimal buy global saturated marginal normal marginal steal least active saturated marginal normal normal steal least active unlikely marginal normal excess steal least active unlikely marginal excess n/a buy global unlikely normal minimal minimal steal global saturated normal minimal marginal steal global saturated normal minimal normal steal least active unlikely normal minimal excess steal least active unlikely normal marginal minimal steal global saturated normal marginal marginal steal global saturated normal marginal normal steal least active expected normal marginal excess steal least active unlikely normal normal minimal steal global unlikely normal normal marginal steal global expected normal normal normal steal least active expected normal normal excess steal least active unlikely normal excess n/a buy global unlikely excess minimal n/a steal global startup excess marginal n/a steal global startup excess normal n/a steal global startup excess excess n/a buy global unlikely
TABLE CS3 COMBINATIONS OF STATUSES AND ACTIONS
BEST
LEAST ACTION STEAL
CHAIN CHAIN ACTIVE TO BE BUY OR PROBABLE
STATUS STATUS DRIVE TAKEN BEG FROM SITUATION minimal minimal minimal beg anywhere saturated minimal marginal minimal buy global saturated minimal normal minimal buy global saturated minimal excess n/a buy global unlikely marginal marginal minimal buy global saturated marginal normal minimal buy global saturated marginal excess n/a buy global unlikely normal excess n/a buy global unlikely normal excess n/a buy global unlikely marginal minimal minimal steal global saturated normal minimal minimal steal global saturated normal minimal marginal steal global saturated normal marginal minimal steal global saturated normal marginal marginal steal global saturated normal normal minimal steal global unlikely normal normal marginal steal global expected excess minimal n/a steal global startup excess marginal n/a steal global startup excess normal n/a steal global startup minimal minimal marginal steal least active saturated minimal minimal normal steal least active saturated minimal minimal excess steal least active unlikely minimal marginal marginal steal least active saturated minimal marginal normal steal least active unlikely minimal marginal excess steal least active unlikely minimal normal marginal steal least active unlikely
minimal normal normal steal least active unlikely minimal normal excess steal least active unlikely marginal minimal marginal steal least active saturated marginal minimal normal steal least active saturated marginal minimal excess steal east active unlikely marginal marginal marginal steal least active saturated marginal marginal normal steal least active unlikely marginal marginal excess steal least active unlikely marginal normal marginal steal least active saturated marginal normal normal steal least active unlikely marginal normal excess steal least active unlikely normal minimal normal steal least active unlikely normal minimal excess steal least active unlikely normal marginal normal steal least active expected normal marginal excess steal least active unlikely normal normal normal steal least active expected normal normal excess steal least active unlikely
TABLE CS4 ACTIONS BASED ON STATUSES AND ACTIONS
STEALING A CACHE BIN TO REUSE
Stealing is the preferred method for a drive to obtain a cache bm when a drive needs one for any purpose, for a read of any nature or a write. A drive which needs a cache bm to reuse may, depending on the cache chain statuses, "steal" a bm from the global cache chain. Since the drive is stealing the cache bm, it need not give up a cache bm; the global cache chain can provide a cache bm in some manner. The data in the global LRU cache bm is decached and the cache bm is made available for the drive's use.
Depending on the global cache chain status and the cache chain statuses of the other drives, the global cache may be compensated by taking a cache bm from some other drive. The global cache will usually take a bm from the least active drive in order to maintain the global cache withm the normal status.
If the drive cannot steal a cache bin; it must buy a cache bm, or as a last resort, it may beg for a cache bm. CONDITIONS FOR STEALING A CACHE BIN WITHOUT COMPENSATION
There are two general sets of conditions under which a drive will steal a cache bin from the global cache chain without the global cache chain receiving a cache bm in compensation.
The common set of conditions which result in the stealing of a cache bin from the global cache chain without any concomitant compensation are:
1. The global cache chain is in normal status. 2. No other drive has a cache chain status better than marginal, 3. The cache chain status of the stealing drive is not excess. The following set of conditions for stealing will generally occur only during the startup time of the device when nothing is in cache. These conditions are:
1. The global cache chain is in excess status.
2. The stealing drive cache chain is not in the excess status. METHOD OF STEALING A CACHE BIN WITHOUT COMPENSATION
When the above set of conditions are satisfied for stealing a cache bin, stealing is directly from the global cache chain. See Figure 13. The following general steps are performed: 1. The data cached in the global LRU cache bin is decached. 2. The LRU cache bin of the global cache chain is placed in the MRU position of the stealing drive's cache chain. 3. The status of the cache chains of the stealing drive and the global cache are reset as appropriate. CONDITIONS FOR STEALING A CACHE BIN WITH COMPENSATION
The set of conditions for stealing a cache bin with compensation will be the normal ones encountered during the operation of the described device. These conditions are:
1. The global cache chain is not in the excess status .
2. The stealing drive cache chain is not in the excess status. 3. There exists a cache chain for some (other than the stealing) drive which is in normal or excess status. METHOD OF STEALING A CACHE BIN WITH COMPENSATION
When the set of conditions are satisfied for stealing a cache bin with compensation, a bin is to be stolen from the least active cache chain which has the highest cache status. For this purpose, the global chain is considered to be the most active. See Figures 11 through 16. The following general logic is used: 1. The best, least active chain is identified.
See Figure 16.
2. If the best, least active cache chain is above marginal, that drive's LRU cache bin is stolen; that is, it is moved to the MRU position of the global chain.
3. If the best, least active cache chain is not above marginal, and the global status is normal, the LRU bin is stolen from the global chain. 4. If both the global and the best, least active cache chains are marginal, the LRU cache bin is stolen from the best, least active chain. 5. If the global chain is marginal, and the status of the drive needing a cache bin is above minimal, the drive must buy the bin from global .
6. If both the global cache chain and the best, least active chains are minimal, the status of the chain needing the bm is examined, and if it is above minimal, the chain requiring a cache bm must buy a cache bm from the global chain.
7. In all cases, the LRU cache bm of the global cache chain is placed in the MRU position of the stealing drive's cache chain. 8. The status of the cache chains of the stealing drive, and the donor (a drive or global) are reset as appropriate. BUYING A CACHE BIN TO REUSE
There are several sets of conditions under which a drive which needs a cache bin to reuse must "buy" a cache bm from the global cache chain. CONDITIONS FOR BUYING A CACHE BIN
When a drive needs to buy a cache bm, the method for obtaining that cache bin depends, to some extent, on the intended usage of the cache bm.
Different rules apply in some cases depending on whether the cache bm is to be used for a host write or for some form of data read.
In order for a drive cache chain to buy a cache bin, one of the following conditions must be met. If none of these sets of conditions is met, the drive cannot buy a cache bm; it must beg for a cache bin.
The first set of conditions for buying a cache bm: 1. The buying drive' s cache chain must be m the excess status; the cache bin may be used for either a read or a write.
The second set of conditions for buying a cache bm is a combination of the following:
1. The global cache chain must be in the minimal or marginal status. 2. The buying drive ' s cache chain is not m the excess status . 3. No other drive m the least active drive list has a status better than marginal. The third set of conditions for buying a cache bm is a combination of the following:
1. The cache bm is to be used for a read operation. 2. The global cache chain must be m the minimal status . 3. The buying drive's cache chain is in the minimal status. 4. No other drive in the least active drive list has a status better than marginal . METHOD OF BUYING A CACHE BIN When the conditions are satisfied for buying a cache bin, the following general steps are performed. See Figure 12.
1. The buying drive's LRU cache bm is rechamed to the MRU position of the global cache chain.
2. The global LRU cache bm is rechamed to the MRU position of the cache chain of the drive requiring a cache bm.
3. The global and drive' s cache statuses are reset to reflect the moved bm.
BEGGING FOR A CACHE BIN TO REUSE If the drive which needs a cache bm to use for a host write and is unable to steal or buy a cache bm, it is forced to "beg for" a bm from another drive's cache chain.
In this case, neither the given drive nor the global cache chain can afford to give up a cache bin. The other drives are canvassed in order based on the least active drive chain. Any drive's cache chain which can afford to do so must give its LRU cache bin to the global cache chain where it is placed in the MRU position. The global cache chain then gives to the requesting drive its LRU cache bin. See Figures 15 and 16.
The data in the global LRU cache bin is decached and that cache bin is made available for the drive's use. If no drive is found to be able to donate a cache bin, the system is overloaded with modified bins, i.e. in a saturated state. In this condition, the management of all drives is actively trying to write modified bins to the corresponding drives. A drive begging for a bin must wait until a cache bin becomes available from some drive's cache, even from its own. DECACHE DATA FROM A CACHE BIN
Whenever a cache bin is needed for caching some currently uncached data, some previously cached bin or bins of data must be decached. When the data in a cache bin is decached, the references to the corresponding drive bin are updated in the ADT table, LRU table, and GAP table, if any, are updated to show that the drive bin whose data currently was cached in the given cache bin is no longer cached. A bin that is a candidate for decaching will never have any references in the MODIFIED BINS table since that would indicate the host has written some data into this cache bin which has not yet been written to the drive.
In any case, such a cache bin would be in the modified pool, and not in the global cache chain. In every case, the cache bin to be decached will be the cache bin currently at the LRU position of the global cache chain. This generally will be a cache bin whose data has not
97/49037 PCMJS97/10155
been referenced by the host in a relatively long time compared to the reference times for the other cache bins. The cache bm chosen is not necessarily the one with absolutely longest time since access; due to the dynamic and ever-changing assignment of the described device ' s cache mto a global cache chain and multiple private cache chains for each drive, there may be some cache bm assigned to a drive other than the one currently involved in the reason for the decaching which has been cached but unreferenced for a longer period of time. This exception is intentional, it is part of the design which is intended to prevent activity related to any one drive from creating excessive interference with the caching performance of data for another drive; it enhances the effectiveness of the described caching scheme. The primary condition which must be satisfied in order for data m a cache bm to be decached is that the bm must be inactive; that is, it is not at this instant the subject of any host or background activity. It is highly unlikely that the global LRU bm would have any activity since most activities would reposition it out of the global cache chain.
It is possible that the global LRU bm is marked for recycling; if so, the recycling of that bm will be ignored. The recycling module would already have recycled the bin if conditions of time, statuses, and modes had permitted. The flexibility of the described design deliberately allows the recycling to be ignored when to do so is m the best interests of the overall device performance. FILLING GAPS IN MODIFIED BINS
When data is written from the host mto cache, there exists the possibility that a gap, or hole, will be created between previously modified, but as yet unwritten to disk, data and the new data. A gap is created when data destined for more
than one portion of a disk bm are cached in a single cache bm, the cached portions of data are both in the modified or dirty condition, and the cached portions are not contiguous. This is dealt with by bookkeeping in the LRU and GAP tables. It is generally desirable to eliminate gaps as much as possible since they complicate the process of determining cache hits. There are several possibilities for the relationship of the location of new data sent from the host with respect to previously cached data. The new data's locality may relate to the localities of previously cached data m a variety of ways. The new data may share no cache bins with the old, in which case, no gaps in the modified data will occur. If the new data does share cache bins with previously cached data, it may share the cache bins with old data m several ways. If the new data is contiguous to, or overlapping the old modified, cached data, no gaps can exist If all the previously cached data is clean, no gap is allowed to be created since some or all of the old data may be logically decached to avoid the gap. The only situation in which a gap in modified, cached data can occur is when data written from the host into cache happens to share a cache bm which contains data previously written from the host mto cache, such data not yet written to the disk, and the new and old data do not occupy contiguous or overlapping locations. When such a condition arises, the described device makes entries in the LRU table to indicate the hole exists, and makes an entry in the GAP table describing the hole. Note that gaps do not have any direct relationship to the existence of clean, cached data in the relevant area of the cache bm; only dirty data can create a gap. In actuality, only the first and/or last bins involved in any given data transfer from the host to cache can possibly create gaps; all interior bins must be full bins, and thus cannot create gaps. A special procedure in the background
activity periodically will examine the GAP table; if one or more gaps exist, a background operation will be set in motion to eliminate the gap. See Figures 39 and 51. A gap may be eliminated in either of two ways, the method being selected to be the most beneficial to the overall operation of the device. The decision of which method to use in eliminating a gap is based on the relative sizes of the dirty data segments, the current drive mode, and the global mode. If the modes indicate that there is a relatively large amount of modified data to be written from cache to disk, the decision will be to write data to eliminate the gap. This could be considered an out-of-order sweep event. In the alternative method for eliminating a gap, the data that should be located in the intervening space may be read from the disk mto cache and marked it dirty even when it is not dirty; the gap will no longer exist. This method is chosen when the drive and global modes are both normal, and the ratio of the gap size to the size of the smaller of the adjacent cached areas is less than a predetermined value. In the second method for eliminating the gap, the data occupying the modified areas withm the cache bm may be written from cache to disk; in this case the gap is eliminated by cleaning the dirty pieces of data and then decachmg one of the cached areas . The data to be decached is selected based on the relative sizes and directions within the cache bm of the cached portions so as to retain the data more likely to be useful in cache. This is usually the larger of the two segments, but may, under some circumstances be the smaller. Regardless of the method chosen to eliminate a gap, or gaps in the modified data withm a given cache bm, the process involves several steps. See Figures 39, 51, and 50. 1. The GAP table is updated to show a gap elimination is in progress on a given cache bm.
2. The LRU table is updated to show a gap elimination is in progress on a given cache bin.
3. A read from, or a write to, a disk drive is initiated.
4. The device manager continues to handle other tasks, both for the host and background.
5. When the drive signals completion of the read or write via an interrupt, the gap elimination module completes the LRU table and GAP table bookkeeping required to eliminate the gap.
GAP-FILL DEFINITIONS/CONSIDERATIONS:
1. In general, if the gap is small with respect to the cached amounts preceding and following the gap, eliminate the gap by reading from disk into cache; otherwise, write the modified portions to disk and decache the smaller of the two cached portions.
2. A gap is an area within a bin, its size being the number of sectors between the end of the modified data area preceding the gap and the beginning of the modified data area following the gap.
3. gapsize is the number of sectors in the gap.
4. forward-size is the size of the cached portion within the cache bin which is contiguous to the forward-most sector of the gap.
5. backward-size is the size of the cached portion within the cache bin which is contiguous to the rearward-most sector of the gap. 6. forward-ratio is gapsize divided by forward-size.
7. backward-ratio is gapsize divided by backward-size.
8. gapwπte is a preset ratio of the gapsize to forward-size or backward-size, which ratio, when exceeded, causes the gap to be eliminated by:
1. writing out cached data adjacent to one end of the gap; and
2. decaching data adjacent to the gap to eliminate the gap. 9. The following table shows the conditions and actions that result from the various conditions.
GLOBAL DRIVE BACKWARD- FORWARD- ACTION MODE MODE RATIO RATIO saturated write modified data to disk urgent write modified data to disk saturated write modified data to disk urgent write modified data to disk sweep write modified data to disk -- timeout write modified data to disk normal normal >gapwrιte write modified data to disk normal normal <gapwπte >gapwrιte write modified data to disk normal normal <gapwrιte <gapwrιte read data from disk into cache Examples global mode normal normal drive mode normal normal gapwπte 4 4
backward cached size θ 16 backward ratio 5.0 2.5 forward cached size 8 16 forward ratio 5.0 2.5 action write read
CACHE-AHEAD MANAGEMENT
One of the goals of the present invention is to have in cache, at any given time, that data which is expected to be accessed by the host m the near future. Much of the data retained m cache has been placed there either directly by the host or has been read from the disk as a direct result of a host activity. In addition to that data, it is desirable to
anticipate future host requests for data to be read from the device, and to prefetch that data from the disk into cache. There are several aspects to the prefetching of data. Some of these have a positive effect and some have a negative effect on performance. On the positive side, successful prefetching into cache of data that is actually requested by the host precludes read cache misses. This improves the overall device's performance. On the negative side, in order to prefetch a bin of data into cache requires that the data in some currently cached bin must be dropped from the cache. Additionally, while data is being prefetched from disk into cache, the disk and the path from the cache to it are busy. If a host request is received for data not in cache and not in the process of being fetched from disk, some delay will be encountered while the path and disk become free for use in satisfying the cache miss. The present invention uses a unique methodology to strike a favorable balance and maintain a high overall performance. Operations which write data from the host to the device need no cache-ahead operations since data can always be accepted from the host into the cache's SSD. However, a read cache-ahead action is a background type of activity, and only uses the private channel between disk and cache, it will have a minimal negative impact on the caching device's response time to host I/O activity. To further limit the impact, the cache-ahead is given a lower priority than any incoming host I/O request. As shown in Figures 29 and 42, the controller checks for the desirability for a cache-ahead after every host I/O which is a read operation regardless of whether the I/O was a cache hit or a cache miss. A major factor in limiting the cache-ahead activity is the lack of need for its operation following most host I/O's. As depicted in Figure 8, the described caching device determines the number of data segments of the same size
as the current host I/O which remain between the location of the end of the current host I/O data and each end of the cached bm containing that data. If this computed number of data segments is more than a predetermined number, the cache unit can handle that number of sequential, contiguous host read I/O's withm that cache bm before there is a need to fetch data for the succeeding bm from the disk into the cache memory. If, on the other hand, the computed number of data segments is not more than the predetermined number, it is possible for the host to access all those segments between the end of the current host I/O data location and the end of the cached bm in the same or less time than it would take for the caching device to fetch the succeeding bm of data from the disk mto the cache memory In this case, the caching device will attempt to initiate action to fetch the data from the succeeding disk drive bm so that the service to the host can proceed with the least disk-imposed delays. Conversely, if the caching device were to ignore the above-described locality factor and always fetch the next data bm after every cache read-miss, many unneeded bins of data would be fetched from disk into cache memory. Such excessive fetches would use up more of the caching device's resources with a negative impact on the cache device's host service time. There are only two types of candidates for cache-ahead- they are the disk bιn(s) immediately following that involved in the host I/O and the disk bιn(s) immediately preceding that of the host I/O. Since these bins will often have already been cached by previous cache-ahead activity, the cache-ahead activity is largely a self-limiting process. While a logical extension of the described methodology would cache-ahead more than one bm, in the described embodiment only one bm is cached-ahead for any given host I/O: the bm succeeding the host I/O is the primary candidate. If it is not already cached, and the proximity
factor indicates the cache-ahead should occur, the forward bin is cached at this time. If the succeeding bm is already cached, the bm preceding the host I/O is considered; if it is not already cached, and the proximity factor favors caching, the data from the preceding bm is cached at this time. Of course, if both of these candidate bins had been cached previously, the cache-ahead module has no need to do any caching. A very important benefit accrues from this cache-ahead, cache-back feature. If related bins are going to be accessed by the host m a sequential mode, that sequence will be either in a forward or backward direction from the first one accessed in a given disk area. By the nature of the cache-ahead algorithm, an unproductive cache-ahead will only involve one bm which lies m the wrong direction from the initial bm m any given bm cluster. This, coupled with the proximity algorithm, makes the cache-ahead behavior self-adapting to the direction of the accesses. Regardless of the desirability of initiating a prefetch of data from disk, some other factors must be considered. The present invention takes mto account the current global and drive modes. If the conditions are such that the most urgent needs of the device is to write modified data from cache to disk, any scheduled prefetch will be held in abeyance in order to allow the background sweep to fully utilize the resources for writing data to the disk. Much of the time in which the sweep is active writing modified data to the disk, the urgency of such writing is not great. In this case, the background sweep and prefetches for a given drive alternately use the resources. See Figure 37. Of course, when the background sweep is inactive any scheduled prefetches can proceed without concern for the sweep. Likewise, when there are no cache-ahead events scheduled, the sweep can proceed to use the resources as needed. When all conditions for initiating a
cache-ahead operation are satisfied, the cache-ahead proceeds as shown in Figure 34. Once the read from disk to cache is initiated, it proceeds without further supervision by the firmware until the read has completed and an interrupt is generated by the drive.
In another, logical extension of the cache-ahead logic, the cache management can do a cache-ahead in two steps. The first step would entail a seek to position the read-write head of disk drive to the proper disk track. If the drive were needed for servicing a higher priority operation, the cache- ahead could be aborted at this point. If no such operation exited at the end of the seek, the cache management would proceed to read the data from disk mto cache to complete the cache-ahead operation. CACHE-AHEAD TERMINATION
When a read from disk to cache completes, the disk will generate an interrupt. See Figure 50. The cache-ahead termination module will then update the control tables to show the subject disk bin's data is now in a cache bm. The cache bm is rechamed as the MRU bm for the drive. See Figure 49. LOCKING CACHE BINS FOR SIMULTANEOUS OPERATIONS
The present invention is designed to handle operations between the host and the cache and between the disks and cacne simultaneously. To preserve data integrity, a system of cache bm locks is incorporated to ensure that no data is overwritten or decached while it is the object of some kind of I/O activity.
For most operations, no bins will have prior locks, and there is no conflict. However, if the same cache bm is involved in two simultaneous I/O operations, it is absolutely essential that care be given to the sequence of the operations in order to preserve data integrity.
The following discussion and the Simultaneous Operations Tables LB1 through LB5 present the basic logic which must be followed. For clarity in the description of the overall operations of the present invention, these rules for simultaneity are not always reflected m the logic diagrams,- their absence from the diagrams is not meant to imply that these rules do not apply.
The cache bin lock flags are mcluded in the LRU table; see the section on the LRU format for a description of those flags. At any given time, a bm may be locked for more than one reason; all locks must be considered in determining if simultaneous operations may proceed. The most restrictive lock prevails. If a cache bin is found locked by a background task, there is no problem, since background tasks can be delayed. If a host I/O request involves a locked cache bm, there can be one of three results based on the lock flags of a bm: the new operation may proceed, the new operation may be delayed (queued for later completion) , or, m rare cases, it may proceed using an alternate cache bm. The following notes discuss the various considerations and are referenced in the tables describing the handling of potential conflicts.
Not all the following notes are referenced in the tables, those notes not referenced are mcluded for their explanatory value with respect to the rest of the notes. For some cases, there are more applicable notes than specified in the tables. LOCK FLAG NOTES GENERAL NOTES
GNA. A background operation usually will not be initiated if its target cache bm is currently locked for any reason.
GNB. Multiple operations involving a given cache bin will be handled in the order received or initiated, all other conditions being equal . This is especially important when one or both operations are modifying the data in the cache bin. GNC. Use of a newly assigned, alternate cache bin for a host I/O results in the decaching of the cache bin currently assigned to the disk bin. Only clean bins can be decached.
Bins that are the subjects of cache aheads or read aheads may be abandoned at the end of those operations if so doing will contribute to the overall performance of the present invention.
Any abandoned cache bins are made available for immediate reuse.
GND. A cache bin that contains any modified sectors is considered dirty, and resides in the pool of modified bins.
GNE. Simultaneous, identical operations on the same cache bin are generally redundant and are, for the purposes of the following tables, considered impossible. READ HIT NOTES
RHA. A read hit refers to the existence of both the cache bin and the currently valid (cached) sectors m that cache bin.
RHB. Gaps are taken into account when determining read hits.
RHC. A locked bin effectively causes a read hit to be handled as a read miss since a locked bin will delay fulfilling the host request.
RHD. A read hit is handled immediately without disconnecting from the host. Therefore, another host command cannot occur for the same drive (and, thus, the same cache or disk bin) until this immediate command is completed. READ MISS NOTES
RMA. A read miss may result in a read fetch, or in both a read fetch and a read ahead. The LRU table is updated for
valid sectors at the time each read fetch and each read ahead is completed.
RMB. Gaps are taken mto account when determining read misses . WRITE HIT NOTES
WHA. A write hit refers to the cache bm only; the existence or absence of valid sectors is not considered for the write hit/miss determination.
WHB. A host write immediately marks the sectors being written from the host to the cache as modified in the LRU table, makes the target cache bm dirty, and removes the cache bm from its cache chain (placing the cache bm m the pool of modified bins) . A host write does not mark the sectors being written from the host to the cache as valid in the LRU table until the operation is completed.
WHC. A host write may modify the currently valid sectors in cache bins, may extend the valid area, create a gap, or do some combination of these.
WHD. A write hit is handled immediately without disconnecting from the host. Therefore, another host command cannot occur for the same drive (and, thus, the same cache or disk bm) until this immediate command is completed.
WRITE MISS NOTES
WMA. A write miss for any currently assigned cache bm, clean or dirty, is a contradiction of terms and cannot occur.
WMB. A host write immediately marks the sectors being written from the host to the cache as modified in the LRU table, makes the target cache bm dirty, and removes the cache bin from its cache chain. A host write does not mark the sectors being written from the host to the cache as valid in the LRU table until the operation is completed.
WMC. A host write may modify the currently valid sectors in cache bins, may extend the valid area, create a gap, or do some combination of these.
WMD. A write miss is nearly always handled immediately without disconnecting from the host; in this discussion, it will be assumed that all write misses are handled without disconnecting. Therefore, another host command cannot occur for the same drive (and, thus, the same cache or disk bm) until this immediate command is completed. READ FETCH NOTES
RFA. A read fetch reads data from the disk mto cache. A read fetch occurs only as a result of a read miss, and the primary purpose of the fetch is to satisfy the direct requirements of the read miss . RFB. A read fetch uses an assigned cache bm but does not mark the sectors being read from disk mto cache as valid m the LRU table until the disk read (fetch) operation is completed.
RFC. A read fetch occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
RFD. A bm locked for a read fetch cannot be decached to accommodate use of an alternate cache bm, since an active host read is waiting for the data being fetched, and which was resident in the present invention prior to this new host write.
READ AHEAD NOTES
RAA. A read ahead refers to the reading of data from disk mto cache of the portion of the data in the disk bm succeeding the sectors covered by the read fetch which satisfied the read miss which triggered the read ahead.
RAB. A read ahead uses an assigned cache bm but does not mark the sectors being read from disk mto cache as valid m the LRU table until the disk read operation is completed.
RAC. A read ahead bin may be abandoned if it is clean, and a subsequent, overlapping (in time) host I/O operation can be handled more quickly by assigning and using another cache bm instead of the read ahead bm. In this case, the bin that is the subject of a read ahead will be abandoned prior to or at the end of that operation, and the abandoned cache bins made available for immediate reuse.
RAD. A read ahead occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
CACHE AHEAD NOTES CAA. A cache ahead is the result of the proximity logic determining that the data in a disk bm adjacent to data currently residing m a cache bin should be read from disk into cache.
CAB. A cache ahead assigns a cache bm to a disk bin; the cache bm is always clean and is immediately placed m the cache chain of the owning drive, but no sectors will be marked valid in the cache bin until the cache ahead is completed.
CAC. A cache ahead bm may be abandoned if a subsequent, overlapping (m time) host I/O operation can be handled more quickly by assigning and using another cache bin instead of the cache ahead bin. In this case, the bm that is the subject of a cache ahead will be abandoned prior to or at the end of that operation, and the abandoned cache bins made available for immediate reuse. CAD. A cache ahead occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path.
SWEEP WRITE NOTES
SWA. A sweep write for a clean cache bm is a contradiction of terms, and cannot occur. A sweep only deals with a dirty bin. SWB. A sweep write operation does not alter any data m a cache bm, and does not change the valid area of the cache bm.
SWC. A sweep write occupies the cache to disk I/O path and precludes other operations requiring that same path. SWD. A host write may occur on a bm locked for a sweep write; however, the bm and its (newly) modified sectors must be recorded as dirty when the sweep and host write are completed, rather than being completely clean as at the completion of an uncontested sweep write. GAP READ NOTES
GRA. A gap can occur only in modified portions of the data in a cache bm, and thus the bm is dirty. A gap read for a clean cache bm is a contradiction of terms, and cannot occur.
GRB. A gap read is presumed to alter data m a cache bm.
GRC. A gap read occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path
GRD. A gap read uses an assigned cache bm but does not mark the sectors being read from disk mto cache as valid in the LRU table until the disk read operation is completed. GAP WRITE NOTES
GWA. A gap can occur only in modified portions of the data in a cache bm, and thus the bm is dirty. A gap write for a clean cache bm is a contradiction of terms, and cannot occur.
GWB. A gap write takes the sectors that are about to be decached out of the valid area when the gap write is initiated.
GWC. A gap write occupies the cache to disk I/O path and precludes other, simultaneous operations requiring that same path. SIMULTANEOUS OPERATIONS INVOLVING HOST READS
NEW BIN ACTION/
OPERATION CONDITION LOCKED BY COMMENT NOTES read hit clean cache ahead impossible RHA,CAB read hit clean read ahead proceed RAB read hit clean read fetch proceed RFA,RFB read hit clean sweep impossible SWA read hit clean gap read impossible GRA read hit clean gap write impossible GWA read hit clean read hit impossible RHD read hit clean read miss proceed RHA,RMA read hit clean write hit impossible WHD read hit clean write miss impossible WMB read hit dirty cache ahead impossible CAB read hit dirty read ahead proceed RAB read hit dirty read fetch proceed RFA,RFB read hit dirty sweep proceed SWB read hit dirty gap read proceed RHB,GRD read hit dirty gap write proceed RHB,GWB read hit dirty read hit impossible RHD read hit dirty read miss proceed RMA read hit dirty write hit impossible WHD read hit dirty write miss proceed WMD,WMB read miss clean cache ahead wait CAB read miss clean read ahead wait RAD read miss clean read fetch wait RFA,RFB read miss clean sweep impossible SWA read miss clean gap read impossible GRA read miss clean gap write impossible GWA read miss clean read hit impossible RHD read miss clean read miss wait RMA,RFC, RAD read miss clean write hit impossible WHD read miss clean write miss impossible WMB read miss dirty cache ahead impossible CAB read miss dirty read ahead wait RAB,RAD read miss dirty read fetch wait RFB,RFC read miss dirty sweep wait SWC read miss dirty gap read wait GRC read miss dirty gap write wait GWC read miss dirty read hit impossible RHD read miss dirty read miss wait GNB read miss dirty write hit impossible WHD read miss dirty write miss impossible WMD,WMC
TABLE LB1 - HOST READS INVOLVING LOCKED BINS SIMULTANEOUS OPERATIONS INVOLVING HOST WRITES
NEW BIN ACTION/
OPERATION CONDITION LOCKED BY COMMENT NOTES write hit clean cache ahead use alternate bin GNC,CAC write hit clean read ahead use alternate bin GNC,RAC write hit clean read fetch wait RFD,WHC write hit clean sweep impossible SWA write hit clean gap read impossible GRA write hit clean gap write impossible GWA write hit clean read hit impossible RHD write hit clean read miss wait WHB write hit clean write hit impossible WHD write hit clean write miss impossible WMD,WMC write hit dirty cache ahead impossible CAB write hit dirty read ahead wait WHA,GNB write hit dirty read fetch wait WHA,GNB,RFD write hit dirty sweep proceed SWB,SWD write hit dirty gap read wait WHC write hit dirty gap write wait WHC,GNB write hit dirty read hit impossible RHD write hit dirty read miss wait GNB write hit dirty write hit impossible WHD write hit dirty write miss impossible WMD,GNB write miss clean any impossible WMA write miss dirty any impossible WMA
TABLE LB2 - HOST WRITES INVOLVING LOCKED BINS
SIMULTANEOUS OPERATIONS INVOLVING CACHING ACTIVITIES
NEW BIN ACTION/
OPERATION CONDITION LOCKED BY COMMENT NOTES cache ! ahead clean any impossible CAB cache : ahead dirty any impossible CAB read ahead clean cache ahead impossible RAA, CAA read ahead clean read ahead impossible GNA,RAA read ahead clean read fetch impossible RAA,RFA read ahead clean sweep impossible SWA read ahead clean gap read impossible GRA read ahead clean gap write impossible GWA read ahead clean read hit impossible RHD,RMA read ahead clean read miss proceed RMA read ahead clean write hit impossible WHD,RMA read ahead clean write miss impossible WMD,WHB read ahead dirty cache ahead impossible CAB read ahead dirty read ahead impossible RMA,RAA read ahead dirty read fetch impossible RMA,RFA read ahead dirty sweep impossible SWC,RMA,GNA read ahead dirty gap read impossible GRC,RMA,GNA read ahead dirty gap write impossible GWC,RMA,GNA read ahead dirty read hit impossible RHD,RMA,RAA
read ahead dirty read miss proceed RMA read ahead dirty write hit impossible WHD read ahead dirty write miss impossible WMD, WMB read fetch clean cache ahead impossible CAA, RFA read fetch clean read ahead wait RAD read fetch clean read fetch impossible GNE read fetch clean sweep impossible SWA read fetch clean gap read impossible GRA read fetch clean gap write impossible GWA read fetch clean read hit impossible RHD, RMA read fetch clean read miss proceed RMA read fetch clean write hit impossible WHD, RMA read fetch clean write miss impossible WMD, WMB read fetch dirty cache ahead impossible CAB read fetch dirty read ahead wait RFA,RAA,RMA read fetch dirty read fetch impossible GNE read fetch dirty sweep wait SWC read fetch dirty gap read wait GRC read fetch dirty gap write wait GWC read fetch dirty read hit impossible RHD, ,RMA read fetch dirty read miss proceed RMA, ,RFA read fetch dirty write hit impossible WHD ,RMA read fetch dirty write miss impossible WMD
TABLE LB3 - SIMULTANEOUS OPERATIONS INVOLVING CACHING ACTIVITIES SIMULTANEOUS OPERATIONS INVOLVING SWEEP ACTIVITIES
NEW BIN ACTION/
OPERATION CONDITION LOCKED BY COMMENT NOTES sweep clean any impossible SWA sweep dirty cache ahead impossible CAB sweep dirty read ahead impossible GNA sweep dirty read fetch impossible GNA sweep dirty sweep impossible GNE sweep dirty gap read impossible GNA sweep dirty gap write impossible GNA sweep dirty read hit proceed SWB sweep dirty read miss impossible RMA,RFB,RAD sweep dirty write hit impossible GNA sweep dirty write miss impossible GNA
TABLE LB4 - SIMULTANEOUS OPERATIONS INVOLVING SWEEP ACTIVITIES
SIMULTANEOUS OPERATIONS INVOLVING GAP ELIMINATION ACTIVITIES
NEW BIN ACTION/
OPERATION CONDITION LOCKED BY COMMENT NOTES gap read clean any impossible GRA gap read dirty cache ahead impossible CAB
gap read dirty read ahead impossible GNA gap read dirty read fetch impossible GNA gap read dirty sweep impossible GNA gap read dirty gap read impossible GNE gap read dirty gap write impossible GNA gap read dirty read hit proceed GND , RHB gap read dirty read miss wait GRC gap read dirty write hit impossible GNA gap read dirty write miss impossible GNA gap write clean any impossible GWA gap write dirty cache ahead impossible GNA gap write dirty read ahead impossible GNA gap write dirty read fetch impossible GNA gap write dirty sweep impossible GNA gap write dirty gap read impossible GNA gap write dirty gap write impossible GNE gap write dirty read hit impossible GNA gap write dirty read miss impossible GNA gap write dirty write hit impossible GNA gap write dirty write miss impossible GNA
TABLE LB5 - S SIIIMULTANEOUS OPERATIONS INVOLVING GAP ACTIVITIES
RECYCLING CACHE BINS
An important goal of the described methodology is the retention in cache, at any given moment m time, of data which is most likely to be requested by the host in the near future. One of the mechanisms of the present invention for accomplishing this goal is the recycling of cached bins based on recent history of usage of the data in each cache bm. Recycling, m its simplest form, is the granting of a "free ride" through the MRU-to-LRU cycle. Whenever a cache bm containing data previously accessed by the host is re-accessed by a read command from the host, information associated with that cache bm is updated in such a way as to indicate that bin's recent activity. In one embodiment, a write operation by the host does not contribute to the recycling of a bm, since a write miss is usually handled at the same speed as a hit, and, therefore, a much smaller benefit would accrue from recycling based on host write activity. It is likely the present invention's performance benefits as much or more from the availability for
reuse of cache bins whose primary activity was host writes as it would from recycling such cache bins. The recycling information is inspected whenever the cache bin reaches or nears the LRU position of the global cache chain. Normally, when a cache bin reaches the global LRU position, it is the primary candidate for decaching of its data when the cache bm is needed for caching some other data. However, in the present invention, based on the modes prevailing at that time, the LRU cache bm may be placed at the MRU position of the drive's private cache instead of being decached and reassigned. This action provides the currently cached data in that cache bin one or more "free rides" down through the private and global cache chains, or in other words, that data's time in cache is increased. Of course, when a cache bm is recycled to the MRU position of the drive cache chain, the recycling information is adjusted to show that the cache bm has already been recycled, and, thus, prevent it from perpetually being recycled. In another embodiment of the present invention, the recycling management decides whether to place the recycled cache at the MRU position of its drive cache chain, or at the MRU position of the global cache chain. While a number of procedures could be invoked to control the recycling operations, one specific method will be described here. The chosen methodology, as depicted m the accompanying diagrams, involves the following steps: 1. When a cache bm is accessed as a result of a cache read hit, the recycle register is increased by one. See Figure 29.
2. When a cache bm reaches or nears the global LRU position, its recycle information is inspected to determine if the cache bin should be recycled based on the current drive and global modes. See Figure 14.
3. If the global mode is normal and the mode of the specific drive to which the cache bm is assigned is normal, recycling takes place as follows:
3.1 The recycle register is divided by a preset factor representing the normal recycling methodology. See the recycling control parameters in the CFG table description.
3.2. If the resulting value is less than one half, the cache bin is not to be recycled.
3.3. If the resulting value is one half or more, the integer portion of the resulting value is placed in the recycle register, and the cache bm is recycled.
4. If either the global mode or the mode of the specific drive to which the cache bm is assigned is urgent, recycling is restricted to the more active cache bins, and takes place as follows:
4.1 The recycle register is divided by a preset factor based on the urgent mode. See the recycling control parameters in the CFG table description.
4.2. If the resulting value is less than one half, the cache bm is not to be recycled.
4.3. If the resulting value is one half or more, the integer portion of the resulting value is placed in the recycle register, and the cache bm is recycled.
5. If the global mode is saturated, or the mode of the specific drive to which the cache bin is assigned is saturated, no recycling takes place regardless of the recycle register value. The effect of this procedure is to allow unused cache-ahead cache bins, cache bins whose data has not been active m the recent past, and cache bins whose data was only the subject of host write operations to move down through the cache chains at a faster rate than those which have been re-used, allowing the more useful data to remain in cache for a
longer period of time. Recycling of cache bins is an optional action. If for some reason, such as high levels of activity or exceptionally large numbers of small, random reads by the host, a cache bin marked for recycling is better utilized for caching new data, the recycling of a given cache bin may be omitted occasionally. This will happen infrequently, but the ability to do so enhances the flexibility of the caching scheme. BACKGROUND SWEEP MANAGEMENT
When a write I/O from the host is serviced by the controller of the described storage device, the data from the host is placed in cache and the cache bins are assigned to the specified disk and placed in the modified bins pool. The data will be written from the cache to its specified disk in the background, minimizing the impact of the disk operations on the time required to service the host I/O. The modules that handle this background activity are collectively known as the background sweep modules. To limit the sweep activity, and thus limit contention for the spindles, only the data in those portions of cache bins which have been modified are written from SSD to disk during a sweep. Each disk's sweep is influenced by the mode in which the drive is currently operating, and is triggered independently of the other drives' conditions except when the global cache is saturated. In the interest of maintaining the storage device's high overall performance, the background sweep modules do not usually copy data from cache to disk as soon as the data has been modified. Rather, the sweep modules remain dormant until some set of conditions justify its operation. The background sweep for a given disk can be awakened by any of three sets of circumstances. These circumstances are:
1) The drive enters the sweep mode. A drive is placed in the sweep mode when the number of cache bins
containing modified data for that drive exceeds a preset threshold;
2) The drive enters the timeout mode. A drive is placed in timeout mode when a specified amount of time has elapsed since the data in the oldest modified cache bin was written from the host to the specified disk's cache;
3) The global cache is m saturated mode, and there is some modified data waiting to be written from this disk's cache to the disk. Global cache is placed in saturated mode when some prespecified amount of all cache bins contains modified data. SWEEP TRIGGERED BY COUNT
A drive is placed m sweep mode when its count of modified cache bins surpasses some preset number. In this case, the modified data from some number of cache bins will be written to the disk. The number of cache bins from which data is to be written to disk is equal to the number of cache bins containing modified data at the time the count placed the drive m sweep mode. Since more cache bins may be written into by the host while the sweep is active, this sweep may not reduce the number of modified cache bins to zero. It is important that this limitation on the number of cache bins to be written exists since, otherwise, the sweep could be caught up in a lengthy set of repetitious writes of one cache bm. This could occur should the host be in a situation in which it is constantly updating a small portion of one or a few cache bins. The fact that there may still be some modified cache bins at the end of a given sweep is not a problem; the count will place the drive back into sweep mode as soon as appropriate, even immediately if the count so dictates.
Ill
SWEEP TRIGGERED BY TIMEOUT
In order to avoid having a single modified cache bin for a given disk wait an inordinately long time before being copied from cache to disk, the drive may be placed in the timeout mode. A timeout occurs when data in a cache bin has been modified, the corresponding bm on disk has not yet been updated after a certain minimum time has elapsed, and the sweep for that disk has not been activated by the modified count. See Figure 45. When a timeout occurs for a given disk's cache, by definition there will be data in at least one cache bm which needs to be copied to disk. At this time, the given disk's cache will be placed m the timeout mode. A sweep which has been initiated by a timeout, unlike a sweep triggered by the counter, will write all modified cache bins for that disk drive to the disk before the drive sweep is terminated. Note, however, that this is a background activity, and as such, still has a lower priority than the handling of host commands. SWEEP TRIGGERED BY SATURATION
If, for some period of time, the amount of write activity by the host to one or more drives is very heavy, the global cache may be forced into the saturated mode. In this situation, the global mode overrides the individual drive modes and conditions, and the sweep is activated for all drives for which there exist any modified cache bins. In this case, the sweep for each drive behaves as though there had been a timeout. This method of triggering all the sweeps is for the purpose of making maximum use of the simultaneous cache-to-disk operations. As soon as the global crisis is past, the drive sweep operations will revert to individual drive control. SWEEP INITIATION
Whenever the device is not servicing an interrupt, the background activities for each drive are handled; see Figure 37.
Based on the drive and global modes, a write from cache to disk may be initiated. In order for a background write to be initiated, there must be no other ongoing activity involving the disk. See Figure 46. As a first step, the modified cache bin corresponding to the disk bin which is nearest, but not directly at, the disk read-write head position is identified. See Figure 47. If the read-write head for the drive is not currently located at the address of the disk bin corresponding to the modified bin a seek is initiated on the drive to the identified disk bin, and the sweep takes no further action at this time. If no other activity directly involving the disk drive occurs before the sweep again is given an opportunity to write a modified cache bin, the same cache bin will be identified, and this time, the head-position will be proper to continue with the write.
If the read-write head is found to be located at the address of the disk bin corresponding to the modified bin which is to be written from cache to disk, a write from cache to disk is initiated. Writing a modified bin from cache to disk is limited to handling only the modified portion or portions of the bin as defined in the LRU table and, possibly the GAP table. Once the write from cache to disk is initiated, it proceeds without further supervision by the firmware until the write has completed and an interrupt is generated by the drive. SWEEP TERMINATION
When a write from cache to disk completes, the disk will generate an interrupt. See Figure 50. The sweep termination module will then update the control tables to show that the cache bin no longer contains modified data, and rechain the cache bin into the proper cache chain based on a number of factors. See Figure 53. Since the writing of data from cache to disk may change the drive and global modes, these are reset
at this time. The effect of this method of handling background writes is to minimize the impact on the host operations. The controller has an opportunity to service host I/O misses between the background seek and the corresponding write operation. None of this has any significant effect on servicing host I/O cache hits since hits are always handled immediately. The disk is not involved in a hit. HOST-COMMAND MANAGEMENT
Whenever a command is received from the host computer, it is given the highest possible priority. To determine what actions are required, the command must be analyzed. A portion of the firmware is dedicated to that purpose (see description of host command analysis) . The analysis of the command will determine the type of command (read, write, seek, or other) and, where meaningful, will make a cache hit/miss determination. The analysis also sets up a list of the bins involved in the command in a table of one or more lines which will be used later in servicing the command. See Figure 28.
If the command is a read and it can be serviced entirely from cache, it is serviced by the read-hit portion of the controller (see description of read-hit handling) .
If any portion of the read cannot be serviced from cached bins, the command is considered a read cache miss, the information required to service it is queued, and the storage device disconnects from the host. See Figure 30. The background manager will note the existence of the queued tasK and will complete the handling of the read miss. See Figures 37, 40, and 42.
If the command is a write and all bins involved in the operation are already in cache, the command is serviced by the write-hit portion of the controller. See Figure 32 and description of write-hit handling.
If any portion of the write involves an uncached bm or bins, the command is turned over to the write-miss portion of the controller.
See Figure 33 and description of the write-miss handling. If the command is a seek, and the target bin is already cached, no action is required. IF the target bm is not cached, the command is considered a seek cache miss, the information required to service it is queued. In either case, storage device disconnects from the host in a fashion to indicate the seek has been completed, whether or not it really has been done. See Figure 31. The background manager will note the existence of the queued task and will complete the handling of the seek miss. See Figures 37, 40, and 44. UPDATE LEAST-ACTIVE-DRIVE LIST If the command is either a read or a write, the count of the I/O accesses for the corresponding drive is incremented, and the LAD information is adjusted. See Figure 22. If this current I/O makes the corresponding disk more active, i.e., its access count is greater than that for the drive above it in the LAD list, this drive is rechamed upward m the LAD list. Additionally, the total disk accesses is incremented. If this makes the total accesses reach the LAD maximum, the total accesses field is reset to zero, and the access count for each drive is recalculated to be its current value divided by the LAD adjustment factor. This overall LAD procedure is designed to temporarily favor the most active drive (s) but not allow one huge burst of activity by one drive to dominate the cache management for an overly long period of time. ANALYZE HOST I/O COMMAND The analysis of a host command includes creation of a bm address list (BAL) which contains the locations of each bm involved in the operation (see description of bm address list
setup) . For each such bm, the list will contain the bin's current location in cache, if it already resides there; or where it will reside in cache after this command, and related caching activity have been completed. In the case that a bm is not already cached, the space for it to be put mto m cache is located, and the current bm resident in that space is decached. The analysis includes setting the cache hit/miss flag so that the controller logic can be expedited. See Figure 19. SET UP BIN ADDRESS LIST The controller segment which sets up the bm address list uses the I/O sector address and size to determine the disk bin identifying numbers for each bm involved in the I/O operation as described m the section below. See Figure 20. The number of bins involved is also determined, and for each, the portion of the bm which is involved m the operation is calculated. ADDRESS TRANSLATION
A sector address can be converted into a bin address by dividing it by the bm size. The quotient will be the bm number, and the remainder will be the offset mto the bm where the sector resides. See Figure 21. CACHE HIT/MISS DETERMINATION - READ
Each line of the bm address list is inspected, and if, for a given disk bm, a corresponding cache bm is shown in the ADT table to exist, that information is copied mto the corresponding line of the BAL list. In addition, the valid sectors information in the LRU table for the bm is compared with the sectors required for the current I/O.
If any sectors are not in cache, the cache-miss marker
CACHE HIT/MISS DETERMINATION - WRITE
Each line of the bm address list is inspected, and if, for a given disk bm, a corresponding cache bm is shown in the
ADT table to exist, that information is copied mto the corresponding line of the BAL list. If any bins are not in cache, the cache-miss marker is set. CACHE READ-HIT OPERATION Refer to Figure 29. In order to reach this module, an I/O read command must have been received from the host. The command will have been analyzed and the bm address table will have been set up, and it has been determined that all data required to fulfill the read command is available in cache. With this preliminary work completed, the host read command can be satisfied by using each line of the bm address table as a subcommand control. A cache read hit is satisfied entirely from the cached data without disconnecting from the host. The caching firmware executes several operations at this time: 1. The requested data must be sent to the host. Since all required portions of all affected bins are already in cache, all required data can be sent directly from the cache to the host .
2. In addition to transferring the data to the host, the affected bins will be rechamed to become the most-recently-used (MRU) cache bins in the LRU cache chain for the drive. This may involve moving the cache bιn(s) from the global cache chain to the specific drive's cache chain. Refer to Figure 23 for the method for rechammg the cache bm or bins involved.
3. The LRU table is updated to reflect the fact that this data has been accessed by the host; if the recycling register value for any cache bin involved m the read hit has not reached its maximum allowable value, it is increased by one to provide for the possible recycling of the cache bm when that bm reaches the LRU position of the cache chain.
4. Proximity calculations are performed to determine the desirability of scheduling a potential read-ahead of an adjacent disk bm. Refer to the discussion on cache-ahead and see Figure 8. 5. The statuses of the global cache chain and the specified drive cache chain must be updated. CACHE READ-MISS OPERATION
A cache read-miss (Figure 30) is satisfied m part or wholly from the disk. In order to reach this module, an I/O read command must have been received from the host. The command will have been analyzed and the bin address table will have been set up, and it has been determined that some or all data required to fulfill the read command are not available in cache. A cache read-miss is handled in several steps: 1. See Figure 30. The information required to handle the read command is saved in a read queue for later use.
2. The storage device logically disconnects from the host.
3. The background management will note the existence of the queued command and proceed to complete the read command operation. See Figures 37 and 40.
4. See Figure 42. The required uncached data for a given disk bm, which may be all or part of the data requested by the host, is retrieved from the specified disk and placed in cache bins. See Figure 43. The corresponding lines of the LRU and ADT tables are updated to reflect the presence of the newly cached data in specific cache bins. Fields of the corresponding LRU table lme(s) are set to reflect the portions of those cache bins which contain data just brought mto cache from disk bins. 5. The statuses of the global cache chain and the specified drive cache chain are updated.
6. As the data for each disk bin arrives in a cache bm, the storage device logically reconnects to the host if it has not already done so.
7. The operations for each specific cache bm proceed in the same general manner as for a cache read-hit .
8. If the disk-to-cache operations proceed at a faster pace than the cache-to-host, the transfer of data to the host proceeds without further disconnects. Otherwise, the device disconnects and reconnects as needed. 9. Steps 3 through 6 are repeated for each disk bm involved m the host request .
10. The queued information for this command is deleted from the queue. CACHE WRITE-HIT OPERATION A cache write hit is handled entirely withm the cache. In order to reach this module of the controller, the host write command will have been analyzed and the bin address list will have been set up. See Figures 27 and 19. With this preliminary work completed, the host write command can be satisfied by using each line of the bm address list as a subcommand control. Since all affected bins are already represented m cache, all data can be sent directly from the host to the cache.
In addition to transferring the data to the cache, this module will, if the bin was linked mto either the drive cache chain or the global cache chain, remove the affected cache bm from the cache chain and place it in the modified pool. In each case, the corresponding LRU table is updated to reflect any changes resulting from the existence of this new data. If the new data created any gaps in the modified portions of the bins, the GAP table is also updated accordingly in order to handle any needed post-transfer staging of partial bins.
UPDATING THE GAP TABLE
Adding a gap reference for a given cache bm is handled in several steps; if no previous gaps exist for the cache bm, the LRUB table gap pointer item will be null. To indicate a gap exists in the modified data of the cache bm, the pointer is set to the number of the first available line in the GAP table. The referenced GAP table line is then filled in with the information about the gap.
Adding an additional gap reference for a cache bm which had a previous, unresolved gap is handled by chaining the next available GAP table line mto the front of the chain of gaps for the given cache bin. That GAP table line is then filled in with the information about the new gap.
Since the writing into cache bins may change the status and modes of the drives cache and the global cache, the drive status, the drive mode, the global status, and the global mode are all updated. CACHE WRITE-MISS OPERATION
A write miss is usually handled entirely withm the cache. In order to reach this module of the controller, the host command will have been analyzed and the bm address list will have been set up. With this preliminary work completed, the host write command can be satisfied by using each line of the bm address list as a subcommand control. Since this is a cache-miss, at least one of the addressed disk bins has no related cache bm assigned to it. One of two conditions will exist; 1) the device is operating in global saturated mode, or 2) it is not operating in global saturated mode. CACHE WRITE-MISS OPERATION WHEN NOT SATURATED If global cache is not operating in saturated mode, all data can be sent directly from the host to the cache without disconnecting from the host. Based on the drive and global
statuses and modes, cache bins are selected as needed and assigned to the drive. See Figures 33 and 26. As data is written into each cache bin, the bin, if not already in the modified pool, is removed from its current cache chain and placed in the modified pool, and the MOD table is updated. See Figures 24 and 25.
As in a write cache hit, the corresponding LRU table is updated to reflect any changes resulting from the existence of this new data. If the new data created any gaps in the modified portions of the bins, the GAP table is also updated accordingly in order to handle any needed post-transfer staging of partial bins. Since the writing into cache bins may change the status and modes of both the drive cache and the global cache, the drive status, the drive mode, the global status, and the global mode are all updated.
CACHE WRITE-MISS OPERATION WHEN DRIVE IS IN SATURATED MODE
If the drive cache is in the saturated mode, but the global cache is not operating in saturated mode, it is still possible to steal bins from either the global cache or from the cache assigned to other drives. See Figure 26. In this case, all data can be sent directly from the host to the cache without disconnecting from the host, and the write miss continues much as in a write cache hit . Cache bins are selected as needed and assigned to the drive currently being written to by the host. See Figure 33. As data is written into each bin, the bin, if not already in the modified pool, is removed from its current cache chain and placed in the modified pool, and the MOD table is updated. See Figures 24 and 25.
Just as in the handling of any write, the corresponding ADT, LRU and GAP tables are updated to reflect any changes resulting from the existence of this new data. Similarly, the new data may change the status and modes of both the drive and
the global cache, so the drive status, the drive mode, the global status, and the global mode are all updated. CACHE WRITE-MISS OPERATION WHEN GLOBAL IS SATURATED
If the global cache is operating in the saturated mode, there will be a delay while the modified data m some cache bm or bins is written to a disk. In this case, the host write command is handled in several steps:
1. See Figure 33. The information required to handle the write command is saved in a write queue for later use. 2. The storage device logically disconnects from the host .
3. Since the global cache is operating m saturated mode, all drives for which there exists modified data m cache will be actively writing data from cache to disk. 4. During each loop through the background list oi tasks, the cache will be inspected to determine if a reusable cache bm is available to handle part or all of the queued write. See Figures 40 and 48.
5. When a cache bm becomes available, the device will reconnect to the host and accept data mto that cache bm.
6. When data for all bins have been received from the host, the queued information for this command is deleted from the queue.
SEEK CACHE MISS HANDLING When the background manager finds a queued seek command, it will handle it as a background task. See Figures 40 and 44. If at this time the disk bm has not been cached, the drive is not busy, and the cache modes and statuses allow it, the device will bring the data from the target disk bm mto a cache bin. This caching action in response to a host seek command is based on the assumption that the host would not send a seek command unless it was intended to be followed by a read or a write
command. If conditions are not favorable to caching the target disk bm when a queued seek command is reconsidered, the seek command is ignored, but left in the queue for later consideration. It will be considered periodically until the target disk bm is cached for any reason. The queued seek will not be carried out as long as the disk is busy handling host read cache misses, or if the drive is busy on background sweeps or cache-ahead actions, or if the cache modes or statuses indicate it would be intrusive to the overall caching performance to use a cache bm for the seek action.
If conditions are all satisfactory, a seek for the given drive is handled in the same manner as a cache-ahead, with the data from the target disk bin being placed m cache. See Figure 44. Once the read from disk to cache is initiated, it proceeds without further supervision by the firmware until the read has completed and an interrupt is generated by the drive. SEEK TERMINATION
When the read from disk to cache completes, the disk will generate an interrupt. See Figure 50. The seek termination module will then update the control tables to show the subject disk bin's data is now in a cache bin. The cache bm is rechamed as the MRU bin for the drive. See Figure 52. INTERNAL INTERRUPTS While the present invention is m operation, it monitors its power source. Should the power to the unit be interrupted, for any reason, the device goes through a controlled power-down sequence, initiated as depicted in Figure 54. POWER DOWN CONTROL As depicted in the diagram of Figure 38, this portion of the firmware is invoked when the unit senses that the voltage on the power line to it has dropped. Since some of the data in the
device may be in cache bins in a modified state and awaiting transfer to one or more of the disks, power must be maintained to the cache memory until the modified portions have been written to their respective disks. Thus, a failure of the line power causes the device to switch to the battery backup module. The battery backup module provides power while the memory device goes through an intelligent shutdown process. If the host is m the process of a data transfer with the memory device when power drops, the shutdown controller allows the transfer in progress to be completed. It then blocks any further transactions with the host from being initiated. The shutdown controller then must initiate a background sweep for each disk to copy any modified portions of cache bins from the solid state memory to the disk so that such data will not be lost when power is completely shut off to the control and memory circuits. After all sweeps are completed (which usually will take only a few seconds) , all data m the solid state memory will also reside on the disks. At this point the disk spindles can be powered down, reducing the load on the battery. Most power outages are of a short duration. Therefore, the controller continues to supply battery power to the control circuits and the solid state memory for some number of seconds. If the outside power is restored in this time period, the controller will power the spindles back up and switch back to outside power. In this case, the operation can proceed without having to reestablish the historical data in the solid state memory. In any case, no data is at risk since it is all stored on the rotating magnetic disks before a final shutdown.
FINAL BACKGROUND SWEEP The final background sweep copies modified portions of data m cache bins from the solid state memory to the magnetic disk. See Figure 55. There will usually be only a few such
cache bins, or portions of cache bins to copy for each drive since the number of cache bins that can reach this state is intentionally limited by the logical operation of the system. The final sweep makes use of logic developed for the normal timeout operation of the background sweep. The sweep is initiated in much the same manner as for a timeout during normal operation. If no cache bins for a given drive contain data which need to be copied, the sweep for that drive is left in the dormant state, and no further sweep action is required for the drive. If data currently stored in any cache bins for a drive need copied, the sweep control sets up and initiates a background write event for the drive. Writes for all drives are executed simultaneously until all data from modified cache bins has been written from cache to the respective drives. When no modified cache bins remain to be copied, the sweep is finished. SERIAL PORT INTERRUPT HANDLING
The present invention includes a capability for utilizing an external terminal which can communicate with the device ' s executive control via a serial port. This communication facility can handle several types of activities. See Figure 56.
1. The serial port may make inquiries to obtain data about the workloads the device has been encountering, such as the numbers of I/O's by the host, the current cache condition, the history of the caching operations, etc. This information is sufficient to allow external analysis to determine, as a function of time, levels of performance, frequency of occurrence of various situations such as the background sweep, cache-aheads, and the modes of operation.
2. At the time the present invention is powered up, the serial port can be used to initiate self tests and to obtain the results thereof.
3. The serial port may, under certain circumstances, modify the configuration of the device. For example, a disk drive may be removed from, or added to the configuration. Another example, is the resetting of some of the information in the configuration table, such as the various cache management parameters .
4. The serial port may be used by the device ' s executive to report current operational conditions such as hardware failures or perceived problems, such as excessive disk retries during I/O's between the cache and the disks. CONTROL TABLE EXAMPLES
For the purpose of illustration, control tables or segments of control tables are listed here. These tables and segments represent conditions which could occur at initialization and during the operation of the present invention. A very brief description of each pertinent table field is given here for convenience in interpreting the table data; see the various table format descriptions for more detailed information. Note that the asterisk (*) is used throughout to indicate a null value, and a dash (-) is used to indicate the field has no meaning m that particular circumstance. CONFIGURATION TABLE
The CONFIGURATION table is made up of seven sections. 1. SIZING PARAMETERS AND DERIVED VALUES
2. DRIVE CACHE STATUS PARAMETERS AND DERIVED VALUES
3. GLOBAL STATUS PARAMETERS AND DERIVED VALUES
4. DRIVE MODE PARAMETERS AND DERIVED VALUES
5. GLOBAL MODE PARAMETERS 6. RECYCLING CONTROL PARAMETERS
7. DRIVE ACTIVITY PARAMETERS
The following configuration table examples set forth one possible device configuration; this configuration is used as a basis for the examples of the control tables that follow. While there are many possible valid sets of values for the configuration table, they are fixed for a given configuration, and change only if the hardware configuration or firmware is modified. CONFIGURATION SIZING PARAMETERS AND DERIVED VALUES
ITEM VALUE UNITS DESCRIPTION
CFG-SECSIZE 512 bytes size of sectors on the disks.
CFG-BINSIZE 64 sectors size of each cache and disk bin.
CFG-DRVSIZE 2, 048 megabytes capacity of each disk drive
CFG-DRIVEB 62, 500 bins capacity of each disk drive.
= CFG-DRVSIZE / CFG-BINSIZE
CFG-DRIVES 4 spindles number of disk drives on device.
CFG-CACHMB 256 megabytes size of entire cache.
CFG-CACHBINS 8, ,192 bins size of the entire cache.
= CFG-CACHMB / CFG-BINSIZE.
DERIVED VALUES bin size 32, ,768 bytes total disk 8, , 192 mb total disk 250 ,000 bins cache size 256 mb cache size 8 ,192 bins
LRU table size 8 , 192 lines
ADT table size 250 ,000 lines
MOD table size 31 ,250 bytes (250,000 bits)
GAP table size 800 lines
TABLE TCA - CONFIGURATION SIZING PARAMETERS AND DERIVED VALUES DRIVE CACHE STATUS PARAMETERS AND DERIVED VALUES
ITEM VALUE UNITS DESCRIPTION CFG-DSMARGB 3 bins lower limit of drive marginal status.
CFG-DΞNORPCT 10 percent lower limit, percent of all cache, drive normal status
CFG-DSNORMB 820 bins lower limit of drive normal status.
= CFG-DSNORPCT * CFG-CACHBINS
CFG-DSEXCPCT 50 percent lower limit percent of all cache, drive excess status. CFG-DSEXCESB 4,096 bins lower limit of drive excess status .
= CFG-CACHBINS * CFG-DSEXCPCT
DERIVED VALUES minimum status 3 bins marginal status 819 bins
normal status 820-4,096 bins excess status 4,097-8,192 bins
TABLE TCB - DRIVE CACHE STATUS PARAMETERS AND DERIVED VALUES GLOBAL STATUS PARAMETERS AND DERIVED VALUES
ITEM VALUE UNITS DESCRIPTION CFG-GSMARGB 31 bins lower limit of global chain in marginal status .
CFG-GSNORPCT 10 percent lower limit percent of all cache in global normal status,
CFG-GSNORMB 819 bins lower limit of global chain in normal status . = CFG-GSNORPCT * CFG-CACHBINS
CFG-GSEXCPCT 50 percent lower limit percent of all cache in global excess status, CFG-GSEXCESB 4, 096 bins lower limit of global chain in excess status. = CFG-GSEXCPCT * CFG-CACHBINS
DERIVED VALUES minimal status 31 bins marginal status 32 - 819 bins normal status 820 -4 , 096 bins excess status 4 , 097 - 8 , 192 bins
TABLE TCC - GLOBAL CACHE STATUS PARAMETERS AND DERIVED VALUES DRIVE MODE PARAMETERS AND DERIVED VALUES
ITEM VALUE UNITS DESCRIPTION CFG-DMSWEEP 16 bins lower limit of modified bins for sweep mode.
CFG-DMURGPCT 40 percent lower limit percent of modified bins for urgent mode.
CFG-DMURGNTB 819 bins lower limit of modified bins for urgent mode.
= CFG-DMURGPCT * CFG-CACHBINS /
CFG-DRIVES
CFG-DMSATPCT 80 percent lower limit percent of modified bins for saturated mode, CFG-DMSATURB 1,638 bins lower limit of modified bins for saturated mode. = CFG-DMSATPCT * CFG-CACHBINS /
CFG-DRIVES
DERIVED VALUES normal mode 0 - 15 dirty bins sweep mode 16 - 818 dirty bins urgent mode 819 - 1 , 638 dirty bins saturated mode 1 , 639 - 8 , 192 dirty bins
TABLE TCD - DRIVE MODE PARAMETERS AND DERIVED VALUES
GLOBAL MODE PARAMETERS
ITEM VALUE UNITS DESCRIPTION CFG-GMURGPCT 30 percent lower limit percent of all bins modified bins for urgent mode.
CFG -GMURGB 2,457 bins lower limit of modified bins for urgent mode.
= CFG-GMURGPCT * CFG-CACHBINS
CFG-GMSATPCT 60 percent lower limit percent of all bins modified bins for saturated mode.
CFG-GMSATURB 4,915 bins lower limit of modified bins for saturated mode. = CFG-GMSATPCT * CFG-CACHBINS
DERIVED VALUES normal mode 0-2,456 dirty bins urgent mode 2,457-4,914 dirty bins saturated mode 4,915-8,192 dirty bins
TABLE TCE GLOBAL MODE PARAMETERS AND DERIVED VALUES
RECYCLING CONTROL PARAMETERS
ITEM VALUE UNITS DESCRIPTION
CFG-RECYCLEC 8 bins number to test to recycle per pass
CFG-RECYCLEM 8 I/O's maximum count of accesses
CFG-RECYCLEN 2 recycle normal mode adjustment
CFG-RECYCLEU 4 recycle urgent mode adjustment
TABLE TCF RECYCLING CONTROL PARAMETERS
DRIVE ACTIVITY PARAMETERS
ITEM VALUE UNITS DESCRIPTION
CFG- LAD-MAX 32 , 768 I /O ' s I/O count that causes count adjustment .
CFG -LAD -ADJUST 16 count adjustment factor TABLE TCG - DRIVE ACTIVITY PARAMETERS
LEAST RECENTLY USED ( LRU) TABLE
The complete LRU table is made up of three sections
1. LRU-CONTROL, unindexed counters of device activity 2. LRU-DISKS, indexed by logical spindle 3. LRU-BINS, indexed by logical disk bin
LEAST RECENTLY USED CONTROL TABLE, INITIAL
The values in the LRU table are dynamically variable; while there are many possible valid sets of values, the values m this table represent one possible set of values at the completion of the initialization when the present invention is powered on after having been powered down for any reason. LRU-CONTROL TABLE, UNINDEXED, INITIAL
ITEM VALUE UNITS DESCRIPTION
LRUC-TOTAL-MOD 0 bins number modified cache bins LRUC-GLOBAL-BINS 8,184 bins number in global cache chain LRUC-GLOBAL-LRU 0 line oldest bin in global cache chain LRUC-GLOBAL-MRU 8,183 line newest bin in global cache chain
TABLE TLB - LRU TABLE DYNAMIC, UNINDEXED VALUES, INITIAL
LRU-DISKS TABLE, INDEXED BY SPINDLE NUMBER, INITIAL
There is one line m this section for each logica] spmdle.
FIELD BRIEF DESCRIPTION
LRUD-BINS-CHAIN clean cache bins assigned to spindle
LRUD-BINS-DIRTY cache bins assigned to spindle LRUD-DISK-LRU oldest clean bin assigned to spindle
LRUD-DISK-MRU newest clean bin assigned to spindle
SPINDLE BINS IN DIRTY LRU MRU
NUMBER CHAIN BINS BIN
0 2 0 8, , 190 8, ,191
1 2 0 8, ,188 8, ,189
2 0 8, , 186 8 ,187
3 2 0 8, , 184 8 ,185 TABLE TLC - LRU TABLE VALUES, INDEXED BY SPINDLE NUMBER, INITIAL
LRU-BINS TABLE, INDEXED BY CACHE BIN NUMBER, INITIAL
There is one line in this section for each cache bm.
FIELD BRIEF DESCRIPTION
LRUB-DISK-BIN ADT line which references this cache bin LRUB-DISK-ID current spindle for this cache bin LRUB-CHAIN flag indicating bin is in global or drive chain LRUB-LINK-OLD next-older cache bin for the same drive LRUB-LINK-NEW next-newer cache bin for the same drive LRUB-VALID-LOW lowest sector in bin containing valid data LRUB-VALID-HIGH highest sector m bm containing valid data
LRUB -MOD -LOW lowest sector in bin containing modif ied data LRUB -MOD -HIGH highest sector in bin containing modif ied data LRUB -MOD -GAP table line , if gaps exist in modif ied portion LRUB - LOCKED a set of flags indicating the cache bin is locked LRUB -RECYCLE indicates the desirability of recycling
CACHE DISK DISK CA LINK LINK VALID VALID MOD MOD MOD LOCKED RECYCLE
BIN ID BIN CHN [ OLEι NEW LOW HIGH LOW HIGH GAP FL AAGGSS VALUE
0 * G * 1 0 0
10 1 * + G 0 2 0 0
2 + * G 1 3 0 0
0
81B2 * * G 8181 8183 0 0
8183 * * G 8182 * 0 0
15 8184 3 * D * 8185 0 0
8185 3 * D 8184 * 0 0
8186 2 * D * 8187 0 0
8187 2 * D 8186 + 0 0
8188 1 * D + 8089 0 0
20 8189 1 * D 8188 * 0 0
8190 0 * D * 8191 0 0
8191 0 + D 8190 * 0 0
TABLE TLD LRU TABLE VALUES INDEXED BY CACHE BIN NUMBER , 25 INITIAL
LEAST RECENTLY USED CONTROL TABLE , OPERATING
The values in the fol lowing table are dynamical ly variable ; while there are many possible valid sets of values ,
30 the values in this table represent one possible set of values at a given point in time in the operation of the present invention .
LRU- CONTROL TABLE , UNINDEXED DYNAMIC , OPERATING
ITEM VALUE UNITS DESCRIPTION
35 LRUC-TOTAL-MOD 40 bins number modified cache bins LRUC-GLOBAL-BINS 1002 bins number global cache chain bins LRUC-GLOBAL-LRU 110 line oldest bin in global cache chain LRUC-GLOBAL-MRU 2883 line newest bin in global cache chain
40 TABLE TLE - LRU TABLE UNINDEXED VALUES, OPERATING
LRU-DISKS TABLE, INDEXED BY SPINDLE NUMBER, OPERATING
There is one line in this section for each logical spmdle. The values are dynamically variable; depending on the
45 specific implementation of the present invention. While there are many possible valid sets of values, the values in this table
represent one possible set of values at a given point in time in the operation of the present invention.
SPINDLE BINS IN DIRTY LRU MRU
NUMBER CHAIN BINS BIN BIN
0 1022 60 373 4033
1 769 4 1090 3365
2 931 0 2652 2911
3 1002 2 3828 1920 0
TABLE TLF - LRU TABLE VALUES INDEXED
OPERATING
LRU-BINS TABLE, INDEXED BY CACHE BIN NUMBER, OPERATING 5 There is one line in this section for each cache bin. The values are dynamically variable; there are many possible valid sets of values depending on the specific implementation of the present invention, its configuration, and its work load prior to the capture of these LRU table values . The values in 0 this table represent a small sample of one possible set of values at a given point in time in the operation of the present invention. Only a few selected LRU lines are shown.
CACHE DISK DISK CA LINK LINK VALID VALID MOD MOD MOD L O C K
25 RECYCLE
BIN ID BIN CHN OLD NEW LOW HIGH LOW HI GAP FLAGS
VALUE
49 1 881 D 2134 3365 0 63 * * * 0 2
*
110 1 588 G 3330 *
0 63 * * 0 0
3 0 373 0 11484 D * 549 0 63 * * * 0 3
488 0 966 D 549 733 21 63 * * *
0 0
549 0 11483 D 373 488 0 63 * * * 0 2
1090 1 12101 D * 8034 0 63 * * * 0 4
1 36213 - **** ****
1115 0 63 44 63 *
0 0
7244 - **** •*#**
35 1219 3 0 63 0 63 * 0 0 **# #***
1228 1 25422 - *
0 63 *
23 42 0 0
1818 0 8275 G 3330 500 0 63 * * * 0 0
*
1920 3 22031 D 6336 0 63 * * * 0 1
27433 - #*** **#*
40 2245 0 24 33 24 33 * 0 0
*
2331 0 44 D 1886 4033 38 63 * * 0 0
23763 D *
2652 2 2881 0 63 * # * 0 0
2881 2 3994 D 2652 693 0 63 * * * 0 0
45 2883 3 5454 G 7277 *
45 63 * * * 0 0
2911 2 13022 D 4093 * 0 63 * * * 0 0
****
3254 0 17211 - 0 63 0 63 1 0 0
3330 0 8274 G 110 1818 0 63 * * *
0 0
3365 1 14476 D 49 * 0 63 * 0 1
3828 3 24939 D * 7745 0 63 * 0 0
4033 0 15144 D 2331 -* 0 63 # 0 0
4092 2 18102 D 925 4093 0 63 * 0 0
5 4093 2 18103 D 4092 2911 0 63 * 0 0
4213 3 7245 - **** **** 0 63 0 63 0 0
6336 3 22030 D 1844 1920 0 63 * 0 1
7277 2 44 G 6290 2883 0 63 * 0
0 7500 1 12100 D 8034 97 0 63 * *
0 0 3
7745 3 24940 D 3828 7782 0 63 0 0
8034 1 388 D 1090 7500 32 63 * 0 2
TABLE TLG - LRU TABLE LINES IN CACHE BIN SEQUENCE, OPERATING
15
LRU-BINS TABLE, IN CACHE CHAIN SEQUENCE, OPERATING
The same LRU table lines from table TLG are shown below, reordered mto their cache chain sequences for illustration purposes .
20
CACHE DISK DISK CA LINK LINK VALID VALID MOD MOD MOD LOCKED
RECYCLE
BIN ID BIN CHN OLD NEW LOW HIGH LOW HI GAP FLAGS
VALUE
25 1 10 1 588 G * 3330 0 63 0 0
3330 0 8274 G 110 1818 0 63 0 1818 0 8275 G 3330 500 0 63 0 1
30
7277 2 44 6290 2883 0 63
2883 3 5454 G 7277 # 45 63 * *
35 373 0 11484 D * 549 0 63 * * 0 3
549 0 11483 D 373 488 0 63 * # 0 2
488 0 966 D 549 733 21 63 * * 0 0
2331 0 44 D 1886 4033 38 63 * * 0 0
40 4033 0 15144 D 2331 * 0 63 * * 0 0
2245 0 27433 - * * 24 33 24 33 0 0
3254 0 17211 - * + 0 63 0 63 0 0
1090 12101 D * 8034 0 63 * * 0 4
45 8034 388 D 1090 7500 32 63 * * 0 2
7500 12100 D 8034 97 0 63 * * 0 3
49 881 D 2134 3365 0 63 * * 0 2
3365 14476 D 49 * 0 63 * * 0 1
50 1115 36213 - * * 0 63 44 63 0 0
1228 25422 - * * 0 63 23 42 0 0
2652 2 23763 D * 2881 0 63 * *
2881 3994 D 2652 693 63
4092 2 18102 D 925 4093 0 63 0 0 4093 2 18103 D 4092 2911 0 63 0 0 2911 2 13022 D 4093 * 0 63 0 0
3828 3 24939 D 7745 0 63 0 0 7745 3 24940 D 3828 7782 0 63 0 0
10
6336 22030 D 1844 1920 63
1920 22031 D 6336 0 63 0 1
1219 7244 - 0 63 0 63 0 0
15 4213 7245 - 0 63 0 63 0 0
TABLE TLH - LRU TABLE LINES IN CHAIN SEQUENCE, OPERATING
ADDRESS TRANSLATION (ADT) TABLE 20 The complete ADT table is made up of three sections:
1. ADT-CONTROL, unmdexed counters of device activity
2. ADT-DISKS, indexed by logical spindle
3. ADT-BINS, indexed by logical disk bm ADDRESS TRANSLATION (ADT) TABLE, INITIAL
25 The values in the following table are dynamically variable; these fields are used primarily as counters of the device activities. The values in this table represent the set of values at the completion of the initialization when the present invention is powered on after having been powered down for any
30 reason.
ADT-CONTROL TABLE, UNINDEXED, INITIAL
ITEM VALUE UNITS DESCRIPTION
ADTC - ACCESSES 0 accesses number of host accesses to device 3 5 ADTC -READS 0 accesses number of host reads to device ADTC -WRITES 0 accesses number of host writes to device
TABLE TAB - ADT TABLE UNINDEXED VALUES , INITIAL ADT-DISKS TABLE , INDEXED BY SPINDLE NUMBER , INITIAL
40 There is one line m this section for each logica l, spmdle , and each l ine is referenced by logical spindle number . During the operation of the device , the values are dynamically variable . Depending on the speci f ic implementation of the
present invention, there are many possible valid sets of values. Some of the values in this table are arbitrarily set during initialization. This table represents one possible set of values at the completion of the initialization when the present invention is powered on after having been powered down for any reason.
FIELD BRIEF DESCRIPTION
ADTD-LINE-BEG first ADT-BINS line for referenced spmdle ADTD-HEAD-POS current position of read/write head of spindle ADTD-SWEEP-DIR current direction of sweep ADTD-DISK-ACCESSES number of host accesses since last reset ADTD-DISK-READS number of host read accesses since last reset ADTD-DISK-WRITES number of host write accesses since last reset ADTD-LAD-USAGE function based on number of host accesses ADTD-LINK-MORE chain pointer to more active drive in LAD list ADTD-LINK-LESS chain pointer to less active drive in LAD list
SPINDLE FIRST HEAD SWEEP HOST NUMBER LINE POSITION DIRECTION ACCESSES READS WRITES 0 0 32,250 UP 0 0 0 1 62,500 94,750 UP 0 0 0 2 125,000 127,250 UP 0 0 0 3 187, 500 189,750 UP 0 0 0
SPINDLE DISK LINK NUMBER USAGE MORE LESS 0 + 2 1 0 1 3 2 0 2 4 3 0 3
TABLE TAC - ADT TABLE VALUES, INDEXED BY SPINDLE NUMBER, INITIAL
ADT-BINS TABLE, INDEXED BY DISK BIN NUMBER, INITIAL
There is one line in this tabular portion of the ADT table for each logical disk bm of all spindles combined.
FIELD BRIEF DESCRIPTION
ADTB-CACHE-BIN logical cache bin containing disk bin data ADTB-BIN-ACCESSES number of host accesses since last reset ADTB-BIN-READS number of host read accesses since last reset ADTB-BIN-WRITES number of host write accesses since last reset
ADT (DISK (DISK CACHE HOST
LINE BIN) NBR) BIN ACCESSES READS WRITES
0 0 0 * 0 0 0
1 1 0 * 0 0 0
62499 62499 0 0 0 62500 0 0 0 0 62501 1 0 0 0
124999 62499 0 0 0 125000 0 0 0 0 125001 1 0 0 0
187499 62499 0 0 0 187500 0 0 0 0 187501 1 0 0 0
249999 62499 3 * 0 0 0 TABLE TAD - ADT TABLE VALUES, INDEXED BY DISK BIN NUMBER, INITIAL
ADDRESS TRANSLATION (ADT) TABLE, OPERATING
The values in this table represent one possible set of ADT values at a given point in time in the operation of the present invention. ADT-CONTROL TABLE, UNINDEXED DYNAMIC, OPERATING
ADTC-ACCESSES = 10,000,000 number of host accesses to device ADTC-READS = 8,000,000 number of host reads to device ADTC-WRITES = 2,000,000 number of host writes to device
TABLE TAE - ADT TABLE VALUES, UNINDEXED, OPERATING ADT-DISKS TABLE, INDEXED BY SPINDLE NUMBER, OPERATING
The values in this table represent a sample of one possible set of ADT values at a given point in time in the operation of the present invention.
SPINDLE FIRST HEAD SWEEP HOST
NUMBER LINE POSITION DIRECTION ACCESSES READS WRITES
0 0 15625 UP 3,500,000 2, , 800, 000 700, , 000
1 62500 46875 UP 2,500, 000 2, ,000,000 500, .000
2 62500 78125 DOWN 3, 000, 000 2, ,400, 000 600, ,000
3 93750 109375 UP 1,000,000 800,000 200, ,000
SPINDLE DISK LINK
NUMBER USAGE MORE LESS
0 14,000 2 1
1 7,000 0 3
2 18,000 + 0
3 4,000 1 *
TABLE TAF - ADT TABLE VALUES INDEXED : BY SPINDLE NU MBE
OPERATING
ADT-BINS TABLE, INDEXED BY DISK BIN NUMBER; OPERATING
The values in this table represent a sample of one possible set of ADT values at a given point in time in the operation of the present invention. The line numbers, disk bin numbers, and disk numbers are implicit and are not in the ADT table, but they are included here for clarity.
ADT (DISK (DISK CACHE HOST
LINE BIN) NBR) BIN ACCESSES READS WRITES COMMENT
44 0 2331 1 0
966 0 488 1 0
8274 0 3330 1 0 1
8275 0 1818 1 0 1
11483 0 549 12 6
11484 0 373 5 3 2
15144 0 4033 2 1
17211 0 3254 4 2 gap
27433 0 2245 2 1
62888 388 1 8034 3 0
63088 588 1 110 1 0 1
63381 881 1 49 3 0
74600 12100 1 7500 4 0
74601 12101 1 1090 5 0
63976 14476 1 3365 2 0
87922 25422 1 1228 1 0 1
98713 36213 1 1115 1 0 1
125044 44 2 7277 1 0
128994 3994 2 2881 1 0
138022 13022 2 2911 1 0
143102 18102 2 4092 1 0
143103 18103 2 4093 1 0
148763 23763 2 2652 1 0
192954 5454 3 2683 2 1
194744 7244 3 1219 1 0 1
194745 7245 3 4213 1 0 1
209530 22030 3 6336 4 2
209531 22031 3 1920 3 2
212439 24939 3 3828 1 0 1
212440 24940 3 7745 1 0 1
TABLE TAG ADT TABLE VALUES, INDEXED BY DISK BIN NUMBER, OPERATING
GAP TABLE
The complete GAP table is made up of three sections: 1. GAP-CONTROL, unindexed counts and pointer to first unused line
2. GAP-DISKS, indexed by logical spindle
3. GAP-GAPS, indexed by gap number GAP TABLE, INITIAL
The values in the GAP table are dynamically vaπabLe; since no gaps can exist at power up time, this table will contain zeros and null values at the completion of the initialization when the present invention is powered on after having been powered down for any reason. GAP-CONTROL TABLE, UNINDEXED INITIAL
ITEM VALUE UNITS DESCRIPTION
GAP -GAPS 0 gaps total gaps , al l logical spindles
GAP -UNUSED - FIRST 1 l ine f irst unused l ine in GAP table
GAP -UNUSED -NUMBER 800 l ines number unused l ines in GAP table TABLE TGB - GAP TABLE DYNAMI C , UNINDEXED VALUES , INIT IAL
GAP-CONTROL TABLE, INDEXED BY DRIVE INITIAL
There is one line in this section for each logical spmdle currently incorporated in the present invention. Since no gaps can exist at power up time, this table will contain all zeros at the completion of the initialization when the present invention is powered on after having been powered down for any reason.
FIELD BRIEF DESCRIPTION GAPD-NUMBER of gaps for this logical spmdle
GAPD-FIRST pointer to line of first gap for this di lve GAPD-LAST pointer to line of last gap for this dr.ve
SPINDLE NUMBER FIRST LAST NUMBER GAPS GAP
0 0 * *
1 0 * *
2 0 * *
3 0 * *
TABLE TGC - GAP TABLE VALUES, INDEXED BY SPINDLE NUMBER, INITIAL GAP-CONTROL TABLE, INDEXED BY GAP, INITIAL
There is one line in this section for each cache bm which currently contains modified cached data, and a gap exists in that modified data. Since no gaps can exist at power up time, this table is essentially null at the completion of the
initialization when the present invention is powered on after having been powered down for any reason.
FIELD BRIEF DESCRIPTION GAPG-DISK identity of logical spmdle for this gap, null if line is currently unused
GAPG-BIN cache bin number in which this gap exists
GAPG-SECTOR-BEG first sector number that contains non-valid data
GAPG-SECTOR-END last sector number that contains non-valid data
GAPG-PREV pointer to next gap in data in same bin, or to previous unused GAP table line
GAPG-NEXT pointer to previous gap in data in same bin, or to next unused GAP table line GAP SPINDLE BIN FIRST LAST PREVIOUS NEXT
NUMBER ID NUMBER SECTOR SECTOR GAP GAP
800 799
TABLE TGD - GAP TABLE VALUES, GAP INDEX, INITIAL
GAP TABLE, OPERATING
Since gaps in cached, modified data are only created as the result of very unusual circumstances, there will be very few, if any, m existence at any given time. For illustration purposes, one gap is shown m this snapshot of the operating GAP table. GAP-CONTROL TABLE, UNINDEXED, OPERATING
ITEM VALUE UNITS DESCRIPTION
GAP -GAPS 1 gaps total gaps , all logical spindles
GAP -UNUSED - FIRST 2 line f irst unused line in GAP table
GAP -UNUSED-NUMBER 799 lines number unused l ines in GAP table TABLE TGE - GAP TABLE DYNAMI C , UNINDEXED VALUES , OPERATING
GAP - CONTROL TABLE , INDEXED BY SPINDLE , OPERATING
SPINDLE NUMBER FIRST LAST NUMBER GAPS GAP
0 1
1 0
2 0
3 TABLE TGF GAP TABLE VALUES , INDEXED BY SPINDLE NUMBER ,
OPERATING
GAP-CONTROL TABLE, INDEXED BY GAP, OPERATING
GAP SPINDLE BIN FIRST LAST PREVIOUS NEXT
NUMBER ID NUMBER SECTOR SECTOR GAP GAP 1 0 3254 11 20 * *
2 * * * * * 3
800 * * * * 799 * TABLE TGG - GAP TABLE VALUES, GAP INDEX, OPERATING MODIFIED BINS (MOD) TABLE
There is one section in the MOD table; its index is a function of disk identifying number and disk bin number. The index is calculated as a constant based on number of disk bins per spindle plus the disk bin number divided by the word size, which m this example configuration is 16 bits. The base for the section of the MOD table for each spindle is the number of disk bins per spindle divided by the word size (in the examples, 16) always rounded up, times the spindle identifying number. Thus, in our example, the MOD table section for spindle zero starts at index of zero, for spindle 1 starts at 3907, spindle 2 starts at 7814, and spindle 3 starts at 11721. MODIFIED BINS TABLE, INITIAL
FIELD BRIEF DESCRIPTION
MOD-FLAGS 16 one-bit flags
INDEX MOD-FLAGS (binary)
0 0000 0000 0000 0000
1 0000 0000 0000 0000
3906 0000 0000 0000 0000
3907 0000 0000 0000 0000
15626 0000 0000 0000 oooo
15627 0000 0000 0000 0000
TABLE TMD - MOD TABLE VALUES , INITIAL
MODIFIED BINS TABLE , OPERATING
FIELD BRIEF DESCRIPTION
MOD - FLAGS 16 one -bi t f lags INDEX MOD- FLAGS (binary)
0 0000 0000 0000 0000
1 0000 0000 0000 0000
1075 0000 0000 0001 0000 (disk 0, bin 17211 is modified)
1714 0000 0000 0100 0000 (disk 0, bm 27433 is modified)
5495 0000 0000 0000 0010 (disk 1, bin 25422 is modified)
6170 0000 0100 0000 0000 (disk 1, bin 36213 is modified)
12173 0000 0000 0000 1100 (disk 3, bins 7244, 7245 are modified)
15626 0000 0000 0000 0000 15627 0000 0000 0000 0000
TABLE TMG - MOD TABLE VALUES, OPERATING
APPENDIX SUPPORTING LOGIC FOR FIGURE 11 BEST
GLOBAL DRIVE LEAST GET BIN PROBABLE
STATUS STATUS ACTIVE ACTION FROM SITUATION minimal minimal minimal beg anywhere saturated minimal minimal marginal steal least active saturated minimal minimal normal steal least active saturated minimal minimal excess steal least active unlikely minimal marginal minimal buy global saturated minimal marginal marginal steal least active saturated minimal marginal normal steal least active unlikely minimal marginal excess steal least active unlikely minimal normal minimal buy global saturated minimal normal marginal steal least active unlikely minimal normal normal steal least active unlikely minimal normal excess steal least active unlikely minimal excess n/a buy global unlikely marginal minimal minimal steal global saturated marginal minimal marginal steal least active saturated marginal minimal normal steal least active saturated marginal minimal excess steal least active unlikely marginal minimal buy global saturated marginal steal least active saturated marginal normal steal least active unlikely marginal excess steal least active unlikely marginal normal minimal buy global saturated marginal normal marginal steal least active saturated marginal normal normal steal least active unlikely marginal normal excess steal least active unlikely marginal excess n/a buy global unlikely normal minimal minimal steal global saturated normal minimal marginal steal global saturated normal minimal normal steal least active unlikely normal minimal excess steal least active unlikely normal marginal minimal steal global saturated normal marginal marginal steal global saturated normal marginal normal steal least active expected normal marginal excess steal least active unlikely normal normal minimal steal global unlikely normal normal marginal steal global expected normal normal normal steal least active expected
normal normal excess steal 1 east active unlikely normal excess n/a buy global unlikely excess minimal n/a steal global startup excess marginal n/a steal global startup excess normal n/a steal global startup excess excess n/a buy global unlikely
SUPPORTING LOGIC FOR FIGURE 11
1 IF global any
AND drive excess AND least active n/a
THEN buy from global minimal excess n/a buy global unlikely marginal excess n/a buy global unlikely normal excess n/a buy global unlikely excess excess n/a buy global unlikely
2 ELSE IF global excess
AND drive any (excepl t excess)
AND least active n/a THEN steal from global excess minimal n/a steal global startup excess marginal n/a steal global startup excess normal n/a steal global startup
(at this point determine best least active except for current drive)
ELSE IF global any (except excess; )
AND drive any (except excess; ) AND least active > marginal THEN steal from least active minimal excess steal least active unlikely minimal marginal excess steal least active unlikely minimal normal excess steal least active unlikely marginal minimal excess steal least active unlikely marginal excess steal least active unlikely marginal normal excess steal least active unlikely normal minimal excess steal least active unlikely normal marginal excess steal least active unlikely normal normal excess steal least active unlikely minimal normal steal least active saturated minimal marginal normal steal least active unlikely minimal normal normal steal least active unlikely marginal minimal normal steal least active saturated marginal normal steal least active unlikely marginal normal normal steal least active unlikely normal minimal normal steal least active unlikely normal marginal normal steal least active expected normal normal normal steal least active expected
4. ELSE IF global normal
AND drive any (except excess )
AND least active any (except excess and normal) THEN steal from global normal marginal marginal steal global saturated normal marginal minimal steal global saturated normal minimal marginal steal global saturated normal minimal minimal steal global saturated normal normal marginal steal global expected normal normal minimal steal global unlikely
5. ELSE IF global any (except excess , normal)
AND drive any (except excess)
AND least active marginal
THEN steal from least active marginal normal marginal steal least active several marginal marginal steal least active saturated marginal minimal marginal steal least active saturated
6. ELSE IF global marginal
AND drive > minimal (except excess)
AND least active (minimal)
THEN buy from global marginal normal minimal buy global saturated marginal minimal buy global saturated
7. ELSE IF global (marginal) AND drive (minimal)
AND least active (minimal)
THEN steal from global marginal minimal marginal steal global saturated marginal minimal steal global saturated
8. ELSE IF global (minimal)
AND drive normal AND least active (minimal)
THEN buy from global minimal normal marginal buy global unlikely minimal normal minimal buy global saturated 9.
ELSE IF global (minimal)
AND drive marginal
AND least active (minimal) THEN buy from global minimal marginal minimal buy global saturated
10
ELSE IF global (minimal)
AND drive (minimal)
AND least active (minimal) THEN beg from anywhere minimal beg anywhere saturated
PLATEAU CACHE ILLUSTRATION
For purposes of illustrating the operation of the plateau cache system, eight situations are presented. See Figures 5"? and 58. Four drives are represented, identified as drive 1, drive 2, drive 3, and drive 4. The snapshots are sequential, but not contiguous. Figure 57 shows the cache assignments when the system is configured with three plateaus as described in detail in this document; Figure 58 illustrates the same operating conditions when five plateaus are utilized, an extension of the described technology. In each figure, the column labeled "p" shows the amount of cache allocated to each of the plateaus. Any set of conditions could have been chosen; the following are typical and/or illustrative:
Case 1 is a snapshot of the cache allocation at a time in the operation of the device during which the four drives are approximately equally busy handling workloads consisting of nearly all host read commands. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 1, drive 4, drive 3, drive 2. Global cache is always considered the most active. Note in both figures the nearly equal distribution of cache among the four drives and the global cache. Since the current commands are reads, and no large amount of writes have occurred in the recent past, the modified cache pool is nearly empty.
Case 2 is a snapshot of the cache allocation at a time in the operation of the device during which drive number 2 is handling most of the host commands, the host commands consisting
of nearly all host reads. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 1, drive 3, drive 4, drive 2. Note in both figures the large amount of cache allocated to drive 2. In Figure 57, see the distribution of the remaining cache favoring drive 4 at the expense of drives 2 and 3. In Figure 58, note that the distribution of cache among drives 2, 3, and 4 and the global cache is fairly equal. This case shows that the plateau methodology is protecting some of the cached data for drives 1, 3, and 4. Again, since the current commands are reads, and no large amount of writes have occurred in the recent past, the modified cache pool is nearly empty.
Case 3 is a snapshot of the cache allocation at a time subsequent to case 2 in the operation of the device during which drive number 2 is continuing to handle most of the host commands, the host commands consisting of nearly all host reads. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 1, drive 3, drive 2, drive 4. As in case 2, note in Figure 57, the large amount of cache allocated to drive 4, and the fact that the distribution of cache among the other three drives and the global cache is fairly equal. In Figure 58, note that drive 2 has retained a bit more of the cache than that assigned to each of drives 1 and 3. In both figures, the plateau methodology is now protecting some of the cached data for drives 1, 2, and 3. Again, since the current commands are reads, and no large amount of writes have occurred in the recent past, the modified cache pool is nearly empty.
Case 4 is a snapshot of the cache allocation at a time in the operation of the device during which drive number 1 is handling most of the host commands, the host commands consisting of nearly all host reads. At the moment, the current amount of
drive activity, ranked from least active drive to most active drive, is drive 2, drive 4, drive 3, drive 1. As in case 2, note in Figure 57 the very large amount of cache now allocated to drive 1, and the fact that the distribution of cache among the drives 3 and 4 and global cache is fairly equal, while the least active drive (drive 2) has been forced to relinquish more of its cache space. In Figure 58, drive 2 has been allowed to retain less of its cache than in Figure 57; this reflects the different reaction of the device to the same work loads but with a different set of plateaus. This case shows that the plateau methodology is now protecting a modest amount of the cached data for drives 3 and 4, and drive 2 is utilizing only a small amount of cache. Again, since the current commands are reads, and no large amount of writes have occurred in the recent past, the modified cache pool is nearly empty.
Case 5 is a snapshot of the cache allocation at a time in the operation of the device shortly after the case 4 snapshot, and at a time when drive number 1 is still handling most of the host commands, the host commands consisting of nearly all host; writes. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 4, drive 1, drive 2, drive 3. Drive 2 has recovered some of the cache space it had previously given up. Note in the figures the lager amounts of cache now assigned to the modified data pool and drive 3 which is the most active; that cache space having been shifted from the drive 1 cache space as seen in case 4. The distribution of cache among the other drives and the global cache is fairly equal in the three-plateau case; in the five- plateau case. This case shows that the plateau methodology is protecting some of the cached data for drives 1, 2, and 4. The current commands are mostly writes, so a noticeable amount of cache is in the modified cache pool.
Case 6 is a snapshot of the cache allocation at a time in the operation of the device during which drive number 3 is handling most of the host commands, the host commands consisting of a nearly even balance of reads and write. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 4, drive 1, drive 2, drive 3. Again, note m the figures the amount of cache assigned to the modified data pool. The distribution of cache among the other drives and the global cache is fairly equal in the three-plateau case; m the five-plateau case, drive 4 has been forced to relinquish more of its cache space. This case shows that the plateau methodology is protecting some of the cached data for drives 1, 2, and 4. The current commands include many writes, so a noticeable amount of cache is in the modified cache pool. Case 7 is a snapshot of the cache allocation at a time m the operation of the device during which drive number 1 is handling most of the host commands, the host commands consisting of nearly all writes sent to the device in rapid succession. This is a rare occurrence, but may happen when copying a large amount of data into the device. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 4, drive 3, drive 2, drive 1. Note m the figures the large amount of cache now assigned to the modified data pool; that cache space having been shifted from the other drives' cache space. In the three-plateau case, the non-modified data is distributed rather equally among the drives and global cache; in the five-plateau case, more of global cache and drive 1 cache have been protected at the expense of the cache assigned to drives 2, 3, and 4. The current commands are nearly all writes, and are occurring very rapidly, so a noticeable amount of cache is temporarily m the modified cache pool .
Case 8 is a snapshot of the cache allocation at a time m the operation of the device during which each drive has resumed a more even amount of the host commands, the host commands consisting of nearly all host reads. At the moment, the current amount of drive activity, ranked from least active drive to most active drive, is drive 4, drive 3, drive 2, drive 1. In both the three-plateau and five-plateau configurations, the cache allocations have returned to a balanced position.
All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
The invention now being fully described, it will be apparent to one of ordinary skill in the art that many changes and modifications can be made thereto without departing from the spirit or scope of the appended claims.
Claims
1. A method for operating a cache memory system having a plurality of mass storage devices and a cache memory, said method comprising the steps of : storing m said cache memory selected data from said mass storage device; maintaining a Least Recently Used (LRU) table indicating the relative recency of use of each data entry stored in said cache memory; updating a value in a recycle register for each entry in said LRU table upon the occurrence of one or more of: when data stored in said cache memory associated with said LRU entry has been accessed for reading; when data associated with said LRU entry is added to said cache memory; and when said LRU table entry is rechamed for reuse after a previous non-use; and when cache space is determined to be needed corresponding to an area of mass storage which is not currently assigned in said cache memory: examining an entry at or near the LRU position of said LRU table to determine the value of said recycle register associated with said LRU table entry; if said value is greater than a predefined threshold value, updating said value and placing said entry at the top of said LRU table and repeating said step of examining the entry at or near the LRU position of said LRU table to determine the value of said recycle register; and if said value is not greater than said predefined threshold value, decaching said data in said cache memory associated with said LRU table entry, reusing said LRU entry and its related cache memory for satisfying said current need for storing data not stored in said cache memory, and placing the new LRU entry at the MRU position of said LRU table or marking it dechamed depending on the new usage of the entry.
2. A method as in claim 1 which further comprises the step of maintaining data in said LRU table indicating which sectors of each block of data stored in said cache memory have been modified.
3. A method as m claim 1 wherein, for each block stored in said cache memory, not all sectors of a given block need be cached.
4. A method as m claim 1 wherein an LRU block entry is flagged as a dechamed member of the LRU table if tnat track contains modified data.
5. A method as in claim 4 wherein a block entry is made a chained member of the LRU table after its modified data is written to said mass storage device.
6. A method as in claim 5 wherein said a block entry is made a chained member at the bottom of the LRU table if every sector of said block was modified prior to writing to said mass storage device.
7. A method as in claim 5 wherein a block entry is made a chained member at the top of the LRU table if less than every sector of said block was modified prior to writing to said mass storage device.
8. A method as m claim 1 wherein said step of updating a value in a recycle register upon data associated with said LRU entry being added to said cache memory is based upon the original reason for caching said data withm said cache memory.
9. A method as in claim 8 wherein said reason is selected from the group of reasons consisting of a read-miss, a write- miss, a read-ahead, and a read-behind.
10. A method as m claim 1 wherein said step of updating a value m a recycle register upon data associated with said LRU entry being added to said cache memory is based upon the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
11. A method as in claim 1 wherein said step of updating a value in a recycle register when said LRU table entry is rechamed for reuse after a previous nonuse is dependent on one or more of : the current value in said recycle register; the original reason for caching data associated with said recycle register; the nature of the most recent activity of said data associated with said recycle register; and the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
12. A method as in claim 11 wherein said original reason for caching said data is selected from the group of reasons consisting of read-miss, write-miss, read-ahead, and read- behind.
13. A method as in claim 11 wherein said nature of the most recent activity of said data associated with said recycle register is selected from read-hit, write-hit, read-ahead, and read-behind.
14. A method as m claim 1 wherein said step of updating a value in a recycle register when said LRU table entry reaches the LRU position is dependent on one or more of : the current value in said recycle register; the previous reason for caching data associated with said recycle register; the nature of the most recent activity of said data associated with said recycle register; and the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
15. A method as in claims 11 or 14 wherein said original reason for caching said data is selected from the group of reasons consisting of read-miss, write-miss, read-ahead, and read-behind.
16. A method as in claims 11 or 14 wherein said nature of the most recent activity of said data associated with said recycle register is selected from read-hit, write-hLt, read- ahead, and read-behind.
17. A method for operating a cache memory system having a plurality of mass storage devices and a cache memory which is divided mto a plurality of sections, each associated with one of said plurality of mass storage devices, said method comprising the steps of: storing in each section of said cache memory selected data from its associated one of said mass storage devices; maintaining, for each section of said cache memory, a Least Recently Used (LRU) table indicating the relative recency of use of each data entry stored in said section of cache memory; updating a value m a recycle register for each entry in said LRU tables upon the occurrence of one or more of : when data stored in said cache memory associated with said LRU entry has been accessed; when data associated with said LRU entry is added to said cache memory; when said LRU table entry is rechamed for reuse after a previous non-use; and when cache space is determined to be needed for storing data currently stored in or destined to be stored in an area of mass storage and said data is not currently assigned an area in said cache memory; examining an entry at or near the LRU position of one of said LRU tables to determine the value of said recycle register associated with said LRU table entry; if said value is greater than a predefined threshold value, updating said value and placing said entry at the top of said LRU table and repeating said step of examining the entry at or near the LRU position of said LRU table to determine the value of said recycle register; and if said value is not greater than said predefined threshold value, decaching said data in said cache memory associated with said LRU table entry, reusing said LRU entry and its related cache memory for satisfying said current need for storing data not stored in said cache memory, and placing the new LRU entry at the MRU position of said LRU table or marking it dechained depending on the new usage of the entry.
18. A method as in claim 17 which further comprises the step of maintaining data in each of said LRU tables indicating which sectors of each block of data stored in said cache memory have been modified.
19. A method as in claim 17 wherein, for each block stored in said cache memory, not all sectors of a given block need be cached.
20. A method as in claim 17 wherein an LRU block entry is flagged as a dechained member of an LRU table if that block contains modified data.
21. A method as in claim 20 wherein a block entry is made a chained member of an LRU table after its modified data is written to one of said mass storage devices.
22. A method as in claim 21 wherein said block entry is made a chained member at the bottom of the LRU table if every sector of said block was modified prior to writing to said mass storage device.
23. A method as m claim 21 wherein said block entry is made a chained member at the top of the LRU table if less than every sector of said block was modified prior to writing to said mass storage device.
24. A method as in claim 17 wherein said step of updating a value in a recycle register upon data associated with said LRU entry being added to said cache memory is based upon the original reason for caching said data withm said cache memory.
25. A method as in claim 24 wherein said reason is selected from the group of reasons consisting of a read-miss, a write-miss, a read-ahead, and a read-behind.
26. A method as in claim 17 wherein said step of updating a value in a recycle register upon data associated with said LRU entry being added to said cache memory is based upon the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
27. A method as in claim 17 wherein said step of updating a value in a recycle register when said LRU table entry is rechamed for reuse after a previous nonuse is dependent on one or more of : the current value m said recycle register; the original reason for caching data associated with said recycle register; the nature of the most recent activity of said data associated with said recycle register; and the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
28. A method as in claim 27 wherein said original reason for caching said data is selected from the group of reasons consisting of read-miss, write-miss, read-ahead, and read- behind.
29. A method as in claim 27 wherein said nature of the most recent activity of said data associated with said recycle register is selected from read-hit, write-hit, read-ahead, and read-behind.
30. A method as in claim 17 wherein said step of updating a value in a recycle register when said LRU table entry reaches the LRU position is dependent on one or more of: the current value in said recycle register; the previous reason for caching data associated with said recycle register; the nature of the most recent activity of said data associated with said recycle register; and the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
31. A method as m claim 27 wherein said original reason for caching said data is selected from the group of reasons consisting of read-miss, write-miss, read-ahead, and read- behind.
32. A method as m claim 27 wherein said nature of the most recent activity of said data associated with said recycle register is selected from read-hit, write-hit, read-ahead, and read-behind.
33. A method as in claim 17 which further comprises utilizing a global section of said cache memory which can be associated with more than one of said mass storage devices, said method further comprising the steps of: storing in global section of said cache memory selected data from said mass storage devices; maintaining, for said global section of said cache memory, a Least Recently Used (LRU) table indicating the relative recency of use of each data entry stored in said global section of cache memory; updating a value in a recycle register for each entry in said LRU table associated with said global section, of said cache memory, upon the occurrence of one or more of: when data stored in said cache memory associated with said LRU entry has been accessed; when data associated with said LRU entry is added to said cache memory; and when said LRU table entry is rechained for reuse after a previous non-use; and wherein said step of examining an entry at or near the LRU position of one of said LRU tables comprises the step of examining an entry at or near the LRU position of said LRU table associated with said global section of said cache memory: if said value is not less than a predefined threshold value, updating said value and placing said entry at the top of said LRU table of the cache section to which the cached data in said entry is associated and repeatincj said step of examining the entry at or near the LRU position of said LRU table to determine the value of said recycle register; and if said value is less than said predefined threshold value, decaching said data in cache memory associated with said LRU table entry, reusing said LRU entry and its related cache memory for satisfying said current need for storing data not stored in said cache memory, and placing the new LRU entry at the MRU position of said LRU table of the cache section to which the cached data in said entry is associated or marking it dechained depending on the new usage of the entry.
34. A method as in claim 33 which further comprises the step of maintaining data m each of said LRU tables indicating which sectors of each block of data stored m said cache memory have been modified.
35. A method as in claim 33 wherein, for each block stored in said cache memory, not all sectors of a given block need be cached.
36. A method as in claim 33 wherein an LRU block entry is flagged as a dechained member of an LRU table if that block contains modified data.
37. A method as in claim 36 wherein a track entry is made a chained member of an LRU table after its modified data is written to one of said mass storage devices.
38. A method as m claim 37 wherein said track entry is made a chained member at the bottom of the global LRU table if every sector of said track was modified prior to writing to said mass storage device.
39. A method as in claim 37 wherein said block entry is made a chained member at the top of the LRU global table if less than every sector of said block was modified prior to writing to said mass storage device.
40. A method as in claim 33 wherein said step of updating a value in a recycle register upon data associated with said LRU entry being added to said cache memory is based upon the original reason for caching said data withm said cache memory.
41. A method as m claim 40 wherein said reason is selected from the group of reasons consisting of a read-miss, a write-miss, a read-ahead, and a read-behind.
42. A method as m claim 33 wherein said step of updating a value in a recycle register upon data associated with said LRU entry being added to said cache memory is based upon the proportion of cache memory which is in a modified condition and needs to be written to said mass storage device.
43. A method as in claim 33 wherein said step of updating a value m a recycle register when said LRU table entry is rechamed for reuse after a previous nonuse is dependent on one or more of : the current value in said recycle register; the original reason for caching data associated with said recycle register; the nature of the most recent activity of said data associated with said recycle register; and the proportion of cache memory which is in a modified condition and needs to be written to mass storage devLce.
44. A method as in claim 43 wherein said original reason for caching said data is selected from the group of reasons consisting of a read-miss, a write-miss, a read-ahead, and a read-behind.
45. A method as in claim 44 wherein said nature of the most recent activity of said data associated with said recycle register is selected from read-hit, write-hit, read-ahead, and read-behind.
46. A method as in claim 33 wherein said step of updating a value in a recycle register when said LRU table entry reaches the global LRU position is dependent on one or more of : the current value in said recycle register; the previous reason for caching data associated with said recycle register; the nature of the most recent activity of said data associated with said recycle register; and the proportion of cache memory which is in a modified condition and needs to be written to mass storage device.
47. A method as in claim 44 wherein said original reason for caching said data is selected from the group of reasons consisting of a read-miss, a write-miss, a read-ahead, and a read-behind.
48. A method as in claim 44 wherein said nature of the most recent activity of said data associated with said recycle register is selected from read-hit, write-hit, read-ahead, and read-behind.
49. A method as in claim 33 wherein if said value is not less than a predefined threshold value, said step of placing said entry at the MRU position of said LRU table comprises the step of placing said entry at the top of said LRU table associated with said global section of said cache memory.
50. A method as in claim 33 wherein if said value is not less than a predefined threshold value, said step of placing said entry at the MRU position of said LRU table comprises the step of placing said entry at the top of said LRU table of the cache section to which the cached data in said entry is associated.
51. A method as in claim 50 which further comprises the step of: rechammg a cache entry associated with the LRU position of one or more of said sections of said cache memory, other than said global section of said memory, to the MRU position of said global section of said cache memory to maintain the desired plateau level of the global section of said cache memory.
52. A method as in claim 33 wherein if said value is less than a predefined threshold value, said step of placing the new LRU global entry at the MRU position of said LRU table of the cache section to which the cached data in said entry is associated or the marking of it dechained depending on the new usage of the entry is followed by the step of the rechammg of a cache entry associated with the LRU position of one of said sections of said cache memory, other than said global section of said memory, to the MRU position of said global section of said cache memory to maintain the desired plateau level of the global section of said cache memory.
53. A method as in claim 17 wherein said cache memory is divided mto said plurality of sections dynamically based on workload.
54. A method for operating a cache memory system having a plurality of mass storage devices and a cache memory, said method comprising the steps of: logically dividing said cache memory into a plurality of sections, each associated with one of said plurality of mass storage devices; storing m each section of said cache memory selected data from its associated one of said mass storage devices; monitoring accesses of data stored on each of said plurality of mass storage devices; and dynamically allocating amounts of said cache memory among said plurality of sections based upon said data accesses.
55. A method as in claim 54 which logically includes an additional global section of cache memory not associεited with any specific mass storage device, in which said global section contains references to data associated with any one or more of the mass storage devices.
56. A method as in claim 54 wherein the size of each of said plurality of sections of said cache memory are set to predefined sizes during initialization and dynamically altered over time based upon said data access.
57. A method as in claim 55 wherein the size oE each of said plurality of sections of said cache memory are set to predefined sizes during initialization and dynamically altered over time based upon said data access.
58. A method as in claim 56 wherein cache memory is dynamically reallocated from selected ones of said sections of said cache memory which have low utilization to selected ones of said sections of said cache memory which have high utilization.
59. A method as in claim 58 wherein said allocation is made by deallocating memory assigned to one or more of said sections of said cache memory which are not highly active.
60. A method as in claim 59 wherein no one of said sections of said cache memory is reduced in size below a predefined minimum cache size.
61. A method as in claim 54 wherein each of said sections of said cache memory has a plurality of plateaus, each defining the bounds of the size of said section of said cache memory.
62. A method as in claim 55 wherein each of said sections of said cache memory has a plurality of plateaus, each defining a size of said section of said cache memory.
63. A method as in claim 61 wherein cache memory is deallocated from a plurality of said sections of said cache memory which are determined to be least active.
64. A method as in claim 63 wherein said deallocation takes place sequentially among said least active sections of said cache memory such that each of said least active sections of said cache memory are deallocated from a first plateau to a second plateau designating a smaller cache memory allocation to said section of said cache memory, in sequence.
65. A method for operating a cache memory system having a plurality of mass storage devices, each of said mass storage devices having a dynamically associated mode of operation based on amount of modified cached data waiting to be written to the said mass storage device and having a cache memory which is logically divided mto a plurality of sections, one each of said sections associated with each of said plurality of mass storage devices and each of said sections of cache memory having designated plateaus which specify cache operating statuses based on amounts of cache which is assigned to each of said mass storage devices, said method comprising the steps of: storing in each section of said cache memory selected data stored on, or destined to be stored on, its associated one of said mass storage devices; maintaining for each of said mass storage devices an associated mode of operation dynamically based on amount of modified cached data waiting to be written to the said mass storage device; maintaining for each section of said cache memory one or more of: a Least Recently Used (LRU) table indicating the relative recency of use of each data entry stored in said section of cache memory; a status of operation based on the plateau oi said cache memory assigned to said section; and a Least Active Drive (LAD) list indicating the relative amount of activity of each of said drives; and dynamically allocating the amounts of said cache memory among said plurality of sections based on said status, mode and activity.
66. A method as in claim 65 wherein said step of dynamically allocating is performed when cache space is determined to be needed for storing data currently stored m or destined to be stored in an area of mass storage, and said data is not currently assigned an area in said cache memory.
67. A method as in claim 65 wherein said step of dynamically allocating comprises the steps of: selecting a block of memory from one of said sections of said cache memory based on: the current mode of the mass storage device associated with said section of cache memory; said block association with the section representing the currently least active drive; said block position at or near the bottom of the LRU table associated with said least active section; said block in a section whose amount of cache assignment place said section m the highest of the plateaus of the sections of cache memory; said selected section is not the section which currently needs the said block of memory; and said block does not qualify for recycling; and reassigning the said selected cache memory block.
68. A method as in claim 67 wherein said step of reassigning comprises one or more of: decaching the data currently occupying said block of memory; placing said selected entry at the MRU position of said LRU table of section requiring the said cache block; and marking said selected entry dechained depending on the new usage of the entry.
69. A method as in claim 65 wherein said cache memory system further comprises at least one global section of said cache memory not associated with any specific one of said mass storage devices, which further comprises the steps of: storing in said global section selected data from one or more of said mass storage devices; maintaining for said global section of said cache memory a Least Recently Used (LRU) table indicating the relative recency of use of each data entry stored in said global section of cache memory.
70. A method as in claim 69 wherein said step of dynamically allocating comprises the steps of: selecting a block of memory from one of said sections of said cache memory based on: the current mode of the mass storage device associated with said section of cache memory; said block association with the section representing the currently least active drive; said block position at or near the bottom of the LRU table associated with said least active section; said block in a section whose amount of cache assignment place said section in the highest of the plateaus of the sections of cache memory; said selected section is not the section which currently needs the said block of memory; and said block does not qualify for recycling; and reassigning the said selected cache memory block.
71. A method as m claim 70 wherein said step of reassigning comprises one or more of: rechammg the said selected cache memory block from the LRU position in the chain of the section associated with its specific mass storage device to the MRU position in the chain of the global section of cache memory; decaching the data currently occupying the least recently used position of the global section of cache memory; and placing the global LRU entry at the MRU position of LRU table of section requiring the said cache block or marking said block dechained depending on the new usage of the entry. decaching the data currently occupying the least recently used position of the global section of cache memory; and placing the global LRU entry at the MRU position of
LRU table of section requiring the said cache block or marking said block dechained depending on the new usage of the entry.
72. A method as in claim 69 which further comprises the step of maintaining for said global section of said cache memory a status of operation based on the plateau of said cache memory assigned to said global section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU34839/97A AU3483997A (en) | 1996-06-18 | 1997-06-12 | Novel cache memory structure and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US66573796A | 1996-06-18 | 1996-06-18 | |
US08/665,737 | 1996-06-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997049037A1 true WO1997049037A1 (en) | 1997-12-24 |
Family
ID=24671378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1997/010155 WO1997049037A1 (en) | 1996-06-18 | 1997-06-12 | Novel cache memory structure and method |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU3483997A (en) |
WO (1) | WO1997049037A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0066766A2 (en) * | 1981-06-05 | 1982-12-15 | International Business Machines Corporation | I/O controller with a dynamically adjustable cache memory |
WO1984002013A1 (en) * | 1982-11-15 | 1984-05-24 | Storage Technology Corp | Adaptive domain partitioning of cache memory space |
WO1992015933A1 (en) * | 1991-03-05 | 1992-09-17 | Zitel Corporation | Cache memory system and method of operating the cache memory system |
US5307473A (en) * | 1991-02-20 | 1994-04-26 | Hitachi, Ltd. | Controller for storage unit and method of controlling storage unit |
US5434992A (en) * | 1992-09-04 | 1995-07-18 | International Business Machines Corporation | Method and means for dynamically partitioning cache into a global and data type subcache hierarchy from a real time reference trace |
-
1997
- 1997-06-12 WO PCT/US1997/010155 patent/WO1997049037A1/en active Application Filing
- 1997-06-12 AU AU34839/97A patent/AU3483997A/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0066766A2 (en) * | 1981-06-05 | 1982-12-15 | International Business Machines Corporation | I/O controller with a dynamically adjustable cache memory |
WO1984002013A1 (en) * | 1982-11-15 | 1984-05-24 | Storage Technology Corp | Adaptive domain partitioning of cache memory space |
US5307473A (en) * | 1991-02-20 | 1994-04-26 | Hitachi, Ltd. | Controller for storage unit and method of controlling storage unit |
WO1992015933A1 (en) * | 1991-03-05 | 1992-09-17 | Zitel Corporation | Cache memory system and method of operating the cache memory system |
US5434992A (en) * | 1992-09-04 | 1995-07-18 | International Business Machines Corporation | Method and means for dynamically partitioning cache into a global and data type subcache hierarchy from a real time reference trace |
Also Published As
Publication number | Publication date |
---|---|
AU3483997A (en) | 1998-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4875155A (en) | Peripheral subsystem having read/write cache with record access | |
US5594885A (en) | Method for operating a cache memory system using a recycled register for identifying a reuse status of a corresponding cache entry | |
AU673488B2 (en) | Cache memory system and method of operating the cache memory system | |
US5590300A (en) | Cache memory utilizing address translation table | |
US5325509A (en) | Method of operating a cache memory including determining desirability of cache ahead or cache behind based on a number of available I/O operations | |
US5881311A (en) | Data storage subsystem with block based data management | |
US6389509B1 (en) | Memory cache device | |
US6928518B2 (en) | Disk drive employing adaptive flushing of a write cache | |
JP4819369B2 (en) | Storage system | |
US7111134B2 (en) | Subsystem and subsystem processing method | |
US6889288B2 (en) | Reducing data copy operations for writing data from a network to storage of a cached data storage system by organizing cache blocks as linked lists of data fragments | |
US9785564B2 (en) | Hybrid memory with associative cache | |
EP0207288A2 (en) | Peripheral subsystem initialization method and apparatus | |
US4574346A (en) | Method and apparatus for peripheral data handling hierarchies | |
US20050086437A1 (en) | Method and system for a cache replacement technique with adaptive skipping | |
US20030212865A1 (en) | Method and apparatus for flushing write cache data | |
US11061622B2 (en) | Tiering data strategy for a distributed storage system | |
US5555399A (en) | Dynamic idle list size processing in a virtual memory management operating system | |
US20130262752A1 (en) | Efficient use of hybrid media in cache architectures | |
EP0077453A2 (en) | Storage subsystems with arrangements for limiting data occupancy in caches thereof | |
US20040205297A1 (en) | Method of cache collision avoidance in the presence of a periodic cache aging algorithm | |
US5717884A (en) | Method and apparatus for cache management | |
US7032093B1 (en) | On-demand allocation of physical storage for virtual volumes using a zero logical disk | |
US20040049638A1 (en) | Method for data retention in a data cache and data storage system | |
US6782444B1 (en) | Digital data storage subsystem including directory for efficiently providing formatting information for stored records |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU CA JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 98503124 Format of ref document f/p: F |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
122 | Ep: pct application non-entry in european phase |