WO2018233823A1 - Systems and methods for data aging based management of data entries - Google Patents
Systems and methods for data aging based management of data entries Download PDFInfo
- Publication number
- WO2018233823A1 WO2018233823A1 PCT/EP2017/065237 EP2017065237W WO2018233823A1 WO 2018233823 A1 WO2018233823 A1 WO 2018233823A1 EP 2017065237 W EP2017065237 W EP 2017065237W WO 2018233823 A1 WO2018233823 A1 WO 2018233823A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bucket
- sequential
- data
- expiration
- data entries
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/219—Managing data history or versioning
Definitions
- the present invention in some embodiments thereof, relates to data entries and, more specifically, but not exclusively, to management of data entries based on data aging.
- Data entries stored in a data storage device may be removed or migrated based on aging of the data. For example, data entries may be removed or migrated based on the time when the data entries were created and/or modified. A timestamp associated with individual data entries may be analyzed to determine the age of the data entry.
- the general application of data aging is to trigger some user-defined action on data that has timed out.
- Exemplary applications include: deletion of stale data (e.g., time-to-live semantics), segregation (e.g., moving data in the storage hierarchy), and notification of subscribing parties interested in timeouts (e.g., reminder semantics).
- an apparatus for managing a data storage device storing data entries each associated with an expiration timestamp comprises a processor configured to: analyze the expiration timestamp of each data entry at least once during a predefined time interval, store within respective sequential buckets of an adaptive index, wherein each sequential bucket corresponds to a respective sequential timeout latency interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval, and process a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration.
- a method of managing a data storage device storing data entries each associated with an expiration timestamp comprises: analyzing the expiration timestamp of each data entry at least once during a predefined time interval, storing within respective sequential buckets of an adaptive index, wherein each sequential bucket corresponds to a respective sequential timeout latency interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval, and processing a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration.
- the apparatus, systems, methods, and/or code instructions described herein reduce processor utilization (e.g., maintain a low processor profile), by analyzing the sequential buckets corresponding to the current timeout latency interval rather than, for example, scanning the entire set of data entry expiration timestamps during every timeout latency interval.
- the implementation based on the adaptive index improves computational efficiency, for example, in comparison to maintaining a full secondary index according to expiration timestamps of the data entries.
- Such secondary indexes which are maintained for all of the data entries, are updated for every data access operation (e.g., storing least recently used (LRU) data).
- LRU least recently used
- the implementation based on the adaptive index reduces data storage requirements in comparison to full secondary indexes.
- the adaptive index stores a subset of indications of the data entries that are set to time out within the predefined time interval.
- the secondary index stores indications for all of the data entries. Effectively, the storage requirements of the adaptive index are a fraction of the storage space that would otherwise be required for implementation of the secondary index.
- the secondary indexes increase undesired data redundancy and increase storage requirements and/or storage complexity.
- the apparatus, systems, methods, and/or code instructions described herein do not significantly reduce performance of parallel update operations on the data entries, in contrast to maintaining a full secondary index or other approaches to data aging.
- the method further comprises and/or the processor is configured to define the expiration timestamp for each new data entry stored in the data storage device.
- the method further comprises and/or the processor is configured to reassign the data entries in the certain sequential bucket having associated expiration timestamps that do not expire during the current timeout latency interval, to other sequential buckets each corresponding to the future timeout latency interval during which the expiration timestamps of the reassigned data entries are set to expire.
- Data entries that will become relevant in the near future are identified and moved to the relevant sequential bucket, which improves computational efficiency in comparison to, for example, implementation of a classical full secondary index on expiration timestamps.
- the method further comprises and/or the processor is configured, when the expiration timestamp of a certain data entry is modified, to reassign the certain data entry to another sequential bucket corresponding to the future timeout latency interval during which the modified expiration timestamp is set to expire.
- the predefined time interval is based on the timeout latency interval (7) during which pending timeouts in the data storage device are discovered and a scan laziness parameter (TV).
- T and N are configurable.
- the T and N value may be selected to provide a desired quality of service (QoS) in terms of data storage and/or processor utilization for analyzing and/or processing the adaptive index.
- QoS quality of service
- the predefined time interval is computed as r*TV.
- Each timestamp of each data entry is analyzed at least once during the predefined interval r*TV, which guarantees discovery of timed out data entries according to the configured predefined timeout latency interval T.
- a limit to processor utilization is set according to the configuration of T and TV.
- TV denotes the number of sequential buckets, and each sequential bucket denoted as bucket-I (1 ⁇ I ⁇ N) stores indication of data entries set to expire in the future timeout latency interval denoted by I * T.
- the configurable range of values of each one of T and TV is set according to a QoS for maintaining at least one of a processor utilization requirement for managing the adaptive index and a storage requirement for storing the adaptive index.
- QoS is configurable. Storage and/or processor resource requirements are proportional to QoS requirements.
- the sequential buckets are chained into a cyclic linked list, wherein a marker points to the current sequential bucket denoted as bucket- 1 which is corresponding to the current timeout latency interval, wherein the previous sequential bucket is denoted as bucket-TV, and the marker is moved to bucket-2 from bucket- 1 when the current timeout latency interval expires and a next timeout latency interval begins, and the sequential buckets are logically relabeled according to bucket-2 being logically relabeled as bucket- 1 and bucket- 1 being logically relabeled as bucket-TV.
- the cyclic design of the linked list continues ad infinitum.
- the analysis of the expiration timestamps of data entries stored in a certain segment is stopped when a certain data entry of the certain segment is identified as not expiring within the predefined time interval, and the analysis is resumed on a different segment.
- Temporal organization of the data entries further improves computational efficiency (e.g., reducing processor utilization) by skipping irrelevant data entries.
- the data entries are stored in a data structure that does not maintain temporal order.
- the systems, apparatus, methods, and/or code instructions described herein improve computational efficiency (e.g., in terms of processor utilization and/or data storage requirements) in implementations in which temporal ordering of the data entries is not maintained.
- the analysis of the expiration timestamp of each data entry is performed as a background process.
- the background process is configurable to provide the selected QoS.
- the background process is executed to maintain validity of the adaptive index, which is computationally efficient, for example, in comparison to implementation of a secondary index which is dynamically updated according to ongoing database operations.
- the expiration timestamp of each respective data entry segregates data entry storage across a heterogeneous storage hierarchy.
- Efficiency of the heterogeneous storage hierarchy is improved, by moving data entries to the storage device that provides the best balance between storage space, speed, and frequency of access of the data entry.
- the expiration timestamp of each respective data entry denotes a generic mechanism that triggers deletion and/or notification and/or other arbitrary actions on the expired data.
- the implementation based on the adaptive index improves computational efficiency of data entry insert and/or update operations, in comparison to for example, maintaining a classical full secondary index on expiration timestamps.
- all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains.
- FIG. 1 is a flowchart of a method of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention
- FIG. 2 is a block diagram of components of a system that includes a computing device for managing one or more data storage devices storing data entries each associated with an expiration timestamp, based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention
- FIG. 3 is a flowchart of an implementation of a process of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention.
- the present invention in some embodiments thereof, relates to data entries and, more specifically, but not exclusively, to management of data entries based on data aging.
- An aspect of some embodiments of the present invention relate to a system, an apparatus, a method, and/or code instructions (stored in a data storage device executable by one or more processors) for managing data entries each associated with an expiration timestamp and stored in a data storage device.
- the expiration timestamps are analyzed at least once during a predefined time interval. Indications of data entries associated with expiration timestamps set to expire with respective sequential timeout latency intervals of the predefined time interval are stored within respective sequential buckets corresponding to the respective sequential timeout latency interval.
- the sequential buckets are arranged as an adaptive index.
- the certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval is processed to remove data entries having expiration timestamps denoting expiration.
- a table stores records, where each record includes a field storing a time-to-live (TTL) of the respective record.
- TTL time-to-live
- a set of records times out and a predefined action is triggered, for example notifications are sent.
- the table does not necessarily maintain temporal order, for example, the table is clustered according to some non-temporal primary key.
- TTL of a record may be arbitrarily updated, but in general it is expected to be reset to 3.600 ⁇ .
- the apparatus, systems, methods, and/or code instructions described herein reduce processor utilization (e.g., maintain a low processor profile), by analyzing the sequential buckets corresponding to the current timeout latency interval rather than, for example, scanning the entire set of data entry expiration timestamps during every timeout latency interval.
- the implementation based on the adaptive index improves computational efficiency, for example, in comparison to maintaining a full secondary index on expiration timestamps of the data entries.
- Such secondary indexes which are maintained for all of the data entries, are updated for every data access operation (e.g., storing least recently used (LRU) data).
- LRU least recently used
- the implementation based on the adaptive index reduces data storage requirements in comparison to secondary indexes.
- the adaptive index stores a subset of indications of the data entries that are set to time out within the predefined time interval.
- the secondary index stores indications for all of the data entries. Effectively, the storage requirements of the adaptive index are a fraction of the storage space that would otherwise be required for implementation of the secondary index.
- the secondary indexes increase undesired data redundancy and increase storage requirements and/or storage complexity.
- the apparatus, systems, methods, and/or code instructions described herein do not significantly reduce performance of parallel update operations on the data entries, in contrast to maintaining a full secondary index or other approaches to data aging.
- the present invention may be a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- FPGA field-programmable gate arrays
- PLA programmable logic arrays
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- FIG. 1 is a flowchart of a method of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention.
- FIG. 1 is a flowchart of a method of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention.
- FIG. 2 which is a block diagram of components of a system 200 that includes a computing device 202 for managing one or more data storage devices 204 storing data entries 204A each associated with an expiration timestamp 204B, based on sequential buckets of an adaptive index 206A, in accordance with some embodiments of the present invention.
- Computing device 202 may be implemented as, for example, one or more of: a single computing device, a group of computing devices arranged in parallel, a computing cloud, a virtual machine, a network server, a storage server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.
- Storage device(s) 204 may be integrated with computing device 202, for example installed within computing device 202, for example, data storage device(s) 204 is installed with a physical box implementation of computing device 202.
- Storage device(s) 204 may be used exclusively by computing device 202, and located externally to computing device 202, for example, an external hard drive, and/or remote storage server.
- Storage device(s) 204 may be shared by computing device 202 and/or installed within another computing system (not shown), for example, within a network.
- Storage device(s) 204 communicates with computing device 202, for example, by a local bus, over a network, over a direct link, over a wireless, and/or over a wire.
- Storage device(s) 204 may be arranged as a heterogeneous storage hierarchy.
- Exemplary storage device(s) 204 include one or more of: cache, random access memory (RAM), solid state drive (SSD), hard disk drive (HDD), a removable storage unit, a remote storage device, a computing cloud, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media.
- Computing device 202 includes one or more processors 208, implemented as for example, central processing unit(s) (CPU), graphics processing unit(s) (GPU), field programmable gate array(s) (FPGA), digital signal processor(s) (DSP), application specific integrated circuit(s) (ASIC), customized circuit(s), processors for interfacing with other units, and/or specialized hardware accelerators.
- processors 208 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogenous and/or heterogeneous processor architectures). It is noted that processor(s) 208 may be designed to implement in hardware one or more features stored as code instructions 206B.
- Data storage device e.g., memory
- Memory 206 stores code instructions executable by processor(s) 208, optionally as code instructions 206B that implement one or more acts of the method described with reference to FIG. 1.
- Memory 206 may store adaptive index 206A.
- Memory 206 may be implemented as for example, a random access memory (RAM), read-only memory (ROM), and/or a storage device, for example, nonvolatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
- RAM random access memory
- ROM read-only memory
- storage device for example, nonvolatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
- Computing device 202 may be in communication with a user interface 210 that presents data to a user and/or includes a mechanism for entry of data, for example, one or more of: a touch-screen, a display, a keyboard, a mouse, voice activated software, and a microphone.
- User interfaces 210 may be used to manually configure parameters described herein.
- one or more parameters are defined.
- the parameters may be set, for example, manually by a user via user interface 210, automatically computed (e.g., by code instructions stored in data storage device 206 executed by processor(s) 208 that analyze performance data and automatically select the parameters), and/or are obtained from a storage device storing a predefined configuration.
- the defined parameters include a timeout latency interval (7) during which pending timeouts in the data storage device are discovered and/or a scan laziness parameter (N).
- the predefined time interval is based on the defined values of T and N.
- the predefined time interval may be computed as T*N.
- T and N are configurable.
- the T and N value may be selected to provide a desired QoS in terms of data storage and/or processor utilization for analyzing and/or processing the adaptive index.
- Each timestamp of each data entry is analyzed at least once during the predefined interval T*N, which guarantees discovery of timed out data entries according to the configured predefined timeout latency interval T.
- a limit to processor utilization is set according to the configuration of T and N.
- N denotes the number of sequential buckets.
- the value set for N may automatically set the number of sequential buckets of adaptive index 206A.
- the number of sequential buckets of adaptive index 206A is set or configured, and the parameter N is automatically set according to the number of sequential buckets.
- Each sequential bucket denoted as bucket-I (1 ⁇ I ⁇ N) stores indication of data entries set to expire in the future timeout latency interval denoted by / * T.
- the configurable range of values of each of T and N may be set according to a quality of service (QoS) for maintaining utilization requirement of processor(s) 208 for managing adaptive index 206A and/or a storage requirement for storing adaptive index 206A.
- QoS quality of service
- the QoS is configurable, with T and N automatically computed according to the set QoS.
- the T and N are set, and the QoS is automatically computed according to the set T and N values.
- Storage and/or processor resource requirements are proportional to QoS requirements.
- the expiration timestamp 204B of each data entry 204A is analyzed at least once during the predefined time interval.
- the expiration timestamp 204B denotes an expiry data, time, and/or alarm in the future, for example, a time-to-live (TTL).
- TTL time-to-live
- An event is generated upon expiration of the timestamp associated with a certain data entry.
- data entries 204A are stored in data storage device 204 in segments exhibiting temporal order
- the analysis of the expiration timestamps of data entries 204A stored in a certain segment is stopped when a certain data entry of the certain segment is identified as not expiring within the predefined time interval.
- temporally ordered data entries 204 A may be stored in segments of a log-structure. The analysis is resumed on a different segment.
- Temporal organization of the data entries further improves computational efficiency (e.g., reducing processor utilization) by skipping irrelevant data entries.
- data entries 204 A are stored in a data structure that does not maintain insertion order and/or temporal order.
- the systems, apparatus, methods, and/or code instructions described herein improve computational efficiency (e.g., in terms of processor utilization and/or data storage requirements) in implementations in which temporal ordering of the data entries is not maintained.
- the analysis of the expiration timestamp of each data entry is performed as a background process.
- the background process is configurable to provide the selected QoS.
- the background process is executed to maintain validity of the adaptive index, which is computationally efficient, for example, in comparison to implementation of a secondary index which is dynamically updated according to ongoing database operations.
- the expiration timestamp 204B for each new data entry 204A stored in data storage device 204 is defined by processor(s) 208 of computing device 202.
- the expiration timestamp 204B is pre-defined, for example, in the user data of the data entry. Additional overhead that would otherwise be required for maintaining a secondary index structure to keep track of timeout information during insertion of the new entry or updating of an existing entry is avoided or reduced. Computational resources (e.g., processor utilizations, data storage) that would otherwise be used for frequent update of the secondary index structure are freed up.
- indications of data entries 204A associated with expiration timestamps 204B that are analyzed and determined as set to expire within a certain sequential timeout latency interval are stored within corresponding respective sequential buckets of adaptive index 206A.
- Each sequential bucket corresponds to a certain respective sequential timeout latency interval of the predefined time interval.
- the certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval is processed.
- the processing of the certain sequential bucket is started when the timeout latency interval T associated with the previously processed sequential bucket is reached.
- Indication of data entries stored in the certain sequential bucket are evaluated to determine whether the associated expiration timestamp is indicative of actual expiration, or whether the associated expiration timestamp has been modified (e.g., by an update process) such that the associated data entry is no longer set to expire during the timeout latency interval T associated with the currently processed certain sequential bucket.
- the update process may execute while the background process is processing the certain sequential bucket.
- the update process may modify expiration timestamps 204B without modifying the corresponding indications stored in adaptive index 206A.
- Data entries having associated expiration timestamps denoting expiration are removed from the certain sequential bucket.
- the expiration timestamp of each respective data entry denotes a generic mechanism.
- the generic mechanisms of the data entries having expiration timestamps are trigged.
- Exemplary triggers include deletion and/or notification and/or other arbitrary actions on the expired data.
- the implementation based on the adaptive index improves computational efficiency of data entry insert and/or update operations, in comparison to for example, maintaining a classical full secondary index on expiration timestamps.
- data entries in the certain sequential bucket having associated expiration timestamps that do not expire during the current timeout latency interval are reassigned to other sequential buckets corresponding to the future timeout latency interval during which the expiration timestamps of the reassigned data entries are set to expire.
- Data entries that will become relevant in the near future are identified and moved to the relevant sequential bucket, which improves computational efficiency in comparison to, for example, implementation of a classical full secondary index on expiration timestamps.
- the expiration timestamp of each respective data entry segregates data entry storage across a heterogeneous storage hierarchy 204.
- Data entries may be segregated into different storage devices based on the age of the data. Frequently accessed data may be stored in fast memory, for example, a cache. Infrequently accessed data may be stored in slow memory, for example, a hard disk drive. Efficiency of the heterogeneous storage hierarchy is improved, by moving data entries to the storage device 204 that provides the best balance between storage space, speed, and frequency of access of the data entry.
- the certain data entry is dynamically reassigned to another sequential bucket corresponding to the future timeout latency interval during which the modified expiration timestamp is set to expire.
- the reassigning is performed dynamically, when the expiration timestamp is modified, rather than and/or in addition to performing the reassigning when the certain sequential bucket is analyzed. It is noted that update operations may modify or reset the value of the expiration timestamp (e.g., TTL) at any time.
- blocks 104-110 are iterated. Each iteration cycle is performed during a subsequent time interval.
- the sequential buckets of adaptive index 206A are chained into a cyclic linked list.
- a marker points to the current sequential bucket denoted as bucket- 1 which is corresponding to the current timeout latency interval (e.g., during the current iteration cycle).
- the previous sequential bucket (e.g., during the previous iteration cycle) is denoted as bucket-N.
- Marker is moved to bucket-2 from bucket- 1 when the current timeout latency interval expires and a next timeout latency interval begins (which may occur during the next iteration cycle).
- the sequential buckets are logically relabeled according to bucket-2 being logically relabeled as bucket- 1 and bucket- 1 being logically relabeled as bucket-N.
- FIG. 3 is a flowchart of an implementation of a process of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention.
- One or more acts of the process described with reference to FIG. 3 may be based on, integrated with, and/or interchanged with an implementation of one or more acts described with reference to FIG. 1.
- the process described with reference to FIG. 3 may be implemented by one or more components of system 200 described with reference to FIG. 2, optionally as code instructions stored in data storage device 206 executable by processor(s) 208 of computing device 202.
- Table R includes a schema that generally defines the table's record format (e.g., stores data entries 204A), and defines a field storing an expiration value, optionally a TTL value (e.g., stores timestamps 204B).
- the adaptive index 206A is created based on Table R.
- Data entries e.g., records
- table R time out as defined by the schema, for example, every T seconds.
- a background process scans table R is a lazy manner. Each data entry 204A is visited at least once during the time interval defined by the configured parameters N*T.
- an alarm is set to the value T, which denotes the data entries set to expire within the current timeout latency interval T, and are stored within the current sequential bucket of the adaptive index.
- the background process scans data entries 204A of Table R, according to blocks 104 and 106 as described with reference to FIG. 1.
- processor(s) 208 performing the scan is limited as defined by the configured QoS parameter(s).
- the background scan stops scanning Table R, and processes the sequential bucket bucket- 1 corresponding to the current T, according to block 108 as described with reference to FIG. 1.
- the expiration timestamp (e.g., TTL) of the next entry in sequential bucket bucket-1 is analyzed.
- the expiration timestamp of an entry stored in bucket-1 is analyzed to determine whether the entry stored in bucket-1 actually timed out.
- the action associated with the expiration is triggered, for example, a notification event is raised, and/or a deletion of data is performed.
- the data entry associated with the expired timestamp is removed from the sequential bucket. Remaining data entries of bucket bucket-1 are processed by resuming block 318.
- the indication of the data entry is moved to another sequential bucket corresponding to the modified value of the expiration timestamp, according to block 110 as described with reference to FIG. 1. Remaining data entries of bucket bucket-1 are processed by resuming block 318.
- the background process checks whether a join request was issued in block 340 within the main process. When no join request is issued, scanning of entries in Table R is resumed at block 312. Alternatively, when a join request is issued, at 336, the background process acknowledges the join request to the main process waiting in block 340 and background scanning of Table R stops by terminating the background process.
- processes perform updates on the data entries 204A and/or timestamps 204B of Table R.
- Table R may be dropped (e.g., the database program accessing Table R is terminated).
- the main process may stop background scanning of Table R by issuing a join request and waiting for acknowledgement from the background process, according to blocks 334 and 336.
- composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
- a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
- range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
There is provided an apparatus for managing a data storage device storing data entries each associated with an expiration timestamp, the apparatus comprising: a processor configured to: analyze the expiration timestamp of each data entry at least once during a predefined time interval, store within respective sequential buckets of an adaptive index wherein each sequential bucket corresponds to a respective sequential timeout latency interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval, and process a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration.
Description
SYSTEMS AND METHODS FOR DATA AGING BASED MANAGEMENT OF
DATA ENTRIES
BACKGROUND
The present invention, in some embodiments thereof, relates to data entries and, more specifically, but not exclusively, to management of data entries based on data aging.
Data entries stored in a data storage device, for example, in a database, may be removed or migrated based on aging of the data. For example, data entries may be removed or migrated based on the time when the data entries were created and/or modified. A timestamp associated with individual data entries may be analyzed to determine the age of the data entry.
The general application of data aging is to trigger some user-defined action on data that has timed out. Exemplary applications include: deletion of stale data (e.g., time-to-live semantics), segregation (e.g., moving data in the storage hierarchy), and notification of subscribing parties interested in timeouts (e.g., reminder semantics).
SUMMARY
It is an object of the present invention to provide an apparatus, a method, a computer program product, and a system for managing a data storage device storing data entries each associated with an expiration timestamp.
The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
According to a first aspect, an apparatus for managing a data storage device storing data entries each associated with an expiration timestamp comprises a processor configured to: analyze the expiration timestamp of each data entry at least once during a predefined time interval, store within respective sequential buckets of an adaptive index, wherein each sequential bucket corresponds to a respective sequential timeout latency
interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval, and process a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration.
According to a second aspect, a method of managing a data storage device storing data entries each associated with an expiration timestamp comprises: analyzing the expiration timestamp of each data entry at least once during a predefined time interval, storing within respective sequential buckets of an adaptive index, wherein each sequential bucket corresponds to a respective sequential timeout latency interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval, and processing a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration.
The apparatus, systems, methods, and/or code instructions described herein reduce processor utilization (e.g., maintain a low processor profile), by analyzing the sequential buckets corresponding to the current timeout latency interval rather than, for example, scanning the entire set of data entry expiration timestamps during every timeout latency interval.
The implementation based on the adaptive index improves computational efficiency, for example, in comparison to maintaining a full secondary index according to expiration timestamps of the data entries. Such secondary indexes, which are maintained for all of the data entries, are updated for every data access operation (e.g., storing least recently used (LRU) data).
The implementation based on the adaptive index reduces data storage requirements in comparison to full secondary indexes. The adaptive index stores a subset of indications of the data entries that are set to time out within the predefined time interval. In contrast, the secondary index stores indications for all of the data entries. Effectively, the storage requirements of the adaptive index are a fraction of the storage space that would otherwise be required for implementation of the secondary
index. The secondary indexes increase undesired data redundancy and increase storage requirements and/or storage complexity.
The apparatus, systems, methods, and/or code instructions described herein do not significantly reduce performance of parallel update operations on the data entries, in contrast to maintaining a full secondary index or other approaches to data aging.
In a further implementation form of the first and second aspects, the method further comprises and/or the processor is configured to define the expiration timestamp for each new data entry stored in the data storage device.
Additional overhead that would otherwise be required for maintaining a secondary index structure to keep track of timeout information during insertion of the new entry or updating of an existing entry is avoided or reduced. Computational resources (e.g., processor utilizations, data storage) that would otherwise be used for frequent update of the secondary index structure are freed up.
In a further implementation form of the first and second aspects, the method further comprises and/or the processor is configured to reassign the data entries in the certain sequential bucket having associated expiration timestamps that do not expire during the current timeout latency interval, to other sequential buckets each corresponding to the future timeout latency interval during which the expiration timestamps of the reassigned data entries are set to expire.
Data entries that will become relevant in the near future are identified and moved to the relevant sequential bucket, which improves computational efficiency in comparison to, for example, implementation of a classical full secondary index on expiration timestamps.
In a further implementation form of the first and second aspects, the method further comprises and/or the processor is configured, when the expiration timestamp of a certain data entry is modified, to reassign the certain data entry to another sequential bucket corresponding to the future timeout latency interval during which the modified expiration timestamp is set to expire.
When the expiration timestamps of data entries are frequently updated, reassigning the certain data entry to another sequential bucket (i.e., an eagerly update strategy) improves computational performance by omitting the checking of the data entries stored in the current bucket.
In a further implementation form of the first and second aspects, the predefined time interval is based on the timeout latency interval (7) during which pending timeouts in the data storage device are discovered and a scan laziness parameter (TV).
T and N are configurable. The T and N value may be selected to provide a desired quality of service (QoS) in terms of data storage and/or processor utilization for analyzing and/or processing the adaptive index.
In a further implementation form of the first and second aspects, the predefined time interval is computed as r*TV.
Each timestamp of each data entry is analyzed at least once during the predefined interval r*TV, which guarantees discovery of timed out data entries according to the configured predefined timeout latency interval T. A limit to processor utilization is set according to the configuration of T and TV. When timestamps are updated to values less than T*N, worse-case discovery is guaranteed after r*TV, unless eager update strategy is selected.
In a further implementation form of the first and second aspects, TV denotes the number of sequential buckets, and each sequential bucket denoted as bucket-I (1 < I < N) stores indication of data entries set to expire in the future timeout latency interval denoted by I * T.
In a further implementation form of the first and second aspects, the configurable range of values of each one of T and TV is set according to a QoS for maintaining at least one of a processor utilization requirement for managing the adaptive index and a storage requirement for storing the adaptive index.
QoS is configurable. Storage and/or processor resource requirements are proportional to QoS requirements.
In a further implementation form of the first and second aspects, the sequential buckets are chained into a cyclic linked list, wherein a marker points to the current sequential bucket denoted as bucket- 1 which is corresponding to the current timeout latency interval, wherein the previous sequential bucket is denoted as bucket-TV, and the marker is moved to bucket-2 from bucket- 1 when the current timeout latency interval expires and a next timeout latency interval begins, and the sequential buckets are logically relabeled according to bucket-2 being logically relabeled as bucket- 1 and bucket- 1 being logically relabeled as bucket-TV.
The cyclic design of the linked list continues ad infinitum.
In a further implementation form of the first and second aspects, when the data entries are stored in the data storage device in segments exhibiting temporal order, the analysis of the expiration timestamps of data entries stored in a certain segment is stopped when a certain data entry of the certain segment is identified as not expiring within the predefined time interval, and the analysis is resumed on a different segment.
Temporal organization of the data entries further improves computational efficiency (e.g., reducing processor utilization) by skipping irrelevant data entries.
In a further implementation form of the first and second aspects, the data entries are stored in a data structure that does not maintain temporal order.
The systems, apparatus, methods, and/or code instructions described herein improve computational efficiency (e.g., in terms of processor utilization and/or data storage requirements) in implementations in which temporal ordering of the data entries is not maintained.
In a further implementation form of the first and second aspects, the analysis of the expiration timestamp of each data entry is performed as a background process.
The background process is configurable to provide the selected QoS. The background process is executed to maintain validity of the adaptive index, which is computationally efficient, for example, in comparison to implementation of a secondary index which is dynamically updated according to ongoing database operations.
In a further implementation form of the first and second aspects, the expiration timestamp of each respective data entry segregates data entry storage across a heterogeneous storage hierarchy.
Efficiency of the heterogeneous storage hierarchy is improved, by moving data entries to the storage device that provides the best balance between storage space, speed, and frequency of access of the data entry.
In a further implementation form of the first and second aspects, the expiration timestamp of each respective data entry denotes a generic mechanism that triggers deletion and/or notification and/or other arbitrary actions on the expired data.
The implementation based on the adaptive index improves computational efficiency of data entry insert and/or update operations, in comparison to for example, maintaining a classical full secondary index on expiration timestamps.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
FIG. 1 is a flowchart of a method of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention;
FIG. 2 is a block diagram of components of a system that includes a computing device for managing one or more data storage devices storing data entries each associated with an expiration timestamp, based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention; and
FIG. 3 is a flowchart of an implementation of a process of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention. DETAILED DESCRIPTION
The present invention, in some embodiments thereof, relates to data entries and, more specifically, but not exclusively, to management of data entries based on data aging.
An aspect of some embodiments of the present invention relate to a system, an apparatus, a method, and/or code instructions (stored in a data storage device executable by one or more processors) for managing data entries each associated with an expiration timestamp and stored in a data storage device. The expiration timestamps are analyzed
at least once during a predefined time interval. Indications of data entries associated with expiration timestamps set to expire with respective sequential timeout latency intervals of the predefined time interval are stored within respective sequential buckets corresponding to the respective sequential timeout latency interval. The sequential buckets are arranged as an adaptive index. The certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval is processed to remove data entries having expiration timestamps denoting expiration.
The systems, apparatus, methods, and/or code instructions (stored in a data storage device, executable by one or more processors) described herein address the technical problem of performing an action according to data aging of entries stored in one or more data storage devices. For example, a table stores records, where each record includes a field storing a time-to-live (TTL) of the respective record. In predefined timeout latency intervals, for example every T seconds, a set of records times out and a predefined action is triggered, for example notifications are sent. The table does not necessarily maintain temporal order, for example, the table is clustered according to some non-temporal primary key. The whole table is scanned every T seconds, to ensure that all the data entries are visited during the interval T, which is an inefficient use of computational resources. Data aging is fuzzy in the sense that instantaneous identification of timeouts is not necessarily required to trigger one or more actions according to the age of the stored data. A certain delay, i.e. the timeout latency interval, is acceptable. However, it is assumed that the TTL interval is much longer than T, for example TTL = 3.600Γ. TTL of a record may be arbitrarily updated, but in general it is expected to be reset to 3.600Γ.
The apparatus, systems, methods, and/or code instructions described herein reduce processor utilization (e.g., maintain a low processor profile), by analyzing the sequential buckets corresponding to the current timeout latency interval rather than, for example, scanning the entire set of data entry expiration timestamps during every timeout latency interval.
The implementation based on the adaptive index improves computational efficiency, for example, in comparison to maintaining a full secondary index on expiration timestamps of the data entries. Such secondary indexes, which are
maintained for all of the data entries, are updated for every data access operation (e.g., storing least recently used (LRU) data).
The implementation based on the adaptive index reduces data storage requirements in comparison to secondary indexes. The adaptive index stores a subset of indications of the data entries that are set to time out within the predefined time interval. In contrast, the secondary index stores indications for all of the data entries. Effectively, the storage requirements of the adaptive index are a fraction of the storage space that would otherwise be required for implementation of the secondary index. The secondary indexes increase undesired data redundancy and increase storage requirements and/or storage complexity.
The apparatus, systems, methods, and/or code instructions described herein do not significantly reduce performance of parallel update operations on the data entries, in contrast to maintaining a full secondary index or other approaches to data aging.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Reference is now made to FIG. 1, which is a flowchart of a method of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention. Reference is also made to FIG. 2, which is a block diagram of components of a system 200 that includes a computing device 202 for managing one or more data storage devices 204 storing data entries 204A each associated with an expiration timestamp 204B, based on sequential buckets of an adaptive index 206A, in accordance with some embodiments of the present invention.
Computing device 202 may be implemented as, for example, one or more of: a single computing device, a group of computing devices arranged in parallel, a computing cloud, a virtual machine, a network server, a storage server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.
Storage device(s) 204 may be integrated with computing device 202, for example installed within computing device 202, for example, data storage device(s) 204 is installed with a physical box implementation of computing device 202. Storage device(s) 204 may be used exclusively by computing device 202, and located externally to computing device 202, for example, an external hard drive, and/or remote storage server. Storage device(s) 204 may be shared by computing device 202 and/or installed within another computing system (not shown), for example, within a network. Storage device(s) 204 communicates with computing device 202, for example, by a local bus, over a network, over a direct link, over a wireless, and/or over a wire.
Storage device(s) 204 may be arranged as a heterogeneous storage hierarchy. Exemplary storage device(s) 204 include one or more of: cache, random access memory (RAM), solid state drive (SSD), hard disk drive (HDD), a removable storage unit, a remote storage device, a computing cloud, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media.
Computing device 202 includes one or more processors 208, implemented as for example, central processing unit(s) (CPU), graphics processing unit(s) (GPU), field programmable gate array(s) (FPGA), digital signal processor(s) (DSP), application
specific integrated circuit(s) (ASIC), customized circuit(s), processors for interfacing with other units, and/or specialized hardware accelerators. Processor(s) 208 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogenous and/or heterogeneous processor architectures). It is noted that processor(s) 208 may be designed to implement in hardware one or more features stored as code instructions 206B.
Data storage device (e.g., memory) 206 stores code instructions executable by processor(s) 208, optionally as code instructions 206B that implement one or more acts of the method described with reference to FIG. 1. Memory 206 may store adaptive index 206A. Memory 206 may be implemented as for example, a random access memory (RAM), read-only memory (ROM), and/or a storage device, for example, nonvolatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
Computing device 202 may be in communication with a user interface 210 that presents data to a user and/or includes a mechanism for entry of data, for example, one or more of: a touch-screen, a display, a keyboard, a mouse, voice activated software, and a microphone. User interfaces 210 may be used to manually configure parameters described herein.
At 102, one or more parameters are defined. The parameters may be set, for example, manually by a user via user interface 210, automatically computed (e.g., by code instructions stored in data storage device 206 executed by processor(s) 208 that analyze performance data and automatically select the parameters), and/or are obtained from a storage device storing a predefined configuration.
The defined parameters include a timeout latency interval (7) during which pending timeouts in the data storage device are discovered and/or a scan laziness parameter (N). The predefined time interval is based on the defined values of T and N. The predefined time interval may be computed as T*N. T and N are configurable. The T and N value may be selected to provide a desired QoS in terms of data storage and/or processor utilization for analyzing and/or processing the adaptive index.
Each timestamp of each data entry is analyzed at least once during the predefined interval T*N, which guarantees discovery of timed out data entries according to the configured predefined timeout latency interval T. A limit to processor utilization
is set according to the configuration of T and N. When timestamps are updated to values less than T*N, worse-case discovery is guaranteed after T*N, unless eager update strategy is selected.
N denotes the number of sequential buckets. The value set for N may automatically set the number of sequential buckets of adaptive index 206A. Alternatively, the number of sequential buckets of adaptive index 206A is set or configured, and the parameter N is automatically set according to the number of sequential buckets.
Each sequential bucket denoted as bucket-I (1 < I < N) stores indication of data entries set to expire in the future timeout latency interval denoted by / * T.
The configurable range of values of each of T and N may be set according to a quality of service (QoS) for maintaining utilization requirement of processor(s) 208 for managing adaptive index 206A and/or a storage requirement for storing adaptive index 206A. Optionally, the QoS is configurable, with T and N automatically computed according to the set QoS. Alternatively or additionally, the T and N are set, and the QoS is automatically computed according to the set T and N values. Storage and/or processor resource requirements are proportional to QoS requirements.
At 104, the expiration timestamp 204B of each data entry 204A is analyzed at least once during the predefined time interval. The expiration timestamp 204B denotes an expiry data, time, and/or alarm in the future, for example, a time-to-live (TTL). An event is generated upon expiration of the timestamp associated with a certain data entry.
When data entries 204A are stored in data storage device 204 in segments exhibiting temporal order, the analysis of the expiration timestamps of data entries 204A stored in a certain segment is stopped when a certain data entry of the certain segment is identified as not expiring within the predefined time interval. For example, temporally ordered data entries 204 A may be stored in segments of a log-structure. The analysis is resumed on a different segment. Temporal organization of the data entries further improves computational efficiency (e.g., reducing processor utilization) by skipping irrelevant data entries.
Alternatively, data entries 204 A are stored in a data structure that does not maintain insertion order and/or temporal order. The systems, apparatus, methods, and/or code instructions described herein improve computational efficiency (e.g., in terms of
processor utilization and/or data storage requirements) in implementations in which temporal ordering of the data entries is not maintained.
Optionally, the analysis of the expiration timestamp of each data entry is performed as a background process. The background process is configurable to provide the selected QoS. The background process is executed to maintain validity of the adaptive index, which is computationally efficient, for example, in comparison to implementation of a secondary index which is dynamically updated according to ongoing database operations.
Optionally, the expiration timestamp 204B for each new data entry 204A stored in data storage device 204 is defined by processor(s) 208 of computing device 202. Alternatively or additionally, the expiration timestamp 204B is pre-defined, for example, in the user data of the data entry. Additional overhead that would otherwise be required for maintaining a secondary index structure to keep track of timeout information during insertion of the new entry or updating of an existing entry is avoided or reduced. Computational resources (e.g., processor utilizations, data storage) that would otherwise be used for frequent update of the secondary index structure are freed up.
At 106, indications of data entries 204A associated with expiration timestamps 204B that are analyzed and determined as set to expire within a certain sequential timeout latency interval are stored within corresponding respective sequential buckets of adaptive index 206A. Each sequential bucket corresponds to a certain respective sequential timeout latency interval of the predefined time interval.
At 108, the certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval is processed. The processing of the certain sequential bucket is started when the timeout latency interval T associated with the previously processed sequential bucket is reached.
Indication of data entries stored in the certain sequential bucket are evaluated to determine whether the associated expiration timestamp is indicative of actual expiration, or whether the associated expiration timestamp has been modified (e.g., by an update process) such that the associated data entry is no longer set to expire during the timeout latency interval T associated with the currently processed certain sequential bucket. The update process may execute while the background process is processing the certain
sequential bucket. The update process may modify expiration timestamps 204B without modifying the corresponding indications stored in adaptive index 206A.
Data entries having associated expiration timestamps denoting expiration are removed from the certain sequential bucket. Alternatively or additionally, the expiration timestamp of each respective data entry denotes a generic mechanism. The generic mechanisms of the data entries having expiration timestamps are trigged. Exemplary triggers include deletion and/or notification and/or other arbitrary actions on the expired data. The implementation based on the adaptive index improves computational efficiency of data entry insert and/or update operations, in comparison to for example, maintaining a classical full secondary index on expiration timestamps.
At 110, data entries in the certain sequential bucket having associated expiration timestamps that do not expire during the current timeout latency interval are reassigned to other sequential buckets corresponding to the future timeout latency interval during which the expiration timestamps of the reassigned data entries are set to expire. Data entries that will become relevant in the near future are identified and moved to the relevant sequential bucket, which improves computational efficiency in comparison to, for example, implementation of a classical full secondary index on expiration timestamps.
Optionally, the expiration timestamp of each respective data entry segregates data entry storage across a heterogeneous storage hierarchy 204. Data entries may be segregated into different storage devices based on the age of the data. Frequently accessed data may be stored in fast memory, for example, a cache. Infrequently accessed data may be stored in slow memory, for example, a hard disk drive. Efficiency of the heterogeneous storage hierarchy is improved, by moving data entries to the storage device 204 that provides the best balance between storage space, speed, and frequency of access of the data entry.
Alternatively or additionally, when the expiration timestamp 204B of a certain data entry 204A is modified, the certain data entry is dynamically reassigned to another sequential bucket corresponding to the future timeout latency interval during which the modified expiration timestamp is set to expire. The reassigning is performed dynamically, when the expiration timestamp is modified, rather than and/or in addition to performing the reassigning when the certain sequential bucket is analyzed. It is noted
that update operations may modify or reset the value of the expiration timestamp (e.g., TTL) at any time. When the expiration timestamps 204A of data entries 204A are frequently updated, reassigning the certain data entry to another sequential bucket (i.e., an eagerly update strategy) improves computational performance by omitting the checking of the data entries stored in the current bucket.
At 112, blocks 104-110 are iterated. Each iteration cycle is performed during a subsequent time interval.
Optionally, the sequential buckets of adaptive index 206A are chained into a cyclic linked list. A marker points to the current sequential bucket denoted as bucket- 1 which is corresponding to the current timeout latency interval (e.g., during the current iteration cycle). The previous sequential bucket (e.g., during the previous iteration cycle) is denoted as bucket-N. Marker is moved to bucket-2 from bucket- 1 when the current timeout latency interval expires and a next timeout latency interval begins (which may occur during the next iteration cycle). When the marker is moved, the sequential buckets are logically relabeled according to bucket-2 being logically relabeled as bucket- 1 and bucket- 1 being logically relabeled as bucket-N.
Reference is now made to FIG. 3, which is a flowchart of an implementation of a process of managing a data storage device storing data entries each associated with an expiration timestamp based on sequential buckets of an adaptive index, in accordance with some embodiments of the present invention. One or more acts of the process described with reference to FIG. 3 may be based on, integrated with, and/or interchanged with an implementation of one or more acts described with reference to FIG. 1. The process described with reference to FIG. 3 may be implemented by one or more components of system 200 described with reference to FIG. 2, optionally as code instructions stored in data storage device 206 executable by processor(s) 208 of computing device 202.
At 302, the process starts.
At 304, an empty table R is created and stored in storage device 204. Table R includes a schema that generally defines the table's record format (e.g., stores data entries 204A), and defines a field storing an expiration value, optionally a TTL value (e.g., stores timestamps 204B).
At 306, the adaptive index 206A is created based on Table R. Data entries (e.g., records) stored in table R time out as defined by the schema, for example, every T seconds.
At 308, a background process scans table R is a lazy manner. Each data entry 204A is visited at least once during the time interval defined by the configured parameters N*T.
At 310, an alarm is set to the value T, which denotes the data entries set to expire within the current timeout latency interval T, and are stored within the current sequential bucket of the adaptive index.
At 312, the background process scans data entries 204A of Table R, according to blocks 104 and 106 as described with reference to FIG. 1.
At 314, the utilization of processor(s) 208 performing the scan is limited as defined by the configured QoS parameter(s).
At 316, an analysis is performed to determine when the timeout latency interval T passes.
At 318, when the timeout latency interval T passed, the background scan stops scanning Table R, and processes the sequential bucket bucket- 1 corresponding to the current T, according to block 108 as described with reference to FIG. 1.
At 320, an analysis is performed to determine whether bucket-1 is empty.
At 322, when bucket-1 is empty, the sequential buckets are relabeled. When the sequential buckets are chained as a cyclic linked list, the relabeling is performed as described herein. Processing the next timeout latency interval begins by resuming at block 310,
Alternatively, at 324, when bucket-1 is not empty, the expiration timestamp (e.g., TTL) of the next entry in sequential bucket bucket-1 is analyzed.
At 326, the expiration timestamp of an entry stored in bucket-1 is analyzed to determine whether the entry stored in bucket-1 actually timed out.
At 328, when the expiration timestamp of the entry timed out, the action associated with the expiration is triggered, for example, a notification event is raised, and/or a deletion of data is performed.
At 330, the data entry associated with the expired timestamp is removed from the sequential bucket. Remaining data entries of bucket bucket-1 are processed by resuming block 318.
At 332, when the data entry has not actually expired (e.g., the expiration timestamp was modified by an update process without affecting the data stored by the adaptive index), the indication of the data entry is moved to another sequential bucket corresponding to the modified value of the expiration timestamp, according to block 110 as described with reference to FIG. 1. Remaining data entries of bucket bucket-1 are processed by resuming block 318.
Referring back to block 316, when the timeout latency interval T has not yet passed, at 334, the background process checks whether a join request was issued in block 340 within the main process. When no join request is issued, scanning of entries in Table R is resumed at block 312. Alternatively, when a join request is issued, at 336, the background process acknowledges the join request to the main process waiting in block 340 and background scanning of Table R stops by terminating the background process.
Referring back to block 306, while the background process is executing as described with reference to block 308, at 338, processes perform updates on the data entries 204A and/or timestamps 204B of Table R.
At 340, Table R may be dropped (e.g., the database program accessing Table R is terminated). Alternatively or additionally, the main process may stop background scanning of Table R by issuing a join request and waiting for acknowledgement from the background process, according to blocks 334 and 336.
At 342 the process stops.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those
of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
It is expected that during the life of a patent maturing from this application many relevant storage devices will be developed and the scope of the term storage device is intended to include all such new technologies a priori.
As used herein the term "about" refers to ± 10 %.
The terms "comprises", "comprising", "includes", "including", "having" and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of and "consisting essentially of .
The phrase "consisting essentially of means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict.
Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as
individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.
Claims
1. An apparatus (202) for managing a data storage device (204) storing data entries (204A) each associated with an expiration timestamp (204B), the apparatus (202) comprising:
a processor (208) configured to:
analyze the expiration timestamp (204B) of each data entry (204A) at least once during a predefined time interval;
store within respective sequential buckets of an adaptive index (206A) wherein each sequential bucket corresponds to a respective sequential timeout latency interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval; and
process a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration.
2. The apparatus (202) according to claim 1, wherein the processor (208) is configured to define the expiration timestamp (204B) for each new data entry (204A) stored in the data storage device (204).
3. The apparatus (202) according to any of the previous claims, wherein the processor (208) is configured to reassign the data entries in the certain sequential bucket having associated expiration timestamps that do not expire during the current timeout latency interval, to other sequential buckets each corresponding to the future timeout latency interval during which the expiration timestamps of the reassigned data entries are set to expire.
4. The apparatus (202) according to any of the previous claims, wherein the processor (208) is configured, when the expiration timestamp (204B) of a certain data entry (204A) is modified, to reassign the certain data entry (204A) to another sequential
bucket corresponding to the future timeout latency interval during which the modified expiration timestamp is set to expire.
5. The apparatus (202) according to any of the previous claims, wherein the predefined time interval is based on the timeout latency interval (7) during which pending timeouts in the data storage device (204) are discovered and a scan laziness parameter (N).
6. The apparatus (202) according to claim 5, wherein the predefined time interval is computed as T*N.
7. The apparatus (202) according to claim 5 or claim 6, wherein N denotes the number of sequential buckets, and each sequential bucket denoted as bucket-I (1 < I < N) stores indication of data entries set to expire in the future timeout latency interval denoted by / * T.
8. The apparatus (202) according to any one of claims 5-7, wherein the configurable range of values of each one of T and N is set according to a quality of service (QoS) for maintaining at least one of a processor utilization requirement for managing the adaptive index (206A) and a storage requirement for storing the adaptive index (206A).
9. The apparatus (202) according to any of the previous claims, wherein the sequential buckets are chained into a cyclic linked list, wherein a marker points to the current sequential bucket denoted as bucket- 1 which is corresponding to the current timeout latency interval, wherein the previous sequential bucket is denoted as bucket-N, and the marker is moved to bucket-2 from bucket- 1 when the current timeout latency interval expires and a next timeout latency interval begins, and the sequential buckets are logically relabeled according to bucket-2 being logically relabeled as bucket- 1 and bucket- 1 being logically relabeled as bucket-N.
10. The apparatus (202) according to any of the previous claims, wherein when the data entries (204A) are stored in the data storage device (204) in segments exhibiting temporal order, the analysis of the expiration timestamps (204B) of data entries (204A) stored in a certain segment is stopped when a certain data entry (204A) of the certain segment is identified as not expiring within the predefined time interval, and the analysis is resumed on a different segment.
11. The apparatus (202) according to any of the previous claims, wherein the data entries (204A) are stored in a data structure that does not maintain temporal order.
12. The apparatus (202) according to any of the previous claims, wherein the analysis of the expiration timestamp (204B) of each data entry (204A) is performed as a background process.
13. The apparatus (202) according to any of the previous claims, wherein the expiration timestamp (204B) of each respective data entry (204A) segregates data entry storage across a heterogeneous storage hierarchy (204).
14. The apparatus (202) according to any of the previous claims, wherein the expiration timestamp (204B) of each respective data entry (204A) denotes a generic mechanism that triggers deletion and/or notification and/or other arbitrary actions on the expired data.
15. A method of managing a data storage device storing data entries each associated with an expiration timestamp, the method comprising:
analyzing the expiration timestamp of each data entry at least once during a predefined time interval (104)
storing within respective sequential buckets of an adaptive index wherein each sequential bucket corresponds to a respective sequential timeout latency interval of the predefined time interval, indications of data entries associated with expiration timestamps each set to expire within the corresponding respective sequential timeout latency interval (106); and
processing a certain sequential bucket corresponding to the current timeout latency interval of the predefined time interval to remove data entries from the certain sequential bucket having associated expiration timestamps denoting expiration (108).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2017/065237 WO2018233823A1 (en) | 2017-06-21 | 2017-06-21 | Systems and methods for data aging based management of data entries |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2017/065237 WO2018233823A1 (en) | 2017-06-21 | 2017-06-21 | Systems and methods for data aging based management of data entries |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018233823A1 true WO2018233823A1 (en) | 2018-12-27 |
Family
ID=59152885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2017/065237 WO2018233823A1 (en) | 2017-06-21 | 2017-06-21 | Systems and methods for data aging based management of data entries |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018233823A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3712780A1 (en) * | 2019-03-19 | 2020-09-23 | Hewlett-Packard Development Company, L.P. | Storing objects in data structures |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008043082A2 (en) * | 2006-10-05 | 2008-04-10 | Splunk Inc. | Time series search engine |
US20130103657A1 (en) * | 2010-05-14 | 2013-04-25 | Hitachi, Ltd. | Time-series data management device, system, method, and program |
US20130254212A1 (en) * | 2012-03-23 | 2013-09-26 | Nec (China) Co., Ltd. | Data indexing system, data indexing method and data querying method |
EP2746970A1 (en) * | 2012-12-19 | 2014-06-25 | Sap Ag | Timeline index for managing temporal data |
-
2017
- 2017-06-21 WO PCT/EP2017/065237 patent/WO2018233823A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008043082A2 (en) * | 2006-10-05 | 2008-04-10 | Splunk Inc. | Time series search engine |
US20130103657A1 (en) * | 2010-05-14 | 2013-04-25 | Hitachi, Ltd. | Time-series data management device, system, method, and program |
US20130254212A1 (en) * | 2012-03-23 | 2013-09-26 | Nec (China) Co., Ltd. | Data indexing system, data indexing method and data querying method |
EP2746970A1 (en) * | 2012-12-19 | 2014-06-25 | Sap Ag | Timeline index for managing temporal data |
Non-Patent Citations (1)
Title |
---|
ELMASRI R ET AL: "Efficient implementation techniques for the time index", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING. KOBE, JP, APRIL 8 - 12, 19; [PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING], LOS ALAMITOS, IEEE COMP. SOC. PRESS, US, vol. CONF. 7, 8 April 1991 (1991-04-08), pages 102 - 111, XP010022728, ISBN: 978-0-8186-2138-3, DOI: 10.1109/ICDE.1991.131457 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3712780A1 (en) * | 2019-03-19 | 2020-09-23 | Hewlett-Packard Development Company, L.P. | Storing objects in data structures |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9489137B2 (en) | Dynamic storage tiering based on performance SLAs | |
CN107077476B (en) | Enriching events with dynamic type big data for event processing | |
US9934473B2 (en) | Selecting a synchronous or asynchronous process to determine a forecast | |
US9268605B2 (en) | Mechanism for facilitating sliding window resource tracking in message queues for fair management of resources for application servers in an on-demand services environment | |
US10225341B2 (en) | Implementing synchronization of state information between instances of an application as well as between different applications in an efficient, scalable manner | |
US20120311295A1 (en) | System and method of optimization of in-memory data grid placement | |
US10318521B2 (en) | Query processing with bounded staleness for transactional mutations in NoSQL database | |
US20170161313A1 (en) | Detection and Resolution of Conflicts in Data Synchronization | |
WO2020215752A1 (en) | Graph computing method and device | |
US20180082228A1 (en) | Digital project management office | |
US20130163740A1 (en) | Asynchronous calls using intermittent callback for delay sensitive applications | |
US9292405B2 (en) | HANA based multiple scenario simulation enabling automated decision making for complex business processes | |
CA3176180A1 (en) | Power-performance based system management | |
US20160041926A1 (en) | Balanced cache for recently frequently used data | |
US20200125553A1 (en) | Systems and methods for management of a log-structure | |
US9513661B2 (en) | Calibrated timeout interval on a configuration value, shared timer value, and shared calibration factor | |
US20190034251A1 (en) | Method and apparatus for memory vulnerability prediction | |
WO2018233823A1 (en) | Systems and methods for data aging based management of data entries | |
US20160379599A1 (en) | Predictive screen display method and apparatus | |
US11636000B2 (en) | Method, device, and computer program product for managing processes based on reading speed of a message queue | |
WO2024152644A1 (en) | Data query method and system, device cluster, medium and program product | |
US10606795B2 (en) | Methods for managing a buffer cache and devices thereof | |
US20210034410A1 (en) | System and method for sorting and scheduling workflows | |
CN109376001A (en) | A kind of method and apparatus of resource allocation | |
US10169081B2 (en) | Use of concurrent time bucket generations for scalable scheduling of operations in a computer system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17732406 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17732406 Country of ref document: EP Kind code of ref document: A1 |