CN103415842A

CN103415842A - Systems and methods for data management virtualization

Info

Publication number: CN103415842A
Application number: CN2011800617167A
Authority: CN
Inventors: A·阿述托什; C·A·普罗文扎诺; D·F·常; P·J·阿伯尔克罗姆比埃; M·穆塔里克; M·A·罗曼
Original assignee: Actifio Inc
Current assignee: Actifio Inc
Priority date: 2010-11-16
Filing date: 2011-11-11
Publication date: 2013-11-27
Anticipated expiration: 2031-11-11
Also published as: KR20140051107A; EP2643760A4; JP2013543198A; BR112013012134A2; CN103415842B; AU2011329232A1; EP2643760A1; WO2012067964A1; CA2817592A1

Abstract

Systems and methods for data management virtualization are disclosed. The systems have a data management engine for performing data management functions, including at least a snapshot function and a back-up function, and a service level policy engine that controls the scheduling of the data management functions using service level agreements in electronic form. Data objects are organized in a deduplicated content store using an organized arrangement of temporal structures to represent states of data over time. Hash signatures are used to track content segments for performing deduplicated copies from a first deduplicated store to a second deduplicated store. Garbage collection is performed on the deduplicated content store for only content segments that have changed relative to an immediately-prior state of a given data object.

Description

Systems and methods for data management virtualization

相关申请的交叉引用Cross References to Related Applications

本申请要求下列专利申请的优先权：2010年11月16日提交的标题为“System and Method for Performing Backup or RestoreOperations Utilizing Difference Information and Timeline StateInformation”的美国专利申请号12/947,393、2010年11月16日提交的标题为“System and Method for Managing Data with Service LevelAgreements That May Specify Non-Uniform Copying of Data”的美国专利申请号12/947,385、2010年11月16日提交的标题为“System andMethod for Performing a Plurality of Prescribed Data ManagementFunctions in a Manner That Reduces Redundant Access Operations toPrimary Storage”的美国专利申请号12/947,436、2010年11月16日提交的标题为“System and Method for Creating Deduplicated Copies ofData by Tracking Temporal Relationships Among Copies and byIngesting Difference Data”的美国专利申请号12/947,418、2010年11月16日提交的标题为“System and Method for Managing DeduplicatedCopies of Data Using Temporal Relationships Among Copies”的美国专利申请号12/947,375、2010年11月16日提交的标题为“System andMethod for Improved Garbage Collection Operations in a DeduplicatedStore by Tracking Temporal Relationships Among Copies”的美国专利申请号12/947,383、2010年11月16日提交的标题为“System and Methodfor Creating Deduplicated Copies of Data by Sending Difference DataBetween Two Near-Neighbor Temporal States”的美国专利申请号12/947,513以及2010年11月16日提交的标题为“System and Methodfor Creating Deduplicated Copies of Data Storing Non-Lossy Encodingsof Data Directly in a Content Addressable Store”的美国专利申请号12/947,438。This application claims priority to the following patent applications: U.S. Patent Application No. 12/947,393, filed November 16, 2010, and entitled "System and Method for Performing Backup or Restore Operations Utilizing Difference Information and Timeline State Information," filed November 16, 2010 U.S. Patent Application No. 12/947,385, filed on November 16, 2010, entitled "System and Method for Managing Data with Service Level Agreements That May Specify Non-Uniform Copying of Data" and entitled "System and Method for Performing a Plurality of Prescribed Data Management Functions in a Manner That Reduces Redundant Access Operations to Primary Storage", U.S. Patent Application No. 12/947,436, filed November 16, 2010, entitled "System and Method for Creating Deduplicated Copies of Data by Tracking Amongel Temporations and by Ingesting Difference Data", U.S. Patent Application No. 12/947,418, filed November 16, 2010 and entitled "System and Method for Managing Deduplicated Copies of Data Using Temporal Relationships Among Copies", U.S. Patent Application No. 12/947,375, 2010 U.S. Patent Application No. 12/947,383, filed November 16, 2010, entitled "System and Method for Improved Garbage Collection Operations in a Deduplicated Store by Tracking Temporal Relationships Among Copies" U.S. Patent Application No. 12/947,513, filed November 16, entitled "System and Method for Creating Deduplicated Copies of Data by Sending Difference Data Between Two Near-Neighbor Temporal States," and filed November 16, 2010, entitled "System and US Patent Application No. 12/947,438 for "Method for Creating Deduplicated Copies of Data Storage Non-Lossy Encodings of Data Directly in a Content Addressable Store".

技术领域technical field

本发明通常涉及数据管理、数据保护、灾难恢复和商业连续性。更具体地，本发明涉及用于利用不同的信息和时间线状态信息来执行恢复的系统和方法。The present invention generally relates to data management, data protection, disaster recovery and business continuity. More specifically, the present invention relates to systems and methods for performing recovery utilizing disparate information and timeline state information.

背景技术Background technique

对管理应用数据的生命周期的商业要求在传统上通过使用多点解决方案来满足，其中每个解决方案处理生命周期的一部分。这导致复杂和昂贵的基础设施，其中数据的多个拷贝被创建和多次移动到对单独的存储库。服务器虚拟化的采用变成对简单、灵活和低成本计算机基础设施的促进因素。这导致虚拟主机和存储器的较大部署，进一步加重了新兴的计算机模型和当前的数据管理实现之间的差距。Business requirements for managing the lifecycle of application data have traditionally been met through the use of multiple point solutions, where each solution handles a portion of the lifecycle. This results in a complex and expensive infrastructure where multiple copies of the data are created and moved multiple times to separate repositories. The adoption of server virtualization has become a catalyst for simple, flexible and low-cost computer infrastructure. This leads to larger deployments of virtual hosts and storage, further exacerbating the gap between emerging computing models and current data management implementations.

提供商业服务的应用取决于在数据的生命周期的各种阶段的商业服务的数据的存储。图1示出数据管理操作的一般集合，其将应用于应用的数据例如在商业服务例如工资表管理之下的数据库。为了提供商业服务，应用102需要具有某个缩小水平的可靠性和可用性的主数据存储器122。Applications that provide business services depend on the storage of business service data at various stages of the data life cycle. Figure 1 shows a general set of data management operations that will apply to data of an application such as a database under business services such as payroll management. In order to provide business services, applications 102 require master data storage 122 with a certain reduced level of reliability and availability.

备份104被制作以防止恶化或主数据存储器通过硬件或软件的故障或人为误差。一般，备份可每日或每星期被制作到本地磁盘或磁带124，并较不频繁地（每周或每月）移动到在物理上远程的安全位置125。Backups 104 are made to protect against corruption or failure of the primary data storage through hardware or software failure or human error. Typically, backups may be made daily or weekly to local disk or tape 124 and moved less frequently (weekly or monthly) to a physically remote secure location 125 .

基于同一数据库的新应用的同时开发和测试106需要开发团队以访问数据126的另一拷贝。这样的快照可每周被制作，取决于开发计划。Simultaneous development and testing 106 of new applications based on the same database requires the development team to have access to another copy of the data 126 . Such snapshots may be made weekly, depending on the development schedule.

与合法或自愿策略的符合性108可能要求一些数据被保留用于在一些年的安全的未来访问；通常，数据被有规律地（比如，每月）拷贝到长期存档系统128。Compliance 108 with legal or voluntary policies may require that some data be retained for secure future access over a number of years; typically, data is copied to long-term archiving systems 128 on a regular basis (eg, monthly).

如果提供主商业服务的系统由于某个物理灾难而失灵，则灾难恢复服务110防止数据的灾难性损失。给出其它约束（例如成本），主数据被合理地频繁地拷贝（130）到物理上不同的位置。在灾难的情况下，主站点可被重建，且数据从安全拷贝移动回。Disaster recovery service 110 prevents catastrophic loss of data if the system providing the primary business service fails due to some physical disaster. Given other constraints (such as cost), master data is copied ( 130 ) to physically distinct locations reasonably frequently. In the event of a disaster, the primary site can be rebuilt and the data moved back from the safe copy.

如果主站点被损坏，则商业连续性服务112提供用于确保连续的商业服务的设施。通常这需要与主数据步调几乎一致的主数据的热拷贝132，以及用于将进入的请求切换到商业连续性服务器的复制系统和应用以及机构。Business continuity services 112 provide facilities for ensuring continued business services if the primary site is damaged. Typically this requires a hot copy 132 of the master data that is nearly in sync with the master data, and a replication system and application and mechanism for switching incoming requests to the business continuity server.

因此，数据管理当前是管理生命周期的不同部分的点应用的集合。这是在最近二十年中数据管理解决方案的发展的人为现象。Thus, data management is currently a collection of point applications that manage different parts of the lifecycle. This is an artifact of the evolution of data management solutions over the last two decades.

附图说明Description of drawings

图1是用来管理商业服务的数据生命周期的当前方法的简化图。Figure 1 is a simplified diagram of the current approach used to manage the data lifecycle of a business service.

图2是通过单个数据管理虚拟化系统在数据的整个生命周期中管理数据的概略图。Figure 2 is a schematic diagram of managing data throughout its life cycle through a single data management virtualization system.

图3是数据管理虚拟化系统的简化方框图。Figure 3 is a simplified block diagram of a data management virtualization system.

图4是数据管理虚拟化引擎的视图。Figure 4 is a view of a data management virtualization engine.

图5示出对象管理和数据移动引擎。Figure 5 shows the object management and data movement engine.

图6示出存储池管理器。Figure 6 shows a storage pool manager.

图7示出服务水平协议的分解。Figure 7 shows the breakdown of a service level agreement.

图8示出应用特定模块。Figure 8 shows application specific modules.

图9示出服务策略管理器。Figure 9 shows the service policy manager.

图10是服务策略调度器的流程图。Figure 10 is a flow diagram of the service policy scheduler.

图11是内容可寻址存储（CAS）提供器的方框图。Figure 11 is a block diagram of a Content Addressable Storage (CAS) provider.

图12示出在CAS系统内的对象句柄的定义。Fig. 12 shows the definition of object handles within the CAS system.

图13示出为CAS内的对象存储的时间关系图的数据模型和操作。Figure 13 shows the data model and operation of the temporal relationship graph stored for objects within the CAS.

图14是表示CAS中的垃圾收集算法的操作的图示。Fig. 14 is a diagram representing the operation of the garbage collection algorithm in the CAS.

图15是将对象拷贝到CAS的操作的流程图。Fig. 15 is a flowchart of the operation of copying an object to the CAS.

图16是数据管理虚拟化系统的一般部署的系统图。16 is a system diagram of a general deployment of a data management virtualization system.

图17是用在数据管理虚拟化系统上的特征物理服务器设备的示意图。Fig. 17 is a schematic diagram of a featured physical server device for use on a data management virtualization system.

具体实施方式Detailed ways

例如上面描述的当前数据管理结构和实现涉及处理数据生命周期管理的不同部分的多个应用，所有应用都执行某些常见功能：（a）产生应用数据的拷贝（这个行动的频率通常被称为恢复点目标（RPO）），（b）一般以专用格式将数据的拷贝存储在专门的存储库中，以及（c）在作为保留时间被测量的某个持续时间内保留拷贝。在每个点解决方案中的主要差异在于RPO的频率、保留时间以及所使用的单独存储库的特征，包括容量、成本和地理位置。Current data management structures and implementations such as those described above involve multiple applications handling different parts of data lifecycle management, all of which perform certain common functions: (a) make copies of application data (the frequency of this action is often referred to as Recovery Point Objective (RPO)), (b) stores a copy of the data in a dedicated repository, typically in a dedicated format, and (c) retains the copy for some duration measured as the retention time. The main differences within each point solution are the frequency of RPOs, retention times, and the characteristics of the individual repositories used, including capacity, cost, and geographic location.

本公开涉及数据管理虚拟化。数据管理虚拟化例如备份、复制和存档是虚拟化的，因为它们不必被配置以且单独和分开地运行。替代地，用户定义其关于数据的生命周期的商业要求，且数据管理虚拟化系统自动执行这些操作。快照从主存储器被带到次存储器；这个快照于是用于到其它次存储器的备份操作。本质上，假定由服务水平协议规定的数据保护的水平，可制作任意数量的这些备份。The present disclosure relates to data management virtualization. Data management virtualization such as backup, replication, and archiving are virtualized in that they do not have to be configured to operate individually and separately. Instead, users define their business requirements regarding the lifecycle of the data, and the data management virtualization system automatically performs these operations. The snapshot is brought from primary storage to secondary storage; this snapshot is then used for backup operations to other secondary storage. Essentially, any number of these backups can be made, assuming the level of data protection dictated by the service level agreement.

本公开还涉及用于使用时间状态之间的不同信息将数据从第一存储池备份到第二存储池的系统，其合并用于执行数据管理功能——包括至少备份功能以创建数据的备份拷贝——的数据管理引擎。数据管理引擎可操作来执行一序列快照操作以在第一存储池上创建应用数据的时间点映像，每个连续的时间点映像相应于应用数据的特定连续时间-状态，且每个快照操作创建指示对于相应的时间状态哪些应用数据已改变和已改变的应用数据的内容的不同信息。数据管理引擎还可操作来对应用数据执行被调度用于在不连续的时间-状态执行的至少一个备份功能，并且也充满具有时间-状态信息的维持历史信息，该时间-状态信息指示对于数据的相应备份拷贝对应用数据执行的最后备份功能的时间-状态。数据管理引擎从在对应用数据执行的最后备份功能的时间-状态和将对应用数据执行的当前调度的备份功能的时间-状态之间的每个时间状态的差异信息创建复合差异信息，并将复合差异信息发送到第二存储池以使用在最后时间-状态的数据的备份拷贝而被编译，以对当前时间-状态创建数据的备份拷贝。The present disclosure also relates to a system for backing up data from a first storage pool to a second storage pool using differing information between temporal states, incorporated for performing data management functions including at least backup functions to create backup copies of data - the data management engine. The data management engine is operable to perform a sequence of snapshot operations to create point-in-time images of the application data on the first storage pool, each successive point-in-time image corresponding to a particular successive time-state of the application data, and each snapshot operation creating an indication Different information about which application data has changed and the content of the changed application data for a corresponding time state. The data management engine is also operable to perform on the application data at least one backup function that is scheduled for execution at discrete time-states and is also populated with maintained history information having time-state information indicating The time-status of the last backup function performed on application data by the corresponding backup copy of . The data management engine creates composite differential information from the differential information for each time state between the time-state of the last backup function performed on the application data and the time-state of the currently scheduled backup function to be performed on the application data, and The composite differential information is sent to the second storage pool to be compiled using the backup copy of the data at the last time-state to create a backup copy of the data for the current time-state.

根据本公开的数据管理虚拟化技术基于以下面的引导原理为基础的结构和实现。The data management virtualization technology according to the present disclosure is based on a structure and implementation based on the following guiding principles.

首先，定义使用用于其整个数据生命周期的服务水平协议（SLA）的应用的商业要求。SLA比单个RPO、保留和恢复时间目标（RTO）复杂得多。它描述数据生命周期的每个阶段的数据保护特征。每个应用可具有不同的SLA。First, define the business requirements for applications that use service level agreements (SLAs) for their entire data lifecycle. SLAs are much more complex than individual RPOs, retention and recovery time objectives (RTOs). It describes data protection characteristics at each stage of the data lifecycle. Each application can have a different SLA.

其次，提供管理数据保护生命周期的统一数据管理虚拟化引擎，在具有提高的存储容量和网络带宽的各种存储库中移动数据。通过跟踪随着时间而改变的数据的部分并通过接触复制和压缩算法（其减少需要被拷贝和移动的数据的量），数据管理虚拟化系统通过补充现代存储系统的扩展的能力来实现这些改进。Second, provide a unified data management virtualization engine that manages the data protection lifecycle, moving data across various storage repositories with increased storage capacity and network bandwidth. Data management virtualization systems achieve these improvements by complementing the extended capabilities of modern storage systems by tracking the portions of data that change over time and by accessing replication and compression algorithms that reduce the amount of data that needs to be copied and moved .

第三，补充应用数据的单个主拷贝以作为生命周期内的多个元件的基础。很多数据管理操作例如备份、存档和复制取决于将被保护的数据的稳定一致的拷贝。数据管理虚拟化系统为了多个目的而补充数据的单个拷贝。由系统维持的数据的单个实例可用作源，每个数据管理功能可根据需要从这个源制作额外的拷贝。这通过传统方法中的多个独立的数据管理应用与需要将被拷贝多次的应用数据形成对比。Third, a single master copy of application data is supplemented as the basis for multiple elements within the lifecycle. Many data management operations such as backup, archive and replication depend on having a stable and consistent copy of the data to be protected. A data management virtualization system supplements a single copy of data for multiple purposes. A single instance of data maintained by the system can be used as a source, from which each data management function can make additional copies as needed. This is in contrast to the need for application data to be copied multiple times through multiple independent data management applications in traditional approaches.

第四，将物理存储资源提取到一系列数据保护存储池中，物理存储资源从不同类别的存储器——包括本地和远程磁盘、固态存储器、磁带和可选的介质、专用、公用和/或混合存储云——被虚拟化。存储池提供与类型、物理位置或基本存储技术无关的访问。数据的生命周期的商业要求可能要求在不同的时间将数据拷贝到不同类型的存储介质。数据管理虚拟化系统允许用户将不同的存储介质分类和聚集到存储池中，例如由高速磁盘组成的快速恢复存储池和可以是在高容量磁盘上的解除复制的存储库或磁带库的成本有效的长期存储池。数据管理虚拟化系统可在这些池当中移动数据以利用每个存储介质的唯一特征。存储池的提取提供与类型、物理位置或基本存储技术无关的访问。Fourth, the abstraction of physical storage resources into a series of data protection storage pools, physical storage resources from different categories of storage - including local and remote disk, solid state storage, tape and optional media, dedicated, public and / or hybrid Storage Cloud - Virtualized. Storage pools provide access regardless of type, physical location, or underlying storage technology. Business requirements for the lifecycle of data may require that data be copied to different types of storage media at different times. Data management virtualization systems allow users to classify and aggregate different storage media into storage pools such as fast recovery storage pools consisting of high-speed disks and can be cost-effective deduplicated storage libraries or tape libraries on high-capacity disks long-term storage pool. A data management virtualization system can move data among these pools to take advantage of the unique characteristics of each storage medium. Abstraction of storage pools provides access regardless of type, physical location, or underlying storage technology.

第五，利用基本设备能力和接触复制后应用数据来改进在存储池与灾难位置之间的数据的移动。数据管理虚拟化系统发现包括存储池的存储系统的能力，并利用这些能力来有效地移动数据。如果存储系统是支持创建数据体的快照或克隆的能力的磁盘阵列，则数据管理虚拟化系统将利用这个能力并使用快照来制作数据的拷贝，而不是从一个地方读取数据并将它写到另一个地方。类似地，如果存储系统支持变化跟踪，则数据管理虚拟化系统将仅仅以变化更新较老的拷贝以有效地创建新拷贝。当在整个网络中移动数据时，数据管理虚拟化系统使用避免发送已经在网络的另一侧上可用的数据的解除复制和压缩算法。Fifth, the movement of data between storage pools and disaster locations is improved by utilizing basic device capabilities and access to replicated application data. A data management virtualization system discovers the capabilities of storage systems, including storage pools, and utilizes these capabilities to move data efficiently. If the storage system is a disk array that supports the ability to create snapshots or clones of data volumes, the data management virtualization system will take advantage of this capability and use snapshots to make copies of data, rather than reading data from one place and writing it to another place. Similarly, if the storage system supports change tracking, the data management virtualization system will only update older copies with changes to effectively create new copies. When moving data across the network, the data management virtualization system uses deduplication and compression algorithms that avoid sending data that is already available on the other side of the network.

改进数据移动的一个关键方面是识别出应用数据随着时间的过去而缓慢改变。今天制作的应用的拷贝将通常具有与昨天制作的同一应用的拷贝的很多类似性。事实上，数据的今天的拷贝可被表示为具有一系列增量变换的昨天的拷贝，其中增量变换本身的大小通常比拷贝本身中的所有数据小得多。数据管理虚拟化系统捕获并记录以位图或范围列表的形式的这些变换。在系统的一个实施例中，基本存储资源——磁盘阵列或服务器虚拟化系统——能够跟踪对容量或文件进行的变化；在这些环境中，数据管理虚拟化系统查询存储资源以得到这些变化列表，并将其与被保护数据一起保存。A key aspect of improving data movement is recognizing that application data changes slowly over time. A copy of an application made today will generally have much similarity to a copy of the same application made yesterday. In fact, today's copy of data can be represented as yesterday's copy with a series of incremental transformations, where the incremental transformations themselves are usually much smaller in size than all the data in the copy itself. The data management virtualization system captures and records these transformations in the form of bitmaps or range lists. In one embodiment of the system, the underlying storage resource—a disk array or server virtualization system—is capable of tracking changes made to volume or files; in these environments, the data management virtualization system queries the storage resource for a list of these changes , and save it with the protected data.

在数据管理虚拟化系统的优选实施例中，存在用于窃听应用的主数据访问路径的机构，其使数据管理虚拟化系统能够观察应用数据的哪些部分被修改，并产生经修改的数据的其自己的位图。如果例如应用在特定的时期期间修改块100、200和300，则数据管理虚拟化系统将窃听这些事件，并创建指示这些特定的块被修改的位图。当处理应用数据的下一拷贝时，数据管理虚拟化系统将只处理块100、200和300，因为它知道这些块仅仅是被修改的块。In a preferred embodiment of the data management virtualization system, there is a mechanism for eavesdropping on the main data access path of the application, which enables the data management virtualization system to observe which parts of the application data are modified, and to generate other parts of the modified data. own bitmap. If, for example, an application modifies blocks 100, 200, and 300 during certain periods of time, the data management virtualization system will eavesdrop on these events and create a bitmap indicating that these certain blocks were modified. When processing the next copy of the application data, the data management virtualization system will only process blocks 100, 200 and 300 because it knows that these blocks are only modified blocks.

在系统的一个实施例（其中应用的主存储器是现代磁盘阵列或存储虚拟化装置）中，数据管理虚拟化系统利用基本存储设备制作数据的初始副本的时间点快照能力。这个虚拟拷贝机构是创建初始拷贝的快速、有效和低冲击技术，其不保证所有的位都将被拷贝或存储在一起。替代地，通过维持允许拷贝在访问时间被重建的元数据和数据结构例如写时拷贝卷位图或范围来构建虚拟拷贝。拷贝具有对应用和对主存储设备的比较轻的冲击。在另一实施例（其中应用基于服务器虚拟化系统例如VMware或Xen）中，数据管理虚拟化系统使用构造在服务器虚拟化系统内的类似的虚拟机快照能力。当虚拟拷贝能力不是可用的时，数据管理虚拟化系统可包括其自己的内置快照机构。In one embodiment of the system, where the application's main storage is a modern disk array or storage virtualization appliance, the data management virtualization system utilizes the point-in-time snapshot capability of the underlying storage device to make an initial copy of the data. This virtual copy mechanism is a fast, efficient and low-impact technique for creating an initial copy that does not guarantee that all bits will be copied or stored together. Instead, virtual copies are constructed by maintaining metadata and data structures that allow the copy to be recreated at access time, such as copy-on-write volume bitmaps or extents. Copying has a relatively light impact on the application and on the primary storage device. In another embodiment (where the application is based on a server virtualization system such as VMware or Xen), the data management virtualization system uses a similar virtual machine snapshot capability built into the server virtualization system. When virtual copy capabilities are not available, the data management virtualization system may include its own built-in snapshot mechanism.

可能使用快照作为构成由系统支持的所有数据管理功能的基础的数据基元。因为它是轻便的，快照可用作内部操作，即使所请求的操作本身不是快照；它被创建来实现和便于其它操作。It is possible to use snapshots as data primitives that form the basis of all data management functions supported by the system. Because it is portable, a snapshot can be used as an internal operation, even if the requested operation is not itself a snapshot; it is created to implement and facilitate other operations.

在创建快照时，可能有所涉及的某些预备操作，以便创建相干快照或相干映像，使得映像可恢复到应用可使用的状态。这些预备操作只需要被执行一次，即使快照将在系统中的多个数据管理功能例如根据策略调度的备份拷贝中被补充。预备操作可包括应用静止，其包括清除数据高速缓冲存储器和冻结应用的状态；它也可能包括在本领域中已知的其它操作和对保持完整的映像有用的其它操作，例如从与映像一起储存的应用收集元数据信息。When creating a snapshot, there may be some preparatory operations involved in order to create a coherent snapshot or coherent image so that the image can be restored to a state that is usable by the application. These preparatory operations only need to be performed once, even though the snapshot will be replenished in multiple data management functions in the system, such as policy-scheduled backup copies. Preparing operations may include application quiescence, which includes clearing the data cache and freezing the state of the application; it may also include other operations known in the art and useful to maintain a complete image, such as from storing with the image The application collects metadata information.

图2示出虚拟化数据管理系统可根据这些原理处理早些时候描述的数据生命周期要求的一种方式。Figure 2 illustrates one way in which a virtualized data management system can handle the data lifecycle requirements described earlier in accordance with these principles.

为了服务于本地备份要求，在本地高可用性存储器202内产生一序列有效的快照。这些快照中的一些用于适应开发/测试要求，而不制作另一拷贝。为了本地备份的较长期保留，拷贝被有效地制作到长期本地存储器204中，长期本地存储器204在这个实现中使用解除复制来减少重复的拷贝。在长期存储器内的拷贝可作为备份被访问或作为档案被处理，取决于由SLA应用的保留策略。数据的拷贝被制作到远程存储器206，以便满足远程备份和商业连续性的要求——拷贝的单个集合再次满足这两个目的。作为对远程备份和灾难恢复的备选方案，数据的另一拷贝可被有效地制作到由商业或私人云存储提供商托管的存储库208。To serve local backup requirements, a sequence of valid snapshots is generated within the local high availability storage 202 . Some of these snapshots are used to accommodate dev/test requirements without making another copy. For longer-term retention of local backups, copies are efficiently made into long-term local storage 204, which in this implementation uses deduplication to reduce duplicate copies. The copy in long-term storage can be accessed as a backup or processed as an archive, depending on the retention policy applied by the SLA. Copies of data are made to remote storage 206 in order to meet remote backup and business continuity requirements—again, a single set of copies meets both purposes. As an alternative to remote backup and disaster recovery, another copy of the data can effectively be made to a repository 208 hosted by a commercial or private cloud storage provider.

数据管理虚拟化系统Data Management Virtualization System

图3示出实现上述原理的数据管理虚拟化系统的高级部件。优选地，该系统包括下文进一步描述的这些基本功能部件。Figure 3 shows the high-level components of a data management virtualization system implementing the principles described above. Preferably, the system includes these basic functional components described further below.

应用300创建并拥有数据。这是由用户已部署为例如电子邮件系统、数据库系统或财务报告系统以便满足某种计算需要的软件系统。应用一般在服务器上运行并利用存储器。为了说明性目的，只有一个应用被指示。实际上，可能有由单个数据管理虚拟化系统管理的数百个或甚至数千个应用。Application 300 creates and owns data. This is a software system that has been deployed by a user as, for example, an e-mail system, a database system, or a financial reporting system in order to satisfy a certain computing need. Applications typically run on servers and utilize memory. For illustrative purposes, only one application is indicated. In fact, there may be hundreds or even thousands of applications managed by a single data management virtualization system.

存储资源302是应用数据在它的整个生命周期中被存储所在的地方。存储资源是物理存储资产，包括用户已获取来处理数据存储要求的内部磁盘驱动器、磁盘阵列、光学和磁带存储库和基于云的存储系统。存储资源由主存储器310和次存储器312组成，在主存储器中应用数据的在线活动拷贝被存储，在次存储器中应用数据的额外拷贝被存储，用于诸如备份、灾难恢复、存档、编索引、报告和其它用途的目的。次存储器资源可包括在与主存储器相同的外壳内的额外存储器以及基于同一数据中心内、另一位置或在跨越互联网的类似或不同的存储技术的存储器。Storage resources 302 are where application data is stored throughout its life cycle. Storage resources are physical storage assets including internal disk drives, disk arrays, optical and tape repositories, and cloud-based storage systems that users have acquired to handle data storage requirements. The storage resource consists of a primary memory 310 where an online active copy of the application data is stored and a secondary memory where an additional copy of the application data is stored for purposes such as backup, disaster recovery, archiving, indexing, Reporting and Other Purposes. Secondary storage resources may include additional storage within the same enclosure as primary storage as well as storage based on similar or different storage technologies within the same data center, another location, or across the Internet.

一个或多个管理工作站308允许用户规定服务水平协议（SLA）304，其定义应用数据的生命周期。管理工作站是用于配置、监控和控制数据管理虚拟化系统的桌上型或膝上型计算机或移动计算设备。服务水平协议是捕获关于应用数据的次拷贝的创建、保留和删除的详细商业要求的详细规范。SLA比在传统数据管理应用中用于表示单个次存储器类别的拷贝的频率和预期恢复时间的简单的RTO和RPO复杂得多。SLA捕获在数据生命周期规范中的多个阶段，并允许在每个次存储器类别内非一致的频率和保留规范。在图7中更详细地描述了SLA。One or more management workstations 308 allow users to specify service level agreements (SLAs) 304 , which define the life cycle of application data. A management workstation is a desktop or laptop computer or mobile computing device used to configure, monitor, and control a data management virtualization system. A service level agreement is a detailed specification that captures detailed business requirements regarding the creation, retention and deletion of secondary copies of application data. SLAs are much more complex than the simple RTO and RPO used in traditional data management applications to express the frequency and expected recovery time of copies of a single secondary storage class. SLAs capture multiple stages in the data lifecycle specification and allow for non-uniform frequency and retention specifications within each secondary storage class. The SLA is described in more detail in FIG. 7 .

数据管理虚拟化引擎306管理如在SLA中规定的应用数据的整个生命周期。它可能管理用于大量应用的大量SLA。数据管理虚拟化引擎通过管理工作站从用户获取输入，并与应用交互以发现应用主存储资源。数据管理虚拟化引擎做出关于什么数据需要被保护和什么次存储资源最好地满足保护需要的决策。例如，如果企业为了商业连续性目的和备份目的而将其会计数据指定为需要拷贝以非常短的时间间隔被制作，则引擎可根据适当的一组SLA决定以短的时间间隔创建会计数据的拷贝到第一存储池，并且还以较长的时间间隔创建会计数据的备份拷贝到次存储池。这由存储应用的商业要求确定。The data management virtualization engine 306 manages the entire lifecycle of application data as specified in the SLA. It may manage a large number of SLAs for a large number of applications. The data management virtualization engine takes input from the user through the management workstation and interacts with the application to discover application primary storage resources. The data management virtualization engine makes decisions about what data needs to be protected and what secondary storage resources best meet the protection needs. For example, if a business specifies its accounting data as requiring copies to be made at very short intervals for business continuity purposes and backup purposes, the engine may decide to create copies of the accounting data at short intervals according to an appropriate set of SLAs to the primary storage pool, and also create backup copies of the accounting data to the secondary storage pool at longer intervals. This is determined by the business requirements of the storage application.

引擎接着使用可用的存储资源的高级能力来制作应用数据的拷贝。在上面的例子中，引擎可使用存储装置的内置虚拟拷贝或快照能力来调度短时间间隔的商业连续性拷贝。数据管理虚拟化引擎在存储资源当中移动应用数据，以便满足在SLA中捕获的商业要求。在图4中更详细地描述了数据管理虚拟化引擎。The engine then uses the advanced capabilities of the available storage resources to make a copy of the application data. In the above example, the engine may use the storage device's built-in virtual copy or snapshot capabilities to schedule business continuity copies at short intervals. The data management virtualization engine moves application data among storage resources in order to meet business requirements captured in SLAs. The data management virtualization engine is described in more detail in FIG. 4 .

数据管理虚拟化系统作为整体可部署在单个主机计算机系统或装置内，或者它可以是一个逻辑实体，但物理地分布在通用和特定用途系统的网络中。系统的某些部件也可部署在计算或存储云中。The data management virtualization system as a whole can be deployed within a single host computer system or device, or it can be one logical entity but physically distributed across a network of general and special purpose systems. Certain components of the system may also be deployed in computing or storage clouds.

在数据管理虚拟化系统的一个实施例中，数据管理虚拟化引擎主要作为多个进程在一对容错冗余计算机上运行。数据管理虚拟化引擎的某些部件可在应用服务器内的应用附近运行。一些其它部件可在主存储器和次存储器附近在存储结构内或在存储系统本身中运行。管理站一般是通过安全网络连接到引擎的桌上型和膝上型计算机以及移动设备。In one embodiment of the data management virtualization system, the data management virtualization engine runs primarily as multiple processes on a pair of fault-tolerant redundant computers. Certain components of the data management virtualization engine may run adjacent to the application within the application server. Some other components may operate within the storage structure near the primary and secondary storage or within the storage system itself. Management stations are typically desktop and laptop computers and mobile devices connected to the engine over a secure network.

数据管理虚拟化引擎Data Management Virtualization Engine

图4示出根据本发明的某些实施例的数据管理虚拟化引擎306的体系结构概览。引擎306包括下列模块：Figure 4 shows an architectural overview of the data management virtualization engine 306 according to some embodiments of the invention. Engine 306 includes the following modules:

应用特定模块402。该模块负责控制和收集来自应用300的元数据。应用元数据包括关于应用的信息，例如应用的类型、关于其配置的细节、其数据存储库的位置、其当前操作状态。控制应用的操作包括各种行动，例如将高速缓存的数据清除到磁盘、冻结和解冻应用I/O、旋转或截断日志文件，以及关闭和重新启动应用。应用特定模块执行这些操作，并响应于来自下面描述的服务水平策略引擎406的命令而发送和接收元数据。参照图8更详细地描述应用特定模块。Application specific module 402 . This module is responsible for controlling and collecting metadata from the application 300 . Application metadata includes information about the application, such as the type of application, details about its configuration, the location of its data repository, its current operating state. Controlling the operation of an application includes actions such as flushing cached data to disk, freezing and unfreezing application I/O, rotating or truncating log files, and shutting down and restarting applications. The application specific modules perform these operations and send and receive metadata in response to commands from the service level policy engine 406 described below. The application specific modules are described in more detail with reference to FIG. 8 .

服务水平策略引擎406根据由用户提供的SLA304来操作以做出关于应用数据的拷贝的创建、移动和删除的决策。每个SLA描述与一个应用的保护有关的商业要求。服务水平策略引擎分析每个SLA并得出一系列行动，其中每个行动涉及应用数据从一个存储位置到另一存储位置的拷贝。服务水平策略引擎接着审查这些行动以确定优先级和相关性，并调度和发起数据移动作业。参照图9更详细地描述服务水平策略引擎。The service level policy engine 406 operates according to the SLA 304 provided by the user to make decisions regarding the creation, movement and deletion of copies of application data. Each SLA describes the business requirements associated with the protection of an application. The service level policy engine analyzes each SLA and derives a series of actions, where each action involves the copying of application data from one storage location to another. The service level policy engine then reviews these actions to determine priority and relevance, and schedules and initiates data movement jobs. The service level policy engine is described in more detail with reference to FIG. 9 .

对象管理器和数据移动引擎410创建它按照来自策略引擎的指令移动通过不同的存储池的应用数据、应用元数据和SLA所组成的复合对象。对象管理器从服务策略引擎406接收以命令的形式的指令以基于属于应用300的实时主数据413而在特定的池中或从现有的拷贝例如415在另一池中创建应用数据的拷贝。由对象管理器和数据移动引擎创建的复合对象的拷贝是自身完备的和自我描述的，因为它不仅包含应用数据，而且包含应用元数据和用于应用的SLA。参照图5更详细地描述了对象管理器和数据移动引擎。The object manager and data movement engine 410 creates composite objects consisting of application data, application metadata and SLAs that it moves through different storage pools according to instructions from the policy engine. The object manager receives instructions in the form of commands from the service policy engine 406 to create a copy of the application data in a particular pool or in another pool from an existing copy eg 415 based on the live master data 413 belonging to the application 300 . The copy of the composite object created by the object manager and data movement engine is self-contained and self-describing because it contains not only application data, but also application metadata and SLAs for the application. The object manager and data movement engine are described in more detail with reference to FIG. 5 .

存储池管理器412是调适和提取基本物理存储资源302并将其呈现为虚拟存储池418的部件。物理存储资源是实际存储资产，例如用户为了支持用户的应用的数据的生命周期的目的而已部署的磁盘阵列和磁带库。这些存储资源可基于不同的存储技术，例如磁盘、磁带、闪存或光学存储器。存储资源也可具有不同的地理位置、成本和速度属性，并可支持不同的协议。存储池管理器的作用是将存储资源组合和聚集，并掩蔽在其编程接口之间的差异。存储池管理器将物理存储资源呈现给对象管理器410作为一组存储池，这组存储池具有使这些池适合于应用数据的生命周期中的特定阶段的特征。参照图6更详细地描述了存储池管理器。Storage pool manager 412 is a component that adapts and abstracts underlying physical storage resources 302 and presents them as virtual storage pools 418 . Physical storage resources are actual storage assets, such as disk arrays and tape libraries that a user has deployed for the purpose of supporting the lifecycle of data for the user's applications. These storage resources can be based on different storage technologies such as disk, tape, flash memory or optical storage. Storage resources can also have different geographic locations, cost and speed attributes, and can support different protocols. The role of the storage pool manager is to combine and aggregate storage resources and to mask the differences between its programming interfaces. The storage pool manager presents physical storage resources to the object manager 410 as a set of storage pools with characteristics that make the pools suitable for specific stages in the lifecycle of application data. The storage pool manager is described in more detail with reference to FIG. 6 .

对象管理器和数据移动引擎Object Manager and Data Movement Engine

图5示出对象管理器和数据移动引擎410。对象管理器和数据移动引擎发现并使用由池管理器504呈现给它的虚拟存储资源510。它接受来自服务水平策略引擎406的请求以从虚拟存储池中的资源创建和维持数据存储对象实例，且它根据来自服务水平策略引擎的指令在虚拟存储池中的存储对象的实例之间拷贝应用数据。被选择用于拷贝的目标池隐含地指定被选择的商业操作，例如备份、复制或恢复。服务水平策略引擎在本地常驻到对象管理器（在同一系统上）或远程地常驻并通过标准联网通信使用协议来进行通信。可在优选实施例中使用TCP/IP，因为它是被很好地理解的、广泛可用的，并允许服务水平策略引擎在本地定位到对象管理器或在几乎没有修改的情况下远程地定位。FIG. 5 shows the object manager and data movement engine 410 . The object manager and data movement engine discover and use the virtual storage resources 510 presented to it by the pool manager 504 . It accepts requests from the service level policy engine 406 to create and maintain data storage object instances from resources in the virtual storage pool, and it copies applications between instances of storage objects in the virtual storage pool according to instructions from the service level policy engine data. The target pool selected for copying implicitly specifies the selected business operation, such as backup, replication or restore. The Service Level Policy Engine resides locally to the Object Manager (on the same system) or remotely and communicates through standard networked communications using protocols. TCP/IP can be used in the preferred embodiment because it is well understood, widely available, and allows the Service Level Policy Engine to be located locally to the Object Manager or remotely with little modification.

在一个实施例中，系统可为了实现的容易而将服务水平策略引擎与对象管理器部署在同一计算机系统上。在另一实施例中，系统可使用多个系统，如果对应用有益或方便，则每个系统托管部件的子集，而不改变设计。In one embodiment, the system can deploy the service level policy engine and the object manager on the same computer system for ease of implementation. In another embodiment, the system may use multiple systems, each hosting a subset of components if beneficial or convenient to the application, without changing the design.

对象管理器501和存储池管理器504是可常驻在计算机系统平台上的软件部件，该计算机系统平台将存储资源和使用那些存储资源的计算机系统互连，其中用户的应用常驻在所述计算机系统上。在互连平台上的这些软件部件的放置被指定为优选实施例，并可提供经由广泛用于这样的应用的通信协议（例如，光纤信道、iSCSI等）将客户系统连接到存储器的能力，并且还可提供各种软件部件的部署的便捷。Object manager 501 and storage pool manager 504 are software components that may be resident on a computer system platform that interconnects storage resources and the computer systems that use those storage resources, where user applications reside on the on the computer system. The placement of these software components on an interconnect platform is specified as a preferred embodiment and may provide the ability to connect client systems to storage via communication protocols (e.g., Fiber Channel, iSCSI, etc.) widely used for such applications, and Ease of deployment of various software components may also be provided.

对象管理器501和存储池管理器504经由由基本存储虚拟化平台提供的应用编程接口与该平台通信。这些接口允许软件部件查询并控制计算机系统的行为和它如何将存储资源和该计算机系统互连，其中用户的应用常驻在所述计算机系统上。如在实践中常见的那样，部件应用模块化技术，以允许对给定平台特定的互通信代码的替换。Object manager 501 and storage pool manager 504 communicate with the underlying storage virtualization platform via application programming interfaces provided by the platform. These interfaces allow the software component to query and control the behavior of the computer system on which the user's applications reside and how it interconnects storage resources with the computer system. As is common in practice, components apply modularization techniques to allow replacement of intercommunication code specific to a given platform.

对象管理器和存储池管理器经由协议进行通信。这些通过一般在计算机系统上可用的标准联网协议例如TCP/IP或标准进程间通信（IPC）机制来传输。取决于特定的计算机平台，如果部件常驻在同一计算机平台上或由网络连接的多个计算机平台上的话，这允许在部件之间的可比较的通信。为了容易部署，当前配置具有常驻在同一计算机系统上的所有本地软件部件。如上所述，这不是设计的严格要求，并可在未来按需要重新配置。Object managers and storage pool managers communicate via a protocol. These are transmitted via standard networking protocols such as TCP/IP or standard interprocess communication (IPC) mechanisms generally available on computer systems. Depending on the particular computer platform, this allows comparable communication between components if the components are resident on the same computer platform or on multiple computer platforms connected by a network. For ease of deployment, the current configuration has all local software components resident on the same computer system. As mentioned above, this is not a strict requirement of the design and can be reconfigured as needed in the future.

对象管理器object manager

对象管理器501是用于维持数据存储对象的软件部件，并提供一组协议操作来控制它。操作包括在对象之间的数据的创建、销毁、复制和拷贝，维持对对象的访问，且特别是允许用于创建拷贝的存储池的规范。没有所有池都支持的共同的功能子集；然而，在优选实施例中，主池可以是性能优化的，即，较低的时延，而备份或复制池可以是容量优化的，支持较大数量的数据和是内容可寻址的。池可以是远程的或本地的。存储池根据各种标准来分类，包括用户可用来进行商业决策的手段，例如每千兆字节存储的成本。Object manager 501 is a software component for maintaining data storage objects and provides a set of protocol operations to control it. Operations include creation, destruction, copying and copying of data between objects, maintaining access to objects, and in particular allowing specification of storage pools for creating copies. There is no common subset of features supported by all pools; however, in a preferred embodiment, the primary pool may be performance optimized, i.e., lower latency, while the backup or replicated pool may be capacity optimized, supporting larger The data sum of the quantity is content addressable. Pools can be remote or local. Storage pools are classified according to various criteria, including the means available to users to make business decisions, such as cost per gigabyte of storage.

首先，特定的存储设备（存储从其取出）可以是一个考虑因素，因为为了不同的商业目的连同相关的成本和其它实际考虑因素而分配设备。一些设备可以甚至不是实际硬件，而是作为服务而提供的容量，且这样的资源的选择可以为了实际商业目的而完成。First, the particular storage device from which storage is taken may be a consideration, as devices are allocated for different business purposes along with associated cost and other practical considerations. Some equipment may not even be actual hardware, but capacity offered as a service, and the selection of such resources may be done for actual business purposes.

其次，网络拓扑“接近度”被考虑，因为附近的存储器一般由低时延、不昂贵的网络资源连接，而远处的存储器可由高时延、带宽限制的昂贵网络资源连接；相反，当地理多样性免受影响本地资源的物理灾难时，存储池相对于源的距离可能是有益的。Second, network topology "proximity" is considered, because nearby storage is generally connected by low-latency, inexpensive network resources, while distant storage can be connected by high-latency, bandwidth-limited expensive network resources; conversely, when geographical The distance of the storage pool relative to the source can be beneficial when diversity is protected from physical disasters affecting local resources.

第三，考虑存储优化特征，其中一些存储对空间有效的存储而优化，但需要计算时间和资源来在数据被存储之前分析或转换数据，而其它存储器相比之下是“性能优化的”，相比之下采用更多的存储资源，但使用比较少的计算机时间或资源来转换数据（如果有的话）。Third, consider storage optimization characteristics, some of which are optimized for space-efficient storage but require computational time and resources to analyze or transform data before it is stored, while others are "performance-optimized" in comparison, In contrast, more storage resources are used, but less computer time or resources are used to transform the data (if any).

第四，考虑“访问速度”特征，其中存储计算机平台所固有的一些资源对用户的应用例如虚拟SCSI块设备容易和快速地变得可用，而一些资源可能只被间接地使用。恢复的这些容易和速度常常由所使用的存储的类型控制，且这允许它被适当地分类。Fourth, consider the "speed of access" feature, where some resources inherent to the storage computer platform become readily and quickly available to user applications, such as virtual SCSI block devices, while some resources may only be used indirectly. These eases and speeds of recovery are often governed by the type of storage used, and this allows it to be categorized appropriately.

第五，考虑到所使用的存储器的数量和在给定池中可用的数量，因为可能有集中或扩展所使用的存储容量的益处。Fifth, take into account the amount of storage used and available in a given pool, as there may be benefits to pooling or expanding the storage capacity used.

如下所述的服务水平策略引擎组合用户所提供的SLA与分类标准以确定如何和何时维持应用数据，且存储池从其提取所需资源以满足服务水平协议（SLA）。A service level policy engine, described below, combines user provided SLAs with classification criteria to determine how and when to maintain application data, and the storage pool draws the required resources from it to meet the service level agreement (SLA).

对象管理器501创建、维持和使用历史机制来跟踪对性能池内的数据对象执行的操作系列，并使那些操作与将对象移动到其它存储池的其它操作特别是容量优化的操作相关。每个数据对象的这系列记录被维持在主池中的所有数据对象的对象管理器处，所有数据对象最初由主数据对象关联，接着按操作顺序关联：每个对象的时间线和所有这样的时间线的列表。所执行的每个操作展示基本虚拟化基元以在给定时间点捕获数据对象的状态。The object manager 501 creates, maintains and uses a history mechanism to track the series of operations performed on data objects within a performance pool and correlate those operations with other operations that move objects to other storage pools, especially capacity optimized operations. This series of records for each data object is maintained at the object manager of all data objects in the main pool, all data objects are initially associated by the main data object, and then in order of operations: each object's timeline and all such List of timelines. Each operation performed exposes basic virtualization primitives to capture the state of a data object at a given point in time.

此外，基本存储虚拟化装置可被修改以暴露并允许内部数据结构例如位图的取回，位图指示在数据对象内的数据的部分的修改。这些数据结构被利用来在数据点捕获数据对象的状态：例如，数据对象的快照，并提供在规定的时间获取的快照之间的差异，并从而实现最佳备份和恢复。虽然特定的实现和数据结构可在来自不同供货商的不同装置当中改变，数据结构用来跟踪对数据对象的变化，且存储器用来保持对象的已改变的那些部分的原始状态：数据结构中的指示相应于在存储器中保留的数据。当访问快照时，数据结构被咨询，且对于已改变的部分，保存的数据而不是当前数据被访问，因为数据对象在这样指示的区域处被修改。所使用的一般数据结构是位图，其中每个位相应于数据对象的区段。设置位指示区段在快照操作的时间点之后被修改。基本快照基元机制维持此，只要快照对象存在。Furthermore, the base storage virtualization apparatus can be modified to expose and allow retrieval of internal data structures such as bitmaps indicating modifications of portions of data within data objects. These data structures are exploited to capture the state of a data object at a data point: for example, a snapshot of the data object, and provide the differences between snapshots taken at specified times, and thus enable optimal backup and recovery. Although specific implementations and data structures may change among different devices from different vendors, data structures are used to track changes to data objects, and memory is used to maintain the original state of those parts of the objects that have changed: The indications correspond to data retained in memory. When accessing the snapshot, the data structure is consulted, and for changed parts, the saved data is accessed instead of the current data, because the data object was modified at the region so indicated. The general data structure used is a bitmap, where each bit corresponds to a section of a data object. A set bit indicates that the extent was modified after the point in time of the snapshot operation. The basic snapshot primitive mechanism maintains this as long as the snapshot object exists.

上面描述的时间线对照给定原始数据对象来维持快照操作的列表，包括操作开始的时间、它停止的时间（如果有的话）、对快照对象的引用以及对内部数据结构的引用（例如，位图或范围列表），使得它可从基本系统得到。还维持对在任何给定时间点将数据对象的状态拷贝到另一池中——作为例子，使用在对象句柄中的内容寻址结果将数据对象的状态拷贝到容量优化池407中——的结果的引用。该对象句柄相应于给定快照，并使用快照操作存储在时间线中。该关联用于识别适当的起始点。The timeline described above maintains a list of snapshot operations against a given original data object, including when the operation started, when it stopped (if any), a reference to the snapshot object, and references to internal data structures (e.g., bitmap or range list), making it available from the base system. Also maintained is the copying of the state of the data object into another pool at any given point in time—as an example, copying the state of the data object into the capacity-optimized pool 407 using the content addressing results in the object handle A reference to the result. This object handle corresponds to the given snapshot and is stored in the timeline using the snapshot operation. This association is used to identify an appropriate starting point.

最佳备份和恢复咨询从期望起始点到终点的操作的列表。操作的时间排序的列表及其相应的数据结构（位图）被构造成使得从开始到结束的连续时间系列被实现：在系列中的操作的起始时间之间没有间隔。这确保对数据对象的所有变化由相应的位图数据结构表示。取回从开始到结束的所有操作不是必要的；同时存在的数据对象和基本快照在时间上重叠；在时间中没有间隔仅仅是必须的，其中不被跟踪的改变可能出现。当位图指示某个存储块已改变但不是改变是什么时，位图可被添加或组成在一起以实现在时间间隔中出现的一组所有变化。不是使用这个数据结构来访问在时间点处的状态，系统替代地利用数据结构表示当时间向前流逝时修改的数据的事实。更确切地，数据对象的终结状态在所指示的区域处被访问，因而将这组变化返回到从给定开始时间到终止时间的给定数据对象。Best Backup and Recovery Advisory list of operations from desired start point to end point. The time-ordered list of operations and their corresponding data structures (bitmaps) are structured such that a continuous time series from start to finish is realized: there is no gap between the start times of the operations in the series. This ensures that all changes to the data object are represented by the corresponding bitmap data structure. It is not necessary to retrieve all operations from start to finish; concurrent data objects and base snapshots overlap in time; it is only necessary that there are no gaps in time where untracked changes may occur. When a bitmap indicates that a certain memory block has changed but not what the change was, the bitmaps can be added or composed together to achieve a set of all changes that occur in a time interval. Rather than using this data structure to access the state at a point in time, the system instead exploits the fact that the data structure represents data that is modified as time passes forward. Rather, the final state of the data object is accessed at the indicated region, thus returning the set of changes to the given data object from the given start time to the end time.

备份操作利用这个时间线、相关的引用和对内部数据结构的访问以实现我们的备份操作。类似地，它以补充的方式使用系统来实现我们的恢复操作。下面在“最佳备份/恢复”的章节中描述特定的步骤。Backup operations utilize this timeline, associated references and access to internal data structures to implement our backup operations. Similarly, it uses the system in a complementary way to implement our recovery operations. Specific steps are described below in the "Best Backup/Restore" section.

虚拟存储池类型Virtual storage pool type

图5示出几个代表性存储池类型。虽然在附图中描绘了一个主存储池和两个次存储池，在一些实施例中可配置更多的存储池。Figure 5 shows several representative storage pool types. Although one primary storage pool and two secondary storage pools are depicted in the figures, more storage pools may be configured in some embodiments.

主存储池507——包含用于创建数据对象的存储资源，其中用户应用存储其数据。这与其它存储池相反，其它存储池存在来主要实现数据管理虚拟化引擎的操作。Main Storage Pool 507 - Contains storage resources for creating data objects where user applications store their data. This is in contrast to other storage pools, which exist to primarily implement the operations of the data management virtualization engine.

性能优化池508——能够通过用户应用提供高性能备份（即，下面描述的时间点复制）以及对备份映像的快速访问的虚拟存储池。Performance Optimized Pool 508 - a virtual storage pool capable of providing high performance backup (ie, point-in-time copy described below) and fast access to backup images by user applications.

容量优化池509——主要以高度空间有效的方式通过使用下面描述的复制技术来提供数据对象的存储的虚拟存储池。虚拟存储池提供对数据对象的拷贝的访问，但不作为其主要目标以高性能这么做，与上面的性能优化池相反。Capacity Optimized Pool 509 - A virtual storage pool that provides storage of data objects primarily in a highly space efficient manner by using the replication techniques described below. A virtual storage pool provides access to copies of data objects, but does not do so with high performance as its primary goal, in contrast to the performance-optimized pools above.

初始部署包含如上所述的存储池作为最小操作集合。设计充分预期表示上面说明的标准的各种组合的各种类型的多个池和方便地表示在未来部署中的所有存储器的多个池管理器。上面示出的折衷是计算数据存储系统的特征。The initial deployment contains storage pools as described above as a minimal set of operations. The design fully anticipates multiple pools of various types representing various combinations of the criteria described above and multiple pool managers conveniently representing all storage in future deployments. The trade-offs shown above are characteristic of computing data storage systems.

从实际观点看，这三个池表示以非常简单的方式处理大部分用户要求的优选实施例。大部分用户将发现，如果他们有用于紧急恢复需要的一个存储池（其提供快速恢复）以及低成本的另一个池，使得大量映像可在长时间段内被保留，则对数据保护的几乎所有商业要求可以在很少损害的情况下被满足。From a practical point of view, these three pools represent a preferred embodiment that handles most user requirements in a very simple manner. Most users will find that if they have one pool of storage for emergency recovery needs (which provides fast recovery) and another pool at low cost so that large numbers of images can be retained over long periods of time, almost all aspects of data protection are Commercial requirements can be met with little damage.

在每个池中的数据的格式由在池内的所使用的目标和技术指示。例如，快速恢复池以非常类似于原始数据的形式被维持以最小化所需的转换并提高恢复的速度。另一方面，长期存储池使用解除复制和压缩来减小数据的大小并因此减小存储的成本。The format of the data in each pool is dictated by the objects and technologies used within the pool. For example, the fast recovery pool is maintained in a form that closely resembles the original data to minimize required transformations and increase the speed of recovery. Long-term storage pools, on the other hand, use deduplication and compression to reduce the size of data and thus the cost of storage.

对象管理操作505Object Management Operations 505

对象管理器501根据由服务水平协议引擎406发送给它的指令来创建并维持来自虚拟存储池418的数据存储对象503的指令。对象管理器在五个主要区域中提供数据对象操作：时间点复制或拷贝（通常被称为“快照”）、标准拷贝、对象维护、映射和访问维护以及收集。Object manager 501 creates and maintains instructions for data storage objects 503 from virtual storage pool 418 according to instructions sent to it by service level agreement engine 406 . The object manager provides data object operations in five main areas: point-in-time copy or copy (often referred to as "snapshot"), standard copy, object maintenance, map and access maintenance, and collection.

对象管理操作还包括用于维护虚拟存储池本身并取回关于它们的信息的一系列资源发现操作。池管理器504最终提供用于这些的功能性。Object management operations also include a series of resource discovery operations for maintaining the virtual storage pools themselves and retrieving information about them. The pool manager 504 ultimately provides the functionality for these.

时间点拷贝（“快照”）操作Point-in-time copy ("snapshot") operations

快照操作创建表示在特定的时间点处的初始对象实例的数据对象实例。更具体地，快照操作使用规定的虚拟存储池的资源来创建集合的成员的完整虚拟拷贝。这被称为数据存储对象。数据存储对象的多个状态随着时间的过去而被维持，使得在数据点处存在的数据存储对象的状态是可用的。如上所述，虚拟拷贝是使用基本存储虚拟API来实现的拷贝，基本存储虚拟API允许使用写时靠背或其它带内技术而不是将复制数据的所有位拷贝和存储到磁盘来以轻便的方式被创建。在一些实施例中，这可使用被写入的软件模块来实现以得到由例如EMC、vmware或IBM提供的现用的基本存储虚拟系统的能力。在这样的基本虚拟不是可用的场合，所述系统可提供其自己的用于与智能硬件通过接口连接的虚拟层。A snapshot operation creates a data object instance representing the initial object instance at a particular point in time. More specifically, a snapshot operation uses the resources of a specified virtual storage pool to create a complete virtual copy of a member of the collection. This is called a datastore object. The multiple states of the data storage object are maintained over time such that the state of the data storage object present at the data point is available. As mentioned above, a virtual copy is a copy implemented using the base storage virtualization API, which allows data to be copied in a portable manner using back-on-write or other in-band techniques rather than copying and storing all bits of the copied data to disk. create. In some embodiments, this can be accomplished using software modules written to get the capabilities of the off-the-shelf basic storage virtualization systems provided by, for example, EMC, vmware, or IBM. Where such basic virtualization is not available, the system may provide its own virtualization layer for interfacing with intelligent hardware.

快照操作要求应用将数据的状态冻结到特定的点，使得映像数据是相干的，并使得快照可稍后用于在快照的时间时恢复应用的状态。也可能需要其它预备步骤。这些由在随后的章节中描述的应用特定模块302处理。因此对于实时应用，需要最轻便的操作。Snapshot operations require the application to freeze the state of the data to a specific point so that the image data is coherent and the snapshot can be used later to restore the state of the application at the time of the snapshot. Other preliminary steps may also be required. These are handled by application specific modules 302 described in subsequent sections. Therefore for real-time applications, the most portable operation is required.

快照操作在系统中用作所有高水平操作的数据基元。实际上，它们提供对在特定的时间点的数据的状态的访问。因为快照也一般使用区分开所改变的内容与常驻在磁盘上的内容的写时拷贝技术来实现，这些快照提供也可被组成或添加在一起来在整个系统中有效地拷贝数据的差异。快照的格式可以是由数据移动器502拷贝的数据的格式，这在下面被描述。Snapshot operations are used as data primitives for all high-level operations in the system. In effect, they provide access to the state of the data at a specific point in time. Because snapshots are also typically implemented using copy-on-write technology that separates changed content from what is resident on disk, these snapshots provide diffs that can also be composed or added together to effectively copy data across the system. The format of the snapshot may be the format of the data copied by the data mover 502, which is described below.

标准拷贝操作standard copy operation

当拷贝操作不是快照时，它可被认为是标准拷贝操作。标准拷贝操作将一个存储池中的源数据对象的全部或子集拷贝到另一存储池中的数据对象。结果是两个不同的对象。可被使用的一种类型的标准拷贝操作是初始“基线”拷贝。这一般在数据最初从一个虚拟存储池拷贝到另一虚拟存储池例如从性能优化池到容量优化存储池时完成。可使用另一类型的标准考虑操作，其中只有改变的数据或差异被拷贝到目标存储池以更新目标对象。这将在初始基线拷贝以前被执行之后出现。When a copy operation is not a snapshot, it can be considered a standard copy operation. A standard copy operation copies all or a subset of a source data object in one storage pool to a data object in another storage pool. The result is two different objects. One type of standard copy operation that can be used is an initial "baseline" copy. This is typically done when data is initially copied from one virtual storage pool to another, such as from a performance optimized pool to a capacity optimized storage pool. Another type of standard consideration operation may be used, where only changed data or differences are copied to the target storage pool to update the target object. This will occur after the initial baseline copy was previously performed.

对象的完全详尽的版本不需要每当拷贝被制作时保存在系统中，即使当数据虚拟系统首先被初始化时需要基线拷贝。这是因为每个虚拟拷贝提供对完整拷贝的访问。任何增量或差异可被表示为与虚拟拷贝有关而不是与基线有关。这具有实质上消除走查一系列改变列表的共同步骤的积极的副作用。A fully detailed version of the object need not be kept in the system each time a copy is made, even though a baseline copy is required when the data virtualization system is first initialized. This is because each virtual copy provides access to a full copy. Any deltas or differences can be expressed as being relative to the virtual copy rather than the baseline. This has the positive side effect of virtually eliminating the common step of walking through a series of change lists.

标准拷贝操作由池管理器所提供的和数据移动器所接收的一系列指令或请求发起，以引起在数据存储对象当中数据的移动，并维持数据存储对象本身。拷贝操作允许规定的数据存储对象的拷贝使用规定的虚拟存储池来创建。结果是在存储池中的目标数据对象中的源数据对象的拷贝。A standard copy operation is initiated by a series of instructions or requests provided by the pool manager and received by the data mover to cause the movement of data among the data storage objects and maintain the data storage objects themselves. A copy operation allows a copy of a specified data storage object to be created using a specified virtual storage pool. The result is a copy of the source data object in the target data object in the storage pool.

快照和拷贝操作每个使用预备操作和激活操作来构造。预备和激活的两个步骤允许长期运行资源分配操作（其是预备阶段的特征）从激活去耦。这可以是只暂停短短一会儿同时实现快照操作的时间点特征的应用所需要的，快照实际上花费有限但非零数量的时间来实现。对于拷贝和快照操作类似地，这个二步骤预备和激活结构只有当所有集合成员的资源可被分配时才允许策略引擎继续操作。Snapshot and copy operations are each constructed using a prepare operation and an activate operation. The two steps of preparation and activation allow long-running resource allocation operations (which are characteristic of the preparation phase) to be decoupled from activation. This may be desired for applications that only pause for a short while while implementing the point-in-time characteristics of snapshot operations, which actually take a finite but non-zero amount of time to implement. Similar to copy and snapshot operations, this two-step provisioning and activation structure allows the policy engine to continue operating only when all set members' resources can be allocated.

对象维护object maintenance

对象维护操作是用于维持数据对象的一系列操作，包括创建、破坏和复制。对象管理器和数据移动器使用池请求中介（下面更多）的功能来实现这些操作。数据对象可被维持在全局水平处、每个存储池处或优选地两者。Object maintenance operations are a series of operations for maintaining data objects, including creation, destruction, and replication. Object managers and data movers use the capabilities of the pool request broker (more on that below) to implement these operations. Data objects can be maintained at the global level, per storage pool or preferably both.

集合gather

收集操作是辅助功能。收集是抽象的软件概念，在存储器中由对象管理器维持的列表。它们允许策略引擎206对集合中的所有成员请求一系列操作，允许对所有成员的请求的一致的应用。集合的使用允许时间点快照的同时激活，使得多个数据存储对象都精确地在同一时间点被捕获，因为这一般是对逻辑上正确的恢复的应用所需要的。集合的使用允许在集合的所有成员中拷贝操作的方便请求，其中应用将使用多个存储对象作为逻辑整体。The collect operation is a helper function. A collection is an abstract software concept, a list maintained in memory by an object manager. They allow policy engine 206 to request a series of operations on all members of the set, allowing consistent application of requests to all members. The use of collections allows simultaneous activation of point-in-time snapshots such that multiple data store objects are all captured at exactly the same point in time, as this is generally required for applications of logically correct recovery. The use of collections allows for convenient requests for copy operations across all members of the collection, where an application will use multiple storage objects as a logical whole.

资源发现操作Resource discovery operation

对象管理器通过向池管理器504发出对象管理操作505来发现虚拟存储池，并使用关于每个池得到的信息来选择满足给定请求的所需标准的池，或在没有匹配的情况下，故障池被选择，且对象管理器可接着使用来自选定的虚拟存储池的资源来创建数据存储对象。The object manager discovers virtual storage pools by issuing object management operations 505 to the pool manager 504, and uses the information obtained about each pool to select a pool that meets the required criteria for a given request, or in the case of no match, A failure pool is selected, and the object manager can then use resources from the selected virtual storage pool to create data storage objects.

映射和访问mapping and access

对象管理器还提供对象管理器操作的集合来允许并维持这些对象对外部应用的可用性。第一集合是用于注册和取消注册用户的应用常驻的计算机的操作。通过对在使用中的存储网络来说典型的身份（例如，光纤信道WWPN、iSCSI身份等）来注册计算机。第二集合是“映射”操作，且当被存储池（对象从存储池创建）允许时，数据存储对象可被“映射”，也就是说，对用户应用常驻的计算机变得可用。The object manager also provides a set of object manager operations to enable and maintain the availability of these objects to external applications. The first set is operations for registering and deregistering the user's application-resident computer. Register the computer with an identity typical for the storage network in use (eg, Fiber Channel WWPN, iSCSI identity, etc.). The second set is the "map" operation, and when allowed by the storage pool from which the object is created, a datastore object can be "mapped", that is, made available to the computer on which the user application resides.

这个可用性采取对存储适当的形式，例如存在于SAN上作为光纤信道磁盘或网络上的iSCSI设备的块设备、在文件共享网络上的文件系统等，并可由应用计算机上的操作系统使用。类似地，“未映射”操作将网络上的虚拟存储设备的可用性反转到用户应用。以这种方式，对一个应用存储的数据，即，备份可以是在稍后的时间另一计算机上的另一应用可用的，即，恢复。This availability takes the form of appropriate storage, such as block devices that exist as Fiber Channel disks on a SAN or iSCSI devices on a network, file systems on a file sharing network, etc., and are available to the operating system on the application computer. Similarly, the "unmap" operation reverses the availability of the virtual storage device on the network to the user application. In this way, data stored for one application, ie, a backup, can be made available, ie, restored, to another application on another computer at a later time.

502数据移动器502 Data Mover

数据移动器502是在对象管理器内的软件部件以及根据从对象管理器接收的对快照（时间点）拷贝请求和标准拷贝请求的指令在各种数据存储对象503当中读和写数据的数据移动器。数据移动器在整个系统中的数据对象的实例当中提供用于读和写数据的操作。数据移动器还提供操作，其允许查询并维持对象管理器请求它来执行的长期运行操作的状态。Data mover 502 is a software component within the object manager and data mover that reads and writes data among various data storage objects 503 according to instructions received from the object manager for snapshot (point-in-time) copy requests and standard copy requests device. Data movers provide operations for reading and writing data among instances of data objects throughout the system. The Data Mover also provides operations that allow querying and maintaining the state of long-running operations that the Object Manager requests it to perform.

数据移动器使用来自池功能提供器（见图6）的功能来实现其操作。快照功能提供器608允许表示在特定的时间点的初始对象实例的数据对象实例的创建。差异引擎功能提供器614用于请求在时间链上相关的两个数据对象之间的差异的描述。对于存储在内容可寻址池上的数据对象，提供特殊功能，其可提供任何两个任意数据对象之间的差异。在一些情况下也由基本存储虚拟化系统且在其它情况下由在商品存储上实现此的模块来为性能优化池提供该功能。数据移动器502使用关于差异的信息来选择它在数据对象503的实例之间拷贝的数据集。The Data Mover uses capabilities from the Pool Capabilities Provider (see Figure 6) for its operations. A snapshot function provider 608 allows the creation of a data object instance representing the initial object instance at a particular point in time. The difference engine function provider 614 is used to request a description of the difference between two data objects related on the time chain. For data objects stored on content-addressable pools, a special function is provided that provides the difference between any two arbitrary data objects. This functionality is also provided for performance optimized pools in some cases by the underlying storage virtualization system and in other cases by modules that implement this on commodity storage. Data mover 502 uses the information about differences to select the data sets it copies between instances of data object 503 .

对于给定池，差异引擎提供器提供随着时间的过去在数据存储对象的两个状态之间的差异的特定表示。对于快照提供器，在两个时间点之间的改变被记录为到数据存储对象的给定部分的写入。在一个实施例中，差异被表示为位图，其中每个位相应于在第一开始并上升到最后一个的数据对象区域的有序列表，其中设定的位指示修改的区域。这个位图从基本存储虚拟化系统所使用的写时拷贝位图得到。在另一实施例中，差异可被表示为相应于数据的改变的区域的范围的列表。对于内容可寻址存储提供器610，该表示在下面被描述，并用于有效地确定不同的两个内容可寻址数据对象的部分。For a given pool, a difference engine provider provides a specific representation of the difference between two states of a datastore object over time. For a snapshot provider, changes between two points in time are recorded as writes to a given portion of a datastore object. In one embodiment, the differences are represented as a bitmap, where each bit corresponds to an ordered list of regions of the data object starting at the first and ascending to the last, with a set bit indicating the modified region. This bitmap is derived from the copy-on-write bitmap used by the underlying storage virtualization system. In another embodiment, the differences may be represented as a list of extents corresponding to changed regions of the data. For the content-addressable storage provider 610, this representation is described below and is used to effectively determine the different parts of two content-addressable data objects.

数据移动器使用这个信息来只拷贝不同的那些部分，使得数据对象的新版本可通过首先复制它、得到差异的列表并接着只移动相应于列表中的那些差异的数据来从现有的版本创建。数据移动器502遍历差异的列表，将所指示的区域从源数据对象移动到目标数据对象。（见数据备份和恢复的最佳方式）。The data mover uses this information to copy only those parts that differ, so that a new version of a data object can be created from an existing version by first copying it, getting a list of differences, and then moving only the data corresponding to those differences in the list . The data mover 502 traverses the list of differences, moving the indicated regions from the source data object to the target data object. (See Best Ways to Backup and Restore Data).

506拷贝操作——请求转换和指令506 copy operation - request conversion and instruction

对象管理器501指示数据移动器502穿过一系列操作以在虚拟存储池418中拷贝数据对象当中的数据。该过程包括在接收到指令时开始的下列步骤：Object manager 501 instructs data mover 502 through a series of operations to copy data among data objects in virtual storage pool 418 . The process consists of the following steps that begin when an instruction is received:

首先，创建集合请求。集合的名称被返回。First, create a collection request. The name of the collection is returned.

其次，将对象添加到集合。来自上文的集合名称也用作将被拷贝的源数据对象的名称和两个前例的名称：在源存储资源池中被对照来获取差异的数据对象以及在目标存储资源池中的相应数据对象。这个步骤对将被这个集合中操作的每个源数据对象重复。Second, add the object to the collection. The collection name from above is also used as the name of the source data object to be copied and the names of the two preceding instances: the data object in the source storage resource pool that is compared for differences and the corresponding data object in the target storage resource pool . This step is repeated for each source data object that will be operated on in this collection.

第三，准备拷贝请求。集合名称也被提供，以及存储资源池充当目标。准备命令指示对象管理器接触存储池管理器以创建相应于集合中的每个资源的必要的目标数据对象。准备命令也提供待复制的在目标存储资源池中的相应数据对象，所以提供器可复制所提供的对象并使用其作为目标对象。拷贝请求的引用名称被返回。Third, prepare the copy request. A collection name is also provided, and a storage resource pool acts as a target. The prepare command instructs the object manager to contact the storage pool manager to create the necessary target data objects corresponding to each resource in the collection. The prepare command also provides the corresponding data object in the target storage resource pool to be copied, so the provider can copy the provided object and use it as the target object. The reference name of the copy request is returned.

第四，激活拷贝请求。上面返回的拷贝请求的引用名称被提供。数据移动器被指示来将给定源对象拷贝到其相应的目标对象。每个请求包括引用名称以及描述总体工作（源目标对的整个集合）的序列号以及描述每个单独的源-目标对的序列号。除了源-目标对以外，相应的前例的名称作为拷贝指令的部分而被提供。Fourth, activate the copy request. The reference name of the copy request returned above is provided. A data mover is instructed to copy a given source object to its corresponding target object. Each request includes a reference name and a sequence number describing the overall work (the entire set of source-target pairs) and a sequence number describing each individual source-target pair. Except for source-target pairs, the name of the corresponding predecessor is provided as part of the copy instruction.

第五，拷贝引擎使用存储池中的数据对象的名称来得到前例和来自在源处的差异引擎之间的差异。所指示的差异接着从源传输到目标。在一个实施例中，这些差异作为位图和数据被传输。在另一实施例中，这些差异作为范围列表和数据被传输。Fifth, the copy engine uses the name of the data object in the storage pool to get the difference between the previous instance and from the difference engine at the source. The indicated differences are then transferred from the source to the target. In one embodiment, these differences are transmitted as bitmaps and data. In another embodiment, these differences are transmitted as range lists and data.

503数据存储对象503 data storage object

数据存储对象是使用计算机数据处理设备和软件熟悉的惯用语和方法来允许应用数据的存储和取回的软件结构。实际上，这些当前采用在存储网络上的SCSI块设备的形式，例如SCSI LUN或内容可寻址容器，其中内容的指示器由其中的数据构造并唯一地识别该数据。通过向池管理器发出指令来创建和维持数据存储对象。用于使应用数据持久的实际存储从虚拟存储池提取，数据存储对象从虚拟存储池创建。Data storage objects are software structures that allow the storage and retrieval of application data using idioms and methods familiar to computer data processing devices and software. In practice, these currently take the form of SCSI block devices on storage networks, such as SCSI LUNs or content-addressable containers, where a pointer to content is constructed from the data within and uniquely identifies that data. Create and maintain data storage objects by issuing instructions to the pool manager. The actual storage used to make application data persistent is drawn from the virtual storage pool, from which data storage objects are created.

数据存储对象的结构根据存储池而变化，数据存储对象从存储池创建。对于采取在存储网络上的块设备——给定块设备的数据结构——的形式的对象，数据对象实现在数据对象内的每个块的逻辑块地址（LBA）与实际存储位置的设备标识符和LBA之间的映射。数据对象的标识符用于识别待使用的映射集合。当前实施例依赖于基本物理计算机平台所提供的服务来实现这个映射，并依赖于其内部数据结构例如位图或范围列表。The structure of the data storage objects varies according to the storage pool from which the data storage objects are created. For objects that take the form of a block device—a data structure for a given block device—on a storage network, the data object implements the logical block address (LBA) and device identity of the actual storage location for each block within the data object Mapping between tokens and LBAs. The identifier of the data object is used to identify the mapping set to use. The current embodiment relies on the services provided by the underlying physical computer platform to implement this mapping, and on its internal data structures such as bitmaps or range lists.

对于采取内容可寻址容器的形式的对象，内容识别特征用作标识符，且数据对象被存储，如下面在关于解除复制的章节中描述的。For objects in the form of content addressable containers, the content identifying characteristics are used as identifiers and the data objects are stored as described below in the section on deduplication.

504池管理器504 Pool Manager

池管理器504是用于管理虚拟存储资源和如下所述的相关功能特征的软件部件。对象管理器501和数据移动引擎502与一个或多个池管理器504通信以维持数据存储对象503。Pool manager 504 is a software component for managing virtual storage resources and related functional features as described below. Object manager 501 and data movement engine 502 communicate with one or more pool managers 504 to maintain data storage objects 503 .

510虚拟存储资源510 virtual storage resources

虚拟存储资源510是用于实现存储池功能的池管理器可采用的各种存储器，如下所述。在这个实施例中，存储虚拟器用于将各种外部光纤信道或iSCSI存储LUN作为虚拟化存储器提供给池管理器504。Virtual storage resources 510 are various types of storage available to the pool manager for implementing storage pool functions, as described below. In this embodiment, a storage virtualizer is used to present various external Fiber Channel or iSCSI storage LUNs to the pool manager 504 as virtualized storage.

存储池管理器storage pool manager

图6进一步示出存储池管理器504。存储池管理器的目的是向作为存储资源池的对象管理器/数据移动器提供基本虚拟存储资源，存储资源池是具有由系统的其它部件利用的公共接口的存储和数据管理功能的抽象。这些公共接口一般包括用于识别并处理与特定的时间状态相关的数据对象和用于产生以位图或范围的形式的数据对象之间的差异的机构。在本实施例中，工具管理器提供主存储池、性能优化池和容量优化池。公共接口允许对象管理器创建和删除在这些池中的数据存储对象，作为其它数据存储对象的拷贝或作为新对象，且数据移动器可在数据存储对象之间移动对象，并可使用数据对象区分操作的结果。FIG. 6 further illustrates the storage pool manager 504 . The purpose of the storage pool manager is to provide basic virtual storage resources to the object manager/data mover as a storage resource pool, which is an abstraction of storage and data management functions with a common interface utilized by other parts of the system. These common interfaces generally include mechanisms for identifying and handling data objects associated with a particular temporal state and for generating differences between data objects in the form of bitmaps or ranges. In this embodiment, the facility manager provides primary storage pools, performance optimized pools and capacity optimized pools. The public interface allows object managers to create and delete datastore objects in these pools, either as copies of other datastore objects or as new objects, and data movers can move objects between datastore objects, and can use data object distinctions the result of the operation.

存储池管理器具有用于实现类似功能的不同实现的公共接口的一般结构，其中一些功能由“智能”基本资源提供，而其它功能必须在较少的功能基本资源上实现。A storage pool manager has a general structure of common interfaces for different implementations of similar functions, some of which are provided by "intelligent" primitives, while other functions must be implemented on less functional primitives.

池请求中介602和池功能提供器604是在与对象管理器/数据移动器相同的过程中或在经由本地或网络协议例如TCP通信的另一过程中执行的软件模块。在本实施例中，提供器包括主存储提供器606、快照提供器608、内容可寻址提供器610和差异引擎提供器614，且这些提供器在下文被进一步描述。在另一实施例中，这组提供器可以是这里所示的提供器的超集。Pool request broker 602 and pool function provider 604 are software modules that execute in the same process as the object manager/data mover or in another process that communicates via a local or network protocol such as TCP. In this embodiment, the providers include a main storage provider 606, a snapshot provider 608, a content addressable provider 610, and a difference engine provider 614, and these providers are further described below. In another embodiment, the set of providers may be a superset of the providers shown here.

虚拟存储资源510是用于实现存储池功能的池管理器可采用的不同类型的存储器。在本实施例中，虚拟存储资源包括存储虚拟化系统的SCSI逻辑单元的组：存储虚拟化系统在与池管理器相同的硬件上运行并且通过编程接口可访问（对于数据和管理操作）：除了标准块存储功能以外，另外的能力是可用的，包括创建和删除快照，以及跟踪卷的改变的部分。在另一实施例中，虚拟资源可以来自暴露类似能力的外部存储系统，并可在接口（例如通过文件系统或通过网络接口例如CIFS、iSCSI或CDMI访问的）上、在能力（例如，资源是否支持产生写时复制快照的操作）上、或在非功能方面（例如，高速/有限容量例如固态磁盘相对于低速/高容量例如SATA磁盘）上不同。可用的能力和接口确定哪些提供器可消耗虚拟存储资源，以及哪个池功能需要在池管理器内由一个或多个提供器实现：例如，内容可寻址存储提供器的这个实现只需要“哑”存储，且实现完全在内容可寻址提供器610内；基本内容可寻址虚拟存储资源可替代地用在较简单的“通过”提供器上。相反，快照提供器的这个实现通常“通过”，并需要暴露快速时间点拷贝操作的存储。Virtual storage resources 510 are different types of storage that can be employed by the pool manager for implementing storage pool functionality. In this embodiment, the virtual storage resource consists of a group of SCSI logical units of the storage virtualization system: The storage virtualization system runs on the same hardware as the pool manager and is accessible (for data and management operations) through a programming interface: except In addition to standard block storage functionality, additional capabilities are available, including creating and deleting snapshots, and tracking changed parts of a volume. In another embodiment, virtual resources can come from external storage systems that expose similar capabilities, and can be accessed on interfaces (for example, through a file system or through network interfaces such as CIFS, iSCSI, or CDMI), in capabilities (for example, whether the resource operations that support copy-on-write snapshots), or in non-functional aspects (eg, high-speed/limited-capacity such as solid state disks versus slow/high-capacity such as SATA disks). The available capabilities and interfaces determine which providers can consume virtual storage resources, and which pool functionality needs to be implemented within the pool manager by one or more providers: for example, this implementation of the Content Addressable Storage provider requires only "dumb " storage, and the implementation is entirely within the content-addressable provider 610; the basic content-addressable virtual storage resource can be used instead on a simpler "pass-through" provider. Instead, this implementation of the snapshot provider generally "passes through" and needs to expose storage for fast point-in-time copy operations.

池请求中介602是简单的软件部件，其通过对照所配置的虚拟存储资源510执行一组适当的池功能提供器来提供对存储池特定的功能的请求。可被提供的请求包括但不限于在池中创建对象；从池删除对象；将数据写到对象；从对象读数据；在池内拷贝对象；在池之间拷贝对象；请求在池中的两个对象之间的差异的概述。The pool request broker 602 is a simple software component that provides requests for storage pool specific functions by executing an appropriate set of pool function providers against the configured virtual storage resources 510 . Requests that can be served include, but are not limited to, creating an object in a pool; deleting an object from a pool; writing data to an object; reading data from an object; copying an object within a pool; copying an object between pools; requesting two objects in a pool An overview of the differences between .

主存储提供器606实现虚拟存储资源的管理接口（例如，创建和删除快照以及跟踪文件的改变的部分），虚拟存储资源也经由接口例如光纤信道、iSCSI、NFS或CIFS直接暴露于应用。Primary storage provider 606 implements a management interface (eg, creating and deleting snapshots and tracking changed parts of files) for virtual storage resources that are also directly exposed to applications via interfaces such as Fiber Channel, iSCSI, NFS, or CIFS.

快照提供器608实现制作来自主存储池的数据的时间点拷贝的功能。这创建填充有快照的另一资源池的抽象。如所实现的，时间点拷贝是来自主存储池的对象的写时拷贝快照，消耗第二虚拟存储资源以适应写时拷贝的拷贝，因为这个管理功能由用于主存储器和用于快照提供器的虚拟存储资源暴露。Snapshot provider 608 implements the functionality of making point-in-time copies of data from the primary storage pool. This creates an abstraction of another resource pool populated with snapshots. As implemented, a point-in-time copy is a copy-on-write snapshot of an object from the main storage pool, consuming a second virtual storage resource to accommodate the copy-on-write copy, since this management function is shared by both for main storage and for the snapshot provider virtual storage resources exposed.

差异引擎提供器614可满足对池中的待比较的两个对象的请求，这两个对象连接在时间链中。在这两个对象之间的差异部分被识别并以提供器特定的方式例如使用位图或范围而被概述。例如，差异部分可被表示为位图，其中每个设定的位表示固定大小区域，其中这两个对象不同；或差异可在程序上被表示为一系列功能调用或回调。The difference engine provider 614 may fulfill requests for two objects in the pool to be compared, connected in a time chain. Differences between these two objects are identified and outlined in a provider-specific way, eg using bitmaps or ranges. For example, a difference portion may be represented as a bitmap, where each set bit represents a fixed-size region where the two objects differ; or a difference may be represented procedurally as a series of function calls or callbacks.

根据池所基于的虚拟存储资源或实现池的其它提供器，差异引擎可用各种方式有效地产生结果。如所实现的，在经由快照提供器实现的池上起作用的差异引擎使用快照提供器的写时拷贝特征来跟踪快照做出的对物体的变化。单个变化的原始对象的连续快照因此具有和它们一起由快照提供器存储的差异的记录，且快照池的差异引擎仅仅取回变化的这个记录。如也实现的，在经由内容可寻址提供器实现的池上起作用的差异引擎使用内容可寻址实现的有效的树结构（见下面，图12）来在要求时完成在对象之间的快速比较。Depending on the virtual storage resource on which the pool is based or the other provider that implements the pool, the difference engine can efficiently produce results in various ways. As implemented, a diff engine operating on a pool implemented via a snapshot provider uses the copy-on-write feature of the snapshot provider to track changes to objects made by snapshots. Successive snapshots of a single changed original object thus have a record of differences stored with them by the snapshot provider, and the snapshot pool's difference engine retrieves only this record of changes. As also implemented, the difference engine acting on pools implemented via content-addressable providers uses the efficient tree structure of content-addressable implementations (see below, Figure 12) to accomplish fast translation between objects on demand Compare.

内容可寻址提供器610实现它消耗的虚拟存储资源的一写多读内容可寻址接口。它满足读、写、复制和删除操作。每个写入或拷贝的对象由从其内容得到的唯一句柄识别。下面进一步描述内容可寻址提供器（图11）。Content Addressable Provider 610 implements a write-once-read-many content-addressable interface for the virtual storage resources it consumes. It satisfies read, write, copy and delete operations. Each written or copied object is identified by a unique handle derived from its contents. The Content Addressable Provider (Figure 11) is described further below.

池管理器操作Pool Manager Operations

在操作中，池请求中介502接受对数据操作操作例如拷贝、快照或删除池或对象的请求。请求中介通过查看池或对象的名称或引用来确定来自池504的哪个提供器代码被执行。中介接着将进入的服务请求转换成可由特定的池功能提供器处理的形式，并调用适当序列的提供器操作。In operation, the pool request broker 502 accepts requests for data manipulation operations such as copying, snapshotting, or deleting pools or objects. The request broker determines which provider code from pool 504 is executed by looking at the name or reference of the pool or object. The mediation then converts the incoming service request into a form that can be processed by the specific pool function provider and invokes the appropriate sequence of provider operations.

例如，进入的请求可请求将来自主存储池中的卷的快照制作到快照池中。进入的请求按照名称识别主存储池中的对象（卷），且名称和操作（快照）的组合确定快照提供器应被调用，该快照提供器可使用基本快照能力从主池产生时间点快照。这个快照提供器将请求转换成由基本存储虚拟化装置所执行的本地写时拷贝功能所需的确切形式例如位图或范围，且它将本地写时拷贝功能的结果转换成可返回到对象管理器并在对池管理器的未来请求中使用的存储卷句柄。For example, an incoming request may request that a snapshot from a volume in the primary storage pool be snapped into the snap pool. Incoming requests identify objects (volumes) in the main storage pool by name, and the combination of name and operation (snapshot) determines that a snapshot provider should be invoked, which can produce point-in-time snapshots from the main pool using basic snapshot capabilities. This snapshot provider translates the request into the exact form required by the native copy-on-write functionality performed by the underlying storage virtualization appliance, such as a bitmap or extent, and it translates the result of the native copy-on-write functionality into a form that can be returned to the object manager pool manager and use in future requests to the pool manager.

使用对象管理器和数据移动器的数据备份的最佳方式Best way to backup data using Object Manager and Data Mover

数据备份的最佳方式是随着时间的过去产生应用数据对象的连续版本同时最小化数据的量的一系列操作，该数据必须通过使用位图、范围和存储在对象移动器的其它时间差异信息来被拷贝。它将应用数据存储在数据存储对象中，并使它与元数据相关，元数据随着时间的过去使各种变化与应用数据有关，使得改变随着时间的过去可以容易被识别。The best way to back up data is as a series of operations that produce successive versions of application data objects over time while minimizing the amount of data that must be passed using bitmaps, extents, and other time-difference information stored in the object mover. to be copied. It stores application data in data storage objects and associates it with metadata that relates changes to application data over time so that changes over time can be easily identified.

在优选实施例中，该过程包括下列步骤：In a preferred embodiment, the process includes the following steps:

1．机制提供在数据存储对象内的应用数据的初始引用状态，例如T0；1. The mechanism provides the initial reference state of the application data within the data storage object, such as T0;

2．随着时间的过去在要求时创建具有差异引擎提供器的虚拟存储池中的数据存储对象的随后的实例（版本）；2. Create subsequent instances (versions) of the data store object in the virtual storage pool with the difference engine provider on demand over time;

3．每个连续的版本例如T4、T5使用虚拟存储池的差异引擎提供器来得到它和在它之前创建的实例之间的差异，使得T5被存储为T4的引用和在T5和T4之间的一组差异；3. Each successive version such as T4, T5 uses the virtual storage pool's difference engine provider to get the difference between it and the instance created before it, so that T5 is stored as a reference to T4 and a reference between T5 and T4 group difference;

4．拷贝引擎接收将数据从一个数据对象（源）拷贝到另一数据中心（目的地）的请求；4. The copy engine receives requests to copy data from one data object (source) to another data center (destination);

5．如果虚拟存储池（其中目的地对象将被创建）不包含从源数据对象的以前版本创建的其它对象，则新对象在目的地虚拟存储池中被创建，且源数据对象的全部内容被拷贝到目的地对象；该过程结束。否则，下一步骤紧接着；5. If the virtual storage pool (in which the destination object is to be created) contains no other objects created from previous versions of the source data object, a new object is created in the destination virtual storage pool and the entire contents of the source data object is copied to Destination object; the process ends. Otherwise, the next step follows;

6．如果虚拟存储池（其中对象被创建）包含从源数据对象的以前版本创建的对象，则在目的地虚拟存储池中的最近创建的以前版本被选择，对这个以前版本存在源数据对象的虚拟存储池中的相应的以前版本。例如，如果T5的拷贝从快照池发起，且在时间T3创建的对象是在目标处可用的最近版本，T3被选择为以前的版本；6. If the virtual storage pool (in which the object was created) contains objects created from a previous version of the source data object, then the most recently created previous version in the destination virtual storage pool for which virtual storage of the source data object exists is selected The corresponding previous version in the pool. For example, if a copy of T5 originates from a snap pool, and the object created at time T3 is the most recent version available at the target, T3 is chosen as the previous version;

7．构造源数据对象的版本的时间排序的列表，其以在以前的步骤中识别的初始版本开始并以将被拷贝的源数据对象结束。在上面的例子中，在快照池处，对象的所有状态是可用的，但只有包括T3和在T3后面的状态有意义：T3、T4、T5；7. A time-ordered list of versions of the source data objects is constructed, starting with the initial version identified in the previous steps and ending with the source data objects to be copied. In the above example, at the snapshot pool, all states of the object are available, but only the states including and after T3 make sense: T3, T4, T5;

8．构造列表中的每个连续版本之间的差异的相应列表，使得从列表的起始版本到最终版本的所有差异被表示。差异都识别数据的哪个部分被改变，并包括相应时间的新数据。这产生从目标版本到源版本的一组差异，例如在T3和T5之间的差异。8. A corresponding list of differences between each successive version in the list is constructed such that all differences from the starting version to the final version of the list are represented. Differences both identify which part of the data was changed and include new data at the corresponding time. This produces a set of differences from the target version to the source version, for example the differences between T3 and T5.

9．通过复制在目的地虚拟存储池中的在步骤6中识别的对象的以前版本来创建目的地对象，例如在目标存储库中的对象T3；9. creating a destination object by copying a previous version of the object identified in step 6 in the destination virtual storage pool, for example object T3 in the destination repository;

10．将在步骤8中创建的列表中识别的这组差异从源数据对象拷贝到目的地对象，该过程结束。10. The set of differences identified in the list created in step 8 is copied from the source data object to the destination object and the process ends.

在目的地虚拟存储池内的每个数据对象是完整的；也就是说，它代表整个数据对象，并允许对在时间点处的所有应用数据的访问而不需要对在其它时间点处的状态或表示的外部引用。对象是可访问的，而不将所有增量从基线状态分程传递到当前状态。此外，在目的地虚拟存储池中的数据对象的初始和随后的版本的复制不需要其中应用数据内容的详尽复制。最后，到达第二和随后的状态只需要所跟踪和维持的改变的传输，如上所述，而没有数据存储对象的内容的遍历、传输或复制。Each data object within the destination virtual storage pool is complete; that is, it represents the entire data object and allows access to all application data at one point in time without requiring knowledge of the state or Represents an external reference. Objects are reachable without passing all deltas from the baseline state to the current state. Furthermore, replication of initial and subsequent versions of data objects in the destination virtual storage pool does not require an exhaustive replication of the data content in which it is applied. Finally, reaching the second and subsequent states requires only the transfer of the tracked and maintained changes, as described above, and no traversal, transfer or copying of the contents of the data store object.

使用对象管理器和数据移动器的数据资源的最佳方式Best way to use data resources with object managers and data movers

直观地，数据资源的最佳方式是数据备份的最佳方式的转换。在给定时间点处重新创建目的地虚拟存储池中的数据对象的期望状态包括下列步骤：Intuitively, the best way for data resources is the transformation of the best way for data backup. Recreating the desired state of the data objects in the destination virtual storage pool at a given point in time includes the following steps:

1．识别具有差异引擎提供器的另一虚拟存储池中的数据对象的版本，其相应于将被创建的期望状态。这是源虚拟存储池中的源数据对象；1. A version of a data object in another virtual storage pool with a difference engine provider corresponding to a desired state to be created is identified. This is the source data object in the source virtual storage pool;

2．识别在目的地虚拟存储池中重新创建的数据对象的以前版本；2. identifying previous versions of data objects recreated in the destination virtual storage pool;

3．如果没有数据对象在步骤2中被识别出，则创建在目的地虚存储池中的新目的地对象并将数据从源数据对象拷贝到目的地数据对象。该过程是完整的。否则，继续下列步骤；3. If no data objects are identified in step 2, a new destination object in the destination virtual storage pool is created and data is copied from the source data object to the destination data object. The process is complete. Otherwise, continue with the following steps;

4．如果数据对象的版本在步骤2中被识别出，则识别相应于在步骤2中识别的数据对象的源虚拟存储池中的数据对象；4. If the version of the data object is identified in step 2, identifying the data object in the source virtual storage pool corresponding to the data object identified in step 2;

5．如果没有数据对象在步骤4中被识别出，则创建在目的地虚存储池中的新目的地对象并将数据从源数据对象拷贝到目的地数据对象。该过程是完整的。否则，继续下列步骤；5. If no data objects are identified in step 4, a new destination object in the destination virtual storage pool is created and data is copied from the source data object to the destination data object. The process is complete. Otherwise, continue with the following steps;

6．通过复制在步骤2中识别的数据对象来创建目的地虚拟存储池中的新目的地数据对象；6. creating a new destination data object in the destination virtual storage pool by duplicating the data object identified in step 2;

7．使用源虚拟存储池的差异引擎提供器来得到在步骤1中识别的数据对象和在步骤4中识别的数据对象之间的差异集合；7. using the difference engine provider of the source virtual storage pool to obtain a set of differences between the data object identified in step 1 and the data object identified in step 4;

8．将由步骤7创建的列表所识别的数据从源数据对象拷贝到目的地数据对象。该过程结束。8. Copy the data identified by the list created by step 7 from the source data object to the destination data object. The process ends.

对期望状态的访问是完整的：它不需要对其它容器或外部状态的外部引用。建立给定的期望状态的引用状态既不需要详尽的遍历也不需要详尽的传输，只有由在源虚拟存储池内的提供的表示指示的所取回的改变。Access to the desired state is complete: it requires no external references to other containers or external state. Neither an exhaustive traversal nor an exhaustive transfer is required to establish the reference state for a given desired state, only the fetched changes indicated by the provided representation within the source virtual storage pool.

服务水平协议service level agreement

图7示出服务水平协议。服务水平协议捕获关于应用数据的次拷贝的详细的商业要求。在最简单的描述中，商业要求定义拷贝何时被创建和多久被创建一次，它们保留多长时间以及这些拷贝以什么类型的存储池存在。这个过分简单化的描述不捕获商业要求的几个方面。给定类型的池的拷贝创建的频率在一天的所有小时中或一星期的所有天中可能不是一致的。一天的某些小时或一星期或一月的某些天可表示应用数据中的更多（更少）的临界期，且因此可要求更多（或更少）频率的拷贝。类似地，在特定池中的应用数据的所有拷贝可能不需要保留相同长度的时间。例如，在每月处理结束时创建的应用数据的拷贝可能需要比在一月中旬创建的同一存储池中的拷贝保留更长的时间段。Figure 7 shows a service level agreement. Service level agreements capture detailed business requirements regarding secondary copies of application data. In the simplest description, business requirements define when and how often copies are created, how long they are retained and in what type of storage pool those copies exist. This simplistic description fails to capture several aspects of business requirements. The frequency at which copies of a pool of a given type are created may not be consistent throughout all hours of the day or all days of the week. Certain hours of the day or days of the week or month may represent more (less) critical periods in the application data, and thus may require more (or less) frequent copies. Similarly, all copies of application data in a particular pool may not need to be kept for the same length of time. For example, a copy of application data created at the end of monthly processing may need to be retained for a longer period of time than a copy in the same storage pool created in mid-January.

某些实施例的服务水平协议304被设计成表示存在于商业要求中的所有这些复杂性。服务水平协议具有四个主要部分：名称、目的地、内务处理属性和服务水平策略的集合。如上所述，每个应用有一个SLA。The Service Level Agreement 304 of certain embodiments is designed to represent all of these complexities that exist in business requirements. A service level agreement has four main parts: a name, a destination, a collection of housekeeping attributes, and a service level policy. As mentioned above, each application has an SLA.

名称属性701允许每个服务水平协议具有唯一的名称。The name attribute 701 allows each service level agreement to have a unique name.

描述属性702是用户可指定服务水平协议的有帮助的描述。Description attribute 702 is a helpful description of the user-specifiable service level agreement.

服务水平协议也具有很多内务处理属性703，其使它能够被维持和修改。这些属性包括但不限于所有者的身份、创建的日期和时间、修改和访问、优先级、启用/禁用标志。The service level agreement also has a number of housekeeping attributes 703 that enable it to be maintained and modified. These attributes include, but are not limited to, identity of owner, date and time of creation, modification and access, priority, enable/disable flag.

服务水平协议还包含多个服务水平策略705。一些服务水平协议可只具有单个服务水平策略。更一般地，单个SLA可包含数十个策略。The service level agreement also contains a number of service level policies 705 . Some service level agreements may have only a single service level policy. More generally, a single SLA can contain dozens of policies.

在某些实施例中，每个服务水平策略至少由下列项组成：源存储池位置706和类型708；目标存储池位置710和类型712；被表示为时间段的创建拷贝的频率714；被表示为时间段的拷贝的保留期的长度716；在一天中这个特定的服务水平策略的操作时间718；以及这个服务水平策略适用的星期、月或年中的那些日子720。In some embodiments, each service level policy consists of at least the following: source storage pool location 706 and type 708; target storage pool location 710 and type 712; frequency of creating copies 714 represented as a time period; represented by The length of the retention period for copies of the time period 716; the time of day during which this particular service level policy operates 718; and the days of the week, month, or year to which this service level policy applies 720.

每个服务水平策略规定源和目标存储池以及在那些存储池之间需要的应用数据的拷贝的频率。此外，服务水平策略规定其操作的小时和它可应用的天。每个服务水平策略是对应用数据的保护的商业要求中的单个声明的表示。例如，如果特定的应用具有对在每月关闭之后和保留三年之后在每月创建的档案拷贝的商业要求，这可能转换到需要在一月的最后一天的半夜从本地备份存储池到长期档案存储池的拷贝的服务水平策略，保留期为三年。Each service level policy specifies source and target storage pools and the frequency at which application data is required to be copied between those storage pools. In addition, the Service Level Policy specifies the hours of its operation and the days on which it is applicable. Each service level policy is an expression of a single statement in the business requirements for protection of application data. For example, if a particular application has a business requirement for archive copies created each month after monthly shutdown and retention for three years, this may translate to the need to move from the local backup storage pool to the long-term archive in the middle of the night on the last day of January The service level policy for copies of storage pools, with a retention period of three years.

具有源和目的地池和位置例如源主存储池和目的地本地快照池的特定组合的所有服务水平策略当在一并考虑时规定对将拷贝创建到该特定的目的地池中的商业要求。商业要求可指示例如快照拷贝在有规律的工作时间期间的每小时但只在这些时间之外的每四个小时被创建。具有相同源和目标存储池的两个服务水平策略将有效地捕获以可由服务策略引擎实施的形式的这些要求。All service level policies with a particular combination of source and destination pools and locations, such as source main storage pools and destination local snap pools, when considered together specify the business requirements for creating copies into that particular destination pool. Business requirements may dictate, for example, that snapshot copies are created every hour during regular business hours but only every four hours outside of these hours. Two service level policies with the same source and target storage pools will effectively capture these requirements in a form enforceable by the service policy engine.

这种形式的服务水平协议允许每日、每星期和每月商业活动的计划的表示，并因此捕获对比传统的基于RPO和RPO的方案准确地多地保护和管理应用数据的商业要求。通过允许操作的小时和一年的日、星期和月，调度可在“日历基础”上发生。This form of service level agreement allows for the planned representation of daily, weekly and monthly business activities, and thus captures business requirements for securing and managing application data more accurately than traditional RPO and RPO-based approaches. Scheduling can occur on a "calendar basis" by allowing hours of operation and days, weeks, and months of the year.

具有源和目的地的一个特定组合例如“源：本地主要和目的地：本地性能优化”的所有服务水平策略合起来捕获对一种类型的存储的非一致数据保护要求。单个RPO号另一方面强制在一天的所有时间和所有天中数据保护的单个一致的频率。例如，服务水平策略的组合可能要求大量快照被保存一段短时间，例如10分钟，较少数量的快照被保存一段较长的时间，例如8个小时；这允许被意外地删除的少量信息可回复到不大于10分钟之前的状态，同时仍然在较长时间水平线提供实质性的数据保护，而不需要存储每十分钟拍摄的所有快照的存储开销。作为另一例子，备份数据保护功能可以是在工作周期间以一个频率和在周末期间以另一频率操作的给定的一个策略。All service level policies with one specific combination of source and destination such as "source: local primary and destination: local performance optimized" together capture non-uniform data protection requirements for a type of storage. A single RPO number on the other hand enforces a single consistent frequency of data protection at all times of day and in all days. For example, a combination of service level policies may require a large number of snapshots to be kept for a short period of time, such as 10 minutes, and a smaller number of snapshots to be kept for a longer period of time, such as 8 hours; this allows recovery of small amounts of information that are accidentally deleted to the state no more than 10 minutes ago while still providing substantial data protection over longer time horizons without the storage overhead of storing all the snapshots taken every ten minutes. As another example, a backup data protection function may be given a policy that operates at one frequency during the work week and at another frequency during the weekend.

当所有不同类的源和目的地存储的服务水平策略被包括时，服务水平协议完全捕获对整个应用——包括本地快照、本地长持续时间存储库、装置外存储、档案等——的所有数据包括要求。在SLA内的策略的集合能够表示给定功能应何时被执行，并能够表示应在给定的数据源上执行的多个数据管理功能。SLAs fully capture all data for the entire application—including local snapshots, local long-duration repositories, off-device storage, archives, etc.—when service level policies for all different classes of source and destination storage are included Include requirements. A collection of policies within an SLA can represent when a given function should be performed, and can represent the various data management functions that should be performed on a given data source.

服务水平协议由用户通过管理工作站上的用户接口来创建和修改。这些协议是由它管理的结构SQL数据库或其它存储库中的服务策略引擎所存储的电子文件。策略被取回、电子地分析并由服务策略引擎通过如上所述的其正常调度计划而起作用。Service level agreements are created and modified by users through the user interface on the management workstation. These agreements are electronic documents stored by the Service Policy Engine in a structured SQL database or other repository it manages. Policies are retrieved, electronically analyzed and acted upon by the service policy engine through its normal schedule as described above.

图8示出应用特定模块402。应用特定模块在应用300（如上所述）附近运行，并与应用和其操作环境交互以收集元数据并查询和控制数据管理操作所需的应用。FIG. 8 shows an application specific module 402 . Application specific modules run adjacent to the application 300 (described above) and interact with the application and its operating environment to collect metadata and query and control the application as needed for data management operations.

应用特定模块与应用的各种部件和的其操作环境交互，操作环境包括应用服务进程和守护程序801、应用配置数据802、操作系统存储服务803（例如窗口上的VSS和VDS）、逻辑卷管理和文件系统服务804以及操作环境驱动器和模块805。Application-specific modules interact with the various components of the application and its operating environment, including application service processes and daemons 801, application configuration data 802, operating system storage services 803 (such as VSS and VDS on Windows), logical volume management and file system services 804 and operating environment drivers and modules 805 .

应用特定模块响应于来自服务策略引擎406的控制命令而执行这些操作。存在用于与应用的这些交互的两个目的：元数据收集和应用一致性。The application specific modules perform these operations in response to control commands from the service policy engine 406 . There are two purposes for these interactions with applications: metadata collection and application consistency.

元数据收集是一过程，应用特定模块通过该过程收集关于应用的元数据。在一些实施例中，元数据包括信息，例如：应用的配置参数；应用的状态和状况；应用的控制文件和启动/关闭脚本；应用的数据文件、日志和事务记录的位置；以及符号链接、文件系统固定点、逻辑卷名称和可影响对应用数据的访问的其它这样的实体。Metadata collection is the process by which an application specific module collects metadata about an application. In some embodiments, metadata includes information such as: the application's configuration parameters; the application's state and status; the application's control files and startup/shutdown scripts; the location of the application's data files, logs, and transaction records; File system fixpoints, logical volume names, and other such entities that can affect access to application data.

元数据连同应用数据和SLA信息一起被收集和保存。这保证在系统内的应用数据的每个拷贝是完备的，并包括重建应用数据所需的所有细节。Metadata is collected and stored along with application data and SLA information. This ensures that each copy of the application data within the system is complete and includes all details needed to reconstruct the application data.

应用一致性是当应用数据的拷贝被创建时确保拷贝是有效的并可恢复到应用的有效实例的一组行动。当商业要求在应用是活动的、在其在线操作状态中时指示应用被保护时，这是关键的。应用可在其数据存储库内具有相互依赖的数据关系，且如果这些没有在一致的状态中被拷贝，则将不提供有效的可恢复的映像。Application consistency is a set of actions that, when a copy of application data is created, ensures that the copy is valid and recoverable to a valid instance of the application. This is critical when business requirements dictate that the application is protected while the application is active, in its online operating state. Applications may have interdependent data relationships within their data stores, and if these are not copied in a consistent state, this will not provide a valid restorable image.

实现应用一致性的确切过程从一个应用到另一应用改变。一些应用具有迫使缓存的数据到磁盘的简单清除命令。一些应用支持热备份模式，其中应用确保它的操作以保证一致性的方式被记录，即使当应用数据改变时。一些应用需要与操作系统存储设备例如VSS和VDS的交互以确保一致性。应用特定模块是为特定目的而建造的，以与特定的应用一起工作并确保该应用的一致性。应用特定模块与基本存储虚拟化设备和对象管理器交互以提供应用数据的一致快照。The exact process for achieving application consistency varies from one application to another. Some applications have simple flush commands that force cached data to disk. Some applications support a hot standby mode, where the application ensures that its operations are recorded in a manner that ensures consistency, even when application data changes. Some applications require interaction with operating system storage devices such as VSS and VDS to ensure consistency. Application-specific modules are purpose-built to work with and ensure consistency with a specific application. Application specific modules interact with the underlying storage virtualization appliance and object manager to provide consistent snapshots of application data.

为了效率，应用特定模块402的优选实施例是在与应用300相同的服务器上运行。这确保在与应用的交互中的最小时延，并提供对应用主机上的存储服务和文件系统的访问。应用主机是一般考虑的主存储器，其接着被快照到性能优化存储库。A preferred embodiment of the application specific module 402 is to run on the same server as the application 300 for efficiency. This ensures minimal latency in interactions with the application and provides access to storage services and file systems on the application host. The application host is generally considered main memory, which is then snapshotted to a performance-optimized repository.

为了最小化运行应用的中断，包括最小化预备步骤，应用特定模块只被触发来当对应于数据的访问在特定的时间被需要时且当对该时间的快照不存在于系统中的其它地方时产生快照，如对象管理器所跟踪的。通过跟踪快照被产生的时间，对象管理器能够实现来自性能优化数据存储库的随后的数据请求，包括用于满足对备份和复制的多个请求，其可从次级容量优化池发出。对象管理器可能能够向性能优化存储库中的快照提供对象句柄，并可引导以快照的格式所特有的本地格式的性能优化存储库，这取决于基本存储装置。在一些实施例中，该格式可以是与指示哪些块改变的一个或多个LUN位图组合的应用数据；在其它实施例中，它可以是特定的范围。用于数据传输的格式因此能够只使用位图或范围来传输两个快照之间的增量或差异。To minimize disruption to running applications, including minimizing preparatory steps, application-specific modules are only triggered when access to corresponding data is required at a specific time and when a snapshot for that time does not exist elsewhere in the system Snapshots are taken, as tracked by the object manager. By tracking when snapshots were taken, the object manager is able to fulfill subsequent data requests from the performance optimized data store, including for satisfying multiple requests for backup and replication, which may originate from secondary capacity optimized pools. Depending on the underlying storage, the object manager may be able to provide object handles to snapshots in the performance-optimized store and may boot the performance-optimized store in a native format specific to the snapshot's format. In some embodiments, this format may be application data combined with one or more LUN bitmaps indicating which blocks changed; in other embodiments, it may be a specific range. The format used for data transfer is thus able to use only bitmaps or ranges to transfer deltas or differences between two snapshots.

也可对每个应用存储元数据例如应用的版本号连同快照。当SLA策略被执行时，应用元数据被读取并用于策略。这个元数据连同数据对象一起被存储。对于每个SLA，应用元素据将只在轻便的快照操作期间被读一次，且在那个时间例如清除高速缓冲存储器出现的预备操作将只在轻便快照操作期间被执行一次，即使应用数据的这个拷贝连同其元数据一起可用于多个数据管理功能。Metadata such as the application's version number may also be stored for each application along with the snapshot. When the SLA policy is enforced, the application metadata is read and used for the policy. This metadata is stored along with the data object. For each SLA, application element data will only be read once during a lightweight snapshot operation, and preparatory operations such as clearing the cache that occur at that time will only be performed once during a lightweight snapshot operation, even if this copy of the application data Together with its metadata, it can be used in several data management functions.

服务策略引擎Service Policy Engine

图9示出服务策略引擎406。服务策略引擎包含服务策略调度器902，其检查由用户配置的所有服务水平协议并作出调度决策来满足服务水平协议。它依赖于几个数据存储库来捕获信息并使它随着时间的过去而持久，数据存储库在一些实施例中包括：SLA存储库904，其中所配置的服务水平协议持续并被更新；资源配置文件存储库906，其存储提供逻辑存储池名称和实际存储工具之间的映射的资源配置文件；保护目录存储库908，其中关于在各种工具中创建的还没有到期的以前的成功拷贝而对信息编目录；以及集中式历史数据库910。FIG. 9 shows service policy engine 406 . The service policy engine includes a service policy scheduler 902 that examines all service level agreements configured by the user and makes scheduling decisions to satisfy the service level agreements. It relies on several data repositories to capture the information and make it persistent over time, the data repositories include in some embodiments: the SLA repository 904, where configured service level agreements are persisted and updated; Configuration file repository 906, which stores resource configuration files that provide a mapping between logical storage pool names and actual storage facilities; protection directory repository 908, which stores information about previous successful copies created in various facilities that have not yet expired and cataloging information; and a centralized historian database 910 .

历史存储库910是为了使用所有数据管理应用而保存关于过去的行动的历史信息，包括每个应用到各种存储工具的以前拷贝的时间戳、顺序和层次。例如，在下午1点发起并被安排为在下午9点到期的从主存储数据库到容量优化数据存储库的快照拷贝将被记录在临时数据存储库中的历史存储库910中，临时数据存储库还包括在上午11点和中午12点发生的同一源和目标的快照的链接的对象数据。History repository 910 is for use with all data management applications to hold historical information about past actions, including timestamps, order and hierarchy of previous copies of each application to various storage tools. For example, a snapshot copy from the primary storage database to the capacity optimized datastore initiated at 1pm and scheduled to expire at 9pm would be recorded in the history store 910 in the temporary datastore, the temporary datastore The repository also includes linked object data for snapshots of the same source and target that occurred at 11:00 am and 12:00 noon.

这些存储库由服务策略引擎管理。例如，当用户通过管理工作站创建服务水平协议或修改其内的策略之一时，它是使在其存储库中的新SLA持久并通过调度如SLA所指示的拷贝来对这个修改作出反应的服务策略引擎。类似地，当服务策略引擎成功地完成导致在存储池中的应用的新拷贝的数据移动工作时，存储策略引擎更新历史存储库，使得该拷贝将分解成未来决策的因素。These repositories are managed by the service policy engine. For example, when a user creates a service level agreement or modifies one of the policies within it through an administrative workstation, it is the service policy that persists the new SLA in its repository and reacts to this modification by scheduling a copy as indicated by the SLA engine. Similarly, when the service policy engine successfully completes a data movement job that results in a new copy of the application in the storage pool, the storage policy engine updates the history store so that the copy will be factored into future decisions.

服务策略引擎所使用的各种存储库的优选实施例在极接近服务策略引擎的冗余数据库管理系统中是以表格的形式。这在查询和更新存储库时确保一致的业务语义，并在取回相互依赖的数据中允许灵活性。A preferred embodiment of the various repositories used by the service policy engine is in the form of tables in a redundant database management system in close proximity to the service policy engine. This ensures consistent business semantics when querying and updating the repository, and allows flexibility in retrieving interdependent data.

服务策略调度器902的调度算法在图10中示出。当服务策略调度器决定它需要制作应用数据从一个存储池到另一存储池的拷贝时，它发起数据移动请求器和监控器任务912。这些任务不是重现的任务，并在它们完成后终止。根据服务水平策略被规定的方式，这些请求器中的多个是同时操作的。The scheduling algorithm of the service policy scheduler 902 is shown in FIG. 10 . When the service policy scheduler decides that it needs to make a copy of application data from one storage pool to another, it initiates the data movement requester and monitor task 912 . These tasks are not recurring tasks and are terminated after they complete. Depending on how the service level policy is specified, many of these requestors are operating concurrently.

当确定承担哪些额外的任务时，服务策略调度器考虑服务水平协议的优先级。例如，如果一个服务水平协议具有高优先级，因为它规定对任务关键的应用的保护，而另一SLA具有较低的优先级，因为它规定对测试数据库的保护，则服务策略引擎可选择只运行对任务关键的应用的保护，并可推迟或甚至完全跳过对较低优先级应用的保护。这由调度在较低优先级SLA之前的较高优先级SLA的服务策略引擎实现。在优选实施例中，在这样的情况下，为了检查目的，服务策略引擎也将向管理工作站触发通知事件。The service policy scheduler takes into account the priorities of the service level agreements when determining which additional tasks to undertake. For example, if one SLA has a high priority because it specifies protection for a mission-critical application, and another SLA has a lower priority because it specifies protection for a test database, the service policy engine may choose to only Run protection for mission-critical applications and defer or even skip protection for lower-priority applications altogether. This is accomplished by a service policy engine that schedules higher priority SLAs ahead of lower priority SLAs. In a preferred embodiment, in such cases, the service policy engine will also trigger a notification event to the management workstation for inspection purposes.

策略调度算法Policy Scheduling Algorithm

图10示出策略调度引擎的流程图。策略调度引擎连续地循环过所定义的所有SLA。当它到达所有SLA的末尾时，它休眠片刻，例如10秒，并重新开始再次浏览SLA。每个SLA封装对一个应用的数据保护商业要求，因此所有SLA代表所有应用。Figure 10 shows a flow diagram of the Policy Scheduling Engine. The Policy Scheduling Engine continuously cycles through all SLAs defined. When it reaches the end of all SLAs, it sleeps for a while, say 10 seconds, and restarts browsing the SLAs again. Each SLA encapsulates the data protection business requirements for an application, so all SLAs represent all applications.

对于每个SLA，调度引擎1000在具有相同的源池和目的地池1004的过程状态的所有服务水平策略收集在一起，并在1002对这组SLA中的下一SLA重复。服务水平策略的这个子集合起来代表对从该源存储池到该特定的目的地存储池的拷贝的所有要求。For each SLA, the scheduling engine 1000 collects together all service level policies that have the same process state for the source and destination pools 1004 and repeats at 1002 for the next SLA in the set of SLAs. This subset of service level policies collectively represent all requirements for copying from this source storage pool to this particular destination storage pool.

在服务水平策略的这个子集当中，服务策略调度器丢弃不可应用于今天或在其操作时间之外的策略。在留下的策略当中，找到具有最短频率的策略（1006），并基于历史数据和在历史存储库910中，找到需要接着运行（1008）的具有最长保持的策略。Of this subset of service level policies, the Service Policy Scheduler discards policies that are not applicable today or outside of their operating hours. Among the strategies left, the strategy with the shortest frequency is found (1006), and based on the historical data and in the historical repository 910, the strategy with the longest retention that needs to be run next (1008) is found.

接着，有一系列检查1010-1014，其在此时阻止制作应用数据的新拷贝，因为新拷贝还没有到期，因为拷贝已经在进行中或因为没有新数据要拷贝。如果这些条件中的任一个适用，则服务策略调度器移动到源和目的地池1004的新组合。如果这些条件中没有一个适用，则新拷贝被发起。拷贝如在这个SLA1016内的相应服务水平策略中规定的被执行。Next, there is a series of checks 1010-1014 which at this point prevent making a new copy of the application data because the new copy has not yet expired, because a copy is already in progress or because there is no new data to copy. If any of these conditions apply, the service policy scheduler moves to the new combination of source and destination pools 1004 . If none of these conditions apply, a new copy is initiated. Copying is performed as specified in the corresponding service level policy within this SLA 1016.

接着，调度器移动到同一服务水平协议1018的下一源和目的地池组合。如果不再有不同的组合，则调度器继续移动到下一服务水平协议1020。Next, the scheduler moves to the next source and destination pool combination of the same service level agreement 1018 . If there are no more different combinations, the scheduler moves on to the next service level agreement 1020 .

在服务策略调度器穿过所有服务水平协议的所有源/目的地池组合之后，它暂停一段短时间并接着重新开始循环。After the service policy scheduler goes through all source/destination pool combinations for all service level agreements, it pauses for a short period of time and then restarts the cycle.

具有快照存储库和备份存储库的简单示例性系统（只有2个策略被定义）将如下与服务策略调度器交互。给出两个策略，一个陈述“每小时备份，备份将保持4个小时”而另一个陈述“每2个小时备份，备份将保持8个小时”，结果将是每小时拍摄的单个快照，每个快照被拷贝到备份存储库，但在快照存储库和备份存储库保留不同数量的时间。“每2个小时备份”策略被调度以在中午12点由系统管理员实施。A simple example system with a snapshot repository and a backup repository (only 2 policies defined) would interact with the service policy scheduler as follows. Given two policies, one stating "Backup every hour, the backup will be kept for 4 hours" and the other stating "Backup every 2 hours, the backup will be kept for 8 hours", the result will be a single snapshot taken every hour, every A snapshot is copied to the backup repository, but is retained for a different amount of time in the snapshot repository and the backup repository. The "backup every 2 hours" policy is scheduled to be enforced by the system administrator at 12 noon.

在下午4点，当服务策略调度器开始在步骤1000操作时，它在步骤1002找到两个策略。（这两个策略都适用，因为自从中午12点以来流逝了两小时的倍数）。在步骤1004只有一个源和目的地池组合。在步骤1006有两个频率，且系统选择1小时频率，因为它比2小时频率短。在步骤1008有具有不同保留期的两个操作，且系统选择具有8小时保留期的操作，因为它具有较长的保留值。不是制作一个拷贝以满足4小时要求和制作另一拷贝以满足8小时要求，这两个要求都合并到较长的8小时要求中，并被单个快照拷贝操作满足。系统在步骤1010确定拷贝到期，并在历史存储库910检查相关的对象以确定拷贝是否已经在目标（在步骤912）和在源（在步骤914）被制作。如果这些检查通过，则系统在步骤916发起拷贝，且在过程中触发快照被制作并保存在快照存储库。快照接着从快照存储库拷贝到备份存储库。系统接着休眠（1022）并在短时间段例如10秒之后再次醒来。结果是在备份存储库处的拷贝和在快照存储库处的拷贝，其中每个偶数小时快照持续8个小时，而每个奇数小时快照持续4个小时。在备份存储库和快照存储库的偶数小时快照都被附上8小时保留期的标签，且将在此时通过另一过程从系统被自动探测。At 4 PM, when the service policy scheduler starts operating at step 1000, it finds two policies at step 1002. (Both strategies work because multiples of two hours have elapsed since 12 noon). At step 1004 there is only one source and destination pool combination. At step 1006 there are two frequencies, and the system chooses the 1 hour frequency because it is shorter than the 2 hour frequency. At step 1008 there are two operations with different retention periods, and the system selects the operation with an 8-hour retention period because it has the longer retention value. Instead of making one copy to meet the 4 hour requirement and another copy to meet the 8 hour requirement, both requirements are combined into the longer 8 hour requirement and satisfied by a single snapshot copy operation. The system determines that the copy is expired at step 1010, and checks the relevant objects at the history store 910 to determine whether a copy has already been made at the target (at step 912) and at the source (at step 914). If these checks pass, the system initiates a copy at step 916, and in the process triggers a snapshot to be made and saved in the snapshot repository. The snapshots are then copied from the snapshot repository to the backup repository. The system then sleeps (1022) and wakes up again after a short period of time, eg 10 seconds. The result is a copy at the backup repository and a copy at the snapshot repository, where each even-hour snapshot lasts 8 hours and each odd-hour snapshot lasts 4 hours. Even-hour snapshots in both the backup repository and the snapshot repository are tagged with an 8-hour retention period and will be automatically detected from the system through another process at this time.

注意，没有在2点拍摄两个快照或制作两个备份拷贝的原因，即使这两个策略都适用，因为这两个策略都由单个拷贝满足。组合和合并这些快照导致不需要的操作的减少，同时保持多个单独的策略的灵活性。它也可有助于对具有多个保留期的同一目标有同时活动的两个策略。在给定例子中，存在比两小时拷贝多的每小时拷贝，导致更大的粒度用于在较接近于目前的时间恢复。例如，在前面的系统中，如果在下午7:30映像从下午的较早时候发现，则备份将对过去四个小时：下午4、5、6、7点的每个小时是可用的。再多两个备份将从下午2点和中午12点被保留。Note that there is no reason to take two snapshots or make two backup copies at 2, even though both policies would apply, since both policies are satisfied by a single copy. Combining and merging these snapshots results in a reduction of unnecessary operations while maintaining the flexibility of multiple individual policies. It can also be helpful for two policies that have simultaneous activity on the same target with multiple retention periods. In the given example, there are more hourly copies than two-hour copies, resulting in greater granularity for restoring at a time closer to the present. For example, in the previous system, if at 7:30 pm the image was discovered from earlier in the afternoon, the backup would be available for each of the past four hours: 4, 5, 6, 7 pm. Two more backups will be kept from 2pm and 12pm.

内容可寻址存储库Content Addressable Repository

图11是实现内容可寻址提供器510的内容可寻址存储库的模块的方框图。FIG. 11 is a block diagram of modules implementing a content-addressable repository of a content-addressable provider 510 .

内容可寻址存储库510的实现提供对容量而不是对拷贝进或拷贝出速度优化的存储资源池，如将是早些时候描述的在快照中实现的性能优化池的情况的，并因此一般用于离线备份、复制和远程备份。内容可寻址存储器提供存储不同对象的公共子集仅仅一次的方式，其中那些公共子集可以具有变化的大小，但一般小至4千字节。与快照存储库比较，内容可寻址存储库的存储开销低，虽然访问时间通常较高。通常，在内容可寻址存储库中的对象没有与彼此的内在关系，即使它们可共享其大部分内容，虽然在这个实现中，历史关系也被维持，其是将被描述的各种优化的启动器。这与快照固有地形成链的快照存储库相反，每个存储库只存储来自以前的快照的增量或基线拷贝。特别是，内容可寻址存储库将只存储在单个对象内重复多次的数据子集的一个拷贝，而基于快照的存储库将存储任何对象的至少一个拷贝。The implementation of the content addressable storage library 510 provides a pool of storage resources optimized for capacity rather than for copy-in or copy-out speed, as will be the case with the performance-optimized pools implemented in snapshots described earlier, and thus generally For offline backup, replication and remote backup. Content-addressable memory provides a way to store common subsets of different objects just once, where those common subsets can be of varying sizes, but are typically as small as 4 kilobytes. Content-addressable repositories have low storage overhead compared to snapshot repositories, although access times are typically higher. Typically, objects in a content-addressable repository have no intrinsic relationship to each other, even though they may share most of their content, although in this implementation, historical relationships are also maintained, which is the result of various optimizations that will be described Launcher. This is in contrast to snapshot repositories where snapshots inherently form a chain, each storing only incremental or baseline copies from previous snapshots. In particular, a content-addressable repository will only store one copy of a subset of data that is repeated many times within a single object, while a snapshot-based repository will store at least one copy of any object.

内容可寻址存储库510是经由本地传输例如TCP在同一过程中或在分开的过程中在与池管理器相同的系统上执行的软件模块。在这个实施例中，内容可寻址存储模块在分开的过程中运行以便最小化来自不同部件的软件故障的影响。The content addressable store 510 is a software module executing on the same system as the pool manager via a local transport such as TCP, in the same process or in a separate process. In this embodiment, the content addressable memory modules run in separate processes in order to minimize the impact of software failures from different components.

该模块的目的是通过复制内容（即，确保在单个或多个数据对象内的重复的内容只存储一次）来允许以高度空间有效的方式存储数据存储对象403。The purpose of this module is to allow storage of data storage objects 403 in a highly space-efficient manner by duplicating content (ie, ensuring that duplicate content within a single or multiple data objects is only stored once).

内容可寻址存储模块经由可编程API向池管理器提供服务。这些服务包括下列项：The content addressable storage module provides services to the pool manager via a programmable API. These services include the following:

处理映射1102的对象：可通过经由API将数据写到存储库中来创建对象；一旦数据被完全写入，API就将返回对象的内容所确定的对象句柄。相反，数据可作为字节流通过提供句柄从对象内的偏移读出。参照图12的描述来解释如何构造句柄的细节。Objects of Process Map 1102: Objects can be created by writing data into the repository via the API; once the data is fully written, the API will return an object handle determined by the object's content. Instead, data can be read from an offset within the object as a stream of bytes by providing a handle. Details of how to construct the handle are explained with reference to the description of FIG. 12 .

时间树管理1104跟踪在所存储的数据对象之间的父/子关系。当数据对象被写到存储库510中时，API允许它作为孩子链接到已经在存储库中的父对象。这向内容可寻址存储库指示子对象是父对象的修改。单个父级可具有带有不同的修改的多个孩子，如可能是例如如果应用的数据被有规律地保存到存储库中一会儿的情况；然后早期拷贝被恢复并用作新的起始点用于随后的修改。下面更详细地描述时间树管理操作和数据模型。Time tree management 1104 tracks parent/child relationships between stored data objects. When a data object is written to the repository 510, the API allows it to be linked as a child to a parent object already in the repository. This indicates to the content addressable store that the child object is a modification of the parent object. A single parent may have multiple children with different modifications, as may be the case, for example, if the application's data is regularly saved to the repository for a while; the earlier copy is then restored and used as a new starting point for subsequent Modifications. The time tree management operations and data model are described in more detail below.

差异引擎1106可产生在存储库中的两个任意对象之间的不同区域的概述。区分操作经由规定待比较的两个对象的句柄的API被调用，且差异概述的形式是具有偏移量的一系列回调和连续的差异部分的大小。通过比较并行的对象的两个哈希表示来计算差异。Difference engine 1106 can generate an overview of the regions of difference between two arbitrary objects in the repository. The diff operation is invoked via an API specifying the handles of the two objects to be compared, and the diff summation is in the form of a series of callbacks with offsets and successive diff part sizes. Computes differences by comparing two hash representations of objects in parallel.

垃圾收集器1108是分析存储库以找到不被任何对象句柄引用的所保存的数据并收回调拨给该数据的存储空间的服务。内容可寻址存储库的性质是，很多数据被多个对象句柄引用，即，数据在数据对象之间被共享；一些数据将被单个对象句柄引用；但没有被对象句柄引用的数据（如可能是如果对象句柄从内容可寻址系统删除的情况的）可安全地被新数据盖写。Garbage collector 1108 is a service that analyzes the repository to find held data that is not referenced by any object handles and reclaims the storage space committed to that data. The nature of content-addressable repositories is that much data is referenced by multiple object handles, i.e., data is shared among data objects; some data will be referenced by a single object handle; but no data referenced by object handles (as may is the case if the object handle is deleted from the content addressable system) can be safely overwritten by new data.

对象复制器1110是复制在两个不同的内容可寻址存储库之间的数据对象的服务。多个内容可寻址存储库可用于满足额外的商业要求，例如离线备份或远程备份。Object Replicator 1110 is a service that replicates data objects between two different content addressable repositories. Multiple content addressable repositories can be used to meet additional business requirements, such as offline backup or remote backup.

这些服务使用图11所示的功能模块实现。数据哈希模块1112为高达固定大小限制的数据块产生固定长度的密钥。例如，在本实施例中，块（哈希发生器将为该块产生密钥）的最大大小是64KiB。固定长度密钥是哈希，其被附上标签以指示所使用的哈希方案或无损算法编码。在这个实施例中使用的哈希方案是SHA-1，其产生具有均匀的分布和没有设施需要合并到该系统中以探测并处理冲突的足够接近零的哈希冲突的概率的安全加密哈希。These services are implemented using the functional modules shown in Figure 11. The data hash module 1112 generates fixed length keys for data blocks up to a fixed size limit. For example, in this embodiment, the maximum size of a block for which the hash generator will generate keys is 64KiB. Fixed-length keys are hashes that are tagged to indicate the hashing scheme or lossless algorithm encoding used. The hashing scheme used in this example is SHA-1, which produces secure cryptographic hashes with a uniform distribution and a probability of hash collision close enough to zero that no facilities need to be incorporated into the system to detect and handle collisions .

数据句柄高速缓冲存储器1114是管理存储器内数据库的软件模块，数据库提供对数据和句柄到数据映射的短暂存储。Data handle cache 1114 is a software module that manages an in-memory database that provides transient storage of data and handle-to-data mappings.

持久性句柄管理索引1104是CAH到数据映射的可靠持久性数据库。在本实施例中，它被实现为B树，将哈希从哈希发生器映射到包含这个哈希的数据的持久性数据存储库1118中的页面。因为全B树不能一次保存在存储器中，为了效率，本实施例也使用存储器内发展过滤器以避免对不被知道存在的哈希的昂贵的B树搜索。The persistent handle management index 1104 is a reliable persistent database of CAH-to-data mappings. In this embodiment, it is implemented as a B-tree, mapping a hash from a hash generator to a page in the persistent data store 1118 containing the data for this hash. Because the full B-tree cannot be kept in memory at once, this embodiment also uses an in-memory evolution filter for efficiency to avoid expensive B-tree searches for hashes that are not known to exist.

永久性数据存储模块1118将数据和句柄存储到长期永久性存储器，返回指示数据存储在哪里的令牌。句柄/令牌对随后用于取回数据。当数据被写到永久性存储器时，它穿过在本实施例方式中使用zlib实现的一层无损数据压缩1120以及在本实施例中未启用的一层可选的可逆加密1122。The persistent data storage module 1118 stores data and handles to long-term persistent storage, returning a token indicating where the data is stored. The handle/token pair is then used to retrieve the data. When data is written to persistent storage, it passes through a layer of lossless data compression 1120 implemented using zlib in this embodiment and an optional layer of reversible encryption 1122 not enabled in this embodiment.

例如，将数据对象拷贝到内容可寻址存储库中是由对象/句柄映射器服务所提供的操作，因为进入的对象将被存储，且句柄将被返回到请求器。对象/句柄映射器读取进入的对象，请求由数据哈希发生器产生的哈希，将数据存储到永久性数据存储器且将句柄存储到永久性句柄管理索引。为了句柄的数据的未来快速查找，数据处理高速缓冲存储器保持被更新。存储到永久性数据存储器的数据被压缩并（可选地）在写到磁盘之前被加密。一般，对数据对象中的拷贝的请求也将调用时间树管理服务以产生对象的历史记录，且这经由永久数据存储器保持持久。For example, copying a data object into a content addressable store is an operation provided by the object/handle mapper service, since incoming objects will be stored and handles will be returned to the requester. The object/handle mapper reads incoming objects, requests hashes produced by the data hash generator, stores data to persistent data storage and handles to persistent handle management indexes. The data processing cache remains updated for future fast lookups of the handle's data. Data stored to persistent data storage is compressed and (optionally) encrypted before being written to disk. Typically, a request for a copy in a data object will also invoke a time tree management service to generate the object's history, and this is persisted via persistent data storage.

作为另一例子，从被给了句柄的内容可寻址存储库拷贝数据对象是由对象/句柄映射器服务所提供的另一操作。句柄在数据句柄高速缓冲存储器中被查找以定位相应的数据；如果数据在高速缓冲存储器中失去，则永久性索引被使用；一旦数据位于磁盘上，它就经由永久性数据存储模块（其加密并解压缩磁盘数据）被取回并接着重新分布以返回到请求器。As another example, copying a data object from a content-addressable repository to which a handle is given is another operation provided by the object/handle mapper service. The handle is looked up in the data handle cache to locate the corresponding data; if the data is lost in the cache, a persistent index is used; decompressed disk data) are retrieved and then redistributed back to the requester.

内容可寻址存储库句柄Content addressable repository handle

图12示出内容寻址对象的句柄如何产生。数据对象管理器引用具有内容可寻址句柄的所有内容可寻址对象。这个句柄由三个部分构成。第一部分1201是句柄直接指向的基本数据对象的大小。第二部分1202是它指向的对象的深度。第三1203是它指向的对象的哈希。字段1203可选地包括指示哈希是基本数据的无损编码的标签。标签指示所使用的编码方案，例如用作算法编码的数据的行程编码（RLE）的形式，如果数据块可被充分表示为短长度RLE。如果基本数据对象太大而不能被表示为无损编码，则从哈希到数据的指针或引用的映射被单独地存储在永久性句柄管理索引1104中。Figure 12 shows how handles to content addressable objects are generated. The data object manager references all content-addressable objects that have a content-addressable handle. This handle consists of three parts. The first part 1201 is the size of the primitive data object directly pointed to by the handle. The second part 1202 is the depth of the object it points to. The third 1203 is the hash of the object it points to. Field 1203 optionally includes a tag indicating that the hash is a lossless encoding of the underlying data. The tag indicates the encoding scheme used, such as a form of run-length encoding (RLE) used for algorithmically encoded data, if the block of data can be adequately represented as a short-length RLE. If the underlying data object is too large to be represented as a lossless encoding, then a mapping from hashes to pointers or references to the data is stored separately in the persistent handle management index 1104 .

内容可寻址对象被分成块1204。每个块的大小必须由一个内容可寻址句柄1205可寻址。数据被数据哈希模块1102哈希，且块的哈希用于产生句柄。如果对象的数据配合在一个块中，则所创建的句柄是对象的最终句柄。如果不，则句柄本身被一起分组成块1206，且哈希对每组句柄产生。句柄的这个分组继续（1207），直到只有一个所产生的句柄1208，其于是为对象的句柄。Content addressable objects are divided into blocks 1204 . The size of each block must be addressable by a content addressable handle 1205 . The data is hashed by the data hashing module 1102, and the block's hash is used to generate a handle. If the object's data fits together in a block, the handle created is the final handle to the object. If not, the handles themselves are grouped together into blocks 1206, and a hash is generated for each group of handles. This grouping of handles continues (1207) until there is only one resulting handle 1208, which is then the object's handle.

当对象将从内容句柄重建（存储资源池的拷贝出操作）时，顶级内容句柄被解引用以得到下一级内容句柄的列表。这些又被解引用以得到内容句柄的另外的列表，直到深度0句柄被得到。这些通过查找句柄管理索引或高速缓冲存储器中的句柄扩展到数据，或（在算法哈希例如行程长度编程的情况中的）确定地扩展到全内容。When the object is to be recreated from the content handle (copy out operation of the storage resource pool), the top level content handle is dereferenced to get the list of the next level content handles. These are in turn dereferenced to get additional lists of content handles, until a depth 0 handle is obtained. These extend to data by looking up handles in managed indexes or caches, or (in the case of algorithmic hashes such as run-length programming) deterministically extend to full content.

时间树管理time tree management

图13示出为存储在内容可寻址存储库内的数据对象创建的时间树关系。这个特定的数据结构仅在内容可寻址存储库内被利用。时间树管理模块将数据结构1302维持在使每个内容寻址数据对象与父级（其可能为零，以致使在修改序列中的第一个）相关的永久性存储库中。树的单独节点包含单个哈希值。这个哈希值引用数据的块——如果哈希是深度0哈希，或其它哈希的列表——如果哈希是深度1或更高的哈希。映射到哈希值的引用包含在永久性句柄管理索引1104中。在一些实施例中，树的边缘可具有权重或长度，其可在算法中用于找到邻居。Figure 13 shows the temporal tree relationships created for data objects stored in a content addressable repository. This particular data structure is only utilized within content-addressable repositories. The time tree management module maintains a data structure 1302 in persistent storage that relates each content-addressed data object to a parent (which may be zero, so as to be first in the modification sequence). Individual nodes of the tree contain a single hash value. This hash value refers to a block of data - if the hash is a depth 0 hash, or a list of other hashes - if the hash is a depth 1 or higher hash. References mapped to hash values are included in the persistent handle management index 1104 . In some embodiments, the edges of the tree may have weights or lengths, which may be used in algorithms to find neighbors.

这是标准树结构，且模块支持标准操纵操作，特别是：1310添加：添加父级之下的叶，这导致在初始状态1302和添加后状态1304之间的树的变化；以及1312移除：移除节点（并给其孩子重定其父亲的父级），这导致在添加后状态1304和移除后状态1306之间的树的变化。This is a standard tree structure, and the module supports standard manipulation operations, in particular: 1310 add: add a leaf below the parent, which results in a change to the tree between the initial state 1302 and the added state 1304; and 1312 remove: A node is removed (and its children are reparented by their parents), which results in a change in the tree between the added state 1304 and the removed state 1306 .

每当对象从外部池拷贝进CAS时，“添加”操作被使用。如果拷贝进是经由用于数据备份的最佳方式，或如果对象起源于不同的CAS池，则前辈对象需要被指定，且添加操作被调用以记录这个前辈/子孙关系。The "add" operation is used whenever an object is copied into the CAS from an external pool. If copy-in is via the best way for data backup, or if the object originates from a different CAS pool, then the predecessor object needs to be specified and the add operation called to record this predecessor/descendant relationship.

当策略管理器确定对象的保留期到期时，“移除”操作被对象管理器调用。这可导致存储在CAS中的数据在引用它的时间树中没有对象，且因此随后的垃圾收集通过可为可用的那个数据释放存储空间用于再使用。The "remove" operation is invoked by the object manager when the policy manager determines that the object's retention period has expired. This can result in data stored in the CAS having no objects in the time tree that references it, and thus a subsequent garbage collection pass can free that data for reuse for reuse.

注意，单个前辈可能有多个子孙或子节点。例如，如果对象最初在时间T1被创建并在时间T2被修改，则这可能出现，修改经由恢复操作来重新执行，且随后的修改在时间T3做出。在本实例中，状态T1具有两个孩子，状态T2和状态T3。Note that a single predecessor may have multiple descendants or child nodes. This may occur, for example, if an object was originally created at time T1 and modified at time T2, the modification is re-executed via a restore operation, and the subsequent modification is made at time T3. In this example, state T1 has two children, state T2 and state T3.

不同的CAS池可用于实现不同的商业目标，例如在远程位置提供灾难恢复。当从一个CAS拷贝到另一CAS时，拷贝可作为哈希和偏移量被发送，以利用目标CAS的本地解除复制能力。由任何新哈希指向的基本数据也在按需基础上被发送。Different CAS pools can be used to achieve different business goals, such as providing disaster recovery at remote locations. When copying from one CAS to another, the copy can be sent as a hash and offset to take advantage of the target CAS's native deduplication capabilities. The base data pointed to by any new hash is also sent on an as-needed basis.

时间树结构作为各种服务的实现的部分而被读取或设法穿过：Time tree structures are read or managed to traverse as part of the implementation of various services:

●垃圾收集设法穿过树以便减小“标记”阶段的成本，如下所述。• Garbage collection tries to traverse the tree in order to reduce the cost of the "marking" phase, as described below.

●复制到不同的CAS池在也被已知已经传输到其它CSA池的时间树中找到一组近邻，使得只有一小组差异需要被额外地传输。• Replication to different CAS pools finds a set of neighbors in the temporal tree that are also known to have been transmitted to other CSA pools, so that only a small set of differences needs to be additionally transmitted.

●用于数据恢复的最佳方式使用时间树来找到可用作恢复操作的基础的前辈。在CAS时间树数据结构中，孩子是随后的版本，例如，如档案策略所指示的。多个孩子被支持在同一父节点上；这种情况下可能在父节点改变、然后用作恢复的基础并随后再次改变时发生。• The best way for data recovery uses time trees to find predecessors that can be used as a basis for recovery operations. In the CAS time tree data structure, children are subsequent versions, eg, as dictated by the archive policy. Multiple children are supported on the same parent; this can happen when the parent changes, is then used as the basis for recovery, and then changes again.

CAS差异引擎CAS difference engine

CAS差异引擎1106比较如在图11和12中的哈希值或句柄所标识的两个对象，并在对象内产生一序列偏移量和范围，其中已知对象数据不同。通过在图12的哈希数据结构中并行地遍历两个对象树来实现该序列。树遍历是标准深度或宽度优先遍历。在遍历期间，比较在当前深度处的哈希。在节点的哈希在两侧之间相同的场合，不需要从树下来得更远，所以遍历可被缩短。如果节点的哈希不是相同的，则遍历继续下降到树的下一最低水平。如果遍历到达与其对应物不相同的深度0哈希，则被比较的在数据对象内的绝对偏移量（其中不相同的数据连同数据长度一起出现）被发出到输出序列中。如果一个对象在大小上比另一对象小，则它的遍历将在较早时候完成，且在其它树的遍历中遇到的所有随后的偏移量作为差异被发出。The CAS difference engine 1106 compares two objects as identified by hashes or handles in FIGS. 11 and 12 and generates a sequence of offsets and ranges within the objects where the object data is known to differ. This sequence is implemented by traversing two object trees in parallel in the hash data structure of FIG. 12 . Tree traversal is standard depth or breadth first traversal. During traversal, compare the hashes at the current depth. Where the hash of a node is the same between both sides, there is no need to go further down the tree, so the traversal can be shortened. If the hashes of the nodes are not the same, traversal continues descending to the next lowest level of the tree. If traversal reaches a depth-0 hash that is not identical to its counterpart, the absolute offset within the data object being compared (where the non-identical data occurs along with the data length) is emitted into the output sequence. If one object is smaller in size than the other, its traversal will complete earlier, and all subsequent offsets encountered in traversals of other trees are emitted as differences.

经由区分的垃圾收集Differentiated Garbage Collection

如在图11下所述的，垃圾收集器是分析特定的CAS存储库的服务，以找到被CAS存储库时间数据结构中的任何对象句柄引用的保存的数据，并收回被调拨到该数据的存储空间。垃圾收集使用标准“标记和扫描”方法。因为“标记”阶段可能相当昂贵，用于标记阶段的算法试图最小化标记相同的数据多次，即时它可被引用很多次；然而，标记阶段必须是完整的，确保没有被引用的数据被保持未标记，因为这将导致来自存储库的数据损失，因为在扫描阶段之后未标记的数据将以后由新数据盖写。As described below in Figure 11, the Garbage Collector is a service that analyzes a particular CAS repository to find saved data referenced by any object handles in the CAS repository time data structure, and reclaims the objects committed to that data. storage. Garbage collection uses the standard "mark and sweep" method. Because the "marking" phase can be quite expensive, the algorithm used for the marking phase tries to minimize marking the same data multiple times, even though it can be referenced many times; however, the marking phase must be complete, ensuring that no referenced data is kept Unmarked as this will result in loss of data from the repository as unmarked data after the scan phase will be overwritten by new data later.

用于标记所引用的数据的算法使用下列事实：在CAS中的对象使用在图13中描绘的数据结构布置在具有时间关系的曲线中。共享这些曲线中的边缘的对象可能仅在其数据的小子集上不同，且当对象从前辈创建时出现的任何新的数据块应在任何两个其它对象之间再次出现也是罕见的。因此，垃圾收集的标记阶段处理时间曲线的每个连接的分量。The algorithm for labeling the referenced data uses the fact that objects in the CAS are arranged in a curve with a time relationship using the data structure depicted in FIG. 13 . Objects sharing edges in these curves may only differ in a small subset of their data, and it is also rare that any new chunk of data that arises when an object is created from a predecessor should reappear between any two other objects. Thus, the marking phase of garbage collection processes each connected component of the time curve.

图14是在某些实施例中使用时间关系的垃圾收集的例子。包含时间关系的数据结构的深度优先搜索被进行，由箭头1402表示。采用起始节点1404，树遍历从该起始节点1404开始。节点1404是树根且没有引用对象。节点1406包含对对象H₁和H₂的引用，表示对象1的哈希值和对象2的哈希值。被节点1406（在这里是H₁和H₂）引用的所有深度0、深度1和更高的数据对象被列举并标记为已引用的。Figure 14 is an example of garbage collection using temporal relationships in some embodiments. A depth-first search of the data structure containing temporal relationships is performed, represented by arrow 1402 . A start node 1404 is taken, from which the tree traversal begins. Node 1404 is the root of the tree and has no referenced objects. Node 1406 contains references to objects _H1 and _H2 , representing the hash value of object 1 and the hash value of object 2. All data objects of depth 0, depth 1 and higher that are referenced by node 1406 (here _H1 and _H2 ) are enumerated and marked as referenced.

接着，处理节点1408。当它与所标记的节点1406共享边缘时，差异引擎应用于被1406引用的对象和被1408引用的对象之间的差异，得到存在于未标记的对象中但不在标记的对象中的一组深度0、深度1和更高哈希。在附图中，存在于节点1408中但不在节点1406中的哈希是H₃，所以H₃被标记为已引用的。这个过程继续，直到所有边缘被耗尽。Next, node 1408 is processed. When it shares an edge with the marked node 1406, the difference engine is applied to the difference between the object referenced by 1406 and the object referenced by 1408, resulting in a set of depths that exist in unmarked objects but not in marked objects 0, depth 1 and higher hashes. In the figure, the hash that exists in node 1408 but not in node 1406 is _H3 , so _H3 is marked as referenced. This process continues until all edges are exhausted.

由现有技术算法1418产生的结果和本实施例1420的比较表明，通过现有技术算法来处理节点1408时，以前看到的哈希H₁和H₂连同新哈希H₃一起被发出到输出流中。本实施例1420不将以前看到的哈希发出到输出流中，从而导致只有新哈希H₃、H₄、H₅、H₆、H₇被发出到输出流中，并且在性能上有相应的提高。注意，这个方法不保证数据将不被标记多于一次。例如，如果哈希值H₄在节点1416中独立地出现，则它将被独立地标记第二次。A comparison of the results produced by the prior art algorithm 1418 and _the present example 1420 shows that when node 1408 is processed by the prior art algorithm, previously seen hashes _H1 and _H2 are sent to in the output stream. This embodiment 1420 does not emit previously seen hashes into the output stream, resulting in only new hashes _H3 , _H4 , _H5 , _H6 , _H7 being emitted into the output stream, with a performance penalty Corresponding improvement. Note that this method does not guarantee that data will not be marked more than once. For example, if hash value _H4 independently occurs in node 1416, it will be independently marked a second time.

将对象拷贝到CAS中Copy the object to CAS

将对象从另一池拷贝到CAS中使用图11中所述的软件模块来产生被如在图12中的对象句柄引用的数据结构。对过程的输入是（a）在规定的偏移量处的一系列数据块，其适当地被设置大小以便生成深度0句柄，以及可选地（b）同一对象的前一版本。隐含地，新对象将与前一版本相同，除了其中输入数据被提供且其本身与前一版本不同以外。拷贝进操作的算法在图15的流程图中示出。Copying an object from another pool into the CAS uses the software modules described in FIG. 11 to generate data structures referenced by object handles as in FIG. 12 . The input to the process is (a) a sequence of data blocks at specified offsets, sized appropriately so as to generate a depth 0 handle, and optionally (b) a previous version of the same object. Implicitly, the new object will be the same as the previous version, except where the input data is provided and itself is different from the previous version. The algorithm for the copy-in operation is shown in the flowchart of FIG. 15 .

如果前一版本（b）被提供，则序列（a）可以是从（b）的一组稀疏更改。在已知待拷贝的对象仅在几个点处与以前的对象不同的情况下，这可极大地减少需要被拷贝进的数据的量，且因此减少所需的计算和i/o活动。这是例如当对象经由以前描述的数据备份的最佳方式被拷贝进时的情况。Sequence (a) may be a sparse set of changes from (b) if a previous version (b) is provided. Where the object to be copied is known to differ from the previous object at only a few points, this can greatly reduce the amount of data that needs to be copied in, and thus reduce the computation and i/o activity required. This is eg the case when objects are copied in via the previously described optimal means of data backup.

即使序列（a）包括从前辈大部分未改变的部分，识别前辈（b）也允许拷贝进过程对于数据是否确实已改变进行快速检查，并因此避免以比对于向CAS提供输入的某个其它存储池中的差异引擎可能的粒度更精细的粒度水平的数据复制。Even if the sequence (a) includes parts that are largely unchanged from the predecessor, identifying the predecessor (b) allows the copy-in process to perform a quick check that the data has indeed changed, and thus avoids using the The difference engine in the pool may replicate data at a finer granularity level.

于是隐含地，新对象将与前一版本相同，除了其中输入数据被提供且其本身与前一版本不同以外。拷贝进操作的算法在图15的流程图中示出。Then implicitly, the new object will be the same as the previous version, except where the input data is provided and itself is different from the previous version. The algorithm for the copy-in operation is shown in the flowchart of FIG. 15 .

当在临时存储库中的任意大小的数据对象被提供时，该过程在步骤1500开始，并继续进行到1502，其列举被前辈对象中的哈希值引用的任何和所有哈希（深度0到最高水平），如果这样的哈希被提供。这将用作快速检查以避免存储已经包含在前辈中的数据。The process begins at step 1500 when a data object of any size in temporary storage is provided, and proceeds to 1502, which enumerates any and all hashes (depth 0 to highest level), if such a hash is provided. This will be used as a quick check to avoid storing data already contained in predecessors.

在步骤1504，如果前辈被输入，则创建其在内容可寻址数据存储库时间数据结构中的克隆的引用。该克隆将被更新以变成新对象。因此，新对象将变成从拷贝源池拷贝到CAS中的差异所修改的前辈的拷贝。At step 1504, if a predecessor is entered, a reference to its clone in the content addressable data store temporal data structure is created. The clone will be updated to become the new object. Thus, the new object will become a copy of the predecessor modified by the difference copied from the copy source pool into the CAS.

在步骤1506、1508，数据移动器502将数据推到CAS中。数据附随有对象引用和偏移量，其是数据的目标位置。数据可以是稀疏的，因为只有与前辈的差异需要移动到新对象中。此时，进入的数据被分成大小足够小的深度0块，每个块可由单个深度0哈希表示。In steps 1506, 1508, the data mover 502 pushes the data into the CAS. The data is accompanied by an object reference and an offset, which is the target location of the data. Data can be sparse, since only differences from predecessors need to be moved into new objects. At this point, incoming data is divided into depth-0 blocks of a size small enough that each block can be represented by a single depth-0 hash.

在步骤1510，数据哈希模块为每个深度0块产生哈希（散列）。At step 1510, the data hashing module generates a hash (hash) for each depth-0 block.

在步骤1512，读取在同一偏移量处的前辈哈希。如果在同一偏移量处数据的哈希匹配前辈的哈希，则没有数据需要被存储，且深度1和更高对象不需要对这个深度0块更新。在这种情况下，返回以接受数据的下一深度0块。这实现了临时解除复制，而不必进行昂贵的全局查找。即使源系统理想地只发送与以前存储在CAS中的数据的差异，这个检查也可能是必要的，如果源系统正在不同的粒度水平处执行区分，或如果数据被标记为改变的但改变回到其以前存储的值。区分可在不同的粒度水平处执行，如果例如源系统是在32KiB边界上创建增量的快照池，且CAS存储库在4KiB块上创建哈希。At step 1512, the predecessor hash at the same offset is read. If the hash of the data at the same offset matches the hash of the predecessor, then no data needs to be stored, and depth 1 and higher objects need not be updated for this depth 0 block. In this case, return to accept the next depth 0 block of data. This enables temporary de-duplication without expensive global lookups. Even if the source system ideally sends only differences from data previously stored in the CAS, this check may be necessary if the source system is performing diffs at a different level of granularity, or if data is marked as changed but changed back to its previously stored value. Differentiation can be performed at different levels of granularity, if for example the source system is a snapshot pool that creates deltas on 32KiB boundaries, and the CAS repository creates hashes on 4KiB blocks.

如果匹配未找到，则数据可被哈希并存储。一旦新数据被耗尽，数据就被写入，在以前的偏离量开始并结束。一旦数据被存储，在步骤1516，如果偏移量仍然包含在同一深度1对象中，则深度1、深度2和所有更高对象1518被更新，在每个水平处产生新哈希，且深度0、深度1和所有更高对象在步骤1514存储到本地高速缓冲存储器。If a match is not found, the data can be hashed and stored. Once the new data is exhausted, the data is written, starting and ending at the previous offset. Once the data is stored, at step 1516, if the offset is still contained in the same depth 1 object, depth 1, depth 2, and all higher objects are updated 1518, generating new hashes at each level, and depth 0 , depth 1 and all higher objects are stored to local cache at step 1514 .

然而，在步骤1520，如果待存储的数据的量超过深度1块大小且偏移量将包含在新的深度1对象中，则当前的深度1必须被清除到存储库，除非它被确定已经存储在那里。首先在全局索引1116中查看它。如果发现它在那里，则从本地高速缓冲存储器移除深度1和所有相关的深度0对象，并继续进行新块1522。However, at step 1520, if the amount of data to be stored exceeds the depth 1 block size and the offset is to be contained in a new depth 1 object, then the current depth 1 must be cleared to the store unless it is determined to have been stored over there. Check it out at Global Index 1116 first. If it is found to be there, then depth 1 and all related depth 0 objects are removed from the local cache, and a new block 1522 is continued.

在步骤1524，作为避免浏览全局索引的快速检查，对于在本地高速缓冲存储器中的每个深度0、深度1和更高对象，在1502中建立的本地存储库中查找其哈希。丢弃匹配的任何东西。At step 1524 , as a quick check to avoid browsing the global index, for each depth 0, depth 1 and higher object in the local cache, its hash is looked up in the local repository established in 1502 . Anything that matches is discarded.

在步骤1526。对于在本地高速缓冲存储器中的每个深度0、深度1和更高对象，在全局索引1116中查找其哈希。丢弃匹配的任何东西。这确保数据被全局地解除复制。In step 1526. For each depth 0, depth 1 and higher object in the local cache, its hash is looked up in the global index 1116 . Anything that matches is discarded. This ensures that data is de-duplicated globally.

在步骤1528：将来自本地高速缓冲存储器的所有其余的内容存储到持久性存储库中，接着继续处理新块。At Step 1528: Store all remaining content from local cache into persistent storage, then continue processing new blocks.

从CAS读取对象是较简单的过程，且在CAS的很多实现中是常见的。对象的句柄经由全局索引被映射到持久性数据对象，且所需的偏移量从该持久性数据内被读取。在一些情况下，穿过对象句柄树中的几个深度递归可能是必要的。Reading objects from CAS is a relatively simple process and is common in many implementations of CAS. The object's handle is mapped to the persistent data object via the global index, and the required offset is read from within the persistent data. In some cases several deep recursions through the object handle tree may be necessary.

CAS对象网络复制CAS object network replication

如在图11下描述的，复制器1110是在两个不同的内容可寻址存储库之间复制数据对象的服务。复制的过程可通过从一个存储库读出并写回到另一存储库中来实现，但这个体系结构允许通过有限的带宽连接（例如局域网或广域网）进行更有效的复制。As described under FIG. 11 , replicator 1110 is a service that replicates data objects between two different content addressable repositories. Replication can be done by reading from one repository and writing back to another, but this architecture allows for more efficient replication over limited-bandwidth connections such as local or wide area networks.

对每个CAS存储库操作的复制系统使用上面描述的差异引擎服务连同如图13中所述的时间关系结构，且此外在每个对象的基础上在CAS存储库所使用的时间数据结构中存储对象已被复制到哪个远程存储库的记录。这提供在某个数据存储库处的对象存在的明确知识。The replication system operating on each CAS repository uses the difference engine service described above together with the temporal relational structure as described in Figure 13, and additionally stores on a per object basis in the temporal data structure used by the CAS repository A record of which remote repository the object has been copied to. This provides explicit knowledge of the existence of objects at a certain data repository.

使用时间数据结构，系统可能确定哪些对象存在于哪些数据存储库上。该信息由数据移动器和差异引擎利用以确定将在拷贝操作期间通过网络发送的数据的最小子集以使目标数据存储库最新。例如，如果数据对象O已在时间T3从波士顿的服务器拷贝到西雅图的远程服务器，则保护目录存储库908将存储在时间T3存在于波士顿和西雅图两地的对象O。在时间T5，在从波士顿到西雅图的随后拷贝期间，时间数据结构将被咨询以确定应当用于在波士顿的源服务器上的区分的在西雅图的对象O的前一状态。波士顿服务器将接着获取T5和T3的差异，并将该差异发送到西雅图服务器。Using temporal data structures, it is possible for the system to determine which objects exist on which data repositories. This information is utilized by the data mover and difference engine to determine the smallest subset of data to send over the network during the copy operation to bring the target data store up to date. For example, if data object O has been copied from a server in Boston to a remote server in Seattle at time T3, protection catalog store 908 will store object O that existed in both Boston and Seattle at time T3. At time T5, during the subsequent copy from Boston to Seattle, the temporal data structure will be consulted to determine the previous state of object O in Seattle that should be used for differentiation on the origin server in Boston. The Boston server will then take the difference between T5 and T3 and send that difference to the Seattle server.

复制对象A的过程于是如下：识别被记录为已经被复制到目标存储库的对象A0和在本地存储库中的A的近邻。如果没有这样的对象A0存在，则将A发送到远程存储库，并在本地将它记录为已发送。为了将本地对象发送到远程存储库，如在这里体现的一般方法是：发送对象内的数据块的所有哈希和偏移量；查询远程存储库关于哪些哈希代表远程地不存在的数据；将所需数据发送到远程存储库（发送数据和哈希在本实施例中通过将它们封装在TCP数据流中而实现）。The process of copying object A is then as follows: Identify object A0 that is recorded as having been copied to the target repository and A's neighbors in the local repository. If no such object A0 exists, A is sent to the remote repository, and it is recorded locally as sent. To send a local object to a remote repository, the general approach as embodied here is: send all hashes and offsets of the data blocks within the object; query the remote repository as to which hashes represent data that does not exist remotely; Send the required data to the remote repository (sending the data and hash is achieved in this example by encapsulating them in a TCP data stream).

相反，如果A0被识别，则运行差异引擎以识别在A中但不在A0中的数据块。这应是需要被发送到远程存储库的数据的超集。发送在A中但不在A0中的块的哈希和偏移量。查询远程存储库关于哪些哈希代表远程地不存在的数据；将所需数据发送到远程存储库。Conversely, if A0 is identified, run the difference engine to identify data blocks that are in A but not in A0. This should be a superset of the data that needs to be sent to the remote repository. Send hashes and offsets of blocks that are in A but not in A0. Query the remote repository about which hashes represent data that does not exist remotely; send the required data to the remote repository.

样本部署体系结构Sample Deployment Architecture

图16示出包括数据管理虚拟化（DMV）系统的一个实施例的软件和硬件部件。包括该系统的软件作为三个分布式部件来执行：Figure 16 illustrates the software and hardware components of one embodiment comprising a data management virtualization (DMV) system. The software comprising the system executes as three distributed components:

主机代理软件1602a、1602b、1602c实现上面描述的一些应用特定模块。它与应用在相同的服务器1610a、1610b、1610c上执行，其中该应用的数据被管理。Host agent software 1602a, 1602b, 1602c implement some of the application specific modules described above. It executes on the same server 1610a, 1610b, 1610c as the application whose data is managed.

DMV服务器软件1604a、1604b实现如这里所述的系统的其余部分。它在也提供高度可用的虚拟化存储服务的一组Linux服务器1612、1614上运行。DMV server software 1604a, 1604b implements the remainder of the system as described herein. It runs on a set of Linux servers 1612, 1614 that also provide highly available virtualized storage services.

该系统由在桌上型或膝上型计算机1620上运行的管理客户端软件1606控制。The system is controlled by management client software 1606 running on a desktop or laptop computer 1620 .

这些软件部件通过IP网络1628经由网络连接彼此通信。数据管理虚拟化系统通过IP网络（例如公共互联网的骨干网）在主站点1622和数据复制（DR）站点1624之间彼此通信。These software components communicate with each other via a network connection over the IP network 1628 . Data management virtualization systems communicate with each other between primary site 1622 and data replication (DR) site 1624 over an IP network (eg, the backbone of the public Internet).

在主站点和DR站点的DMV系统经由光纤信道网络1626访问一个或多个SAN存储系统1616、1618。运行主应用的服务器经由光纤信道网络上的光纤信道或IP网络上的iSCSI访问被DMV系统虚拟化的存储器。在远程DR站点处的DMV系统在Linux服务器1628上运行DMV服务器软件1604c的并行实例。Linux服务器1628也可以是Amazon Web服务EC2实例或其它类似的云计算资源。The DMV systems at the primary and DR sites access one or more SAN storage systems 1616 , 1618 via Fiber Channel network 1626 . Servers running host applications access storage virtualized by the DMV system via Fiber Channel over Fiber Channel networks or iSCSI over IP networks. The DMV system at the remote DR site runs a parallel instance of the DMV server software 1604c on a Linux server 1628. Linux server 1628 may also be an Amazon Web Services EC2 instance or other similar cloud computing resource.

图17是描绘根据本发明的某些实施例的计算机化系统的各种部件的图，其中某些元件可在该计算机化系统上实现。所述逻辑模块可以在包含易失性存储器1702、持久性存储设备（例如硬盘驱动器1708）、处理器1703和网络接口1704的主机计算机1701上实现。使用网络接口，系统计算机可通过SAN或光纤信道设备以及其它实施例与存储池1705、1706交互。虽然图17示出其中系统计算机与各种存储池分离的系统，但一些或全部存储池可安置在主机计算机内，从而消除对网络接口的需要。编程过程可在如图17所示的单个主机上执行，或它们可分布在多个主机当中。Figure 17 is a diagram depicting various components of a computerized system on which certain elements may be implemented, according to some embodiments of the present invention. The logic modules can be implemented on a host computer 1701 including volatile memory 1702 , persistent storage (eg, hard drive 1708 ), processor 1703 and network interface 1704 . Using a network interface, the system computer can interact with the storage pools 1705, 1706 through SAN or Fiber Channel devices, among other embodiments. Although FIG. 17 shows a system in which the system computer is separate from the various storage pools, some or all of the storage pools may be housed within the host computer, thereby eliminating the need for a network interface. Programming processes can be performed on a single host as shown in Figure 17, or they can be distributed among multiple hosts.

图17所示的主机计算机可用作管理工作站，或可实现应用和应用特定代理402，或可实现在此说明书中描述的任何和所有逻辑模块，包括数据虚拟化系统本身，或可用作用于将物理介质的存储池暴露于系统的存储控制器。工作站可连接到图形显示设备1707和输入设备（例如鼠标1709和键盘1710）。或者，活动用户的工作站可包括手持设备。The host computer shown in Figure 17 may be used as a management workstation, or may implement applications and application specific agents 402, or may implement any and all of the logic modules described in this specification, including the data virtualization system itself, or may be used as a Storage pools of physical media are exposed to the system's storage controller. A workstation may be connected to a graphics display device 1707 and input devices such as mouse 1709 and keyboard 1710 . Alternatively, the active user's workstation may include a handheld device.

在整个此说明书中，我们提到软件部件，但对软件部件的提及预期适用于在硬件上运行的软件。在本说明书中提到的对象和数据结构预期适用于实际上存储在存储器——易失性或非易失性的——中的数据结构。同样，服务器预期适用于软件，且引擎预期适用于软件，软件都在硬件例如图17所述的计算机系统上运行。Throughout this specification we refer to software components, but references to software components are intended to apply to software running on hardware. References to objects and data structures in this specification are intended to apply to data structures that are actually stored in memory, volatile or non-volatile. Likewise, servers are contemplated for software and engines are contemplated for software, both running on hardware such as the computer system described in FIG. 17 .

前述内容概述主题的一些更相关的特征。这些特征应被解释为仅仅是例证性的。很多其它有益的结果可通过以不同的方式应用所公开的主题或通过修改如将被描述的主题来得到。例如，所公开的垃圾收集算法可合并其它垃圾收集优化技术例如三色标记。The foregoing outlines some of the more relevant features of the subject. These features should be construed as merely illustrative. Many other beneficial results can be obtained by applying the disclosed subject matter in a different manner or by modifying the subject matter as will be described. For example, the disclosed garbage collection algorithms may incorporate other garbage collection optimization techniques such as three-color marking.

Claims

1. one kind for to reduce the system of the mode of the redundant access operation of primary memory being carried out to the data management function of a plurality of regulations, and described system comprises:

Data management engine, it is for the executing data management function, comprise the video snapshot functions of external memory of the time point that can operate at least to create main memory data, and at least one backup functionality that can operate to create at least one backup copy of data, described data management engine is in response to the electronic service level agreements (SLA) of the plan that is given for the executing data management function

Wherein the time point of data reflection comprises the quoting of the variance data of the change to described data of the baseline full reflection of the data at particular point in time and indication particular point in time afterwards, and

Wherein, the described plan be performed simultaneously in response at least some data management functions of needs, described data management engine creates the time point reflection of described main memory data, and the different information of this time point reflection is delivered to described external memory with at least one in the backup copy that upgrades described master data, make for the described primary memory of all corresponding renewal set to described external memory only once accessed.

2. the system as claimed in claim 1, wherein the described time point reflection at the main memory data at external memory place is stored on the external memory of performance optimization.

3. the system as claimed in claim 1, wherein the described backup copy of the described time point reflection of main memory data is stored on remote memory.

4. the system as claimed in claim 1, wherein the described backup copy of the described time point reflection of main memory data is stored in the storer place that capacity is optimized.

5. system as claimed in claim 4, on the storer that wherein the reflection capacity that is stored in that copies as releasing of the described backup copy of the described time point reflection of main memory data is optimized.

6. the system as claimed in claim 1, wherein variance data comprises message bit pattern, each of described bitmap part corresponding to main memory data, and comprise the new data of those parts that are provided to the described bitmap that designation data changed.

7. the system as claimed in claim 1, wherein variance data comprises range information.

8. the system as claimed in claim 1, wherein said data management engine comprise calling described primary memory with the logic of time point reflection that data are provided and comprise from the logic of the described time point reflection of described primary memory retrieval.

9. one kind for the data management function coming the system of management data, described service level agreement to be given on the calendar basis to put rules into practice according to service level agreement (SLA) and be used to reducing the plan of redundancy between function, and described system comprises:

Data management engine, it is for the executing data management function, comprise at least one snapshot functions and at least one backup functionality, described data management engine comprises the service level policy engine, described service level policy engine receives the SLA with electronic form, and control the scheduling of described data management function according to it

Wherein each electronics SLA is relevant to the respective application of usage data, and wherein each SLA stipulates at least one service level strategy, the pond, source of each tactful specified data, should make therein the pond, destination of copy of the data in pond, described source, indicate the copy frequency of frequency of the operation of this strategy, indicate given copy before being allowed to expire, should be retained retention period how long and indication when described strategy is ready operation hour with the plan information of number of days, make the set of the strategy in SLA can mean for when carrying out the incomparable inconsistent plan of given function and can mean to tackle a plurality of data management functions that carry out in the given source of data, and

Wherein said data management engine can operate to use described application and use pond, described source to carry out preparatory function, the relevant reflection that makes the pond, described source of data have data to be copied, and wherein said preparatory function is performed once, even a plurality of data management functions that described SLA regulation will be carried out this pond, source in the current time.

10. system as claimed in claim 9, if wherein two or more copy functions are scheduled to occur in the synchronization between pond, Chi He destination, same source, in described two or more copy functions only one by described data management engine, carried out, and should copy relevant to the maximum retention time of copy function corresponding to described two or more scheduling.

11. system as claimed in claim 9, wherein preparatory function comprises that described data management engine collects metadata about described application in conjunction with application data, to store.

12. system as claimed in claim 9, wherein preparatory function comprises the static operation of application.

13. system as claimed in claim 12, the static operation of wherein said application comprise, freeze described application and further do not upgrade application data.

14. system as claimed in claim 12, the static operation of wherein said application comprises the I/O cache memory of the application server of removing application data.

15. one kind for using different information between time state by the system of data from the first storage pool backup to the second storage pool, described system comprises:

Data management engine, it is for the executing data management function, comprises creating at least one backup functionality of the backup copy of data,

Described data management engine can operate to carry out the snapshot operation of a sequence on the first storage pool, to create the time point reflection of application data, each continuous time point reflection is corresponding to specific continuous time of the state of described application data, and each snapshot operation creates, and which application data of indication has changed and the different information of the content of the application data changed of corresponding time state;

Described data management engine can operate carries out at least one backup functionality to described application data, and wherein said backup operation is scheduled at discrete time state and carries out,

Wherein said data management engine can operate to maintain the historical information with time state information, and described time state information is indicated the time state of the upper backup functionality that described application data is carried out for the respective backup copy of data; And

Wherein said data management engine can operate come for described application data is carried out described on each time state between the time state of time state and the backup functionality of the current scheduling that will carry out described application data of a backup functionality from the compound different information of described different information establishment, and wherein said data management engine can operate described compound different information is sent to the second storage pool compiles to create the data of described current time state together with the backup copy of the data with a time state on described backup copy.

16. system as claimed in claim 15, wherein different information comprises message bit pattern, each of described bitmap part corresponding to main memory data, and comprise the new data of those parts that are provided to the described bitmap that designation data changed.

17. system as claimed in claim 15, wherein different information comprises range information.

18. system as claimed in claim 15, wherein a plurality of backup functionalitys are scheduled to occur simultaneously, each backup functionality has the different gap of discontinuous time state, and each backup functionality has the different composite different information produced corresponding to described different gap.

19. one kind is recovered the system of the data of storage pool for the different information of use between time state from the backup copy of data, described system comprises:

Data management engine, wherein said data management engine can operate to maintain the historical information of the described time state of indication, and for described time state, storage pool has the time point reflection of application data; And

Wherein said data management engine comprises the logic of time point reflection that returns to the described data of official hour state for the application data by storage pool;

Described data management engine can operate to identify the existence of the time point reflection of stating data for the time state before described official hour state in described storage pool place, and will send to described storage pool with the different information of the described backup copy of data, which application data described different information indicates changed and in the content of described official hour state and the application data changed of the time between the time state before described official hour state.

20. system as claimed in claim 19, wherein different information comprises message bit pattern, each of described bitmap part corresponding to main memory data, and comprise the Backup Data of those parts that are provided to the described bitmap that designation data changed.

21. system as claimed in claim 19, wherein different information comprises range information.

22. system as claimed in claim 19, wherein said time state and described official hour state in the past is discrete time state.

23. the different information of a use between the time state of data object forms the method for the reflection that the releasing of the described data object changed along with the time copies, described method comprises:

By the Content Organizing of the described data object of very first time state, be a plurality of inclusive segments and described inclusive segment is stored in data repository;

Create the layout of organizing of Hash structure to be illustrated in the described data object in its very first time state, wherein for the subset of described Hash structure, each structure comprises the hash signature of corresponding inclusive segment and to relevant to quoting of corresponding inclusive segment, and the logical organization of wherein said layout means the tissue as the described inclusive segment of described inclusive segment being expressed in described data object;

Receive the different information of described data object, described different information indication is with respect to the content changed of the described data object of the second time state of described very first time state, and the position of described different information indication described content changed in described data object;

Form at least one hash signature of the described content changed;

The described content changed that will be unique in described data repository is stored as inclusive segment;

The layout of organizing of revising the Hash structure is with the new construction of described at least one hash signature of merging the described content changed, with as described in indicate in different information as described in the content that changed as described in merge in the layout of organizing of corresponding position, position in structure in data object as described in new construction, and make the hash signature of described new construction relevant to quoting of corresponding inclusive segment to the described content changed; And

Make described new construction relevant to described the second time state, thus, the reflection that the releasing of the described data object of the second time state copies is stored, and does not need to receive the full image of the described data object of described the second time state.

24. method as claimed in claim 23, wherein after forming described at least one hash signature of the described content changed, by least one the hash signature comparison in the layout of organizing of formed signature and Hash structure with in the layout of determining formed structure and whether Already in being organized.

25. method as claimed in claim 23, wherein said at first with as described in indicate in different information as described in occur together with Hash structure in the layout of organizing of corresponding position, position of the content that changed.

26. method as claimed in claim 23, wherein the layout of organizing of Hash structure is the tree construction of tissue.

27. method as claimed in claim 23, wherein the layout of organizing of time structure is maintained, and each time structure is relevant to time state, and each time structure comprises the information of indication corresponding to the Hash structure of relevant time state.

28. the method for the reflection that the releasing of the data object that a management changed along with the time copies, described method comprises:

The unique content of each data object is organized as to a plurality of inclusive segments and described inclusive segment is stored in data repository;

For each data object, create the organized layout of Hash structure, wherein for the subset of described Hash structure, each structure comprises the hash signature of corresponding inclusive segment and to relevant to quoting of corresponding inclusive segment, the logical organization of wherein said layout means the logical organization as the described inclusive segment of described inclusive segment being expressed in described data object, and another subset of wherein said Hash structure comprises the level of hash signature of the described hash signature of corresponding inclusive segment, make the layout of organizing can be traversed to determine that whether content is meaned by the layout of the described tissue of Hash structure, and

For each data object, the layout of organizing of the structure of holding time is to mean the corresponding data object changed along with the time, wherein each structure is relevant to the time state of described data object, and wherein the logic arrangement of structure is indicated the time state of the change of described data object, and wherein each time state is relevant to the Hash structure of the content that means described data object during this time state.

29. method as claimed in claim 28, wherein preset time state data object described time structure to respect to described data object before the Hash structure of the data object content that changed of time state relevant.

30. method as claimed in claim 28, the described Hash structure of the wherein said data content changed are organized as the figure separated with the layout of organizing of the Hash structure of former time state.

31. method as claimed in claim 28, the difference wherein from a time state to the content of the data object of another time state layout of organizing of the time structure of another time state and the every other time state between described another time state and a described state by reference make difference to be determined to determine at a plurality of time states.

32. store the method for removing the reflection copied for one kind, the part of wherein said reflection directly is stored in Hash table with coding form, described method comprises:

For each data object, create the organized layout of Hash structure, wherein for the subset of described Hash structure, each structure comprise comprising corresponding inclusive segment hash signature field and to relevant to quoting of corresponding inclusive segment, the logical organization of wherein said layout means the logical organization as the described inclusive segment of described inclusive segment being expressed in described data object;

Reception will be included in the content in the reflection that the releasing of described data object copies;

Whether definite content received can encode with predetermined lossless coding technique, and wherein encoded radio will be engaged in be used in the described field that comprises hash signature;

If so, coding is placed in described field, and the described Hash structure of mark is to indicate described field to comprise the encoded content of the reflection that described releasing copies;

If not, produce the hash signature of the content received, and described hash signature is placed in described field, and by the corresponding inclusive segment of Content placement in described data repository received, its prerequisite is that the content received is unique.

33. method as claimed in claim 32, wherein said lossless coding is run length encoding.

34. method as claimed in claim 32, wherein the hash signature of each data object creates by the SHA-1 keyed Hash function.

35. method as claimed in claim 32, it also comprises subsequently rebuilds described content from encoded content.

36. one kind for removing and to copy thesaurus and to mean that information that how data object changes along with the time upgrades second and remove the method that copies thesaurus with first, described method comprises:

First, remove and copy the thesaurus place, the unique content of each data object is organized as to a plurality of inclusive segments and described inclusive segment is stored in data repository;

First, remove and copy the thesaurus place, for each data object, create the organized layout of Hash structure, wherein for the subset of described Hash structure, each structure comprises the hash signature of corresponding inclusive segment and to relevant to quoting of corresponding inclusive segment, the logical organization of wherein said layout means the logical organization as the described inclusive segment of described inclusive segment being expressed in described data object;

First, remove and copy the thesaurus place, for each data object, the layout of organizing of the structure of holding time is to mean the corresponding data object changed along with the time, wherein each structure is relevant to the time state of described data object, and wherein the logic arrangement of structure is indicated the time state of the change of described data object, and wherein each time state is relevant to the Hash structure of the content that means the described data object changed with respect to former time state;

Second, remove and copy the thesaurus place, the unique content of each data object is organized as to a plurality of inclusive segments and described inclusive segment is stored in data repository;

Second, remove and copy the thesaurus place, for each data object, maintain the layout of organizing of Hash structure, namely in described the first releasing, copy at least one subset of the described Hash structure at thesaurus place;

Second, remove and copy the thesaurus place, for each data object, the layout of organizing of the structure of holding time is to mean the corresponding data object changed along with the time, wherein the layout of the described tissue of time structure is to remove at least one subset of the described time structure that copies the thesaurus place described first, thereby means the subset of described time state;

In response to using, from described first, remove the information updating that copies thesaurus described second and remove the request that copies thesaurus, find described first remove copy thesaurus and second remove copy thesaurus common and with described second remove the current state time state around that copies thesaurus; And

The hash signature group of the content that the current time state that compiling copies thesaurus from described common state to described the first releasing has changed, and described hash signature group is sent to described second remove and to copy thesaurus, so its layout organized that can upgrade its Hash structure to mean until described first remove the content of the described data object of the current time state that copies thesaurus.

37. method as claimed in claim 36, it also comprises and maintains the history that each releasing copies the described hash signature that thesaurus comprises, and to copy thesaurus be the hash signature in new described hash signature group for to described second, removing, send the corresponding inclusive segment that copies thesaurus from described the first releasing, making described the second releasing copy thesaurus can upgrade its data repository with fresh content.

38. method as claimed in claim 36 is wherein the arest neighbors state of described current state near the described second described time state of removing the described current state that copies thesaurus.

39. method as claimed in claim 36 is wherein ancestors' state of described current state near the described second described time state of removing the described current state that copies thesaurus.

40. method as claimed in claim 36 is wherein the sub-state of described current state near the described second described time state of removing the described current state that copies thesaurus.

41. method as claimed in claim 38, wherein said arest neighbors state are the states connected by one group of edge, described edge and lower than any other group edge and.

42. method as claimed in claim 36, wherein the described logic arrangement of structure comprises branch.

43. method as claimed in claim 37, it also is included in described current state record and has been sent to what state corresponding to the described inclusive segment of described current time state.

44. a method of carrying out the inclusive segment that refuse collection no longer copies to be cited in storage system in releasing with identification, wherein the operation of the redundant marks in mark and scanning technique is avoided, and described method comprises:

The unique content of each data object is organized as to a plurality of inclusive segments in described releasing copies storage system;

For each data object, create the organized layout of Hash structure, wherein for the subset of described Hash structure, each structure comprises the hash signature of corresponding inclusive segment and to relevant to quoting of corresponding inclusive segment, the logical organization of wherein said layout means the logical organization as the described inclusive segment of described inclusive segment being expressed in described data object, and another subset of wherein said Hash structure comprises the level of hash signature of the described hash signature of corresponding inclusive segment, make the layout of organizing can be traversed to determine that whether content is meaned by the layout of the described tissue of Hash structure,

For each data object, the layout of organizing of the structure of holding time is to mean the corresponding data object changed along with the time, wherein each structure is relevant to the time state of described data object, and the time state of the change of the described data object of the logic arrangement of structure indication wherein, and wherein the Hash structure of the content of the described data object that changed with respect to the previous time state of described data object of each time state and expression is relevant;

For described releasing, copy each inclusive segment in storage system, remove its corresponding refuse collection state;

Iteration on described time structure, and for each time structure, the described refuse collection state of the inclusive segment that the described inclusive segment mark only changed for the previous time state with respect to described data object is relevant; And

Any inclusive segment is turned back to the free core pool of the refuse collection state with removing after described iterative step.

45. method as claimed in claim 44, it also comprises use depth-first search iteration on described time structure.

46. method as claimed in claim 44, it also comprises with periodic intervals and repeats described method.

47. method as claimed in claim 44, it carries out described method after also being included in the new time state that adds data object.

48. method as claimed in claim 44, it carries out described method after also being included in the time state that removes data object.

49. method as claimed in claim 44, it also comprises and maintains the global reference list that described releasing copies all the elements section of having distributed in storage system.