WO2023115935A1 - Data processing method, and related apparatus and device - Google Patents
Data processing method, and related apparatus and device Download PDFInfo
- Publication number
- WO2023115935A1 WO2023115935A1 PCT/CN2022/107586 CN2022107586W WO2023115935A1 WO 2023115935 A1 WO2023115935 A1 WO 2023115935A1 CN 2022107586 W CN2022107586 W CN 2022107586W WO 2023115935 A1 WO2023115935 A1 WO 2023115935A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- hot
- network device
- row
- access
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 63
- 238000011084 recovery Methods 0.000 claims abstract description 32
- 230000004044 response Effects 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims description 48
- 238000004891 communication Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 13
- 230000009286 beneficial effect Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 235000019633 pungent taste Nutrition 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000036316 preload Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the embodiments of the present application relate to the field of computers, and in particular, to data processing methods, related devices and equipment.
- An in-memory database is a database that stores all data directly in memory. Compared with a solution that stores data in disk, it can significantly improve access speed. However, due to the volatility of the storage medium itself, the memory data will be lost after power failure or database restart. How to recover data has become an urgent problem to be solved.
- a checkpoint operation when a checkpoint operation is performed, the location information of row data in the log "copy" is backed up, and when data is restored, the location information of all row data in the database is fully restored based on the log.
- the row data is loaded at runtime.
- the embodiment of the present application provides a data processing method, related device and equipment, which defines row data with high access frequency as hot data, and backs up location information of all row data when a checkpoint operation is performed.
- the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime.
- the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
- the first aspect of the embodiment of the present application provides a data processing method, including:
- the network device obtains the location information set corresponding to the row data set, the location information in the location information set corresponds to the row data in the row data set one by one, and the location information indicates the location of the row data in the log.
- the row data set includes hot data, the access frequency of the hot data is higher than that of other data in the row data set, and the location information set includes first location information corresponding to the row data.
- the first network device can acquire a data recovery instruction, where the data recovery instruction is used to instruct to restore the data in the row data set. The network device will respond to the data recovery instruction, and load the hot data into the memory according to the first location information. After the data is restored, the hot data can be directly accessed in the memory.
- the data restoration method improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
- the method before acquiring the location information set of the row data set, the method further includes: recording the popularity information of each row data in the row data set, where the popularity information is used to indicate that each row Data access frequency.
- the access frequency of the row data is reflected by recording the heat information, which provides a basis for determining the hot data in the row data set, and improves the feasibility of the technical solution.
- the popularity information includes the number of visits, recording the popularity information of each row data in the row data set, including: obtaining the first data access request, the first data access request indicating access to the row data The first target row of data in the collection. Then, according to the first data access request, the number of accesses corresponding to the first target row data is increased.
- there are many ways to record the number of visits You can set a frequency counter, an expiration time, and/or a timer to record the number of visits, which is not limited here.
- the method further includes: the network device may further provide a temperature threshold configuration interface, and obtain the temperature threshold input by the user through the temperature threshold configuration interface.
- the popularity threshold may indicate the proportion of hot data in the row data set, or indicate the minimum value of access times corresponding to the hot data, which is not limited here.
- the temperature threshold can also be set according to the type of application, or set by the R&D/testing personnel, which is not specifically limited here.
- the network device can determine hot data from the row data set according to the hotness information and hotness threshold.
- the popularity threshold can be set by the user, which enhances the user experience.
- the heat threshold which can be selected according to the needs of practical applications, which further improves the flexibility of the technical solution of the present application.
- the other data includes cold data
- the location information set further includes second location information corresponding to the cold data.
- the present application can also perform a data access process.
- the network device may obtain a second data access request from the terminal device, and the second data access request indicates to access the second target row data in the row data set.
- the second target row data is accessed based on different methods. If the second target row data is hot data, access the hot data in the memory; if the second target row data is cold data, load the cold data according to the second location information.
- accessing the second target row data may include performing other operations on the second target row data in addition to reading or writing the second target row data, for example, modifying the content of the second target row data, etc. , not limited here.
- the row data set in the memory database is divided into hot data and cold data, and different data are processed differently during the data recovery process.
- Hot data can be directly accessed in memory, ensuring good performance of the system.
- Cold data with low access frequency is backed up as location information. When accessing cold data, it is loaded at runtime according to the location information, which reduces the overall data recovery time of the memory database.
- the second aspect of the embodiment of the present application provides a data processing method, the method is applied to a first network device, the first network device is included in a data processing system, and the data processing system further includes a second network device, the method includes:
- the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than the access frequency of other data in the row data set; and sending the hot checkpoint information to the second network device to After the first network device fails, hot data is accessed based on the second network device.
- the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
- the hot checkpoint information includes hot data, so that the second network device stores the hot data in memory after receiving the hot data.
- the hot checkpoint information includes location information corresponding to the hot data, and the location information is used to indicate the location of the hot data in the log, so that after the second network device receives the location information , the hot data will be loaded into the memory according to the location information.
- the first network device may further execute the method of the foregoing first aspect. Relevant beneficial effects are similar to those of the first aspect, see the first aspect for details, and will not be repeated here.
- the third aspect of the embodiment of the present application provides a data processing method, the method is applied to a data processing system, the data processing system includes a first network device and a second network device, the method includes:
- the first network device acquires hot checkpoint information, where the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set. After acquiring the hot checkpoint information, the first network device sends the hot checkpoint information to the second network device. The second network device receives the checkpoint information, and loads hot data in the memory according to the hot checkpoint information. In case of failure of the first network device, the second network device will receive a data access request, and the data access request indicates access to hot data; and the device responds to the data access request, and accesses the hot data in memory.
- the hot checkpoint information includes hot data or location information corresponding to the hot data, where the location information is used to indicate a location of the hot data in the log. If the hot checkpoint information is hot data, the second network device will store the hot data in memory after receiving the hot data; if the hot checkpoint information is the location information corresponding to the hot data, the second network device will receive the location information The hot data is then loaded into memory based on this location information.
- the first network device may further execute the method of the foregoing first aspect.
- the second network device may also execute the method in the first aspect described above, which will not be repeated here. Relevant beneficial effects are similar to those of the first aspect, see the first aspect for details, and will not be repeated here.
- the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
- the fourth aspect of the embodiment of the present application provides a network device, including:
- An acquisition unit configured to acquire a location information set of a row data set, the row data set includes hot data, the location information set includes first location information corresponding to the hot data, and the access frequency of the hot data is higher than that of other data in the row data set .
- the obtaining unit is also used to obtain a data restoration instruction, and the data restoration instruction is used to instruct to restore the data in the row data set.
- the processing unit is configured to load the hot data into the memory according to the first location information in response to the data recovery instruction.
- the network device is configured to execute the method in the first aspect, and the beneficial effect shown in this aspect is similar to the beneficial effect in the first aspect, which will not be repeated here.
- the fifth aspect of the embodiment of the present application provides a first network device.
- the first network device is included in a data processing system.
- the data processing system further includes a second network device.
- the first network device includes:
- the acquiring unit is configured to acquire hot checkpoint information, the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set.
- the sending unit is configured to send hot checkpoint information to the second network device, so that the hot data can be accessed based on the second network device after the first network device fails.
- the first network device is configured to execute the method in the second aspect, and the beneficial effect shown in this aspect is similar to the beneficial effect in the second aspect, which will not be repeated here.
- a sixth aspect of the embodiments of the present application provides a data processing system, including a first network device and a second network device.
- the first network device is used to obtain hot checkpoint information, the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than the access frequency of other data in the row data set; send to the second network device Hot checkpoint information.
- the second network device is configured to load hot data in the memory according to the hot checkpoint information; if the first network device fails, the second network device receives a data access request, and the data access request indicates access to hot data; in response to the data access request, the Access hot data in memory.
- the data processing system is used to execute the method of the third aspect, and the beneficial effect shown in this aspect is similar to the beneficial effect of the third aspect, which will not be repeated here.
- the seventh aspect of the embodiment of the present application provides a network device, including a processor, a memory and a communication interface, the processor, the memory and the communication interface are connected, and the processor is used to execute the method in any one of the first to third aspects. .
- the eighth aspect of the embodiment of the present application provides a computer-readable storage medium, in which a program is stored, and when the computer executes the program, any one of the aforementioned first to third aspects is executed Methods.
- a ninth aspect of the embodiment of the present application provides a computer program product, which is characterized in that, when the computer program product is executed on a computer, the computer executes the method of any one of the foregoing first aspect to the third aspect.
- Figure 1 is a schematic diagram of an in-memory database
- Fig. 2 is a schematic diagram of an application architecture of the data processing method
- FIG. 3 is a schematic flow chart of a data processing method provided in an embodiment of the present application.
- Fig. 4a is another schematic flowchart of the data processing method provided by the embodiment of the present application.
- Figure 4b is a schematic diagram of the data form provided by the embodiment of the present application.
- FIG. 5 is another schematic flowchart of the data processing method provided by the embodiment of the present application.
- FIG. 6 is another schematic flowchart of the data processing method provided in the embodiment of the present application.
- Fig. 7a is a schematic diagram of the threshold setting interface provided by the embodiment of the present application.
- Fig. 7b is another schematic diagram of the threshold setting interface provided by the embodiment of the present application.
- FIG. 8 is another schematic flowchart of the data processing method provided in the embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
- FIG. 10 is another schematic flowchart of the data processing method provided by the embodiment of the present application.
- FIG. 11 is another schematic flowchart of the data processing method provided by the embodiment of the present application.
- FIG. 12 is another schematic flowchart of the data processing method provided by the embodiment of the present application.
- FIG. 13 is a schematic structural diagram of a network device provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of a first network device provided by an embodiment of the present application.
- FIG. 15 is another schematic structural diagram of the data processing system provided by the embodiment of the present application.
- FIG. 16 is another schematic structural diagram of a network device provided by an embodiment of the present application.
- the embodiment of the present application provides a data processing method, related devices and equipment, which defines row data with high access frequency as hot data, and backs up location information of all row data when a checkpoint operation is performed.
- the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime.
- the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
- At least one means one or more
- plural means two or more.
- “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
- the character “/” generally indicates that the contextual objects are an “or” relationship.
- “At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
- At least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
- An in-memory database is a database that stores data in memory and directly operates it.
- the significant advantage of an in-memory database is that the access speed of data in memory is greatly improved compared with that on disk, which can improve application performance.
- the data structure of the in-memory database will be described below in conjunction with FIG. 1 . Please refer to FIG. 1 , which is a schematic diagram of the in-memory database.
- Figure 1 Generally, the data organization of an in-memory database is shown in Figure 1, which mainly includes row data and indexes.
- the index can be used to quickly locate the row ID (row ID) where the row data is located, and the index can adopt tree structures such as adaptive radix tree (ART), MassTree, and bwTree. Among them, the ART tree may also be called a prefix tree. It should be noted that in practical applications, the index may also be based on other types of tree structures, or not adopt tree structures, which are not specifically limited here.
- the index includes a primary index (primary index) and a secondary index (secondary index), the primary index corresponds to the unique identifier in the row data, and can accurately index the unique row data; the secondary index can It is understood as an auxiliary identifier for rough retrieval.
- Each row of data is assigned a unique row ID, and the row ID is indexed by the row map.
- multi-version concurrency control MVCC
- MVCC multi-version concurrency control
- the same row of data will include multiple record (record) versions, and multiple versions of records form a version chain (version chain), and each record in the version chain is generally arranged in order from new to old .
- the storage layer stores logs in the log store, which is the unit for collecting, storing and querying log data in the log service.
- the storage layer also includes checkpoint information.
- the checkpoint information includes the location information set corresponding to the row data set.
- the location information in the location information set corresponds to the row data in the row data set.
- the location information Reflects the position of row data in the log. When the memory database fails, data recovery can be performed through the checkpoint information.
- the Ziff distribution is also known as the 80/20 rule.
- the access data of the application complies with the 80/20 rule, and 80% of the business is realized by accessing 20% of the business. Therefore, in the embodiment of the present application, the hot data with high access frequency is recorded in the memory in advance, which can ensure the performance requirements of most access requests after the data is restored.
- FIG. 2 is a schematic diagram of an application architecture of the data processing method provided by the embodiment of the present application.
- a communication connection is established between a network device 201 and a terminal device 202 , and a memory database exists in the network device 201 .
- the terminal device 202 may send a data access request to the network device 201, where the data access request indicates to access data in the memory database. Due to the volatility of the memory medium itself, after the memory database fails (such as power failure or database restart), the data in the memory database will be completely lost, and it is necessary to restore the data in time to reduce adverse effects.
- FIG. 1 is only an example of the application architecture.
- the network device 201 may also be other media with storage functions, such as a host.
- the terminal device 202 involved in the embodiment of the present application may include various handheld devices with communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem.
- the terminal device can also be called a terminal, and the terminal device can also be a subscriber unit, a cellular phone, a smart phone, a wireless data card, a personal digital assistant (PDA) ) computer, tablet computer, wireless modem (modem), handheld device (handset), laptop computer (laptop computer), machine type communication (machine type communication, MTC) terminal, etc., are not limited here.
- PDA personal digital assistant
- the network device 201 can also provide services to more terminal devices 202, and the number and types of terminal devices are determined according to actual needs, which are not specifically limited here.
- FIG. 3 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the row data set includes hot data
- the location information set includes first location information corresponding to the hot data
- the access frequency of the hot data is higher than that of other data in the row data set.
- a row data set is stored, and the row data set includes at least one row data.
- the network device performs a checkpoint operation, it scans the location information set corresponding to the row data set, and the location information set indicates the location information of the row data set in the log. Specifically, the position information in the position information set is in one-to-one correspondence with the row data in the row data set.
- FIG. 4a is a schematic flow chart of the data processing method provided in the embodiment of the present application, including the following steps:
- a checkpoint thread triggers a checkpoint operation.
- the data backup corresponding to the checkpoint operation can be divided into full backup and incremental backup.
- full backup refers to backing up the positions of all data in the memory data in the log
- incremental backup refers to backing up the positions of the changed data in the memory database in the log.
- the strategy for triggering a checkpoint request may include not only the cycle time of the checkpoint operation interval, but also the occurrence frequency of full backup and incremental backup, which is not limited here.
- the policy may stipulate that four incremental backups are performed after one full backup, and the checkpoint operation is performed reciprocally at a fixed time interval (such as 20 minutes) between each backup. It should be noted that the specific content of the policy that triggers the checkpoint request is determined according to the actual application, and is not limited here.
- the in-memory database obtains the timestamp corresponding to the checkpoint.
- the in-memory database gets the timestamp corresponding to the checkpoint.
- the memory database scans the snapshot data based on the timestamp, and saves the location information corresponding to the snapshot data.
- the in-memory database scans the snapshot version of the data based on the timestamp, that is, scans the snapshot data.
- the so-called snapshot data refers to a copy of the data at a certain moment.
- the memory database will also store the corresponding location information of the snapshot data in the disk, and the location information indicates the location of the row data in the log.
- the disk can be a local disk of the network device, or other media with functions such as data reading and writing, such as a disk on a cloud network (virtual machine).
- the type and location of the disk are selected according to the needs of the actual application. There is no limit here.
- the memory database saves the heat information corresponding to the row data to disk.
- the in-memory database also stores the heat information corresponding to the row data to the disk, and the heat information is used to indicate the access frequency of the row data and to determine the hot data.
- the row data set includes hot data, and the access frequency of the hot data is higher than other data in the row data set.
- the location information set includes first location information corresponding to the thermal data, and the first location information indicates the location of the thermal data in the log.
- step 404 the network device also executes step 405, that is, the memory database feedbacks that the checkpoint operation is completed.
- FIG. 4b is a schematic diagram of the data form provided by the embodiment of the present application.
- a network device when a network device saves to disk, it can be saved in the form of data form 1, and heat information is attached to the checkpoint file.
- the checkpoint file includes the location information set corresponding to the row data, and the heat information reflects the row data set.
- the access frequency of each row of data in is used to determine hot data.
- the network device can also save to disk according to data form 2, storing the first location information corresponding to the hot data in the hot data file, and storing the second location information corresponding to the cold data in the cold data file.
- the in-memory database After the in-memory database fails, the in-memory database will receive a data recovery instruction, which instructs to restore data in the row data set.
- the network device will respond to the data recovery instruction, and preload the hot data into the memory according to the first location information, so that when accessing the hot data, it can be directly accessed from the memory.
- FIG. 5 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the data recovery thread triggers a data recovery operation.
- an alarm can be issued to prompt technicians to perform data recovery, and the technicians control the data recovery thread to trigger data recovery operations.
- the data recovery operation for example, start the timer when the memory database fails, and automatically trigger the data recovery operation when the timer expires and no data recovery command from the user is received.
- the situation that triggers the data recovery operation is determined according to the actual application, which is not limited here.
- the memory database loads the checkpoint data.
- the memory database After the memory database receives the data recovery instruction, it will load the checkpoint data. Specifically, the in-memory database will read the checkpoint data of the full backup and the checkpoint data of the incremental backup after the full backup from the storage layer at the closest time point to the time when the failure occurred. In the embodiment of the present application, since the location information corresponding to the row data is backed up when the checkpoint operation is performed, the loaded checkpoint data is the location information corresponding to the row data.
- the memory database restores the corresponding offset position of the row data.
- the in-memory database will completely restore the location information of all row data in the in-memory database based on the log library, that is, restore the offset position of the row data in the log to the memory.
- the memory database sets the global clock.
- the data processing process involved in the embodiment of the present application depends on the time stamp.
- the time stamp can be run as a counter. Therefore, in order to prevent errors in the recovered data, a global clock will be set so that the recovered data is located in the correct Location.
- the memory database preloads hot data.
- the in-memory database will preload the hot data in the row data set into the memory, so that the restored in-memory database has better performance.
- the network device may also execute step 506, that is, return that the data recovery operation is completed.
- row data with a high access frequency is defined as hot data, and position information of all row data is backed up when a checkpoint operation is performed.
- the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime.
- the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
- the network device before step 301, also records popularity information of each row data in the row data set, where the popularity information is used to indicate the access frequency of each row data.
- the access frequency of the row data is reflected by recording the heat information, which provides a basis for determining the hot data in the row data set, and improves the feasibility of the technical solution.
- the popularity information includes access times, and when the row data is accessed, the network device will modify the popularity information corresponding to the row data. That is to say, recording the popularity information of each row data in the row data set includes: obtaining a first data access request, where the first data access request indicates to access the first target row data in the row data set. Then, according to the first data access request, the number of accesses corresponding to the first target row data is increased.
- FIG. 6 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the network device acquires a first data access request from the terminal device.
- the network device will obtain a first data access request from the terminal device, where the first data access request indicates to access the first target row data in the row data set.
- the first data access request may carry the row number corresponding to the first row of data, or the unique index corresponding to the first target row data, so that the network device can determine whether the accessed row data is The first target row data.
- the unique index is the main index in the embodiment shown in FIG. 1.
- the memory database stores information about at least one product, including product name, unit price, inventory and other information. The product names of different products are different. The same, then the product name of each product can be used as a unique index.
- Accessing the first target row data may include performing other operations on the first target row data in addition to reading or writing the first target row data, for example, modifying the content of the first target row data, etc., specifically There is no limit here.
- the network device acquires a time stamp corresponding to the first data access request.
- the first data access request can also carry the timestamp corresponding to the request, and the timestamp is used to indicate the row data at which time the first target row data is accessed, so as to avoid multiple versions of the row data in the memory database. resulting in access errors.
- access to data in the in-memory database is based on the concurrency control protocol, and the type of the concurrency control protocol is determined according to the type of the in-memory database, which is not limited here.
- the network device When the first target row data is accessed, the network device will modify the popularity information corresponding to the first target row data, specifically, it can increase the number of visits to the first target row data. There are many possible situations how to increase the number of visits, and the possible situations are described below.
- the network device may separately maintain a frequency counter accessCount for each row of data, and its initial value is 0.
- n is any positive number.
- the accessCount can be changed periodically according to the time interval of executing the checkpoint operation. For example, assuming that the checkpoint operation is performed every 20 minutes, then at the beginning of each cycle, the initial value of accessCount is 0, and the accessCount++ operation is executed according to the number of accesses within the cycle.
- accessCount may not change periodically. For example, when the row data set is stored in the memory database for the first time, the initial value of accessCount corresponding to each row data is 0, and which row of data is accessed, the accessCount++ operation is performed on the accessCount corresponding to the row data.
- the network device may also set an expiration time for each row of data, and the duration of the expiration time may be shorter than the interval between checkpoint operations.
- the expiration time corresponding to a row of data expires, if the row of data is accessed, the number of accesses to the row of data will be increased; if the row of data is not accessed within the corresponding expiration time of a row of data, the row will be considered The number of visits corresponding to the data remains unchanged.
- the network device may also record the number of visits in combination with the frequency counter and the expiration time. Specifically, the network device sets an expiration time and a frequency counter for each row of data. If the row of data is not accessed within the expiration time, the value of the frequency counter corresponding to the row of data remains unchanged and is refreshed. The invalid expiration time is used to continue counting; if the row data is accessed before the expiration time expires, then the frequency counter corresponding to the row data executes an accessCount++ operation, and the expiration time is refreshed when the row data is accessed. Among them, the meaning of the accessCount++ operation has been explained above and will not be explained here.
- the value of accessCount corresponding to the row of data remains unchanged, and the expiration time is refreshed. 31s to enter a new round of expiration time. If the row of data is accessed at the 15th second, the frequency counter corresponding to the row of data executes an accessCount++ operation, and refreshes the expiration time, and enters a new round of expiration time from the 16th second.
- the accessCount can be changed periodically according to the time interval of executing the checkpoint operation, or not. It has been described in detail above, The details will not be repeated here.
- the above three methods are only some possibilities for recording the number of visits.
- the number of visits can also be recorded based on similar principles. For example, setting a timer, if within the time set by the timer, the first target If the row data is accessed, it is determined that the first target row data is hot data; if the first target row data is not accessed within the time set by the timer, then it is determined that the first target row data is cold data.
- the specific process of recording the number of visits is determined according to the needs of the actual application, and is not specifically limited here.
- step 605 may also be executed by the network device to report the completion of the data access, where there are multiple ways for the network device to report the completion of the data access: if the first data access request indicates the read the first target row data, then the network device can send the first target row data to the terminal device to feedback the completion of data access. If the first data access request indicates to modify the first target row data, the network device may send a modification completion message to the terminal device to feed back the completion of the data access. In practical applications, there may be other manners, which are not specifically limited here.
- the network device may provide a temperature threshold configuration interface, where the temperature threshold configuration interface is used to obtain a temperature threshold input by a user.
- the network device determines the hot data from the row data set according to the heat information and the heat threshold.
- the popularity threshold may indicate the proportion of hot data in the row data set, or indicate the minimum value of access times corresponding to the hot data, which is not limited here. The possible situations are described below.
- the hotness threshold input by the user indicates the proportion of hot data in the row data set.
- the heat threshold configuration interface may be displayed as the threshold setting interface shown in FIG. 7a. If the heat threshold set by the user is 20%, the user only needs to input 20. In practical applications, the threshold setting interface may also have multiple forms, and the values input by the user may also be expressed in different orders of magnitude, which is not specifically limited here.
- step 404 shown in Figure 4a the popularity information saved to the disk is in accordance with the accessCount Descending order is performed, and the saved data includes ⁇ accessCount, row number>, then the row data corresponding to the location information stored in the first 20% (that is, the 20% with the highest access frequency) is determined as hot data.
- the accessCount value corresponding to the row data is the highest among the 5 row data, all of which are 3, so there are multiple processing situations. If the network device follows the above "1) record the number of visits based on the frequency counter", then the network can determine that any one of the two data is hot data. If the network device records the number of visits according to the above "3) according to the frequency counter and expiration time", then the network device can further compare the number of failures corresponding to the expiration time of the two rows of data, and select the row data with fewer failure times as hot data . If the number of failures is the same, it can be determined that any one of the two data is hot data.
- the heat threshold value input by the user indicates the minimum value of the number of visits corresponding to the heat data.
- the thermal threshold configuration interface may be displayed as the threshold setting interface shown in FIG. 7b. As shown in Figure 7b, the heat threshold set by the user is 100 times. In practical applications, the threshold setting interface may also have multiple representations, and the value input by the user may also be other values, which are not specifically limited here.
- the popularity threshold can be set by the user, which enhances the user experience.
- the heat threshold which can be selected according to the needs of practical applications, which further improves the flexibility of the technical solution of the present application.
- the temperature threshold can also be set according to the type of application, or set by the R&D/testing personnel, which is not specifically limited here.
- the network device determines the hot data from the row data set, it can determine other data in the row data set as cold data, and determine the first location corresponding to the hot data from the location information set The information corresponds to the second location information of the cold data.
- the network device may obtain a second data access request from the terminal device, and the second data access request indicates to access the second target row data in the row data set. If the second target row data is hot data, access the hot data in the memory; if the second target row data is cold data, load the cold data according to the second location information.
- FIG. 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the network device acquires the second data access request from the terminal device.
- the terminal device may send a second data access request to the network device, where the second data access request indicates to access the second target row data in the row database.
- the second data access request may carry the row number corresponding to the second row data, or the unique index corresponding to the second target row data, so that the network device can determine whether the row data being accessed is Second target row data.
- Accessing the second target row data may include performing other operations on the second target row data in addition to reading or writing the second target row data, for example, modifying the content of the second target row data, etc., specifically There is no limit here.
- the network device determines whether the second target row data to be accessed indicated by the second data access request is hot data, if yes, perform step 803, and if not, perform step 804.
- the network device needs to determine whether the second target row data to be accessed indicated by the second data access request is hot data, and perform corresponding operations.
- a network device accesses hot data in memory.
- the network device can directly access the second target row data in memory.
- the network device loads cold data according to the second location information.
- the network device needs to load the cold data according to the second location information, and then access the loaded cold data.
- the second location information indicates the location of the cold data in the log.
- the network device may also execute step 805, where the network device feeds back that the data access is completed. Similar to step 605 in the embodiment shown in FIG. 6 , in step 805 , there are multiple ways for the network device to report the completion of data access, which will not be repeated here. The difference is that in step 605, the data access completion information fed back by the network device corresponds to the first target row data; in step 805, the data access completion information fed back by the network device corresponds to the second target row data.
- the row data set in the memory database is divided into hot data and cold data, and different data are processed differently during the data recovery process.
- Hot data can be directly accessed in memory, ensuring good performance of the system.
- Cold data with low access frequency is backed up as location information. When accessing cold data, it is loaded at runtime according to the location information, which reduces the overall data recovery time of the memory database.
- FIG. 9 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
- the data processing system includes a first network device and a second network device
- the first network device may be called a main network device, and the first network device has a main memory database
- the second network device may be called a backup network device, there is a backup in-memory database on the second network device, or a secondary in-memory database.
- the design scheme of the master-slave memory database can achieve high availability.
- Synchronization technology is adopted between the main memory database and the slave memory data path.
- the slave memory database can quickly take over.
- the first network device will synchronize hot checkpoint information with the second network device.
- the data processing system further includes a management device, the management device is used to determine the status of the first network device, and when the first network device fails, send a takeover instruction to the second network device, so that the second network device The network device can quickly take over the work of the first network device.
- FIG. 10 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the first network device acquires hot checkpoint information, the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set.
- the hot checkpoint information includes hot data, and the way for the first network device to determine the hot data has been described in detail above, and will not be repeated here.
- the hot checkpoint information includes location information corresponding to the hot data, and the location information is used to indicate the location of the hot data in the log. It can be understood that the location information corresponding to the hot data shown here is the first location information introduced above, and how the network device obtains the first location information has been described in detail above and will not be repeated here.
- the first network device sends hot checkpoint information to the second network device.
- the first network device sends hot checkpoint information to the second network device asynchronously, so that after the first network device fails, the base terminal device can access hot data based on the second network device.
- the second network device takes over the work of the first network device.
- FIG. 11 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the management device sends a takeover signal to the second network device.
- the management device After the first network device fails, the management device will send a takeover signal to the second network device to instruct the second network device to take over the work of the first network device.
- the second network device reports that the takeover is complete.
- the second network device may report the completion of the takeover to the management device.
- the second network device may report the completion of the takeover to the management device.
- the second network device successfully receives the hot checkpoint information transmitted by the first network device before the failure, it may be considered that the second network device takes over successfully.
- the terminal device sends a hot data access request to the second network device.
- the terminal device may send a hot data access request to the second network device, where the request is used to instruct access to the hot data in the row data set.
- Accessing hot data may include, in addition to reading or writing the hot data, other operations on the hot data, for example, modifying the content of the hot data, etc., which is not limited here.
- the second network device directly accesses the hot data in memory.
- the second network device Since the hot data access request indicates that the hot data is accessed, when the hot checkpoint information received by the second network device is hot data, the second network device will store the hot data in the memory. In this case, the second network Devices can access hot data directly in memory.
- the second network device will load the hot data into the memory according to the location information after receiving the location information, It is convenient for the second network device to access the hot data in the memory in the subsequent process.
- the second network device feeds back that hot data access is complete.
- the network device may also perform step 1105, and the second network device feeds back that the data access is completed. Similar to step 605 in the embodiment shown in FIG. 6 , in step 1105 , the second network device can feedback the completion of hot data access in many ways, which will not be repeated here. The difference is that in step 605, the execution subject is a network device, and the data access completion information fed back by the network device corresponds to the first target row data; in step 1105, the execution subject is the second network device, and the second network The data access completion information fed back by the device corresponds to hot data.
- the row data set includes not only hot data but also cold data
- the second network device can also implement access to cold data, that is, perform steps 1106 to 1108 .
- FIG. 11 is only an example of the technical solution of the present application, and does not constitute a limitation on steps, and there is no necessary sequence between step 1106 and step 1103 .
- the terminal device sends a cold data access request to the second network device.
- the terminal device may send a cold data access request to the second network device, where the request is used to instruct to access the cold data in the row data set.
- accessing cold data may also include performing other operations on cold data, for example, modifying the content of cold data, etc., which is not limited here.
- the second network device loads the cold data during operation according to the location information of the cold data.
- the first network device and the second network device share a disk, and the first network device may perform the operation shown in FIG. 4a to save the location information set corresponding to the row data set to disk.
- the second network device may obtain the location information corresponding to the cold data through the disk, and load the cold data during operation based on the location information, so as to complete the access to the cold data.
- the second network device reports back that the cold data access is completed.
- the network device may also perform step 1108, and the second network device feeds back that the data access is completed. Similar to step 605 in the embodiment shown in FIG. 6 , in step 1108 , the second network device may feedback the completion of the cold data access in various ways, which will not be repeated here. The difference is that in step 605, the execution subject is a network device, and the data access completion information fed back by the network device corresponds to the first target row data; in step 1108, the execution subject is a second network device, and the second network The data access completion information fed back by the device corresponds to cold data.
- the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
- the first network device may also perform the operations performed by the network device in the embodiments shown in FIGS. 1 to 8;
- the second network device may also perform the operations performed by the network device in the foregoing embodiments shown in FIG. 1 to FIG. 8 , which will not be repeated here.
- the embodiment of the present application also provides a data processing method, the method is applied to a data processing system, and the data processing system includes a first network device and a second network device.
- FIG. 12 is a flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
- the first network device acquires hot checkpoint information, the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set.
- the first network device sends hot checkpoint information to the second network device.
- Step 1201 to step 1202 are similar to step 1001 and step 1002 in the aforementioned embodiment shown in FIG. 10 , please refer to the description of step 1001 and step 1002 for details, and details are not repeated here.
- the second network device loads hot data into the memory according to the hot checkpoint information.
- the second network device After the second network device obtains the hot checkpoint information, it loads hot data into the memory. Specifically, if the hot checkpoint information is hot data, the second network device will store the hot data in the memory; if the hot checkpoint information is the location information corresponding to the hot data, the second network device will store the Location information to load hot data into memory.
- the second network device receives a data access request, and the data access request indicates access to hot data.
- the second network device may receive a data access request indicating access to hot data.
- the second network device responds to the data access request, and accesses the hot data in the memory.
- the second network device may also perform the operations shown in steps 1106 to 1108 in the embodiment shown in FIG. 11 , which will not be detailed here.
- the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
- the first network device may also perform the operations performed by the network device in the embodiments shown in FIGS. 1 to 8;
- the second network device may also perform the operations performed by the network device in the foregoing embodiments shown in FIG. 1 to FIG. 8 , which will not be repeated here.
- FIG. 13 is a schematic structural diagram of a network device provided by an embodiment of the present application.
- the network device 1300 includes:
- the acquiring unit 1301 is configured to acquire the location information set of the row data set, the row data set includes hot data, the location information set includes the first location information corresponding to the hot data, and the access frequency of the hot data is higher than that of other data in the row data set frequency.
- the acquiring unit 1301 is further configured to acquire a data restoration instruction, and the data restoration instruction is used to instruct to restore data in the row data set.
- the processing unit 1302 is configured to load the hot data into the memory according to the first location information in response to the data restoration instruction.
- the processing unit 1302 is further configured to record popularity information of each row of data in the row data set, where the popularity information is used to indicate the access frequency of each row of data.
- the popularity information includes the number of visits
- the processing unit 1302 is specifically configured to: obtain a first data access request, where the first data access request indicates access to the first target row data in the row data set; A data access request, increasing the number of visits corresponding to the first target row data.
- the processing unit 1302 is further configured to: provide a popularity threshold configuration interface, and the popularity threshold configuration interface is used to obtain the popularity threshold input by the user; determine the hotness threshold from the row data set according to the popularity information and the popularity threshold data.
- the other data includes cold data
- the location information set further includes second location information corresponding to the cold data
- the acquiring unit 1301 is further configured to acquire a second data access request from the client, where the second data access request indicates to access the second target row data in the row data set;
- the processing unit 1302 is further configured to: if the second target row data is hot data, access the hot data in memory; if the second target row data is cold data, load the cold data according to the second location information.
- the network device 1300 is configured to execute the operations performed by the network device in the foregoing embodiments shown in FIGS. 1 to 8 , and details are not repeated here.
- FIG. 14 is a schematic structural diagram of the first network device provided by the embodiment of the present application.
- the first network device 1400 is included in the data processing system, and the data processing system also includes a second network device, the first network device 1400 include:
- the obtaining unit 1401 is configured to obtain hot checkpoint information, the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set;
- the sending unit 1402 is configured to send hot checkpoint information to the second network device, so that the hot data can be accessed based on the second network device after the first network device fails.
- the hot checkpoint information includes hot data.
- the hot checkpoint information includes location information corresponding to the hot data, and the location information is used to indicate the location of the hot data in the log.
- the first network device 1400 is configured to perform the operations performed by the first network device in the foregoing embodiments shown in FIGS. 9 to 11 , and details are not repeated here.
- FIG. 15 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
- the data processing system 1500 includes a first network device 1501 and a second network device 1502 .
- the first network device 1501 is configured to: acquire hot checkpoint information, the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set. Send hot checkpoint information to the second network device.
- the second network device 1502 is configured to: load hot data in memory according to hot checkpoint information. If the first network device fails, the second network device receives a data access request, and the data access request indicates access to hot data. Access hot data in memory in response to data access requests.
- the data processing system 1500 is used to implement the functions implemented by the data processing system in the foregoing embodiments shown in FIGS. 9 to 12 , and details are not described here again.
- FIG. 16 is a schematic structural diagram of the network device provided by the embodiment of the present application.
- the network device 1600 includes: a processor 1601 and a memory 1602, and the memory 1602 stores one or more application programs or data.
- the storage 1602 may be a volatile storage or a persistent storage.
- the program stored in the memory 1602 may include one or more modules, and each module may be used to perform a series of operations performed by the network device 1600 .
- the processor 1601 may communicate with the memory 1602 , and execute a series of instruction operations in the memory 1602 on the network device 1600 .
- the processor 1601 may be a central processing unit (central processing units, CPU), or a single-core processor, or other types of processors, such as a dual-core processor, which are not specifically limited here.
- the network device 1600 may also include one or more communication interfaces 1603, and one or more operating systems, such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and so on.
- Windows Server TM Mac OS X TM
- Unix TM Unix TM
- Linux TM FreeBSD TM
- the network device 1600 can perform the operations performed by the network device in the foregoing embodiments shown in FIGS. 1 to 8 , and the operations performed by the first network device in the foregoing embodiments shown in FIGS. 9 to 11 , which will not be repeated here. .
- the disclosed system, device and method can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求于2021年12月22日提交中国国家知识产权局、申请号为202111582635.3、发明名称为“数据处理方法、相关装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202111582635.3 and titled "Data Processing Method, Related Devices and Equipment" filed with the State Intellectual Property Office of China on December 22, 2021, the entire contents of which are hereby incorporated by reference In this application.
本申请实施例涉及计算机领域,尤其涉及数据处理方法、相关装置及设备。The embodiments of the present application relate to the field of computers, and in particular, to data processing methods, related devices and equipment.
内存数据库是将数据全部直接存放在内存中的数据库,相较于将数据存放在磁盘中的方案,能够显著提升访问速度。但是由于存储介质本身的易失性,掉电或者数据库重启之后,内存数据会丢失。如何进行数据恢复成为了亟待解决的问题。An in-memory database is a database that stores all data directly in memory. Compared with a solution that stores data in disk, it can significantly improve access speed. However, due to the volatility of the storage medium itself, the memory data will be lost after power failure or database restart. How to recover data has become an urgent problem to be solved.
一种数据处理的方法中,在执行检查点操作时,备份行数据在日志“副本”中的位置信息,在数据恢复时,基于日志,把数据库中所有行数据的位置信息完整恢复出来。在系统运行时,基于这些位置信息,在运行时加载行数据。In a data processing method, when a checkpoint operation is performed, the location information of row data in the log "copy" is backed up, and when data is restored, the location information of all row data in the database is fully restored based on the log. When the system is running, based on these location information, the row data is loaded at runtime.
在这种方法中,在数据恢复时,恢复的是行数据的位置信息,在系统运行时才加载行数据,导致数据恢复后,系统可用的性能不高。In this method, when data is restored, the location information of the row data is restored, and the row data is loaded only when the system is running, resulting in low performance of the system after the data is restored.
发明内容Contents of the invention
本申请实施例提供了数据处理方法、相关装置及设备,将访问频率高的行数据定义为热数据,在执行检查点操作时,备份所有行数据的位置信息。在数据恢复时,根据热数据的第一位置信息,将热数据加载到内存中,从而热数据不需要在运行时加载。结合齐夫分布可知,大部分的业务都是基于对少部分的数据(热数据)访问实现的。因此本申请实施例提供的数据恢复方法通过保证访问热数据的业务的性能,提升了数据恢复后系统的性能。The embodiment of the present application provides a data processing method, related device and equipment, which defines row data with high access frequency as hot data, and backs up location information of all row data when a checkpoint operation is performed. When data is restored, the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime. Combined with the Zif distribution, it can be seen that most of the business is based on access to a small amount of data (hot data). Therefore, the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
本申请实施例第一方面提供了一种数据处理方法,包括:The first aspect of the embodiment of the present application provides a data processing method, including:
在内存数据库中,存储至少一个行数据,这至少一个行数据包含于行数据集合中。在执行检查点操作时,网络设备获取行数据集合对应的位置信息集合,该位置信息集合中的位置信息与行数据集合中的行数据一一对应,位置信息指示行数据在日志中的位置。行数据集合中包括热数据,热数据的访问频率高于行数据集合中其他数据的访问频率,位置信息集合中包括行数据对应的第一位置信息。在内存数据库发生故障的情况下,第一网络设备能够获取数据恢复指令,该数据恢复指令用于指示恢复行数据集合中的数据。网络设备会响应数据恢复指令,根据第一位置信息,将热数据加载到内存中。使得数据恢复之后,可以直接在内存中访问到热数据。In the memory database, at least one row of data is stored, and the at least one row of data is included in the row data set. When performing a checkpoint operation, the network device obtains the location information set corresponding to the row data set, the location information in the location information set corresponds to the row data in the row data set one by one, and the location information indicates the location of the row data in the log. The row data set includes hot data, the access frequency of the hot data is higher than that of other data in the row data set, and the location information set includes first location information corresponding to the row data. In the case of a fault in the memory database, the first network device can acquire a data recovery instruction, where the data recovery instruction is used to instruct to restore the data in the row data set. The network device will respond to the data recovery instruction, and load the hot data into the memory according to the first location information. After the data is restored, the hot data can be directly accessed in the memory.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
将访问频率高的行数据定义为热数据,在执行检查点操作时,备份所有行数据的位置信息。在数据恢复时,根据热数据的第一位置信息,将热数据加载到内存中,从而热数据不需要在运行时加载。结合齐夫分布可知,大部分的业务都是基于访问少部分的数据(热数据)实现的。因此本申请实施例提供的数据恢复方法通过保证访问热数据的业务的性能,提升了数据恢复后系统的性能。Define row data with high access frequency as hot data, and back up the location information of all row data when performing a checkpoint operation. When data is restored, the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime. Combined with the Zipf distribution, it can be seen that most of the business is based on accessing a small amount of data (hot data). Therefore, the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
在第一方面的一些可选的实施例中,在获取行数据集合的位置信息集合之前,方法还 包括:记录行数据集合中每个行数据的热度信息,该热度信息用于指示每个行数据的访问频率。In some optional embodiments of the first aspect, before acquiring the location information set of the row data set, the method further includes: recording the popularity information of each row data in the row data set, where the popularity information is used to indicate that each row Data access frequency.
本申请实施例中,通过记录热度信息反映行数据的访问频率,为确定行数据集合中的热数据提供了依据,提升了技术方案的可实现性。In the embodiment of the present application, the access frequency of the row data is reflected by recording the heat information, which provides a basis for determining the hot data in the row data set, and improves the feasibility of the technical solution.
在第一方面的一些可选的实施例中,热度信息包括访问次数,记录行数据集合中每个行数据的热度信息,包括:获取第一数据访问请求,第一数据访问请求指示访问行数据集合中的第一目标行数据。然后根据第一数据访问请求,增加第一目标行数据对应的访问次数。本申请实施例中,记录访问次数的方式有多种情况,可以设置频度计数器、过期时间、和/或定时器都能够方式,记录访问次数,具体此处不做限定。In some optional embodiments of the first aspect, the popularity information includes the number of visits, recording the popularity information of each row data in the row data set, including: obtaining the first data access request, the first data access request indicating access to the row data The first target row of data in the collection. Then, according to the first data access request, the number of accesses corresponding to the first target row data is increased. In the embodiment of the present application, there are many ways to record the number of visits. You can set a frequency counter, an expiration time, and/or a timer to record the number of visits, which is not limited here.
本申请实施例中,提供了多种获取热度信息的方式,在实际应用中,可以灵活选择,提升了本申请技术方案的灵活性和实用性。In the embodiment of the present application, multiple ways of obtaining popularity information are provided, and in practical applications, they can be flexibly selected, which improves the flexibility and practicability of the technical solution of the present application.
在第一方面的一些可选的实施例中,方法还包括:网络设备还可以提供热度阈值配置接口,通过热度阈值配置接口获取用户输入的热度阈值。热度阈值可以指示热数据在行数据集合中的占比,也可以指示热数据对应的访问次数的最小值,具体此处不做限定。在一些可选的实施例中,热度阈值还可以根据应用类型设置,或者由研发/测试人员设置,具体此处不做限定。网络设备可以根据热度信息和热度阈值,从行数据集合中确定热数据。In some optional embodiments of the first aspect, the method further includes: the network device may further provide a temperature threshold configuration interface, and obtain the temperature threshold input by the user through the temperature threshold configuration interface. The popularity threshold may indicate the proportion of hot data in the row data set, or indicate the minimum value of access times corresponding to the hot data, which is not limited here. In some optional embodiments, the temperature threshold can also be set according to the type of application, or set by the R&D/testing personnel, which is not specifically limited here. The network device can determine hot data from the row data set according to the hotness information and hotness threshold.
本申请实施例中,可以由用户设置热度阈值,增强了用户体验。此外,热度阈值有多种定义方式,可以根据实际应用的需要选择,进一步提升了本申请技术方案的灵活性。In the embodiment of the present application, the popularity threshold can be set by the user, which enhances the user experience. In addition, there are many ways to define the heat threshold, which can be selected according to the needs of practical applications, which further improves the flexibility of the technical solution of the present application.
在第一方面的一些可选的实施例中,其他数据包括冷数据,位置信息集合中还包括冷数据对应的第二位置信息。在数据恢复完后,也即在将热数据加载到内存中之后,本申请还可以执行数据访问过程。具体来说,网络设备可以获取来自于终端设备的第二数据访问请求,第二数据访问请求指示访问行数据集合中的第二目标行数据。根据第二目标行数据对应的不同数据类型,基于不同的方式访问第二目标行数据。若第二目标行数据为热数据,则在内存中访问热数据;若第二目标行数据为冷数据,则根据第二位置信息加载冷数据。其中,访问第二目标行数据,除了包括对第二目标行数据进行读或写操作之外,还可以包括对第二目标行数据进行其他的操作,例如,修改第二目标行数据的内容等,具体此处不做限定。In some optional embodiments of the first aspect, the other data includes cold data, and the location information set further includes second location information corresponding to the cold data. After the data is restored, that is, after the hot data is loaded into the memory, the present application can also perform a data access process. Specifically, the network device may obtain a second data access request from the terminal device, and the second data access request indicates to access the second target row data in the row data set. According to different data types corresponding to the second target row data, the second target row data is accessed based on different methods. If the second target row data is hot data, access the hot data in the memory; if the second target row data is cold data, load the cold data according to the second location information. Wherein, accessing the second target row data may include performing other operations on the second target row data in addition to reading or writing the second target row data, for example, modifying the content of the second target row data, etc. , not limited here.
本申请实施例中,将内存数据库中的行数据集合分为热数据和冷数据,在数据恢复过程中对不同的数据进行不同的处理,访问频率高的热数据预先记载到内存中,在访问热数据时可以直接在内存中访问,保证了系统有良好的性能表现。访问频率低的冷数据则备份为位置信息,在访问冷数据时,根据位置信息在运行时加载,减少了内存数据库整体的数据恢复时间。In the embodiment of this application, the row data set in the memory database is divided into hot data and cold data, and different data are processed differently during the data recovery process. Hot data can be directly accessed in memory, ensuring good performance of the system. Cold data with low access frequency is backed up as location information. When accessing cold data, it is loaded at runtime according to the location information, which reduces the overall data recovery time of the memory database.
本申请实施例第二方面提供了一种数据处理方法,该方法应用于第一网络设备,第一网络设备包含于数据处理系统,数据处理系统还包括第二网络设备,方法包括:The second aspect of the embodiment of the present application provides a data processing method, the method is applied to a first network device, the first network device is included in a data processing system, and the data processing system further includes a second network device, the method includes:
获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率;并向第二网络设备发送热检查点信息,以使第一网络设备故障后,基于第二网络设备访问热数据。Acquiring hot checkpoint information, the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than the access frequency of other data in the row data set; and sending the hot checkpoint information to the second network device to After the first network device fails, hot data is accessed based on the second network device.
本申请实施例中,第一网络设备向第二网络设备同步的是热检查点信息而不是日志信息,使得第一网络设备和第二网络设备的同步速度相匹配,更符合实际应用的需要,也提升了本申请技术方案的实用性。In the embodiment of the present application, the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
在第二方面的一些可选的实施例中,热检查点信息包括热数据,使得第二网络设备收到该热数据之后,将热数据存储至内存中。In some optional embodiments of the second aspect, the hot checkpoint information includes hot data, so that the second network device stores the hot data in memory after receiving the hot data.
在第二方面的一些可选的实施例中,热检查点信息包括热数据对应的位置信息,该位置信息用于指示热数据在日志中的位置,使得第二网络设备收到该位置信息之后,会根据该位置信息将热数据加载至内存中。In some optional embodiments of the second aspect, the hot checkpoint information includes location information corresponding to the hot data, and the location information is used to indicate the location of the hot data in the log, so that after the second network device receives the location information , the hot data will be loaded into the memory according to the location information.
本申请实施例中,热检查点的信息有多种,可以根据实际应用的需要确定,提升了技术方案的灵活性。In the embodiment of the present application, there are various types of hot checkpoint information, which can be determined according to actual application requirements, which improves the flexibility of the technical solution.
在第二方面的一些可选的实施例中,在第一网络设备没有故障的情况下,第一网络设备可以还执行前述第一方面的方法。相关有益效果与第一方面类似,详见第一方面所示,此处不再赘述。In some optional embodiments of the second aspect, if the first network device is not faulty, the first network device may further execute the method of the foregoing first aspect. Relevant beneficial effects are similar to those of the first aspect, see the first aspect for details, and will not be repeated here.
本申请实施例第三方面提供了一种数据处理方法,该方法应用于数据处理系统,数据处理系统包括第一网络设备和第二网络设备,该方法包括:The third aspect of the embodiment of the present application provides a data processing method, the method is applied to a data processing system, the data processing system includes a first network device and a second network device, the method includes:
第一网络设备获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率。在获取热检查点信息之后,第一网络设备向第二网络设备发送热检查点信息。第二网络设备收到检查点信息,会根据热检查点信息,在内存中加载热数据。在第一网络设备故障的情况下,第二网络设备会接收数据访问请求,数据访问请求指示访问热数据;并设备响应数据访问请求,在内存中访问热数据。The first network device acquires hot checkpoint information, where the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set. After acquiring the hot checkpoint information, the first network device sends the hot checkpoint information to the second network device. The second network device receives the checkpoint information, and loads hot data in the memory according to the hot checkpoint information. In case of failure of the first network device, the second network device will receive a data access request, and the data access request indicates access to hot data; and the device responds to the data access request, and accesses the hot data in memory.
在第三方面的一些可选的实施例中,热检查点信息包括热数据或者热数据对应的位置信息,该位置信息用于指示热数据在日志中的位置。如果热检查点信息为热数据,第二网络设备收到热数据之后,会将热数据存储至内存中;如果热检查点信息为热数据对应的位置信息,第二网络设备收到该位置信息之后,会根据该位置信息将热数据加载至内存中。In some optional embodiments of the third aspect, the hot checkpoint information includes hot data or location information corresponding to the hot data, where the location information is used to indicate a location of the hot data in the log. If the hot checkpoint information is hot data, the second network device will store the hot data in memory after receiving the hot data; if the hot checkpoint information is the location information corresponding to the hot data, the second network device will receive the location information The hot data is then loaded into memory based on this location information.
在第三方面的一些可选的实施例中,在第一网络设备没有故障的情况下,第一网络设备可以还执行前述第一方面的方法。在第一网络设备故障的情况下,第二网络设备还可以执行前述第一方面的方法,此处不再赘述。相关有益效果与第一方面类似,详见第一方面所示,此处不再赘述。In some optional embodiments of the third aspect, if the first network device is not faulty, the first network device may further execute the method of the foregoing first aspect. In the case of a failure of the first network device, the second network device may also execute the method in the first aspect described above, which will not be repeated here. Relevant beneficial effects are similar to those of the first aspect, see the first aspect for details, and will not be repeated here.
本申请实施例中,第一网络设备向第二网络设备同步的是热检查点信息而不是日志信息,使得第一网络设备和第二网络设备的同步速度相匹配,更符合实际应用的需要,也提升了本申请技术方案的实用性。In the embodiment of the present application, the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
本申请实施例第四方面提供了一种网络设备,包括:The fourth aspect of the embodiment of the present application provides a network device, including:
获取单元,用于获取行数据集合的位置信息集合,行数据集合包括热数据,位置信息集合包括热数据对应的第一位置信息,热数据的访问频率高于行数据集合中其他数据的访问频率。An acquisition unit, configured to acquire a location information set of a row data set, the row data set includes hot data, the location information set includes first location information corresponding to the hot data, and the access frequency of the hot data is higher than that of other data in the row data set .
获取单元,还用于获取数据恢复指令,数据恢复指令用于指示恢复行数据集合中的数 据。The obtaining unit is also used to obtain a data restoration instruction, and the data restoration instruction is used to instruct to restore the data in the row data set.
处理单元,用于响应数据恢复指令,根据第一位置信息,将热数据加载到内存中。The processing unit is configured to load the hot data into the memory according to the first location information in response to the data recovery instruction.
网络设备用于执行第一方面的方法,本方面所示有益效果与第一方面有益效果类似,此处不再赘述。The network device is configured to execute the method in the first aspect, and the beneficial effect shown in this aspect is similar to the beneficial effect in the first aspect, which will not be repeated here.
本申请实施例第五方面提供了一种第一网络设备,第一网络设备包含于数据处理系统,数据处理系统还包括第二网络设备,第一网络设备包括:The fifth aspect of the embodiment of the present application provides a first network device. The first network device is included in a data processing system. The data processing system further includes a second network device. The first network device includes:
获取单元,用于获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率。The acquiring unit is configured to acquire hot checkpoint information, the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set.
发送单元,用于向第二网络设备发送热检查点信息,以使第一网络设备故障后,基于第二网络设备访问热数据。The sending unit is configured to send hot checkpoint information to the second network device, so that the hot data can be accessed based on the second network device after the first network device fails.
第一网络设备用于执行第二方面的方法,本方面所示有益效果与第二方面有益效果类似,此处不再赘述。The first network device is configured to execute the method in the second aspect, and the beneficial effect shown in this aspect is similar to the beneficial effect in the second aspect, which will not be repeated here.
本申请实施例第六方面提供了一种数据处理系统,包括第一网络设备和第二网络设备。A sixth aspect of the embodiments of the present application provides a data processing system, including a first network device and a second network device.
第一网络设备,用于获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率;向第二网络设备发送热检查点信息。The first network device is used to obtain hot checkpoint information, the hot checkpoint information corresponds to the hot data in the row data set, and the access frequency of the hot data is higher than the access frequency of other data in the row data set; send to the second network device Hot checkpoint information.
第二网络设备,用于根据热检查点信息,在内存中加载热数据;若第一网络设备故障,第二网络设备接收数据访问请求,数据访问请求指示访问热数据;响应数据访问请求,在内存中访问热数据。The second network device is configured to load hot data in the memory according to the hot checkpoint information; if the first network device fails, the second network device receives a data access request, and the data access request indicates access to hot data; in response to the data access request, the Access hot data in memory.
数据处理系统用于执行第三方面的方法,本方面所示有益效果与第三方面有益效果类似,此处不再赘述。The data processing system is used to execute the method of the third aspect, and the beneficial effect shown in this aspect is similar to the beneficial effect of the third aspect, which will not be repeated here.
本申请实施例第七方面提供了一种网络设备,包括处理器,存储器和通信接口,处理器,存储器和通信接口相连,处理器用于执行前述第一方面至第三方面中任一方面的方法。The seventh aspect of the embodiment of the present application provides a network device, including a processor, a memory and a communication interface, the processor, the memory and the communication interface are connected, and the processor is used to execute the method in any one of the first to third aspects. .
本申请实施例第八方面提供了一种计算机可读存储介质,计算机可读存储介质中保存有程序,当所述计算机执行所述程序时,执行前述第一方面至第三方面中任一方面的方法。The eighth aspect of the embodiment of the present application provides a computer-readable storage medium, in which a program is stored, and when the computer executes the program, any one of the aforementioned first to third aspects is executed Methods.
本申请实施例第九方面提供了一种计算机程序产品,其特征在于,当计算机程序产品在计算机上执行时,该计算机执行前述第一方面至第三方面中任一方面的方法。A ninth aspect of the embodiment of the present application provides a computer program product, which is characterized in that, when the computer program product is executed on a computer, the computer executes the method of any one of the foregoing first aspect to the third aspect.
第六方面至第九方面所示的有益效果与第一方面至第三方面所示的有益效果类似,此处不再赘述。The beneficial effects shown in the sixth aspect to the ninth aspect are similar to those shown in the first aspect to the third aspect, and will not be repeated here.
图1为内存数据库的一个示意图;Figure 1 is a schematic diagram of an in-memory database;
图2为数据处理方法的一个应用架构示意图;Fig. 2 is a schematic diagram of an application architecture of the data processing method;
图3为本申请实施例提供的数据处理方法的一个流程示意图;FIG. 3 is a schematic flow chart of a data processing method provided in an embodiment of the present application;
图4a为本申请实施例提供的数据处理方法的另一个流程示意图;Fig. 4a is another schematic flowchart of the data processing method provided by the embodiment of the present application;
图4b为本申请实施例提供的数据形态的一个示意图;Figure 4b is a schematic diagram of the data form provided by the embodiment of the present application;
图5为本申请实施例提供的数据处理方法的另一个流程示意图;FIG. 5 is another schematic flowchart of the data processing method provided by the embodiment of the present application;
图6为本申请实施例提供的数据处理方法的另一个流程示意图;FIG. 6 is another schematic flowchart of the data processing method provided in the embodiment of the present application;
图7a本申请实施例提供的阈值设置界面的一个示意图;Fig. 7a is a schematic diagram of the threshold setting interface provided by the embodiment of the present application;
图7b本申请实施例提供的阈值设置界面的另一个示意图;Fig. 7b is another schematic diagram of the threshold setting interface provided by the embodiment of the present application;
图8为本申请实施例提供的数据处理方法的另一个流程示意图;FIG. 8 is another schematic flowchart of the data processing method provided in the embodiment of the present application;
图9为本申请实施例提供的数据处理系统的一个结构示意图;FIG. 9 is a schematic structural diagram of a data processing system provided by an embodiment of the present application;
图10为本申请实施例提供的数据处理方法的另一个流程示意图;FIG. 10 is another schematic flowchart of the data processing method provided by the embodiment of the present application;
图11为本申请实施例提供的数据处理方法的另一个流程示意图;FIG. 11 is another schematic flowchart of the data processing method provided by the embodiment of the present application;
图12为本申请实施例提供的数据处理方法的另一个流程示意图;FIG. 12 is another schematic flowchart of the data processing method provided by the embodiment of the present application;
图13为本申请实施例提供的网络设备的一个结构示意图;FIG. 13 is a schematic structural diagram of a network device provided by an embodiment of the present application;
图14为本申请实施例提供的第一网络设备的一个结构示意图;FIG. 14 is a schematic structural diagram of a first network device provided by an embodiment of the present application;
图15为本申请实施例提供的数据处理系统的另一个结构示意图;FIG. 15 is another schematic structural diagram of the data processing system provided by the embodiment of the present application;
图16为本申请实施例提供的网络设备的另一个结构示意图。FIG. 16 is another schematic structural diagram of a network device provided by an embodiment of the present application.
本申请实施例提供了数据处理方法、相关装置及设备,将访问频率高的行数据定义为热数据,在执行检查点(check point)操作时,备份所有行数据的位置信息。在数据恢复时,根据热数据的第一位置信息,将热数据加载到内存中,从而热数据不需要在运行时加载。结合齐夫分布可知,大部分的业务基于访问少部分的数据(热数据)实现的。因此本申请实施例提供的数据恢复方法通过保证访问热数据的业务的性能,提升了数据恢复后系统的性能。The embodiment of the present application provides a data processing method, related devices and equipment, which defines row data with high access frequency as hot data, and backs up location information of all row data when a checkpoint operation is performed. When data is restored, the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime. Combined with the Zif distribution, it can be seen that most of the business is realized based on accessing a small amount of data (hot data). Therefore, the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。Embodiments of the present application are described below in conjunction with the accompanying drawings. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,其目的在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。另外,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a description of the manner in which objects with the same attribute are described in the embodiments of the present application. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product, or apparatus comprising a series of elements is not necessarily limited to those elements, but may include without other elements explicitly listed or inherent to the process, method, product, or apparatus. In addition, "at least one" means one or more, and "plurality" means two or more. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
首先,对本申请实施例可能涉及的相关概念进行解释。First, related concepts that may be involved in the embodiments of the present application are explained.
1.内存数据库。1. In-memory database.
内存数据库是将数据放在内存中直接操作的数据库,内存数据库的显著优势是将数据在内存中比放在磁盘中访问速度有大幅提升,能够提升应用的性能。下面结合图1对内存数据库的数据结构进行说明,请参阅图1,图1为内存数据库的一个示意图。An in-memory database is a database that stores data in memory and directly operates it. The significant advantage of an in-memory database is that the access speed of data in memory is greatly improved compared with that on disk, which can improve application performance. The data structure of the in-memory database will be described below in conjunction with FIG. 1 . Please refer to FIG. 1 , which is a schematic diagram of the in-memory database.
一般情况下,内存数据库的数据组织如图1所示,主要包括行数据和索引(index)。Generally, the data organization of an in-memory database is shown in Figure 1, which mainly includes row data and indexes.
索引可以用于快速定位行数据所在的行标识(row ID),索引可以采用自适应基树(adaptive radix tree,ART)、MassTree、bwTree等树形结构。其中,ART树又可以称为前缀树。需要注意的是,在实际应用中,索引还可以基于其他类型的树形结构,或者不采用树形结构,具体此处不做限定。在图1所示实施例中,索引包括主索引(primary index)和二级索引(secondary index),主索引对应行数据中独一无二的标识,能够准确地索引到唯一的行数据;二级索引可以理解为一个辅助标识,用于粗略的检索。The index can be used to quickly locate the row ID (row ID) where the row data is located, and the index can adopt tree structures such as adaptive radix tree (ART), MassTree, and bwTree. Among them, the ART tree may also be called a prefix tree. It should be noted that in practical applications, the index may also be based on other types of tree structures, or not adopt tree structures, which are not specifically limited here. In the embodiment shown in Figure 1, the index includes a primary index (primary index) and a secondary index (secondary index), the primary index corresponds to the unique identifier in the row data, and can accurately index the unique row data; the secondary index can It is understood as an auxiliary identifier for rough retrieval.
每个行数据分配了唯一的行标识(row ID),row ID由行图(row map)进行索引。为了支持高并发的数据处理能力,通常会采用多版本并发控制(multi-version concurrency control,MVCC)的方式访问数据,能够访问到一条行数据的不同版本,也能访问到各个行数据在当前时刻的版本。经过对数据的修改等操作,同一个行数据会包括多个记录(record)版本,多个版本的记录组成版本链(version chain),版本链中各个记录一般按照从新到旧的顺序有序排列。Each row of data is assigned a unique row ID, and the row ID is indexed by the row map. In order to support high-concurrency data processing capabilities, multi-version concurrency control (MVCC) is usually used to access data, so that different versions of a row of data can be accessed, and each row of data can be accessed at the current moment. version of. After data modification and other operations, the same row of data will include multiple record (record) versions, and multiple versions of records form a version chain (version chain), and each record in the version chain is generally arranged in order from new to old .
存储层(storage)会在日志库(log store)中存放日志,日志库是日志服务中日志数据的采集、存储和查询单元。存储层中还包括检查点信息,在本申请实施例中,检查点信息包括行数据集合对应的位置信息集合,位置信息集合中的位置信息与行数据集合中的行数据一一对应,位置信息反映的是行数据在日志中的位置。在内存数据库故障时,通过检查点信息,能够进行数据恢复。The storage layer (storage) stores logs in the log store, which is the unit for collecting, storing and querying log data in the log service. The storage layer also includes checkpoint information. In the embodiment of the present application, the checkpoint information includes the location information set corresponding to the row data set. The location information in the location information set corresponds to the row data in the row data set. The location information Reflects the position of row data in the log. When the memory database fails, data recovery can be performed through the checkpoint information.
2.齐夫分布。2. Zif distribution.
齐夫分布又称为二八法则,简单来说就是一般情况下,应用的访问数据都符合二八法则,80%的业务的通过访问20%的业务实现的。因此,本申请实施例中,预先将访问频率高的热数据记载到内存中,可以保证数据恢复之后大部分访问请求的性能要求。The Ziff distribution is also known as the 80/20 rule. Simply put, in general, the access data of the application complies with the 80/20 rule, and 80% of the business is realized by accessing 20% of the business. Therefore, in the embodiment of the present application, the hot data with high access frequency is recorded in the memory in advance, which can ensure the performance requirements of most access requests after the data is restored.
下面,请参阅图2,图2为本申请实施例提供的数据处理方法的一个应用架构示意图。如图2所示,网络设备201和终端设备202建立通信连接,网络设备201中存在内存数据库。终端设备202可以向网络设备201发送数据访问请求,数据访问请求指示访问内存数据库中的数据。由于内存介质本身的易失性,内存数据库发生故障(例如掉电或者数据库重启)之后,内存数据库中的数据会完全丢失,便需要及时进行数据恢复,降低不良影响。Next, please refer to FIG. 2 . FIG. 2 is a schematic diagram of an application architecture of the data processing method provided by the embodiment of the present application. As shown in FIG. 2 , a communication connection is established between a
需要注意的是,图1只是对应用架构的一个示例,在实际应用中,网络设备201除了是图1所示的服务器之外,还可以是其他带有存储功能的介质,例如主机,具体此处不做限定。本申请实施例中所涉及到的终端设备202可以包括各种具有通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其它处理设备。终端设备也可以称为终端(terminal),终端设备还可以是用户单元(subscriber unit)、蜂窝电话(cellular phone)、智能手机(smart phone)、无线数据卡、个人数字助理(personal digital assistant,PDA)电脑、平板型电脑、无线调制解调器(modem)、手持设备(handset)、膝上型电脑(laptop computer)、机器类型通信(machine type communication,MTC)终端等,此处不做限定。It should be noted that FIG. 1 is only an example of the application architecture. In practical applications, besides the server shown in FIG. 1, the
在本申请实施例中,网络设备201还可以向更多的终端设备202提供服务,终端设备 的数量和种类根据实际需要确定,具体此处不做限定。In this embodiment of the application, the
接下来,请参阅图3,图3为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:Next, please refer to FIG. 3. FIG. 3 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
301.获取行数据集合的位置信息集合,行数据集合包括热数据,位置信息集合包括热数据对应的第一位置信息,热数据的访问频率高于行数据集合中其他数据的访问频率。301. Acquire the location information set of the row data set, the row data set includes hot data, the location information set includes first location information corresponding to the hot data, and the access frequency of the hot data is higher than that of other data in the row data set.
在内存数据库中,存储有行数据集合,行数据集合中包括至少一个行数据。网络设备在执行检查点操作时,会扫描行数据集合对应的位置信息集合,该位置信息集合指示行数据集合在日志中的位置信息。具体来说,位置信息集合中的位置信息与行数据集合中的行数据一一对应。In the memory database, a row data set is stored, and the row data set includes at least one row data. When the network device performs a checkpoint operation, it scans the location information set corresponding to the row data set, and the location information set indicates the location information of the row data set in the log. Specifically, the position information in the position information set is in one-to-one correspondence with the row data in the row data set.
下面结合图4a,对本申请实施例中执行检查点操作的过程进行说明,图4a为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:The process of performing the checkpoint operation in the embodiment of the present application will be described below in conjunction with FIG. 4a. FIG. 4a is a schematic flow chart of the data processing method provided in the embodiment of the present application, including the following steps:
401.检查点线程触发检查点操作。401. A checkpoint thread triggers a checkpoint operation.
内存数据库运行过程中,后台会按照一定的策略周期性地触发检查点请求。检查点操作对应的数据备份,可以分为全量备份和增量备份。在本申请实施例中,全量备份是指对内存数据中的所有数据在日志中的位置进行备份,增量备份是指对内存数据库中发生变化的数据在日志中的位置进行备份。触发检查点请求的策略不仅可以包括检查点操作间隔的周期时长,还可以包括全量备份和增量备份的发生频率,具体此处不做限定。例如,策略可以规定按照一次全量备份之后执行四次增量备份,且每次备份间隔固定时间(如20分钟)的方式往复执行检查点操作。需要注意的是,触发检查点请求的策略的具体内容根据实际应用确定,具体此处不做限定。During the operation of the memory database, the background will periodically trigger checkpoint requests according to certain policies. The data backup corresponding to the checkpoint operation can be divided into full backup and incremental backup. In this embodiment of the application, full backup refers to backing up the positions of all data in the memory data in the log, and incremental backup refers to backing up the positions of the changed data in the memory database in the log. The strategy for triggering a checkpoint request may include not only the cycle time of the checkpoint operation interval, but also the occurrence frequency of full backup and incremental backup, which is not limited here. For example, the policy may stipulate that four incremental backups are performed after one full backup, and the checkpoint operation is performed reciprocally at a fixed time interval (such as 20 minutes) between each backup. It should be noted that the specific content of the policy that triggers the checkpoint request is determined according to the actual application, and is not limited here.
402.内存数据库获取检查点对应的时间戳。402. The in-memory database obtains the timestamp corresponding to the checkpoint.
因为内存数据库中存储有多个版本的数据,在执行检查点操作时,需要确定当前时刻对应的数据的版本,避免重复备份或者备份不准确等带来的问题。因此,内存数据库会获取检查点对应的时间戳。Because there are multiple versions of data stored in the memory database, when performing a checkpoint operation, it is necessary to determine the version of the data corresponding to the current moment to avoid problems caused by repeated backups or inaccurate backups. Therefore, the in-memory database gets the timestamp corresponding to the checkpoint.
403.内存数据库基于时间戳扫描快照数据,并对快照数据对应的位置信息存盘。403. The memory database scans the snapshot data based on the timestamp, and saves the location information corresponding to the snapshot data.
内存数据库基于时间戳扫描数据的快照版本,也即扫描快照数据,所谓快照数据是指某个时刻的数据的拷贝。同时,内存数据库还会将快照数据的对应的位置信息存储在磁盘中,位置信息指示的是行数据在日志中的位置。其中,磁盘既可以是网络设备本地的磁盘,也可以是其他具备数据读写等功能的介质,例如云网络(虚拟机)上的磁盘,磁盘的类型和位置根据实际应用的需要进行选择,具体此处不做限定。The in-memory database scans the snapshot version of the data based on the timestamp, that is, scans the snapshot data. The so-called snapshot data refers to a copy of the data at a certain moment. At the same time, the memory database will also store the corresponding location information of the snapshot data in the disk, and the location information indicates the location of the row data in the log. Among them, the disk can be a local disk of the network device, or other media with functions such as data reading and writing, such as a disk on a cloud network (virtual machine). The type and location of the disk are selected according to the needs of the actual application. There is no limit here.
404.内存数据库将行数据对应的热度信息存盘。404. The memory database saves the heat information corresponding to the row data to disk.
内存数据库还会将行数据对应的热度信息存储至磁盘中,热度信息用于指示行数据的访问频率,用于确定热数据。具体来说,行数据集合中包括热数据,热数据的访问频率高于行数据集合中的其他数据。位置信息集合中包括热数据对应的第一位置信息,第一位置信息指示热数据在日志中的位置。The in-memory database also stores the heat information corresponding to the row data to the disk, and the heat information is used to indicate the access frequency of the row data and to determine the hot data. Specifically, the row data set includes hot data, and the access frequency of the hot data is higher than other data in the row data set. The location information set includes first location information corresponding to the thermal data, and the first location information indicates the location of the thermal data in the log.
在存盘时,可以按照访问频率从高到低的顺序放置,也可以按照其他的顺序存盘,例如按照行号,具体此处不做限定。When saving to disk, they can be placed in descending order of access frequency, or they can be saved in other order, for example, by line number, which is not limited here.
在一些可选的实施例中,在步骤404之后,网络设备还会执行步骤405,即内存数据库反馈检查点操作完成。In some optional embodiments, after step 404, the network device also executes step 405, that is, the memory database feedbacks that the checkpoint operation is completed.
本申请实施例中,经过检查点操作之后,在磁盘中存储的数据形态有多种可能,下面结合图4b进行说明。请参阅图4b,图4b为本申请实施例提供的数据形态的一个示意图。In the embodiment of the present application, after the checkpoint operation, there are many possible forms of data stored in the disk, which will be described in conjunction with FIG. 4b below. Please refer to FIG. 4b. FIG. 4b is a schematic diagram of the data form provided by the embodiment of the present application.
如图4b所示,网络设备在存盘时,可以按照数据形态1的方式存盘,在检查点文件上附带热度信息,检查点文件包括行数据对应的位置信息集合,热度信息反映的是行数据集合中各个行数据的访问频率,用于确定热数据。网络设备也可以按照数据形态2的方式存盘,在热数据文件中存储热数据对应的第一位置信息,在冷数据文件中存储冷数据对应的第二位置信息。其中,网络设备如何从行数据集合中确定热数据,并确定其他数据为冷数据,在下文图5至图7b所示实施例中详细说明,详见该部分内容所示。As shown in Figure 4b, when a network device saves to disk, it can be saved in the form of data form 1, and heat information is attached to the checkpoint file. The checkpoint file includes the location information set corresponding to the row data, and the heat information reflects the row data set. The access frequency of each row of data in is used to determine hot data. The network device can also save to disk according to data form 2, storing the first location information corresponding to the hot data in the hot data file, and storing the second location information corresponding to the cold data in the cold data file. Wherein, how the network device determines hot data from the row data set and determines other data as cold data is described in detail in the embodiments shown in FIG. 5 to FIG. 7b below, see this part for details.
302.获取数据恢复指令,数据恢复指令用于指示恢复行数据集合中的数据;302. Obtain a data restoration instruction, where the data restoration instruction is used to instruct to restore the data in the row data set;
在内存数据库发生故障之后,内存数据库会收到数据恢复指令,该数据恢复指令指示恢复行数据集合中的数据。After the in-memory database fails, the in-memory database will receive a data recovery instruction, which instructs to restore data in the row data set.
303.响应数据恢复指令,根据第一位置信息,将热数据加载到内存中。303. In response to the data recovery instruction, load the hot data into the memory according to the first location information.
网络设备会响应数据恢复指令,并根据第一位置信息,将热数据预先加载到内存中,使得访问热数据时可以直接从内存中访问。The network device will respond to the data recovery instruction, and preload the hot data into the memory according to the first location information, so that when accessing the hot data, it can be directly accessed from the memory.
下面结合图5,对步骤302和步骤303进行详细说明,请参阅图5,图5为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:Step 302 and step 303 will be described in detail below in conjunction with FIG. 5. Please refer to FIG. 5. FIG. 5 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
501.数据恢复线程触发数据恢复操作。501. The data recovery thread triggers a data recovery operation.
内存数据库故障之后,可以发出告警,提示技术人员进行数据恢复,由技术人员控制数据恢复线程,以触发数据恢复操作。在实际应用中,还可以有其他的方式触发数据恢复操作,例如,在内存数据库故障时开启定时器,在定时器超时还没有接收到用户下发的数据恢复指令时,自动触发数据恢复操作。触发数据恢复操作的情况根据实际应用确定,具体此处不做限定。After the in-memory database fails, an alarm can be issued to prompt technicians to perform data recovery, and the technicians control the data recovery thread to trigger data recovery operations. In practical applications, there are other ways to trigger the data recovery operation, for example, start the timer when the memory database fails, and automatically trigger the data recovery operation when the timer expires and no data recovery command from the user is received. The situation that triggers the data recovery operation is determined according to the actual application, which is not limited here.
502.内存数据库加载检查点数据。502. The memory database loads the checkpoint data.
内存数据库收到数据恢复指令之后,会加载检查点数据。具体来说,内存数据库会从存储层读取距离故障发生时刻最近时间点的全量备份的检查点数据以及全量备份之后的增量备份的检查点的数据。在本申请实施例中,由于执行检查点操作时,备份存储的是行数据对应的位置信息,因此加载的检查点数据是行数据对应的位置信息。After the memory database receives the data recovery instruction, it will load the checkpoint data. Specifically, the in-memory database will read the checkpoint data of the full backup and the checkpoint data of the incremental backup after the full backup from the storage layer at the closest time point to the time when the failure occurred. In the embodiment of the present application, since the location information corresponding to the row data is backed up when the checkpoint operation is performed, the loaded checkpoint data is the location information corresponding to the row data.
503.内存数据库恢复行数据的对应的偏移位置。503. The memory database restores the corresponding offset position of the row data.
加载出检查点数据之后,内存数据库会基于日志库,把内存数据库中所有行数据的位置信息完整恢复出来,也即把行数据的在日志中的偏移位置恢复到内存中。After the checkpoint data is loaded, the in-memory database will completely restore the location information of all row data in the in-memory database based on the log library, that is, restore the offset position of the row data in the log to the memory.
504.内存数据库设置全局时钟。504. The memory database sets the global clock.
本申请实施例涉及的数据处理过程依赖于时间戳,在机器语言中,时间戳可以按照计数器的方式运行,因此为了防止恢复后的数据出错,会设置全局时钟,使得恢复后的数据位于正确的位置。The data processing process involved in the embodiment of the present application depends on the time stamp. In machine language, the time stamp can be run as a counter. Therefore, in order to prevent errors in the recovered data, a global clock will be set so that the recovered data is located in the correct Location.
505.内存数据库预加载热数据。505. The memory database preloads hot data.
内存数据库会将行数据集合中的热数据预先加载到内存中,使得恢复后的内存数据库具备较为良好的性能。The in-memory database will preload the hot data in the row data set into the memory, so that the restored in-memory database has better performance.
在一些可选的实施例中,在步骤505之后,网络设备还可以执行步骤506,即返回数据恢复操作完成。In some optional embodiments, after step 505, the network device may also execute step 506, that is, return that the data recovery operation is completed.
本申请实施例中,将访问频率高的行数据定义为热数据,在执行检查点操作时,备份所有行数据的位置信息。在数据恢复时,根据热数据的第一位置信息,将热数据加载到内存中,从而热数据不需要在运行时加载。结合齐夫分布可知,大部分的业务都是基于对少部分的数据(热数据)访问实现的。因此本申请实施例提供的数据恢复方法通过保证访问热数据的业务的性能,提升了数据恢复后系统的性能。In the embodiment of the present application, row data with a high access frequency is defined as hot data, and position information of all row data is backed up when a checkpoint operation is performed. When data is restored, the hot data is loaded into the memory according to the first location information of the hot data, so that the hot data does not need to be loaded at runtime. Combined with the Zif distribution, it can be seen that most of the business is based on access to a small amount of data (hot data). Therefore, the data restoration method provided by the embodiment of the present application improves the performance of the system after data restoration by ensuring the performance of services accessing hot data.
在一些可选的实施例中,在步骤301之前,网络设备还会记录行数据集合中每个行数据的热度信息,该热度信息用于指示每个行数据的访问频率。In some optional embodiments, before
本申请实施例中,通过记录热度信息反映行数据的访问频率,为确定行数据集合中的热数据提供了依据,提升了技术方案的可实现性。In the embodiment of the present application, the access frequency of the row data is reflected by recording the heat information, which provides a basis for determining the hot data in the row data set, and improves the feasibility of the technical solution.
在一些可选的实施例中,热度信息包括访问次数,行数据被访问时,网络设备会修改该行数据对应的热度信息。也就是说,记录行数据集合中每个行数据的热度信息,包括:获取第一数据访问请求,该第一数据访问请求指示访问行数据集合中的第一目标行数据。然后根据第一数据访问请求,增加第一目标行数据对应的访问次数。In some optional embodiments, the popularity information includes access times, and when the row data is accessed, the network device will modify the popularity information corresponding to the row data. That is to say, recording the popularity information of each row data in the row data set includes: obtaining a first data access request, where the first data access request indicates to access the first target row data in the row data set. Then, according to the first data access request, the number of accesses corresponding to the first target row data is increased.
下面结合图6,对这一过程进行说明,请参阅图6,图6为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:This process will be described below in conjunction with FIG. 6. Please refer to FIG. 6. FIG. 6 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
601.网络设备获取来自于终端设备的第一数据访问请求。601. The network device acquires a first data access request from the terminal device.
网络设备会获取来自于终端设备的第一数据访问请求,该第一数据访问请求指示访问行数据集合中的第一目标行数据。其中,第一数据访问请求中可以携带第一行数据对应的行号,或者是第一目标行数据对应的唯一索引,以使网络设备能够根据该行号或者唯一索引确定被访问的行数据是第一目标行数据。其中,唯一索引即为图1所示实施例中的主索引,示例性的,假设内存数据库中存储的是至少一个商品的相关信息,包括品名、单价、库存等信息,不同商品的品名是不一样的,那么每件商品的品名便可作为唯一索引。访问第一目标行数据,除了包括对第一目标行数据进行读或写操作之外,还可以包括对第一目标行数据进行其他的操作,例如,修改第一目标行数据的内容等,具体此处不做限定。The network device will obtain a first data access request from the terminal device, where the first data access request indicates to access the first target row data in the row data set. Wherein, the first data access request may carry the row number corresponding to the first row of data, or the unique index corresponding to the first target row data, so that the network device can determine whether the accessed row data is The first target row data. Wherein, the unique index is the main index in the embodiment shown in FIG. 1. Exemplarily, it is assumed that the memory database stores information about at least one product, including product name, unit price, inventory and other information. The product names of different products are different. The same, then the product name of each product can be used as a unique index. Accessing the first target row data may include performing other operations on the first target row data in addition to reading or writing the first target row data, for example, modifying the content of the first target row data, etc., specifically There is no limit here.
602.网络设备获取第一数据访问请求对应的时间戳。602. The network device acquires a time stamp corresponding to the first data access request.
第一数据访问请求中还可以携带该请求对应的时间戳,该时间戳用于指示所访问的第一目标行数据是哪一时刻的行数据,避免由于内存数据库中存在多个版本的行数据而带来的访问错误。The first data access request can also carry the timestamp corresponding to the request, and the timestamp is used to indicate the row data at which time the first target row data is accessed, so as to avoid multiple versions of the row data in the memory database. resulting in access errors.
603.基于并发控制协议读取第一目标行数据。603. Read the first target row data based on the concurrency control protocol.
一般情况下,访问内存数据库中的数据都是基于并发控制协议进行的,并发控制协议的类型根据内存数据库的类型确定,具体此处不做限定。In general, access to data in the in-memory database is based on the concurrency control protocol, and the type of the concurrency control protocol is determined according to the type of the in-memory database, which is not limited here.
604.修改第一目标行数据对应的热度信息。604. Modify the popularity information corresponding to the first target row data.
当第一目标行数据被访问时,网络设备会修改第一目标行数据对应的热度信息,具体 来说可以是增加第一目标行数据的访问次数。如何增加该访问次数,有多种可能的情况,下面分别对可能的情况进行说明。When the first target row data is accessed, the network device will modify the popularity information corresponding to the first target row data, specifically, it can increase the number of visits to the first target row data. There are many possible situations how to increase the number of visits, and the possible situations are described below.
1)基于频度计数器记录访问次数。1) Record the number of visits based on a frequency counter.
在一些可选的实施例中,网络设备可以单独对每一个行数据维护一个频度计数器accessCount,其初始值为0。当第一目标行数据被访问一次,其对应的accessCount进行一次加n操作,也即执行一次accessCount++操作。其中,n为任意正数。In some optional embodiments, the network device may separately maintain a frequency counter accessCount for each row of data, and its initial value is 0. When the data of the first target row is accessed once, the corresponding accessCount is added with n, that is, the accessCount++ operation is performed once. Wherein, n is any positive number.
可选的,accessCount可以按照执行checkpoint操作的时间间隔,进行周期性的变化。例如,在假设checkpoint操作是每隔20分钟执行一次,那么在每个周期的起始时刻,accessCount的初始值为0,在周期内根据访问次数执行accessCount++操作。Optionally, the accessCount can be changed periodically according to the time interval of executing the checkpoint operation. For example, assuming that the checkpoint operation is performed every 20 minutes, then at the beginning of each cycle, the initial value of accessCount is 0, and the accessCount++ operation is executed according to the number of accesses within the cycle.
可选的,accessCount还可以不进行周期性的变化。例如,在内存数据库中第一次存入行数据集合时,每个行数据对应的accessCount初始值都为0,哪一行数据被访问,便对该行数据对应的accessCount执行accessCount++操作。Optionally, accessCount may not change periodically. For example, when the row data set is stored in the memory database for the first time, the initial value of accessCount corresponding to each row data is 0, and which row of data is accessed, the accessCount++ operation is performed on the accessCount corresponding to the row data.
可以理解的是,在实际应用中,accessCount的计数方式还可以有更多可能,只要能够反映出行数据的访问次数即可,具体此处不做限定。It can be understood that, in practical applications, there are more possibilities for counting the accessCount, as long as it can reflect the number of visits to the travel data, which is not limited here.
2)根据过期时间记录访问次数。2) Record the number of visits according to the expiration time.
在一些可选的实施例中,网络设备还可以为每个行数据设置一个过期时间,过期时间的时长可以小于checkpoint操作之间间隔的时长。在某一行数据对应的过期时间失效之前,该行数据被访问的话,则增加该行数据的访问次数;在某一行数据对应的过期时间之内,该行数据没有被访问的话,则认为该行数据对应的访问次数不变。In some optional embodiments, the network device may also set an expiration time for each row of data, and the duration of the expiration time may be shorter than the interval between checkpoint operations. Before the expiration time corresponding to a row of data expires, if the row of data is accessed, the number of accesses to the row of data will be increased; if the row of data is not accessed within the corresponding expiration time of a row of data, the row will be considered The number of visits corresponding to the data remains unchanged.
3)根据频度计数器和过期时间记录访问次数。3) Record the number of visits according to the frequency counter and expiration time.
在一些可选的实施例中,网络设备还可以结合频度计数器和过期时间记录访问次数。具体来说,网络设备为每个行数据设置一个过期时间和一个频度计数器,在过期时间之内,如果行数据没有被访问,那么该行数据对应的频度计数器的数值不变,并刷新失效的过期时间,用于继续计数;如果在过期时间失效之前,该行数据被访问,那么该行数据对应的频度计数器执行一次accessCount++操作,并且在行数据被访访问时刷新过期时间。其中,accessCount++操作的含义在上文已经阐述,此处不再说明。In some optional embodiments, the network device may also record the number of visits in combination with the frequency counter and the expiration time. Specifically, the network device sets an expiration time and a frequency counter for each row of data. If the row of data is not accessed within the expiration time, the value of the frequency counter corresponding to the row of data remains unchanged and is refreshed. The invalid expiration time is used to continue counting; if the row data is accessed before the expiration time expires, then the frequency counter corresponding to the row data executes an accessCount++ operation, and the expiration time is refreshed when the row data is accessed. Among them, the meaning of the accessCount++ operation has been explained above and will not be explained here.
示例性的,假设某一行数据对应的过期时间为30秒(s),在30s内该行数据没有被访问,则该行数据对应的accessCount的取值保持不变,并刷新过期时间,从第31s进入新一轮的过期时间。若在第15s的时候该行数据被访问,则该行数据对应的频度计数器执行一次accessCount++操作,并刷新过期时间,从第16s进入新一轮的过期时间。Exemplarily, assuming that the expiration time corresponding to a row of data is 30 seconds (s), and the row of data is not accessed within 30s, the value of accessCount corresponding to the row of data remains unchanged, and the expiration time is refreshed. 31s to enter a new round of expiration time. If the row of data is accessed at the 15th second, the frequency counter corresponding to the row of data executes an accessCount++ operation, and refreshes the expiration time, and enters a new round of expiration time from the 16th second.
可选的,与“1)基于频度计数器记录访问次数”类似,accessCount可以按照执行checkpoint操作的时间间隔,进行周期性的变化,还可以不进行周期性的变化,在上文已经详细描述,具体此处不再赘述。Optionally, similar to "1) Recording the number of accesses based on the frequency counter", the accessCount can be changed periodically according to the time interval of executing the checkpoint operation, or not. It has been described in detail above, The details will not be repeated here.
需要注意的是,上述三种方式只是记录访问次数的一些可能,在实际应用中,还可以基于类似原理记录访问次数,例如,设置定时器,若在定时器设置的时间之内,第一目标行数据被访问,则确定第一目标行数据为热数据;若在定时器设置的时间之内,第一目标行数据没有被访问,则确定第一目标行数据为冷数据。记录访问次数的具体过程根据实际 应用的需要确定,具体此处不做限定。It should be noted that the above three methods are only some possibilities for recording the number of visits. In practical applications, the number of visits can also be recorded based on similar principles. For example, setting a timer, if within the time set by the timer, the first target If the row data is accessed, it is determined that the first target row data is hot data; if the first target row data is not accessed within the time set by the timer, then it is determined that the first target row data is cold data. The specific process of recording the number of visits is determined according to the needs of the actual application, and is not specifically limited here.
在一些可选的实施例中,在步骤604之后,还可以执行步骤605网络设备反馈数据访问完成,其中,网络设备反馈数据访问完成的方式有多种:如果第一数据访问请求指示的是读第一目标行数据,那么网络设备可以向终端设备发送第一目标行数据,以反馈数据访问完成。如果第一数据访问请求指示的是修改第一目标行数据,那么网络设备可以向终端设备发送修改完成消息,以反馈数据访问完成。在实际应用中,还可以有其他的方式,具体此处不做限定。In some optional embodiments, after step 604, step 605 may also be executed by the network device to report the completion of the data access, where there are multiple ways for the network device to report the completion of the data access: if the first data access request indicates the read the first target row data, then the network device can send the first target row data to the terminal device to feedback the completion of data access. If the first data access request indicates to modify the first target row data, the network device may send a modification completion message to the terminal device to feed back the completion of the data access. In practical applications, there may be other manners, which are not specifically limited here.
本申请实施例中,提供了多种获取热度信息的方式,在实际应用中,可以灵活选择,提升了本申请技术方案的灵活性和实用性。In the embodiment of the present application, multiple ways of obtaining popularity information are provided, and in practical applications, they can be flexibly selected, which improves the flexibility and practicability of the technical solution of the present application.
在一些可选的实施例中,网络设备可以提供热度阈值配置接口,该热度阈值配置接口用于获取用户输入的热度阈值。网络设备根据热度信息和热度阈值,从行数据集合中确定热数据。热度阈值可以指示热数据在行数据集合中的占比,也可以指示热数据对应的访问次数的最小值,具体此处不做限定。下面分别对可能的情况进行说明。In some optional embodiments, the network device may provide a temperature threshold configuration interface, where the temperature threshold configuration interface is used to obtain a temperature threshold input by a user. The network device determines the hot data from the row data set according to the heat information and the heat threshold. The popularity threshold may indicate the proportion of hot data in the row data set, or indicate the minimum value of access times corresponding to the hot data, which is not limited here. The possible situations are described below.
A)用户输入的热度阈值指示的是热数据在行数据集合中的占比。A) The hotness threshold input by the user indicates the proportion of hot data in the row data set.
在一些可选的实施例中,热度阈值配置接口可以显示为图7a所示的阈值设置界面,如果用户设置的热度阈值为20%,那么用户只需要输入20即可。在实际应用中,阈值设置界面还可以有多种表现形式,用户所输入的数值也可以表现为不同的数量级,具体此处不做限定。In some optional embodiments, the heat threshold configuration interface may be displayed as the threshold setting interface shown in FIG. 7a. If the heat threshold set by the user is 20%, the user only needs to input 20. In practical applications, the threshold setting interface may also have multiple forms, and the values input by the user may also be expressed in different orders of magnitude, which is not specifically limited here.
假设网络设备按照上述“1)基于频度计数器记录访问次数”或者上述“3)根据频度计数器和过期时间记录访问次数”,并且在图4a所示步骤404中,存盘的热度信息是按照accessCount进行的降序排列,并且存盘的数据包括<accessCount,行号>,那么将存放在前的20%(也即访问频率最高的20%)的位置信息对应的行数据确定为热数据。Assume that the network device follows the above "1) record the number of accesses based on the frequency counter" or the above "3) record the number of accesses based on the frequency counter and expiration time", and in step 404 shown in Figure 4a, the popularity information saved to the disk is in accordance with the accessCount Descending order is performed, and the saved data includes <accessCount, row number>, then the row data corresponding to the location information stored in the first 20% (that is, the 20% with the highest access frequency) is determined as hot data.
可以理解的是,在实际应用中,可能会出现前20%之外的数据的访问频率与前20%之内的数据的访问频率相同的情况,例如,一个有5个行数据,前两个行数据对应的accessCount值是5个行数据中最高的,均为3,那么有多种处理情况。如果网络设备是按照上述“1)基于频度计数器记录访问次数”,那么网络可以确定这两个数据中任一个数据为热数据。如果网络设备是按照上述“3)根据频度计数器和过期时间记录访问次数”,那么网络设备可以进一步比较这两个行数据对应的过期时间的失效次数,选择失效次数少的行数据为热数据。如果失效次数相同,则可以确定这两个数据中任一个数据为热数据。It is understandable that in practical applications, there may be cases where the access frequency of data outside the top 20% is the same as that of data within the top 20%. The accessCount value corresponding to the row data is the highest among the 5 row data, all of which are 3, so there are multiple processing situations. If the network device follows the above "1) record the number of visits based on the frequency counter", then the network can determine that any one of the two data is hot data. If the network device records the number of visits according to the above "3) according to the frequency counter and expiration time", then the network device can further compare the number of failures corresponding to the expiration time of the two rows of data, and select the row data with fewer failure times as hot data . If the number of failures is the same, it can be determined that any one of the two data is hot data.
B)用户输入的热度阈值指示的是热数据对应的访问次数的最小值。B) The heat threshold value input by the user indicates the minimum value of the number of visits corresponding to the heat data.
在一些可选的实施例中,热度阈值配置接口可以显示为图7b所示的阈值设置界面。如图7b所示,用户设置的热度阈值为100次。在实际应用中,阈值设置界面还可以有多种表现形式,用户所输入的数值也可以为其他数值,具体此处不做限定。In some optional embodiments, the thermal threshold configuration interface may be displayed as the threshold setting interface shown in FIG. 7b. As shown in Figure 7b, the heat threshold set by the user is 100 times. In practical applications, the threshold setting interface may also have multiple representations, and the value input by the user may also be other values, which are not specifically limited here.
本申请实施例中,可以由用户设置热度阈值,增强了用户体验。此外,热度阈值有多种定义方式,可以根据实际应用的需要选择,进一步提升了本申请技术方案的灵活性。In the embodiment of the present application, the popularity threshold can be set by the user, which enhances the user experience. In addition, there are many ways to define the heat threshold, which can be selected according to the needs of practical applications, which further improves the flexibility of the technical solution of the present application.
在一些可选的实施例中,热度阈值还可以根据应用类型设置,或者由研发/测试人员设置,具体此处不做限定。In some optional embodiments, the temperature threshold can also be set according to the type of application, or set by the R&D/testing personnel, which is not specifically limited here.
可以理解的是,基于上述方式,网络设备从行数据集合中确定出热数据之后,便可以确定行数据集合中的其他数据为冷数据,从位置信息集合中确定出热数据对应的第一位置信息和冷数据对应的第二位置信息。It can be understood that, based on the above method, after the network device determines the hot data from the row data set, it can determine other data in the row data set as cold data, and determine the first location corresponding to the hot data from the location information set The information corresponds to the second location information of the cold data.
本申请实施例中,在执行完数据恢复流程之后,内存数据库中的数据恢复正常,网络设备可以接收数据访问请求,并执行对应的访问操作。也就是说,在图3所示实施例步骤303之后,网络设备可以获取来自于终端设备的第二数据访问请求,该第二数据访问请求指示访问行数据集合中的第二目标行数据。若第二目标行数据为热数据,则在内存中访问热数据;若第二目标行数据为冷数据,则根据第二位置信息加载冷数据。In the embodiment of the present application, after the data recovery process is executed, the data in the memory database returns to normal, and the network device can receive the data access request and execute the corresponding access operation. That is to say, after
下面结合图8,对内存数据库的数据恢复之后的数据访问过程进行说明,请参阅图8,图8为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:The following describes the data access process after the data recovery of the memory database in conjunction with FIG. 8. Please refer to FIG. 8. FIG. 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
801.网络设备获取来自于终端设备的第二数据访问请求。801. The network device acquires the second data access request from the terminal device.
终端设备可以向网络设备发送第二数据访问请求,第二数据访问请求指示访问行数据库中的第二目标行数据。其中,第二数据访问请求中可以携带第二行数据对应的行号,或者是第二目标行数据对应的唯一索引,以使网络设备能够根据该行号或者唯一索引确定被访问的行数据是第二目标行数据。访问第二目标行数据,除了包括对第二目标行数据进行读或写操作之外,还可以包括对第二目标行数据进行其他的操作,例如,修改第二目标行数据的内容等,具体此处不做限定。The terminal device may send a second data access request to the network device, where the second data access request indicates to access the second target row data in the row database. Wherein, the second data access request may carry the row number corresponding to the second row data, or the unique index corresponding to the second target row data, so that the network device can determine whether the row data being accessed is Second target row data. Accessing the second target row data may include performing other operations on the second target row data in addition to reading or writing the second target row data, for example, modifying the content of the second target row data, etc., specifically There is no limit here.
802.网络设备确定第二数据访问请求指示访问的第二目标行数据是否为热数据,若是,则执行步骤803,若否,则执行步骤804。802. The network device determines whether the second target row data to be accessed indicated by the second data access request is hot data, if yes, perform step 803, and if not, perform step 804.
本申请实施例中,由于行数据集合中的行数据分为了热数据和冷数据,在数据恢复过程中,分别对热数据和冷数据进行了不同的处理。因此,网络设备需要确定第二数据访问请求指示访问的第二目标行数据是否为热数据,并执行相应的操作。In the embodiment of the present application, since the row data in the row data set is divided into hot data and cold data, different processing is performed on the hot data and cold data respectively during the data recovery process. Therefore, the network device needs to determine whether the second target row data to be accessed indicated by the second data access request is hot data, and perform corresponding operations.
803.网络设备在内存中访问热数据。803. A network device accesses hot data in memory.
如果第二目标行数据为热数据,那么网络设备可以直接在内存中访问第二目标行数据。If the second target row data is hot data, the network device can directly access the second target row data in memory.
804.网络设备根据第二位置信息,加载冷数据。804. The network device loads cold data according to the second location information.
如果第二目标行数据为冷数据,那么网络设备需要根据第二位置信息,将冷数据加载出来,然后访问加载出来的冷数据。其中,第二位置信息指示的是冷数据在日志中的位置。If the second target row data is cold data, the network device needs to load the cold data according to the second location information, and then access the loaded cold data. Wherein, the second location information indicates the location of the cold data in the log.
在一些可选的实施例中,在步骤804之后,网络设备还可以执行步骤805网络设备反馈数据访问完成。与图6所示实施例步骤605类似,在步骤805中网络设备反馈数据访问完成的方式有多种,此处不再赘述。不同之处在于,在步骤605中,网络设备反馈的数据访问完成信息对应于第一目标行数据;在步骤805中,网络设备反馈的数据访问完成信息对应于第二目标行数据。In some optional embodiments, after step 804, the network device may also execute step 805, where the network device feeds back that the data access is completed. Similar to step 605 in the embodiment shown in FIG. 6 , in step 805 , there are multiple ways for the network device to report the completion of data access, which will not be repeated here. The difference is that in step 605, the data access completion information fed back by the network device corresponds to the first target row data; in step 805, the data access completion information fed back by the network device corresponds to the second target row data.
本申请实施例中,将内存数据库中的行数据集合分为热数据和冷数据,在数据恢复过程中对不同的数据进行不同的处理,访问频率高的热数据预先记载到内存中,在访问热数据时可以直接在内存中访问,保证了系统有良好的性能表现。访问频率低的冷数据则备份为位置信息,在访问冷数据时,根据位置信息在运行时加载,减少了内存数据库整体的数据恢复时间。In the embodiment of this application, the row data set in the memory database is divided into hot data and cold data, and different data are processed differently during the data recovery process. Hot data can be directly accessed in memory, ensuring good performance of the system. Cold data with low access frequency is backed up as location information. When accessing cold data, it is loaded at runtime according to the location information, which reduces the overall data recovery time of the memory database.
可以理解的是,在本申请实施例中,在数据访问的过程中,同样需要记录行数据集合中各个行数据的热度信息,为下一次执行checkpoint操作提供依据。It can be understood that, in the embodiment of the present application, during the process of data access, it is also necessary to record the popularity information of each row data in the row data set, so as to provide a basis for the next checkpoint operation.
在上文的介绍中,描述的是单机的数据处理过程,本申请实施例提供的数据处理方法,还可以应用在多个网络设备交互的场景下,下面对这种情况进行说明。In the above introduction, the data processing process of a single computer is described. The data processing method provided by the embodiment of the present application can also be applied in the scenario where multiple network devices interact. This situation will be described below.
请参阅图9,图9为本申请实施例提供的数据处理系统的一个结构示意图。Please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
如图9所示,数据处理系统包括第一网络设备和第二网络设备,第一网络设备可以称为主网络设备,第一网络设备上有主内存数据库;第二网络设备可以称为备用网络设备,第二网络设备上有备用内存数据库,或者从内存数据库。主从内存数据库的设计方案,能够实现高可用性。As shown in Figure 9, the data processing system includes a first network device and a second network device, the first network device may be called a main network device, and the first network device has a main memory database; the second network device may be called a backup network device, there is a backup in-memory database on the second network device, or a secondary in-memory database. The design scheme of the master-slave memory database can achieve high availability.
主内存数据库和从内存数据路之间采用同步技术,当主内存数据库故障时,从内存数据库可以快速接管。本申请实施例中,第一网络设备会与第二网络设备同步热检查点信息。Synchronization technology is adopted between the main memory database and the slave memory data path. When the main memory database fails, the slave memory database can quickly take over. In this embodiment of the present application, the first network device will synchronize hot checkpoint information with the second network device.
在一些可选的实施例中,数据处理系统中还包括管理设备,管理设备用于确定第一网络设备的状态,在第一网络设备故障时,向第二网络设备发送接管指令,使得第二网络设备能够快速接管第一网络设备的工作。In some optional embodiments, the data processing system further includes a management device, the management device is used to determine the status of the first network device, and when the first network device fails, send a takeover instruction to the second network device, so that the second network device The network device can quickly take over the work of the first network device.
请参阅图10,图10为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:Please refer to FIG. 10. FIG. 10 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
1001.第一网络设备获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率。1001. The first network device acquires hot checkpoint information, the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set.
在一些可选的实施例中,热检查点信息包括热数据,第一网络设备确定热数据的方式在上文已经详细说明,此处不再赘述。In some optional embodiments, the hot checkpoint information includes hot data, and the way for the first network device to determine the hot data has been described in detail above, and will not be repeated here.
在一些可选的实施例中,热检查点信息包括热数据对应的位置信息,该位置信息用于指示热数据在日志中的位置。可以理解的是,此处所示的热数据对应的位置信息,即为上文介绍的第一位置信息,网络设备如何获取第一位置信息在上文已经详细说明,此处不再赘述。In some optional embodiments, the hot checkpoint information includes location information corresponding to the hot data, and the location information is used to indicate the location of the hot data in the log. It can be understood that the location information corresponding to the hot data shown here is the first location information introduced above, and how the network device obtains the first location information has been described in detail above and will not be repeated here.
本申请实施例中,热检查点的信息有多种,可以根据实际应用的需要确定,提升了技术方案的灵活性。In the embodiment of the present application, there are various types of hot checkpoint information, which can be determined according to actual application requirements, which improves the flexibility of the technical solution.
1002.第一网络设备向第二网络设备发送热检查点信息。1002. The first network device sends hot checkpoint information to the second network device.
第一网络设备向第二网络设备异步发送热检查点信息,以使第一网络设备故障后,基终端设备可以基于第二网络设备访问热数据。The first network device sends hot checkpoint information to the second network device asynchronously, so that after the first network device fails, the base terminal device can access hot data based on the second network device.
1003.若第一网络设备故障,第二网络设备接管第一网络设备的工作。1003. If the first network device fails, the second network device takes over the work of the first network device.
在第一网络设备故障的情况下,第二网络设备会收到接管信号,根据该接管指令,接管第一网络设备的工作。其中,第一网络设备的工作包括访问内存数据库中的数据。下面结合图11进行说明,请参阅图11,图11为本申请实施例提供的数据处理方法的一个流程示意图,包括以下步骤:In the case of failure of the first network device, the second network device will receive a takeover signal, and take over the work of the first network device according to the takeover instruction. Wherein, the work of the first network device includes accessing data in the memory database. The following description will be made in conjunction with FIG. 11. Please refer to FIG. 11. FIG. 11 is a schematic flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
1101.管理设备向第二网络设备发送接管信号。1101. The management device sends a takeover signal to the second network device.
第一网络设备故障之后,管理设备会向第二网络设备发送接管信号,用于指示第二网 络设备接管第一网络设备的工作。After the first network device fails, the management device will send a takeover signal to the second network device to instruct the second network device to take over the work of the first network device.
1102.第二网络设备反馈接管完成。1102. The second network device reports that the takeover is complete.
第二网络设接管完成之后,可以向管理设备反馈接管完成。可选的,第二网络设备成功接收到第一网络设备在故障之前传输的热检查点信息,便可以认为第二网络设备接管成功。After the takeover is completed, the second network device may report the completion of the takeover to the management device. Optionally, if the second network device successfully receives the hot checkpoint information transmitted by the first network device before the failure, it may be considered that the second network device takes over successfully.
1103.终端设备向第二网络设备发送热数据访问请求。1103. The terminal device sends a hot data access request to the second network device.
由于第一网络设备故障,由第二网络设备接管了第一网络设备的工作,因此,数据访问请求由第二网络设备接收。终端设备可以向第二网络设备发送热数据访问请求,该请求用于指示访问行数据集合中的热数据。访问热数据,除了包括对热数据进行读或写操作之外,还可以包括对热数据进行其他的操作,例如,修改热数据的内容等,具体此处不做限定。Due to the failure of the first network device, the second network device takes over the work of the first network device, therefore, the data access request is received by the second network device. The terminal device may send a hot data access request to the second network device, where the request is used to instruct access to the hot data in the row data set. Accessing hot data may include, in addition to reading or writing the hot data, other operations on the hot data, for example, modifying the content of the hot data, etc., which is not limited here.
1104.第二网络设备在内存中直接访问热数据。1104. The second network device directly accesses the hot data in memory.
由于热数据访问请求指示访问的是热数据,在第二网络设备接收的热检查点信息为热数据时,第二网络设备会将热数据存储在内存中,在这种情况下,第二网络设备可以直接在内存中访问热数据。Since the hot data access request indicates that the hot data is accessed, when the hot checkpoint information received by the second network device is hot data, the second network device will store the hot data in the memory. In this case, the second network Devices can access hot data directly in memory.
可选的,如果在第二网络设备接收的热检查点信息为热数据对应的位置信息,那么第二网络设备在收到该位置信息之后,会根据该位置信息将热数据加载至内存中,便于在后续过程中,第二网络设备在内存中访问热数据。Optionally, if the hot checkpoint information received by the second network device is the location information corresponding to the hot data, then the second network device will load the hot data into the memory according to the location information after receiving the location information, It is convenient for the second network device to access the hot data in the memory in the subsequent process.
1105.第二网络设备反馈热数据访问完成。1105. The second network device feeds back that hot data access is complete.
在一些可选的实施例中,在步骤1104之后,网络设备还可以执行步骤1105第二网络设备反馈数据访问完成。与图6所示实施例步骤605类似,在步骤1105中第二网络设备反馈热数据访问完成的方式有多种,此处不再赘述。不同之处在于,在步骤605中,执行主体是网络设备,且网络设备反馈的数据访问完成信息对应于第一目标行数据;在步骤1105中,执行主体是第二网络设备,且第二网络设备反馈的数据访问完成信息对应于热数据。In some optional embodiments, after step 1104, the network device may also perform step 1105, and the second network device feeds back that the data access is completed. Similar to step 605 in the embodiment shown in FIG. 6 , in step 1105 , the second network device can feedback the completion of hot data access in many ways, which will not be repeated here. The difference is that in step 605, the execution subject is a network device, and the data access completion information fed back by the network device corresponds to the first target row data; in step 1105, the execution subject is the second network device, and the second network The data access completion information fed back by the device corresponds to hot data.
在一些可选的实施例中,行数据集合中除了包括热数据之外,还包括冷数据,第二网络设备还可以实现冷数据的访问,也即执行步骤1106至步骤1108。可以理解的是,图11只是对本申请技术方案的一个示例,并不构成对步骤的限定,步骤1106与步骤1103之间没有必然的先后顺序。In some optional embodiments, the row data set includes not only hot data but also cold data, and the second network device can also implement access to cold data, that is, perform steps 1106 to 1108 . It can be understood that FIG. 11 is only an example of the technical solution of the present application, and does not constitute a limitation on steps, and there is no necessary sequence between step 1106 and step 1103 .
1106.终端设备向第二网络设备发送冷数据访问请求。1106. The terminal device sends a cold data access request to the second network device.
终端设备可以向第二网络设备发送冷数据访问请求,该请求用于指示访问行数据集合中的冷数据。访问冷数据,除了包括对冷数据进行读或写操作之外,还可以包括对冷数据进行其他的操作,例如,修改冷数据的内容等,具体此处不做限定。The terminal device may send a cold data access request to the second network device, where the request is used to instruct to access the cold data in the row data set. In addition to reading or writing cold data, accessing cold data may also include performing other operations on cold data, for example, modifying the content of cold data, etc., which is not limited here.
1107.第二网络设备根据冷数据的位置信息,在运行时加载冷数据。1107. The second network device loads the cold data during operation according to the location information of the cold data.
第一网络设备和第二网络设备共享磁盘,第一网络设备可以执行前述图4a所示的操作,将行数据集合对应的位置信息集合进行存盘。第二网络设备可以通过磁盘获取冷数据对应的位置信息,并基于该位置信息在运行时加载冷数据,以完成对冷数据的访问。The first network device and the second network device share a disk, and the first network device may perform the operation shown in FIG. 4a to save the location information set corresponding to the row data set to disk. The second network device may obtain the location information corresponding to the cold data through the disk, and load the cold data during operation based on the location information, so as to complete the access to the cold data.
1108.第二网络设备反馈冷数据访问完成。1108. The second network device reports back that the cold data access is completed.
在一些可选的实施例中,在步骤1104之后,网络设备还可以执行步骤1108第二网络设备反馈数据访问完成。与图6所示实施例步骤605类似,在步骤1108中第二网络设备反馈冷数据访问完成的方式有多种,此处不再赘述。不同之处在于,在步骤605中,执行主体是网络设备,且网络设备反馈的数据访问完成信息对应于第一目标行数据;在步骤1108中,执行主体是第二网络设备,且第二网络设备反馈的数据访问完成信息对应于冷数据。In some optional embodiments, after step 1104, the network device may also perform step 1108, and the second network device feeds back that the data access is completed. Similar to step 605 in the embodiment shown in FIG. 6 , in step 1108 , the second network device may feedback the completion of the cold data access in various ways, which will not be repeated here. The difference is that in step 605, the execution subject is a network device, and the data access completion information fed back by the network device corresponds to the first target row data; in step 1108, the execution subject is a second network device, and the second network The data access completion information fed back by the device corresponds to cold data.
本申请实施例中,第一网络设备向第二网络设备同步的是热检查点信息而不是日志信息,使得第一网络设备和第二网络设备的同步速度相匹配,更符合实际应用的需要,也提升了本申请技术方案的实用性。In the embodiment of the present application, the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
需要注意的是,本申请实施例中,在第一网络设备没有故障的情况下,第一网络设备可以还执行前述图1至图8所示实施例中网络设备所执行的操作;在第一网络设备故障的情况下,第二网络设备还可以执行前述图1至图8所示实施例中网络设备所执行的操作,此处不再赘述。It should be noted that in this embodiment of the application, if the first network device is not faulty, the first network device may also perform the operations performed by the network device in the embodiments shown in FIGS. 1 to 8; In the case of a network device failure, the second network device may also perform the operations performed by the network device in the foregoing embodiments shown in FIG. 1 to FIG. 8 , which will not be repeated here.
本申请实施例还提供了一种数据处理方法,该方法应用于数据处理系统,该数据处理系统包括第一网络设备和第二网络设备。The embodiment of the present application also provides a data processing method, the method is applied to a data processing system, and the data processing system includes a first network device and a second network device.
下面结合图12对这种方法进行说明,请参阅图12,图12为本申请实施例提供的数据处理方法的一个流程图,包括以下步骤:This method is described below in conjunction with FIG. 12. Please refer to FIG. 12. FIG. 12 is a flow chart of the data processing method provided by the embodiment of the present application, including the following steps:
1201.第一网络设备获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率。1201. The first network device acquires hot checkpoint information, the hot checkpoint information corresponds to hot data in the row data set, and the access frequency of the hot data is higher than that of other data in the row data set.
1202.第一网络设备向第二网络设备发送热检查点信息。1202. The first network device sends hot checkpoint information to the second network device.
步骤1201至步骤1202与前述图10所示实施例中步骤1001和步骤1002类似,详见步骤1001和步骤1002的描述,此处不再赘述。Step 1201 to step 1202 are similar to step 1001 and step 1002 in the aforementioned embodiment shown in FIG. 10 , please refer to the description of step 1001 and step 1002 for details, and details are not repeated here.
1203.第二网络设备根据热检查点信息,在内存中加载热数据。1203. The second network device loads hot data into the memory according to the hot checkpoint information.
第二网络设备获取到热检查点信息之后,会在内存中加载热数据。具体来说,如果热检查点信息为热数据,第二网络设备会将热数据存储至内存中;如果热检查点信息为热数据对应的位置信息,第二网络设备会将根据热数据对应的位置信息,将热数据加载到内存中。After the second network device obtains the hot checkpoint information, it loads hot data into the memory. Specifically, if the hot checkpoint information is hot data, the second network device will store the hot data in the memory; if the hot checkpoint information is the location information corresponding to the hot data, the second network device will store the Location information to load hot data into memory.
1204.若第一网络设备故障,第二网络设备接收数据访问请求,数据访问请求指示访问热数据。1204. If the first network device fails, the second network device receives a data access request, and the data access request indicates access to hot data.
第一网络设备故障的情况下,由第二网络设备执行数据访问操作。第二网络设备可以收到指示访问热数据的数据访问请求。When the first network device fails, the data access operation is performed by the second network device. The second network device may receive a data access request indicating access to hot data.
1205.第二网络设备响应数据访问请求,在内存中访问热数据。1205. The second network device responds to the data access request, and accesses the hot data in the memory.
在一些可选的实施例中,第二网络设备还可以执行图11所示实施例中步骤1106至1108所示的操作,具体此处不再赘述。In some optional embodiments, the second network device may also perform the operations shown in steps 1106 to 1108 in the embodiment shown in FIG. 11 , which will not be detailed here.
本申请实施例中,第一网络设备向第二网络设备同步的是热检查点信息而不是日志信息,使得第一网络设备和第二网络设备的同步速度相匹配,更符合实际应用的需要,也提升了本申请技术方案的实用性。In the embodiment of the present application, the first network device synchronizes the hot checkpoint information with the second network device instead of log information, so that the synchronization speed of the first network device and the second network device match, which is more in line with the needs of practical applications. It also improves the practicability of the technical solution of the present application.
需要注意的是,本申请实施例中,在第一网络设备没有故障的情况下,第一网络设备可以还执行前述图1至图8所示实施例中网络设备所执行的操作;在第一网络设备故障的情况下,第二网络设备还可以执行前述图1至图8所示实施例中网络设备所执行的操作,此处不再赘述。It should be noted that in this embodiment of the application, if the first network device is not faulty, the first network device may also perform the operations performed by the network device in the embodiments shown in FIGS. 1 to 8; In the case of a network device failure, the second network device may also perform the operations performed by the network device in the foregoing embodiments shown in FIG. 1 to FIG. 8 , which will not be repeated here.
接下来,对本申请实施例提供的相关装置及设备进行说明。Next, related devices and equipment provided in the embodiments of the present application will be described.
请参阅图13,图13为本申请实施例提供的网络设备的一个结构示意图,网络设备1300包括:Please refer to FIG. 13. FIG. 13 is a schematic structural diagram of a network device provided by an embodiment of the present application. The
获取单元1301,用于获取行数据集合的位置信息集合,行数据集合包括热数据,位置信息集合包括热数据对应的第一位置信息,热数据的访问频率高于行数据集合中其他数据的访问频率。The acquiring
获取单元1301,还用于获取数据恢复指令,数据恢复指令用于指示恢复行数据集合中的数据。The acquiring
处理单元1302,用于响应数据恢复指令,根据第一位置信息,将热数据加载到内存中。The
在一些可选的实施例中,处理单元1302,还用于记录行数据集合中每个行数据的热度信息,热度信息用于指示每个行数据的访问频率。In some optional embodiments, the
在一些可选的实施例中,热度信息包括访问次数,处理单元1302,具体用于:获取第一数据访问请求,第一数据访问请求指示访问行数据集合中的第一目标行数据;根据第一数据访问请求,增加第一目标行数据对应的访问次数。In some optional embodiments, the popularity information includes the number of visits, and the
在一些可选的实施例中,处理单元1302,还用于:提供热度阈值配置接口,热度阈值配置接口用于获取用户输入的热度阈值;根据热度信息和热度阈值,从行数据集合中确定热数据。In some optional embodiments, the
在一些可选的实施例中,其他数据包括冷数据,位置信息集合还包括冷数据对应的第二位置信息。In some optional embodiments, the other data includes cold data, and the location information set further includes second location information corresponding to the cold data.
获取单元1301,还用于获取来自于客户端的第二数据访问请求,第二数据访问请求指示访问行数据集合中的第二目标行数据;The acquiring
处理单元1302,还用于:若第二目标行数据为热数据,则在内存中访问热数据;若第二目标行数据为冷数据,则根据第二位置信息加载冷数据。The
网络设备1300用于执行前述图1至图8所示实施例中网络设备所执行的操作,具体此处不再赘述。The
请参阅图14,图14为本申请实施例提供的第一网络设备的一个结构示意图,第一网络设备1400包含于数据处理系统,该数据处理系统还包括第二网络设备,第一网络设备1400包括:Please refer to FIG. 14. FIG. 14 is a schematic structural diagram of the first network device provided by the embodiment of the present application. The
获取单元1401,用于获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率;The obtaining
发送单元1402,用于向第二网络设备发送热检查点信息,以使第一网络设备故障后,基于第二网络设备访问热数据。The sending
在一些可选的实施例中,热检查点信息包括热数据。In some optional embodiments, the hot checkpoint information includes hot data.
在一些可选的实施例中,热检查点信息包括热数据对应的位置信息,位置信息用于指示热数据在日志中的位置。In some optional embodiments, the hot checkpoint information includes location information corresponding to the hot data, and the location information is used to indicate the location of the hot data in the log.
第一网络设备1400用于执行前述图9至图11所示实施例中第一网络设备所执行的操作,具体此处不再赘述。The
请参阅图15,图15为本申请实施例提供的数据处理系统的一个结构示意图,数据处理系统1500包括第一网络设备1501和第二网络设备1502。Please refer to FIG. 15 . FIG. 15 is a schematic structural diagram of a data processing system provided by an embodiment of the present application. The
第一网络设备1501,用于:获取热检查点信息,热检查点信息对应于行数据集合中的热数据,热数据的访问频率高于行数据集合中其他数据的访问频率。向第二网络设备发送热检查点信息。The
第二网络设备1502,用于:根据热检查点信息,在内存中加载热数据。若第一网络设备故障,第二网络设备接收数据访问请求,数据访问请求指示访问热数据。响应数据访问请求,在内存中访问热数据。The
数据处理系统1500用于实现前述图9至图12所示实施例中数据处理系统所实现的功能,具体此处不再赘述。The
下面,对本申请实施例提供的网络设备进行说明,请参阅图16,图16为本申请实施例提供的网络设备的一个结构示意图。Next, the network device provided by the embodiment of the present application will be described, please refer to FIG. 16 , which is a schematic structural diagram of the network device provided by the embodiment of the present application.
该网络设备1600包括:处理器1601和存储器1602,存储器1602中存储有一个或一个以上的应用程序或数据。The
其中,存储器1602可以是易失性存储或持久存储。存储在存储器1602的程序可以包括一个或一个以上模块,每个模块可以用于执行网络设备1600所执行的一系列操作。更进一步地,处理器1601可以与存储器1602通信,在网络设备1600上执行存储器1602中的一系列指令操作。处理器1601可以是中央处理器(central processing units,CPU),也可以是单核处理器,除此之外,还可以是其他类型的处理器,例如双核处理器,具体此处不做限定。Wherein, the
网络设备1600还可以包括一个或一个以上通信接口1603,一个或一个以上操作系统,例如Windows Server
TM,Mac OS X
TM,Unix
TM,Linux
TM,FreeBSD
TM等。
The
该网络设备1600可以执行前述图1至图8所示实施例中网络设备所执行的操作,以及前述图9至图11所示实施例中第一网络设备所执行的操作,此处不再赘述。The
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
Claims (21)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111582635.3A CN116340051A (en) | 2021-12-22 | 2021-12-22 | Data processing method, related device and equipment |
CN202111582635.3 | 2021-12-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023115935A1 true WO2023115935A1 (en) | 2023-06-29 |
Family
ID=86879272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/107586 WO2023115935A1 (en) | 2021-12-22 | 2022-07-25 | Data processing method, and related apparatus and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116340051A (en) |
WO (1) | WO2023115935A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117909601A (en) * | 2024-03-15 | 2024-04-19 | 厦门她趣信息技术有限公司 | Payment social matching method, device, equipment and readable storage medium |
CN118296015A (en) * | 2024-06-05 | 2024-07-05 | 恒生电子股份有限公司 | Data processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014209234A1 (en) * | 2013-06-26 | 2014-12-31 | Agency For Science, Technology And Research | Method and apparatus for hot data region optimized dynamic management |
CN106547484A (en) * | 2016-10-20 | 2017-03-29 | 华中科技大学 | It is a kind of that internal storage data reliability method and system realized based on RAID5 |
CN111459710A (en) * | 2020-03-27 | 2020-07-28 | 华中科技大学 | Erasure code memory recovery method, device and memory system capable of sensing heat degree and risk |
CN111831423A (en) * | 2019-04-15 | 2020-10-27 | 阿里巴巴集团控股有限公司 | A method and system for implementing Redis in-memory database on non-volatile memory |
-
2021
- 2021-12-22 CN CN202111582635.3A patent/CN116340051A/en active Pending
-
2022
- 2022-07-25 WO PCT/CN2022/107586 patent/WO2023115935A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014209234A1 (en) * | 2013-06-26 | 2014-12-31 | Agency For Science, Technology And Research | Method and apparatus for hot data region optimized dynamic management |
CN106547484A (en) * | 2016-10-20 | 2017-03-29 | 华中科技大学 | It is a kind of that internal storage data reliability method and system realized based on RAID5 |
CN111831423A (en) * | 2019-04-15 | 2020-10-27 | 阿里巴巴集团控股有限公司 | A method and system for implementing Redis in-memory database on non-volatile memory |
CN111459710A (en) * | 2020-03-27 | 2020-07-28 | 华中科技大学 | Erasure code memory recovery method, device and memory system capable of sensing heat degree and risk |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117909601A (en) * | 2024-03-15 | 2024-04-19 | 厦门她趣信息技术有限公司 | Payment social matching method, device, equipment and readable storage medium |
CN117909601B (en) * | 2024-03-15 | 2024-05-17 | 厦门她趣信息技术有限公司 | Payment social matching method, device, equipment and readable storage medium |
CN118296015A (en) * | 2024-06-05 | 2024-07-05 | 恒生电子股份有限公司 | Data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN116340051A (en) | 2023-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11397648B2 (en) | Virtual machine recovery method and virtual machine management device | |
EP3726365A1 (en) | Data processing method and device | |
EP3722973B1 (en) | Data processing method and device for distributed database, storage medium, and electronic device | |
EP2474919A1 (en) | System and method for data replication between heterogeneous databases | |
US20150213100A1 (en) | Data synchronization method and system | |
WO2019070915A1 (en) | Partial database restoration | |
EP2976714B1 (en) | Method and system for byzantine fault tolerant data replication | |
CN111078667B (en) | Data migration method and related device | |
US20190155705A1 (en) | Coordinated Replication of Heterogeneous Database Stores | |
CN107506266B (en) | Data recovery method and system | |
CN103516736A (en) | Data recovery method of distributed cache system and a data recovery device of distributed cache system | |
WO2016115217A1 (en) | Data backup method and apparatus | |
WO2023115935A1 (en) | Data processing method, and related apparatus and device | |
US11748215B2 (en) | Log management method, server, and database system | |
WO2019020081A1 (en) | Distributed system and fault recovery method and apparatus thereof, product, and storage medium | |
US12210505B2 (en) | Operation request processing method, apparatus, device, readable storage medium, and system | |
EP3480705B1 (en) | Database data modification request processing method and apparatus | |
US10409691B1 (en) | Linking backup files based on data partitions | |
US8843450B1 (en) | Write capable exchange granular level recoveries | |
WO2022033269A1 (en) | Data processing method, device and system | |
US20130290385A1 (en) | Durably recording events for performing file system operations | |
CN112650629B (en) | Block chain index data recovery method, device, equipment and computer storage medium | |
CN111404737B (en) | Disaster recovery processing method and related device | |
CN114791901A (en) | Data processing method, device, equipment and storage medium | |
CN114490570A (en) | Production data synchronization method and device, data synchronization system and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22909265 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22909265 Country of ref document: EP Kind code of ref document: A1 |