[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109947709B - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
CN109947709B
CN109947709B CN201910261866.0A CN201910261866A CN109947709B CN 109947709 B CN109947709 B CN 109947709B CN 201910261866 A CN201910261866 A CN 201910261866A CN 109947709 B CN109947709 B CN 109947709B
Authority
CN
China
Prior art keywords
data
target
index information
directory
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910261866.0A
Other languages
Chinese (zh)
Other versions
CN109947709A (en
Inventor
田勇
司春峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910261866.0A priority Critical patent/CN109947709B/en
Publication of CN109947709A publication Critical patent/CN109947709A/en
Application granted granted Critical
Publication of CN109947709B publication Critical patent/CN109947709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data storage method and device. One embodiment of the method comprises: determining index information of data in a target data set, wherein the target data set comprises a plurality of pieces of data, and each piece of data comprises a key and a value; index information for each piece of data is added to the target data set. According to the embodiment of the application, on the basis of determining the index information of the data, the data can be conveniently addressed by scanning the index information subsequently. Therefore, the situation that the time consumption is too long due to the fact that all key value pairs are required to be scanned in a full disk mode during starting can be avoided, and the data processing efficiency is improved.

Description

Data storage method and device
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to a data storage method and device.
Background
With the development of data storage technology, various data structures are generated. The existing Key-value data structure needs to scan each Key value pair when starting and before cleaning the junk data.
The efficiency of starting the data storage mode and cleaning the junk data is low, and the data processing time is prolonged.
Disclosure of Invention
The embodiment of the application provides a data storage method and device.
In a first aspect, an embodiment of the present application provides a data storage method, including: determining index information of data in a target data set, wherein the target data set comprises a plurality of pieces of data, and each piece of data comprises a key and a value; and adding the index information of each piece of data to the target data set.
In some embodiments, the target data set belongs to a total data set, the total data set includes at least one data set, each data set is stored in at least two directories, the number of files in each directory is a preset number, and the preset number is less than or equal to a preset file number threshold.
In some embodiments, determining index information for data in the target data set comprises: composing the key and at least one of the following into index information: the data processing method comprises the steps of obtaining the serial number of a data file where data are located, the length of the data, the offset address of the data and the serial number of a directory corresponding to the data, wherein the serial number is used for representing the arrangement sequence of each file under the directory.
In some embodiments, the method further comprises: in response to receiving an operation instruction on the target data set, the index information of each piece of data is scanned.
In some embodiments, adding the index information of the pieces of data to the target data set includes: and writing the index information of each piece of data into an index file under a directory where the target data set is located, wherein the index file and the data file which correspond to each other are stored under the directory.
In some embodiments, the method further comprises: and writing effective index information indicating effective data in the index file into a memory space.
In some embodiments, in response to receiving an operation instruction on a target data set, scanning index information of each piece of data includes: scanning the effective index information of each piece of data in the memory space in response to receiving one of the following instructions for the target data set: searching for an instruction, modifying the instruction, deleting the instruction, adding a data instruction and reading the instruction; and in response to receiving the rewriting instruction of the target data set, scanning the index file and effective index information of each piece of data in the memory space.
In some embodiments, the method is applied to a target electronic device, the target data set being stored at the target electronic device, the method further comprising: in response to receiving a closing instruction of the target electronic equipment, performing persistence processing on effective index information in the memory space on the magnetic disk; and the method further comprises: and writing the valid index information obtained through persistence into the memory space in response to receiving a starting instruction of the target electronic equipment.
In some embodiments, each data set is stored under two directories.
In some embodiments, the method further comprises: receiving a data rewriting instruction; taking a data file with the maximum sequence number or the minimum sequence number under a preset target directory in at least two directories as a starting point, and rewriting effective data under the target directory according to the sequence of the sequence numbers to obtain rewritten data; receiving a data updating instruction; and taking the data file of the other one of the largest sequence number and the smallest sequence number under the target directory as a starting point, and adding the updating data under the target directory according to the sequence of the sequence numbers.
In some embodiments, before completing the overwriting of the valid data under the target directory, performing an operation of adding the updated data under the target directory; or completing the operation of rewriting the valid data in the target directory before completing the addition of the updated data in the target directory.
In some embodiments, rewriting valid data under the target directory in order of the size of the sequence number includes: and in response to the fact that the size of the current data file obtained by rewriting reaches a preset file size threshold, rewriting a next data file of the current data file according to the size sequence of the sequence numbers.
In some embodiments, adding the update data under the target directory in order of the size of the sequence number includes: and in response to the fact that the size of the current data file obtained through updating reaches a preset file size threshold, adding updating data to the next data file of the current data file according to the sequence of the sequence numbers.
In some embodiments, the method further comprises: determining index information of the updated data, and writing the index information into a target directory; and determining the index information of the rewritten data and writing the rewritten data into the target directory.
In some embodiments, the method further comprises: and migrating data under other directories except the target directory to the target directory.
In a second aspect, an embodiment of the present application provides a data storage device, including: a determination unit configured to determine index information of data in a target data set, wherein the target data set includes a plurality of pieces of data, each piece of data including a key and a value; an adding unit configured to add index information of each piece of data to the target data set.
In some embodiments, the target data set belongs to a total data set, the total data set includes at least one data set, each data set is stored in at least two directories, the number of files in each directory is a preset number, and the preset number is less than or equal to a preset file number threshold.
In some embodiments, determining index information for data in the target data set comprises: composing the key and at least one of the following into index information: the data processing method comprises the steps of obtaining the serial number of a data file where data are located, the length of the data, the offset address of the data and the serial number of a directory corresponding to the data, wherein the serial number is used for representing the arrangement sequence of each file under the directory.
In some embodiments, the apparatus further comprises: and the scanning unit is configured to scan the index information of each piece of data in response to receiving an operation instruction on the target data set.
In some embodiments, the adding unit: the index information of each piece of data is written into an index file under a directory where the target data set is located, wherein the index file and the data file corresponding to each other are stored under the directory.
In some embodiments, the apparatus further comprises: and the writing memory unit is configured to write effective index information indicating effective data in the index file into the memory space.
In some embodiments, the scanning unit is configured to perform scanning the index information of the pieces of data in response to receiving the operation instruction on the target data set as follows: scanning the effective index information of each piece of data in the memory space in response to receiving one of the following instructions for the target data set: searching for an instruction, modifying the instruction, deleting the instruction, adding a data instruction and reading the instruction; and in response to receiving the rewriting instruction of the target data set, scanning the index file and effective index information of each piece of data in the memory space.
In some embodiments, the apparatus is applied to a target electronic device, the target data set being stored on the target electronic device, the apparatus further comprising: the persistent unit is configured to respond to receiving a closing instruction of the target electronic equipment and carry out persistent processing on the effective index information in the memory space on the disk; and the apparatus further comprises: and the writing unit is configured to write the valid index information obtained through persistence into the memory space in response to receiving a starting instruction of the target electronic equipment.
In some embodiments, the scanning unit is configured to perform scanning the index information of the pieces of data in response to receiving the operation instruction on the target data set as follows: in response to receiving an operation instruction on the target data set, scanning index information of each piece of data in the index file.
In some embodiments, each data set is stored under two directories.
In some embodiments, the apparatus further comprises: a first receiving unit configured to receive a data rewriting instruction; the rewriting unit is configured to take a data file with the maximum sequence number or the minimum sequence number under a preset target directory in at least two directories as a starting point, and rewrite effective data under the target directory according to the sequence of the sequence numbers to obtain rewritten data; a second receiving unit configured to receive a data update instruction; and an updating unit configured to add update data under the target directory in order of the size of the sequence numbers, with a data file of the other of the largest sequence number and the smallest sequence number under the target directory as a starting point.
In some embodiments, before completing the overwriting of the valid data under the target directory, performing an operation of adding the updated data under the target directory; or completing the operation of rewriting the valid data in the target directory before completing the addition of the updated data in the target directory.
In some embodiments, the rewriting unit is further configured to perform rewriting valid data under the target directory in order of the size of the sequence number as follows: and in response to the fact that the size of the current data file obtained by rewriting reaches a preset file size threshold, rewriting a next data file of the current data file according to the size sequence of the sequence numbers.
In some embodiments, the update unit is further configured to perform the increasing of the update data under the target directory in order of the size of the sequence number as follows: and in response to the fact that the size of the current data file obtained through updating reaches a preset file size threshold, adding updating data to the next data file of the current data file according to the sequence of the sequence numbers.
In some embodiments, the apparatus further comprises: a first index determining unit configured to determine index information of the update data and write the index information into the target directory; and a second index determining unit configured to determine index information of the overwrite data and write the overwrite data to the target directory.
In some embodiments, the apparatus further comprises: and the migration unit is configured to migrate data in other directories except the target directory to the target directory.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of the data storage method.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a method as in any embodiment of the data storage method.
According to the data storage scheme provided by the embodiment of the application, index information of data in a target data set is determined firstly, wherein the target data set comprises a plurality of pieces of data, and each piece of data comprises a key and a value. Thereafter, the index information of each piece of data is added to the target data set. According to the embodiment of the application, on the basis of determining the index information of the data, the data can be conveniently addressed by scanning the index information subsequently. Therefore, the situation that the time consumption is too long due to the fact that all key value pairs are required to be scanned in a full disk mode during starting can be avoided, and the data processing efficiency is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2a is a flow diagram of one embodiment of a data storage method according to the present application;
FIG. 2b is a schematic illustration of the organization of values of data according to the data storage method of the present application;
FIG. 3a is a flow chart of yet another embodiment of a data storage method according to the present application;
FIG. 3b is a schematic representation of the organization of an index according to the data storage method of the present application;
FIG. 4 is a schematic illustration of data updating and data rewriting according to the data storage method of the present application;
FIG. 5 is a schematic block diagram of one embodiment of a data storage device according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the data storage method or data storage apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as a data storage application, a video-like application, a live application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103. When the client application on the terminal device 101, 102, 103 is started, the data in the memory can be read.
Here, the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background data server providing support for the terminal devices 101, 102, 103. The background data server may analyze and otherwise process data in the received target data set, and feed back a processing result (for example, data such as the target data set to which the index information is added) to the terminal device.
It should be noted that the data storage method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103, and accordingly, the data storage apparatus may be disposed in the server 105 or the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2a, a flow 200 of one embodiment of a data storage method according to the present application is shown. The data storage method comprises the following steps:
step 201, determining index information of data in a target data set, wherein the target data set comprises a plurality of pieces of data, and each piece of data comprises a key and a value.
In this embodiment, an execution subject of the data storage method (for example, the server or the terminal device shown in fig. 1) may determine index information of each piece of data. The target data set is stored in the form of key-value pairs (key-values). One index information indicates a position where one piece of data is stored, and the data indicated by the index information can be found through the index information. The index information determined here includes not only keys but also other information that is information for finding data. For example, the number of the directory corresponding to the data or the serial number of the file in the directory, etc. Specifically, the execution body may determine the index information in various ways. For example, the execution agent may compose index information from numbers of directories corresponding to keys and data, and may compose index information from keys and physical addresses.
In practice, the data may be organized in a graph as in FIG. 2b, where the header may represent the length of the key and the length of the value for a piece of data. value is a piece of data. crc represents the check values of the above header and value.
Step 202, adding the index information of each piece of data to the target data set.
In this embodiment, the execution body may add the determined index information of each piece of data to the target data set. Thus, the target data set includes the index information, and the data indicated by each index information.
The method provided by the above embodiment of the application can determine the index information of the data, so that the data can be addressed by scanning the index information subsequently. Therefore, the situation that the time consumption is too long due to the fact that all key value pairs are required to be scanned in a full disk mode during starting can be avoided, and the data processing efficiency is improved.
With further reference to FIG. 3a, a flow 300 of yet another embodiment of a data storage method is illustrated. The process 300 of the data storage method includes the following steps:
step 301, determining index information of data in a target data set, wherein the target data set includes a plurality of pieces of data, and each piece of data includes a key and a value.
In this embodiment, an execution subject of the data storage method (for example, the server or the terminal device shown in fig. 1) may determine index information of each piece of data. The target data set is stored in the form of key-value pairs. One index information indicates a position where one piece of data is stored, and the data indicated by the index information can be found through the index information. The index information determined here includes not only keys but also other information that is information for finding data.
Step 302, adding the index information of each piece of data to the target data set.
In this embodiment, the execution body may add the determined index information of each piece of data to the target data set. Thus, the target data set includes the index information, and the data indicated by each index information.
Step 303, in response to receiving an operation instruction on the target data set, scanning index information of each piece of data.
In this embodiment, the execution body may scan index information of each piece of data in response to receiving an operation instruction on the target data set. By scanning the index information, the position of each piece of data can be determined, so that the subsequent operation of searching the data and the like is facilitated. The operation instruction may be various operation instructions, such as a data search instruction, a data modification instruction, and the like.
And the scanning index information replaces the scanning key value pair, so that the data can be effectively addressed, the scanning time is obviously shortened, and the data processing efficiency is improved.
In some optional implementations of this embodiment, step 302 includes: and writing the index information of each piece of data into an index file under a directory where the target data set is located, wherein the index file and the data file which correspond to each other are stored under the directory.
In these alternative implementations, the execution body may write the determined index information of each piece of data into an index file. In this way, index information and data indicated by each index information are located in one of the directories where the target data set is located. Each data file has an index file corresponding to the data file under the same directory, and the index file stores index information of data in the data file. Typically, the index file is stored on disk. Specifically, even under the same directory, a file in which the index information is written is different from a file in which the data is written.
The index information and the data of the implementation modes are respectively in different files, so that only the index information in the index file is scanned at the time of starting and before garbage data cleaning, and the position of the data is quickly determined. In addition, the generated index file can also accelerate and construct index information in a memory space so as to facilitate further application.
In a further optional implementation manner, valid index information indicating valid data in the index file is written into the memory space.
In these alternative implementations, the execution body may only write valid index information into the memory space. Therefore, the effective index information in the memory can be scanned subsequently when the operation instruction is received, or the effective index information is stored in the disk, and then the effective index information on the disk is scanned. The scanning mode can avoid scanning invalid data, avoid invalid scanning and improve the scanning speed.
In some optional cases of the above further optional implementations, step 303 may include:
scanning the effective index information of each piece of data in the memory space in response to receiving one of the following instructions for the target data set: searching for an instruction, modifying the instruction, deleting the instruction, adding a data instruction and reading the instruction; and in response to receiving the rewriting instruction of the target data set, scanning the index file and effective index information of each piece of data in the memory space.
In these optional cases, if any of the above instructions for data in the target data set are received, then the memory space is scanned for valid index information. The index information is quickly scanned by means of the higher processing speed of the memory space. If the received command is a rewrite command for rewriting valid data in the disk, the index information in the disk and the valid index information in the memory need to be scanned. Since the corresponding invalid data is deleted when updated information is received in the memory space. Thus, the data overwritten in the disk is ensured to be valid with reference to the valid index information in the memory space. The add data instruction may instruct to add update data for updating the original data, or may instruct to add other data.
Optionally, the method in the foregoing case may be applied to a target electronic device, where the target data set is stored, and the method further includes: in response to receiving a closing instruction of the target electronic equipment, performing persistence processing on effective index information in the memory space on the magnetic disk; and the method further comprises: and writing the valid index information obtained through persistence into the memory space in response to receiving a starting instruction of the target electronic equipment.
The execution main body can make the effective index information in the memory persistent to the disk when the electronic equipment is closed, so that the effective index information obtained through persistence can be written into the memory space when the electronic equipment is started next time, and the index information in the memory can be quickly constructed. Meanwhile, when an instruction of a target data set is received, index information in the memory space is scanned conveniently, so that rapid scanning is realized by means of the higher processing speed of the memory space.
It should be noted that, in general, the memory space is small, and if the byte number of the index information is large, the memory space is large. Even if the disk space for storing data is large, the memory space is not sufficient to accommodate the index information, which results in a waste of disk space. Therefore, by adopting the data organization form and the composition of the index information, the byte number of the index information is very small, and a large-capacity disk in the device can be more fully utilized under the condition that the index information is durably processed in the memory. Moreover, the effective index information can be stored in the persistence mode, and the loss of the effective index information is avoided.
In some optional implementations of any of the above embodiments of the data storage method of the present application, the target data set belongs to a total data set, the total data set includes at least one data set, each data set is stored in at least two directories, the number of files in each directory is a preset number, and the preset number is less than or equal to a preset file number threshold.
In these alternative implementations, the individual data in one database may constitute an aggregate set of data. There may be multiple data sets (i.e., multiple groups) in the aggregate set of data, which may include the target data set. Each data set may be stored under at least two directories, such as directory 0, directory 1, and directory 2. The number of files under each directory is predetermined and is relatively small. Specifically, the number of data files and the number of index files in the directory may be preset. For example, there may be 64 data files and 64 index files under the directory.
The implementation modes can determine a proper data organization mode, and optimize the utilization mode of the storage space. The provision of more than two directories avoids the situation where the number of contexts in the same directory is excessive. The data organization form can ensure that the byte number of the index information of the data is very small, thereby avoiding the problem that the space is excessively occupied by the index information. Particularly, under the condition that the index information is stored in the memory space, the storage utilization rate of the memory can be improved.
In a further alternative implementation, each data set is stored under two directories.
By accurately limiting the number of the directories, the number of bytes occupied by the directories in the index field can be controlled, and the problem of excessive file number caused by too few directories is avoided.
In a further alternative implementation, the determining index information of the data in the target data set in step 302 may include: forming index information by combining keys of data and at least one of the following items: the data processing method comprises the steps of obtaining the serial number of a data file where data are located, the length of the data, the offset address of the data and the serial number of a directory corresponding to the data, wherein the serial number is used for representing the arrangement sequence of each file under the directory.
In these further alternative implementations, the execution body may determine index information including the key. In particular, as shown in FIG. 3b, the index information may be organized into the structure shown in the figure. Where dir represents the directory number corresponding to the data. Here, the data corresponding to a directory is a piece of data written in a file under the directory. fno denotes the number of the data file under the directory where one piece of data is located. A value range may be set for the sequence number to limit the number of files in the directory. off denotes an offset address of the data, and the actual storage address of the data can be determined from the offset address. len denotes the length of one piece of data. Under the same directory, the sequence numbers of the data files are continuous, and the sequence numbers of the index files are continuous.
These further alternative implementations have the index information contain very complete addressing data to facilitate finding the data it indicates through the index information. Meanwhile, the index information is formed by the components, the byte number of the obtained index information is very small, and the index information can further occupy a small space.
In a further optional implementation manner, the method may further include the following steps:
receiving a data rewriting instruction; and rewriting effective data under the target directory according to the sequence of the sequence numbers by taking the data file with the maximum sequence number or the minimum sequence number under the preset target directory in the at least two directories as a starting point to obtain rewritten data.
The data storage method may further include the steps of:
receiving a data updating instruction; and taking the data file of the other one of the largest sequence number and the smallest sequence number under the target directory as a starting point, and adding the updating data under the target directory according to the sequence of the sequence numbers.
In these alternative implementations, the execution body may receive a data rewrite instruction, and rewrite valid data in order of the size of the sequence number to obtain rewrite data. For example, there are 64 data files under the directory, and the sequence numbers of the data files are 0-63, respectively. The data may be rewritten starting from the 0 th file under the target directory. Valid data is data other than invalid data. Invalid data may refer to data that has reached an expiration time, data that has a message indicating that it is invalid, or data that already has corresponding updated data. In the process of updating the data, the original data is not deleted, and only the new data is added. For example, the original data X is 1, and in the updating process, the updated data X is added to 2, and the original data is not deleted.
In order to ensure that the rewriting data and the updating data can be performed in the same directory, and the processes do not affect each other, the process of updating data starts with the data file of the other one of the largest sequence number and the smallest sequence number. For example, the rewriting data is started from data file 0, and the next file to be rewritten is data file 1. The update data may start from data file 63 and the next file to which the update data is written is data file 62.
As shown in fig. 4, fig. 4 illustrates that update data and rewrite data are performed in one directory, and update indexes and rewrite indexes are performed.
The further implementation modes realize mutual noninterference of the data updating and the rewriting data in different files, and omit the step of mutually exclusive rewriting and updating in advance, thereby improving the rewriting and updating efficiency of the data.
In a further optional implementation manner, the rewriting valid data in the target directory according to the size order of the sequence number may include:
and in response to the fact that the current data file obtained by rewriting reaches a preset file size threshold, rewriting a next data file of the current data file according to the size sequence of the sequence numbers.
In these further implementations, the execution agent rewrites data in the current data file. If the written data enables the file size to reach a preset file size threshold, for example, 2G, the execution main body may switch to the next data file adjacent to the lower sequence number of the same directory to start the rewriting process. For example, data is currently being overwritten in data file c, and the next file to be overwritten may be data file d.
These further implementations may control the size of each data file under the directory to achieve a more uniform file size for each data file resulting from the rewriting.
In a further optional implementation manner, the adding update data under the target directory according to the size order of the sequence numbers includes: and in response to the fact that the size of the current data file obtained through updating reaches a preset file size threshold, adding updating data to the next data file of the current data file according to the sequence of the sequence numbers.
In these further implementations, the execution agent writes the update data in the current data file. And under the condition that the written data enables the file size to reach the preset file size threshold, switching to the next data file adjacent to the lower sequence number of the same directory to start the updating process.
These further implementation manners may control the size of each data file under the directory, so as to achieve that the file sizes of each updated data file are relatively uniform.
In a further optional implementation manner, before the rewriting of the valid data in the target directory is completed, the operation of adding the updated data in the target directory is executed; or completing the operation of rewriting the valid data in the target directory before completing the addition of the updated data in the target directory.
In these further implementations, there may be an overlap in execution time of the rewriting data and the updating data. That is, under the same directory, data updating and data rewriting can be simultaneously performed.
The further implementation modes can realize data updating and data rewriting in the same directory on the basis that the starting points of the updating data and the rewriting data are respectively in different files. Therefore, the time occupied by data updating and data rewriting can be shortened by simultaneously carrying out two data processing, and the data processing efficiency is improved.
In a further alternative implementation, the method may further include: determining index information of the updated data, and writing the index information into a target directory; and determining the index information of the rewritten data and writing the rewritten data into the target directory.
In these further implementations, the execution principal may determine index information of the update data and write the index information of the update data to the target directory. The execution main body can also determine the index information of the rewriting data and write the index information of the rewriting data into the target directory. The number of the files where the index information is located may also be a certain preset number, so as to reduce the number of the files where the index is located and avoid that the files themselves occupy too much space.
In practice, the index information of the update data and the index information of the overwrite data may be written in different files, respectively. For example, the index information defining the update data may be written in index files 0 to 63 under the target directory. The index information of the overwrite data can be written in the index file 64 to the index file 127 under the target directory.
These further implementations may determine index information for the overwrite data and the update data, respectively, so as to accurately determine the location of the data by overwriting the index information of the data and updating the index information of the data, respectively.
In a further optional implementation manner, the method may further include: and migrating data under other directories except the target directory to the target directory.
In these further implementations, the execution entity may perform data migration to migrate data in a directory other than the target directory to the target directory. The target directory may be a fixed directory, or may be rotated among directories.
In practice, the data migration process can be performed before rewriting and updating the data, so that data rewriting and updating can be performed on the data in one directory only, multi-thread processing on a plurality of directories is avoided, and the data processing process is simplified.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of a data storage device, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.
As shown in fig. 5, the data storage device 500 of the present embodiment includes: a determination unit 501 and an addition unit 502. The determining unit 501 is configured to determine index information of data in a target data set, where the target data set includes a plurality of pieces of data, and each piece of data includes a key and a value; an adding unit 502 configured to add index information of each piece of data to the target data set.
In some embodiments, the determination unit 501 of the data storage device 500 may determine index information of each piece of data. The target data set is stored in the form of key-value pairs. One index information indicates a position where one piece of data is stored, and the data indicated by the index information can be found through the index information.
In some embodiments, the adding unit 502 may add the determined index information of each piece of data to the target data set. Thus, the target data set includes the index information, and the data indicated by each index information.
In some optional implementations of this embodiment, the target data set belongs to a total data set, where the total data set includes at least one data set, each data set is stored in at least two directories, the number of files in each directory is a preset number, and the preset number is less than or equal to a preset file number threshold.
In some optional implementations of this embodiment, determining index information of data in the target data set includes: composing the key and at least one of the following into index information: the data processing method comprises the steps of obtaining the serial number of a data file where data are located, the length of the data, the offset address of the data and the serial number of a directory corresponding to the data, wherein the serial number is used for representing the arrangement sequence of each file under the directory.
In some optional implementations of this embodiment, the apparatus further includes: and the scanning unit is configured to scan the index information of each piece of data in response to receiving an operation instruction on the target data set.
In some optional implementations of this embodiment, the adding unit: the index information of each piece of data is written into an index file under a directory where the target data set is located, wherein the index file and the data file corresponding to each other are stored under the directory.
In some optional implementations of this embodiment, the apparatus further includes: and the writing memory unit is configured to write effective index information indicating effective data in the index file into the memory space.
In some optional implementations of the embodiment, the scanning unit is configured to perform scanning the index information of each piece of data in response to receiving the operation instruction on the target data set as follows: scanning the effective index information of each piece of data in the memory space in response to receiving one of the following instructions for the target data set: searching for an instruction, modifying the instruction, deleting the instruction, adding a data instruction and reading the instruction; and in response to receiving the rewriting instruction of the target data set, scanning the index file and effective index information of each piece of data in the memory space.
In some optional implementations of this embodiment, the apparatus is applied to a target electronic device, where the target data set is stored in the target electronic device, and the apparatus further includes: the persistent unit is configured to respond to receiving a closing instruction of the target electronic equipment and carry out persistent processing on the effective index information in the memory space on the disk; and the apparatus further comprises: and the writing unit is configured to write the valid index information obtained through persistence into the memory space in response to receiving a starting instruction of the target electronic equipment.
In some optional implementations of the embodiment, the scanning unit is configured to perform scanning the index information of each piece of data in response to receiving the operation instruction on the target data set as follows: in response to receiving an operation instruction on the target data set, scanning index information of each piece of data in the index file.
In some alternative implementations of the present embodiment, each data set is stored under two directories.
In some optional implementations of this embodiment, the apparatus further includes: a first receiving unit configured to receive a data rewriting instruction; the rewriting unit is configured to take a data file with the maximum sequence number or the minimum sequence number under a preset target directory in at least two directories as a starting point, and rewrite effective data under the target directory according to the sequence of the sequence numbers to obtain rewritten data; a second receiving unit configured to receive a data update instruction; and an updating unit configured to add update data under the target directory in order of the size of the sequence numbers, with a data file of the other of the largest sequence number and the smallest sequence number under the target directory as a starting point.
In some optional implementations of this embodiment, before completing rewriting the valid data in the target directory, an operation of adding the update data in the target directory is performed; or completing the operation of rewriting the valid data in the target directory before completing the addition of the updated data in the target directory.
In some optional implementations of this embodiment, the rewriting unit is further configured to perform rewriting valid data under the target directory in order of size of the sequence number as follows: and in response to the fact that the size of the current data file obtained by rewriting reaches a preset file size threshold, rewriting a next data file of the current data file according to the size sequence of the sequence numbers.
In some optional implementations of this embodiment, the updating unit is further configured to perform adding the update data under the target directory in the order of the size of the sequence number as follows: and in response to the fact that the size of the current data file obtained through updating reaches a preset file size threshold, adding updating data to the next data file of the current data file according to the sequence of the sequence numbers.
In some optional implementations of this embodiment, the apparatus further includes: a first index determining unit configured to determine index information of the update data and write the index information into the target directory; and a second index determining unit configured to determine index information of the overwrite data and write the overwrite data to the target directory.
In some optional implementations of this embodiment, the apparatus further includes: and the migration unit is configured to migrate data in other directories except the target directory to the target directory.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: an input device 606; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a determination unit and an addition unit. Where the names of these units do not in some cases constitute a limitation on the units themselves, for example, the determination unit may also be described as a "unit that determines index information of data in the target data set".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: determining index information of data in a target data set, wherein the target data set comprises a plurality of pieces of data, and each piece of data comprises a key and a value; and adding the index information of each piece of data to the target data set.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (17)

1. A method of data storage, comprising:
determining index information of data in a target data set, wherein the target data set comprises a plurality of pieces of data, and each piece of data comprises a key and a value;
adding index information of each piece of data to the target data set;
the adding the index information of each piece of data to the target data set includes:
writing the index information of each piece of data into an index file under a directory where the target data set is located, wherein the directory stores corresponding index files and data files;
the method further comprises the following steps:
scanning index information in the index file in response to starting or before garbage data cleaning;
updating data and rewriting data of data files in the same directory are carried out in different data files, the updating data and the rewriting data are overlapped in time, and the starting point is in different data files.
2. The method of claim 1, wherein the target data set belongs to a data aggregate set, the data aggregate set comprising at least one data set, each data set being stored under at least two directories, the number of files under each directory being a preset number, the preset number being less than or equal to a preset file number threshold.
3. The method of claim 2, wherein the determining index information for data in the target data set comprises:
composing the key and at least one of the following into index information: the method comprises the steps of obtaining the serial number of a data file where data are located, the length of the data, the offset address of the data and the serial number of a directory corresponding to the data, wherein the serial number is used for representing the arrangement sequence of each file under the directory.
4. The method of claim 1, wherein the method further comprises:
and scanning index information of each piece of data in response to receiving an operation instruction on the target data set.
5. The method of claim 4, wherein the method further comprises:
and writing effective index information indicating effective data in the index file into a memory space.
6. The method of claim 4, wherein the scanning index information of each piece of data in response to receiving an operation instruction on the target data set comprises:
scanning for valid index information of each piece of data in the memory space in response to receiving one of the following instructions for the target data set: searching for an instruction, modifying the instruction, deleting the instruction, adding a data instruction and reading the instruction;
and scanning the index file and the effective index information of each piece of data in the memory space in response to receiving the rewriting instruction of the target data set.
7. The method of claim 6, wherein the method is applied to a target electronic device, the target data set being stored at the target electronic device, the method further comprising:
in response to receiving a closing instruction of the target electronic equipment, performing persistence processing on effective index information in a memory space on a magnetic disk; and
the method further comprises the following steps:
and writing the effective index information obtained through persistence into the memory space in response to receiving a starting instruction of the target electronic equipment.
8. The method of claim 2, wherein each data set is stored under two directories.
9. The method of claim 2, wherein the method further comprises:
receiving a data rewriting instruction;
taking a data file with the maximum sequence number or the minimum sequence number under a preset target directory in the at least two directories as a starting point, and rewriting effective data under the target directory according to the sequence of the sequence numbers to obtain rewritten data;
receiving a data updating instruction;
and taking the data file of the other one of the largest sequence number and the smallest sequence number under the target directory as a starting point, and adding updating data under the target directory according to the sequence of the sequence numbers.
10. The method of claim 9, wherein the operation of adding update data under the target directory is performed before the overwriting of valid data under the target directory is completed; or
And before the update data is added under the target directory, performing the operation of rewriting the valid data under the target directory.
11. The method of claim 9, wherein the overwriting of valid data under the target directory in order of size of sequence number comprises:
and in response to the fact that the size of the current data file obtained by rewriting reaches a preset file size threshold, rewriting a next data file of the current data file according to the size sequence of the sequence numbers.
12. The method of claim 9, wherein adding update data under the target directory in order of the size of the sequence number comprises:
and in response to the fact that the size of the current data file obtained through updating reaches a preset file size threshold, adding updating data to a data file next to the current data file according to the sequence of the sequence numbers.
13. The method of claim 9, wherein the method further comprises:
determining index information of the updated data, and writing the index information into the target directory;
and determining the index information of the rewritten data and writing the index information into the target directory.
14. The method of claim 9, wherein the method further comprises:
and migrating data under other directories except the target directory to the target directory.
15. A data storage device comprising:
a determination unit configured to determine index information of data in a target data set, wherein the target data set includes a plurality of pieces of data, each piece of data including a key and a value;
an adding unit configured to add index information of each piece of data to the target data set;
the adding unit: the index information of each piece of data is written into an index file under a directory where the target data set is located, wherein the directory stores corresponding index files and data files;
the apparatus is further configured to:
updating data and rewriting data of data files in the same directory are carried out in different data files, the updating data and the rewriting data are overlapped in time, and the starting point is in different data files.
16. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-14.
17. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-14.
CN201910261866.0A 2019-04-02 2019-04-02 Data storage method and device Active CN109947709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910261866.0A CN109947709B (en) 2019-04-02 2019-04-02 Data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910261866.0A CN109947709B (en) 2019-04-02 2019-04-02 Data storage method and device

Publications (2)

Publication Number Publication Date
CN109947709A CN109947709A (en) 2019-06-28
CN109947709B true CN109947709B (en) 2021-10-08

Family

ID=67013411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910261866.0A Active CN109947709B (en) 2019-04-02 2019-04-02 Data storage method and device

Country Status (1)

Country Link
CN (1) CN109947709B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110505039B (en) * 2019-09-26 2022-04-01 北京达佳互联信息技术有限公司 Data transmission control method, device, equipment and medium
CN110765076B (en) * 2019-10-25 2023-04-21 北京奇艺世纪科技有限公司 Data storage method, device, electronic equipment and storage medium
CN111241108B (en) * 2020-01-16 2023-12-26 北京百度网讯科技有限公司 Key value based indexing method and device for KV system, electronic equipment and medium
CN114253908A (en) * 2020-09-23 2022-03-29 华为云计算技术有限公司 Data management method and device of key value storage system
CN112491857B (en) * 2020-11-20 2023-05-02 北京人大金仓信息技术股份有限公司 Method, device and equipment for transmitting set type data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102323947A (en) * 2011-09-05 2012-01-18 东北大学 Generation method of pre-join table on ring-shaped schema database
CN103309950A (en) * 2013-05-22 2013-09-18 苏州雄立科技有限公司 Searching method for key value
CN104182508A (en) * 2014-08-19 2014-12-03 华为技术有限公司 Data processing method and data processing device
CN107704604A (en) * 2017-10-16 2018-02-16 中汇信息技术(上海)有限公司 A kind of information persistence method, server and computer-readable recording medium
CN108388569A (en) * 2018-01-09 2018-08-10 杭州电子科技大学 A kind of system and method for building up of quick key value database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100834760B1 (en) * 2006-11-23 2008-06-05 삼성전자주식회사 Structure of index, apparatus and method for optimized index searching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102323947A (en) * 2011-09-05 2012-01-18 东北大学 Generation method of pre-join table on ring-shaped schema database
CN103309950A (en) * 2013-05-22 2013-09-18 苏州雄立科技有限公司 Searching method for key value
CN104182508A (en) * 2014-08-19 2014-12-03 华为技术有限公司 Data processing method and data processing device
CN107704604A (en) * 2017-10-16 2018-02-16 中汇信息技术(上海)有限公司 A kind of information persistence method, server and computer-readable recording medium
CN108388569A (en) * 2018-01-09 2018-08-10 杭州电子科技大学 A kind of system and method for building up of quick key value database

Also Published As

Publication number Publication date
CN109947709A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN109947709B (en) Data storage method and device
CN107870728B (en) Method and apparatus for moving data
CN110751275B (en) Graph training system, data access method and device, electronic device and storage medium
CN109213694B (en) Method and apparatus for cache management
CN106170757B (en) A kind of date storage method and device
CN114785685B (en) Software differential upgrading method and device, electronic equipment and readable storage medium
CN113395353A (en) File downloading method and device, storage medium and electronic equipment
CN112487009B (en) Data updating method, device, equipment, storage medium and program product
US9460137B2 (en) Handling an increase in transactional data without requiring relocation of preexisting data between shards
CN111694703B (en) Cache region management method and device and computer equipment
CN104618445A (en) Method and device for arranging files based on cloud storage space
CN114817146A (en) Method and device for processing data
CN114625695A (en) Data processing method and device
US9471246B2 (en) Data sharing using difference-on-write
CN113656100A (en) Interface switching method and device, electronic device and computer program product
CN110858201A (en) Data processing method and system, processor and storage medium
CN113886350A (en) Data processing method and system
CN110300222B (en) Short message display method, system, terminal equipment and computer readable storage medium
CN109614383B (en) Data copying method and device, electronic equipment and storage medium
CN113127438A (en) Method, apparatus, server and medium for storing data
CN113032349A (en) Data storage method and device, electronic equipment and computer readable medium
CN109213815B (en) Method, device, server terminal and readable medium for controlling execution times
CN110113416B (en) Method and device for displaying information
US11366613B2 (en) Method and apparatus for writing data
CN111459893B (en) File processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190628

Assignee: Beijing Intellectual Property Management Co.,Ltd.

Assignor: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Contract record no.: X2023110000096

Denomination of invention: Data storage methods and devices

Granted publication date: 20211008

License type: Common License

Record date: 20230821

EE01 Entry into force of recordation of patent licensing contract