CN112732174B - Data processing method and device, electronic equipment and storage medium - Google Patents
Data processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112732174B CN112732174B CN202011561924.0A CN202011561924A CN112732174B CN 112732174 B CN112732174 B CN 112732174B CN 202011561924 A CN202011561924 A CN 202011561924A CN 112732174 B CN112732174 B CN 112732174B
- Authority
- CN
- China
- Prior art keywords
- data
- record
- page
- sequence number
- association information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000003780 insertion Methods 0.000 claims abstract description 33
- 230000037431 insertion Effects 0.000 claims abstract description 33
- 238000004891 communication Methods 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1009—Address translation using page tables, e.g. page table structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a data processing method and device, electronic equipment and storage medium, wherein the method comprises the following steps: receiving a first insertion instruction, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node; determining first association information of the first record, wherein the first association information is used for representing a storage position of the first record on the first node; generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record; the first record is inserted into a first data page, wherein the first record contains first data and a first row identifier. The application solves the problem of increased server resource load caused by the need of using I/O resources in the mode of distributing row identifiers in the related technology.
Description
Technical Field
The present application relates to the field of data processing, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
Currently, databases (e.g., mySQL) generate rowid by way of counter accumulation, save to global variables by maintaining a self-incrementing counter (accumulator), take a count from the counter when inserting a data, and multiple data tables share a counter.
In order to ensure the reasonability of the rowid allocation after downtime, a preset increment step length can be adopted, a value increased according to the increment step length is stored into a file (a file in a disk), and the file is updated regularly, so that the increment of data is ensured.
For example, the increment step is 200, firstly, 200 is written in the counting file, which indicates that the rowid of the current allocation is less than or equal to 200, and when the value of the counter is increased to 200, 400 is written in the counting file, which indicates that the rowid of the current allocation is less than or equal to 400. After the server is down, the value in the file can be directly read, and rowid is distributed according to the read value.
However, in the above manner of allocating row identifiers, a file stored in a disk is required to update a record value in the file, and I/O (Input/output) resources are required to be used, so that a server resource load is increased.
Disclosure of Invention
The application provides a data processing method and device, electronic equipment and a storage medium, which at least solve the problem of increased server resource load caused by the need of using I/O resources in a mode of distributing row identifiers in related technologies.
According to an aspect of an embodiment of the present application, there is provided a data processing method, including: receiving a first insertion instruction, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node; determining first association information of the first record, wherein the first association information is used for representing a storage position of the first record on the first node; generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record; and inserting the first record into the first data page, wherein the first record contains the first data and the first row identifier.
Optionally, determining the first association information of the first record includes: determining a node sequence number of the first node, a data table sequence number of the first data table on the first node and a data page sequence number of the first data page in the first data table; determining a page offset of the first record in the first data page, wherein the first association information includes: the node sequence number, the data table sequence number, the data page sequence number and the page offset.
Optionally, determining the page offset of the first record in the first data page includes: acquiring a starting position of a previous record of the first record and a record length of the previous record in the first data page; and determining the page offset of the first record in the first data page according to the starting position of the previous record and the record length of the previous record.
Optionally, generating the first row identifier corresponding to the first record according to the first association information includes: initializing an initial row identifier comprising a plurality of bytes; writing each piece of sub-association information of the first association information into a corresponding byte position in the initial row identifier to obtain the first row identifier, wherein the first association information comprises the following piece of sub-association information: the node sequence number, the data table sequence number, the data page sequence number and the page offset.
Optionally, writing the sub-association information of the first association information onto the corresponding byte in the initial row identifier includes: writing the node serial number into a first byte position of the initial line identifier by adopting a big endian mode according to the byte number allocated for the node serial number; writing the data table sequence number into a second byte position of the initial line identifier by adopting a small end sequence mode according to the number of bytes which are allocated for the data table sequence number and the data page sequence number, and writing the data page sequence number into a third byte position of the initial line identifier by adopting a large end sequence mode; and writing the page offset into a fourth byte position of the initial row identifier by adopting a large endian mode according to the byte number allocated for the page offset.
Optionally, after writing the data page sequence number to the third byte position of the initial row identifier in a big endian mode, the method further comprises: acquiring at least one bit of data which is not stored in the data page sequence number under the condition that the bit number required for storing the data page sequence number is greater than the total bit number of bytes allocated for the data page sequence number; writing the at least one bit of data onto at least one target data bit of the second byte position, wherein the at least one target data bit is one or more consecutive data bits of the second byte position adjacent to the third byte position.
Optionally, after inserting the first record into the first data page, the method further comprises: receiving a target update instruction, wherein the target update instruction is used for updating first data recorded by the first record into target data; and responding to the target updating instruction, and updating the data recorded by the first record into target data, wherein the line identifier of the first record is the first line identifier.
Optionally, after inserting the first record into the first data page, the method further comprises: receiving a target deleting instruction, wherein the target deleting instruction is used for deleting the data recorded by the first record; and deleting the data recorded by the first record and the first row identifier in response to the target deleting instruction.
Optionally, the method further comprises: after receiving a first insertion instruction and before inserting the first record into the first data page, receiving a second insertion instruction, wherein the second insertion instruction is used for inserting a second record corresponding to second data into a second data page, and the second data page is one data page in a second data table on a second node; determining second association information of the second record, wherein the second association information is used for representing a storage position of the second record on the second node; generating a second row identifier corresponding to the second record according to the second association information, wherein the second row identifier is used for uniquely identifying the second record; and inserting the second record into the second data page, wherein the second record contains the second data and the second association information.
According to another aspect of the embodiment of the present application, there is also provided a data processing apparatus, including: the first receiving unit is used for receiving a first inserting instruction, wherein the first inserting instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node; a first determining unit, configured to determine first association information of the first record, where the first association information is used to characterize a storage location of the first record on the first node; the first generation unit is used for generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record; and a first inserting unit, configured to insert the first record into the first data page, where the first record includes the first data and the first row identifier.
Optionally, the first determining unit includes: a first determining module, configured to determine a node sequence number of the first node, a data table sequence number of the first data table on the first node, and a data page sequence number of the first data page in the first data table; a second determining module, configured to determine a page offset of the first record in the first data page, where the first association information includes: the node sequence number, the data table sequence number, the data page sequence number and the page offset.
Optionally, the second determining module includes: an obtaining sub-module, configured to obtain a start position of a previous record of the first record and a record length of the previous record in the first data page; a determining sub-module, configured to determine the page offset of the first record in the first data page according to the start position of the previous record and the record length of the previous record.
Optionally, the first generating unit includes: an initialization module for initializing an initial line identifier comprising a plurality of bytes; the writing module is used for writing each piece of sub-association information of the first association information into a corresponding byte position in the initial row identifier to obtain the first row identifier, wherein the first association information comprises the following piece of sub-association information: the node sequence number, the data table sequence number, the data page sequence number and the page offset.
Optionally, the writing module includes: the first writing sub-module is used for writing the node serial numbers into the first byte positions of the initial row identifiers by adopting a large endian mode according to the byte numbers allocated to the node serial numbers; the second writing sub-module is used for writing the data table sequence number into a second byte position of the initial line identifier by adopting a small end sequence mode according to the byte number which is allocated for the data table sequence number and the data page sequence number, and writing the data page sequence number into a third byte position of the initial line identifier by adopting a large end sequence mode; and the third writing sub-module is used for writing the page offset into a fourth byte position of the initial line identifier by adopting a large endian mode according to the byte number allocated for the page offset.
Optionally, the apparatus further comprises: an obtaining unit, configured to obtain, after writing the data page sequence number to the third byte position of the initial line identifier in a big endian mode, at least one bit of data that is not stored in the data page sequence number, if a bit number required for storing the data page sequence number is greater than a total bit number of bytes allocated for the data page sequence number; a writing unit for writing the at least one bit of data onto at least one target data bit of the second byte position, wherein the at least one target data bit is one or more consecutive data bits of the second byte position adjacent to the third byte position.
Optionally, the apparatus further comprises: a second receiving unit configured to receive a target update instruction after inserting the first record into the first data page, where the target update instruction is configured to update first data recorded by the first record into target data; and the updating unit is used for responding to the target updating instruction and updating the data recorded by the first record into target data, wherein the line identifier of the first record is the first line identifier.
Optionally, the apparatus further comprises: a third receiving unit configured to receive a target deletion instruction after inserting the first record into the first data page, wherein the target deletion instruction is used to delete data recorded by the first record; and the deleting unit is used for responding to the target deleting instruction and deleting the data recorded by the first record and the first row identifier.
Optionally, the apparatus further comprises: a fourth receiving unit, configured to receive a second insertion instruction after receiving the first insertion instruction and before inserting the first record into the first data page, where the second insertion instruction is used to insert a second record corresponding to second data in a second data page, and the second data page is one data page in a second data table on a second node; a second determining unit, configured to determine second association information of the second record, where the second association information is used to characterize a storage location of the second record on the second node; a second generating unit, configured to generate, according to the second association information, a second row identifier corresponding to the second record, where the second row identifier is used to uniquely identify the second record; and a second inserting unit, configured to insert the second record into the second data page, where the second record includes the second data and the second association information.
According to still another aspect of the embodiments of the present application, there is provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein the memory is used for storing a computer program; a processor for performing the method steps of any of the embodiments described above by running the computer program stored on the memory.
According to a further aspect of the embodiments of the present application there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the method steps of any of the embodiments described above when run.
In the embodiment of the application, a mode of replacing a counter by a logic position is adopted, and a first insertion instruction is received, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node; determining first association information of the first record, wherein the first association information is used for representing a storage position of the first record on the first node; generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record; the first record is inserted into the first data page, wherein the first record contains the first data and the first row identifier, and as a rowid is generated for the record according to the recorded association information instead of taking a value from a counter, the rowid does not need to be stored in a file on a disk, and the record in the file is updated, the purpose of not accessing I/O resources can be achieved; in addition, as the recorded position is known, a CPU (Central Processing Unit, a central processing unit) is not needed to calculate (for example, calculate the value taken from a counter and calculate the value updated to a disk), so that the purpose of saving CPU resources can be realized, the technical effect of reducing the load of server resources is achieved, and the problem that the load of server resources is increased due to the fact that I/O resources are needed to be used in a mode of distributing row identifiers in the related art is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of a hardware environment of an alternative data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of an alternative method of processing data according to an embodiment of the application;
FIG. 3 is a schematic diagram of an alternative data page according to an embodiment of the application;
FIG. 4 is a schematic diagram of an alternative data page storage mode according to an embodiment of the present application;
FIG. 5 is a flow chart of another alternative method of processing data according to an embodiment of the application;
FIG. 6 is a block diagram of an alternative data processing apparatus according to an embodiment of the present application;
Fig. 7 is a block diagram of an alternative electronic device in accordance with an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of an embodiment of the present application, a method for processing data is provided. Alternatively, in the present embodiment, the above-described data processing method may be applied to a hardware environment constituted by the terminal 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal 102 through a network, and may be used to provide services (such as game services, application services, etc.) to the terminal or clients installed on the terminal, and a database may be provided on the server or independent of the server, for providing data storage services to the server 104.
The network 104 includes, but is not limited to, at least one of: a wired network, a wireless network, which may include, but is not limited to, at least one of: a wide area network, metropolitan area network, or local area network, which may include, but is not limited to, at least one of the following: bluetooth, WIFI (WIRELESS FIDELITY ) and other networks that enable wireless communications. The terminal 102 may be a terminal for calculating data, such as a mobile terminal (e.g., a mobile phone, a tablet computer), a notebook computer, a PC, etc. The server may include, but is not limited to, any hardware device that can perform the calculations.
The data processing method according to the embodiment of the present application may be performed by the server 104, or may be performed by other devices capable of providing a database service. Taking the example of the data processing method performed by the server 104 (the local host of the database) in this embodiment, fig. 2 is a schematic flow chart of an alternative data processing method according to an embodiment of the present application, as shown in fig. 2, the flow of the method may include the following steps:
Step S202, a first insert instruction is received, where the first insert instruction is used to insert a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node.
The data processing method in this embodiment may be applied to a scenario in which a row identifier (rowid) is allocated to a record in a database (for example, mySQL). For other scenarios requiring row identification, the data processing method in this embodiment is also applicable.
The server may receive a first Insert instruction to Insert a first record corresponding to first data in a first data page, the first data page being one of the first data pages in a first data table on the first node.
The first Node may be a DN (Data Node), which is a single-Node database. In the distributed database, the first node may be one of a plurality of nodes of the distributed database. The first node may include a plurality of data tables, and the data table into which the first record is to be inserted is the first data table. The first data table may contain a plurality of data pages, and the first record is inserted into the first data page in the first data table.
For example, in a database (e.g., a distributed database), the data store is a block store in the format of page pages, as shown in Table 1:
TABLE 1
Page1 | Page2 | Page3 | Page4 | Page5 | Page6 | Page7 | Page8 | …… |
16k | 16k | 16k | 16k | 16k | 16k | 16k | 16k | …… |
Table 1 shows a memory area of table structure data, which contains a plurality of pages (data pages), each page defaulting to 16k. The first record may be inserted into a data page (e.g., page 200) in a data table.
Step S204, determining first association information of the first record, where the first association information is used to characterize a storage location of the first record on the first node.
For a first record, the server may determine association information for the record, i.e. the first association information, which is used to characterize the storage location of the first record on the first node, e.g. which data page is stored in which data table, and also e.g. at what location of a certain data page.
The above-described storage location may be a logical location where the first record is stored, and may be represented by all or part of a data table, a data page, and a page offset (record offset) recorded in the data page. Correspondingly, the first association information may include, but is not limited to, at least one of: an identification of a data table (e.g., a logical sequence number of the data table), an identification of a data page (e.g., a logical sequence number of the data page), a page offset, etc.
In step S206, according to the first association information, a first row identifier corresponding to the first record is generated, where the first row identifier is used to uniquely identify the first record.
According to the first association information, the server may generate a first row identification (rowid) corresponding to the first record, the first row identification being used to uniquely identify the first record. Since the record's association information is used to characterize the storage locations of the records on the node, the storage locations of the different records are different, and so the row identification generated from the association information of the different records is also different, so that the record can be uniquely identified.
The first association information may include a plurality of sub-association information, for example, a data table identifier, a data page identifier, a page offset, etc., and the manner of generating the first row identifier according to the first association information may be: and generating the first row identification according to the pre-configured sequence of different sub-association information and the space size occupied by each sub-association information.
For example, the first row identification of the first record may be generated in the order of the data table identification, the data page identification, and the page offset.
In step S208, the first record is inserted into the first data page, where the first record includes the first data and the first row identifier.
After obtaining the first row identifier, the server may insert the first row identifier into the first record, that is, into the location of the row identifier in the record, to obtain the first record, where the obtained first record may include the first data and the first row identifier.
For the first record, the server may insert it into the first data page. The manner of inserting the record into the data page may refer to the related art, and this will not be described in detail in this embodiment.
Through the steps S202 to S208, a first insertion instruction is received, where the first insertion instruction is used to insert a first record corresponding to first data in a first data page, where the first data page is one data page in a first data table on a first node; determining first association information of the first record, wherein the first association information is used for representing a storage position of the first record on the first node; generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record; the first record is inserted into the first data page, wherein the first record contains the first data and the first row identifier, so that the problem that the load of server resources is increased due to the fact that I/O resources are needed to be used in a mode of distributing row identifiers in the related art is solved, and the load of server resources is reduced.
As an alternative embodiment, determining the first association information of the first record comprises:
S11, determining a node sequence number of a first node, a data table sequence number of a first data table on the first node and a data page sequence number of a first data page in the first data table;
S12, determining page offset of a first record in a first data page, wherein the first association information comprises: node number, data table number, data page number, page offset.
In the related art, rowid of each node in the distributed database is accumulated from 1, which may cause data repetition between different nodes, and if there is a system capacity reduction (data of different nodes are stored on one node), there may be a case that rowid corresponds to a plurality of records.
Optionally, in this embodiment, the recorded row identifier may be generated according to the node sequence number, the data table sequence number, the data page sequence number and the page offset, and the node information is added to the rowid by recording the position of the node, so that the rowid between different nodes may be ensured not to be repeated in the distributed database, and the problem of repeated row identifiers caused by system capacity reduction and the like is avoided.
In the distributed database, each node can have a serial number, and when the distributed database is started, a global variable can be initialized to record node information (node serial number), and different nodes can be distinguished through the node serial numbers.
Each data table may have a logical sequence number, and different data tables on the same node may be distinguished by the data table sequence number. The data storage in the data table can be block storage in the format of page pages, each page has a logic sequence number (the sequence number of the page), the record is pageNo, different data pages in the same data table can be distinguished through pageNo, further record offsets in different page information are distinguished, and the duplication of rowid is avoided.
The starting position of the storage location recorded within a data page is the page offset recorded in the data page, in one data page the page offsets of the different recordings are different, and different recordings in the same data page can be distinguished by the page offset of the recording.
For the first record, the server may determine a node number of the first node, a data table number of the first data table on the first node, a data page number of the first data page in the first data table, and a page offset of the first record in the first data page, respectively, to obtain first association information of the first record. The resulting first association information may uniquely represent the first record.
According to the method and the device, the node serial numbers, the data table serial numbers, the data page serial numbers and the page offsets corresponding to the records are obtained, and as the associated information of the records contains the id of each node, the rowid of different nodes in the distributed database can be prevented from being repeated.
As an alternative embodiment, determining the page offset of the first record in the first data page comprises:
S21, acquiring a starting position of a previous record of a first record in a first data page and a record length of the previous record;
S22, determining the page offset of the first record in the first data page according to the starting position of the previous record and the record length of the previous record.
Page offset refers to an offset position of a stripe recorded within one data page, i.e., a starting position of a storage position recorded within the data page. The starting position of the first record typically starts from a specific position, which may be 120. The first 120-bit header of the data page may record some relevant information of some auxiliary records. The second record is the starting position of the first record plus the length of the first record, the third record is the starting position of the second record plus the length of the second record, and so on.
For example, as shown in the data page of fig. 3, the offset address of the first record is 120, and the record length is 20; the offset address of the second record is 140, and the record length is 50; the offset address of the third record is 190, and the record length is 20; the offset address of the fourth record is 210 and the record length is 30. For a record, the first few bytes may store the rowid of the record, followed by the data stored by the record.
For the first record, the server may determine the specific location as a page offset of the first record in the first data page if the first record is the first record of the first data page.
If the first record is not the first record of the first data page, the server may acquire a start position of a previous record of the first record and a record length of the previous record in the first data page, and determine the page offset of the first record in the first data page according to the start position of the previous record and the record length of the previous record, for example, determine a sum of the start position of the previous record and the record length of the previous record as the page offset of the first record in the first data page.
By the embodiment, the page offset of the current record is determined according to the page offset and the record length of the previous record of the current record, so that the accuracy of determining the page offset can be improved.
As an alternative embodiment, generating the first row identifier corresponding to the first record according to the first association information includes:
s31, initializing an initial row identifier containing a plurality of bytes;
S32, writing each piece of sub-association information of the first association information into a corresponding byte position in the initial row identifier to obtain the first row identifier, wherein the first association information comprises the following sub-association information: node number, data table number, data page number, page offset.
The sub-association information included in the first association information may include: node number, data table number, data page number, page offset. The server may generate a first row identifier corresponding to the first record based on the node number, the data table number, the data page number, and the page offset. The order in which the different sub-association information (node number, data table number, data page number, and page offset) is in the row identity and the number of bytes occupied may be pre-configured.
The number of bytes allowed to be occupied by the row identification and the byte position occupied by each piece of sub-association information can be preconfigured. To generate a first row identifier corresponding to the first record, the server may first initialize an initial row identifier including a plurality of bytes, the plurality of bytes including a number of bytes permitted to be occupied by the row identifier; then, each piece of sub-association information is written into a corresponding byte position in the initial row identifier, so that a first row identifier is obtained.
When the sub-association information is written, different sub-association information can be written sequentially or in parallel. If the sub-association information has an association relationship, writing is performed according to the sequence indicated by the association relationship, which is not limited in this embodiment.
According to the embodiment, the node serial number, the data table serial number, the data page serial number and the page offset are written into the corresponding byte positions in the initial line identification, so that the required line identification is obtained, the determination efficiency of the line identification can be improved, and the identification capacity of the line identification to the record can be improved.
As an optional embodiment, writing each piece of sub-association information of the first association information onto a corresponding byte in the initial row identifier includes:
S41, writing the node serial number into a first byte position of an initial row identifier by adopting a large end sequence mode according to the byte number allocated for the node serial number;
S42, according to the number of bytes allocated for the data table sequence number and the data page sequence number, writing the data table sequence number into a second byte position of the initial line identification by adopting a small end sequence mode, and writing the data page sequence number into a third byte position of the initial line identification by adopting a large end sequence mode;
s43, according to the number of bytes allocated for the page offset, the page offset is written into a fourth byte position of the initial row identifier in a big endian mode.
The server may use the same writing mode to write each sub-association information, where the writing mode may include, but is not limited to, at least one of: a large endian mode, a small endian mode.
The big endian mode (big endian mode) refers to: the high bytes of data are stored in the low addresses of the memory, while the low bytes of data are stored in the high addresses of the memory. This storage mode similarly treats data as a string sequence: the address increases from small to large, while the data is put from high to low.
The small endian mode (small endian mode) refers to: the high byte of the data is stored in the high address of the memory, the low byte of the data is stored in the low address of the memory, the storage mode can combine the high and low address with the data bit weight, the high address part weight is high, and the low address part weight is low.
For node sequence numbers and page offsets, the server may first convert the allocated number of bytes into a corresponding string form, storing it in bytes at corresponding byte positions.
For example, the node number is 2, the number of bytes allocated to it is 2, and the server can convert it into the corresponding string form: 00000000|00000010 and writes it to the corresponding two characters in big endian mode, 00000000 occupies the first character position and 00000010 occupies the second character position.
For the data table sequence number and the data page sequence, byte positions can be allocated to the data table sequence number and the data page sequence number as a whole, one or more bytes can be shared by the data table sequence number and the data page sequence number, and the byte positions occupied by the data table sequence number and the data page sequence number can be limited. For example, a data table sequence number occupies the first M bytes (or binary bits) and a data table sequence number occupies the next N bytes (or binary bits).
In order to ensure the flexibility of configuration of the number of data tables and the number of data pages, the server may write the sequence number of the data table to the second byte position of the initial line identifier in a small endian mode, and write the sequence number of the data page to the third byte position of the initial line identifier in a large endian mode. The second byte position and the third byte position may be complete byte positions or partial byte positions.
For example, 4 bytes are allocated in total for the data table sequence number and the data page sequence number, wherein the first two bytes are allocated by default to the data table sequence number and the last two bytes are allocated by default to the data page sequence number. The number of bytes occupied by the data table sequence number and the data page sequence number can be adjusted according to the allocation instruction.
After the node serial number, the data table serial number, the data page serial number and the page offset are written into the corresponding byte positions in the initial line identification respectively, a first line identification corresponding to the first record can be obtained.
By adopting a plurality of data storage modes to write different sub-associated information, the flexibility of row sequence determination and the flexibility of configuration of the number of data tables and the number of data pages can be improved.
As an alternative embodiment, after writing the data page sequence number to the third byte position of the initial row identifier in the big endian mode, the method further includes:
s51, acquiring at least one bit of data which is not stored in the data page sequence number under the condition that the bit number required for storing the data page sequence number is greater than the total bit number of bytes allocated to the data page sequence number;
s52, writing at least one bit of data onto at least one target data bit of the second byte position, wherein the at least one target data bit is one or more consecutive data bits of the second byte position adjacent to the third byte position.
After writing the data page sequence number to the third byte position of the initial row identifier in the big endian mode, if the number of bits required to store the data page sequence number is greater than the total number of bits of the bytes allocated for the data page sequence number, then the third byte position stores a portion of the information of the data page sequence number, and there is at least one bit of data that is not stored.
For example, the default allocated byte number of the data page sequence number is 2 bytes, and the data page sequence number needs to be identified by a binary number of 20 bits, there is 4 bits of data not stored, and the 4 bits of data are high order data of the data page sequence number due to the large endian mode.
The server may retrieve at least one bit of data not stored in the data page sequence number and write it to at least one target data bit at the second byte position. The at least one target data bit is one or more consecutive data bits adjacent to the second byte position and the third byte position.
The second byte position is a byte position for storing a data table sequence number and may contain one or more bytes. If the second byte location contains a byte, at least one of the target data bits is the high data bit of the byte. If the second byte position comprises a plurality of bytes, the at least one target data bit is a high data bit of a byte of the plurality of bytes that is adjacent to the third byte position.
The number of the data bits of the at least one target data bit can be one or more, a threshold value of the number of the data bits allowed to be occupied by the data page sequence number can be configured according to the requirement, and the number of the data bits occupied by the data page sequence number does not exceed the threshold value; or the number of data bits occupied by the data page sequence number can be configured according to the requirement, and whether the data page sequence number is not stored or whether the number of bits of the data page sequence number which is not stored reaches the configured number or not, the data bits are occupied (0 can be supplemented when the bit data is insufficient).
For example, the server may generate rowid based on the node number, the data table number (table number), the data page number (page number, pageNo), and the page offset (record offset). The rowid generated includes three parts, respectively: a first portion corresponding to the node number, a second portion corresponding to the table number and the page number, and a third portion corresponding to the page offset. The order of the three parts may be: the node information i table sequence number and PageNo page offset, the number of bytes required to store rowid is 8 bytes.
In generating rowid, the server may define an 8 byte length string to store information for each portion of rowid, where rowid is defined as: unsigned char rowid [8].
The node number N, stored in the first part of rowid, is defined as:
rowid [0] = (unsigned char), a value obtained by right-shifting a binary number of N by 8 bits, or a value obtained by dividing by 2 8 is stored (N > > 8);
rowid [1] = (unsigned char), store (N), i.e., the first 8 bits of N.
The first part is the first two bytes of rowid and can be applied to a system of 65536 nodes at most.
The first two bytes (or the first nibble) stored in the second part of rowid, with table number T, are generated using a small endian mode, defined as follows:
rowid [2] = (unsigned char), store T, the first 8 bits of T;
rowid [3] = (unsigned char), a value obtained by right-shifting the binary number of T by 8 bits is stored (T > > 8).
The table sequence number occupies at most 12 bits of the second part and if it exceeds 12 bits it may be covered (or directly covered) thereby weakening the existence of the table.
The page sequence number P, the last two bytes (or last two nibbles) stored in the second portion of rowid, are generated using a big endian mode, defined as follows:
rowid [4] = (unsigned char), store (P > > 8), i.e., the first 8 bits of the value obtained after shifting the binary number of P by 8 bits to the right;
rowid [5] = (unsigned char), P is stored, i.e. the first 8 bits of P.
If two bytes cannot hold a page sequence number, then multiple bits in rowid [3] need to be occupied. The page sequence number occupies 20 bits at most, and can occupy four bits (high four bits) in rowid [3] at most, or directly configure the high four bits occupying rowid [3 ]. When occupying, the occupied binary value is (P > > 16) from the high order, and the occupied rowid [3] is :rowid[3]|((P>>16)&0x1)<<7|((P>>16)&0x2)<<5|((P>>16)&0x4)<<3|((P>>16)&0x8)<<1.
If the page sequence number stores more than 16 bits, i.e., greater than 65535, then the portion greater than 65535, as shown in FIG. 5, is stored in the upper four bits of rowid [3] in such a way that the lower bits are stored in the upper bits:
((P > > 16) &0x 1) < <7, the first bit stored on bit 7 (count starting from 0);
((P > > 16) &0x 2) < <5, the second bit stored on bit 6 (count starting from 0);
((P > > 16) &0x 4) < <3, the third bit stored on bit 5 (count starting from 0);
((P > > 16) &0x 8) < <1, the fourth bit is stored on bit 4 (count starts from 0).
The page offset is F, stored in the third part of rowid (two characters in total), generated using the big endian mode, defined as:
rowid [6] = (unsigned char), store (F > > 8), i.e., the value obtained by right shifting the binary number of F by 8 bits;
rowid [7] = (unsigned char), store (F), i.e., the first 8 bits of F.
According to the embodiment, at least one bit of data which is not stored in the page sequence number is stored in one or more continuous data bits adjacent to the third byte position in the second byte position, so that the existence of a table can be weakened, the method is suitable for a scene with more data page numbers, and the capability of representing the data page by the generated line identification is improved.
It should be noted that rowid is an identifier for indicating a record, and is not used for searching for a record, so even if the data bit storing the sequence number of the data table is covered, the usage of rowid is not affected.
As an alternative embodiment, after inserting the first record into the first data page, the method further comprises:
s61, receiving a target update instruction, wherein the target update instruction is used for updating first data recorded by a first record into target data;
And S62, updating the data recorded by the first record into target data in response to a target updating instruction, wherein the line identifier of the first record is a first line identifier.
If the server receives a target update instruction (update), the target update instruction is used for updating the first data recorded by the first record into target data. In response to the target update instruction, the server may update the data recorded by the first record to target data.
If the storage position of the first record is unchanged, rowid is unchanged, and the updated line identifier of the first record is the first line identifier. If the first record is stored as a new location, because rowid is to record the uniqueness of the identified data, but not to record the logical address of the data, rowid still maintains the original value, and the updated row identifier of the first record is still the first row identifier with the original data rowid.
By the embodiment, the row identifier is kept unchanged when the recorded data is updated, so that the convenience of the data recording can be ensured, and the resource consumption for calculating the row identifier is reduced.
As an alternative embodiment, after inserting the first record into the first data page, the method further comprises:
S71, receiving a target deleting instruction, wherein the target deleting instruction is used for deleting the data recorded by the first record;
and S72, deleting the data recorded by the first record and the first row identifier in response to the target deleting instruction.
If the server receives a target delete instruction (delete), the target delete instruction is used to delete the data recorded by the first record. In response to the target update instruction, the server may delete the data recorded by the first record.
Since rowid is the uniqueness of the data to be recorded, it becomes useless after the data is deleted, and can be deleted together when delete the data. For the first record, the server may delete both the data recorded by the first record and the first row identification.
By the embodiment, the row identifier is deleted while the data is deleted, so that the occupation of the row identifier to the storage resource can be reduced, and the storage resource is saved.
In addition, if the data position of the new insert is the position of one record deleted before, the generated rowid is consistent with the original recorded rowid because the associated information (node number, data table number, data page number, and page offset) is the same.
As an alternative embodiment, the method further comprises:
S81, after receiving the first insertion instruction and before inserting the first record into the first data page, receiving a second insertion instruction, wherein the second insertion instruction is used for inserting a second record corresponding to second data into a second data page, and the second data page is one data page in a second data table on a second node;
S82, determining second association information of a second record, wherein the second association information is used for representing a storage position of the second record on a second node;
S83, generating a second row identifier corresponding to the second record according to the second association information, wherein the second row identifier is used for uniquely identifying the second record;
s84, inserting a second record into the second data page, wherein the second record contains second data and second association information.
Because the logic position is adopted to replace the counter, the problems brought by the accumulator can be avoided, for example, when multithreading is concurrent, the value of the accumulator needs to be taken from the same accumulator, and therefore, the serial access needs to be locked, and the high concurrency performance is affected.
Alternatively, in this embodiment, since the value does not need to be taken from the same accumulator, while one thread (the thread into which the data is inserted) is performing the operation of generating rowid, if there are other threads that also need to perform the operation of generating rowid, these threads can be accessed in parallel,
For the first record, inserting the first data for execution by the first thread, and if a second inserting instruction is received, inserting a second record corresponding to second data in a second data page, wherein the second data page is one data page in a second data table on a second node.
In response to the second insertion instruction, the server may generate a second row identifier for the second record and insert the second record into the second data page in the same or similar manner as the first insertion instruction is processed, which is not described herein.
According to the embodiment, when multithreading is concurrent, the concurrent access mode is adopted for processing, so that the concurrency performance is improved.
The method of processing data in the embodiment of the present application is explained below in conjunction with alternative examples. In this example, the database is a distributed database, the row identifier is rowid, the node information occupies 2 bytes, the data table sequence number and the data page sequence number occupy 4 bytes (the data table sequence number occupies the first 1 byte), and the page offset occupies 2 bytes.
As shown in fig. 5, the flow of the data processing method in this alternative example may include the following steps:
step S502, an instruction for inserting data is received.
A server of a distributed database (which may be a node in a distributed database) may receive an instruction to insert a piece of data.
Step S504, a rowid is generated for the record corresponding to the data.
In the distributed database, this data is routed to node number 02 and stored in the data table number 03 in that node. In the table data file, the table already has 500 data pages (pages), which data needs to be added to the 500 th data page, and the data page number of this page is 499. There are already three pieces of data on this page, which should be inserted in the fourth place, the page offset of this place being 210.
Node number (N) is 02, then the first 8 bits of N are 2, (N > > 8) are 0, and the first two bytes of rowid are: 00000000|00000010. Table number (T) is 003, then the first 8 bits of T are 3, (T > > 8) is 0, and the 3-4 th byte of rowid is: 00000011|00000000. Page number (P) 499, then P's first 8 bits 243, (T > > 8) 1, and rowid's 5-6 bytes are: 00000001|11110011. The page offset (F) is 210, then the first 8 bits of F are 210, (F > > 8) are 0, and the 7-8 th byte of rowid is: 00000000|11010010. Then, rowid generated for this data record is: 00002|50332147|00210.
Step S506, writing the record corresponding to the data into the data page.
After obtaining rowid, the server may store rowid and data as one record and write the record to the 200 th data page.
By the method, rowid does not need to be saved in a file and update a record, so that I/O resources do not need to be accessed, the load of server resources can be reduced, parallel access can be performed during multithreading concurrency, concurrency performance is improved, and CPU resources can be saved because CPU calculation is not needed; in addition, in the distributed database, rowid contains the id of each node, and rowid does not repeat data among different nodes.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM (Read-Only Memory)/RAM (Random Access Memory), magnetic disk, optical disk) and including instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
According to another aspect of the embodiment of the present application, there is also provided a data processing apparatus for implementing the above data processing method. Fig. 6 is a block diagram of an alternative data processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus may include:
A first receiving unit 602, configured to receive a first insert instruction, where the first insert instruction is configured to insert a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node;
a first determining unit 604, coupled to the first receiving unit 602, for determining first association information of the first record, where the first association information is used to characterize a storage location of the first record on the first node;
a first generating unit 606, connected to the first determining unit 604, configured to generate, according to the first association information, a first row identifier corresponding to the first record, where the first row identifier is used to uniquely identify the first record;
a first inserting unit 608 is connected to the first generating unit 606, and is configured to insert a first record into the first data page, where the first record includes the first data and the first row identifier.
It should be noted that, the first receiving unit 602 in this embodiment may be used to perform the above-mentioned step S202, the first determining unit 604 in this embodiment may be used to perform the above-mentioned step S204, the first generating unit 606 in this embodiment may be used to perform the above-mentioned step S206, and the first inserting unit 608 in this embodiment may be used to perform the above-mentioned step S208.
Through the module, a first insertion instruction is received, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node; determining first association information of the first record, wherein the first association information is used for representing a storage position of the first record on the first node; generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record; the first record is inserted into the first data page, wherein the first record contains the first data and the first row identifier, so that the problem that the load of server resources is increased due to the fact that I/O resources are needed to be used in a mode of distributing row identifiers in the related art is solved, and the load of server resources is reduced.
As an alternative embodiment, the first determining unit 604 includes:
the first determining module is used for determining the node sequence number of the first node, the data table sequence number of the first data table on the first node and the data page sequence number of the first data page in the first data table;
A second determining module, configured to determine a page offset of the first record in the first data page, where the first association information includes: node number, data table number, data page number, page offset.
As an alternative embodiment, the second determining module includes:
An acquisition sub-module, configured to acquire a start position of a previous record of a first record in a first data page and a record length of the previous record;
And the determining submodule is used for determining the page offset of the first record in the first data page according to the starting position of the previous record and the record length of the previous record.
As an alternative embodiment, the first generating unit 606 includes:
an initialization module for initializing an initial line identifier comprising a plurality of bytes;
The writing module is used for writing each piece of sub-association information of the first association information into a corresponding byte position in the initial row identifier to obtain the first row identifier, wherein the first association information comprises the following sub-association information: node number, data table number, data page number, page offset.
As an alternative embodiment, the writing module includes:
the first writing sub-module is used for writing the node serial numbers into a first byte position of the initial row identifier by adopting a large end sequence mode according to the byte numbers distributed for the node serial numbers;
The second writing sub-module is used for writing the data table sequence number into a second byte position of the initial line identifier by adopting a small end sequence mode according to the byte number which is allocated for the data table sequence number and the data page sequence number, and writing the data page sequence number into a third byte position of the initial line identifier by adopting a large end sequence mode;
and the third writing submodule is used for writing the page offset into a fourth byte position of the initial row identifier by adopting a large endian mode according to the byte number allocated for the page offset.
As an alternative embodiment, the above device further comprises:
The acquisition unit is used for acquiring at least one bit of data which is not stored in the data page sequence number under the condition that the bit number required for storing the data page sequence number is greater than the total bit number of bytes allocated to the data page sequence number after the data page sequence number is written to the third byte position of the initial row identifier by adopting a large end sequence mode;
A writing unit for writing at least one bit of data onto at least one target data bit of the second byte position, wherein the at least one target data bit is one or more consecutive data bits of the second byte position adjacent to the third byte position.
As an alternative embodiment, the above device further comprises:
a second receiving unit configured to receive a target update instruction after inserting the first record into the first data page, wherein the target update instruction is configured to update the first data recorded by the first record to target data;
and the updating unit is used for responding to the target updating instruction and updating the data recorded by the first record into target data, wherein the line identifier of the first record is the first line identifier.
As an alternative embodiment, the above device further comprises:
A third receiving unit configured to receive a target deletion instruction after inserting the first record into the first data page, wherein the target deletion instruction is configured to delete data recorded by the first record;
And the deleting unit is used for deleting the data recorded by the first record and the first row identifier in response to the target deleting instruction.
As an alternative embodiment, the above device further comprises:
A fourth receiving unit, configured to receive a second insertion instruction after receiving the first insertion instruction and before inserting the first record into the first data page, where the second insertion instruction is used to insert a second record corresponding to second data in a second data page, and the second data page is one data page in a second data table on a second node;
a second determining unit configured to determine second association information of a second record, where the second association information is used to characterize a storage location of the second record on a second node;
The second generation unit is used for generating a second row identifier corresponding to the second record according to the second association information, wherein the second row identifier is used for uniquely identifying the second record;
And a second inserting unit for inserting a second record into the second data page, wherein the second record contains second data and second associated information.
It should be noted that the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments. It should be noted that the above modules may be implemented in software or in hardware as part of the apparatus shown in fig. 1, where the hardware environment includes a network environment.
According to still another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the above-mentioned data processing method, where the electronic device may be a server, a terminal, or a combination thereof.
Fig. 7 is a block diagram of an alternative electronic device, shown in fig. 7, including a processor 702, a communication interface 704, a memory 706, and a communication bus 708, wherein the processor 702, the communication interface 704, and the memory 706 communicate with one another via the communication bus 708, wherein,
A memory 706 for storing a computer program;
The processor 702, when executing the computer program stored on the memory 706, performs the following steps:
S1, receiving a first insertion instruction, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node;
S2, determining first association information of a first record, wherein the first association information is used for representing a storage position of the first record on a first node;
S3, generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record;
s4, inserting the first record into a first data page, wherein the first record comprises first data and a first row identifier.
Alternatively, in the present embodiment, the above-described communication bus may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus, or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The memory may include RAM or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
As an example, the memory 706 may include, but is not limited to, a first receiving unit 602, a first determining unit 604, a first generating unit 606, and a first inserting unit 608 in a processing device including the data. In addition, other module units in the data processing apparatus may be included, but are not limited to, and are not described in detail in this example.
The processor may be a general purpose processor and may include, but is not limited to: CPU, NP (Network Processor ), etc.; but may also be a DSP (DIGITAL SIGNAL Processing), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.
It will be understood by those skilled in the art that the structure shown in fig. 7 is only schematic, and the device implementing the above data processing method may be a terminal device, and the terminal device may be a smart phone (such as an Android Mobile phone, an iOS Mobile phone, etc.), a tablet computer, a palm computer, a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 7 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, etc.
According to yet another aspect of an embodiment of the present application, there is also provided a storage medium. Alternatively, in this embodiment, the storage medium may be used to execute the program code of the processing method of any of the data described above in the embodiment of the present application.
Alternatively, in this embodiment, the storage medium may be located on at least one network device of the plurality of network devices in the network shown in the above embodiment.
Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of:
S1, receiving a first insertion instruction, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node;
S2, determining first association information of a first record, wherein the first association information is used for representing a storage position of the first record on a first node;
S3, generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record;
s4, inserting the first record into a first data page, wherein the first record comprises first data and a first row identifier.
Alternatively, specific examples in the present embodiment may refer to examples described in the above embodiments, which are not described in detail in the present embodiment.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, ROM, RAM, a mobile hard disk, a magnetic disk or an optical disk.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the present embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.
Claims (10)
1. A method of processing data, comprising:
receiving a first insertion instruction, wherein the first insertion instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node;
Determining first association information of the first record, wherein the first association information is used for representing a storage position of the first record on the first node, and the first association information comprises: node sequence number, data table sequence number, data page sequence number, and page offset; wherein determining the first association information of the first record comprises: determining a node sequence number of the first node, a data table sequence number of the first data table on the first node, a data page sequence number of the first data page in the first data table and a page offset of the first record in the first data page;
Generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record;
inserting the first record into the first data page, wherein the first record contains the first data and the first row identifier;
receiving a target update instruction, wherein the target update instruction is used for updating first data recorded by the first record into target data;
and responding to the target updating instruction, and updating the data recorded by the first record into target data, wherein the line identifier of the first record is the first line identifier.
2. The method of claim 1, wherein determining the page offset of the first record in the first data page comprises:
Acquiring a starting position of a previous record of the first record and a record length of the previous record in the first data page;
and determining the page offset of the first record in the first data page according to the starting position of the previous record and the record length of the previous record.
3. The method of claim 1, wherein generating the first row identification corresponding to the first record according to the first association information comprises:
Initializing an initial row identifier comprising a plurality of bytes;
Writing each piece of sub-association information of the first association information into a corresponding byte position in the initial row identifier to obtain the first row identifier, wherein the first association information comprises the following piece of sub-association information: the node sequence number, the data table sequence number, the data page sequence number and the page offset.
4. A method according to claim 3, wherein writing the respective sub-association information of the first association information onto corresponding bytes in the initial row identity comprises:
Writing the node serial number into a first byte position of the initial line identifier by adopting a big endian mode according to the byte number allocated for the node serial number;
writing the data table sequence number into a second byte position of the initial line identifier by adopting a small end sequence mode according to the number of bytes which are allocated for the data table sequence number and the data page sequence number, and writing the data page sequence number into a third byte position of the initial line identifier by adopting a large end sequence mode;
And writing the page offset into a fourth byte position of the initial row identifier by adopting a large endian mode according to the byte number allocated for the page offset.
5. The method of claim 4, wherein after writing the data page sequence number to the third byte position of the initial row identification in a big endian mode, the method further comprises:
acquiring at least one bit of data which is not stored in the data page sequence number under the condition that the bit number required for storing the data page sequence number is greater than the total bit number of bytes allocated for the data page sequence number;
writing the at least one bit of data onto at least one target data bit of the second byte position, wherein the at least one target data bit is one or more consecutive data bits of the second byte position adjacent to the third byte position.
6. The method of claim 1, wherein after inserting the first record into the first data page, the method further comprises:
receiving a target deleting instruction, wherein the target deleting instruction is used for deleting the data recorded by the first record;
and deleting the data recorded by the first record and the first row identifier in response to the target deleting instruction.
7. The method according to any one of claims 1 to 6, further comprising:
After receiving a first insertion instruction and before inserting the first record into the first data page, receiving a second insertion instruction, wherein the second insertion instruction is used for inserting a second record corresponding to second data into a second data page, and the second data page is one data page in a second data table on a second node;
Determining second association information of the second record, wherein the second association information is used for representing a storage position of the second record on the second node;
Generating a second row identifier corresponding to the second record according to the second association information, wherein the second row identifier is used for uniquely identifying the second record;
And inserting the second record into the second data page, wherein the second record contains the second data and the second association information.
8. A data processing apparatus, comprising:
The first receiving unit is used for receiving a first inserting instruction, wherein the first inserting instruction is used for inserting a first record corresponding to first data in a first data page, and the first data page is one data page in a first data table on a first node;
A first determining unit, configured to determine first association information of the first record, where the first association information is used to characterize a storage location of the first record on the first node, and the first association information includes: node sequence number, data table sequence number, data page sequence number, and page offset; wherein determining the first association information of the first record comprises: determining a node sequence number of the first node, a data table sequence number of the first data table on the first node, a data page sequence number of the first data page in the first data table and a page offset of the first record in the first data page;
the first generation unit is used for generating a first row identifier corresponding to the first record according to the first association information, wherein the first row identifier is used for uniquely identifying the first record;
A first inserting unit, configured to insert the first record into the first data page, where the first record includes the first data and the first row identifier;
The second receiving unit is used for receiving a target updating instruction, wherein the target updating instruction is used for updating the first data recorded by the first record into target data;
And the updating unit is used for responding to the target updating instruction and updating the data recorded by the first record into target data, wherein the line identifier of the first record is the first line identifier.
9. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus, characterized in that,
The memory is used for storing a computer program;
The processor is configured to perform the method steps of any of claims 1 to 7 by running the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to perform the method steps of any of claims 1 to 7 when run.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011561924.0A CN112732174B (en) | 2020-12-25 | 2020-12-25 | Data processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011561924.0A CN112732174B (en) | 2020-12-25 | 2020-12-25 | Data processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112732174A CN112732174A (en) | 2021-04-30 |
CN112732174B true CN112732174B (en) | 2024-09-13 |
Family
ID=75616323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011561924.0A Active CN112732174B (en) | 2020-12-25 | 2020-12-25 | Data processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112732174B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6546394B1 (en) * | 1999-12-28 | 2003-04-08 | Oracle International Corporation | Database system having logical row identifiers |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050216518A1 (en) * | 2004-03-26 | 2005-09-29 | Oracle International Corporation | Database management system with persistent, user-accessible bitmap values |
US9396103B2 (en) * | 2007-06-08 | 2016-07-19 | Sandisk Technologies Llc | Method and system for storage address re-mapping for a memory device |
CN102651008B (en) * | 2011-02-28 | 2015-06-17 | 国际商业机器公司 | Method and equipment for organizing data records in relational data base |
US8396858B2 (en) * | 2011-08-11 | 2013-03-12 | International Business Machines Corporation | Adding entries to an index based on use of the index |
CN103500183A (en) * | 2013-09-12 | 2014-01-08 | 国家计算机网络与信息安全管理中心 | Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method |
US9898551B2 (en) * | 2014-11-25 | 2018-02-20 | Sap Se | Fast row to page lookup of data table using capacity index |
US10725987B2 (en) * | 2014-11-25 | 2020-07-28 | Sap Se | Forced ordering of a dictionary storing row identifier values |
CN110196847A (en) * | 2018-08-16 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Data processing method and device, storage medium and electronic device |
US11514027B2 (en) * | 2019-06-07 | 2022-11-29 | Sap Se | Paged hybrid LOBs |
CN110555001B (en) * | 2019-09-05 | 2021-05-28 | 腾讯科技(深圳)有限公司 | Data processing method, device, terminal and medium |
CN111190903A (en) * | 2019-12-27 | 2020-05-22 | 柏科数据技术(深圳)股份有限公司 | Btree block indexing technology for disaster recovery client |
-
2020
- 2020-12-25 CN CN202011561924.0A patent/CN112732174B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6546394B1 (en) * | 1999-12-28 | 2003-04-08 | Oracle International Corporation | Database system having logical row identifiers |
Also Published As
Publication number | Publication date |
---|---|
CN112732174A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11003625B2 (en) | Method and apparatus for operating on file | |
US11314689B2 (en) | Method, apparatus, and computer program product for indexing a file | |
CN109299190B (en) | Method and device for processing metadata of object in distributed storage system | |
CN108052643B (en) | Data storage method and device based on LSM Tree structure and storage engine | |
CN112269665B (en) | Memory processing method and device, electronic equipment and storage medium | |
CN112835528B (en) | Dirty page refreshing method and device, electronic equipment and storage medium | |
CN111803917B (en) | Resource processing method and device | |
CN109697019B (en) | Data writing method and system based on FAT file system | |
CN110377276B (en) | Source code file management method and device | |
CN107451070B (en) | Data processing method and server | |
CN114385089B (en) | Cross addressing-based dynamic bank storage method and device and electronic equipment | |
CN115470156A (en) | RDMA-based memory use method, system, electronic device and storage medium | |
CN112732174B (en) | Data processing method and device, electronic equipment and storage medium | |
CN110008020B (en) | Memory management method, memory management device, electronic equipment and computer readable storage medium | |
CN113268439A (en) | Memory address searching method and device, electronic equipment and storage medium | |
CN113986134B (en) | Method for storing data, method and device for reading data | |
CN112115521A (en) | Data access method and device | |
CN111209304B (en) | Data processing method, device and system | |
CN115934999A (en) | Video stream data storage method, device and medium based on block file | |
CN108196790B (en) | Data management method, storage device, and computer-readable storage medium | |
CN113254273A (en) | Method, system, device and medium for real-time recovery of principal metadata | |
CN112799592A (en) | Multi-namespace allocation method, device, equipment and readable medium | |
CN111625502A (en) | Data reading method and device, storage medium and electronic device | |
CN113448958B (en) | Data processing method and device, electronic equipment and storage medium | |
CN118331748B (en) | Data processing method, device, medium and computing equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |