US20080235245A1 - Commitment of transactions in a distributed system - Google Patents
Commitment of transactions in a distributed system Download PDFInfo
- Publication number
- US20080235245A1 US20080235245A1 US12/131,006 US13100608A US2008235245A1 US 20080235245 A1 US20080235245 A1 US 20080235245A1 US 13100608 A US13100608 A US 13100608A US 2008235245 A1 US2008235245 A1 US 2008235245A1
- Authority
- US
- United States
- Prior art keywords
- transaction
- node
- participant
- nodes
- sequence number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/466—Transaction processing
Definitions
- the present invention relates generally to distributed systems. More particularly, the present invention is directed to commitment of transactions in a distributed system.
- a distributed system is a multi-node system in which data is stored in various databases.
- Nodes can be any data processing system, such as a computer system. Although each database can only be accessed through one node, more than one database may be accessible through a node in the distributed system.
- the nodes in a distributed system can be connected to one another through a network, such as a local area network (LAN) or a wide area network (WAN).
- LAN local area network
- WAN wide area network
- nodes in a distributed system may be in one location or spread out over multiple locations. Examples of distributed systems include database systems, mail server systems, etc.
- a transaction which consists of a set of requests that results in a single logical action, can modify data on multiple databases in a distributed system
- the distributed system must ensure that data consistency is maintained, regardless of whether or not failures (e.g., power outages, hardware crashes, etc.) occur.
- failures e.g., power outages, hardware crashes, etc.
- each requested operation in a transaction must be “committed,” i.e., changes to data become persistent, before the transaction can be committed.
- a data change becomes persistent when a log record of the data change is “flushed,” i.e., written, to non-volatile storage (e.g., disk drive).
- Log records allow a node to restore a database to its pre-failure state by replaying the operations that committed prior to failure.
- a coordinator node for each transaction i.e., the node where a client (e.g., an application) submitted the transaction, identifies, for each request in the transaction, a node in the distributed system responsible for handling the request.
- a participant node Each node assigned to handle a request in the transaction is referred to as a participant node.
- Each participant node in a two-phase commit protocol votes whether to commit or abort the transaction and sends its vote to the coordinator node.
- the coordinator node then makes the final decision on whether to commit or abort the transaction based on the vote from each participant node.
- a transaction will only be committed by the coordinator node if all of the participant nodes vote to commit the transaction. Otherwise, the coordinator node will abort the transaction.
- the two-phase commit protocol is not really message efficient because during phase one, the coordinator node sends a message to each participant node to prepare to commit the transaction. Each participant node then decides whether it can commit the requested operation(s) and sends a message back to the coordinator node with its vote on whether to commit or abort the transaction. In the second phase, the coordinator node decides whether to commit or abort the transaction based on all of the votes it received from the participant nodes and sends a message to each participant node to commit or abort the transaction.
- Another commit protocol employed by distributed systems is a two-interval commit (2IC), discussed in U.S. Pat. No. 5,799,305, entitled “Method of Commitment in a Distributed Database Transaction,” which is hereby incorporated in its entirety for all purposes.
- the 2IC system uses interval messages that are sent in succession from an interval coordinator to determine whether to commit or abort a transaction.
- interval messages that are sent in succession from an interval coordinator to determine whether to commit or abort a transaction.
- a computer program product and system for committing transactions in a distributed system are provided.
- the computer program product and system provide for receiving a request from a client to commit a transaction at a coordinator node in the distributed system, the distributed system comprising one or more participant nodes, tracking a tail log sequence number for each of all other nodes in the distributed system, each tail log sequence number approximating a last transaction log record flushed by the respective node, wherein at least one of the all other nodes is a participant node, determining a max log sequence number associated with the transaction for each of the one or more participant nodes, each max log sequence number corresponding to a highest transaction log record required for the transaction at the respective participant node, and committing the transaction at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node.
- a computer program product on a computer-readable medium containing a plurality of executable program instructions for committing transactions in a distributed system the instructions when executed perform to: receive a request from a client to commit a transaction at a coordinator node in the distributed system, the distributed system comprising one or more participant nodes; track a tail log sequence number for each of all other nodes in the distributed system, each tail log sequence number being associated with a last transaction log record flushed by a respective node, wherein at least one node of the all other nodes is a participant node; determine a max log sequence number associated with the transaction for each of one or more participant nodes, each max log sequence number corresponding to a highest transaction log record required for the transaction at the respective participant node; and commit the transaction at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node; is provided for.
- FIG. 1 is a process flow of a method for committing transactions in a distributed system according to an aspect of the invention.
- FIGS. 2A-2B illustrate flowcharts of a method for committing transactions in a distributed system in accordance with one implementation of the invention.
- FIGS. 3A-3B depict a distributed system according to an embodiment of the invention
- FIG. 4 shows a distributed system in accordance with another aspect of the invention.
- FIG. 5 is a block diagram of a data processing system with which embodiments of the present invention can be implemented.
- the present invention relates generally to distributed systems and more particularly to commitment of transactions in a distributed system.
- the following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements.
- Various modifications to the preferred implementations and the generic principles and features described herein will be readily apparent to those skilled in the art.
- the present invention is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features described herein.
- FIG. 1 depicts a process 100 for committing transactions in a distributed system according to an aspect of the invention.
- a request to commit a transaction is received from a client at a coordinator node in the distributed system.
- a client can be an application or a process and may be located on the coordinator node or some other node.
- the coordinator node is usually where the transaction was initialized, i.e., where the client submitted a request to begin the transaction.
- the distributed system comprises all participant nodes participating in the transaction, and there may be one or more participant nodes in the distributed system. In some embodiments, the coordinator node may also be a participant node.
- a tail log sequence number for every other node in the distributed system is tracked (at the coordinator node). It is clear that at least one other node is a participant node in the distributed system. Each node in the distributed system keeps a log of all of the requests which are fulfilled by that node. Log records are usually first created in volatile memory, i.e., memory susceptible to failures, such as random access memory (RAM). When a node is ready to commit a requested operation, it will “flush,” i.e., write, the log record associated with the requested operation to non-volatile memory, such as a hard disk. Flushing the log record permits the node to re-perform the requested operation if a failure occurs sometime thereafter. Each tail log sequence number approximates a last transaction log record flushed by the respective node.
- a max log sequence number associated with the transaction is then determined for each of the one or more participant nodes at 106 .
- the transaction is committed at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node ( 108 ). This ensures that each participant node has committed its respective transaction request(s) before the transaction is committed at the coordinator node, which preserves data consistency.
- FIGS. 2A-2B Illustrated in FIGS. 2A-2B is a process 200 for committing transactions in a distributed system in accordance with an embodiment of the invention.
- a request to commit a transaction is received from a client at a coordinator node in the distributed system at 202 .
- the client has typically already submitted many other requests, such as requests to modify, delete, or insert data, to the coordinator node, which then forwarded those requests onto one or more participant nodes in the distributed system participating in the transaction.
- a first array comprising an entry for each of the other nodes in the distributed system is maintained at the coordinator node ( 204 ). Each entry is operably configured to store a tail log sequence number for the respective other node.
- the first array is updated when a new tail log sequence number is piggybacked on a message from one of the other nodes.
- a message may include one or more responses, one or more requests, or a combination of the two. This results in a significant reduction is messaging traffic because the tail log sequence number is included along with a message one node was already going to send to another node, rather than being sent in a new, separate message.
- the cost of adding the tail log sequence number to an existing message is very low, usually only a few extra bytes.
- the tail log sequence number sent by a node is not the actual last transaction log record flushed by the node, but rather some earlier transaction log record. This helps minimize contention for resources, such as memory, on the node because it allows more time for resources allocated to other transactions to be unlocked or unlatched.
- a second array for the transaction is created at 208 .
- the second array comprises an entry for each of the one or more participant nodes, where each entry is operably configured to store the max log sequence number associated with the transaction for the respective participant node.
- the second array is updated when the max log sequence number associated with the transaction is piggybacked on a response from one of the one or more participant nodes. By piggybacking the max log sequence numbers on responses the one or more participant nodes were already going to send to the coordinator node, messaging traffic is further reduced.
- the second array is then compared to the first array to determine whether each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node ( 212 ).
- a commit log record for the transaction is written at the coordinator node ( 214 )
- a commit request is scheduled to be sent to each of the one or more participant nodes along with another message already scheduled to be sent to the respective participant node ( 216 )
- a successful commit message is returned to the client ( 218 ).
- Messaging traffic is again improved by sending commit requests to each participant node via another message already scheduled to be sent to the respective participant node.
- a participant node Once a participant node receives the commit request, it will perform local commit processing, such as unlocking resources reserved for the transaction and writing a commit log record for the transaction.
- local commit processing such as unlocking resources reserved for the transaction and writing a commit log record for the transaction.
- a participant node Once a participant node completes local commit processing, it will send a response to the coordinator node that it has committed the transaction locally. The response may be piggybacked on an unrelated message being sent from the participant node to the coordinator node to further reduce messaging traffic.
- the coordinator node Once the coordinator node receives a response back from each participant node, it will reclaim a log space at the coordinator node assigned to retain information about the state of the transaction ( 220 ). The information may include the identity of each participant node, etc.
- the coordinator node will simply wait to check again after sending the message to flush one or more transaction log records.
- the message may include the log sequence number to be flushed and a response request. The coordinator node will then will wait for the response before proceeding.
- the need to actively send a request to a participant node to flush one or more transaction records should be a rare occurrence as a distributed system typically handles so many transactions that there are plenty of messages being sent between nodes to allow the tail log sequence numbers to be updated frequently.
- participant node performing crash recovery will not see any commit request, even though the transaction has been committed. Under those circumstances, the participant node performing crash recovery will communicate to the coordinator node to determine the state of the transaction and commit or abort, accordingly. The coordinator node will still remember the state of the transaction since it will not reclaim the log space assigned to retain information about the state of the transaction until it has received a response back from each participant node that the respective participant node committed the transaction locally.
- Certain transactions may include requests that do not modify, delete, or insert data, such as pre-fetching (i.e., read-ahead) requests. These requests are usually asynchronous and sometimes they encounter lock timeouts or deadlocks. A deadlock can occur when a first transaction has been allocated resource A and is waiting for the allocation of resource B, but at the same time, a second transaction has been allocated resource B and is waiting for the allocation of resource A.
- pre-fetching i.e., read-ahead
- the node When a deadlock occurs on a node, the node will usually try to rollback one of the transactions after a lock timeout. However, in a distributed transaction, if the node is a participant node, it must ask the coordinator node for permission to rollback. In some embodiments, the coordinator node will grant the rollback permission if it is not already processing commit, as described above, and inform any other participant node to rollback. Otherwise, the coordinator node will prevent the participant node from performing a rollback.
- FIGS. 3A-3B show a distributed system 300 according to an implementation of the invention.
- Distributed system 300 includes nodes 302 a and 302 b .
- Node 302 a is coupled to a database 304 a and node 302 b is coupled to databases 304 b and 304 c .
- Transaction logs 306 a and 306 b are maintained on nodes 302 a and 302 b , respectively.
- An exemplary transaction log 306 a can be seen in FIG. 3B .
- Exemplary transaction log 306 a includes a log sequence number (LSN) column 314 a , a transaction ID column 314 b , an operation column 314 c , and a plurality of rows (i.e., records) 316 .
- Transaction logs in other embodiments may be different with more or less columns, different columns, different information, etc.
- Nodes 302 a and 302 b also include arrays 308 a and 308 c .
- arrays 308 a and 308 c are Tail LSN Arrays each with only one entry 318 a and 318 c since distributed system 300 only shown with nodes 302 a and 302 b in FIG. 3A .
- entry 318 a is an approximation of the last transaction log record flushed by node 302 b
- entry 318 c is an approximation of the last transaction log record flushed by node 302 a.
- a Max Trans LSN Array 308 b is included in node 302 a as a client 310 has submitted a transaction to node 302 a , in which node 302 b is a participant node.
- Client 310 may be an application or process residing on node 302 a , node 302 b , or some other node (not shown) within or outside of distributed system 300 .
- Max Trans LSN Array 308 b includes an entry 318 b for the highest transaction log record required for the transaction at node 302 b .
- Messages 312 a - h are illustrated in FIG. 3A to show the messaging between client 310 , node 302 a , and node 302 b.
- FIG. 4 Depicted in FIG. 4 is a distributed system 400 for in accordance with another aspect of the invention.
- Distributed system 400 includes nodes 402 a - d and databases 404 a - k .
- a client application 406 is running on node 402 a .
- Node 402 a also includes a log space 410 for storing log records and agents 408 a - b to coordinate transactions.
- Agents 408 a - b may be tasks or processes running on node 402 a .
- one agent is used to handle execution of a transaction and another agent may be used to handle commitment of the transaction.
- the invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements.
- the invention is implemented in software, which includes, but is not limited to, firmware, resident software, microcode, etc.
- the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk.
- Current examples of optical disks include DVD, compact disk-read-only memory (CD-ROM), and compact disk-read/write (CD-R/W).
- FIG. 5 illustrates a data processing system 500 suitable for storing and/or executing program code.
- Data processing system 500 includes a processor 502 coupled to memory elements 504 a - b through a system bus 506 .
- data processing system 500 may include more than one processor and each processor may be coupled directly or indirectly to one or more memory elements through a system bus.
- Memory elements 504 a - b can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution.
- I/O devices 508 a - b are coupled to data processing system 500 .
- I/O devices 508 a - b may be coupled to data processing system 500 directly or indirectly through intervening I/O controllers (not shown).
- a network adapter 510 is coupled to data processing system 500 to enable data processing system 500 to become coupled to other data processing systems or remote printers or storage devices through communication link 512 .
- Communication link 512 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.
- messaging traffic in distributed systems may be greatly reduced. This reduction in messaging traffic results in quicker transaction commit times and may allow for the use of lower cost systems, such as a less powerful network, while maintaining comparable performance.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Multi Processors (AREA)
Abstract
A computer program product and system for committing transactions in a distributed system are provided. The method, computer program product, and system provide for receiving a request from a client to commit a transaction at a coordinator node in the distributed system, tracking a tail log sequence number for every other node in the distributed system, determining a max log sequence number associated with the transaction for each participant node in the distributed system, and committing the transaction at the coordinator node when the tail log sequence number for each participant node is greater than or equal to the max log sequence number associated with the transaction at the respective participant node.
Description
- Under 35 USC §120, this application is a continuation application and claims the benefit of priority to U.S. patent application Ser. No. 11/312,994 entitled “Commitment of Transactions in a Distributed System”, all of which is incorporated herein by reference.
- The present invention relates generally to distributed systems. More particularly, the present invention is directed to commitment of transactions in a distributed system.
- A distributed system is a multi-node system in which data is stored in various databases. Nodes can be any data processing system, such as a computer system. Although each database can only be accessed through one node, more than one database may be accessible through a node in the distributed system. The nodes in a distributed system can be connected to one another through a network, such as a local area network (LAN) or a wide area network (WAN). In addition, nodes in a distributed system may be in one location or spread out over multiple locations. Examples of distributed systems include database systems, mail server systems, etc.
- Since a transaction, which consists of a set of requests that results in a single logical action, can modify data on multiple databases in a distributed system, the distributed system must ensure that data consistency is maintained, regardless of whether or not failures (e.g., power outages, hardware crashes, etc.) occur. Hence, each requested operation in a transaction must be “committed,” i.e., changes to data become persistent, before the transaction can be committed. A data change becomes persistent when a log record of the data change is “flushed,” i.e., written, to non-volatile storage (e.g., disk drive). Log records allow a node to restore a database to its pre-failure state by replaying the operations that committed prior to failure.
- Traditionally, distributed systems have utilized a two-phase commit (2PC) protocol to preserve consistency of data. In a 2PC system, a coordinator node for each transaction, i.e., the node where a client (e.g., an application) submitted the transaction, identifies, for each request in the transaction, a node in the distributed system responsible for handling the request. Each node assigned to handle a request in the transaction is referred to as a participant node.
- Each participant node in a two-phase commit protocol votes whether to commit or abort the transaction and sends its vote to the coordinator node. The coordinator node then makes the final decision on whether to commit or abort the transaction based on the vote from each participant node. A transaction will only be committed by the coordinator node if all of the participant nodes vote to commit the transaction. Otherwise, the coordinator node will abort the transaction.
- The two-phase commit protocol, however, is not really message efficient because during phase one, the coordinator node sends a message to each participant node to prepare to commit the transaction. Each participant node then decides whether it can commit the requested operation(s) and sends a message back to the coordinator node with its vote on whether to commit or abort the transaction. In the second phase, the coordinator node decides whether to commit or abort the transaction based on all of the votes it received from the participant nodes and sends a message to each participant node to commit or abort the transaction.
- Another commit protocol employed by distributed systems is a two-interval commit (2IC), discussed in U.S. Pat. No. 5,799,305, entitled “Method of Commitment in a Distributed Database Transaction,” which is hereby incorporated in its entirety for all purposes. The 2IC system uses interval messages that are sent in succession from an interval coordinator to determine whether to commit or abort a transaction. Thus, although a 2IC system requires less messaging than a 2PC system, it is still more message-intensive than necessary.
- Accordingly, there is a need for a distributed transaction commitment protocol that is more message efficient than current commitment protocols. The present invention addresses such a need.
- A computer program product and system for committing transactions in a distributed system are provided. The computer program product and system provide for receiving a request from a client to commit a transaction at a coordinator node in the distributed system, the distributed system comprising one or more participant nodes, tracking a tail log sequence number for each of all other nodes in the distributed system, each tail log sequence number approximating a last transaction log record flushed by the respective node, wherein at least one of the all other nodes is a participant node, determining a max log sequence number associated with the transaction for each of the one or more participant nodes, each max log sequence number corresponding to a highest transaction log record required for the transaction at the respective participant node, and committing the transaction at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node.
- In one implementation, a computer program product on a computer-readable medium containing a plurality of executable program instructions for committing transactions in a distributed system, the instructions when executed perform to: receive a request from a client to commit a transaction at a coordinator node in the distributed system, the distributed system comprising one or more participant nodes; track a tail log sequence number for each of all other nodes in the distributed system, each tail log sequence number being associated with a last transaction log record flushed by a respective node, wherein at least one node of the all other nodes is a participant node; determine a max log sequence number associated with the transaction for each of one or more participant nodes, each max log sequence number corresponding to a highest transaction log record required for the transaction at the respective participant node; and commit the transaction at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node; is provided for.
-
FIG. 1 is a process flow of a method for committing transactions in a distributed system according to an aspect of the invention. -
FIGS. 2A-2B illustrate flowcharts of a method for committing transactions in a distributed system in accordance with one implementation of the invention. -
FIGS. 3A-3B depict a distributed system according to an embodiment of the invention -
FIG. 4 shows a distributed system in accordance with another aspect of the invention. -
FIG. 5 is a block diagram of a data processing system with which embodiments of the present invention can be implemented. - The present invention relates generally to distributed systems and more particularly to commitment of transactions in a distributed system. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred implementations and the generic principles and features described herein will be readily apparent to those skilled in the art. Thus, the present invention is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features described herein.
-
FIG. 1 depicts aprocess 100 for committing transactions in a distributed system according to an aspect of the invention. At 102, a request to commit a transaction is received from a client at a coordinator node in the distributed system. A client can be an application or a process and may be located on the coordinator node or some other node. The coordinator node is usually where the transaction was initialized, i.e., where the client submitted a request to begin the transaction. The distributed system comprises all participant nodes participating in the transaction, and there may be one or more participant nodes in the distributed system. In some embodiments, the coordinator node may also be a participant node. - At 104, a tail log sequence number for every other node in the distributed system is tracked (at the coordinator node). It is clear that at least one other node is a participant node in the distributed system. Each node in the distributed system keeps a log of all of the requests which are fulfilled by that node. Log records are usually first created in volatile memory, i.e., memory susceptible to failures, such as random access memory (RAM). When a node is ready to commit a requested operation, it will “flush,” i.e., write, the log record associated with the requested operation to non-volatile memory, such as a hard disk. Flushing the log record permits the node to re-perform the requested operation if a failure occurs sometime thereafter. Each tail log sequence number approximates a last transaction log record flushed by the respective node.
- A max log sequence number associated with the transaction is then determined for each of the one or more participant nodes at 106. There are usually multiple requests within a single transaction. Since each participant node may be assigned to handle more than one of the requests and each request has a separate log record, each max log sequence number corresponds to a highest transaction log record required for the transaction at the respective participant node (i.e., the log sequence number of the transaction log record corresponding to the last requested operation executed by the respective participant node for the transaction). Because each node's log is unique to the node, a log sequence number at one node will usually correspond to a different transaction than the same log sequence number at another node.
- The transaction is committed at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node (108). This ensures that each participant node has committed its respective transaction request(s) before the transaction is committed at the coordinator node, which preserves data consistency.
- Illustrated in
FIGS. 2A-2B is aprocess 200 for committing transactions in a distributed system in accordance with an embodiment of the invention. A request to commit a transaction is received from a client at a coordinator node in the distributed system at 202. When the commit request is submitted, the client has typically already submitted many other requests, such as requests to modify, delete, or insert data, to the coordinator node, which then forwarded those requests onto one or more participant nodes in the distributed system participating in the transaction. - A first array comprising an entry for each of the other nodes in the distributed system is maintained at the coordinator node (204). Each entry is operably configured to store a tail log sequence number for the respective other node. At 206, the first array is updated when a new tail log sequence number is piggybacked on a message from one of the other nodes. A message may include one or more responses, one or more requests, or a combination of the two. This results in a significant reduction is messaging traffic because the tail log sequence number is included along with a message one node was already going to send to another node, rather than being sent in a new, separate message. In addition, the cost of adding the tail log sequence number to an existing message is very low, usually only a few extra bytes.
- In some implementations, the tail log sequence number sent by a node is not the actual last transaction log record flushed by the node, but rather some earlier transaction log record. This helps minimize contention for resources, such as memory, on the node because it allows more time for resources allocated to other transactions to be unlocked or unlatched.
- A second array for the transaction is created at 208. The second array comprises an entry for each of the one or more participant nodes, where each entry is operably configured to store the max log sequence number associated with the transaction for the respective participant node. At 210, the second array is updated when the max log sequence number associated with the transaction is piggybacked on a response from one of the one or more participant nodes. By piggybacking the max log sequence numbers on responses the one or more participant nodes were already going to send to the coordinator node, messaging traffic is further reduced.
- The second array is then compared to the first array to determine whether each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node (212). When each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node, a commit log record for the transaction is written at the coordinator node (214), a commit request is scheduled to be sent to each of the one or more participant nodes along with another message already scheduled to be sent to the respective participant node (216), and a successful commit message is returned to the client (218).
- Messaging traffic is again improved by sending commit requests to each participant node via another message already scheduled to be sent to the respective participant node. Once a participant node receives the commit request, it will perform local commit processing, such as unlocking resources reserved for the transaction and writing a commit log record for the transaction. Once a participant node completes local commit processing, it will send a response to the coordinator node that it has committed the transaction locally. The response may be piggybacked on an unrelated message being sent from the participant node to the coordinator node to further reduce messaging traffic. Once the coordinator node receives a response back from each participant node, it will reclaim a log space at the coordinator node assigned to retain information about the state of the transaction (220). The information may include the identity of each participant node, etc.
- When the tail log sequence number for at least one of the one or more participant nodes is less than the max log sequence number associated with the transaction at the least one participant node, a determination is made as to whether another check had already been made (222). If no other check has been made, the coordinator node may wait with a timeout before checking again. On the other hand, if another check has already been made, a message is sent to the at least one participant node to flush one or more transaction log records at the at least one participant node (224).
- In some embodiments, the coordinator node will simply wait to check again after sending the message to flush one or more transaction log records. In other embodiments, the message may include the log sequence number to be flushed and a response request. The coordinator node will then will wait for the response before proceeding. The need to actively send a request to a participant node to flush one or more transaction records should be a rare occurrence as a distributed system typically handles so many transactions that there are plenty of messages being sent between nodes to allow the tail log sequence numbers to be updated frequently.
- It is possible that a participant node performing crash recovery will not see any commit request, even though the transaction has been committed. Under those circumstances, the participant node performing crash recovery will communicate to the coordinator node to determine the state of the transaction and commit or abort, accordingly. The coordinator node will still remember the state of the transaction since it will not reclaim the log space assigned to retain information about the state of the transaction until it has received a response back from each participant node that the respective participant node committed the transaction locally.
- Certain transactions may include requests that do not modify, delete, or insert data, such as pre-fetching (i.e., read-ahead) requests. These requests are usually asynchronous and sometimes they encounter lock timeouts or deadlocks. A deadlock can occur when a first transaction has been allocated resource A and is waiting for the allocation of resource B, but at the same time, a second transaction has been allocated resource B and is waiting for the allocation of resource A.
- When a deadlock occurs on a node, the node will usually try to rollback one of the transactions after a lock timeout. However, in a distributed transaction, if the node is a participant node, it must ask the coordinator node for permission to rollback. In some embodiments, the coordinator node will grant the rollback permission if it is not already processing commit, as described above, and inform any other participant node to rollback. Otherwise, the coordinator node will prevent the participant node from performing a rollback.
-
FIGS. 3A-3B show a distributedsystem 300 according to an implementation of the invention. Distributedsystem 300 includesnodes Node 302 a is coupled to adatabase 304 a andnode 302 b is coupled todatabases nodes FIG. 3B . Exemplary transaction log 306 a includes a log sequence number (LSN)column 314 a, atransaction ID column 314 b, anoperation column 314 c, and a plurality of rows (i.e., records) 316. Transaction logs in other embodiments may be different with more or less columns, different columns, different information, etc. -
Nodes arrays FIG. 3B ,arrays entry system 300 only shown withnodes FIG. 3A . In the embodiment,entry 318 a is an approximation of the last transaction log record flushed bynode 302 b andentry 318 c is an approximation of the last transaction log record flushed bynode 302 a. - A Max
Trans LSN Array 308 b is included innode 302 a as aclient 310 has submitted a transaction tonode 302 a, in whichnode 302 b is a participant node.Client 310 may be an application or process residing onnode 302 a,node 302 b, or some other node (not shown) within or outside of distributedsystem 300. MaxTrans LSN Array 308 b includes anentry 318 b for the highest transaction log record required for the transaction atnode 302 b. Messages 312 a-h are illustrated inFIG. 3A to show the messaging betweenclient 310,node 302 a, andnode 302 b. - Depicted in
FIG. 4 is a distributedsystem 400 for in accordance with another aspect of the invention. Distributedsystem 400 includes nodes 402 a-d and databases 404 a-k. Aclient application 406 is running onnode 402 a.Node 402 a also includes alog space 410 for storing log records and agents 408 a-b to coordinate transactions. Agents 408 a-b may be tasks or processes running onnode 402 a. In some embodiments of the invention, one agent is used to handle execution of a transaction and another agent may be used to handle commitment of the transaction. - The invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. In one aspect, the invention is implemented in software, which includes, but is not limited to, firmware, resident software, microcode, etc.
- Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Current examples of optical disks include DVD, compact disk-read-only memory (CD-ROM), and compact disk-read/write (CD-R/W).
-
FIG. 5 illustrates adata processing system 500 suitable for storing and/or executing program code.Data processing system 500 includes aprocessor 502 coupled to memory elements 504 a-b through asystem bus 506. In other embodiments,data processing system 500 may include more than one processor and each processor may be coupled directly or indirectly to one or more memory elements through a system bus. - Memory elements 504 a-b can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution. As shown, input/output or I/O devices 508 a-b (including, but not limited to, keyboards, displays, pointing devices, etc.) are coupled to
data processing system 500. I/O devices 508 a-b may be coupled todata processing system 500 directly or indirectly through intervening I/O controllers (not shown). - In the embodiment, a
network adapter 510 is coupled todata processing system 500 to enabledata processing system 500 to become coupled to other data processing systems or remote printers or storage devices throughcommunication link 512.Communication link 512 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters. - By piggybacking communications between nodes regarding tail log sequence numbers, max log sequence numbers, commit requests, and/or local commit confirmations on other existing messages, messaging traffic in distributed systems may be greatly reduced. This reduction in messaging traffic results in quicker transaction commit times and may allow for the use of lower cost systems, such as a less powerful network, while maintaining comparable performance.
- Various implementations for committing transactions in a distributed system have been described. Nevertheless, one of ordinary skill in the art will readily recognize that various modifications may be made to the implementations, and any variations would be within the spirit and scope of the present invention. For example, the above-described process flows are described with reference to a particular ordering of process actions. However, the ordering of many of the described process actions may be changed without affecting the scope or operation of the invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the following claims.
Claims (15)
1. A computer program product on a computer-readable medium containing a plurality of executable program instructions for committing transactions in a distributed system, the instructions when executed perform to:
receive a request from a client to commit a transaction at a coordinator node in the distributed system, the distributed system comprising one or more participant nodes;
track a tail log sequence number for each of all other nodes in the distributed system, each tail log sequence number being associated with a last transaction log record flushed by a respective node, wherein at least one node of the all other nodes is a participant node;
determine a max log sequence number associated with the transaction for each of one or more participant nodes, each max log sequence number corresponding to a highest transaction log record required for the transaction at the respective participant node; and
commit the transaction at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node.
2. The product of claim 1 , wherein
track the tail log sequence number for each of the all other nodes in the distributed system comprises:
maintain a first array comprising an entry for each of the all other nodes in the distributed system, each entry being operably configured to store the tail log sequence number for a respective other node, and
update the first array when a new tail log sequence number is piggybacked on a message from one of the all other nodes;
determine the max log sequence number associated with the transaction for each of the one or more participant nodes comprises:
create a second array for the transaction, the second array comprising an entry for each of the one or more participant nodes, wherein each entry is operable to store the max log sequence number associated with the transaction for the respective participant node, and
update the second array when the max log sequence number associated with the transaction is piggybacked on a response from one node of the one or more participant nodes; and
commit the transaction at the coordinator node comprises:
compare the second array to the first array to determine whether each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node,
write a commit log record for the transaction at the coordinator node when each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node,
schedule a commit request to be sent to each of the one or more participant nodes along with another message already scheduled to be sent to the respective participant node when each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node, and
return a successful commit message to the client when each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node.
3. The product of claim 2 , wherein the new tail log sequence number is not the most current transaction log record flushed by the node and the message comprises one or more responses, one or more requests, or at least one response and at least one request.
4. The product of claim 1 , wherein the instructions when executed on the computer further causes the computer to:
reclaim a log space at the coordinator node assigned to retain information about the state of the transaction when a response is received from each of the one or more participant nodes specifying that the transaction has been committed at the respective participant node, wherein each response is piggybacked on an unrelated message being sent from the respective participant node to the coordinator node.
5. The product of claim 1 , wherein a first agent on the coordinator node handles execution of the transaction and a second agent on the coordinator node handles commitment of the transaction.
6. The product of claim 1 , wherein the instructions when executed on the computer further causes the computer to:
send a message to at least one participant node to flush one or more transaction log records at the at least one participant node when the tail log sequence number for the at least one participant node is less than the max log sequence number associated with the transaction at the at least one participant node.
7. A distributed data processing system having a plurality of system nodes each of the plurality of system nodes being a respective data processing system where at least one of the plurality of system nodes is a client system at a coordinator node and each data processing system includes a processor coupled to one or more memory elements through a respective system bus of the data processing system and at least one input and at least one output device, the system comprising:
a plurality of databases; and
a plurality of nodes connected to one another, each of the plurality nodes being coupled to one or more of the plurality of databases, wherein at least one of the plurality of nodes is operable to:
receive a request from a client to commit a transaction at the at least one node, the at least one node being a coordinator node for the transaction, wherein one or more of the plurality of nodes are participant nodes;
track a tail log sequence number for each of all other nodes in the distributed system, each tail log sequence number being associated with a last transaction log record flushed by a respective node, wherein at least one node of the all other nodes is a participant node;
determine a max log sequence number associated with the transaction for each of one or more participant nodes, each max log sequence number corresponding to a highest transaction log record required for the transaction at the respective participant node; and
commit the transaction at the coordinator node when the tail log sequence number for each of the one or more participant nodes is greater than or equal to the max log sequence number associated with the transaction at the respective participant node.
8. The system of claim 7 , wherein
track the tail log sequence number for each of the all other nodes in the distributed system comprises:
maintain a first array comprising an entry for each of the all other nodes in the distributed system, each entry being operably configured to store the tail log sequence number for the respective other node, and
update the first array when a new tail log sequence number is piggybacked on a message from one of the all other nodes;
determine the max log sequence number associated with the transaction for each node of the one or more participant nodes comprises:
create a second array for the transaction, the second array comprising an entry for each node of the one or more participant nodes, wherein each entry is operable to store the max log sequence number associated with the transaction for the respective participant node, and
update the second array when the max log sequence number associated with the transaction is piggybacked on a response from one of the one or more participant nodes; and
commit the transaction at the coordinator node comprises:
compare the second array to the first array to determine whether each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node,
write a commit log record for the transaction at the coordinator node when each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node,
schedule a commit request to be sent to each of the one or more participant nodes along with another message already scheduled to be sent to the respective participant node when each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node, and
return a successful commit message to the client when each of the one or more participant nodes has flushed the highest transaction log record required for the transaction at the respective participant node.
9. The system of claim 8 , wherein the new tail log sequence number is not the most current transaction log record flushed by the node and the message comprises one or more responses, one or more requests, or at least one response and at least one request.
10. The system of claim 8 , wherein the coordinator node is further operably configured to:
reclaim a log space at the coordinator node assigned to retain information about the state of the transaction when a response is received from each node of the one or more participant nodes specifying that the transaction has been committed at the respective participant node, wherein each response is piggybacked on an unrelated message being sent from the respective participant node to the coordinator node.
11. The distributed system of claim 7 , wherein a first agent on the coordinator node handles execution of the transaction and a second agent on the coordinator node handles commitment of the transaction and the coordinator node is further operable to:
send a message to at least one participant node to flush one or more transaction log records at the at least one participant node when the tail log sequence number for the at least one participant node is less than the max log sequence number associated with the transaction at the at least one participant node.
12. The product of claim 1 , wherein, for each tail log sequence number, the last transaction record flushed by the respective node and being associated therewith is not an actual last transaction log record flushed by the respective node.
13. The product of claim 12 , wherein the last transaction record flushed is an earlier transaction log record.
14. The system of claim 7 , wherein, for each tail log sequence number, the last transaction record flushed by the respective node and being associated therewith is not an actual last transaction log record flushed by the respective node.
15. The system of claim 14 , wherein the last transaction record flushed is an earlier transaction log record.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/131,006 US20080235245A1 (en) | 2005-12-19 | 2008-05-30 | Commitment of transactions in a distributed system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/312,994 US7725446B2 (en) | 2005-12-19 | 2005-12-19 | Commitment of transactions in a distributed system |
US12/131,006 US20080235245A1 (en) | 2005-12-19 | 2008-05-30 | Commitment of transactions in a distributed system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/312,994 Continuation US7725446B2 (en) | 2005-12-19 | 2005-12-19 | Commitment of transactions in a distributed system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080235245A1 true US20080235245A1 (en) | 2008-09-25 |
Family
ID=38024130
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/312,994 Active 2027-01-19 US7725446B2 (en) | 2005-12-19 | 2005-12-19 | Commitment of transactions in a distributed system |
US12/131,006 Abandoned US20080235245A1 (en) | 2005-12-19 | 2008-05-30 | Commitment of transactions in a distributed system |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/312,994 Active 2027-01-19 US7725446B2 (en) | 2005-12-19 | 2005-12-19 | Commitment of transactions in a distributed system |
Country Status (7)
Country | Link |
---|---|
US (2) | US7725446B2 (en) |
EP (1) | EP1963972A2 (en) |
JP (1) | JP2009520253A (en) |
CN (1) | CN101341466A (en) |
CA (1) | CA2633309A1 (en) |
TW (1) | TW200805162A (en) |
WO (1) | WO2007071592A2 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063486A1 (en) * | 2007-08-29 | 2009-03-05 | Dhairesh Oza | Data replication using a shared resource |
US20110178984A1 (en) * | 2010-01-18 | 2011-07-21 | Microsoft Corporation | Replication protocol for database systems |
US20110191299A1 (en) * | 2010-02-01 | 2011-08-04 | Microsoft Corporation | Logical data backup and rollback using incremental capture in a distributed database |
US20140040898A1 (en) * | 2012-07-31 | 2014-02-06 | Alan H. Karp | Distributed transaction processing |
US20140337303A1 (en) * | 2013-05-07 | 2014-11-13 | Red Hat, Inc | Bandwidth optimized two-phase commit protocol for distributed transactions |
US20150254273A1 (en) * | 2009-12-18 | 2015-09-10 | Microsoft Technology Licensing, Llc | Distributed transaction management |
US20150261563A1 (en) * | 2014-03-17 | 2015-09-17 | International Business Machines Corporation | Passive two-phase commit system for high-performance distributed transaction execution |
CN109002462A (en) * | 2018-06-04 | 2018-12-14 | 北京明朝万达科技股份有限公司 | A kind of method and system for realizing distributed things |
US10379977B2 (en) | 2014-05-30 | 2019-08-13 | Huawei Technologies Co., Ltd. | Data management method, node, and system for database cluster |
JP2020500458A (en) * | 2018-11-27 | 2020-01-09 | アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited | Information protection system and method |
US10700850B2 (en) | 2018-11-27 | 2020-06-30 | Alibaba Group Holding Limited | System and method for information protection |
US10715500B2 (en) | 2018-11-27 | 2020-07-14 | Alibaba Group Holding Limited | System and method for information protection |
US10726657B2 (en) | 2018-11-27 | 2020-07-28 | Alibaba Group Holding Limited | System and method for information protection |
US11080694B2 (en) | 2018-11-27 | 2021-08-03 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US11102184B2 (en) | 2018-11-27 | 2021-08-24 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US11144918B2 (en) | 2018-08-06 | 2021-10-12 | Advanced New Technologies Co., Ltd. | Method, apparatus and electronic device for blockchain transactions |
WO2022001629A1 (en) * | 2020-06-29 | 2022-01-06 | 华为技术有限公司 | Database system, and method and apparatus for managing transactions |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7765361B2 (en) * | 2006-11-21 | 2010-07-27 | Microsoft Corporation | Enforced transaction system recoverability on media without write-through |
CN101706811B (en) * | 2009-11-24 | 2012-01-25 | 中国科学院软件研究所 | Transaction commit method of distributed database system |
US10198463B2 (en) | 2010-04-16 | 2019-02-05 | Salesforce.Com, Inc. | Methods and systems for appending data to large data volumes in a multi-tenant store |
JP2012018449A (en) * | 2010-07-06 | 2012-01-26 | Fujitsu Ltd | Snapshot acquisition processing program, snapshot acquisition processing method, snapshot participant computer, and snap shot coordinator computer |
CN102571850B (en) * | 2010-12-24 | 2014-08-06 | 中国移动通信集团山东有限公司 | Transaction committing system, method and equipment |
US8442962B2 (en) * | 2010-12-28 | 2013-05-14 | Sap Ag | Distributed transaction management using two-phase commit optimization |
US9047331B2 (en) * | 2011-04-21 | 2015-06-02 | International Business Machines Corporation | Scalable row-store with consensus-based replication |
US9760584B2 (en) | 2012-03-16 | 2017-09-12 | Oracle International Corporation | Systems and methods for supporting inline delegation of middle-tier transaction logs to database |
US9146944B2 (en) * | 2012-03-16 | 2015-09-29 | Oracle International Corporation | Systems and methods for supporting transaction recovery based on a strict ordering of two-phase commit calls |
US9665392B2 (en) | 2012-03-16 | 2017-05-30 | Oracle International Corporation | System and method for supporting intra-node communication based on a shared memory queue |
CN103257987A (en) * | 2012-12-30 | 2013-08-21 | 北京讯鸟软件有限公司 | Rule-based distributed log service implementation method |
US9195542B2 (en) * | 2013-04-29 | 2015-11-24 | Amazon Technologies, Inc. | Selectively persisting application program data from system memory to non-volatile data storage |
CN103294479A (en) * | 2013-06-19 | 2013-09-11 | 成都市欧冠信息技术有限责任公司 | Distribution type transaction processing method and system |
CN106997305B (en) * | 2013-10-29 | 2020-09-29 | 华为技术有限公司 | Transaction processing method and device |
CN105900059B (en) | 2014-01-21 | 2019-06-07 | 甲骨文国际公司 | System and method for supporting multi-tenant in application server, cloud or other environment |
US10348822B2 (en) * | 2014-01-21 | 2019-07-09 | Oracle International Corporation | System and method for clustering in a multitenant application server environment |
US10203981B2 (en) * | 2014-02-28 | 2019-02-12 | Red Hat, Inc. | Systems and methods for prepare list communication to participants in two-phase commit protocol transaction processing |
TWI556612B (en) * | 2014-04-29 | 2016-11-01 | 鼎捷軟件股份有限公司 | Timeout controlling unit for remote procedure call and method for remote procedure call |
US9613078B2 (en) | 2014-06-26 | 2017-04-04 | Amazon Technologies, Inc. | Multi-database log with multi-item transaction support |
JP6748638B2 (en) | 2014-09-24 | 2020-09-02 | オラクル・インターナショナル・コーポレイション | System and method for supporting patching in a multi-tenant application server environment |
US10318280B2 (en) | 2014-09-24 | 2019-06-11 | Oracle International Corporation | System and method for supporting patching in a multitenant application server environment |
US10430402B2 (en) * | 2015-01-16 | 2019-10-01 | Red Hat, Inc. | Distributed transaction with dynamic form |
US10250512B2 (en) | 2015-01-21 | 2019-04-02 | Oracle International Corporation | System and method for traffic director support in a multitenant application server environment |
WO2017013636A1 (en) * | 2015-07-23 | 2017-01-26 | Telefonaktiebolaget Lm Ericsson (Publ) | Leaderless consistency protocol |
CN107797850B (en) * | 2016-08-30 | 2021-09-21 | 阿里巴巴集团控股有限公司 | Method, device and system for distributed transaction processing |
US10552409B2 (en) | 2016-11-30 | 2020-02-04 | International Business Machines Corporation | Dynamically optimizing flows in a distributed transaction processing environment |
US11120006B2 (en) * | 2018-06-21 | 2021-09-14 | Amazon Technologies, Inc. | Ordering transaction requests in a distributed database according to an independently assigned sequence |
CN109783204A (en) * | 2018-12-28 | 2019-05-21 | 咪咕文化科技有限公司 | Distributed transaction processing method, device and storage medium |
US11681657B2 (en) * | 2019-07-31 | 2023-06-20 | EMC IP Holding Company, LLC | System and method for parallel flushing with bucketized data |
CN111124635B (en) * | 2019-12-06 | 2024-07-12 | 北京达佳互联信息技术有限公司 | Task processing method, device, electronic equipment and storage medium |
US11392614B2 (en) * | 2020-01-15 | 2022-07-19 | EMC IP Holding Company LLC | Techniques for performing offload copy operations |
CN111858171B (en) * | 2020-07-10 | 2024-03-12 | 上海达梦数据库有限公司 | Data backup method, device, equipment and storage medium |
CN113377502B (en) * | 2021-06-10 | 2024-06-14 | 上海达梦数据库有限公司 | Transaction processing method, device, server, database management system and medium |
US11768741B2 (en) * | 2021-07-30 | 2023-09-26 | International Business Machines Corporation | Replicating changes written by a transactional virtual storage access method |
JP7377305B2 (en) * | 2022-03-30 | 2023-11-09 | 株式会社日立製作所 | Distributed transaction control system and distributed transaction control method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6510421B1 (en) * | 1998-12-29 | 2003-01-21 | Oracle Corporation | Performing 2-phase commit with presumed prepare |
US6564215B1 (en) * | 1999-12-16 | 2003-05-13 | International Business Machines Corporation | Update support in database content management |
US20040148289A1 (en) * | 2003-08-01 | 2004-07-29 | Oracle International Corporation | One-phase commit in a shared-nothing database system |
US7100076B2 (en) * | 2003-05-09 | 2006-08-29 | Hewlett-Packard Development Company, L.P. | Minimum latency reinstatement of database transaction locks |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5335343A (en) * | 1992-07-06 | 1994-08-02 | Digital Equipment Corporation | Distributed transaction processing using two-phase commit protocol with presumed-commit without log force |
US5799305A (en) | 1995-11-02 | 1998-08-25 | Informix Software, Inc. | Method of commitment in a distributed database transaction |
DE60111072T2 (en) * | 2000-10-26 | 2006-01-26 | Prismedia Networks, Inc., San Jose | METHOD AND APPARATUS FOR PARALLEL MESSAGE TRANSMISSION IN REAL TIME OF FILE-SEPARATE |
-
2005
- 2005-12-19 US US11/312,994 patent/US7725446B2/en active Active
-
2006
- 2006-12-04 TW TW095144893A patent/TW200805162A/en unknown
- 2006-12-12 EP EP06841332A patent/EP1963972A2/en not_active Withdrawn
- 2006-12-12 CA CA002633309A patent/CA2633309A1/en not_active Abandoned
- 2006-12-12 WO PCT/EP2006/069587 patent/WO2007071592A2/en active Application Filing
- 2006-12-12 JP JP2008544985A patent/JP2009520253A/en active Pending
- 2006-12-12 CN CNA2006800478642A patent/CN101341466A/en active Pending
-
2008
- 2008-05-30 US US12/131,006 patent/US20080235245A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6510421B1 (en) * | 1998-12-29 | 2003-01-21 | Oracle Corporation | Performing 2-phase commit with presumed prepare |
US6684223B1 (en) * | 1998-12-29 | 2004-01-27 | Oracle International Corporation | Performing 2-phase commit with presumed prepare |
US6564215B1 (en) * | 1999-12-16 | 2003-05-13 | International Business Machines Corporation | Update support in database content management |
US7100076B2 (en) * | 2003-05-09 | 2006-08-29 | Hewlett-Packard Development Company, L.P. | Minimum latency reinstatement of database transaction locks |
US20040148289A1 (en) * | 2003-08-01 | 2004-07-29 | Oracle International Corporation | One-phase commit in a shared-nothing database system |
US6845384B2 (en) * | 2003-08-01 | 2005-01-18 | Oracle International Corporation | One-phase commit in a shared-nothing database system |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8527454B2 (en) * | 2007-08-29 | 2013-09-03 | Emc Corporation | Data replication using a shared resource |
US20090063486A1 (en) * | 2007-08-29 | 2009-03-05 | Dhairesh Oza | Data replication using a shared resource |
US10114837B2 (en) * | 2009-12-18 | 2018-10-30 | Microsoft Technology Licensing, Llc | Distributed transaction management |
US20150254273A1 (en) * | 2009-12-18 | 2015-09-10 | Microsoft Technology Licensing, Llc | Distributed transaction management |
US20110178984A1 (en) * | 2010-01-18 | 2011-07-21 | Microsoft Corporation | Replication protocol for database systems |
US20110191299A1 (en) * | 2010-02-01 | 2011-08-04 | Microsoft Corporation | Logical data backup and rollback using incremental capture in a distributed database |
US8825601B2 (en) | 2010-02-01 | 2014-09-02 | Microsoft Corporation | Logical data backup and rollback using incremental capture in a distributed database |
US20140040898A1 (en) * | 2012-07-31 | 2014-02-06 | Alan H. Karp | Distributed transaction processing |
US9465648B2 (en) * | 2012-07-31 | 2016-10-11 | Hewlett Packard Enterprise Development Lp | Distributed transaction processing through commit messages sent to a downstream neighbor |
US20140337303A1 (en) * | 2013-05-07 | 2014-11-13 | Red Hat, Inc | Bandwidth optimized two-phase commit protocol for distributed transactions |
US9201919B2 (en) * | 2013-05-07 | 2015-12-01 | Red Hat, Inc. | Bandwidth optimized two-phase commit protocol for distributed transactions |
US20150261563A1 (en) * | 2014-03-17 | 2015-09-17 | International Business Machines Corporation | Passive two-phase commit system for high-performance distributed transaction execution |
US10296371B2 (en) * | 2014-03-17 | 2019-05-21 | International Business Machines Corporation | Passive two-phase commit system for high-performance distributed transaction execution |
US10379977B2 (en) | 2014-05-30 | 2019-08-13 | Huawei Technologies Co., Ltd. | Data management method, node, and system for database cluster |
US10860447B2 (en) | 2014-05-30 | 2020-12-08 | Huawei Technologies Co., Ltd. | Database cluster architecture based on dual port solid state disk |
CN109002462A (en) * | 2018-06-04 | 2018-12-14 | 北京明朝万达科技股份有限公司 | A kind of method and system for realizing distributed things |
US11295303B2 (en) | 2018-08-06 | 2022-04-05 | Advanced New Technologies Co., Ltd. | Method, apparatus and electronic device for blockchain transactions |
US11144918B2 (en) | 2018-08-06 | 2021-10-12 | Advanced New Technologies Co., Ltd. | Method, apparatus and electronic device for blockchain transactions |
US10700850B2 (en) | 2018-11-27 | 2020-06-30 | Alibaba Group Holding Limited | System and method for information protection |
US11102184B2 (en) | 2018-11-27 | 2021-08-24 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US10726657B2 (en) | 2018-11-27 | 2020-07-28 | Alibaba Group Holding Limited | System and method for information protection |
US10885735B2 (en) | 2018-11-27 | 2021-01-05 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US10892888B2 (en) | 2018-11-27 | 2021-01-12 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US10938549B2 (en) | 2018-11-27 | 2021-03-02 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US11080694B2 (en) | 2018-11-27 | 2021-08-03 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US10748370B2 (en) | 2018-11-27 | 2020-08-18 | Alibaba Group Holding Limited | System and method for information protection |
US11127002B2 (en) | 2018-11-27 | 2021-09-21 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US10715500B2 (en) | 2018-11-27 | 2020-07-14 | Alibaba Group Holding Limited | System and method for information protection |
US11218455B2 (en) | 2018-11-27 | 2022-01-04 | Advanced New Technologies Co., Ltd. | System and method for information protection |
JP2020500458A (en) * | 2018-11-27 | 2020-01-09 | アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited | Information protection system and method |
US11277389B2 (en) | 2018-11-27 | 2022-03-15 | Advanced New Technologies Co., Ltd. | System and method for information protection |
US11282325B2 (en) | 2018-11-27 | 2022-03-22 | Advanced New Technologies Co., Ltd. | System and method for information protection |
WO2022001629A1 (en) * | 2020-06-29 | 2022-01-06 | 华为技术有限公司 | Database system, and method and apparatus for managing transactions |
Also Published As
Publication number | Publication date |
---|---|
CN101341466A (en) | 2009-01-07 |
WO2007071592A3 (en) | 2007-08-09 |
JP2009520253A (en) | 2009-05-21 |
EP1963972A2 (en) | 2008-09-03 |
WO2007071592A2 (en) | 2007-06-28 |
US7725446B2 (en) | 2010-05-25 |
WO2007071592B1 (en) | 2007-10-04 |
CA2633309A1 (en) | 2007-06-28 |
TW200805162A (en) | 2008-01-16 |
US20070143299A1 (en) | 2007-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7725446B2 (en) | Commitment of transactions in a distributed system | |
US7814065B2 (en) | Affinity-based recovery/failover in a cluster environment | |
US6510421B1 (en) | Performing 2-phase commit with presumed prepare | |
JP4293790B2 (en) | Disk writing in a distributed shared disk system. | |
US7392335B2 (en) | Anticipatory changes to resources managed by locks | |
US20050283658A1 (en) | Method, apparatus and program storage device for providing failover for high availability in an N-way shared-nothing cluster system | |
US8103643B2 (en) | System and method for performing distributed transactions using global epochs | |
US20100306256A1 (en) | Distributed Database Write Caching With Limited Durability | |
US9189303B2 (en) | Shadow queues for recovery of messages | |
CN111259083A (en) | Distributed transaction processing method and device | |
US6862595B1 (en) | Method and apparatus for implementing a shared message queue using a list structure | |
CN102831156A (en) | Distributed transaction processing method on cloud computing platform | |
US10970273B2 (en) | Aiding resolution of a transaction | |
CN111258976A (en) | Distributed lock implementation method, system, device and storage medium | |
US7693882B2 (en) | Replicating data across the nodes in a cluster environment | |
US20090193280A1 (en) | Method and System for In-doubt Resolution in Transaction Processing | |
CN112995262B (en) | Distributed transaction submission method, system and computing equipment | |
US7165061B2 (en) | Transaction optimization of read-only data sources | |
US20090193286A1 (en) | Method and System for In-doubt Resolution in Transaction Processing | |
US6212595B1 (en) | Computer program product for fencing a member of a group of processes in a distributed processing environment | |
US6490595B1 (en) | Method, system and program products for providing efficient syncpoint processing of distributed transactions | |
JP4486689B2 (en) | A method for managing information about where to recover after a failure, a method for recovering after a failure, and a method for recovering the current version of a data item after a failure in a system containing multiple caches | |
US6848037B2 (en) | Data processing arrangement and method | |
US6799172B2 (en) | Method and system for removal of resource manager affinity during restart in a transaction processing system | |
JP2008544371A (en) | How to handle lock-related inconsistencies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HURAS, MATTHEW ALBERT;VINCENT, TIMOTHY JON;REEL/FRAME:021042/0334;SIGNING DATES FROM 20051216 TO 20051219 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |