[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2017131623A1 - Performing operations on a graph - Google Patents

Performing operations on a graph Download PDF

Info

Publication number
WO2017131623A1
WO2017131623A1 PCT/US2016/014794 US2016014794W WO2017131623A1 WO 2017131623 A1 WO2017131623 A1 WO 2017131623A1 US 2016014794 W US2016014794 W US 2016014794W WO 2017131623 A1 WO2017131623 A1 WO 2017131623A1
Authority
WO
WIPO (PCT)
Prior art keywords
vertices
operations
graph
vertex
embedded database
Prior art date
Application number
PCT/US2016/014794
Other languages
French (fr)
Inventor
Hideaki Kimura
Alkis Simitsis
William K. Wilkinson
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2016/014794 priority Critical patent/WO2017131623A1/en
Publication of WO2017131623A1 publication Critical patent/WO2017131623A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the graph engine also uses the embedded database to enforce transactional consistency (e.g., atomicity, consistency, isolation and durability, "ACID") when accessing a key-value pair associated with a vertex or metadata.
  • transactional consistency e.g., atomicity, consistency, isolation and durability, "ACID”
  • the embedded database also performs "lightweight" transactions. These lightweight transactions do not lock significant portions of the database, but rather lock the portion of the database associated with a particular key-value pair.
  • These database transactions also provide high transactional throughput due to the embedded database not locking larger objects, such as partitions of the database or entire tables of a database.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A computing system includes a graph engine comprising an embedded database. The graph engine may: store vertices of a graph in the embedded database as key-value pairs associated with each of the vertices. Each of the key-value pairs may comprise information associated the vertices. The graph engine may further: receive a request indicating operations to perform on a plurality of the vertices, partition the graph into partitions comprising of a subset of the vertices of the graph, partition the vertices into blocks comprising a subset of the vertices of the partitions, and perform the indicated operations on the plurality of the vertices of the request.

Description

Performing Operations on a Graph
BACKGROUND
[0001] A graph is a data structure comprising a set of vertices. The vertices may be connected to each other with edges.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Certain examples are described in the following detailed description and in reference to the drawings, in which:
[0003] FIG. 1 is a conceptual diagram of an example computing system that may perform operations on a graph;
[0004] FIG. 2 is another conceptual diagram of an example computing system that may perform operations on a graph;
[0005] FIG. 3 is another conceptual diagram of an example computing system that may perform operations on a graph;
[0006] FIG. 4 is a flowchart of an example method for performing operations on a graph;
[0007] FIG. 5 is a flowchart of an example method performing operations on a graph; and
[0008] FIG. 6 is a block diagram of an example for performing operations on a graph.
DETAILED DESCRIPTION
[0009] A graph is a data structure comprising vertices. The vertices may be connected to each other with edges. The edges may be bi-directional or unidirectional. Properties may be associated with a vertex or an edge. A computing device may perform operations, e.g. algorithms, on the graph to determine information about the structure of the graph.
[0010] As an example, a graph may be used to represent flight paths of an airline. The vertices of the graph may represent cities. The cities are connected to each other via flight paths, which may be represented using edges that connect the vertices. Each of the vertices may have an associated cost. The associated costs may represent a cost, such as distances associated with the flight paths.
[0011] A computing device may be configured to determine properties about the graph, such as the shortest path, or lowest cost that may be obtained to travel between two cities, based on the available flight paths. Various algorithms, such as Dijkstra's algorithm, the Bellman-Ford algorithm, or the like, may be used to determine lowest cost or shortest path by analyzing and/or navigating the vertices of the graph.
[0012] Graphs may comprise millions or billions of vertices in some cases. Being able to efficiently perform graph workloads in a scalable matter is useful as the number of vertices scales upward. This disclosure is related to a graph engine capable of performing operations on graphs with such large numbers of vertices.
[0013] More particularly, the techniques of this disclosure are related to a graph engine. A graph engine is software specifically designed to perform operations on a graph. The graph engine of this disclosure may scale efficiently to hundreds or thousands of cores, and terabytes of memory. The graph engine of this disclosure may also perform well on non-uniform memory architecture (NUMA) systems, which have non-uniform for transferring data between cores of the computing system.
[0014] The graph engine as described herein comprises, or is coupled with an embedded database. The embedded database is so-called because the graph engine has knowledge about the underlying structure and
performance of the database. The graph engine uses the embedded database to store the vertices of the graph in a particular fashion, and to perform operations on the stored vertices in order to maximize performance. As an example, the graph engine may control partitioning of the database among various NUMA cores, and may control threading of the embedded database.
[0015] The embedded database may store each vertex as a row in the embedded database. Each row may further comprise a set of key-value pairs. Additionally, the embedded database may store additional information associated with each vertex, such as information that indicates the edges that connect with each vertex along with each key-value pair, cost associated with each vertex, and/or metadata.
[0016] The graph engine also uses the embedded database to enforce transactional consistency (e.g., atomicity, consistency, isolation and durability, "ACID") when accessing a key-value pair associated with a vertex or metadata. When performing an operation on a key-value pair and associated information, the embedded database also performs "lightweight" transactions. These lightweight transactions do not lock significant portions of the database, but rather lock the portion of the database associated with a particular key-value pair. These database transactions also provide high transactional throughput due to the embedded database not locking larger objects, such as partitions of the database or entire tables of a database.
[0017] FIG. 1 is a conceptual diagram of an example computing system that may perform operations on a graph. Computing system 100 is illustrated in FIG. 1 . Computing system 100 comprises a graph engine 1 16, embedded database 1 12, and graph 102. Graph engine 1 16 may receive requests, e.g. from a client device. Based on the request, graph engine 1 16 may generate one or more operations 1 14 to perform on graph 102.
[0018] Embedded database 1 12 is coupled with graph engine 1 16, and comprises key value pairs 1 18. Key-value pairs 1 18 are stored in blocks 106A- 106N, and 108A-108N, although illustrated otherwise for the purpose of simplicity. Embedded database 1 12 may comprise a NoSQL database, in some examples. Graph engine 1 16 may control the structure of key-value pairs 1 18, as well as the execution of embedded database 1 12.
[0019] Graph 102 may comprise a set of vertices that are stored as key- value pairs 1 18 within embedded database 1 12. Graph engine 1 16 may cause embedded database 1 12 to store the vertices and edges of graph 102 as entries in a table, denoted as VG, also referred to as "VertexlndexTable." VG is illustrated as table 210 (described below with respect to FIG. 2). The information in the table comprises rows each of which is associated with a vertex. Each row comprises a vertex and information about outgoing edges associated with that vertex. [0020] Each row of VG may comprise a schema corresponding to: [vid, <meta>, edg_cnt, <eidi , metai>, ... , <eidN, metaN>], where "vid" denotes a vertex id that is associated with a particular vertex, "<meta>" denotes metadata associated with the vertex, "edg_cnt" is the out-degree (number of edges that exit) of the vertex, and <eidi , metai >... <eidn, metan> are tuples corresponding to identifiers of each of the outgoing edges of the vertex and any metadata associated with the edges. Depending on the size and complexity of graph 102, graph engine 1 16 and embedded database 1 12 may store the metadata in a table row of VG, or the table row may comprise a reference to the metadata, e.g. a pointer.
[0021 ] Graph engine 1 16 may divide the table comprising the set of vertices of graph 102 into partitions, such as partitions 104A through 104N, where "N" is any number of partitions. Graph engine 1 16 may further divide the vertices of each partition into blocks comprising a subset of the vertices of the partition. For example, partition 104A is divided into blocks 106A-106N, and block 104N is divided into blocks 108A-108N, where "N" may be number of blocks. In various examples, a partition may comprise 256x256 blocks, and each block may comprise a grid having 32 vertices. Graph engine 1 16 may partition the vertices of graph 102 such that each block is assigned to a different core of a NUMA computing device in some examples.
[0022] Graph engine 1 16 may access a row having a particular vertex based on the vertex id, which is a key value that maps to the row of VG having the particular vertex id. Thus, the keys of key-value pairs 1 18 may comprise vertex identifiers and the pair values may comprise the corresponding row of the table, VQ. When graph engine 1 16 performs an operation, such as one of operations 1 14 on a vertex, embedded database 1 12 may perform a transaction on a row associated with a vertex id involved in the operation, and may exclusively lock the row associated with that vertex id during the transaction.
[0023] When performing an operation, graph engine 1 16 may cause embedded database 1 12 to execute a single transaction or multiple transactions depending on whether graph engine 1 16 is performing a navigational operation or an analytical operation on graph 102. A navigational operation may comprise a navigational query of graph 102 that touches or involves a relatively small number of vertices. An example navigational operation may comprise a shortest-path algorithm, which may for example be performed using Dijkstra's algorithm. The navigational query may attempt to find a shortest-path between vertices located within tens of hops of each other. Graph engine 1 16 may cause embedded database 1 16 to perform the navigational operation using a single transaction that executes on a single thread, and/or on a single core of a computing device, as is described in greater detail elsewhere.
[0024] Although described as performing a single transaction, a single transaction may involve or interact with multiple vertices. For example, a navigational operation may interact with vertices 108A-108N. Corresponding to the interaction with vertices 108A-108N, embedded database 1 12 may lock corresponding rows of embedded database 1 12 corresponding to multiple involved vertices in some examples.
[0025] Operations 1 14 may also comprise an analytics operation in various examples. Graph engine 1 16 may also execute the analytics operation using vertices stored in embedded database 1 12. An analytics operation is a query that may access a large fraction of the vertices of graph 102. In some examples, the number of vertices involved may comprise hundreds of millions vertices. As an example referring to FIG. 1 , an analytics operation of operation 1 14 may access most or all of the vertices of each of partitions 104A-104N.
[0026] Graph engine 1 16 may cause embedded database 1 12 to execute multiple transactions when performing an analytics operation. Using multiple transactions may improve analytical operation execution performance. Each transaction may correspond to accessing a particular vertex or to updating a shared table that indicates the progress of the analytics operation. Graph engine 1 16 may use an algorithm, such as a Bellman-Ford algorithm for analytics workloads in some examples. Additional details regarding performing an analytics operations are illustrated in greater detail below.
[0027] FIG. 2 is another conceptual diagram of an example computing device that may perform operations on a graph. FIG. 2 illustrates a computing system 200. Graph engine 1 16 and embedded database 1 12 perform navigational operation 202 in the example of FIG. 2.
[0028] As described above, a navigational operation may involve a smaller number of vertices relative to an analytical operation. In the example of FIG. 2, graph engine 1 16 may receive a request 206, which may comprise a single operation or multiple operations that a client is requesting that graph engine 1 16 perform. Based on request 206, graph engine 1 16 determines that graph engine 1 16 is to perform a navigational operation, and generates navigational operation 202.
[0029] Graph engine 1 16 determines that embedded database 1 12 should perform a single transaction 204 as part of executing navigational operation 202. Transaction 204 may access vertices 108A and 108N in this example. Embedded database 1 12 may access vertices 108A and 108N as part of executing transaction 204. Each navigational transaction, e.g.
transaction 204 may execute as a single thread. Additionally, embedded database 1 12 locks rows 208A and 208N of table 210 because rows 208A and 208N are associated with vertices 108A and 108N as part of performing transaction 204. Upon completing transaction 204, embedded database 1 12 unlocks rows 208A and 208N. As described herein, a lock may comprise a conceptual lock. A conceptual lock may comprise any transaction serialization mechanism that embedded database 1 12 may use to supports high
concurrency.
[0030] FIG. 3 is another conceptual diagram of an example computing device that may perform operations on a graph. FIG. 3 illustrates a computing system 300. Graph engine 1 16 and embedded database 1 12 perform an analytics operation 302 in the example of FIG. 3.
[0031] As described above, an analytical operation, such as analytical operation 302, may involve a much greater number of vertices relative to a navigational operation. In the example of FIG. 3, graph engine 1 16 may receive a request 206, which may comprise a single operation or multiple operations that a client is requesting that graph engine 1 16 perform. Based on request 206, graph engine 1 16 determines that graph engine 1 16 is to perform an analytical operation, and generates analytical operation 302.
[0032] Graph engine 1 16 determines that embedded database 1 12 should perform multiple transactions as part of executing navigational operation 202. More particularly, graph engine 1 16 may determine that analytical operation 302 accesses vertices 108A, 108B, and 108N. Based on the determination that vertices 108A, 108B, and 108N are accessed, embedded database 1 12 generates corresponding transactions, i.e. transactions 304A, 304B, and 304N.
[0033] In some examples, for each access of a vertex, embedded database 1 12 may perform a separate transaction. For example, embedded database 1 12 performs transaction 304A when accessing vertex 108A, and locks corresponding row 208A of table 208A. Similarly, embedded database 1 12 performs transaction 304B when accessing vertex 108B, and locks row 208B, and a similar process when accessing vertex 108N. The transaction may then read, update, or add data to the locked row. Embedded database 1 12 unlocks a row when the corresponding transaction completes. In various examples, transactions 304 may depend upon each other and may be performed in a sequence based on their dependencies. For example, the execution of transaction 304B may depend upon the completion of transaction 304A.
[0034] As an example, analytical operation 302 may comprise a Bellman- Ford algorithm. A Bellman-Ford algorithm determines a single shortest path (SSSP) starting from a given source vertex, and finds the shortest path to every other vertex in the graph that is connected with the source. The Bellman-Ford algorithm may be distributed across partitions and blocks to compute the shortest path from the source vertex in parallel using multiple cores of a computing device.
[0035] To execute the Bellman-ford algorithm or another analytical operation, threads may calculate and propagate so-called "relax" distances from distances form the source vertex. Upon propagating an updated distance value, the completing thread communicates with another thread to request further calculation and propagation. The request for further calculation is referred to as an "activate" operation.
[0036] When performing an analytical operation 302, such as a Bellman- Ford algorithm, graph engine 1 16 may cause embedded database 1 12 to generate and execute a transaction for both distance calculation, and for distance propagation. In the example of FIG. 3, transactions 304A-304N may each calculate a distance value. After a distance value has been calculated, embedded database 1 12 executes a separate transaction to update the distance value in shared table 308. Table 308 may comprise rows, and each row may comprise a vertex identifier, a distance value, and a parent vertex identifier.
[0037] As a more particular example, if vertex 108A is a vertex, graph engine 1 16 uses row table 202 to determine all neighboring vertices and their associated distances. When graph engine determines the distance between vertex 108A and a neighboring vertex, such as vertex 108B, embedded database 1 12 executes a single transaction 104A. Additionally, upon determining the distance between vertex 108A and 108B, embedded database 1 12 executes a separate transaction 306 to update distance in shared table 308.
[0038] To generalize, graph engine 1 16 may partition operations to be performed on graph 102 into multiple computations. Graph engine 1 16 may spawn one or more threads. Each of the threads may be associated with, and may execute one or more of the computations, and each of the threads may attach to a particular core. Examples of such computations may comprise activation and relaxation operations, e.g. performed as part of executing a Bellman-Ford algorithm.
[0039] By using fine-grained locking of rows associated with vertices, as well as shared table 308 to store state for various computations, e.g. for activation and relaxation operations, respectively, the graph engine techniques of this disclosure achieve speed-ups in throughput for analytical workloads. Additionally, the techniques of this disclosure may assign each block, e.g. block 106A, to a different core of a computing device. In NUMA systems, threads may attach to a particular core to be NUMA-aware. That is, graph engine 1 16 may assign cores, e.g. a group of cores comprising a NUMA node, to work on blocks that are near to each other. For a navigational workload, graph engine 1 16 may attach the navigational workload to a particular core.
[0040] The techniques of this disclosure also provide the ability to run navigational and analytical operations (referred to as "mixed workloads") without compromising performance, to execute mixed workloads the same storage engine using transactions, and to execute mixed workloads without having to segregate navigational vs. analytical operations. Experimental observations indicate multiple orders of magnitude improvement when performing analytics workloads on large datasets as compared to other approaches for performing analytical graph workloads in some cases.
[0041] FIG. 4 is a flowchart of an example method for performing operations on a graph. Method 400 may be described below as being executed or performed by a system, for example, computing system 100 (FIG. 1 )
.computing system 200 (FIG. 2), or computing system 300 (FIG. 3). In various examples, method 400 may be performed by hardware, software, firmware, or any combination thereof. Other suitable systems and/or computing devices may be used as well. Method 400 may be implemented in the form of executable instructions stored on at least one machine-readable storage medium of the system and executed by at least one processor of the system. Alternatively or in addition, method 400 may be implemented in the form of electronic circuitry (e.g., hardware). In alternate examples of the present disclosure, one or more blocks of method 400 may be executed substantially concurrently or in a different order than shown in FIG. 4. In alternate examples of the present disclosure, method 400 may include more or fewer blocks than are shown in FIG. 4. In some examples, one or more of the blocks of method 400 may, at certain times, be ongoing and/or may repeat.
[0042] Method 400 may start at block 402 at which point the computing system, e.g. computing system 100, e.g. graph engine 1 16 may store vertices of graph 102 in an embedded database 1 12 comprising key-value pairs 1 18 associated with each of the vertices, e.g. vertices 108A-108N. Each of the key- value pairs may comprises information associated with vertices 108.
[0043] Method 400 may proceed to block 402 at which point graph engine 1 16 may cause embedded database 1 12 to partition the graph into partitions comprised of a subset of the vertices, and to block 402 where graph engine 1 16 and embedded database 1 12 may partition the vertices into blocks, e.g. blocks 106 comprising a subset of the vertices, e.g. vertices 108A of the partition, e.g. partition 104A. Method 400 may proceed to block 408 at which point graph engine 1 16 and/or embedded database 1 12 may perform
operations on a plurality of the vertices of embedded database 1 12, wherein performing the operations comprises performing transactions using the embedded database 1 12
[0044] FIG. 5 is a flowchart of an example method for performing operations on a graph. FIG. 5 illustrates method 500. Method 500 may be described below as being executed or performed by a system, for example, computing system 100 (FIG. 1 ) or computing system 200 (FIG. 2). Other suitable systems and/or computing devices may be used as well. Method 500 may be implemented in the form of executable instructions stored on at least one machine-readable storage medium of the system and executed by at least one processor of the system. Method 500 may be performed by hardware, software, firmware, or any combination thereof.
[0045] Alternatively or in addition, method 500 may be implemented in the form of electronic circuitry (e.g., hardware). In alternate examples of the present disclosure, one or more blocks of method 500 may be executed substantially concurrently or in a different order than shown in FIG. 5. In alternate examples of the present disclosure, method 500 may include more or fewer blocks than are shown in FIG. 5. In some examples, one or more of the blocks of method 500 may, at certain times, be ongoing and/or may repeat.
[0046] In various examples, method 500 may start at block 502, at which point graph engine 1 16 may store vertices of graph 102 in an embedded database 1 12 comprising key-value pairs 1 18 associated with each of the vertices, wherein each of the key-value pairs 1 18 comprises information associated with a vertex of the vertices. In various examples, the associated information may at least one of: an identifier of the vertex, a distance to another vertex of the vertices, or edge information of the vertex.
[0047] Method 500 may proceed to block 504 at which point graph engine 1 16 may cause embedded database 1 12 to partition the graph into partitions comprised of a subset of the vertices, and to block 506 where graph engine 1 16 and embedded database 1 12 may partition the vertices into blocks, e.g. blocks 106 comprising a subset of the vertices, e.g. vertices 108A of the partition, e.g. partition 104A.
[0048] In various examples, method 500 may proceed to block 508, at which point graph engine 1 16 may receive a request, such as request 206, that indicates operations to perform. At block 510, database engine 1 16 and embedded database 1 12 may perform at least one of the operations. The operations may comprise at least one of a navigation operation, e.g.
navigational operation 202, or an analytical operation, e.g. analytical operation 302.
[0049] In various examples, to perform the operations, graph engine 1 16 and embedded database 1 12 may partition the operations into multiple computations. Each of the computations may be handled by a thread associated with one of the operations. The threads may execute each of the operations associated with the threads. The threads may also lock rows of embedded database 1 12, e.g. using conceptual locking. The rows of embedded database 1 12 comprise key-value pairs 1 18, which are associated with each of the operations, e.g. one or more analytical operations 302.
[0050] Graph engine 1 16 and embedded database 1 12 may perform a single transaction for each of the operations comprising navigational operations. Database engine 1 16 and embedded database 1 12 may perform a transaction for each relaxation operation of a vertex of a subset of vertices of a block, and a transaction for each activation operation associated with the vertex.
[0051] In various examples, method 500 may proceed to block 512. To perform block 512, e.g. to perform the operations on a vertex of the subset of vertices of the block, embedded database 1 12 may perform a transaction on the vertex. To perform the transaction, embedded database 1 12 may lock a row of the embedded database corresponding to a vertex of the vertices on which the operation is being performed.
[0052] In some examples, graph 102 may be partitioned into a first block and a second block. In this example, graph engine 1 16 may perform block 514, and may process the first block, e.g. block 106A, on a thread executing on a first core of a computing device and process the second block, e.g. block 106N, on a thread executing on a second core of the computing device.
[0053] FIG. 6 is a block diagram of an example for performing operations on a graph. In the example of FIG. 6, system 600 includes a processor 610 and a machine-readable storage medium 620. Although the following descriptions refer to a single processor and a single machine-readable storage medium, the descriptions may also apply to a system with multiple processors and multiple machine-readable storage mediums. In such examples, the instructions may be distributed (e.g., stored) across multiple machine-readable storage mediums and the instructions may be distributed (e.g., executed by) across multiple processors.
[0054] Processor 610 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 620. In the particular example shown in FIG. 6, processor 610 may fetch, decode, and execute instructions 622, 624, 626, 628 to perform operations on a graph.
[0055] As an alternative or in addition to retrieving and executing instructions, processor 610 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of the instructions in machine-readable storage medium 620. With respect to the executable instruction representations (e.g., boxes) described and shown herein, it should be understood that part or all of the executable instructions and/or electronic circuits included within one box may, in alternate examples, be included in a different box shown in the figures or in a different box not shown. [0056] Machine-readable storage medium 620 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, machine-readable storage medium 620 may be, for example, Random Access Memory (RAM), an Electrically-Erasable
Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like. Machine-readable storage medium 620 may be disposed within system 600, as shown in FIG. 6. In this situation, the executable instructions may be "installed" on the system 600. Alternatively, machine-readable storage medium 620 may be a portable, external or remote storage medium, for example, that allows system 600 to download the instructions from the portable/external/remote storage medium.
[0057] Referring to FIG. 6, vertex storage instructions 622, when executed by a processor (e.g., 610), may cause processor 610 store vertices, e.g. vertices 108 of a graph 102 in an embedded database 1 12 comprising key- value pairs 1 18 associated with each of vertices 108. Each of the key-value pairs 1 18 comprises information associated with a vertex of the vertices. In various examples, the information associated with the one of the vertices may comprise at least one of: an identifier of the vertex, a distance to another vertex of the vertices, or edge information of the vertex.
[0058] Graph partitioning instructions 624, if executed, may cause processor 610 to determine partition the graph into partitions, e .g. partitions 104, comprising a subset of the vertices.
[0059] Processor 610 may execute block partitioning instructions 626 in various examples. Block partitioning instructions 626 may cause processor 610 partition the vertices into blocks comprising a subset of the vertices of the partitions.
[0060] In some examples, processor 610 may execute operation performance instructions 628. Operation performance instructions 628, if executed, may cause processor 610 to perform operations, e.g. operations 1 14, comprising transactions on a plurality of the vertices of embedded database 1 12.

Claims

1 . A method for performing operations on a graph, the method comprising: storing vertices of the graph in an embedded database comprising key- value pairs associated with each of the vertices, wherein each of the key-value pairs comprises information associated with a vertex of the vertices;
partitioning the graph into partitions comprised of a subset of the vertices; partitioning the vertices into blocks comprising a subset of the vertices of the partitions; and
performing operations on a plurality of the vertices of the embedded database, wherein performing the operations comprises performing transactions using the embedded database.
2. The method of claim 1 ,
the method further comprising:
receiving a request that indicates the operations to perform; and performing the operations on the vertices, wherein the operations may comprise at least one of a navigational operation or an analytical operation.
3. The method of claim 1 , wherein the information associated with the one of the vertices comprises at least one of: an identifier of the vertex, a distance to another vertex of the vertices, or edge information of the vertex.
4. The method of claim 1 , wherein the blocks comprise a first block and a second block, the method further comprising:
processing the first block on a thread executing on a first core of a computing device and processing the second block on a thread executing on a second core of the computing device.
5. The method of claim 1 , further comprising:
performing an operation on a vertex of the subset of vertices of the block; and
performing a transaction on the vertex using the embedded database, wherein performing the transaction comprises locking a row of the embedded database corresponding to a vertex of the vertices on which the operation is being performed.
6. The method of claim 1 , wherein performing the operations comprises performing a single transaction for each of the operations comprising
navigational operations.
7. The method of claim 1 , wherein performing the operations comprises partitioning the operations into multiple computations;
spawning threads associated with the computations;
executing, by the associated threads, the computations associated with the threads; and
locking, by each of the threads, rows of the embedded database comprising key-value pairs associated with executing the computations.
8. A computing system comprising:
a graph engine comprising; and
an embedded database;
the graph engine to:
store vertices of a graph in the embedded database as key-value pairs associated with each of the vertices, wherein each of the key-value pairs comprises information associated the vertices,
the graph engine further to:
receive a request indicating operations to perform on a plurality of the vertices;
partition the graph into partitions comprising of a subset of the vertices of the graph;
partition the vertices into blocks comprising a subset of the vertices of the partitions; and
perform the indicated operations on the plurality of the vertices of the request.
9. The computing system of claim 8, further comprising:
a first core to:
perform a first operation of the operations on a first one of the blocks; and
a second core to:
perform a second, different operation of the operations a second one of the blocks, wherein the second block is different than the first block.
10. The computing system of claim 8, wherein the key-value pairs further comprise information associated with the vertices, the information comprising: at least one of: an identifier of the vertex, a distance to another vertex of the vertices, or edge information of the vertex.
1 1 . The computing system of claim 8, wherein the request comprises a plurality of requests, wherein each of the requests comprise a plurality of operations on a set of the vertices of the graph, the graph engine further to: perform the operations of the requests on the set of vertices concurrently.
12. The computing system of claim 8, wherein to perform the operations, the graph engine is further to:
perform a single transaction using the embedded database for each of the operations comprising navigational operations.
13. The computing system of claim 8, wherein to perform the operations, the graph engine further to:
partition the operations into multiple computations;
spawn threads associated with the computations;
execute, by the associated threads, each of the computations associated with the threads; and
lock, by each of the threads, rows of the embedded database comprising key-value pairs associated with executing the computations.
14. A non-transitory machine-readable storage medium encoded with instructions, the instructions that, when executed, cause a processor to:
store vertices of a graph in an embedded database comprising key-value pairs associated with each of the vertices, wherein each of the key-value pairs comprises information associated with a vertex of the vertices;
partition the graph into partitions comprised of a subset of the vertices; partition the vertices of the partitions into blocks comprising a subset of the vertices of the partitions; and
perform operations comprising transactions on a plurality of the vertices of the embedded database.
15. The non-transitory computer-readable storage medium of claim 14,
wherein the information associated with the one of the vertices comprises at least one of: an identifier of the vertex, a distance to another vertex of the vertices, or edge information of the vertex.
PCT/US2016/014794 2016-01-26 2016-01-26 Performing operations on a graph WO2017131623A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2016/014794 WO2017131623A1 (en) 2016-01-26 2016-01-26 Performing operations on a graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2016/014794 WO2017131623A1 (en) 2016-01-26 2016-01-26 Performing operations on a graph

Publications (1)

Publication Number Publication Date
WO2017131623A1 true WO2017131623A1 (en) 2017-08-03

Family

ID=59398290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/014794 WO2017131623A1 (en) 2016-01-26 2016-01-26 Performing operations on a graph

Country Status (1)

Country Link
WO (1) WO2017131623A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169137A1 (en) * 2008-12-31 2010-07-01 Ebay Inc. Methods and systems to analyze data using a graph
US20120310916A1 (en) * 2010-06-04 2012-12-06 Yale University Query Execution Systems and Methods
US20130097133A1 (en) * 2007-11-01 2013-04-18 Ebay Inc. Navigation for large scale graphs
US20140280360A1 (en) * 2013-03-15 2014-09-18 James Webber Graph database devices and methods for partitioning graphs
US20150169687A1 (en) * 2011-05-02 2015-06-18 Ab Initio Technology Llc Managing data queries

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097133A1 (en) * 2007-11-01 2013-04-18 Ebay Inc. Navigation for large scale graphs
US20100169137A1 (en) * 2008-12-31 2010-07-01 Ebay Inc. Methods and systems to analyze data using a graph
US20120310916A1 (en) * 2010-06-04 2012-12-06 Yale University Query Execution Systems and Methods
US20150169687A1 (en) * 2011-05-02 2015-06-18 Ab Initio Technology Llc Managing data queries
US20140280360A1 (en) * 2013-03-15 2014-09-18 James Webber Graph database devices and methods for partitioning graphs

Similar Documents

Publication Publication Date Title
US10296371B2 (en) Passive two-phase commit system for high-performance distributed transaction execution
Arulraj et al. Bridging the archipelago between row-stores and column-stores for hybrid workloads
Liu et al. A synchronization-free algorithm for parallel sparse triangular solves
ES2776975T3 (en) Continuous full scan data storage table and distributed data warehouse with predictable response time for unpredictable workload
US7426511B2 (en) Efficient support of consistent cyclic search with read-copy-update
Stonebraker et al. Intel" big data" science and technology center vision and execution plan
US20180095666A1 (en) Fair High-Throughput Locking For Expedited Grace Periods
US11537579B2 (en) Fast in-memory technique to build a reverse CSR graph index in an RDBMS
US11144499B1 (en) Method for logging update queries
Fomin Consideration of data load time on modern processors for the Verlet table and linked‐cell algorithms
Schwalb et al. Efficient transaction processing for Hyrise in mixed workload environments
Wang et al. Parallelizing approximate single-source personalized pagerank queries on shared memory
US10289723B1 (en) Distributed union all queries
Paznikov et al. Towards efficient implementation of concurrent hash tables and search trees based on software transactional memory
US20180268030A1 (en) Queries based on ranges of hash values
Besta et al. The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores
US11423003B2 (en) Optimistic concurrency control for database transactions
WO2017131623A1 (en) Performing operations on a graph
US20190220209A1 (en) Information processing apparatus, method for control, and non-transitory computer-readable recording medium having stored therein control program
Nishioka et al. Scalable task-parallel SGD on matrix factorization in multicore architectures
Kimura et al. Janus: Transactional processing of navigational and analytical graph queries on many-core servers
Wei et al. Benchmarking apache spark with machine learning applications
Kipf et al. Adaptive geospatial joins for modern hardware
Chirigati et al. Virtual lightweight snapshots for consistent analytics in NoSQL stores
US20160117415A1 (en) Method and system to discover dependencies in datasets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16888360

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16888360

Country of ref document: EP

Kind code of ref document: A1