Releases: Pometry/Raphtory
v0.15.1
Graphql
- Added new option to output the graphql schema without running the server via
raphtory-graphql schema > schema.graphql
- Graphql now accepts signed integers (bug with underlying library that we patched)
- Created gqldocuments + output nodes and edges as well as gqldocument in that object -- for vector search
- You can now provide a custom UI as part of a private raphtory server.
misc
- Removed dependency on numpy 2.0, will now install/run with <2
- Several library upgrades for CVE reasons.
- Improved python testing pipeline
What's Changed
- enable setting up custom ui through env variable by @ricopinazo in #2000
- Fix reading of Utf8View columns in parquet reader by @ljeub-pometry in #2003
- Output nodes and edges in similarity search by @rachchan in #1975
- Fix/utf8view by @ljeub-pometry in #2005
- Update python dependencies and testing by @ljeub-pometry in #2021
- add as_ref to NodeView by @ljeub-pometry in #2024
- add option to output graphql schema by @ricopinazo in #2023
- update-ui-db132d339 by @miratepuffin in #2029
- Fix security and deps by @miratepuffin in #2025
- Use fixed dynamic_graphql and up rust version to 1.86 by @louisch in #2020
- add patchelf for docs by @miratepuffin in #2032
- Release v0.15.1 by @github-actions in #2031
Full Changelog: v0.15.0...v0.15.1
v0.15.0
API and Model changes
Property changes for Graph to Parquet
As part of our work to unify the in-memory and on-disk storage models of Raphtory and allow us to save directly to formats such as arrow and parquet we have had to make several changes to the model. These include:
- Restricting Map properties such that for each instance of the map in a history, each key has the same property type.
- Restrict List properties such that the values must be the same type.
- Removing Graphs and PersistentGraph properties.
Through this you can now save to/load from parquet via to_parquet
and from_parquet
. Once we have improved this slightly and added the ability to stream updates in, we will be deprecating the proto format for saving and moving fully to parquet. This is because loading from proto is using a huge amount of memory and is quite slow.
If any of these changes affect your use case, please reach out and we can assist.
Algorithm Result replaced with NodeState
One of the major roadmap objectives for Raphtory is to standardise all outputs as either a NodeState
or EdgeState
. These dataframe like structures make post-processing significantly easier and as more functionality is added will allow more complicated pipelines to be optimised automatically by Raphtory, instead of an having to swap over to writing a function in rust.
As part of this release we have replaced all instances of AlgorithmResult
with NodeState
an example of which can be seen below with Pagerank.
These NodeState
objects are indexable and have all of the same functionality perviously available in the AlgorithmResult
.
The only notable change is Group_by
has been renamed to groups
as there is only one value to group on. This returns a NodeGroups
which is also indexable:
Fixing Persistent Graph semantics
- Changed the semantics for edge deletions without a corresponding addition so that they are only considered as an instantaneous event (the edge does not exist before or after)
- Fixed bug where property values for exploded edges were incorrect for the PersistentGraph
- Cleaned up semantics for earliest and latest time on edges accordingly
- Multiple updates at the start of the window are now handled properly
- No more spurious exploded edges if there is an update at the start of the window
Smaller changes/fixes
- Fixed an issue where
contains
andkeys
were giving inconsistent results for edge properties, leading to a panic
g = Graph()
g.add_edge(0, 1, 2, layer="a")
g.add_edge(0, 1, 2)
g.edge(1, 2).add_constant_properties({"test": 1})
constant_exploded = g.layer("a").edges.explode().properties.constant.values() # used to panic here!
- Unified the logic between
update_constant_properties
andadd_constant_properties
on edges to make sure that the edge actually exists in the layer that the constant properties are being added to. - Alongside this unification, if an edge has no temporal updates for one of its layers within a given window, it will now be correctly filtered out of the view - this was previously not happening if that layer had constant properties.
- Fixed a bug where adding empty temporal updates to graph properties incorrectly affected the earliest/latest time
- Removed the get_by_id function on Properties - this was nonsense and is now only available on temporal and constant properties individually.
rolling
andexpanding
can now accept Interval directly instead of complaining about incompatible Error types in the conversion- Fixed a bug where the const properties for edges did not align with the values.
- Materialising and empty graph view now preserves the layer information.
- Fixes bug where loading from DataFrame would miss adding edges to the layer adjacency lists
Graphql
Apply views
It can be quite annoying to parse the response from a Raphtory server when you have a use case where nested views are changed arbitrarily, altering the depth of results. As such we have added a new function applyViews
which allows you to batch in a singular call. This function is available on the Graph, GQLNodes, GQLEdges, Edge and Node.
An example of this can be seen below where we apply excludeNodes
, before
, layers
and edgeFilter
and then get the properties of exploded edges - in the first screenshot (how you would currently do this) the edges appear 6 objects deep, which would change if we removed one of these filters. In the second screenshot the edges are 3 objects deep and this won't change if we add or remove filters. The results will otherwise be the same.
Sorting in Graphql
Unlike in python or rust where it is easy to sort the edge/node iterators on anything you like, in graphql this was not possible. This meant a lot more client side processing and made it impossible to page results if you want them sorted by say earliest time.
As such we have added a sorting functionality to GqlNodes
and GqlEdges
which allow you to order by time, property value and id (or a prioritised combination of these) before paging/listing. An example of this can be seen below where we are sorting nodes first by a property and then by the latest time.
Namespaces and Graph metadata
We have added a new namespace API in graphql which allows you to easily explore the graphs which are present within each path, and explode the childen and parent of each namespace. This will replace the GQLgraphs api which will be deprecated.
Calling the graph function within a namespace will return a new MetaGraph
object which allows you to query information about that graph without loading it - notably the node/edge count, when it was created, and when it was last edited/accessed.
This information is being stored inside the .raph file which will be automatically updated for any graphs you have saved from <0.15.0.
Read write permissions via JWT
We have added a JWT bearer auth layer on top of Raphtory. It does it by using an EdDSA public key, which makes the server responsibility boil down to only two things:
- Correctly validating JWTs.
- Allowing access only to those resources stated in the JWT.
The responsibility for preventing a secret leakage is out of the equation since Raphtory doesn't have access to the private key, responsible for encoding JWTs.
Currently we are using this to specify if users can read (accessing all graphs) or write (able to modify all graphs). However, in future versions this will be used to limit users to specific namespaces and possibly information within each graph.
Other changes
- Changed anywhere that was returning a list of Nodes or list of Edges to GQLNodes and GQLEdges respectively. This is so all output can be correctly paged. If you notice anywhere that is not the case, please do raise an issue.
- The in- and out-components were not applying the one-hop filter resetting correctly - the GQLNodes which are returned will now return back to the graph filter and can be layered/windowed differently than the node which in/out-components was called on.
- Addded an option ids argument to nodes query in GraphQL for getting a subset of the nodes without having to reduce the graph via subgraph.
- Added a new mutation
create_subgraph
which we use to allow saving of graph views in the open source UI. - Removed the ability to create
RemoteEdge
andRemoteNode
directly in python, this should now only be able to be grabbed from aRemoteGraph
- Fix a bug causing NaN float to panic when querying through GraphQL
- Change the schema queries so it doesn't eagerly iterate over all nodes in the graph - if the variants for a property are >100, this will return an empty list to reduce computation.
Algorithms
- The docstrings, method signatures, and return types of many of the algorithms have been standardised as part of the swap to Nodestate from AlgorithmResult
- Fix the order in which nodes are considered in the in- and out-component algorithm so the calculated distances are correct.
- Added integer support to balance algorithm - Previously, edge properties had to be converted to floats. Now ints and floats both work as expected.
- 'clustering_coefficient' is renamed to 'global_clustering_coefficient'. All of the clustering coefficient variants have been moved to a submodule of 'metrics' called 'clustering_coefficient'. It was previously extremely inefficient to run LCC on a group of nodes.
- The new batch version should do a better job of parallelizing the process and reducing overhead.
- Remove inefficient early-culling code from SCC implementation
- The SCC implementation featured a block of code in the beginning which exhaustively checked which nodes belong to a strongly connected component by performing a BFS search and checking if the source node is reachable from itself. In the way this is implemented, this is entirely redundant to the process of just executing Tarjan's SCC algorithm, which it already subsequently executes.
Documentation
- We have added a huge amount of documentation to python and graphql alon...
0.15-beta
API and Model changes
Property changes for Graph to Parquet
As part of our work to unify the in-memory and on-disk storage models of Raphtory and allow us to save directly to formats such as arrow and parquet we have had to make several changes to the model. These include:
- Restricting Map properties such that for each instance of the map in a history, each key has the same property type.
- Restrict List properties such that the values must be the same type.
- Removing Graphs and PersistentGraph properties.
Through this you can now save to/load from parquet via to_parquet
and from_parquet
. Once we have improved this slightly and added the ability to stream updates in, we will be deprecating the proto format for saving and moving fully to parquet. This is because loading from proto is using a huge amount of memory and is quite slow.
If any of these changes affect your use case, please reach out and we can assist.
Algorithm Result replaced with NodeState
One of the major roadmap objectives for Raphtory is to standardise all outputs as either a NodeState
or EdgeState
. These dataframe like structures make post-processing significantly easier and as more functionality is added will allow more complicated pipelines to be optimised automatically by Raphtory, instead of an having to swap over to writing a function in rust.
As part of this release we have replaced all instances of AlgorithmResult
with NodeState
an example of which can be seen below with Pagerank.
These NodeState
objects are indexable and have all of the same functionality perviously available in the AlgorithmResult
.
The only notable change is Group_by
has been renamed to groups
as there is only one value to group on. This returns a NodeGroups
which is also indexable:
Fixing Persistent Graph semantics
- Changed the semantics for edge deletions without a corresponding addition so that they are only considered as an instantaneous event (the edge does not exist before or after)
- Fixed bug where property values for exploded edges were incorrect for the PersistentGraph
- Cleaned up semantics for earliest and latest time on edges accordingly
- Multiple updates at the start of the window are now handled properly
- No more spurious exploded edges if there is an update at the start of the window
Smaller changes/fixes
- Fixed an issue where
contains
andkeys
were giving inconsistent results for edge properties, leading to a panic
g = Graph()
g.add_edge(0, 1, 2, layer="a")
g.add_edge(0, 1, 2)
g.edge(1, 2).add_constant_properties({"test": 1})
constant_exploded = g.layer("a").edges.explode().properties.constant.values() # used to panic here!
- Unified the logic between
update_constant_properties
andadd_constant_properties
on edges to make sure that the edge actually exists in the layer that the constant properties are being added to. - Alongside this unification, if an edge has no temporal updates for one of its layers within a given window, it will now be correctly filtered out of the view - this was previously not happening if that layer had constant properties.
- Fixed a bug where adding empty temporal updates to graph properties incorrectly affected the earliest/latest time
- Removed the get_by_id function on Properties - this was nonsense and is now only available on temporal and constant properties individually.
rolling
andexpanding
can now accept Interval directly instead of complaining about incompatible Error types in the conversion- Fixed a bug where the const properties for edges did not align with the values.
- Materialising and empty graph view now preserves the layer information.
- Fixes bug where loading from DataFrame would miss adding edges to the layer adjacency lists
Graphql
Apply views
It can be quite annoying to parse the response from a Raphtory server when you have a use case where nested views are changed arbitrarily, altering the depth of results. As such we have added a new function applyViews
which allows you to batch in a singular call. This function is available on the Graph, GQLNodes, GQLEdges, Edge and Node.
An example of this can be seen below where we apply excludeNodes
, before
, layers
and edgeFilter
and then get the properties of exploded edges - in the first screenshot (how you would currently do this) the edges appear 6 objects deep, which would change if we removed one of these filters. In the second screenshot the edges are 3 objects deep and this won't change if we add or remove filters. The results will otherwise be the same.
Sorting in Graphql
Unlike in python or rust where it is easy to sort the edge/node iterators on anything you like, in graphql this was not possible. This meant a lot more client side processing and made it impossible to page results if you want them sorted by say earliest time.
As such we have added a sorting functionality to GqlNodes
and GqlEdges
which allow you to order by time, property value and id (or a prioritised combination of these) before paging/listing. An example of this can be seen below where we are sorting nodes first by a property and then by the latest time.
Other changes
- Changed anywhere that was returning a list of Nodes or list of Edges to GQLNodes and GQLEdges respectively. This is so all output can be correctly paged. If you notice anywhere that is not the case, please do raise an issue.
- The in- and out-components were not applying the one-hop filter resetting correctly - the GQLNodes which are returned will now return back to the graph filter and can be layered/windowed differently than the node which in/out-components was called on.
- Addded an option ids argument to nodes query in GraphQL for getting a subset of the nodes without having to reduce the graph via subgraph.
- Added a new mutation
create_subgraph
which we use to allow saving of graph views in the open source UI. - Removed the ability to create
RemoteEdge
andRemoteNode
directly in python, this should now only be able to be grabbed from aRemoteGraph
- Fix a bug causing NaN float to panic when querying through GraphQL
- Change the schema queries so it doesn't eagerly iterate over all nodes in the graph - if the variants for a property are >100, this will return an empty list to reduce computation.
Algorithms
- The docstrings, method signatures, and return types of many of the algorithms have been standardised as part of the swap to Nodestate from AlgorithmResult
- Fix the order in which nodes are considered in the in- and out-component algorithm so the calculated distances are correct.
- Added integer support to balance algorithm - Previously, edge properties had to be converted to floats. Now ints and floats both work as expected.
- 'clustering_coefficient' is renamed to 'global_clustering_coefficient'. All of the clustering coefficient variants have been moved to a submodule of 'metrics' called 'clustering_coefficient'. It was previously extremely inefficient to run LCC on a group of nodes.
- The new batch version should do a better job of parallelizing the process and reducing overhead.
- Remove inefficient early-culling code from SCC implementation
- The SCC implementation featured a block of code in the beginning which exhaustively checked which nodes belong to a strongly connected component by performing a BFS search and checking if the source node is reachable from itself. In the way this is implemented, this is entirely redundant to the process of just executing Tarjan's SCC algorithm, which it already subsequently executes.
Documentation
- We have added a huge amount of documentation to python and graphql alongside improvements to the stub generator to let us know what is missing. There are currently screaming warning everywhere as there is still a lot to add, but should make it much easier to manage this moving forward.
- We have turned the stub generator into a python package that can be installed for use with other projects - This will probably be released to pypi soon.
Vector APIs
- Added default document templates as having default templates is a first step towards a smart search view on the open source UI.
- Update vector API (on the server as well) to allow choosing between using the default template, a custom one, or nothing at all, for each of the three types of entities
- Fixed a bug causing subgraphs to allow containing the same node more than once
- Reviewed public API to stick to temporal_props / constant_props naming convention
Optimisations and misc
- Started work on several known issues when iterating over edges - still much to do, but should be noticeably faster now.
- Calling edges on a subgraph should no longer iterate over all edges in the entire graph to apply the subgraph filter.
- Now Using DoubleEndedIterator for last value in node temporal properties.
- Fix the optimisation that checks if the window is actually a constraint to look at the underlying storage, not the wrapped view (which is both potentially slow and incorrect). This increases performance notably for nested windows.
- Fixed GIL deadlock when ...
v0.14.0
Cached View
We have added a new function .cache_view
which builds a lightweight index of the nodes and edges present in the current view (i.e. when you have applied a window/layer filter etc). If you are running any global algorithms or analytical pipelines over views, this will make your analysis drastically faster!
Example:
g = Graph()
#add some updates
for windowed_graph in g.rolling("1 day"):
cached = windowed_graph.cache_view() #We are gonna run several algorithms, so build an index
rp.weakly_connected_components(cached)
rp.pagerank(cached)
Node and edge filter view
We have added new views for the filtering of Nodes and Edges based upon property values. This includes checking:
- if a property exists/doesn't exist
- if the property value is less than/greater than/equal to a give argument
- if the property value is in/not in a list of given arguments.
Note the edge filters are currently disabled for PersistentGraph
whilst we confirm there are no missing corner cases.
Python example:
from raphtory import Graph
g = Graph
# add some updates
graph.filter_edges(Prop("test_int") > 2)
graph.filter_exploded_edges(Prop("test_str") != "first")
graph.filter_nodes(Prop("node_bool").is_some())
graph.filter_nodes(Prop("node_int") in [2,2,4])
Graphql example:
graph(path: "g") {
nodes {
nodeFilter(
property: "prop1",
condition: {
operator: ANY,
value: [10, 30, 50, 70]
}
) {
list {
8000
name
}
}
}
}
Create Node
Added a create_node
function which works exactly the same as add_node
but will fail if the node is already in the graph. This is mostly useful in Graphql, where it is harder to first check if a node exists, but has been exposed in python as well.
Example:
from raphtory import Graph
g = Graph()
g.create_node(1,1) #Returns fine
g.create_node(1,1) #Throws an exception
g.add_node(1,1) #Returns fine
Import as
Added a set of import_as
functions which allow renaming of nodes and edges when importing from one graph into another.
Example:
from raphtory import Graph
g1 = Graph()
a = g1.add_node(1, "A") #create node A in graph1
g2 = Graph()
g2.import_node_as(a, "X") # import A into graph2 as X - this brings all updates and properties as well
e = g1.add_edge(1,"A","B"") # add edge A->B to graph1
g2.import_edge_as(e,("X","Y")) #import edge A->B into graph2 as X->Y - this brings all updates and properties with it
Python
- When using the Property APIs with any numerical properties Raphtory will now return numpy arrays instead of python lists. This is better for memory usage, faster to hand over from rust, and means aggregations etc are a lot more straight forward.
- Exposed the secondary time index, allowing mangement of updates which occur at the same time.
- Changes
Graph.add_property
toGraph.add_properties
to bring it in line with other APIs. - Fixed a bug in the repr where we were print the wrong edge info (#1808)
- Added wrappers for constructing vecs from any python iterable, meaning
Nodes
andEdges
can be handed over toimport
functions directly without collecting.
Algorithms
- Added FastRP based on "Fast and Accurate Network Embeddings via Very Sparse Random Projection" by Haochen Chen, et al.
- Added maximum-weighted matching based on "Efficient Algorithms for Finding Maximum Matching in Graphs" by Zvi Galil, et al.
- Changed the return of in-component and out-component to include the distance from the starting node.
UI updates
- We have added a
Saved graphs
page which enables you to open whole graphs and get some top level statistics on each of the graphs on your server. An example of this can be seen below. - A whole heap of small bug fixes! We have noted several more (thank you everyone that is reporting them) and shall be blasting through them over the coming weeks before Christmas).
GraphQL
- Added the edge ID function which returns the names of the source and destination as an array.
- Added explode and eplode_layers onto the edges object.
- Added all node property filters to graphql - examples of these can be found here.
- Added the namespace function onto graph/graphql to allow easier grouping by path.
- Removes the ability to create RemoteGraph directly, can now only be done through the client
Core-Raphtory
- Made lazy node state support time ops and layer ops. This allows you to e.g. get a windowed degree for all nodes in the graph. This is a step towards out new NodeState APIs which should be complete soon.
- Exposed several low level APIs to make writing raphtory extensions easier.
- Subgraphs creation is now faster as we no longer need to build a hashset. Counting nodes should also be much faster now as well.
- Made the inner rust value accessible on python NodeState and LazyNodeState wrappers.
- Exposed parquet_loaders in rust.
- updated our pyo3 version for python bindings to the new APIs.
- Removed snmalloc as the build started to fail due to some unknown upstream dependency.
Python Documentation
- Drastically improved the stub generation for hints within python IDEs
- Fixed many missing types/doc strings, incorrect/confusing descriptions
- Added warning for missing docs (still some to fix, but will mean in future we can fix a lot quicker)
Datasets
- Added some properties to the LOTR data for the basic graphRAG example.
What's Changed
- Py speedup1 by @fabianmurariu in #1840
- install rustup + cargo when generating readthedocs by @fabianmurariu in #1846
- Fix existing rust Dockerfile by @ricopinazo in #1844
- Adding initial docker files by @miratepuffin in #1836
- fix docker by @shivam-880 in #1849
- fix docker release by @shivam-880 in #1851
- Fix/workflow by @shivam-880 in #1852
- Update/pyo3 by @ljeub-pometry in #1847
- Make load edges pub in parquet_loaders.rs by @Alnaimi- in #1843
- Node property filters by @ljeub-pometry in #1830
- Feature/graphqlfunctions by @rachchan in #1853
- remove snmalloc by @fabianmurariu in #1856
- max weight matching by @miratepuffin in #1602
- Feature/create node by @shivam-880 in #1855
- Update pull_request_template.md by @miratepuffin in #1858
- add wrapper for constructing vec from any python iterable by @ljeub-pometry in #1862
- Sparse Node temporal props by @fabianmurariu in #1848
- impl filters and tests by @shivam-880 in #1857
- Feature/node state ops by @ljeub-pometry in #1854
- Feature/import as by @shivam-880 in #1859
- impl edge id for graphql and add test by @shivam-880 in #1868
- add fast_rp algorithm by @wyatt-joyner-pometry in #1867
- Fix iconify icons by @ricopinazo in #1863
- improve subgraph count_nodes performance by @ljeub-pometry in #1869
- Add UI section to README.md by @Alnaimi- in #1872
- fix issue with edge repr multiple layer by @shivam-880 in #1870
- no reason to make a Hashset when building a subgraph anymore by @ljeub-pometry in #1874
- update graphql ui by @ricopinazo in #1876
- Various improvements for disk graph by @fabianmurariu in #1866
- Add distance from starting node for in- and out-components by @ljeub-pometry in #1877
- Features/py sec indices by @shivam-880 in #1875
- make the inner rust value accessible on python NodeState and LazyNodeState wrappers by @ljeub-pometry in #1878
- add lotr_graph_with_props function by @ricopinazo in #1881
- Feature/more public apis by @ljeub-pometry in #1879
- Fix stubs with make tidy before release by @miratepuffin in #1880
- Release v0.14.0 by @github-actions in #1865
- Disable auto docker publish by @miratepuffin in #1882
New Contributors
- @wyatt-joyner-pometry made their first contribution in #1867
Full Changelog: v0.13.1...v0.14.0
v0.13.1
What's Changed
- GrapQL improvements for disk graph by @ricopinazo in #1824
- Support multiple layers for disk storage by @fabianmurariu in #1817
- GraphQL optional indexing by @ricopinazo in #1827
- Snapshot at/latest by @ricopinazo in #1832
- more stub cleanup to reduce the number of type errors in tests by @ljeub-pometry in #1834
- exclude LayeredGraph from EdgeFilterOps by @fabianmurariu in #1828
- Embed GraphQL playground into Raphtory UI by @ricopinazo in #1838
- Release v0.13.1 by @github-actions in #1841
Full Changelog: v0.13.0...v0.13.1
v0.13.0
UI Alpha
- We have released the first version of the Raphtory UI. This should work for any graph that you host within your
GraphServer
and is available at/
by default. The graphql playground has been moved to/playground
. - We have many more plans for this UI, but in the meantime if you notice it isn't handling your data correctly, or you find a bug please report and issue and we shall get it fixed.
- Below is an example of the UI with the Lord of the Rings graph loaded:
Small tweaks
- The python doc stubs now error when the return type is incorrect - all current errors have been fixed. We will start to enable more warning and tidy these up fully over the coming releases.
PyDirection
is no more and direction arguments now take strings as input directly (The only way to construct aPyDirection
was via passing in a string anyway so this seemed entirely confusing and useless).- Added layers to the edge repr to show what layers an edge/exploded edge is present in, e.g.
Bug fixes
to_df
inAlgorithmResult
no longer returns internal idsGraph.edges.explode().to_df()
is now equivalent toGraph.edges.to_df(explode=True)
, in particular the history is no longer duplicated for each exploded edge.- The EmbeddingFunction was changed to return a Result to be able to bubble up errors instead of panicking. These changes were propagated all the way up.
- Path inputs in python now use PathBuf instead of String, removing a host of annoying issues, especially in windows.
What's Changed
- make EmbeddingFunction return a result instead of panicking by @ricopinazo in #1806
- Use PathBuf for python path input by @ljeub-pometry in #1813
- Load const props by @fabianmurariu in #1811
- expose encode graph by @shivam-880 in #1812
- add support to exclude edge temp properties on import by @fabianmurariu in #1814
- Edge repr layers by @narnolddd in #1809
- type annotations in stubs created from docs by @ljeub-pometry in #1815
- GraphQL UI by @ricopinazo in #1816
- fix the
to_df
in AlgorithmResult and Edges by @ljeub-pometry in #1820 - Release v0.13.0 by @github-actions in #1823
Full Changelog: v0.12.1...v0.13.0
v0.12.1
Release v0.12.1
- Publish to crates.io
- Publish to PyPi
- Make Tag
- Release to Github
- Auto-generated by [create-pull-request] triggered by release action [1]
[1]: https://github.com/peter-evans/create-pull-request
- Auto-generated by [create-pull-request] triggered by release action [1]
v0.12.0
Obvious breaking changes
In our efforts to better support indexes over properties and vector representations of the graph we have changed the on-disk representation of a Raphtory graph to a folder. Within this folder we can store the graph itself, any vectors, indexes, metadata, etc. required to simplfy the transfer of a graph between your machine and a GraphServer, or between yourself and colleages working on the same data.
As such the function save_to_file()
will now produce a folder containing this new structure. If you would like to continue having a singular file (for purposes of transfer or ease) you can instead call save_to_zip()
. This zip can be directly be read by Raphtory when you call load_from_file()
so don't worry about having to unzip later.
New Vector APIs and integration with the GraphServer
- We have updated vector query APIs as per #1713. The new vector context makes it much easier to query the nodes/edges/graphs by both similarity and structural elements such as neighbours. We have also tried to make it a lot clearer what each function is bringing into the context.
- The embedding function is now set globally for the GraphServer and the conversion between graph/nodes/edges -> Document is now specified via jinja templates. This is to make it possible to store a vectorised Graph on disk.
- Vectors are now updated when a node/edge are updated.
Algorithms
- In_components and out_components have been optimised to do the minimal number of checks before returning a result.
- In_component and out_component have been added for when you only want to get the component for an individual node.
Graphql
- In_component and out_component have been made available on Node within Graphql - this returns a vec of Node objects allowing you to get metadata/properties of the nodes within this component.
- A generated Schema is now available via Graphql to see what the type of all properties are for both nodes and edges.
- We have drastically simplified the plugin APIs for the GraphServer, and now allow both custom mutations and queries. An example of this can be seen here: https://github.com/Pometry/Raphtory/tree/master/examples/custom-gql-apis.
- Added open telemetry tracing to the GraphServer, allowing you to track the speed of all raphtory queries.
- Added better logging throughout the GraphServer.
Edge filtering Alpha
- We have released an alpha of edge property filtering - this allows you to filter both whole edges or updates within edges (exploded edges) in a variety of useful ways (see below).
- This is currently limited to the EventGraph whilst we fix some semantics for the PersistentGraph. Please let us know if you notice anything odd, or unexpected with these if you give them a go.
Latest and is_active
- We have added a
latest()
function onto the graph, node and edge. This is the equivalent of doing x.at(graph.latest_time). This isn't a massive issue in rust/python, but is very helpful in graphql, where you would have to do an initial query to get the latest_time. - We have exposed an
is_active()
function to nodes and edges, allowing you to check if they have any updates within the current window. This is very useful if you are calling rolling or expanding on a node/edge.
Bug fixes and performance improvements
- Floats are now supported timestamps within python.
- Fixed an issue in the motif algorithms where self loops were not being correctly handled.
- Parallelised reading from saved graphs and for generation of new graphs with materialise.
- Fixed the constant properties function in graphql as it was not set to async.
- Fixed several 'off-by-one' errors in the boundary checks for node/edge window inclusion within the PersistentGraph.
- Added the event_graph function to Graph and persistent_graph function to PersistentGraph - these are basically just NoOps, but make it so in python you can call them without knowing what type of graph you currently have.
Commits
- New vector API by @ricopinazo in #1717
- Bool properties and test_graphdb changes by @fabianmurariu in #1767
- Fix node types returned from graphql being empty by @louisch in #1775
- Motif self loops by @narnolddd in #1777
- fix/gql add new apis by @shivam-880 in #1750
- Feature/materialise improvement by @ljeub-pometry in #1773
- Graphql logging by @miratepuffin in #1746
- Implement edge property filtering by @ljeub-pometry in #1781
- Persistent Graph boundry fix by @miratepuffin in #1785
- Feature/latest and active by @miratepuffin in #1786
- impl create update graph gql apis as plugins by @shivam-880 in #1784
- fix issues with casting time columns by @fabianmurariu in #1793
- More edge property filters by @ljeub-pometry in #1783
- GraphQL vector updates by @ricopinazo in #1778
- add float timestamp support in python by @ljeub-pometry in #1794
- impl schema node edge prop type by @shivam-880 in #1789
- Updating request due to issue in quinn-proto (CVE-2024-45311) by @miratepuffin in #1797
- In_component out_component by @miratepuffin in #1790
- Release v0.12.0 by @github-actions in #1798
Full Changelog: v0.11.3...v0.12.0
v0.11.3
Parallel python loaders
Through some elegant dancing around locks, the pandas and parquet loaders now ingest into Raphtory’s underlying graph shards with minimal contention between threads. This has led to an order of magnitude improvement in ingestion speed in several of our use cases!
An example of this can be seen below where the 129 million edges of the Graph500 SF23 dataset are ingested in 25 seconds on a laptop!
Other minor bug fixes
- Don't drop the update in the Cache if the write fails by @ljeub-pometry in #1745
- fix typo in cdf and ccdf functions by @narnolddd in #1748
- More cache writer error handling improvements by @ljeub-pometry in #1747
- fix latest_time for exploded edge on PersistentGraph for edges that are deleted at the same time as created by @ljeub-pometry in #1752
- implementation of temporal rich club by @narnolddd in #1692
- Write-locked graph storage implementation for faster bulk loaders by @ljeub-pometry in #1741
- Release v0.11.3 by @github-actions in #1753
Full Changelog: v0.11.2...v0.11.3
v0.11.2
Release v0.11.2
- Publish to crates.io
- Publish to PyPi
- Make Tag
- Release to Github
- Auto-generated by [create-pull-request] triggered by release action [1]
[1]: https://github.com/peter-evans/create-pull-request
- Auto-generated by [create-pull-request] triggered by release action [1]