A high-performance graph database library with Python bindings written in Rust.
- Installation
- Introduction
- Basic Usage
- Working with Nodes
- Creating Connections
- Filtering and Querying
- Traversing the Graph
- Statistics and Calculations
- Saving and Loading
- Performance Tips
pip install rusty-graph
# upgrade
pip install rusty-graph --upgrade
Rusty Graph is a Rust-based project that aims to empower the generation of high-performance knowledge graphs within Python environments. Specifically designed for aggregating and merging data from SQL databases, Rusty Graph facilitates the seamless transition of relational database information into structured knowledge graphs. By leveraging Rust's efficiency and Python's flexibility, Rusty Graph offers an optimal solution for data scientists and developers looking to harness the power of knowledge graphs in their data-driven applications.
- Efficient Data Integration: Easily import and merge data from SQL databases to construct knowledge graphs, optimizing for performance and scalability.
- High-Performance Operations: Utilize Rust's performance capabilities to handle graph operations, making Rusty Graph ideal for working with large-scale data.
- Python Compatibility: Directly integrate Rusty Graph into Python projects, allowing for a smooth workflow within Python-based data analysis and machine learning pipelines.
- Flexible Graph Manipulation: Create, modify, and query knowledge graphs with a rich set of features, catering to complex data structures and relationships.
import rusty_graph
import pandas as pd
# Create a new knowledge graph
graph = rusty_graph.KnowledgeGraph()
# Create some data using pandas
users_df = pd.DataFrame({
'user_id': [1001, 1002, 1003],
'name': ['Alice', 'Bob', 'Charlie'],
'age': [28, 35, 42]
})
# Add nodes to the graph
graph.add_nodes(
data=users_df,
node_type='User',
unique_id_field='user_id',
node_title_field='name'
)
# View graph schema
print(graph.get_schema())
# Add products to graph
products_df = pd.DataFrame({
'product_id': [101, 102, 103],
'title': ['Laptop', 'Phone', 'Tablet'],
'price': [999.99, 699.99, 349.99],
'stock': [45, 120, 30]
})
graph.add_nodes(
data=products_df,
node_type='Product',
unique_id_field='product_id',
node_title_field='title',
# Optional: specify which columns to include
columns=['product_id', 'title', 'price', 'stock', 'category'],
# Optional: how to handle conflicts with existing nodes
conflict_handling='update' # Options: 'update', 'replace', 'skip', 'preserve'
)
# Get all products
products = graph.type_filter('Product')
# Get node information
product_nodes = products.get_nodes()
print(product_nodes)
# Get specific properties
prices = products.get_properties(['price', 'stock'])
print(prices)
# Get only titles
titles = products.get_titles()
print(titles)
# Purchase data
purchases_df = pd.DataFrame({
'user_id': [1001, 1001, 1002],
'product_id': [101, 103, 102],
'date': ['2023-01-15', '2023-02-10', '2023-01-20'],
'quantity': [1, 2, 1]
})
# Create connections
graph.add_connections(
data=purchases_df,
connection_type='PURCHASED',
source_type='User',
source_id_field='user_id',
target_type='Product',
target_id_field='product_id',
# Optional additional fields to include
columns=['date', 'quantity']
)
# Create connections from currently selected nodes
users = graph.type_filter('User')
products = graph.type_filter('Product')
# This would connect all users to all products with a 'VIEWED' connection
users.selection_to_new_connections(connection_type='VIEWED')
# Filter by exact match
expensive_products = graph.type_filter('Product').filter({'price': 999.99})
# Filter using operators
affordable_products = graph.type_filter('Product').filter({
'price': {'<': 500.0}
})
# Multiple conditions
popular_affordable = graph.type_filter('Product').filter({
'price': {'<': 500.0},
'stock': {'>': 50}
})
# In operator
selected_products = graph.type_filter('Product').filter({
'product_id': {'in': [101, 103]}
})
Orphan nodes are nodes that have no connections (no incoming or outgoing edges). You can filter to include or exclude orphan nodes:
# Get only orphan nodes
orphans = graph.filter_orphans(include_orphans=True)
# Get only nodes that have at least one connection
connected = graph.filter_orphans(include_orphans=False)
# Filter orphans with sorting and limits
recent_orphans = graph.filter_orphans(
include_orphans=True,
sort_spec='created_date',
max_nodes=100
)
# Chain with other operations
product_orphans = graph.type_filter('Product').filter_orphans(include_orphans=True)
Rusty Graph offers flexible options for sorting nodes based on their properties. The sort_spec
parameter can be used in various methods including type_filter()
, filter()
, filter_orphans()
, traverse()
, and the standalone sort()
method.
-
Single field string: Sorts by the specified field in ascending order.
# Sort products by price (lowest to highest) sorted_products = graph.type_filter('Product').sort('price') # Can also be used in other methods cheap_products = graph.type_filter('Product').filter( {'stock': {'>': 10}}, sort_spec='price' )
-
Field with direction: Explicitly specify ascending or descending order.
# Sort products by price (highest to lowest) expensive_first = graph.type_filter('Product').sort('price', ascending=False)
-
List of tuples: For multi-field sorting with different directions.
# First sort by stock (descending), then by price (ascending) # This prioritizes high-stock items, and for items with equal stock, # shows the cheapest ones first complex_sort = graph.type_filter('Product').sort([ ('stock', False), # False = descending order ('price', True) # True = ascending order ])
-
Dictionary with field and direction: Alternative format for single field sorting.
# Sort by rating in descending order top_rated = graph.type_filter('Product').filter( {}, sort_spec={'field': 'rating', 'ascending': False} )
Sort specifications work consistently across methods:
# In type_filter
latest_users = graph.type_filter('User', sort_spec='creation_date', max_nodes=10)
# In filter
new_expensive = graph.type_filter('Product').filter(
{'price': {'>': 500.0}},
sort_spec=[('creation_date', False), ('price', True)]
)
# In traversal
alice_recent_purchases = graph.type_filter('User').filter({'name': 'Alice'}).traverse(
connection_type='PURCHASED',
sort_target='date',
max_nodes=5
)
# In filter_orphans
recent_orphans = graph.filter_orphans(
include_orphans=True,
sort_spec='last_modified',
max_nodes=20
)
# In children_properties_to_list
expensive_products = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
property='title',
sort_spec='price', # Sort children by price before creating the list
max_nodes=3,
store_as='top_expensive_purchases'
)
# Get at most 5 nodes per group
limited_products = graph.type_filter('Product').max_nodes(5)
# Find products purchased by a specific user
alice = graph.type_filter('User').filter({'name': 'Alice'})
alice_products = alice.traverse(
connection_type='PURCHASED',
direction='outgoing'
)
# Access the resulting products
alice_product_data = alice_products.get_nodes()
# Filter the traversal target nodes
expensive_purchases = alice.traverse(
connection_type='PURCHASED',
filter_target={'price': {'>=': 500.0}},
sort_target='price',
max_nodes=10
)
# Get connection information
connection_data = alice.get_connections(include_node_properties=True)
# Get statistics for a property
price_stats = graph.type_filter('Product').statistics('price')
print(price_stats)
# Calculate unique values
unique_categories = graph.type_filter('Product').unique_values(
property='category',
# Store result in node property
store_as='category_list',
max_length=10
)
# Convert children properties to a comma-separated list in parent nodes
# Option 1: Store results in parent nodes
users_with_products = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
property='title', # Default is 'title' if not specified
filter={'price': {'<': 500.0}}, # Optional filtering of children
sort_spec='price', # Optional sorting of children
max_nodes=5, # Optional limit of children per parent
store_as='purchased_products', # Property name to store the list in parent
max_length=100, # Optional maximum string length (adds "..." if truncated)
keep_selection=False # Whether to keep the current selection
)
# Option 2: Get results as a dictionary without storing them
product_names = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
property='title',
sort_spec='price',
max_nodes=5
)
print(product_names) # Returns {'User1': 'Product1, Product2', 'User2': 'Product3, Product4, Product5'}
# Simple calculation: tax inclusive price
with_tax = graph.type_filter('Product').calculate(
expression='price * 1.1',
store_as='price_with_tax'
)
# Aggregate calculations per group
user_spending = graph.type_filter('User').traverse('PURCHASED').calculate(
expression='sum(price * quantity)',
store_as='total_spent'
)
# Count operations
products_per_user = graph.type_filter('User').traverse('PURCHASED').count(
store_as='product_count',
group_by_parent=True
)
# Save graph to file
graph.save("my_graph.bin")
# Load graph from file
loaded_graph = rusty_graph.load("my_graph.bin")
-
Batch Operations: Add nodes and connections in batches rather than individually.
-
Specify Columns: When adding nodes or connections, explicitly specify which columns to include to reduce memory usage.
-
Use Indexing: Filter on node type first before applying other filters.
-
Avoid Overloading: Keep node property count reasonable; too many properties per node will increase memory usage.
-
Conflict Handling: Choose the appropriate conflict handling strategy:
- Use
'update'
to merge new properties with existing ones - Use
'replace'
for a complete overwrite - Use
'skip'
to avoid any changes to existing nodes - Use
'preserve'
to only add missing properties
- Use
-
Connection Direction: Specify direction in traversals when possible to improve performance.
-
Limit Results: Use
max_nodes()
to limit result size when working with large datasets.