PR: Refactor GraphDB and Update ML Dependency Handling #3

lllangWV · 2025-05-08T18:02:58Z

Summary:

This pull request introduces significant refactoring to the GraphDB module and updates how machine learning dependencies are managed.

Changes:

GraphDB Refactoring (Commit 149b285):

The generalized GraphDB module has been moved into the parquetdb package and renamed to ParquetGraphDB.

MatGraphDB has been updated to act as a wrapper around the new ParquetGraphDB.

Necessary adjustments have been made throughout the codebase to accommodate these structural changes.

ML Dependency Management Update:

The torch library has been removed from the core dependencies listed in pyproject.toml.

A new optional dependency group, [ml], has been created in pyproject.toml to house machine learning-specific packages like torch.

The README.md file has been updated with detailed instructions on how to install these ML dependencies, including guidance for both CPU and GPU environments. This change is intended to better organize project dependencies and offer clearer setup instructions for users leveraging ML functionalities.

Purpose:

To improve the modularity and organization of the GraphDB components by centralizing the core Parquet-based graph database logic.
To provide a more flexible and explicit way of managing machine learning dependencies, preventing unnecessary installation of heavy libraries for users who do not require ML features.
To enhance the user experience by providing clearer installation instructions for different environments.

…arquetGraphDB. Moved the generalized GraphDB module to parquetdb package as ParquetGraphDB. MatGraphDB should just be a wraper around this. Made subsequent changes thropughout the package

Removed 'torch' from the main dependencies in pyproject.toml and added a new optional '[ml]' dependency group for machine learning packages. Updated README.md to include installation instructions for the ML dependencies, clarifying the process for both CPU and GPU installations. This change improves the organization of dependencies and enhances user guidance for ML setups.

Moved MaterialStore and MatGraphDB from the materials package to the core package to streamline the structure. Updated imports across the codebase to reflect this change, ensuring consistency and maintainability. This refactor simplifies the module organization and enhances the clarity of the project's architecture.

Reorganized the optional dependencies section in pyproject.toml by adding 'matgraphdb[ml]' to the tests group. This change ensures that the necessary testing tools are included for machine learning functionality, improving the clarity and usability of the dependency management.

Moved 'pyg_lib' to the correct position in the optional dependencies section of pyproject.toml. This change improves the organization of the dependencies list, ensuring that all machine learning libraries are clearly defined and properly ordered.

Deleted 'pyg_lib', 'torch_scatter', 'torch_sparse', 'torch_cluster', and 'torch_spline_conv' from the optional dependencies section in pyproject.toml. This change reduces clutter and improves the organization of the dependencies list, ensuring only relevant libraries are included for machine learning functionality.

Updated the graph_db fixture in tests/test_pyg_builder.py to use ParquetGraphDB instead of GraphDB. This change aligns the test setup with the recent restructuring of the database modules, ensuring that tests are executed with the correct database implementation.

Changed imports in the "01 - Getting Started.ipynb" notebook to reflect the new structure of the matgraphdb package. This update ensures that the notebook uses the correct modules from the core package, maintaining consistency with recent refactoring efforts.

…d HeteroGraphBuilder classes

Added nbsphinx configuration to prevent execution of notebooks during the build process. This change ensures that the notebooks are treated as static content, improving the documentation generation workflow. Also updated import statements in the Graph Generators notebook to reflect the correct module paths.

Changed the load method in HeteroGraphBuilder to accept ParquetGraphDB instead of GraphDB. This update aligns the method signature with recent refactoring efforts, ensuring consistency across the codebase.

Replaced instances of GraphBuilder with HeteroGraphBuilder in the test file tests/test_pyg_builder.py. This change aligns the test setup with the recent refactoring of the GraphBuilder class, ensuring consistency in the usage of the updated class name throughout the codebase.

Changed references from "materials" to "material" in test fixtures and assertions to align with recent naming conventions. Updated the load method in HeteroGraphBuilder to load hetero_data with weights_only set to False, ensuring compatibility with the latest data structure. This refactor improves consistency across the codebase and prepares for future enhancements.

Modified .gitignore to specify tmp.py for exclusion and removed the unnecessary entry for *data. This change improves the clarity of the ignored files and ensures that temporary Python files are not tracked. Also added a new Parquet file material_0.parquet to the test data directory, enhancing the test dataset for material-related tests.

Removed outdated references and files related to edge and node generators, and updated the materials API documentation to reflect the new structure. Added new sections for materials and node generators, ensuring clarity and consistency in the API reference. This change enhances the organization of the documentation and prepares for future updates.

Eliminated the verbose parameter from the MaterialStore class methods to simplify the API. This change removes unnecessary complexity and aligns with the current logging practices, as verbosity is no longer needed for logging purposes. Updated the test fixture to set verbosity directly when creating MaterialStore instances, ensuring compatibility with the new method signatures.

Updated import paths to reflect the new structure, ensuring compatibility with existing modules. This change enhances the functionality of the material management system and prepares for future developments.

Implemented a new script to run pytest continuously, capturing test results and logging failures with timestamps. This change allows for better monitoring of test outcomes and facilitates debugging by storing failure details in a dedicated log directory. The script handles graceful shutdown on receiving termination signals, ensuring that test execution can be stopped without abrupt interruptions.

…tories Replaced string-based temporary directory handling with Path objects for improved readability and consistency. This change simplifies directory management in test fixtures, ensuring that paths are handled more robustly across different operating systems. Updated the material_store and matgraphdb fixtures to utilize Path for creating and cleaning up temporary directories, enhancing the overall structure of the test code.

Replaced the existing material_0.parquet file with a new version to enhance the test dataset for material-related tests. This update ensures that the tests are using the most current data, improving the reliability of test outcomes.

Deleted the old matgraphdb_base.rst file and created a new matgraphdb.rst file to better organize the documentation for the MatGraphDB class. This change improves clarity and maintains consistency in the API documentation structure.

lllangWV added 21 commits May 8, 2025 13:57

chore: Moved the generalized GraphDB module to parquetdb package as P…

149b285

…arquetGraphDB. Moved the generalized GraphDB module to parquetdb package as ParquetGraphDB. MatGraphDB should just be a wraper around this. Made subsequent changes thropughout the package

refactor: renamed module from data to builders CrystalGraphBuilder an…

6948140

…d HeteroGraphBuilder classes

refactor: Update load method to use ParquetGraphDB

090c2fc

Changed the load method in HeteroGraphBuilder to accept ParquetGraphDB instead of GraphDB. This update aligns the method signature with recent refactoring efforts, ensuring consistency across the codebase.

maint: Rename matgraphdb_base.py to matgraphdb.py

c50e4ca

Updated import paths to reflect the new structure, ensuring compatibility with existing modules. This change enhances the functionality of the material management system and prepares for future developments.

chore: Update material_0.parquet test data file

8b587f7

Replaced the existing material_0.parquet file with a new version to enhance the test dataset for material-related tests. This update ensures that the tests are using the most current data, improving the reliability of test outcomes.

lllangWV merged commit 0c7db33 into main May 9, 2025
8 checks passed

github-actions bot pushed a commit that referenced this pull request May 9, 2025

Merge PR #3 into next-release

cb5cf0f

github-actions bot mentioned this pull request May 9, 2025

Release v*.*.* #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR: Refactor GraphDB and Update ML Dependency Handling #3

PR: Refactor GraphDB and Update ML Dependency Handling #3

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PR: Refactor GraphDB and Update ML Dependency Handling #3

PR: Refactor GraphDB and Update ML Dependency Handling #3

Uh oh!

Conversation

Summary:

Changes:

Purpose:

Uh oh!

Uh oh!

Uh oh!