-
Notifications
You must be signed in to change notification settings - Fork 0
PR: Refactor GraphDB and Update ML Dependency Handling #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…arquetGraphDB. Moved the generalized GraphDB module to parquetdb package as ParquetGraphDB. MatGraphDB should just be a wraper around this. Made subsequent changes thropughout the package
Removed 'torch' from the main dependencies in pyproject.toml and added a new optional '[ml]' dependency group for machine learning packages. Updated README.md to include installation instructions for the ML dependencies, clarifying the process for both CPU and GPU installations. This change improves the organization of dependencies and enhances user guidance for ML setups.
Moved MaterialStore and MatGraphDB from the materials package to the core package to streamline the structure. Updated imports across the codebase to reflect this change, ensuring consistency and maintainability. This refactor simplifies the module organization and enhances the clarity of the project's architecture.
Reorganized the optional dependencies section in pyproject.toml by adding 'matgraphdb[ml]' to the tests group. This change ensures that the necessary testing tools are included for machine learning functionality, improving the clarity and usability of the dependency management.
Moved 'pyg_lib' to the correct position in the optional dependencies section of pyproject.toml. This change improves the organization of the dependencies list, ensuring that all machine learning libraries are clearly defined and properly ordered.
Deleted 'pyg_lib', 'torch_scatter', 'torch_sparse', 'torch_cluster', and 'torch_spline_conv' from the optional dependencies section in pyproject.toml. This change reduces clutter and improves the organization of the dependencies list, ensuring only relevant libraries are included for machine learning functionality.
Updated the graph_db fixture in tests/test_pyg_builder.py to use ParquetGraphDB instead of GraphDB. This change aligns the test setup with the recent restructuring of the database modules, ensuring that tests are executed with the correct database implementation.
Changed imports in the "01 - Getting Started.ipynb" notebook to reflect the new structure of the matgraphdb package. This update ensures that the notebook uses the correct modules from the core package, maintaining consistency with recent refactoring efforts.
…d HeteroGraphBuilder classes
Added nbsphinx configuration to prevent execution of notebooks during the build process. This change ensures that the notebooks are treated as static content, improving the documentation generation workflow. Also updated import statements in the Graph Generators notebook to reflect the correct module paths.
Changed the load method in HeteroGraphBuilder to accept ParquetGraphDB instead of GraphDB. This update aligns the method signature with recent refactoring efforts, ensuring consistency across the codebase.
Replaced instances of GraphBuilder with HeteroGraphBuilder in the test file tests/test_pyg_builder.py. This change aligns the test setup with the recent refactoring of the GraphBuilder class, ensuring consistency in the usage of the updated class name throughout the codebase.
Changed references from "materials" to "material" in test fixtures and assertions to align with recent naming conventions. Updated the load method in HeteroGraphBuilder to load hetero_data with weights_only set to False, ensuring compatibility with the latest data structure. This refactor improves consistency across the codebase and prepares for future enhancements.
Modified .gitignore to specify tmp.py for exclusion and removed the unnecessary entry for *data. This change improves the clarity of the ignored files and ensures that temporary Python files are not tracked. Also added a new Parquet file material_0.parquet to the test data directory, enhancing the test dataset for material-related tests.
Removed outdated references and files related to edge and node generators, and updated the materials API documentation to reflect the new structure. Added new sections for materials and node generators, ensuring clarity and consistency in the API reference. This change enhances the organization of the documentation and prepares for future updates.
Eliminated the verbose parameter from the MaterialStore class methods to simplify the API. This change removes unnecessary complexity and aligns with the current logging practices, as verbosity is no longer needed for logging purposes. Updated the test fixture to set verbosity directly when creating MaterialStore instances, ensuring compatibility with the new method signatures.
Updated import paths to reflect the new structure, ensuring compatibility with existing modules. This change enhances the functionality of the material management system and prepares for future developments.
Implemented a new script to run pytest continuously, capturing test results and logging failures with timestamps. This change allows for better monitoring of test outcomes and facilitates debugging by storing failure details in a dedicated log directory. The script handles graceful shutdown on receiving termination signals, ensuring that test execution can be stopped without abrupt interruptions.
…tories Replaced string-based temporary directory handling with Path objects for improved readability and consistency. This change simplifies directory management in test fixtures, ensuring that paths are handled more robustly across different operating systems. Updated the material_store and matgraphdb fixtures to utilize Path for creating and cleaning up temporary directories, enhancing the overall structure of the test code.
Replaced the existing material_0.parquet file with a new version to enhance the test dataset for material-related tests. This update ensures that the tests are using the most current data, improving the reliability of test outcomes.
Deleted the old matgraphdb_base.rst file and created a new matgraphdb.rst file to better organize the documentation for the MatGraphDB class. This change improves clarity and maintains consistency in the API documentation structure.
Open
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This pull request introduces significant refactoring to the GraphDB module and updates how machine learning dependencies are managed.
Changes:
The generalized GraphDB module has been moved into the parquetdb package and renamed to ParquetGraphDB.
MatGraphDB has been updated to act as a wrapper around the new ParquetGraphDB.
Necessary adjustments have been made throughout the codebase to accommodate these structural changes.
The torch library has been removed from the core dependencies listed in pyproject.toml.
A new optional dependency group, [ml], has been created in pyproject.toml to house machine learning-specific packages like torch.
The README.md file has been updated with detailed instructions on how to install these ML dependencies, including guidance for both CPU and GPU environments. This change is intended to better organize project dependencies and offer clearer setup instructions for users leveraging ML functionalities.
Purpose:
To improve the modularity and organization of the GraphDB components by centralizing the core Parquet-based graph database logic.
To provide a more flexible and explicit way of managing machine learning dependencies, preventing unnecessary installation of heavy libraries for users who do not require ML features.
To enhance the user experience by providing clearer installation instructions for different environments.