8000 PR: Refactor GraphDB and Update ML Dependency Handling by lllangWV · Pull Request #3 · romerogroup/MatGraphDB · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

PR: Refactor GraphDB and Update ML Dependency Handling #3

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 9, 2025

Conversation

lllangWV
Copy link
Member
@lllangWV lllangWV commented May 8, 2025

Summary:

This pull request introduces significant refactoring to the GraphDB module and updates how machine learning dependencies are managed.

Changes:

  • GraphDB Refactoring (Commit 149b285):

The generalized GraphDB module has been moved into the parquetdb package and renamed to ParquetGraphDB.

MatGraphDB has been updated to act as a wrapper around the new ParquetGraphDB.

Necessary adjustments have been made throughout the codebase to accommodate these structural changes.

  • ML Dependency Management Update:

The torch library has been removed from the core dependencies listed in pyproject.toml.

A new optional dependency group, [ml], has been created in pyproject.toml to house machine learning-specific packages like torch.

The README.md file has been updated with detailed instructions on how to install these ML dependencies, including guidance for both CPU and GPU environments. This change is intended to better organize project dependencies and offer clearer setup instructions for users leveraging ML functionalities.

Purpose:

  • To improve the modularity and organization of the GraphDB components by centralizing the core Parquet-based graph database logic.

  • To provide a more flexible and explicit way of managing machine learning dependencies, preventing unnecessary installation of heavy libraries for users who do not require ML features.

  • To enhance the user experience by providing clearer installation instructions for different environments.

lllangWV added 21 commits May 8, 2025 13:57
…arquetGraphDB.

Moved the generalized GraphDB module to parquetdb package as ParquetGraphDB. MatGraphDB should just be a wraper around this. Made subsequent changes thropughout the package
Removed 'torch' from the main dependencies in pyproject.toml and
added a new optional '[ml]' dependency group for machine learning
packages. Updated README.md to include installation instructions
for the ML dependencies, clarifying the process for both CPU
and GPU installations. This change improves the organization
of dependencies and enhances user guidance for ML setups.
Moved MaterialStore and MatGraphDB from the materials package to the core package to streamline the structure. Updated imports across the codebase to reflect this change, ensuring consistency and maintainability. This refactor simplifies the module organization and enhances the clarity of the project's architecture.
Reorganized the optional dependencies section in pyproject.toml by
adding 'matgraphdb[ml]' to the tests group. This change ensures that
the necessary testing tools are included for machine learning
functionality, improving the clarity and usability of the dependency
management.
Moved 'pyg_lib' to the correct position in the optional dependencies
section of pyproject.toml. This change improves the organization
of the dependencies list, ensuring that all machine learning
libraries are clearly defined and properly ordered.
Deleted 'pyg_lib', 'torch_scatter', 'torch_sparse', 'torch_cluster',
and 'torch_spline_conv' from the optional dependencies section in
pyproject.toml. This change reduces clutter and improves the
organization of the dependencies list, ensuring only relevant
libraries are included for machine learning functionality.
Updated the graph_db fixture in tests/test_pyg_builder.py to use
ParquetGraphDB instead of GraphDB. This change aligns the test
setup with the recent restructuring of the database modules,
ensuring that tests are executed with the correct database
implementation.
Changed imports in the "01 - Getting Started.ipynb" notebook to
reflect the new structure of the matgraphdb package. This update
ensures that the notebook uses the correct modules from the core
package, maintaining consistency with recent refactoring efforts.
Added nbsphinx configuration to prevent execution of notebooks during
the build process. This change ensures that the notebooks are treated
as static content, improving the documentation generation workflow.

Also updated import statements in the Graph Generators notebook to
reflect the correct module paths.
Changed the load method in HeteroGraphBuilder to accept ParquetGraphDB
instead of GraphDB. This update aligns the method signature with recent
refactoring efforts, ensuring consistency across the codebase.
Replaced instances of GraphBuilder with HeteroGraphBuilder in the test
file tests/test_pyg_builder.py. This change aligns the test setup with
the recent refactoring of the GraphBuilder class, ensuring consistency
in the usage of the updated class name throughout the codebase.
Changed references from "materials" to "material" in test fixtures and
assertions to align with recent naming conventions. Updated the load
method in HeteroGraphBuilder to load hetero_data with weights_only set
to False, ensuring compatibility with the latest data structure.

This refactor improves consistency across the codebase and prepares
for future enhancements.
Modified .gitignore to specify tmp.py for exclusion and removed the
unnecessary entry for *data. This change improves the clarity of the
ignored files and ensures that temporary Python files are not tracked.

Also added a new Parquet file material_0.parquet to the test data
directory, enhancing the test dataset for material-related tests.
Removed outdated references and files related to edge and node
generators, and updated the materials API documentation to reflect
the new structure. Added new sections for materials and node
generators, ensuring clarity and consistency in the API reference.

This change enhances the organization of the documentation and
prepares for future updates.
Eliminated the verbose parameter from the MaterialStore class methods
to simplify the API. This change removes unnecessary complexity and
aligns with the current logging practices, as verbosity is no longer
needed for logging purposes.

Updated the test fixture to set verbosity directly when creating
MaterialStore instances, ensuring compatibility with the new method
signatures.
Updated import paths to reflect the new structure, ensuring
compatibility with existing modules. This change enhances the
functionality of the material management system and prepares for
future developments.
Implemented a new script to run pytest continuously, capturing test
results and logging failures with timestamps. This change allows for
better monitoring of test outcomes and facilitates debugging by storing
failure details in a dedicated log directory.

The script handles graceful shutdown on receiving termination signals,
ensuring that test execution can be stopped without abrupt interruptions.
…tories

Replaced string-based temporary directory handling with Path objects
for improved readability and consistency. This change simplifies
directory management in test fixtures, ensuring that paths are
handled more robustly across different operating systems.

Updated the material_store and matgraphdb fixtures to utilize
Path for creating and cleaning up temporary directories, enhancing
the overall structure of the test code.
Replaced the existing material_0.parquet file with a new version to
enhance the test dataset for material-related tests. This update
ensures that the tests are using the most current data, improving
the reliability of test outcomes.
Deleted the old matgraphdb_base.rst file and created a new
matgraphdb.rst file to better organize the documentation for
the MatGraphDB class. This change improves clarity and
maintains consistency in the API documentation structure.
@lllangWV lllangWV merged commit 0c7db33 into main May 9, 2025
8 checks passed
github-actions bot pushed a commit that referenced this pull request May 9, 2025
@github-actions github-actions bot mentioned this pull request May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0