8000 Method to truly root tree nodes by GavinHuttley · Pull Request #2353 · cogent3/cogent3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Method to truly root tree nodes #2353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 13, 2025
Merged

Conversation

GavinHuttley
Copy link
Collaborator
@GavinHuttley GavinHuttley commented Jun 13, 2025

Summary by Sourcery

Provide a clearer rooting API by renaming root() to get_root(), deprecating the old alias, adding a rooted() method for rerooting, and adjusting code, tests, and changelog accordingly.

New Features:

  • Introduce TreeNode.get_root() as the non‐misleading method for retrieving a tree’s root
  • Add TreeNode.rooted() to produce a new tree rerooted at a specified edge, tip, or internal node

Enhancements:

  • Deprecate the old TreeNode.root() alias in favor of get_root()
  • Replace internal calls to root() with get_root()
  • Simplify name_unnamed_nodes() by using a list comprehension

Documentation:

  • Update changelog to document deprecation of TreeNode.root()

Tests:

  • Update existing root() tests to use get_root()
  • Add comprehensive tests for the new rooted() method

…nt3#2252

[NEW] rooting is defined in this instance as two children from
    the "root" node, as distinct fromn >= 3 children.
    You must provide the name of an edge, either tip or internal
    node name.
Copy link
Contributor
sourcery-ai bot commented Jun 13, 2025

Reviewer's Guide

This PR refactors the tree rooting interface by renaming and deprecating the old method, introduces a robust “rooted” splitting API, optimizes node-naming, updates internal calls and tests accordingly, and adds the corresponding changelog entry.

Updated Class Diagram for TreeNode

classDiagram
    class TreeNode {
        +get_root() Self
        +root() Self$ "deprecated"
        +rooted(edge_name str) Self
        +total_length()
    }

    note for TreeNode "Method root() is deprecated in favor of get_root()"
Loading

File-Level Changes

Change Details Files
Refactor root lookup with new get_root and deprecate old root()
  • Added get_root() as the canonical root finder
  • Marked root() as deprecated and pointed it to get_root()
  • Replaced internal uses of root() with get_root() (e.g. total_length)
src/cogent3/core/tree.py
Implement true re-rooting via a new rooted(edge_name) method
  • Deep-copies tree and validates calling node is the root
  • Locates split node, halves branch length, and names new children
  • Constructs a new two-child root node preserving source
src/cogent3/core/tree.py
Streamline naming of unnamed nodes
  • Replaced loop-based name collection with a comprehension
src/cogent3/core/tree.py
Update and extend tests for rooting behavior
  • Replaced root() assertions with get_root()
  • Added tests covering rooted() on root, non-root, tip, and internal nodes
tests/test_core/test_tree.py
Uncomment and refine changelog deprecation entry
  • Enabled Deprecations section
  • Documented deprecation of TreeNode.root() in favor of get_root()
changelog.d/20250604_092738_Gavin.Huttley.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor
@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @GavinHuttley - I've reviewed your changes - here's some feedback:

  • Consider refactoring the lengthy rooted() method into smaller helper functions (e.g. splitting logic for naming and subtree extraction) to improve readability.
  • Double-check that all uses of TreeNode.root() (and PhyloNode.root()) are consistently aliased or replaced with get_root() to avoid any remaining inconsistencies.
  • Clean up the changelog by removing the surrounding HTML comment wrappers so the deprecation entry for root() is correctly displayed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the lengthy rooted() method into smaller helper functions (e.g. splitting logic for naming and subtree extraction) to improve readability.
- Double-check that all uses of TreeNode.root() (and PhyloNode.root()) are consistently aliased or replaced with get_root() to avoid any remaining inconsistencies.
- Clean up the changelog by removing the surrounding HTML comment wrappers so the deprecation entry for root() is correctly displayed.

## Individual Comments

### Comment 1
<location> `src/cogent3/core/tree.py:733` </location>
<code_context>
-        for node in self.traverse():
-            if node.name:
-                names_in_use.append(node.name)
+        names_in_use = [node.name for node in self.traverse() if node.name]
         # assign unique names to the Data property of nodes where Data = None
         name_index = 1
</code_context>

<issue_to_address>
Use a set for `names_in_use` to speed up membership tests

A set provides faster lookups and avoids duplicates, making the code clearer and more efficient.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        # make a list of the names that are already in the tree
        names_in_use = [node.name for node in self.traverse() if node.name]
        # assign unique names to the Data property of nodes where Data = None
        name_index = 1
=======
        # make a set of the names that are already in the tree
        names_in_use = {node.name for node in self.traverse() if node.name}
        # assign unique names to the Data property of nodes where Data = None
        name_index = 1
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `src/cogent3/core/tree.py:1851` </location>
<code_context>
         """returns the sum of all branch lengths in tree"""
-        root = self.root()
+        root = self.get_root()
         if root is None:
             msg = "no root to this tree!"
             raise ValueError(msg)
</code_context>

<issue_to_address>
Unreachable branch: `get_root()` never returns None

The `if root is None` check is unnecessary and can be removed, unless you plan to modify `get_root()` to return None for missing roots.
</issue_to_address>

### Comment 3
<location> `tests/test_core/test_tree.py:2681` </location>
<code_context>
+    child1.name = "root"
+    child2.parent = None
+    child2.name = "root"
+    ts1, ts2 = (ts1, ts2) if "HUman" in child1.get_tip_names() else (ts2, ts1)
+    expect1 = make_tree(treestring=ts1)
+    expect2 = make_tree(treestring=ts2)
</code_context>

<issue_to_address>
Potential typo in test logic.

"HUman" should be corrected to "Human" to ensure the test works as intended.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    ts1, ts2 = (ts1, ts2) if "HUman" in child1.get_tip_names() else (ts2, ts1)
=======
    ts1, ts2 = (ts1, ts2) if "Human" in child1.get_tip_names() else (ts2, ts1)
>>>>>>> REPLACE

</suggested_fix>

### Comment 4
<location> `src/cogent3/core/tree.py:479` </location>
<code_context>
+    def root(self) -> typing_extensions.Self:
+        return self.get_root()
+
+    def rooted(self, edge_name: str) -> typing_extensions.Self:
+        """Returns a new tree with split at edge_name
+
</code_context>

<issue_to_address>
Consider refactoring the new rooted method by extracting helpers, using early returns, and unifying naming logic to improve readability.

```markdown
I agree the new `rooted` method can be made more readable by:

1. pulling each logical piece into a tiny helper,
2. replacing nested `if`/`else` with early-returns, and
3. unifying the tip vs internal naming logic.

Here’s one way to refactor:

```python
def rooted(self, edge_name: str) -> typing_extensions.Self:
    root = self.get_root()
    _validate_root_call(root, edge_name)

    tree = root.deepcopy()
    tree.source = None

    node = tree.get_node_matching_name(edge_name)
    left, right = _split_at(node)
    left.length = right.length = _half_length(node)
    left.name, right.name = _make_names(edge_name, node.is_tip())

    return self.__class__(name="root", children=[left, right], source=tree.source)


def _validate_root_call(root, edge_name):
    if root.name != "root":
        raise TreeError(f"cannot apply from non-root node {root.name!r}, use get_root()")
    if edge_name == "root":
        if len(root.children) == 2:
            return
        raise TreeError("cannot root at existing root")


def _split_at(node):
    parent = node.parent
    parent.children.remove(node)
    # detach fully so deepcopy of subtrees is unrooted
    node.parent = None

    return node.unrooted_deepcopy(), parent.unrooted_deepcopy()


def _half_length(node):
    return (getattr(node, "length", 0.0)) / 2


def _make_names(edge_name, is_tip):
    """
    Unify naming for both tip/internal:
      - tips:      ("<edge>-root", "<edge>")
      - internals: ("<edge>-L",    "<edge>-R")
    """
    if is_tip:
        return f"{edge_name}-root", edge_name
    return f"{edge_name}-L", f"{edge_name}-R"
```

This:

- removes deep nesting,
- centralizes validation and naming,
- preserves exactly the same behavior.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 904 to +905
for i in list(nodes.values()):
assert i.root() is root
assert i.get_root() is root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid loops in tests. (no-loop-in-tests)

ExplanationAvoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +497 to +498
msg = "cannot root at existing root"
raise TreeError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
msg = "cannot root at existing root"
raise TreeError(msg)
else:
raise TreeError("cannot root at existing root")

@coveralls
Copy link
Collaborator
coveralls commented Jun 13, 2025

Pull Request Test Coverage Report for Build 15627216644

Details

  • 41 of 41 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 90.755%

Totals Coverage Status
Change from base Build 15626421114: 0.02%
Covered Lines: 30119
Relevant Lines: 33187

💛 - Coveralls

@GavinHuttley GavinHuttley merged commit cf8ad8d into cogent3:develop Jun 13, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0