-
Notifications
You must be signed in to change notification settings - Fork 64
Method to truly root tree nodes #2353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nt3#2252 [NEW] rooting is defined in this instance as two children from the "root" node, as distinct fromn >= 3 children. You must provide the name of an edge, either tip or internal node name.
Reviewer's GuideThis PR refactors the tree rooting interface by renaming and deprecating the old method, introduces a robust “rooted” splitting API, optimizes node-naming, updates internal calls and tests accordingly, and adds the corresponding changelog entry. Updated Class Diagram for TreeNodeclassDiagram
class TreeNode {
+get_root() Self
+root() Self$ "deprecated"
+rooted(edge_name str) Self
+total_length()
}
note for TreeNode "Method root() is deprecated in favor of get_root()"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @GavinHuttley - I've reviewed your changes - here's some feedback:
- Consider refactoring the lengthy rooted() method into smaller helper functions (e.g. splitting logic for naming and subtree extraction) to improve readability.
- Double-check that all uses of TreeNode.root() (and PhyloNode.root()) are consistently aliased or replaced with get_root() to avoid any remaining inconsistencies.
- Clean up the changelog by removing the surrounding HTML comment wrappers so the deprecation entry for root() is correctly displayed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the lengthy rooted() method into smaller helper functions (e.g. splitting logic for naming and subtree extraction) to improve readability.
- Double-check that all uses of TreeNode.root() (and PhyloNode.root()) are consistently aliased or replaced with get_root() to avoid any remaining inconsistencies.
- Clean up the changelog by removing the surrounding HTML comment wrappers so the deprecation entry for root() is correctly displayed.
## Individual Comments
### Comment 1
<location> `src/cogent3/core/tree.py:733` </location>
<code_context>
- for node in self.traverse():
- if node.name:
- names_in_use.append(node.name)
+ names_in_use = [node.name for node in self.traverse() if node.name]
# assign unique names to the Data property of nodes where Data = None
name_index = 1
</code_context>
<issue_to_address>
Use a set for `names_in_use` to speed up membership tests
A set provides faster lookups and avoids duplicates, making the code clearer and more efficient.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
# make a list of the names that are already in the tree
names_in_use = [node.name for node in self.traverse() if node.name]
# assign unique names to the Data property of nodes where Data = None
name_index = 1
=======
# make a set of the names that are already in the tree
names_in_use = {node.name for node in self.traverse() if node.name}
# assign unique names to the Data property of nodes where Data = None
name_index = 1
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `src/cogent3/core/tree.py:1851` </location>
<code_context>
"""returns the sum of all branch lengths in tree"""
- root = self.root()
+ root = self.get_root()
if root is None:
msg = "no root to this tree!"
raise ValueError(msg)
</code_context>
<issue_to_address>
Unreachable branch: `get_root()` never returns None
The `if root is None` check is unnecessary and can be removed, unless you plan to modify `get_root()` to return None for missing roots.
</issue_to_address>
### Comment 3
<location> `tests/test_core/test_tree.py:2681` </location>
<code_context>
+ child1.name = "root"
+ child2.parent = None
+ child2.name = "root"
+ ts1, ts2 = (ts1, ts2) if "HUman" in child1.get_tip_names() else (ts2, ts1)
+ expect1 = make_tree(treestring=ts1)
+ expect2 = make_tree(treestring=ts2)
</code_context>
<issue_to_address>
Potential typo in test logic.
"HUman" should be corrected to "Human" to ensure the test works as intended.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
ts1, ts2 = (ts1, ts2) if "HUman" in child1.get_tip_names() else (ts2, ts1)
=======
ts1, ts2 = (ts1, ts2) if "Human" in child1.get_tip_names() else (ts2, ts1)
>>>>>>> REPLACE
</suggested_fix>
### Comment 4
<location> `src/cogent3/core/tree.py:479` </location>
<code_context>
+ def root(self) -> typing_extensions.Self:
+ return self.get_root()
+
+ def rooted(self, edge_name: str) -> typing_extensions.Self:
+ """Returns a new tree with split at edge_name
+
</code_context>
<issue_to_address>
Consider refactoring the new rooted method by extracting helpers, using early returns, and unifying naming logic to improve readability.
```markdown
I agree the new `rooted` method can be made more readable by:
1. pulling each logical piece into a tiny helper,
2. replacing nested `if`/`else` with early-returns, and
3. unifying the tip vs internal naming logic.
Here’s one way to refactor:
```python
def rooted(self, edge_name: str) -> typing_extensions.Self:
root = self.get_root()
_validate_root_call(root, edge_name)
tree = root.deepcopy()
tree.source = None
node = tree.get_node_matching_name(edge_name)
left, right = _split_at(node)
left.length = right.length = _half_length(node)
left.name, right.name = _make_names(edge_name, node.is_tip())
return self.__class__(name="root", children=[left, right], source=tree.source)
def _validate_root_call(root, edge_name):
if root.name != "root":
raise TreeError(f"cannot apply from non-root node {root.name!r}, use get_root()")
if edge_name == "root":
if len(root.children) == 2:
return
raise TreeError("cannot root at existing root")
def _split_at(node):
parent = node.parent
parent.children.remove(node)
# detach fully so deepcopy of subtrees is unrooted
node.parent = None
return node.unrooted_deepcopy(), parent.unrooted_deepcopy()
def _half_length(node):
return (getattr(node, "length", 0.0)) / 2
def _make_names(edge_name, is_tip):
"""
Unify naming for both tip/internal:
- tips: ("<edge>-root", "<edge>")
- internals: ("<edge>-L", "<edge>-R")
"""
if is_tip:
return f"{edge_name}-root", edge_name
return f"{edge_name}-L", f"{edge_name}-R"
```
This:
- removes deep nesting,
- centralizes validation and naming,
- preserves exactly the same behavior.
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
for i in list(nodes.values()): | ||
assert i.root() is root | ||
assert i.get_root() is root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Avoid loops in tests. (no-loop-in-tests
)
Explanation
Avoid complex code, like loops, in test functions.Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
- loops
- conditionals
Some ways to fix this:
- Use parametrized tests to get rid of the loop.
- Move the complex logic into helpers.
- Move the complex part into pytest fixtures.
Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / Don't Put Logic in Tests
msg = "cannot root at existing root" | ||
raise TreeError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Swap positions of nested conditionals (
swap-nested-ifs
) - Lift code into else after jump in control flow (
reintroduce-else
) - Hoist nested repeated code outside conditional statements (
hoist-similar-statement-from-if
) - Inline variable that is only used once (
inline-variable
)
msg = "cannot root at existing root" | |
raise TreeError(msg) | |
else: | |
raise TreeError("cannot root at existing root") |
Pull Request Test Coverage Report for Build 15627216644Details
💛 - Coveralls |
Summary by Sourcery
Provide a clearer rooting API by renaming root() to get_root(), deprecating the old alias, adding a rooted() method for rerooting, and adjusting code, tests, and changelog accordingly.
New Features:
Enhancements:
Documentation:
Tests: