8000 Allow specification of sequence identity mode by dxu16 · Pull Request #31 · aivant/peppr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Allow specification of sequence identity mode #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dxu16
Copy link
@dxu16 dxu16 commented May 19, 2025

This PR adds support for specifying the mode to use for sequence identity calculation in evaluator.

The reasoning for adding this is that when comparing AlphaFold predicted structure with ground truth, often times the predicted structure contains residues that are missing in the ground truth. Currently, the "all" mode is used for computing sequence identity during entity id assignment, which may produce a low sequence identity when ground truth has large gaps in the middle, even though the rest of the sequence aligns perfectly. Although one could address this by lowering the min_sequence_identity threshold, setting sequence_identity_mode to "shortest" may be more optimal in this senario.

@padix-key
Copy link
Collaborator

Hi, sorry for the late response. I generally agree with you: Other users experienced similar problems where terminal portions are missing. However, I would be a bit conservative in adding new parameters, as it 'burdens' the user with making a choice. In this case (correct if I am wrong) I think "shortest" would be a sensible solution for all use cases.

Furthermore, as mentioned above, often terminal sections are missing. To reflect this I think it is reasonable to ignore terminal gap penalties (terminal_penalty=False) when performing the sequence alignment.

I prepared an alternative PR #33 building upon your suggestion. Let me know what you think 🙂

@dxu16
Copy link
Author
dxu16 commented May 26, 2025

Hi, thank you for your response! That makes sense, and your proposed fix looks good. I will close this PR in favor of #33.

One potential issue that I can think of is that when homomers in a structure have vastly different number of resolved residues (for instance, one has 100 residues and the other only has 10), the current code will assign both as the same entity which may not be ideal depending on circumstances. But it is definitely an edge case and will probably need larger scope changes such as allowing the specification of chain pairs between reference and model.

@dxu16 dxu16 closed this May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0