Allow specification of sequence identity mode #31

dxu16 · 2025-05-19T18:29:37Z

This PR adds support for specifying the mode to use for sequence identity calculation in evaluator.

The reasoning for adding this is that when comparing AlphaFold predicted structure with ground truth, often times the predicted structure contains residues that are missing in the ground truth. Currently, the "all" mode is used for computing sequence identity during entity id assignment, which may produce a low sequence identity when ground truth has large gaps in the middle, even though the rest of the sequence aligns perfectly. Although one could address this by lowering the min_sequence_identity threshold, setting sequence_identity_mode to "shortest" may be more optimal in this senario.

padix-key · 2025-05-26T16:38:25Z

Hi, sorry for the late response. I generally agree with you: Other users experienced similar problems where terminal portions are missing. However, I would be a bit conservative in adding new parameters, as it 'burdens' the user with making a choice. In this case (correct if I am wrong) I think "shortest" would be a sensible solution for all use cases.

Furthermore, as mentioned above, often terminal sections are missing. To reflect this I think it is reasonable to ignore terminal gap penalties (terminal_penalty=False) when performing the sequence alignment.

I prepared an alternative PR #33 building upon your suggestion. Let me know what you think 🙂

dxu16 · 2025-05-26T20:18:04Z

Hi, thank you for your response! That makes sense, and your proposed fix looks good. I will close this PR in favor of #33.

One potential issue that I can think of is that when homomers in a structure have vastly different number of resolved residues (for instance, one has 100 residues and the other only has 10), the current code will assign both as the same entity which may not be ideal depending on circumstances. But it is definitely an edge case and will probably need larger scope changes such as allowing the specification of chain pairs between reference and model.

allow specification of seqeunce identity mode

c189875

padix-key mentioned this pull request May 26, 2025

Ignore gaps in shorter (usually pose) sequence #33

Merged

dxu16 closed this May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow specification of sequence identity mode #31

Allow specification of sequence identity mode #31

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Allow specification of sequence identity mode #31

Allow specification of sequence identity mode #31

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!