8000 code implement for DCC and DCA · Issue #3 · tiantz17/PocketAnchor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

code implement for DCC and DCA #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
allen-xf opened this issue May 16, 2023 · 15 comments
Open

code implement for DCC and DCA #3

allen-xf opened this issue May 16, 2023 · 15 comments

Comments

@allen-xf
Copy link

DCC and DVO metrics are used for evaluation. Could you please show the code implementation of evaluation.

@tiantz17
Copy link
Owner

We used DCC and DCA for pocket detection evaluation.

After obtaining the predicted scores for individual anchors of a protein, a clustering algorithm was applied to generate pocket centers, that is,

'''
anchor_coords: anchor coordinates
pred: predicted scores for individual anchors
protein_coords: protein atom coordinates (given chains)
thre: threshold for clustering

'''
def get_dca_data(anchor_coords, pred, protein_coords, thre=5):
    anchor_coords = anchor_coords.reshape(-1, 3)
    # filter nearby protein atoms 
    if protein_coords is not None:
        index = pairwise_distances(anchor_coords, protein_coords).min(1) < 6
        anchor_coords = anchor_coords[index]
        pred = pred[index]
    
    max_value = np.nanmean(pred) + 3*np.nanstd(pred)

    aa_dist = pairwise_distances(anchor_coords)
    num = len(pred)
    list_done = []
    list_centers = []
    list_coords = []
    list_scores = []
    while True:
        list_todo = np.array([i for i in range(num) if i not in list_done])
        try:
            if len(list_centers) >= 1 and pred[list_todo].max() < max_value:
                break
        except:
            break

        seed = list_todo[pred[list_todo].argmax()]
        list_pocket = [seed]
        list_bfs = [seed]
        list_finished = []
        while len(list_bfs)>0:
            start = list_bfs.pop()
            list_finished.append(start)
            nei = list(np.arange(num)[(aa_dist[start] < thre) * (pred > max_value)])
            list_pocket.extend(nei)
            list_bfs.extend(list(set(nei) - set(list_finished)))
        list_pocket = list(set(list_pocket))
        # check
        coords = np.array(anchor_coords[list_pocket])
        weight = np.exp(pred[list_pocket]).reshape(-1, 1)
        weight /= weight.sum()
        center = (coords * weight).sum(0)
                
        list_coords.append(coords)
        list_centers.append(center)
        list_scores.append(len(list_pocket))
        list_done.extend(list_pocket)
    pocket_centers = np.array(list_centers)
    return pocket_centers

Then, DCC and DCA can be computed directly based on the definitions, that is,

'''
  ligand_coords: coordinates of ligand atoms
  pocket_centers: predicted pocket centers in descending order
  n: number of pockets for input protein
'''

from sklearn.metrics import pairwise_distances

# DCC top-(n+2)
dcc_anchor_n2 = pairwise_distances(np.mean(ligand_coords, axis=0, keepdims=True), pocket_centers[:(n+2)]).min()

# DCC top-(n)
dcc_anchor_n = pairwise_distances(np.mean(ligand_coords, axis=0, keepdims=True), pocket_centers[:n]).min()

# DCA top-(n+2)
dca_anchor_n2 = pairwise_distances(ligand_coords, pocket_centers[:(n+2)]).min()

# DCA top-(n)
dca_anchor_n = pairwise_distances(ligand_coords, pocket_centers[:n]).min()

We did not use DVO because there is no voxelization in this work.

@allen-xf allen-xf changed the title code implement for DCC and DVO code implement for DCC and DCA May 16, 2023
@allen-xf
Copy link
Author

Thanks for your code, your paper also gives me a lot of inspiration.

@allen-xf
Copy link
Author

image
Why does COACH420 only have 348 labels? What is the problem that causes the program to run abnormally?

@tiantz17
Copy link
Owner

Hi, can you print the exception error message?

@allen-xf
Copy link
Author

Hi,I don't run the data preprocessing, the output(419, 348) in the figure may be what you ran earlier. When I predict coach420, I find that only 348 labels are imported, and you have common out the import of labels(https://github.com/tiantz17/PocketAnchor/blob/main/PocketDetection/src/COACH420.py#L30C1-L34)

@allen-xf
Copy link
Author
allen-xf commented May 20, 2023

I guess you commented out the import of the label because the dataloader would get an error if the label was not complete.So I would like to know why there are 348 labels in the data preprocessing. I suspect that the code is wrong where the picture indicates, and then it directly passed the exception.

@tiantz17
Copy link
Owner

Oh, I see.
The label_dict was generated as training labels for scPDB. COACH420 was only used for testing, not for training. So we did not use the label_dict of COACH420 here.
You can just ignore this.

@tiantz17
Copy link
Owner

Here I provide the ligand coordinates of COACH420 used for evaluation.

coach_ligand_coord_load.zip

@allen-xf
Copy link
Author

I see, thanks for your reply.

@allen-xf
Copy link
Author

When calculating the DCC/DCA , if one protein has multiple ligands, the prediction of the protein will be successful only if each ligand calculation meets the requirements?

@allen-xf
Copy link
Author
allen-xf commented May 21, 2023

devalab/DeepPocket#9 (comment). In deepPocket, the success rate is divided by pocket num, not protein num. Is it the same with your method?

@tiantz17
Copy link
Owner

When calculating the DCC/DCA , if one protein has multiple ligands, the prediction of the protein will be successful only if each ligand calculation meets the requirements?

If one protein has n ligands, then the top-(n) or top-(n+2) predicted pocket centers will be used for evaluation. The success rate is defined as the number of successfully predicted pockets divided by the number of total pockets, which was adopted by most methods.

@allen-xf
Copy link
Author

Thanks. I reproduced the results on coach420

@allen-xf
Copy link
Author

I am sorry that the result may be inaccurate because of some bugs in my code before. After the revision, it seems that the gap between the reproduced results and the results in the paper. Could you please provide the complete code of the prediction to help me reproduce the results

@tiantz17
Copy link
Owner

For better reproducing the results of our paper, we now provide a docker image containing code, data, environment, trained models, and prediction results.

You can pull it from https://hub.docker.com/r/tiantz17/pocketanchor or run docker pull tiantz17/pocketanchor.

Hope this can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0