You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I don't know if I'm understanding this properly, but it seems to me like the plots are being made on only one rank (chosen at random) when doing the plots, because they are being dropped on duplicates on the "reference" column, which is the reference PLINDER system_id.
The scores_df keeps only one prediction per system_id (with a rank chosen at random) since it does "drop_duplicates." I checked, and in fact, the merged_df is of shape (n_test_points, n_columns) instead of (n_test_points * rank, n_columns).
The result is that data is initialized with this merged_df. I think this is not the intended behavior since data operations have a top_n argument, which doesn't make sense on a merged_df that has only one rank per reference system.
Let me know if I'm missing something! In case this is a bug, I could quickly fix it, I would need to understand what was the reasoning for the drop duplicates operation in the first place.
The text was updated successfully, but these errors were encountered:
DreRnc
changed the title
Plots dropping all except ranks except one for plotting
Plots dropping all ranks except one for plotting
Feb 15, 2025
DreRnc
changed the title
Plots dropping all ranks except one for plotting
make_plots dropping all ranks except one for plotting
Feb 15, 2025
Hi, I don't know if I'm understanding this properly, but it seems to me like the plots are being made on only one rank (chosen at random) when doing the plots, because they are being dropped on duplicates on the "reference" column, which is the reference PLINDER system_id.
plinder/src/plinder/eval/docking/make_plots.py
Lines 43 to 65 in d303084
The scores_df keeps only one prediction per system_id (with a rank chosen at random) since it does "drop_duplicates." I checked, and in fact, the merged_df is of shape (n_test_points, n_columns) instead of (n_test_points * rank, n_columns).
The result is that
data
is initialized with thismerged_df
. I think this is not the intended behavior since data operations have atop_n
argument, which doesn't make sense on amerged_df
that has only one rank per reference system.Let me know if I'm missing something! In case this is a bug, I could quickly fix it, I would need to understand what was the reasoning for the
drop duplicates
operation in the first place.The text was updated successfully, but these errors were encountered: