Open
Description
The current binarize function uses a cutoff of 0.5 for binarization:
Lines 28 to 34 in 3e26652
This is an issue for PyKEEN, where the scores that come from a model could all be on the range of [-5,-2]. The current TODO text says to use https://en.wikipedia.org/wiki/Youden%27s_J_statistic, but it's not clear how that would be used.
As an alternative, the NetMF package implements the following code for constructing an indicator that might be more applicable (though I don't personally recognize what method this is, and unfortunately it's not documented):
def construct_indicator(y_score, y):
# rank the labels by the scores directly
num_label = np.sum(y, axis=1, dtype=np.int)
y_sort = np.fliplr(np.argsort(y_score, axis=1))
y_pred = np.zeros_like(y, dtype=np.int)
for i in range(y.shape[0]):
for j in range(num_label[i]):
y_pred[i, y_sort[i, j]] = 1
return y_pred
Metadata
Metadata
Assignees
Labels
No labels