Improve binning in binarize()

The current binarize function uses a cutoff of 0.5 for binarization:

Lines 28 to 34 in 3e26652

    
           def metric_wrapper(*args, **kwargs): 
        
               # TODO: Move to optimal binning. Youden’s J statistic. 
        
               y_score = args[1] 
        
               y_score[y_score < 0.5] = 0 
        
               y_score[y_score >= 0.5] = 1 
        
               score = metric(*args, **kwargs) 
        
               return score

This is an issue for PyKEEN, where the scores that come from a model could all be on the range of [-5,-2]. The current TODO text says to use https://en.wikipedia.org/wiki/Youden%27s_J_statistic, but it's not clear how that would be used.

As an alternative, the NetMF package implements the following code for constructing an indicator that might be more applicable (though I don't personally recognize what method this is, and unfortunately it's not documented):

def construct_indicator(y_score, y):
    # rank the labels by the scores directly
    num_label = np.sum(y, axis=1, dtype=np.int)
    y_sort = np.fliplr(np.argsort(y_score, axis=1))
    y_pred = np.zeros_like(y, dtype=np.int)
    for i in range(y.shape[0]):
        for j in range(num_label[i]):
            y_pred[i, y_sort[i, j]] = 1
    return y_pred

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def metric_wrapper(args, *kwargs):
	# TODO: Move to optimal binning. Youden’s J statistic.
	y_score = args[1]
	y_score[y_score < 0.5] = 0
	y_score[y_score >= 0.5] = 1
	score = metric(args, *kwargs)
	return score

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions