8000 Handle multiclass precision/recall by montanalow · Pull Request #152 · smartcorelib/smartcore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Handle multiclass precision/recall #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 13, 2022

Conversation

montanalow
Copy link
Collaborator
@montanalow montanalow commented Sep 13, 2022

This implements micro averaging for multiclass precision/recall to handle #151. I believe this is a good default, since it handles imbalanced labels better than macro averaging (which is also not yet implemented).

Additionally, the documentation for to_f32_bits is incorrect, and that function may currently cause unexpected behavior because it truncates half the bits for f64 values. This breaks things like the uniqueness check I'm doing to count the number of distinct labels. I'd also be open to better methods to count distinct floating point values in Rust, but I haven't seen an easy library function that handles this (with nans etc). I've added to_f64_bits to provide the behavior I expected.

@Mec-iS
Copy link
Collaborator
Mec-iS commented Sep 13, 2022

thanks for your contribution!

This looks great, we would like to have your opinion about some major changes on how the lib manages arrays, available at PR#108

@Mec-iS Mec-iS requested a review from morenol September 13, 2022 09:38
Copy link
Collaborator
@morenol morenol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Mec-iS Mec-iS merged commit 2e5f88f into smartcorelib:development Sep 13, 2022
@montanalow
Copy link
Collaborator Author
montanalow commented Sep 13, 2022

This looks great, we would like to have your opinion about some major changes on how the lib manages arrays, available at PR#108

Overall, the goals for #108 looks great! For context on my opinions, I'm working on https://postgresml.org/. We're currently implemented as a Python wrapper, but undertaking an effort to port everything to Rust, which is possible in no small part due to smartcore. Raw performance (not copying data) is the primary reason.

  • Avoiding copy operations is a win for performance, but an even bigger win for capability because our models are often memory bound during data cleaning rather than compute or memory bound during training/inference. Being able to do more of that cleaning in Rust rather than SQL would be great.
  • Supporting multiple types in a Matrix would also help organize the data cleaning steps, allowing us to support more feature types. We currently operate as if everything were f32, but would like to eventually automatically handle Strings, Enums, Ints efficiently and automatically as categorical variables.
  • Having the flexibility to use things like 8bit ints for large model weights will be interesting as we start to push the limits. https://huggingface.co/blog/hf-bitsandbytes-integration

morenol pushed a commit that referenced this pull request Nov 8, 2022
* handle multiclass precision/recall
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0