-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[BUG] KNeighborsTimeSeriesClassifier throw an OOM error when it should not #5914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this is due to the distance matix computed interally - that is of size 250.000 times 250.000. Internally, |
To reduce the size of the matrix, you could try You say you tried other libraies where you do not get oom - which, if I may ask? We could simply interface these. |
thx for the explanation and suggestion. I tried |
But that's a different algorithm - k-means, not knn. Anyway, I fixed up a version where a callable is passed to sklearn, not a distance matrix, so and different algorithms should be available: What would be appreciated is testing, would you be able to check whether #5937 is less memory hungry? Works only for |
Also, have you tried We've also been trying to add interfaces to Again, testing & report back would be much appreciated! |
Also for testing, here is an interface to the Having these all in the (I already see that it has a different set of distances to choose from...) |
mmm interesting! Thank you for posting that! I was digging into the sklearn kNNclassifier and found that potentially you can pass something like a graph to avoid OOM and massive matrix. Perhaps KNeighborsTransformer which use |
I will test that PR asap |
Thanks! Btw, there's a runtime profiler in Given that your problem is memory - do you know of a quick way to profile memory? This could be added as a feature, currently the |
Hm, interesting idea. It does not work with precomputed matrix, and needs a callable like we pass in #5937. That might work, but as said it would require passing a callable in the first place, and doing that could fix your problem without recurring to |
did you manage to check? |
I got |
Here's the one from So, in the end you could try any of the following:
Some questions:
As already mentioned, testing would be appreciated. |
@srggrs, let us know if any of the three KNN released with 0.26.1 solves the issue (pyts, tslearn, and the new parameter to the sktime native one) |
Describe the bug
Fitting a standard
KNeighborsTimeSeriesClassifier
to a dataset of 250k samples will throw an OOM error even though the memory usage of the training data is very small (in the order of MBs). The error is thisTo Reproduce
Set up a python >= 3.8 env and install sktime
Expected behavior
It should train without throwing a memory error
Additional context
I tried other libraries that offer timeseries clustering for python with the same algorithm (kNN) and there is no problem in fitting over this dataset.
Versions
latest
The text was updated successfully, but these errors were encountered: