10000 examples/1d/classif_keras.py is misleading · Issue #1040 · kymatio/kymatio · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

examples/1d/classif_keras.py is misleading #1040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kkm000 opened this issue Nov 12, 2023 · 3 comments
Open

examples/1d/classif_keras.py is misleading #1040

kkm000 opened this issue Nov 12, 2023 · 3 comments

Comments

@kkm000
Copy link
kkm000 commented Nov 12, 2023

ds = deeplake.load("hub://activeloop/spoken_mnist", verbose=False)
ds = ds[:200]

The dataset is arranged as digit{0..9} then speaker{1..6} then utterance{1..50}, i.e. in the first 50 records speaker 1 calls out 'zero', in the next 50 speaker 2's says 'zero', etc. Since there are 6 speakers, 'one' first occurs at the record 300.

The first 200 records represent 4 speakers (out of 6), all pronouncing 'zero'.

In this example we use the 1D scattering transform to represent spoken digits,
which we then classify using a simple classifier. This shows that 1D scattering
representations are useful for this type of problem.

x_out = layers.Dense(10, activation='softmax')(x)

It's a bit hard to see how this is a classification problem. But the description reads, and the network structure suggests as if 10 digits are classified, not that the model is trained, validated, and evaluated on a single label out of 10.

Results:
Training accuracy = 99.7%
Testing accuracy = 98.0%

Accuracy of 100% (the actual number on the held-out set after either 50 or 35 epochs) should always be suspicious.


There are extremely serious problems with TensorFlow implementation (I cannot say anything about other frameworks), which make Kymatio non-productizable right from the start, and likely slower than a proper implementation should have been. The generated serialized computation graph from the same example's model is ≈200MB in size, and consists of 337 identical traces, one per nested loop iteration in scattering1d.py.

Kymatio integrates the construction of wavelet filter banks in 1D, 2D, and 3D, as well as memory-efficient algorithms for extracting wavelet scattering coefficients, under a common application programming interface.

Having all loops unrolled into a 200MB executable graph hardly qualifies as a “memory-efficient” computation by me, but that's just my opinion. In the eye of this beholder, the README's tenor sets expectations unreasonably high.

I'll share more detail indeed when I'm finished tearing it apart to get a working TensorFlow 1D implementation of the "classic" Andén et.al. 2014 paper scattering transform; give me a few weeks. All I can say now is, the structure of code is so Byzantine and, unfortunately, assumes so much nothing at all about TensorFlow that I, personally, wouldn't dare attempt to salvage the codebase and then take it up to its declared scope (i.e., {1,2,3}D cases with multiple frameworks). Redesigning it would require much more intimate knowledge of all supported frameworks, which I simply don't have.

@MuawizChaudhary
Copy link
Collaborator

Hi, sorry for the long time in responding. I'd like to look into this issue a bit more. Thanks for the information and details on the spoken digits classification.

Can you give me more details about the tensorflow stack track? how did you get this stack trace? why is the computation graph so large? what is you suggestion for solving this just in the case of the 1d tensorflow implementation?

@kkm000
Copy link
Author
kkm000 commented May 20, 2024

Hi @MuawizChaudhary, thanks for looking into it. There are in fact multiple issues. You can immediately look at the incorrect selection from the MNIST Digits: only one digit by a few speakers is ever used. I have a hunch that someone added ds = ds[:200] to the sample code while debugging because transform takes pretty much forever (the full dataset took close to an hour to transform; of course I stored features. A single-layer little network from the 1D example totally fails to converge into a 10-digit classifier; perhaps it needs a beefing-up.

As for Tensorflow, you may simply save the model with the tf.saved_model API. The saved file will take a good half a gig. The problem is, TF has two kinds of variables: tensors and normal Python variables. Keras internally traces all model layers. To determine if a trace may be reused, a few conditions must be satisfied, for example, tensor shapes must be compatible. Some conditions absolutely prohibit even checking for trace compatibility. One of them is, unfortunately, the use of generators, and this is the core of Kymatio implementation: every layer receives its filters via a yield as a Python data structure. Another restriction is that the XLA compiler may only compile Python code into a TF graph if all functions are traced within one compile unit, so even returning arrays instead of the yield from another module didn't help. Some patterns, such as appending a value of any kind: scalar, object, tensor—anything—is also non-traceable (although this may no longer be true starting with experimental support in TF 2.13),

Generally, at runtime, tensors don't exist as instances of anything, except constants and TensorArray, Dataset and similar N-dim objects, which are designed to contain data; a regular tensor is a logical abstraction for describing inputs/outputs shapes of an op. Ops are graph vertices, and are real sets of pre-written kernels optimized for different devices, but tensors are edges which only convey shapes. In TF model, "reallocating" or "extending" a tensor does not even have a sensible definition; changing a shape mid-graph may have graph-wide repercussions.

As a simplified example, augmenting a list (assuming it's possible, for the sake of an example) in a fixed-bound loop of final size, say, 100, consists of 100 vertices, each concatenating an element to the end of a fixed TensorArray object; each of the 100 ops has a fixed input and output shapes. Unfortunately, inter-op fusion is a much harder, generally undecidable problem, so TF optimization capabilities are limited. And Python is not the easiest language to translate into rigorously proven data structures and flows, indeed! :)

As you probably remember, there are 2 nested fixed loops over layers and then over filter scales. Tracing of these loops is possible in a regular language automaton (no python code is executed twice during tracing, so it's always finite), so by the end all numbers of iterations, all filters etc. are known and hardcoded each—at the price of a massive and an extremely slow, non-parallelizable graph. The whole body of the inner loop traced and emitted into the graph anew.

I shall get back to you in about two weeks; I'm near a finish line of quite a project. We'll be able to discuss the issue. At first, I removed all but NP and TF support and only 1D "classic" (non-JTF) code, then reworked it to TensorFlow compatibility, which required additional precomputed constants storage, including the index of the output coefficient that will be eventually stored: there is no memory reallocation during runtime, or at the least it's very expensive.

What worries me is that every framework has its own quirks, and it may be an excruciatingly hard problem to implement 1D, 2D, 3D transforms using multiple backend in an idiomatic and efficient way.

I gave up in the end and implemented the transform in C++ as I was not happy with its performance, although I improved it possibly 50-fold. The highly-branching tree with a variable number of branches is not an easy data shape to build efficient algorithms around, unfortunately. The DFS, implemented in Kymatio, unlike the BFS in the original Ànden's 2014 Matlab code, is also the worst, sequential strategy for modern CPUs and especially GPUs.

I'm sorry, I won't have time in the next two weeks, let's make it three just in case, but I believe the general architecture of the project is definitely worth discussing!

@MuawizChaudhary
Copy link
Collaborator
MuawizChaudhary commented May 20, 2024

Hi @kkm000,

thanks for the detailed reply. Indeed there are two separate issues to be resolved here, 1. being the error in the example, 2. the other being the inefficiency of the implementation in graph format.

On 1; I'll try to see if I or someone else has time in the next couple of weeks to resolve the issue. I'm not experienced with 1d audio or scattering, so preferably someone else would be able to provide a fix.

On 2; yes I agree this is an issue that will take much time to resolve. Whenever you are done with your deadline, lets discuss more. #735 should be where we discuss this issue. I'll ping you in approx a month from now to discuss more.

EDIT: Good luck with your deadline!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0