Description
I found your work very interesting, but when applying it in practice, the results do not align with those presented in the paper. Before drawing any conclusions, I would like to clarify whether we might be using the method incorrectly.
For example, we used the human cortex dataset from your paper as a test case. We directly fed the raw count matrix into the embedding pipeline and followed the tutorial provided on GitHub. We attempted to replicate the spatial domain detection task as described in the paper. Unfortunately, as shown in the attached results, the performance was significantly worse than expected—not to mention exceeding the performance of stLearn or SpaGCN. Could you confirm whether this is the expected performance of the method, or if there might be an issue with our usage?
Secondly, in the paper, you mention that the results were fine-tuned before comparison with unsupervised methods such as SpaGCN. This seems like an unfair comparison. Additionally, you only reported results on two LIBD samples. Were the other samples used for training, or were these two simply the only ones that performed reasonably well?
Lastly, the paper states that different experts were used to handle different platforms. However, the code does not seem to reflect this. While the model parameters include a platform dictionary, the input does not appear to specify platform settings explicitly. Could you clarify how the platform should be set in the implementation?
I appreciate your time and look forward to your response.