Access to raw (non-normalized) data #50

gwaybio · 2024-12-13T22:45:27Z

@roshankern - would you have a way to provide access to the raw feature?

I believe these would be the datasets located in the following folder, which is currently ignored: https://github.com/WayScience/mitocheck_data/blob/main/.gitignore#L15

cc @hwarden162

gwaybio · 2024-12-15T15:13:58Z

I've been in touch with @roshankern (Roshan, please feel free to clarify anything), and we do have access to this data 🎉 it is about 45GB. Therefore, @hwarden162 here's my proposal:

You go ahead and start working on the aggregated real-world example.
In the meantime, let us know exactly the transformations that you're applying. IIRC, you have implemented this in R, so perhaps this is as simple as pointing us to this function.
Roshan, Hugh (or someone else in the lab) can develop a crude python implementation (this will be the first step toward a stable pycytominer-based implementation)
Roshan, with access to maple, can apply the feature transformation to the 45GB single-cell dataset.
Roshan, again through maple, can process the new, transformed data and generate a new dataset of the same format in https://github.com/WayScience/mitocheck_data/tree/main/3.normalize_data/normalized_data called training_data__ic__rotationalvariancetransform.csv.gz. This would be the final deliverable Roshan, for you to be included as an author on the paper Hugh is working on. We only need to do this for the CellProfiler features
[Optional] We will pass this new dataset through our LOIO evaluation. Roshan, you can also take this on. Otherwise, Greg will run this. We are testing the hypothesis that by transforming the rotationally variant texture CP features, that we improve LOIO performance.

roshankern · 2024-12-15T20:04:25Z

This makes sense to me. @hwarden162 let me know if you are able to develop the crude python implementation or if some version of this already exists!

gwaybio · 2024-12-16T16:30:09Z

For some additional context, we're hosting a rotation student next term, so ideally Roshan is able to perform his part before Feb 1

hwarden162 · 2024-12-17T09:50:05Z

This looks really good, thank you! I will write up the transoformations and the extra features I am suggesting are blocklisted and will then update you here.

hwarden162 · 2024-12-30T22:49:44Z

Back at work today after the christmas break. Looking into this, is there a way for me to get a csv of the first ~10 rows of the data (or failing this the column names and preferably their dtype)? Just thinking this will definitely allow me to deliver some code to transform the data with the minimum likelihood of errors.

As a side note, is there a preference on if this transformation is in Python or R? I only ask as my data manipulation is a lot better in R so would be easier if it is the same for you but I can give either if there are external factors that would prefer one over the other.

If I can get access to the head of the data then the turn around on the code should be very fast. Thanks.

roshankern · 2024-12-31T18:05:20Z

Looking into this, is there a way for me to get a csv of the first ~10 rows of the data (or failing this the column names and preferably their dtype)?

I have uploaded the relevant head of the illumination-corrected training data data here. I believe that we will also need the transform applied to the illumination-corrected negative control data that we use to normalize the training data, so I have uploaded the head of that data here as well.

The data frame structures are nearly identical, but the training data also has the Mitocheck-assigned phenotypic class and object outlines. The general data frame column structures are:

Metadata columns (str, int)
CellProfiler feature columns prefixed with CP__ (int, float)
DeepProfiler feature columns prefixed with DP__ (float)

Given that the transform is only relevant for CellProfiler features, we can ignore the DeepProfiler features.

As a side note, is there a preference on if this transformation is in Python or R?
< 7F26 /blockquote>
We would definitely prefer this transformation in Python for easier integration into this and the pycytominer project.

Thank you for your help with this! Let me know if anything is unclear or more resources would be helpful.

roshankern · 2025-01-15T22:23:25Z

Hello @hwarden162 👋. Any chance there is an update on this transformation? We would like to complete the analysis before the next rotational student starts (if possible). Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Access to raw (non-normalized) data #50

Access to raw (non-normalized) data #50

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Access to raw (non-normalized) data #50

Access to raw (non-normalized) data #50

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!