-
Notifications
You must be signed in to change notification settings - Fork 5
Access to raw (non-normalized) data #50
New issue
Have a ques 8000 tion about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been in touch with @roshankern (Roshan, please feel free to clarify anything), and we do have access to this data 🎉 it is about 45GB. Therefore, @hwarden162 here's my proposal:
|
This makes sense to me. @hwarden162 let me know if you are able to develop the crude python implementation or if some version of this already exists! |
For some additional context, we're hosting a rotation student next term, so ideally Roshan is able to perform his part before Feb 1 |
This looks really good, thank you! I will write up the transoformations and the extra features I am suggesting are blocklisted and will then update you here. |
Back at work today after the christmas break. Looking into this, is there a way for me to get a csv of the first ~10 rows of the data (or failing this the column names and preferably their dtype)? Just thinking this will definitely allow me to deliver some code to transform the data with the minimum likelihood of errors. As a side note, is there a preference on if this transformation is in Python or R? I only ask as my data manipulation is a lot better in R so would be easier if it is the same for you but I can give either if there are external factors that would prefer one over the other. If I can get access to the head of the data then the turn around on the code should be very fast. Thanks. |
I have uploaded the relevant head of the illumination-corrected training data data here. I believe that we will also need the transform applied to the illumination-corrected negative control data that we use to normalize the training data, so I have uploaded the head of that data here as well. The data frame structures are nearly identical, but the training data also has the Mitocheck-assigned phenotypic class and object outlines. The general data frame column structures are:
Given that the transform is only relevant for CellProfiler features, we can ignore the DeepProfiler features.
|
Hello @hwarden162 👋. Any chance there is an update on this transformation? We would like to complete the analysis before the next rotational student starts (if possible). Thanks! |
@roshankern - would you have a way to provide access to the raw feature?
I believe these would be the datasets located in the following folder, which is currently ignored: https://github.com/WayScience/mitocheck_data/blob/main/.gitignore#L15
cc @hwarden162
The text was updated successfully, but these errors were encountered: