8000 normalize fails · Issue #133 · cytomining/cytominer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

normalize fails #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cells2numbers opened this issue Oct 10, 2019 · 1 comment
Closed

normalize fails #133

cells2numbers opened this issue Oct 10, 2019 · 1 comment

Comments

@cells2numbers
Copy link
Member

Normalization fails in a simple case where cytominer is used to normalize a complete data set (no groups). Attached is a csv containing the features used in the provided example code below.

Tested with dplyr >0.8 so this issue could be related to #131 (not checked yet)

Example file population_ge_test.csv.tar.gz

Example:

library(readr)
library(dplyr)
library(magrittr)

df <- read_csv('population_ge_test.csv') %>% 
  filter(complete.cases(.))

# workaround for error: geError in parse(text = x) : <text>:1:7: unexpected input 1: 221227_ ^
colnames(df) <- c("Metadata_broad_sample_simple",1:977)

df %<>% mutate(strata_col = 1) 

feature_columns <- setdiff(colnames(df),"Metadata_broad_sample_simple")  %>% print
 
ge_normalized <- cytominer::normalize(
  population = df,
  variables = feature_columns, 
  sample = df,
  strata = c("strata_col"), 
  operation = "standardize"
)

Error:

Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) : non-numeric argument to binary operator

sessionInfo:

> sessionInfo() 
R version 3.6.1 (2019-07-05) 
Platform: x86_64-pc-linux-gnu (64-bit) 
Running under: Ubuntu 18.04.3 LTS  

Matrix products: default 
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so  

locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         

attached base packages: 
[1] stats     graphics  grDevices utils     datasets  methods   base       other 

attached packages: 
[1] magrittr_1.5 dplyr_0.8.3  readr_1.3.1   

loaded via a namespace (and not attached):  
[1] Rcpp_1.0.2           rstudioapi_0.10      knitr_1.24           hms_0.5.0            tidyselect_0.2.5     lattice_0.20-38      R6_2.4.0             rlang_0.4.0           
[9] foreach_1.4.7        tools_3.6.1          grid_3.6.1           xfun_0.8             lambda.r_1.2.3       futile.logger_1.4.3  iterators_1.0.12     assertthat_0.2.1     
[17] tibble_2.1.3         crayon_1.3.4         Matrix_1.2-17        formatR_1.7          purrr_0.3.2          futile.options_1.0.1 codetools_0.2-16     vctrs_0.2.0          
[25] zeallot_0.1.0        glue_1.3.1           compiler_3.6.1       cytominer_0.1.0.9000 pillar_1.4.2         backports_1.1.4      pkgconfig_2.0.2
--
 ```




@shntnu
Copy link
Member
shntnu commented Mar 20, 2020

Fixed in #135

library(readr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)

df <- read_csv('~/Downloads/population_ge_test.csv') %>% 
  filter(complete.cases(.))
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   Metadata_broad_sample_simple = col_character()
#> )
#> See spec(...) for full column specifications.

# workaround for error: geError in parse(text = x) : <text>:1:7: unexpected input 1: 221227_ ^
colnames(df) <- c("Metadata_broad_sample_simple",1:977)

df %<>% mutate(strata_col = 1) 

feature_columns <- setdiff(colnames(df),"Metadata_broad_sample_simple")

ge_normalized <- cytominer::normalize(
  population = df,
  variables = feature_columns, 
  sample = df,
  strata = c("strata_col"), 
  operation = "standardize"
)

ge_normalized %>% select(1:3) %>% slice(1:5) %>% knitr::kable()
Metadata_broad_sample_simple 1 2
BRD-A01528713 -1.2871379 1.0886862
BRD-A02809788 -0.0555724 -0.7771730
BRD-A03182941 0.5495872 -0.9621648
BRD-A04691170 -0.4334830 -0.3546694
BRD-A08759443 -0.1129926 -0.5845238

Created on 2020-03-20 by the reprex package (v0.3.0)

@shntnu shntnu closed this as completed Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0