Abstract
It is highly important for governments and health organizations to monitor the prevalence of breast cancer as a leading source of cancer-related death among women. However, the accurate diagnosis of this disease is expensive, especially in developing countries. This article concerns a cost-efficient method for estimating prevalence of breast cancer, when diagnosis is based on a comprehensive biopsy procedure. Multistage ranked set sampling (MSRSS) is utilized to develop a proportion estimator. This design employs imprecise rankings based on some visually assessed cytological covariates, so as to provide the experimenter with a more informative sample. Theoretical properties of the proposed estimator are explored. Evidence from numerical studies is reported. The developed procedure can be substantially more efficient than its competitor in simple random sampling (SRS). In some situations, the proportion estimation in MSRSS needs around 76% fewer observations than that in SRS, given a precision level. Thus, using MSRSS may lead to a considerable reduction in cost with respect to SRS. In many medical studies, e.g., diagnosing breast cancer based on a full biopsy procedure, exact quantification is difficult (costly and/or time-consuming), but the potential sample units can be ranked fairly accurately without actual measurements. In this setup, multistage ranked set sampling is an appropriate design for developing cost-efficient statistical methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Al-Saleh MF, Al-Omari AI (2002) Multistage ranked set sampling. J Stat Plan Inference 102:273–286
Chen H, Stasny EA, Wolfe DA (2005) Ranked set sampling for efficient estimation of a population proportion. Stat Med 24:3319–3329
Coelho F, Braga AP, Natowicz R et al (2011) Semi-supervised model applied to the prediction of the response to preoperative chemotherapy for breast cancer. Soft Comput 15:1137–1144
Dell TR, Clutter JL (1972) Ranked set sampling theory with order statistics background. Biometrics 28:545–555
Frey J (2007) New imperfect rankings models for ranked set sampling. J Stat Plan Inference 137:1433–1445
Frey J, Feeman TJ (2018) Finding the maximum efficiency for multistage ranked-set sampling. Commun Stat Theory Methods 47:4131–4141
Frey J, Zhang Y (2021) Robust confidence intervals for a proportion using ranked-set sampling. J Korean Stat Soc. https://doi.org/10.1007/s42952-020-00103-3
Gemayel NM, Stasny EA, Tackett JA, Wolfe DA (2012) Ranked set sampling: an auditing application. Rev Quant Financ Acc 39:413–422
Mahdizadeh M, Strzalkowska-Kominiak E (2017) Resampling based inference for a distribution function using censored ranked set samples. Comput Stat 32:1285–1308
Mahdizadeh M, Zamanzade E (2017) Reliability estimation in multistage ranked set sampling. REVSTAT 15:565–581
Mahdizadeh M, Zamanzade E (2019a) Dynamic reliability estimation in a rank-based design. Probab Math Stat 39:1–18
Mahdizadeh M, Zamanzade E (2019b) Efficient body fat estimation using multistage pair ranked set sampling. Stat Methods Med Res 28:223–234
Mahdizadeh M, Zamanzade E (2020a) Estimating asymptotic variance of M-estimators in ranked set sampling. Comput Stat 35:1785–1803
Mahdizadeh M, Zamanzade E (2020b) Estimation of a symmetric distribution function in multistage ranked set sampling. Stat Papers 61:851–867
Mahdizadeh M, Zamanzade E (2021) Smooth estimation of the area under the ROC curve in multistage ranked set sampling. Stat Pap 62:1753–1776
McIntyre GA (1952) A method of unbiased selective sampling using ranked sets. Aust J Agric Res 3:385–390
Ogiela MR, Krzyworzeka N (2016) Heuristic approach for computer-aided lesion detection in mammograms. Soft Comput 20:4193–4202
Penrose KW, Nelson AG, Fisher AG (1985) Generalized body composition prediction equation for men using simple measurement techniques. Med Sci Sports Exerc 17:189
Presnell B, Bohn LL (1999) U-Statistics and imperfect ranking in ranked set sampling. J Nonparamet Stat 10:111–126
Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. Proc SPIE Int Soc Opt Eng 1905:861–870
Terpstra JT, Liudahl LA (2004) Concomitant-based rank set sampling proportion estimates. Stat Med 23:2061–2070
Terpstra JT, Miller ZA (2006) Exact inference for a population proportion based on a ranked set sample. Commun Stat Simul Comput 35:19–27
Terpstra JT, Wang P (2008) Confidence intervals for a population proportion based on a ranked set sample. J Stat Comput Simul 78:351–366
Wahde M, Szallasi Z (2006) Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms. Soft Comput 10:338–345
Wolfe DA (2012) Ranked set sampling: its relevance and impact on statistical inference. ISRN Probab Stat 2012:568385
Zamanzade E, Mahdizadeh M (2017) A more efficient proportion estimator in ranked set sampling. Stat Probab Lett 129:28–33
Zamanzade E, Mahdizadeh M (2018) Estimating the population proportion in pair ranked set sampling with application to air quality monitoring. J Appl Stat 45:426–437
Acknowledgements
The authors are grateful to the Area Editor and the reviewer for careful reading of an earlier version of this manuscript and providing many constructive comments. Ehsan Zamanzade’s research was carried out in IPM Isfahan branch and was in part supported by a grant from IPM, Iran (No. 1400620422).
Author information
Authors and Affiliations
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Proof of Proposition 1
(a) Let \(F_{[i]}(x)\) (\(i=1,\ldots ,m\)) be the common distribution function of judgment order statistics with rank i, i.e., \(X_{[i]1},\ldots ,X_{[i]n}\). If the ranking scheme is consistent, then we have
where F(x) is the population distribution function (see Lemma 1 in Presnell and Bohn (1999) for details). The unbiasedness of \(\hat{p}_{\text {RSS}}\) is immediate from this identity. Also, the variance expression is obtained by noting that elements of \(\{X_{[i]j}: i=1,\ldots ,m\,;j=1,\ldots ,n \}\) are independent, and each \(X_{[i]j}\) is a Bernoulli random variable with the success probability \(p_{[i]}\).
(b) From the first part, one can write
where in the third equality, identity (1) has been used.
(c) It can be seen that
To put it another way, \(\hat{p}_{\text {RSS}}\) is the sample mean of n independent and identically distributed random variables with mean p and variance \(\sum _{i=1}^m p_{[i]} \left( 1-p_{[i]} \right) /m^2\). The asymptotic normality is then concluded from the central limit theorem. \(\square \)
Proof of Proposition 2
(a) An identity similar to (1) holds in MSRSS (see Proposition 3 in Mahdizadeh and Zamanzade (2020b)). It establishes that
where \(F_{[i]}^{(r)}(x)\) (\(i=1,\ldots ,m\)) is the common distribution function of \(X_{[i]1}^{(r)},\ldots ,X_{[i]n}^{(r)}\). An application of identity (2) shows the unbiasedness of \(\hat{p}_{\text {MSRSS}}^{(r)}\). The corresponding variance is derived similar to that of \(\hat{p}_{\text {RSS}}\).
(b) This is proved in line with part (b) of Proposition 1.
(c) Suppose \(\mathcal {X}_{(i)}^{(r-1)}\) (\(i=1,\ldots ,m\)) is the ith order statistic of \(X_{[1]1}^{(r-1)},\ldots ,X_{[m]1}^{(r-1)}\). Then, we have
It can be shown that the covariance terms in (3) are positive. To do so, without loss of generality, it is assumed that \(i<j\). Then, one can write
Putting (3) and (4) together, it follows that
where the first equality results from the fact that \(\mathcal {X}_{(i)}^{(r-1)}\) and \(X_{[i]1}^{(r)}\) are identically distributed according to MSRSS procedure.
d) Proof of asymptotic normality of \(\hat{p}_{\text {MSRSS}}^{(r)}\) parallels that of \(\hat{p}_{\text {RSS}}\), and it is omitted. \(\square \)
Appendix B
Suppose that \(m^{r+1}\) units are randomly identified from the population and are assigned labels \(1,\ldots ,m^{r+1}\). Also, the corresponding measured values for the covariate are stored in ”zcon vector. Then, we may use the following function to draw a single cycle of an rth stage ranked set sample using set size m. It returns a vector of length m, consisting of labels \(\ell _i\) (\(i=1,\ldots ,m\)). The final sample is given by \(X_{[1]1}^{(r)},\ldots , X_{[m]1}^{(r)}\), where \(X_{[i]1}^{(r)}\) is measurement of the variable of interest for the unit with label \(\ell _i\).
Rights and permissions
About this article
Cite this article
Mahdizadeh, M., Zamanzade, E. Using a rank-based design in estimating prevalence of breast cancer. Soft Comput 26, 3161–3170 (2022). https://doi.org/10.1007/s00500-022-06770-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-06770-0