Using a rank-based design in estimating prevalence of breast cancer

158 Accesses
6 Citations
Explore all metrics

Abstract

It is highly important for governments and health organizations to monitor the prevalence of breast cancer as a leading source of cancer-related death among women. However, the accurate diagnosis of this disease is expensive, especially in developing countries. This article concerns a cost-efficient method for estimating prevalence of breast cancer, when diagnosis is based on a comprehensive biopsy procedure. Multistage ranked set sampling (MSRSS) is utilized to develop a proportion estimator. This design employs imprecise rankings based on some visually assessed cytological covariates, so as to provide the experimenter with a more informative sample. Theoretical properties of the proposed estimator are explored. Evidence from numerical studies is reported. The developed procedure can be substantially more efficient than its competitor in simple random sampling (SRS). In some situations, the proportion estimation in MSRSS needs around 76% fewer observations than that in SRS, given a precision level. Thus, using MSRSS may lead to a considerable reduction in cost with respect to SRS. In many medical studies, e.g., diagnosing breast cancer based on a full biopsy procedure, exact quantification is difficult (costly and/or time-consuming), but the potential sample units can be ranked fairly accurately without actual measurements. In this setup, multistage ranked set sampling is an appropriate design for developing cost-efficient statistical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Efficient estimation of a disease prevalence using auxiliary ranks information

Article 26 November 2024

Smooth estimation of the area under the ROC curve in multistage ranked set sampling

Article 06 January 2020

Clustering and estimation of finite mixture models under bivariate ranked set sampling with application to a breast cancer study

Article 02 March 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://cran.r-project.org/web/packages/mlbench/index.html.
It is accessible at http://lib.stat.cmu.edu/datasets/bodyfat.

References

Al-Saleh MF, Al-Omari AI (2002) Multistage ranked set sampling. J Stat Plan Inference 102:273–286
Article MathSciNet Google Scholar
Chen H, Stasny EA, Wolfe DA (2005) Ranked set sampling for efficient estimation of a population proportion. Stat Med 24:3319–3329
Article MathSciNet Google Scholar
Coelho F, Braga AP, Natowicz R et al (2011) Semi-supervised model applied to the prediction of the response to preoperative chemotherapy for breast cancer. Soft Comput 15:1137–1144
Article Google Scholar
Dell TR, Clutter JL (1972) Ranked set sampling theory with order statistics background. Biometrics 28:545–555
Article Google Scholar
Frey J (2007) New imperfect rankings models for ranked set sampling. J Stat Plan Inference 137:1433–1445
Article MathSciNet Google Scholar
Frey J, Feeman TJ (2018) Finding the maximum efficiency for multistage ranked-set sampling. Commun Stat Theory Methods 47:4131–4141
Article MathSciNet Google Scholar
Frey J, Zhang Y (2021) Robust confidence intervals for a proportion using ranked-set sampling. J Korean Stat Soc. https://doi.org/10.1007/s42952-020-00103-3
Article MathSciNet Google Scholar
Gemayel NM, Stasny EA, Tackett JA, Wolfe DA (2012) Ranked set sampling: an auditing application. Rev Quant Financ Acc 39:413–422
Article Google Scholar
Mahdizadeh M, Strzalkowska-Kominiak E (2017) Resampling based inference for a distribution function using censored ranked set samples. Comput Stat 32:1285–1308
Article MathSciNet Google Scholar
Mahdizadeh M, Zamanzade E (2017) Reliability estimation in multistage ranked set sampling. REVSTAT 15:565–581
MathSciNet MATH Google Scholar
Mahdizadeh M, Zamanzade E (2019a) Dynamic reliability estimation in a rank-based design. Probab Math Stat 39:1–18
Article MathSciNet Google Scholar
Mahdizadeh M, Zamanzade E (2019b) Efficient body fat estimation using multistage pair ranked set sampling. Stat Methods Med Res 28:223–234
Mahdizadeh M, Zamanzade E (2020a) Estimating asymptotic variance of M-estimators in ranked set sampling. Comput Stat 35:1785–1803
Mahdizadeh M, Zamanzade E (2020b) Estimation of a symmetric distribution function in multistage ranked set sampling. Stat Papers 61:851–867
Mahdizadeh M, Zamanzade E (2021) Smooth estimation of the area under the ROC curve in multistage ranked set sampling. Stat Pap 62:1753–1776
McIntyre GA (1952) A method of unbiased selective sampling using ranked sets. Aust J Agric Res 3:385–390
Article Google Scholar
Ogiela MR, Krzyworzeka N (2016) Heuristic approach for computer-aided lesion detection in mammograms. Soft Comput 20:4193–4202
Article Google Scholar
Penrose KW, Nelson AG, Fisher AG (1985) Generalized body composition prediction equation for men using simple measurement techniques. Med Sci Sports Exerc 17:189
Article Google Scholar
Presnell B, Bohn LL (1999) U-Statistics and imperfect ranking in ranked set sampling. J Nonparamet Stat 10:111–126
Article MathSciNet Google Scholar
Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. Proc SPIE Int Soc Opt Eng 1905:861–870
Google Scholar
Terpstra JT, Liudahl LA (2004) Concomitant-based rank set sampling proportion estimates. Stat Med 23:2061–2070
Article Google Scholar
Terpstra JT, Miller ZA (2006) Exact inference for a population proportion based on a ranked set sample. Commun Stat Simul Comput 35:19–27
Article MathSciNet Google Scholar
Terpstra JT, Wang P (2008) Confidence intervals for a population proportion based on a ranked set sample. J Stat Comput Simul 78:351–366
Article MathSciNet Google Scholar
Wahde M, Szallasi Z (2006) Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms. Soft Comput 10:338–345
Article Google Scholar
Wolfe DA (2012) Ranked set sampling: its relevance and impact on statistical inference. ISRN Probab Stat 2012:568385
Zamanzade E, Mahdizadeh M (2017) A more efficient proportion estimator in ranked set sampling. Stat Probab Lett 129:28–33
Article MathSciNet Google Scholar
Zamanzade E, Mahdizadeh M (2018) Estimating the population proportion in pair ranked set sampling with application to air quality monitoring. J Appl Stat 45:426–437
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are grateful to the Area Editor and the reviewer for careful reading of an earlier version of this manuscript and providing many constructive comments. Ehsan Zamanzade’s research was carried out in IPM Isfahan branch and was in part supported by a grant from IPM, Iran (No. 1400620422).

Author information

Authors and Affiliations

Department of Statistics, Hakim Sabzevari University, P.O. Box 397, Sabzevar, Iran
M. Mahdizadeh
Department of Statistics, Faculty of Mathematics and Statistics, University of Isfahan, P.O. Box 81746-73441, Isfahan, Iran
Ehsan Zamanzade
School of Mathematics, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395–5746, Tehran, Iran
Ehsan Zamanzade

Authors

M. Mahdizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Zamanzade
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Proposition 1

(a) Let $F_{[i]}(x)$ ($i=1,\ldots ,m$) be the common distribution function of judgment order statistics with rank i, i.e., $X_{[i]1},\ldots ,X_{[i]n}$. If the ranking scheme is consistent, then we have

$$\begin{aligned} F(x)=\frac{1}{m} \sum _{i=1}^m F_{[i]}(x), \end{aligned}$$

(1)

where F(x) is the population distribution function (see Lemma 1 in Presnell and Bohn (1999) for details). The unbiasedness of $\hat{p}_{\text {RSS}}$ is immediate from this identity. Also, the variance expression is obtained by noting that elements of $\{X_{[i]j}: i=1,\ldots ,m\,;j=1,\ldots ,n \}$ are independent, and each $X_{[i]j}$ is a Bernoulli random variable with the success probability $p_{[i]}$.

(b) From the first part, one can write

$$\begin{aligned} Var\left( \hat{p}_{\text {RSS}} \right)= & {} \frac{1}{n m^2} \sum _{i=1}^m p_{[i]} \left( 1-p_{[i]} \right) \\= & {} \frac{1}{n m^2} \left[ \sum _{i=1}^m p_{[i]}-\sum _{i=1}^m p_{[i]}^2 \right] \\= & {} \frac{1}{n m^2} \left[ mp-\sum _{i=1}^m \left( p_{[i]}-p+p \right) ^2 \right] \\= & {} \frac{1}{n m^2} \left[ mp-mp^2-\sum _{i=1}^m \left( p_{[i]}-p \right) ^2 \right] \\= & {} \frac{p(1-p)}{n m}-\frac{1}{n m^2} \sum _{i=1}^m \left( p_{[i]}-p \right) ^2 \\\le & {} Var\left( \hat{p}_{\text {SRS}} \right) , \end{aligned}$$

where in the third equality, identity (1) has been used.

$$\begin{aligned} \hat{p}_{\text {RSS}}=\frac{1}{n}\left( \frac{1}{m}\sum _{i=1}^m X_{[i]1} +\cdots + \frac{1}{m}\sum _{i=1}^m X_{[i]n} \right) . \end{aligned}$$

To put it another way, $\hat{p}_{\text {RSS}}$ is the sample mean of n independent and identically distributed random variables with mean p and variance $\sum _{i=1}^m p_{[i]} \left( 1-p_{[i]} \right) /m^2$. The asymptotic normality is then concluded from the central limit theorem. $\square $

Proof of Proposition 2

(a) An identity similar to (1) holds in MSRSS (see Proposition 3 in Mahdizadeh and Zamanzade (2020b)). It establishes that

$$\begin{aligned} F(x)=\frac{1}{m} \sum _{i=1}^m F_{[i]}^{(r)}(x), \end{aligned}$$

(2)

where $F_{[i]}^{(r)}(x)$ ($i=1,\ldots ,m$) is the common distribution function of $X_{[i]1}^{(r)},\ldots ,X_{[i]n}^{(r)}$. An application of identity (2) shows the unbiasedness of $\hat{p}_{\text {MSRSS}}^{(r)}$. The corresponding variance is derived similar to that of $\hat{p}_{\text {RSS}}$.

(b) This is proved in line with part (b) of Proposition 1.

(c) Suppose $\mathcal {X}_{(i)}^{(r-1)}$ ($i=1,\ldots ,m$) is the ith order statistic of $X_{[1]1}^{(r-1)},\ldots ,X_{[m]1}^{(r-1)}$. Then, we have

$$\begin{aligned} Var\left( \hat{p}_{\text {MSRSS}}^{(r-1)} \right)= & {} \frac{1}{n m^2} Var\left( \sum _{i=1}^m X_{[i]1}^{(r-1)} \right) \nonumber \\= & {} \frac{1}{n m^2} Var\left( \sum _{i=1}^m \mathcal {X}_{(i)}^{(r-1)} \right) \nonumber \\= & {} \frac{1}{n m^2} \left[ \sum _{i=1}^m Var\left( \mathcal {X}_{(i)}^{(r-1)} \right) \right. \nonumber \\&+ \left. \sum _{i\ne j=1}^m Cov\left( \mathcal {X}_{(i)}^{(r-1)}, \mathcal {X}_{(j)}^{(r-1)} \right) \right] . \end{aligned}$$

(3)

It can be shown that the covariance terms in (3) are positive. To do so, without loss of generality, it is assumed that $i<j$. Then, one can write

$$\begin{aligned} Cov\left( \mathcal {X}_{(i)}^{(r-1)}, \mathcal {X}_{(j)}^{(r-1)} \right)= & {} E\left( \mathcal {X}_{(i)}^{(r-1)} \mathcal {X}_{(j)}^{(r-1)} \right) \nonumber \\- & {} E\left( \mathcal {X}_{(i)}^{(r-1)} \right) E\left( \mathcal {X}_{(j)}^{(r-1)} \right) \nonumber \\= & {} P\left( \mathcal {X}_{(i)}^{(r-1)}=1 \right) \nonumber \\&-P\left( \mathcal {X}_{(i)}^{(r-1)}=1 \right) P\left( \mathcal {X}_{(j)}^{(r-1)}=1 \right) \nonumber \\= & {} p_{[i]}^{(r)} \left( 1-p_{[j]}^{(r)} \right) . \end{aligned}$$

(4)

Putting (3) and (4) together, it follows that

$$\begin{aligned} Var\left( \hat{p}_{\text {MSRSS}}^{(r-1)} \right)\ge & {} \frac{1}{n m^2} \sum _{i=1}^m Var\left( \mathcal {X}_{(i)}^{(r-1)} \right) \nonumber \\= & {} \frac{1}{n m^2} \sum _{i=1}^m Var\left( X_{[i]1}^{(r)} \right) \nonumber \\= & {} Var\left( \hat{p}_{\text {MSRSS}}^{(r)} \right) , \end{aligned}$$

where the first equality results from the fact that $\mathcal {X}_{(i)}^{(r-1)}$ and $X_{[i]1}^{(r)}$ are identically distributed according to MSRSS procedure.

d) Proof of asymptotic normality of $\hat{p}_{\text {MSRSS}}^{(r)}$ parallels that of $\hat{p}_{\text {RSS}}$, and it is omitted. $\square $

Appendix B

Suppose that $m^{r+1}$ units are randomly identified from the population and are assigned labels $1,\ldots ,m^{r+1}$. Also, the corresponding measured values for the covariate are stored in ”zcon vector. Then, we may use the following function to draw a single cycle of an rth stage ranked set sample using set size m. It returns a vector of length m, consisting of labels $\ell _i$ ($i=1,\ldots ,m$). The final sample is given by $X_{[1]1}^{(r)},\ldots , X_{[m]1}^{(r)}$, where $X_{[i]1}^{(r)}$ is measurement of the variable of interest for the unit with label $\ell _i$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahdizadeh, M., Zamanzade, E. Using a rank-based design in estimating prevalence of breast cancer. Soft Comput 26, 3161–3170 (2022). https://doi.org/10.1007/s00500-022-06770-0

Download citation

Accepted: 22 December 2021
Published: 10 February 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00500-022-06770-0

Using a rank-based design in estimating prevalence of breast cancer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient estimation of a disease prevalence using auxiliary ranks information

Smooth estimation of the area under the ROC curve in multistage ranked set sampling

Clustering and estimation of finite mixture models under bivariate ranked set sampling with application to a breast cancer study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Appendices

Appendix A

Proof of Proposition 1

Proof of Proposition 2

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Using a rank-based design in estimating prevalence of breast cancer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient estimation of a disease prevalence using auxiliary ranks information

Smooth estimation of the area under the ROC curve in multistage ranked set sampling

Clustering and estimation of finite mixture models under bivariate ranked set sampling with application to a breast cancer study

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Appendices

Appendix A

Proof of Proposition 1

Proof of Proposition 2

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation