[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Using a rank-based design in estimating prevalence of breast cancer

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

It is highly important for governments and health organizations to monitor the prevalence of breast cancer as a leading source of cancer-related death among women. However, the accurate diagnosis of this disease is expensive, especially in developing countries. This article concerns a cost-efficient method for estimating prevalence of breast cancer, when diagnosis is based on a comprehensive biopsy procedure. Multistage ranked set sampling (MSRSS) is utilized to develop a proportion estimator. This design employs imprecise rankings based on some visually assessed cytological covariates, so as to provide the experimenter with a more informative sample. Theoretical properties of the proposed estimator are explored. Evidence from numerical studies is reported. The developed procedure can be substantially more efficient than its competitor in simple random sampling (SRS). In some situations, the proportion estimation in MSRSS needs around 76% fewer observations than that in SRS, given a precision level. Thus, using MSRSS may lead to a considerable reduction in cost with respect to SRS. In many medical studies, e.g., diagnosing breast cancer based on a full biopsy procedure, exact quantification is difficult (costly and/or time-consuming), but the potential sample units can be ranked fairly accurately without actual measurements. In this setup, multistage ranked set sampling is an appropriate design for developing cost-efficient statistical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://cran.r-project.org/web/packages/mlbench/index.html.

  2. It is accessible at http://lib.stat.cmu.edu/datasets/bodyfat.

References

  • Al-Saleh MF, Al-Omari AI (2002) Multistage ranked set sampling. J Stat Plan Inference 102:273–286

    Article  MathSciNet  Google Scholar 

  • Chen H, Stasny EA, Wolfe DA (2005) Ranked set sampling for efficient estimation of a population proportion. Stat Med 24:3319–3329

    Article  MathSciNet  Google Scholar 

  • Coelho F, Braga AP, Natowicz R et al (2011) Semi-supervised model applied to the prediction of the response to preoperative chemotherapy for breast cancer. Soft Comput 15:1137–1144

    Article  Google Scholar 

  • Dell TR, Clutter JL (1972) Ranked set sampling theory with order statistics background. Biometrics 28:545–555

    Article  Google Scholar 

  • Frey J (2007) New imperfect rankings models for ranked set sampling. J Stat Plan Inference 137:1433–1445

    Article  MathSciNet  Google Scholar 

  • Frey J, Feeman TJ (2018) Finding the maximum efficiency for multistage ranked-set sampling. Commun Stat Theory Methods 47:4131–4141

    Article  MathSciNet  Google Scholar 

  • Frey J, Zhang Y (2021) Robust confidence intervals for a proportion using ranked-set sampling. J Korean Stat Soc. https://doi.org/10.1007/s42952-020-00103-3

    Article  MathSciNet  Google Scholar 

  • Gemayel NM, Stasny EA, Tackett JA, Wolfe DA (2012) Ranked set sampling: an auditing application. Rev Quant Financ Acc 39:413–422

    Article  Google Scholar 

  • Mahdizadeh M, Strzalkowska-Kominiak E (2017) Resampling based inference for a distribution function using censored ranked set samples. Comput Stat 32:1285–1308

    Article  MathSciNet  Google Scholar 

  • Mahdizadeh M, Zamanzade E (2017) Reliability estimation in multistage ranked set sampling. REVSTAT 15:565–581

    MathSciNet  MATH  Google Scholar 

  • Mahdizadeh M, Zamanzade E (2019a) Dynamic reliability estimation in a rank-based design. Probab Math Stat 39:1–18

    Article  MathSciNet  Google Scholar 

  • Mahdizadeh M, Zamanzade E (2019b) Efficient body fat estimation using multistage pair ranked set sampling. Stat Methods Med Res 28:223–234

  • Mahdizadeh M, Zamanzade E (2020a) Estimating asymptotic variance of M-estimators in ranked set sampling. Comput Stat 35:1785–1803

  • Mahdizadeh M, Zamanzade E (2020b) Estimation of a symmetric distribution function in multistage ranked set sampling. Stat Papers 61:851–867

  • Mahdizadeh M, Zamanzade E (2021) Smooth estimation of the area under the ROC curve in multistage ranked set sampling. Stat Pap 62:1753–1776

  • McIntyre GA (1952) A method of unbiased selective sampling using ranked sets. Aust J Agric Res 3:385–390

    Article  Google Scholar 

  • Ogiela MR, Krzyworzeka N (2016) Heuristic approach for computer-aided lesion detection in mammograms. Soft Comput 20:4193–4202

    Article  Google Scholar 

  • Penrose KW, Nelson AG, Fisher AG (1985) Generalized body composition prediction equation for men using simple measurement techniques. Med Sci Sports Exerc 17:189

    Article  Google Scholar 

  • Presnell B, Bohn LL (1999) U-Statistics and imperfect ranking in ranked set sampling. J Nonparamet Stat 10:111–126

    Article  MathSciNet  Google Scholar 

  • Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. Proc SPIE Int Soc Opt Eng 1905:861–870

    Google Scholar 

  • Terpstra JT, Liudahl LA (2004) Concomitant-based rank set sampling proportion estimates. Stat Med 23:2061–2070

    Article  Google Scholar 

  • Terpstra JT, Miller ZA (2006) Exact inference for a population proportion based on a ranked set sample. Commun Stat Simul Comput 35:19–27

    Article  MathSciNet  Google Scholar 

  • Terpstra JT, Wang P (2008) Confidence intervals for a population proportion based on a ranked set sample. J Stat Comput Simul 78:351–366

    Article  MathSciNet  Google Scholar 

  • Wahde M, Szallasi Z (2006) Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms. Soft Comput 10:338–345

    Article  Google Scholar 

  • Wolfe DA (2012) Ranked set sampling: its relevance and impact on statistical inference. ISRN Probab Stat 2012:568385

  • Zamanzade E, Mahdizadeh M (2017) A more efficient proportion estimator in ranked set sampling. Stat Probab Lett 129:28–33

    Article  MathSciNet  Google Scholar 

  • Zamanzade E, Mahdizadeh M (2018) Estimating the population proportion in pair ranked set sampling with application to air quality monitoring. J Appl Stat 45:426–437

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Area Editor and the reviewer for careful reading of an earlier version of this manuscript and providing many constructive comments. Ehsan Zamanzade’s research was carried out in IPM Isfahan branch and was in part supported by a grant from IPM, Iran (No. 1400620422).

Author information

Authors and Affiliations

Authors

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Proposition 1

(a) Let \(F_{[i]}(x)\) (\(i=1,\ldots ,m\)) be the common distribution function of judgment order statistics with rank i, i.e., \(X_{[i]1},\ldots ,X_{[i]n}\). If the ranking scheme is consistent, then we have

$$\begin{aligned} F(x)=\frac{1}{m} \sum _{i=1}^m F_{[i]}(x), \end{aligned}$$
(1)

where F(x) is the population distribution function (see Lemma 1 in Presnell and Bohn (1999) for details). The unbiasedness of \(\hat{p}_{\text {RSS}}\) is immediate from this identity. Also, the variance expression is obtained by noting that elements of \(\{X_{[i]j}: i=1,\ldots ,m\,;j=1,\ldots ,n \}\) are independent, and each \(X_{[i]j}\) is a Bernoulli random variable with the success probability \(p_{[i]}\).

(b) From the first part, one can write

$$\begin{aligned} Var\left( \hat{p}_{\text {RSS}} \right)= & {} \frac{1}{n m^2} \sum _{i=1}^m p_{[i]} \left( 1-p_{[i]} \right) \\= & {} \frac{1}{n m^2} \left[ \sum _{i=1}^m p_{[i]}-\sum _{i=1}^m p_{[i]}^2 \right] \\= & {} \frac{1}{n m^2} \left[ mp-\sum _{i=1}^m \left( p_{[i]}-p+p \right) ^2 \right] \\= & {} \frac{1}{n m^2} \left[ mp-mp^2-\sum _{i=1}^m \left( p_{[i]}-p \right) ^2 \right] \\= & {} \frac{p(1-p)}{n m}-\frac{1}{n m^2} \sum _{i=1}^m \left( p_{[i]}-p \right) ^2 \\\le & {} Var\left( \hat{p}_{\text {SRS}} \right) , \end{aligned}$$

where in the third equality, identity (1) has been used.

(c) It can be seen that

$$\begin{aligned} \hat{p}_{\text {RSS}}=\frac{1}{n}\left( \frac{1}{m}\sum _{i=1}^m X_{[i]1} +\cdots + \frac{1}{m}\sum _{i=1}^m X_{[i]n} \right) . \end{aligned}$$

To put it another way, \(\hat{p}_{\text {RSS}}\) is the sample mean of n independent and identically distributed random variables with mean p and variance \(\sum _{i=1}^m p_{[i]} \left( 1-p_{[i]} \right) /m^2\). The asymptotic normality is then concluded from the central limit theorem. \(\square \)

Proof of Proposition 2

(a) An identity similar to (1) holds in MSRSS (see Proposition 3 in Mahdizadeh and Zamanzade (2020b)). It establishes that

$$\begin{aligned} F(x)=\frac{1}{m} \sum _{i=1}^m F_{[i]}^{(r)}(x), \end{aligned}$$
(2)

where \(F_{[i]}^{(r)}(x)\) (\(i=1,\ldots ,m\)) is the common distribution function of \(X_{[i]1}^{(r)},\ldots ,X_{[i]n}^{(r)}\). An application of identity (2) shows the unbiasedness of \(\hat{p}_{\text {MSRSS}}^{(r)}\). The corresponding variance is derived similar to that of \(\hat{p}_{\text {RSS}}\).

(b) This is proved in line with part (b) of Proposition 1.

(c) Suppose \(\mathcal {X}_{(i)}^{(r-1)}\) (\(i=1,\ldots ,m\)) is the ith order statistic of \(X_{[1]1}^{(r-1)},\ldots ,X_{[m]1}^{(r-1)}\). Then, we have

$$\begin{aligned} Var\left( \hat{p}_{\text {MSRSS}}^{(r-1)} \right)= & {} \frac{1}{n m^2} Var\left( \sum _{i=1}^m X_{[i]1}^{(r-1)} \right) \nonumber \\= & {} \frac{1}{n m^2} Var\left( \sum _{i=1}^m \mathcal {X}_{(i)}^{(r-1)} \right) \nonumber \\= & {} \frac{1}{n m^2} \left[ \sum _{i=1}^m Var\left( \mathcal {X}_{(i)}^{(r-1)} \right) \right. \nonumber \\&+ \left. \sum _{i\ne j=1}^m Cov\left( \mathcal {X}_{(i)}^{(r-1)}, \mathcal {X}_{(j)}^{(r-1)} \right) \right] . \end{aligned}$$
(3)

It can be shown that the covariance terms in (3) are positive. To do so, without loss of generality, it is assumed that \(i<j\). Then, one can write

$$\begin{aligned} Cov\left( \mathcal {X}_{(i)}^{(r-1)}, \mathcal {X}_{(j)}^{(r-1)} \right)= & {} E\left( \mathcal {X}_{(i)}^{(r-1)} \mathcal {X}_{(j)}^{(r-1)} \right) \nonumber \\- & {} E\left( \mathcal {X}_{(i)}^{(r-1)} \right) E\left( \mathcal {X}_{(j)}^{(r-1)} \right) \nonumber \\= & {} P\left( \mathcal {X}_{(i)}^{(r-1)}=1 \right) \nonumber \\&-P\left( \mathcal {X}_{(i)}^{(r-1)}=1 \right) P\left( \mathcal {X}_{(j)}^{(r-1)}=1 \right) \nonumber \\= & {} p_{[i]}^{(r)} \left( 1-p_{[j]}^{(r)} \right) . \end{aligned}$$
(4)

Putting (3) and (4) together, it follows that

$$\begin{aligned} Var\left( \hat{p}_{\text {MSRSS}}^{(r-1)} \right)\ge & {} \frac{1}{n m^2} \sum _{i=1}^m Var\left( \mathcal {X}_{(i)}^{(r-1)} \right) \nonumber \\= & {} \frac{1}{n m^2} \sum _{i=1}^m Var\left( X_{[i]1}^{(r)} \right) \nonumber \\= & {} Var\left( \hat{p}_{\text {MSRSS}}^{(r)} \right) , \end{aligned}$$

where the first equality results from the fact that \(\mathcal {X}_{(i)}^{(r-1)}\) and \(X_{[i]1}^{(r)}\) are identically distributed according to MSRSS procedure.

d) Proof of asymptotic normality of \(\hat{p}_{\text {MSRSS}}^{(r)}\) parallels that of \(\hat{p}_{\text {RSS}}\), and it is omitted. \(\square \)

Appendix B

Suppose that \(m^{r+1}\) units are randomly identified from the population and are assigned labels \(1,\ldots ,m^{r+1}\). Also, the corresponding measured values for the covariate are stored in zcon vector. Then, we may use the following function to draw a single cycle of an rth stage ranked set sample using set size m. It returns a vector of length m, consisting of labels \(\ell _i\) (\(i=1,\ldots ,m\)). The final sample is given by \(X_{[1]1}^{(r)},\ldots , X_{[m]1}^{(r)}\), where \(X_{[i]1}^{(r)}\) is measurement of the variable of interest for the unit with label \(\ell _i\).

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahdizadeh, M., Zamanzade, E. Using a rank-based design in estimating prevalence of breast cancer. Soft Comput 26, 3161–3170 (2022). https://doi.org/10.1007/s00500-022-06770-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-06770-0

Keywords

Navigation