[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Classification of multivariate count data with multivariate log-linear conditional Poisson distribution

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

A classification model is proposed for distinguishing between several subpopulations using a multivariate count dataset. The classification rule, which minimizes the probability of misclassification, is obtained under the distributional hypothesis of a multivariate log-linear conditional Poisson distribution. A sample classification rule is defined based on the maximum likelihood estimators of the distributional parameters. This rule is based on functions associated with each one of the subpopulations, or equivalently, on the estimated posterior probabilities. Additionally, the likelihood ratio test of equality of the parameters for all the subpopulations is analyzed, providing a measure of the power to discriminate between subpopulations. Furthermore, an algorithm to determine the most suitable subset of counting variables for classification is proposed. Finally, actual and simulated datasets are considered to illustrate the application of the methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan M. Muñoz-Pichardo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Approximation to moments of the distribution

Consider the three-dimensional case, that is, \(\underline{Y} \sim MLCP_3(\underline{\eta },{\textbf{A}})\), with

$$\underline{\eta }=(\eta _1,\eta _2,\eta _3) \qquad \text { and } \qquad {\textbf{A}}= \left( \begin{array}{ccc} 0 & 0 & 0 \\ \alpha _{21} & 0 & 0 \\ \alpha _{31} & \alpha _{32} & 0 \end{array} \right) ,$$

Given the difficulty of calculating the moments of the distribution, we approach the approximate calculation through expansion series: the quadratic approximation, that is, the second degree Taylor polynomial approximation. In order to evaluate the change of the moments with respect to the parameters that determine the statistical dependence structure (\({\textbf{A}}\)) between the components, the polynomial approximation is carried out with respect to these parameters, keeping the \(\underline{\eta }\) parametric vector fixed. The proofs of the following results are collected in the work of Muñoz-Pichardo and Pino-Mejías (2023).

  1. 1.

    Expected values

    1. (a)

      \(E[Y_{1}] = e^{\eta _{1}}\).

    2. (b)

      \(E[Y_{2}]=e^{\eta _{2}}\ \exp \left[ e^{\eta _{1}}(e^{\alpha _{21}}-1)\right]\).

    3. (c)

      The quadratic approximation of \(E[Y_3]\) is given by

      $$\begin{aligned} E[Y_{3}]\approx &\,e^{\eta _{3}}\ + e^{\eta _{1}+\eta _{3}} \; \alpha _{31} + e^{\eta _{2}+\eta _{3}}\; \alpha _{32} \\ & + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3} } \; \alpha _{21}\alpha _{32} + \frac{1}{2} e^{\eta _{1} + \eta _{3}}(e^{\eta _{1}}+1)\; \alpha _{31}^{2} \\ & + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}} \alpha _{31} \; \alpha _{32} + \frac{1}{2} e^{\eta _{2}+\eta _{3}} \left( e^{\eta _{2}}+1\right) \; \alpha _{32}^{2}. \end{aligned}$$
  2. 2.

    Variances

    1. (a)

      \(Var[Y_{1}] = e^{\eta _{1}}\).

    2. (b)

      \(Var(Y_2) =E[Y_{2}] +\left( E[Y_2]\right) ^2 \left\{ \exp \left[ e^{\eta _{1}} (e^{\alpha _{21}}-1)^{2} \right] -1 \right\}\).

    3. (c)

      The quadratic approximation of \(Var[Y_3]\) is given by

      $$\begin{aligned} Var\left[ Y_{3}\right]\approx &\,e^{\eta _{3}} + e^{\eta _{1}+\eta _{3}} \alpha _{31} + e^{\eta _{2}+\eta _{3}}\alpha _{32} \\ & +\frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}}\left( 1-e^{\eta _{3}}\right) \alpha _{21}\alpha _{32} + \frac{1}{2} e^{\eta _{1}+\eta _{3}}\left[ 1+e^{\eta _{1}}+2e^{\eta _{3}}\right] \alpha _{31}^{2} \\ & + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}}\alpha _{31}\alpha _{32} + \frac{1}{2} e^{\eta _{2}+\eta _{3}}\left[ 1+e^{\eta _{2}}+2e^{\eta _{3}}\right] \alpha _{32}^{2} \end{aligned}$$
  3. 3.

    Covariances

    1. (a)

      \(Cov(Y_{1},Y_{2}) = e^{\eta _{1}}(e^{\alpha _{21}}-1)E[Y_{2}]\).

    2. (b)

      The quadratic approximation of \(Cov[Y_{1},Y_{3}]\) is given by

      $$\begin{aligned} Cov[Y_{1},Y_{3}] &\approx e^{\eta _{1}+\eta _{3}}\ \alpha _{31} + e^{\eta _{1}+\eta _{2}+\eta _{3}} \; \alpha _{21}\alpha _{32} \\ & \quad + \frac{1}{2} e^{\eta _{1}+\eta _{3}} \left[ 1+2e^{\eta _{1}}\ \right] \; \alpha _{31}^{2} + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}} \;\alpha _{31}\alpha _{32}. \end{aligned}$$
    3. (c)

      The quadratic approximation of \(Cov[Y_{2},Y_{3}]\) is given by

      $$\begin{aligned} Cov[Y_{2},Y_{3}] & \approx e^{\eta _{2}+\eta _{3}} \; \alpha _{32} + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}} \; \alpha _{21}\alpha _{31} \\ & \quad + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}} \; \alpha _{21}\alpha _{32} + \frac{1}{2} e^{\eta _{1}+\eta _{2}+\eta _{3}}\; \alpha _{31}\alpha _{32} \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muñoz-Pichardo, J.M., Pino-Mejías, R. Classification of multivariate count data with multivariate log-linear conditional Poisson distribution. Adv Data Anal Classif (2024). https://doi.org/10.1007/s11634-024-00617-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11634-024-00617-2

Keywords

Mathematics Subject Classification

Navigation