[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Binary segmentation procedures using the bivariate binomial distribution for detecting streakiness in sports data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Streakiness is an important measure in many sports data for individual players or teams in which the success rate is not a constant over time. That is, there are many successes/failures during some periods and few or no successes/failures during other periods. In this paper we propose a Bayesian binary segmentation procedure using a bivariate binomial distribution to locate the changepoints and estimate the associated success rates. The proposed method consists of a series of nested hypothesis tests based on the Bayes factors or posterior probabilities. At each stage, we compare three different changepoint models to the constant success rate model using the bivariate binary data. The proposed method is applied to analyze real sports datasets on baseball and basketball players as illustration. Extensive simulation studies are performed to demonstrate the usefulness of the proposed methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adams RP, MacKay DJ (2007) Bayesian online changepoint detection. Technical Report, University of Cambridge, Cambridge, UK

  • Albert J (2004) Streakiness in team performance. Chance 17:37–43

    Article  MathSciNet  Google Scholar 

  • Albert J (2008) Streaky hitting in baseball. J Quanti Anal Sports 4, Article 3

  • Albert J, Williamson P (2001) Using model/data simulations to detect streakiness. Am Stat 55:41–50

    Article  MathSciNet  Google Scholar 

  • Albright S (1993) A statistical analysis of hitting streaks in baseball. J Am Stat Assoc 88:1175–1183

    Article  Google Scholar 

  • Aminikhanghahi S, Cook DJ (2017) A survey of methods for time series change point detection. Knowl Inf Syst 51:339–367

    Article  Google Scholar 

  • Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88:309–319

    MathSciNet  MATH  Google Scholar 

  • Baumer B (2008) Why on-base percentage is a better indicator of future performance than batting average: an algebraic proof. J Quant Anal Sports 4:1–13

    MathSciNet  Google Scholar 

  • Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91:109–122

    Article  MathSciNet  Google Scholar 

  • Berry S (1991) The summer of ’41: a probability analysis of DiMaggio’s streak and Williams’s average of.406. Chance 4:8–11

    Article  Google Scholar 

  • Chen J, Gupta AK (1997) Testing and locating variance changepoints with application to stock prices. J Am Stat Assoc 92:739–747

    Article  MathSciNet  Google Scholar 

  • Crowder M, Sweeting T (1989) Bayesian inference for a bivariate binomial distribution. Biometrika 76:599–603

    Article  MathSciNet  Google Scholar 

  • Dean O (2004) Basketball on paper: rules and tools for performance analysis. Potomac Books Inc, Sterling

    Google Scholar 

  • Dorsey-Palmateer R, Smith G (2004) Bowlers’ hot hands. Am Stat 58:38–45

    Article  MathSciNet  Google Scholar 

  • Fearnhead P, Liu Z (2007) On-line inference for multiple changepoint problems. J R Stat Soc Ser B (Stat Methodol) 69:589–605

    Article  MathSciNet  Google Scholar 

  • Gilovich T, Vallone R, Tversky A (1985) The hot hand in basketball: on the misperception of random sequences. Cogn Psychol 17:295–314

    Article  Google Scholar 

  • Hickson DA, Waller LA (2003) Spatial analyses of basketball shot charts: an Application to Michael Jordans 2001–2002 NBA Season. Technical Report, Department of Biostatistics, Emory University

  • Hollinger J (2005) Pro basketball forecast: 2005–2006. Potomac Books Inc, Sterling

    Google Scholar 

  • Kass-Hout TA, Xu Z, McMurray P, Park S, Buckeridge DL, Brownstein JS, Finelli L, Groseclose SL (2012) Application of change point analysis to daily influenza-like illness emergency department visits. J Am Med Inform Assoc 19:1075–1081

    Article  Google Scholar 

  • Kendrick L, Musial K, Gabrys B (2018) Change point detection in social networks—critical review with experiments. Comput Sci Rev 29:1–13

    Article  MathSciNet  Google Scholar 

  • Larkey P, Smith R, Kadane J (1989) It’s okay to believe in the “Hot Hand”. Chance 2:22–30

    Article  Google Scholar 

  • Null B (2009) Modeling baseball player ability with a nested Dirichlet distribution. J Quant Anal Sports 5:1–38

    MathSciNet  Google Scholar 

  • Piette J, Anand S, Zhang K (2010) Scoring and shooting abilities of NBA players. J Quant Anal Sports 6:1–23

    MathSciNet  Google Scholar 

  • Polson N, Wasserman L (1990) Prior distributions for the bivariate binomial. Biometrika 77:901–904

    Article  MathSciNet  Google Scholar 

  • Puerzer RJ (2003) Engineering baseball: branch Rickey’s innovative approach to baseball management. NINE J Baseb Hist Cult 12:72–87

    Article  Google Scholar 

  • Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. Int J Climatol 33:520–528

    Article  Google Scholar 

  • Silver N (2003) Lies, damned lies, randomness: catch the fever! Baseball Prospectus, May 14, 2003

  • Smith AFM (1975) A Bayesian approach to inference about a change-point in a sequence of random variables. Biometrika 62:407–416

    Article  MathSciNet  Google Scholar 

  • Stephens DA (1994) Bayesian retrospective multiple-changepoint identification. J R Stat Soc Ser C (Appl Stat) 43:159–178

    MATH  Google Scholar 

  • Stern HS, Morris CN (1993) Looking for small effects: power and finite sample bias considerations. (Comment on C. Albright’s, “A Statistical analysis of hitting streaks in baseball”). J Am Stat Assoc 88:1189–1194

    Google Scholar 

  • Stern H (1997) Judging who’s hot and who’s not. Chance 10:40–43

    Article  Google Scholar 

  • Studeman D (2007) Should Jose Reyes hit more ground balls? The Hardball Times, December 13, 2007

  • Truong C, Oudre L, Vayatis N (2020) Selective review of offline change point detection methods. Sig Process 167:107299

    Article  Google Scholar 

  • Tversky A, Gilovich T (1989) The cold facts about the “Hot Hand” in basketball. Chance 2:16–21

    Article  Google Scholar 

  • Yang TY (2004) Bayesian binary segmentation procedure for detecting streakiness in sports. J R Stat Soc Ser A (Stat Soc) 167:627–637

    Article  MathSciNet  Google Scholar 

  • Yang TY, Kuo L (2001) Bayesian binary segmentation procedure for a Poisson process with multiple changepoints. J Comput Graph Stat 10:772–785

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The first author’s research was partially supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07045804).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinheum Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Proposition 3.1

We only prove parts (b) and (c). The derivations for \(\pi _0\) and \(\pi _3\) can be done using similar arguments in the proofs of (b) and (c). Note that the Jeffreys prior for a multi-parameter case with \({\varvec{\theta }}=(\theta _1,\dots ,\theta _k)\) is \(\pi ({\varvec{\theta }}) \propto |I({\varvec{\theta }})|^{1/2}\), where \(I({\varvec{\theta }})\) is the Fisher information matrix for \({\varvec{\theta }}\) and the i’s row and j’s column of \(I({\varvec{\theta }})\) is

$$\begin{aligned} I_{ij}=-\text{ E } \biggl [\frac{\partial ^2 \log L({\varvec{\theta }})}{\partial \theta _i \partial \theta _j}\biggl ] \quad \text { for } 1 \le i, j \le k. \end{aligned}$$

(b) For a fixed value of c, the log-likelihood of model \({{{\mathcal {M}}}}_1\) is

$$\begin{aligned} \lambda _1= & {} \log {L_{1}(\theta _{11},\theta _{12},\theta _{20})}\\= & {} \sum _{i=1}^{c}x_{1i} \log {\theta _{11}} + \sum _{i=1}^{c} (m-x_{1i})\log {(1-\theta _{11})}+\sum _{i=c+1}^{n}x_{1i} \log {\theta _{12}} \\&+\sum _{i=c+1}^{n} (m-x_{1i})\log {(1-\theta _{12})}+\sum _{i=1}^{n}x_{2i} \log {\theta _{20}}+\sum _{i=1}^{n}(x_{1i}-x_{2i})\log (1-\theta _{20}). \end{aligned}$$

The partial derivatives can be obtained as

$$\begin{aligned} \frac{\partial \lambda _1}{\partial \theta _{11}}= & {} \frac{\sum _{i=1}^{c}x_{1i}}{\theta _{11}}- \frac{\sum _{i=1}^{c}(m-x_{1i})}{1-\theta _{11}},\\ \frac{\partial ^{2}\lambda _1}{\partial \theta _{11}^{2}}= & {} -\frac{\sum _{i=1}^{c}x_{1i}}{\theta _{11}^{2}}- \frac{\sum _{i=1}^{c}(m-x_{1i})}{(1-\theta _{11})^{2}},\\ \frac{\partial \lambda _1}{\partial \theta _{12}}= & {} \frac{\sum _{i=c+1}^{n}x_{1i}}{\theta _{12}}- \frac{\sum _{i=c+1}^{n}(m-x_{1i})}{1-\theta _{12}},\\ \frac{\partial ^{2}\lambda _1}{\partial \theta _{12}^{2}}= & {} -\frac{\sum _{i=c+1}^{n}x_{1i}}{\theta _{12}^{2}}- \frac{\sum _{i=c+1}^{n}(m-x_{1i})}{(1-\theta _{12})^{2}},\\ \frac{\partial \lambda _1}{\partial \theta _{20}}= & {} \frac{\sum _{i=1}^{n}x_{2i}}{\theta _{20}}- \frac{\sum _{i=1}^{n}(x_{1i}-x_{2i})}{1-\theta _{20}},\\ \frac{\partial ^{2} \lambda }{\partial \theta _{20}^{2}}= & {} -\frac{\sum _{i=1}^{n}x_{2i}}{\theta _{20}^{2}}- \frac{\sum _{i=1}^{n}(x_{1i}-x_{2i})}{(1-\theta _{20})^{2}}. \end{aligned}$$

Since all the cross partial derivatives are zero, we have the following Fisher information

$$\begin{aligned} I(\theta _{11},\theta _{12},\theta _{20})= \left( \begin{array}{ccc}{\frac{cm}{\theta _{11}(1-\theta _{11})}} &{} {0} &{} {0} \\ {0} &{} {\frac{(n-c)m}{\theta _{12}(1-\theta _{12})}}&{}{0} \\ {0}&{}{0}&{}{\frac{cm\theta _{11}+(n-c)m\theta _{12}}{\theta _{20}(1-\theta _{20})}} \end{array}\right) . \end{aligned}$$
(A1)

The Jeffreys prior for model \({{{\mathcal {M}}}}_1\) readily follows from (A1), since the off-diagonal elements are all zero. \(\square \)

(c) Again, for a fixed value of c the log-likelihood of model \({{{\mathcal {M}}}}_2\) is

$$\begin{aligned} \lambda _2= & {} \log {L_{2}(\theta _{10},\theta _{21},\theta _{22})}\\= & {} \sum _{i=1}^{n}x_{1i} \log {\theta _{10}} + \sum _{i=1}^{n} (m-x_{1i})\log {(1-\theta _{10})}+\sum _{i=1}^{c}x_{1i} \log {\theta _{21}} \nonumber \\&+\sum _{i=1}^{c} (x_{1i}-x_{2i})\log {(1-\theta _{21})}+ \sum _{i=c+1}^{n}x_{2i}\log {\theta _{22}}\\&+\sum _{i=c+1}^{n}(x_{1i}-x_{2i})\log (1-\theta _{22}). \end{aligned}$$

Differentiate \(\lambda _2\) with respect to \(\theta _{10}\), \(\theta _{21}\), and \(\theta _{22}\) to obtain the partial derivatives as

$$\begin{aligned} \frac{\partial \lambda _2}{\partial \theta _{10}}= & {} \frac{\sum _{i=1}^{n}x_{1i}}{\theta _{10}}- \frac{\sum _{i=1}^{n}(m-x_{1i})}{1-\theta _{10}},\\ \frac{\partial ^{2}\lambda _2}{\partial \theta _{10}^{2}}= & {} -\frac{\sum _{i=1}^{n}x_{1i}}{\theta _{10}^{2}}- \frac{\sum _{i=1}^{n}(m-x_{1i})}{(1-\theta _{10})^{2}},\\ \frac{\partial \lambda _2}{\partial \theta _{21}}= & {} \frac{\sum _{i=1}^{c}x_{2i}}{\theta _{21}}- \frac{\sum _{i=1}^{c}(x_{1i}-x_{2i})}{1-\theta _{21}},\\ \frac{\partial ^{2}\lambda _2}{\partial \theta _{21}^{2}}= & {} -\frac{\sum _{i=1}^{c}x_{2i}}{\theta _{21}^{2}}- \frac{\sum _{i=1}^{c}(x_{1i}-x_{2i})}{(1-\theta _{21})^{2}},\\ \frac{\partial \lambda _2}{\partial \theta _{22}}= & {} \frac{\sum _{i=c+1}^{n}x_{2i}}{\theta _{22}}- \frac{\sum _{i=c+1}^{n}(x_{1i}-x_{2i})}{1-\theta _{22}},\\ \frac{\partial ^{2} \lambda _2}{\partial \theta _{22}^{2}}= & {} -\frac{\sum _{i=c+1}^{n}x_{2i}}{\theta _{22}^{2}}-\frac{\sum _{i=c+1}^{n}(x_{1i}-x_{2i})}{(1-\theta _{22})^{2}}. \end{aligned}$$

Similar to the proof of part (b), the off-diagonal elements of the Fisher information are all zero. Thus, the result follows from some routine algebra. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S.W., Shahin, S., Ng, H.K.T. et al. Binary segmentation procedures using the bivariate binomial distribution for detecting streakiness in sports data. Comput Stat 36, 1821–1843 (2021). https://doi.org/10.1007/s00180-020-00992-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-00992-2

Keywords

Navigation