Abstract
Streakiness is an important measure in many sports data for individual players or teams in which the success rate is not a constant over time. That is, there are many successes/failures during some periods and few or no successes/failures during other periods. In this paper we propose a Bayesian binary segmentation procedure using a bivariate binomial distribution to locate the changepoints and estimate the associated success rates. The proposed method consists of a series of nested hypothesis tests based on the Bayes factors or posterior probabilities. At each stage, we compare three different changepoint models to the constant success rate model using the bivariate binary data. The proposed method is applied to analyze real sports datasets on baseball and basketball players as illustration. Extensive simulation studies are performed to demonstrate the usefulness of the proposed methodologies.
Similar content being viewed by others
References
Adams RP, MacKay DJ (2007) Bayesian online changepoint detection. Technical Report, University of Cambridge, Cambridge, UK
Albert J (2004) Streakiness in team performance. Chance 17:37–43
Albert J (2008) Streaky hitting in baseball. J Quanti Anal Sports 4, Article 3
Albert J, Williamson P (2001) Using model/data simulations to detect streakiness. Am Stat 55:41–50
Albright S (1993) A statistical analysis of hitting streaks in baseball. J Am Stat Assoc 88:1175–1183
Aminikhanghahi S, Cook DJ (2017) A survey of methods for time series change point detection. Knowl Inf Syst 51:339–367
Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88:309–319
Baumer B (2008) Why on-base percentage is a better indicator of future performance than batting average: an algebraic proof. J Quant Anal Sports 4:1–13
Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91:109–122
Berry S (1991) The summer of ’41: a probability analysis of DiMaggio’s streak and Williams’s average of.406. Chance 4:8–11
Chen J, Gupta AK (1997) Testing and locating variance changepoints with application to stock prices. J Am Stat Assoc 92:739–747
Crowder M, Sweeting T (1989) Bayesian inference for a bivariate binomial distribution. Biometrika 76:599–603
Dean O (2004) Basketball on paper: rules and tools for performance analysis. Potomac Books Inc, Sterling
Dorsey-Palmateer R, Smith G (2004) Bowlers’ hot hands. Am Stat 58:38–45
Fearnhead P, Liu Z (2007) On-line inference for multiple changepoint problems. J R Stat Soc Ser B (Stat Methodol) 69:589–605
Gilovich T, Vallone R, Tversky A (1985) The hot hand in basketball: on the misperception of random sequences. Cogn Psychol 17:295–314
Hickson DA, Waller LA (2003) Spatial analyses of basketball shot charts: an Application to Michael Jordans 2001–2002 NBA Season. Technical Report, Department of Biostatistics, Emory University
Hollinger J (2005) Pro basketball forecast: 2005–2006. Potomac Books Inc, Sterling
Kass-Hout TA, Xu Z, McMurray P, Park S, Buckeridge DL, Brownstein JS, Finelli L, Groseclose SL (2012) Application of change point analysis to daily influenza-like illness emergency department visits. J Am Med Inform Assoc 19:1075–1081
Kendrick L, Musial K, Gabrys B (2018) Change point detection in social networks—critical review with experiments. Comput Sci Rev 29:1–13
Larkey P, Smith R, Kadane J (1989) It’s okay to believe in the “Hot Hand”. Chance 2:22–30
Null B (2009) Modeling baseball player ability with a nested Dirichlet distribution. J Quant Anal Sports 5:1–38
Piette J, Anand S, Zhang K (2010) Scoring and shooting abilities of NBA players. J Quant Anal Sports 6:1–23
Polson N, Wasserman L (1990) Prior distributions for the bivariate binomial. Biometrika 77:901–904
Puerzer RJ (2003) Engineering baseball: branch Rickey’s innovative approach to baseball management. NINE J Baseb Hist Cult 12:72–87
Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. Int J Climatol 33:520–528
Silver N (2003) Lies, damned lies, randomness: catch the fever! Baseball Prospectus, May 14, 2003
Smith AFM (1975) A Bayesian approach to inference about a change-point in a sequence of random variables. Biometrika 62:407–416
Stephens DA (1994) Bayesian retrospective multiple-changepoint identification. J R Stat Soc Ser C (Appl Stat) 43:159–178
Stern HS, Morris CN (1993) Looking for small effects: power and finite sample bias considerations. (Comment on C. Albright’s, “A Statistical analysis of hitting streaks in baseball”). J Am Stat Assoc 88:1189–1194
Stern H (1997) Judging who’s hot and who’s not. Chance 10:40–43
Studeman D (2007) Should Jose Reyes hit more ground balls? The Hardball Times, December 13, 2007
Truong C, Oudre L, Vayatis N (2020) Selective review of offline change point detection methods. Sig Process 167:107299
Tversky A, Gilovich T (1989) The cold facts about the “Hot Hand” in basketball. Chance 2:16–21
Yang TY (2004) Bayesian binary segmentation procedure for detecting streakiness in sports. J R Stat Soc Ser A (Stat Soc) 167:627–637
Yang TY, Kuo L (2001) Bayesian binary segmentation procedure for a Poisson process with multiple changepoints. J Comput Graph Stat 10:772–785
Acknowledgements
The first author’s research was partially supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07045804).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Proposition 3.1
We only prove parts (b) and (c). The derivations for \(\pi _0\) and \(\pi _3\) can be done using similar arguments in the proofs of (b) and (c). Note that the Jeffreys prior for a multi-parameter case with \({\varvec{\theta }}=(\theta _1,\dots ,\theta _k)\) is \(\pi ({\varvec{\theta }}) \propto |I({\varvec{\theta }})|^{1/2}\), where \(I({\varvec{\theta }})\) is the Fisher information matrix for \({\varvec{\theta }}\) and the i’s row and j’s column of \(I({\varvec{\theta }})\) is
(b) For a fixed value of c, the log-likelihood of model \({{{\mathcal {M}}}}_1\) is
The partial derivatives can be obtained as
Since all the cross partial derivatives are zero, we have the following Fisher information
The Jeffreys prior for model \({{{\mathcal {M}}}}_1\) readily follows from (A1), since the off-diagonal elements are all zero. \(\square \)
(c) Again, for a fixed value of c the log-likelihood of model \({{{\mathcal {M}}}}_2\) is
Differentiate \(\lambda _2\) with respect to \(\theta _{10}\), \(\theta _{21}\), and \(\theta _{22}\) to obtain the partial derivatives as
Similar to the proof of part (b), the off-diagonal elements of the Fisher information are all zero. Thus, the result follows from some routine algebra. \(\square \)
Rights and permissions
About this article
Cite this article
Kim, S.W., Shahin, S., Ng, H.K.T. et al. Binary segmentation procedures using the bivariate binomial distribution for detecting streakiness in sports data. Comput Stat 36, 1821–1843 (2021). https://doi.org/10.1007/s00180-020-00992-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-00992-2