- Airoldi, E. M. and M. Bischof (2017). A regularization scheme on word occurrence rates that improves estimation and interpretation of topical content. Journal of the American Statistical Association 111(516), 1382–1403.
Paper not yet in RePEc: Add citation now
- Airoldi, E. M., E. A. Erosheva, S. E. Fienberg, C. Joutard, T. Love, and S. Shringarpure (2010). Reconceptualizing the classification of PNAS articles. In Proceedings of the National Academy of Sciences.
Paper not yet in RePEc: Add citation now
- Akaike, H. (1973). Information theory and the maximum likelihood principle. In 2nd International Symposium on Information Theory.
Paper not yet in RePEc: Add citation now
Angrist, J. D. and A. B. Krueger (2001). Instrumental variables and the search for identification: from supply and demand to natural experiments. Journal of Economic Perspectives 15(4), 69– 85.
Antweiler, W. and M. Z. Frank (2004). Is all that talk just noise? The information content of internet stock message boards. Journal of Finance 59(3), 1259–1294.
- Armagan, A., D. B. Dunson, and J. Lee (2013). Generalized double Pareto shrinkage. Statistica Sinica 23(1), 119–143.
Paper not yet in RePEc: Add citation now
Baker, S. R., N. Bloom, and S. J. Davis (2016). Measuring economic policy uncertainty. Quarterly Journal of Economics (4), 1593–1636.
Banbura, M., D. Giannone, M. Modugno, and L. Reichlin (2013). Now-casting and the real-time data flow. In Handbook of Economic Forecasting, Volume 2. Elsevier.
Belloni, A., V. Chernozhukov, and C. Hansen (2011). Inference for high-dimensional sparse econometric models. In Advances in Economics & Econometrics: Tenth World Congress.
- Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009). Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics 37(4), 1705–1732.
Paper not yet in RePEc: Add citation now
- Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Paper not yet in RePEc: Add citation now
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Paper not yet in RePEc: Add citation now
- Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM 55(4), 77–84.
Paper not yet in RePEc: Add citation now
- Blei, D. M. and J. D. Lafferty (2006). Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning.
Paper not yet in RePEc: Add citation now
- Blei, D. M. and J. D. Lafferty (2007). A correlated topic model of Science. Annals of Applied Statistics 1, 17–35.
Paper not yet in RePEc: Add citation now
- Blei, D. M. and J. D. McAuliffe (2007). Supervised topic models. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
- Blei, D. M., A. Y. Ng, and M. I. Jordan (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.
Paper not yet in RePEc: Add citation now
- Bollen, J., H. Mao, and X. Zeng (2011). Twitter mood predicts the stock market. Journal of Computational Science 2(1), 1–8.
Paper not yet in RePEc: Add citation now
- Bolukbasi, T., K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
Born, B., M. Ehrmann, and M. Fratzscher (2014). Central bank communication on financial stability. Economic Journal 124(577), 701–734.
- Breiman, L. (2001). Random forests. Machine Learning 45(1), 5–32.
Paper not yet in RePEc: Add citation now
- Breiman, L., J. Friedman, R. Olshen, and C. Stone (1984). Classification and Regression Trees. Boca Raton: Chapman & Hall/CRC.
Paper not yet in RePEc: Add citation now
Buehlmaier, M. M. and T. M. Whited (2016). Are financial constraints priced? Evidence from textual analysis. Simon School Working Paper No. FR 14-11.
- Buhlmann, P. and S. van de Geer (2011). Statistics for High-Dimensional Data. Heidelberg: Springer.
Paper not yet in RePEc: Add citation now
- Candes, E. J., M. B. Wakin, and S. P. Boyd (2008). Enhancing sparsity by reweighted L1 minimization. Journal of Fourier Analysis and Applications 14, 877–905.
Paper not yet in RePEc: Add citation now
Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480.
- Chen, D. and C. D. Manning (2014). A fast and accurate dependency parser using neural networks. In Conference on Empirical Methods in Natural Language Processing.
Paper not yet in RePEc: Add citation now
Choi, H. and H. Varian (2012). Predicting the present with Google Trends. Economic Record 88, 2–9.
- Cook, R. D. (2007). Fisher lecture: dimension reduction in regression. Statistical Science 22(1), 1–26.
Paper not yet in RePEc: Add citation now
- Cowles, A. (1933). Can stock market forecasters forecast? Econometrica 1(3), 309–324.
Paper not yet in RePEc: Add citation now
Das, S. R. and M. Y. Chen (2007). Yahoo! for Amazon: sentiment extraction from small talk on the web. Management Science 53(9), 1375–1388.
Deerwester, S., S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.
- Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014.
Paper not yet in RePEc: Add citation now
- Duchi, J., E. Hazan, and Y. Singer (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159.
Paper not yet in RePEc: Add citation now
Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation. Journal of the American Statistical Association 99(467), 619–632.
- Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (2004). Least angle regression. Annals of Statistics 32(2), 407–499.
Paper not yet in RePEc: Add citation now
Engelberg, J. E. and C. A. Parsons (2011). The causal impact of media in financial markets. Journal of Finance 66(1), 67–97.
- Evans, J. A. and P. Aceves (2016). Machine translation: Mining text for social theory. Annual Review of Sociology 42, 21–50.
Paper not yet in RePEc: Add citation now
Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456), 1348–1360.
- Fan, J., L. Xue, and H. Zou (2014). Strong oracle optimality of folded concave penalized estimation. Annals of Statistics 42(3), 819–849.
Paper not yet in RePEc: Add citation now
Flynn, C., C. Hurvich, and J. Simonoff (2013). Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. Journal of the American Statistical Association 108(503), 1031–1043.
Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.
Gentzkow, M. and J. M. Shapiro (2010). What drives media slant? Evidence from U.S. daily newspapers. Econometrica 78(1), 35–72.
Gentzkow, M., J. M. Shapiro, and M. Taddy (2016). Measuring polarization in high-dimensional data: method and application to congressional speech. NBER Working Paper No. 22423.
- George, E. I. and R. E. McCulloch (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association 88(423), 881–889.
Paper not yet in RePEc: Add citation now
- Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57, 345–420.
Paper not yet in RePEc: Add citation now
- Goldberg, Y. and J. Orwant (2013). A dataset of syntactic-ngrams over time from a very large corpus of English books. In Second Joint Conference on Lexical and Computational Semantics (* SEM).
Paper not yet in RePEc: Add citation now
- Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep Learning. MIT Press. http://www. deeplearningbook.org.
Paper not yet in RePEc: Add citation now
Grimmer, J. (2010). A Bayesian hierarchical topic model for political texts: measuring expressed agendas in Senate press releases. Political Analysis 18(1), 1–35.
Grimmer, J. and B. M. Stewart (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3), 267–297.
Groseclose, T. and J. Milyo (2005). A measure of media bias. Quarterly Journal of Economics (4), 1191–1237.
Hans, C. (2009). Bayesian lasso regression. Biometrika 96(4), 835–845.
Hansen, S., M. McMahon, and A. Prat (2014). Transparency and deliberation within the FOMC: a computational linguistics approach. Centre for Economic Performance Discussion Papers, CEPDP 1276.
- Hastie, T., R. Tibshirani, and J. Friedman (2009). The Elements of Statistical Learning. New York: Springer.
Paper not yet in RePEc: Add citation now
- Hastie, T., R. Tibshirani, and M. Wainwright (2015). Statistical Learning with Sparsity: the Lasso and Generalizations. CRC Press. https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS. pdf.
Paper not yet in RePEc: Add citation now
- Hoberg, G. and G. M. Phillips (2015). Text-based network industries and endogenous product differentiation. Journal of Political Economy 124(5), 1423–1465.
Paper not yet in RePEc: Add citation now
- Hoerl, A. and R. Kennard (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67.
Paper not yet in RePEc: Add citation now
- Hoffman, M. D., D. M. Blei, C. Wang, and J. Paisley (2013). Stochastic variational inference.
Paper not yet in RePEc: Add citation now
- Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the Twenty-Second Annual International SIGIR Conference.
Paper not yet in RePEc: Add citation now
- Iyyer, M., P. Enns, J. Boyd-Graber, and P. Resnik (2014). Political ideology detection using recursive neural networks. In Proceedings of the Association for Computational Linguistics.
Paper not yet in RePEc: Add citation now
Jegadeesh, N. and D. Wu (2013). Word power: a new approach for content analysis. Journal of Financial Economics 110(3), 712–729.
- Johnson, H. A., M. M. Wagner, W. R. Hogan, W. Chapman, R. T. Olszewski, J. Dowling, and G. Barnas (2004). Analysis of web access logs for surveillance of influenza. Studies in Health Technology and Informatics 107, 1202–1206.
Paper not yet in RePEc: Add citation now
- Johnson, R. and T. Zhang (2015). Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
- Jurafsky, D. and J. H. Martin (2009). Speech and Language Processing (2nd ed.). USA: Prentice Hall.
Paper not yet in RePEc: Add citation now
- Kass, R. E. and L. Wasserman (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90(431), 928–934.
Paper not yet in RePEc: Add citation now
- Kingma, D. and J. Ba (2014). ADAM: a method for stochastic optimization. In International Conference on Learning Representations (ICLR).
Paper not yet in RePEc: Add citation now
- Lazer, D., R. Kennedy, G. King, and A. Vespignani (2014). The parable of Google Flu: traps in big data analysis. Science (6176), 1203–1205.
Paper not yet in RePEc: Add citation now
- Le, Q. V. and T. Mikolov (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning.
Paper not yet in RePEc: Add citation now
- LeCun, Y., Y. Bengio, and G. Hinton (2015). Deep learning. Nature 521, 436–444.
Paper not yet in RePEc: Add citation now
Li, F. (2010). The information content of forward-looking statements in corporate filings—a naïve Bayesian machine learning approach. Journal of Accounting Research 48(5), 1049–1102.
Loughran, T. and B. McDonald (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance 66(1), 35–65.
- Lucca, D. O. and F. Trebbi (2011). Measuring central bank communication: an automated approach with application to FOMC statements. University of British Columbia Mimeo.
Paper not yet in RePEc: Add citation now
- Manela, A. and A. Moreira (2015). News implied volatility and disaster concerns. Journal of Financial Economics 123(1), 137–162.
Paper not yet in RePEc: Add citation now
- Manning, C. D., P. Raghavan, and H. Schütze (2008). Introduction to Information Retrieval. Cambridge university press.
Paper not yet in RePEc: Add citation now
- Mannion, D. and P. Dixon (1997). Authorship attribution: the case of Oliver Goldsmith. Journal of the Royal Statistical Society, Series D 46(1), 1–18.
Paper not yet in RePEc: Add citation now
- Manski, C. F. (1988). Analog Estimation Methods in Econometrics. New York: Chapman & Hall.
Paper not yet in RePEc: Add citation now
- Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
- Morin, F. and Y. Bengio (2005). Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics.
Paper not yet in RePEc: Add citation now
- Mosteller, F. and D. L. Wallace (1963). Inference in an authorship problem. Journal of the American Statistical Association 58(302), 275–309.
Paper not yet in RePEc: Add citation now
- Murphy, K. P. (2012). Machine Learning: a Probabilistic Perspective. USA: MIT Press.
Paper not yet in RePEc: Add citation now
- Ng, A. Y. and M. I. Jordan (2002). On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
- Pang, B., L. Lee, and S. Vaithyanathan (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP).
Paper not yet in RePEc: Add citation now
Park, T. and G. Casella (2008). The Bayesian lasso. Journal of the American Statistical Association 103(482), 681–686.
- Pennington, J., R. Socher, and C. D. Manning (2014). GloVe: global vectors for word representation. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP).
Paper not yet in RePEc: Add citation now
- Polson, N. G. and S. L. Scott (2011). Data augmentation for support vector machines. Bayesian Analysis 6(1), 1–23.
Paper not yet in RePEc: Add citation now
- Porter, M. F. (1980). An algorithm for suffix stripping. Program 14(3), 130–137.
Paper not yet in RePEc: Add citation now
- Pritchard, J. K., M. Stephens, and P. Donnelly (2000). Inference of polulation structure using multilocus genotype data. Genetics 155(2), 945–959.
Paper not yet in RePEc: Add citation now
Quinn, K. M., B. L. Monroe, M. Colaresi, M. H. Crespin, and D. R. Radev (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1), 209–228.
- Rabinovich, M. and D. Blei (2014). The inverse regression topic model. In Proceedings of The 31st International Conference on Machine Learning.
Paper not yet in RePEc: Add citation now
Roberts, M. E., B. M. Stewart, D. Tingley, E. M. Airoldi, et al. (2013). The structural topic model and applied social science. In Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation.
- Rumelhart, D., G. Hinton, and R. Williams (1986). Learning representations by back-propagating errors. Nature 323, 533–536.
Paper not yet in RePEc: Add citation now
Saiz, A. and U. Simonsohn (2013). Proxying for unobservable variables with internet documentfrequency. Journal of the European Economic Association 11(1), 137–165.
- Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6(2), 461–464.
Paper not yet in RePEc: Add citation now
- Scott, S. and H. Varian (2014). Predicting the present with Bayesian structural time series. International Journal of Mathematical Modeling and Numerical Optimisation 5(1/2), 4–23.
Paper not yet in RePEc: Add citation now
Scott, S. and H. Varian (2015). Bayesian variable selection for nowcasting economic time series. In Economic Analysis of the Digital Economy. University of Chicago Press.
- Srivastava, N., G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1929–1958.
Paper not yet in RePEc: Add citation now
Stephens-Davidowitz, S. (2014). The cost of racial animus on a black candidate: evidence using Google search data. Journal of Public Economics 118, 26–40.
Stock, J. H. and F. Trebbi (2003). Retrospectives: who invented instrumental variable regression? Journal of Economic Perspectives 17(3), 177–194.
- Sutskever, I., O. Vinyals, and Q. V. Le (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
- Taddy, M. (2012). On estimation and selection for topic models. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS 2012).
Paper not yet in RePEc: Add citation now
- Taddy, M. (2013a). Measuring political sentiment on Twitter: factor optimal design for multinomial inverse regression. Technometrics 55(4), 415–425.
Paper not yet in RePEc: Add citation now
Taddy, M. (2013b). Multinomial inverse regression for text analysis. Journal of the American Statistical Association 108(503), 755–770.
Taddy, M. (2013c). Rejoinder: efficiency and structure in MNIR. Journal of the American Statistical Association 108(503), 772–774.
- Taddy, M. (2015a). Bayesian and empirical Bayesian forests. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015).
Paper not yet in RePEc: Add citation now
- Taddy, M. (2015b). Distributed multinomial regression. Annals of Applied Statistics 9(3), 1394– 1414.
Paper not yet in RePEc: Add citation now
- Taddy, M. (2015c). Document classification by inversion of distributed language representations. In Proceedings of The 53rd Meeting of the Association for Computational Linguistics.
Paper not yet in RePEc: Add citation now
- Taddy, M. (2016). One-step estimator paths for concave regularization. Journal of Computational and Graphical Statistics. To appear.
Paper not yet in RePEc: Add citation now
- Taddy, M. (2017). Comment: A regularization scheme on word occurrence rates that improves estimation and interpretation of topical content. Journal of the American Statistical Association. To appear.
Paper not yet in RePEc: Add citation now
Taddy, M., M. Gardner, L. Chen, and D. Draper (2016). Nonparametric Bayesian analysis of heterogeneous treatment effects in digital experimentation. Journal of Business and Economic Statistics. To appear.
Teh, Y. W., M. I. Jordan, M. J. Beal, and D. M. Blei (2006). Hierarchical Dirichlet processes.
Tetlock, P. (2007). Giving content to investor sentiment: the role of media in the stock market.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58(1), 267–288.
Paper not yet in RePEc: Add citation now
- Vapnik, V. (1996). The Nature of Statistical Learning Theory. New York: Springer.
Paper not yet in RePEc: Add citation now
- Wager, S. and S. Athey (2015). Estimation and inference of heterogeneous treatment effects using random forests. arXiv: 1510.04342.
Paper not yet in RePEc: Add citation now
- Wager, S., T. Hastie, and B. Efron (2014). Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. Journal of Machine Learning Research 15, 1625–1651.
Paper not yet in RePEc: Add citation now
- Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using L1-constrained quadratic programming (lasso). IEEE Transactions on Information Theory 55(5), 2183–2202.
Paper not yet in RePEc: Add citation now
- Wainwright, M. J. and M. I. Jordan (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1–2), 1–305.
Paper not yet in RePEc: Add citation now
Wisniewski, T. P. and B. J. Lambe (2013). The role of media in the credit crunch: the case of the banking sector. Journal of Economic Behavior and Organization 85(1), 163–175.
- Wu, Y., M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. (2016). Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv: 1609.08144.
Paper not yet in RePEc: Add citation now
- Yang, Y., M. J. Wainwright, M. I. Jordan, et al. (2016). On the computational complexity of high-dimensional Bayesian variable selection. Annals of Statistics 44(6), 2497–2532.
Paper not yet in RePEc: Add citation now
- Zeng, X. and M. Wagner (2002). Modeling the effects of epidemics on routinely collected data. Journal of the American Medical Informatics Association 9(6), s17–s22.
Paper not yet in RePEc: Add citation now
- Zhang, X., J. Zhao, and Y. LeCun (2015). Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476), 1418–1429.
Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67(2), 301–320.
- Zou, H., T. Hastie, and R. Tibshirani (2007). On the degrees of freedom of the lasso. Annals of Statistics 35(5), 2173–2192.
Paper not yet in RePEc: Add citation now