Abstract
The body of knowledge related to modeling and simulation (M&S) comes from a variety of constituents: (1) practitioners and users, (2) tool developers and (3) theorists and methodologists. Previous work has shown that categorizing M&S as a concentration in an existing, broader disciple is inadequate because it does not provide a uniform basis for research and education across all institutions. This article presents an approach for the classification of M&S as a scientific discipline and a framework for ensuing analysis. The novelty of the approach lies in its application of machine learning classification to documents containing unstructured text (e.g. publications, funding solicitations) from a variety of established and emerging disciplines related to modeling and simulation. We demonstrate that machine learning classification models can be trained to accurately separate M&S from related disciplines using the abstracts of well-index research publication repositories. We evaluate the accuracy of our trained classifiers using cross-fold validation. Then, we demonstrate that our trained classifiers can effectively identify a set of previously unseen M&S funding solicitations and grant proposals. Finally, we use our approach to uncover new funding trends in M&S and support a uniform basis for education and research.
Similar content being viewed by others
References
Aboelela, S. W., Larson, E., Bakken, S., Carrasquillo, O., Formicola, A., Glied, S. A., et al. (2007). Defining interdisciplinary research: Conclusions from a critical review of the literature. Health Services Research, 42(1p1), 329–346.
Alpaydin, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.
Argamon, S., Koppel, M., & Avneri, G. (1998). Routing documents according to style. In First international workshop on innovative information systems, pp. 85–92. Citeseer.
Baird, L., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. Advances in Neural Information Processing Systems, 20, 968–974.
Balci, O. (2001). A methodology for certification of modeling and simulation applications. ACM Transactions on Modeling and Computer Simulation (TOMACS), 11(4), 352–377.
Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., et al. (2012). Design and update of a classification system: The ucsd map of science. PLoS One, 7(7), e39464.
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pp. 177–186. Springer.
Bourke, P., & Butler, L. (1998). Institutions and the map of science: Matching university departments and fields of research1. Research Policy, 26(6), 711–718.
Crookall, D. (2010). Serious games, debriefing, and simulation/gaming as a discipline. Simulation and Gaming, 41(6), 898–920.
Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician, 37(1), 36–48.
Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the naive bayes model for text categorization. In Proceedings of the ninth international workshop on artificial intelligence and statistics, pp. 705–722.
Fox, C. (1989). A stop list for general text. In ACM SIGIR forum (Vol. 24, pp. 19–21). ACM.
Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In H. F. Moed, W. Glänzel, K. U. Leuven & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 257–276). New York, NY: Springer.
Glänzel, W. (1996). The need for standards in bibliometric research and technology. Scientometrics, 35(2), 167–176.
Glänzel, W., & Moed, H. F. (2002). Journal impact measures in bibliometric research. Scientometrics, 53(2), 171–193.
Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572.
Herrera, M., Roberts, D. C., & Gulbahce, N. (2010). Mapping the evolution of scientific fields. PloS One, 5(5), e10355.
Hinze, S. (1994). Bibliographical cartography of an emerging interdisciplinary discipline: The case of bioelectronics. Scientometrics, 29(3), 353–376.
Hu, X., Downie, J. S., & Ehmann, A. F. (2009). Lyric text mining in music mood classification. American Music, 183(5,049), 2–209.
Ioannidis, J. P. A. (2006). Concentration of the most-cited papers in the scientific literature: Analysis of journal ecosystems. PLoS One, 1(1), e5.
Jahn, N., Fenner, M., & Schirrwagen, J. (2013). PlosopenR–exploring FP7 funded PLOS plosopenR–exploring FP7 funded PLOS. Information Services & Use, 33(2), 93–101.
Jordan, A. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in Neural Information Processing Systems, 14, 841.
Katz, J. S., & Hicks, D. (1995). The classification of interdisciplinary journals: A new approach. In Proceeding of the fifth biennial conference of the international society for scientometrics and informatics, pp. 7–10.
Kaur, J., Hoang, D. T., Sun, X., Possamai, L., JafariAsbagh, M., Patil, S., et al. (2012). Scholarometer: A social framework for analyzing impact across disciplines. PloS One, 7(9), e43235.
Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. (2006). Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1457–1466.
Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International joint conference on artificial intelligence (Vol. 14, pp. 1137–1145). Lawrence Erlbaum Associates Ltd.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Lewis, D. D. (1998). Naive (bayes) at forty: The independence assumption in information retrieval. In D. E. Chemnitz (Ed.), Machine learning: ECML-98 (pp. 4–15). New York, NY: Springer.
Lin, F.-R., Hsieh, L.-S., & Chuang, F.-T. (2009). Discovering genres of online discussion threads via text mining. Computers and Education, 52(2), 481–495.
Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In C. Aggarwal & C. Zhai (Eds.), Mining text data (pp. 415–463). New York, NY: Springer.
Mayr, E. (2004). What makes biology unique? Considerations on the autonomy of a scientific discipline. Cambridge: Cambridge University Press.
McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41–48). Citeseer.
Miltsakaki, E., & Troutt, A. (2008). Real-time web text classification and analysis of reading difficulty. In Proceedings of the third workshop on innovative use of NLP for building educational applications, pp. 89–97. Association for Computational Linguistics.
Nederhof, A. J., & Noyons, E. C. M. (1992). Assessment of the international standing of university departments’ research: A comparison of bibliometric methods. Scientometrics, 24(3), 393–404.
NIH. (2003). National Institute of Health Research Awards 1990–2012 via Exporter. http://exporter.nih.gov/. Accessed June 19, 2013.
Noyons, E. (2001). Bibliometric mapping of science in a policy context. Scientometrics, 50(1), 83–98.
Noyons, E. C. M., Moed, H. F., & Luwel, M. (1999). Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study. Journal of the Association for Information Science and Technology, 50(2), 115.
Pazzani, M., & Meyers, A. (2003). NSF Research Award Abstracts 1990–2003 Data Set. http://archive.ics.uci.edu/ml/datasets/NSF+Research+Award+Abstracts+1990-2003. Accessed June 19, 2013.
Rajman, M., & Besançon, R. (1998). Text mining: Natural language techniques and text mining applications. In S. Spaccapietra & F. Maryanski (Eds.), Data mining and reverse engineering (pp. 50–64). New York, NY: Springer.
Salter, L., & Hearn, A. (1997). Outside the lines: Issues in interdisciplinary research. Montreal: McGill-Queen’s Press-MQUP.
Sarjoughian, H. S., & Zeigler, B. P. (2001). Towards making modeling & simulation into a discipline. Simulation Series, 33(2), 130–135.
Searls, D. B. (2010). The roots of bioinformatics. PLoS Computational Biology, 6(6), e1000809.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
Vessey, I., Ramesh, V., & Glass, R. L. (2005). A unified classification system for research in the computing disciplines. Information and Software Technology, 47(4), 245–255.
Vinkler, P. E. T. E. R. (1988). An attempt of surveying and classifying bibliometric indicators for scientometric purposes. Scientometrics, 13(5–6), 239–259.
Wallace, M. L., Larivière, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PloS One, 7(3), e33339.
Wang, B., & PAN, W. (2005). A survey of content-based anti-spam email filtering [j]. Journal of Chinese Information Processing, 5, 000.
Wei, C.-H., Harris, B. R., Li, D., Berardini, T. Z., Huala, E., Kao, H.-Y., et al. (2012). Accelerating literature curation with text mining tools: A case study of using PubTator to curate genes in PubMed abstracts. Database, 2012, bas041. doi:10.1093/database/bas041.
White, J. (2001). Open portal for digital library. Communications of the ACM, 44(7), 14–44.
Yu, B. (2008). An evaluation of text classification methods for literary study. Literary and Linguistic Computing, 23(3), 327–343.
Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on machine learning, p. 116. ACM.
Acknowledgments
We gratefully acknowledge the support of our colleagues at the Virginia Modeling, Analysis and Simulation Center (VMASC), University of Virginia (UVA) and Gettysburg College in manually classifying the 1000 NSF and NIH Grants used in the evaluation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gore, R., Diallo, S. & Padilla, J. Classifying modeling and simulation as a scientific discipline. Scientometrics 109, 615–628 (2016). https://doi.org/10.1007/s11192-016-2050-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-2050-y