[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
survey
Open access

Data Science: A Comprehensive Overview

Published: 29 June 2017 Publication History

Abstract

The 21st century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights, and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics. Although it is widely debated whether big data is only hype and buzz, and data science is still in a very early phase, significant challenges and opportunities are emerging or have been inspired by the research, innovation, business, profession, and education of data science. This article provides a comprehensive survey and tutorial of the fundamental aspects of data science: the evolution from data analysis to data science, the data science concepts, a big picture of the era of data science, the major challenges and directions in data innovation, the nature of data analytics, new industrialization and service opportunities in the data economy, the profession and competency of data education, and the future of data science. This article is the first in the field to draw a comprehensive big picture, in addition to offering rich observations, lessons, and thinking about data science and analytics.

References

[1]
ACEMS. 2014. The Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers. Retrieved from acems.org.au/.
[2]
Ritu Agarwal and Vasant Dhar. 2014. Editorial-big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research 25, 3 (2014), 443--448.
[3]
Xinhua News Agency. 2016. The 13th Five-Year Plan for the National Economic and Social Development of the People’s Republic of China. Retrieved from http://news.xinhuanet.com/politics/2016lh/2016-03/17/c_1118366322.htm.
[4]
AGIMO. 2013. AGIMO Big Data Strategy - Issues Paper. Retrieved from www.finance.gov.au/files/2013/03/Big-Data-Strategy-Issues-Paper1.pdf.
[5]
Paul E. Anderson, James F. Bowring, Rene McCauley, George Pothering, and Christopher W. Starr. 2014. An undergraduate degree in data science: Curriculum and a decade of implementation experience. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE’14). 145--150.
[6]
ASA. 2015. ASA views on data science. Retrieved from http://magazine.amstat.org/?s=data+science8x=08y=0.
[7]
AU. 1990. Data-matching Program. Retrieved from http://www.comlaw.gov.au/Series/C2004A04095.
[8]
AU. 2010. Declaration of Open Government. Retrieved from http://agimo.gov.au/2010/07/16/declaration-of-open-government/.
[9]
AU. 2013. Attorney-General’s Department. Retrieved from http://www.attorneygeneral.gov.au/Mediareleases/Pages/2013/Seconder/22May2013-AustraliajoinsOpenGovernmentPartnership.aspx.
[10]
AU. 2016. Australia Big Data. Retrieved from http://www.finance.gov.au/big-data/.
[11]
Kayode Ayankoya, André P. Calitz, and Jean Greyling. 2014. Intrinsic relations between data science, big data, business analytics and datafication. ACM International Conference Proceeding Series 28 (2014), 192--198.
[12]
John Bailer, Roger Hoer, David Madigan, Jill Montaquila, and Tommy Wright. 2012. Report of the ASA workgroup on master’s degrees. Retrieved from http://magazine.amstat.org/wp-content/uploads/2013an/masterworkgroup.pdf.
[13]
Ben Baumer. 2015. A data science course for undergraduates: Thinking with data. The American Statistician 69, 4 (2015), 334--342.
[14]
BDL. 2016a. Big Data Landscape. Retrieved from www.bigdatalandscape.com.
[15]
BDL. 2016b. Big Data Landscape 2016 (Version 3.0). Retrieved from http://mattturck.com/2016/02/01/big-data-landscape/.
[16]
Mark A. Beyer and Douglas Laney. 2012. The Importance of ‘Big Data’: A Definition. Retrieved from https://www.gartner.com/doc/2057415 Gartner.
[17]
Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshp, Aaron J. Elmore, Samuel Madden, and Aditya Parameswaran. 2015. Datahub: Collaborative data science 8 dataset version management at scale. In CIDR.
[18]
BigML. 2016. BigML. Retrieved from https://bigml.com/.
[19]
Kirk D. Borne, Suzanne Jacoby, Karen Carney, Andy Connolly, Timothy Eastman, M. Jordan Raddick, J. A. Tyson, and John Wallin. 2010. The revolution in astronomy education: Data science for the masses. Retrieved from http://arxiv.org/pdf/0909.3895v1.pdf.
[20]
Sebastien Boyer, Ben U. Gelman, Benjamin Schreck, and Kalyan Veeramachaneni. 2015. Data science foundry for MOOCs. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). 1--10.
[21]
Leo Breiman. 2001. Statistical modeling: The two cultures. Statistical Science 16, 3 (2001), 199--231.
[22]
Gavin Brown. 2009. Review of Education in Mathematics, Data Science and Quantitative Disciplines: Report to the Group of Eight Universities. Retrieved from https://go8.edu.au/publication/go8-review-education-mathematics-data-scie nce-and-quantitative-disciplines.
[23]
Linda Burtch. 2014. The Burtch Works Study: Salaries of Data Scientists. Retrieved from http://www.burtchworks.com/files/2014/07/Burtch-Works-Study_DS_final.pdf.
[24]
Kanyarat Bussaban and Phanu Waraporn. 2015. Preparing undergraduate students majoring in computer science and mathematics with data science perspectives and awareness in the age of big data. In Proceedings of the 7th World Conference on Educational Sciences, Vol. 197. 1443--1446.
[25]
CA. 2016. Canada Capitalizing on Big Data. http://www.sshrc-crsh.gc.ca/news_room-salle_de_presse/latest_news-nouvell es_recentes/big_data_consultation-donnees_massives_consultation-eng.aspx.
[26]
Longbing Cao. 2010a. Domain driven data mining: Challenges and prospects. IEEE Transactions on Knowledge and Data Engineering 22, 6 (2010), 755--769.
[27]
Longbing Cao. 2010b. In-depth behavior understanding and use: The behavior informatics approach. Information Science 180, 17 (2010), 3067--3085.
[28]
Longbing Cao. 2011. Strategic Recommendations on Advanced Data Industry and Services for the Yanhuang Science and Technology Park.
[29]
Longbing Cao. 2014. Non-IIDness learning in behavioral and social data. The Computer Journal 57, 9 (2014), 1358--1370.
[30]
Longbing Cao. 2015a. Coupling learning of complex interactions. Journal of Information Processing and Management 51, 2 (2015), 167--186.
[31]
Longbing Cao. 2015b. Metasynthetic Computing and Engineering of Complex Systems. Springer.
[32]
Longbing Cao. 2016a. Data science and analytics: A new era. International Journal of Data Science and Analytics 1, 1 (2016), 1--2.
[33]
Longbing Cao. 2016b. Data science: Challenges and directions. Technical Report, UTS Advanced Analytics Institute.
[34]
Longbing Cao. 2016c. Data Science: Nature and Pitfalls. Technical Report, UTS Advanced Analytics Institute.
[35]
Longbing Cao. 2016d. Data Science: Profession and Education. Technical Report, UTS Advanced Analytics Institute.
[36]
Longbing Cao. 2017. Understand Data Science (to be published). Springer.
[37]
Longbing Cao and Ruwei Dai. 2008. Open Complex Intelligent Systems. Post Telecom Press.
[38]
Longbing Cao, Ruwei Dai, and Mengchu Zhou. 2009. Metasynthesis: M-space, m-interaction and m-computing for open complex giant systems. IEEE Transactions on Systems, Man, and Cybernetics--Part A 39, 5 (2009), 1007--1021.
[39]
Longbing Cao and Philip S. Yu (Eds). 2012. Behavior Computing: Modeling, Analysis, Mining and Decision. Springer.
[40]
Longbing Cao, Yuming Ou, and Philip S Yu. 2012. Coupled behavior analysis with applications. IEEE Transactions on Knowledge and Data Engineering 24, 8 (2012), 1378--1392.
[41]
Longbing Cao, Philip S. Yu, Chengqi Zhang, and Yanchang Zhao. 2010. Domain Driven Data Mining. Springer.
[42]
Capterra. 2016a. Top Project Management Tools. Retrieved from http://www.capterra.com/project-management-software/.
[43]
Capterra. 2016b. Top Reporting Software Products. Retrieved from http://www.capterra.com/reporting-software/.
[44]
CBDIO. 2016. China Big Data Industrial Observation. Retrieved from www.cbdio.com.
[45]
CCF-BDTF. 2013. China Computer Federation Task Force on Big Data. Retrieved from http://www.bigdataforum.org.cn/.
[46]
John M. Chambers. 1993. Greater or lesser statistics: A choice for future research. Statistics and Computing 3, 4 (1993), 182--184.
[47]
Swami Chandrasekaran. 2013. Becoming a Data Scientist. Retrieved from http://nirvacana.com/thoughts/becoming-a-data-scientist/.
[48]
Hsinchun Chen, Roger H. L. Chiang, and Veda C. Storey. 2012. Business intelligence and analytics: From big data to big impact. MIS Quarterly 36, 4 (2012), 1165--1188.
[49]
China Information Security. 2015. Big Data Strategies and Actions in Major Countries. Retrieved from http://www.cac.gov.cn/2015-07/03/c_1115812491.htm.
[50]
Thomas R. Clancy, Kathryn H. Bowles, Lillee Gelinas, Ida Androwich, Connie Delaney, Susan Matney, Joyce Sensmeier, Judith Warren, John Welton, and Bonnie Westra. 2014. A call to action: Engage in big data science. Nursing Outlook 62, 1 (2014), 64--65.
[51]
Classcentral. 2016. Data Science and Big Data—Free Online Courses. Retrieved from https://www.class-central.com/subject/data-science.
[52]
Kelly Clay. 2013. CES 2013: The Year of The Quantified Self? Retrieved from http://www.forbes.com/sites/kellyclay/2013/01/06/ces-2013-the-year-of-the-quantified-self/♯4cf4d2b55e74.
[53]
William S. Cleveland. 2001. Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review 69, 1 (2001), 21--26.
[54]
CMIST. 2016. China Will Establish A Series of National Labs. Retrieved from http://news.sciencenet.cn/htmlnews/2016/4/344404.shtm.
[55]
CNSF. 2015. National Science Foundation China. Retrieved from http://www.nsfc.gov.cn/.
[56]
European Commission. 2014. Commission urges governments to embrace potential of big data. Retrieved from europa.eu/rapid/press-release_IP-14-769_en.htm.
[57]
Coursera. 2016. Coursera. Retrieved from www.coursera.org/data-science.
[58]
Kevin Crowston and Jian Qin. 2011. A capability maturity model for scientific data management: Evidence from the literature. Proceedings of the Association for Information Science and Technology 48, 10 (2011), 1--9.
[59]
CSC. 2012. Big data universe beginning to explode. Retrieved from http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode.
[60]
CSNSTC. 2009. Harnessing the Power of Digital Data for Science and Society: Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council. Retrieved from https://www.nitrd.gov/About/Harnessing_Power_Web.pdf.
[61]
DABS. 2016. Data Analytics Book Series. Retrieved from http://www.springer.com/series/15063.
[62]
DARPA. 2016. DARPA Xdata program. Retrieved from www.darpa.mil/program/xdata.
[63]
Data61. 2016. Data61. Retrieved from https://www.data61.csiro.au/.
[64]
DataRobot. 2016. DataRobot. Retrieved from https://www.datarobot.com/.
[65]
Datasciences.org. 2005. Homepage. Retrieved from www.datasciences.org.
[66]
Thomas H. Davenport and D. J. Patil. 2012. Data scientist: The sexiest job of the 21st century. Harvard Business Review (2012), 70--76.
[67]
Jessica Davis. 2016. 10 Programming Languages And Tools Data Scientists Used. Retrieved from http://www.informationweek.com/devops/programming-languages/10-programming-languages-and-tools-data-scientists-use-now/d/d-id/1326034.
[68]
Devendra Desale. 2015. Top 30 Social Network Analysis and Visualization Tools. Retrieved from http://www.kdnuggets.com/2015/06/top-30-social-network-analysis-visualization-tools.html.
[69]
Vasant Dhar. 2013. Data science and prediction. Communications of the ACM 56, 12 (2013), 64--73.
[70]
Herman A. Dierick and Fabrizio Gabbiani. 2015. Drosophila neurobiology: No escape from ‘Big Data’ science. Current Biology 25, 14 (2015), 606--608.
[71]
Peter J. Diggle. 2015. Statistics: A data science for the 21st century. Journal of the Royal Statistical Society: Series A (Statistics in Society) 178, 4 (2015), 793--813.
[72]
David Donoho. 2015. 50 years of Data Science. Retrieved from http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf.
[73]
Bonnie J. Dorr, Craig S. Greenberg, Peter Fontana, Mark A. Przybocki, Marion Le Bras, Cathryn A. Ploehn, Oleg Aulov, Martial Michel, E. Jim Golden, and Wo Chang. 2015. The NIST data science initiative. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). 1--10.
[74]
DSA. 2016. Data Science Association. Retrieved from http://www.datascienceassn.org/.
[75]
DSAA. 2014. IEEE/ACM/ASA International Conference on Data Science and Advanced Analytics. Retrieved from www.dsaa.co.
[76]
DSC. 2016a. College 8 University Data Science Degrees. Retrieved from http://datascience.community/colleges.
[77]
DSC. 2016b. The Data Science Community. Retrieved from http://datasciencebe.com/.
[78]
DSCentral. 2016. Data Science Central. Retrieved from http://www.datasciencecentral.com/.
[79]
DSE. 2015. Data Science and Engineering. Retrieved from http://link.springer.com/journal/41019.
[80]
DSJ. 2014. Data Science Journal.Retrieved from datascience.codata.org.
[81]
DSKD. 2007. Data Science and Knowledge Discovery Lab, UTS. Retrieved from http://www.uts.edu.au/research-and-teaching/our-research/quantum-computation-and-intelligent-systems/data-sciences-and.
[82]
David Ewing Duncan. 2009. Experimental Man: What One Man’s Body Reveals about His Future, Your Health, and Our Toxic World. Wiley 8 Sons, New York.
[83]
Edx. 2016. EDX Courses. Retrieved from https://www.edx.org/course?search_query=data+science.
[84]
EMC. 2011. Data science revealed: A data-driven glimpse into the burgeoning new field. Retrieved from www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf.
[85]
EPJDS. 2012. EPJ Data Science. Retrieved from http://epjdatascience.springeropen.com/.
[86]
EU. 2014. EU Towards a Thriving Data-Driven Economy. Retrieved from https://ec.europa.eu/digital-single-market/en/towards-thriving-data-driven-economy.
[87]
EU-DSA. 2016. The European Data Science Academy. Retrieved from edsa-project.eu.
[88]
EU-OD. 2016. The European Union Open Data Portal. Retrieved from https://open-data.europa.eu/.
[89]
Facebook. 2016. Facebook Data. Retrieved from https://www.facebook.com/careers/teams/data/.
[90]
James H. Faghmous and Vipin Kumar. 2014. A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2, 3 (2014), 155--163.
[91]
Joshua Fairfielda and Hannah Shteina. 2014. Big data, big problems: Emerging issues in the ethics of data science and journalism. Journal of Mass Media Ethics 29, 1 (2014), 38--51.
[92]
Jack Faris, Evelyne Kolker, Alex Szalay, Leon Bradlow, Ewa Deelman, Wu Feng, Judy Qiu, Donna Russell, Elizabeth Stewart, and Eugene Kolker. 2011. Communication and data-intensive science in the beginning of the 21st century. A Journal of Integrative Biology 15, 4 (2011), 213--215.
[93]
Tom Fawcett. 2016. Mining the quantified self: Personal knowledge discovery as a challenge for data science. Big Data 3, 4 (2016), 249--266.
[94]
Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From data mining to knowledge discovery in databases. AI Magazine 17, 3 (1996), 37--54.
[95]
William Finzer. 2013. The data science education dilemma. Technology Innovations in Statistics Education 7, 2 (2013).
[96]
Geoffrey Fox, Siddharth Maini, Howard Rosenbaum, and David J. Wild. 2015. Data science and online education. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom’15). 582--587.
[97]
Peter Fox and James Hendler. 2014. The science of data science. Big Data 2, 2 (2014), 68--70.
[98]
Molly Galetto. 2016. Top 50 Data Science Resources. Retrieved from http://www.ngdata.com/top-data-science-resources/?.
[99]
GEO. 2016. Gene Expression Omnibus. Retrieved from http://www.ncbi.nlm.nih.gov/geo/.
[100]
Deepak Ghodke. 2015. Bye Bye 2015: What lies ahead for BI. Retrieved from http://www.ciol.com/bye-bye-2015-what-lies-ahead-for-bi/.
[101]
Github. 2016a. Data Science Colleges. Retrieved from https://github.com/ryanswanstrom/awesome-datascience-colleges.
[102]
Github. 2016b. List of Recommender Systems. Retrieved from https://github.com/grahamjenson/list_of_recommender_systems.
[103]
Michael Gold, Ryan McClarren, and Conor Gaughan. 2013. The lessons Oscar taught us: Data science and media 8 entertainment. Big Data 1, 2 (2013), 105--109.
[104]
Google. 2016a. Google Bigquery and Cloud Platform. Retrieved from https://cloud.google.com/bigquery/.
[105]
Google. 2016b. Google Cloud Prediction API. Retrieved from https://cloud.google.com/prediction/docs/.
[106]
Google. 2016c. Google Online Open Education. Retrieved from https://www.google.com/edu/openonline/.
[107]
Google. 2016d. Google Trends. (2016). https://www.google.com.au/trends/explore#q=datalyticsz=Etc Retrieved on 14 Novermber 2016.
[108]
Google. 2016e. Open Mobile Data. Retrieved from https://console.developers.google.com/storage/browser/openmobiledata_public/.
[109]
Beijing Municipal Government. 2016. Beijing Big Data and Cloud Computing Development Action Plan. Retrieved from http://zhengwu.beijing.gov.cn/gh/dt/t1445533.htm.
[110]
China Government. 2015. China Big Data. Retrieved from http://www.gov.cn/zhengce/content/2015-09/05/content_10137.htm.
[111]
Matthew J. Graham. 2012. The art of data science. In Astrostatistics and Data Mining,Springer Series in Astrostatistics, Vol. 2. 47--59.
[112]
Jim Gray. 2007. eScience—A Transformed Scientific Method. Retrieved from http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt.
[113]
GTD. 2016. Global Terrorism Database. Retrieved from https://www.start.umd.edu/gtd/.
[114]
Akash Gupta, Ahmet Cecen, Sharad Goyal, Amarendra K. Singh, and Surya R. Kalidindi. 2015. Structure-property linkages using a data science approach: Application to a non-metallic inclusion/steel composite system. Acta Materialia 91 (2015), 239--254.
[115]
David J. Hand. 2015. Statistics and computing: The genesis of data science. Statistics and Computing 25, 4 (2015), 705--711.
[116]
Hardin. 2016. Github. Retrieved from hardin47.github.io/DataSciStatsMaterials/.
[117]
Johanna Hardin, Roger Hoerl, Nicholas J. Horton, and Deborah Nolan. 2015. Data science in statistics curricula: Preparing students to “Think with Data”. The American Statistician 69, 4 (2015), 343--353.
[118]
Harlan Harris, Sean Murphy, and Marck Vaisman. 2013. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media.
[119]
Benjamin T. Hazena, Christopher A. Booneb, Jeremy D. Ezellc, and L. Allison Jones-Farmer. 2014. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics 154 (2014), 72--80.
[120]
Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Retrieved from http://research.microsoft.com/en-us/collaboration/fourthparadigm/.
[121]
Tony Hey and Anne Trefethen. 2003. The Data Deluge: An e-Science Perspective. John Wiley 8 Sons, Ltd, 809--824.
[122]
HLSG. 2010. Final report of the high level expert group on scientific data. http://ec.europa.eu/information_society/newsroom/cf/document.cfm?action=display8doc_id=707.
[123]
HLSG. 2014. An RDA Europe Report. Retrieved from http://www.e-nformation.ro/wp-content/uploads/2014/12/TheDataHarvestReport_-Final.pdf.
[124]
Horizon. 2014. European Commission Horizon 2020 Big Data Private Public Partnership. Retrieved from http://ec.europa.eu/programmes/horizon2020/en/h2020-section/information-and-communication-technologies.
[125]
Peter J. Huber. 2011. Data Analysis: What Can Be Learned From the Past 50 Years. John Wiley 8 Sons.
[126]
IASC. 1977. International Association for Statistical Computing. (1977). http://www.iasc-isi.org/.
[127]
IBM. 2010. Capitalizing on Complexity. Retrieved from http://www-935.ibm.com/services/us/ceo/ceostudy2010/multimedia.html.
[128]
IBM. 2016a. IBM Analytics and Big Data. Retrieved from http://www.ibm.com/analytics/us/en/orhttp://www-01.ibm.com/software/data/bigdata/.
[129]
IBM. 2016b. What is a Data Scientist? Retrieved from http://www-01.ibm.com/software/data/infosphere/data-scientist/.
[130]
IDA. 2014. International Institute of Data 8 Analytics. Retrieved from www.datasciences.org.
[131]
IEEEBD. 2014. IEEE Big Data Initiative. (2014). http://bigdata.ieee.org/.
[132]
IFSC-96. 1996. Data Science, Classification, and Related Methods. Retrieved from http://d-nb.info/955715512/04.
[133]
IJDS. 2016. International Journal of Data Science. (2016). http://www.inderscience.com/jhome.php?jcode=ijds.
[134]
IJRDS. 2017. International Journal of Research on Data Science. Retrieved from http://www.sciencepublishinggroup.com/journal/index?journalid=310.
[135]
INFORMS. 2014. Candidate Handbook. Retrieved from https://www.informs.org/Certification-Continuing-Ed/Analytics-Certificati on/Candidate-Handbook.
[136]
INFORMS. 2016. Institute for Operations Research and the Management Sciences. Retrieved from https://www.informs.org/.
[137]
Shuichi Iwata. 2008. Scientific “agenda” of data science. Data Science Journal 7, 5 (2008), 54--56.
[138]
H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and Cyrus Shahabi. 2014. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94.
[139]
H. V. Jagadish. 2015. Big data and science: Myths and reality. Big Data Research 2, 2 (2015), 49--52.
[140]
JDS. 2002. Journal of Data Science. Retrieved from http://www.jds-online.com/.
[141]
JDSA. 2015. International Journal of Data Science and Analytics (JDSA). Retrieved from http://www.springer.com/41060.
[142]
JFDS. 2016. The Journal of Finance and Data Science. Retrieved from http://www.keaipublishing.com/en/journals/the-journal-of-finance-and-data-science/.
[143]
Kaggle. 2016. Kaggle Competition Data. Retrieved from https://www.kaggle.com/competitions.
[144]
Surya R. Kalidindi. 2015. Data science and cyberinfrastructure: Critical enablers for accelerated development of hierarchical materials. International Materials Reviews 60, 3 (2015), 150--168.
[145]
KDD89. 1989. IJCAI-89 Workshop on Knowledge Discovery in Databases. Retrieved from http://www.kdnuggets.com/meetings/kdd89/index.html.
[146]
KDnuggets. 2015. Visualization Software. Retrieved from http://www.kdnuggets.com/software/visualization.html.
[147]
Kdnuggets. 2016. Kdnuggets. Retrieved from http://www.kdnuggets.com/.
[148]
K Kelly. 2012. The quantified century. In Quantified Self Conference. Retrieved from http://quantifiedself.com/conference/Palo-Alto-2012.
[149]
Nawsher Khan, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, and et al. 2014. Big data: Survey, technologies, opportunities, and challenges. The Scientific World Journal 2014 (2014), 18.
[150]
John King and Roger Magoulas. 2015. 2015 Data Science Salary Survey. Retrieved from http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf.
[151]
Ron Kohavi, Neal J. Rothleder, and Evangelos Simoudis. 2002. Emerging trends in business analytics. Communications of the ACM 45, 8 (2002), 45--48.
[152]
AMP Lab. 2016. MLBase. Retrieved from http://mlbase.org/.
[153]
Alexandros Labrinidis and H. V. Jagadish. 2012. Challenges and opportunities with big data. Proceedings of the VLDB Endowment 5, 12 (2012), 2032--2033.
[154]
Douglas Laney. 2001. 3D Data Management: Controlling Data Volume, Velocity and Variety. Technical Report, META Group.
[155]
David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343 (2014), 1203--1205.
[156]
LDC. 2016. Linguistic Data Consortium. Retrieved from https://www.ldc.upenn.edu/about.
[157]
LinkedIn. 2016. LinkedIn Jobs. Retrieved from https://www.linkedin.com/jobs/data-scientist-jobs.
[158]
Mike Loukides. 2011. The Evolution of Data Products. O’Reilly, Cambridge.
[159]
Mike Loukides. 2012. What is Data Science? O’Reilly Media, Sebastopol, CA. http://radar.oreilly.com/2010/06/what-is-data-science.htmldata-scientists.
[160]
Andrea Manieri, Steve Brewer, Ruben Riestra, Yuri Demchenko, Matthias Hemmje, Tomasz Wiktorski, Tiziana Ferrari, and Jrmy Frey. 2015. Data science professional uncovered: How the EDISON project will contribute to a widely accepted profile for data scientists. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom’15). 588--593.
[161]
Kate Matsudaira. 2015. The science of managing data science. Communications of the ACM 58, 6 (2015), 44--47.
[162]
McKinsey. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.
[163]
Claire Cain Miller. 2013. Data science: The numbers of our lives. New York Times Retrieved from http://www.nytimes.com/2013/04/14/education/edlife/universities-offer-courses-in-a-hot-new-field-data-science.html?pagewanted=all8_r=0.
[164]
Arthur John Havart Morrell (Ed.). 1968. Information processing. In Proceedings of IFIP Congress 1968. Edinburgh, UK.
[165]
Peter Murray-Rust. 2007. Data-driven science: A scientist’s view. In NSF/JISC 2007 Digital Repositories Workshop. http://www.sis.pitt.edu/repwkshop/papers/murray.pdf.
[166]
Peter Naur. 1968. ‘Datalogy’, the science of data and data processes. In Proceedings of IFIP Congress 1968, 1383--1387.
[167]
Peter Naur. 1974. Concise Survey of Computer Methods. Studentlitteratur, Lund, Sweden.
[168]
NCSU. 2007a. Institute for Advanced Analytics, North Carolina State University. Retrieved from http://analytics.ncsu.edu/.
[169]
NCSU. 2007b. Master of Science in Analytics, Institute for Advanced Analytics, North Carolina State University. Retrieved from http://analytics.ncsu.edu/.
[170]
Michael L. Nelson. 2009. Data-driven science: A new paradigm? EDUCAUSE Review 44, 4 (2009), 6--7.
[171]
NICTA. 2016. National ICT Australia. Retrieved from https://www.nicta.com.au/.
[172]
NIST. 2015. NIST Text Retrieval Conference Data. Retrieved from http://trec.nist.gov/data.html.
[173]
NSB. 2005. Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Retrieved from http://www.nsf.gov/pubs/2005/nsb0540/.
[174]
NSF. 2007. US NSF07-28. Retrieved from http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.
[175]
OECD. 2007. OECD Principles and Guidelines for Access to Research Data from Public Funding. Retrieved from https://www.oecd.org/sti/sci-tech/38500813.pdf.
[176]
OPENedX. 2016. OPENedX Online Education Platform. Retrieved from https://open.edx.org/.
[177]
Tim O’Reilly. 2005. What is Web 2.0. Retrieved from http://oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=3.
[178]
D. J. Patil. 2011. Building Data Science Teams. O’Reilly Media.
[179]
Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V. Weber. 1993. Capability maturity model version 1.1. IEEE Software 10, 4 (1993), 18--27.
[180]
Gil Press. 2013. A Very Short History of Data Science. Retrieved from http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/61ae3ebb69fd.
[181]
Xuesen Qian. 1991. Revisiting issues on open complex giant systems. International Journal of Pattern Recognition and Artificial Intelligence 4, 1 (1991), 5--8.
[182]
Xuesen Qian, Jingyuan Yu, and Ruwei Dai. 1993. A new discipline of science—The study of open complex giant system and its methodology. Chinese Journal of Systems Engineering 8 Electronics. 4, 2 (1993), 2--12.
[183]
RapidMiner. 2016. RapidMiner. (2016). https://rapidminer.com/.
[184]
Samantha Renae. 2011. Data analytics: Crunching the future. Bloomberg Businessweek (2011). September 8.
[185]
Solutions Review. 2016. Data Integration and Application Integration Solutions Directory. Retrieved from http://solutionsreview.com/data-integration/data-integration-solutions-directory/.
[186]
C. Rudin, D. Dunson, R. Irizarry, H. Ji, E. Laber, J. Leek, T. McCormick, Sherri Rose, C. Schafer, M. van der Laan, L. Wasserman, and L. Xue. 2014. Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society. Retrieved from http://www.amstat.org/policy/pdfs/BigDataStatisticsJune2014.pdf American Statistical Association.
[187]
SAS. 2013. Big Data Analytics: An Assessment of Demand for Labour and Skills, 2012-2017. Retrieved from https://www.thetechpartnership.com/globalassets/pdfs/research-2014/bigdata_report_nov14.pdf Report. SAS/The Tech Partnership.
[188]
SAS. 2016. SAS Retrieved from http://www.sas.com/en_us/insights.html.
[189]
Tobias Schoenherr and Cheri Speier-Pero. 2015. Data science, predictive analytics, and big data in supply chain management: Current state and future potential. Journal of Business Logistics 36, 1 (2015), 120--132.
[190]
SIAM. 2016. SIAM career center. (2016). http://jobs.siam.org/home/.
[191]
Christoph Siart, Simon Kopp, and Jochen Apel. 2015. The interface between data science, research assessment and science support—Highlights from the German perspective and examples from Heidelberg University. In Proceedings of the 2015 IIAI 4th International Congress on Advanced Applied Informatics (IIAI-AAI’15). 472--476.
[192]
Silk. 2016. Data Science University Programs. Retrieved from http://data-science-university-programs.silk.co/.
[193]
Larry Smarr. 2012. Quantifying your body: A how-to guide from a systems biology perspective. Biotechnology Journal 7, 8 (2012), 980--991.
[194]
F. Jack Smith. 2006. Data science as an academic discipline. Data Science Journal 5 (2006), 163--164.
[195]
SSDS. 2015. Springer Series in the Data Sciences. Retrieved from http://www.springer.com/series/13852.
[196]
Stanford. 2014. Stanford Data Science Initiatives, Stanford University. Retrieved from https://sdsi.stanford.edu/.
[197]
Thomas R. Stewart and Claude McMillan, Jr. 1987. Descriptive and prescriptive models for judgment and decision making: Implications for knowledge engineering. In Expert Judgment and Expert Systems, Jeryl L. Mumpower, Ortwin Renn, Lawrence D. Phillips, and V. R. R. Uppuluri (Eds.). Springer-Verlag, London, 305--320.
[198]
Michael Stonebraker, Sam Madden, and Pradeep Dubey. 2013. Intel ‘big data’ science and technology center vision and execution plan. SIGMOD Record 42, 1 (2013), 44--49.
[199]
Alma Swan and Sheridan Brown. 2008. The skills, role career structure of data scientists curators: Assessment of current practice future needs. (2008). Technical Report. University of Southampton.
[200]
Melanie Swan. 2013. The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 1, 2 (2013), 85--99.
[201]
Technavio. 2016. Top 10 Healthcare Data Analytics Companies. Retrieved from http://www.technavio.com/blog/top-10-healthcare-data-analytics-companies.
[202]
TFDSAA. 2013. IEEE Task Force on Data Science and Advanced Analytics. Retrieved from http://dsaatf.dsaa.co/.
[203]
TOBD. 2015. IEEE Transactions on Big Data. Retrieved from https://www.computer.org/web/tbd.
[204]
Predictive Analytics Today. 2016. 29 Data Preparation Tools and Platforms. Retrieved from http://www.predictiveanalyticstoday.com/data-preparation-tools-and-platforms/.
[205]
John W. Tukey. 1962. The future of data analysis. The Annals of Mathematical Statistics 33, 1 (1962), 1--67.
[206]
John W. Tukey. 1977. Exploratory Data Analysis. Pearson.
[207]
Tutiempo. 2016. Global Climate Data. Retrieved from http://en.tutiempo.net/climate.
[208]
UCI. 2016. UCI Machine Learning Repository. Retrieved from archive.ics.uci.edu/ml/.
[209]
Udacity. 2016. Udacity Courses. Retrieved from https://www.udacity.com/courses/data-science.
[210]
Udemy. 2016. Udemy Courses. Retrieved from https://www.udemy.com/courses/search/?ref=home8src=ukw8q=data+science8lang=en.
[211]
UK. 2016. UK Big Data. Retrieved from http://www.rcuk.ac.uk/research/infrastructure/big-data/.
[212]
UK-HM. 2012. UK HM Government. Retrieved from http://data.gov.uk/sites/default/files/Open_data_White_Paper.pdf.
[213]
UK-OD. 2016. UK Open Data. Retrieved from http://data.gov.uk/.
[214]
UMichi. 2015. Michigan Institute For Data Science, University of Michigan. Retrieved from http://midas.umich.edu/.
[215]
UN. 2010. United Nation Global Pulse Projects. Retrieved from http://www.unglobalpulse.org/.
[216]
US-OD. 2016. US Government Open Data. Retrieved from https://www.data.gov/.
[217]
USD2D. 2016. US National Consortium for Data Science. Retrieved from data2discovery.org.
[218]
USDSC. 2016. US Degree Programs in Analytics and Data Science. Retrieved from http://analytics.ncsu.edu/?page_id=4184.
[219]
USNSF. 2012. US Big Data Research Initiative. Retrieved from http://www.nsf.gov/cise/news/bigdata.jsp.
[220]
UTS. 2011. Master of Analytics (Research) and Doctor of Philosophy Thesis: Analytics, Advanced Analytics Institute, University of Technology Sydney. Retrieved from http://www.uts.edu.au/research-and-teaching/our-research/advanced-analytics-institute/education-and-research-opportuniti-1.
[221]
UTSAAI. 2011. Advanced Analytics Institute, University of Technology Sydney. Retrieved from https://analytics.uts.edu.au/.
[222]
David van Dyk, Montse Fuentes, Michael I. Jordan, Michael Newton, Bonnie K. Ray, Duncan Temple Lang, and Hadley Wickham. 2015. ASA Statement on the Role of Statistics in Data Science. Retrieved from http://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/.
[223]
Vast. 2016. Visual Analytics Community. Retrieved from http://vacommunity.org/HomePage.
[224]
Dan Vesset, Benjamin Woo, Henry D. Morris, Richard L. Villars, Gard Little, Jean S. Bozman, Lucinda Borovick, Carl W. Olofson, Susan Feldman, Steve Conway, Matthew Eastwood, and Natalya Yezhkova. 2012. Worldwide Big Data Technology and Services 2012-2015 Forecast. IDC.
[225]
Ana Viseu and Lucy Suchman. 2010. Wearable Augmentations: Imaginaries of the Informed Body. Berghahn Books, New York, 161--184.
[226]
Whitehouse. 2015. The White House Names Dr. D. J. Patil as the First U.S. Chief Data Scientist. Retrieved from https://www.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist.
[227]
Wikipedia. 2016a. Comparison of Cluster Software. Retrieved from https://en.wikipedia.org/wiki/Comparison_of_cluster_software.
[228]
Wikipedia. 2016b. Informatics. (2016). https://en.wikipedia.org/wiki/Informatics.
[229]
Wikipedia. 2016c. List of Reporting Software. Retrieved from https://en.wikipedia.org/wiki/List_of_reporting_software.
[230]
WIRED. 2014. How Europe can Seize the Starring Role in Big Data. Retrieved from www.wired.com/insights/2014/09/europe-big-data/.
[231]
Gary Wolf. 2012. The data-driven life. New York Times. Retrieved from www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html.
[232]
Jeff Wu. 1997. Statistics = Data Science? Retrieved from http://www2.isye.gatech.edu/∼jeffwu/presentations/datascience.pdf.
[233]
Yahoo. 2016. Yahoo Finance. Retrieved from finance.yahoo.com.
[234]
Nathan Yau. 2009. Rise of the Data Scientist. Retrieved from http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/.
[235]
Chris Yiu. 2012. The Big Data Opportunity. Retrieved from http://www.policyexchange.org.uk/images/publications/thepportunity.pdf.
[236]
Bin Yu. 2014. IMS presidential address: Let us own data science. IMS Bulletin Online (2014). Oct. 1, 2014.

Cited By

View all
  • (2025)Machine learning for the advancement of membrane science and technology: A critical reviewJournal of Membrane Science10.1016/j.memsci.2024.123256713(123256)Online publication date: Jan-2025
  • (2025)MTable: Visual query interface for browsing and navigation in NoSQL data storesJournal of Computer Languages10.1016/j.cola.2024.10131282(101312)Online publication date: Mar-2025
  • (2025)An Introduction to Data Science or Data Science: An Introductory ApproachEncyclopedia of Libraries, Librarianship, and Information Science10.1016/B978-0-323-95689-5.00273-X(317-330)Online publication date: 2025
  • Show More Cited By

Index Terms

  1. Data Science: A Comprehensive Overview

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 50, Issue 3
    May 2018
    550 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3101309
    • Editor:
    • Sartaj Sahni
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 June 2017
    Accepted: 01 April 2017
    Revised: 01 November 2016
    Received: 01 June 2016
    Published in CSUR Volume 50, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Big data
    2. advanced analytics
    3. big data analytics
    4. computing
    5. data DNA
    6. data analysis
    7. data analytics
    8. data economy
    9. data education
    10. data engineering
    11. data industry
    12. data innovation
    13. data profession
    14. data science
    15. data scientist
    16. data service
    17. informatics
    18. statistics

    Qualifiers

    • Survey
    • Research
    • Refereed

    Funding Sources

    • Australian Research Council Discovery Grant

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6,880
    • Downloads (Last 6 weeks)709
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Machine learning for the advancement of membrane science and technology: A critical reviewJournal of Membrane Science10.1016/j.memsci.2024.123256713(123256)Online publication date: Jan-2025
    • (2025)MTable: Visual query interface for browsing and navigation in NoSQL data storesJournal of Computer Languages10.1016/j.cola.2024.10131282(101312)Online publication date: Mar-2025
    • (2025)An Introduction to Data Science or Data Science: An Introductory ApproachEncyclopedia of Libraries, Librarianship, and Information Science10.1016/B978-0-323-95689-5.00273-X(317-330)Online publication date: 2025
    • (2024)Incorporating Morris' Design Thoughts for AI and Big Data-Enabled Coverage Optimization in China's Wireless Communication NetworkJournal of Information Systems Engineering and Management10.55267/iadt.07.140769:1(23622)Online publication date: 2024
    • (2024)A study on rethinking EDA in digital transformation eraKorean Journal of Applied Statistics10.5351/KJAS.2024.37.1.08737:1(87-102)Online publication date: 29-Feb-2024
    • (2024)Role of Predictive Analytics for Enhanced Decision Making in Business ApplicationsHarnessing AI and Digital Twin Technologies in Businesses10.4018/979-8-3693-3234-4.ch023(313-326)Online publication date: 2-Feb-2024
    • (2024)The Evolution of AI and Data ScienceThe Ethical Frontier of AI and Data Analysis10.4018/979-8-3693-2964-1.ch018(295-312)Online publication date: 12-Apr-2024
    • (2024)Revolution Ethics of Data Science and AIThe Ethical Frontier of AI and Data Analysis10.4018/979-8-3693-2964-1.ch015(245-256)Online publication date: 12-Apr-2024
    • (2024)Unleashing the Power of Cloud Computing for Data SciencePractical Applications of Data Processing, Algorithms, and Modeling10.4018/979-8-3693-2909-2.ch017(222-233)Online publication date: 14-Jun-2024
    • (2024)Reference Architecture for the Integration of Prescriptive Analytics Use Cases in Smart FactoriesMathematics10.3390/math1217266312:17(2663)Online publication date: 27-Aug-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media