[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Wilkinson's Tests and SQL Packages

Published: 20 December 2019 Publication History

Abstract

Wilkinson's Tests are used to benchmark the accuracy of some statistical functions in six SQL packages: Apache Hive, Microsoft Access, Microsoft SQL Server, MySQL, Oracle 11g SQL, and SAP Hana. Using the best choice of data type, we find that different packages use different rounding schemes, two packages use unreliable algorithms to compute the sample variance, one package returns the population standard deviation when the sample standard deviation is called, and one package has an unstable algorithm for computing the correlation coefficient. Using the wrong data type all but guarantees inaccurate results.

References

[1]
Michael Alexander. Microsoft Access 2007 Data Analysis. Wiley, 2007.
[2]
anonymous. https://social.msdn.micro soft.com/forums/sqlserver/en-US/9daa1b 60-d11c-421b-8b87-e38a299e372c/roundi ng-off-issue-during-oracle-to-sql-ser ver-migration, 2013. Accessed: 2019-07--15.
[3]
U. Bankhofer and A. Hilbert. Statistical software packages for windows: A market survey. Statistical Papers, 38:393--407, 1997.
[4]
Joe Celko. SQL for Smarties: Advanced SQL Programming, 5e. Morgan Kaufman, 2015.
[5]
T. F. Chan, Golub G. H. and R. J. Leveque. Algorithms for computing the sample variance: Analysis and recommendations. American Statistician, 37:242--247, 1983.
[6]
Hwan-sik Choi and Nicholas Kiefer. Software evaluation: Easyreg international. International Journal of Forecasting, 21(3):609--616, 2005.
[7]
Marin Fotache and Catalin Strimbel. Sql and data analysis. some implications for data analysis and higher education. Procedia Economics and Finance, 20:243--251, 2015.
[8]
Kellie Keeling and Robert Pavur. Statistical accuracy of spreadsheet software. The American Statistician, 65(4):265--273, 2011.
[9]
R. F. Ling. Comparison of several algorithms for computing sample means and variances. Journal of the American Statistical Association, 69:859866, 1974.
[10]
Gordon S. Linoff. Data analysis using SQL and Excel. Wiley, 2015.
[11]
Richard G. Lomax. Statistical accuracy of ipad applications: An initial examination. The American Statistician, 67(2):105--108, 2009.
[12]
Wei Lu, Jiajia HouYing, YanMeihui, Zhang Xiaoyong, and Thomas Moscibroda. Msql: efficient similarity search in metric spaces using sql. The VLDB Journal, 26(3):829--854, 2017.
[13]
B. D. McCullough. Wilkinson's tests and econometric software. Journal of Economic and Social Measurement, 29(1--3):261--270, 2004.
[14]
B. D. McCullough and A. Talha Yalta. Spreadsheets in the cloud -- not ready yet. Journal of Statistical Software, 52(7):1--14, 2013.
[15]
Kamat Niranjan and Arnab Nandi. A closer look at variance implementations in modern database systems. SIGMOD Record, 45(4):28--33, 2016.
[16]
C. Ordonez and S. K. Pitchaimalai. Bayesian classifiers programmed in sql. IEEE Transactions on Knowledge and Data Engineering, 22(1):139--144, 2010.
[17]
William R. Pearson and Aaron J. Mackey. Using sql databases for sequence similarity searching and analysis. Current Protocols in Bioinformatics, 59(1):1--22, 2017.
[18]
G. Sawitzki. Report on the numerical reliability of data analysis systems. Computational Statistics and Data Analysis, 18(2):289--301, 1994.
[19]
Charles Severance. Elizabeth Fong: Creating the SQL Database Standards. Computer, 47(8):7--8, 2014.
[20]
Robert P. Trueblood and Jr. John N. Lovett. Data mining and statistical analysis using SQL. Apress, 2001.
[21]
Leland Wilkinson. Statistics Quiz. Systat Inc., 1985. http://web.stanford.edu/~clint/ bench/wilk.txt.
[22]
Leland Wilkinson. Practical guidelines for testing statistical software. In P. Dirschedl and R. Ostermann, editors, Computational Statistics. Physica-Verlag, Heidelberg, 1994.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 48, Issue 3
September 2019
39 pages
ISSN:0163-5808
DOI:10.1145/3377391
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2019
Published in SIGMOD Volume 48, Issue 3

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 96
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media