Purity: a New Dimension for Measuring Data Centralization Quality
Abstract
1 Introduction
1.1 Problem Statement
1.2 Contribution
2 Background
2.1 Data Centralization on Cloud
2.2 Existing Data Quality Dimensions
3 Related Work
4 Methodology for measuring the purity and the quality of the data
5 Data Purity Dimensions
5.1 Degree Centrality
5.2 Betweenness Centrality
5.3 Closeness Centrality
5.4 Centralized Quality
6 Mathematical Computing for Purity Dimension
6.1 Centrality Formulas
6.2 Centralized Quality Formulas
6.2.1 Timeliness and Uniqueness.
6.2.2 Accuracy, Completeness, Consistency, and Validity.
7 Validation Scenario
7.1 Mobility Use Case
Dataset | D | B | C | |
1 | CAM data | 0.25 | 0 | 0 |
2 | DENM data | 0.25 | 0 | 0 |
3 | DATEX data | 0.12 | 0 | 0 |
4 | Meteo data | 0.38 | 0 | 0 |
5 | CAM meteo | 0.25 | 0 | 0.25 |
6 | DENM meteo | 0.25 | 0 | 0.25 |
7 | CAM Datex | 0.38 | 0.07 | 0.25 |
8 | ITSC data | 0.38 | 0.07 | 0.33 |
9 | ITSC meteo | 0.25 | 0 | 0.38 |
D: Degree, B: Betweenness, C: Closeness |
Timeliness F. 4 | Completeness F. 5 and 6 | ||||
Amount of Data | Duplicate Rows | Real Quality | Predicted Quality | Real Quality | Predicted Quality |
500 | 0 | 0.5999 | 0.5999 | 0.5547 | 0.5547 |
500 | 90 | 0.5849 | 0.5554 | 0.555 | 0.553 |
500 | 180 | 0.5979 | 0.5870 | 0.5602 | 0.5533 |
500 | 300 | 0.5784 | 0.5694 | 0.5604 | 0.5521 |
1500 | 0 | 0.617 | 0.617 | 0.557 | 0.557 |
1500 | 90 | 0.6158 | 0.5915 | 0.5584 | 0.5536 |
1500 | 180 | 0.624 | 0.6139 | 0.5622 | 0.5544 |
1500 | 300 | 0.6367 | 0.6144 | 0.5636 | 0.5536 |
3000 | 0 | 0.6086 | 0.6086 | 0.5544 | 0.5544 |
3000 | 90 | 0.6103 | 0.6022 | 0.555 | 0.5525 |
3000 | 180 | 0.6286 | 0.6069 | 0.5592 | 0.5529 |
3000 | 300 | 0.6364 | 0.6154 | 0.5735 | 0.5598 |
7.2 Results
8 Conclusions and Future Work
Acknowledgments
Footnotes
References
Index Terms
- Purity: a New Dimension for Measuring Data Centralization Quality
Recommendations
A Review on Data Cleansing Methods for Big Data
AbstractMassive amounts of data are available for the organization which will influence their business decision. Data collected from the various resources are dirty and this will affect the accuracy of prediction result. Data cleansing offers a better ...
Relating Big Data and Data Quality in Financial Service Organizations
Challenges and Opportunities in the Digital EraAbstractToday’s financial service organizations have a data deluge. A number of V’s are often used to characterize big data, whereas traditional data quality is characterized by a number of dimensions. Our objective is to investigate the complex ...
Towards Data Quality into the Data Warehouse Development
DASC '11: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure ComputingCommonly, DW development methodologies, paying little attention to the problem of data quality and completeness. One of the common mistakes made during the planning of a data warehousing project is to assume that data quality will be addressed during ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 163Total Downloads
- Downloads (Last 12 months)163
- Downloads (Last 6 weeks)163
Other Metrics
Citations
View Options
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML FormatLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in