Abstract
How to handle missing information is essential for system efficiency and robustness in the field of the database. Missing information in big data environment tends to have richer semantics, leading to more complex computational logic, as well as affecting operations and implement. The existing methods either have limited semantic expression ability or do not consider the influence of big data environment. To solve these problems, this paper proposes a novel missing information processing method. Combining the practical case of the big data environment, we summary the missing information into two types: unknown and nonexistent value, and define four-valued logic to support the logic operation. The relational algebra is extended systematically to describe the data operations. We implement our approach on the dynamic table model in the self-developed big data management system Muldas. Experimental results on real large-scale sparse data sets show the proposed approach has the good ability of semantic expression and computational efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tsichritzis, D., Klug, A.: The ANSI/X3/SPARC DBMS framework report of the study group on database management systems. Inf. Syst. 3(3), 173–191 (1978)
Candan, K.S., Grant, J., Subrahmanian, V.: A unified treatment of null values using constraints. Inf. Sci. 98(1–4), 99–156 (1997)
Roth, M.A., Korth, H.F., Silberschatz, A.: Null values in nested relational databases. Acta Informatica 26(7), 615–642 (1989)
Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. (TODS) 4(4), 397–434 (1979)
Codd, E.F.: Missing information (applicable and inapplicable) in relational databases. ACM SIGMOD Rec. 15(4), 53–53 (1986)
Codd, E.F.: More commentary on missing information in relational databases (applicable and inapplicable information). ACM SIGMOD Rec. 16(1), 42–50 (1987)
Gessert, G.: Four valued logic for relational database systems. ACM SIGMOD Rec. 19(1), 29–35 (1990)
Vassiliou, Y.: Null values in data base management a denotational semantics approach. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 162–169. ACM (1979)
Lipski Jr., W.: On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. (TODS) 4(3), 262–296 (1979)
Date, C.: Null values in database management. In: BNCOD, pp. 147–166 (1982)
Yue, K.-B.: A more general model for handling missing information in relational databases using a 3-valued logic. ACM SIGMOD Rec. 20(3), 43–49 (1991)
Date, C.: A critique of the SQL database language. ACM SIGMOD Rec. 14(3), 8–54 (1984)
Lipski Jr., W.: On databases with incomplete information. J. ACM (JACM) 28(1), 41–70 (1981)
Cheng, X., Meng, B., Chen, Y., Zhao, P., Li, H., Wang, T., Yang, D.: Dynamic table: a layered and configurable storage structure in the cloud. In: Bao, Z., et al. (eds.) WAIM 2012. LNCS, vol. 7419, pp. 204–215. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33050-6_21
Silberschatz, A., Korth, H.F., Sudarshan, S., et al.: Database System Concepts, vol. 4. McGraw-Hill, New York (1997)
Martinez, M.V., Molinaro, C., Grant, J., Subrahmanian, V.: Customized policies for handling partial information in relational databases. IEEE Trans. Knowl. Data Eng. 25(6), 1254–1271 (2013)
Eessaar, E., Saal, E.: Evaluation of different designs to represent missing information in SQL databases. In: Elleithy, K., Sobh, T. (eds.) Innovations and Advances in Computer, Information, Systems Sciences, and Engineering, pp. 173–187. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-3535-8_14
Dugas, M., et al.: Missing semantic annotation in databases. Methods Inf. Med. 53(6), 516–517 (2014)
Hartmann, S., Kohler, H., Leck, U., Link, S., Thalheim, B., Wang, J.: Constructing armstrong tables for general cardinality constraints and not-null constraints. Ann. Math. Artif. Intell. 73(1–2), 139–165 (2015)
Acknowledgments
Shun Li is the corresponding author. This research is supported by the Natural Science Foundation of China (Grant No. 61572043), the National Key Research and Development Program (Grant No. 2016YFB1000704), and High-performance Computing Platform of Peking University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Chen, Y., Li, S., Yao, J. (2018). Processing Missing Information in Big Data Environment. In: Tan, Y., Shi, Y., Tang, Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science(), vol 10943. Springer, Cham. https://doi.org/10.1007/978-3-319-93803-5_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-93803-5_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93802-8
Online ISBN: 978-3-319-93803-5
eBook Packages: Computer ScienceComputer Science (R0)