Abstract
Latest research studies on multi-dimensional design have combined business data with User-Generated Content (UGC). They have integrated new analytical aspects, such as user’s behavior, sentiments, opinions or topics of interest, to ameliorate decisional analysis. In this paper, we deal with the complexity of designing topics dimension schema due to the dynamicity and heterogeneity of its hierarchies. Researchers addressed partially this issue by offering technical solutions to topics detection without focusing on the Extraction, Transformation and Loading (ETL) process allowing their integration in multi-dimensional schema. Our contribution consists in modeling ETL steps generating valid topic dimension hierarchies referring to UGC informal texts. In this research work, we propose a generic ETL4SocialTopic process model defining a set of operations executed following a specific order. The implementation of these steps offers a set of customized jobs simplifying the ETL designer’s work by automating a large part of the process. Experimentation results show the consistency of ETL4SocialTopic to design valid topic dimension schemas in several contexts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Muntean, M., Cabău, L.G., Rinciog, V.: Social business intelligence: a new perspective for decision makers. Proc.-Soc. Behav. Sci. 124, 562–567 (2014)
Gallinucci, E., Golfarelli, M., Rizzi, S.: Meta-stars: multidimensional modeling for social business intelligence. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP, pp. 11–18 (2013)
Gallinucci, E., Golfarelli, M., Rizzi, S.: Advanced topic modeling for social business intelligence. Inf. Syst. 53, 87–106 (2015)
Rehman, N.U., Weiler, A., Scholl, M.H.: OLAPing social media: the case of Twitter. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1139–1146. IEEE (2013)
Dayal, U., Gupta, C., Castellanos, M., Wang, S., Garcia-Solaco, M.: Of cubes, DAGs and hierarchical correlations: a novel conceptual model for analyzing social media data. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 30–49. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34002-4_3
Francia, M., Gallinucci, E., Golfarelli, M., Rizzi, S.: Social business intelligence in action. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 33–48. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_3
Gutiérrez-Batista, K., et al.: Building a contextual dimension for OLAP using textual data from social networks. Expert Syst. Appl. 93, 118–133 (2018)
Kurnia, P.F.: Business intelligence model to analyze social media information. Proc. Comput. Sci. 135, 5–14 (2018)
Rehman, N.U., et al.: Building a data warehouse for twitter stream exploration. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1341–1348. IEEE (2012)
Mukherjee, R., Kar, P.: A comparative review of data warehousing ETL tools with new trends and industry insight. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 943–948. IEEE (2017)
El Akkaoui, Z., Mazón, J.-N., Vaisman, A., Zimányi, E.: BPMN-based conceptual modeling of ETL processes. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 1–14. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_1
Oliveira, B., Belo, O.: BPMN patterns for ETL conceptual modelling and validation. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds.) ISMIS 2012. LNCS (LNAI), vol. 7661, pp. 445–454. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34624-8_50
Walha, A., Ghozzi, F., Gargouri, F.: From user generated content to social data warehouse: processes, operations and data modelling. Int. J. Web Eng. Technol. 14(3), 203–230 (2019)
Awiti, J., Vaisman, A.A., Zimányi, E.: Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl. Eng. 129, 101–837 (2020)
Nagamanjula, R., Pethalakshmi, A.: A novel framework based on bi-objective optimization and LAN 2 FIS for Twitter sentiment analysis. Soc. Netw. Anal. Min. 10, 1–16 (2020)
Singh, S., Manjunanh, T.N., Aswini, N.: A study on Twitter 4j libraries for data acquisition from tweets. Int. J. Comput. Appl. 975(2016), 8887 (2016)
Hemalatha, I., Saradhi Varma, G.P., Govardhan, A.: Preprocessing the informal text for efficient sentiment analysis. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 1(2), 58–61 (2012)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
El Akkaoui, Z., Vaisman, A.A., Zimányi, E.: A quality-based ETL design evaluation framework. In: ICEIS, no. 1 (2019)
Abran, A., et al.: Usability meanings and interpretations in ISO standards. Softw. Qual. J. 11(4), 325–338 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Walha, A., Ghozzi, F., Gargouri, F. (2021). Design and Execution of ETL Process to Build Topic Dimension from User-Generated Content. In: Cherfi, S., Perini, A., Nurcan, S. (eds) Research Challenges in Information Science. RCIS 2021. Lecture Notes in Business Information Processing, vol 415. Springer, Cham. https://doi.org/10.1007/978-3-030-75018-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-75018-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75017-6
Online ISBN: 978-3-030-75018-3
eBook Packages: Computer ScienceComputer Science (R0)