Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology
<p>Flowchart of the methodology.</p> "> Figure 2
<p>The organizational structure of data cells. (<b>a</b>) The structure of the initial data cell <math display="inline"><semantics> <mrow> <mi>u</mi> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>c</mi> </mrow> <mrow> <mi>u</mi> </mrow> </msub> </mrow> </semantics></math> represents the information nucleus of <math display="inline"><semantics> <mrow> <mi>u</mi> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mi>u</mi> <mo>,</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mi>u</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> represent <math display="inline"><semantics> <mrow> <mi>n</mi> </mrow> </semantics></math> different strategies; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>o</mi> <mi>c</mi> </mrow> <mrow> <mi>u</mi> <mo>,</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>o</mi> <mi>c</mi> </mrow> <mrow> <mi>u</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> represent the data pipes corresponding to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math>; (<b>b</b>) The structure of the binary combined data cell <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>c</mi> </mrow> <mrow> <mfenced separators="|"> <mrow> <mi>u</mi> <mo>,</mo> <mi>v</mi> </mrow> </mfenced> </mrow> </msub> </mrow> </semantics></math> represents the information nucleus of the binary combined data cell <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> (Larger yellow square outside cells <math display="inline"><semantics> <mrow> <mi>u</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>v</mi> </mrow> </semantics></math>) represent <math display="inline"><semantics> <mrow> <mi>n</mi> </mrow> </semantics></math> different strategies; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>o</mi> <mi>c</mi> </mrow> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> <mo>,</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>o</mi> <mi>c</mi> </mrow> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> represent the data pipes corresponding to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> <mo>,</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math>.</p> "> Figure 3
<p>Sketch of the binary combined data cell <math display="inline"><semantics> <mrow> <mfenced separators="|"> <mrow> <mi>u</mi> <mo>,</mo> <mi>v</mi> </mrow> </mfenced> </mrow> </semantics></math>.</p> "> Figure 4
<p>Construction of information nucleuses of initial data cells. (<b>a</b>) table-type relational databases; (<b>b</b>) document-type non-relational databases. (<b>c</b>) image-type databases; (<b>d</b>) video-type databases.</p> "> Figure 5
<p>The plus (+) data pipe with the information nucleus “weight” is associated with the output (⬅) data pipe with the information nucleus “height” to form the unified expression “weight + height”.</p> "> Figure 6
<p>Symmetric association of data pipes and asymmetric association of data pipes. (<b>a</b>) and (<b>b</b>) are symmetric and (<b>c</b>) and (<b>d</b>) are asymmetric. The symbols “+”, “&”, “−”, and “confidence (A, B)” in the figure represent “addition”, “and”, “subtraction”, and “confidence” operations, respectively.</p> "> Figure 7
<p>The information-driven intelligent information discovery method.</p> "> Figure 8
<p>The task-driven intelligent information discovery method.</p> ">
Abstract
:1. Introduction
- Propose a data autonomous association method for multi-source heterogeneous data based on the organizational structure of data cells. Initial data cells are constructed according to the definition of the organizational structure of data cells; the concepts of symmetric random association and asymmetric random association of data pipes are defined by analyzing the symmetric and asymmetric strategic associations of data. Information nucleuses of combined data cells containing independent association information are generated using multi-level association of data pipes. All information nucleuses of combined data cells are obtained to form the set of information nucleuses of combined data cells.
- Propose an information-driven intelligent information discovery method. A reward and punishment model is constructed and the model is trained by the manual scoring data of information nucleuses of combined data cells. Ultimately, the trained model replaces the manual labor in intelligent scoring of information nucleuses and realizes the intelligent screening and storage of meaningful information nucleuses, which contain a large number of meaningful associated information.
- Propose a task-driven intelligent information discovery method. Based on realistic task requests, the collection of subject headings related to the task request is obtained by the topic model of natural language understanding (NLU). Subject headings are used to search for information nucleuses of initial data cells to obtain the set of matched initial data cells. The data pipes of matched initial data cells are randomly associated to form combined data cells, from which the set of information nucleuses of the matched combined data cells related to task requests can be obtained. A reward and punishment model is constructed and the model is trained using the manually sorted data of information nucleuses of combined data cells related to task requests. The trained model achieves intelligent sorting of information nucleuses, realizing task-driven intelligent information discovery.
2. Related Works
2.1. Multi-Source Heterogeneous Data Fusion
2.2. Data Association and Autonomous Knowledge Discovery
2.3. Data Self-Intelligence
3. Methodology
3.1. Data Cells
- Information nucleus: The information nucleus is the core constituent of the initial data cells, which carry the intelligent implementation of data. Strategies and data pipes exist based on the information nucleus. There is no one-to-one correspondence between the information nucleus and things in the real world, but a kind of abstraction of transactions with the same characteristics. The information nucleus is embeddable and objective: The embeddability of the information nucleus refers to the semi-automatic or automatic association between the information nucleus and the database system; the objectivity of the information nucleus means that although information nucleuses do not correspond to the real world one by one, they all have practical significance.
- Strategies: Strategies refer to a set of algorithms and scripts, which is the method and process of realizing specific transactions based on the data and logic provided by the information nucleus. Strategies are divided into internal strategies and external strategies. It has been explained earlier that the data cells are hierarchical, so the internal strategy and the external strategy are relative. For example, strategies within the combined data cells are internal strategies relative to the combined data cell and are external strategies to the initial data cells that make up the combined data cell.
- Data pipes: Data pipes are channels via which the initial data cells carry out information transfer and communication. Data pipes are not independent but have a one-to-one relationship with strategies. The construction of the information nucleuses of combined data cells is based on the association of data pipes of initial data cells.
3.2. Construction of Initial Data Cells
3.2.1. Information Nucleus Construction
3.2.2. Strategy Construction
- Input strategy and output strategy.
- 2.
- Data normalization strategies.
- 3.
- Mathematical operation strategies.
- 4.
- Logical operation strategies.
- Association rule strategies.
- 2.
- Spatio-temporal relationship strategies.
- 3.
- Empirical model strategies.
3.2.3. Data Pipe Construction
3.3. Autonomous Association of Multi-Source Heterogeneous Data Based on the Organizational Structure of Data Cells
3.4. Information-Driven Intelligent Information Discovery
3.5. Task-Driven Intelligent Information Discovery
4. Examples of Applications
4.1. The Example of the Information-Driven Method
4.2. The Example of the Task-Driven Method
5. Discussion
- Theoretical feasibility: Field names, key-value names, or technical names in the database are data items of the same type divided according to some boundaries, which represent specific transactions. The data association is the analysis of the relationship between transactions. This paper takes field names, key-value names, or technical names as information nucleuses of initial data cells, and the nucleuses of combined data cells are generated by random association of data pipes of initial data cells, so the construction of information nucleuses are in accordance with the essence of data association. The construction of strategies and data pipes and the association of data pipes realize the data autonomous association. Strategies are all kinds of common algorithms and scripts, and unified expressions are generated by the association of data pipes, which realizes the association of multiple data sources with different types and formats. The process of association of data pipes is random, thus maximizing the acquisition of associated new information. According to Section 3.2.2, the unified expressions formed by the autonomous association of data pipes need to be judged using arithmetic rules and practical significance. Therefore, we select expressions that meet arithmetic rules and classify associations into symmetric and asymmetric categories to construct information nucleuses of combined data cells and the reward and punishment models from two perspectives are trained to simulate the manual screening process. The information-driven type directly filters nucleuses, and the task-driven type is task-oriented to the requirements of real applications. In conclusion, the method in this paper extracts meaningful associated information from subjective and objective perspectives and from overall and local perspectives.
- Technical feasibility: The construction of information nucleuses of initial data cells is the process of extracting field names, key-value names, or technical names, which can be obtained directly using related database statements and other critical information can be obtained using various existing algorithms, such as image-recognition algorithms. Strategies are divided into underlying strategies and advanced strategies, where underlying strategies are common basic mathematical operations and advanced strategies are some existing complex algorithms or combinations of different strategies, which can be realized via algorithm development. The association of data pipes is a sequential combination or connection of information nucleuses and strategies. Expressions that fail to conform to arithmetic rules can imitate “calculator” applications, where logical judgments are implemented using manually predefined rules. The symmetry of associations of data pipes can also be judged using manual predefinition. For the two intelligent information discovery methods, we have given loss functions and training data formats for the reward and punishment model, so the model can be implemented via more detailed designing and programming. The training data can be obtained using manual scoring or ranking of samples. When the empirical knowledge reaches a certain level, the model can spontaneously screen meaningful information nucleuses to realize autonomous information discovery. The meaningful nucleuses are stored in the cloud brain, which is essentially a relational database, and can be designed with reference to relational databases.
- Feasibility of data security: The data invoked in the actual application is according to the unified expressions of meaningful information nucleuses stored in the cloud brain. Although data cells are directly connected with the database, and the data are called and analyzed without changing the source data and source environment, the data invoked according to the unified expression have been calculated using various strategies, which are far different from the original data, and the safety of the original data is initially guaranteed. In addition, for further guarantees of data security, the managers of data can restrict the permission of information nucleuses in the cloud brain to different personnel or departments.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, B.; Qi, P.; Liu, B.; Di, S.; Liu, J.; Pei, J.; Yi, J.; Zhou, B. Trustworthy AI: From principles to practices. ACM Comput. Surv. 2023, 55, 177. [Google Scholar] [CrossRef]
- Lotfian, M.; Ingensand, J.; Brovelli, M.A. The partnership of citizen science and machine learning: Benefits, risks, and future challenges for engagement, data collection, and data quality. Sustainability 2021, 13, 8087. [Google Scholar] [CrossRef]
- Zha, D.; Bhat, Z.P.; Lai, K.-H.; Yang, F.; Hu, X. Data-centricai: Perspectives and challenges. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), Minneapolis, MN, USA, 27–29 April 2023; pp. 945–948. [Google Scholar]
- Wang, T. A novel approach of integrating natural language processing techniques with fuzzy TOPSIS for product evaluation. Symmetry 2022, 14, 120. [Google Scholar] [CrossRef]
- Shen, Z.; Zhang, X.; Liu, Z. PM2 VE: Power Metering Model for Virtualization Environments in Cloud Data Centers. IEEE Trans. Cloud Comput. 2023, 11, 3126–3138. [Google Scholar]
- Ethan, A. Data Virtualization: The Key to Realizing Big Data Analytics Potential. Int. J. Comput. Sci. Inf. 2022, 6, 20–50. [Google Scholar]
- Shiva, L. Data Virtualization Best Practices for Advanced Analytics in Big Data. Int. J. Comput. Sci. Inf. 2022, 6, 39–66. [Google Scholar]
- Al-Okaily, A.; Al-Okaily, M.; Teoh, A.P.; Al-Debei, M.M. An empirical study on data warehouse systems effectiveness: The case of Jordanian banks in the business intelligence era. EuroMed J. Bus. 2023, 18, 489–510. [Google Scholar] [CrossRef]
- Nambiar, A.; Mundra, D. An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management. Big Data Cogn. Comput. 2022, 6, 132. [Google Scholar] [CrossRef]
- Oueslati, W.; Tahri, S.; Limam, H.; Akaichi, J. A systematic review on moving objects’ trajectory data and trajectory data warehouse modeling. Comput. Sci. Rev. 2023, 47, 100516. [Google Scholar] [CrossRef]
- Porshnev, S.; Borodin, A.; Ponomareva, O.; Mirvoda, S.; Chernova, O. The development of a heterogeneous MP data model based on the ontological approach. Symmetry 2021, 13, 813. [Google Scholar] [CrossRef]
- Muniswamaiah, M.; Agerwala, T.; Tappert, C. Data virtualization for decision making in big data. Int. J. Softw. Eng. Appl. 2019, 10, 45–53. [Google Scholar] [CrossRef]
- Saxena, G.; Agarwal, B.B. Data Warehouse Designing: Dimensional Modelling and ER Modelling. Int. J. Eng. Invent. 2014, 3, 28–34. [Google Scholar]
- Togatorop, P.R.; Sitorus, D.; Purba, Y.; Tarigan, A.M. Twitter Data Warehouse and Business Intelligence Using Dimensional Model and Data Mining. In Proceedings of the 2022 IEEE International Conference of Computer Science and Information Technology (ICOSNIKOM), Laguboti, Sumatera Utara, Indonesia, 19–21 October 2022; pp. 1–6. [Google Scholar]
- Rodríguez-Mazahua, N.; Rodríguez-Mazahua, L.; López-Chau, A.; Alor-Hernández, G.; Machorro-Cano, I. Decision-Tree-Based Horizontal Fragmentation Method for Data Warehouses. Appl. Sci. 2022, 12, 10942. [Google Scholar] [CrossRef]
- Witanto, E.N.; Oktian, Y.E.; Lee, S.-G. Toward data integrity architecture for cloud-based AI systems. Symmetry 2022, 14, 273. [Google Scholar] [CrossRef]
- Wu, X.; Duan, J.; Pan, Y.; Li, M. Medical knowledge graph: Data sources, construction, reasoning, and applications. Big Data Min. Anal. 2023, 6, 201–217. [Google Scholar] [CrossRef]
- Hassan, M.M.; Karim, A.; Mollick, S.; Azam, S.; Ignatious, E.; Al Haque, A.F. An Apriori Algorithm-Based Association Rule Analysis to detect Human Suicidal Behaviour. Procedia Comput. Sci. 2023, 219, 1279–1288. [Google Scholar] [CrossRef]
- Liu, T.; Zhang, X.; Du, P.; Du, Q.; Li, A.; Gong, L. Knowledge Discovery Method from Text Big Data for Earthquake Emergency. Geomat. Inf. Sci. Wuhan Univ. 2020, 45, 1205–1213. [Google Scholar]
- Cao, S.J.; Cao, R.Y. Research on Interdisciplinary Knowledge Discovery Based on Knowledge Graph to Support Scientific Research Innovation. Inf. Stud. Theroy Appl. 2022, 45, 45–53. [Google Scholar]
- Huang, X.; Liu, Y.; Huang, L.; Onstein, E.; Merschbrock, C. BIM and IoT data fusion: The data process model perspective. Autom. Constr. 2023, 149, 104792. [Google Scholar] [CrossRef]
- Moreno, C.; González, R.A.C.; Viedma, E.H. Data and artificial intelligence strategy: A conceptual enterprise big data cloud architecture to enable market-oriented organisations. Int. J. Interact. 2019, 5, 7–14. [Google Scholar] [CrossRef]
- Yang, J.-T.; Chen, W.-Y.; Li, C.-H.; Huang, S.C.-H.; Wu, H.-C. APPFLChain: A Privacy Protection Distributed Artificial-Intelligence Architecture Based on Federated Learning and Consortium Blockchain. arXiv 2022, arXiv:2206.12790. [Google Scholar]
- Liu, J.; Li, T.; Xie, P.; Du, S.; Teng, F.; Yang, X. Urban big data fusion based on deep learning: An overview. Inf. Fusion 2020, 53, 123–133. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, C.; Yu, B.; Li, Y. A general multi-source data fusion framework. In Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; pp. 285–289. [Google Scholar]
- Ji, Z.Y.; Pi, H.Y.; Yao, W. A hybrid recommendation model based on fusion of multi-source heterogeneous data. J. Beijing Univ. Posts Telecommun. 2019, 42, 126. [Google Scholar]
- Liu, Z.; Liu, H.; Huang, W.; Wang, B.; Sun, F. Audiovisual cross-modal material surface retrieval. Neural Comput. Appl. 2020, 32, 14301–14309. [Google Scholar] [CrossRef]
- Meng, F.; Li, A.; Liu, Z. An Evidence theory and data fusion based classification method for decision making. Procedia Comput. Sci. 2022, 199, 892–899. [Google Scholar] [CrossRef]
- Shu, X.; Ye, Y. Knowledge Discovery: Methods from data mining and machine learning. Soc. Sci. Res 2023, 110, 102817. [Google Scholar] [CrossRef] [PubMed]
- Rajput, D.S.; Meena, G.; Acharya, M.; Mohbey, K.K. Fault prediction using fuzzy convolution neural network on IoT environment with heterogeneous sensing data fusion. Meas. Sens. 2023, 26, 100701. [Google Scholar] [CrossRef]
- Abdulahi Hasan, A.; Fang, H. Data Mining in Education: Discussing Knowledge Discovery in Database (KDD) with Cluster Associative Study. In Proceedings of the 2021 2nd International Conference on Artificial Intelligence and Information Systems, Chongqing, China, 28–30 May 2021; pp. 1–6. [Google Scholar]
- Mollaei, N.; Fujao, C.; Rodrigues, J.; Cepeda, C.; Gamboa, H. Occupational health knowledge discovery based on association rules applied to workers’ body parts protection: A case study in the automotive industry. Comput. Methods Biomech. Biomed. 2023, 26, 1875–1888. [Google Scholar] [CrossRef]
- Jun, D.; Ruan, W. Research on Knowledge Map and Multidimensional Knowledge Discovery of Oral History Archives Re-sources. Libr. Inf. Serv. 2022, 66, 4–16. [Google Scholar]
- Janssen, M.; Brous, P.; Estevez, E.; Barbosa, L.S.; Janowski, T. Data governance: Organizing data for trustworthy Artificial Intelligence. Gov. Inf. Q. 2020, 37, 101493. [Google Scholar] [CrossRef]
- Di Vaio, A.; Hassan, R.; Alavoine, C. Data intelligence and analytics: A bibliometric analysis of human–Artificial intelligence in public sector decision-making effectiveness. Technol. Forecast. Soc. Chang. 2022, 174, 121201. [Google Scholar] [CrossRef]
- Zhi-Qiang, P.; Jian-Qiang, Y.; Zhen, L.; Teng-Hai, Q.; Jin-Lin, S.; Fei-Mo, L. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Autom. Sin. 2022, 48, 627–643. [Google Scholar]
- Zhe, J.; Yin, Z.; Fei, W.; Wenwu, Z.; Yunhe, P. Artificial Intelligence Algorithms Based on Data-driven and Knowledge-guided Models. J. Electron. Sci. Technol. 2023, 45, 2580–2594. [Google Scholar]
- Zhang, J.; Xiao, W.; Li, Y. Data and knowledge twin driven integration for large-scale device-free localization. IEEE Internet Things J. 2020, 8, 320–331. [Google Scholar] [CrossRef]
- Zhu, J.; Chai, M.; Zhou, W. Three-three-three network architecture and learning optimization mechanism for B5G/6G. J. Commun. 2021, 42, 62–75. [Google Scholar]
- Sarker, I.H. Data science and analytics: An overview from data-driven smart computing, decision-making and applications perspective. SN Comput. Sci. 2021, 2, 377. [Google Scholar] [CrossRef] [PubMed]
- Yin, T.; Lu, N.; Guo, G.; Lei, Y.; Wang, S.; Guan, X. Knowledge and data dual-driven transfer network for industrial robot fault diagnosis. Mech. Syst. Signal Process. 2023, 182, 109597. [Google Scholar] [CrossRef]
- Yin, J.; Ren, X.; Liu, R.; Tang, T.; Su, S. Quantitative analysis for resilience-based urban rail systems: A hybrid knowledge-based and data-driven approach. Reliab. Eng. Syst. Saf. 2022, 219, 108183. [Google Scholar] [CrossRef]
- Destro, F.; Salmon, A.J.; Facco, P.; Pantelides, C.C.; Bezzo, F.; Barolo, M. Monitoring a segmented fluid bed dryer by hybrid data-driven/knowledge-driven modeling. IFAC-PapersOnLine 2020, 53, 11638–11643. [Google Scholar] [CrossRef]
- Wang, H.; Mao, K.; Yuan, Z.; Shi, J.; Cao, M.; Qin, Z.; Duan, S.; Tang, B. A method for land surface temperature retrieval based on model-data-knowledge-driven and deep learning. Remote Sens. Environ. 2021, 265, 112665. [Google Scholar] [CrossRef]
- Wu, Z.; Zhang, Y.; Dong, Z. Prediction of NOx emission concentration from coal-fired power plant based on joint knowledge and data driven. Energy 2023, 271, 127044. [Google Scholar] [CrossRef]
- Wu, W.; Song, C.; Liu, J.; Zhao, J. Data-knowledge-driven distributed monitoring for large-scale processes based on digraph. J. Process Control 2022, 109, 60–73. [Google Scholar] [CrossRef]
- Shi, Z. Image semantic analysis and understanding. In Proceedings of the International Conference on Intelligent Information Processing, Manchester, UK, 13–16 October 2010; pp. 4–5. [Google Scholar]
- Kulkarni, G.; Premraj, V.; Ordonez, V.; Dhar, S.; Li, S.; Choi, Y.; Berg, A.C.; Berg, T.L. Babytalk: Understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. 2013, 35, 2891–2903. [Google Scholar] [CrossRef] [PubMed]
- Cohn, N.; Jackendoff, R.; Holcomb, P.J.; Kuperberg, G.R. The grammar of visual narrative: Neural evidence for constituent structure in sequential image comprehension. Neuropsychologia 2014, 64, 63–70. [Google Scholar] [CrossRef] [PubMed]
- Dong, J.; Li, X.; Snoek, C.G. Predicting visual features from text for image and video caption retrieval. IEEE Trans. Multimed. 2018, 20, 3377–3388. [Google Scholar] [CrossRef]
- Han, M.; Wang, Y.; Chang, X.; Qiao, Y. Mining inter-video proposal relations for video object detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 431–446. [Google Scholar]
- Yang, H.; Zhao, X.; Wang, L. Review of data normalization methods. Comput. Appl. Eng. Educ. 2023, 59, 13–22. [Google Scholar] [CrossRef]
- Ahmad, Z.; Al-Thani, N.J. Undergraduate Research Experience Models: A systematic review of the literature from 2011 to 2021. Int. J. Educ. Res. 2022, 114, 101996. [Google Scholar] [CrossRef]
- Rafailov, R.; Sharma, A.; Mitchell, E.; Ermon, S.; Manning, C.D.; Finn, C. Direct preference optimization: Your language model is secretly a reward model. arXiv 2023, arXiv:2305.18290. [Google Scholar]
- Churchill, R.; Singh, L. The evolution of topic modeling. ACM Comput. Surv. 2022, 54, 215. [Google Scholar] [CrossRef]
- Tarakeswar, M.K.; Kavitha, D. Search engines: A study. J. Comput. Appl. 2011, 4, 29–33. [Google Scholar]
Score | Practical Significance Description of Information Nucleuses |
---|---|
1 | nucleuses with no practical significance. |
2 | nucleuses with a little practical significance. |
3 | nucleuses with general practical significance. |
4 | nucleuses with great practical significance. |
5 | nucleuses with common practical significance. |
Strategies | Abbreviation |
---|---|
Input | |
Output | |
Data normalization | |
Addition | |
Subtraction | |
Multiplication | |
Division | |
With | |
Or | |
Non | |
Supportability | |
Confidence | |
Spatio-temporal relationship | |
Empirical modeling |
No. | Nucleuses | Explanation |
---|---|---|
1 | ←IT ←IP→II | Input a year and a city, output the city’s cultural income for the year. |
2 | (←IT←IP→II) − (←TT←TP→TI) | When the input satisfies “←IT = ←TT, ←IP = ←TP”, it indicates the difference between tourism income and cultural income in the given city in the given year. (When “−” is replaced by “/”, it indicates the income ratio; when “−” is replaced by “+”, it indicates the sum of income). |
3 | (←IT1←IP1→II1)–(←IT2←IP2→II2) | When the input satisfies “←IT1 = ←IT2, ←IP1≠←IP2”, it indicates the cultural income difference for the same year of different cities. When “←IT1 ≠ ←IT2, ←IP1 = ←IP2”, it indicates the cultural income difference in the given city in different years. (“−” replaced by “+” for sum of income and “−” replaced by “/” for income ratio) |
4 | ((←IT1←IP1→II1)–(←IT2←IP2→II2))/(←IT3←IP3→II3)) | When the input satisfies “←IP1 = ←IP2 = ←IP3, ←IT2 = ←IT3”, and IT1 and IT2 are adjacent years, it indicates the annual growth rate of cultural income of a city. |
5 | (((←IT1←IP1→II1)–(←IT2←IP2→II2))/(←IT3←IP3→II3)) && (((←TT1←TP1→TI1)–(←TT2←TP2→TI2))/(←TT3←TP3→TI3)) | When the inputs satisfy “←IP1 = ←IP2 = ←IP3 = ←IP4 = ←IP5 = ←IP6, ←IT2 = ←IT3”, IT1 and IT2 are adjacent years, “←IT1 = ←TT1, ←IT2 = ←TT2, ←IT3 = ←TT3 “, it indicates the positive and negative correlation analysis of cultural income and tourism income of a city. |
6 | Model((←IT1←IP1→II1), (←IT2←IP2→II2) …(←ITn←IPn→IIn)) | Empirical modeling analysis of changes in cultural income over time, or empirical modeling analysis of future cultural income, etc. |
7 | Model (((←IT1←IP1→II1), (←IT2←IP2→II2), …(←ITn←IPn→IIn)), ((←TT1←TP1→TI1), (←TT2←TP2→TI2), …, (←TTn←TPn→TIn))) | Mining the relationship between cultural income and tourism income, etc. |
Rank | Nucleuses of Combined Data Cells |
---|---|
1 | ←city*←date*←air temperature*←temperature*←air pollution index*…… |
2 | ←city*←date*←air temperature*←temperature*←air quality*←air pollution index*…… |
3 | ←city*←date*←air temperature*←temperature* |
… | … |
n−1 | ←air quality* |
n | ←air quality |
Type and Range of Data Processed | Manual Dependency | Association Pattern between Data | Deep Mining of Data Association | Robustness | |
---|---|---|---|---|---|
Traditional data association [18,19,20] | Several specific types | High. Requires data association definition | Relies on advance manual definition | No | Weak |
Data warehouse [5,6,7] | Multimodal data around a topic | High. Requires storage layer design | Relies on advance manual definition | No | General |
Data virtualization [8,9,10,11] | Multimodal data around a topic | High. Requires virtual layer design | Relies on advance manual definition | No | General |
Our method | Wide range of multimodal data | Low. Only manual annotation of samples | Autonomous association | Yes | Strong |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, W.; Li, J.; Jiang, J.; Wang, B.; Wang, Q.; Gao, E.; Yue, T. Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology. Symmetry 2024, 16, 81. https://doi.org/10.3390/sym16010081
Wang W, Li J, Jiang J, Wang B, Wang Q, Gao E, Yue T. Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology. Symmetry. 2024; 16(1):81. https://doi.org/10.3390/sym16010081
Chicago/Turabian StyleWang, Wei, Jingwen Li, Jianwu Jiang, Bo Wang, Qingyang Wang, Ertao Gao, and Tao Yue. 2024. "Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology" Symmetry 16, no. 1: 81. https://doi.org/10.3390/sym16010081
APA StyleWang, W., Li, J., Jiang, J., Wang, B., Wang, Q., Gao, E., & Yue, T. (2024). Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology. Symmetry, 16(1), 81. https://doi.org/10.3390/sym16010081