Open AccessArticle

Research on Service-Oriented Sharing and Computing Framework of Geographic Data for Geographic Modeling and Simulation

Jin Wang

Lingkai Shi

^2,3,4,

Xuan Zhang

^2,3,4,

Kai Xu

⁵,

Zaiyang Ma

^2,3,4,

Yongning Wen

^2,3,4

and

Min Chen

^2,3,4,*

School of Geomatics Science and Technology, Nanjing Tech University, Nanjing 211816, China

Key Laboratory of the Virtual Geographic Environment, Ministry of Education of PRC, Nanjing Normal University, Nanjing 210023, China

Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing 210023, China

⁴

State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing Normal University, Nanjing 210023, China

⁵

School of Geographical Sciences, Liaoning Normal University, Dalian 116029, China

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(24), 11983; https://doi.org/10.3390/app142411983

Submission received: 24 November 2024 / Revised: 15 December 2024 / Accepted: 18 December 2024 / Published: 20 December 2024

Download

Browse Figures

Figure 1
Design of the proposed data-sharing and computing framework. "> Figure 2
Example of using UDX to describe hydrological data. "> Figure 3
The process of data service generation. "> Figure 4
The process of data service invoking. "> Figure 5
Design of workspace. "> Figure 6
The XML expression of a data-computing solution. "> Figure 7
The XML description of data-computing tasks. "> Figure 8
The execution process of data-computing tasks. "> Figure 9
The implementation of the prototype system. "> Figure 10
Raw data of the rainfall station and its UDX description. "> Figure 11
List of data-processing services in the data service container. "> Figure 12
Data configuration in the workspace. "> Figure 13
The output of the SWAT model and its visualization. ">

Versions Notes

Abstract

Geographic data are the foundation of geographic model construction, and any stage of their acquisition, processing, and analysis may have an impact on the efficiency and quality of geographic modeling and simulation. With the advent of the era of big data, a large number of data resources are generated in the field of geographic information. However, due to the heterogeneity of geographic data and the security of data usage, massive geographic data resources are difficult to fully explore and utilize, resulting in the formation of data islands. This paper proposes a service-oriented geographic data-sharing and computing framework, which provides users with a complete set of geographic data access and application processes (such as data acquisition, processing, configuration, etc.), so as to reduce the difficulty of using data and improve the efficiency of data sharing. The framework mainly consists of three core components: (1) the “Data service container” can publish data resources as data services to provide a consistent data access interface; (2) the “Workspace” provides a series of methods and tools for users to develop data-computing solutions; and (3) the “Data-computing engine” is responsible for performing computing tasks such as data processing and configuration. Finally, a case of runoff simulation using the SWAT model is designed, in which the whole process of data sharing, acquisition, calculation, and application is realized, so as to verify the validity of the proposed framework.

Keywords:

geographic data service; geographic data sharing; geographic data computing; geographic modeling and simulation

1. Introduction

Geographic modeling and simulation are important contents of the construction of digital earth, which uses mathematical methods to model the geographical process and computer programs to simulate the mechanism of geographical elements, so as to reveal the laws in the real world [1,2]. Geographic data are the foundation of geographic computing models, and the algorithm programs within the computing models need to work based on geographic data [3]. Geographic data have become a key factor in the success of geographic modeling and simulation [4].

With the rapid development of satellite remote sensing technology and various observation methods, a large amount of data resources has been accumulated in various fields of geography in recent years, and the era of geographical big data is quietly coming [5,6,7,8]. Multiple scientific data centers in the geoscience field have been established, such as the National Earth System Science Data Center [9], the ChinaGEOSS Data Sharing Network [10], the National Meteorological Science Data Center [11], and the National Marine Data Center [12]. Such large-scale data resources provide a solid data foundation for geographical research.

At present, geographical data sharing is mainly divided into three modes: (1) “Apply-download” mode, in which users need to fill in an online application form (such as the identity of the applicant, the purpose of the applied data, etc.), and download the data after it is approved. For example, all of the data centers mentioned above fall into this data-sharing mode. (2) “Open download” mode, in which users follow the agreement of the data provider and are free to download and use the data. For example, Geospatial Data Cloud [13] and HydroShare [14]. (3) “Limited-access” mode, in which, for some classified data, the data user needs to use it under supervision. For example, the cell phone signaling data [15,16]. For a long time, data sharing has almost always been carried out in these three ways. However, with the deepening of the comprehensive study of geography, the use of interdisciplinary data makes the disadvantages of traditional data sharing gradually appear.

First, when data providers share data, they cannot guarantee that their data resources will not be tampered with, misused, and abused, that is, they lose control over the data [17,18,19]. Especially cross-disciplinary researchers, for whom it is easy to misuse the data, and then lead to wrong research results, which will affect the reputation of the data provider [20,21,22]. Second, when users obtain the data files, they need to spend a lot of time reading the relevant metadata documentation, trying to understand the organization of the data, and then extract the desired data from these data files [23,24]. For interdisciplinary researchers, it may be difficult to understand these metadata documents, making it difficult for them to use the data even if they have access to it [25]. Third, the data resources obtained by users often cannot directly meet their data requirements; they need to use data-processing tools or write code to preprocess these data [26,27]. However, for some data users, this requires a high learning cost. In summary, in the context of increasingly in-depth geographical comprehensive research, sharing data through data files is difficult to ensure data-sharing security on one hand, and cannot improve the current situation of inefficient use of data by users on the other hand.

Therefore, this paper proposes a service-oriented data-sharing and computing framework, which aims to provide users with a whole-process data-access mechanism for data acquisition, processing, and application. The framework consists of three core components: (1) The “Data service container” can publish data resources as data services, so that users can obtain data by invoking data services without downloading data files. The UDX (the Universal Data eXchange) model proposed by the OpenGMS team [28] is used as the data interface of the data service, so that the heterogeneous geographic data can be structured by UDX, thus reducing the difficulty for users to understand the data [29,30,31,32]. (2) The “Workspace” provides a place for users to customize complex data requirements. Users can customize data-processing logic in the workspace by aggregating data services in the network, without downloading data to the local machine for data processing. (3) The “Data-computing engine” provides the execution environment for data-processing methods. This allows users to focus only on the data-processing logic without having to manually configure the execution environment for the data-processing method. In short, based on the proposed data-sharing and computing framework, users only need to pay attention to their own data application requirements, and customize the corresponding data-processing logic, without paying attention to the details of data processing, which greatly reduces the difficulty of data use and greatly improves the efficiency of data sharing.

The remainder of this article is structured as follows. The data-sharing and computing framework is designed in Section 2. Section 3 introduces the design and implementation of three core components of the proposed framework. In Section 4, based on the proposed data-sharing and computation framework, a case for preparing input data for the SWAT model is designed. Finally, Section 5 summarizes the research work of this paper and discusses the future work of the study.

2. Framework Design

In the process of geographic data sharing, two types of roles are usually involved. For data providers, on the one hand, they hope to ensure the security of data (that is, they do not want users to tamper with, misuse, abuse data, etc.), and on the other hand, they hope users can quickly produce more results based on their shared data to give full play to the value of data. For users, they want to be able to easily and quickly obtain data resources, and can quickly customize data to meet their personalized data requirements. To address these concerns, a service-oriented geographic data-sharing and computing framework is designed, as shown in Figure 1.

Data service containers are used to publish data resources as data services. In this paper, data services include data resource services and data-processing services. The former means that data resources are directly provided to users in the form of a data service. For example, if the data provider shares the point data set of shared bicycles in a city, the data can be directly shared in the form of a data service without any processing. In general, the data source shared by the data provider can be in any form, such as data files, databases, APIs (Application Programming Interface), etc. However, if the user needs a hotspot map based on these point data, the data need to be analyzed before the data are shared, and the hotspot analysis method can be published as a data-processing service (hotspot analysis service). In general, data-processing methods can be provided in a variety of forms, such as Python script files, dynamic link libraries, Java packages, etc. In addition, in order to provide the consistent access interface of data services, this paper designs the standardized description method of geographic data and the standardized encapsulation method of data-processing methods. Based on this, standardized data services can be generated and published in a data service container.

The workspace is used to provide solutions for the diverse data needs of requirements, as it can share and reuse the data configuration works of users. According to data requirements, users can connect the required data resource services and data-processing services from the data service container to the workspace, and then generate data-computing solutions by configuring data-processing logic online. In this way, a data-computing solution that meets the needs of users can be saved, which is convenient for users to modify and improve later. It can be found that in this process, users only need to pay attention to the data-processing logic itself, without paying attention to the source of data and the specific details of data processing, so as to improve the efficiency of users in using data.

The data-computing engine generates and executes computing tasks according to the user’s data-computing solution, and finally outputs the data required by the user. After the user has configured the data-computing solution in the workspace, the solution can be sent to the computing engine. According to the data resources and data-processing methods configured in the computing solution, the computing engine will first prepare the data-computing environment, which needs to consider the CPU (Central Processing Unit) architecture (such as x86 and ARM), memory size, software, and hardware environment of the computing server, and then synchronize the related data resources and processing methods to the specified computing server. Then, the computing task can be started for data computation. Obviously, in order to ensure the correct execution of the computing task, it is also necessary to develop a series of protection measures such as task computing-process monitoring, abnormal-situation handling, and computing-process optimization.

In summary, the framework proposed in this paper not only solves the problem of data sharing and computing, but also brings many benefits to both data providers and data users. For data providers, they can share their data resources on demand through data services (for example, they can define a data-processing service to extract part of the data from the specified data set for sharing), so that they can share the data and control the data, so as to ensure the security of data sharing. For users, data services shield the details of data sources and data processing, and when facing diverse data requirements, users only need to focus on data-processing logic, which greatly reduces the difficulty of using data.

3. Methodology

In this section, the detailed design and implementation method of the geographic data-sharing and computing framework will be introduced.

3.1. Design of the Data Service Container

In terms of technical implementation, a data service container is a lightweight hosting server for geographic data services, whose design goal is to provide data-access services to users through a series of common interfaces for heterogeneous geographic data resources.

3.1.1. UDX Model

The UDX model was proposed for the structured description of multi-source heterogeneous geographic data [29]. UDX represents the content information of geographic data through the combination of data nodes, as shown in Figure 2. Each UDX node represents a specific data type, such as “list” type, “float/real” type, “int” type, and so on. By organizing these nodes according to certain logic, arbitrary data content can be expressed. In Figure 2, the pollutant flux data stored in Excel are represented by two nested “DTKT_LIST” nodes.

As can be seen from Figure 2, unlike a specific data format, the UDX model is a descriptive model, which is only responsible for describing the content of the data, not how the data are organized and stored. In this way, users can obtain the data they want based on UDX, such as obtaining the pollutant data of “Yuxi River” in January, which can be obtained directly by accessing the UDX data node, without reading the Excel file and extracting the required data from it.

In addition, in the traditional data-sharing process, after users receive the raw data, they often need to configure the corresponding development environment to process and operate the data to meet their specific data needs, such as reading and writing Excel files requires downloading the corresponding development libraries and configuring the development environment. This process is often cumbersome and time consuming, and the data-processing methods developed are difficult to share with other users. However, if the user obtains UDX data, they can use the data read and write interface provided by UDX to process the data. This is obviously much easier than using the underlying data read and write interface, and the UDX-based data-processing methods are also easier to share with other users.

Therefore, for data sharing based on UDX, data providers, data users, and data processors will be separated. The different roles only need to focus on their responsibilities: the data provider only needs to be responsible for describing what data they share, the data user only needs to focus on how the data are processed to meet their data needs, and the data processor only needs to focus on how to implement UDX-based data processing and then share the processing method with the other two roles.

The construction of data services in this paper is also based on a UDX model, aiming to provide users with a general view of data operation, without paying attention to the details of the organization and storage of the underlying data, thus helping to improve the efficiency of data sharing.

3.1.2. Generation of Data Service

From the perspective of data use, in most cases, when users receive the data, they need to carry out specific data-processing operations to obtain the result data that meet their application requirements. Therefore, the data service proposed in this paper includes both a data resource service and data-processing service, as shown in Figure 3.

(1): Data resource service

The raw data are standardized through a UDX model, and then published as data services; such data services are called data resource services. As shown in Figure 2, if the pollutant flux data are published as a service, it is a data resource service, and users can directly access UDX nodes to obtain corresponding data resources.

Generally, geographic data come from a wide range of sources and are stored in a variety of ways, such as data stored in files, data accessed through API interfaces, and data in spatial data. But no matter how the data are organized and stored, there are always relevant data read and write interfaces that can retrieve the relevant data content from these original data sources. For example, common data formats such as GeoTIFF, Shapefile, NetCDF (Network Common Data Form), etc. have associated data read–write libraries such as GDAL, Geopandas, and the NetCDF read–write interface. For user-defined data formats, you can also write relevant code to read and write data. After the data content information is obtained, the UDX API interface can be used to map the data content information into UDX data nodes, and organize these data nodes according to certain logic to form a common data view that users can easily understand. At this point, the data resource service can be published, and the common data view can be used as the data content that the data resource service shares externally. When users access the data resource service, they can understand the data content provided by the UDX description view, and access the corresponding data node according to their own requirements. The service background will invoke the UDX API to read the relevant data node and return the node data content to the user.

(2): Data-processing service

When users need to obtain some specific data, and the existing data resource service cannot meet their needs, they can realize data customization by invoking the data-processing service. For example, the data resource service for pollutant data release in Figure 2 only provides the original pollutant data, but if the user needs statistical data such as the mean value, variance, and standard difference of pollutant flux at each site, the data service cannot be directly provided. At this time, it is necessary to call the data-processing service with statistical function, carry out numerical statistical processing by reading the data content in the data resource service, and then return the result to the user.

Data processing is a very broad concept, and any operation conducted on data can be called data processing. This paper focuses on three types of data-processing method: (a) Data extraction refers to the extraction of some data from the specified data, such as the extraction of some interesting research area data from a DEM data. (b) Data mapping refers to data exchange operations between raw data and UDX data nodes. In other words, behind the data resource services mentioned above is the data-mapping method at work. (c) Data refactoring can be broadly understood as processing one form of input data into another form of data. For example, elevation data stored in GeoTIFF can be processed into ASCII GRID files using data-reconstruction methods. The statistical processing of pollutant data mentioned above can also be considered as a data-reconstruction operation.

Different data-processing methods may be implemented in different ways, mainly in the programming language they use (such as Python, Java, C#, etc.), the way they are called (such as Python scripts, DLL, exe, jar, etc.), and the running environment (such as Python environment, NET framework, JDK, etc.). Since most programming languages support invoking on the command line, this paper describes the invoke interface of data-processing methods as a invoke-mode based on the command line, and passes the path of the data to be processed to the method in the form of command line parameters, so as to realize data processing.

Generally, for complex custom data formats, UDX works better because of the lack of a common library of data read and write methods. These custom data formats can be mapped to UDX data by data-mapping methods, and then data processing based on UDX data can directly prevent users from manipulating raw data. At this time, the data-processing method needs to support UDX read and write, so this part of the data-processing method needs to use UDX API to encapsulate its data read and write interface. In this way, the published data-processing service and data resource service can cooperate with each other, so as to achieve the diversified data requirements of users.

3.1.3. Access and Invocation of Data Service

After the data service is generated, the following questions need to be considered: How can users access and invoke the data service? How do data services perform? How does the container handle the exceptions during data service execution? To address these issues, remote access to data services, asynchronous execution, and runtime-monitoring methods are designed, as shown in Figure 4.

(1): Remote accessing

The sharing of data resources often involves multiple departments; for example, data centers usually have multiple subordinate units. So how to coordinate the data sharing between subordinate departments and leading departments as well as among subordinate departments is an urgent problem to be solved. The data service container designed in this paper supports deployment on multiple different service nodes, and data services in different nodes can be migrated to each other. For example, in the data container node A, there is a large-scale data service, but the related data-processing services are deployed in the container node B. In general, migrating data-processing methods is much more efficient than migrating data resources, so the data-processing service can be migrated from container node B to node A, so that users can invoke the data-processing service in container A to obtain the data they want.

(2): Asynchronous execution

Usually, the invocation of data services is a very time-consuming operation, whether the user directly requests the data resource service or invokes the data-processing service, it needs a certain execution time. Therefore, the invocation of a data service is implemented asynchronously. When the container receives the user’s invocation request, the container process will prepare for the execution of the data service based on the user’s input (e.g., input data files, input parameters, etc.), such as initializing the working directory, specifying the output directory, and determining the parameter passing order. The container process then creates a separate execution process that acts as a data service. At this point, the container process can continue to process other invocation requests without blocking, thus increasing the concurrency of data service invocations. When the data service is completed, the executing process notifies the container process and passes the results of the data service execution to the container process, which then forwards the output to the user. Since the data service is run in an independent process, even if an exception occurs during the data service operation, it will not affect the container process, thus ensuring the stability of the data service container.

(3): Service monitoring

In order to ensure the successful execution of data services, it is necessary to monitor the execution process of data services. The runtime monitoring of data services is the responsibility of the monitoring process. For the stability of the data service container, the monitoring process is also an independent process created by the container process, which is specifically responsible for the monitoring and exception handling of the execution process of the data service.

After the data service is started, the monitoring process first checks the input of the data service, such as checking whether the input data organization meets the requirements and whether the input control parameter is an invalid value. The monitoring process then continues to monitor for exceptions during the execution of the data service, such as data I/O exceptions (such as errors in reading and writing data), timeout exceptions (such as long waits caused by processing large data sets), and runtime exceptions (such as manipulating the wrong memory address in the code). In addition, the monitoring process will monitor the execution status of the data service in real time, such as the input data status, running status, output status, error status, etc., to feed back the execution information of the data service to the container process in real time. When the monitoring process detects that the data service is abnormal, it immediately terminates the data service-execution process and reports the exception information to the container process. In addition, the temporary files generated during the execution of the data service and the applied system resources will be cleared and released to ensure the efficiency of the container operation.

3.2. Design of Workspace

In general, the data service container provides data services for users in the form of a single service. In other words, users can only call one data service (data resource service or data-processing service) in the data service container each time. When users need to handle complex data requirements, they need to manually invoke data services several times to meet their data requirements, which is very inconvenient. Therefore, a place that can provide data access and configuration for users is needed, which is the workspace proposed in this paper. In the workspace, users can integrate required data resources and data-processing methods, and then customize their data requirements based on these data services. Figure 5 shows the design of the workspace.

3.2.1. Configuration of Data-Processing Workflow

When there are many data services distributed in the network space, how to find and use them in time is the concern of data users. Therefore, the registration mechanism of the data service is proposed in this paper. After a data provider deploys a data service container, he/she can decide whether the container provides data services on the Intranet or in an open network environment. When a data service container is authorized to be shared in an open network, it will be registered in the workspace. Any data service information updated in this service container in the future will be synchronized and updated to the workspace. In this way, the data service resources in the open network environment are recorded in the workspace in real time.

When users need to customize their data requirements, they need to create a workspace instance in the workspace. A workspace instance contains all the resources and tools to customize the current data requirements, including data resources, data-processing methods, and data configuration tools (such as workflow canvas, service-binding tool, etc.). Then, according to the requirements of data processing, users can visualize the data-processing logic through workflow canvas. For example, when users need to obtain DEM data of a target research area, they need to perform image mosaic, reprojection, and clipping operations on the original DEM data (which has been published as data services). Then, drag and drop three workflow nodes in the workflow canvas to represent the above three data-processing operations, respectively, and use arrows to connect these three nodes in sequence. At this time, the workflow construction of DEM data extraction in the target research area is completed. Moreover, when there are special data requirements, for example, missing values in DEM data need to be filled before DEM data clipping, the user can still drag and drop a node to the workflow canvas to represent data filling operations. Then, break the arrow line between the “reprojection” and “clipping” nodes, and reconnect the “reprojection”, “data filling”, and “clipping” nodes. Thus, a data-processing workflow that meets the specific requirements is configured.

Each node in the workflow represents an abstract data-processing process that is not capable of execution. Users can customize any data-processing nodes and connect them according to the sequence of data-processing logic to form an abstract data-processing logic workflow. Finally, a data-processing workflow that meets the user’s data customization needs is built. However, the workflow cannot perform real data-processing operations until the corresponding data service is bound to each node through the service-binding tool.

3.2.2. Generation of Data-Computing Solutions

When the data-processing workflow is configured, it can only represent the data-processing logic, and cannot really carry out data processing. At this point, it is necessary to bind data-processing services and data resource services to each logical node in the workflow. Since the workspace instance records all data services in the open network space, users can access and select the relevant data services directly and bind them to the corresponding workflow node. However, in practice, the existing data services in the network may not all meet the needs of users to configure data. Users can only associate and bind those data services that meet the requirements to the data-processing node, and for those missing data services, users can deploy their own data service container to publish the relevant data service for the corresponding data-processing nodes. For example, when users need to perform grid segmentation for a certain research area, if there is no corresponding data service in the network, users need to encapsulate the grid-partitioning algorithm (based on open-source or self-implemented code) and publish these algorithms in their own data service container as data-processing services. Then, when the container is registered with the workspace, users can access these grid-partitioning algorithm services in the workspace instance.

When the data service binding is complete, a data-computing solution that can actually run is formed. Figure 6 shows an example of an XML expression of a data-computing solution.

An executable computing solution consists of three main parts: (1) The “DataCollection” node refers to the collection of data resource services required in the current data-processing workflow. It consists of several “Data” nodes. Each “Data” node represents a specific data resource service. The “id” attribute is used to uniquely identify the data resource service, and the “source” attribute indicates the network host address of the data service container where the current data resource service resides. Thus, a data resource service can be uniquely located through the “id” and “source” attributes. (2) The “MethodCollection” node refers to the collection of data-processing services required in the current data-processing workflow. It consists of several “Method” nodes. Each “Method” node consists of “InputCollection”, “ControlParams”, and “OutputCollection” nodes. The “InputCollection” node consists of several “Input” nodes. Each “Input” node represents an input to the current data-processing service, and the “id” attribute of the node corresponds to the “id” attribute of the “Data” node. The “ControlParams” node consists of several “Param” nodes. Each “Param” node represents the control parameter of the current data-processing service, where the “type” attribute and the “value” attribute indicate the type and value of the control parameter, respectively. The “OutputCollection” node consists of several “Output” nodes. Each “Output” node represents the output of the current data-processing method; its “id” attribute is used to uniquely identify the output, and the “target” attribute is used to indicate the server where the output is stored. (3) The “LinkCollection” node records the data-processing logic of the current data-processing workflow, which is composed of several “Link” nodes. Each “Link” node records the connection sequence between two nodes in the data-processing workflow. The “from” attribute indicates the “id” of the “start” node, and the “to” attribute indicates the “id” of the “end” node. Thus, when the data-computing solution is executed, the execution order of the data-processing services can be determined by traversing the “LinkCollection” node.

At this point, an executable data-computing solution is generated. A computing solution is used to provide data resources for specific data requirements, and users can save the computing solution to cope with changes in data requirements. At the same time, the computing solution can be easily shared with other users.

3.3. Design of Data-Computing Engine

The data-computing engine is designed for the execution of the data-computing solution, and its main goal is to ensure the safe and stable execution of the data-computing solution. Therefore, the core work content of the data-computing engine is to accurately analyze the requirements of the running environment before the execution of the computing solution, and to monitor and handle the exception in real time when the computing solution is running.

3.3.1. Generation of Data-Computing Task

To execute a data-computing solution, it is first necessary to understand the computing environment requirements for these data-processing tasks, that is, to create data-computing tasks. When the user calls the execution command of the data-computing solution in the workspace, the workspace will transfer the computing solution to the computing engine.

Firstly, the computing engine will traverse the entire data-computing solution node and collect metadata information of the data service corresponding to each node. The metadata of data resource services usually includes the size of the disk space occupied by the data resource, which can be used as a basis for allocating the size of the hard disk space for it. The metadata of data-processing services usually includes the hardware environment (such as CPU architecture, memory size, etc.) and software environment (such as the requirement to install Python 3.7 environment, etc.) that the execution of the data-processing program depends on. Based on this software and hardware dependency information, the computing engine will search for suitable server nodes from the network (which have already been registered in the computing engine). Server node matching is a complicated process, which needs a specially designed matching algorithm. The matching algorithm will score each server node based on the software and hardware requirements of the current computing task, and provide a list of candidate servers sorted by their scores. If a suitable server cannot be found, the user must manually configure the corresponding execution environment.

Subsequently, each data-processing service will generate a data-computing task, as shown in Figure 7. The “serviceId” attribute of each “Task” node represents the data-processing service to be called by the task, and the “order” attribute represents the execution order of the task. For example, a value of “1” for “order” indicates that the task is executed first, and so on. The “Servers” Node indicates the server where the current task is running. There may be multiple “Server” nodes under the “Servers” node (that is, multiple servers meet the execution conditions of the current task). Each “Server” node has a “score” property that indicates how well the server node meets the computing environment requirements of the current task. Typically, the current computing task will select a server node with a high score as its execution environment. The “Dependency” node indicates the software and hardware requirements of the current computing task. For example, each “Environment” node under the “Software” node indicates the required software environment information. The “CPU”, “Memory”, “HardDisk”, and “Network” nodes under the “Hardware” node indicate the required hardware information.

3.3.2. Execution of Data-Computing Tasks

After all computing tasks are generated, they can be prepared for execution, as shown in Figure 8. All computing tasks are stored in the task queue in order of priority. If the computing tasks have the same priority, they can be executed at the same time. Otherwise, the tasks with higher priority are executed first. When a task obtains the execution right, the resource files (such as data resources, data-processing method resources, etc.) required by the computing task are first synchronized from the data service container to the specified computing server. Due to the possibility of multiple tasks running simultaneously on a computing server, when system resources are insufficient to support the current task, it can lead to task failure. Therefore, before executing each task, it is necessary to determine whether the current server resources can support its execution. If the judgment result is true, the computing task can be started. Otherwise, the task will continue to wait for the appropriate execution time.

In order to avoid the interference between different computing tasks, this paper proposes a virtual computing container to manage the resources involved in the running of each computing task, including hardware resources such as CPU, memory, hard disk space, and related software resources such as Java runtime. The container process will monitor these software and hardware environments to ensure the correct execution of computing tasks. It also reports to the computing engine in real time when system resources are insufficient. In addition, the container process also manages the executing process and the monitoring process, with the former responsible for running computing tasks and the latter responsible for monitoring task execution.

When a data-computing task is started, its entire execution period is monitored by the monitoring process. The monitoring process determines the running status of the computing task by obtaining its status information (e.g., running, exception, finished, etc.). When a computing task encounters an exception, the monitoring process will check the running logs to determine the cause of the exception. For example, if the exception is due to insufficient system resources (such as memory, hard disk size), the monitoring process will re-run the task at the appropriate time. If the exception is caused by illegal input of a computing task, the monitoring process will directly report the exception to the user. In this case, all computing tasks that are executed after this computing task are suspended until the user reconfigures correct data for this computing task.

4. Study Case

Aiming at the proposed framework of geographic data sharing and computing, a prototype system is designed and developed. The prototype system realizes the core flow of geographic data sharing and computing. Then, a case of simulating the watershed runoff process with a SWAT model (2012 version) is designed. This case is based on the developed prototype system to prepare input data, so as to verify the feasibility of the data-sharing and computing framework proposed in this paper.

4.1. Construction of Prototype System

The prototype system adopts the most popular Web development technology, and uses Java (JDK 1.8) as the main development language, as shown in Figure 9. In order to improve the scalability and maintainability of the prototype system, the prototype system is implemented using the microservice architecture [33].

The three components of the proposed framework are, respectively, realized as “microservices”. The “Data service container microservice” mainly realizes the functions of service release, access, and execution. The “Workspace microservice” mainly realizes the functions of workspace creation, data configuration, and computing-solution generation. The “Data-computing engine microservice” mainly realizes the functions of computing-task generation, resource synchronization, and computing-task execution. Each microservice can be deployed and run independently, and microservices can communicate with each other based on message passing, including communication based on HTTP REST [34], gRPC [35], message queue [36], and WebSocket [37].

This prototype system uses “Spring Boot” (2.2.6 version) as the Web backend development framework and “Spring Cloud” (2.2.1 version) as the microservice development framework. The former provides a framework for developers to quickly develop Web applications, which can greatly improve development efficiency. “Spring Cloud” is based on “Spring Boot” and provides a framework for developers to quickly develop distributed system infrastructure.

Each microservice uses the MySQL (8.4.2 version) database [38] to store application data: data service container (such as service entries, service run records, service input and output data, etc.), workspace (such as workspace instances, data-computing solution, etc.), and data-computing engine (data-computing tasks, task-execution records, etc.). The Redis (3.2.9 version) [39], as an in-memory database, is mainly used to improve system performance and optimize user experience. For example, when the data service is bound to each node of the data-processing workflow in the workspace, the data service queried in the network can be cached into the Redis database, and the cached data can be read directly when the next access is needed, without having to look up from the network again.

The front-end of this prototype system uses Vue.js (3.8.0 version) as the development framework, and uses “Element Plus” library (2.3.9 version) to build front-end pages. Node.js (22.11.0 version) is mainly used to build and manage front-end projects. In this prototype system, the data service container, workspace, and data-computing engine are implemented by corresponding front-end projects.

4.2. Publication of SWAT-Related Data Service

The SWAT (Soil and Water Assessment Tool) model is a distributed hydrological model developed by the Agricultural Research Center (USDA-ARS) of the United States Department of Agriculture [40]. It can continuously simulate hydrological and water-quality processes such as soil evapotranspiration, surface runoff, groundwater and surface interaction, and pollutant migration on a daily scale [41,42]. The SWAT model has been widely used in the world for its powerful functions, among which the runoff simulation is one of the most basic and important functions [43,44].

The western part of Taihu Lake Basin, located in the northwest corner of Taihu Lake Basin in southeast China, with a total area of about 7722 km², is an important upstream area of Taihu Lake Basin and plays a key role in regulating the water quality and quantity of Taihu Lake [45]. In this section, the SWAT model is used to simulate the runoff process in this area, and the whole process of data sharing, computing, and application is demonstrated, so as to verify the feasibility of the geographic data-sharing and calculation framework proposed in this paper.

The main input data for the runoff simulation using the SWAT model include DEM data of the study area, land-use-type data, soil-type data, and meteorological data (such as precipitation data, temperature data, relative humidity data, solar radiation data, wind speed data), etc. The data sources in this case are shown in Table 1.

Obviously, these data sources come from different data-sharing institutions, and the data are organized and stored in different ways. In order to reduce the difficulty for users to use these data, we first use UDX to structurally describe the content of these data. As shown in Figure 10, the raw data of the rainfall station (stored in a text file) is described by UDX. From the UDX description, it is clear to see the data content information such as the station ID, station name, station coordinate, and station rainfall. Moreover, users only need to access UDX data nodes to obtain the corresponding data, without reading the original text file. In this way, the rainfall station data can be published as a data resource service.

In order to drive the SWAT model, some data-processing methods need to be developed to process the collected data resources, so as to meet the execution requirements of the SWAT model. For example, using a clipping algorithm to clip the DEM data of the study area from the original DEM data, constructing land-use-type index tables and soil-type index tables, generating weather-generator data, etc. By publishing these data-processing methods as data-processing services, users only need to invoke these services to process data, which greatly improves the efficiency of preparing data for the SWAT model. Figure 11 shows some of the data-processing services published in this case. Since some open-source data-processing libraries (such as GDAL/OGR, GeoTools, pro4j) provide common data-processing methods, such as data clipping, projection transformation, etc., in this case, these open-source libraries are directly packaged and published as corresponding data-processing services.

4.3. Data Configuration and Computing

Based on the published data services, users can configure input data for the SWAT model in the workspace. As shown in Figure 12, the workspace provides a graphical data configuration method. Each graphical node corresponds to a data-processing method, and the arrows between nodes represent the logical sequence of data processing. The data configuration of this case mainly includes eight steps: (1) DEM data processing in the study area. Through image mosaic, projection, and clipping, the collected original DEM data are processed into DEM data of the study area. (2) Land-use-type data processing. Through data format conversion, clipping, land-type reclassification, and projection operation, the collected original land-use-type data were processed into land-use-type data within the study area. (3) Construction of land-use-type index table. Based on the data of land-use type output in Step 2, the index table of land-use type is constructed. (4) Soil data processing. Similar to Step 2, the collected soil data are processed into soil data within the study area. (5) Construction of soil physical property database. Based on the output of Step 4, the soil physical property database is constructed. (6) Construction of soil-type index table. Based on the output of Step 5, the soil-type index table is constructed. (7) Construction of weather generator based on meteorological data. (8) Construction of daily meteorological data file. Each step consists of related data-processing nodes, each of which performs different data-processing operations to obtain different types of data resources. For example, Step 1 consists of three data-processing nodes (i.e., Mosaic, Projection, and Clip) that are bound to corresponding data-processing services. After the calculation of Step 1 is completed, the DEM data of the study area can be obtained, and the data can be directly used as the input of the SWAT model without additional processing. Similarly, the other steps are configuring the corresponding data.

In this way, when all the data-processing nodes are bound to the corresponding data-processing service, the data-processing workflow becomes an executable data-computing solution. Then the workspace sends the computing solution to the data-computing engine, and the data can be calculated online. Additionally, from the dependency relationships between data-processing steps in Figure 12, it can be found that steps 1, 2, 4, 7, and 8 can be executed concurrently for data processing, while Step 3 must be run after Step 2 has been completed, Step 5 must be run after Step 4, and Step 6 must be run after Step 5.

When the data-computing solution is completed, the input data of the SWAT model can be obtained. In this case, ArcSWAT (the plug-in of SWATmodel in ArcGIS) is used to simulate runoff. Figure 13 shows the simulation output of the SWAT model and its visual presentation. In this case, the output of the SWAT model is also described by UDX, and the average daily flow out of the river reach can be obtained by accessing the “FLOW_OUTcms” node of UDX, without reading the complex raw output file of the SWAT model. Users can then conduct relevant analysis based on the resulting data. In this case, a script is written to visually analyze the resulting data.

5. Conclusions and Future Work

Based on the background of big data era, aiming at the problem that multi-source heterogeneous geographic data are difficult to share, this paper proposes a service-oriented geographic data-sharing and computing framework, and designs and implements three core components of this framework. The “Data service container” component publishes multi-source heterogeneous geographic data resources as data services and uses UDX to structurally describe heterogeneous data, providing users with a consistent data access interface and reducing the difficulty of accessing and understanding data. In addition, the process of users using data services will be recorded, which ensures the security of data to a certain extent. The “Workspace” component provides users with a series of methods and tools to customize data requirements, so that users only need to pay attention to the data-processing logic, without paying attention to the details of data processing, which greatly provides users with the efficiency of data processing. The “Data-computing engine” component provides an online execution environment for data-computing tasks, preventing users from manually configuring complex software and hardware environments. Based on the above design, this paper develops a prototype system, and completes a case of SWAT model data acquisition, processing, and application based on this prototype system. The case shows that the proposed framework can improve the efficiency of data providers in sharing data, and reduce the difficulty of users using data, thereby improving the efficiency of data applications.

At present, the framework proposed in this paper has only been preliminarily implemented, and further exploration is needed in the following aspects in the future.

(1): Quality evaluation of data services. When the data container is open to the public, how to ensure the quality of the data shared by the data provider is the problem that must be faced. The relevant data quality evaluation system must be designed to ensure that users can access high-quality data services. At that time, high-quality data services will be automatically recommended to users when configuring computing solutions.
(2): Visualization methods should be introduced in data processing. Although it is now possible to configure data-processing logic based on graphical workflow tools, it is still not possible to visualize the results of data processing online. It is also a complex and cumbersome task for users to manually visualize data.
(3): Implementation of online programming environment. At present, the nodes in the data-processing workflow can only bind data services, but not data-processing scripts. For complex data requirements, users can only re-publish the relevant data-processing services and then associate them with the relevant data-processing nodes, which is very inconvenient.

Author Contributions

Conceptualization, J.W.; Funding acquisition, Y.W. and M.C.; Methodology, J.W. and Y.W.; Software, L.S. and X.Z.; Supervision, Z.M. and M.C.; Writing—original draft, J.W.; Writing—review and editing, J.W. and K.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant No. 2022YFF0711604) and the National Natural Science Foundation of China (Grant No. 41930648 and 42071361).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors appreciate the detailed suggestions from the anonymous reviewers, and also express heartfelt thanks to the other members of the OpenGMS team [http://opengmsteam.com/ (accessed on 10 October 2024)].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Granell, C.; Schade, S.; Ostländer, N. Seeing the forest through the trees: A review of integrated environmental modelling tools. Comput. Environ. Urban Syst. 2013, 41, 136–150. [Google Scholar] [CrossRef]
Lü, G.; Batty, M.; Strobl, J.; Lin, H.; Zhu, A.X.; Chen, M. Reflections and speculations on the progress in Geographic Information Systems (GIS): A geographic perspective. Int. J. Geogr. Inf. Sci. 2019, 33, 346–367. [Google Scholar] [CrossRef]
Voinov, A.; Cerco, C. Model integration and the role of data. Environ. Model. Softw. 2010, 25, 965–969. [Google Scholar] [CrossRef]
Singleton, A.; Arribas-Bel, D. Geographic data science. Geogr. Anal. 2021, 53, 61–75. [Google Scholar] [CrossRef]
Beran, B.; Piasecki, M. Engineering new paths to water data. Comput. Geosci. 2009, 35, 753–760. [Google Scholar] [CrossRef]
Chen, M.; Sheng, Y.; Wen, Y.; Su, H. Geographic problem-solving oriented data representation model. J. Geo-Inf. Sci. 2009, 11, 333–337. [Google Scholar] [CrossRef]
Chen, M.; Sheng, Y.; Wen, Y.; Tao, H.; Guo, F. Semantics guided geographic conceptual modeling environment based on icons. Geogr. Res. 2009, 28, 705–715. [Google Scholar]
Abdallah, A.M.; Rosenberg, D.E. A data model to manage data for water resources systems modeling. Environ. Model. Softw. 2019, 115, 113–127. [Google Scholar] [CrossRef]
The National Earth System Science Data Center. Available online: https://www.geodata.cn/data/ (accessed on 10 October 2024).
The ChinaGEOSS Data Sharing Network. Available online: https://noda.ac.cn/ (accessed on 10 October 2024).
The National Meteorological Science Data Center. Available online: https://data.cma.cn/ (accessed on 10 October 2024).
The National Marine Data Center. Available online: https://mds.nmdis.org.cn/ (accessed on 10 October 2024).
Geospatial Data Cloud. Available online: https://www.gscloud.cn/ (accessed on 10 October 2024).
HydroShare. Available online: https://www.hydroshare.org/ (accessed on 10 October 2024).
Yang, Y.; Xiong, C.; Zhuo, J.; Cai, M. Detecting home and work locations from mobile phone cellular signaling data. Mob. Inf. Syst. 2021, 2021, 5546329. [Google Scholar] [CrossRef]
Zhang, C.; Hu, Y.; Adams, M.D.; Bu, R.; Xiong, Z.; Liu, M.; Du, Y.; Li, B.; Li, C. Distribution patterns and influencing factors of population exposure risk to particulate matters based on cell phone signaling data. Sustain. Cities Soc. 2023, 89, 104346. [Google Scholar] [CrossRef]
Harvey, F.; Tulloch, D. Local-government data sharing: Evaluating the foundations of spatial data infrastructures. Int. J. Geogr. Inf. Sci. 2006, 20, 743–768. [Google Scholar] [CrossRef]
Yang, C.; Zhu, C.; Wang, Y.; Rui, T.; Zhu, J.; Ding, K. A robust watermarking algorithm for vector geographic data based on QIM and matching detection. Multimed. Tools Appl. 2020, 79, 30709–30733. [Google Scholar] [CrossRef]
Ren, N.; Wang, H.; Chen, Z.; Zhu, C.; Gu, J. A multilevel digital watermarking protocol for vector geographic data based on blockchain. J. Geovisualization Spat. Anal. 2023, 7, 31. [Google Scholar] [CrossRef]
Langsdale, S.; Beall, A.; Bourget, E.; Hagen, E.; Kudlas, S.; Palmer, R.; Tate, D.; Werick, W. Collaborative modeling for decision support in water resources: Principles and best practices. J. Am. Water Resour. Assoc. 2013, 49, 629–638. [Google Scholar] [CrossRef]
Tarboton, D.G.; Idaszak, R.; Horsburgh, J.S.; Heard, J.; Ames, D.; Goodall, J.L.; Band, L.; Merwade, V.; Couch, A.; Arrigo, J.; et al. HydroShare: Advancing collaboration through hydrologic data and model sharing. In Proceedings of the 7th International Congress on Environmental Modelling and Software, San Diego, CA, USA, 15–19 June 2014. [Google Scholar]
Ma, Z.; Chen, M.; Zheng, Z.; Yue, S.; Zhu, Z.; Zhang, B.; Wang, J.; Zhang, F.; Wen, Y.; Lü, G. Customizable process design for collaborative geographic analysis. GISci. Remote Sens. 2022, 59, 914–935. [Google Scholar] [CrossRef]
Balley, S.; Parent, C.; Spaccapietra, S. Modelling geographic data with multiple representations. Int. J. Geogr. Inf. Sci. 2004, 18, 327–352. [Google Scholar] [CrossRef]
Guillot, G.; Renaud, S.; Ledevin, R.; Michaux, J.; Claude, J. A unifying model for the analysis of phenotypic, genetic, and geographic data. Syst. Biol. 2012, 61, 897–911. [Google Scholar] [CrossRef]
Xiao, F. A big spatial data processing framework applying to national geographic conditions monitoring. The International Archives of the Photogrammetry. Remote Sens. Spat. Inf. Sci. 2018, 42, 1945–1950. [Google Scholar]
Shan, J.; Qin, K.; Huang, C.; Hu, X.; Yu, Y.; Hu, Q.; Lin, Z.; Chen, J.; Jia, T. Methods of crowd sourcing geographic data processing and analysis. Geomat. Inf. Sci. Wuhan Univ. 2014, 39, 390–396. [Google Scholar]
Ragimova, N.A.; Abdullayev, V.H.; Jamalova, J. Overview of Methods of Collection and Processing of Geographic Data. Int. J. Innov. Res. Rev. 2021, 5, 6–9. [Google Scholar]
OpenGMS. Available online: https://geomodeling.njnu.edu.cn/ (accessed on 10 October 2024).
Yue, S.; Wen, Y.; Chen, M.; Lu, G.; Hu, D.; Zhang, F. A data description model for reusing, sharing and integrating geo-analysis models. Environ. Earth Sci. 2015, 74, 7081–7099. [Google Scholar] [CrossRef]
Chen, M.; Voinov, A.; Ames, D.P.; Kettner, A.J.; Goodall, J.L.; Jakeman, A.J.; Barton, M.C.; Harpham, Q.; Cuddy, S.M.; DeLuca, C.; et al. Position paper: Open web-distributed integrated geographic modelling and simulation to enable broader participation and applications. Earth-Sci. Rev. 2020, 207, 103223. [Google Scholar] [CrossRef]
Wang, J.; Chen, M.; Lü, G.; Yue, S.; Chen, K.; Wen, Y. A study on data processing services for the operation of geo-analysis models in the open web environment. Earth Space Sci. 2018, 5, 844–862. [Google Scholar] [CrossRef]
Wang, J.; Chen, M.; Lü, G.; Yue, S.; Wen, Y.; Lan, Z.; Zhang, S. A data sharing method in the open web environment: Data sharing in hydrology. J. Hydrol. 2020, 587, 124973. [Google Scholar] [CrossRef]
Microservices. Available online: https://www.ibm.com/topics/microservices (accessed on 10 October 2024).
HTTP REST. Available online: https://restfulapi.net/ (accessed on 10 October 2024).
gRPC. Available online: https://grpc.io/ (accessed on 10 October 2024).
Message Queue. Available online: https://www.ibm.com/topics/message-queues (accessed on 10 October 2024).
WebSocket. Available online: https://developer.mozilla.org/zh-CN/docs/Web/API/WebSocket (accessed on 10 October 2024).
MySQL. Available online: https://www.mysql.com/ (accessed on 10 October 2024).
Redis. Available online: https://redis.io/ (accessed on 10 October 2024).
SWAT. Available online: https://swat.tamu.edu/ (accessed on 10 October 2024).
Douglas-Mankin, K.R.; Srinivasan, R.; Arnold, J.G. Soil and Water Assessment Tool (SWAT) model: Current developments and applications. Trans. ASABE 2010, 53, 1423–1431. [Google Scholar] [CrossRef]
Gassman, P.W.; Sadeghi, A.M.; Srinivasan, R. Applications of the SWAT model special section: Overview and insights. J. Environ. Qual. 2014, 43, 1–8. [Google Scholar] [CrossRef] [PubMed]
Li, M.; Wang, H.; Du, W.; Gu, H.; Zhou, F.; Chi, B. Responses of runoff to changes in climate and human activities in the Liuhe River Basin, China. J. Arid. Land 2024, 16, 1023–1043. [Google Scholar] [CrossRef]
Shekar, P.R.; Mathew, A.; Yeswanth, P.V.; Deivalakshmi, S. A combined deep CNN-RNN network for rainfall-runoff modelling in Bardha Watershed, India. Artif. Intell. Geosci. 2024, 5, 100073. [Google Scholar] [CrossRef]
Yan, X.; Xia, Y.; Ti, C.; Shan, J.; Wu, Y.; Yan, X. Thirty years of experience in water pollution control in Taihu Lake: A review. Sci. Total Environ. 2024, 914, 169821. [Google Scholar] [CrossRef]
High-Precision Land Use Data of Tsinghua University. Available online: https://data-starcloud.pcl.ac.cn/ (accessed on 10 October 2024).
HWSD Global Soil Database. Available online: https://gaez.fao.org/pages/hwsd (accessed on 10 October 2024).
CMADSV1.0 (Chinese). Available online: http://www.cmads.org/ (accessed on 10 October 2024).

Figure 1. Design of the proposed data-sharing and computing framework.

Figure 2. Example of using UDX to describe hydrological data.

Figure 3. The process of data service generation.

Figure 4. The process of data service invoking.

Figure 5. Design of workspace.

Figure 6. The XML expression of a data-computing solution.

Figure 7. The XML description of data-computing tasks.

Figure 8. The execution process of data-computing tasks.

Figure 9. The implementation of the prototype system.

Figure 10. Raw data of the rainfall station and its UDX description.

Figure 11. List of data-processing services in the data service container.

Figure 12. Data configuration in the workspace.

Figure 13. The output of the SWAT model and its visualization.

Table 1. Relevant data sources for a SWAT-based runoff simulation.

Data Type	Data Source
DEM	Geospatial data cloud [13]
Land-use data	High-precision land-use data of Tsinghua University [46]
Soil data	HWSD Global Soil Database [47]
Meteorological data	CMADSV1.0 (Chinese) [48]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Shi, L.; Zhang, X.; Xu, K.; Ma, Z.; Wen, Y.; Chen, M. Research on Service-Oriented Sharing and Computing Framework of Geographic Data for Geographic Modeling and Simulation. Appl. Sci. 2024, 14, 11983. https://doi.org/10.3390/app142411983

AMA Style

Wang J, Shi L, Zhang X, Xu K, Ma Z, Wen Y, Chen M. Research on Service-Oriented Sharing and Computing Framework of Geographic Data for Geographic Modeling and Simulation. Applied Sciences. 2024; 14(24):11983. https://doi.org/10.3390/app142411983

Chicago/Turabian Style

Wang, Jin, Lingkai Shi, Xuan Zhang, Kai Xu, Zaiyang Ma, Yongning Wen, and Min Chen. 2024. "Research on Service-Oriented Sharing and Computing Framework of Geographic Data for Geographic Modeling and Simulation" Applied Sciences 14, no. 24: 11983. https://doi.org/10.3390/app142411983

APA Style

Wang, J., Shi, L., Zhang, X., Xu, K., Ma, Z., Wen, Y., & Chen, M. (2024). Research on Service-Oriented Sharing and Computing Framework of Geographic Data for Geographic Modeling and Simulation. Applied Sciences, 14(24), 11983. https://doi.org/10.3390/app142411983

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Service-Oriented Sharing and Computing Framework of Geographic Data for Geographic Modeling and Simulation

Abstract

1. Introduction

2. Framework Design

3. Methodology

3.1. Design of the Data Service Container

3.1.1. UDX Model

3.1.2. Generation of Data Service

3.1.3. Access and Invocation of Data Service

3.2. Design of Workspace

3.2.1. Configuration of Data-Processing Workflow

3.2.2. Generation of Data-Computing Solutions

3.3. Design of Data-Computing Engine

3.3.1. Generation of Data-Computing Task

3.3.2. Execution of Data-Computing Tasks

4. Study Case

4.1. Construction of Prototype System

4.2. Publication of SWAT-Related Data Service

4.3. Data Configuration and Computing

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI