"Datafeeder" is geOrchestra's backend RESTful service to upload file based datasets and publish them to GeoServer and GeoNetwork in one shot.
The separate front-end UI service provides the wizard-like user interface to interact with this backend.
In order to integrate the Datafeeder with geOrchestra's data-api, you will have to follow the next sections, depending on the version of the data-api you are using.
In datafeeder.properties:
datafeeder.publishing.ogcfeatures.public-url=https://${domainName}/data/ogcapi
JVM arguments, enable data-api-schemas profile and set the additional configuration file:
-Dspring.profiles.active=georchestra,data-api-schemas -Dspring.config.additional-location=file:/etc/georchestra/data-api/application.yaml
Build the project:
To compile and run the unit and integration tests:
georchestra$ mvn clean install -f datafeeder/
Or
georchestra$ cd datafeeder
datafeeder$ mvn clean install
To build the datafeeder application and its docker image:
georchestra$ make docker-build-datafeeder
Use the following maven properties to skip tests and/or integration tests:
-DskipTests
skips both unit and integration tests-DskipITs
skips only integration tests
For integration testing, some external services are required:
- A geOrchestra GeoServer instance
- A geOrchestra GeoNetwork instance
- A PostgreSQL database with PostGIS extension, for which we're using geOrchestra's
database
docker image
There is a docker composition with just the required external services in the docker-compose.yml
file.
Prior to launch the integration tests, you will have to set up the provided docker composition by hand, as follows:
$ docker-compose -f docker-compose.yml up -d
With the services from the composition in place, run the tests as many times as needed from the IDE or the console:
$ mvn verify
The integration tests ought to be written in a way that support multiple runs without re-initializing the external services state (for example, randomizing database schema names when going to create a schema and such).
database: while running from the datafeeder/integration
branch, and until the work is merged to master, you'll need to build the database
docker image in addition to datafeeder's, for the postgres datafeeder
schema to be initialized, and prune the database volume.
georchestra$ make docker-build-database
To build datafeeder's docker image:
georchestra$ make docker-build-datafeeder
georchestra$ docker images|grep datafeeder
georchestra/datafeeder 20.2-SNAPSHOT a2ca96143b9f 12 seconds ago 376MB
At this point, the service can run as a geOrchestra dockerized service, as part of its docker composition, or standalone for development purposes.
With the service's REST API being defined as an Open API 3 specification, a swagger-ui user interface is provided when browsing to /import
(e.g. http://localhost:8080/import/ when running standalone, https://georchestra.mydomain.org/import when running within the docker composition - must log in first - )
Running as a geOrchestra service requires a clone of geOrchestra's docker repository git@github.com:georchestra/docker.git
.
At this time, being not part of the official distribution, you need to switch to the datafeeder
branch, which in turn will set the git config/
submodule to the appropriate datafeeder branch, so that the config/datafeeder/datafeder.properties
file exists.
git clone git@github.com:georchestra/docker.git
git checkout datafeeder
git submodule update --init
Run geOrchestra with datafeeder integrated:
docker compose up -d
geOrchestra's "security proxy" API Gateway service has been configured to redirect all calls to /datafeeder/**
to this service.
Open https://georchestra-127-0-1-1.traefik.me/ in your browser.
Log in with these credentials:
testuser
/testuser
testadmin
/testadmin
Once logged in, datafeeder's OpenAPI test UI is available at https://georchestra-127-0-1-1.traefik.me/datafeeder
Run from within the datafeeder
root folder with:
docker-compose -f docker-compose.yml up -d
mvn spring-boot:run -Dspring-boot.run.profiles=georchestra,it
or create an equivalent run configuration in your IDE with org.georchestra.datafeeder.app.DataFeederApplication
as the application's main class.
Then datafeeder
should start and run at http://localhost:8080/datafeeder/
The following is a simple step by step guide to do manual testing with curl
.
For a more complete API description, consult the api.yaml OpenAPI 3 definition.
You can also access the Swagger UI by browsing to http://localhost:8080/datafeeder.
The /datafeeder/upload
endpoint receives a number of files, identifies which ones are geospatial datasets, starts up an asynchronous analysis process, and returns the initial job state where to get the job identifier as a UUID.
For example, given a shapefile:
wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
unzip ne_10m_admin_0_countries.zip
ls *shp
ne_10m_admin_0_countries.shp
Launch the upload and analysis process with:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
-X POST \
http://localhost:8080/datafeeder/upload \
-F filename=@ne_10m_admin_0_countries.shp \
-F filename=@ne_10m_admin_0_countries.dbf \
-F filename=@ne_10m_admin_0_countries.shx \
-F filename=@ne_10m_admin_0_countries.prj
{
"_links" : {
"self" : {
"href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0"
}
},
"jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
"progress" : 0.0,
"status" : "PENDING",
"datasets" : [ ]
}
Then poll the job status with the returned jobId
:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \\
http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0
{
"_links" : {
"self" : {
"href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0"
}
},
"jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
"progress" : 1.0,
"status" : "DONE",
"datasets" : [ {
"name" : "ne_10m_admin_0_countries",
"status" : "DONE",
"featureCount" : 258,
"nativeBounds" : {
"crs" : {
"srs" : "EPSG:4326",
"wkt" : "GEOGCS[\"GCS_WGS_1984\", DATUM[\"D_WGS_1984\", SPHEROID[\"WGS_1984\", 6378137.0, 298.257223563]], PRIMEM[\"Greenwich\", 0.0], UNIT[\"degree\", 0.017453292519943295], AXIS[\"Longitude\", EAST], AXIS[\"Latitude\", NORTH]]"
},
"minx" : -179.99999999999991,
"maxx" : 180.0,
"miny" : -89.99999999999994,
"maxy" : 83.63410065300008
},
"encoding" : "ISO-8859-1"
} ]
}
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/ne_10m_admin_0_countries/bounds
{
"crs" : {
"srs" : "EPSG:4326",
"wkt" : "GEOGCS[\"GCS_WGS_1984\", DATUM[\"D_WGS_1984\", SPHEROID[\"WGS_1984\", 6378137.0, 298.257223563]], PRIMEM[\"Greenwich\", 0.0], UNIT[\"degree\", 0.017453292519943295], AXIS[\"Longitude\", EAST], AXIS[\"Latitude\", NORTH]]"
},
"minx" : -179.99999999999991,
"maxx" : 180.0,
"miny" : -89.99999999999994,
"maxy" : 83.63410065300008
Use a GET
request on /datafeeder/updload/{id}/{typeName}/sampleFeature
to get a sample GeoJSON feature from an uploaded dataset:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/ne_10m_admin_0_countries/sampleFeature
{
"type": "Feature",
"crs": {
"type": "name",
"properties": {"name": "EPSG:4326"}
},
"bbox": [95.01270592500003,-10.922621351999908,140.97762699400005,5.910101630000042],
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[[[117.7036,4.1634], ...,[127.1304,4.7744]]]
]
},
"properties": {
"featurecla": "Admin-0 country",
"scalerank": 0,
"LABELRANK": 2,
"NAME_AR": "Ø¥Ù\u0086دÙ\u0088Ù\u0086Ù\u008AسÙ\u008Aا",
"NAME_BN": "à¦\u0087নà§\u008Dদà§\u008Bনà§\u0087শিয়া",
"NAME_DE": "Indonesien",
"NAME_EN": "Indonesia",
"NAME_ES": "Indonesia",
"NAME_FA": "اÙ\u0086دÙ\u0088Ù\u0086زÛ\u008C",
"NAME_FR": "Indonésie",
"NAME_EL": "Î\u0099νδονηÏ\u0083ία",
"NAME_HE": "×\u0090×\u0099× ×\u0093×\u0095× ×\u0096×\u0099×\u0094",
"NAME_HI": "à¤\u0087à¤\u0082डà¥\u008Bनà¥\u0087शिया",
"NAME_HU": "Indonézia",
"NAME_ID": "Indonesia",
"NAME_IT": "Indonesia",
"NAME_JA": "ã\u0082¤ã\u0083³ã\u0083\u0089ã\u0083\u008Dã\u0082·ã\u0082¢",
"NAME_KO": "ì\u009D¸ë\u008F\u0084ë\u0084¤ì\u008B\u009Cì\u0095\u0084",
"NAME_NL": "Indonesië",
"NAME_PL": "Indonezja",
"NAME_PT": "Indonésia",
"NAME_RU": "Ð\u0098ндонезиÑ\u008F",
"NAME_SV": "Indonesien",
"NAME_TR": "Endonezya",
"NAME_UK": "Ð\u0086ндонезÑ\u0096Ñ\u008F",
"NAME_UR": "اÙ\u0086Ú\u0088Ù\u0088Ù\u0086Û\u008CØ´Û\u008Cا",
"NAME_VI": "Indonesia",
"NAME_ZH": "å\u008D°åº¦å°¼è¥¿äº\u009A",
"NAME_ZHT": "å\u008D°åº¦å°¼è¥¿äº\u009E"
},
"id": "ne_10m_admin_0_countries.1"
}
Note the default shapefile "encoding", ISO-8859-1
, is not appropriate for the values (the sample property NAME_JA is messed up).
The above query can rec
8000
eive an encoding
query parameter to indicate how to interpret the shapefile's alphanumeric data:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/ne_10m_admin_0_countries/sampleFeature?encoding=UTF-8
{
"type": "Feature",
"crs": {
"type": "name",
"properties": {"name": "EPSG:4326"}
},
"bbox": [95.01270592500003,-10.922621351999908,140.97762699400005,5.910101630000042],
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[[[117.7036,4.1634], ...,[127.1304,4.7744]]]
]
},
"properties": {
"featurecla": "Admin-0 country",
"scalerank": 0,
"LABELRANK": 2,
"NAME_AR" : "إندونيسيا",
"NAME_BN" : "ইন্দোনেশিয়া",
"NAME_CIAWF" : "Indonesia",
"NAME_DE" : "Indonesien",
"NAME_EL" : "Ινδονησία",
"NAME_EN" : "Indonesia",
"NAME_ES" : "Indonesia",
"NAME_FA" : "اندونزی",
"NAME_FR" : "Indonésie",
"NAME_HE" : "אינדונזיה",
"NAME_HI" : "इंडोनेशिया",
"NAME_HU" : "Indonézia",
"NAME_ID" : "Indonesia",
"NAME_IT" : "Indonesia",
"NAME_JA" : "インドネシア",
"NAME_KO" : "인도네시아",
"NAME_LEN" : 9,
"NAME_LONG" : "Indonesia",
"NAME_NL" : "Indonesië",
"NAME_PL" : "Indonezja",
"NAME_PT" : "Indonésia",
"NAME_RU" : "Индонезия",
"NAME_SORT" : "Indonesia",
"NAME_SV" : "Indonesien",
"NAME_TR" : "Endonezya",
"NAME_UK" : "Індонезія",
"NAME_UR" : "انڈونیشیا",
"NAME_VI" : "Indonesia",
"NAME_ZH" : "印度尼西亚",
"NAME_ZHT" : "印度尼西亞"
},
"id": "ne_10m_admin_0_countries.1"
}
Launching the dataset publishing process and querying/polling its status is performed through
the /datafeeder/update/{jobId}/publish
endpoint.
At any point you can query the publishing status with a GET
request. For example:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish
{
"_links" : {
"self" : {
"href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish"
}
},
"datasets" : [
{
"nativeName" : "ne_10m_admin_0_countries",
"progress" : 0,
"progressStep" : "SKIPPED",
"publish" : false,
"status" : "PENDING"
}
],
"jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
"progress" : 0,
"status" : "PENDING"
}
After upload, the publish status will be PENDING
, as in the sample response above.
To launch the publishing process, you'll need to POST
to that same URL with a JSON request
body that matches the DatasetPublishRequest
defined in the api.yaml OpenAPI 3 spec, for example:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
-X POST -H "Content-Type: application/json" \
http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish \
-d '{
"datasets": [
{
"encoding": "UTF-8",
"metadata": {
"title": "Include dataset title for the metadata record",
"abstract": "Include some dataset description to be used on the metadata record",
"creationDate": "2022-10-14",
"creationProcessDescription": "",
"scale": 25000,
"tags": [
"tag1", "tag2"
]
},
"nativeName": "ne_10m_admin_0_countries",
"publishedName": "ne_10m_admin_0_countries",
"srs": "EPSG:4326",
"srs_reproject": false
}
]
}'
Where the only mandatory fields are:
datasets[].nativeName
datasets[].metadata.title
datasets[].medatada.abstract
Once the publishing process is launched, poll its status. If the process hasn't completed, it'll return a "status": "RUNNING"
(or "SCHEDULED"
).
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish
{
"_links" : {
"self" : {
"href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish"
}
},
"datasets" : [
{
"nativeName" : "ne_10m_admin_0_countries",
"progress" : 0,
"progressStep" : "SCHEDULED",
"publish" : true,
"publishedName" : "ne_10m_admin_0_countries",
"status" : "PENDING",
"title" : "Include dataset title for the metadata record"
}
],
"jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
"progress" : 0,
"status" : "RUNNING"
}
Once the process finishes, the complete state will be returned:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish
{
"_links" : {
"self" : {
"href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish"
}
},
"jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
"progress" : 1.0,
"status" : "DONE",
"datasets" : [ {
"_links" : {
"service" : [ {
"href" : "https://georchestra.mydomain.org/geoserver/test/wms?",
"title" : "Web Map Service entry point where the layer is published",
"name" : "WMS"
}, {
"href" : "https://georchestra.mydomain.org/geoserver/test/wfs?",
"title" : "Web Feature Service entry point where the layer is published",
"name" : "WFS"
} ],
"preview" : {
"href" : "https://georchestra.mydomain.org/geoserver/test/wms/reflect?LAYERS=ne_10m_admin_0_countries&width=800&format=application/openlayers",
"title" : "OpenLayers preview page for the layer published in GeoServer",
"type" : "application/openlayers",
"name" : "openlayers"
},
"describedBy" : [ {
"href" : "https://georchestra.mydomain.org/geonetwork/srv/api/0.1/records/ab798d02-684f-4874-a2d8-8be14bfbb718/formatters/xml",
"title" : "Metadata record XML representation",
"type" : "application/xml",
"name" : "metadata"
}, {
"href" : "https://georchestra.mydomain.org/geonetwork/srv/eng/catalog.search#/metadata/ab798d02-684f-4874-a2d8-8be14bfbb718",
"title" : "Metadata record web page",
"type" : "text/html",
"name" : "metadata"
} ]
},
"nativeName" : "ne_10m_admin_0_countries",
"publishedWorkspace" : "test",
"publishedName" : "ne_10m_admin_0_countries",
"metadataRecordId" : "ab798d02-684f-4874-a2d8-8be14bfbb718",
"title" : "Include dataset title for the metadata record",
"status" : "DONE",
"publish" : true,
"progress" : 1.0,
"progressStep" : "COMPLETED"
} ]
}
Spring mail is used to send notifications when jobs start, finish, or fail; with the following dependency:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-mail</artifactId>
</dependency>
The geOrchestra datadir's datafeeder/datafeeder.properties
contains the SMTP configuration properties, like:
spring.mail.host=${smtpHost}
spring.mail.port=${smtpPort}
spring.mail.username:
spring.mail.password:
spring.mail.protocol: smtp
spring.mail.test-connection: true
spring.mail.properties.mail.smtp.auth: false
spring.mail.properties.mail.smtp.starttls.enable: false
If theese configuration properties are not provided, the application simply won't send emails (see
DataFeederNotificationsAutoConfiguration
and GeorchestraNotificationsAutoConfiguration
).
A message template consists of a number of one-line header parts, followed by the full message body.
Variables on a message template are specified using ${variable-name}
notation.
These variable names can be any of the ones defined below, or an application context property. In any case, the property
must exist or the application will fail to start up.
Note the contents of the message templates can be modified while the application is running, and the changes will be picked up the next time an email is to be sent. However, be careful of using valid property names, validation of such properties occurs only during application start-up.
Here's a small template example:
to: ${user.email}
cc: ${administratorEmail}
bcc:
sender: ${administratorEmail}
from: Georchestra Importer Application
subject:
body:
Dear ${user.name},
....
The following variables are resolved against the job's user, dataset, or publishing attributes:
${user.name}
:${user.lastName}
:${user.email}
:${job.id}
:${job.createdAt}
:${job.error}
:${job.analizeStatus}
:${job.publishStatus}
:${dataset.name}
:${dataset.featureCount}
:${dataset.encoding}
:${dataset.nativeBounds}
:${publish.tableName}
:${publish.layerName}
:${publish.workspace}
:${publish.srs}
:${publish.encoding}
:${metadata.id}
:${metadata.title}
:${metadata.abstract}
:${metadata.creationDate}
:${metadata.lineage}
:${metadata.latLonBoundingBox}
:${metadata.keywords}
:${metadata.scale}
:
Additionally, any other ${property}
will be resolved against the application context
(for example, any property specified in default.properties
or datafeeder.properties
).
To publish a (CSV) job:
curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' 'https://localhost:8080/datafeeder/upload/cbc00cdd-5e5d-4c40-aa76-ea44998a66ec/publish'
{
"datasets": [
{
"nativeName": "covoit-mel",
"srs": "EPSG:4326",
"metadata": {
"title": "My dataset title",
"abstract": "Description of the dataset",
"tags": [
"Soil"
],
"creationDate": "2024-01-16T13:07:37.094Z",
"scale": 10000,
"creationProcessDescription": "My creation process description"
}
}
]
}