8000 GitHub - georchestra/datafeeder
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

georchestra/datafeeder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

geOrchestra Datafeeder

geOrchestra Logo

Import and share SHP or CSV with ease.

Build Status License

Overview

"Datafeeder" is geOrchestra's backend RESTful service to upload file based datasets and publish them to GeoServer and GeoNetwork in one shot.

The separate front-end UI service provides the wizard-like user interface to interact with this backend.

Add georchestra/data-api

In order to integrate the Datafeeder with geOrchestra's data-api, you will have to follow the next sections, depending on the version of the data-api you are using.

For all data-api versions

In datafeeder.properties:

datafeeder.publishing.ogcfeatures.public-url=https://${domainName}/data/ogcapi

For data-api version >= 1.3:

JVM arguments, enable data-api-schemas profile and set the additional configuration file:

-Dspring.profiles.active=georchestra,data-api-schemas -Dspring.config.additional-location=file:/etc/georchestra/data-api/application.yaml

Building

Build the project:

To compile and run the unit and integration tests:

georchestra$ mvn clean install -f datafeeder/

Or

georchestra$ cd datafeeder
datafeeder$ mvn clean install

To build the datafeeder application and its docker image:

georchestra$ make docker-build-datafeeder

Skipping tests

Use the following maven properties to skip tests and/or integration tests:

  • -DskipTests skips both unit and integration tests
  • -DskipITs skips only integration tests

Integration testing: docker compose

For integration testing, some external services are required:

  • A geOrchestra GeoServer instance
  • A geOrchestra GeoNetwork instance
  • A PostgreSQL database with PostGIS extension, for which we're using geOrchestra's database docker image

There is a docker composition with just the required external services in the docker-compose.yml file.

Prior to launch the integration tests, you will have to set up the provided docker composition by hand, as follows:

$ docker-compose -f docker-compose.yml up -d

With the services from the composition in place, run the tests as many times as needed from the IDE or the console:

$ mvn verify

The integration tests ought to be written in a way that support multiple runs without re-initializing the external services state (for example, randomizing database schema names when going to create a schema and such).

Build the docker image:

database: while running from the datafeeder/integration branch, and until the work is merged to master, you'll need to build the database docker image in addition to datafeeder's, for the postgres datafeeder schema to be initialized, and prune the database volume.

georchestra$ make docker-build-database

To build datafeeder's docker image:

georchestra$ make docker-build-datafeeder
georchestra$ docker images|grep datafeeder
georchestra/datafeeder         20.2-SNAPSHOT       a2ca96143b9f        12 seconds ago      376MB

Running

At this point, the service can run as a geOrchestra dockerized service, as part of its docker composition, or standalone for development purposes.

With the service's REST API being defined as an Open API 3 specification, a swagger-ui user interface is provided when browsing to /import (e.g. http://localhost:8080/import/ when running standalone, https://georchestra.mydomain.org/import when running within the docker composition - must log in first - )

Georchestra environment

Checkout the datafeeder branch in docker:

Running as a geOrchestra service requires a clone of geOrchestra's docker repository git@github.com:georchestra/docker.git.

At this time, being not part of the official distribution, you need to switch to the datafeeder branch, which in turn will set the git config/ submodule to the appropriate datafeeder branch, so that the config/datafeeder/datafeder.properties file exists.

git clone git@github.com:georchestra/docker.git
git checkout datafeeder
git submodule update --init

Run geOrchestra docker composition

Run geOrchestra with datafeeder integrated:

docker compose up -d

geOrchestra's "security proxy" API Gateway service has been configured to redirect all calls to /datafeeder/** to this service.

Open https://georchestra-127-0-1-1.traefik.me/ in your browser.

Log in with these credentials:

  • testuser / testuser
  • testadmin / testadmin

Once logged in, datafeeder's OpenAPI test UI is available at https://georchestra-127-0-1-1.traefik.me/datafeeder

Standalone

Run from within the datafeeder root folder with:

docker-compose -f docker-compose.yml up -d
mvn spring-boot:run -Dspring-boot.run.profiles=georchestra,it

or create an equivalent run configuration in your IDE with org.georchestra.datafeeder.app.DataFeederApplication as the application's main class.

Then datafeeder should start and run at http://localhost:8080/datafeeder/

Manual testing

The following is a simple step by step guide to do manual testing with curl.

For a more complete API description, consult the api.yaml OpenAPI 3 definition.

You can also access the Swagger UI by browsing to http://localhost:8080/datafeeder.

Upload dataset

The /datafeeder/upload endpoint receives a number of files, identifies which ones are geospatial datasets, starts up an asynchronous analysis process, and returns the initial job state where to get the job identifier as a UUID.

For example, given a shapefile:

wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
unzip ne_10m_admin_0_countries.zip
ls *shp
ne_10m_admin_0_countries.shp

Launch the upload and analysis process with:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
 -X POST \
 http://localhost:8080/datafeeder/upload \
 -F filename=@ne_10m_admin_0_countries.shp \
 -F filename=@ne_10m_admin_0_countries.dbf \
 -F filename=@ne_10m_admin_0_countries.shx \
 -F filename=@ne_10m_admin_0_countries.prj
 
{
  "_links" : {
    "self" : {
      "href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0"
    }
  },
  "jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
  "progress" : 0.0,
  "status" : "PENDING",
  "datasets" : [ ]
}

Poll updload status

Then poll the job status with the returned jobId:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \\
 http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0
{
  "_links" : {
    "self" : {
      "href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0"
    }
  },
  "jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
  "progress" : 1.0,
  "status" : "DONE",
  "datasets" : [ {
    "name" : "ne_10m_admin_0_countries",
    "status" : "DONE",
    "featureCount" : 258,
    "nativeBounds" : {
      "crs" : {
        "srs" : "EPSG:4326",
        "wkt" : "GEOGCS[\"GCS_WGS_1984\", DATUM[\"D_WGS_1984\", SPHEROID[\"WGS_1984\", 6378137.0, 298.257223563]], PRIMEM[\"Greenwich\", 0.0], UNIT[\"degree\", 0.017453292519943295], AXIS[\"Longitude\", EAST], AXIS[\"Latitude\", NORTH]]"
      },
      "minx" : -179.99999999999991,
      "maxx" : 180.0,
      "miny" : -89.99999999999994,
      "maxy" : 83.63410065300008
    },
    "encoding" : "ISO-8859-1"
  } ]
}

Query dataset bounds

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR'  http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/ne_10m_admin_0_countries/bounds
{
  "crs" : {
    "srs" : "EPSG:4326",
    "wkt" : "GEOGCS[\"GCS_WGS_1984\", DATUM[\"D_WGS_1984\", SPHEROID[\"WGS_1984\", 6378137.0, 298.257223563]], PRIMEM[\"Greenwich\", 0.0], UNIT[\"degree\", 0.017453292519943295], AXIS[\"Longitude\", EAST], AXIS[\"Latitude\", NORTH]]"
  },
  "minx" : -179.99999999999991,
  "maxx" : 180.0,
  "miny" : -89.99999999999994,
  "maxy" : 83.63410065300008

Query sample features

Use a GET request on /datafeeder/updload/{id}/{typeName}/sampleFeature to get a sample GeoJSON feature from an uploaded dataset:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR'  \
 http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/ne_10m_admin_0_countries/sampleFeature
{
    "type": "Feature",
    "crs": {
        "type": "name",
        "properties": {"name": "EPSG:4326"}
    },
    "bbox": [95.01270592500003,-10.922621351999908,140.97762699400005,5.910101630000042],
    "geometry": {
        "type": "MultiPolygon",
        "coordinates": [
            [[[117.7036,4.1634], ...,[127.1304,4.7744]]]
        ]
    },
    "properties": {
        "featurecla": "Admin-0 country",
        "scalerank": 0,
        "LABELRANK": 2,
        "NAME_AR": "Ø¥Ù\u0086دÙ\u0088Ù\u0086Ù\u008AسÙ\u008Aا",
        "NAME_BN": "à¦\u0087নà§\u008Dদà§\u008Bনà§\u0087শিয়া",
        "NAME_DE": "Indonesien",
        "NAME_EN": "Indonesia",
        "NAME_ES": "Indonesia",
        "NAME_FA": "اÙ\u0086دÙ\u0088Ù\u0086زÛ\u008C",
        "NAME_FR": "Indonésie",
        "NAME_EL": "Î\u0099νδονηÏ\u0083ία",
        "NAME_HE": "×\u0090×\u0099× ×\u0093×\u0095× ×\u0096×\u0099×\u0094",
        "NAME_HI": "à¤\u0087à¤\u0082डà¥\u008Bनà¥\u0087शिया",
        "NAME_HU": "Indonézia",
        "NAME_ID": "Indonesia",
        "NAME_IT": "Indonesia",
        "NAME_JA": "ã\u0082¤ã\u0083³ã\u0083\u0089ã\u0083\u008Dã\u0082·ã\u0082¢",
        "NAME_KO": "ì\u009D¸ë\u008F\u0084ë\u0084¤ì\u008B\u009Cì\u0095\u0084",
        "NAME_NL": "Indonesië",
        "NAME_PL": "Indonezja",
        "NAME_PT": "Indonésia",
        "NAME_RU": "Ð\u0098ндонезиÑ\u008F",
        "NAME_SV": "Indonesien",
        "NAME_TR": "Endonezya",
        "NAME_UK": "Ð\u0086ндонезÑ\u0096Ñ\u008F",
        "NAME_UR": "اÙ\u0086Ú\u0088Ù\u0088Ù\u0086Û\u008CØ´Û\u008Cا",
        "NAME_VI": "Indonesia",
        "NAME_ZH": "å\u008D°åº¦å°¼è¥¿äº\u009A",
        "NAME_ZHT": "å\u008D°åº¦å°¼è¥¿äº\u009E"
    },
    "id": "ne_10m_admin_0_countries.1"
}

Note the default shapefile "encoding", ISO-8859-1, is not appropriate for the values (the sample property NAME_JA is messed up). The above query can rec 8000 eive an encoding query parameter to indicate how to interpret the shapefile's alphanumeric data:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR'  \
 http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/ne_10m_admin_0_countries/sampleFeature?encoding=UTF-8
{
    "type": "Feature",
    "crs": {
        "type": "name",
        "properties": {"name": "EPSG:4326"}
    },
    "bbox": [95.01270592500003,-10.922621351999908,140.97762699400005,5.910101630000042],
    "geometry": {
        "type": "MultiPolygon",
        "coordinates": [
            [[[117.7036,4.1634], ...,[127.1304,4.7744]]]
        ]
    },
    "properties": {
        "featurecla": "Admin-0 country",
        "scalerank": 0,
        "LABELRANK": 2,
      "NAME_AR" : "إندونيسيا",
      "NAME_BN" : "ইন্দোনেশিয়া",
      "NAME_CIAWF" : "Indonesia",
      "NAME_DE" : "Indonesien",
      "NAME_EL" : "Ινδονησία",
      "NAME_EN" : "Indonesia",
      "NAME_ES" : "Indonesia",
      "NAME_FA" : "اندونزی",
      "NAME_FR" : "Indonésie",
      "NAME_HE" : "אינדונזיה",
      "NAME_HI" : "इंडोनेशिया",
      "NAME_HU" : "Indonézia",
      "NAME_ID" : "Indonesia",
      "NAME_IT" : "Indonesia",
      "NAME_JA" : "インドネシア",
      "NAME_KO" : "인도네시아",
      "NAME_LEN" : 9,
      "NAME_LONG" : "Indonesia",
      "NAME_NL" : "Indonesië",
      "NAME_PL" : "Indonezja",
      "NAME_PT" : "Indonésia",
      "NAME_RU" : "Индонезия",
      "NAME_SORT" : "Indonesia",
      "NAME_SV" : "Indonesien",
      "NAME_TR" : "Endonezya",
      "NAME_UK" : "Індонезія",
      "NAME_UR" : "انڈونیشیا",
      "NAME_VI" : "Indonesia",
      "NAME_ZH" : "印度尼西亚",
      "NAME_ZHT" : "印度尼西亞"
    },
    "id": "ne_10m_admin_0_countries.1"
}

Publish uploaded dataset

Launching the dataset publishing process and querying/polling its status is performed through the /datafeeder/update/{jobId}/publish endpoint.

Query/poll publishing process status

At any point you can query the publishing status with a GET request. For example:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
 http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish

{
   "_links" : {
      "self" : {
         "href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish"
      }
   },
   "datasets" : [
      {
         "nativeName" : "ne_10m_admin_0_countries",
         "progress" : 0,
         "progressStep" : "SKIPPED",
         "publish" : false,
         "status" : "PENDING"
      }
   ],
   "jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
   "progress" : 0,
   "status" : "PENDING"
}

After upload, the publish status will be PENDING, as in the sample response above.

Launch the publishing process

To launch the publishing process, you'll need to POST to that same URL with a JSON request body that matches the DatasetPublishRequest defined in the api.yaml OpenAPI 3 spec, for example:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
 -X POST -H "Content-Type: application/json" \
 http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish \
 -d '{
  "datasets": [
    {    
      "encoding": "UTF-8",    
      "metadata": {    
        "title": "Include dataset title for the metadata record",    
        "abstract": "Include some dataset description to be used on the metadata record",    
        "creationDate": "2022-10-14",    
        "creationProcessDescription": "",    
        "scale": 25000,    
        "tags": [    
          "tag1", "tag2"    
        ]    
      },    
      "nativeName": "ne_10m_admin_0_countries",    
      "publishedName": "ne_10m_admin_0_countries",    
      "srs": "EPSG:4326",    
      "srs_reproject": false    
    }    
  ]    
}'

Where the only mandatory fields are:

  • datasets[].nativeName
  • datasets[].metadata.title
  • datasets[].medatada.abstract

Once the publishing process is launched, poll its status. If the process hasn't completed, it'll return a "status": "RUNNING" (or "SCHEDULED").

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' \
 http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish

{
   "_links" : {
      "self" : {
         "href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish"
      }
   },
   "datasets" : [
      {
         "nativeName" : "ne_10m_admin_0_countries",
         "progress" : 0,
         "progressStep" : "SCHEDULED",
         "publish" : true,
         "publishedName" : "ne_10m_admin_0_countries",
         "status" : "PENDING",
         "title" : "Include dataset title for the metadata record"
      }
   ],
   "jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
   "progress" : 0,
   "status" : "RUNNING"
}

Once the process finishes, the complete state will be returned:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR'  http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish
{
  "_links" : {
    "self" : {
      "href" : "http://localhost:8080/datafeeder/upload/d1a62676-0de3-4b9d-a0f2-9a691f197cf0/publish"
    }
  },
  "jobId" : "d1a62676-0de3-4b9d-a0f2-9a691f197cf0",
  "progress" : 1.0,
  "status" : "DONE",
  "datasets" : [ {
    "_links" : {
      "service" : [ {
        "href" : "https://georchestra.mydomain.org/geoserver/test/wms?",
        "title" : "Web Map Service entry point where the layer is published",
        "name" : "WMS"
      }, {
        "href" : "https://georchestra.mydomain.org/geoserver/test/wfs?",
        "title" : "Web Feature Service entry point where the layer is published",
        "name" : "WFS"
      } ],
      "preview" : {
        "href" : "https://georchestra.mydomain.org/geoserver/test/wms/reflect?LAYERS=ne_10m_admin_0_countries&width=800&format=application/openlayers",
        "title" : "OpenLayers preview page for the layer published in GeoServer",
        "type" : "application/openlayers",
        "name" : "openlayers"
      },
      "describedBy" : [ {
        "href" : "https://georchestra.mydomain.org/geonetwork/srv/api/0.1/records/ab798d02-684f-4874-a2d8-8be14bfbb718/formatters/xml",
        "title" : "Metadata record XML representation",
        "type" : "application/xml",
        "name" : "metadata"
      }, {
        "href" : "https://georchestra.mydomain.org/geonetwork/srv/eng/catalog.search#/metadata/ab798d02-684f-4874-a2d8-8be14bfbb718",
        "title" : "Metadata record web page",
        "type" : "text/html",
        "name" : "metadata"
      } ]
    },
    "nativeName" : "ne_10m_admin_0_countries",
    "publishedWorkspace" : "test",
    "publishedName" : "ne_10m_admin_0_countries",
    "metadataRecordId" : "ab798d02-684f-4874-a2d8-8be14bfbb718",
    "title" : "Include dataset title for the metadata record",
    "status" : "DONE",
    "publish" : true,
    "progress" : 1.0,
    "progressStep" : "COMPLETED"
  } ]
}

Developer's corner

Email notifications

Spring mail is used to send notifications when jobs start, finish, or fail; with the following dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-mail</artifactId>
</dependency>

The geOrchestra datadir's datafeeder/datafeeder.properties contains the SMTP configuration properties, like:

spring.mail.host=${smtpHost}
spring.mail.port=${smtpPort}
spring.mail.username: 
spring.mail.password: 
spring.mail.protocol: smtp
spring.mail.test-connection: true
spring.mail.properties.mail.smtp.auth: false
spring.mail.properties.mail.smtp.starttls.enable: false

If theese configuration properties are not provided, the application simply won't send emails (see DataFeederNotificationsAutoConfiguration and GeorchestraNotificationsAutoConfiguration).

Email message templates

A message template consists of a number of one-line header parts, followed by the full message body.

Variables on a message template are specified using ${variable-name} notation. These variable names can be any of the ones defined below, or an application context property. In any case, the property must exist or the application will fail to start up.

Note the contents of the message templates can be modified while the application is running, and the changes will be picked up the next time an email is to be sent. However, be careful of using valid property names, validation of such properties occurs only during application start-up.

Here's a small template example:

to: ${user.email}
cc: ${administratorEmail}
bcc:
sender: ${administratorEmail}
from: Georchestra Importer Application
subject:
body:

Dear ${user.name}, 
....

The following variables are resolved against the job's user, dataset, or publishing attributes:

  • ${user.name}:
  • ${user.lastName}:
  • ${user.email}:
  • ${job.id}:
  • ${job.createdAt}:
  • ${job.error}:
  • ${job.analizeStatus}:
  • ${job.publishStatus}:
  • ${dataset.name}:
  • ${dataset.featureCount}:
  • ${dataset.encoding}:
  • ${dataset.nativeBounds}:
  • ${publish.tableName}:
  • ${publish.layerName}:
  • ${publish.workspace}:
  • ${publish.srs}:
  • ${publish.encoding}:
  • ${metadata.id}:
  • ${metadata.title}:
  • ${metadata.abstract}:
  • ${metadata.creationDate}:
  • ${metadata.lineage}:
  • ${metadata.latLonBoundingBox}:
  • ${metadata.keywords}:
  • ${metadata.scale}:

Additionally, any other ${property} will be resolved against the application context (for example, any property specified in default.properties or datafeeder.properties).

Payload

To publish a (CSV) job:

curl -H 'sec-proxy: true' -H 'sec-org: test' -H 'sec-orgname: Test Org' -H 'sec-username: testadmin' -H 'sec-roles: ROLE_ADMINISTRATOR' 'https://localhost:8080/datafeeder/upload/cbc00cdd-5e5d-4c40-aa76-ea44998a66ec/publish'
{
  "datasets": [
    {
      "nativeName": "covoit-mel",
      "srs": "EPSG:4326",
      "metadata": {
        "title": "My dataset title",
        "abstract": "Description of the dataset",
        "tags": [
          "Soil"
        ],
        "creationDate": "2024-01-16T13:07:37.094Z",
        "scale": 10000,
        "creationProcessDescription": "My creation process description"
      }
    }
  ]
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 12

Languages

0