The ROR API allows retrieving, searching and filtering the organizations indexed in ROR. The results are returned in JSON. See https://ror.readme.io for documentation.
Commands for indexing ROR data, generating new ROR IDs and other internal operations are also included in this API.
-
Install Docker Desktop
-
Clone this project locally
-
Create a .env file in the root of your local
ror-api
repo with the following valuesELASTIC_HOST=elasticsearch ELASTIC_PORT=9200 ELASTIC_PASSWORD=changeme ROR_BASE_URL=http://localhost GITHUB_TOKEN=[GITHUB TOKEN] AWS_SECRET_ACCESS_KEY=[AWS SECRET ACCESS KEY] AWS_ACCESS_KEY_ID=[AWS ACCESS KEY ID] DATA_STORE=data.dev.ror.org ROUTE_USER=[USER] TOKEN=[TOKEN]
ROR staff should replace values in [] with valid credential values, however, external users who only wish to run the API locally do not need to add these values as they are used for management functionality only.
- Optionally, uncomment line 24 in docker-compose.yml in order to pull the rorapi image from Dockerhub rather than creating it from local code
-
Start Docker Desktop
-
In the project directory, run docker-compose to start all services: docker-compose up -d
-
Index the latest ROR dataset from https://github.com/ror-community/ror-data
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data
Note: You must specify a dataset that exists in ror-data
-
Optionally, start other services, such as ror-app (the search UI) or generate-id (middleware microservice)
-
Optionally, run tests
docker-compose exec web python manage.py test rorapi.tests docker-compose exec web python manage.py test rorapi.tests_integration docker-compose exec web python manage.py test rorapi.tests_functional
Management command indexror
downloads new/updated records from a specified AWS S3 bucket/directory and indexes them into an existing index.
Used in the data deployment process managed in ror-records. Command is triggered by Github actions, but can also be run manually. See ror-records/readme for complete deployment process details.
-
Create a .env file with values for DATA_STORE, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
-
In the project directory, run docker-compose to start all services:
docker-compose up -d
-
Index the latest ROR dataset from https://github.com/ror-community/ror-data
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data
Note: You must specify a dataset that exists in ror-data
-
Add new/updated record files to a directory in the S3 bucket as files.zip. Github actions in dev-ror-records can be used to automatically push files to the DEV S3 bucket.
-
Index files for new/updated records from a directory in an S3 bucket
Through the route:
curl -H "Token: <<token value>>" -H "Route-User: <<value>>" http://localhost:9292/indexdata/<<directory in S3 bucket>>
Through the CLI:
docker-compose exec web python manage.py indexror <<directory in S3 bucket>>`
Management command indexrordump
downloads and indexes and full ROR data dump.
Not used as part of the normal data deployment process. Used when developing locally or restoring a remote environment to a specific data dump.
To delete the existing index, create a new index and index a data dump:
LOCALHOST: Run
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data
DEV/STAGING/PROD: Access the running ror-api container and run:
python manage.py setup v1.0-2022-03-17-ror-data
Note: You must specify a dataset that exists in ror-data
Steps used prior to Mar 2022:
- Convert latest GRID dataset to ROR (including assigning ROR IDs)
- Generate ROR data dump
- Index ROR data dump into Elastic Search
As of Mar 2022 ROR is no longer based on GRID. Record additions/updates and data deployment is now managed in https://github.com/ror-community/ror-records using the indexror
command described above.
Steps below no longer work, as data files have been moved to ror-data. This information is being maintained for historical purposes.
Management commands used in this process no longer work and are pre-pended with "legacy".
To import GRID data, you need a system where setup
has been run successfully. Then first update the GRID
variable in settings.py
, e.g.
GRID = {
'VERSION': '2020-03-15',
'URL': 'https://digitalscience.figshare.com/ndownloader/files/22091379'
}
And, also in settings.py
, set the ROR_DUMP
variable, e.g.
ROR_DUMP = {'VERSION': '2020-04-02'}
Then run this command: ./manage.py upgrade
.
You should see this in the console:
Downloading GRID version 2020-03-15
Converting GRID dataset to ROR schema
ROR dataset created
ROR dataset ZIP archive created
This will create a new data/ror-2020-03-15
folder, containing a ror.json
and ror.zip
. To finish the process, add the new folder to git and push to the GitHub repo.
To install the updated ROR data, run ./manage.py setup
.