An import tool for the British rail fares, routeing and timetable feeds into a database.
Although both the timetable and fares feed are open data you will need to obtain the fares feed via the ATOC website. The formal specification for the data inside the feed also available on the ATOC website.
The tool supports both MySQL and Snowflake databases.
- Node.js version 20 or higher
- npm (comes with Node.js)
npm install git+https://github.com/seatfrog/dtd2db.git
Or add to your package.json:
{
"dependencies": {
"dtd2db": "github:seatfrog/dtd2db"
}
}
This will install the dtd2db
command, which can be used with either MySQL or Snowflake databases by setting the DATABASE_TYPE
environment variable.
- Clone the repository:
git clone git@github.com:seatfrog/dtd2db.git
cd dtd2db
- Install dependencies and build:
npm install
npm run prepublishOnly
- Link it locally (optional, for development):
npm link
- In your project, you can then either:
- Add it as a local dependency in your package.json:
{ "dependencies": { "dtd2db": "file:/path/to/dtd2db" } }
- Or install directly from the local path:
npm install /path/to/dtd2db
- Add it as a local dependency in your package.json:
You need to run npm run prepublishOnly
in the following situations:
- After making any changes to TypeScript source files
- After pulling new changes from the repository that include TypeScript modifications
- After installing new dependencies that affect TypeScript compilation
- When you see TypeScript-related errors in the console that suggest the compiled JavaScript is out of sync with the source
The command compiles TypeScript files into JavaScript in the dist/
directory, which is what actually gets executed when you run dtd2db
. Without running this command after changes, you might be running outdated code.
The DATABASE_TYPE
environment variable is required for all operations and must be set to either mysql
or snowflake
.
Required environment variables:
DATABASE_TYPE=mysql
DATABASE_HOSTNAME=localhost
DATABASE_USERNAME=root
DATABASE_PASSWORD=your_password
DATABASE_NAME=your_database
DATABASE_PORT=3306
Required environment variables:
DATABASE_TYPE=snowflake
DATABASE_NAME=your_database
SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_USERNAME=your_username
SNOWFLAKE_PRIVATE_KEY_PATH=/path/to/private_key.p8
SNOWFLAKE_SCHEMA=your_schema
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_ROLE=your_role
Note: For Snowflake, key pair authentication is required and password authentication is not supported. You'll need to:
- Generate an RSA key pair
- Add the public key to your Snowflake user
- Store the private key securely and provide its path via
SNOWFLAKE_PRIVATE_KEY_PATH
For more information on setting up key pair authentication with Snowflake, refer to the official documentation.
The application creates two separate Snowflake connections for different purposes:
- General-purpose Connection: Used for regular database operations like schema management, queries, and table operations.
- Streaming Connection: Used specifically for data streaming operations and bulk data loading.
This dual-connection architecture provides several benefits:
- Prevents streaming operations from blocking regular database operations
- Allows for better resource management
- Provides isolation between different types of database operations
- Enables parallel processing of data streams and regular queries
You'll see log messages indicating which type of connection is being created during execution.
When working with Snowflake:
- Column names and identifiers are automatically converted to uppercase
- Quoted identifiers are avoided to maintain compatibility with Snowflake's case-sensitivity rules
- Snowflake-specific record implementations are used for optimal performance
- The application uses Snowflake's native ID generation mechanisms instead of MySQL-style auto-increment
All commands require the DATABASE_TYPE
environment variable to be set to either mysql
or snowflake
.
Import the fares into a database, creating the schema if necessary. This operation is destructive and will remove any existing data.
dtd2db --fares /path/to/RJFAFxxx.ZIP
Removes expired data and invalid fares, corrects railcard passenger quantities, adds full date entries to restriction date records. This command will occasionally fail due to a database timeout (depending on hardware), re-running the command should correct the problem.
dtd2db --fares-clean
The fares feed uses the following short codes to indicate the action to take for each record:
Code | Action |
---|---|
I | Insert |
U | Update |
D | Delete |
These codes are used in the first column of each record in the fares feed to indicate whether the record should be inserted, updated, or deleted in the database.
Import the timetable information into a database, creating the schema if necessary. This operation is destructive and will remove any existing data.
dtd2db --timetable /path/to/RJTTFxxx.ZIP
Convert the DTD/TTIS version of the timetable (up to 3 months into the future) to GTFS.
dtd2db --timetable /path/to/RJTTFxxx.ZIP
dtd2db --gtfs-zip filename-of-gtfs.zip
dtd2db --routeing /path/to/RJRGxxxx.ZIP
dtd2db --nfm64 /path/to/nfm64.zip
The download commands will take the latest full refresh from an SFTP server (by default the DTD server).
Requires the following environment variables:
SFTP_USERNAME=dtd_username
SFTP_PASSWORD=dtd_password
SFTP_HOSTNAME=dtd_hostname (this will default to dtd.atocrsp.org)
There is a command for each feed:
dtd2db --download-fares /path/
dtd2db --download-timetable /path/
dtd2db --download-routeing /path/
dtd2db --download-nfm64 /path/
Or download and process in one command:
dtd2db --get-fares
dtd2db --get-timetable
dtd2db --get-routeing
dtd2db --get-nfm64
Values marked as all asterisks, empty spaces, or in the case of dates - zeros, are set to null. This is to preserve the integrity of the column type. For instance a route code is numerical although the data feed often uses ***** to signify any so this value is converted to null.
Although every record format has a composite key defined in the specification an id
field is added as the fields in the composite key are sometimes null. This is no longer supported in modern versions of MariaDB or MySQL.
At present journey segments, class legends, rounding rules, print formats and the fares data feed meta data are not imported. They are either deprecated or irrelevant. Raise an issue or PR if you would like them added.
The timetable data does not map to a relational database in a very logical fashion so all LO, LI and LT records map to a single stop_time
table.
Only schedule records that start up to 3 months into the future (using date of import as a reference point) are exported to GTFS for performance reasons. This will cause any data after that point to be either incomplete or incorrect, as override/cancellation records after that will be ignored as well.
git clone git@github.com:seatfrog/dtd2db
npm install --dev
npm test
The following short codes are used in the codebase to represent different record actions:
- I: Insert
- A: Update
- D: Delete
- DI: DelayedInsert
These codes are used in logging and throughout the codebase for clarity and consistency.
The following short codes are used in the codebase to represent different record actions:
- I: Insert
- A: Update
- D: Delete
- DI: DelayedInsert
These codes are used in logging and throughout the codebase for clarity and consistency.