UK Rail Import
A tool to extract, transform and load (ETL) UK rail data feeds. This script will extract fares, flows, locations, group stations and restrictions from the ATOC fares feed and unofficial GTFS timetable feed and store them in a database.
Although both the timetable and fares feed are open data you will need to obtain the fares feed via the ATOC website. The formal specification for the data inside the feed also available on the ATOC website.
At the moment only mysql is supported as the data store but it could be extended to support other data stores. PRs are very welcome.
You don't have to install it globally but it makes it easier if you are not going to use it as part of another project. The
-g option usually requires
npm install -g uk-rail-import
You can run the application by pipping the output to mysql, or you can use the environment/profile variables. The environment variable method connects directly to mysql to import the data whereas the unix pipes just pass SQL queries around.
Each of these commands relies on the database settings being set in the environment variables. For example
DATABASE_USERNAME=root DATABASE_NAME=fares uk-rail-import --fares-clean.
uk-rail-import --fares /path/to/RJFAFxxx.ZIP
Clean (remove old an inaccurate data)
Clean (remove journeys in the past)
Values marked as all asterisks, empty spaces, or in the case of dates - zeros, are set to null. This is to preverse the integrity of the column type. For instance a route code is numerical although the data feed often uses ***** to signify any so this value is converted to null.
Although every record format has a composite key defined in the specification an
id field is added as the fields in the composite key are sometimes null. This is no longer supported in modern versions of MariaDB or MySQL.
At present journey segments, class legends, rounding rules, print formats and the fares data feed meta data are not imported. They are either deprecated or irrelevant. Raise an issue or PR if you would like them added.
Issues and PRs are very welcome.
If there is a way of automatically downloading the feed from the ATOC website that would be useful too.
The project is written in TypeScript.
This software is licensed under GNU GPLv3.
Copyright 2016 Linus Norton.