@tricoteuses/assemblee
TypeScript icon, indicating that this package has built-in type declarations

1.7.0 • Public • Published

Tricoteuses-Assemblee

Retrieve, clean up & handle French Assemblée nationale's open data

Installation

git clone https://git.en-root.org/tricoteuses/tricoteuses-assemblee
cd tricoteuses-assemblee/
npm install

Download and clean data

Basic usage

Create a folder where the data will be downloaded and run the following commands to download, reorganize and clean the data.

mkdir ../assemblee-data/

# Download open data
npm run data:retrieve_open_data ../assemblee-data

# Reorganizating open data files and directories into cleaner (and split) directories
npm run data:reorganize_data ../assemblee-data

# Validation & cleaning of JSON data
npm run data:clean_data ../assemblee-data

Data from other sources is also available :

# Retrieval of députés' pictures from Assemblée nationale's website
npm run data:retrieve_deputes_photos ../assemblee-data

# Retrieval of sénateurs' pictures from Assemblée nationale's website
npm run data:retrieve_senateurs_photos ../assemblee-data

# Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
npm run data:retrieve_pending_amendements ../assemblee-data

Notes:

Filtering options

Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.

To download only a type of dataset, use the --categories option (shortcut -k) :

# Available options : ActeursEtOrganes, Agendas, Amendements, DossiersLegislatifs, Photos, Scrutins, Questions, ComptesRendusSeances
npm run data:retrieve_open_data ../assemblee-data -- --categories Amendements

To download only a specific legislature, use the --legislature option (shortcut -l):

# Available options : 14, 15, 16
npm run data:retrieve_open_data ../assemblee-data -- --legislature 16

If you use such options, use them in all subsequent commands too (data:regorganize_data and data:clean_data).

Download using Docker

A Docker image that downloads and cleans the data all at once is available. Build it locally or pull it from the container registry :

docker pull registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest

Create a volume to download the data and use the environment variables LEGISLATURE and CATEGORIES if needed :

docker volume create assemblee-data
docker run --name tricoteuses-assemblee -v assemblee-data:/app/assemblee -e LEGISLATURE=16 -d registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest

Using the data

Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the @tricoteuses/assemblee package, and import the iterator functions that you need.

npm install @tricoteuses/assemblee
import {
  iterLoadAssembleeActeurs,
  iterLoadAssembleeOrganes,
  iterLoadAssembleeReunions,
  iterLoadAssembleeScrutins,
  iterLoadAssembleeDocuments,
  iterLoadAssembleeDossiersParlementaires,
  iterLoadAssembleeAmendements,
  iterLoadAssembleeQuestions
} from "@tricoteuses/assemblee/lib/loaders";

// Pass data directory and legislature as arguments
for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", "16")) {
  console.log(acteur.uid)
}

Test loading everything in memory

Test loading small split files

npx babel-node --extensions ".ts" --max-old-space-size=2048 -- src/scripts/test_load.ts ../assemblee-data/

Test loading big non-split files

npx babel-node --extensions ".ts" --max-old-space-size=2048 -- src/scripts/test_load_big_files.ts ../assemblee-data/

Note: The big non-split open data files should not be used. Use small split files instead.

Generating schemas and documentation (for contributors only)

Initial generation of TypeScript & JSON schema files from JSON data

npx quicktype --acronym-style=camel -o src/raw_types/acteurs_et_organes.ts ../assemblee-data/AMO{10,20,30,40,50}_*.json
npx quicktype --acronym-style=camel -o src/raw_types/agendas.ts ../assemblee-data/Agenda_{XIV,XV}.json
npx babel-node --extensions ".ts" --max-old-space-size=8192 --  src/scripts/raw_types_from_amendements.ts ../assemblee-data/
npx quicktype --acronym-style=camel -o src/raw_types/dossiers_legislatifs.ts ../assemblee-data/Dossiers_Legislatifs_{XIV,XV}.json

Updating JSON schema files and validating JSON files

  • Convert src/types/*.ts into JSON schemas for comparison purposes
for f in src/types/*.ts ; do b=$(basename $f .ts) ; npx typescript-json-schema src/types/$b.ts '*' > src/schemas/converted_from_type/$b.json ; done
  • Manually update src/schemas//.json to account for these differences
  • Verify the JSON files validate with the updated schema
npx babel-node --extensions .ts -- src/scripts/validate_json.ts --repository=$(git rev-parse --show-toplevel) --dataset ../data/assemblee-nettoye/AMO*nettoye
npx babel-node --extensions .ts -- src/scripts/validate_json.ts --repository=$(git rev-parse --show-toplevel) --dataset ../data/assemblee-nettoye/Dossiers_Legislatifs_XV_nettoye
etc.

If an error occurs and the schema must be fixed:

  • Verify the schema works by using --dev to use the schema from the current working directory instead of fetching them from the tag maching the version mentionned in the JSON file. For instance, if the file acteurs/PA766283.json has schemaVersion = "acteur-1.0" it will use the schema found at schema-acteur-1.0 and not the current working directory, except if --dev is used.
  • Once the schema is verified to work, add a tag matching the directory of the schema. For instance for amendement/Amendement.json or any of its references (i.e. amendement/*.json), set the tag schema-amendement-X.Y.
    • If the schema change is backward compatible (i.e. software using the corresponding JSON won't break), increment Y (X.1, X.2, ...)
    • If the schema change is not backward compatible, increment X and set Y to zero (1.0, 2.0, ...)

The tag with the highest version will be used by src/scripts/clean_reorganized_data.ts to add a schemaVersion field for all JSON files created in a *_nettoye repository from that point on. The goal is for a JSON file to validate against an immutable schema identified by a version tag and to all each JSON file to have a different version of the schema.

See the discussion in the forum for more information and further discussion.

Helpers to create documentation

All raw files are kept during the process

$ npx babel-node --extensions .ts -- src/scripts/document_dossiers_legislatifs.ts --data ../data/assemblee-nettoye/Dossiers_Legislatifs_{XIV,XV}_nettoye/dossiers/**/*.json

See the data-site README for more information about how it is used.

Package Sidebar

Install

npm i @tricoteuses/assemblee

Weekly Downloads

11

Version

1.7.0

License

AGPL-3.0-or-later

Unpacked Size

6.89 MB

Total Files

213

Last publish

Collaborators

  • eraviart