Tricoteuses-Assemblee

_Retrieve, clean up & handle French Assemblée nationale's open data_

_Tricoteuses Légifrance_ is free and open source software.

- software repository
- GNU Affero General Public License version 3 or greater

documentation

- Architecture
- TypeScript API
- Main interfaces:
- Acteur : personne physique élue ou nommée dans des organes
- Amendement
- CompteRendu : compte-rendu d'un débat parlementaire
- Document : texte d'un projet de loi, d'une proposition de loi, d'un rapport, etc
- DossierParlementaire : Dossier de suivi d'un projet ou d'une proposition de loi, d'une résolution, etc
- Organe : commission, groupe politique, groupe d'étude, groupe d'amitié, etc
- Question : question au Gouvernement
- Reunion : séance publique, réunion de commission, de groupe d'étude, etc
- Scrutin : vote de chaque député lors d'un scrutin public
- JSON Schemas

Requirements

- Node >= 18

Installation

``bash git clone https://git.tricoteuses.fr/logiciels/tricoteuses-assemblee cd tricoteuses-assemblee/`

`bash npm install`

`Download and clean data`

`$3`

Create a directory to store the data, then run the following command to download, reorganize and clean the data.

`bash mkdir ../assemblee-data/ npm run data:download ../assemblee-data`

`$3`

- npm run data:download

: Download, reorganize, and clean data
-

npm run data:retrieve_open_data

: Download raw data files.
-

npm run data:reorganize_data

: Reorganize raw files by entity.
-

npm run data:clean_data

: Clean and validate reorganized files.
-

npm run data:retrieve_deputes_photos

: Retrieval of députés' pictures from Assemblée nationale's website
-

npm run data:retrieve_senateurs_photos

: Retrieval of sénateurs' pictures from Assemblée nationale's website
-

npm run data:retrieve_documents

: Retrieval of legislative documents from Assemblée nationale's website
-

npm run data:retrieve_pending_amendements

: Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
_Notes_:

- Reorganized files (generated by the _data:reorganize_data_ command) are also available in Tricoteuses / Data / Données brutes de l'Assemblée. They are updated on a regular basis. - Split & cleaned files (generated by the _data:clean_data_ command) are also available in Tricoteuses / Data / Données nettoyées de l'Assemblée with the_nettoye suffix. They are updated on a regular basis.

`$3`

Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.

Examples:

`bash

`Only download amendments`


npm run data:download ../assemblee-data -- -k Amendements
Only process 16th and 17th legislatures

npm run data:download ../assemblee-data -- -l 16 -l 17

$3

- --categories or -k : Filter by dataset categories (Available options : ActeursEtOrganes, Agendas, Amendements, DossiersLegislatifs, Photos, Scrutins, Questions, ComptesRendusSeances)

- --legislature or -l : Specify one or more legislatures to process (e.g., -l 15 -l 16) ---dataDir (Mandatory): Path to the working directory where all data is stored (required) ---silent or -s: Disable logging ---verbose or -v: Enable verbose logging ---fetch or -f: Force re-download of data even if already present ---commit or -c: Automatically commit cleaned data ---pull or -p: Pull repositories before starting ---clone or -C : Clone Git repositories from a remote group or organization ---remote or -r : Push commits to specified Git remote(s) ---keepDir: Keep Dir (Implement before cleaning data) ---only-recent (number): If files are already present, skip files that are above the specified number of days and skip old legislatures (e.g. -only-recent 30)

If you use such options, use them in all subsequent commands too (_data:regorganize_data_ and _data:clean_data_).

`$3`

- --dataset or -d : Clean a specific dataset only ---no-reset-after-commit: Skip Git reset after committing (useful to preserve local changes) ---no-validate or -V: Skip schema validation during cleaning ---fetchDocuments: Specify to retrieve documents ---parseDocuments: Specify to parse documents into cleaned json ---fetchVideos: Retrieve videos

`$3`

- --full or -f: Retrieve all documents, even those already downloaded ---document-type or -T : Restrict to specific document types (e.g., PION)

`Download using Docker`

A Docker image that downloads and cleans the data all at once is available. Build it locally or run it from the container registry. Use the environment variablesLEGISLATURE and CATEGORIES if needed.

`bash docker run --pull always --name tricoteuses-assemblee -v ../assemblee-data:/app/assemblee-data -e LEGISLATURE=17 -d git.tricoteuses.fr/logiciels/tricoteuses-assemblee:latest`

`Using the data`

Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the _@tricoteuses/assemblee_ package, and import the iterator functions that you need.

`bash npm install @tricoteuses/assemblee`

`js import { iterLoadAssembleeActeurs, iterLoadAssembleeOrganes, iterLoadAssembleeReunions, iterLoadAssembleeScrutins, iterLoadAssembleeDocuments, iterLoadAssembleeDossiersParlementaires, iterLoadAssembleeAmendements, iterLoadAssembleeQuestions, iterLoadAssembleeComptesRendus, } from "@tricoteuses/assemblee/loaders"

// Pass data directory and legislature as arguments for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) { console.log(acteur.uid) }``

Generating schemas and documentation (for contributors only)

View instructions here

Tricoteuses-Assemblee

_Retrieve, clean up & handle French Assemblée nationale's open data_

_Tricoteuses Légifrance_ is free and open source software.

- software repository
- GNU Affero General Public License version 3 or greater

documentation

Requirements

- Node >= 18

Installation

``bash git clone https://git.tricoteuses.fr/logiciels/tricoteuses-assemblee cd tricoteuses-assemblee/`

`bash npm install`

`Download and clean data`

`$3`

Create a directory to store the data, then run the following command to download, reorganize and clean the data.

`bash mkdir ../assemblee-data/ npm run data:download ../assemblee-data`

`$3`

- npm run data:download

: Download, reorganize, and clean data
-

npm run data:retrieve_open_data

: Download raw data files.
-

npm run data:reorganize_data

: Reorganize raw files by entity.
-

npm run data:clean_data

: Clean and validate reorganized files.
-

npm run data:retrieve_deputes_photos

: Retrieval of députés' pictures from Assemblée nationale's website
-

npm run data:retrieve_senateurs_photos

: Retrieval of sénateurs' pictures from Assemblée nationale's website
-

npm run data:retrieve_documents

: Retrieval of legislative documents from Assemblée nationale's website
-

npm run data:retrieve_pending_amendements

: Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
_Notes_:

`$3`

Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.

Examples:

`bash

`Only download amendments`


npm run data:download ../assemblee-data -- -k Amendements
Only process 16th and 17th legislatures

npm run data:download ../assemblee-data -- -l 16 -l 17

$3

If you use such options, use them in all subsequent commands too (_data:regorganize_data_ and _data:clean_data_).

`$3`

- --full or -f: Retrieve all documents, even those already downloaded ---document-type or -T : Restrict to specific document types (e.g., PION)

`Download using Docker`

`bash docker run --pull always --name tricoteuses-assemblee -v ../assemblee-data:/app/assemblee-data -e LEGISLATURE=17 -d git.tricoteuses.fr/logiciels/tricoteuses-assemblee:latest`

`Using the data`

`bash npm install @tricoteuses/assemblee`

// Pass data directory and legislature as arguments for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) { console.log(acteur.uid) }``

Generating schemas and documentation (for contributors only)

View instructions here