Import FANTOIR database
Open, NormalPublic
Actions

Assigned To

Authored By

	dereckson
	Jan 12 2023, 01:33

Description

rVIPER contains parsing code for the FANTOIR database, a database with all streets, ways, private lots, and "pseudo-voies" like metro stations in France.

FANTOIR

FANTOIR is the "fichier des voies et lieux-dits" produced by the DGFiP, a French administration with finance as a core mission.

The database uses a text file, where each line is a record, and information is located between characters ... and ..., a format difficult to work with:

 ENEVERS                  2022110120223060000000
010        AIN                                             00000000000000 00000000000000
010001    WL'ABERGEMENT-CLEMENCIAT        N  3      000082500000000000000 00000001987001
010001A008WLOT BELLEVUE                   N  3  0          00000000000000 00000002001351               000592   BELLEVUE
010001A015DLOT LES CHARMILLES             N  3  0          00000000000000 00000001998274               000562   CHARMILL
010001A025PLOT LES COQUELICOTS            N  3  0          00000000000000 00000001999300               000572   COQUELIC
010001A028TLOT LES LILAS                  N  3  0          00000000000000 00000002001025               000582   LILAS
010001A030VLOT MUNETVILLE                 N  3  0          00000000000000 00000001991302               000522   MUNETVIL
010001A035ALOT LES MURIERS                N  3  0          00000000000000 00000002003352               000602   MURIERS
010001A100WLOT LES TROIS CHENES           N  3  0          00000000000000 00000001996150               000552   CHENES

Documentation of the format is available at https://data.economie.gouv.fr/api/datasets/1.0/fichier-fantoir-des-voies-et-lieux-dits/attachments/descriptif_du_fichier_national_fantoir_pdf

We'd like to modernize that dataset to exploit it furthermore.

Plan is currently:

Import data from this file in a relational database
Build a pipeline (DAG workflow?) to import fresh data when a new file is released
See if we can enrich data from Wikidata, OSM, BAN, BANO or other sources

PostgreSQL has been picked as:

Good feeling to use PostgreSQL from the Rust ecosystem (first idea was to use Diesel, but as an ORM isn't especially needed here, we use sqlx)
Fast search on a name of a way, with trigram indexes

This task focus on dataset building, ie create a PostgreSQL database with all information and tooling to update data in it.

Next steps

From there, several things can be done:

Create a front-end to search it
Distribute JSON or XML files with rich information split by department
Help Wikidata do detect disparities between the values in FANTOIR file values on Wikidata elements.
Add PostgreSQL support to rVIPER to use our dataset instead of the text file

Revisions and Commits

rDS Nasqueron Datasources
	Needs Review		D2754 Run fantoir-datasource as Airflow pipeline
		D2753	rDS917c197d633c Suggest FANTOIR_TABLE value in fetch command
		D2749	rDSe83205280941 Don't require PostgreSQL schema at build time
		D2738	rDSfbabea18bd71 Fetch last FANTOIR file
		D2745	rDS159b1bdd835c Consume Opendatasoft Explore API
		D2744	rDS0ed6f3047578 Switch repository to monorepo organization
		D2734	rDS5357130c932f Search the imported FANTOIR database
		D2733	rDS3ff0473187d4 Promote imported FANTOIR version
		D2729	rDSb483889ac492 Import FANTOIR file into PostgreSQL db
rOPS Nasqueron Operations
	Accepted		D3280 Configure Salt reactor to provision Airflow pipeline for datasources
		D2943	rOPS53975e67f77b Deploy Airflow to Dwellers
		D2942	rOPS857190beb8ad Allow external connections to PostgreSQL
		D2941	rOPSe472250c6ae7 Create airflow PostgreSQL database
		D2760	rOPSa4a81d1137cc Provision PostgreSQL cluster

Related Objects
Search...

Status	Assigned	Task
Open	inidal	T771 Allow to send notifications from the command line
Open	dereckson	T1750 Import FANTOIR database
Open	dereckson	T1751 Enrich FANTOIR database with Wikidata information
Open	dereckson	T1812 Deploy Airflow
Open	dereckson	T1942 Allow Jenkins to trigger deployment through Salt

Event Timeline

dereckson triaged this task as Normal priority.Jan 12 2023, 01:33

dereckson created this task.

dereckson added a revision: D2729: Import FANTOIR file into PostgreSQL db.

dereckson added a commit: rDSb483889ac492: Import FANTOIR file into PostgreSQL db.Jan 12 2023, 05:05

dereckson added a revision: D2733: Promote imported FANTOIR version.Jan 12 2023, 05:30

dereckson added a commit: rDSb6d0e43ca9f3: Promote importe FANTOIR version.Jan 12 2023, 05:33

dereckson added a commit: rDS3ff0473187d4: Promote imported FANTOIR version.Jan 12 2023, 05:36

dereckson removed a commit: rDSb6d0e43ca9f3: Promote importe FANTOIR version.Jan 12 2023, 05:36

dereckson added a revision: D2734: Search the imported FANTOIR database.Jan 12 2023, 19:01

dereckson added a commit: rDS5357130c932f: Search the imported FANTOIR database.Jan 12 2023, 19:05

dereckson added a revision: D2738: Fetch last FANTOIR file.Jan 15 2023, 05:57

dereckson added a revision: D2744: Switch repository to monorepo organization.Jan 17 2023, 20:56

dereckson added a commit: rDS0ed6f3047578: Switch repository to monorepo organization.Jan 17 2023, 20:59

dereckson added a revision: D2745: Consume Opendatasoft Explore API.Jan 17 2023, 21:03

dereckson added a commit: rDS159b1bdd835c: Consume Opendatasoft Explore API.Jan 18 2023, 06:40

dereckson added a commit: rDSfbabea18bd71: Fetch last FANTOIR file.Jan 18 2023, 19:13

dereckson added a revision: D2749: Don't require PostgreSQL schema at build time.Jan 18 2023, 20:24

dereckson added a commit: rDSe83205280941: Don't require PostgreSQL schema at build time.Jan 18 2023, 23:22

dereckson added a parent task: T771: Allow to send notifications from the command line.Jan 19 2023, 00:40

rDS has now a comprehensive fantoir-datasource tool.

Components to deploy

Next steps are the deployment, with the following components:

PostgreSQL server or cluster
Install fantoir-datasource
Pipeline to execute it

PostgreSQL

The server or cluster should be reachable by our import tool but also the rDB front-end.

A specialized Docker container could be fine, or a more generic like acquisitariat.
But a baremetal or VM is also a possible solution.

Installer

We need to install it either:

on PaaS docker, with a Docker image + a command wrapper
on the PostgreSQL server, if dedicated to datasources, as a FreeBSD package to install

The combo Docker to run it with a connection to a specified PostgreSQL server seems preferable.

Pipeline engine

The pipeline could be driven by Jenkins (at CD level, cd.nasqueron.org). The initial idea was to install a specific tool like Apache Airflow to run pipeline as a DAG.

Can we easily build pipeline on Salt? Reactor with events would be fine, but what about DAG visibility?

If the pipeline runs in a Docker container, how do we run the fantoir-datasource? For Jenkins, we could add it to rust_brown (Rust executor).

For Apache Airflow, a full access to our main Docker engine looks dangerous, Apache Airflow + dedicated Docker/k8s infra is perhaps more relevant.

Pipeline itself

The ideal operation order is documented at https://agora.nasqueron.org/Fantoir-datasource

Two components are missing:

send a notification / we can use T771
configure rDB to use the new table $FANTOIR_TABLE, depends it we'll use Consul or etcd as the ideal client/API, but in both case a HTTP API exists