Page MenuHomeDevCentral

Enrich FANTOIR database with Wikidata information
Open, NormalPublic

Description

Wikidata uses the property P3182 to add FANTOIR code to an entity.

From there, we can get two interesting information:

  • the French label with rich typography
  • the nature (P31 property) of the element, to identify "pseudo-voies" like metro station

How to query Wikidata?

We can query them with the following SPARQL request:

PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

# Streets with FANTOIR code
SELECT DISTINCT ?code_fantoir ?item ?itemLabel ?what
WHERE
{
  ?item wdt:P3182 ?code_fantoir .
  ?item wdt:P31 ?what
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
}

Plan

  • query Wikidata from our FANTOIR import solution
  • if a Wikidata entity has several values for P31, determine the most relevant one
  • normalize FANTOIR code, as they are imported to Wikidata under several formats.

Example of dataset

As I've currently a small proof of concept, let's query our FANTOIR and Wikidata information:

SELECT f.code_fantoir, item_label, code_nature_voie, libelle_voie
FROM fantoir_wikidata wd JOIN fantoir_202210 f
    on wd.code_fantoir = f.code_fantoir;

This returns this dataset:

Wikidata x FANTOIR.PNG (1×1 px, 256 KB)

Event Timeline

dereckson triaged this task as Normal priority.Jan 12 2023, 01:37
dereckson created this task.
dereckson added a project: Wikimedia.
dereckson moved this task from Backlog to Working on on the Nasqueron Databases board.

Wikidata uses local files, with codes not always in national FANTOIR schema. Should we use them too?

If so:

Or:

  • Change the main fantoir schema to add a source field, 'N' for national, 'L' for local
  • Change the import command to add 'N' on this source field
  • Import after national data the local data in a separate follow-up command before Wikidata enrichment and promotion

The second method would give a better trigram index.

That would help to offer codes for cancelled voies not recorded anymore in the national file.