Page MenuHomeDevCentral

Query Wikidata to enrich FANTOIR file
ClosedPublic

Authored by dereckson on Jan 10 2023, 19:53.
Tags
None
Referenced Files
F4019297: D2731.id6922.diff
Sat, Jan 18, 12:43
F4018062: D2731.diff
Sat, Jan 18, 05:43
Unknown Object (File)
Fri, Jan 17, 22:55
Unknown Object (File)
Fri, Jan 17, 22:54
Unknown Object (File)
Thu, Jan 16, 01:42
Unknown Object (File)
Wed, Jan 15, 10:04
Unknown Object (File)
Wed, Jan 15, 03:55
Unknown Object (File)
Tue, Jan 14, 14:50
Subscribers
None

Details

Summary

Create a fantoir_wikidata table with label and P31 information
to offer rich typography and determine what kind of pseudo-voie
an entry is.

Query Wikidata through the SPARQL end-point and map results.
Select most relevant P31 information.

As some Wikidata entries imports code RIVOLI from BAN, we've two
concurrent formats for DOM/TOM. To allow a foreign key, but also
to identify such Wikidata entries for further improvement, we store
in a separate column the FANTOIR code used by Wikidata.

Ref T1751.

Test Plan

Run it and get a full fantoir_wikidata table

Diff Detail

Repository
rDS Nasqueron Datasources
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
wikidata
Build Status
Buildable 4329
Build 4596: arc lint + arc unit

Event Timeline

dereckson created this revision.

A lot of code to handle exceptions, mainly because there are
several formats used to represent the FANTOIR code.

Two strategies will be used to find the full code:

  1. Compute it when cle RIVOLI is missing or code direction can be determined.
  2. Query it for departments like 13/59/75/92/94 (*)

(*) Blocked by a search service to query the fantoir_... table to find information.
I'll work on that on another branch, as table can't exist or name can be unknown.

Also, sometimes a code FANTOIR will be missing. For example, I've found one
in a collectivity file not included in the national file on a Wikidata article.

We need code to skip them for now, and add a Wikidata health maintenance report later.

Don't hardcore wikidata table name twice

Clean comment and debug code

This version is ready and can correctly update Wikidata. If we can solve the 403 issue when connecting to SPARQL endpoint, it's ready to commit. Meanwhile, to test this code, you can run the query and save the result (in XML format) at tmp/wikidata-query-result.xml

Fix SPARQL client, refactor

Refactor to a cleaner implementation, move sparql mod to services

dereckson retitled this revision from WIP: Query Wikidata to enrich FANTOIR file to Query Wikidata to enrich FANTOIR file.Jan 15 2023, 01:24
dereckson edited the summary of this revision. (Show Details)
dereckson added inline comments.
src/services/sparql.rs
1 ↗(On Diff #6946)

See T1752 for future projects to release this into a standalone crate.

This revision is now accepted and ready to land.Jan 15 2023, 01:29
This revision was automatically updated to reflect the committed changes.