Page MenuHomeDevCentral
Paste P337

README for language-subtag-registry-datasource

Authored by dereckson on May 29 2023, 20:31.
The `language-subtag-registry-datasource` utility allows to download
IANA language subtag registry datasource defined in the RFC 5646,
parse it, and transform the output.
This registry shares language codes with the different ISO-639 lists,
but is more inclusive and descriptive.
It has been designed to output the index in an arbitrary format,
so we can export a Darkbot database for Odderon, one of our IRC bot.
## Usage
--format <format string>
[--aggregation-separator <separator string>]
[--source /path/to/registry.txt]`
The format string can be arbitrary text or variables:
| **Variable** | **Description** |
| %%<key>%% | A field in the registry |
| %%fullstatus%% | A string built with description, comments |
If an entry doesn't have the required field, it left blank.
Examples for the variables:
- `%%Description%%` will output `Inupiaq` for the `ik` subtag
- `%%Description%%` will output `Sichuan Yi / Nuosu` for the `ii` subtag
- `%%Comments%%` will output an empty string for both `ik` and `ii` subtags
- `%%fulldescription%%` will output "Serbo-Croatian - sr, hr, bs are preferred for most modern uses" for `sh`
If a language has several values, they are coalesced and a specific string
is used as separator. Default separator is " / ". It can be overridden with
The utility uses as source, by order of priority:
- the path specified to the --source argument
- any `registry.txt` file available in the current directory
## Recipes
### Darkbot database
language-subtag-registry-datasource --format "lang+%%Subtag%% %%fulldescription%%"
### CSV export
language-subtag-registry-datasource --format '%%Subtag%%,%%Type%%,"%%Description%%",%%Added%%,%%Scope%%,"%%Comments%%"'