Datasets come from wildly mixed datasources.
In some cases, especially from spreadsheets and CSV files, we don't have any unique ready-to-use identifier for a column/field, only an heading.
We want to convert automatically any UTF-8 data into a ASCII expression (Contractor name -> contractorName or contractor_name) or a fallback (label1, label2) when it's not possible to determine one.
The architecture should be flexible enough to allow to add specific rules in the future. A use case is we can add transliteration in the mix in the future.