Page MenuHomeDevCentral

Library to convert between CSV and JSON data formats with sensible identifiers
Closed, WontfixPublic

Description

CSV data seems the format the most versatile to export data from an HTML table, a spreadsheet or a SQL table.

For example we could have:

Timestamp,Name,Age
4/11/2013,Raymond Herisse,22
4/11/2013,Gil Collar,18
4/12/2013,Stanley LaVon Gibson,43

If we convert this CSV into a JSON document, we'll be able:

  • to store it in MongoDB database
  • to make the API able to use it, allowing to query a dataset

The plan is to:

  • extract the columns' titles as identifiers
  • create an array of objets
  • each object has properties matching the columns' titles

For example, our CSV should be converted into this:

[
    {
        "Timestamp": "4/11/2013",
        "Name": "Raymond Herisse",
        "Age": 22
    },
    {
        "Timestamp": "4/11/2013",
        "Name": "Gil Collar",
        "Age": 18
    },
    {
        "Timestamp": "4/12/2013",
        "Name": "Stanley LaVon Gibson",
        "Age": 43
    }
]

Here some testable not nice data. We could extract the columnns' fields and get camel case expressions for them. 's could be discard, () ignored, a number appended if several fields have the same identifier.

Timestamp,Subject's name,Subject's age,Subject's gender,Subject's race,URL of image of deceased,Date of injury resulting in death (month/day/year),Location of injury (address),Location of death (city),Location of death (state),Location of death (zip code),Location of death (county),Agency responsible for death,Cause of death,A brief description of the circumstances surrounding the death,Official disposition of death (justified or other),Link to news article or photo of official document,Symptoms of mental illness?,Source/Submitted by,Email address,Date&Description,,Unique identifier
4/11/2013,Raymond Herisse,22,Male,African-American/Black,http://graphics8.nytimes.com/images/2013/08/04/us/MIAMI/MIAMI-popup.jpg,"May 30, 2011",18th Street and Collins Avenue,"South Beach, Miami Beach, Florida",FL,33139,Miami-Dade,Miami Beach Police Department,Gunshot,Police tried to stop Herisse’s speeding four-door Hyundai as it barreled down a crowded Collins Avenue.,justified,http://www.miamiherald.com/2013/04/10/3336557/lab-report-man-slain-in-wild-sobe.html#storylink=cpy,Unknown,Burghart,,5/30/2011: Police tried to stop Herisse’s speeding four-door Hyundai as it barreled down a crowded Collins Avenue. http://www.miamiherald.com/2013/04/10/3336557/lab-report-man-slain-in-wild-sobe.html#storylink=cpy,,1.1

Expected column identifiers:

Timestamp
SubjectName
SubjectAge
SubjectGender
SubjectRace
URLOfImageDeceased
DateOfInjuryResultingInDeath
LocationOfInjury
LocationOfDeath1
LocationOfDeath2
LocationOfDeath3
[...]

Event Timeline

dereckson renamed this task from Convert between CVS and JSON data formats to Convert between CSV and JSON data formats.
dereckson raised the priority of this task from to Normal.
dereckson updated the task description. (Show Details)
dereckson added a project: Tasacora.
dereckson moved this task to Backlog on the Tasacora board.
dereckson added a subscriber: spI33n.
dereckson added a subscriber: dereckson.

I think that the objective are to vague for making something concrete.

In a nutshell the objective is to represent a CSV dataset in JSON, with the assertion the first line gives us properties names.

Spl33n has started to work on this at D11.

represent the CSV as JSON is the action not the objective it is the functionnallity implemented

The objective quoted are

  • to store it in MongoDB database
  • to make the API able to use it, allowing to query a dataset

The need of a MongoDB is not raised for the moment.
The API is still not begin to be designed if any needed. No one know the format it will need.

in my opinion it fills a you aren't gonna need it functionnallity

Initial context of this task

Yes, it probably falls under the document assumption.

The document provides "But unless your project is very different from mine, you already have too much to do right now. Doing more now is a very bad thing when you already have too much to do.".

One of the concern here were to provide a concrete task to spl33n. It's probably be a bad management move to write the spec of a part of the API and to give it to implement to a contributor, before the general API architecture is well documented. That raises the concern to get a project manager for this project, and we approached Kumkum for this role, but they instead provided us with (valuable) documentation and notes about geography and maps.

We had a small discussions on the channel about API, but not document has bee consolidated for the results. In this discussion, it has been decided not to use XML in favour of more lightweight formats.

We used the dataset as a working example, and agreed data exchange will be in JSON. We also note datasets are often in CSV.

The general idea were the format documented in the task description should be the format our API want.

Action needed

We need to design and document the API in a working draft document to avoid such occurrences.

We decrease priority for this task, as not currently vitally required by a specific project.

At this time, we've discussed about offering a converter online at Nasqueron Tools and to use it to fill Anuta (a CLI upload script for Wikimedia Commons, which wants CSV files).

Scope

So I'm revamping the task to give it a scope.

We want:

  1. to solve the problem generic data headings should be transformed on universal alphabetic key identifiers
  2. provide a library to use this key pattern
  3. offer a reference implementation with a CSV to JSON converter to show this in practice (and nothing more as long as no real need for example from the API spec is decided)
dereckson renamed this task from Convert between CSV and JSON data formats to Library to convert between CSV and JSON data formats with sensible identifiers.Nov 30 2015, 21:15
dereckson lowered the priority of this task from Normal to Low.

I did not understood, it was an easy task. It totally changed my mind and find it well fitted.

I have reserve concerning the format. As it is intended to be treated as tabular data, I think the following would be more suitable

{
   "headers": ["Timestamp","Name","Age"]
    "data": [ ["4/11/2013", "Raymond Herisse", "22"],
              ["4/11/2013", "Gil Collar", "18"],
              ["4/12/2013","Stanley LaVon Gibson", "43"] 
            ]
 }

This format has the additional property to be more compact either plain or compressed.

Note that in either format there is no reason that the age is converted to integer.

dereckson claimed this task.

Archiving Tasacora project as this project doesn't currently have any traction or resources.

Thanks a lot to Rama, Ash Crow, Kumkum and Harmonia for their support on this project.

If any developer is interested, please get in touch to reopen those tasks in bulk:
a bulk update from Wontfix to Open is suitable.