Currently, I am working on a Machine Learning project with my colleagues where we don’t have much data to train the model so we scrapped data from multiple places and kept in JSON format because sometimes we get data for some fields or keys and sometimes we don’t.

But when we use it for training we need to put them in table structure or CSV format. By doing it we also will know about which features we have data and for which we don’t. After that, we can analyze the data and then we can impute where we can and drop features that we can’t impute.

We can easily write JSON data to CSV file if JSON is flat structured and we know all the keys. The code is simple for this. Load each JSON so that it will become a dictionary object then we can put it in the list after that using Dictwriter in CSV module we can write it to CSV file but we have 3 problems here1. Nested JSON structure2. All JSONs don’t have the same structure. Some keys commonly available in each JSON and some keys present in some JSONs and absent in some JSONs.3. We don’t know the unique keys list.

After some googling found answer to the above problems here. Using that wrote code according to my requirement. For example purpose, I am going to create the JSON dataset which is somewhat similar to my problem for that I used the JSON Generator tool.

This code is pretty much self-explanatory. Basically what we are doing here is loading the JSON then iterating it one by one then converting each object(dictionary) from nested structure to flat structure then extracting unique keys after that writing it to CSV

Some times we want to write only specific keys data to CSV if that is the case then we just need to create a list with keys which we want to and use that list as CSV DictWriter fieldnames and in addition to it, we need to use extrasaction=”ignore” because if the dictionary has keys more than we mentioned in fieldnames it will give a ValueError.