I have automated identification system (AIS) data as nested dictionaries in a pandas dataframe. Here is an example:
dfAIS
Out[18]:
Message ... MetaData
0 {'ShipStaticData': {'AisVersion': 1, 'CallSign... ... {'MMSI': 255814000, 'MMSI_String': 255814000, ...
1 {'MultiSlotBinaryMessage': {'ApplicationID': {... ... {'MMSI': 2276003, 'MMSI_String': 2276003, 'Shi...
2 {'StandardClassBPositionReport': {'AssignedMod... ... {'MMSI': 503760500, 'MMSI_String': 503760500, ...
3 {'PositionReport': {'Cog': 25.2, 'Communicatio... ... {'MMSI': 211648000, 'MMSI_String': 211648000, ...
4 {'StaticDataReport': {'MessageID': 24, 'PartNu... ... {'MMSI': 338467989, 'MMSI_String': 338467989, ...
... ... ...
139625 {'PositionReport': {'Cog': 360, 'Communication... ... {'MMSI': 244730300, 'MMSI_String': 244730300, ...
139626 {'PositionReport': {'Cog': 231.5, 'Communicati... ... {'MMSI': 219025528, 'MMSI_String': 219025528, ...
139627 {'PositionReport': {'Cog': 360, 'Communication... ... {'MMSI': 273252100, 'MMSI_String': 273252100, ...
139628 {'UnknownMessage': {}} ... {'MMSI': 244730043, 'MMSI_String': 244730043, ...
139629 {'ShipStaticData': {'AisVersion': 1, 'CallSign... ... {'MMSI': 211666470, 'MMSI_String': 211666470, ...
[139630 rows x 3 columns]
The core data resides in column “Message”. Each element therein is a dictionary with only 1 key and 1 value. As shown above, the one key might be “ShipStaticData”, “MultiSlotBinaryMessage”, etc., which we can think of as message types. Let’s call this dictionary the level-1 dictionary.
The level-1 dictionary is an extraneous layer of mapping because the desired data resides in the corresponding value, which itself is an entire dictionary. Let’s call the latter the level-2 dictionary. As shown above, the fields in the level-2 dictionary can be “AisVersion”, “ApplicationID”, “Cog”, etc. The valid fields are specified here.
I don’t need the key for the level-1 dictionary because the level-2 dictionary contains a MesssageID field that much less ambiguously maps to the message type. Furthermore, dataframe dfAIS
also has a MessageType column not shown above that contains the same label as the sole key in the level-1 dictionary.
I’ve been educating myself on dataframe manipulations using apply to extract the MessageID from the level-2 dictionary. I also found that json_normalize is a great option. Unfortunately, in nested dictionaries, it requires that the hierarchical path to common fields have the same path components. I cannot use it for the above scenario because Message.ShipStaticData.MessageID
is a different path from Message.MultiSlotBinaryMessage.MessageID
. In both of these paths, the 2nd of the 3 path components is the extraneous mapping layer that I wish didn’t exist.
How can I un-nest the nested dictionary so that I can access the MessageID field?
P.S. I was also wrestling with how to refer to the labels “ShipStaticData”, “MultiSlotBinaryMessage”, etc. They are the keys of the level-1 dictionary. It would have been convenient to talk about the “value” taken on by the level-1 key, but “value” already refers to the thing that is mapped to by the key. We can very carefully say that the latter is the value associated with the key, or for the key, but it is still asking for confusion. Is there a clearer (and succinct) way to refer to “ShipStaticData”, “MultiSlotBinaryMessage”, etc.?