I am trying to organize a graph data exchange process between ArangoDB and DGL for learning graph neural networks on a data that I have in Arango collections.
Case 1
I am exporting data from ArangoDB to DGL using metagraph, teaching a GNN, updating target labels and importing results back into ArangoDB.
op_metagraph = {
"vertexCollections" : {
"tmp_v_accounts" : {
"features" : {
"creation_dt" : IdentityEncoder(dtype=torch.int),
"foreign_flg" : CategoricalEncoder()
}
}
},
"edgeCollections" : {
"tmp_e_operations" : {
"features" : {
"operation_dt" : IdentityEncoder(dtype=torch.int),
"operation_sum" : IdentityEncoder(dtype=torch.int)
},
"label" : "fraud_flg"
}
}
}
dgl_tx = adbdgl_adapter.arangodb_to_dgl(
"operations_graph",
metagraph=op_metagraph
)
#learning...
adb_g = adbdgl_adapter.dgl_to_arangodb("new_graph", dgl_tx)
The initial problem is that, while the network did do some work, I couldn’t interpret the results because the import process created new collections and:
- I could not add technical attributes (such as the element’s _key or _id) to the original graph data as it would be wrongfully used in the learning process;
- I could not match the records from old and new collections using non-technical attributes as they were changed in the learning process, and even if they weren’t, there would be no guarantee a given record’s attributes are unique across all data.
Case 2
I tried including tech attributes into the metagraph, hoping to exclude them from learning later by editing the model itself.
Trying to include _key attribute as a string:
"features" : {
...,
"_key" : CategoricalEncoder()
}
Produces KeyError: <some arbitrary number>
, which I assume (but cannot confirm) to be that DGL cannot upload that much categorical data into the graph.
Trying to create a new attribute as TO_NUMBER(_key)
via AQL and include it as an integer:
"features" : {
...,
"t_int_key" : IdentityEncoder(dtype=torch.int)
}
Produces the following error:
Exception has occurred: DGLError
[12:18:17] C:UserspeizhouworkspaceDGL_scriptsreleasewin-64dglsrcgraphunit_graph.cc:69: Check failed: aten::IsValidIdArray(src):
Which I could not find an explanation for. Stunning is the fact the my username is not peizhou
, and I do not have such user on my computer. So the error seemingly doesn’t even originate from my own installation of DGL.
Summary
The question that stems from this is how can I either:
Update the existing ArangoDB collections with data that I obtained from the DGL, rather than imporing data into entirely new collections, or:
Include technical attributes, such as the elements’ keys, into the DGL graph, to be able to match the records in old and new collections after importing data into the new ones?