Absolutely! Here’s how you could phrase your question for Stack Overflow to get help regarding feature standardization for using GraphSAGE in Neo4j:
Title: How to Standardize Features and Relationship Weights for GraphSAGE in Neo4j?
Body:
I am working on preparing my Neo4j graph for machine learning with the GraphSAGE algorithm, and I need help with standardizing node features and relationship weights across heterogeneous nodes and relationships.
My graph consists of several node types (User
, Product
, Review
, Word
) and relationships (BOUGHT
, REVIEWS
, WROTE
, CO_OCCURS
, CONTAINS
). The Word
nodes have a vector
property (a list of floats), and I need to create a uniform feature vector of length 100 for the other node types (User
, Product
, Review
) where non-existent properties are represented by zeros, and the first element is a dummy feature set to 1.
Additionally, I am working with two types of weighted relationships where weights (count
and rating
) differ significantly in their range and significance. I need to standardize these weights for use in GraphSAGE.
Here is what I am currently doing:
-
Setting Dummy Features:
cypher
CALL { MATCH (u:User) SET u.dummyFeature = 1 RETURN count(*) AS updatedCount, 'User' AS nodeType UNION ALL MATCH (p:Product) SET p.dummyFeature = 1 RETURN count(*) AS updatedCount, 'Product' AS nodeType UNION ALL MATCH (r:Review) SET r.dummyFeature = 1 RETURN count(*) AS updatedCount, 'Review' AS nodeType } RETURN nodeType, updatedCount;
-
Updating Feature Vectors:
cypher
MATCH (n) SET n.featureVector = CASE WHEN n:Word THEN n.vector ELSE [1.0] + apoc.coll.repeat(0.0, 99) END
-
Standardizing Relationship Weights:
cypher
MATCH ()-[r]->() SET r.unifiedWeight = CASE WHEN exists(r.count) THEN toFloat(r.count) WHEN exists(r.rating) THEN r.rating ELSE 1.0 END
Questions:
-
How can I ensure that the feature vector creation and weight standardization are optimally set up for GraphSAGE training? Is this the standard procedure, converting scalar to vectors?
-
Are there more efficient or standardized methods to handle feature vector alignment and relationship weight normalization in Neo4j for machine learning purposes?
Any advice or improvements on this setup would be greatly appreciated!
Our GraphSage implementation has support for having different properties on different disjoint node labels.
Hence it would be better to enable that mode of graphsage. This is done by setting a value for projectedFeatureDimension
.
That way, you can project a graph that only has the dummy feature for the node labels User, Product and Review and project the vector
property for the word nodes. If you set projectedFeatureDimension
to 100, the algorithm will take care of the normalization by using learned projection matrixes that map the input features into 100-dimensinal features for each node.
You might also want to use GDS to compute the degree and use that as a “dummy” feature. It might or might not give you better results than the constant value of 1.0.
Hope it helps 🙂
/ Jacob from GDS team