I am trying to solve a problem of finding a max repetitive sub-tree in an object tree.
By the object tree I mean a tree where each leaf and node has a name.
Each leaf has a type and a value of that type associated with that leaf.
Each node has a set of leaves / nodes in certain order.
Given an object tree that – we know – has a repetitive sub-tree in it.
By repetitive I mean 2 or more sub-trees that are similar in everything (names/types/order of sub-elements) but the values of leaves. No nodes/leaves can be shared between sub-trees.
Problem is to identify these sub-trees of the max height.
I know that the exhaustive search can do the trick. I am rather looking for more efficient approach.
You can hash your sub-trees: for each node, generate a hash value based on the values of interest (i.e., excluding leaf values) and the hash values of its children, in order.
Use this hash to insert your nodes into an appropriately-sized hash table, and check collisions to determine which are actually repetitive, and which are accidental.
1
You could trade off some storage for computation:
On insert check for similarity with parent and if found, copy “similar tree counter” from parent and increment for child. What you do with that is up to you. Either leave it in a tree or a keep a map somewhere indexed on the head of each sub-tree.
At minimum you’re only doing n-1 comparisons, not counting whatever accounting you choose to do to keep track of the sub-trees.