I’m not sure what the most efficent solution to this algorithm is. I’m not sure this fits exact cover, and approaching it via exhaustive backtracking seems somewhat inefficient. Any suggestions on how to approach this?
There are 210 Invoices and 1700 bills – these bills add up to these invoices
The association between bills and invoices is lost . The only way to match them is by adding them up to correct amounts that are equal to the invoices.
For example, consider the case where you have 2 invoices for 80, 210 and you have bills for values 50, 10, 10, 30, 20, 70, 100.
One of the possible solution is:
80 = 50 + 30
210 = 10 + 10 + 20 + 70 + 100
Other possible solution is
80 = 50 + 10 + 20
210 = 30 + 20 + 70 + 100
What is the best possible way to get all solutions? Remember you are dealing with big datasets.
This problem is harder than solving it for 2 invoices (just group all invoices but one). If we try solving it for 2 invoices and 1700 bills we see that the sum of the 2 invoices is equal to the sum of the 1700 bills, so all we need to find is a subset of the 1700 bills whose sum is equal to one of the invoices. Thus we have a subset sum problem, which means that this is a NP-hard problem. So, unless we have some reasonable restriction on what values the invoices can take this problem is not solvable in a reasonable amount of time.
The answer is thus that this problem is not solvable with the given information. However, if we assume that the largest invoice is reasonably small we can use dynamic programming to solve this problem.