Joining multiple Spark datasets in Java makes it very slow
In Spark I am using MinHashLSH to do approximate similarity join. I am doing this on N sets of columns.
Joining multiple Spark datasets in Java makes it very slow
In Spark I am using MinHashLSH to do approximate similarity join. I am doing this on N sets of columns.