Relative Content

Tag Archive for parallel-processingdaskdask-dataframe

BrokenProcessPool Error processing large data set

I am trying to process a large data set that has 167373 exercise logs. For context I am combining all coordinate data in each exercise log to create vectors and then calculating cosine similarity to find the most similar ones – which these may indicate common routes for running or biking. I need to write my code in dask/parallel and so far I think my code is breaking when I try to calculate (log,cosine sim value pairs)

BrokenProcessPool Error processing larg data set

I am trying to process a large data set that has 167373 exercise logs. For context I am combining all coordinate data in each exercise log to create vectors and then calculating cosine similarity to find the most similar ones – which these may indicate common routes for running or biking. I need to write my code in dask/parallel and so far I think my code is breaking when I try to calculate (log,cosine sim value pairs)