I hope this isn’t a bad question to ask. For a university project, I’ll have to process several hundreds (if not thousands) of .csv
files, each containing two curves – one curve as the machine is approaching the target and the other while it’s retracting.
Now the variables are – height, deflection.nm, separation, deflection.nN and the categorical variable segment, which differentiates between Approach and Retract curves. Because of the volume of curves we’ll have to process, I’m inclined to use libraries like mlpack or ROOT for this. My question is – how should I go about preparing a single (or multiple) files to store all the data so that it can serve as input to some machine learning algorithm in a more productive and smart manner?
I’ll probably have to use some shell scripts to first gather all .csv files in the first place (because they’re all scattered across multiple directories for some reason), all are named in the same fashion (which will inevitably create conflicts) so I’ll have to rename them as well. The last step is the one that I’m a bit unsure – how to properly store and then encode, say for example, the group they belong to for a logistic regression.
Thanks for any tip!