I have 54 data files generated by a simulation. Each file has 10 million rows, and each file is several GB in size.
I need to read each file, compute their autocorrelation, and fit the curve.
What is the best way to do that in the shortest time possible?
I am using the C# programming language, parallel loops, and stream readers. To give you an idea, reading and processing 2 million lines from nine different files (i.e., 18 million lines in total) is taking 4 hours.
This is the configuration of the remote computer I am working on:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Stepping: 7
CPU MHz: 2625.762
CPU max MHz: 2800,0000
CPU min MHz: 1200,0000
BogoMIPS: 4600.17
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11
Should I migrate to Amazon AWS or should I rent HPC clusters?