I am implementing an algorithm which requires taking thousands of random samples from an array of 100 integers.
std::vector<int> weight_vector(100);
I would like to understand how I can improve the running time of my current approach, as it is the main bottleneck in my overall algorithm. Consider the following code which times weighted random sampling.
// let's just populate weight_vector somehow: 1, 2, ..., 99
iota(weight_vector.begin(), weight_vector.end(), 0);
std::discrete_distribution<int> weighted_dist = std::discrete_distribution<>(
weight_vector.begin(),
weight_vector.end()
)
auto start std::chrono::high_resolution_clock::now();
for (int n = 1; n <= 10000; n++)
{
int random_index = weighted_dist(generator);
// do something with randomly sampled index...
}
auto end std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms_double = end-start;
std::cout << ms_double.count() << "ms" << std::endl;
The running time of this loop is around 2.9ms. This is currently unacceptably high, and bringing this down to an order of magnitude of 0.1ms would be a great success. Is there any obvious way I can do this? Or perhaps a more efficient way to generate my random indices?
I suspect this might be possible because instead setting int random_index = rand()%100;
gives the loop a running time of less than 0.1ms. Of course, this is the wrong distribution, however it leads me to suspect sampling random bits need not be as time consuming as the first approach (e.g. can I get a weighted random bit through one or more calls to rand()
?). I am not concerned about the “quality” of the randomness, and if I was sampling a uniform index, rand()
would be entirely sufficient.
I would like to consider this before I turn towards parallelisation (all of my loop iterations are independent and the end result is basically an average of the loop values, so in theory this would be possible). If I eventually go this way, would parallelisation be feasible? Or are running times on the order of ms below the threshold at which the overheads of thread management begin to dominate?
Thank you very much for your help, and I apologise for the vague nature of my question. I am quite inexperienced with C++ and ‘low-level’ programming, so please excuse me.