I have this simple piece of code that iterates through a list of 12,000 detection results (from a YOLOv8 detection model) and performs some processing on them. I’ve already applied a few optimization techniques, such as pre-allocating vectors for efficiency and only looping through non-zero confidence rows. However, the code is still too slow for my use case.
// Prepare containers for detected boxes, confidences, and class IDs
std::vector<cv::Rect> boxes;
std::vector<float> confidences;
std::vector<int> class_ids;
// Pre-allocate vectors for efficiency
boxes.reserve(cols);
confidences.reserve(cols);
class_ids.reserve(cols);
clock_t start_time = clock();
// Iterate through the rows of the output array
for (int i = 0; i < cols; i++) {
cv::Mat classes_scores = output.col(i).rowRange(4, output.rows);
// Find the class with the maximum score
double maxScore;
cv::Point maxClassLoc;
if (cv::countNonZero(classes_scores) > 0) {
cv::minMaxLoc(classes_scores, nullptr, &maxScore, nullptr, &maxClassLoc);
int maxClassIndex = maxClassLoc.y; // Note: use maxClassLoc.y for row index
// Apply confidence threshold
if (maxScore >= 0.25) {
// Extract bounding box coordinates relative to the image size
float x = output.at<float>(0, i);
float y = output.at<float>(1, i);
float w = output.at<float>(2, i);
float h = output.at<float>(3, i);
// Calculate absolute coordinates
int x_scaled = static_cast<int>((x - 0.5f * w) * scale);
int y_scaled = static_cast<int>((y - 0.5f * h) * scale);
int w_scaled = static_cast<int>(w * scale);
int h_scaled = static_cast<int>(h * scale);
cv::Rect box(x_scaled, y_scaled, w_scaled, h_scaled);
boxes.push_back(box);
confidences.push_back(static_cast<float>(maxScore));
class_ids.push_back(maxClassIndex);
}
}
}
What are the ways to optimize this code further and improve its performance and runtime?
My target platform is a constrained device, so I’m limited to using only C++ standard functions and cannot utilize any third-party libraries. For compiling, I just simply use g++ draw_detections.cpp
.
Here is the full C++ code and inputs needed to reproduce and run.