Yesterday at work, my colleague claimed that preprocessor macros were slower than writing variables and functions manually. The context is that we have a class in which member variables are sometimes added and for each of these member variables, 3 different methods have to be created in exactly the same pattern. We had these generated automatically using macros, as shown below.
struct Bar
{
long long a;
long long b;
long long c;
long long d;
};
struct Foo
{
Bar var[1300];
};
typedef std::vector<Foo> TEST_TYPE ;
class A
{
private:
TEST_TYPE container;
public:
TEST_TYPE& getcontainer()
{
return container;
}
};
#define createBMember(TYPE, NAME)
private:
TYPE NAME;
public:
TYPE& get##NAME()
{
return NAME;
}
class B
{
createBMember(TEST_TYPE, container);
};
double testA()
{
A a;
LARGE_INTEGER frequency;
LARGE_INTEGER startA, endA;
if (!QueryPerformanceFrequency(&frequency)) {
std::cerr << "High-Resolution-Timer nicht unterstützt." << std::endl;
return 1;
}
QueryPerformanceCounter(&startA);
for(size_t i = 0; i < 10000; ++i)
{
a.getcontainer().push_back(Foo());
}
QueryPerformanceCounter(&endA);
return static_cast<double>(endA.QuadPart - startA.QuadPart) / frequency.QuadPart;
}
double testB()
{
B b;
LARGE_INTEGER frequency;
LARGE_INTEGER startB, endB;
if (!QueryPerformanceFrequency(&frequency)) {
std::cerr << "High-Resolution-Timer nicht unterstützt." << std::endl;
}
QueryPerformanceCounter(&startB);
for(size_t i = 0; i < 10000; ++i)
{
b.getcontainer().push_back(Foo());
}
QueryPerformanceCounter(&endB);
return static_cast<double>(endB.QuadPart - startB.QuadPart) / frequency.QuadPart;
}
//----------------------------------------------------[main]
int main()
{
double Atest = 0;
double Btest = 0;
double AHigh = 0;
double BHigh = 0;
double ALow = 10000;
double BLow = 10000;
double a;
double b;
const uint16_t amount = 30;
for(uint16_t i = 0; i < amount; ++i)
{
a = testA();
AHigh = a > AHigh ? a : AHigh;
ALow = a < ALow ? a : ALow;
Atest += a;
}
for(uint8_t i = 0; i < amount; ++i)
{
b = testB();
BHigh = b > BHigh ? b : BHigh;
BLow = b < BLow ? b : BLow;
Btest += b;
}
Atest /= amount;
Btest /= amount;
std::cout << "A: " << Atest << std::endl;
std::cout << "B: " << Btest << std::endl;
auto size = sizeof(Foo);
return 0;
}
I tried to refute his statement with this test by having a fairly large struct, which I simply append in a vector in each test run.
The strange thing, however, was that although the preprocessor runs before compiling and both classes should therefore be identical, I measured some speed differences. The following observations were made:
- In debug mode without any optimization, the class that is tested first is faster
- In release mode with “whole-program-optimization” and other settings, B is faster. The last times were: A: 0.47695, B: 0.430825
This confuses me, because as I said, both classes are identical.
I should also mention that unfortunately, as far as our development environment is concerned, we have to work with a kind of snapshot version of C++11 (Visual Studio 2010). That’s why I can’t use std::chrono for benchmarking, for example.
I haven’t been able to test it with other compilers yet. I also looked at the assembly code on godbolt.org, but didn’t find anything that could make such a big difference.
Admittedly, I’m still a trainee and would classify my skills as more of an amateur. Does anyone have any idea what could be causing this difference in speed?
Ccre is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.