I’m working on a simple 4×4 matrix class template that is using an internal flat 1D array representation of a 4×4 matrix. Within the class so far, default constructors, should all be “trivial”, I currently have operator overloading for both operator[]
and operator()
operators for indexing. Direct indexing via []
and using the well-known formula:
index = row*cols + col;
for indexing into a 1D Array for a 2D representation, mapping, image of that 1D Array. Fairly straight forward here. I’m Maintaining [row][col] ordering and indexing for cache coherency.
I’m also providing a begin
and end
iterator for ease of use within ranged based for loops such that any basic standard container – algorithm would use.
Simple pseudo example:
Mat4<unsigned int> my_mat4 { // values here }
for (auto& v : my_mat4 ) {
// iterate over each element and do work on v.
}
So far so good.
Now I’m onto the point of adding in arithmetic operators and I’m planning on using vector intrinsics, SIMD instructions, in this particular example, SSE instructions. “I can’t test AVX or AVX512” on my system. My system only has support up to SSE4.1
, personal limitations. Just a little bit of background for context.
Okay, so this is now leading to the need of doing template specialization as my goal is to support most arithmetic trivial types including various integer types, and floating types.
And this is where I’m getting the compiler errors on Compiler Explorer where I have the following compiler and options set:
x86-64 gcc 14.1
- flags:
g++
-std=c++17
-O3
-msse2
- flags:
<source>:66:5: error: extra qualification 'Mat4<T>::' on member
'operator+' [-fpermissive]
66 | Mat4<T>::operator+(const Mat4& other) const {
| ^~~~~~~
I’m so close yet, I’m racking my head against the wall here and since I’m doing this in “Compiler Explorer” I’m not actually defining the template in a header
file. I’ve also tried declaring the arithmetic operators outside of the class and that failed even worse, so then I kept the floating
point specializations outside of the class, and brought the integer
types back in.
The use of std::enable_if<>
is really throwing me for a loop here.
Here’s what I currently have so far:
Source: sample.cpp
#include <algorithm> // for std::clamp
#include <cstdint> // for std::uint8_t
#include <immintrin.h> // for SSE/SSE2 intrinsics
#include <type_traits> // for std::is_floating_point, std::is_integral
#include <iostream>
#include <iomanip>
constexpr std::uint8_t stride = 4;
constexpr std::uint8_t size = 16;
template <typename T>
struct Mat4 {
T data[size];
// operator[] for row-major access
constexpr T& operator[]( std::uint8_t index ) {
index = std::clamp( index, std::uint8_t(0), (std::uint8_t)(stride - 1) );
return data[index];
}
constexpr const T& operator[](std::uint8_t index) const {
index = std::clamp( index, std::uint8_t(0), (std::uint8_t)(stride - 1) );
return data[index];
}
// operator() for element-wise access
constexpr T& operator()( std::uint8_t row, std::uint8_t col ) {
row = std::clamp( row, std::uint8_t(0), (std::uint8_t)(stride - 1) );
col = std::clamp( col, std::uint8_t(0), (std::uint8_t)(stride - 1) );
return data[row * stride + col];
}
constexpr const T& operator()(std::uint8_t row, std::uint8_t col) const {
row = std::clamp( row, std::uint8_t(0), (std::uint8_t)(stride - 1) );
col = std::clamp( col, std::uint8_t(0), (std::uint8_t)(stride - 1) );
return data[row * stride + col];
}
// begin and end iterators
constexpr T* begin() {
return data;
}
constexpr const T* begin() const {
return data;
}
constexpr T* end() {
return data + size;
}
constexpr const T* end() const {
return data + size;
}
// Default implementations for basic operations
Mat4 operator+(const Mat4& other) const;
// Other operators here:
// Mat4 operator-(const Mat4& other) const;
// Mat4 operator*(const Mat4& other) const;
// Specialization for integral types using SSE2
template <typename U>
typename std::enable_if<std::is_integral<U>::value, Mat4<U>>::type
Mat4<T>::operator+(const Mat4& other) const {
Mat4 result;
for (std::uint8_t i = 0; i < size; i += 4) {
__m128i a = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&data[i]));
__m128i b = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&other.data[i]));
__m128i c = _mm_add_epi32(a, b); // Change this to _mm_add_epi64 for 64-bit integers
_mm_storeu_si128(reinterpret_cast<__m128i*>(&result.data[i]), c);
}
return result;
}
// Other Integer Type arithmetic operators here.
};
// Specialization for floating-point types
template <>
Mat4<float> Mat4<float>::operator+(const Mat4& other) const {
Mat4 result;
for (std::uint8_t i = 0; i < size; i += 4) {
__m128 a = _mm_loadu_ps(&data[i]);
__m128 b = _mm_loadu_ps(&other.data[i]);
__m128 c = _mm_add_ps(a, b);
_mm_storeu_ps(&result.data[i], c);
}
return result;
}
// Other floating-point type specialization arithmetic operators here
Attempted Driver program to test the class if I can get the class to successfully compile.
int main() {
Mat4<float> mat1 = { 1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12,
13, 14, 15, 16 };
Mat4<float> mat2 = { 16, 15, 14, 13,
12, 11, 10, 9,
8, 7, 6, 5,
4, 3, 2, 1 };
Mat4<int> mat3 = { 1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12,
13, 14, 15, 16 };
Mat4<int> mat4 = { 16, 15, 14, 13,
12, 11, 10, 9,
8, 7, 6, 5,
4, 3, 2, 1 };
Mat4<float> sum_f = mat1 + mat2;
Mat4<int> sum_i = mat3 + mat4;
std::cout << "Sum (float):n";
for (const auto& element : sum_f) {
std::cout << std::setw(4) << element << ' ';
}
std::cout << "nnSum (int):n";
for (const auto& element : sum_i) {
std::cout << std::setw(4) << element << ' ';
}
std::cout << std::endl;
return 0;
}
I even attempted to switch the compiler to MSVC
and was getting a similar compiler error but still no success, and eventually I reverted back to GCC
.
I’ve been staring at it for too long, and now my mind is doing this: ...
I know it’s probably staring me in the face, and when the resolution to this is known, it’s going to be one of those “duh… moments”. Just a fresh set of eyes to point out what I’m overlooking is much needed and extremely appreciated.
Edit
Here’s the current link to Compiler Explorer
The provided Q/A’s that were given as a solution or answer to my specific problem do not answer my specific question or problem.
The first Q/A that was suggested does not addressing class templates or template specializations nor its syntax.
The second Q/A isn’t even the same language, completely N/A.
The third Q/A is about the language standard itself.
And finally, none of them pertain to working within Compiler Explorer itself. I’m not using a stand-alone compiler – linker either via command line terminal, or via an IDE.
I also shortened it down to specifically the opeator+()
7