Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics:

for (uint r = 0; r < height; r++)
{
    uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch * (r + yPos);
    uint32* bmpPixels = bitmap->pixels + bitmap->pitch * r;

    for (uint p = 0; p + NUM_SIMD_PIXELS /* 4 */ <= width; p += NUM_SIMD_PIXELS)
    {
        __m128i bmpVec = _mm_loadu_si128((__m128i*)(bmpPixels + p));

        _mm_storeu_si128((__m128i*)(bufPixels + p), bmpVec);
    }
}

I want to implement scaling into this, but I can’t grasp myself on any ideas without having to resort to scalar operations.

With SSE2 I am always working with 4 pixels at once and thus I can’t just loop over width * xScale and do p / xScale on each of the pixels to get the actual pixel index in the bitmap’s pixel buffer.

Not really looking for code examples, more just ideas.

EDIT: here is a scalar example of the scaling I wish to accomplish:

for (uint r = 0; r < height * yScale; ++r)
{
    uint bmpRowIndex = (uint)(r / yScale);

    uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch * (r + yPos);
    uint32* bmpPixels = bitmap->pixels + bitmap->pitch * bmpRowIndex;
    for (uint p = 0; p < width * xScale; ++p)
    {
        uint bmpPixelIndex = (uint)(p / xScale);
        bufPixels[p] = bmpPixels[bmpPixelIndex];
    }
}

It’s basic nearest-neighbor sprite scaling, nothing fancy. I just want this but in SIMD.

Based on the comments, it seems you only interested in the nearest neighbor resampling, and only for RGBA8 pixel format. I think for your use case, SIMD is borderline useless. While it is possible to do something smart with _mm_shuffle_epi8 or if you have AVX with _mm_permutevar_ps, not sure you going to have much profit from these, if at all.

Assuming you compile for 64 bits, try the following version. The code is untested but I hope the idea is clear. I’ll be surprised to find it’s possible to speed up with SIMD by any meaningful margin, unless restricting the scaling multiplier into small rational numbers.

#include <stdint.h>
#include <assert.h>
#include <cmath>
#include <algorithm>

struct RgbaBitmap
{
    // Bitmap data in system memory
    uint32_t* pointer;
    // Distance between rows; expressed in uint32_t elements, not bytes
    size_t rowPitch;
    // Size of the bitmap
    int width, height;
};

// End value for that 32.32 fixed-point number, incremented by `step` on each iteration
inline uint64_t scaledEnd32( uint64_t step, int rt, int sprite )
{
    if( rt > 0 && sprite > 0 )
    {
        // End value for the input sprite
        uint64_t sprite64 = (uint32_t)sprite;
        uint64_t endSprite = sprite64 << 32;
        // End value for the output bitmap
        uint64_t rt64 = (uint32_t)rt;
        uint64_t endRt = rt64 * step;
        // Return the minimum of them
        return std::min( endSprite, endRt );
    }
    else
        return 0;
}

void scaleBitmap( const RgbaBitmap& target, const RgbaBitmap& sprite,
    int xPos, int yPos, float scaling )
{
    // Supporting negative xPos / yPos is possible but rather tricky, not doing in this example
    assert( xPos >= 0 );
    assert( yPos >= 0 );
    assert( scaling > 0.0 );

    // Compute inverse of the scaling multiplier.
    // The following codes need to scale opposite way, output pixels -> sprite pixels.
    // Also convert into 32.32 fixed point, rounding for optimal precision.
    constexpr double p32 = (double)( (int64_t)1 << 32 );
    const uint64_t step = (uint64_t)std::llround( p32 / scaling );

    const size_t destPitch = target.rowPitch;
    const size_t sourcePitch = sprite.rowPitch;
    uint32_t* rdiLine = target.pointer + yPos * destPitch + xPos;
    const uint64_t fxEnd = scaledEnd32( step, target.width - xPos, sprite.width );
    const uint64_t fyEnd = scaledEnd32( step, target.height - yPos, sprite.height );

    // The outer loop is by output rows
    for( uint64_t fy = 0; fy < fyEnd; fy += step, rdiLine += destPitch )
    {
        // Sprite Y coordinate for the current output row
        const size_t sourceY = ( fy >> 32 );
        assert( (int64_t)sourceY < sprite.height );
        // Source pointer to read from the sprite
        const uint32_t* const rsi = sprite.pointer + sourcePitch * sourceY;

        // The inner loop is within a single row, each iteration makes an output pixel
        uint32_t* rdi = rdiLine;
        for( uint64_t fx = 0; fx < fxEnd; fx += step, rdi++ )
        {
            const size_t sourceX = ( fx >> 32 );
            assert( (int64_t)sourceX < sprite.width );
            *rdi = rsi[ sourceX ];
        }
    }
}

If you compile for a 32 bit CPU where 64 bit integer arithmetic is expensive, refactor the code into 16.16 fixed point i.e. use (double)( 1 << 16 ) multiplier and shift numbers by 16 bits when sampling from the source sprite.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 10:57

Thẻ: c++renderingsimdblitsse2

Thiết kế website giá rẻ

Danh mục

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]