I wrote a metal shader that draws around 512K vertices using the .lineStrip
primitive with .isBlendEnabled
set to true and using an .add
blend operation to get an accumulation effect “for free”.
All vertices are on the same Z-plane and their positions are computed within the vertex shader itself by reading the corresponding input pixel from an input texture2d<float, access::read>
.
struct VertexOut {
float4 position [[position]];
float4 color;
float pointSize [[point_size]];
};
vertex VertexOut
vertex_function(texture2d<float, access::read> inTexture [[texture(0)]],
uint id [[vertex_id]])
{
uint width = inTexture.get_width();
uint x = id % width;
uint y = id / width;
float4 in = inTexture.read(uint2(x, y));
// Do some math here to compute
// xPos and yPos
return {
.position = float4(xPos, yPos, 0, 1),
.color = out,
.pointSize = 1
};
}
fragment float4 vectorscope_fragment(VertexOut input [[stage_in]]) {
return input.color;
}
I’m struggling to understand why when the vertices end up very close together (overlapping more in the final raster) the GPU utilization is more than double compared to when the vertices are more scattered and far apart.
I’m guessing that it has something to do with high pixel overdraw, but I’m not experienced with Metal and shading enough to know how to optimize the operation.
For this kind of drawing, I thought GPU utilization would only be dependent on (a function of) the number of non-transparent vertices that it has to render (i.e. the size of the input texture), regardless of their final position(?), since there’s no culling or depth testing possible.
Am I missing something obvious about how Metal draws .lineStrip
s with blending turned on? How can this be better optimized to minimize the pixel overdraw but conserve the add blending result?