I am writing a program in Swift that does computation on the GPU. Below is the code that generates a random matrix using Metal performance shaders:
func randomMatrix(
rows: Int,
columns: Int,
result: UnsafeMutableRawPointer
){
autoreleasepool{
let bufferSize = rows * columns * MemoryLayout<Float32>.size
let resultBuffer = device!.makeBuffer(
length: bufferSize,
options: .storageModeShared
)
let matrixDescriptor = MPSMatrixDescriptor(
rows: rows,
columns: columns,
rowBytes: MemoryLayout<Float>.size * columns,
dataType: .float32)
let outputMatrix = MPSMatrix(
buffer: resultBuffer!,
descriptor: matrixDescriptor
)
let randomMatrixDescriptor =
MPSMatrixRandomDistributionDescriptor.uniformDistributionDescriptor(
withMinimum: -1.0,
maximum: 1.0
)
let matrixRandomKernel = MPSMatrixRandomMTGP32(
device: device!,
destinationDataType: .float32,
seed: 43,
distributionDescriptor: randomMatrixDescriptor
)
let commandBuffer = commandQueue?.makeCommandBuffer()
matrixRandomKernel.encode(commandBuffer: commandBuffer!, destinationMatrix: outputMatrix)
commandBuffer?.commit() // Push command to the buffer
commandBuffer?.waitUntilCompleted()
memcpy(result, resultBuffer!.contents(), MemoryLayout<Float32>.size * rows * columns)
}
}
This code works well, but when generating large matrices (say of dimensions 100,000 x 100,000 or more), memcpy
creates a bottleneck. Therefore, I am looking for a way to eliminate the memcpy
.
The signature of the function on C side is
void metalRandomMatrix(float minimum, float maximum, int rows, int columns, float *result);
I have tried to create the buffer on Swift side with the argument bytesNoCopy, but that crashes the application when run on C side. Alternatively, I have tried the following:
// method 1 --- returns garbage values
result.pointee = resultBuffer!.contents()
// method 2 --- works but is similar to memcpy, so defeats the purpose
let resultPointer = resultBuffer!.contents().assumingMemoryBound(to: Float32.self)
result.update(from: resultPointer, count: rows * columns)
// method 3 --- returns garbage values
result.pointee = resultBuffer!.contents().assumingMemoryBound(to: Float32.self)
result.initialize(to: resultPointer)
Could anyone help me with this? Thank you.
6