I have downloaded Google iOS sample of MediaPipe and tried to load my model which is 2.5 GB in size.
private var inference: LlmInference! = {
/// let path = Bundle.main.path(forResource: "gemma-2b-it-gpu-int4", ofType: "bin")!
let path = Bundle.main.path(forResource: "slm1", ofType: "bin")!
let llmOptions = LlmInference.Options(modelPath: path)
return LlmInference(options: llmOptions)
}()
I am getting following error:
-[MTLDebugDevice newBufferWithBytes:length:options:]:670: failed assertion `Buffer Validation
newBufferWith*:length 0x1f400000 must not exceed 256 MB.
I have came to know that a single MTLBuffer
is limited to a maximum length of 256MB. If we need a total allocation of more than 256 MB, we can allocate multiple buffers and split data among them, but I don’t know how I will be able to do in case of this SDK.