I have a project which needs to use llava-1.5-7b-hf
to inference on both images with texts and text only data. I know how to deal with image with texts, but I couldnt’t figure out a way to inference on text only cases. I tried
inputs = processor(text=conversation, images=None, return_tensors="pt").to('mps')
and system outputs
TypeError: is_floating_point(): argument 'input' (position 1) must be Tensor, not NoneType
I also tried to use a dummy tensor but no luck:
inputs = processor(text=conversation, images=torch.zeros((1, 1, 1)), return_tensors="pt").to('mps')
...
ValueError: mean must have 1 elements if it is an iterable, got 3
Please instruct me how to do it. Thank a lot!