I am working with Mistral 7B model on Kaggle notebooks. In my case, I will pass information to the prompt and I want the model to extract the functional needs from the document. For example, I pass it a text that describes a project, and I want It to be able to extract the specific needs from that text.
The problem that I am facing is memory use, which I could overcome now by using tpus, but It still takes so long to respond, is there any way to make it faster?
I am considering sending a piece of the document each time, is there another method that may improve the speed?