I am trying to run the following code using llama-index to read in a bunch of PDF books from a directory:
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(
input_dir="/home/ovo/code/datasets/ebooks/compsci/"
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
It is only using 1 out of 12 CPU cores, and ~1/8 of my VRAM, and is very slow as a result.
The same situation is occuring when I then try to create embeddings from these files:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5"
)
index = VectorStoreIndex.from_documents(docs)
Is there a way to make the above process multiple files in parallel so that it doesn’t run so slowly?