I hope someone can help me with llamaindex. I try to play with it, and I am at the beginning. I have created a program for indexing a PDF document (741 pages). This program is in main.py.
And then I try to query the document PDF, which is indexed in a JSON file. The program is query1.py.
main.py created successfully index in JSON files.
Once I execute query1.py, and the CPU is 350% and execution is endless.
Someone, could tell me if something is wrong in my program or if optimizations should be done? I am novice so surely I don’t use correctly llamaindex.
Screenshots :
https://i.sstatic.net/jyeGImRF.png
https://i.sstatic.net/VZVj0Kth.png
https://i.sstatic.net/3G9Hox3l.png
main.py
```
from llama_index.core import SimpleDirectoryReader
from llama_index.core import GPTVectorStoreIndex
from sentence_transformers import SentenceTransformer
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
class LocalEmbeddingModel(BaseEmbedding):
def __init__(self):
self.model = SentenceTransformer('all-MiniLM-L6-v2')
def embed(self, texts):
return self.model.encode(texts)
# Initialiser le modèle d'embeddings local
#embed_model = LocalEmbeddingModel()
# Configurer LlamaIndex avec les paramètres appropriés
#settings = Settings(embed_model=embed_model)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
#service_context = ServiceContext.from_defaults(embed_model=None)
# Charger les documents à partir d'un répertoire
documents = SimpleDirectoryReader('./mesdoc').load_data()
# Créer un index à partir des documents
index = GPTVectorStoreIndex(documents)
# Sauvegarder l'index pour une utilisation future
#index.save_to_disk('index.json')
index.storage_context.persist(persist_dir="./mesdoc")
```
query1.py
```
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import ServiceContext
# Configurer LlamaIndex avec les paramètres appropriés
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Fonction pour poser des questions
def ask_question(question):
# Recharger l'index depuis le répertoire de persistance
# Load documents and build index
documents = SimpleDirectoryReader("./mesdoc").load_data()
index = VectorStoreIndex.from_documents(documents)
#index = GPTVectorStoreIndex.load_from_storage(persist_dir="./",)
# Poser la question et obtenir une réponse
query_engine = index.as_query_engine()
response = query_engine.query(question)
return response
# Exemple de question
question = "What is tasklist ?"
answer = ask_question(question)
print(f"Réponse : {answer}")
```
Benjamin Lombard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.