Question:
I’m building a chatbot that incorporates a conversational retrieval system using Transformers and Gradio. The system retrieves context using FAISS for query context. However, I’m encountering an error that I can’t seem to resolve.
Problem:
I’m receiving the following error message:
TypeError: string indices must be integers, not 'tuple'
This error occurs within my code when I’m using FAISS for context retrieval. Specifically, it happens when the input message is being processed by the retrieval system. The message is correctly formatted as a string, but there seems to be an issue with how it’s being indexed or processed within the FAISS retrieval system.
Details:
I’m using a DistilBERT-based model for embedding and question answering.
The FAISS system is used to retrieve context based on the user’s message.
The ask function receives a message input, retrieves context using the FAISS retrieval system, and then uses a question-answering model to answer questions based on the retrieved context.
Despite adding type checks and debugging, I’m still encountering the same error.
Code:
import torch
from transformers import (
DistilBertTokenizer,
DistilBertModel,
DistilBertForQuestionAnswering,
)
import gradio as gr
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain_community.vectorstores import FAISS
import pathlib
import logging
# Set logging
logging.basicConfig(level=logging.INFO)
# Initialize tokenizer and models
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
embedding_model = DistilBertModel.from_pretrained("distilbert-base-uncased")
qa_model = DistilBertForQuestionAnswering.from_pretrained(
"distilbert-base-uncased-distilled-squad"
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
embedding_model.to(device)
qa_model.to(device)
# Load FAISS index
index_path = pathlib.Path("./saved-index-faiss") # Update the path as necessary
embeddings_db = FAISS.load_local(
index_path, embedding_model, allow_dangerous_deserialization=True
)
retriever = embeddings_db.as_retriever(
search_kwargs={"k": 3}
) # Adjust 'k' as necessary for your context retrieval
# Setup memory for conversation
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
# Define a function to answer questions using the provided context
def answer_question(question, context):
inputs = tokenizer(
question,
context,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=512,
)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = qa_model(**inputs)
answer_start = torch.argmax(outputs.start_logits)
answer_end = torch.argmax(outputs.end_logits) + 1
answer = tokenizer.decode(
inputs["input_ids"][0][answer_start:answer_end], skip_special_tokens=True
)
return answer
# Define the Gradio function
def ask(message, history):
if not isinstance(message, str):
message = str(message) # Convert to string if it's not already
print("Input message:", message)
# Fetch context based on the retrieval model
results = retriever.invoke(message)
context = results[0].content if results else "No relevant information found."
print("Retrieved context:", context)
# Tokenize the input message
inputs = tokenizer(
message,
context,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=512,
)
print("Tokenized inputs:", inputs)
# Answer the question using the context
return answer_question(message, context), memory.update_history(
{"question": message, "context": context}
)
# Create the Gradio interface
io = gr.ChatInterface(
fn=ask,
chatbot=gr.Chatbot(height=400),
textbox=gr.Textbox(placeholder="Ask Away! ", container=False, scale=6),
title="WikiBuddy",
description="Ask me any question",
theme="soft",
examples=[
"Who is David Beckham?",
"Is Bitcoin Dead?",
"What club does Christiano Ronaldo plays for?",
],
retry_btn=None,
undo_btn="Delete Previous",
clear_btn="Clear",
analytics_enabled=True,
fill_height=True,
)
# Run the Gradio app
io.launch()
Additional Notes:
I’ve noticed that the error started occurring only after introducing FAISS for context retrieval. Before that, when I used a simple model without FAISS, the chatbot was working fine.
The goal is to have the chatbot utilize FAISS for context retrieval and provide answers based on the retrieved context.
Any insights or solutions to resolve this issue would be greatly appreciated!
Ghulam Mustafa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.