I am working for a company that has some data restrictions. I have been using the Langchain SQL agent with these imports:
from langchain.agents import AgentType, create_sql_agent
from langchain.sql_database import SQLDatabase
from langchain.agents.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
This works extremely well with synthetic data. One of the chain-of-thought reasoning steps is to predict which table the data will come from, and then get the schema of the table, which is compliant. Then, the next step is to query the first three rows of the table. This step is not compliant with our company’s data restrictions since querying one full row is sensitive data.
I have been running through all of the documentation of this “SQLDatabaseToolkit,” and I can see which tools they use with
toolkit = SQLDatabaseToolkit(db, llm)
toolkit.get_tools()
I can see the list of tools here, but I’m not sure how to change the toolkit to prevent them from querying the first three rows.
I might be looking in the wrong part of the pipeline, too. Happy for any advice.
I’ve read through documentation, but can’t find where to prevent the chain from querying the first three rows.