I started working on a text-to-sql solution for my company using LLM. The model is anthropic claude 3 sonnet. I ran into a problem — two tables have one to many relation. Here is an example:
Orders:
Transaction ID | Amount |
---|---|
001 | 5.45 |
002 | 12.23 |
OrderDetail:
Transaction ID | Product ID | Quantity |
---|---|---|
001 | A001 | 12 |
001 | B001 | 12 |
001 | C001 | 3 |
002 | D001 | 4 |
Now the two tables are simplified data model, but it does represent the core issue. As you can see, one order has multiple records in OrderDetail
table. There are two scenarios I need: (1) join the two tables using Transaction ID
. (2) first get distinct Transaction ID
from OrderDetail
table, then join Orders
table. The problem is that I am not able to get LLM to draft the correct SQL queries consistently. The best I could come up with to break requests into two parts for scenario (2), ask LLM to get distinct Transaction ID
, then during the 2nd request, ask to join.
This wouldn’t work for my end users assuming most of them do not have SQL knowledge, and would ask questions in natural language. Would this be too complex for LLM right now, first determine the nature of the join, then decide the most suitable SQL based on the request? Any thoughts on this?
Resolved the issue by asking AI lol. Here are a few steps that one can do:
- Add table description, column description into the schema to help LLM understand each column
- Describe the data model, the primary and foreign keys in each table, and if it is one-to-many or many-to-many relation
- Provide query requirement in the prompt, for example asking the model to build a subquery, then join using the subquery.
This is not an ideal solution, but did give me the correct results.Potentially one can level up by using RAG to build a vector database, although I was unable to find much info on how to resolve this through RAG knowledge base.
1