The code below is my attempt to identify join relationships between tables in a Oracle database.
The idea is to build a map of joins to help identify implicit FK relationships.
In the code below I use the PlSqlParserListener to walk the parse tree.
In the Join_on_partContext
handler I breakout all the child tokens and try to get check if these are column names (I also want table names but keeping this example simple ).
import antlr4
from PlSqlLexer import PlSqlLexer
from PlSqlParser import PlSqlParser
from pretty_print_antlr_tree import to_string_tree
from PlSqlParserListener import PlSqlParserListener
class SQLListener(PlSqlParserListener):
def is_column_name(self, token, ctx):
for child_ctx in ctx.getChildren():
if isinstance(child_ctx, PlSqlParser.Column_nameContext):
token_start = child_ctx.start.tokenIndex
token_stop = child_ctx.stop.tokenIndex
if token.tokenIndex >= token_start and token.tokenIndex <= token_stop:
return True
return False
def enterJoin_on_part(self, ctx:PlSqlParser.Join_on_partContext):
print('Found a join on :', ctx.getText())
idx1 = ctx.start.tokenIndex
idx2 = ctx.stop.tokenIndex
istrm = ctx.parser._input
tks = istrm.getTokens(idx1, idx2 + 1)
for tk in tks:
print( " Check " , tk.text, " is a column" )
if self.is_column_name(tk, ctx):
print (" join col name:" , tk.text )
def enterTableview_name(self, ctx:PlSqlParser.Tableview_nameContext):
print ( " Found table name: ", ctx.getText() )
def enterColumn_name(self, ctx:PlSqlParser.Column_nameContext):
print(' Found a columnName:', ctx.getText() )
def enterTable_alias(self, ctx:PlSqlParser.Table_aliasContext):
print(' Found a table alias:', ctx.getText() )
def enterColumn_alias(self, ctx:PlSqlParser.Column_aliasContext):
print(' Found a column alias:', ctx.getText() )
sqltext = """
select t1.col1, t2,col2
from t1
join t2 on t1.id1 = t2.id1
join ( select t3.col3
from t3
where t3.col4 = 25 ) t3a on t2.col5 = t3a.col2
where t1.col5 = 2024
order by 1
"""
lexer = PlSqlLexer(antlr4.InputStream(sqltext))
parser = PlSqlParser(antlr4.CommonTokenStream(lexer))
root = parser.sql_script()
antlr4.ParseTreeWalker.DEFAULT.walk(SQLListener(), root)
I am aiming for the output to be:
Found table name: t1
Found table name: t2
Found a join on : ont1.id1=t2.id1
Check on is a column
Check is a column
Check t1 is a column
Check . is a column
Check id1 is a column
join col name: id1 <<< Missing from actual output
Check is a column
Check = is a column
Check is a column
Check t2 is a column
Check . is a column
Check id1 is a column
join col name: id1 <<< Missing from actual output
Found table name: t3
Found a table alias: t3a
Found a join on : ont2.col5=t3a.col2
Check on is a column
Check is a column
Check t2 is a column
Check . is a column
Check col5 is a column
join col name: col5 <<< Missing from actual output
Check is a column
Check = is a column
Check is a column
Check t3a is a column
Check . is a column
Check col2 is a column
join col name: col2 <<< Missing from actual output
I am thinking maybe it would be better to fashion a state machine so I can tell when I am in a JOIN clause, then break up the “table.column”, “alias.column” or “columns” token and check off against a lookup list of tables and columns. But this seems to suggest a gaping hole in the Antlr way of doing things, and makes me think there must a way to achieve what I want within the Antlr framework.