I have python dictionary with key values as table and columns. In below example, sec_identifier, sec_profile and sec_sch are tables and id_1, amount, name, interest and price, ukey, date are columns.
my_dictionary = {
"sec_identifier": ["ukey", "date", "id_1"],
"sec_profile": ["ukey", "date", "amount", "name"],
"sec_sch": ["ukey", "date", "interest","price"]
}
I wanted to create SELECT statement dynamically based on dictionary using python code.
ukey and dt are my primary key. This is what I wanted to create from above dictionary.
SELECT si.id_1, sp.amount, sp.name, sc.interest, sc.price FROM sec_profile sp
INNER JOIN sec_identifier si ON (sp.ukey, sp.date) = (si.ukey, si.date)
INNER JOIN sec_sch sc ON (sp.ukey, sp.date) = (sc.ukey, sc.date)
INNER JOIN will shrink or grow based on my dictionary tables.
I tried keeping INNER JOIN statements same and just dynamically replace si.id_1, sp.amount, sp.name, sc.interest, sc.price based on my dictionary. However I was not successful to generate entire INNER JOIN dynamically.
Sachin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
This does the transformation with one big assumption that only the columns that are common across all the tables will be used for the join and the rest are just values that you want.
The generated sql differs that
- the table aliases are just t1, t2,t3,..
- and that the join conditions are not tuples, just
AND
s - all the joins are to t1, not the prior table.
you could probably fix that up, but i don’t think that would make the sqlengine any happier. Although you’d probably want to replace the t1,t2,t3 with a more predicable alias scheme,depending on what you are doing with the result. Technically this will be stable since python has ordereddict’s by default but that may not work for you in practice.
from pprint import pprint
def transform(input_dict):
# count the times each column appears
column_counts = {}
for columns in input_dict.values():
for column in columns:
column_counts[column] = column_counts.get(column, 0) + 1
# the join columns will be the ones that show up in every table
matches = [
column for column, count in column_counts.items() if count == len(input_dict)
]
tables = []
for table_name, columns in input_dict.items():
alias = f't{len(tables)+1}'
filtered_columns = [column for column in columns if column not in matches]
tables.append((alias, table_name, filtered_columns))
return tables, matches
def generate_sql(tables, matches):
select_clause = ', '.join(
f'{alias}.{column}' for alias, _, columns in tables for column in columns
)
from_clause = f'FROM {tables[0][1]} t1'
join_clauses = []
for alias, table_name, _ in tables[1:]:
# Join all tables to the first table (t1)
join_conditions = ' AND '.join(
f't1.{match} = {alias}.{match}' for match in matches
)
join_clause = f'INNER JOIN {table_name} {alias} ON {join_conditions}'
join_clauses.append(join_clause)
return f'SELECT {select_clause} {from_clause}n ' + 'n '.join(join_clauses)
if __name__ == '__main__':
input_dict = {
'sec_identifier': ['ukey', 'date', 'id_1'],
'sec_profile': ['ukey', 'date', 'amount', 'name'],
'sec_sch': ['ukey', 'date', 'interest', 'price'],
}
tables, matches = transform(input_dict)
sql_query = generate_sql(tables, matches)
print('Original Dictionary:')
pprint(input_dict)
print('nTransformed Dictionary:')
pprint(tables)
print('nMatches:')
pprint(matches)
print('nGenerated SQL Query:')
print(sql_query)
Returns:
$ python sqlgen.py
Original Dictionary:
{'sec_identifier': ['ukey', 'date', 'id_1'],
'sec_profile': ['ukey', 'date', 'amount', 'name'],
'sec_sch': ['ukey', 'date', 'interest', 'price']}
Transformed Dictionary:
[('t1', 'sec_identifier', ['id_1']),
('t2', 'sec_profile', ['amount', 'name']),
('t3', 'sec_sch', ['interest', 'price'])]
Matches:
['ukey', 'date']
Generated SQL Query:
SELECT t1.id_1, t2.amount, t2.name, t3.interest, t3.price FROM sec_identifier t1
INNER JOIN sec_profile t2 ON t1.ukey = t2.ukey AND t1.date = t2.date
INNER JOIN sec_sch t3 ON t1.ukey = t3.ukey AND t1.date = t3.date
$
1