I have dataframe and want to update it or create a new dataframe based on some input from an SQL table. The dataframe A has two columns (ID and Added_Date).
On the other hand, the SQL table has a few more columns including ID, Transaction_Date, Year, Month and Day. My idea is to merge contents of dataframe A to the SQL table and after the merge, pick all records transacted 30 days after the Transaction_Date in SQL table. In summary, I’m keen on having a dataframe with all transactions that happened 30 days (in SQL table) after the Added_Date in the df A. The SQL table is quite huge and is partitioned by Year, Month and Day. How can I optimize this process?
I understand the join can happen when the dataframe is converted to a tuple or may be dictionary but nothing past that.