Hi guys Im currently integrating my local ETL Pipeline into AWS cloud platform. Im using Amazon MWAA service for this. I currently have a problem with loading my retrieved data into an excel file which is inside my s3 bucket. With my local pipeline I use openpyxl to manipulate the excel file, below are the code for my local etl pipeline (this function is to load the data into the excel file in my local folder.
def export_andbeyond_to_excel(ti):
df = ti.xcom_pull(task_ids='transform_andbeyond')
excel_file = ti.xcom_pull(task_ids='latest_file')
master_wb =openpyxl.load_workbook(excel_file)
master_sheets = master_wb.active #open the sheets
table = master_sheets['Q71':'Q89'] # Select all cells in column I from row 27 to the row 46
for i, row in enumerate(table, start=70):
site = row[0].value
if site in df['Sites'].values:
index = df.index[df['Sites'] == site].tolist()[0]
master_sheets[f'R{i+1}'] = df.at[index, 'Impressions']
master_sheets[f'S{i+1}'] = df.at[index, 'Revenue']
else:
master_sheets[f'R{i+1}'] = 0
master_sheets[f'S{i+1}'] = 0
# Save the workbook
master_wb.save(excel_file)
I want to do the same thing on AWS, where the excel file is in the s3 folder. The ‘latest_file’ is from another function where it will locate the latest file created and this ‘export_andbeyond_to_excel’ will then load data into the ‘latest_file’.
Anybody can suggest me how can I do the same thing on local to cloud platform?
If can, I dont want it to download from s3 and upload to s3 again