Thiết kế website giá rẻ

Question

<code>data1 = {

'alias_cd': ['12345', '12345', '12345'],

'country_cd': ['AU', 'AU', 'AU2'],

'pos_name': ['st1', 'Jh', 'Jh'],

'ts_allocated': [100, 100, 100],

'tr_id': ['None', 'None', 'None'],

'ty_name': ['E2E', 'E2E', 'E2E']

}

data2 = {

'alias_cd': ['12345', '12345'],

'country_cd': ['AU', 'AU3'],

'pos_name': ['st1', 'st2'],

'ts_allocated': [200, 100],

'tr_id': ['None', 'None'],

'ty_name': ['E2E', 'E2E']

}

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

</code>

<code>data1 = { 'alias_cd': ['12345', '12345', '12345'], 'country_cd': ['AU', 'AU', 'AU2'], 'pos_name': ['st1', 'Jh', 'Jh'], 'ts_allocated': [100, 100, 100], 'tr_id': ['None', 'None', 'None'], 'ty_name': ['E2E', 'E2E', 'E2E'] } data2 = { 'alias_cd': ['12345', '12345'], 'country_cd': ['AU', 'AU3'], 'pos_name': ['st1', 'st2'], 'ts_allocated': [200, 100], 'tr_id': ['None', 'None'], 'ty_name': ['E2E', 'E2E'] } df1 = pd.DataFrame(data1) df2 = pd.DataFrame(data2) </code>

data1 = {
    'alias_cd': ['12345', '12345', '12345'],
    'country_cd': ['AU', 'AU', 'AU2'],
    'pos_name': ['st1', 'Jh', 'Jh'],
    'ts_allocated': [100, 100, 100],
    'tr_id': ['None', 'None', 'None'],
    'ty_name': ['E2E', 'E2E', 'E2E']
}

data2 = {
    'alias_cd': ['12345', '12345'],
    'country_cd': ['AU', 'AU3'],
    'pos_name': ['st1', 'st2'],
    'ts_allocated': [200, 100],
    'tr_id': ['None', 'None'],
    'ty_name': ['E2E', 'E2E']
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

output should be

<code> alias_cd country_cd pos_name ts_allocated tr_id ty_name etl_flag

1 12345 AU st1 200 None E2E U

2 12345 AU3 st2 100 None E2E D

3 12345 AU st1 100 None E2E I

4 12345 AU Jh 100 None E2E I

5 12345 AU2 Jh 100 None E2E I

</code>

<code> alias_cd country_cd pos_name ts_allocated tr_id ty_name etl_flag 1 12345 AU st1 200 None E2E U 2 12345 AU3 st2 100 None E2E D 3 12345 AU st1 100 None E2E I 4 12345 AU Jh 100 None E2E I 5 12345 AU2 Jh 100 None E2E I </code>

   alias_cd country_cd pos_name  ts_allocated          tr_id    ty_name  etl_flag
1  12345            AU  st1      200                   None      E2E     U
2  12345           AU3  st2      100                   None      E2E     D
3  12345            AU  st1      100                   None      E2E     I
4  12345            AU  Jh       100                   None      E2E     I
5  12345           AU2  Jh       100                   None      E2E     I

Because:

The combination of alias_cd and country_cd acts as a primary key.

1.If a combination exists in df2 and df1 (12345 AU), it will be marked for ‘Update’ in df2, and all corresponding rows in df1 for the same combination will be marked as ‘Insert’. for the above example for 12345 AU records in df2 will be etl_flag= ‘Update’ and add records for the same combination from df1 to df2 with etl_flag as ‘Insert’

2.12345 AU3 exists in df2 but not in df1, so it will be tagged as ‘DELETE’ in the etl_flag column.

3.Any new combination that appears in df1 and not present in df2 will be tagged as ‘Insert’ in the etl_flag column.

How can I achieve this efficiently?
This is what I tried but its doesn’t give correct output:

<code>df2['etl_flag'] = 'U'

to_insert = df1[~df1.apply(lambda x: (df2['alias_cd'] == x['alias_cd']) & (df2['country_cd'] == x['country_cd']), axis=1)]

to_insert['etl_flag'] = 'I'

df2 = pd.concat([df2, to_insert], ignore_index=True)

to_delete = df2[~df2.apply(lambda x: (df1['alias_cd'] == x['alias_cd']) & (df1['country_cd'] == x['country_cd']), axis=1)]

to_delete['etl_flag'] = 'D'

final_df = pd.concat([df2, to_delete], ignore_index=True)

final_df.sort_values(by=['alias_cd', 'country_cd'], inplace=True)

print(final_df[['alias_cd', 'country_cd', 'pos_name', 'ts_allocated', 'tr_id', 'ty_name', 'etl_flag']])

</code>

<code>df2['etl_flag'] = 'U' to_insert = df1[~df1.apply(lambda x: (df2['alias_cd'] == x['alias_cd']) & (df2['country_cd'] == x['country_cd']), axis=1)] to_insert['etl_flag'] = 'I' df2 = pd.concat([df2, to_insert], ignore_index=True) to_delete = df2[~df2.apply(lambda x: (df1['alias_cd'] == x['alias_cd']) & (df1['country_cd'] == x['country_cd']), axis=1)] to_delete['etl_flag'] = 'D' final_df = pd.concat([df2, to_delete], ignore_index=True) final_df.sort_values(by=['alias_cd', 'country_cd'], inplace=True) print(final_df[['alias_cd', 'country_cd', 'pos_name', 'ts_allocated', 'tr_id', 'ty_name', 'etl_flag']]) </code>

df2['etl_flag'] = 'U'

to_insert = df1[~df1.apply(lambda x: (df2['alias_cd'] == x['alias_cd']) & (df2['country_cd'] == x['country_cd']), axis=1)]
to_insert['etl_flag'] = 'I'
df2 = pd.concat([df2, to_insert], ignore_index=True)

to_delete = df2[~df2.apply(lambda x: (df1['alias_cd'] == x['alias_cd']) & (df1['country_cd'] == x['country_cd']), axis=1)]
to_delete['etl_flag'] = 'D'

final_df = pd.concat([df2, to_delete], ignore_index=True)
final_df.sort_values(by=['alias_cd', 'country_cd'], inplace=True)

print(final_df[['alias_cd', 'country_cd', 'pos_name', 'ts_allocated', 'tr_id', 'ty_name', 'etl_flag']])

Thiết kế website giá rẻ

Danh mục

How to efficiently compare 2 dataframe and produce a column value based on condition