I originally developed a model to work with a df as df1_ref. Now, I found more data I need in a different format as df2.
df1_ref:
id country point_id lon lat value_A
10195 United States 0 -98.52838821 35.33514183 22
10195 United States 1 -98.52838821 35.33514183 22
10195 United States 2 -98.52838821 35.33514183 22
10195 United States 3 -98.6100762 35.34134382 22
557 United States 0 -98.6100762 35.34134382 33
557 United States 1 -98.68102873 35.32256443 33
557 United States 2 -98.68102873 35.32256443 33
11354 United States 0 -95.4872305 31.85227725 22
11354 United States 1 -95.4872305 31.85227725 22
11354 United States 2 -95.474907 31.90126317 22
10131 United States 0 -98.7424264 35.777792 25
10131 United States 1 -98.58307728 35.74085314 25
10131 United States 2 -98.58307728 35.74085314 25
df2:
id country point_id_0 mid_points last_point_n route_id value_B
[10195, 557, 10123, 10105] United States [35.31514548, -98.46788543] [[35.33514183, -98.52838821], [35.34134382, -98.6100762]] [35.50457366645043, -98.98023683317503] 883 3500.69
[10029, 10027, 11354, 11355] United States [31.77540736, -95.11453227] [[31.83779617, -95.1591275], [31.89157317, -95.20337925]] [31.3187429993161, -95.450949000766] 1156 105.9092
[10131, 10204, 553, 2855] United States [35.77292283, -98.69682372] [[35.777792, -98.7424264], [35.74085314, -98.58307728]] [35.7434880000976, -99.2341759999093] 1186 4478.04
I need to convert df2 into df1_ref for me to run my previous model.
Here I explain what I need:
Various id in df2 come in the same observation, e.g., [10195, 557, 10123, 10105…]; I simplify to 4 id’s for the example. So, I need to split it into rows. I share some examples as how it would look like in df1.
point_id_0 from df2 is lat and lon of point_id = 0 of df1. So, I need to split point_id_0 in lat and lon columns, respectively.
mid_points from df2 are the next coordinates lat/lot of my network, each pair of coordinates belongs to one id in the same order (1,2,3…). So, first mid_points of route_id is id = 557, then to become 1 in point_id.
last_point_n is the last coordinate, belonging to the last id, e.g., 10105; point_id = 3.
Therefore, df3 is built from df2, considering df1 as a reference. So, the final df3 is as df1 + value_B + route_id.