I want to merge two CSV files from the open Bixi dataset. The problem is that after the outer merge, there are rows missing:
In [148]: outer_merged_df['Code']==7150
Out[148]:
0 False
1 False
2 False
3 False
4 False
...
1045584 False
1045585 False
1045586 False
1045587 False
1045588 False
Name: Code, Length: 1045589, dtype: bool
But this row is present in the left dataset:
In [151]: df['Code']==7150
...
615 True
Here is the code for the outer merge:
outer_merged_df = pd.merge(df, df_ride, left_on='Code', right_on='start_station_code', how='outer', indicator=True)
Here is the code to read the Bixi rides and the station:
df_ride = pd.read_csv('OD_2019-08.csv')
df = pd.read_csv('Stations_2019.csv')
And there is the link to the CSV files. If you’re going to download them, please use the August of 2019 file.
When I do a left merge, it finds it:
In [154]: merged_df_left=pd.merge(df, df_ride, left_on='Code', right_on='start_station_code', how='left')
In [155]: merged_df_left['Code']==7150
Out[155]:
913466 True
Name: Code, Length: 913470, dtype: bool
This is extremely confusing. Can someone please give a hint?