I have two Pandas dataframes. One is called “swepam” and the other is called “ssn_monthly_mean.” Both dataframes have a column of different numpy.float64 objects and a column of Datetimes as pandas._libs.tslibs.timestamps.Timestamp objects. Both datasets span 24 years and the data in both dataframes (proton_speed for swepam and ssn for ssn_monthly_mean) was downsampled from a higher resolution to being monthly averaged.
I am trying to create a 2D histogram of the swepam data (proton speed on the y-axis and time on the x-axis) with time binned per year and proton speed binned per 25 km/s. I am also trying to normalize the columns of the 2D histogram. I would like to make it so the of the number of individual counts in a y-bin divided by the total number of counts in a column sum to one across each y-bin for a given column/time bin. Hopefully, that makes sense, but I will be happy to clarify if not. I am also trying to overplot the data in ssn onto the histogram as a line plot.
My problem is twofold. First, and more importantly, I can get the histogram to show up, but I cannot get the line plot to show up. Second, the color bar for the histogram keeps covering the axis titles, regardless of positioning.
The following is the code I have put together to try and make this plot:
fig=plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
# Define the range for proton speed bins
speed_min = 245
speed_max = swepam.proton_speed.max() + 25
speed_step = 25
speed_bins = np.arange(speed_min, speed_max, speed_step)
# Define the range for years
year_min = swepam['Datetime'].dt.year.min()
year_max = swepam['Datetime'].dt.year.max()
year_bins = np.arange(year_min, year_max + 1, 1) # include the last year
# Map the 'Datetime' to 'year' as a new column for easier grouping
swepam['Year'] = swepam['Datetime'].dt.year
# Create the 2D histogram data
histogram = np.zeros((len(speed_bins)-1, len(year_bins)-1))
for i, year in enumerate(year_bins[:-1]):
year_data = swepam[swepam['Year'] == year]
hist, _ = np.histogram(year_data['proton_speed'], bins=speed_bins)
histogram[:, i] = hist
# Normalize by year columns
histogram_normed = histogram / histogram.sum(axis=0)
# Create a heatmap DataFrame
heatmap_data = pd.DataFrame(
histogram_normed,
index=speed_bins[:-1],
columns=year_bins[:-1]
)
# Create the Seaborn heatmap
sns.heatmap(heatmap_data, annot=False, fmt=".2f", linewidths=.5, cmap='viridis', cbar_kws=dict(label = 'Counts (column bins sum to one)', use_gridspec = False, location = "bottom"), ax = ax1)
ax1.set_xlabel('Year')
ax1.set_ylabel('Proton Speed (km/s)')
# Invert the y-axis to have the higher speed bins at the top
ax1.invert_yaxis()
# Resample SSN data to get the monthly mean just in case.
ssn_monthly_mean = ssn_data['ssn'].resample('M').mean().reset_index()
# Plot the SSN data using Seaborn's lineplot on the secondary y-axis (ax2)
sns.lineplot(data=ssn_monthly_mean, x='Datetime', y='ssn', ax=ax2, color="orange", label='Monthly Mean SSN')
# Set secondary y-axis limits if needed
ax2.set_ylim(ssn_monthly_mean['ssn'].min(), ssn_monthly_mean['ssn'].max())
# Set the ylabel for the secondary y-axis and create the legend
ax2.set_ylabel('SSN', color='black')
ax2.legend(loc='upper left')
# Title and layout adjustments
plt.title('Proton Speed Distribution with SSN Overlay')
#plt.tight_layout()
plt.show()
swepam and ssn_monthly_mean have been defined in earlier lines and have been verified several times to contain the correct data and data types. The following is the resultant plot:
Given that the legend is being produced, I would assume that the plot line is being recognized, but the actual data won’t show up. Sometimes, using different plotting methods will cause the line plot to appear, but the line that shows up does not match the data at all. This is exemplified in the first of the following plots. The second plot is what the line should look like.
I have attempted to use seaborn, matplotlib, and even creating the plots from scratch using minimal canned routines. However, nothing has worked. Some methods I have tried work better than others. However, some methods end up causing more problems, such as overflow errors from dealing with Datetime objects.
Thank you in advance for any help you are able to give!