I am trying to add a line to match the data in vars with different “source”. The codes below got “KeyError: None”
df_combined.columns
Index(['Temp (°C)', 'RH (%)', 'CO2 (ppm)', 'PM2.5 (µg/m³)', 'SH (g/kg)',
'Source']
The df_combined looks like:
TimeSeries | Temp (°C) | RH (%) | PM2.5 (µg/m³) | SH (g/kg) | Source |
---|---|---|---|---|---|
2024-03-01 00:00:00 | 19.556857 | 52.384013 | 474.000000 | 0.794159 | BF |
2024-03-01 00:15:00 | 19.792490 | 50.691260 | 473.931034 | 0.714120 | BF |
2024-03-01 00:30:00 | 19.877775 | 49.918059 | 471.533333 | 0.676270 | BF |
2024-03-01 00:45:00 | 19.855077 | 49.405916 | 469.700000 | 0.705883 | BF |
2024-03-01 01:00:00 | 19.719011 | 48.791986 | 467.965517 | 0.713488 | BF |
2024-03-01 00:00:00 | 19.556857 | 52.384013 | 474.000000 | 0.794159 | ATF |
2024-03-01 00:15:00 | 19.792490 | 50.691260 | 473.931034 | 0.714120 | ATF |
2024-03-01 00:30:00 | 19.877775 | 49.918059 | 471.533333 | 0.676270 | ATF |
2024-03-01 00:45:00 | 19.855077 | 49.405916 | 469.700000 | 0.705883 | ATF |
2024-03-01 01:00:00 | 19.719011 | 48.791986 | 467.965517 | 0.713488 | ATF |
The codes is as followed:
g = sns.PairGrid(
data=df_combined,
vars=['Temp (°C)', 'SH (g/kg)', 'CO2 (ppm)', 'PM2.5 (µg/m³)'],
hue="Source",
height=2.5, aspect=1.5,
diag_sharey=False, despine=False
)
# Map histograms and additional line plots to the diagonal
def diagonal_with_line(xdata, **kwargs):
ax = plt.gca() # Get the current axis
variable = kwargs.get('var') # Extract the variable name
hue = kwargs.get('hue') # Extract the hue parameter if available
# Plot the histogram
sns.histplot(x=xdata, kde=False, **kwargs, ax=ax)
ax2 = ax.twinx()
# Add a line plot for each hue value separately
# Add a line plot (example: normal fit)
# Compute a fitted normal distribution PDF as an example
for level in xdata[hue].unique():
subset = xdata[xdata[hue] == level]
mu, std = np.mean(subset), np.std(subset)
x_vals = np.linspace(stats.norm.ppf(0.01, scale=std, loc=mu),
stats.norm.ppf(0.99, scale=std, loc=mu), 1000)
pdf = stats.norm.pdf(x=x_vals, scale=std, loc=mu)
ax2 = sns.lineplot(x=x_vals, y=pdf, color='red', label="normal fit", ax=ax2)
# Replace the diagonal mapping with the custom function
g.map_diag(diagonal_with_line)
# Map scatter plots to the off-diagonal plots
g.map_offdiag(sns.scatterplot)
If I remove the for loop, I can get the figure below.
without loop
8
There are a couple of problems:
- In **kwargs there are no “hue”, you should use “label”. There are also no “var” but you don’t use it.
- Seaborn calls diagonal_with_line twice per features (once for each of the 2 labels) so the loop needs to be removed. And for the same reason, no need of subset, take the full xdata.
The code below should solve both issues.
g = sns.PairGrid(
data=df_combined,
vars=['Temp (°C)', 'SH (g/kg)', 'RH (%)', 'PM2.5 (µg/m³)'],
hue="Source",
height=2.5, aspect=1.5,
diag_sharey=False, despine=False
)
# Map histograms and additional line plots to the diagonal
def diagonal_with_line(xdata, **kwargs):
ax = plt.gca() # Get the current axis
variable = kwargs.get('var') # Extract the variable name
hue = kwargs.get('label') # Extract the hue parameter if available
# Plot the histogram
sns.histplot(x=xdata ,kde=False, **kwargs, ax=ax)
ax2 = ax.twinx()
# Add a line plot for each hue value separately
# Add a line plot (example: normal fit)
# Compute a fitted normal distribution PDF as an example
# for level in xdata[hue].unique():
subset = xdata # [xdata[hue] == level]
mu, std = np.mean(subset), np.std(subset)
x_vals = np.linspace(stats.norm.ppf(0.01, scale=std, loc=mu),
stats.norm.ppf(0.99, scale=std, loc=mu), 1000)
pdf = stats.norm.pdf(x=x_vals, scale=std, loc=mu)
ax2 = sns.lineplot(x=x_vals, y=pdf, color='red', label="normal fit", ax=ax2)
# Replace the diagonal mapping with the custom function
g.map_diag(diagonal_with_line)#df_combined[])
# Map scatter plots to the off-diagonal plots
g.map_offdiag(sns.scatterplot)