I frequently want to make a contour plot of the density of bivariate data. Sometimes the data are bimodal, like this:
import numpy as np
x = np.concatenate([np.random.normal(size=1000, scale=.5),
np.random.normal(size=100, loc=10, scale=.1)])
y = np.random.uniform(size=1100)
plt.scatter(x, y)
Scatterplot
If I try to use seaborn’s kdeplot, I get overly smooth contours (just showing two levels for comparison to version in R):
import seaborn as sns
import pandas as pd
plt.scatter(x, y)
sns.kdeplot(pd.DataFrame({'x': x, 'y':y}), x='x', y='y', c='orange',levels=[.1,.5])
Two contours of the density of (x,y)
In fact, the 1D density estimation is also overly smooth if I don’t mess with the bandwidth myself:
plt.hist(x, density=True, bins=50)
sns.kdeplot(x)
Density of x
This is the equivalent in R:
x = c(rnorm(1000, sd=.5), rnorm(100, 10, .1))
y = runif(1100)
plot(x, y)
plot(x, y, cex=.5, xlim=c(-4, 12), ylim=c(-.2, 1.2))
contour(MASS::kde2d(x, y, n=100, lims=c(c(-4, 12), c(-.2, 1.2))), col='red',
add=T, levels = c(.1, .5))
hist(x, freq=F, breaks=50)
lines(density(x), col='red')
Density of x and two contours of the density of (x,y)
I am no student of density estimation, but the R version is doing what I would expect and the seaborn version is not. Is there a way to get seaborn’s kdeplot to produce similar contours to what I am getting in R? Or is there a better way with another established plotting tool in python?
1